nlp-langid¶
Description¶
Note
More information about the service specification can be found in the Core concepts > Service documentation.
The purpose of a language identification service (in short langid) is to detect which language is present in a snippet of text. The detection usually works well starting from a couple of phrases, so there is no need to input a whole 100 pages document to this service. If multiple languages are present in the input text, then the detection will output the most present language. To be noted that some languages are closer than other, e.g. the latin based languages.
The current langid model is here based on a naive bayes implementation multiplying n-gram probabilities, and assuming equal a priori probabilities for each languages. To simplify the implementation, n is here fixed for a given model. n-grams are produced simply sliding a window of length n on the input string. Tests have shown that 3-grams are providing satisfying results up to at least 10 languages. Once the models are loaded, computation is quite fast, basically O(1) for 1 n-gram as it is simple lookups in dictionaries to retrieve the probabilities. The computation time is only proportional to the length of the string and the number of languages in the model set, which is very much reasonable.
The list of languages that can be identified are in dir src/trained_models and currently includes en, de, es, fr, it, nl, pl, pt, ru, tr and dialect de-CH.
The API documentation is automatically generated by FastAPI using the OpenAPI standard. A user friendly interface provided by Swagger is available under the /docs
route, where the endpoints of the service are described.
This simple service only has one route /compute
that takes an image as input, which will be analyzed.
Environment variables¶
Check the Core concepts > Service > Environment variables documentation for more details.
Run the tests with Python¶
Check the Core concepts > Service > Run the tests with Python documentation for more details.
Start the service locally¶
Check the Core concepts > Service > Start the service locally documentation for more details.