Detect Languages

 

Updated: February 23, 2017

Detects the language of each line in the input file

Category: Text Analytics

You can use the Detect Languages module to analyze text input and identify the language associated with each record in the input. The language detection algorithm can identify many different languages.

You specify which text column to analyze, and how many languages to detect. The algorithm will analyze each row of text, and assign a language score for each language. The language in the first result column is the language that is most likely.

  1. Add the Detect Languages module to your experiment, and connect a dataset that has a text column including multiple languages.

    It is not necessary that the input contain any labels; the language detection algorithm works using morphological and lexical features of the supported languages.

  2. For Text column, choose the column you want to analyze.

  3. For Upper bound on number of languages to detect, indicate the maximum number of languages to detect.

    Setting an upper bound on the number of languages can improve performance.

  4. Run the experiment.

  5. The Detect Languages module outputs a language identifier and score for each row. If there are equi-probable language matches, several languages might be listed, with a score for each.

For example, in these results, the first two columns col1 and language label are input columns, already present in your dataset. The remaining columns are generated by the Detect Languages module.

Col1Language labelCol1 LanguageCol1 Iso6391 LanguageCol1 Iso6391 Language Score
It was a wonderful hotel with a friendly staff and good serviceEnglishEnglishen100
Es war ein wunderbares Hotel mit freundlichem Personal und guter serviceGermanGermande100
C’est un magnifique hôtel avec un personnel sympathique et un service de qualitéFrenchFrenchfr100
Det var et dejligt hotel med et venligt personale og god serviceDanishDanishnl100
Va ser un magnífic hotel amb un personal amable i bon serveiCatalanCatalanca92.30769348
とても素敵なホテルで、スタッフは親切で、サービスもよかったJapanese(Unknown)0
qu mebpa'mey naQ friendly QaQ chavmoH jeKlingonFrenchfr77.5

For a general idea of the languages that potentially can be detected, refer to Bing Translator.

Many more languages can be detected than Azure Machine Learning currently supports for advanced text analytics. We recommend that you use the results of Detect Languages to filter the results that you send to other modules that require language-specific processing.

NameTypeDescription
DatasetData TableThe input
NameTypeRangeOptionalDefaultDescription
Upper bound on number of languages to detectInteger[1;184]Required1Upper bound on number of languages to detect.
Text columnColumnSelectionRequiredName or one-based index of text column.
NameTypeDescription
Results datasetData TableThe result
ExceptionDescription
Error 0003Exception occurs if one or more of inputs are null or empty.
Error 0010Exception occurs if input datasets have column names that should match but do not.
Error 0016Exception occurs if input datasets passed to the module should have compatible column types but do not.
Error 0008Exception occurs if parameter is not in range.

Text Analytics
A-Z Module List

Show: