How to: Set the text language for Windows Runtime OCR

 

This article is obsolete. It shows how to work with language settings in The Bing Optical Character Recognition (OCR) Control, including selected available languages from a list, caching the selection, and caching the list itself.

Published date: March 4, 2014

System_CAPS_warningWarning

The Bing OCR Control is deprecated as of March 12, 2014.

The Bing OCR service makes predictions about scanned text based on both the character set and the linguistic context. Therefore, enabling the user to set the OCR language to match that of the image to interpret will yield more accurate results, even in languages with the same character set.

To set the language for OCR

  1. Create a C# or C++ Windows Store Application and add an Ocr Control, as described in Embedding the Bing OCR Control in an Application.

  2. From the Toolbox, drag a ComboBox onto the design surface of your XAML page, and name it LanguagePicker.

    <ComboBox x:Name="LanguagePicker" .../>
    
  3. Double-click the ComboBox to create a default event handler.

  4. In the event handler, set the OcrControl.CaptureLanguage Property to the selected value.

    private void LanguagePicker_SelectionChanged(object sender, SelectionChangedEventArgs e)
    {
        OCR.CaptureLanguage = (string) LanguagePicker.SelectedItem;
    }
    
  5. In the page load event handler, populate the Combo Box.

    try 
    {
        LanguagePicker.ItemsSource = await Bing.Ocr.OcrControl.GetLanguagesAsync();
    }
    catch (Exception ex)
    {
        // Failed to get supported language list from OCR service. Notify user.
    }
    LanguagePicker.SelectedItem = "en";
    
    System_CAPS_warningWarning

    The page load event handler is not generated in all Windows Store Application projects, but should be present to set the ClientId and ClientSecret properties for the OcrControl. To add this event handler, type this.Loaded += MainPage_Loaded; in the MainPage() constructor, and then right-click to generate a method stub.

You can save your users the effort of setting their preferred language on startup by caching the language setting from the previous session. This procedure builds on the code from the previous section.

To cache the language setting

  1. Add the Windows.Storage namespace to the application.

    using Windows.Storage;
    
  2. In the SelectionChanged event handler for the Combo Box, create a “LastLanguage” value for the LocalSettings.Values collection, and set it to the selected language code.

    private void LanguagePicker_SelectionChanged(object sender, SelectionChangedEventArgs e)
    {
        OCR.CaptureLanguage = (string) LanguagePicker.SelectedItem;
    
                                    ApplicationData.Current.LocalSettings.Values["LastLanguage"] = OCR.CaptureLanguage;
    }
    
    
  3. In the Page Load event handler, replace the line that sets the selected language in the Combo Box ( LanguagePicker.SelectedItem = "en";) with the following code.

    var 
                                    settings = ApplicationData.Current.LocalSettings;
    
    if (settings.Values["LastLanguage"] != null)
    {
        LanguagePicker.SelectedItem = (string) 
                                    settings.Values["LastLanguage"];
    }
    else LanguagePicker.SelectedItem = "en";
    
    

    This sets the selected item in the list to the last cached value if there is one, and sets it to English otherwise. Changing the SelectedItem property triggers the Selection Changed event, which in turn updates the OcrControl.CaptureLanguage property.

You can save your users additional time at startup by caching the list of supported languages instead of calling it every time the application opens. Because this list rarely changes, you can use a long caching interval. This example updates the language cache once a month.

To cache the list of supported languages

  1. In the Page Load event handler, delete the try... ...catch block that contains the call to the OcrControl.GetLanguagesAsync() Method. Then, just below the line var settings = ApplicationData.Current.LocalSettings;, insert the following code.

    // If the cache is empty or old, update it.
    var freshness = settings.Values["LastLanguageRefresh"];
    if (freshness == null || (int)freshness != DateTime.Now.Month)
    {
        try
        {
            // Cache the language list and the current month.
            var langs = await Bing.Ocr.OcrControl.GetLanguagesAsync();
            settings.Values["LanguageList"] = langs.ToArray<string>();
            settings.Values["LastLanguageRefresh"] = DateTime.Now.Month;
        }
        catch (Exception ex)
        {
            // Failed to get supported language list from OCR service. Notify user.
        }
    }
    
    LanguagePicker.ItemsSource = settings.Values["LanguageList"]
        as IReadOnlyList<string>;
    

    This code defines a freshness cache entry, then checks it for a null value (to prevent errors) or an out of date value. If found, it reloads the cached language list and updates the refresh date. Note that if the call to GetLanguagesAsync() fails, nothing is refreshed.

    Converting the langs value from in IReadonlyList<string> to an Array allows it to be stored as a single cache value and then recast later.

    After the if block, the code updates the Combo Box from the cache, recasting the language list back to an IReadonlyList<string>. If the cache is still empty at this time, the code that sets LanguagePicker.SelectedItem will provide the single value of “en”.

The following example shows the completed methods for the procedures described in this article.

public MainPage()
{
    this.InitializeComponent();
    this.Loaded += MainPage_Loaded;
}

private async void MainPage_Loaded(object sender, RoutedEventArgs e)
{
    OCR.ClientId = 
                <Your ClientID>;
    OCR.ClientSecret = 
                Your ClientSecret;

    var settings = ApplicationData.Current.LocalSettings;

    // If the cache is empty or old, update it.
    var freshness = settings.Values["LastLanguageRefresh"];
    if (freshness == null || (int)freshness != DateTime.Now.Month)
    {
        try
        {
            // Cache the language list and the current month.
            var langs = await Bing.Ocr.OcrControl.GetLanguagesAsync();
            settings.Values["LanguageList"] = langs.ToArray<string>();
            settings.Values["LastLanguageRefresh"] = DateTime.Now.Month;
        }
        catch (Exception ex)
        {
            // Failed to get supported language list from OCR service. Notify user.
        }
    }

    LanguagePicker.ItemsSource = settings.Values["LanguageList"]
        as IReadOnlyList<string>;

    if (settings.Values["LastLanguage"] != null)
    {
        LanguagePicker.SelectedItem = (string)settings.Values["LastLanguage"];
    }
    else LanguagePicker.SelectedItem = "en";
}

private void LanguagePicker_SelectionChanged(object sender, SelectionChangedEventArgs e)
{
    OCR.CaptureLanguage = (string)LanguagePicker.SelectedItem;
    ApplicationData.Current.LocalSettings.Values["LastLanguage"] = OCR.CaptureLanguage;
}

Having set up user-controlled language settings for your OCR application, you should also create handlers for the Completed and Failed events, and consider adding a Cancel button that calls the OcrControl.ResetAsync() Method.

Show: