How to: Extract text from an OCR result

 

This article is obsolete. It shows how to access the text returned by calls to the Bing OCR Service. The Bing Optical Character Recognition (OCR) Control sends captured images to the OCR Service, and the response from the service is converted to an OcrResult object.

Published date: March 4, 2014

System_CAPS_warningWarning

The Bing OCR Control is deprecated as of March 12, 2014.

An OcrResult contains a collection of Line objects, which each contain a collection of Word objects. Each Word object contains a text string. The OcrResult is found in the OcrCompletedEventArgs of the OcrControl.Completed Event.

To extract text and line breaks from an OcrResult object

  1. Create a handler for the OcrControl.Completed Event.

    private async void MainPage_Loaded(object sender, RoutedEventArgs e)
    {
        OCR.Completed += OCR_Completed; 
    …
    }
    
    private async void OCR_Completed(object sender, OcrCompletedEventArgs e)
    {
    }
    
    
  2. Ensure that there is text to extract.

    private void OCR_Completed(object sender, OcrCompletedEventArgs e)
    {
        // Make sure there is text.
        if (e.Result.Lines.Count == 0)
        {
            tbResults.Text = "No text found.";
            return;
        }
    }
    
    
  3. Use a nested loop to read the words into a string with a line break at the end of each line, and then display the string.

    private void OCR_Completed(object sender, OcrCompletedEventArgs e)
    {
        // Make sure there is text.
        if (e.Result.Lines.Count == 0)
        {
            tbResults.Text = "No text found.";
            return;
        }
    
        // Read the text and print it to a TextBlock.
        var sb = new System.Text.StringBuilder();
        foreach (Line l in e.Result.Lines)
        {
            foreach (Word w in l.Words)
            {
                sb.AppendFormat("{0} ", w.Value);
            }
            sb.AppendLine();
        }
        tbResults.Text = sb.ToString();
    }
    

Along with text, the Ocr Result also contains positional information. The Line.Box and Word.Box properties give coordinates of rectangles to mark the text in the captured image. For more information, see How to: Extract text position data from an OCR result.

Show: