Optical Character Recognition (U-SQL)
Updated: October 2, 2017
OcrExtractor cognitive function detects and extract text in an image. It analyze images to detect embedded text and generate character streams.
Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- Ensure you have installed the cognitive assemblies, see Registering Cognitive Extensions in U-SQL for more information.
- The scripts can be executed locally if you first download the assemblies locally, see Enabling U-SQL Advanced Analytics for Local Execution for more information. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
- You will need images accessible to you ADLA or Local account.
- The examples utillize the table
myImagesfrom the example Load images to a table.
Extract text from the image using OCR Extractor
REFERENCE ASSEMBLY ImageCommon;
REFERENCE ASSEMBLY ImageOcr;
@ocrs =
PROCESS dbo.myImages
PRODUCE FileName,
Text string
READONLY FileName
USING new Cognition.Vision.OcrExtractor();
OUTPUT @ocrs
TO "/ReferenceGuide/Cognition/Vision/OcrExtractor.txt"
USING Outputters.Tsv(outputHeader: true);
See Also
Community Additions
ADD
Show: