Blogs > OCR Data Extraction in AEM Forms

AEM Forms

OCR Data Extraction in AEM Forms

Infodales Tech Solutions | June 09, 2023

OCR Data

OCR stands for Optical Character Recognition. OCR data refers to the output generated by OCR software or systems when they scan and convert printed or handwritten text into digital text that computers can understand and process. This data typically includes the recognized text itself, along with information about its formatting, layout, and structure.
OCR technology has advanced significantly in recent years, allowing for high accuracy in converting scanned documents, images, or even live text from a camera feed into editable and searchable text. OCR data is valuable in various applications, such as digitizing books and documents, extracting information from forms, enabling text search in scanned documents, and aiding visually impaired individuals in accessing written content.

OCR Data Extraction in AEM Forms

Optical Character Recognition (OCR) integration in Adobe Experience Manager (AEM) Forms can significantly streamline data extraction from scanned documents or images. Choose a suitable OCR engine based on your requirements. Adobe Acrobat provides OCR capabilities, or you can opt for third-party OCR engines like Tesseract, ABBYY FineReader, or Google Cloud Vision API.

Integrate the chosen OCR engine with AEM Forms. This might involve installing plugins, libraries, or APIs provided by the OCR engine provider. Allow users to upload scanned documents or images through AEM Forms. Preprocess the uploaded documents/images if necessary. This may include tasks like image enhancement, noise reduction, or deskewing to improve OCR accuracy. Utilize the integrated OCR engine to extract text from the uploaded documents/images. This step involves passing the document/image to the OCR engine and receiving the extracted text.

There are several organizations offering OCR services, and if they have well-documented REST APIs, integrating them with AEM Forms using the data integration capability becomes straightforward. For the sake of this tutorial, we'll showcase OCR data extraction using ID Analyzer for uploaded documents."
This update maintains clarity and aligns with the tutorial's purpose, focusing on the integration with ID Analyzer for OCR data extraction.

OCR Data Extraction

Create a Swagger File

Creating a Swagger file involves defining your API's structure, endpoints, parameters, responses, and other details using the Swagger/OpenAPI Specification. Here's a step-by-step guide to help you create a Swagger file:

  • Understand Swagger/OpenAPI Specification

  • Choose a Swagger Editor

  • Define API Info

  • Define Paths and Operations

  • Define Parameters and Responses

  • Add Security Definitions (if needed)

  • Export and Save

  • Validate and Test

Use tools like Swagger Inspector or Postman to validate and test your Swagger file against your actual API endpoints.

swagger: '2.0'
info:
version: 1.0.0
title: Simple API
description: Learning Swagger
host: api.idanalyzer.com
schemes:
- https
paths:
/:
post:
summary: Decode Documents
produces:
- application/json
consumes:
- application/x-www-form-urlencoded
operationId: Decode Documents
parameters:
- in: formData
name: file_base64
type: string
description: Base 64 image of the Documents
- in: formData
name: apikey
type: string
description: API Secret Key
responses:
'200':
description: Successfull Response
schema:
$ref: '#/definitions/returnvalue'
definitions:
result:
type: object
properties:
fullName:
type: string
documentNumber:
type: string
address:
type: string
dob:
type: string
pincode:
type: string
returnvalue:
type: object
properties:
result:
type: object
$ref: '#/definitions/result'

Create a Data Source

To connect AEM/AEM Forms with third party api's, you first make a data source in cloud services. You can use the Swagger file to set up this data source.

  • Log in to AEM and go to the Dashboard.

  • From Tools, select Cloud Services.

  • Pick or create a folder in Cloud Services to store your data sources.

  • Define settings like data type, endpoint URL, and authentication.

  • Save the data source

Data Source
Data Source
Data Source
Data Source

Create a Form Data Model

Creating a form data model in AEM Forms involves defining the structure of your form data, including the fields, data types, and validation rules. Here's a step-by-step guide:

Form Data Model
Form Data Model
Form Data Model

Select you data source

Form Data Model
Form Data Model

Create a Client Lib

To proceed, we'll require the base64 encoded representation of the uploaded document. This encoded string serves as a crucial parameter in our REST invocation process.

Client lib

Create an Adaptive Form

Maximize the potential of your adaptive form by integrating the POST invocations of the Form Data Model. Effortlessly extract valuable data from user-uploaded documents by leveraging this powerful feature. Transmit the base64 encoded string of the uploaded document securely through the form data model's POST invocation, ensuring smooth and efficient data extraction processes. Enhance your adaptive form's functionality and elevate your data collection capabilities with this seamless integration.

Form Data Model
Form Data Model
Form Data Model
Form Data Model
Form Data Model
Form Data Model
Form Data Model
Form Data Model

By integrating the POST invocations of the Form Data Model, extracting data from user-uploaded documents becomes a seamless process in adaptive forms. Leveraging the form data model's capabilities allows for efficient transmission of base64 encoded strings, enhancing data extraction and processing. With this integration, adaptive forms are empowered to deliver enhanced functionality and streamlined data collection experiences.

I'm glad you found this article interesting and informative! Feel free to share it with your friends to spread the knowledge.
Don't forget to follow me for upcoming blogs. Thank you!


Nitish Bisen | AEM Developer
LinkedIn Email