Blogs > OCR Data Extraction in AEM Forms
AEM Forms
OCR Data Extraction in AEM Forms
June 09, 2023OCR Data
OCR stands for Optical Character Recognition. OCR data refers to the output generated by OCR software or systems when they scan and convert printed or handwritten text into digital text that computers can understand and process. This data typically includes the recognized text itself, along with information about its formatting, layout, and structure.
OCR technology has advanced significantly in recent years, allowing for high accuracy in converting scanned documents, images, or even live text from a camera feed into editable and searchable text. OCR data is valuable in various applications, such as digitizing books and documents, extracting information from forms, enabling text search in scanned documents, and aiding visually impaired individuals in accessing written content.
OCR Data Extraction in AEM Forms
Optical Character Recognition (OCR) integration in Adobe Experience Manager (AEM) Forms can significantly streamline data extraction from scanned documents or images. Choose a suitable OCR engine based on your requirements. Adobe Acrobat provides OCR capabilities, or you can opt for third-party OCR engines like Tesseract, ABBYY FineReader, or Google Cloud Vision API.
Integrate the chosen OCR engine with AEM Forms. This might involve installing plugins, libraries, or APIs provided by the OCR engine provider. Allow users to upload scanned documents or images through AEM Forms. Preprocess the uploaded documents/images if necessary. This may include tasks like image enhancement, noise reduction, or deskewing to improve OCR accuracy. Utilize the integrated OCR engine to extract text from the uploaded documents/images. This step involves passing the document/image to the OCR engine and receiving the extracted text.
There are several organizations offering OCR services, and if they have well-documented REST APIs, integrating them with AEM Forms using the data integration capability becomes straightforward. For the sake of this tutorial, we'll showcase OCR data extraction using ID Analyzer for uploaded documents."
This update maintains clarity and aligns with the tutorial's purpose, focusing on the integration with ID Analyzer for OCR data extraction.
Create a Swagger File
Creating a Swagger file involves defining your API's structure, endpoints, parameters, responses, and other details using the Swagger/OpenAPI Specification. Here's a step-by-step guide to help you create a Swagger file:
Understand Swagger/OpenAPI Specification
Choose a Swagger Editor
Define API Info
Define Paths and Operations
Define Parameters and Responses
Add Security Definitions (if needed)
Export and Save
Validate and Test
Use tools like Swagger Inspector or Postman to validate and test your Swagger file against your actual API endpoints.
swagger: '2.0'
info:
version: 1.0.0
title: Simple API
description: Learning Swagger
host: api.idanalyzer.com
schemes:
- https
paths:
/:
post:
summary: Decode Documents
produces:
- application/json
consumes:
- application/x-www-form-urlencoded
operationId: Decode Documents
parameters:
- in: formData
name: file_base64
type: string
description: Base 64 image of the Documents
- in: formData
name: apikey
type: string
description: API Secret Key
responses:
'200':
description: Successfull Response
schema:
$ref: '#/definitions/returnvalue'
definitions:
result:
type: object
properties:
fullName:
type: string
documentNumber:
type: string
address:
type: string
dob:
type: string
pincode:
type: string
returnvalue:
type: object
properties:
result:
type: object
$ref: '#/definitions/result'
Create a Data Source
To connect AEM/AEM Forms with third party api's, you first make a data source in cloud services. You can use the Swagger file to set up this data source.
Log in to AEM and go to the Dashboard.
From Tools, select Cloud Services.
Pick or create a folder in Cloud Services to store your data sources.
Define settings like data type, endpoint URL, and authentication.
Save the data source
Create a Form Data Model
Creating a form data model in AEM Forms involves defining the structure of your form data, including the fields, data types, and validation rules. Here's a step-by-step guide:
Select you data source
Create a Client Lib
To proceed, we'll require the base64 encoded representation of the uploaded document. This encoded string serves as a crucial parameter in our REST invocation process.
Create an Adaptive Form
Maximize the potential of your adaptive form by integrating the POST invocations of the Form Data Model. Effortlessly extract valuable data from user-uploaded documents by leveraging this powerful feature. Transmit the base64 encoded string of the uploaded document securely through the form data model's POST invocation, ensuring smooth and efficient data extraction processes. Enhance your adaptive form's functionality and elevate your data collection capabilities with this seamless integration.
By integrating the POST invocations of the Form Data Model, extracting data from user-uploaded documents becomes a seamless process in adaptive forms. Leveraging the form data model's capabilities allows for efficient transmission of base64 encoded strings, enhancing data extraction and processing. With this integration, adaptive forms are empowered to deliver enhanced functionality and streamlined data collection experiences.
I'm glad you found this article interesting and informative! Feel free to share it with your friends to spread the
knowledge.
Don't forget to follow me for upcoming blogs. Thank you!