PDF
Area Mapper
6min
The Area Mapper action facilitates mapping values found in specific areas of a document to corresponding keys. It involves defining configurations to specify the areas, keys, and filters for mapping.
- [I] Extracted words (list<object>): The output extracted from a PDF using the PDF Extract Text action.
- [I] Area configuration (list<json>): List of JSON objects containing configuration options for a set of areas.
- [O] Mapped areas (list<object>): Key-value list containing value maps for each area.
The Area configuration JSON structure comprises the following properties:
- MapToKey (mandatory)(string): Key that will be found in the output data under the key field.
- OnPages (optional)(list<int>): Specify the pages in the document to search. If not specified or <=0, it searches all pages.
- BetweenXLeft (int), BetweenXRight (int), BetweenYTop (int), BetweenYBottom (int), IsBelowWord (string), IsAboveWord (string), IsOnTheRightOfWord (string), IsOnTheLeftOfWord (string), RegExCaseInsensitivePattern (string), RegExCaseSensitivePattern (string) (all optional): These are all optional fields, but at least any 2 fields must be filled to extract text from the pdf.
- MaxNoOfRowsPerTarget (mandatory) (int): Specifies how many rows identified should be considered as a result. Words identified on MaxNoOfRowsPerTarget are concatenated into a single text. E.g. If your data in the pdf is written on 2 rows of text and you want this information from both rows mapped to one key result, then you should set this parameter to 2.
- TakeFirstElements (optional) (int): If null or <=0, it's considered to take all elements. This parameter specifies how many "instances" identified should be put on the output. It does not refer to how many words to consider, but how many "instances" are found. TakeFirstElements defines the maximum number of values to output for the MapToKey. Each "instance" can have 1 or more lines of text.
The images below present the configuration parameters visually depicted to express the way they need to be used to extract data from a PDF.
The JSON below is an output example for the above Area Configuration JSON Example.
- Ensure that the necessary configurations accurately map the desired areas to keys.
- Verify the output to ensure the expected values are correctly mapped.
Updated 17 Apr 2024
Did this page help you?