Our client compiles data that is sourced from the public domain and published by government municipalities for public consumption.
The data is lifted from public PDFs, stored in a central database, and served to users in the form of dashboards.
The client needed an entirely different approach to data processing and management, and a solution that would provide best in class processes for PDF processing, data storage and business intelligence.
· Firstly, we’re eliminating the client’s cumbersome copy-and-paste approach to lifting text from PDFs. Originally, workers would sift through electronic documents and transcribe data, page by page.
Now, business analysts begin by building a metadata library for each PDF. They identify which pages contain relevant data, record them, and then capture the coordinates of the data in a zonal capture program. An OCR service can then digest those coordinates and automatically lift the text data from the PDFs. Fraxses can then re-assemble useful data and filter out the noise from the result sets provided.
· Secondly, we have developed a classification microservice that automates enrichment and classification – this would otherwise be a manual process. This enrichment service is built using human-defined keyword matching.
This process adds over 100 unique dimensions to the dataset. These dimensions are later used to customize the user experience in the Fraxses visualization tool. This solution is offered as a Software as a Service (SAAS) model by Intenda. The input to the data platform is an Excel database that a business analyst has reviewed for accuracy.
The Flowz module runs a batch process to dimension the data into its components while virtual data objects rebuild the original structure to create a hosting friendly data model.
While the current end-to-end processes are loosely coupled, the future state of these processes entails tight integration with document management software.
Enrichment and classification will move to machine learning.
Essentially, business analysts will be able to run these procedures from a data workbench in an automated fashion, which would not be possible without the metadata catalogue approach and automation we have provided for this client.