data pipelining | ETL & streaming | automated movement
The Integrate module comprises a rich set of features for managing the flow of data between different systems, allowing users to create, operate and monitor dataflows via a single graphical user interface. Integrate facilitates all types of data integration, including both real-time and batch processing, and can support any ETL and ELT process.
The module is fully managed within the Fraxses cluster, and is underpinned by leading open source technology that we have extended in order to maximise compatibility and facilitate seamless integration into Fraxses. Enhancements include the addition of custom processors and the integration of the module into the platform’s Events Engine, thus allowing for dataflows to be triggered.
Amongst its many usages, Integrate allows users to move data from a CSV file to a central database, perform social media sentiment analysis, or move a portion of their data for analysis. It can be highly effective in situations where the federated approach is not practical. For example, when there is a short window for collecting historic data from a REST API, a scheduled process can be set up to regularly fetch data from the API and store it in a more permanent data store, thus allowing for historic reporting. Similarly, if a data source contains complex JSON data objects, Integrate can be used to split the data out into separate related tables which can then be consumed by the system.
The Integrate module offers several important advantages:
Prioritisation schemes allow users to determine how data is retrieved from the queue. By default, Integrate retrieves the oldest data first, but users can elect to pull the newest or smallest data first, or set prioritisation for any custom schema.
Passing ensures distribution at scale by making optimal use of a content depository. The purpose-built persistent write-ahead log enables effective load-spreading, very high transaction rates, and copy-on-write.
Integrate allows for data to be moved to multiple destinations simultaneously. It supports buffering for all queued data and back-pressure ability, and offers flow-specific configuration at points where data loss is intolerant.