Integrate: data integration module

data pipelining | ETL & streaming | automated movement

Introducing Integrate

The Integrate module comprises a rich set of features for managing the flow of data between different systems, allowing users to create, operate and monitor dataflows via a single graphical user interface. Integrate facilitates all types of data integration, including both real-time and batch processing, and can support any ETL and ELT process. 

The module is fully managed within the Fraxses cluster, and is underpinned by leading open source technology that we have extended in order to maximise compatibility and facilitate seamless integration into Fraxses. Enhancements include the addition of custom processors and the integration of the module into the platform’s Events Engine, thus allowing for dataflows to be triggered.  

Amongst its many usages, Integrate allows users to move data from a CSV file to a central database, perform social media sentiment analysis, or move a portion of their data for analysis. It can be highly effective in situations where the federated approach is not practical. For example, when there is a short window for collecting historic data from a REST API, a scheduled process can be set up to regularly fetch data from the API and store it in a more permanent data store, thus allowing for historic reporting. Similarly, if a data source contains complex JSON data objects, Integrate can be used to split the data out into separate related tables which can then be consumed by the system.

 

The Integrate module offers several important advantages:

Prioritised Queuing

Prioritisation schemes allow users to determine how data is retrieved from the queue. By default, Integrate retrieves the oldest data first, but users can elect to pull the newest or smallest data first, or set prioritisation for any custom schema.

Guaranteed Delivery

Passing ensures distribution at scale by making optimal use of a content depository. The purpose-built persistent write-ahead log enables effective load-spreading, very high transaction rates, and copy-on-write.

Flow Management

Integrate allows for data to be moved to multiple destinations simultaneously. It supports buffering for all queued data and back-pressure ability, and offers flow-specific configuration at points where data loss is intolerant.

Ease of Use

Integrate enables users to visualise dataflows, reducing complexity and reflecting changes in real-time. The module can automatically index, record, and make provenance data available through the system across transformations, fan-in, fan-out, and more.

S2S Protocol

Site-to-Site (S2S) protocol facilitates the transfer of data and allows for client libraries to be bundled into applications or and communicate with Integrate. S2S supports HTTP(S) and socket-based protocols, allowing for a proxy server to be easily embedded into S2S communication.

Security

Protocols and two-way SSL encryption ensure the secure exchange of data throughout the dataflow. Integrate enables encryption and decryption for senders and recipients, offers pluggable authorisation to control user access, and provides admins with access to the entire dataflow.

Thank you for contacting us.

We will be in touch shortly.