It is no longer special for organizations to own multiple really big data systems. Organizations increasingly collect and analyze massive amounts of data to improve business and decision-making processes.
‘By Rick F. van der Lans’
Unfortunately, developing big data systems is not always easy and can be risk-prone. The technology to store the data can be complex, use unfamiliar database concepts, use a different and proprietary language or API, and it can support complex, non-flat data structures.
An approach taken by organizations to simplify working with big data technology is to copy the data from the big data system to a more traditional SQL-based system. While this simplifies data access, such an approach has several drawbacks:
All these aspects stand in the way of big data projects and especially the aspect of making big data available to data consumers.
Data virtualization can simplify the use of data stored in big data systems by operating as a data abstraction layer between the data consumers and big data systems. Data consumers access the big data systems through the data virtualization platform.
The latter can transform the database concepts and proprietary language or API of the big data systems to more well-known interfaces such as SQL. It can also transform complex, non-flat data structures into flat, relational data structures. In other words, it can create a SQL-based view on big data systems.
The advantages of this approach are:
In addition, products such as fraXses can deploy some of the big data technologies internally.
For example, for specific queries, fraXses creates, optimizes and enhances filter / join / aggregation pushdown query plans that are pushed into Apache Spark to execute queries using parallel processing. Data virtualization platforms also support advanced forms of data security and can extend or replace the underlying data security system.
Some big data experts are not really aware what data virtualization can do for them. Therefore, I recommend them to study this technology. It may reduce the risks of big data projects and speed up the exploitation of big data within organizations.