Blog

ETL Strategy for Business Intelligence and Reporting Cloud Solutions

  • John Liette - Product Architect, Business Intelligence

With modern Business Intelligence solutions focused so heavily on being "in the cloud," one aspect of the traditional data warehousing approach that often gets over looked or over shadowed is your organization's ETL processes.

ETL (Extract, Transform, and Load) processes typically serve two primary functions: moving data from a source to a target and transforming that data to be utilized in the best way by the reporting tools. Many folks I talk to often think of them primarily as means to move data to a place their reporting tools can consume them efficiently, doubly so when talking about cloud solutions. When groups are designing cloud solutions, they often get caught up in where the data can be stored and still accessed by all the interacting processes within a solution, but in many organizations, the transformation piece is just as, if not more, critical than just simply moving data.

It is rare to find a robust, heavily BI solution reporting on data exactly how it lives in a transactional system. Usually, the data needs to be re-modeled to build conforming dimensions, join semi-related data elements together, or to just shape the data in a way that answers specific analytical questions.

In the traditional BI solution, this would be handled by a tool like SSIS, Informatica, ODI, or a similar product that allows you to build out processes that map data from a source to target while performing any necessary transformations along the way. These can range from very basic copies of the data from the source system to a data mart for querying purposes to huge, complex processes that heavily transform the data to feed a system of interrelated processes to serve reporting and analytical needs.

In terms of Cloud BI solutions, the good news is that not much has to change if you aren't ready for it. In most cases, you can continue to use on-premise ETL solutions to load data into cloud-based storage and database solutions. Connectivity between on-premise and cloud-based architectures can become tricky due to private networks, firewalls, and other security infrastructure. In most solutions, the path is difficult to connect from a cloud-based server into your network but connecting from within your network into the cloud is typically painless. This means that if the server initiating any transaction is within your network, you shouldn't have to worry about opening any access points.

Here's an important caveat that seems to pop in anytime I discuss cloud-based tools: always involve your network and security teams when designing cloud integrated solutions. This ensures the design meets corporate policies and industry best practices for maintaining the integrity of your data, so it is not left susceptible to outside interference. If you don't involve these folks, they will eventually find out about your system and request (sometimes not so nicely) a review of your system and its interfacing points with the open internet.

I often advocate for iterative steps to the cloud for most of my large scale BI clients. This means identifying one star schema or one set of reports to migrate to the cloud. Then we go through the processes to build just the objects that support that relatively narrowly specified set of requirements. Once that is successful you can build on it over time, adding more schemas and reporting requirements based on their priority and resource availability. This keeps projects moving briskly with a small, focused team.

An iterative approach for ETL transitions can sometimes mean simply pointing existing processes to a new target, assuming the technology between the existing target and the new target doesn't require significant revamping/rewriting of the processes. For example, if you have an ETL process that loads tables in an Oracle DB and are moving to an Oracle Cloud DB, this cut over should be straightforward.

However, if you're looking for a cloud-based ETL tool, you still have plenty of options. Oracle's ODI has a cloud-based counterpart, as does Informatica, Microsoft, and many others. Which one you use will depend on a variety of factors including cost, existing licensing obligations, resource availability, and others, but the important takeaway is that there are now a variety of options to meet the needs of complex data transformations for cloud reporting tools that are both on premise and in the cloud.

We have worked with clients who have wanted to keep one foot on premise at times, making hybrid approaches ideal for them where the reporting tool is in the cloud, but the data is kept in their on-premise databases. We have also worked with newer clients, typically younger companies without a large history of IT systems to keep integrated and functioning, who do not want to host a server on their internal network for even development tools. They want to be 100% cloud-based, and these ETL solutions certainly exist for them as well.

Contact MindStream Analytics

Contact us to learn more about Business Intelligence and Analytics, including data integration via ETL, MDM, and other tools. We would love to assist your organization in advancing to the next stage of your Business Intelligence and EPM evolution.


Latest testimonial

MindStream Analytics is a leading consulting firm focused on helping clients improve business understanding and decision making. With years of experience in the analytics and Business Performance Management area, MindStream offers customers services ranging from software selection and implementation to best practices for financial planning. MindStream will work with customers towards a solution that enhances value and offers more insight into their data. MindStream believes in the power of technology combined with new procedures to give customers better analytic capabilities


Partner SpotLight

Oracle Partner

Built using best in class hardware, market- leading business intelligence software and in-memory database technology, Oracle Exalytics is an optimized system that delivers speed of thought analysis with unmatched intelligence, simplicity and manageability. For customers evaluating Oracle Exalytics, MindStream Analytics houses an Oracle Exalytics machine dedicated to 'sandbox' environment testing.

Oracle Profile

Share: