With modern Business Intelligence solutions focused so heavily on being "in the cloud," one aspect of the traditional data warehousing approach that often gets over
looked or over shadowed is your organization's ETL processes.
ETL (Extract, Transform, and Load) processes typically serve two primary functions: moving data from a source to a target and transforming that data to be utilized
in the best way by the reporting tools. Many folks I talk to often think of them primarily as means to move data to a place their reporting tools can consume them
efficiently, doubly so when talking about cloud solutions. When groups are designing cloud solutions, they often get caught up in where the data can be stored
and still accessed by all the interacting processes within a solution, but in many organizations, the transformation piece is just as, if not more, critical
than just simply moving data.
It is rare to find a robust, heavily BI solution reporting on data exactly how it lives in a transactional system. Usually, the data needs to be re-modeled to
build conforming dimensions, join semi-related data elements together, or to just shape the data in a way that answers specific analytical questions.
In the traditional BI solution, this would be handled by a tool like SSIS, Informatica, ODI, or a similar product that allows you to build out processes that map
data from a source to target while performing any necessary transformations along the way. These can range from very basic copies of the data from the source system
to a data mart for querying purposes to huge, complex processes that heavily transform the data to feed a system of interrelated processes to serve reporting and
In terms of Cloud BI solutions, the good news is that not much has to change if you aren't ready for it. In most cases, you can continue to use on-premise ETL solutions
to load data into cloud-based storage and database solutions. Connectivity between on-premise and cloud-based architectures can become tricky due to private networks,
firewalls, and other security infrastructure. In most solutions, the path is difficult to connect from a cloud-based server into your network but connecting
from within your network into the cloud is typically painless. This means that if the server initiating any transaction is within your network, you shouldn't
have to worry about opening any access points.
Here's an important caveat that seems to pop in anytime I discuss cloud-based tools: always involve your network and security teams when designing cloud integrated
solutions. This ensures the design meets corporate policies and industry best practices for maintaining the integrity of your data, so it is not left susceptible
to outside interference. If you don't involve these folks, they will eventually find out about your system and request (sometimes not so nicely) a review of your
system and its interfacing points with the open internet.
I often advocate for iterative steps to the cloud for most of my large scale BI clients. This means identifying one star schema or one set of reports to migrate
to the cloud. Then we go through the processes to build just the objects that support that relatively narrowly specified set of requirements. Once that is
successful you can build on it over time, adding more schemas and reporting requirements based on their priority and resource availability. This keeps projects
moving briskly with a small, focused team.
An iterative approach for ETL transitions can sometimes mean simply pointing existing processes to a new target, assuming the technology between the existing
target and the new target doesn't require significant revamping/rewriting of the processes. For example, if you have an ETL process that loads tables in an
Oracle DB and are moving to an Oracle Cloud DB, this cut over should be straightforward.
However, if you're looking for a cloud-based ETL tool, you still have plenty of options. Oracle's ODI has a cloud-based counterpart, as does Informatica, Microsoft,
and many others. Which one you use will depend on a variety of factors including cost, existing licensing obligations, resource availability, and others, but the
important takeaway is that there are now a variety of options to meet the needs of complex data transformations for cloud reporting tools that are both on premise
and in the cloud.
We have worked with clients who have wanted to keep one foot on premise at times, making hybrid approaches ideal for them where the reporting tool is in the cloud,
but the data is kept in their on-premise databases. We have also worked with newer clients, typically younger companies without a large history of IT systems to
keep integrated and functioning, who do not want to host a server on their internal network for even development tools. They want to be 100% cloud-based, and these
ETL solutions certainly exist for them as well.