solutions focused so heavily on being "in the cloud," one aspect of the traditional data warehousing approach that often gets over looked or over shadowed is your organization's
ETL (Extract, Transform, and Load)
processes typically serve two primary functions: moving data from a source to a target and transforming that data to be utilized in the best way by the reporting tools. Many folks I talk to often think
of them primarily as means to move data to a place their reporting tools can consume them efficiently, doubly so when talking about
. When groups are designing cloud solutions, they often get caught up in where the data can be stored and still accessed by all the interacting processes within a solution, but in many organizations,
the transformation piece is just as, if not more, critical than just simply moving data.
It is rare to find a robust, heavily
solution reporting on data exactly how it lives in a transactional system. Usually, the data needs to be re-modeled to build conforming dimensions, join semi-related data elements together, or to just
shape the data in a way that answers specific analytical questions.
In the traditional
, this would be handled by a tool like SSIS, Informatica, ODI, or a similar product that allows you to build out processes that map data from a source to target while performing any necessary transformations
along the way. These can range from very basic copies of the data from the source system to a data mart for querying purposes to huge, complex processes that heavily transform the data to feed a system
of interrelated processes to serve reporting and analytical needs.
In terms of
Cloud BI solutions
, the good news is that not much has to change if you aren't ready for it. In most cases, you can continue to use on-premise
to load data into cloud-based storage and database solutions. Connectivity between on-premise and cloud-based architectures can become tricky due to private networks, firewalls, and other security infrastructure.
In most solutions, the path is difficult to connect from a cloud-based server into your network but connecting from within your network into the cloud is typically painless. This means that if the server
initiating any transaction is within your network, you shouldn't have to worry about opening any access points.
Here's an important caveat that seems to pop in anytime I discuss cloud-based tools: always involve your network and security teams when designing cloud integrated solutions. This ensures the design meets
corporate policies and industry best practices for maintaining the integrity of your data, so it is not left susceptible to outside interference. If you don't involve these folks, they will eventually
find out about your system and request (sometimes not so nicely) a review of your system and its interfacing points with the open internet.
I often advocate for iterative steps to the cloud for most of my large scale
clients. This means identifying one star schema or one set of reports to migrate to the cloud. Then we go through the processes to build just the objects that support that relatively narrowly specified
set of requirements. Once that is successful you can build on it over time, adding more schemas and reporting requirements based on their priority and resource availability. This keeps projects moving
briskly with a small, focused team.
An iterative approach for
transitions can sometimes mean simply pointing existing processes to a new target, assuming the technology between the existing target and the new target doesn't require significant revamping/rewriting
of the processes. For example, if you have an
that loads tables in an Oracle DB and are moving to an Oracle Cloud DB, this cut over should be straightforward.
However, if you're looking for a
cloud-based ETL tool
, you still have plenty of options.
has a cloud-based counterpart, as does Informatica, Microsoft, and many others. Which one you use will depend on a variety of factors including cost, existing licensing obligations, resource availability,
and others, but the important takeaway is that there are now a variety of options to meet the needs of complex data transformations for cloud reporting tools that are both on premise and in the cloud.
We have worked with clients who have wanted to keep one foot on premise at times, making hybrid approaches ideal for them where the reporting tool is in the cloud, but the data is kept in their on-premise
databases. We have also worked with newer clients, typically younger companies without a large history of IT systems to keep integrated and functioning, who do not want to host a server on their internal
network for even development tools. They want to be 100% cloud-based, and these
solutions certainly exist for them as well.