“ELT” has become a popular buzz word in the world of Modern Data Architecture. Why has that happened, and what exactly does it mean? In this article, we will reveal how ELT has become a philosophy just as much as a technique. Most importantly, we will reveal how it enables organizations to accelerate data-to-decisions, boosting the competitive advantage of any business.
Why should you move from ETL to ELT for data ingestion?
ELT, or “Extract-Load-Transform,” represents an evolution of data pipelines from traditional ETL, or “Extract-Transform-Load” practices. By prioritizing the “Load” phase of data ingestion earlier than the “Transform” phase, data is available to BI analysts and data scientists faster, which leads to faster business analytics reports and models. Additionally, the ELT process, when merged with modern data storage solutions, leverages the scalability and security of the cloud. Although there are still tradeoffs with respect to data governance, ELT can help organizations deliver on the promise of scalable, secure data science and business intelligence while reducing costs by simplifying the data ingestion process.
What is ETL and why is it used?
ETL (Extract-Transform-Load) refers to the traditional data ingestion pipeline whereby raw data is extracted from a raw source into a staging area, transformed into a useable product, and then loaded into a data warehouse. The process is akin to buying groceries and bringing them home (extract), preparing the groceries for consumption (transform), and then putting the prepared food into a pantry or refrigerator to be used later (load).
Historically, this process has been beneficial for on-premises data warehouses due to the costs of expanding and maintaining warehouse infrastructure. By filtering and cleaning the data before loading it, organizations only needed to store data that would be used. Additionally, this paradigm allows for filtering data for compliance reasons (e.g., GDPR) before sensitive data is loaded into a warehouse for user access.
How is ELT different?
ELT (Extract-Load-Transform) differs from the traditional ETL approach by inverting the final two stages of data ingestion and loading the data first. Due to advances in modern data storage and cloud data warehouses, quickly adding resources to accommodate new data is relatively cheap and organizations are now able to load large volumes of data quickly before making any alterations to the data. In other words, organizations can wait to spend effort on the transformation process until after the data is loaded. The most powerful and commonly used solution for storing raw data in an ELT process is a cloud data lake, an open-structure, cost-effective service available on all major cloud platforms.
In the grocery analogy, ELT is like putting the groceries away without any alterations. This allows whoever is doing the cooking (e.g., data scientist or business analysts) to process the raw materials they need in the way that they need those materials processed.
By switching to the new way of ingesting data, organizations can reap several benefits at lower costs. First, because more data will be loaded faster, data scientists and analysts can deliver quicker, more targeted insights. Second, this approach provides the flexibility to make changes along the way, removing the need to define everything up front. Third, ELT better leverages modern cloud solutions and cloud data lakes, which leads to higher scalability, security, and availability. Finally, along with streamlined processes and lower maintenance, cost is optimized using pay as you go features unique to cloud systems.
What are the costs and benefits of shifting from ETL to ELT?
The greatest benefits to be gained from switching from an ETL to ELT process comes from both the cloud compatibility of ELT and the flexibility it offers. While it is true that more options have become available to implement this type of architecture on-premises, the ELT process fits naturally with a data lake hosted on a cloud platform. At the most basic level, moving to the cloud offers auto-scalability to meet the data needs of an organization. It also offers easier managed security, higher data availability, lower infrastructure maintenance, and data redundancy. These cloud benefits translate to lower operating costs for managing a data warehouse on the cloud, while at the same time expanding the options for leveraging the data.
Beyond the basic benefits of moving to the cloud, an ELT process leads to faster insights from data. Because data scientists and analysts are not waiting on engineers to transform data, they have more flexibility to transform and filter the data to fit the business problem at hand. This flexibility also translates to less time spent on the transformation process by engineers, who are now free to work on other critical tasks rather than creating a “one size fits all” data warehouse. Together, data scientists and analysts can collaborate with data engineers and architects to create targeted analytics pipelines in a truly agile data science environment.
While moving to ELT can lead to faster data insights at lower costs, there are some tradeoffs in moving to a more flexible and open data ecosystem. For one, a decision must be made about the volume of data that should be loaded weighed against the cloud costs. Cloud storage is cheap, but movement and processing can be expensive under some circumstances. Secondly, larger organizations need to consider the possibility that different analysts may find different insights based on variations in transformations. While this can be managed by engineers or documentation showing best practices for data cleaning, coordination and collaboration is key as an environment gets more complex. Finally, access to the data needs to be managed with respect to the volume of data which is available to users and various roles throughout the organization. In the most extreme case, this could lead to compliance issues with standards such as GDPR where personally identifiable information (PII) should not be stored or accessed under certain conditions.
How can compliance issues be managed?
In the face of both external compliance issues and internal governance standards, it may make sense to do some light cleansing before the Load stage, even in an ELT process. For example, perhaps all data is passed through a PII scrubber to enforce data privacy. This light filtering stage could be thought of as a little “t” in an EtLT process. This provides some of the security benefits of transforming before loading but can be done quicker and cheaper than a full transformation stage would normally entail.
In other environments, compliance must be maintained downstream of the loading process. Under an ELT framework, data is no longer channeled through one simple, rigidly controlled pipeline. Many organizations manage the compliance challenges posed by this open framework using a Data Governance toolset, which controls data use throughout the environment.
What are the implications for organizational change?
As we have seen, ELT as a technique is changing the way organizations handle their data to respond to rapidly changing environments. But it changes more than one technical aspect of a business’ Data Estate—it leads to a paradigm shift of how an organization views and handles its data. Agile data management is one step on the roadmap to data maturity:
This shift involves areas like Development Processes and Data Governance, as well as evolutionary (or revolutionary) updates to one’s Modern Data Architecture. While this evolution can be achieved with incremental action, it requires a wholesale change in thinking. ELT is a philosophy just as much as a technique.
Advances in the Data World open many exciting opportunities for organizations to enhance their position in the competitive arena. These opportunities require new process and tool adoption and, more importantly, a fundamental rethinking of one’s approach to data valuation. Given the many ways to evolve one’s architecture to accelerate decision-making, we recommend starting with an assessment of your data ecosystem. This is available with an Exploratory Data Analysis (EDA), which is a great way to understand your position in this journey, and what strategies you can use to become a fully Data-Driven organization. The EDA assessment includes:
• Architectural Assessment
• Data Quality / Reliability Measurement
• Evaluation of ML / AI Readiness
• Roadmap Alignment
Please reach out to us today to begin the conversation and accelerate your organization’s path to data maturity.