The question of ETL versus ELT has been the topic of discussion lately. In this post, we’ll look at some of the features that are a good fit for modern cloud data warehouse and the challenges that underlie the two approaches.
Our previous blog posts presented details of agile analytics usingAzure SQL Data Warehouse and Snowflake and the benefits of moving data into the cloud. Cloud analytics is becoming prominent with improved agility and scalability. We will look at how cloud data integration with ELT proves an important factor.
ETL is Getting Outdated
ETL has played a vital part for movement of data between various layers in a traditional on-premises data warehouse. It starts with a collection/extraction layer followed by the transformation layer for business rules and ends with a reporting layer built for analytics.
ETL is best suited for structured data when volumes of the data are small or moderate. Data that is available to customers is not always in an organized and consistent structure. This makes ETL process expensive for data preparation in terms of time and resource investments when integrating newer requirements. It may also take several weeks or months to get the data prepared for BI needs
Earlier implementations of data warehouse processes weren’t designed for the volumes we see now. The transformation time is proportional to the amount of data so it takes more time to transform and load. ETL must be optimized to load only the data that is required in the available time frame
In a typical data warehouse ETL environment, analysts/users spend time waiting on data load completions to look into recent data sets. This causes a delay and often inconsistent results when reporting on high volume data
There are proven success stories on the ETL approach as it is more established and has been in the industry for decades. We can find best practices and plenty of how-to guides that act as a knowledge base. Below we will discuss how ELT differs and better suits modern cloud data warehouse solutions.
ELT is Emerging
With the swap of ‘T’ and ‘L’, ELT refers to extracting data, loading everything into target system and then applying any transformations after it is loaded. Based on the solution, data can be loaded into cloud data warehouse, a data lake, or both, thereby allowing users/data analysts to work as they choose.
ELT is often used with No-SQL databases like HBASE, Mongo DB, etc. Since it supports both structured and unstructured data, ELT is a good fit for the columnar data warehouse environments in the cloud that can handle high volumes, varieties, and velocities of data
One important differentiating factor is that ELT supports data lake architectures. A data lake is a repository of data in raw form, allowing multiple potential uses of data from a single load. In Snowflake, a cloud solution, data can be ingested through an ELT process and transformations can be executed as needed on dedicated virtual warehouse(s). ELT thus helps to continuously refresh the cloud data warehouse while reporting continues uninterrupted
ELT has the ability to store data in any form and requires less effort by developers with advanced SQL skillsets. It also enables schema on read so the use of the data isn’t constrained by its structure at the time of load
ELT techniques can use the same database management system for storage as well as transformations so the load processing speeds are less dependent on size of data
ELT is replacing ETL and fits into cloud data integration processes due to the factors discussed above. ELT is a relatively new concept, shifting data preparation effort to the time of analytic use. ELT works well for both data warehouse modernization and supports data lake deployments. This enables customers to see a combination of data from centralized repositories as well as flexible BI reporting solutions based on their business needs.