Snowflake Cloud Data Warehouse
Traditionally, building a data warehouse with ETL takes longer to implement. This is due to factors such as upfront data modeling, ETL design and development, aggregates and performance tuning, administration, testing and implementation. In this modern digital era, demand for agile analytics is increasing, along with data needing to be processed, analyzed and reported. Since traditional data management strategies for on‑premises data warehouses were not designed to support cloud data warehousing or advanced analytics, a new approach is required to support these needs.
The following are some of the top challenges in traditional data warehouse architecture:
- Dependency on IT to provision, build and maintain hardware and software
- Long duration of implementation cycles and time to market
- Highly complex data pipelines
- Performance issues and tuning
- Handling a variety of data (structured and un-structured)
Snowflake is promising agile data management with its cloud data warehouse, addressing the above challenges with its innovative 3-layered architecture. This physically separates, but logically integrates storage, computation and services like security and metadata as shown below:
Picture courtesy of Snowflake.
The goal of this article is to show the user how to build a Snowflake cloud data warehouse, load data, and analyze this data with no ETL or aggregate tables required. We will perform the following steps to accomplish our goal:
- Create and configure a Snowflake data warehouse
- Load data into Snowflake (storage)
Create and Configure a Snowflake Cloud Data Warehouse
The first step in this process is to make sure you have a Snowflake account. If you are just getting started you can sign up for a free trial to conduct this exercise. Administration and configuration can be done in two ways through either the web site or SnowSQL. SnowSQL is a command line interface client that connects to Snowflake to execute DDL commands. These create and drop objects like databases, schemas, tables, views, and DML commands to load, insert, select and update data in the tables. To use SnowSQL, you need to install and configure SnowSQL on your local computer. You can use the web interface to create a database and data warehouse in a few clicks:
- Login to Snowflake web interface
- Switch role to SYSADMIN
- Create new database
- Create new data warehouse
Creation of the virtual data warehouse is required to load, process and execute queries. As shown in the above Snowflake architecture diagram, all the data processing is happening in data warehouse layer. You can learn more about connecting to Snowflake here.
Load data into Snowflake (Storage)
When loading data into Snowflake, it must be loaded as a flat data file or semi-structured data file (JSON, XML etc.) unless you use a tool that converts relational data sources for loading. Loading data into Snowflake can be done in multiple ways, including web interface, SnowSQL, and third party ETL tools. We can try to load data in to Snowflake using all the above methods. You can use the web interface to load a small data file, like dimension data, as there is restriction on the size of the data through the web interface. Likewise, you can use SnowSQL to load larger data files that might include transnational data.
In order to read data directly from relational tables, we can use third party tools like FiveTran and Stitch to load data from RDBMS tables into Snowflake. Both FiveTran and Stitch offer free trials and are very easy to use. They both work in a similar way using change tracking to basically replicate data and records into Snowflake. The diagram below depicts the various methods described above for data loading:
Loading data is greatly simplified because complex ETL data pipelines are no longer needed to prepare data for loading. Snowflake supports and optimizes diverse data, both structured and semi-structured, while making that data accessible via SQL. Here, you can learn more about loading data in Snowflake.
Prepare Data for Analysis
One of the biggest advantages of Snowflake is the elastic and scalable compute that allows for ease of analytical data preparation. For example, instead of creating a table and an ETL process to do aggregate calculations or logic, you can create a view on the raw transaction detail. You can then connect tools like Tableau, Power BI, and MicroStrategy to Snowflake using ODBC connectivity, and build analytical reporting on top of your created view. When the report is executed, Snowflake’s virtual warehouse processes the logic on the base tables, and returns the output usually faster than an on-premise traditional data warehouses.
In this approach, we can reduce the number of tables at different aggregate levels and dimensions, while writing and maintaining ETLS to process raw data and tables. We can rely on Snowflake to do all the ETL processing using its massively parallel processing power very quickly. We hope this exercise has helped you get up and running with Snowflake, and realize some of the key benefits integrated within.