The Path Towards Modern Cloud Data Warehousing with Snowflake
As business processes are adapting, so should your data warehouse. We’ll discuss what moving to a cloud data warehouse entails and how our partner, Snowflake, is providing a simple, efficient, and flexible solution to handling the high volume, variety, and velocity of data companies need to tend to.
Picture courtesy: Snowflake Computing Inc.
The Rising Need of a Modern Data Warehouse
Business Intelligence processes play a key role in all organizations that help various layers from executives to managers to make decisions and analyze their business. The analysis helps the business make better decisions and increase the productiveness of their organizations. Our data management services team understands that certain factors like a few listed below could impact the effectiveness of the analytics.
- In the past and present, several organizations have managed their data on-premise where the data sources are present in a local database and bound by network firewalls. More recently, the growing pain of a traditional on-premise data warehouse has led to performance issues, workload contentions, etc. One of the most common outcome is longer run times of user reports that eventually could time out or hang up without completion. Typically, companies have to work around with options like changing the data load processes by building smaller aggregate datasets or separating the conflicting run times for data and dashboards.
- The rise of new data sources and the growing trend of IoT (Internet of Things) has a huge impact on the analytics. This data being unstructured and huge in volume could pose challenges to design datasets using traditional database practices. This makes cloud data warehouse a perfect location to store and integrate the data.
Snowflake’s unique design and capabilities provide solution to several problems like the ones we discussed above. Let’s look at some of the key features below:
Picture courtesy: Snowflake Computing Inc.
Storage, Compute, and Services
The traditional on premise data warehouses are becoming expensive in terms of hardware and administration. It often needs a dedicated team of database administrators and design experts to monitor and maintain the system. As we all know, “data assets” are ever growing and companies have to estimate the costs upfront in a traditional on-premise data warehouse. This is where CDW (Cloud Data Warehouse) provides flexibility that eliminates the upfront costs by providing with the “pay as you go” pricing option. It also opens up the advantage of exploring options with broader data sets. Statistics show that storage costs have considerably declined over the past few years with data storage getting cheaper proving that integrating data in the cloud makes things easier. Snowflake’s unique architecture separates the service, compute and storage components.
- Snowflake uses Amazon S3 for storage. It supports columnar based storage which leads to improved query performance
- The independent compute resources handle the data processing tasks for loading and running queries
- The services layer is like the engine that communicates with Client applications to coordinate query processing and return results
Infinite Scalability and Elasticity
Analytics is critical to business and users often want to get results within a reasonable time frame. The limitation with the conventional data warehouse is the slowness of the system which results with workloads to run within the limited resource availability. The cloud data warehouse has the perfect solution of elastic scaling where the resources can be scaled up and down at any given time based on number of workloads.
Have you ever wondered how this works?
Snowflake uses Virtual Warehouses (VW) for the query processing. Each VW is an independent cluster, and can be scaled up and down as the need changes. The design allows multiple clusters to access the same database tables allowing data sharing without any data movement. Costs are incurred only when the VW is being used/up and running. Multi cluster warehouses can be created by just specifying name and size while Snowflake handles all the provisioning and configuration.
Say Goodbye to Work Load Contentions
It is a common scenario where multiple processes like data loads and user report queries access the system at the same time. This leads to a problem where the user might see partial/inaccurate data or end up modifying the report execution times which can impact the business team both in terms of time and value. Snowflake’s unique design distributes the loads on virtual warehouses across multiple compute clusters. Let’s look at how contentions are avoided:
For example, we can assign ETL and business user runs to run on their individual cluster – Data Cluster and Business Cluster. By doing this we avoid any bottlenecks that happen on loads and the user runs, as they run independently of each other. Further, it is also possible to clone a table, schema, or a database across different environments to get updated/refreshed results continuously.
Handle Various Data Forms
Data modeling is very critical because it represents all the entities of the business.
In the modern cloud world, data modeling is agile and can change overtime. To support this, Snowflake handles structured and semi-structured (JSON, XML, etc.) data very well. In addition to supporting the common SQL data types (Numeric, String & Binary, Data & Time), Snowflake has VARIANT, OBJECT, and ARRAY data types to support semi structured data. All the common database DDL operations can be performed using the SQL: ANSI.
The most common concern in cloud systems is the potential risk to the data security, security of users, and access to sensitive information. Snowflake provides the network, user, and data security similar to other cloud providers. In addition, there are some great features for data recovery and secure views offered by Snowflake.
- Let’s say an object was deleted accidentally. Do you think it can be recovered to the state it was in at that point of time? The answer is yes. Snowflake offers a retention period up to 90 days to recover the objects that have been deleted.
- Snowflake allows you to create a secure view when sharing needs to be restricted within a required data set. Snowflake strongly recommends sharing secure views and/or secure UDFs instead of directly sharing tables so that it does not expose all the sensitive data underneath. We can also add privileges to the secure view to other groups.
I hope this post gave you an overview for the importance of cloud data warehousing and the benefits of having Snowflake for your data warehousing needs.
Keep Reading: Getting Started with Snowflake Cloud Data Warehouse
There’s more to explore at Smartbridge.com!
Sign up to be notified when we publish articles, news, videos and more!