Modern Data Integration in Cloud – Azure Data Factory for Snowflake
As more organizations move toward modernizing their data warehouse, many applications and services have taken part in helping facilitate this digital transformation. This is the first part in a 2 part series on how the cloud data integration service, Azure Data Factory, can be used to load data from on-premises databases into Snowflake, a cloud data warehouse.
With the modernization of data warehouses and emergence of platform as a service (PaaS) solutions, there are multiple applications for data integrations in the cloud. Azure Data Factory is one which is leading the way. Azure Data Factory (ADF) is a fully managed cloud data integration service which is able to compose data storage, movement, and processing services into automated data pipelines. It is possible to create, schedule, and orchestrate ETL/ELT workflows, at scale and wherever the data lives, including the cloud or a self-hosted network.
Azure Data Factory is a crucial element of the whole Azure Big Data ecosystem. ADF pipeline definitions are stored in a set of JSON files. Previously, the JSON files had to be created manually. Fortunately, this is no longer the case. With Azure Data Factory V2, developers and consumers of data don’t have to care about JSON files.
Data Factory is suited for both ELT and ETL type of data loading jobs.
In ADF V2, SQL Server Integration Services (SSIS) packages can be moved to Azure Data Factory. This is a giant step forward. Smartbridge has a number of clients with tons of SSIS packages. The ability to move these into ADF enables full migration to Azure. This gives companies the flexibility to migrate quickly and reuse their current SSIS code investment in the cloud with ADF. There is no need to reinvent the wheel.
Some of the Capabilities of Azure Data Factory
Creation and orchestration ETL processes in Azure
User friendly interface for developers to create pipelines and transformations
Secured and scalable
Execution of stored procedures in the cloud
Easy to use and manage linked services over the network
Deployment of existing on-premises SSIS packages
Since July 2018 Snowflake is available on Azure which has brought clients more flexibility in choosing underlying cloud platform.
Picture courtesy of Snowflake
There are number of steps which need to be taken in an order to establish ODBC connection between on-premises databases and Snowflake.
- Add Azure Data Factory Integration Runtime on Azure
- Install Azure Data Factory Integration Runtime on the on-premises server
- Pair these two Azure Data Factory Integration Runtime instances
- Download the ODBC Driver from Snowflake Help and install it on the on-premises server
- Add the Snowflake system DNS to the on-premises server
- Create an Azure Linked Service pointing to the on-premises source database
- Create an Azure Linked Service pointing to the Snowflake target database
In the next part, we will cover all 7 steps. Once connections are established, developers will use them for data movement pipelines or activities.
There’s more to explore at Smartbridge.com!
Sign up to be notified when we publish articles, news, videos and more!