Modern Data Integration in Cloud – Azure Data Factory for Snowflake

As more organizations move toward modernizing their data warehouse, many applications and services have taken part in helping facilitate this digital transformation. This article explains how the cloud data integration service, Azure Data Factory, can be used to load data from on-premises databases into Snowflake, a cloud data warehouse.

With the modernization of data warehouses and emergence of platform as a service (PaaS) solutions, there are multiple applications for data integrations in the cloud. Azure Data Factory is one which is leading the way. Azure Data Factory (ADF) is a fully managed cloud data integration service which is able to compose data storage, movement, and processing services into automated data pipelines. It is possible to create, schedule, and orchestrate ETL/ELT workflows, at scale and wherever the data lives, including the cloud or a self-hosted network.

Snowflake Logo

Learn more about our partnership with Snowflake

Azure Data Factory is a crucial element of the whole Azure Big Data ecosystem. ADF pipeline definitions are stored in a set of JSON files. Previously, the JSON files had to be created manually. Fortunately, this is no longer the case. With Azure Data Factory V2, developers and consumers of data don’t have to care about JSON files.

Data Factory is suited for both ELT and ETL type of data loading jobs.

In ADF V2, SQL Server Integration Services (SSIS) packages can be moved to Azure Data Factory. This is a giant step forward. Smartbridge has a number of clients with tons of SSIS packages. The ability to move these into ADF enables full migration to Azure. This gives companies the flexibility to migrate quickly and reuse their current SSIS code investment in the cloud with ADF. There is no need to reinvent the wheel.

Some of the Capabilities of Azure Data Factory

  • Creation and orchestration ETL processes in Azure

  • User friendly interface for developers to create pipelines and transformations

  • Secured and scalable

  • Execution of stored procedures in the cloud

  • Easy to use and manage linked services over the network

  • Deployment of existing on-premises SSIS packages

Snowflake is the leading massively parallel processing (MPP) cloud data warehouse. Snowflake is built for the cloud.

Since July 2018 Snowflake is available on Azure which has brought clients more flexibility in choosing underlying cloud platform.

Snowflake Cloud Data Warehouse on Azure Diagram
Picture courtesy of Snowflake

There are number of steps which need to be taken in an order to establish ODBC connection between on-premises databases and Snowflake.

  1. Add Azure Data Factory Integration Runtime on Azure
  2. Install Azure Data Factory Integration Runtime on the on-premises server (get the latest version here)
  3. Pair these two Azure Data Factory Integration Runtime instances
  4. Download the ODBC Driver from Snowflake Help and install it on the on-premises server
  5. Add the Snowflake system DNS to the on-premises server
  6. Create an Azure Linked Service pointing to the on-premises source database
  7. Create an Azure Linked Service pointing to the Snowflake target database

Once connections are established, developers will use them for data movement pipelines or activities.

There’s more to explore at Smartbridge.com!

Sign up to be notified when we publish articles, news, videos and more!