Best Practices in Azure Data Factory Version 2

While you can find several “how to” articles on the web about Microsoft’s Azure Data Factory (ADF), there are virtually no “why to” articles. Here we’ll define some best practices to remember while working in Azure Data Factory version 2.

Article originally published April 2020

Azure Data Factory Best Practices

Since there aren’t many guiding resources on Azure Data Factory version 2, I wanted to share some “bigger-picture” notions about how to approach orchestration and data pipelines from a more architectural perspective. Let me preface this by mentioning that ADF version 2 differs from version 1 in its features, which is why I direct the below content to heavily appeal to version 2.

First things first – Remember that good architecture practices always call for appropriate separation of concerns/functionality between your solution layers. If you are working in ADF, it stands to reason that you are probably building a Modern Data Architecture solution in the Azure cloud. Therefore, your solution should consist of at least 3 separate components/layers:

The Ingestion Layer
This layer purely focuses on the intake of raw data from source systems. Typically in Modern Data Architecture, this layer stores data in a “raw zone” in a data lake store. Perform minimal cleansing or transformation here if any exists. Further, there should be little (if any) consumption of data by non-system users from the raw zone.
Transformation/Experimentation Layer
This layer is where we massage data from the raw zone into a consumable form. Typically, this process is more than just a data store. Transformations are performed on raw data, and that data is stored in the consumption layer. This is where we fulfill experimentation and data science needs, as such data sets might not be used identically for end-user consumption.
Consumption Layer
This is where we store ready-to-use data for user consumption. It may consist of a formalized data warehouse or mart structure, but it also might be stored in the data lake itself in a “cleansed” or “user” zone.

Having laid out these concepts, note that Azure Data Factory version 2 doesn’t play as much of a role in the consumption end of things. It is mostly intended as a key utility in the ingestion and transformation layers. That said, the specific role of ADF and your approach to it is different between those layers.

While ingestion can be carried out solely by ADF itself (with some considerations), transformation is not as straightforward, and ADF is better relegated to the role of orchestrator.

cloud-analytics-services-ebook-transparent

Accelerate to your future state!

Download the Cloud Analytics E-Book

Which brings an important point into focus…

ADF is primarily an orchestration tool – not so much a data transformation tool. Yes, it has capabilities in that regard, but typical uses defer transformation logic into Databricks, Spark/Storm, or (less commonly these days) HDInsight.

What does this mean for your ELT/ETL architecture with ADF? It means considering each layer of the solution and zone of the data architecture separately. Couple your ingestion subsystem loosely with your transformation subsystem, and consider the needs of each separately. Don’t feel compelled to force ADF into a role it’s not suited for.

Keep Reading: Using Azure ARM Templates for Data Factory Deployments

Looking for more on Azure?

Explore more insights and expertise at smartbridge.com/data

There’s more to explore at Smartbridge.com!

By signing up for emails from Smartbridge.com, you agree to our terms and privacy policy.

Other ways to
follow us:

eBook Library Featuring Microsoft & Salesforce

Data

Automation

AI

Best Practices in Azure Data Factory Version 2

While you can find several “how to” articles on the web about Microsoft’s Azure Data Factory (ADF), there are virtually no “why to” articles. Here we’ll define some best practices to remember while working in Azure Data Factory version 2.

Azure Data Factory Best Practices

Keep Reading: Using Azure ARM Templates for Data Factory Deployments

Share This:

Looking for more on Azure?

Explore more insights and expertise at smartbridge.com/data

There’s more to explore at Smartbridge.com!

Email Signup

By signing up for emails from Smartbridge.com, you agree to our terms and privacy policy.

About the Author: Smartbridge

Contact

Services

Subscribe

Email Signup

eBook Library
Featuring Microsoft & Salesforce