Best Practices in Azure Data Factory Version 2
While you can find several “how to” articles on the web about Microsoft’s Azure Data Factory (ADF), there are virtually no “why to” articles. Here we’ll define some best practices to remember while working in Azure Data Factory version 2.
Article originally published April 2020
Azure Data Factory Best Practices
Since there aren’t many guiding resources on Azure Data Factory version 2, I wanted to share some “bigger-picture” notions about how to approach orchestration and data pipelines from a more architectural perspective. Let me preface this by mentioning that ADF version 2 differs from version 1 in its features, which is why I direct the below content to heavily appeal to version 2.
First things first – Remember that good architecture practices always call for appropriate separation of concerns/functionality between your solution layers. If you are working in ADF, it stands to reason that you are probably building a Modern Data Architecture solution in the Azure cloud. Therefore, your solution should consist of at least 3 separate components/layers:
Having laid out these concepts, note that Azure Data Factory version 2 doesn’t play as much of a role in the consumption end of things. It is mostly intended as a key utility in the ingestion and transformation layers. That said, the specific role of ADF and your approach to it is different between those layers.
While ingestion can be carried out solely by ADF itself (with some considerations), transformation is not as straightforward, and ADF is better relegated to the role of orchestrator.
Which brings an important point into focus…
ADF is primarily an orchestration tool – not so much a data transformation tool. Yes, it has capabilities in that regard, but typical uses defer transformation logic into Databricks, Spark/Storm, or (less commonly these days) HDInsight.
What does this mean for your ELT/ETL architecture with ADF? It means considering each layer of the solution and zone of the data architecture separately. Couple your ingestion subsystem loosely with your transformation subsystem, and consider the needs of each separately. Don’t feel compelled to force ADF into a role it’s not suited for.
Keep Reading: Using Azure ARM Templates for Data Factory Deployments
Looking for more on Azure?
Explore more insights and expertise at smartbridge.com/data
There’s more to explore at Smartbridge.com!
Sign up to be notified when we publish articles, news, videos and more!
Other ways to
follow us: