Agile Analytics – Azure SQL Data Warehouse

Today, everything needs to be instant, even our data. With the rise of cloud-based services like Microsoft Azure SQL Data Warehouse, companies are now able to gather the information they need whenever they need it with on-demand scalability. We will walk through Azure SQL Data Warehouse (ASDW), its implementation use cases, as well as how to load and visualize the data.

Article originally published March 2019

Introduction

With digital transformation, companies are facing increased demand to analyze data on the fly to obtain valuable insights in a timely manner. This might not be possible with existing infrastructure due to rapidly-growing volumes of structured and unstructured data. On-premises solutions take time to provision or scale up the resources. The emergence of cloud technologies brought agility, on-demand scalability, and quick return on investment. Smartbridge clients are now turning their attention to their legacy data warehouse as an opportunity to further leverage cloud-based services to modernize their data management architecture.

What is Azure SQL Data Warehouse?

Azure SQL Data Warehouse (ASDW) is a massively parallel processing (MPP) data warehouse with unlimited storage and computing power which can be scaled in and scaled out as per demand. Azure SQL Data Warehouse is optimized for reporting purposes. It is helping companies to analyze their structured and unstructured data faster to realize value sooner. With Polybase, SQL Server tables can be joined with unstructured data residing in Azure Blob Storage without need to bring that data into ASDW.

Azure SQL Data Warehouse Architecture

ASDW has a similar architecture to other managed MPP databases that separate storage from compute resources. Built on top of Azure Blob Storage, it dynamically provisions those compute resources to query data that resides there.

Azure SQL Data Warehouse Architecture Diagram
Picture courtesy of Microsoft

ASDW is divided between two types of nodes: Control (the “brain”) and Compute (the “engine”).

Control nodes:

  • Responsible for creating a query execution plan, breaking that query into parallel phases to be executed by each of the compute nodes

  • Stores query metadata, query optimization and distribution

Compute nodes:

  • Contains an instance of SQL Database responsible for processing the data

  • Returns query results to the control node to be aggregated once processed

Azure SQL DWH Implementation Use Cases

There are multiple implementation use cases available to suit company needs and long term plans.

1.   New: Create Azure SQL DWH on Microsoft Azure

In this use case, a completely new Azure SQL Data Warehouse is created and data is loaded from sources. This is for companies who want to build their first data warehouse.

Azure SQL DWH Diagram
Picture courtesy of Informatica

2.    Extend: On-Premises Enterprise Data Warehouse with Azure SQL Data Warehouse

In this use case, data from the on-premises data warehouse can be replicated to Azure SQL Data Warehouse to achieve higher performance and mitigate bottlenecks in data analysis and visualization.

3.    Lift and Shift: On-Premises Data Warehouse to Azure SQL Data Warehouse

In this approach, an on-premises data warehouse is migrated into Azure SQL Data Warehouse. This is accomplished using Microsoft Data Migration Service Utility. Alternatively, a current backup of SQL Server databases, residing in Azure Blob storage, can be restored to a managed instance of Azure SQL Data Warehouse.

Extend On-Premises Enterprise DWH Diagram
Picture courtesy of Informatica

Data Integration/Data Load

SQL Data Warehouse supports loading and exporting data through several Microsoft tools as well as third-party tools such as Informatica, Fivetran, Striim, Talend, and Alooma.

The most commonly used Microsoft tools are Azure Data Factory, SSIS, Polybase, and BCP.

Azure Data Factory (ADF)

  • Fully managed ETL service in cloud

  • Allows creation of data-driven workflows in the cloud for orchestrating and automating data movement and data transformation

  • Can load data from both relational and non-relational sources into Azure SQL Data Warehouse. It provides initial and incremental data load capabilities and jobs can be scheduled

  • Allows re-use of SSIS packages (see below)

  • Stored procedures with transformation logic can be implemented

AzCopy

  • Command-line utility designed for copying data to/from Microsoft Azure Blob, file, and table storage, using simple commands

SQL Server Integration Services (SSIS)

  • SSIS package can be used to load data from SQL Server into Azure SQL Data Warehouse

  • Allows restructuring, transformation and cleansing of data

  • SSIS uses ADO NET to connect to SQL Data Warehouse

  • SSIS packages can be executed into Azure Data Factory using Integration Runtime

Polybase

  • No ETL is required

  • Uses familiar T-SQL language to run queries on external data

  • Can push query operations to Hadoop

  • External data can be joined with data stored in ASDW

Apart from these tools, Microsoft also provides utilities to migrate an on-premises data warehouse into Azure SQL Data Warehouse.

Data Visualization

Azure SQL Data Warehouse supports leading data visualization tools for data analysis and reporting.

MicroStrategy

  • MicroStrategy’s connector enables web and desktop connectivity workflows and enhances the Relational Engine with support for Azure SQL Data Warehouse

Tableau

  • Connects to Azure SQL Data Warehouse through Tableau SQL Server connector

Power BI

  • Embedded Power BI as well Desktop Power BI can be connected for data visualization

MicroStrategy Partner
Tableau partner

Conclusion

With Microsoft Azure SQL Data Warehouse implementation, companies are achieving quicker return on their data management investment. Some of the most valuable benefits are:

Improved Analytics and Decision Making

  • With essentially unlimited scalability of cloud computing resources, companies achieve improvements in performance more easily when compared to on-premises architecture

  • Data scientists and business analysts can access and analyze data from multiple sources within a short time and bring agility to their business decision making

  • It’s secure and reliable, with minimum down time and data loss

Quick Setup and Lower Upfront Infrastructure Costs

  • No capital expenditure

  • There is no waiting period to procure infrastructure

  • Resources can be scaled up and down as needed, and paused when not in use

Improved Productivity

  • Reduced workload on Database Administrators and Developers

  • Overall reduction in overhead

Share This:

Looking for more on Data?

Explore more insights and expertise at Smartbridge.com/data

There’s more to explore at Smartbridge.com!

Sign up to be notified when we publish articles, news, videos and more!