Article

Azure Synapse Analytics: The Service Formerly Known as SQL Data Warehouse

Azure Synapse Analytics (formerly SQL Data Warehouse), Microsoft’s latest data service offering was announced earlier this month at Microsoft Ignite. Synapse is the next generation of Azure SQL Data Warehouse, blending big data analytics, data warehousing, and data integration into a single unified service that provides end-to-end analytics with limitless scale. It offers the capability to query your data using either serverless on-demand compute or provisioned resources. Bringing these two concepts together, Synapse delivers the ability to ingest, explore, prep, train, manage, and serve data through a single pane of glass to support business intelligence, machine learning, and data science workloads in the cloud.

source: Microsoft

It is the first service of its kind to bring multiple technologies together into one unified experience to reduce time to market, increase development efficiencies, and knock down silos between teams. End-to-end development starting from data ingestion, to cleansing, and all the way through to visualizations can be completed in one user interface. No longer will you need to switch between multiple tools to build and support your data and analytics platform. Synapse Analytics helps to bring together members from multiple teams into one tool to support collaboration across the enterprise data landscape.

Azure Synapse Studio

Microsoft has enriched the Azure SQL Data Warehouse experience by providing a sleek new look and feel. Azure Synapse Studio brings together the ability to ingest, explore, analyze, and visualize your data all through one user interface. Does it look familiar? Well it should—it has the same look and feel as the Azure Data Factory and Azure Databricks UI. It is an end-to-end unified experience, not only for data engineers, but for data scientists as well. If your gut reaction is like mine, right now you’re saying, “Oh great, yet another tool I need to learn.” No need to worry — Microsoft built this UI with users in mind. At its core, it leverages existing capabilities of other data services you may already use within the Azure eco-system, opposed to introducing a completely new tool.

source: Microsoft

Ingest

For those familiar with Azure Data Factory, the ingestion process in Azure Synapse Studio provides a very similar experience. You still have the ability to build pipelines and can take advantage of the copy data wizard, leverage data flows to perform your business transformations and retain the ability to manually trigger or schedule the execution of pipelines.

Explore

The Explore functionality is like Azure Data Explorer on steroids. Microsoft has provided us with the ability to explore storage accounts, data lakes, and databases all within the same interface. For those who have used Azure Data Explorer or Azure Data Studio, browsing your resources will feel no different. Not only can you explore your data, but it has some great features built in that allow you to easily discover data in storage accounts or data lakes. For example, similar to a right-click of a table or view in SQL Server Management Studio using “Select Top 1000”, you now have the same capability in Azure Synapse Studio to query files that reside in storage accounts and data lakes.

Analyze & Visualize

Microsoft brings together the ability to run both SQL and Spark, providing a single pane of glass to help bridge the gap between data engineers and data scientists. Using Azure Synapse Studio, you have the ability to analyze and transform data using both T-SQL and Spark Notebooks. The develop tab within Azure Synapse Studio allows you to develop and explore T-SQL scripts, spark notebooks, data flows, spark job definitions, and Power BI. Aside from the wow factor of having one place to do all these things, in my opinion, the coolest capability is how Power BI has been exposed and integrated into Synapse Studio—it’s now linked directly to the Power BI Service. Any datasets or reports that live in your Power BI workspaces are now browsable, can be edited directly in Synapse Studio, and republished out to the Power BI Service. Additionally, you have the capability to create new datasets and reports and publish those to Power BI as well.

New Features

Amongst all the new capabilities of Azure Synapse Studio, improvements to the data warehousing portion of the service were buried in the credits at Ignite, but are well worth mentioning.

source: Microsoft

Generally Available:

  • Result-set caching
  • Materialized views
  • Ordered clustered columnstore indexes
  • JSON support
  • Dynamic data masking
  • Integration with SQL Server Data Tools
  • Read committed snapshot isolation

In Public Preview:

  • Workload isolation
  • Simple ingestion with COPY INTO
  • Azure data share support
  • Private link support

In Private Preview:

  • Streaming ingestion and analytics
  • Built-in machine learning with native prediction and scoring capabilities
  • Fast query over parquet files (10x faster than Polybase)
  • Ability to update distribution columns
  • FROM clause with joins
  • Multi-column distribution support on tables
  • Column level encryption

A Unified Experience

(source: Microsoft)

Azure Synapse Analytics leverages Azure Data Lake Storage as the building blocks of storing and ingesting your data into the data warehouse. Combining the existing capabilities from Azure SQL Data Warehouse with the ability to run both Spark and SQL in clustered and serverless form factors enables both data science and data engineering workloads. This helps bridge the gap between data scientists and data engineers. Traditionally, data engineers use several tools to wrangle and shape data into a format that can support data science applications, and data scientists are using many tools unfamiliar to the data engineers. By providing the capabilities to support both data engineering and data science activities in one tool, Azure Synapse Analytics helps break down team silos by providing one unified experience for collaboration. The integration, management, monitoring, and security capabilities are unparalleled in the market, providing a streamlined end-user experience. With deep integrations between Azure Data Lake Storage, Azure Machine Learning, and Power BI, Microsoft is able to significantly reduce project development time and time to market with this end-to-end analytics solution.

Azure has some of the most advanced security and privacy features in the marketplace today. Features such as threat detection, transparent data encryption, and always-on encryption are built into the underlying architecture of Azure Synapse. Synapse also provides fine-grained access control to help ensure data stays safe and private by leveraging column-level security and native row-level security, as well as dynamic data masking to automatically protect sensitive data in real-time. Combine these features with a defense-in-depth security strategy, and Azure Synapse gives you complete control of security at all levels of the analytics platform.

Azure Synapse is a Game Changer

This truly was a massive release for the data team at Microsoft and it’s obvious they are investing heavily in the platform. The continued development and enhancement of rich features across Azure Synapse is what is projecting the service to be one of the top players in the market for providing big data and data warehousing capabilities. As a data architect who was an early adopter of the SQL Data Warehouse service when it first became generally available in Azure to today, I’ve seen the service come a long way from its early stages. This release contained very rich features, not only to the engine itself to increase performance, but also to add new functionality in providing data teams with a unified analytics experience.

Azure Synapse Analytics truly is a game changer and paving the path towards an integrated data experience for data scientists and engineers. To learn more about how Azure can drive performance for your enterprise visit, explore AHEAD’s Cloud Workshop offerings and Briefings Catalog.