A brief overview of what, why, and how for Azure Synapse Analytics
Introduction
Synapse Analytics is an integrated platform service from Microsoft Azure that combines the capabilities of data warehousing, data integrations, ETL pipelines, analytics tools & services, the scale for big-data capabilities, visualization & dashboards.
The platform enables enterprise-wide analytics requirements for decision-making. The tools, processes, and techniques support — Analytics capabilities across the dimensions: Descriptive & Diagnostic Analyticsby leveraging its data-warehouse capabilities whereby gathering business insights with the help of T-SQL queries. It empowers an organization’s decision-making by leveraging Predictiveand Prescriptive Analytics capabilities using its integration with Apache Spark, Databricks, Stream Analytics.
How it Works
Azure Synapse Analytics is a one-stop-shop analytics solution that offers the following capabilities:
- The dedicated-poolof SQL Servers, known as, Synapse SQL, is the backbone for the entire analytics data storage, these provide the necessary infrastructure for implementing a data warehouse under its hood. Enables engineers to execute T-SQL queries native to their existing experience.
It also enables empower with the serverless model is for unplanned or ad-hoc workloads, by using data virtualization allows unlocking insights from their own data stores without going through the formal processes of setting up a data warehouse. - ETL and data integration capabilities from disparate sources using Synapse Pipelines enables the organization to churn the data efficiently for warehousing and analytical purpose. The reusable workflows and Pipeline orchestration capabilities are easy to adapt. The support for big-data compute services such as HDInsight for Hadoop and DataBricks make this a more powerful ETL tool.
- Enable the development of big-data workloads and machine learning solutions with Apache Spark for Azure Synapse. This platform processes big data workloads by enabling massively scalable high-performance-compute resources. SparkML algorithms and Azure ML integration make it a complete solution for training machine learning workloads.
- Deliver real-time operational analytics from operational data sources using Synapse Link.
Where it fits?
The most common business use-cases for Azure Synapse Analytics are:
- Data Warehouse: Ability to integrate with various data platforms and services.
- Descriptive/Diagnostic Analytics: Use T-SQL queries against the Synapse database to perform data exploration and discovery.
- Realtime Analytics: Azure Synapse Link enables integration with disparate operational data sources to implement real-time analytics solutions.
- Advanced Analytics: Uses Azure Databricks to support decision-making by leveraging Azure Databricks.
- Reporting & Visualization: Integrate with PowerBI to empower and enhance business decision-making.
Features Walkthrough
Assuming that you have already set up and configured the Azure Synapse Analytics workspace in your Azure subscription.
Setting up and configuration of Azure Synapse Analytics workspace is beyond the scope of this article. Please look into the references section for more details.
Load and Analyze Data using Spark
- From Data Hub, Browse the gallery and the
covid-tracking
dataset. - Open the new Spark notebook using the dataset as below.
- The notebook contains the following default code, with further capabilities to analyze the dataset using Spark backbone framework:-
## ---- ##
df = spark.read.parquet(wasbs_path)
display(df.limit(10))
Analyze the Data in Serverless SQL Pool
- Again, load the sample dataset from the Data Hub, but, this time create a New SQL Script as shown below:
- This will give you the free hand script to analyze the data set using native T-SQL queries
Setup and Integrate a Pipeline
This small demo will show to integrate pipeline tasks in Azure Synapse
- From the Integrate hub of the Synapse Analytics Studio select the Pipeline.
- Once the pipeline is instantiated — you can add workflow Activities according to the problem at hand.
- You can add trigger conditions to respond to an event or manual execution of the Pipeline workflow. Also, select the Monitor hub, and choose the Pipeline runs to monitor any pipeline execution progress.
Integrate Linked Services
Use Synapse Analytics Studio to integrate and enable various Linked Services, e.g. Power BI
- Use Manage Hub to see existing Linked services
- Also, create a link to the PowerBI workspace to leverage data visualization and reporting.
Conclusion
In this brief introduction of Azure Synapse Analytics, I have navigated through the shallow waters to understand the basic functional capabilities of this managed service from Azure.
You now understand what is Synapse Analytics, why it is very important to solve optimize organizations’ goals, and very slightly about how its capabilities can be leveraged for various analytical use-case.
Hope you have gained some insights, do let me know about your feedback and queries.