Azure Data Factory For Data Engineers – Project on Covid19
Real world project for Data Engineers using Azure Data Factory, SQL, Data Lake, Databricks, HDInsight, CI/CD [DP203]
What you’ll learn
You will learn how to build a real-world data pipeline in Azure Data Factory (ADF).
You will acquire good Data Engineering skills in Azure using Azure Data Factory (ADF), Azure Data Lake Storage Gen2, Azure SQL Database and Azure Monitor
You will learn how to ingest data from sources such as HTTP and Azure Blob Storage into Azure Data Lake Gen2 using Azure Data Factory (ADF)
You will learn how to transform data using Data Flows in Azure Data Factory (ADF) and load into Azure Data Lake Storage Gen2
You will learn how to transform data using Databricks Notebook Activity in Azure Data Factory (ADF) and load into Azure Data Lake Storage Gen2
You will learn how to transform data using Azure HDInsight Activity in Azure Data Factory (ADF) and load into Azure Data Lake Storage Gen2
You will learn how to load transformed data from Azure Data Lake Storage Gen2 to Azure SQL Database using Azure Data Factory (ADF)
You will learn extensively about Triggers in Azure Data Factory (ADF) and how to use them to schedule the data pipelines.
You will learn how to monitor pipelines using Azure Data Factory (ADF), Azure Monitor and Log Analytics with a real-world project.
You will learn how to build production ready pipelines and good practices and naming standards
You will learn the topics required on Azure Data Factory to pass the Azure Data Engineer Associate Certification Exam DP203
You will learn about how to create CI/CD pipelines in Azure Devops to release ADF pipelines to higher environments (Testing/ Production)
Basic understanding about cloud computing will be useful, but not necessary.
Experience in Azure is not required, I will take you through everything necessary to learn this course and build the project
An Azure Account is required, If you don’t have one we will create a free account in the course
Major updates to the course since the launch
January 2023 – Updates to section 3 (Environment Set-up) to reflect the change to the User Interface. Re-recorded 5 lessons.
November 2022 – Addition of sections 15 & 16 focusing on Continuous Integration & Continuous Delivery (CI/CD)
I am looking forward to helping you with learning one of the in-demand data engineering tools in the cloud, Azure Data Factory (ADF)! This course has been taught with implementing a data engineering solution using Azure Data Factory (ADF) for a real world problem of reporting Covid-19 trends and prediction of the spread of this virus.
This is like no other course in Udemy for Azure Data Factory or Data Engineering Technologies. Once you have completed the course including all the assignments, I strongly believe that you will be in a position to start a real world data engineering project on your own and also proficient on Azure Data Factory (ADF).
I have also included lessons on the storage solutions such as Azure Data Lake Storage, Azure Blob Storage, Azure SQL Database etc. Also, there are lessons on Azure HDInsight and Azure Databricks. I have even included lessons on building reports using Power BI on the data processed by the Azure Data Factory data pipelines. I have considered the machine learning models to be out of scope. You can use this data to build your own models and predict the spread.
The course follows a logical progression of real world project implementation with technical concepts being explained and the data pipelines in Azure Data Factory (ADF) being built at the same time. Even-though this course is not specifically designed to teach you the skills required for passing the Azure Data Engineer Associate Certification exam DP203, it can greatly help you get most of the necessary skills required for the exam.
I value your time as much as I do mine. So, I have designed this course to be fast-paced and to the point. Also, the course has been taught with simple English and no jargons. I start the course from basis and by the end of the course you will be proficient in the technologies used.
Currently the course teaches you the following
Azure Data Factory
- Building a solution architecture for a data engineering solution using Azure Data Engineering technologies such as Azure Data Factory (ADF), Azure Data Lake Gen2, Azure Blob Storage, Azure SQL Database, Azure Databricks, Azure HDInsight and Microsoft PowerBI.
- Integrating data from HTTP clients, Azure Blob Storage and Azure Data Lake Gen2 using Azure Data Factory.
- Branching and Chaining activities in Azure Data Factory (ADF) Pipelines using control flow activities such as Get Metadata. If Condition, ForEach, Delete, Validation etc.
- Using Parameters and Variables in Pipelines, Datasets and LinkedServices to create a metadata driven pipelines in Azure Data Factory (ADF)
- Debugging the data pipelines and resolving issues.
- Scheduling pipelines using triggers such as Event Trigger, Schedule Trigger and Tumbling Window Trigger in Azure Data Factory (ADF)
- Creating Mapping Data Flows to create transformation logic. The course covers all of the transformation steps such as Source, Filter, Select, Pivot, Lookup, Conditional Split, Derived Column, Aggregate, Join and Sink transformation.
- Debugging data flows, investigating issues, fixing failures etc
- Implementing Azure Data Factory pipelines to invoke Mapping Data Flows and executing them.
- Creating ADF pipelines to execute HDInsight activities and carry out data transformations.
- Creating ADF pipelines to execute Databricks Notebook activities to carry out transformations.
- Creating dependency between pipelines to orchestrate the data flow
- Creating dependency between triggers to orchestrate the data flow
- Monitoring data pipelines, creating alerts, reporting of metrics from the Azure Data Factory Monitor.
- Monitoring of Data Factory pipelines using Azure Monitor and setting diagnostic setting to be forwarded to Azure Storage Account or Log Analytics Workspace.
- Creating Log Analytics workspace, creating workbooks and charts from log analytics on the Azure Data Factory pipelines
- Implementing the Azure Data Factory Analytics monitoring tool and how to extend the capability further.
Azure Storage Solutions
- Creating Azure Storage Account, Creating containers, Uploading data, Access Control (IAM), Using Azure Storage explorer to interact with the storage account
- Creating Azure Data Lake Gen2, Creating containers, Uploading data, Access Control (IAM), Using Azure Storage explorer to interact with the storage account
- Creating Azure SQL Database, Pricing Tiers, Creating Admin User, Creating Tables, Loading Data and Querying the database.
Azure HDInsight & Databricks
- Creating HDInsight Clusters, Interacting with the UI, Using Ambari, Creating Hive tables, Invoking HDInsight activities from Azure Data Factory
- Creating Azure Databricks Workspace, Creating Databricks clusters, Mounting storage accounts, Creating Databricks notebooks, performing transformations using Databricks notebooks, Invoking Databricks notebooks from Azure Data Factory.
Azure Devops (CI/CD)
- Creating Azure Devops Environment and configuring Azure Devops Git Repository
- CI/ CD process for releasing Azure Data Factory artefacts to higher environments
- Creating build and release pipelines in Azure Devops to release code to higher environments (Test/ Prod)
- Configuring/ Parameterise CI/CD pipelines to release ADF pipelines that access Azure Data Lake Storage.
Who this course is for:
- University students looking for a career in Data Engineering
- IT developers working on other disciplines trying to move to Data Engineering
- Data Engineers/ Data Warehouse Developers currently working on on-premises technologies, or other cloud platforms such as AWS or GCP who want to learn Azure Technologies
- Data Architects looking to gain an understanding about Azure Data Engineering stack
- Data Scientists who want extend their knowledge into data engineering
Created by Ramesh Retnasamy
Last updated 1/2023
Size: 4.77 GB
Google Drive Links