Description
Prerequisites
We recommend that attendees of this course have:
- Completed the AWS Technical Essentials (AWSE) classroom course
- One year of experience building data analytics pipelines or have completed the Data Analytics Fundamentals digital course
Outline: AWS Data Analytics Course Collection Bundle (DACCB)
Day 1:
Module 1: Introduction to data lakes
- Describe the value of data lakes
- Compare data lakes and data warehouses
- Describe the components of a data lake
- Recognize common architectures built on data lakes
Module 2: Data ingestion, cataloging, and preparation
- Describe the relationship between data lake storage and data ingestion
- Describe AWS Glue crawlers and how they are used to create a data catalog
- Identify data formatting, partitioning, and compression for efficient storage and query
- Lab 1: Set up a simple data lake
Module 3: Data processing and analytics
- Recognize how data processing applies to a data lake
- Use AWS Glue to process data within a data lake
- Describe how to use Amazon Athena to analyze data in a data lake
Module 4: Building a data lake with AWS Lake Formation
- Describe the features and benefits of AWS Lake Formation
- Use AWS Lake Formation to create a data lake
- Understand the AWS Lake Formation security model
- Lab 2: Build a data lake using AWS Lake Formation
Module 5: Additional Lake Formation configurations
- Automate AWS Lake Formation using blueprints and workflows
- Apply security and access controls to AWS Lake Formation
- Match records with AWS Lake Formation FindMatches
- Visualize data with Amazon QuickSight
- Lab 3: Automate data lake creation using AWS Lake Formation blueprints
- Lab 4: Data visualization using Amazon QuickSight
Day 2:
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Introduction to Amazon EMR
- Using Amazon EMR in analytics solutions
- Amazon EMR cluster architecture
- Interactive Demo 1: Launching an Amazon EMR cluster
- Cost management strategies
Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
- Storage optimization with Amazon EMR
- Data ingestion techniques
Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
- Apache Spark on Amazon EMR use cases
- Why Apache Spark on Amazon EMR
- Spark concepts
- Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
- Transformation, processing, and analytics
- Using notebooks with Amazon EMR
- Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR
Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
- Using Amazon EMR with Hive to process batch data
- Transformation, processing, and analytics
- Practice Lab 2: Batch data processing using Amazon EMR with Hive
- Introduction to Apache HBase on Amazon EMR
Module 5: Serverless Data Processing
- Serverless data processing, transformation, and analytics
- Using AWS Glue with Amazon EMR workloads
- Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions
Module 6: Security and Monitoring of Amazon EMR Clusters
- Securing EMR clusters
- Interactive Demo 3: Client-side encryption with EMRFS
- Monitoring and troubleshooting Amazon EMR clusters
- Demo: Reviewing Apache Spark cluster history
Module 7: Designing Batch Data Analytics Solutions
- Batch data analytics use cases
- Activity: Designing a batch data analytics workflow
- Module B: Developing Modern Data Architectures on AWS
- Modern data architectures
Day 3:
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Using Amazon Redshift in the Data Analytics Pipeline
- Why Amazon Redshift for data warehousing?
- Overview of Amazon Redshift
Module 2: Introduction to Amazon Redshift
- Amazon Redshift architecture
- Interactive Demo 1: Touring the Amazon Redshift console
- Amazon Redshift features
- Practice Lab 1: Setting up your data warehouse using Amazon Redshift
Module 3: Ingestion and Storage
- Ingestion
- Interactive Demo 2: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
- Data distribution and storage
- Interactive Demo 3: Analyzing semi-structured data using the SUPER data type
- Querying data in Amazon Redshift
- Practice Lab 2: Data analytics using Amazon Redshift Spectrum
Module 4: Processing and Optimizing Data
- Data transformation
- Advanced querying
- Practice Lab 3: Data transformation and querying in Amazon Redshift
- Resource management
- Interactive Demo 4: Applying mixed workload management on Amazon Redshift
- Automation and optimization
Module 5: Security and Monitoring of Amazon Redshift Clusters
- Securing the Amazon Redshift cluster
- Monitoring and troubleshooting Amazon Redshift clusters
Module 6: Designing Data Warehouse Analytics Solutions
- Data warehouse use case review
- Activity: Designing a data warehouse analytics workflow
Module B: Developing Modern Data Architectures on AWS
- Modern data architectures
Day 4:
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Using Streaming Services in the Data Analytics Pipeline
- The importance of streaming data analytics
- The streaming data analytics pipeline
- Streaming concepts
Module 2: Introduction to AWS Streaming Services
- Streaming data services in AWS
- Amazon Kinesis in analytics solutions
- Demonstration: Explore Amazon Kinesis Data Streams
- Practice Lab: Setting up a streaming delivery pipeline with Amazon Kinesis
- Using Amazon Kinesis Data Analytics
- Introduction to Amazon MSK
- Overview of Spark Streaming
Module 3: Using Amazon Kinesis for Real-time Data Analytics
- Exploring Amazon Kinesis using a clickstream workload
- Creating Kinesis data and delivery streams
- Demonstration: Understanding producers and consumers
- Building stream producers
- Building stream consumers
- Building and deploying Flink applications in Kinesis Data Analytics
- Demonstration: Explore Zeppelin notebooks for Kinesis Data Analytics
- Practice Lab: Streaming analytics with Amazon Kinesis Data Analytics and Apache Flink
Module 4: Securing, Monitoring, and Optimizing Amazon Kinesis
- Optimize Amazon Kinesis to gain actionable business insights
- Security and monitoring best practices
Module 5: Using Amazon MSK in Streaming Data Analytics Solutions
- Use cases for Amazon MSK
- Creating MSK clusters
- Demonstration: Provisioning an MSK Cluster
- Ingesting data into Amazon MSK
- Practice Lab: Introduction to access control with Amazon MSK
- Transforming and processing in Amazon MSK
Module 6: Securing, Monitoring, and Optimizing Amazon MSK
- Optimizing Amazon MSK
- Demonstration: Scaling up Amazon MSK storage
- Practice Lab: Amazon MSK streaming pipeline and application deployment
- Security and monitoring
- Demonstration: Monitoring an MSK cluster
Module 7: Designing Streaming Data Analytics Solutions
- Use case review
- Class Exercise: Designing a streaming data analytics workflow
Module B: Developing Modern Data Architectures on AWS
- Modern data architectures