Description
Prerequisites
We recommend that attendees of this course have:
- Completed the AWS Technical Essentials (AWSE) classroom course
 - One year of experience building data analytics pipelines or have completed the Data Analytics Fundamentals digital course
 
Outline: AWS Data Analytics Course Collection Bundle (DACCB)
Day 1:
Module 1: Introduction to data lakes
- Describe the value of data lakes
 - Compare data lakes and data warehouses
 - Describe the components of a data lake
 - Recognize common architectures built on data lakes
 
Module 2: Data ingestion, cataloging, and preparation
- Describe the relationship between data lake storage and data ingestion
 - Describe AWS Glue crawlers and how they are used to create a data catalog
 - Identify data formatting, partitioning, and compression for efficient storage and query
 - Lab 1: Set up a simple data lake
 
Module 3: Data processing and analytics
- Recognize how data processing applies to a data lake
 - Use AWS Glue to process data within a data lake
 - Describe how to use Amazon Athena to analyze data in a data lake
 
Module 4: Building a data lake with AWS Lake Formation
- Describe the features and benefits of AWS Lake Formation
 - Use AWS Lake Formation to create a data lake
 - Understand the AWS Lake Formation security model
 - Lab 2: Build a data lake using AWS Lake Formation
 
Module 5: Additional Lake Formation configurations
- Automate AWS Lake Formation using blueprints and workflows
 - Apply security and access controls to AWS Lake Formation
 - Match records with AWS Lake Formation FindMatches
 - Visualize data with Amazon QuickSight
 - Lab 3: Automate data lake creation using AWS Lake Formation blueprints
 - Lab 4: Data visualization using Amazon QuickSight
 
Day 2:
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
 - Using the data pipeline for analytics
 
Module 1: Introduction to Amazon EMR
- Using Amazon EMR in analytics solutions
 - Amazon EMR cluster architecture
 - Interactive Demo 1: Launching an Amazon EMR cluster
 - Cost management strategies
 
Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
- Storage optimization with Amazon EMR
 - Data ingestion techniques
 
Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
- Apache Spark on Amazon EMR use cases
 - Why Apache Spark on Amazon EMR
 - Spark concepts
 - Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
 - Transformation, processing, and analytics
 - Using notebooks with Amazon EMR
 - Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR
 
Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
- Using Amazon EMR with Hive to process batch data
 - Transformation, processing, and analytics
 - Practice Lab 2: Batch data processing using Amazon EMR with Hive
 - Introduction to Apache HBase on Amazon EMR
 
Module 5: Serverless Data Processing
- Serverless data processing, transformation, and analytics
 - Using AWS Glue with Amazon EMR workloads
 - Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions
 
Module 6: Security and Monitoring of Amazon EMR Clusters
- Securing EMR clusters
 - Interactive Demo 3: Client-side encryption with EMRFS
 - Monitoring and troubleshooting Amazon EMR clusters
 - Demo: Reviewing Apache Spark cluster history
 
Module 7: Designing Batch Data Analytics Solutions
- Batch data analytics use cases
 - Activity: Designing a batch data analytics workflow
 - Module B: Developing Modern Data Architectures on AWS
 - Modern data architectures
 
Day 3:
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
 - Using the data pipeline for analytics
 
Module 1: Using Amazon Redshift in the Data Analytics Pipeline
- Why Amazon Redshift for data warehousing?
 - Overview of Amazon Redshift
 
Module 2: Introduction to Amazon Redshift
- Amazon Redshift architecture
 - Interactive Demo 1: Touring the Amazon Redshift console
 - Amazon Redshift features
 - Practice Lab 1: Setting up your data warehouse using Amazon Redshift
 
Module 3: Ingestion and Storage
- Ingestion
 - Interactive Demo 2: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
 - Data distribution and storage
 - Interactive Demo 3: Analyzing semi-structured data using the SUPER data type
 - Querying data in Amazon Redshift
 - Practice Lab 2: Data analytics using Amazon Redshift Spectrum
 
Module 4: Processing and Optimizing Data
- Data transformation
 - Advanced querying
 - Practice Lab 3: Data transformation and querying in Amazon Redshift
 - Resource management
 - Interactive Demo 4: Applying mixed workload management on Amazon Redshift
 - Automation and optimization
 
Module 5: Security and Monitoring of Amazon Redshift Clusters
- Securing the Amazon Redshift cluster
 - Monitoring and troubleshooting Amazon Redshift clusters
 
Module 6: Designing Data Warehouse Analytics Solutions
- Data warehouse use case review
 - Activity: Designing a data warehouse analytics workflow
 
Module B: Developing Modern Data Architectures on AWS
- Modern data architectures
 
Day 4:
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
 - Using the data pipeline for analytics
 
Module 1: Using Streaming Services in the Data Analytics Pipeline
- The importance of streaming data analytics
 - The streaming data analytics pipeline
 - Streaming concepts
 
Module 2: Introduction to AWS Streaming Services
- Streaming data services in AWS
 - Amazon Kinesis in analytics solutions
 - Demonstration: Explore Amazon Kinesis Data Streams
 - Practice Lab: Setting up a streaming delivery pipeline with Amazon Kinesis
 - Using Amazon Kinesis Data Analytics
 - Introduction to Amazon MSK
 - Overview of Spark Streaming
 
Module 3: Using Amazon Kinesis for Real-time Data Analytics
- Exploring Amazon Kinesis using a clickstream workload
 - Creating Kinesis data and delivery streams
 - Demonstration: Understanding producers and consumers
 - Building stream producers
 - Building stream consumers
 - Building and deploying Flink applications in Kinesis Data Analytics
 - Demonstration: Explore Zeppelin notebooks for Kinesis Data Analytics
 - Practice Lab: Streaming analytics with Amazon Kinesis Data Analytics and Apache Flink
 
Module 4: Securing, Monitoring, and Optimizing Amazon Kinesis
- Optimize Amazon Kinesis to gain actionable business insights
 - Security and monitoring best practices
 
Module 5: Using Amazon MSK in Streaming Data Analytics Solutions
- Use cases for Amazon MSK
 - Creating MSK clusters
 - Demonstration: Provisioning an MSK Cluster
 - Ingesting data into Amazon MSK
 - Practice Lab: Introduction to access control with Amazon MSK
 - Transforming and processing in Amazon MSK
 
Module 6: Securing, Monitoring, and Optimizing Amazon MSK
- Optimizing Amazon MSK
 - Demonstration: Scaling up Amazon MSK storage
 - Practice Lab: Amazon MSK streaming pipeline and application deployment
 - Security and monitoring
 - Demonstration: Monitoring an MSK cluster
 
Module 7: Designing Streaming Data Analytics Solutions
- Use case review
 - Class Exercise: Designing a streaming data analytics workflow
 
Module B: Developing Modern Data Architectures on AWS
- Modern data architectures
 




