AWS Data Analytics Course Collection Bundle

These four courses together replace the Big Data on AWS course track and cover key topics covered in the exam for the AWS Certified Data Analytics – Specialty certification.

Days : 4
Price :





We recommend that attendees of this course have:

  • Completed the AWS Technical Essentials (AWSE) classroom course
  • One year of experience building data analytics pipelines or have completed the Data Analytics Fundamentals digital course

Outline: AWS Data Analytics Course Collection Bundle (DACCB)

Day 1:

Module 1: Introduction to data lakes

  • Describe the value of data lakes
  • Compare data lakes and data warehouses
  • Describe the components of a data lake
  • Recognize common architectures built on data lakes

Module 2: Data ingestion, cataloging, and preparation

  • Describe the relationship between data lake storage and data ingestion
  • Describe AWS Glue crawlers and how they are used to create a data catalog
  • Identify data formatting, partitioning, and compression for efficient storage and query
  • Lab 1: Set up a simple data lake

Module 3: Data processing and analytics

  • Recognize how data processing applies to a data lake
  • Use AWS Glue to process data within a data lake
  • Describe how to use Amazon Athena to analyze data in a data lake

Module 4: Building a data lake with AWS Lake Formation

  • Describe the features and benefits of AWS Lake Formation
  • Use AWS Lake Formation to create a data lake
  • Understand the AWS Lake Formation security model
  • Lab 2: Build a data lake using AWS Lake Formation

Module 5: Additional Lake Formation configurations

  • Automate AWS Lake Formation using blueprints and workflows
  • Apply security and access controls to AWS Lake Formation
  • Match records with AWS Lake Formation FindMatches
  • Visualize data with Amazon QuickSight
  • Lab 3: Automate data lake creation using AWS Lake Formation blueprints
  • Lab 4: Data visualization using Amazon QuickSight

Day 2:

Module A: Overview of Data Analytics and the Data Pipeline

  • Data analytics use cases
  • Using the data pipeline for analytics

Module 1: Introduction to Amazon EMR

  • Using Amazon EMR in analytics solutions
  • Amazon EMR cluster architecture
  • Interactive Demo 1: Launching an Amazon EMR cluster
  • Cost management strategies

Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage

  • Storage optimization with Amazon EMR
  • Data ingestion techniques

Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR

  • Apache Spark on Amazon EMR use cases
  • Why Apache Spark on Amazon EMR
  • Spark concepts
  • Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
  • Transformation, processing, and analytics
  • Using notebooks with Amazon EMR
  • Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR

Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive

  • Using Amazon EMR with Hive to process batch data
  • Transformation, processing, and analytics
  • Practice Lab 2: Batch data processing using Amazon EMR with Hive
  • Introduction to Apache HBase on Amazon EMR

Module 5: Serverless Data Processing

  • Serverless data processing, transformation, and analytics
  • Using AWS Glue with Amazon EMR workloads
  • Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions

Module 6: Security and Monitoring of Amazon EMR Clusters

  • Securing EMR clusters
  • Interactive Demo 3: Client-side encryption with EMRFS
  • Monitoring and troubleshooting Amazon EMR clusters
  • Demo: Reviewing Apache Spark cluster history

Module 7: Designing Batch Data Analytics Solutions

  • Batch data analytics use cases
  • Activity: Designing a batch data analytics workflow
  • Module B: Developing Modern Data Architectures on AWS
  • Modern data architectures

Day 3:

Module A: Overview of Data Analytics and the Data Pipeline

  • Data analytics use cases
  • Using the data pipeline for analytics

Module 1: Using Amazon Redshift in the Data Analytics Pipeline

  • Why Amazon Redshift for data warehousing?
  • Overview of Amazon Redshift

Module 2: Introduction to Amazon Redshift

  • Amazon Redshift architecture
  • Interactive Demo 1: Touring the Amazon Redshift console
  • Amazon Redshift features
  • Practice Lab 1: Setting up your data warehouse using Amazon Redshift

Module 3: Ingestion and Storage

  • Ingestion
  • Interactive Demo 2: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
  • Data distribution and storage
  • Interactive Demo 3: Analyzing semi-structured data using the SUPER data type
  • Querying data in Amazon Redshift
  • Practice Lab 2: Data analytics using Amazon Redshift Spectrum

Module 4: Processing and Optimizing Data

  • Data transformation
  • Advanced querying
  • Practice Lab 3: Data transformation and querying in Amazon Redshift
  • Resource management
  • Interactive Demo 4: Applying mixed workload management on Amazon Redshift
  • Automation and optimization

Module 5: Security and Monitoring of Amazon Redshift Clusters

  • Securing the Amazon Redshift cluster
  • Monitoring and troubleshooting Amazon Redshift clusters

Module 6: Designing Data Warehouse Analytics Solutions

  • Data warehouse use case review
  • Activity: Designing a data warehouse analytics workflow

Module B: Developing Modern Data Architectures on AWS

  • Modern data architectures

Day 4:

Module A: Overview of Data Analytics and the Data Pipeline

  • Data analytics use cases
  • Using the data pipeline for analytics

Module 1: Using Streaming Services in the Data Analytics Pipeline

  • The importance of streaming data analytics
  • The streaming data analytics pipeline
  • Streaming concepts

Module 2: Introduction to AWS Streaming Services

  • Streaming data services in AWS
  • Amazon Kinesis in analytics solutions
  • Demonstration: Explore Amazon Kinesis Data Streams
  • Practice Lab: Setting up a streaming delivery pipeline with Amazon Kinesis
  • Using Amazon Kinesis Data Analytics
  • Introduction to Amazon MSK
  • Overview of Spark Streaming

Module 3: Using Amazon Kinesis for Real-time Data Analytics

  • Exploring Amazon Kinesis using a clickstream workload
  • Creating Kinesis data and delivery streams
  • Demonstration: Understanding producers and consumers
  • Building stream producers
  • Building stream consumers
  • Building and deploying Flink applications in Kinesis Data Analytics
  • Demonstration: Explore Zeppelin notebooks for Kinesis Data Analytics
  • Practice Lab: Streaming analytics with Amazon Kinesis Data Analytics and Apache Flink

Module 4: Securing, Monitoring, and Optimizing Amazon Kinesis

  • Optimize Amazon Kinesis to gain actionable business insights
  • Security and monitoring best practices

Module 5: Using Amazon MSK in Streaming Data Analytics Solutions

  • Use cases for Amazon MSK
  • Creating MSK clusters
  • Demonstration: Provisioning an MSK Cluster
  • Ingesting data into Amazon MSK
  • Practice Lab: Introduction to access control with Amazon MSK
  • Transforming and processing in Amazon MSK

Module 6: Securing, Monitoring, and Optimizing Amazon MSK

  • Optimizing Amazon MSK
  • Demonstration: Scaling up Amazon MSK storage
  • Practice Lab: Amazon MSK streaming pipeline and application deployment
  • Security and monitoring
  • Demonstration: Monitoring an MSK cluster

Module 7: Designing Streaming Data Analytics Solutions

  • Use case review
  • Class Exercise: Designing a streaming data analytics workflow

Module B: Developing Modern Data Architectures on AWS

  • Modern data architectures