Description

Prerequisites

We recommend that attendees of this course have:

Completed the AWS Technical Essentials (AWSE) classroom course
One year of experience building data analytics pipelines or have completed the Data Analytics Fundamentals digital course

Outline: AWS Data Analytics Course Collection Bundle (DACCB)

Day 1:

Module 1: Introduction to data lakes

Describe the value of data lakes
Compare data lakes and data warehouses
Describe the components of a data lake
Recognize common architectures built on data lakes

Module 2: Data ingestion, cataloging, and preparation

Describe the relationship between data lake storage and data ingestion
Describe AWS Glue crawlers and how they are used to create a data catalog
Identify data formatting, partitioning, and compression for efficient storage and query
Lab 1: Set up a simple data lake

Module 3: Data processing and analytics

Recognize how data processing applies to a data lake
Use AWS Glue to process data within a data lake
Describe how to use Amazon Athena to analyze data in a data lake

Module 4: Building a data lake with AWS Lake Formation

Describe the features and benefits of AWS Lake Formation
Use AWS Lake Formation to create a data lake
Understand the AWS Lake Formation security model
Lab 2: Build a data lake using AWS Lake Formation

Module 5: Additional Lake Formation configurations

Automate AWS Lake Formation using blueprints and workflows
Apply security and access controls to AWS Lake Formation
Match records with AWS Lake Formation FindMatches
Visualize data with Amazon QuickSight
Lab 3: Automate data lake creation using AWS Lake Formation blueprints
Lab 4: Data visualization using Amazon QuickSight

Day 2:

Module A: Overview of Data Analytics and the Data Pipeline

Data analytics use cases
Using the data pipeline for analytics

Module 1: Introduction to Amazon EMR

Using Amazon EMR in analytics solutions
Amazon EMR cluster architecture
Interactive Demo 1: Launching an Amazon EMR cluster
Cost management strategies

Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage

Storage optimization with Amazon EMR
Data ingestion techniques

Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR

Apache Spark on Amazon EMR use cases
Why Apache Spark on Amazon EMR
Spark concepts
Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
Transformation, processing, and analytics
Using notebooks with Amazon EMR
Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR

Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive

Using Amazon EMR with Hive to process batch data
Transformation, processing, and analytics
Practice Lab 2: Batch data processing using Amazon EMR with Hive
Introduction to Apache HBase on Amazon EMR

Module 5: Serverless Data Processing

Serverless data processing, transformation, and analytics
Using AWS Glue with Amazon EMR workloads
Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions

Module 6: Security and Monitoring of Amazon EMR Clusters

Securing EMR clusters
Interactive Demo 3: Client-side encryption with EMRFS
Monitoring and troubleshooting Amazon EMR clusters
Demo: Reviewing Apache Spark cluster history

Module 7: Designing Batch Data Analytics Solutions

Batch data analytics use cases
Activity: Designing a batch data analytics workflow
Module B: Developing Modern Data Architectures on AWS
Modern data architectures

Day 3:

Module A: Overview of Data Analytics and the Data Pipeline

Data analytics use cases
Using the data pipeline for analytics

Module 1: Using Amazon Redshift in the Data Analytics Pipeline

Why Amazon Redshift for data warehousing?
Overview of Amazon Redshift

Module 2: Introduction to Amazon Redshift

Amazon Redshift architecture
Interactive Demo 1: Touring the Amazon Redshift console
Amazon Redshift features
Practice Lab 1: Setting up your data warehouse using Amazon Redshift

Module 3: Ingestion and Storage

Ingestion
Interactive Demo 2: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
Data distribution and storage
Interactive Demo 3: Analyzing semi-structured data using the SUPER data type
Querying data in Amazon Redshift
Practice Lab 2: Data analytics using Amazon Redshift Spectrum

Module 4: Processing and Optimizing Data

Data transformation
Advanced querying
Practice Lab 3: Data transformation and querying in Amazon Redshift
Resource management
Interactive Demo 4: Applying mixed workload management on Amazon Redshift
Automation and optimization

Module 5: Security and Monitoring of Amazon Redshift Clusters

Securing the Amazon Redshift cluster
Monitoring and troubleshooting Amazon Redshift clusters

Module 6: Designing Data Warehouse Analytics Solutions

Data warehouse use case review
Activity: Designing a data warehouse analytics workflow

Module B: Developing Modern Data Architectures on AWS

Modern data architectures

Day 4:

Module A: Overview of Data Analytics and the Data Pipeline

Data analytics use cases
Using the data pipeline for analytics

Module 1: Using Streaming Services in the Data Analytics Pipeline

The importance of streaming data analytics
The streaming data analytics pipeline
Streaming concepts

Module 2: Introduction to AWS Streaming Services

Streaming data services in AWS
Amazon Kinesis in analytics solutions
Demonstration: Explore Amazon Kinesis Data Streams
Practice Lab: Setting up a streaming delivery pipeline with Amazon Kinesis
Using Amazon Kinesis Data Analytics
Introduction to Amazon MSK
Overview of Spark Streaming

Module 3: Using Amazon Kinesis for Real-time Data Analytics

Exploring Amazon Kinesis using a clickstream workload
Creating Kinesis data and delivery streams
Demonstration: Understanding producers and consumers
Building stream producers
Building stream consumers
Building and deploying Flink applications in Kinesis Data Analytics
Demonstration: Explore Zeppelin notebooks for Kinesis Data Analytics
Practice Lab: Streaming analytics with Amazon Kinesis Data Analytics and Apache Flink

Module 4: Securing, Monitoring, and Optimizing Amazon Kinesis

Optimize Amazon Kinesis to gain actionable business insights
Security and monitoring best practices

Module 5: Using Amazon MSK in Streaming Data Analytics Solutions

Use cases for Amazon MSK
Creating MSK clusters
Demonstration: Provisioning an MSK Cluster
Ingesting data into Amazon MSK
Practice Lab: Introduction to access control with Amazon MSK
Transforming and processing in Amazon MSK

Module 6: Securing, Monitoring, and Optimizing Amazon MSK

Optimizing Amazon MSK
Demonstration: Scaling up Amazon MSK storage
Practice Lab: Amazon MSK streaming pipeline and application deployment
Security and monitoring
Demonstration: Monitoring an MSK cluster

Module 7: Designing Streaming Data Analytics Solutions

Use case review
Class Exercise: Designing a streaming data analytics workflow

Module B: Developing Modern Data Architectures on AWS

Modern data architectures

AWS Data Analytics Course Collection Bundle

Description

Prerequisites

Outline: AWS Data Analytics Course Collection Bundle (DACCB)

Related Products

AWS Technical Essentials

Splunk Enterprise Architect Fast Start

Designing HPE Campus Access Solutions