Great — DAY 6 is where many data engineers get confused, so we’ll lock this down at architect level.


📅 DAY 6 — DATABASES ON AWS (MASTER LEVEL FOR DATA ENGINEERS)

RDS vs DynamoDB vs Glue Catalog — zero confusion forever

Amazon RDS
Amazon DynamoDB


🎯 DAY 6 GOAL

By the end of today, you will:

  • Clearly know WHEN to use which database
  • Understand why most data engineers don’t manage DB servers
  • Stop confusing Glue Catalog with databases
  • Design real production metadata & application data patterns
  • Answer database interview questions confidently

🧠 PART 1 — FIRST PRINCIPLE: WHY DATABASES EXIST

Core problem databases solve:

Fast, consistent access to structured data with guarantees

Not all data needs:

  • Transactions
  • Indexes
  • ACID
  • Millisecond latency

📌 Data engineers mostly process data, not serve transactions


🧠 PART 2 — AMAZON RDS (RELATIONAL DATABASES)

Image
Image

🧩 What RDS Is

Amazon RDS = managed:

  • PostgreSQL
  • MySQL
  • MariaDB
  • Oracle
  • SQL Server

You get:

  • Backups
  • Patching
  • High availability
  • Scaling (to a point)

🧠 WHEN RDS IS USED (REAL LIFE)

✔ Application databases
✔ Metadata tables
✔ Configuration stores
✔ Small control tables

❌ NOT for big data analytics
❌ NOT for data lakes


🧠 RDS REAL-WORLD DATA ENGINEER USE CASES

Airflow metadata DB
Job status tables
ETL control tables
Audit tables

📌 RDS is NOT where raw data lives


🔐 RDS + VPC (IMPORTANT)

  • RDS always runs inside VPC
  • Usually in private subnet
  • Accessed via:
    • EC2
    • Lambda (inside VPC)

🧠 PART 3 — AMAZON DYNAMODB (NO-SQL, SERVERLESS)

Image
Image

🧩 What DynamoDB Is

Amazon DynamoDB = serverless key-value / document DB

You get:

  • Massive scale
  • Single-digit ms latency
  • No servers
  • Auto scaling

🧠 WHEN DYNAMODB IS USED

✔ Event-driven systems
✔ High-scale lookups
✔ Session stores
✔ Job state tracking

❌ Complex joins
❌ Analytics queries


🧠 DynamoDB Mental Model

Primary Key
 ├── Partition Key (required)
 └── Sort Key (optional)

📌 Bad key design = disaster


🧠 DynamoDB IN DATA PLATFORMS

Lambda → DynamoDB
Step Functions → DynamoDB
Job status → DynamoDB

📌 Often used instead of RDS for simple state


🧠 PART 4 — GLUE CATALOG IS NOT A DATABASE (CRITICAL)

AWS Glue

❌ Common confusion

Glue stores data

✅ Reality

Glue stores metadata only


Glue Catalog Stores:

  • Table name
  • Column schema
  • S3 location
  • Partitions

Glue DOES NOT store:

  • Rows
  • Records
  • Values

📌 Glue = Hive Metastore replacement


🧠 PART 5 — DATABASE COMPARISON (SAVE THIS)

FeatureRDSDynamoDBGlue Catalog
Data typeRelationalKey-valueMetadata
Stores data
Serverless
Joins
Analytics
Data Lake

📌 S3 stores data for analytics


🧠 PART 6 — REAL-LIFE ARCHITECTURE PATTERNS

🔹 Pattern 1 — ETL Metadata

RDS (Postgres)
→ Job config
→ Schedules
→ Status

🔹 Pattern 2 — Serverless State

Lambda / Step Functions
→ DynamoDB
→ Job state

🔹 Pattern 3 — Analytics

S3 (data)
+ Glue Catalog (schema)
+ Athena / Spark

🧠 PART 7 — INTERVIEW TRAPS (IMPORTANT)

❌ “We store analytics data in RDS”
❌ “Glue is a database”
❌ “DynamoDB replaces all databases”

✔ Correct answers show trade-off thinking


🎤 INTERVIEW STATEMENTS (MASTER LEVEL)

✔ “RDS is used for transactional and metadata workloads, not analytics.”
✔ “DynamoDB provides serverless, low-latency access for stateful workflows.”
✔ “Glue Catalog stores schema metadata; actual data resides in S3.”
✔ “Analytics workloads should decouple storage (S3) from compute.”


🧪 DAY 6 THINKING EXERCISE

Answer mentally:

  1. Why is RDS a bad choice for large analytical queries?
  2. Why do serverless pipelines prefer DynamoDB over RDS?
  3. Why does Glue Catalog scale effortlessly?

🧠 DAY 6 MEMORY MAP (SAVE THIS)

RDS        → Transactions / Metadata
DynamoDB  → High-scale state
Glue      → Metadata only
S3        → Actual data

⏭️ DAY 7 PREVIEW — WEEK 1 ARCHITECT REVIEW

Tomorrow we will:

  • Connect Days 1–6 into ONE architecture
  • Do whiteboard-style explanation
  • Identify gaps
  • Lock fundamentals forever

Reply with:

DAY 7

You’re progressing exactly like a senior AWS data engineer 🚀