AWS - Part 4

Posted Mar 4, 2026

By thirtyone

9 min read

AWS - Part 4

AWS Dynamo DB

Fully managed, highly available with replication across multiple AZs
NoSQL DB with transactional support
Scale to massive workloads, distributed DBs.
Millions of req per sec, trillions of row, 100s of TB of storage
Fast and consistent in performance
Standard and Infrequent access(IA) access table class
Low cost and autoscaling capabilities

Basics

Dynamo DB is made of tables
Each table has a primary key (partition key and optional sort key). Partition key in DynamoDB is the primary key attribute used to determine the partition where an item is stored, while the sort key (optional) is used to order and uniquely identify multiple items within the same partition key
Each table can have an infinite number of items
Each item has attributes - maybe, name, email, score.
Max Size for an item is 400KB
Data Supported are:
Scalar - String, Number, Binary, Boolean
Collection - List, Map
Set Types - String Set, Number, binary
Therefore, in DynamoDB you can rapidly evolve schemas

Read/Write Capacity Modes

Read/Write Capacity Mode in DynamoDB determines how table throughput is managed: Provisioned mode (default) requires specifying read and write capacity units in advance and is cheaper for predictable workloads, whereas On-Demand mode automatically scales capacity based on traffic without planning but is generally more expensive.

DAX

DynamoDB Accelerator (DAX) is used to reduce read latency and database load by caching frequently accessed DynamoDB items in memory; it is a fully managed, write-through cache tightly integrated with DynamoDB, whereas using a general-purpose cache (e.g., ElastiCache) provides more flexible caching for arbitrary data but requires manual cache management and invalidation. Adding DAX doesn’t require modification of application logic but adding ElastiCache does.

Stream Processing, Dynamo DB Streams and Kinesis Data Streams

Stream processing is used to process and react to data changes in real time; DynamoDB Streams capture item-level changes in a DynamoDB table and are typically used to trigger downstream processing like Lambda functions, whereas Kinesis Data Streams is a general-purpose real-time streaming service designed to ingest, store, and process large volumes of streaming data from multiple producers.

Global Tables in Dynamo DB

Global Tables are used to build low-latency, multi-region applications by replicating DynamoDB tables across multiple AWS regions; DynamoDB automatically handles multi-master replication and conflict resolution so applications can read and write to the nearest region with high availability.

API Gateway

API Gateway is used to securely expose backend services as scalable APIs without managing servers; it acts as a managed entry point that handles request routing, authentication, throttling, and monitoring for services like Lambda, ECS, or HTTP backends.

Features: Built-in authentication (IAM, Cognito, JWT), rate limiting and throttling, request/response transformation, caching, monitoring via CloudWatch, and seamless integration with AWS services. Limitations: Higher latency and cost compared to ALB for high-throughput workloads, strict payload limits, and configuration complexity for advanced routing.

AWS Step Function

AWS Step Functions are used to orchestrate and coordinate multiple services in a reliable workflow, enabling you to build serverless applications by defining state machines that manage sequencing, retries, parallel tasks, and error handling across services like Lambda, ECS, or API calls.

Features: Visual workflow/state machine orchestration, built-in retries and error handling, parallel execution, long-running workflow support, and native integrations with many AWS services.

Example: An order processing pipeline where a workflow validates an order → processes payment → updates inventory → sends a confirmation email, with retries and error handling managed automatically.

Limitations: Can become expensive for high-frequency state transitions, has workflow execution limits, and is less suitable for ultra-low-latency real-time processing.

Amazon Cognito

Amazon Cognito is used to add authentication, authorization, and user management to applications without building your own identity system, allowing users to sign up, sign in, and access resources securely.

Features: Managed user directory (User Pools), federated identity with social/OIDC/SAML providers, JWT-based authentication, MFA support, and integration with API Gateway and AWS IAM.

Example: A mobile/web app where users sign up with email or social providers (Google/Apple), and Cognito issues JWT tokens that the API Gateway uses to authorize requests.

Limitations: Complex configuration for advanced auth flows, limited customization of hosted UI and user flows, and debugging auth issues can be difficult.

Amazon Data Firehose

Amazon Kinesis Data Firehose is used to reliably ingest and deliver streaming data to storage or analytics services without managing streaming infrastructure.

Example: Streaming application logs or clickstream data that Firehose automatically buffers, batches, and delivers to S3, Redshift, or OpenSearch for analytics.

Features: Fully managed streaming delivery, automatic scaling, built-in buffering and batching, optional data transformation via Lambda, and direct integration with S3, Redshift, OpenSearch, and Splunk.

Limitations: Limited control over stream processing, buffering introduces delivery latency, and it is designed for delivery pipelines rather than complex real-time stream processing.

Amazon MQ

Amazon MQ is used when applications require a fully managed message broker compatible with traditional messaging protocols (JMS, AMQP, MQTT), allowing legacy or enterprise systems to migrate to AWS without changing their existing messaging architecture.

Example: A microservices system using JMS with ActiveMQ/RabbitMQ where services publish messages to queues and other services consume them asynchronously. Features: Managed ActiveMQ/RabbitMQ, supports standard messaging protocols (JMS, AMQP, STOMP, MQTT), durable queues and topics, message ordering, and compatibility with enterprise messaging systems. Limitations: Higher cost and operational complexity than cloud-native services, scaling is limited by broker instances, and requires managing broker concepts.

Comparison:

SQS is a serverless message queue for simple, highly scalable asynchronous processing. SNS is a pub/sub service for fan-out messaging to multiple subscribers. Amazon MQ is a managed broker for protocol compatibility and legacy messaging systems, while SQS/SNS are cloud-native services optimized for scalability and simplicity.

When to use:

Use Amazon MQ when migrating applications that depend on JMS or standard messaging protocols. Use SQS for decoupled microservices and background job queues. Use SNS for event broadcasting or fan-out notifications to multiple services.

Choosing the Right Database

1. Understand the Workload and Choose the Appropriate Database Model

Before selecting a database, clearly define the characteristics of your workload:

Data structure: Structured, semi-structured, or unstructured
Query patterns: Key lookups, filtering, aggregations, search, or time-range queries
Read vs write characteristics: Read-heavy, write-heavy, or balanced
Scale expectations: Data size, request volume, and expected growth
Consistency requirements: Strong consistency vs eventual consistency
Latency requirements: User-facing requests vs analytical processing
Data relationships: Complex relational joins vs independent records

Relational Databases

Examples: PostgreSQL, MySQL

Best suited when:

Data is structured and fits a tabular model
Transactions and strong consistency are required
Queries involve joins or complex filtering
Data integrity constraints must be enforced
Mature tooling and operational stability are important

Relational databases are a strong default choice for most application workloads.

Document Databases

Examples: MongoDB, Couchbase

Use when:

Data is semi-structured and schema changes frequently
Records are typically retrieved as complete documents
Flexible schemas are preferred over strict relational models

Trade-offs include weaker relational constraints and more complex cross-document queries.

Key–Value Stores / Caching

Examples: Redis, Memcached

Use when:

Extremely low-latency reads and writes are required
Data is ephemeral or derived (sessions, caching)
Atomic counters, rate limits, or TTL-based storage are needed

These systems typically complement a primary database rather than replace it.

Wide-Column Databases

Examples: Cassandra, ScyllaDB, Bigtable

Use when:

Very high write throughput is required
Access patterns are predictable
Horizontal scalability across large datasets is critical

These systems require careful data modeling and typically trade query flexibility for scalability.

Time-Series Databases

Examples: TimescaleDB, InfluxDB

Use when:

Data is indexed primarily by time
Queries focus on time windows (e.g., last hour, last week)
Metrics, monitoring data, or event streams are stored

These databases often provide built-in retention policies and aggregation capabilities.

Search Engines

Examples: Elasticsearch, OpenSearch

Use when:

Full-text search is required
Queries involve ranking, fuzzy matching, or complex filtering

Search systems are usually used as secondary indexes, with the primary database acting as the source of truth.

Graph Databases

Examples: Neo4j, Amazon Neptune

Use when:

The application depends heavily on relationship traversal
Queries involve network graphs, recommendations, or fraud detection

Analytical Data Warehouses

Examples: BigQuery, Snowflake, Redshift

Use when:

Large-scale analytical queries and aggregations are required
Data is used for reporting, dashboards, or business intelligence
Analytical workloads should be isolated from production systems

Operational databases often stream data into warehouses for analytics.

Amazon Keyspaces

Amazon Keyspaces is used when applications need a serverless, scalable wide-column database compatible with Apache Cassandra, allowing existing Cassandra workloads to run on AWS without managing clusters.

AWS Timestream

Motivation: When you need to store and analyze large volumes of time-series data (data with timestamps) efficiently.
What it is: A serverless time-series database optimized for fast ingestion and queries on timestamped data.
Use cases: IoT sensor data, application metrics, infrastructure monitoring, real-time dashboards.

Amazon Redshift

Motivation: When you need to run fast analytical queries on large historical datasets for business intelligence.
What it is: A fully managed data warehouse designed for complex SQL analytics on large structured datasets.
Use cases: BI dashboards, reporting, sales analytics, data warehousing.

Amazon EMR

Motivation: When you need to process huge datasets using big-data frameworks like Spark or Hadoop.
What it is: A managed platform for running distributed data processing frameworks at scale.
Use cases: Log processing, batch analytics, large-scale data transformations, big data pipelines.

AWS Glue

Motivation: When you need to prepare and transform data before analytics or storage in a data warehouse/data lake.
What it is: A serverless ETL (Extract, Transform, Load) service for building data pipelines.
Use cases: Cleaning data, transforming datasets, building ETL pipelines, managing data catalog schemas.

Amazon Athena

Motivation: When you want to query data directly in S3 without setting up databases or infrastructure.
What it is: A serverless service that runs SQL queries directly on data stored in S3.
Use cases: Ad-hoc analysis, log analysis, querying data lakes.

Amazon SageMaker

Motivation: When you need to build, train, and deploy machine learning models without managing ML infrastructure.
What it is: A fully managed machine learning platform covering the entire ML lifecycle.
Use cases: Model training, real-time predictions, recommendation systems, fraud detection.

AWS

This post is licensed under CC BY 4.0 by the author.

AWS Dynamo DB

Basics

Read/Write Capacity Modes

DAX

Stream Processing, Dynamo DB Streams and Kinesis Data Streams

Global Tables in Dynamo DB

API Gateway

AWS Step Function

Amazon Cognito

Amazon Data Firehose

Amazon MQ

Choosing the Right Database

1. Understand the Workload and Choose the Appropriate Database Model

Relational Databases

Document Databases

Key–Value Stores / Caching

Wide-Column Databases

Time-Series Databases

Search Engines

Graph Databases

Analytical Data Warehouses

Amazon Keyspaces

AWS Timestream

Amazon Redshift

Amazon EMR

AWS Glue

Amazon Athena

Amazon SageMaker

Trending Tags