AWS - Part 3

Posted Mar 1, 2026

By thirtyone

10 min read

AWS - Part 3

SQS

Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple the components of your application. SQS stores messages in a highly available and durable manner, and provides APIs for sending and receiving messages.

Unlimited throughput, unlimited number of messages in queue
low latency, 1024KB per message max
Default retention of messages - 4 - 14 days max
Can have duplicate messages
can have out of order messages
supports multiple queues

Producing Messages

sent to SQS using SDK
The message is persisted in SQS untill a consumer deletes it

Consuming Messages

Consumers can be EC2 instances, Lambda functions, or any service that supports SQS
Poll SQS for messages(receive upto 10 messages at a time)
Process the messages
Call delete API message

Security

in-flight encryption using HTTPS API
at rest encryption using KMS
client side encryption

A case where SQS might come into picture

User asks backend to do bulk compression of 1000 music files. This might cause the backend to become overloaded. SQS can be used to offload this work by queuing the requests and processing them asynchronously.

Visibility Timeout

After a message is polled by a consumer, it becomes invisible to other consumers for a configurable period of time (default 30 seconds)
This allows the consumer to process the message without interference from other consumers
If the message is not deleted within the visibility timeout, it becomes visible again and can be picked up by another consumer
Consumer can extend the visibility timeout to give itself more time to process the message

SQS - Long Polling

SQS supports long polling, where the consumer polls for messages and waits for a configurable period of time (default 20 seconds) before timing out
This reduces the number of empty polls and improves efficiency
Long polling is especially useful when the queue is expected to have few messages, as it reduces the number of API calls made by the consumer
if configured time is 20s, and the consumer finds a message at t=7, it returns immediately.

Q. Is long polling always better than short polling? A. Long polling is almost always better than short polling because it returns immediately when messages exist but reduces empty calls when they don’t.

SQS - FIFO Queue

Ordered delivery of messages within a queue
Messages are processed in the order they were sent
Throughput (number of messages processed per second by consumers reading from the queue) is lower than standard queues because strict ordering prevents parallel processing within a message group.

SQS can be used as buffer to database writes in order to offload DB.

Queue vs Pub/Sub

Queue - One queue message can be consumed by only one consumer at a time. Pub/Sub - One message can be consumed by multiple subscribers.

Amazon SNS

It is based on pub/sub pattern where one message can be sent to multiple receivers.

The ‘event producer’ only sends message to one SNS topic.
Multiple subscriber can subscribe for the topic.

Security

in-flight encryption using HTTPS API
at rest encryption using KMS
client side encryption

SNS + SQS: Fan Out Pattern

+————-+ +———–+ +———-+ +————-+ | Publisher | —–> | SNS | —–> | SQS Q #1 | —–> | Consumer #1 | | | | Topic | +———-+ +————-+ | | | | —–> | SQS Q #2 | —–> | Consumer #2 | | | | | +———-+ +————-+ | | | | —–> | SQS Q #3 | —–> | Consumer #3 | +————-+ +———–+ +———-+ +————-+

Fanout is a type of pub/sub.
Push once in SNS, receive in all SQS queues that are subscribers.

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams is a fully managed AWS service used for real-time data ingestion and processing at scale.
A stream consists of shards, and each shard provides fixed throughput (1 MB/sec write, 2 MB/sec read).
Producers (applications, logs, IoT devices) send records to the stream using a partition key, which determines the shard and guarantees ordering within that shard.
Consumers (Lambda, KCL apps, EC2, etc.) read and process the data in real time. It supports data retention (24 hours to 7+ days), replay capability, and horizontal scaling by increasing shard count.
Security points are same as SNS

Provisioned capacity mode: You manually set the number of shards and manage scaling based on expected throughput. On-demand mode: Kinesis automatically scales shards based on traffic, so you don’t manage capacity upfront.

ECS(Elastic Container Service)

Amazon ECS is AWS’s managed container orchestration service.
Cluster is a logical group of compute capacity where containers run. It’s backed by - EC2 Instances and Fargate
Task Definition defines the blueprint for running containers like docker image, cpu, memory, port mappings, env, IAM, volumes and so on. Task is an instance of Task definition

Fargate Launch Type

Launch docker container on AWS without worrying about managing the infra
Serverless
Just need to create task definitions
To scale, just increase the number of tasks

IAM Roles for EC2

EC2 Instance Profile

Used by the ECS agent
Makes API calls to ECS service
Send container logs to cloudwatch and so on

ECS Task Role

Allows each task to have a specific role
Defined in task definition

Q. What happens when user wants to visit example.com/bla/bla where infra is hosted on ECS? A. The request reaches the ALB, which checks its listener rules and forwards the request to the appropriate target group. That target group is linked to an ECS Service, which routes the request to one of the running ECS tasks (containers). The container processes the /bla/bla path and returns a response, which flows back through the load balancer to the user.

Q. Why ECS allows to use both fargate and EC2 in the same cluster? A. ECS allows both Fargate and EC2 in the same cluster because a cluster is just a logical grouping of services, not tied to a single compute type. This lets you run different workloads in the same logical environment while choosing the most suitable launch type per service — for example, running steady, cost-optimized workloads on EC2 and bursty or operationally simple workloads on Fargate. It provides flexibility in cost, control, and operational complexity without needing separate clusters.

Data Volumes

Mount EFS fs onto ECS tasks
Works for both EC2 and fargate launch types
AZ independent as EFS is distributed. Multi attach is possible and safe.

ECS Auto Scaling

Automatically increase/decrease the desired number of tasks. Some of the factors that can come into play are:

ECS service average CPU utilization
ECS service average memory utilization
Based on RPS

Note - Task level autoscaling != instance level autoscaling

AWS ECS

Internally backed by S3. Supports image vulnerability scanning, versioning, image tags, image lifecycle etc.

AWS EKS Overview

AWS managed K8s. Alernative to ECS. Supports two mode - Fargate and EC2

Node Types

Fargate: Serverless compute environment that runs containers without the need for you to provision or manage servers.
EC2: Runs containers on Amazon EC2 instances that you manage.
Managed Node Groups: Pre-configured node groups that are easy to use and automatically scale.
Self Managed Nodes: Customized node groups that allow you to manage your own EC2 instances.

Data Volumes

Need to specify storage class manifest on your EKS cluster.
Leverages CSI(Container Storage Interface)
Support for EBS, EFS, FSx

AWS App Runner

AWS App Runner is a fully managed service from AWS that lets you deploy and run containerized web applications and APIs without managing servers or infrastructure.

AWS App Runner vs Beanstalk

Elastic Beanstalk: A PaaS layer on top of EC2, Auto Scaling Groups, and ELBs. It provisions real infrastructure on your behalf but gives you full access to tweak it.

Q. How is it different from provisioning manually? A. Elastic Beanstalk abstracts and automates the provisioning, configuration, scaling, and deployment of AWS infrastructure (EC2, ELB, ASG, etc.), while still allowing you to fine-tune or override the underlying resources and configurations when needed, unlike manual provisioning where you build and manage everything from scratch.

App Runner - A fully managed service where you point AWS at a container image or source code repo and it handles everything — infrastructure, scaling, load balancing, TLS. Zero config required.

Serverless

Serverless computing allows you to run code without provisioning or managing servers. It does not mean that there are no servers, it just means you don’t manage/provision them. Eg.- AWS Lambda, DynamoDB, AWS Cognito, AWS API Gateway, AWS S3, SNS & SQS, Kinesis Data Firehose, Aurora Serverless, Fargate.

AWS Lambda

Virtual Functions with no servers to manage
Limited by time - short executions
Run on-demand
Scaling is automated
Cheaper than running EC2 instances whole day in case you have small workloads that happens in bursts.

Benefits

Pay per request and compute time
Free Tier of 100000 AWS lambda requests and 400,000 GB-seconds of compute time

GB-seconds = Memory (GB) × Execution time (seconds) When you create an AWS Lambda function, you choose how much RAM it gets. That is the memory part. And how much seconds it consumes is the execution time.

Tight integration with AWS

When not to use AWS Lambda

When you need long-running processes - AWS Lambda has a timeout limit of 15 minutes.
When you need to manage your own servers - AWS Lambda abstracts away the server management.
When you need fine-grained control over resources - AWS Lambda is a managed service with predefined resource configurations.

Serverless CronJob

Serverless AWS cron jobs are implemented using Amazon EventBridge scheduled rules to trigger AWS Lambda functions at defined cron or rate intervals, without managing any servers.

AWS Lambda Limits

Memory -> 128 MB - 10 GB Max Execution Time - 15 mins Env Variables - 4 KB Concurrency Executions - 1k(can be increased)

Lambda Concurrency and Throttling

Upto 1k concurrent executions
Above 1k, the request will be throttled. Throttle behaviour:
- synchronous invocation, response code 429 (Too Many Requests)
- asynchronous invocation, retry automatically. Exponential backoff is used for retries.

1k is the limit on user’s account. Obv, it can increased by contacting AWS support.

Cold Start

When a lambda functioin is invoked:

AWS checks if there is an existing an existing warm execution environment available.
If yes - function runs immediately.
If no - AWS must: allocate compute resources, initialize the runtime, download the fn code and run code.
First request served by new insances has higher latency than the rest.

Provisioned Concurrency

Provisioned Concurrency in AWS Lambda is a feature that keeps a specified number of Lambda instances pre-initialized and ready to handle requests, eliminating cold starts; however, it is relatively expensive because you are billed continuously for the allocated concurrency (per GB-second) even when the function is idle, in addition to normal execution and request charges.

Customization at the Edge: Lambda@Edge vs Lamda functions

CloudFront Functions run lightweight JavaScript at CloudFront edge locations for ultra-low-latency request/response manipulation, whereas Lambda@Edge runs full AWS Lambda functions at the edge with broader capabilities (network calls, larger compute, Node/Python runtimes) but with higher latency and cost.

AWS

This post is licensed under CC BY 4.0 by the author.