AWS - Part 3
SQS
Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple the components of your application. SQS stores messages in a highly available and durable manner, and provides APIs for sending and receiving messages.
- Unlimited throughput, unlimited number of messages in queue
- low latency, 1024KB per message max
- Default retention of messages - 4 - 14 days max
- Can have duplicate messages
- can have out of order messages
- supports multiple queues
Producing Messages
- sent to SQS using SDK
- The message is persisted in SQS untill a consumer deletes it
Consuming Messages
- Consumers can be EC2 instances, Lambda functions, or any service that supports SQS
- Poll SQS for messages(receive upto 10 messages at a time)
- Process the messages
- Call delete API message
Security
- in-flight encryption using HTTPS API
- at rest encryption using KMS
- client side encryption
A case where SQS might come into picture
User asks backend to do bulk compression of 1000 music files. This might cause the backend to become overloaded. SQS can be used to offload this work by queuing the requests and processing them asynchronously.
Visibility Timeout
- After a message is polled by a consumer, it becomes invisible to other consumers for a configurable period of time (default 30 seconds)
- This allows the consumer to process the message without interference from other consumers
- If the message is not deleted within the visibility timeout, it becomes visible again and can be picked up by another consumer
- Consumer can extend the visibility timeout to give itself more time to process the message
SQS - Long Polling
- SQS supports long polling, where the consumer polls for messages and waits for a configurable period of time (default 20 seconds) before timing out
- This reduces the number of empty polls and improves efficiency
- Long polling is especially useful when the queue is expected to have few messages, as it reduces the number of API calls made by the consumer
- if configured time is 20s, and the consumer finds a message at t=7, it returns immediately.
Q. Is long polling always better than short polling? A. Long polling is almost always better than short polling because it returns immediately when messages exist but reduces empty calls when they don’t.
SQS - FIFO Queue
- Ordered delivery of messages within a queue
- Messages are processed in the order they were sent
- Throughput (number of messages processed per second by consumers reading from the queue) is lower than standard queues because strict ordering prevents parallel processing within a message group.
SQS can be used as buffer to database writes in order to offload DB.
Queue vs Pub/Sub
Queue - One queue message can be consumed by only one consumer at a time. Pub/Sub - One message can be consumed by multiple subscribers.
Amazon SNS
It is based on pub/sub pattern where one message can be sent to multiple receivers.
- The ‘event producer’ only sends message to one SNS topic.
- Multiple subscriber can subscribe for the topic.
Security
- in-flight encryption using HTTPS API
- at rest encryption using KMS
- client side encryption
SNS + SQS: Fan Out Pattern
+————-+ +———–+ +———-+ +————-+ | Publisher | —–> | SNS | —–> | SQS Q #1 | —–> | Consumer #1 | | | | Topic | +———-+ +————-+ | | | | —–> | SQS Q #2 | —–> | Consumer #2 | | | | | +———-+ +————-+ | | | | —–> | SQS Q #3 | —–> | Consumer #3 | +————-+ +———–+ +———-+ +————-+
- Fanout is a type of pub/sub.
- Push once in SNS, receive in all SQS queues that are subscribers.
Amazon Kinesis Data Streams
- Amazon Kinesis Data Streams is a fully managed AWS service used for real-time data ingestion and processing at scale.
- A stream consists of shards, and each shard provides fixed throughput (1 MB/sec write, 2 MB/sec read).
- Producers (applications, logs, IoT devices) send records to the stream using a partition key, which determines the shard and guarantees ordering within that shard.
- Consumers (Lambda, KCL apps, EC2, etc.) read and process the data in real time. It supports data retention (24 hours to 7+ days), replay capability, and horizontal scaling by increasing shard count.
- Security points are same as SNS
[Producers] | | (Data + Partition Key) v ————————————————- | Kinesis Data Stream | | ——————————————- | | | Shard 1 | Shard 2 | Shard 3 | … | | | ——————————————- | ————————————————- | | (Ordered per Shard) v [Consumers] Lambda | EC2 (KCL) | Analytics | Firehose
Provisioned capacity mode: You manually set the number of shards and manage scaling based on expected throughput. On-demand mode: Kinesis automatically scales shards based on traffic, so you don’t manage capacity upfront.
ECS(Elastic Container Service)
- Amazon ECS is AWS’s managed container orchestration service.
- Cluster is a logical group of compute capacity where containers run. It’s backed by - EC2 Instances and Fargate
- Task Definition defines the blueprint for running containers like docker image, cpu, memory, port mappings, env, IAM, volumes and so on. Task is an instance of Task definition
Fargate Launch Type
- Launch docker container on AWS without worrying about managing the infra
- Serverless
- Just need to create task definitions
- To scale, just increase the number of tasks
IAM Roles for EC2
EC2 Instance Profile
- Used by the ECS agent
- Makes API calls to ECS service
- Send container logs to cloudwatch and so on
ECS Task Role
- Allows each task to have a specific role
- Defined in task definition
Q. What happens when user wants to visit example.com/bla/bla where infra is hosted on ECS? A. The request reaches the ALB, which checks its listener rules and forwards the request to the appropriate target group. That target group is linked to an ECS Service, which routes the request to one of the running ECS tasks (containers). The container processes the /bla/bla path and returns a response, which flows back through the load balancer to the user.
Q. Why ECS allows to use both fargate and EC2 in the same cluster? A. ECS allows both Fargate and EC2 in the same cluster because a cluster is just a logical grouping of services, not tied to a single compute type. This lets you run different workloads in the same logical environment while choosing the most suitable launch type per service — for example, running steady, cost-optimized workloads on EC2 and bursty or operationally simple workloads on Fargate. It provides flexibility in cost, control, and operational complexity without needing separate clusters.
Data Volumes
- Mount EFS fs onto ECS tasks
- Works for both EC2 and fargate launch types
- AZ independent as EFS is distributed. Multi attach is possible and safe.
ECS Auto Scaling
Automatically increase/decrease the desired number of tasks. Some of the factors that can come into play are:
- ECS service average CPU utilization
- ECS service average memory utilization
- Based on RPS
Note - Task level autoscaling != instance level autoscaling
AWS ECS
Internally backed by S3. Supports image vulnerability scanning, versioning, image tags, image lifecycle etc.
AWS EKS Overview
AWS managed K8s. Alernative to ECS. Supports two mode - Fargate and EC2
Node Types
- Fargate: Serverless compute environment that runs containers without the need for you to provision or manage servers.
- EC2: Runs containers on Amazon EC2 instances that you manage.
- Managed Node Groups: Pre-configured node groups that are easy to use and automatically scale.
- Self Managed Nodes: Customized node groups that allow you to manage your own EC2 instances.
Data Volumes
- Need to specify storage class manifest on your EKS cluster.
- Leverages CSI(Container Storage Interface)
- Support for EBS, EFS, FSx
AWS App Runner
AWS App Runner is a fully managed service from AWS that lets you deploy and run containerized web applications and APIs without managing servers or infrastructure.
AWS App Runner vs Beanstalk
Elastic Beanstalk: A PaaS layer on top of EC2, Auto Scaling Groups, and ELBs. It provisions real infrastructure on your behalf but gives you full access to tweak it.
Q. How is it different from provisioning manually? A. Elastic Beanstalk abstracts and automates the provisioning, configuration, scaling, and deployment of AWS infrastructure (EC2, ELB, ASG, etc.), while still allowing you to fine-tune or override the underlying resources and configurations when needed, unlike manual provisioning where you build and manage everything from scratch.
App Runner - A fully managed service where you point AWS at a container image or source code repo and it handles everything — infrastructure, scaling, load balancing, TLS. Zero config required.
Serverless
Serverless computing allows you to run code without provisioning or managing servers. It does not mean that there are no servers, it just means you don’t manage/provision them. Eg.- AWS Lambda, DynamoDB, AWS Cognito, AWS API Gateway, AWS S3, SNS & SQS, Kinesis Data Firehose, Aurora Serverless, Fargate.
AWS Lambda
- Virtual Functions with no servers to manage
- Limited by time - short executions
- Run on-demand
- Scaling is automated
- Cheaper than running EC2 instances whole day in case you have small workloads that happens in bursts.
Benefits
- Pay per request and compute time
- Free Tier of 100000 AWS lambda requests and 400,000 GB-seconds of compute time
GB-seconds = Memory (GB) × Execution time (seconds) When you create an AWS Lambda function, you choose how much RAM it gets. That is the memory part. And how much seconds it consumes is the execution time.
- Tight integration with AWS
When not to use AWS Lambda
- When you need long-running processes - AWS Lambda has a timeout limit of 15 minutes.
- When you need to manage your own servers - AWS Lambda abstracts away the server management.
- When you need fine-grained control over resources - AWS Lambda is a managed service with predefined resource configurations.
Serverless CronJob
Serverless AWS cron jobs are implemented using Amazon EventBridge scheduled rules to trigger AWS Lambda functions at defined cron or rate intervals, without managing any servers.
AWS Lambda Limits
Memory -> 128 MB - 10 GB Max Execution Time - 15 mins Env Variables - 4 KB Concurrency Executions - 1k(can be increased)
Lambda Concurrency and Throttling
- Upto 1k concurrent executions
- Above 1k, the request will be throttled. Throttle behaviour:
- synchronous invocation, response code 429 (Too Many Requests)
- asynchronous invocation, retry automatically. Exponential backoff is used for retries.
1k is the limit on user’s account. Obv, it can increased by contacting AWS support.
Cold Start
When a lambda functioin is invoked:
- AWS checks if there is an existing an existing warm execution environment available.
- If yes - function runs immediately.
- If no - AWS must: allocate compute resources, initialize the runtime, download the fn code and run code.
- First request served by new insances has higher latency than the rest.
Provisioned Concurrency
Provisioned Concurrency in AWS Lambda is a feature that keeps a specified number of Lambda instances pre-initialized and ready to handle requests, eliminating cold starts; however, it is relatively expensive because you are billed continuously for the allocated concurrency (per GB-second) even when the function is idle, in addition to normal execution and request charges.
Customization at the Edge: Lambda@Edge vs Lamda functions
CloudFront Functions run lightweight JavaScript at CloudFront edge locations for ultra-low-latency request/response manipulation, whereas Lambda@Edge runs full AWS Lambda functions at the edge with broader capabilities (network calls, larger compute, Node/Python runtimes) but with higher latency and cost.