Scalable Storage in the Cloud
Overview
Key-value based object storage with unlimited storage, unlimited objects up to 5 TB for the internet
is an Object level storage (not a Block level storage) and cannot be used to host OS or dynamic websites (but can work with Javascript SDK)
provides durability by redundantly storing objects on multiple facilities within a region
regularly verifies the integrity of data using checksums and provides auto healing capability
integrates with CloudTrail, CloudWatch and SNS for event notifications
bucket names are globally unique
data model is a flat structure with no hierarchies or folders
Logical hierarchy can be inferred using the keyname prefix e.g. Folder1/Object1
Objects stored in S3 have the URL format: https://s3-[region].amazonaws.com/[bucketname]/[filename]
If you host a static website in S3 the URL format: http://[bucketname].s3-website-[region].amazonaws.com
S3 is object based. Objects consist of the following:
- Key - unique, developer-assigned key
- Value - this is the data which consists of sequence of bytes.
- VersionID - used for versioning
- Metadata - data about data e.g. Content-Type, Author, CreateDate etc.
- Subresources - bucket specific configuration like Bucket policies and ACLs
Buckets & Objects
- allows retrieval of 1000 objects and provides pagination support and is NOT suited for list or prefix queries with large number of objects
- with a single put operations, 5GB size object can be uploaded
- use Multipart upload to upload large objects up to 5 TB and is recommended for object size of over 100MB for fault tolerant uploads
- support Range HTTP Header to retrieve partial objects for fault tolerant downloads where the network connectivity is poor
- Pre-Signed URLs can also be used shared for uploading/downloading objects for limited time without requiring AWS security credentials
- allows deletion of a single object or multiple objects (max 1000) in a single call
- Multipart Uploads allows:
- parallel uploads with improved throughput and bandwidth utilization
- fault tolerance and quick recovery from network issues
- ability to pause and resume uploads
- begin an upload before the final object size is known
- Versioning
- allows preserve, retrieve, and restore every version of every object
- protects individual files but does NOT protect from Bucket deletion
Consistency
- provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES
- for new objects, synchronously stores data across multiple facilities before returning success
- updates to a single key are atomic
Security
IAM policies – grant users within your own AWS account permission to access S3 resources
Bucket and Object ACL – grant other AWS accounts (not specific users) access to S3 resources
Bucket policies – allows to add or deny permissions across some or all of the objects within a single bucket
S3 Policy - Anonymous Read Access :
{ "Version": "2012-10-17", "Id": "Policy1589785443274", "Statement": [ { "Sid": "Stmt1589785361449", "Effect": "Allow", "Principal": "*", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::<bucketname>/*" } ] }
CORS
CORS - Cross Origin Resource Sharing is a typical concern when using resources from another domain.
When requesting a resource via Javascript in the browser from a different domain than the one where the page was loaded from, your browser ensures that CORS is accepted on the target resource site. If not then the resource is not loaded.
When using S3 buckets to store static resources for a WebApp, since S3 buckets are not in the same domain as the WebApp, the browser will block loading the resource.
To configure your bucket to allow cross-origin requests, you create a CORS configuration, which is an XML document with rules that identify the origins that you will allow to access your bucket, the operations (HTTP methods) that will support for each origin, and other operation-specific information.
Sample CORS Configuration to allow all origins:
<CORSConfiguration> <CORSRule> <AllowedOrigin>*</AllowedOrigin> <AllowedMethod>GET</AllowedMethod> <MaxAgeSeconds>3000</MaxAgeSeconds> <AllowedHeader>Authorization</AllowedHeader> </CORSRule> </CORSConfiguration>
Versioning
- You can version your files in Amazon S3. It is enabled at the bucket level.
- Same key overwrite will increment the “version”.
- It is best practice to version your buckets to protect against unintended deletes (ability to restore a version) and easy roll back to previous version.
- Any file that is not versioned prior to enabling versioning will have version “null”
- Suspending versioning does not delete the previous versions
Replication
- Objects can be replicated between two buckets.
Key Points:
Must enable versioning in source and destination Buckets
Buckets can be in different accounts
Copying is asynchronous
Must give proper IAM permissions to S3
Replication could be
- Cross Region Replication (CRR) - Use cases: compliance, lower latency access, replication across accounts
- Same Region Replication (SRR) - Use cases: log aggregation, live replication between production and test accounts
After activating, only new objects are replicated (not retroactive)
For DELETE operations:
- If you delete without a version ID, it adds a delete marker, not replicated
- If you delete with a version ID, it deletes in the source, not replicated
So in short DELETE operations are NOT replicated.
There is no “chaining” of replication i.e. If bucket 1 has replication into bucket 2, which has replication into bucket 3 then objects created in bucket 1 are not replicated to bucket 3
Encryption
Supports HTTPS (SSL/TLS) encryption of data in TRANSIT
S3 Bucket Policies – Force SSL:
To force SSL, create an S3 bucket policy with a DENY on the condition aws:SecureTransport = false
{ "Id": "ExamplePolicy", "Version": "2012-10-17", "Statement": [ { "Sid": "AllowSSLRequestsOnly", "Action": "s3:*", "Effect": "Deny", "Resource": [ "arn:aws:s3:::DOC-EXAMPLE-BUCKET", "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*" ], "Condition": { "Bool": { "aws:SecureTransport": "false" } }, "Principal": "*" } ] }
Note: Using an ALLOW on aws:SecureTransport = true would allow anonymous GetObject if using SSL
data encryption at REST. There are a 4 options:
SSE-S3 - AES256 bit encryption, S3 managed keys
SSE-KMS -
- KMS managed keys
- Customer has access to envelope key and that is a key which actually encrypts your data’s encryption key
- Audit Trail available via KMS
SSE-C - Customer managed keys
Client side encryption
If the file is encrypted at upload time the x-amz-server-side-encryption parameter will be included in your request header.
There are two possible values for the x-amz-server-side-encryption header:
- AES256, which tells S3 to use S3-managed keys,
- aws:kms, which tells S3 to use AWS KMS–managed keys.
The old way to enable default encryption in a bucket was to use a bucket policy and refuse any HTTP command without the proper headers:
Security Policy for enforcing encryption via SSE-S3:
{ "Version": "2012-10-17", "Id": "PutObjPolicy", "Statement": [ { "Sid": "DenyIncorrectEncryptionHeader", "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::<bucket_name>/*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption": "AES256" } } }, { "Sid": "DenyUnEncryptedObjectUploads", "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::<bucket_name>/*", "Condition": { "Null": { "s3:x-amz-server-side-encryption": true } } } ] }
Security Policy for enforcing encryption via SSE-KMS:
It gets more interesting with the SSE-KMS case. Here we need to deny all requests that use the wrong encryption type, i.e. x-amz-server-side-encryption = AWS256 or an incorrect KMS key, i.e. the value of x-amz-server-side-encryption-aws-kms-key-id. A rule for that looks like this (You need to replace $BucketName, $Region, $Accountid and $KeyId):
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::$BucketName/*", "Condition": { "StringNotEqualsIfExists": { "s3:x-amz-server-side-encryption": "aws:kms" }, "Null": { "s3:x-amz-server-side-encryption": "false" } } }, { "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::$BucketName/*", "Condition": { "StringNotEqualsIfExists": { "s3:x-amz-server-side-encryption-aws-kms-key-id": "arn:aws:kms:$Region:$AccountId:key/$KeyId" } } } ] }
The new way is to use the “default encryption” option in S3.
Note: Bucket Policies are evaluated before “default encryption”
Storage
Standard
- Default storage class
- 99.999999999% durability & 99.99% availability
- Low latency and high throughput performance
- designed to sustain the loss of data in a two facilities
Standard IA
- optimized for long-lived and less frequently accessed data
- designed to sustain the loss of data in a two facilities
- 99.999999999% durability & 99.9% availability
- suitable for objects greater than 128 KB kept for at least 30 days
One Zone-IA
- High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
- 99.5% Availability
- Use Cases: Storing secondary backup copies of on-premises data, or data you can recreate
Glacier Instant Retrieval
- Millisecond retrieval, great for data accessed once a quarter
- Minimum storage duration of 90 days
Glacier Flexible Retrieval
- suitable for archiving data where data access is infrequent and retrieval time of several (3-5) hours is acceptable.
- 99.999999999% durability
- allows Lifecycle Management policies
- transition to move objects to different storage classes and Glacier
- expiration to remove objects
- Minimum storage duration of 90 days
- 3 retrieval options:
- Expedited (1 to 5 minutes)
- Standard (3 to 5 hours)
- Bulk (5 to 12 hours)
Glacier Deep Archive
- For long term storage – cheaper than Glacier
- 2 retrival options:
- Standard (12 hours)
- Bulk (48 hours)
- Minimum storage duration of 180 days
Comparison of Storage Tiers
Lifecycle
You can transition objects between storage classes either manually or via Lifecycle policies.
Here’s a diagram to show the transition paths possible:
Transition actions: It defines when objects are transitioned to another storage class. Like for eg:
- Move objects to Standard IA class 60 days after creation
- Move to Glacier for archiving after 6 months
- Expiration actions: configure objects to expire (delete) after some time
- Access log files can be set to delete after a 365 days
- Can be used to delete old versions of files (if versioning is enabled)
- Can be used to delete incomplete multi-part uploads
Rules can be created for a certain prefix (ex - s3://mybucket/mp3/*)
Rules can be created for certain objects tags (ex - Department: Finance)
Access Logs
For audit purpose, you may want to log all access to S3 buckets
Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
That data can be analyzed using data analysis tools, or Amazon Athena.
The log format is at: Log Format
Do not set your logging bucket to be the monitored bucket. It will create a logging loop, and your bucket will grow in size exponentially.
Performance
Amazon S3 automatically scales to high request rates, latency 100-200 ms
Your application can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.
There are no limits to the number of prefixes in a bucket.
Example (object path => prefix):
bucket/folder1/sub1/file => /folder1/sub1/
bucket/folder1/sub2/file => /folder1/sub2/
bucket/1/file => /1/
bucket/2/file => /2/If you spread reads across all four prefixes evenly, you can achieve 22,000 requests per second for GET and HEAD
S3 – KMS Limitation - If you use SSE-KMS, you may be impacted by the KMS limits. When you upload, it calls the GenerateDataKey KMS API. When you download, it calls the Decrypt KMS API. These count towards the KMS quota per second (5500, 10000, 30000 req/s based on region). As of today, you cannot request a quota increase for KMS.
Multi-Part upload: Recommended for files > 100MB, must use for files > 5GB. Can help parallelize uploads (speed up transfers)
S3 Transfer Acceleration (upload only) - Increase transfer speed by transferring file to an AWS edge location. From there it will forward the data to the S3 bucket in the target region. Compatible with multi-part upload.
S3 Byte-Range Fetches - Using the Range HTTP header in a GET Object request, you can fetch a byte-range from an object, transferring only the specified portion. Parallelize GETs by requesting specific byte ranges. Better resilience in case of failures. Can be used to speed up downloads. Can be used to retrieve only partial data (for example the head of a file)
S3 Select & Glacier Select - Retrieve less data using SQL by performing server side filtering. Can filter by rows & columns (simple SQL statements). Less network transfer, less CPU cost client-side
Presigned URLs
We can give users access to files in a bucket without giving them access to the bucket itself using Presigned URLs. We can generate pre-signed URLs using SDK or CLI
- For downloads (easy, can use the CLI)
- For uploads (harder, must use the SDK)
Valid for a default of 3600 seconds, can change timeout with –expires-in [TIME_BY_SECONDS] argument
Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT
Examples :
- Allow only logged-in users to download a premium video on your S3 bucket
- Allow an ever changing list of users to download files by generating URLs dynamically
- Allow temporarily a user to upload a file to a precise location in our bucket
Events
- When some events happen in the
S3
bucket (for e.g. S3:ObjectCreated, S3:ObjectRemoved, S3:ObjectRestore, S3:Replication…) you can create event notification rules and these rules would emit events which could trigger other AWS services likeSQS
,SNS
,Lambda
functions andEventBridge
like so: - Object name filtering possible (*.jpg)
- S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer
- If two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent
- If you want to ensure that an event notification is sent for every successful write, you have to enable versioning on your bucket.
Best Practices
- use random hash prefix for keys and ensure a random access pattern, as S3 stores object lexicographically randomness helps distribute the contents across multiple partitions for better performance
- use parallel threads and Multipart upload for faster writes
- use parallel threads and Range Header GET for faster reads
- for list operations with large number of objects, its better to build a secondary index in DynamoDB
- use Versioning to protect from unintended overwrites and deletions, but this does not protect against bucket deletion
- use VPC S3 Endpoints with VPC to transfer data using Amazon internal network