Amazon S3 Notes
S3 is object storage
Simple web service interface
Use Cases:
- backup and archive for on premise or cloud data
- content media and software storage and distribution
- big data analytics
- static website hosting
- cloud native mobile and internet application hosting
- disaster recover
manage data through lifecycle policies
S3 Basics
Buickets must be unique across all AWS accounts like DNS
Best practice is name buckets with domain name.
namespace is global but buckets are created in a specific region
- minimize latency
- satisfy data locality and sovereignty concerns
- disaster recover and compliance
object size from 0 bytes to 5TB
unlimited amount of data
objects contain data and metadata
- system metadata is used by S3 and contains date last modified, object size, md5 digest
- user metadata is optional with use of tags
object identified by unique identifier called a key
keys are 1024 bytes UTF-8 and make contain slashes dots and dashes
example:
httpL//mybucket.s3-amazonaws.com/files/mydoc.txt
- mybucket is bucket name
- files/mydoc.txt is key
S3 Operations
- Create/delete a bucket
- Write an object
- Read an object
- Delete an object
- List Keys in a bucket
REST Interface maps to HTTP verbs
HTTPS for secure requests
Durability and Availability
- durability “Will my data still be there?” 99.999999999%
- availability “Can I access my data now?” 99.99%
durability is achieved by storing data across multiple devises in multiple facilities within a region
Best practice is to protect data from user error using features as versioning, cross region replication and MFA delete
Data is ‘Eventually Consistent’
May take time to propagate. Puts to existing objects or deletes may see old file on subsequent GETs
S3 secure by default. Only person creating bucket as access
Access through
- coarse grained using access control lists
- fine grained using bucket policies, IAM policies, query-string authentication
course grained – Read, Write, Full-Control
Bucket Policies are recommended
Static Website Hosting
- Create bucket with same name as website hostname
- upload static files to the bucket
- make all files public
- enable static website hosting for the bucket including an Index and Error doc
- Site available at <bucket>.s3-website-<awsregion>.amazonaws.com
- create friendly DNS name in your own domain and use DNS cname or Route53 alias
- Website now available
S3 Advanced Features
Prefixes and Delimiters
flat structure in a bucket
prefixes and delimiters to emulate a file and folder hierarchy
Storage Classes
S3 Standard
- high durability 99.999999999%
- high availability 99.99%
- low latency
- high performance
- low first byte latency
- high throughput
S3 Standard Infrequent Access (IA)
- high durability 99.999999999%
- low latency 99.9%
- high throughput
- log lived less frequently accessed
- lower price per GB
- minimum object size 128kb
- minimum duration 30 days
S3 Reduced Redundancy Storage (RRS)
- lower durability 99.99%
- derived data that can be reproduced
Amazon Glacier
- secure
- durable
- extremely low cost
- no real time access
- restore 3-5 hours later
- retrieve 5% each month free
- Amazon Glacier as a storage class. Data can only be retrieved by S3 APIs. Client can set a data retrieval policy to prevent cost overruns
Object Lifecycle Management
- mange files over the life of the document
- Lifecycle configs are attached to a bucket and can be applied to all objects or specified by a prefix
Encryption
- all sensitive data should be encrypted in flight and at rest
- in flight can use S3 SSL endpoints (HTTPS)
- at rest use server side encryption (SSE)
AWS Key Management Service
- uses 256-bit advanced encryption standard (AES)
- can encrypt on client side using Client Side encryption
SSE-S3 (AWS Manage Keys)
- fully integrated “check-box-style”
- AWS handles key management and key protection for S3
- Every object is encrypted with unique key
- further encrypted by a separate master key
- new master keys issued at least monthly with rotating keys
- keys stored on secure hosts
SSE-KMS (AWS KMS Keys)
- full integrated where Amazon handles key management and protection for S3 but client manages keys
- separate permissions for using master keys
- provides auditing – who used key to access chick object and when
- can view failed attempts
SSE-C (Customer Provided Keys)
- Client maintains encryption keys but do not want to manage client side encryption library
- AWS will do encryption/decryption of objects while client maintains keys
Client Side Encryption
- encrypt data on client side before sending to AWS S3
- use AWS KMS manage customer master key or use client side master key
Versioning
- Protects data from accidental or malicious deletion
- once enabled, versioning cannot be removed from a bucket, only suspended
MFA Delete
- On top of versioning, another layer of protection from deletion
- only enabled by Root
Pre-Signed URLS
- S3 objects default are private but can be shared with pre-signed URLs
- must provide :
- security credentials
- bucket name
- object key
- HTTP method
- expiration data and time
Multipart Upload
- support uploading or copying large objects
- use multipart upload api
- 3 step process
- initiation
- uploading of the parts
- completion
- arbitrary order
- should use > 100Mb
- must use > 5GB
- AWS cli – multipart upload is automatic
- can set lifecycle policy on a bucket to abort in complete multipart uploads after a specified number of days
Range Get
- can download a portion of an object in S3 or Glacier
Cross Region Replication
- asynchronously replicate all new objects in source bucket in one region to target bucket in another region.
- only new objects, old objects must be copied manually
- versioning must be turned on for both regions
- IAM policy to give S3 permission
Logging
- track requests to S3
- off by default
Event Notifications
- Sent in response to actions taken on an object uploaded or stored in S3
- enables client to run workflows, send alerts, or perform actions
- set at bucket level
Amazon Glacier
- unlimited
- large tar
- 99.999999999% durability
Archives
- contain up to 40TB
- unlimited number of archives
- unique ID
- automatically encrypted
- immutable
Vaults
- containers for archive
- each account up to 1000 vaults
- IAM or vault policies for access
Vault Locks
- vault lock policy for Write Once Read Many
- once locked, policy cannot be changed
Data Retrieval
- 5% free each month
- 3-5 hours later

