Amazon Database
Database and AWS
Categories
- Relational
- NoSQL
Relational is most common
- Online Transaction Processing OLTP, frequent writing
- Online Analytic Processing OLAP, data warehouse
Amazon Relational Database Service (RDS)
supports 6 types of database engines
- MySQL
- Oracle
- PostrgreSQL
- Microsoft SQL Server
- MariaDB
- Amazon Aurora
Data Warehouses – Amazon Redshift
central repository for data from many sources
- OLAP
- Batch jobs
Amazon Redshift is designed for OLAP
NoSQL Amazon DynamoDB
- allows for horizontal scalability
- non-relational
- do not have same table and columns as relational
- Key/Value store or document stores
- used for managing
- user session state
- user profiles
- shopping cart
- time series database
Amazon RDS simplifies:
- setup
- operations
- scaling
Amazon RDS offload common tasks such as:
- backups
- patching
- scaling
- replication
Amazon RDS has no shell access and restricts access to certain system procedures and tables
DB Parameter group
a container for engine config values to be applied to one or more DB instances
DB Options
a container for engine features, ex: Oracle statspack
Operations Benefits
- reliability
- simplify common tasks
Database Engines
MySQL
support Multi-AZ deployment for HA and Read Replicas for horizontal scaling
PostgreSQL
support Multi-AZ deployment for HA and Read Replicas for horizontal scaling
MariaDB
support Multi-AZ deployment for HA and Read Replicas for horizontal scaling
Oracle
editions
- Standard Edition One
- Standard Edition
- Enterprise Edition
Microsoft SQL Server
editions
- Express Edition
- Web Edition
- Standard Edition
- Enterprise Edition
Licensing
Oracle and Microsoft SQL server
- License include
- AWS holds licenses
- included in price
- Oracle – Standard Edition One
- Microsoft – Express, Web, Standard
Amazon Aurora
5 times the performance of MySQL
Creates a DB cluster that has one or more instances and a cluster volume. Cluster Volume is a virtual database storage volume spanning multiple AZs.
Primary Instance is the main instance that supports both read and write
Amazon Aurora Replicas are secondary instances that supports read operations
each DB cluster can have up to 15 Amazon Aurora Replicas in addition to primary instances.
Can Locate Replicas in multiple AZs
Storage Options
- users EBS
- 4 to 6 TB
- up to 30,000 IOPS
Backup and Recovery
Automated backups
tracks change sand backs up DB
creates storage volume snapshot.
Backup Retention Period
- Default: retains 1 day of backups
- Max: 35 days of backups
Delete DB and all automated backups are deleted also
Daily during a configurable 30 minute maintenance window
Manual Backups
manually initiated and snapshots are kept until explicitly deleted
Recovery
when restored and only default Security Group and DB parameter is set, After completing recovery you must reset Security Group and custom DB parameters
High Availability with Multi-AZ
allows you to crate a dB cluster across Multiple AZs.
Lets you meet:
- RPO – recover point objective
- max period of data loss that is acceptable
- RTO – recovery time objective
- max amount of downtime that is permitted to recover from backup
Automatically replicated data from primary to slave
Automatically performs failover in the event:
- loss of primary AZ
- loss of network connectivity to primary
- compute unit failure on primary
- storage failure on primary
Failover takes 1-2 minutes
Scale up and out
when going up in compute, memory or storage, changes are scheduled during next maintenance or immediately using ModifyDBInstance
Horizontal Scalability with Partitioning
partitioning or sharding allows horizontal scaling
NoSQL are designed to scale horizontally
Horizontal Scalability with Read Replicas
- heavy read workloads
- offload reporting
To enhance disaster recovery or reduce global latencies use cross region read replicas to serve read requests
Security
- use IAM Policies
- Private Subnets
- ACL
- Security Groups
- SSL
Amazon Redshift
- petabyte scale data warehouse
- relational DB for OLAP
- uses PostgreSQL
Clusters and Nodes
key component of Redshift is a cluster
cluster
- Leader node
- one or more compute nodes
client only interacts with leader node
Dense Compute
up to 32 TB using fast SSDs
Dense Storage
up to 2 PB using magnetic disks
Increase query performance by adding compute nodes
Resize Operation on Redshift
- creates new cluster
- DB becomes read only until finished
Amazon Redshift uses compression
Samples then selects best compression scheme
Distribution Styles
- Even: default
- Key: distributed based on values in one column
- All: full copy of table distributed to every node
Sort Keys
- compound: more efficient when using prefixes
- interleaved: equal weight to each column
Loading Data
- standard insert and update
- bulk uses Copy
- Unload for exporting
Query Data
- Workload Management (WLM): queues and prioritized queries
Snapshots
- automatically stored internally on S3 but can do manual snapshots
Amazon DynamoDB
NoSQL – fast, low latency
must define a certain read or write capacity and Amazon DynamoDB auto adds capacity to support it
- on fast SSDs
- requires tables have a primary key
- stores info as key/value pair
Supports 5 Scalar Data Types
- String
- Number
- Binary
- Boolean
- Null
Set Data Types
unique list of one or more scalar types
- String
- Number
- Binary
Document Data Types
represents multiple nested attributes
- List: ordered list
- Map: unordered list
Primary Key is made of one attributes
Partition and Sort key
- primary key is mad of two attributes
- first attribute is partition key
- second attribute is sort (or range) key
Secondary Indexs can be made on a table
- Global
- Local
Searching
- Query: primary search operation that requires a partition key attribute name and a distinct value
- Scan: reads every time in table

