What I have learned about RDS on AWS

Amazon-Web-Services · 22 Mar 2022

This is the sixth post of content for preparing yourself for becoming an AWS Solutions Architect Associate.

In this post, we are going to focus on RDS and Databases on AWS.

As I have done previously let’s look at the similarities and differences to the closest equivalent in Azure. There are a number of relational and non-relational database platforms on AWS. This is the same with Azure. As I go through this section, I will point out some of the similar platforms.

Relational Databases (RDS)

Rows and columns

RDS types in AWS:

SQL Server
Oracle
MySQL Server
PostgreSQL
Amazon Aurora
MariaDB

Key features:

Multi-AZ - for disaster recovery. Fail-over is automatic with this settings. Aurora is fault-tolerance and Multi-AZ is not configurable.
Read replicas - for performance. Perfect copy of the database, need to create a new database URL if primary database fails. Used for scaling, not DR. Up to 5 read-replicas.

RDS runs on virtual machines that you do not have access to remotely manage. Amazon patches the RDS OS and database. It is NOT serverless, though Aurora is considered serverless.

Automated backups between 1-35 days for retention period. Enabled by default and stored in S3. No charge for storage equal to the size of the database. Restore point in time of the most recent backup. Restores a new RDS instance and different endpoint.

Database snapshots

Encryption uses AWS Key Management Service (KMS).

Non-relational databases

Collection
Documents
Key value pairs
json is non-relational database

DynamoDB - No SQL, non-relational database. single digit millisecond latency. Mobile, web, gaming, ad-tech, IoT use cases. Similar to CosmosDB in Azure.

Stored on SSD storage
Three geographically distinct regions
Eventual consistent reads - consistency within seconds, best read performance. Default setting.
Strongly consistent reads - all writes and reads in less than a second.

Advanced DynamoDB

DynamoDB Accelerator - 10x performance improvement to microseconds. Completely compatible with existing DynamoDB API calls.
- Avoids the throttling without the use of Cache services. DAX has a read and write-through cache.
- High availability to another availability zone.
Transactions
- Prepare/commit with every transaction
- Up to 25 items
On-demand capacity
- Pay per request
- no minimum capacity
- no read/write when idle, only storage and backups
- ideal for new product launches before capacity and use is predictable
Backup and restore
Point in Time recovery - restore any point in last 35 days, must be turned on.
Streams - combine with Lambda functions
Global tables - globally distributed applications, multi-region. Latency under one second.
Security - KMS encryption, site-to-site VPN, IAM policies and roles.

Database Migration Service (DMS) Facilitates a migration from a source database to a target database. DMS can migration on-premises to on-premises, on-premise to AWS, AWS DB service to another AWS DB service. DynamoDB cannot be a source database but can be a target.

Data Warehousing

Used for business intelligence and large amounts of data
OLTP, online transaction processing - more specific results in the queries.
OLAP, online analytics processing - pulls in large records, huge number of queries are run.
Use different processing for data and infrastructure

Redshift - Amazon’s Data Warehouse, business intelligent and data warehousing. Petabyte scale. $.25 per hour up to $1,000 per TB per year. Fraction of the cost of an on-premises Data Warehouse.

Single node
Multi node
Advanced compression to use less space
Massive Parallel Processing (MPP)
Backups enabled by default 1 day configurable up to 35 day retention period with at least 3 copies. Can asynchronously replicate in S3 in another region for DR.
1 node unit per hour. Compute node hours, backups, data transfer are charged.
Encrypted in transit with SSL
Encrypted at rest
RedShift has its own Key Management
Only one AZ

Aurora - Amazon’s RDS. MySQL and PostreSQL compatible RDS. Combines speed and availability. Architected to compete with Managed SQL Database, Azure SQL Database and Oracle. More open source to decrease licensing costs. 10 GB and auto-scales to 64 TB. Two copies in each Availablity Zones in a miniumum of three AZs, making six copies replicated and data blocks are continuously scanned and repaired, similar to how Geo-replicated storage works within Azure Storage. Up to 15 read replicas for maximum performance in-region with automated fail-over.

Automated backups are always enabled
Can share Aurora backups with other accounts

Amazon Aurora Serverless

On-demand, autoscaling configuration of Amazon Aurora. Clusters automatically start up, shut down, and scale capacity based on application needs. Helpful for infrequent, intermittent, or unpredictable workloads. Only pay when someone accesses the websites. Works well as a backend to Lambda.

Elasticache - open source, in memory caching engines. Deploy, operate, and scale for common web queries to increase performance and take the load off of the databases. Speed up existing databases when there are frequent identical queries. Not disk-based databases.

Memcached - simple, horizontal scale when getting started.
Redis - ranking and pub/sub with multiple availabilty zones and backup/restore.

Database Migration Service

Cloud service to migration relational, data warehouses, and no-SQL databases.

on-premises to cloud
cloud to on-premises
cloud to cloud
on-premises to on-premises

AWS Schema conversion tool (SCT) can create some or all of the target tables, indexes, views, triggers, etc. These can also be created manually before using the migration tool.

Homogenous migrations - ex: Oracle to Oracle, or SQL to SQL. SCT is not needed.

Heterogenous migrations - SQL to Aurora - will need the SCT.

EC2 instance runs the DMS and SCT, if necessary.

Caching capabilities within AWS (balance of providing information that is up-to-date and providing low latency):

CloudFront
API Gateway
Elasticache
DynamoDB Accelerator (DAX)

Amazon EMR (Elastic Map Reduce)

Petabyte scale data analysis. Solution for big data within AWS - Apache Spark, Hive, HBase, Flink, Hudi, and Presto. Three times the speed of typical Apache Spark on-premises.

Master node - manages the health and status of the cluster
Core node - run tasks and stores data in Hadoop Distributed File System (HDFS)
Task node - optional

Core nodes communicate with each other. Log data is lost if the Master node is lost or deleted. Can move log data and archive log data of master node to S3. Default is 5-minute intervals and must be done at the time that you setup the cluster.

I hope that you are enjoying this information so far. It is helping me to continue to comprehend these services as I prepare for the exam myself. Thank you very much.

Reference links:

What I’ve Learned about IAM on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/20/aws-iam

What I’ve Learned about S3 on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/20/aws-s3

What I’ve Learned about EC2 on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/22/aws-ec2

What I’ve Learned about EBS on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/22/aws-ebs

What I’ve Learned about Cloudwatch on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/22/aws-cloudwatch

What I’ve Learned about RDS on AWS