What I have learned about RDS on AWSAmazon-Web-Services ·
This is the sixth post of content for preparing yourself for becoming an AWS Solutions Architect Associate.
In this post, we are going to focus on RDS and Databases on AWS.
As I have done previously let’s look at the similarities and differences to the closest equivalent in Azure. There are a number of relational and non-relational database platforms on AWS. This is the same with Azure. As I go through this section, I will point out some of the similar platforms.
Relational Databases (RDS)
- Rows and columns
RDS types in AWS:
- SQL Server
- MySQL Server
- Amazon Aurora
- Multi-AZ - for disaster recovery. Fail-over is automatic with this settings. Aurora is fault-tolerance and Multi-AZ is not configurable.
- Read replicas - for performance. Perfect copy of the database, need to create a new database URL if primary database fails. Used for scaling, not DR. Up to 5 read-replicas.
RDS runs on virtual machines that you do not have access to remotely manage. Amazon patches the RDS OS and database. It is NOT serverless, though Aurora is considered serverless.
Automated backups between 1-35 days for retention period. Enabled by default and stored in S3. No charge for storage equal to the size of the database. Restore point in time of the most recent backup. Restores a new RDS instance and different endpoint.
Encryption uses AWS Key Management Service (KMS).
- Key value pairs
- json is non-relational database
DynamoDB - No SQL, non-relational database. single digit millisecond latency. Mobile, web, gaming, ad-tech, IoT use cases. Similar to CosmosDB in Azure.
- Stored on SSD storage
- Three geographically distinct regions
- Eventual consistent reads - consistency within seconds, best read performance. Default setting.
- Strongly consistent reads - all writes and reads in less than a second.
- DynamoDB Accelerator - 10x performance improvement to microseconds. Completely compatible with existing DynamoDB API calls.
- Avoids the throttling without the use of Cache services. DAX has a read and write-through cache.
- High availability to another availability zone.
- Prepare/commit with every transaction
- Up to 25 items
- On-demand capacity
- Pay per request
- no minimum capacity
- no read/write when idle, only storage and backups
- ideal for new product launches before capacity and use is predictable
- Backup and restore
- Point in Time recovery - restore any point in last 35 days, must be turned on.
- Streams - combine with Lambda functions
- Global tables - globally distributed applications, multi-region. Latency under one second.
- Security - KMS encryption, site-to-site VPN, IAM policies and roles.
Database Migration Service (DMS) Facilitates a migration from a source database to a target database. DMS can migration on-premises to on-premises, on-premise to AWS, AWS DB service to another AWS DB service. DynamoDB cannot be a source database but can be a target.
- Used for business intelligence and large amounts of data
- OLTP, online transaction processing - more specific results in the queries.
- OLAP, online analytics processing - pulls in large records, huge number of queries are run.
- Use different processing for data and infrastructure
Redshift - Amazon’s Data Warehouse, business intelligent and data warehousing. Petabyte scale. $.25 per hour up to $1,000 per TB per year. Fraction of the cost of an on-premises Data Warehouse.
- Single node
- Multi node
- Advanced compression to use less space
- Massive Parallel Processing (MPP)
- Backups enabled by default 1 day configurable up to 35 day retention period with at least 3 copies. Can asynchronously replicate in S3 in another region for DR.
- 1 node unit per hour. Compute node hours, backups, data transfer are charged.
- Encrypted in transit with SSL
- Encrypted at rest
- RedShift has its own Key Management
- Only one AZ
Aurora - Amazon’s RDS. MySQL and PostreSQL compatible RDS. Combines speed and availability. Architected to compete with Managed SQL Database, Azure SQL Database and Oracle. More open source to decrease licensing costs. 10 GB and auto-scales to 64 TB. Two copies in each Availablity Zones in a miniumum of three AZs, making six copies replicated and data blocks are continuously scanned and repaired, similar to how Geo-replicated storage works within Azure Storage. Up to 15 read replicas for maximum performance in-region with automated fail-over.
- Automated backups are always enabled
- Can share Aurora backups with other accounts
Amazon Aurora Serverless
On-demand, autoscaling configuration of Amazon Aurora. Clusters automatically start up, shut down, and scale capacity based on application needs. Helpful for infrequent, intermittent, or unpredictable workloads. Only pay when someone accesses the websites. Works well as a backend to Lambda.
Elasticache - open source, in memory caching engines. Deploy, operate, and scale for common web queries to increase performance and take the load off of the databases. Speed up existing databases when there are frequent identical queries. Not disk-based databases.
- Memcached - simple, horizontal scale when getting started.
- Redis - ranking and pub/sub with multiple availabilty zones and backup/restore.
Database Migration Service
Cloud service to migration relational, data warehouses, and no-SQL databases.
- on-premises to cloud
- cloud to on-premises
- cloud to cloud
- on-premises to on-premises
AWS Schema conversion tool (SCT) can create some or all of the target tables, indexes, views, triggers, etc. These can also be created manually before using the migration tool.
Homogenous migrations - ex: Oracle to Oracle, or SQL to SQL. SCT is not needed.
Heterogenous migrations - SQL to Aurora - will need the SCT.
EC2 instance runs the DMS and SCT, if necessary.
Caching capabilities within AWS (balance of providing information that is up-to-date and providing low latency):
- API Gateway
- DynamoDB Accelerator (DAX)
Amazon EMR (Elastic Map Reduce)
Petabyte scale data analysis. Solution for big data within AWS - Apache Spark, Hive, HBase, Flink, Hudi, and Presto. Three times the speed of typical Apache Spark on-premises.
- Master node - manages the health and status of the cluster
- Core node - run tasks and stores data in Hadoop Distributed File System (HDFS)
- Task node - optional
Core nodes communicate with each other. Log data is lost if the Master node is lost or deleted. Can move log data and archive log data of master node to S3. Default is 5-minute intervals and must be done at the time that you setup the cluster.
I hope that you are enjoying this information so far. It is helping me to continue to comprehend these services as I prepare for the exam myself. Thank you very much.
What I’ve Learned about IAM on AWS
What I’ve Learned about S3 on AWS
What I’ve Learned about EC2 on AWS
What I’ve Learned about EBS on AWS
What I’ve Learned about Cloudwatch on AWS
What I’ve Learned about RDS on AWS
What I’ve Learned about Route 53 on AWS
What I’ve Learned about VPC on AWS
What I’ve Learned about High Availability on AWS
What I’ve Learned about Applications on AWS
What I’ve Learned about Security on AWS
What I’ve Learned about Lambda and serverless on AWS