What I have learned about S3 on AWS

Amazon-Web-Services · 20 Mar 2022

This is the second post of content for preparing yourself for becoming an AWS Solutions Architect Associate.

As stated in my AWS IAM post, next up is AWS’ Simple Storage Service, or S3, as it is more commonly known.

As I did with IAM, let’s look at the differences to the closest equivalent in Azure. S3 is a service that is similar to Azure Storage service, but only the object, or BLOB, storage component of Azure Storage. S3 is object storage ONLY, no files, tables, or queues. AWS provides file storage with Elastic File Storage (EFS). This post is focusing on S3 object storage.

One major difference from a security perspective is that S3 storage is not encrypted at rest by default. You need to turn on this encryption for your S3 services. This encryption is turned on for the S3 Bucket which acts like a folder and is the equivalent to a BLOB container in Azure Storage.

There are different capabilities for identity and access security permissions, IAM, within the S3 buckets. They can created at the bucket level using role policies or at the object level using access control lists.

S3 has six different tiers that can be chosen for S3: Standard, Infrequently Accessed, One-zone Infrequently Accessed, Intelligent Tiering, Glacier, and Glacier sub-zero. Standard S3 has an availability SLA of 99.99%. It does have a durability of 11 “9s” or 99.999999999%, meaning that even if the service were to go down, your objects are safe. This is not true for One-zone Infrequently Accessed.

Let’s break-down each of these for understanding:

Standard S3 - the most commonly used tier and the original tier to the S3 service. This would be considered the Hot tier in an Azure Storage BLOB. These are for files that are used frequently and need immediate access. This tier has the highest cost of the tiers for per GB of storage. Standard S3 is zone redundant within the region to provide the 11 9s of durability.
Infrequently Accessed, IA-S3 - This would be similar to the Cool storage tier of an Azure Storage BLOB. IA-S3 has the same Service Level (SLA) and durability as the Standard S3, and is zone redundant. The per GB storage cost is lower, but their is a higher retrieval cost to access these objects.
One-zone-IA-S3 - same as IA-S3, but without the redundancy. This should only be used for files that are thought to be non-critical to the business because there is a lower durability. These objects are only stored in one zone, so if the storage in that zone fails, you will lose your objects. This is the lowest cost per GB for active storage due to the single zone use. A possible use case that I see for this storage tier would be for image files that aren’t frequently used.
Intelligent Tiering S3 - This tier provides a level of flexibility and life cycle management to your S3 buckets. Intelligent tiering uses both Standard S3 and IA-S3 with machine learning to review the use of objects and put them in the proper tier based on access frequency. Intelligent Tiering can help manage the cost of storage while maintaining the same SLA and durability with zone redundancy. The per GB cost is the same for your objects as they would be when storing them directly into a Standard or IA S3 tier, without the need for you to move these files between tiers yourself. This would be close to the use of Lifecycle management on your Azure Storage account. However, choosing this tier creates the parameters to move the files for you, where in Azure, you have to create the logic for moving files to different tiers.
Glacier - up to this point, the S3 tiers have been for storage account buckets and objects that are in an active status. What is you are no longer needing access of the files and only need to keep them for legal or regulatory retention requirements. This is where Glacier comes in. Glacier is the equivalent to the Archive tier in Azure Storage and is for objects that are accessed very infrequently, such as beyond 90 days. Glacier has a lower cost per GB than the Standard, IA, and Intelligent tiering tiers but a higher retrieval cost.
Glacier, sub-zero is the final tier. This is the same Glacier storage as mentioned previously, but this is for those objects that you almost never feel that you will need to access unless it is for litigation or tax investigation. These objects have a very high retrieval cost and may take over 24 hours to access once requested. This tier has the lowest cost per GB to all of the S3 tiers.

S3 Bucket sharing can be done in three ways:

Bucket policies and IAM - entire bucket - programmatic access only
Bucket ACL and IAM - individual objects within buckets - programmatic access only
Cross-account IAM roles - programmatic and console configuration. Temporarily access to resources.
- provides cross-account access between organizations. Create the role for the second account and then setup up a user and role that is going to be given and assigned to the other organization by going to the account and switching the role.

Cross region replication

Replicate buckets and objects to different regions.
Role for replication must be created.
Versioning must be enabled for the source and destination bucket.
Replication starts working for new objects from point in time forward, not existing objects.
Can be done within the same account or across different accounts.
Delete markers are not replicated.

S3 Transfer acceleration

Uses the CloudFront Edge network to upload to an edge location with a distinct URL to upload and then it replicates to the S3 bucket through the Amazon backbone.
Can test the difference in speed with the comparison tool that is accessible at this link: https://aws.amazon.com/premiumsupport/knowledge-center/upload-speed-s3-transfer-acceleration/

AWS DataSync

This is similar to Azure DataBox Edge
Transfer on-premises data over the WAN and provides data checks
Used to move large amounts of NFS or SMB file systems.
Can also replicate EFS to EFS through an EC2 instance with a DataSync agent.

Snowball

Petabyte scale data transport solution similar to DataBox in Azure.
Snowball is sent to customer site to upload the data and send to Amazon to upload into S3.

CloudFront

AWS Content Delivery Network (CDN)
Edge devices globally to deliver content to them faster than going to the origin region.
Data is cached at the edge device for the next user that accesses.

Storage Gateway

Additional way to move data to AWS storage from on-premises
Physical device or virtual appliance on-premises
VMware ESXi or Hyper-V for the virtual appliance
Management console to create the storage gateway
- File Gateway - NFS or SMB - Upload and store to S3 storage or Glacier.
- Volume gateway - iSCSI - Stored and Cached volumes - VM images and disks - Upload and store to EBS storage.
- Tape gateway
This is similar to using Microsoft System Center to replicate to Azure.

Athena versus Macie

Athena - Query service to analyze data in S3 with SQL queries. Works directly with S3 stored data. Pay per query or terabyte scanned, serverless infrastructure. Query log files in S3. Can generate business, cost, and usage reports.

Macie - Security service that uses AI/ML and natural language processing to analyze objects on S3 for Personal Identifiable Information (PII). Helpful for PCI-DSS compliance.

I hope that you are enjoying this information so far. It is helping me to continue to comprehend these services as I prepare for the exam myself. Thank you very much.

Reference links:

What I’ve Learned about IAM on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/20/aws-iam

What I’ve Learned about S3 on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/20/aws-s3

What I’ve Learned about EC2 on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/22/aws-ec2

What I’ve Learned about EBS on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/22/aws-ebs

What I’ve Learned about Cloudwatch on AWS

https://captainhyperscaler.github.io/amazon-web-services/2022/03/22/aws-cloudwatch

What I’ve Learned about RDS on AWS