Karl Robinson
January 21, 2020
Karl is CEO and Co-Founder of Logicata – he’s an AWS Community Builder in the Cloud Operations category, and AWS Certified to Solutions Architect Professional level. Knowledgeable, informal, and approachable, Karl has founded, grown, and sold internet and cloud-hosting companies.
What is Amazon S3?
Amazon S3 stands for ‘Simple Storage Service’ and it is an object ‘Storage’ ‘Service’ which provides secure, durable and highly scalable object storage as a service for IT teams and developers. Amazon S3 is very ‘Simple’ to use, offering a web services interface which enables customers to store and retrieve their data from anywhere on the web.
Objects are essentially flat files including documents, pictures, videos etc.
Amazon S3 Objects consist of:
- Key – essentially the name of the file
- Value – the file data made up of a sequence of bytes
- Version ID – used for file versioning
- Metadata – data about the data you are storing
- Access Control Lists – governing access to the object
- Torrent – enabling BitTorrent access to objects stored in S3
Amazon S3 Overview
S3 can be used to store objects from 0 to 5TB. Objects are stored in buckets, which you can think of as a root folder. S3 buckets must have unique name – they are global namespace so no 2 buckets can have the same name.
The URL for an S3 bucket follows the format: https://s3.eu-west-1.amazonaws.com/bucketname
A successful upload to a bucket generates an HTTP200 code
Amazon S3 Guarantees
- Built for 99.99% availability
- Amazon guarantee 99.9% availability (S3 Standard)
- ’11 9s’ durability for data stored on S3 (99.999999999%)
Wait – 11 9’s? What does that mean? Well, statistically, it means that if you store 1 million files in S3, AWS may lose one file every 659,000 years. So S3 storage is very reliable!
Amazon S3 Data Consistency
S3 offers ‘Read after Write’ consistency for PUTS of new objects. This means that if you write a new file and read it immediately afterwards you will be able to view that data
However, S3 offers ‘Eventual Consistency’ for Overwrite PUTS and DELETES – these changes can take some time to propagate. This means that if you update an existing file or delete a file and read it immediately, you may get the older version, or you may not due to the time it can take for these changes to propagate.
Amazon S3 Storage Classes
Amazon S3 is a tiered storage offering, with 6 different storage classes available. All classes benefit from 11 9’s durability and support lifecycle transitions.
- S3 Standard – the most commonly used storage class, designed for 99.99% availability, 99.9% availability guarantee. Data is stored across multiple devices in multiple facilities, designed to sustain the loss of 2 facilities concurrently. No minimum storage duration charge, and no retrieval fees.
- **NEW** S3 Intelligent Tiering – announced at re:Invent 2019 – designed to optimise costs by leveraging machine learning technology to automatically move data to the most cost-effective tier without performance impact or operational overhead. Designed for 99.99% availability, 99% availability guarantee. 30 day minimum storage duration charge, and no retrieval fees.
- S3 Standard-IA – Infrequently Accessed – requires rapid access when needed, lower fee than S3 standard but a retrieval fee is charged. Designed for 99.9% availability, 99% availability guarantee. 30 day minimum storage duration charge, per GB retrieval fees.
- S3 One Zone – IA – lower cost option for infrequently accessed data that does not require multiple Availability Zone (AZ) resilience. Designed for 99.5% availability, 99% availability guarantee. 30 day minimum storage duration charge, per GB retrieval fees. Because this is a single AZ deployment, loss of an AZ will result in loss of data.
- S3 Glacier – secure, durable and low cost storage class for data archiving. Reliably store any amount of data at competitive or cheaper costs than on premise storage. Retrieval times configurable from minutes to hours. Designed for 99.99% availability, 99.9% availability guarantee. 90 day minimum storage duration charge, per GB retrieval fees.
- **NEW** S3 Glacier Deep Archive – announced at re:Invent 2019 – lowest cost storage where 12 hour retrieval time is acceptable. Designed for 99.99% availability, 99.9% availability guarantee. 180 day minimum storage duration charge, per GB retrieval fees.
Amazon S3 Charges
Like with all AWS services, you are only charged for what you use in S3. It is however important to understand the chargeable elements of S3 as you are charged for more than simply storage consumed.
- Storage – per GB stored, per month – pricing varies by region and decreases with increased storage volume.
- Requests – GET, PUT, COPY, POST, LIST, Select, Lifecycle Transition & Data Retrieval Requests – per 1000 requests.
- Data Retrievals – per GB data retrieved.
- Management – Inventory per million objects listed, Analytics Storage Class Analysis per million objects monitored per month, and Object Tagging per 10,000 tags per month.
- Data Transfer – Transfer in and Transfer out in GB.
- Transfer Acceleration – Data accelerated by AWS Edge Locations, per GB.
- Cross Region Replication – per GB transferred.
Amazon S3 Security Access Controls
There are 3 ways to control access to your data stored in S3 – S3 Access Control Lists (ACLs), S3 Bucket Policies and User based policies.
S3 Access Control Lists – There are 2 types of S3 ACLs – Bucket and Object. Bucket ACLs allow you to control access at the bucket level, and Object ACLs control access at the Object level. ACLs are probably not the best way to control access to S3 data, unless you are looking to manage individual object permissions within a bucket. It is best practise to control access at the user policy level using IAM.
S3 Bucket Policies – Bucket policies allow users to grant cross account access to S3 resources without having to configure user based roles in IAM. User based roles are a more desirable way to control access to S3 resources as poorly defined Bucket Policies can lead to data being shared with the wrong audience.
User Based Policies – User based policies are configured centrally in IAM.
Amazon S3 Data At Rest Encryption
There are 2 ways that data can be encrypted at rest on Amazon S3 – Server Side Encryption and Client Side Encryption.
Server Side Encryption
- S3 Managed Keys – SSE-S3
- AWS Key Management Service Managed Keys – SSE-KMS
- Server Side Encryption with Customer Provided Keys – SSE-C
Client Side Encryption
- Done locally on your PC or Mac that you use to upload the data to S3
Amazon S3 Version Control
Amazon S3 Version control enables users to store all versions of an object (including all writes and even if you delete an object). This is a great backup tool, enabling AWS S3 customers to retrieve deleted or corrupted files. It is important to note that once Version Control is enabled, it cannot be disabled – only suspended.
Version Control integrates with Lifecycle Rules so that older versions of an object can be moved onto a cheaper storage class.
As an extra level of security, Version Control has an MFA delete capability, which uses Multi Factor Authentication to prevent accidental deletion of data.
It is also important to note that each version of an object takes up storage space and will therefore cost the user money. So it’s a good idea to implement a Lifecycle Policy to delete or archive older versions.
Amazon S3 Lifecycle Rules
Amazon S3 Lifecycle Rules enable S3 users to automatically transition objects to another S3 storage class. Users can configure object expiration to delete old files. Lifecycle Rules can be used in conjunction with Version Control, and can be applied to current and previous versions of objects
Amazon S3 Cross Region Replication
With Amazon S3 Cross Region Replication, users can replicate entire S3 buckets, or select objects based on object prefixes or tags. This can be useful for disaster recovery purposes. When using Cross Region Replication, the storage class for replicated objects can be changed – for example from a higher to a lower class of storage. It is also possible to change the ownership of replicated objects.
When setting up Cross Region Replication there are a few things to remember:
- Versioning must be enabled on both the source and destination buckets
- Regions must be unique
- Files in existing buckets are not replicated automatically
- All subsequently updated files will be replicated automatically
- Deletion of individual versions or Delete Markers will not be replicated
Amazon S3 Transfer Acceleration
AWS S3 Transfer Acceleration enables geographically dispersed users to upload files faster to their S3 buckets. Transfer Acceleration utilises the AWS Edge Network locations to accelerate uploads to S3. When using S3 Transfer Acceleration, users are given a separate distinct URL (not the bucket URL) to upload objects. This URL enables users upload files to AWS Edge locations, which are then synchronised back to the destination S3 bucket.
Amazon Cloudfront
While Amazon CloudFront is not strictly part of S3, it is the CloudFront network that is used by S3 Transfer Acceleration, so it’s useful to understand a little more about CloudFront. CloudFront is a Content Delivery Network (CDN) which uses AWS Edge Locations to speed up the delivery of content. Content is cached at the Edge location for the ‘Time to Live’ which is configurable by the user. Edge locations can be written to as well as read from, hence the use case with Transfer Acceleration. The content expires after the TTL and will be pulled down from the source location the next time a request is made via that Edge location.
CDN’s use ‘Distributions’ – this is in effect a policy which states which content will be cached in which Edge locations, and for how long. With CloudFront there are 2 types of distributions:
- Web Distribution – normally used for websites
- RTMP – leveraging the Adobe RTMP protocol for Media Streaming
AWS Snowball
AWS Snowball again is not strictly part of S3, but it is a useful way to import data to and export data from AWS S3 if the volume of data that you are looking to migrate will take too long over the network connection you have available.
AWS Snowball is a ruggedised hardware device available in 50TB or 80TB configurations. Data on a Snowball is protected by 256 Bit encryption, and the device has an Industry Standard TPM (Trusted Platform Module) to add further security.
The Snowball device can be ordered via the AWS console. It is shipped to the customer site, and once data has been uploaded, returned to AWS for the data to be imported into S3. AWS will then securely wipe the device to ensure that there is no way that the customer data can be restored by other customers using the device.
AWS also offer the Snowball Edge device, which has a capacity of 100TB and some compute capability. This device can be used in remote locations to run some compute functions (for example Lambda functions) on the data stored on the device.
Finally , for exabyte scale data transfers, AWS offers the AWS Snowmobile – a ruggedised shipping container which can be towed to customer premises and can import up to 100PB of data per snowmobile.
Amazon S3 Storage Gateway
Amazon S3 Storage Gateway uses a virtual appliance to connect a customer’s on premises storage systems to AWS S3. The appliance can run on a VMware ESX or Microsoft Hyper V virtual machine.
There are 2 ways that the appliance can be used to connect to Amazon S3:
- File Gateway – this enables the customer to connect to and store files in S3 via NFS or SMB. Once the files stored in S3, they can be managed as standard S3 Objects.
- Volume Gateway – Presents applications with disk volumes in S3 using the iSCSI protocol.
- Stored Volumes – store primary data locally on the customer premise storage system, with backed up to AWS S3. This gives the customer low latency access to entire data set while providing a durable offsite backup.
- Cached Volumes – Stores only the most frequently used data locally on the customer premises. This minimises the need to scale on premise storage while still providing low latency access to most used data.
- Tape Gateway – provides a virtual tape library interface, supported by NetBackup, Backup Exec, Veeam & other mainstream tape backup vendors.
So there you have it – Logicata’s Super Slick Summary of Amazon S3 Simple Storage Service!