Amazon S3
Encyclopedia
Amazon S3 is an online storage web service
offered by Amazon Web Services
. Amazon S3 provides storage through web service
s interfaces (REST
, SOAP
, and BitTorrent). Amazon launched S3, its first publicly-available web service, in the United States in March 2006 and in Europe in November 2007.
At its inception, Amazon charged end users US$
0.15 per gigabyte-month, with additional charges for bandwidth used in sending and receiving data, and a per-request (get or put) charge. As of November 1, 2008, pricing moved to tiers where end users storing more than 50 terabyte
s receive discounted pricing. Amazon claims that S3 uses the same scalable storage infrastructure that Amazon.com uses to run its own global e-commerce network.
Amazon S3 is reported to store more than 449 billion objects . This is up from 102 billion objects , 64 billion objects in August 2009, 52 billion in March 2009, 29 billion in October 2008, 14 billion in January 2008, and 10 billion in October 2007. S3 uses include web hosting, image hosting, and storage for backup systems. S3 comes with a 99.9% monthly uptime guarantee which equates to approximately 43 minutes of downtime per month.
, high availability
, and low latency
at commodity
costs.
S3 is designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.
S3 stores arbitrary object
s up to 5 terabyte
s in size, each accompanied by up to 2 kilobytes of metadata
. Objects are organized into buckets (each owned by an Amazon Web Services
or AWS account), and identified within each bucket by a unique, user-assigned key. Amazon Machine Images (AMIs)
which are modified in the Elastic Compute Cloud
(EC2) can be exported to S3 as bundles.
Buckets and objects can be created, listed, and retrieved using either a REST
-style HTTP
interface or a SOAP interface. Additionally, objects can be downloaded using the HTTP GET interface and the BitTorrent protocol.
Requests are authorized using an access control list
associated with each bucket and object.
Bucket names and keys are chosen so that objects are addressable using HTTP URL
s:
Because objects are accessible by unmodified HTTP clients, S3 can be used to replace significant existing (static) web hosting
infrastructure. The Amazon AWS Authentication mechanism allows the bucket owner to create an authenticated URL with time-bounded validity. That is, someone can construct a URL that can be handed off to a third-party for access for a period such as the next thirty minutes, or the next twenty-four hours.
Every item in a bucket can also be served up as a BitTorrent feed. The S3 store can act as a seed host for a torrent and any BitTorrent client can retrieve the file. This drastically reduces the bandwidth costs for the download of popular objects. While the use of BitTorrent does reduce bandwidth, AWS does not provide native bandwidth limiting and as such users have no access to automated cost control. This can lead to users on the 'free-tier' S3 or small hobby users to amass dramatic bills. AWS representatives have previously stated that such a feature was on the design table from 2006-2010 but have recently stated the feature is no longer in development.
A bucket can be configured to save HTTP log information to a sibling bucket; this can be used in later data mining
operations. This feature is currently still in beta.
http://subdomain.example.com/ . In the past, a visitor to this URL would find only an XML-formatted list of objects instead of a general landing page (e.g., index.html) to accommodate casual visitors. Now, however, websites hosted on S3 may designate a default page to display, and another page to display in the event of a partially invalid URL. However the CNAME specification only allows a subdomain to be hosted this way, not a second level domain. That is, subdomain.example.com can be hosted, but not example.com. One may use an ANAME record pointing to the S3 server, but this method is not documented by Amazon.
has used S3 since April 2006. They experienced a number of initial outages and slowdowns, but after one year they described it as being "considerably more reliable than our own internal storage" and claimed to have saved almost $1 million in storage costs.
There is a User Mode File System (FUSE)
for Unix-like operating systems (Linux, etc.) that lets EC2
-hosted Xen
images mount an S3 bucket as a file system. Note that as the semantics of the S3 file system are not that of a Posix
file system, the file system may not behave entirely as expected.
Apache Hadoop
file systems can be hosted on S3, as its requirements of a file system are met by S3. As a result, Hadoop can be used to run MapReduce
algorithms on EC2 servers, reading data and writing results back to S3.
Least Authority Enterprises, Dropbox
, Zmanda
and Ubuntu One
are some of the many online backup and synchronization services that use S3 as their storage and transfer facility.
Minecraft
hosts game updates and player skins on the S3 servers.
Tumblr
, Formspring and Posterous
images are hosted on the S3 servers.
Web service
A Web service is a method of communication between two electronic devices over the web.The W3C defines a "Web service" as "a software system designed to support interoperable machine-to-machine interaction over a network". It has an interface described in a machine-processable format...
offered by Amazon Web Services
Amazon Web Services
Amazon Web Services is a collection of remote computing services that together make up a cloud computing platform, offered over the Internet by Amazon.com...
. Amazon S3 provides storage through web service
Web service
A Web service is a method of communication between two electronic devices over the web.The W3C defines a "Web service" as "a software system designed to support interoperable machine-to-machine interaction over a network". It has an interface described in a machine-processable format...
s interfaces (REST
Rest
Rest may refer to:* Leisure* Human relaxation* SleepRest may also refer to:* Rest , a pause in a piece of music* Rest , the relation between two observers* Rest , a 2008 album by Gregor Samsa...
, SOAP
SOAP
SOAP, originally defined as Simple Object Access Protocol, is a protocol specification for exchanging structured information in the implementation of Web Services in computer networks...
, and BitTorrent). Amazon launched S3, its first publicly-available web service, in the United States in March 2006 and in Europe in November 2007.
At its inception, Amazon charged end users US$
United States dollar
The United States dollar , also referred to as the American dollar, is the official currency of the United States of America. It is divided into 100 smaller units called cents or pennies....
0.15 per gigabyte-month, with additional charges for bandwidth used in sending and receiving data, and a per-request (get or put) charge. As of November 1, 2008, pricing moved to tiers where end users storing more than 50 terabyte
Terabyte
The terabyte is a multiple of the unit byte for digital information. The prefix tera means 1012 in the International System of Units , and therefore 1 terabyte is , or 1 trillion bytes, or 1000 gigabytes. 1 terabyte in binary prefixes is 0.9095 tebibytes, or 931.32 gibibytes...
s receive discounted pricing. Amazon claims that S3 uses the same scalable storage infrastructure that Amazon.com uses to run its own global e-commerce network.
Amazon S3 is reported to store more than 449 billion objects . This is up from 102 billion objects , 64 billion objects in August 2009, 52 billion in March 2009, 29 billion in October 2008, 14 billion in January 2008, and 10 billion in October 2007. S3 uses include web hosting, image hosting, and storage for backup systems. S3 comes with a 99.9% monthly uptime guarantee which equates to approximately 43 minutes of downtime per month.
Design
Details of S3's design are not made public by Amazon. According to Amazon, S3's design aims to provide scalabilityScalability
In electronics scalability is the ability of a system, network, or process, to handle growing amount of work in a graceful manner or its ability to be enlarged to accommodate that growth...
, high availability
High availability
High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period....
, and low latency
Low latency
Low latency allows human-unnoticeable delays between an input being processed and the corresponding output providing real time characteristics. This can be especially important for internet connections utilizing services such as online gaming and VOIP....
at commodity
Commodity computing
Commodity computing is to use large numbers of already available computing components for parallel computing to get the greatest amount of useful computation at low cost. It is computing done in commodity computers as opposed to high-cost supermicrocomputers or boutique computers...
costs.
S3 is designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.
S3 stores arbitrary object
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...
s up to 5 terabyte
Terabyte
The terabyte is a multiple of the unit byte for digital information. The prefix tera means 1012 in the International System of Units , and therefore 1 terabyte is , or 1 trillion bytes, or 1000 gigabytes. 1 terabyte in binary prefixes is 0.9095 tebibytes, or 931.32 gibibytes...
s in size, each accompanied by up to 2 kilobytes of metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
. Objects are organized into buckets (each owned by an Amazon Web Services
Amazon Web Services
Amazon Web Services is a collection of remote computing services that together make up a cloud computing platform, offered over the Internet by Amazon.com...
or AWS account), and identified within each bucket by a unique, user-assigned key. Amazon Machine Images (AMIs)
Amazon Machine Image
An Amazon Machine Image is a special type of virtual appliance which is used to instantiate a virtual machine within the Amazon Elastic Compute Cloud...
which are modified in the Elastic Compute Cloud
Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud is a central part of Amazon.com's cloud computing platform, Amazon Web Services . EC2 allows users to rent virtual computers on which to run their own computer applications...
(EC2) can be exported to S3 as bundles.
Buckets and objects can be created, listed, and retrieved using either a REST
Representational State Transfer
Representational state transfer is a style of software architecture for distributed hypermedia systems such as the World Wide Web. The term representational state transfer was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation...
-style HTTP
Hypertext Transfer Protocol
The Hypertext Transfer Protocol is a networking protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web....
interface or a SOAP interface. Additionally, objects can be downloaded using the HTTP GET interface and the BitTorrent protocol.
Requests are authorized using an access control list
Access control list
An access control list , with respect to a computer file system, is a list of permissions attached to an object. An ACL specifies which users or system processes are granted access to objects, as well as what operations are allowed on given objects. Each entry in a typical ACL specifies a subject...
associated with each bucket and object.
Bucket names and keys are chosen so that objects are addressable using HTTP URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
s:
-
http:// s3.amazonaws.com/bucket/key -
http:// bucket.s3.amazonaws.com/key -
(where bucket is a DNShttp:// bucket/keyDomain name systemThe Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...
CNAME record pointing to bucket.s3.amazonaws.com)
Because objects are accessible by unmodified HTTP clients, S3 can be used to replace significant existing (static) web hosting
Web hosting service
A web hosting service is a type of Internet hosting service that allows individuals and organizations to make their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own or lease for use by their clients as well as providing Internet...
infrastructure. The Amazon AWS Authentication mechanism allows the bucket owner to create an authenticated URL with time-bounded validity. That is, someone can construct a URL that can be handed off to a third-party for access for a period such as the next thirty minutes, or the next twenty-four hours.
Every item in a bucket can also be served up as a BitTorrent feed. The S3 store can act as a seed host for a torrent and any BitTorrent client can retrieve the file. This drastically reduces the bandwidth costs for the download of popular objects. While the use of BitTorrent does reduce bandwidth, AWS does not provide native bandwidth limiting and as such users have no access to automated cost control. This can lead to users on the 'free-tier' S3 or small hobby users to amass dramatic bills. AWS representatives have previously stated that such a feature was on the design table from 2006-2010 but have recently stated the feature is no longer in development.
A bucket can be configured to save HTTP log information to a sibling bucket; this can be used in later data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
operations. This feature is currently still in beta.
Hosting entire websites
As of February 18, 2011, Amazon S3 provides options to host static websites with Index document support and error document support. This support was added as a result of user requests dating at least to 2006. For example, suppose that Amazon S3 was configured with CNAME records to hostNotable uses
Photo hosting service SmugMugSmugmug
SmugMug is a paid digital photo sharing website.- History :Chris MacAskill and Don MacAskill started the original company to build a new video game-oriented web service in February 2002. By August 2002, though, their focus shifted...
has used S3 since April 2006. They experienced a number of initial outages and slowdowns, but after one year they described it as being "considerably more reliable than our own internal storage" and claimed to have saved almost $1 million in storage costs.
There is a User Mode File System (FUSE)
Filesystem in Userspace
Filesystem in Userspace is a loadable kernel module for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code...
for Unix-like operating systems (Linux, etc.) that lets EC2
Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud is a central part of Amazon.com's cloud computing platform, Amazon Web Services . EC2 allows users to rent virtual computers on which to run their own computer applications...
-hosted Xen
Xen
Xen is a virtual-machine monitor providing services that allow multiple computer operating systems to execute on the same computer hardware concurrently....
images mount an S3 bucket as a file system. Note that as the semantics of the S3 file system are not that of a Posix
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...
file system, the file system may not behave entirely as expected.
Apache Hadoop
Hadoop
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...
file systems can be hosted on S3, as its requirements of a file system are met by S3. As a result, Hadoop can be used to run MapReduce
MapReduce
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Parts of the framework are patented in some countries....
algorithms on EC2 servers, reading data and writing results back to S3.
Least Authority Enterprises, Dropbox
Dropbox (storage provider)
Dropbox is a Web-based file hosting service operated by Dropbox, Inc. that uses cloud storage to enable users to store and share files and folders with others across the Internet using file synchronization. It was founded in 2007 by MIT graduates Drew Houston and Arash Ferdowsi as a Y Combinator...
, Zmanda
Zmanda
Zmanda Inc. is an Open Source and Cloud backup software company. It is headquartered in Sunnyvale, CA, USA.. In partnership with open source companies such as Sun and MySQL, the company contributes to many open source projects.-Open Source Projects:...
and Ubuntu One
Ubuntu One
Ubuntu One is a personal cloud service operated by Canonical Ltd.The service enables users to store files online and sync them between computers and mobile devices, as well as stream audio and music from cloud to mobile devices.- Features :...
are some of the many online backup and synchronization services that use S3 as their storage and transfer facility.
Minecraft
Minecraft
Minecraft is a sandbox-building independent video game written in Java originally by Swedish creator Markus "Notch" Persson and now by his company, Mojang, formed from the proceeds of the game. It was released as an alpha on May 17, 2009, with a beta version on December 20, 2010...
hosts game updates and player skins on the S3 servers.
Tumblr
Tumblr
Tumblr is a website and microblogging platform that allows users to post text, images, videos, links, quotes and audio to their tumblelog, a short-form blog. Users can follow other users, or choose to make their tumblelog private. The service emphasizes ease of use. The site ranks as the 10th...
, Formspring and Posterous
Posterous
Posterous is a simple blogging platform started in May 2008, funded by Y Combinator. It is based in San Francisco.Updating to Posterous is similar to other blogging platforms. Posting can be done by logging in to the website's rich text editor, but it is particularly designed for mobile blogging...
images are hosted on the S3 servers.
External links
- S3 tools (opensource tools for accessing S3)
- S3fm console (Free AjaxAjax (programming)Ajax is a group of interrelated web development methods used on the client-side to create asynchronous web applications...
based web interface for Amazon S3) - DragonDisk (Free cross-platformCross-platformIn computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...
client for Amazon S3) - S3 Browser (FreewareFreewareFreeware is computer software that is available for use at no cost or for an optional fee, but usually with one or more restricted usage rights. Freeware is in contrast to commercial software, which is typically sold for profit, but might be distributed for a business or commercial purpose in the...
Windows client for Amazon S3) - xtbackup (opensource backup tool focused on backing up data to Amazon S3)
- CloudBerry Explorer (FreewareFreewareFreeware is computer software that is available for use at no cost or for an optional fee, but usually with one or more restricted usage rights. Freeware is in contrast to commercial software, which is typically sold for profit, but might be distributed for a business or commercial purpose in the...
Windows client for Amazon S3) - AnyClient (Free cross-platformCross-platformIn computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...
client for Amazon S3 with support for FTP/S and SFTP)