Amazon's S3 for ASP.NET Web Developers

A little over a year ago, I wanted to facilitate free video downloads from my site without incurring excessive hosting costs. I decided to use Amazon's Simple Storage Services (S3). Overall, the experience has been a resounding success. In this article, I'll discuss some of the pros and cons of using S3 to augment ASP.NET websites.

Amazon S3 at a Glance

As paraphrased from Amazon's site, S3 is an Internet storage engine designed to make web scalability easier for developers. Through a set of simple APIs, developers can upload, manage, and control access to data stored on Amazon's servers. It's a highly scalable, reliable, fast, and inexpensive storage infrastructure. S3 helps to commoditize bandwidth and level the playing field for small businessesby allowing companies to scale up content delivery without incurring expensive hosting overhead.

S3 lets users store an unlimited number of objects, each of which can up to 5GB. Rather than monthly fees, users pay for what they consume. In the U.S., Amazon charges monthly for total data uploaded ($.10 per GB), stored ($.15 per GB), and served ($.17 per GB). There are also fees for API calls and requests ($.01 per 10,000 calls/requests). Discounts apply for users serving, storing, or uploading more than 40-50 TB per month.

Amazon has storage servers based in the U.S. and in Europe, where pricing is slightly higher. Amazon can also deliver content stored on S3 servers through their edge servers, which have additional locations such as Japan and Hong Kong. This edge delivery is done through integration with Amazon CloudFront, the company's Content Delivery Network (CDN). To use Amazon's edge servers, you just specify which content to serve and from where and pay an additional 12 to 50 percent in per GB costs for this service, depending on the hosting location.

Interacting with Amazon S3

Objects or resources on S3 servers are stored in user-defined buckets, where each resource is stored with a unique key. Unlike traditional storage, where folders partition the content, Amazon's approach let the service behave like a giant List in the sky, where each resource is a file or other collection of zeros and ones. However, because the folder metaphor is so common in storage, S3 keys containing slashes in the name can be used by developers and third party applications to transparently mimic folder semantics as needed.

Service developers can use SOAP or REST-style HTTP interfaces to create, modify, upload, manage, and list buckets and their contents. More importantly, they can also request files using HTTP GET, meaning that requests against either of the following URIs (Figure 1) would work as HTTP GET requests.

Because S3 was designed as a web service, it doesn't come with built-in admin tools, dashboards, GUIs, or other management tools. Instead, developers interact with S3 using only API calls. S3 provides a robust set of APIs that provide for any actions and options you need, not just afterthought levers and knobs that fall short of the mark. Because the APIs are so good, there are many ways to interact with S3's services, some of them very creative.

Many third-party products use S3 to power remote backup and storage solutions. .NET developers can easily use S3's functions with a great set of libraries that's available at www.codeplex.com/ThreeSharp. In this article, however, I'll ignore most of the cooler aspects of interacting with those APIs and will focus on the ins and outs of using S3 as a content delivery solution that you can integrate with ASP.NET sites.

Getting Started with Amazon S3

Setting up an S3 account is free. The account provides you with an Access Key (or ID) and a Secret Access Key that's used to sign secured requests when interacting with the API. Once you have an account, you can start issuing API calls to create buckets and upload files. For simple content-management purposes, however, it's easier to grab a third-party application that lets you interact with your S3 account similarly to how to use an FTP site. Personally, I love Cloudberry Explorer for Amazon S3, shown in Figure 2. It's full-featured and updated regularly, and the free version is amazingly capable out of the box.

Accessing uploaded content from your buckets with a browser is simple. Just concatenate your unique bucket name, the host name for Amazon's S3 servers, and the key for the file you want to pull down:

http://.s3.amazonaws.com/

You can also CNAME part of your site over to Amazon to make buckets part of your domain. Create a bucket name that matches the full name of your host then CNAME the target host to Amazon by prepending the bucket name to Amazon's host name, as outlined in Figure 3.

Content Disposition

Serving simple web content such as images, CSS, and JavaScript from S3 will help decrease page load times by increasing the number of hosts from which browsers can pull content (thereby helping mitigate RFC 2616, which limits browsers to two concurrent connections against any given host). But where you'll start to see big benefits when using S3 is when it comes to larger files and resources. For example, I use S3 to host download versions of my videos, each of which is roughly 20-30MB. Because Amazon's S3 servers are designed to serve content as fast as clients can pull it down, visitors to my site aren't forced to wait while these files trickle down, but I'm still paying just pennies for total bandwidth.

More importantly, S3 allows content owners to define additional HTTP headers for their stored files. On my site, I let users watch videos on the site or download them for later use. To get browsers to download WMV files instead of playing them in the browser, the files have to be treated like attachments. S3 makes that easy to accomplish by enabling metadata for each uploaded resource, including optional HTTP headers. I built a simple web form application (using the ThreeSharp library) that automates configuration of these headers each time I upload all formats for a new video but, as you can see in Figure 2, you can also use Cloudberry Explorer to control your content disposition and HTTP headers.

Hosting Rich Media with S3

Serving rich media, such as Flash or Silverlight content, from Amazon S3 is no different from serving simple images, CSS, and other resources from your own sites. Just upload the content to Amazon's servers and link to that content from your pages. Browsers will pull up dynamically generated pages on your ASP.NET site, start the necessary players and controls, and pull the content from Amazon's servers.

However, because both Flash and Silverlight are designed to protect end users from some forms of cross-site scripting attacks (and to protect content owners from leaching), you must configure your S3 buckets to allow cross-domain requests. You can easily address this problem by uploading a crossdomain.xml or clientaccesspolicy.xml file that defines which host-names should be able to access your S3 content to the root of your S3 bucket. For example, if you've embedded a Silverlight player on your main site at www.contoso.com, you need to place a clientaccesspolicy.xml file at the root of your media.contoso.com S3 bucket that permits requests from pages generated on www.contoso.com. Of course, you must be using an aliased bucket in this case, because Amazon won't let you modify the files for their S3 servers at s3.amazonaws.com.

In some cases, you'll need to restrict access to your content, such as if you want to prevent deadbeats from hotlinking your content or if you want users to register on your site before giving them access to a free eBook. At the most stringent level, S3 can use Amazon Web Service (AWS) credentials to define full-blown ACLs, which can be used to define granular control and access to resources. The downsides to this approach are that it requires end users to have their own AWS accounts and it can become tedious to manage.

S3 provides another option that makes content protection both effective and easy to manage: query string authentication. With this approach, you simply append an absolute expiration time (in Unix time) into the query-string for each resource along with signed proof that you, the content or bucket owner, defined the expiration time. Setting an expiration time is fairly straight-forward (even if S3 doesn't provide stellar documentation in this regard), and Figure 4 outlines a sample class that you can use to sign your own requests. The only real gotcha in defining this solution (other than converting to Unix time) was in getting the exact formatting of the signature down—strangely, Amazon doesn't model that in their documentation.

Using this sample class (or a similar one of your own making), it's easy to define expiration times (which can be ratcheted down to just a few seconds for most content) within both ASP.NET web forms and ASP.NET MVC applications.

The Hack From Hell

If you're serving Flash content that's access-restricted with query string authentication, you may run into a nasty problem that stems from how some Flash players treat URLs. Specifically, even though the + (plus) character is valid within URLs and can even be URL encoded as %2B, there's a long-standing tradition of treating + as a space within URLs. Consequently, some Flash players truncate URLs when they encounter this "space," whether it's URL encoded or not.

The problem is that because query string authentication uses a base-64 encoded hash of the request, the + character will occasionally show up in hashes or signatures. The end result of this problem is that if you generate a signed request for Flash content that ends up with a + in the signature, some Flash players won't be able to load the content because the truncation of the signature will result in an HTTP 403 - with details similar to those displayed in Figure 5.

After trying numerous workarounds (and encoding until I was blue in the face), I came across the mother of all hacks on a message board (bit.ly/1cQhzS) for users tackling this exact problem. Effectively, this hack assumes that expiration time will play a role in when a + shows up in the signature. Therefore, by waiting a second or two (typically) you'll get a new signature without a + involved. As such, the hack is to just check signatures and loop until you get a signed URL without a + in the signature.

That solution feels entirely too dirty to put out in production. Instead, I've found that looping while expanding the time window (or expiration) time by a few seconds is a great way to get around this same problem. It still results in a loop, but typically takes only two to five iterations before a solution pops up. The CLR does a great job of churning through operations like this, so this workaround works well enough—even though it will leave you feeling dirty.

Downsides and Weaknesses

Despite S3's numerous strengths and benefits, there are a few weaknesses. Documentation is a bit sparse in some areas, and hasn't been updated in three years. Tech support is slow to respond unless you pay for premium support offerings. However, because there's a ton of community-driven content available (with some searching), both of these weaknesses typically only amount to a nuisance when working on integration tasks.

Because S3 is a Cloud-based service, it's subject to outages. These have been rare, and recent outages experienced by premier hosting companies highlight the fact that you can never rely upon any web service or hosting solution without experiencing the occasional hiccup here or there. This is a relatively minor problem, but where S3 still needs some serious help is error messages. For developers and applications relying upon the service's APIs, S3's error messages are great. In fact, they're quite verbose (see Figure 5). The problem is that verbose error messages for application developers don't do end users much good when they hit an S3 server for content and run into an error that can't be skinned, translated, or modified in any way.

Personally, I'd love to see Amazon enable a convention where a specifically named (or keyed) .xslt file within a bucket would be used to skin error messages. This way, it would be easy to address the look and feel of these error messages and use XSLT to translate the error messages into more user-friendly errors, facilitating a better overall end-user experience.

S3 Benefits for ASP.NET Sites

Other than Amazon's poor support for web-based error handling, it's a great solution. It's very easy to integrate into ASP.NET sites as an additional hosting source for static content, larger downloads, and rich media. It provides excellent download speeds (effectively only limited by client downstream capabilities), but costs much less than the bandwidth costs you'd expect to see from high-end hosting services or CDNs. S3's rich and functional APIs give developers a huge amount of flexibility in terms of addressing real-life business needs. There's even the option to configure logging against buckets, to let you use comprehensive analytics packages to see who's accessing your content.

All in all, if you're serving rich media or large downloads, or if you just want to address increasing page load times, you should really check out Amazon S3. It's very flexible, very affordable, and packs a lot of benefits for ASP.NET sites with very little effort.

Michael K. Campbell ([email protected]) is an ASPInsider and consultant with years of SQL Server and development experience. He spends most of his time engaged in consulting, ASP.NET MVC development, and creating free online SQL Server Videos.

Comments

Plain text