Hosting Downloads on Amazon S3 with CloudFront

null

Since early 2019, I host downloads for my app The Archive on Amazon's servers. The S3 bucket is a cheap-enough storage of the zip files, and the CloudFront cache is a content distribution network across the globe that improve download speeds.

Here's a long tutorial, because I will most likely forget how I did all this in a while, and chances are you don't know how to do this either.

  • Don't be afraid. There's a lot of things you may not now, but they are not complicated. Only very bare-bones.
  • It's quite fun, actually. When I had the setup running, I felt like a cloud computing wizard. With this, you could set up websites with fast response times around the globe and scale easily!

Backstory and End Result

A couple of years ago, I dabbled with Amazon S3 storage to host downloads of my macOS apps.

One lesson I learned: do not point the app's update feed setting to your S3 bucket. That's too fragile. Use your own domain and redirect if needed. This way you retain 100% control of where the traffic goes. If you point to S3 directly, you give away part of your control. And when Apple enforces TLS with macOS updates, you end up with users not being able to download from the server you cannot control. Dang.

But hosting website downloads on AWS S3 still works fine. Add CloudFront on top of it, and your downloads will be served from a server closest to the request's origin. It's a web cache of sorts, useful for content distribution. Using CloudFront actually reduces traffic cost, so that's what we'll be using.

This post is how I configured an Amazon S3 bucket with the modern non-public settings, and then set up CloudFront to actually publish the contents. In this case, I publish an index.html file to prevent directory listings. The rest is .zip files for downloads.

This is the result, using a .htaccess redirect of the download endpoint on my website:

$ curl -I https://zettelkasten.de/the-archive/download
HTTP/1.1 303 See Other
...
Location: http://dlyfuw95744jo.cloudfront.net/TheArchive-v1.1.11.dmg.zip
...

I'm not an expert in cloud computing, cloud hosting, or cloud anything, really. Amazon S3 was the fanciest thing I used. I once read that you could host static websites in S3 buckets, but it didn't sound as cost-effective, so I never looked into it any further. But with the knowledge from this post, you will be able to host a static website on S3 with great response times thanks to CloudFront if you want.

Prerequisites

You will need an AWS account. Then click around in the console to familiarize yourself with the menu. You will need to switch between S3 and CloudFront, so try to locate the items in the huge services list, or use the search.

We will be creating a new bucket to host files from. If you don't know anything about any of this and want to play around with S3 first, follow along the instructions from the manual:

  1. Sign up for Amazon S3
  2. Create a Bucket
  3. Follow the other steps for uploading and looking at files. Once you have a bucket, you can play around with the bucket "file" browser and upload stuff, click on the file, and view the absolute URL for reference.

Chances are you won't be able to download files or view HTML pages from S3 buckets because public read access is not enabled. And granting public read access is discouraged nowadays, as we'll see shortly.

Amazon Access Rights

The access right levels for S3 directories are:

  • "List objects",
  • "Write objects",
  • "Read object permissions",
  • "Write object permissions".

For files, read access is called "Read object".

You will not want to give anyone access to file or directory permission settings. "List objects" and "Read object" settings are most interesting.

Naively, I was granting read access ("List objects" and "Read object") access to "Everyone" when I began hosting downloads on Amazon S3. But it turns out that generating lots of traffic with direct downloads from S3 buckets is more costly than using the CloudFront distribution service. So it pays off to enable the CloudFront CDN to cache files in multiple data centers around the world, backed by a single S3 bucket as the repository.

CloudFront being a cache means changes to the repository will not be visible immediately. Keep that in mind when you experiment with settings. You will need to reset the CloudFront cache in these cases to force re-caching. We'll get to that later down the road.

Public read access is considered a bad practice since IT companies apparently leaked private data this way. CloudFront is supposed to provide another layer around this and safeguard against putting private files into public folders.

You will want to keep all access levels enabled for your own AWS account, of course.

So here's how to use Amazon S3 to host files (or a static website) and offer download links using the CloudFront content distribution network.

Set up File Storage

First, you need an Amazon S3 Bucket to upload files.

Create a New Bucket

Navigate to S3 in the the AWS Console: https://console.aws.amazon.com/s3/

There, select "Create Bucket" and enter a readable ID for the bucket name. Its domain will be something like http://BUCKETNAME.s3.amazonaws.com/, so you maybe want to keep this recognizable. During the permissions setup step, keep the public access blocking enabled. This will make AWS complain when you try to change a file to be publicly visible.

It's supposed to be for your protection, and we don't need direct S3 access in a minute anyway, so stick with this.

Change an Existing Bucket for New Access Management

I had an existing bucket, so here's what I did.

The admin URL for your bucket is https://s3.console.aws.amazon.com/s3/buckets/BUCKETNAME/. You may want to keep this open in a separate browser tab.

Go to your S3 bucket's permissions. (Select S3 from the Services menu; then select your bucket from the list; then select the "Permissions" tab.)

Select the "Access Control List" permission setting page. (Yeah, another layer of tabs below tabs. I get lost in the AWS console a lot.) Remove "List objects" access for everybody but the owner.

Select the "Public access settings" permission setting page and edit the resulting settings. Enable all checkboxes to block future uploads from being made public, and retroactively removing public access settings from all available files.

Now the S3 bucket contents cannot be accessed via direct links anymore. Next, we set up the CloudFront CDN to provide the actual downloads.

Set up CloudFront for Distribution

With the bucket's content being sufficiently protected via private read-only access, we can add the CloudFront layer to manage actual publication of our contents.

Navigate to the CloudFront service in the AWS console: https://console.aws.amazon.com/cloudfront/

This is a two-step process: you will create a CloudFront distribution (resulting in a public URL) and a CloudFront user that will have read access to your files.

I think this is similar to local web server configurations where your site content is generally protected in your user's home folder, but the Apache web server can read and show the data.

To save yourself some manual setup steps, we start with the user setup because then CloudFront does most of the grunt work.

Create a CloudFront "User"

To grant CloudFront read access to your bucket, you have to link the two services. This works by creating a CloudFront user, so to speak.

With CloudFront still open, in the menu to the left select Origin Access Identity (OAI) below the "Security" sub-heading. Create a so-called OAI, which will create one such user for you.

The created OAI consists of a short ID and a long canonical user name hash. You need the latter to grant access to individual files. You need the former for bucket policies later on, though. So keep both handy.

I entered "The Archive downloads" as a comment for the ID to identify it later.

Create a CloudFront Distribution

Select "Create Distribution", then "Get Started" for a Web Distribution.

Origin Name is your S3 bucket. Click inside the field for suggestions.

Origin Path can be left empty when you want to publicize the whole bucket. In fact, I used a public sub-directory.

Now make sure to choose Restrict Bucket Access (Yes) and then, for Origin Access Identity, pick "Use an Existing Identity". In the drop-down, select the CloudFront user you just created. (If you edit an existing distribution, this setting will be tied to items in your "Origin" tab, not in the general distribution settings.)

For your convenience, also pick "Yes, Update Bucket Policy" for Grant Read Permissions on Bucket. That'll create the policies for you, which is nice.

I left most other settings at their defaults. HTTPS is nice, and I don't need to forward query parameters for downloads, so no need to change any of these.

I did change Default Root Object to index.html so I can add a simple HTML file with a link back to the main website in case someone copies the download link and tries to browse the directory.

Create the distribution.

You will need to wait a couple of minutes (about 15 in my case) until the distribution is ready and not "In Progress" anymore. Then you can access the S3 bucket files using the new CloudFront URL aka domain name.

How You Could Manage CloudFront Access Yourself

Per-File Setup

Go to your S3 bucket's permissions. (Select S3 from the Services menu; then select your bucket from the list; then select the "Permissions" tab.)

Select the "Access Control List" permission setting page. (Yeah, another layer of tabs below tabs. I get lost in the AWS console a lot.)

Below the Access for other AWS accounts heading, select "Add Account". Paste the CloudFront canonical user ID you just generated and tick "List Objects". Now your CloudFront has read access to your bucket and can list all objects.

Repeat this step for every file you want CloudFront to access.

For changing directories of stuff, like my app updates, I rather not rely on a manual process and use bucket policies instead. Think of them like regular expressions for paths to apply access rights.

CloudFront Access Rule in a S3 Bucket Policy

Look at your S3 Bucket Policy. (Select S3 from the Services menu; then select your bucket from the list; then select the "Permissions" tab; then select the "Bucket Policy" sub-tab.)

If CloudFront finished its set-up work, you will see an active policy already.

If not, or if you skipped the generation step above, here's what the policy looks like.

You can use the bucket policy generator to some extent. The hardest part for me was to find out how to specify the CloudFront user ID. StackOverflow to the rescue and reading the docs, the following template is what I ended up using. Note that the Id and Sid are human readable strings with a unique timestamp so I avoid collisons in the future:

{
    "Version": "2012-10-17",
    "Id": "CloudFrontAccess20190224100906",
    "Statement": [
        {
            "Sid": "CloudFrontReadAccess20190224100921",
            "Effect": "Allow",
            "Principal": {
                "CanonicalUser": "CANONICAL_USER_ID"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::BUCKETNAME/*"
        }
    ]
}

Adapt the template to your settings:

  • Replace BUCKETNAME with your bucket name;
  • and replace CANONICAL_USER_ID with the long OAI canonical user ID from earlier.

The s3:GetObject action is the read content/list files access right. You can come up with your own Statement ID (Sid) and policy ID (Id) if you want.

If you save the policy and refresh your browser, you'll notice that AWS replaced this line:

"CanonicalUser": "CANONICAL_USER_ID"

Instead, it'll read:

"AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity SHORT_CLOUDFRONT_ID"

You can also add additional statements to deny read access to any user except the CloudFront user if you want. Find details in the StackOverflow post. I don't use this.

How to Refresh CloudFront

If you play around with files and access rights, you may end up with CloudFront serving files that no longer exist, or denying access even though you fixed your permissions.

That's because CloudFront is a cache. Its cache is not refreshed on every request – that'd be pointless. Instead, you need to tell CloudFront to forget its cached contents.

Head to you CloudFront Distribution list, select your distribution, then go to the Invalidations tab.

Select Create Invalidation and enter a Wildcard, *, to invalidate all cached files. The new invalidation request will show in the list and be "In Progress". After a minute, my invalidations finish and you can try to access the public URLs again and see if it works now.

Conclusion

Cloud services still feel weird to me. I never know when things correctly update an if the services work together properly. I was probably pasting the download URLs into my web browser a hundred times to test the availability. But I'm happy to have learned that CloudFront caches my download all around the globe: this means faster download times! Also, it is supposed to reduce cost. I'm looking forward to next month's bill to verify.

The strangest thing about all this is the eventual consistent service setup. You create a service, then another, but things don't work right away. Changes take time. It takes a couple of minutes to reflect the new synchronized state you were creating. It's a flexible model, but a bit awkward to debug at first.

Another bonus of adding CloudFront as a layer around my S3 bucket is the use of custom SSL certificates. Apple's download restrictions (aka "Application Transport Security"), introduced a couple of years ago, made it impossible to download app updates from S3 buckets directly. The shared certificate didn't meet the restrictions Apple imposed. That was when I changed update hosting to my own web server. Once I configure the CloudFront domain's certificates properly, I can once again switch to provide downloads via Amazon's servers an save a couple of Euro cents more, yay!

A few years ago, the AWS console was worse. I couldn't do anything besides the most basic tasks. It seems the situation has improved a lot. Or I became smarter. Or less afraid to break things, I don't know. Either way, it was a pretty simple process to set up my own public file download CDN this time.

Browse the blog archive