TL;DR

Google App Engine is very easy to provision - automatic tls certs, friendly domain name, caching, scaling and robust debugging are among the useful features.

However using GAE to serve a static Jekyll site (admittedly not it’s most important use-case) has issues ranging from annoying to serious.

Before deploying a static site on GAE, make sure the issues outlined below are not blockers for you.

Deploying your Jekyll site on GAE

Jekyll creates a _site/ directory with all of your site’s markdown rendered to .html. Google App Engine takes a app.yaml description of your site and publishes the relevant files.

GAE issue: differentiating files and directories

One nice feature of Jekyll is Permalink style which let’s you link to your pages without annoying suffixes such as .html. Unfortunately GAE doesn’t handle this well.

Most webservers can handle all three of the following scenarios:

Type URL Proper Webserver Serves GAE Serves Makes sense?
directory _site/fun/ _site/fun/index.html _site/fun/index.html yep
directory _site/fun _site/fun/index.html _site/fun.html ???
file _site/fun _site/fun.html _site/fun.html yep

There is a workaround … you have to tell GAE about every directory and file in your site!

#!/bin/bash

# Script to generate the site yaml for GAE to serve your site
# Get all the unique file suffixes in the _site; e.g. ".txt", ".png".
# Output in form "js|html|yml", etc.
suffixes=`find _site -type f -iname \*.* -print | sed 's/.*\.//' | sort | uniq | paste -sd "|" -`

# Create a static file handler based on all the suffixes
printf -- "- url: /(.*\\.(%s))$\n" $suffixes
printf "  static_files: _site/\\\1\n"
printf "  upload: _site/(.*\\.(%s))$\n" $suffixes

Now let’s handle the case where someone forgets to append a / to a directory URL, for example: https://_site/fun when they mean to say https://_site/fun/

#!/bin/bash

# Get all the directories in the _site; e.g. "fun"
# Output in form "/dir1|/dir2", etc.
directories=`find _site -type d -print | sed 's/_site\///g' | sort | uniq | grep -v _site | paste -sd "|" -`

# Create a handler for URLs with a directory that do NOT have a terminal /.
printf "\n"
printf "# Handle any directory URLs that are missing a terminal /\n"
printf "#\n"
printf -- "- url: /(%s)$\n" $directories
printf "  static_files: _site/\\\1/index.html\n"
printf "  upload: _site/(%s)/index.html\n" $directories

Okay, this is awkward and doesn’t scale, but fortunately we have a pretty small site with fewer than 100 directories and 500 files. The generated app.yaml is big, but not impossibly big.

GAE issue: mime type for scripts or yaml?

Most webservers give you a robust pre-populated list of mime types. There are only about a dozen file suffixes in our site, e.g. yaml, txt, html, jpg. Nothing exotic.

Turns out GAE’s webserver doesn’t know what to do with .bash, .sh, or .yaml. Again, not a big deal, but our script is getting bigger:

# Serve up static files based on suffix. See:
#   https://www.iana.org/assignments/media-types/media-types.xhtml
#
# Specify mime-type for yaml files since GAE doesn't handle this correctly.
#
- url: /(.*\.(yaml|yml))$
  static_files: _site/\1
  mime_type: text/x-yaml
  upload: _site/(.*\.(yaml|yml))$

# Specify mime-type for sh files since GAE doesn't handle this correctly.
#
- url: /(.*\.(sh|bash))$
  static_files: _site/\1
  mime_type: text/x-shellscript
  upload: _site/(.*\.(sh|bash))$

# For all remaining files, let GAE infer mime-type

GAE issue: GAE sends compressed files, unexpectedly

This one drove me crazy for a little while. Trying to download a yaml file: curl -O https://my.site.io/my_kubernetes_manifest.yaml returned binary junk in the file!

It turns out GAE sometimes sends back gzipped data, even if the request header of the client doesn’t say it can accept it. But only on https, not http. And only sometimes.

There’s a one-year old bug Google frontend serves gzipped content even if the client doesn’t ask for it which captures this issue perfectly.

There are other interesting, and unresolved bugs around this area too. Try the handy search GAE issues.

The workaround is to force curl to expect a gzipped response: curl --compressed -O https://my.site.io/my_kubernetes_manifest.yaml.

Conclusion

GAE is a subpar approach for serving static content for the reasons outlined above (and others). More worryingly, the slow response to outstanding bugs implies Google’s attention is elsewhere.

Justin Krause did a nice job of reviewing GAE for python dynamic sites. GAE covered their needs nicely. As long as you’re using GAE for exactly what it was designed for, and you don’t mind working around bugs, GAE may work for you.

Serving a Jekyll site with Google App Engine

Here’s the bash script for generating the GAE site yaml in it’s entirety.

#!/bin/bash

# Script that inspects the jekyll-generated _site and emits
# a Google App Engine service specification file on stdout.

if [ ! -d _site ]; then
  echo _site/ doesn\'t exist yet, make sure you run \"jekyll build\".
  exit 1
fi

cat <<EOF
# A Google Application Engine "service" definition. The reference
# to python is a red-herring - we're just statically serving the
# contents of the _site subdirectory. GAE isn't setup to do this
# easily - we need to tell it how to handle each type of file, as
# well as infer "index.html" when we just refer to a site directory.
#
runtime: python27
api_version: 1
threadsafe: yes

handlers:
# Serve up static files based on suffix. See:
#   https://www.iana.org/assignments/media-types/media-types.xhtml
#
# Specify mime-type for yaml files since GAE doesn't handle this correctly.
#
- url: /(.*\.(yaml|yml))$
  static_files: _site/\1
  mime_type: text/x-yaml
  upload: _site/(.*\.(yaml|yml))$

# Specify mime-type for sh files since GAE doesn't handle this correctly.
#
- url: /(.*\.(sh|bash))$
  static_files: _site/\1
  mime_type: text/x-shellscript
  upload: _site/(.*\.(sh|bash))$

# For all remaining files, let GAE infer mime-type
#
EOF

# Get all the unique file suffixes in the _site; e.g. ".txt", ".png".
# Output in form "js|html|yml", etc.
suffixes=`find _site -type f -iname \*.* -print | sed 's/.*\.//' | sort | uniq | paste -sd "|" -`

# Create a static file handler based on all the suffixes
printf -- "- url: /(.*\\.(%s))$\n" $suffixes
printf "  static_files: _site/\\\1\n"
printf "  upload: _site/(.*\\.(%s))$\n" $suffixes

# Enumerate all the directories in the _site; e.g. "fun/". Output in form "/dir1|/dir2", etc.
directories=`find _site -type d -print | sed 's/_site\///g' | sort | uniq | grep -v _site | paste -sd "|" -`

# Create a handler for URLs with a directory that do NOT have a
# terminal /. This is a fail-safe in case someone misconstructs
# the URL without a terminal /.
printf "\n"
printf "# Handle any directory URLs that are missing a terminal /\n"
printf "#\n"
printf -- "- url: /(%s)$\n" $directories
printf "  static_files: _site/\\\1/index.html\n"
printf "  upload: _site/(%s)/index.html\n" $directories

cat <<EOF

# Default directory/html append rules
#
- url: /
  static_files: _site/index.html
  upload: _site/index.html

# For directories indicated by a terminal /, append "/index.html"
- url: /(.+)/
  static_files: _site/\1/index.html
  upload: _site/(.+)/index.html
  expiration: "15m"

# For md files, append ".html"
- url: /(.+[a-z0-9])
  static_files: _site/\1.html
  upload: _site/(.+[a-z]).html
  expiration: "15m"

- url: /(.+)
  static_files: _site/\1/index.html
  upload: _site/(.+)/index.html
  expiration: "15m"

- url: /(.*)
  static_files: _site/\1
  upload: _site/(.*)

libraries:
- name: webapp2
  version: "2.5.2"
EOF