Skip to main content

Accelerate your website on both sides of the Great Firewall, without high costs

Posted in Tech on May 13th 2016
author photo
COO and Co-Founder

At IT Consultis, we are often asked for a solution on how to get a website that will be easily accessible from both sides of the Great Firewall (GFW).

First, let's start with a bit of background on the most efficient way to accelerate a website: a CDN. A Content Delivery Network or a CDN is a distributed network of nodes that will serve your static content (images, videos, files) from different locations. The idea is to serve a user as fast as possible by serving these resources (that are often heavy) from a location that will be closed to the user. Most common typical CDN will have nodes either in China only or outside of China only, which make them useless for users that are not close geographically speaking. We therefore lose the advantage of using a CDN in the first place.

Fortunately, there are some global CDN that will accelerate resource loading from both within China and from the rest of the world. But these solution come at a price. Global CDN are usually sized for Enterprise level and cost above a couple of thousand dollars per month.

We have been working on a solution that is costing way less to maintain. This article will explain a step-by-step method about how to replicate your website easily on both sides of the GFW and send the traffic to the right server based on the user location. We will also take the opportunity to introduce the benefits of the latest version of network protocols.

Please note that the procedure described in this article requires knowledge of server administration so you can experiment at your own risk.


Prerequisites

The following tools will be needed:

  • A website (in this experiment Drupal 7)
  • A master web server outside of China
  • A domain name with ICP registration
  • A slave web server in China
  • An account on DNSPod for the DNS management.

We will be using the following LEMP stack on the server:

  • Ubuntu 14.04 LTS
  • PHP 5.6
  • Nginx 1.9.5 or above
  • Mysql 5.5
  • Varnish Cache 3

What we will achieve

We'll describe here a way to get a website replicated through Varnish on a remote server. We will then use DNSPod to redirect the user to one of the two servers based on the user's location. We will have most of the pages served by Varnish and only the pages that are not yet cached by Varnish will be generated by the web server once requested by the user.

In addition, we will take the opportunity to enable the latest network technologies : HTTP/2.

Since certain pages will need to be generated each time by the web server, these can be easily excluded from being cached using Varnish.

Architecture

To keep things simple, we are only going to use two servers in this article but a more complex architecture can be achieved with some additional effort. A better solution would be having the servers containing Varnish separated from the back-end server, and the back-end server(s) separated from the database(s) server.

Server 1

This is going to be our main server and it will contain all the softwares needed to run our application.

  • Nginx
  • Varnish
  • php-fpm
  • Mysql

Server 2

This server is used just to cache the requests that are redirected to it.

  • Nginx
  • Varnish
                    +------+
                    | USER |
                    +---+--+
                        |
                        |
                     +--v--+
                     | DNS |
             +-------+-----+------+
             |                    |
             |                    |
             |                    |
             |                    |
+------------+---------+  +-------+--------------+
|Nginx reverse proxy 1 |  | Nginx reverse proxy 2|
+----------+-----------+  +----------+-----------+
           |                         |
           |                         |
           |                         |
  +--------+---------+      +--------+---------+
  | Varnish server 1 |      | Varnish server 2 |
  +--------+---------+      +--------+---------+
           |                         |
           |                         |
+----------+------------+            |
| Nginx back-end server +------------+
+----------+------------+
           |
           |
      +----+----+
      | php-fpm |
      +----+----+
           |
           |
      +----+-----+
      |  Mysql   |
      +----------+

DNS

First thing will be to setup your domain to be using DNSPod's NameServers.

  1. Create an account on DNSPod
  2. Add your domain
  3. Recreate your zone as per your previous configuration
  4. Update your NameServers to be using the one given by DNSPod

LEMP

We will assume that you already have this environment ready. Otherwise, you can refer to this article: How to Install Linux, Nginx, MySQL, PHP (LEMP) stack on Ubuntu 14.04.

Keep in mind that we want to use new HTTP/2 protocol to get even more speed. The idea is to be able to load all your resources in one multiplexed connection which brings great speed. Most of the modern browsers support this protocol.

In order to get HTTP/2 working, you will need to have HTTPS support; otherwise browsers will not be able to load the website. You will need to get a SSL Certificate installed.

SSL

Since we will be using HTTP/2, we need to get SSL certificates installed. You can choose the provider that you want but we can only be amazed by Let's encrypt initiative that is finally available and making use of SSL accessible to anyone for free. We can already see some hosting companies integrating this service for a more private and safer web.

If you don't know how to use this tool, we would recommend you to check this tutorial at Digital Ocean.

Let's encrypt certs come with a 30-day validity. Since the certs need to be identical on the two servers, it might be easier to go for a more classic 365 days certificate.

HTTP/2

Nginx is now supporting HTTP/2 with a quite easy setup, let's see what we have to do here:

1 - Disable SPDY support by removing the listen directives from your Nginx configuration file

2 - Enable HTTP/2 by adding http2 and your SSL Certificates. Your Nginx configuration file regarding this part should look similar to this:

server {
    server_name mywebsite.com;
    listen 443 ssl __http2__ default_server;

    ssl_certificate    server.crt;
    ssl_certificate_key server.key;
    ...

    return 301 https://www.mywebsite.com$request_uri;
}

server {
    server_name www.mywebsite.com;
    listen 443 ssl __http2__ default_server;

    ssl_certificate    server.crt;
    ssl_certificate_key server.key;
    ...
}

3 - Create your back-end server configuration (or update your previously existing one) to let it listen on port 8000, this should look similar to this:

server {

    server_name www.mywebsite.com;
    listen 8000;
    
    [ALL THE REWRITES AND php-fpm CONFIGURATION]
}

4 - Restart Nginx with this command nginx -s reload

Varnish Cache

Varnish Cache is a HTTP accelerator. Here we will use it as a proxy to serve the HTML output.

Install Varnish Cache 3 by running:

apt-get install varnish

On Ubuntu 14.04, this will install version 3.0.7 by default.

OK, now that we have our master server that is working on http2, let's dig into Varnish configuration file.

1 - Edit you Varnish daemon configuration in the file located at /etc/default/varnish to listen on port 80

DAEMON_OPTS="-a :80 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -S /etc/varnish/secret \
             -s malloc,64m"

2 - Edit your Varnish configuration file that should be in /etc/varnish/default.vcl

 backend default {
    .host = "127.0.0.1";
    .port = "8000";
    .first_byte_timeout = 3000s;
}

acl cache_acl {
    "127.0.0.1";
    # insert additional ip's here
}


# Like the default function, only that cookies don't prevent caching
sub vcl_recv {
    if (req.http.host ~ "^(www.)?$host$request_uri" && server.port == 80) {
        error 750 "https://$host$request_uri" + req.url;
    }
    else if (req.http.host == "www.$host$request_uri") {
        error 750 "https://$host$request_uri" + req.url;
    }

    # Don't do anything
    # return (pipe);
    # see http://www.varnish-cache.org/trac/wiki/VCLExampleNormalizeAcceptEncoding
    # parse accept encoding rulesets to normalize
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|mp4|flv)$") {
            # don't try to compress already compressed files
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

    if (req.http.x-forwarded-for) {
        set req.http.X-Forwarded-For =
        req.http.X-Forwarded-For + ", " + client.ip;
    } else {
        set req.http.X-Forwarded-For = client.ip;
    }

    # Some known-static file types
    if (req.url ~ "^[^?]*\.(css|js|htc|xml|txt|swf|flv|pdf|gif|jpe?g|png|ico)$") {
        # Pretend no cookie was passed
        unset req.http.Cookie;
    }

    # PURGE requests
    if (req.request == "PURGE") {
        if (client.ip ~ cache_acl) {
            ban(req.url);
        } else {
            error 405 "Not allowed.";
        }
    }

    if (req.request != "GET" && req.request != "HEAD") {
        # We only deal with GET and HEAD by default
        return (pass);
    }

    if (req.http.Authorization) {
        return (pass);
    }

    if (req.url ~ "^/(index.php/)?(en/|fr/|cn/)?(user|admin|contact).*$") {
        return(pass);
    }

    set req.http.X-Cached = true;
    return (lookup);
}

# Called after the document was retrieved from back-end
sub vcl_fetch {
    set req.grace = 30s;

    # Flags set when we want to delete cache headers received from back-end
    if (req.http.magicmarker){
        unset beresp.http.magicmarker;
        unset beresp.http.Cache-Control;
        unset beresp.http.Expires;
        unset beresp.http.Pragma;
        unset beresp.http.Cache;
        unset beresp.http.Server;
        unset beresp.http.Set-Cookie;
        unset beresp.http.Age;

        # default ttl for pages
        set beresp.ttl = 1d;
    }
    if (req.http.staticmarker) {
        set beresp.ttl = 30d; # static file cache expires in 30 days
        unset beresp.http.staticmarker;
        unset beresp.http.ETag; # Removes Etag in case we have multiple front ends
    }
    if (req.http.X-Requested-With == "XMLHttpRequest"){
        set beresp.http.X-Origin = "ajax";
    }
    if (beresp.http.Content-Type ~ "html"){
        set beresp.http.Cache-Control = "no-cache";
    }
    if (req.http.X-Cached) {
        unset beresp.http.Set-Cookie;
    }
    # Don't allow static files to set cookies.
    # (?i) denotes case insensitive in PCRE (perl compatible regular expressions).
    # This list of extensions appears twice, once here and again in vcl_recv so
    # make sure you edit both and keep them equal.
    if (req.url ~ "(?i)\.(pdf|asc|dat|txt|doc|xls|ppt|tgz|csv|png|gif|jpeg|jpg|ico|swf|css|js)(\?.*)?$") {
        unset beresp.http.set-cookie;
    }

    unset beresp.http.X-Drupal-Cache;

    return (deliver);
}

sub vcl_error {
    # See "750" in vcl_recv
    if (obj.status == 750) {
        set obj.http.Location = obj.response;
        set obj.status = 301;
        return (deliver);
    }

    set obj.http.Content-Type = "text/html; charset=utf-8";
    set obj.http.Retry-After = "5";
    synthetic {"
<!--?xml version="1.0" encoding="utf-8"?-->


  
    <title>"} + obj.status + " " + obj.response + {"</title>
  
  
    <h1>Error "} + obj.status + " " + obj.response + {"</h1>
    <p>"} + obj.response + {"</p>
    <h3>Guru Meditation:</h3>
    <p>XID: "} + req.xid + {"</p>
    <hr>
    <p>Varnish cache server</p>
  

"};
    return (deliver);
}

Optionally, you can add debugging information to http headers to check if Varnish is working well.

# Adding debugging information (Optional)
sub vcl_deliver {
    if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT (" + obj.hits + ")";
        set resp.http.Server = "Varnish (HIT: " + obj.hits + ")";
    } else {
        set resp.http.X-Cache = "MISS";
        set resp.http.Server = "Varnish (MISS)";
    }
}

Drupal with Varnish

It is important to note that; we found out that the popular Varnish HTTP Accelerator Integration module for Drupal is not working as expected. Main issue we were facing was the cache which was not refreshing properly after modifying a page.

DNS Route and geo-location

Once all of this is done, you can start testing by modifying your /etc/hosts file and verifying that both IPs are loading the website. Once confirmed, we can move on with the DNS setup.

The idea will be to have our two DNS records for the same host with two different lines based on the IP of the user.

DNSpod offers this great option to enable geo-aware DNS meaning that we can have the same record with multiple IPs so the user is sent to the right server right from the DNS connection. It is called Line in DNSPod.

DNSPod Setup


According to 360 audit tool, the Chinese IP will be hit most of the time from China resulting in a faster website. When hitting from outside of China, the other IP will simply be hit. We now have a website that is easily accessible from all over the world, without using a CDN and by having a limited hardware infrastructure.