2018-05-10 22:42:46
18 votes, rating 6
With the updates to my development workflow (as detailed in the previous blog), I added a feature that automates a process called cache-busting.
Before I explain the automated process, I'll begin by going over what it is and why it's necessary.
Beyond media (images and other graphical content), a typical webpage has three different basic components:
- Markup (HTML): The content of the page
- Style (CSS): The layout and "looks" of the page
- Script (JavaScript): The dynamic functionality of the page.
The markup is the root file, and it links to style and script roughly like this (slightly simplified):
markup.html:
<html>
<head>
<script src="script.js" />
<link href="style.css" />
</head>
<body>
Content goes here...
</body>
</html>
When you load a page like that, your browser will first get the html file, and process it. When it runs into the script and style references, it will request these from the web server as well.
Now, if this was always done, the web would be very slow overall, so your browser does something called "caching". This simply means that the browser remembers that it has loaded various things from the server and instead of requesting content it already downloaded before it simply reuses the previously fetched script and styles (the same is done for images, but normally not for HTML files).
This is all great under normal circumstances, but as a developer, I often change these scripts and styles on the server. Without doing anything, these updates wouldn't get loaded by the browsers unless you forced it to (shift-reload does this). This is clearly not very good, as an updated HTML would sometimes require that the script also changes to make sense and to have the webpage display properly.
To get around this, a technique called cache-busting is utilized, where the basic concept is to change the link to the scripts and styles so that the browser thinks it has not seen them before. A naïve way to do this would be to always make new names to your scripts (script-v1.js, script-v2.js, etc) and change your HTML to match. This approach would absolutely work, but would require a lot of manual changes to html files and also require you to clean up old versions of a script when you make a new one.
Instead, a common technique is to add things to the end of the script filename that points to the same file, but has a different URL, such as "script.js?v1". The question mark has special meaning on the web, and the file loaded by the server will be the one before the question mark. That solves one of the problems, but one thing still remains: You still have to change the html file every time a script is changed. I used to have a similar system in place for the site prior to the switch to the new dev platform.
With the changes I've done to the workflow, I was able to introduce a component called "gulp-rev". What this component does is two things:
1. Every time a file is run through it, a "hash" (sort of a finger print) is calculated for the contents. This hash gives a 10 digit hexadecimal string and uniquely identifies the contents of the file.
2. A dictionary is set up which allows conversion from the normal name of the script to a name that includes this hash.
For example, this dictionary from step 2 would contain a record like:
script.js => _a5b623b43d_/script.js
(For those of you who are familiar with gulp-rev, you'll realize this is a custom format and not the default one)
With me processing HTML through a scripting language (PHP in my case), I can modify the HTML above to the following (again, simplified):
markup.php:
<html>
<head>
<script src="<?php echo cacheLookup('script.js'); ?>" />
<link href="style.css" />
</head>
<body>
Content goes here...
</body>
</html>
The cacheLookup() function looks up the hashed version of script.js, and it will be replaced just as if I had written the following:
<script src="_a5b623b43d_/script.js" />
And whenever I process the script.js file with new changes, the hex hash will change and browsers will be happy.
So now you're asking why this doesn't force me to have loads of folders and files on the web server for these scripts. What I've done is to have a web server rewrite, that simply looks for a request looking like: (something)/_(10 digits)_/(something).js and simply treats this as if the 10 digit part isn't there, meaning the file it actually sends for any request like that is (something)/(something).js, which is the original name of the file, and this is the file I've stored on the web server.
And that's it. An automated way for me to do changes to a page and the script (and style) will automatically be updated properly on the browser side. I am expecting this to significantly reduce the moments where I deploy an update and forget to update the html to force a script to be reloaded.. Something that has happened way too often in the past. :)
Note that this new method is not very prevalent on the site just yet (you'll see it in action on the tournaments and boxtrophy pages at the time of writing).
Hopefully, this blog was a bit easier to digest than the previous one, which I fully realize was quite technical and complex.