Helpful 404s for Jekyll (and GitHub Pages)
404 - not found
pages for Jekyll and GitHub pages that automatically suggest similar URLs to the one requested based on your site’s sitemap.xml
.
While the internet has long had a soft spot for clever 404
pages, it’s rare to see one that’s actually helpful, especially for static sites like Jekyll or GitHub Pages that make dynamic searches more difficult. Great 404 pages should help visitors find what they’re looking.
Here’s how I updated the 404
(not found) pages on my own site to resolve typos and suggest other pages potentially relevant to the visitor’s intended URL, in case you’d like to implement the same or similar functionality on your own site:
How my 404 page suggests alternate URLs
If you were to click an invalid link or typo a URL on my site, the following would occur:
- You’d see a
404 - not found
page1 - Your browser would retrieve and parse my site’s
sitemap.xml
2 - Your browser would find the valid path that has the shortest edit distance from the path you requested
- Your browser would update the
404
page with a link to the suggested path
What it looks like
Let’s say you tried to navigate to a path that doesn’t exist like /2022/06/30/unhelpful-404s-for-jekyll
. Along with a list of recent posts, the experience, would look something like this:
Perhaps you're looking for /2022/06/30/helpful-404s-for-jekyll-and-github-pages/?
How it works
This functionality is driven by a surprisingly small amount of JavaScript (really TypeScript):
import { closest } from 'fastest-levenshtein';
const div = document.getElementById('four-oh-four-suggestion');
if (div) {
const xhr = new XMLHttpRequest();
xhr.onload = () => {
if (xhr.status === 200) {
const xml = xhr.responseXML;
const urls = Array.from(xml.querySelectorAll('urlset > url > loc')).map((el) => el.textContent);
const url = new URL(closest(window.location.href, urls));
div.innerHTML = `<a href="${url.href}">${url.pathname}</a>`;
} else {
div.innerHTML = '<a href="/">/</a>';
}
};
xhr.open('GET', `${window.location.protocol}//${window.location.host}/sitemap.xml`);
xhr.send();
}
The v0.1
Could it be written better? Absolutely (but it works!). For now, I’m using fastest-levenshtein
to find the closest path to the one requested, and the lower level XMLHttpRequest
and querySelectorAll
to retrieve and parse the XML sitemap.
Along with better error handling, this could also be implemented with the more modern fetch
API to retrieve the sitemap and something like fast-xml-parser
to more properly parse the XML, but my modern JavaScript knowledge is limited.3 If you’d like to take a pass at a better implementation, pull requests are always welcome.
Conclusion
When I click on a broken link, the site that I land on should point me in the right direction. After all typo’d or updated URLs are not uncommon, and the site I’m visiting knows more about the site’s content and structure than I ever will. While it’s still true that everything should have a URL, sometimes those URLs change or get lost in translation. Although you might hope a visitor would never see one, great 404 pages go that extra step and help visitors find what they’re looking for. If you’re interested in implementing the same functionality on your own site, the code above is part of the retlab
Jekyll theme, and is licensed under The MIT License.
-
When a visitor tries to access a URL that does not exist, GitHub Pages will serve the
404.html
file in the site’s root directory, if one exists. ↩ -
Generated automatically by the Jekyll Sitemap plugin. The same implementation would work with any other static site (or static site generator), so long as your site has a comprehensive
sitemap.xml
. ↩ -
I’m proud to say that no
jQuery
was harmed in the making of this functionality. ↩
If you enjoyed this post, you might also enjoy:
- 15 rules for communicating at GitHub
- Twelve tips for growing communities around your open source project
- How I over-engineered my home network for privacy and security
- Why open source
- How to make a product great
- Four characteristics of modern collaboration tools
- Everything an open source maintainer might need to know about open source licensing
- How I re-over-engineered my home network for privacy and security
- Speak like a human: 12 ways tech companies can write less-corporate blog posts
- Eight things I wish I knew my first week at GitHub
- Using GitHub Pages to showcase your organization's open source efforts
Ben Balter is the Director of Hubber Enablement within the Office of the COO at GitHub, the world’s largest software development platform, ensuring all Hubbers can do their best (remote) work. Previously, he served as the Director of Technical Business Operations, and as Chief of Staff for Security, he managed the office of the Chief Security Officer, improving overall business effectiveness of the Security organization through portfolio management, strategy, planning, culture, and values. As a Staff Technical Program manager for Enterprise and Compliance, Ben managed GitHub’s on-premises and SaaS enterprise offerings, and as the Senior Product Manager overseeing the platform’s Trust and Safety efforts, Ben shipped more than 500 features in support of community management, privacy, compliance, content moderation, product security, platform health, and open source workflows to ensure the GitHub community and platform remained safe, secure, and welcoming for all software developers. Before joining GitHub’s Product team, Ben served as GitHub’s Government Evangelist, leading the efforts to encourage more than 2,000 government organizations across 75 countries to adopt open source philosophies for code, data, and policy development. More about the author →
This page is open source. Please help improve it.
Edit