Analysis of federal .gov domains, 2015 edition
In 2011 and then again in 2014 I used a small tool that I wrote to crawl every site on the publicly available list of Federal Executive .gov domains to get a better sense of the state of federal IT, at least when it comes to agencies’ public-facing web presence. This weekend, I decided to resurrect that effort, with the recently updated list of .gov domains, and with a more finely tuned version of the open source Site Inspector tool, thanks to some contributions from Eric Mill.
You can always compare them to the original 2011 or 2014 crawls, or browse the entire dataset for yourself, but here are some highlights of what I found:
- 1177 of those domains are live (about 86%, up from 83% last year, and 73% originally)
- Of those live domains only 75% are reachable without the
www.
prefix, down from 83% last year -
722 sites return an
AAAA
record, the first step towards IPv6 compliance (up from 64 last year, and 10 before that, more than a 10x increase) - 344 sites are reachable via HTTPS (stagnant at one in four from last year), and like last year, only one in ten enforce it.
- 87% of sites have no decreeable CMS (the same as last year), with Drupal leading the pack with 123 sites, WordPress with 29 sites (double from last year), and Joomla powering 8 (up one from last year)
- Just shy of 40% of sites advertise that they are powered by open source server software (For example, Apache, Nginx), up from about a third last year, with about one in five sites responding that they are powered by closed source software (for example, Microsoft, Oracle, Sun)
- 61 sites are still somehow running IIS 6.0 (down from 74 last year), a 10+ year old server
- HHS is still the biggest perpetrator of domain sprawl with 117 domains (up from 110 last year), followed by GSA (104, down from 105), Treasury (95, up from 92), and Interior (86, down from 89)
- Only 67 domains have a
/developer
page, 99 have a/data
page, and 74 have a/data.json
file, all significantly down from past years, due to more accurate means of calculation, which brings us to -
255, or just shy of 20% of domains, don’t properly return “page not found” or 404 errors, meaning if you programmatically request their
/data.json
file (or any other non-existent URL), the server will tell you that it’s found the requested file, but really respond with a human-readable “page not found” error, making machine readability especially challenging
Edit (May 12, 2015): As @konklone properly points out the list now includes legislative and judicial .gov domains, and thus isn’t limited to just to federal executive .govs
.
As I’ve said in past years, math’s never been my strong point, so I highly encourage you to check my work. You can browse the full results at dotgov-browser.herokuapp.com or check an individual site (.gov or otherwise) at site-inspector.herokuapp.com. The source code for all tools used, is available on GitHub. If you find an error, I encourage you to open an issue or submit a pull request.
If you enjoyed this post, you might also enjoy:
- Analysis of Federal Executive .govs (Part Deux)
- Why open source
- How to make a product great
- Five best practices in open source: external engagement
- Analysis of federal .gov domains, pre-Biden edition
- Twelve tips for growing communities around your open source project
- How I re-over-engineered my home network for privacy and security
- The difference between 18F and USDS
- Everything a government attorney needs to know about open source software licensing
- Everything an open source maintainer might need to know about open source licensing
- Four characteristics of modern collaboration tools
Ben Balter is the Director of Hubber Enablement within the Office of the COO at GitHub, the world’s largest software development platform, ensuring all Hubbers can do their best (remote) work. Previously, he served as the Director of Technical Business Operations, and as Chief of Staff for Security, he managed the office of the Chief Security Officer, improving overall business effectiveness of the Security organization through portfolio management, strategy, planning, culture, and values. As a Staff Technical Program manager for Enterprise and Compliance, Ben managed GitHub’s on-premises and SaaS enterprise offerings, and as the Senior Product Manager overseeing the platform’s Trust and Safety efforts, Ben shipped more than 500 features in support of community management, privacy, compliance, content moderation, product security, platform health, and open source workflows to ensure the GitHub community and platform remained safe, secure, and welcoming for all software developers. Before joining GitHub’s Product team, Ben served as GitHub’s Government Evangelist, leading the efforts to encourage more than 2,000 government organizations across 75 countries to adopt open source philosophies for code, data, and policy development. More about the author →
This page is open source. Please help improve it.
Edit