Regular Expression to Parse Word-style Footnotes into WordPress’s Simple Footnotes Format
I needed a quick-and-easy way to parse Microsoft Word’s footnote format into a more web-friendly format for a recent project. After a bit of regular expression hacking, I was able to build a WordPress plugin to automatically convert content pasted from Word into a format readable by Andrew Nacin’s popular Simple Footnotes plugin.
The process is surprisingly simple given WordPress’s extensive filter API. First, to grab the footnotes from Word’s ftnref
format:
<?php
//grab all the Word-style footnotes into an array
$pattern = '#<a href\\="#_ftnref([0–9-]+)">[([0–9-]+)]</a> (.\*)#';
preg_match_all( $pattern, $content, $footnotes, PREG_SET_ORDER);
?>
This creates an array ($footnotes
) with the both the footnote number and the text of the footnote. We then need a way to replace the in-text reference with the parsed footnotes so that Simple Footnotes can understand them. I did this by creating two arrays, a find array and a replace array with each Word-style footnote reference and its Simple Footnote formatted counterpart:
<?php
//build find and replace arrays
foreach ($footnotes as $footnote) {
$find[] = '#<a href\\="#_ftn'.$footnote[1].'">['.$footnote[1].']</a>#';
$replace[] = '[ref]' . str_replace( array("\\r\\n", "\\r", "\\n"), "", $footnote[3]) . '[/ref]';
}
?>
Finally, so that the entire replacement can be done in a single pass, push a final find/replace pair into the end of the array, to remove the original footnotes:
<?php
//remove all the original footnotes when done
$find[] = '#<div>\s*<a href\="\#_ftnref([0-9]+)">[([0-9]+)\]</a> (.*)\s*</div>\s+#';
$replace[] = '';
?>
Because PHP’s preg_replace
function can handle arrays, all we have to do is run a single function:
<?php
$content = preg_replace( $find, $replace, $content );
?>
Putting it all together, including a filter hook to call our function and a meta_value
flag to prevent parsing on subsequent saves, the result is:
To use, you can download the plugin file1 and activate (be sure you already have Simple Footnotes installed). Copy the content from Word, and Paste into the “Paste from Word” box (may need to toggle the “Kitchen Sink”.2
Thoughts? Improvements? The above code solved a rather stubborn workflow problem in a project I was working on, and hopefully it can do the same for you. Feel free to use/improve the above code.
-
You can even Fork the plugin over on GitHub ↩
If you enjoyed this post, you might also enjoy:
- How I re-over-engineered my home network for privacy and security
- Twelve tips for growing communities around your open source project
- 15 rules for communicating at GitHub
- Why WordPress
- Why open source
- Everything an open source maintainer might need to know about open source licensing
- How to make a product great
- 19 reasons why technologists don't want to work at your government agency
- Four characteristics of modern collaboration tools
- How I over-engineered my home network for privacy and security
- Five best practices in open source: external engagement
Ben Balter is the Director of Hubber Enablement within the Office of the COO at GitHub, the world’s largest software development platform, ensuring all Hubbers can do their best (remote) work. Previously, he served as the Director of Technical Business Operations, and as Chief of Staff for Security, he managed the office of the Chief Security Officer, improving overall business effectiveness of the Security organization through portfolio management, strategy, planning, culture, and values. As a Staff Technical Program manager for Enterprise and Compliance, Ben managed GitHub’s on-premises and SaaS enterprise offerings, and as the Senior Product Manager overseeing the platform’s Trust and Safety efforts, Ben shipped more than 500 features in support of community management, privacy, compliance, content moderation, product security, platform health, and open source workflows to ensure the GitHub community and platform remained safe, secure, and welcoming for all software developers. Before joining GitHub’s Product team, Ben served as GitHub’s Government Evangelist, leading the efforts to encourage more than 2,000 government organizations across 75 countries to adopt open source philosophies for code, data, and policy development. More about the author →
This page is open source. Please help improve it.
Edit