Regular Expression to Parse Word-style Footnotes into WordPress’s Simple Footnotes Format

I needed a quick-and-easy way to parse Microsoft Word’s footnote format into a more web-friendly format for a recent project. After a bit of regular expression hacking, I was able to build a WordPress plugin to automatically convert content pasted from Word into a format readable by Andrew Nacin’s popular Simple Footnotes plugin.

The process is surprisingly simple given WordPress’s extensive filter API. First, to grab the footnotes from Word’s ftnref format:

<?php
//grab all the Word-style footnotes into an array
$pattern = '#&lt;a href\\="#_ftnref([0–9-]+)">\[([0–9-]+)]</a> (.\*)#';
preg_match_all( $pattern, $content, $footnotes, PREG_SET_ORDER);
?>

This creates an array ($footnotes) with the both the footnote number and the text of the footnote. We then need a way to replace the in-text reference with the parsed footnotes so that Simple Footnotes can understand them. I did this by creating two arrays, a find array and a replace array with each Word-style footnote reference and its Simple Footnote formatted counterpart:

<?php
//build find and replace arrays
foreach ($footnotes as $footnote) {
 $find\[] = '#&lt;a href\\="#_ftn'.$footnote[1].'">\['.$footnote[1].']</a>#';
 $replace\[] = '[ref]' . str_replace( array("\\r\\n", "\\r", "\\n"), "", $footnote[3]) . '[/ref]';
}
?>

Finally, so that the entire replacement can be done in a single pass, push a final find/replace pair into the end of the array, to remove the original footnotes:

<?php
//remove all the original footnotes when done
$find[] = '#<div>\s*<a href\="\#_ftnref([0-9]+)">\[([0-9]+)\]</a> (.*)\s*</div>\s+#';
$replace[] = '';
?>

Because PHP’s preg_replace function can handle arrays, all we have to do is run a single function:

<?php
$content = preg_replace( $find, $replace, $content );
?>

Putting it all together, including a filter hook to call our function and a meta_value flag to prevent parsing on subsequent saves, the result is:

To use, you can download the plugin file1 and activate (be sure you already have [Simple Footnotes][2] installed). Copy the content from Word, and Paste into the “Paste from Word” box (may need to toggle the “Kitchen Sink”.2

Thoughts? Improvements? The above code solved a rather stubborn workflow problem in a project I was working on, and hopefully it can do the same for you. Feel free to use/improve the above code.

  1. Licensed under GPLv2 

  2. You can even Fork the plugin over on GitHub 

benbalter

Prior to GitHub, Ben was a member of the inaugural class of Presidential Innovation Fellows where he served as entrepreneur in residence reimagining the role of technology in brokering the relationship between citizens and government. Ben has also served as a Fellow in the Office of the US Chief Information Officer within the Executive Office of the President where he was instrumental in drafting the President’s Digital Strategy and Open Data Policy, on the SoftWare Automation and Technology (SWAT) Team, the White House’s first and only agile development team, and as a New Media Fellow, in the Federal Communications Commission’s Office of the Managing Director. His paper, Towards a More Agile Government was published in the Public Contract Law Journal, arguing that Federal IT Procurement should be more amenable to modern, agile development methods. More about the author →

This content is open source.
Please help improve it.