Regular Expression to Parse Word-style Footnotes into WordPress’s Simple Footnotes Format

I needed a quick-and-easy way to parse Microsoft Word’s footnote format into a more web-friendly format for a recent project. After a bit of regular expression hacking, I was able to build a WordPress plugin to automatically convert content pasted from Word into a format readable by Andrew Nacin’s popular Simple Footnotes plugin.

The process is surprisingly simple given WordPress’s extensive filter API. First, to grab the footnotes from Word’s ftnref format:

<?php
//grab all the Word-style footnotes into an array
$pattern = '#&lt;a href\\="#_ftnref([0–9-]+)">\[([0–9-]+)]</a> (.\*)#';
preg_match_all( $pattern, $content, $footnotes, PREG_SET_ORDER);
?>

This creates an array ($footnotes) with the both the footnote number and the text of the footnote. We then need a way to replace the in-text reference with the parsed footnotes so that Simple Footnotes can understand them. I did this by creating two arrays, a find array and a replace array with each Word-style footnote reference and its Simple Footnote formatted counterpart:

<?php
//build find and replace arrays
foreach ($footnotes as $footnote) {
 $find\[] = '#&lt;a href\\="#_ftn'.$footnote[1].'">\['.$footnote[1].']</a>#';
 $replace\[] = '[ref]' . str_replace( array("\\r\\n", "\\r", "\\n"), "", $footnote[3]) . '[/ref]';
}
?>

Finally, so that the entire replacement can be done in a single pass, push a final find/replace pair into the end of the array, to remove the original footnotes:

<?php
//remove all the original footnotes when done
$find[] = '#<div>\s*<a href\="\#_ftnref([0-9]+)">\[([0-9]+)\]</a> (.*)\s*</div>\s+#';
$replace[] = '';
?>

Because PHP’s preg_replace function can handle arrays, all we have to do is run a single function:

<?php
$content = preg_replace( $find, $replace, $content );
?>

Putting it all together, including a filter hook to call our function and a meta_value flag to prevent parsing on subsequent saves, the result is:

To use, you can download the plugin file1 and activate (be sure you already have [Simple Footnotes][2] installed). Copy the content from Word, and Paste into the “Paste from Word” box (may need to toggle the “Kitchen Sink”.2

Thoughts? Improvements? The above code solved a rather stubborn workflow problem in a project I was working on, and hopefully it can do the same for you. Feel free to use/improve the above code.

  1. Licensed under GPLv2 

  2. You can even Fork the plugin over on GitHub 

This content is open source.
Please help improve it.