Regular Expression to Parse Word-style Footnotes into WordPress’s Simple Footnotes Format

I needed a quick-and-easy way to parse Microsoft Word’s footnote format into a more web-friendly format for a recent project. After a bit of regular expression hacking, I was able to build a WordPress plugin to automatically convert content pasted from Word into a format readable by Andrew Nacin’s popular Simple Footnotes plugin.

The process is surprisingly simple given WordPress’s extensive filter API. First, to grab the footnotes from Word’s ftnref format:

<?php

//grab all the Word-style footnotes into an array
$pattern = '#<a href\="\#_ftnref([0-9]+)">\[([0-9]+)\]</a> (.*)#';
preg_match_all( $pattern, $content, $footnotes, PREG_SET_ORDER);

?>

This creates an array ($footnotes) with the both the footnote number and the text of the footnote. We then need a way to replace the in-text reference with the parsed footnotes so that Simple Footnotes can understand them. I did this by creating two arrays, a find array and a replace array with each Word-style footnote reference and its Simple Footnote formatted counterpart:

<?php

//build find and replace arrays
foreach ($footnotes as $footnote) {
  $find[] = '#<a href\="\#_ftn'.$footnote[1].'">\['.$footnote[1].'\]</a>#';
  $replace[] = '[ref]' . str_replace( array("\r\n", "\r", "\n"), "",   $footnote[3]) . '[/ref]';
}

?>

Finally, so that the entire replacement can be done in a single pass, push a final find/replace pair into the end of the array, to remove the original footnotes:

<?php

    //remove all the original footnotes when done
    $find[] = '#<div>\s*<a href\="\#_ftnref([0-9]+)">\[([0-9]+)\]</a> (.*)\s*</div>\s+#';
    $replace[] = '';

?>

Because PHP’s preg_replace function can handle arrays, all we have to do is run a single function:

<?php

$content = preg_replace( $find, $replace, $content );

?>

Putting it all together, including a filter hook to call our function and a meta_value flag to prevent parsing on subsequent saves, the result is:

To use, you can download the plugin file1 and activate (be sure you already have [Simple Footnotes][2] installed). Copy the content from Word, and Paste into the “Paste from Word” box (may need to toggle the “Kitchen Sink”.2

Thoughts? Improvements? The above code solved a rather stubborn workflow problem in a project I was working on, and hopefully it can do the same for you. Feel free to use/improve the above code.

  1. Licensed under GPLv2

  2. You can even Fork the plugin over on Github

This content is open source.
Please help improve it.