Saturday, July 2, 2011

mediawiki - Migrating away from vqwiki

Some time ago we had to migrate all pages of our Wiki, these few lines describe my little adventure.

For historical reasons we adopted VQWiki. Well, I must admit that for a long time not only it met our needs, but even brought a new way to write documentation, whereas official instrument was ms sharepoint. All right then, but at some point, however, we felt the need to change and the reasons that brought us up this direction were many:
  • VQWiki has become a project no longer maintained (last product update was on 03/2009)
  • has some bugs that were becoming really boring. For example, thus tables has no borders, you had to add an HTML style tag directly in page, in order to redefine table css class.
  • VQWiki housed much of our (precious) documentation.
We need a more flexible product, which gives us security and continuity. So with last project startup, I decided to migrate to a new product. Among many Wiki available on Internet my choice fell on MediaWiki , then in version 1.16.5.

Note that in our case VQWiki saved all pages as text files. So this post cannot help you much if your VQWiki use a database to persist pages. Whereas explains how to manipulate files and turn them into a format compatible with import utility provided by MediaWiki.

Let's finally be comfortable starting our 15 easily steps of VQWiki migration. Please download scripts here.

I assume you have already successfully installed and configured MediaWiki.
You must copy scripts in MediaWiki maintenance directory (first level inside docroot). For sake of simplicity this directory could be:

cd /var/www/mediawiki/maintenance

Scripts to be copied are:

mediawiki_import.sh - a bash script that takes care of iterating over all files (old wiki'w pages) for each running patch_vqwiki.pl

patch_vqwiki.pl -  Useful to convert VQWiki format tags. Just to be clear, I'm not a Perl guru, i.e. probably this script could be done better. Anyway, it seemed the right language to run a pattern matching and replacing task on a large number of files. Also it was an opportunity to learn Perl.
This code that takes care of modifying pages' structure. Takes as input a file format VQWiki and generates a new file in a format that it 'can then be processed importTextFile.php (utility directly available in MediaWiki).
We have many tags to be converted, most important are those relating to formatting (i.e. bold, headings, tables, etc..), attachments and images (i.e. img, doc, pdf, ppt, etc.)..

At end of text pages conversion process, we must run utility importImages.php for all images and attachments.
Don't let name importImages.php fool you, this utility analyze and load any type of file, you must simply specify the extension. In my case I needed "only" png, jpg, jpeg, gif, zip, doc, pdf, rtf, zip, xls, xlsx, xml, ppt, docx ...
Obviously, if extension you need is not listed in this list, just add a line to script by specifying correct parameter --extensions