>WikiText and DocBook almost valid/well-formed

>If most of your information is going to be output into HTML and you like the default format that the Mylyn WikiText produces, then you do not need to read any further. However if you are a big fan of single sourcing, and want to produce your output to DocBook, then keep reading as there are some gotchas (yes, I opened a bug with the big one).

There is already a good example of the docbook generation, so I won’t document how to do that. However there are several problems:

1. If your MediaWiki page uses code or pre tags, you can run into problems. WikiText will generate invalid non-well formed XML.

2. Items like <strike> will be passed through, these aren’t valid docbook terms. They need to be wrapped in with the appropriate role attribute specified.

I’ve opened bug 296705 to help address the first one.

Now for some possible work arounds to the issues:

Using Ant one can use the replace function to search for the <pre> and </pre> tags and just replace them. WikiText outputs wraps the text in LiteralLayout elements.

The same technique can be used to change strike to emphasis.

Wikitext is like any automated convertor I have used, it gets the job done, but the output it produces doesn’t win a beauty contest. In many cases it puts to many tags in when it shouldn’t. i.e. code samples will get extra LiteralLayouts when one literallayout or programmlisting block would do nicely.


<literallayout>/**
* First load and optionally validate the XML document
*/
</literallayout>
<literallayout>// Create an InputStream from the XML document
InputStream is = new FileInputStream(“XPexample.xml”);
</literallayout>
<literallayout>// Initializing the Xerces DOM loader.
DOMLoader loader = new XercesLoader();
</literallayout>
<literallayout>// Optionally set flag to validate XML document loader.setvalidating(validate);
// Loads the XML document and stores the DOM root
Document doc = loader.load(is);
</literallayout>

This ideally would be:


<literallayout>/**
* First load and optionally validate the XML document
*/

// Create an InputStream from the XML document
InputStream is = new FileInputStream(“XPexample.xml”);

// Initializing the Xerces DOM loader.
DOMLoader loader = new XercesLoader();

// Optionally set flag to validate XML document loader.setvalidating(validate);
// Loads the XML document and stores the DOM root
Document doc = loader.load(is);
<literallayout>

People tend to complain about how wordy XML can be, but automated conversions don’t help things when they aren’t optimized to generate the best output possible. With this said, various conversion programs from wiki markup to docbook all run into issues. None are perfect and there has to be manual cleanup done.

Advertisements
This entry was posted in docbook, documentation, eclipse. Bookmark the permalink.

3 Responses to >WikiText and DocBook almost valid/well-formed

  1. David Green says:

    >Great to see efforts towards single-sourcing documentation via Eclipsepedia!In MediaWiki, lines that start with the space character are converted to preformatted text. Mylyn WikiText doesn't yet have a rule to detect <pre> tags. Your code sample is split into multiple literallayout tags as a result of leading spaces and this missing rule. With a block rule to detect <pre> added to Mylyn WikiText, this problem would go away.As with all projects, the quality of wiki markup language implementations in Mylyn WikiText is a direct result of the level of community contribution. MediaWiki language parsing has a few rough edges around embedded HTML tags in markup. Hopefully renewed interest in documentation at Eclipse will drive some improvements into Mylyn WikiText in this area.

  2. David Carver says:

    >@David Since I have the source code for the standalone version I plan to submit some patches to the bug I mentioned. Will attach something to the bug once I have it completed.

  3. David Green says:

    >Fantastic! I'll see you on the bug.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s