>If most of your information is going to be output into HTML and you like the default format that the Mylyn WikiText produces, then you do not need to read any further. However if you are a big fan of single sourcing, and want to produce your output to DocBook, then keep reading as there are some gotchas (yes, I opened a bug with the big one).
There is already a good example of the docbook generation, so I won’t document how to do that. However there are several problems:
1. If your MediaWiki page uses code or pre tags, you can run into problems. WikiText will generate invalid non-well formed XML.
2. Items like <strike> will be passed through, these aren’t valid docbook terms. They need to be wrapped in with the appropriate role attribute specified.
I’ve opened bug 296705 to help address the first one.
Now for some possible work arounds to the issues:
Using Ant one can use the replace function to search for the <pre> and </pre> tags and just replace them. WikiText outputs wraps the text in LiteralLayout elements.
The same technique can be used to change strike to emphasis.
Wikitext is like any automated convertor I have used, it gets the job done, but the output it produces doesn’t win a beauty contest. In many cases it puts to many tags in when it shouldn’t. i.e. code samples will get extra LiteralLayouts when one literallayout or programmlisting block would do nicely.
<literallayout>/**
* First load and optionally validate the XML document
*/
</literallayout>
<literallayout>// Create an InputStream from the XML document
InputStream is = new FileInputStream(“XPexample.xml”);
</literallayout>
<literallayout>// Initializing the Xerces DOM loader.
DOMLoader loader = new XercesLoader();
</literallayout>
<literallayout>// Optionally set flag to validate XML document loader.setvalidating(validate);
// Loads the XML document and stores the DOM root
Document doc = loader.load(is);
</literallayout>
This ideally would be:
<literallayout>/**
* First load and optionally validate the XML document
*/
// Create an InputStream from the XML document
InputStream is = new FileInputStream(“XPexample.xml”);
// Initializing the Xerces DOM loader.
DOMLoader loader = new XercesLoader();
// Optionally set flag to validate XML document loader.setvalidating(validate);
// Loads the XML document and stores the DOM root
Document doc = loader.load(is);
<literallayout>
People tend to complain about how wordy XML can be, but automated conversions don’t help things when they aren’t optimized to generate the best output possible. With this said, various conversion programs from wiki markup to docbook all run into issues. None are perfect and there has to be manual cleanup done.
>Great to see efforts towards single-sourcing documentation via Eclipsepedia!In MediaWiki, lines that start with the space character are converted to preformatted text. Mylyn WikiText doesn't yet have a rule to detect <pre> tags. Your code sample is split into multiple literallayout tags as a result of leading spaces and this missing rule. With a block rule to detect <pre> added to Mylyn WikiText, this problem would go away.As with all projects, the quality of wiki markup language implementations in Mylyn WikiText is a direct result of the level of community contribution. MediaWiki language parsing has a few rough edges around embedded HTML tags in markup. Hopefully renewed interest in documentation at Eclipse will drive some improvements into Mylyn WikiText in this area.
>@David Since I have the source code for the standalone version I plan to submit some patches to the bug I mentioned. Will attach something to the bug once I have it completed.
>Fantastic! I'll see you on the bug.