<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Bandos&#039; Arcade &#187; xml</title>
	<atom:link href="http://www.nuwanbando.com/tag/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nuwanbando.com</link>
	<description>&#34;It&#039;s not about how it is, but how I see it &#34; - Stranger Than Fiction</description>
	<lastBuildDate>Thu, 02 Feb 2012 08:52:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Convert from HTML to XML with HTML Tidy</title>
		<link>http://www.nuwanbando.com/2009/09/convert-from-html-to-xml-with-html-tidy/</link>
		<comments>http://www.nuwanbando.com/2009/09/convert-from-html-to-xml-with-html-tidy/#comments</comments>
		<pubDate>Wed, 16 Sep 2009 06:37:18 +0000</pubDate>
		<dc:creator>Nuwan Bandara</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Maven]]></category>
		<category><![CDATA[Tidy]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.nuwanbando.com/?p=199</guid>
		<description><![CDATA[For few days I was involved with WSO2 Mashup Server 2.0 release documentation, giving a hand to the mashup team. Documentation is a painful task, but when comes to open source what matters mostly is documentation . Last night I had to convert a bunch of html files (some Java Api Docs) to xml in-order [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.nuwanbando.com%2F2009%2F09%2Fconvert-from-html-to-xml-with-html-tidy%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.nuwanbando.com%2F2009%2F09%2Fconvert-from-html-to-xml-with-html-tidy%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>For few days I was involved with <a href="http://wso2.org/projects/mashup">WSO2 Mashup Server</a> 2.0 release documentation, giving a hand to the mashup team. Documentation is a painful task, but when comes to open source what matters mostly is documentation <img src='http://www.nuwanbando.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> .<br />
Last night I had to convert a bunch of html files (some Java Api Docs) to xml in-order to port into maven site. Formatting 30+ html files to xml !@#$%^&amp;*@% <img src='http://www.nuwanbando.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> . So I was googleing for a tool to automate the task. With few clicks here and there I found a nice article in <a href="http://www.ibm.com">Big Blue</a>&#8216;s developer works site, a tool called &#8220;<a href="http://tidy.sourceforge.net/">Tidy</a>&#8220;. When I tried to download and use I figure out that you can straight away apt-get the package and use. So,</p>
<pre name="code" class="xml">sudo apt-get install tidy</pre>
<p>and your box is now equiped with the tool, and can be accessed via the shell.</p>
<pre name="code" class="xml">tidy -asxhtml -numeric < index.html > index.xml</pre>
<p>but who wants to convert file by file when you have such a nice tool, so I spent few minutes in writing a tiny shell script to get the job done, the snippet is, </p>
<pre name="code" class="xml">
#!/bin/bash
for file in $(find $1 -type f -iname '*.html'); do
	myf=`echo $file | sed 's/html/xml/g'`
	tidy -asxhtml -numeric < $file > $myf
done
</pre>
<p>All looked good, worked fine. However in my Api Docs I had, had few special tags, custom to our Mashup Apis (&lt;imconfig&gt;, &lt;yahoo&gt;, &lt;mail:config&gt;). Tidy gave error for these files since the tags are not recognized. </p>
<p>In such a case you can train Tidy for new tags, by adding few lines to the tidy configuration file. (/etc/tidy.config &#8211; You can also give your own config file at the prompt)</p>
<pre name="code" class="xml">new-pre-tags: imconfig, yahoo, msn, aim, icq, jabber, username, password</pre>
<p>There are whole bunch of tweeks you can do with tidy, [<a href="http://www.ibm.com/developerworks/library/x-tiptidy.html">1</a>], [<a href="http://tidy.sourceforge.net/">2</a>] and [<a href="http://tidy.sourceforge.net/docs/tidy_man.html">3</a>] are some useful links that you can read up when using the tool.</p>
<p>[1] : <a href="http://www.ibm.com/developerworks/library/x-tiptidy.html">http://www.ibm.com/developerworks/library/x-tiptidy.html</a><br />
[2] : <a href="http://tidy.sourceforge.net/">http://tidy.sourceforge.net/</a><br />
[3] : <a href="http://tidy.sourceforge.net/docs/tidy_man.html">http://tidy.sourceforge.net/docs/tidy_man.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nuwanbando.com/2009/09/convert-from-html-to-xml-with-html-tidy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

