On Fri, 2004-02-27 at 11:15, Dave Pawson wrote:
My only other suggestion is to chunk the source into tiny bits, then use a plain text to xml program, and chunk the big bits via some other progam into article or somesuch.
My I suggest html2db:
http://www.cise.ufl.edu/~ppadala/tidy/
It does a very nice, compliant conversion. Converting from HTML, it can't know much more than to turn <pre /> into <literal />, but that kind of thing is easy to fix. I've converted multi-page HTML into DocBook *ML in just a few hours with a simple convert and edit. The structure you get in the end is not the point, it's the chunks of markup which can then be put into a DocBook template.
Of course, it's nice to have an editor that can do e.g. sgml-tag-region and tag creation/completion for manually marking up missed or incorrect bits.
- Karsten