Hi all,
Has anyone attempted converting PyDoc to Docbook XML recently with success? I've tried a project called HappyDoc but it seems to be unmaintained and pretty poorly documented, though it claims on the project site to be able to generate DocBook XML.
I have also tried generating the PyDoc HTML with `pydoc -w <module>`, converting it to XHTML with tidy:
tidy -indent -m -asxhtml -clean -bare -omit ovirtsdk.infrastructure.brokers.html
...and then turning it into Docbook with an XSLT [1]:
java -cp "/usr/share/java/xalan-j2.jar" org.apache.xalan.xslt.Process -XSL html2db.xsl -IN <module>.html > <module>.xml
The resulting XML still has a *lot* of errors according to XMLLINT, even before applying the DTD rules. Anyone had any success with a variation of this or maybe a different method entirely?
Thanks,
Steve
On Thu, Jul 26, 2012 at 9:26 AM, Steve Gordon sgordon@redhat.com wrote:
Has anyone attempted converting PyDoc to Docbook XML recently with success? I've tried a project called HappyDoc but it seems to be unmaintained and pretty poorly documented, though it claims on the project site to be able to generate DocBook XML.
Sorry, I haven't tried this, as I haven't spent a lot of time in PyDoc.
I have also tried generating the PyDoc HTML with `pydoc -w <module>`, converting it to XHTML with tidy:
tidy -indent -m -asxhtml -clean -bare -omit ovirtsdk.infrastructure.brokers.html
...and then turning it into Docbook with an XSLT [1]:
This is almost guaranteed to cause heartburn, as the HTML and XHTML tend to tag items based on appearance (bold, italics, CSS styling) rather than their inherit types (section, paragraph, emphasis, program listing, and footnote just to name a few).
-- Jared Smith
On Thu, Jul 26, 2012 at 09:26:30AM -0400, Steve Gordon wrote:
Hi all,
Has anyone attempted converting PyDoc to Docbook XML recently with success? I've tried a project called HappyDoc but it seems to be unmaintained and pretty poorly documented, though it claims on the project site to be able to generate DocBook XML.
I have also tried generating the PyDoc HTML with `pydoc -w <module>`, converting it to XHTML with tidy:
tidy -indent -m -asxhtml -clean -bare -omit ovirtsdk.infrastructure.brokers.html
...and then turning it into Docbook with an XSLT [1]:
java -cp "/usr/share/java/xalan-j2.jar" org.apache.xalan.xslt.Process -XSL html2db.xsl -IN <module>.html > <module>.xml
The resulting XML still has a *lot* of errors according to XMLLINT, even before applying the DTD rules. Anyone had any success with a variation of this or maybe a different method entirely?
Do you need to be able to turn anything that pydoc displays into docbook or just certain things that you have written? Is this for an ongoing conversion or a one-off? pydoc itself is a somewhat naive displayer of text that's in docstrings. docstrings are just text formatted so that humans can look at it and assign meaning based on their past experience. So simply converting from pydoc's output to docbook isn't likely to work well.
There's two different approaches I can think of to take -- I haven't used either but they have a higher likelihood of working than raw pydoc:
* Happydoc is the only tool I've found that extracts python API documentation and spits out docbook. happydoc parses python source files to do this. This allows doing things like extracting both comments and docstrings but it also means it does not work with C extensions, only pure python code. I haven't played around with happydoc in years so I don't know how good it is at guesing what the semantic meaning of docstrings are in order to output nice docbook. http://happydoc.sourceforge.net/
* Sphinx is a tool that is widely used by python modules (the python stdlib uses docs handcoded in sphinx/restructuredtext. Other python modules use sphinx to extract documentation from docstrings). You write your docstrings in restructuredtext with many extensions to mark the semantic content of your document. Then the sphinx toolchain is used to convert that into an output format. There is not currently a builder for docbook so if you went this route, someone would have to write a builder that does that. The advantage that balances that is that sphinx is a semantic markup system so you have well-defined markup entities that you can convert from rather than guessing. http://sphinx.pocoo.org/contents.html
-Toshio
docs@lists.stg.fedoraproject.org