XML handling in Python - SAX, DOM and XSLT examples
Archive - Originally posted on "The Horse's Mouth" - 2010-12-09 17:11:59 - Graham EllisXML is the "eXtensible Markup Language" ... a set of rules to which a language must adhere, rather than a complete language definition - you need to add other elements such as a DTD or Schema to complete the definition of a language that conforms to an XML standard.
How do you process XML, then?
There are three common ways.
a) You can use a SAX parser. SAX is the Simple API for XML. With a SAX parser, you pass the data through a handler which extracts pertinent information as it's passed through - so this is ideally suited to extracting a few very specific bits of information from what is potentially a huge data flow.
b) You can use DOM (the Domain Object Model). Here, you read data into a structure in memory and can process it within that structure. Because XML tags can be nested, you'll end up with nested structures in memory, with the result that you'll probably find yourself writing recursive code
c) XSLT - X Stylesheet Language Transforms - are a way of defining how a file of XSLT is transformed into some other format. XSLT is a programming language itself.
Can we use SAX, DOM, or XSLT in Python?
Yes - you can use any of them. There are many classes supplied with Python, and others which are easy to download. I've uploaded some recent demonstrations - firstly, there's some XML that I've used [here] ... with sample processing of that in Python via SAX [here] and in Python via DOM [here].
There's an example (using different data) with XSLT [here] - and you'll note that the XSLT example is much shorter. That's because a lot of the work has been transferred to the XSLT code (see [here]). The XML for that last example is also available - [here].