[weboob] What library uses to parse HTML pages

Christophe Benz christophe.benz at gmail.com
Wed Mar 31 19:00:42 CEST 2010


I think that the most appropriated behavior is to use the standard
Python parser, and if and only if it does not work with the crappy
HTML, switch to html5lib.

But I think that xml.dom.minidom is quite old, and elementtree is the
new interface for parsing DOM in Python.

Le Wed, 31 Mar 2010 18:57:14 +0200,
Romain Bignon <romain at peerfuse.org> a écrit :

> On 31/Mar - 18:54, Laurent Bachelier wrote:
> > Not that I don't like Maemo or the n900 (thank god they exist), but
> > if not using html5lib makes development much harder, I would be
> > against it. Is it really slower? Since the n900 runs Firefox I
> > would be a bit surprised.
> 
> This is html5lib which is slower. You know it well as I used it for
> AuM. Now, perhaps xml.dom.minidom isn't really more efficient, but it
> is a godd thing to test.
> 
> An other important thing is that xml.dom.minidom and html5lib have
> approximately the same API, as they both implement DOM.


-- 
Christophe Benz
http://cbenz.pointique.org


More information about the weboob mailing list