[weboob] What library uses to parse HTML pages

Laurent Bachelier laurent at bachelier.name
Wed Mar 31 18:16:56 CEST 2010

Since weboob requires an Internet connection anyway, why not use
weboob on another machine, over SSH, from the n900?

On Wed, Mar 31, 2010 at 18:07, Romain Bignon <romain at peerfuse.org> wrote:
> Hi,
> Historically, the 'AuM' backend uses html5lib to parse HTML pages.
> This library has serious performance problems, and another issue is that it is
> not packaged on every systems (for example, juke tells me that it is not on the
> N900 Nokia cell phone.
> I propose to use instead the xml.dom.minidom, a light implementation of DOM.
> This is a standard library, so probably with high-performances, probably more
> supported, available on every systems with python.
> The only eventual problem is: how is it tolerant to bad-HTML?
> So I'll try to do some test to know if this is a good solution. If you have
> other ideas, don't hesitate.
> Romain
> _______________________________________________
> weboob mailing list
> weboob at lists.symlink.me
> http://lists.symlink.me/mailman/listinfo/weboob

More information about the weboob mailing list