[weboob] What library uses to parse HTML pages

Laurent Bachelier laurent at bachelier.name
Wed Mar 31 18:16:56 CEST 2010


Since weboob requires an Internet connection anyway, why not use
weboob on another machine, over SSH, from the n900?

On Wed, Mar 31, 2010 at 18:07, Romain Bignon <romain at peerfuse.org> wrote:
> Hi,
>
> Historically, the 'AuM' backend uses html5lib to parse HTML pages.
>
> This library has serious performance problems, and another issue is that it is
> not packaged on every systems (for example, juke tells me that it is not on the
> N900 Nokia cell phone.
>
> I propose to use instead the xml.dom.minidom, a light implementation of DOM.
> This is a standard library, so probably with high-performances, probably more
> supported, available on every systems with python.
>
> The only eventual problem is: how is it tolerant to bad-HTML?
>
> So I'll try to do some test to know if this is a good solution. If you have
> other ideas, don't hesitate.
>
> Romain
>
> _______________________________________________
> weboob mailing list
> weboob at lists.symlink.me
> http://lists.symlink.me/mailman/listinfo/weboob
>
>


More information about the weboob mailing list