[weboob] New way to execute javascript

Romain Bignon romain at symlink.me
Thu May 14 14:38:17 CEST 2015


Hello,

As Paypal has tried to add a new way to block scrappers like weboob, I've
changed my position about the execution of javascript snippets and added a way
to easily do it.

Actually, during login on Paypal, there is a transitional page where a received
token is processed by a javascript function to be sent back to server. Well,
this function changes on each call, randomly!

Here are some examples of the “convert” function found in the <script> tag:

  function convert (value) {var value_result = value + ':' + value;return value_result;}
  function convert (value) {var value_result = value.replace(/[a-zA-Z]/g, function(c){return Math.round(Math.sin(c.charCodeAt(0)) * 100);});return value_result;}
  function convert (value) {var value_result = btoa (value);return value_result;}

Instead of checking what kind of function it is, and to re-implement it in our
Python code, the better thing is to interpret the javascript code. It will
prevent to break the module if they add a new implementation of the function.

I've seen that Oleg calls in citibank the 'd8' executable provided by v8, the
javascript interpreter library of Chromium. But the main problem is that it
isn't present in the Debian package of libv8, neither in any other package.

On the other hand, there are other ways to execute javascript, like PyV8,
nodejs, phantomjs, etc.

I've found a little library called “PyExecJS”, which looks for the better js
interpreter found on the system, and wrap it to execute our code. The main
problem is that PyExecJS is in pip but not in Debian. By the way, it is enough
little to be copied directly in the weboob's sources if needed.

Here is, for example, the code from Paypal module to execute javascript:

  from weboob.tools.js import Javascrcipt

  code = ''.join(self.document.xpath('//script[contains(text(), "convert")]/text()'))
  code = re.sub('if \(autosubmit.*', '', code)
  js = Javascript(code)
  self.browser['ads_token_js'] = str(js.call('convert', self.browser['ads_token']))
  
  self.browser.submit(nologin=True)

Then, another big problem is that v8 and other interpreters don't build any DOM,
so “document” is undefined! That's why in my code I remove a part of the code
contained in the <script> tag which tries to interact with the DOM. By the way,
it is hopeful that convert() implementations do not use jquery or other library
to perform operations.

I tried to create a dummy 'document' object, but there is no equivalent of
__getattr__ in javascript, so it seems to be impossible.

It appears that our current solution is probably temporary, as if Paypal wants
to piss us off, they can do it easily.

Perhaps it will be necessary in future to use an alternative browser than ours
based on selenium/phantomjs/anything else, to execute all pages exactly like a
browser would do.

Romain
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <https://lists.symlink.me/pipermail/weboob/attachments/20150514/e8a83277/attachment.sig>


More information about the weboob mailing list