SOS

Shane O'Sullivan's technical blog… really ties the room together

Dojo & GreaseMonkey == DaftMonkey

Posted by Shane O'Sullivan on January 26, 2009

Given that I’m looking for a new apartment, and I live in Ireland, I use the property search website Daft.ie.  Everyone does.  However I wasn’t very happy with how slow it is to scan through the many results that match my meagre budget.  I realised that it could be readily fixed with GreaseMonkey, using the Dojo Ajax Toolkit to make life easier when it comes to parsing the page, adding effects etc.

The result is DaftMonkey.

I wasn’t even sure if Dojo could be used from within a GreaseMonkey script, as it sandboxes away the custom script code.  However, with a little hackery it was (more or less) possible.  The steps I took were:

  1. Set up the djConfig parameter in the host window to tell Dojo that the page had already loaded, using unsafeWindow.djConfig = {afterOnLoad: true};. unsafeWindow is what GreaseMonkey calls the normal, non-sandboxed window.
  2. Added the <script> tag for dojo.js to the head of the document.  In this case I used the dojo.js.file hosted on AOL’s CDN servers – see http://dev.aol.com/dojo .
  3. Now you have to wait for Dojo to load.  This can be done with a simple setInterval function call, checking if unsafeWindow.dojo exists or not.  (Update: thanks a comment from James, this has been changed to use the djConfig.addOnLoad function)
  4. Once Dojo is loaded, you can call a function kicking off whatever it is that you  script is supposed to do.  In this case, I wanted to add a bunch of DOM nodes to the page (which you can do without Dojo), and add some cool effects, so I also included the dojo.fx bundle.
  5. Copy the dojo variable back into the sandbox window using var dojo = unsafeWindow.dojo, otherwise you’ll have to refer to it as unsafeWindow.dojo all the time.

Screen Scraping With dojo.query

A lot of the features of DaftMonkey rely on asynchronously fetching remote HTML pages and scraping the required data from them.  The approach I used for this was:

  1. Perform a remote request using GreaseMonkey’s native Ajax function GM_xmlhttpRequest.  This works more or less the same as dojo.xhrGet, and I saw no reason to not use it.
  2. When the text is returned, create a DIV, and absolutely position it far to the left.  Fix it’s size to just one pixel so it doesn’t mess with the scroll bars.
  3. Set the innerHTML of the DIV to the text you have retrieved.  Congratulations, you can now use dojo.query to find whatever nodes you need.  e.g. to find all images inside anchor tags, use dojo.query(“a img”, tempDiv).  Note the second parameter, this tells Dojo to only search inside the temporary DIV we created, and not the whole document.

Some other site-specific things were required as part of the screen scraping process.  Many of the sites had iframes included, and as soon as you add those to the temporary DIV, they start loading another page.  This was a nasty performance hit, so I had to remove them from the HTML string before setting the innerHTML of the temporary DIV.

Problems

One problem I found is that calling dojo.declare didn’t work from inside a GreaseMonkey script.  I don’t know why.  Therefore widgets had to be defined the old fashioned way.

A second problem was more related to the website I was writing the script for, Daft.ie.  The entire site is programmed using TABLES!  Seriously, there’s barely one or two DIVs on the page, with practically no CSS either.  This makes it quite difficult and brittle to screen scrape using dojo.query, as there’s really no classes to match.  Still it was possible, but could break relatively easily if the site layout is changed.

Get the Source

You can get the entire source for the script at http://userscripts.org/scripts/show/41105 .

To read a bit more about DaftMonkey, I’ve put up a page about it at http://www.chofter.com/apps?n=daftmonkey .

About these ads

8 Responses to “Dojo & GreaseMonkey == DaftMonkey”

  1. As of Dojo 1.2, you can specify a djConfig.addOnLoad = function(){} and it will execute that function once dojo loads, even in the djConfig.afterOnLoad case.

    This might be a way to avoid an interval check to see when Dojo is loaded, and it might fix the dojo.declare issue, doing the dojo.declare inside the djConfig.addOnLoad function? Maybe there are weird things with the window and unsafeWindow scoping that cause an issue for the dojo.declare thing.

    If the window vs unsafeWindow thing is not an issue, you may be able to do:

    djConfig.addOnLoad = function(){ dojo.require(“…”); dojo.addOnLoad(function(){ /*stuff that uses the dojo.required module*/ }); }

    if you want to bring in more modules than just dojo base.

  2. Very cool, I didn’t know that – I was wondering how I could get a notification that Dojo had loaded.

    I was doing the dojo.declare after dojo had finished loading and it didn’t work, so I think it has something to do with GreaseMonkey’s sandboxing. dojo.require works just fine.

    Thanks for the tip.

  3. [...] more inject-dojo fun, check out Shane O’Sullivan’s DaftMonkey – a dojo/greasemonkey [...]

  4. [...] Dojo & GreaseMonkey == DaftMonkey [...]

  5. wei said

    It’s not that dojo.declare doesn’t work. It’s just greasemonkey doesn’t allow you create an object of the class defined in the remote script inside your own script. What I did is using eval(“var a= new xxxx()”)to create an object in the global context, then import it back to my own script. I’m wondering whether there is a better workaround for this.

  6. Hi Wei,

    Maybe doing something like including a separate remotely hosted script in the head that declares these for you would work….. rather than declaring them in the GreaseMonkey script.

    Shane

  7. timdp said

    Very nice Shane, have you ever used Dojo within a Firefox extension you’ve developed yourself? I’m looking to do just that and any tips would be greatly appreciated.

  8. David Rees said

    @Timdp – I just started messing with this also. See the post on the forum at http://dojo-toolkit.33424.n3.nabble.com/Using-dojo-in-a-Firefox-extension-td706410.html. It’s actually even a little simpler than that, really all you need to do is set your baseUrl correctly and it seems to work just fine.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 535 other followers

%d bloggers like this: