Node.js Crawler Error

I'm on Mac OSX and I've been trying to use Node.js with Crawler. Now i've just installed following steps as it instructed:

  • git clone git://github.com/ry/node.git
  • cd node
  • ./configure
  • make
  • sudo make install
  • curl http://npmjs.org/install.sh | sh
  • npm install crawler

As soon as i've installed the last one (Crawler), when i test run the test/simple.js as in its sample, i'm getting following errors:

The "sys" module is now called "util". It should have a similar interface.
http://jamendo.com/
http://tedxparis.com

/crawler/node_modules/crawler/lib/crawler.js:74
                        response.body = body;
                                      ^
TypeError: Cannot set property 'body' of undefined
    at Object.callback (/crawler/node_modules/crawler/lib/crawler.js:74:39)
    at Request._callback (/crawler/node_modules/crawler/lib/crawler.js:70:43)
    at /crawler/node_modules/crawler/node_modules/request/main.js:119:22
    at Request.<anonymous> (native)
    at Request.emit (events.js:67:17)
    at Object._onTimeout (/crawler/node_modules/crawler/node_modules/request/main.js:532:12)
    at Timer.ontimeout (timers.js:84:39)

This means the Crawler doesn't work yet. How can i fix it?

You have a few options:

  • Try a newer version of Crawler
  • Use an older version of Node
  • Use a different module (recommended, as Crawler is very out of date)
  • Fix Crawler yourself (and submit your patches!, but it seems nobody is maintaining this project anymore)

If this is just a one-time error, you can wrap the whole thing in a try/catch and handle it as needed.

I made couple fixes for you, let me know your feedback. https://github.com/eugenehp/node-crawler

You may also be interested in trying my Node Crawler (https://github.com/ecdeveloper/node-web-crawler). It's not a module but an independent web app. It uses Mongo, Express, Socket.io, Twitter bootstrap.