Here is my web-crawler with node.js
using cheerio
library:
var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');
var urls = [];
request('http://www.reddit.com', function(err, resp, body){
if(!err && resp.statusCode == 200){
var $ = cheerio.load(body);
$('a.title may-blank').each(function(){
var url = this.attr('href');
urls.push(url);
});
console.log(urls);
}
});
But when I run it I get the the following output:
[]
Instead of 25 links in the array.
What have I done wrong?
How can I fix that?
I'm guessing may-blank
is a class, so you need a .
in front of it:
$('a.title .may-blank').each(...
// Here ---^
...although at present, a.title .may-blank
doesn't match any elements on the reddit front page for me; there are no .may-blank
elements that are descendants of a.title
.
If you wanted a
elements that have both the class title
and has the class may-blank
, remove the space before .may-blank
; for me there are currently 36 of those:
$('a.title.may-blank').each(...
// ^-- no space
Or just .may-blank
matches 167.
There is a little tpyo if I am not mistaking, the tag selector should be 'title may-blank ', notice the space after the blank, or you should change the selector to '^⁼' starts with to be more forgiven, hope that will help.