Here is my web-crawler with node.js using cheerio library:
var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');
var urls = [];
request('http://www.reddit.com', function(err, resp, body){
if(!err && resp.statusCode == 200){
var $ = cheerio.load(body);
$('a.title may-blank').each(function(){
var url = this.attr('href');
urls.push(url);
});
console.log(urls);
}
});
But when I run it I get the the following output:
[]
Instead of 25 links in the array.
What have I done wrong?
How can I fix that?
I'm guessing may-blank is a class, so you need a . in front of it:
$('a.title .may-blank').each(...
// Here ---^
...although at present, a.title .may-blank doesn't match any elements on the reddit front page for me; there are no .may-blank elements that are descendants of a.title.
If you wanted a elements that have both the class title and has the class may-blank, remove the space before .may-blank; for me there are currently 36 of those:
$('a.title.may-blank').each(...
// ^-- no space
Or just .may-blank matches 167.
There is a little tpyo if I am not mistaking, the tag selector should be 'title may-blank ', notice the space after the blank, or you should change the selector to '^⁼' starts with to be more forgiven, hope that will help.