I am parsing in about 4000 URLs with a generic Node.js HTTP request script:
(function (i){
http.get(options, function(res) {
var obj = {};
obj.url = hostNames[i];
obj.statusCode = res.statusCode;
obj.headers = res.headers;
db.scrape.save(obj);
}).on('error',function(e){
console.log("Error: " + hostNames[i] + "\n" + e.stack);
})
})(i);
Around 1300 URLs in, I get this error, which stops the entire script. I don't know what page.ly is, as I do not have that in my list of URLs. I've done a lot of research, but I could not pin-point what's causing this error.
If someone is familiar with HTTP requests on Node.js - could you help me out?
Error: key page.ly must not contain '.'
at Error (unknown source)
at Function.checkKey (/Users/loop/node_modules/mongojs/node_modules/mongodb/node_modules/bson/lib/bson/bson.js:1421:11)
at serializeObject (/Users/loop/node_modules/mongojs/node_modules/mongodb/node_modules/bson/lib/bson/bson.js:355:14)
at packElement (/Users/loop/node_modules/mongojs/node_modules/mongodb/node_modules/bson/lib/bson/bson.js:854:23)
at serializeObject (/Users/loop/node_modules/mongojs/node_modules/mongodb/node_modules/bson/lib/bson/bson.js:359:15)
at Function.serializeWithBufferAndIndex (/Users/loop/node_modules/mongojs/node_modules/mongodb/node_modules/bson/lib/bson/bson.js:332:10)
at BSON.serializeWithBufferAndIndex (/Users/loop/node_modules/mongojs/node_modules/mongodb/node_modules/bson/lib/bson/bson.js:1502:15)
at InsertCommand.toBinary (/Users/loop/node_modules/mongojs/node_modules/mongodb/lib/mongodb/commands/insert_command.js:132:37)
at Connection.write (/Users/loop/node_modules/mongojs/node_modules/mongodb/lib/mongodb/connection/connection.js:198:35)
at __executeInsertCommand (/Users/loop/node_modules/mongojs/node_modules/mongodb/lib/mongodb/db.js:1745:14)
at Db._executeInsertCommand (/Users/loop/node_modules/mongojs/node_modules/mongodb/lib/mongodb/db.js:1801:5)
Loops-MacBook-Air:JS loop$
What could prevent this? It seems my script does not scale very well.
EDIT: From the answers I am getting - there exists a key somewhere that has a ".", which isn't allowed in MongoDB, and I am supposed to escape it. But the question remains - if my keys are only url, statusCode, and headers, what is causing the key with a . in it to show up?
EDIT: Bug is found. Answer below.
This error is caused when you attempt persist an Object in MongoDB and one (or more) of the keys contain the character '.', e.g:
{
"name": "bob",
"url": "http://example.com",
"some.field": "value"
}
would raise the error Error: key some.field must not contain '.'.
Scrub your object keys of '.'s before saving to MongoDB!
The site "divensurf.com" has a header which is called page.ly: v4.0
I have no idea what the is, but that broke my import into MongoDB, since keys cannot symbols. I found the culprit by printing the output onto a .txt file, did a search on the header page.ly, found the site, and deleted it.
I will be sanitizing the headers before importing.
Thanks for the help guys.
HTTP/1.1 304 Not Modified
X-Varnish: 2236761436 2236710300
Vary: Accept-Encoding,Cookie,X-UA-Device
Cache-Control: max-age=7200, must-revalidate
X-Cache: V1HIT 5
Content-Type: text/html; charset=UTF-8
Page.ly: v4.0
Content-Encoding: gzip
X-Pingback: http://divensurf.com/xmlrpc.php
Date: Thu, 21 Mar 2013 19:45:35 GMT
Accept-Ranges: bytes
Via: 1.1 varnish
Connection: keep-alive
Last-Modified: Thu, 21 Mar 2013 19:40:57 GMT
Age: 278