I have to process a large XML file (around 25 mb in size), and organize the data into documents to import into MongoDB.
The issue is, there are around 5-6 types of elements in the xml document, each with around 10k rows.
After fetching one xml node of type a, I have to fetch it's corresponding elements of types b,c,d, etc.
What I am trying to do in node:
If there are 10k rows of type a, the 2nd step runs 10k times. I am trying to get this to run in parallel so that the thing doesn't take forever. Hence, async.forEach seemed to be the perfect solution.
async.forEach(rowsA,fetchA);
My fetchrelations function is sort of like this
var fetchA = function(rowA) {
//covert the xml row into an object
var obj = {};
for(i in rowA.attributes) {
attribute = rowA.attributes[i];
if(attribute.value === undefined)
continue;
obj[attribute.name] = attribute.value;
}
console.log(obj.someattribute);
//first other related rows,
//callback inserts the modified object with the subdocuments
findRelations(obj,function(obj){
insertA(obj,postInsert);
});
};
After I try to run this, the console.log in the code only runs about once in every 1.5 seconds, not parallely for every row as I expected. I have been scratching my head and trying to figure this out for the past two hours, but I am not sure what I am doing wrong.
I am not very adept with node, so please be patient.
It looks to me like you're not declaring and calling the callback function which async will pass to your iterator function (fetchA). See the forEach documentation for an example.
Your code probably needs to look more like...
var fetchA = function(rowA, cb) {
//covert the xml row into an object
var obj = {};
for(i in rowA.attributes) {
attribute = rowA.attributes[i];
if(attribute.value === undefined)
cb();
obj[attribute.name] = attribute.value;
}
console.log(obj.someattribute);
//first other related rows,
//callback inserts the modified object with the subdocuments
findRelations(obj,function(obj){
insertA(obj,postInsert);
cb(); // You may even need to call this within insertA or portInsert if those are asynchronous functions.
});
};