I seek a little nudge in the right direction to understand Node workers. I currently have Node code that reads data from a file and performs a bunch of subsequent actions with network requests. All of the actions I do with the data currently take place in the callback of the read function.
What I struggle to wrap my head around is how best to take this single read function (which almost certainly is not slowing my application down -- I'm fairly certain it's the later requests I'd like to branch), and divide the manipulation into multiple child processes. Of course, I don't want to perform my battery of actions multiple times on the same row of data, but rather I want to give each worker a slice of the pie. Is my best bet to, in the read-callback, create several arrays with part of the data, and then feed one array into each worker, outside the callback? Are there other options? My end goal is to reduce the time it takes my script to run through x amount of data.
var request = require('request');
var request = request.defaults({
jar: true
})
var yacsv = require('ya-csv');
// Post Log-In Form Information to Appropriate URL -- Occurs only once per script-run -- Log in cookies saved for subsequent requests
request.post({
url: 'xxxxx.com',
body: "login_info",
// On Reponse...
}, function(error, res, body) {
// Instantiate CSV Reader
var reader = yacsv.createCsvFileReader("somefile.csv");
// Read Data from CSV, Row by Row -- Function happens once per CSV-row
// THIS IS WHAT I -THINK- I CAN SPLIT AMONG MULTIPLE WORKERS
var readData = reader.addListener('data', function(data) {
// Bind each field from a CSV row to a corresponding variable for ease of use
//[Variables here]
// Second Request for Search Form -- Uses information from a single row to query more information from a database
request.post({
url: 'xxxxx.com/form',
body: variable_with_csv_data,
}, function(error, res, body) {
// Parse the resulting page, then page elements to variables for ease of output
}
});
});
});
The cluster module is not an altenative to threads. The cluster module allows you to balance http requests to the same application logic over multiple processes
, without the option of delegating responsibility.
What is it exactly that you are trying to optimize ?
Is the overall process taking to long ?
Is the separate processing of the data events to slow ?
Are your database calls to slow ?
Are the http requests to slow ?
Also, I would do away with the ya-csv
module it seems somewhat outdated to me.