I have the task: need to select data from "TABLE_FROM", modify it and insert to the "TABLE_TO". The main problem is script must run on production and shouldn't hurts live site performance, but "TABLE_FROM" contains hundred millions of rows. Going to run the script using nodejs. What techniques are using to resolve such kind of problems? ie. how to make this script running "slowly" or other words "softly" to prevent DB and CPU overload?
Time of script execution is irrelevant. I use Cassandra DB.
Sample code:
var OFFSET = 0;
var BATCHSIZE = 100;
var TIMEOUT = 1000;
function fetchPush() {
// fetch from TABLE_FROM, possibly in batches
rows = fetch(OFFSET, BATCHSIZE);
// push to TABLE_TO
push(rows);
// do next batch in timeout
setTimeout(fetchPush, TIMEOUT);
}
Here I'm assuming the fetch and push are blocking calls, for async processing you could (obviously) use async.