I have a large array of filenames I need to check, but I also need to respond to network clients. The easiest way is to perform:
for(var i=0;i < array.length;i++) {
fs.readFile(array[i], function(err, data) {...});
}
, but array can be of any length, say 100000, so it's not a good idea to perform 100000 reads at once, on the other hand doing fs.readFileSync() can take too long. Also launching next fs.readFile() in callback, like this:
var Idx = 0;
function checkFile() {
fs.readFile(array[Idx], function (err, data) {
Idx++;
if (Idx < array.length) {
checkFile();
} else {
Idx = 0;
setTimeout(checkFile, 10000); // start checking files in one second
}
});
}
is also not a best option, because array[] gets constantly updated by network clients - some items deleted, new added and so on.
What is the best way to accomplish such a task in node.js?
You should stick to your first solution (fs.readFile). For file I/O, node.js uses a thread pool. The reason is that most unix kernels don't provide efficient asynchronous APIs for the file system. Even if you start 10,000 reads concurrently, only a few reads will actually run and the rest will wait in a queue.
In order to make this answer more interesting, I browsed through node's code again to make sure that things hadn't changed.
Long story short, file I/O uses blocking system calls and is made by a thread pool with at most 4 concurrent threads.
The important code is in libeio, which is abstracted by libuv. All I/O code is wrapped by macros which queue requests. For example:
eio_req *eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data, eio_channel *channel)
{
REQ (EIO_READ); req->int1 = fd; req->offs = offset; req->size = length; req->ptr2 = buf; SEND;
}
REQ prepares the request and SEND queues it. We eventually end up in etp_maybe_start_thread:
static unsigned int started, idle, wanted = 4;
(...)
static void
etp_maybe_start_thread (void)
{
if (ecb_expect_true (etp_nthreads () >= wanted))
return;
(...)
The queue keeps 4 threads running to process the requests. When our read request is finally executed, eio simply use the block read from unistd.h:
case EIO_READ: ALLOC (req->size);
req->result = req->offs >= 0
? pread (req->int1, req->ptr2, req->size, req->offs)
: read (req->int1, req->ptr2, req->size); break;