I am building a command line tool and want to use Node JS in this particular case.
I have a TXT file on which I want to perform regex on each line and use those within another function.
1) Should I import-convert the TXT file into an ARRAY using readFileSync or readFile AND then go through the elements of this array?
2) Should I go with readLines?
This file's size might be up to 5 MB but it will get larger and larger with time (up to hundred(s)).
3) Should I use Python, Ruby or any other language for this specific purpose? Would any other language make it much better? (Please answer the first two questions as my ability of not-using-node and option for something much different might not be possible)
Ultimately I want all this data to be stored in memory to be used over an over again at different times so any other solution, as long as it will be fast, I can consider.
Thank you very much.
3) You should use something async, like Node.js. The benefits are that you can read a chunk of the file and process it on the spot (but without blocking your entire app while this happens and without buffering the whole file), then move to the next chunk and so on. At any time you can pause the stream if you wish.
2) I think you should read (and then process) the file line by line.
1) You should definitely choose a readStream: http://nodejs.org/docs/v0.6.18/api/fs.html#fs_class_fs_readstream
That way you won't have to wait for the whole file to be read (and kept in memory). Here's a small snippet on how to achive this by using readStream and carrier (https://github.com/pgte/carrier):
var fs = require('fs'), carrier = require('carrier'), file = 'test.txt', stream;
stream = fs.ReadStream(file, { encoding: 'UTF-8' }); carrier.carry(stream, function(line) { extractWithRegex(line); });