Performance considerations / issues when using Express / Busboy / Multer and Gridfs in NodeJS

I have decided to use MongoDB's GridFS in my NodeJS Express application for storing (larger) files, which end users can upload and download. Until now, I have used MySQL and a filesystem oriented approach, but it would be nice to have file data and metadata in one place.

I came up with two ideas for handling multipart-formdata uploads in my application:

First idea: Using Multer Middleware to upload files and "moving" them later into GridFS

var time = {startFile:"",endFile:"",startGFS:"",endGFS:""};
app.post('/upload',  multer({
    dest:"./tmp/",

    onFileUploadStart:function(file){
      console.log("=====================================================");
      console.log("starting upload " + file.originalname );
      time.startFile = new Date().getTime();
    },

    onFileUploadComplete:function(file) {
      // end process 

      time.endFile  = new Date().getTime();
      var size = ""+(file.size / 1024);

      var sizeStr = size.substr(0, size.indexOf(".")+2);

      console.log("Size:  " + sizeStr + " kb");
    }
  }),
  function(req,res) {

    var fileMulterName = req.files.file.path;
    var filename = req.files.file.originalname;
    var metadata = {
      userName : req.body.userName
    };
    time.startGFS = new Date().getTime();

    this.writestream = gfs.createWriteStream({
        filename:filename,
        mode:"w",
        chunkSize:1024*2048,
        content_type:req.files.file.mimetype,
        root:"fs",
        metadata:metadata
    });

    fs.createReadStream(fileMulterName).pipe(writestream);

    writestream.on('close', function (file) {
      // do something with `file`
      time.endGFS = new Date().getTime();
      res.sendStatus(200);

      console.log("Time needed for writing already uploaded file to FS: " + (time.endFile-time.startFile) + " ms");
      console.log("Time needed for writing to GridFS: " + (time.endGFS-time.startGFS) + " ms");

      var startDelete = new Date().getTime();
      var endDelete = "";
      fs.unlink(fileMulterName, function (err) {
        if (err) {
          console.log(err);
        }
        else {
          endDelete = new Date().getTime();
          console.log("successfully deleted File from FS. This needed " + (endDelete - startDelete) + " ms");
          console.log("=====================================================");
        }
      });
    });
});

(See complete GIST here https://gist.github.com/derMani/63f82cd88320c2074dcd)

While this was quite easy to implement, I wondered if there isn't a better, direct approach than using the filesystem first. My highly unscientific time measurements signaled me, that storing a file into Gridfs from the filesystem is quite slow. A ~640 MB file needed between 25 and 45 seconds to be stored into Gridfs. In addition to that there are the FS operations, which needs to be done, too. But the good thing for this approach: The server's memory is barely touched.

Second idea: Using just Busboy and writing into GridFS from the memory (onData)

app.post('/upload', function(req, res) {

  var gfsstream, startFileWrite, endFileWriteTime;

  var busboy = new Busboy({ headers: req.headers });

  busboy.on('file', function(fieldname, file, filename, encoding, mimetype) {
    startFileWrite = new Date().getTime();
    console.log('File [' + fieldname + ']: filename: ' + filename);

    gfsstream = gfs.createWriteStream({
        filename:filename,
        mode:"w",
        chunkSize:1024*256,
        content_type:mimetype,
        root:"fs",
        metadata: {} // put some crazy meta data in here
      });

    file.on('data', function(data) {
      gfsstream.write(data);
    });

file.on('end', function() {
  gfsstream.end();

});

gfsstream.on('close', function (file) {
  // do something with `file`
 endFileWrite = new Date().getTime();


  console.log('File [' + fieldname + '] Finished');

  console.log("Time needed: " + (endFileWrite - startFileWrite) + " ms");
});

});

  busboy.on('error', function(err) {
    console.error(err);
    res.send(500, 'ERROR', err);
  });

  busboy.on('finish', function end() {
    res.send(200);
  });
  req.pipe(busboy);
});

(See complete GIST here https://gist.github.com/derMani/c7ec0d66d783804c012b)

This approach seems to be a lot quicker! It just needs ~12 seconds to store the file into Gridfs. This higly depends of course on the upload speed available. In this example it's just a localhost upload. I've used the on('data') event to post just the incoming data into GridFS. The problem here: My application consumes now a lot of memory, depending on the file size of the incoming file. While up to 1GB large files are ok to handle, greater files (like *.iso images) consume a lot of memory. The memory itself is not cleaned / freed very good after the GridFS. I have monitored it in MacOSX's activity monitor and I think the garbage collection takes it time to free every once used data fragments of the file.

My question is which approach is the "right" way to go? Is there maybe even a better mechanism for uploading files into GridFS? Can we somehow get a better control of the incoming stream to store all incoming data fragments directly into GridFS and removing the blob as soon as it is not needed anymore?

edit

I made an error in measuring the time in the wrong method. The times for the same file are now comparable, but the main question remains

Best Regards Rolf