Node.js & Amazon S3: How to iterate through all files in a bucket?

Is there any Amazon S3 client library for Node.js that allows listing of all files in S3 bucket?

The most known aws2js and knox don't seem to have this functionality.

Using the official aws-sdk:

var allKeys = [];
function listAllKeys(marker, cb)
{
  s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
    allKeys.push(data.Contents);

    if(data.IsTruncated)
      listAllKeys(data.Contents.slice(-1)[0].Key, cb);
    else
      cb();
  });
}

see s3.listObjects

In fact aws2js supports listing of objects in a bucket on a low level via s3.get() method call. To do it one has to pass prefix parameter which is documented on Amazon S3 REST API page:

var s3 = require('aws2js').load('s3', awsAccessKeyId, awsSecretAccessKey);    
s3.setBucket(bucketName);

var folder = encodeURI('some/path/to/S3/folder');
var url = '?prefix=' + folder;

s3.get(url, 'xml', function (error, data) {
    console.log(error);
    console.log(data);
});

The data variable in the above snippet contains a list of all objects in the bucketName bucket.

Published knox-copy when I couldn't find a good existing solution. Wraps all the pagination details of the Rest API into a familiar node stream:

var knoxCopy = require('knox-copy');

var client = knoxCopy.createClient({
  key: '<api-key-here>',
  secret: '<secret-here>',
  bucket: 'mrbucket'
});

client.streamKeys({
  // omit the prefix to list the whole bucket
  prefix: 'buckets/of/fun' 
}).on('data', function(key) {
  console.log(key);
});

If you're listing fewer than 1000 files a single page will work:

client.listPageOfKeys({
  prefix: 'smaller/bucket/o/fun'
}, function(err, page) {
  console.log(page.Contents); // <- Here's your list of files
});

Although @Meekohi's answer does technically work, I've had enough heartache with the S3 portion of the AWS SDK for NodeJS. After all the previous struggling with modules such as aws-sdk, s3, knox, I decided to install s3cmd via the OS package manager and shell-out to it using child_process

Something like:

    var s3cmd = new cmd_exec('s3cmd', ['ls', filepath, 's3://'+inputBucket],
            function (me, data) {me.stdout += data.toString();},
            function (me) {me.exit = 1;}
    );
    response.send(s3cmd.stdout);

(Using the cmd_exec implementation from this question)

This approach just works really well - including for other problematic things like file upload.