node.js staggering keep alives to large amount of tcp clients

I'm trying to send keep alives from a server to a bunch of tcp clients. To reduce the server load on the responses, I want to spread the keep alives apart.

If I have 3000 tcp clients, and 60s keep alive interval, I need to stagger keep alive messages over the 60s and send 50 keep alives every second.

Assumptions:

  1. Lots of tcp connections (in the thousands)
  2. Tcp connections persist and can be expected to be active for several hours, minimum
  3. Server needs to know within say 60s if a client is no longer connect
  4. Other information from the server and clients will be sent back and forth
  5. Keep alive return messages from clients contain useful data (which I think rules out UDP)

Currently, my thought is to store my tcp connections as a standard javascript object, with some id mapping to a particular the connection itself. Then, each second, I get the array of keys of this object, and send keep alives to some portion of these.

Is this a good approach? Are there better approaches or other things I should consider?

Example code for my initial stab at the problem:

var KEEP_ALIVE_INTERVAL = 1000; // time between groups
var KEEP_ALIVE_CYCLE = 3; // number of groups
var tcp_conns = {
    a:"a",
    b:"b",
    c:"c",
    d:"d",
    e:"e",
    f:"f",
    g:"g",
    h:"h",
    i:"i"
};

var intervalCounter = 0;
setInterval(function() {

    console.log("sending keep alives intervalCounter="+intervalCounter);

    var numConns = Object.keys(tcp_conns).length;
    var connFactor = Math.ceil( numConns / KEEP_ALIVE_CYCLE );
    var lowerLimit = connFactor*intervalCounter-1;
    var upperLimit = connFactor*(intervalCounter+1);

    console.log("connFactor="+connFactor+", limits=["+lowerLimit+","+upperLimit+"]");

    // Is this even async???
    var keys = Object.keys(tcp_conns)
    for (var i = 0; i < keys.length; i++) {
        if(i>lowerLimit && i<upperLimit){
            var key = keys[i]
            var val = tcp_conns[key]
            console.log(" id="+key+" => "+val);
        }
    }

    intervalCounter++;
    if(intervalCounter==KEEP_ALIVE_CYCLE){
        intervalCounter=0;
    }
}, KEEP_ALIVE_INTERVAL);

Rather than explicitly managing a collection containing all the connections, I'd send keep-alives randomly every 45s to 75s. This way, keep-alives will be spread over time. I'm not sure that the following code works as-is, but you'll get the basic idea.

  • I'm assuming that 'PONG' arrives as a single chunk, which might not be the case.
  • Be careful to avoid leaking listeners. Here, I add a "data" handler when I send PING, and I remove it when I get PONG. Not the most efficient solution.

Here's the code:

var KEEP_ALIVE_TIMEOUT = 120*1000,
    MIN_KEEP_ALIVE = 45*1000,
    MAX_KEEP_ALIVE = 75*1000;

function randomInt(min, max) {
    return Math.random()*(max - min) + min;
}

net.createServer(function(conn) {
  function ping() {
     var keepAliveTimer = setTimeout(function() {
       conn.destroy();
       console.log('timeout !');
     }, KEEP_ALIVE_TIMEOUT);

     conn.write('PING\r\n');

     conn.on('data', function onData(chunk) {
        if(chunk.toString() !== 'PONG\r\n')
          return handleSomethingElse();

        clearTimeout(keepAliveTimer);
        conn.removeListener('data', onData);
        setTimeout(ping, randomInt(MIN_KEEP_ALIVE, MAX_KEEP_ALIVE));
     });
  }

  ping();
});