Load testing ec2 Node.js with Apache AB - 6500 QPS run on server but 175 QPS from remote

I am trying to load test a simple node.js and cyclone hello world app on an ec2 c1.xlarge machine on ubuntu 64. It has 8 cores. I am using nginx as a load balancer and supervisor is launching one process for each core. When I run the below on the machine, My qps is about 6500 for node.js.

 ab -n 5000 -c 25 http://127.0.0.1

When I run ab from a remote machine, even a machine that is in the same zone, qps drops to about 175 qps. Its even worse if I run from my dev machine.

So, what am I missing? Are there parameters that I have to tune to allow for more connections from remote machines? I feel I am missing something. Is there a magic nob I have to tune in the sysctl config file? Its a rather raw machine but on boot, the below are the nobs that I tune.

sysctl -w fs.file-max=128000;
sysctl -w net.ipv4.tcp_keepalive_time=300;
sysctl -w net.core.somaxconn=250000;
sysctl -w net.ipv4.tcp_max_syn_backlog=2500;
sysctl -w net.core.netdev_max_backlog=30000;
sysctl -p

Latency is slowing the test down which reduces throughput. In virtually every case, a remote request is going to take longer than a local one so a single thread will have a lower throughput running remotely than when running locally and thus, when using Ab which will not pace requests, the overall throughput must decrease.

For example, you have 25 threads. Lets say it takes 50ms to make your request locally. For one thread this gives:

1000 (1 second) / 50 = 20 requests per/sec - this is the maximum throughput possible with one thread.

Over 25 theads that adds up to 25 * 20 = 500 req/s.

If you take that formula and change the response time to, say, 250ms then the total maximum throughput for on thread drops to 4 req/s giving an overall maximum possible with 25 threads of 80 requests per second.

Taking this a step further: If you say you get 6000 qps with 25 threads then logically your app responds in about 4ms when called locally. If you can only get 175 qps remotely then it is because the response time drops to about 142ms, so in your system you have latency of about 138ms - give or take - and this is the issue.