Performance testing Twitter Streaming API consumer

I have a service that consumes twitter posts in realtime using the Twitter Streaming API.

I have built a background process which connects to the stream and pushes tweet into Redis. This is built with node.js

What I need to do is to figure out what the maximum number of tweets this process can consume. I need to performance test this setup.

What is the best way to test this?

I need to know:

  • how many tweets it can handle before it falls over
  • what happens when the process can't handle any more tweets

Another reason why I would want to do this is to work out whether its worth using node.js at all. I would prefer to write it with EventMachine instead.

Since you're inherently limited by the frequency and volume of tweets coming from the Twitter Streaming API, what you're actually interested in benchmarking is the I/O performance of your background process with respect to Redis.

Mock the tweets and generate pseudo-tweets or collect a significant sampling of actual tweets and use this data set in your benchmarking. After mocking/generating this data set, you can precisely write your benchmark against this. For example, data set in hand, you could push the entirety of this data set all at once into your new tweet event handling logic, or simulate peaks and valleys of activity.

The point being, when benchmarking, identify and isolate the desired variable (number of tweets), use a standardized sample, and mock away inconsistent and outside behavior (API limits, variable tweet/sec rate).

I would suggest to create custom client simulating Twitter Stream API. The client can generate tweets for your application to consume. We can use a load testing tool that supports custom scripts to run this twitter script from distributed machines to generated the desired load. While the tweets are being generated you can monitor the health of the system to measure the impact of tweet throughput on your application.