Twitter Streaming API Multiple Stream vs Custom Filter

I'm building a node.js application that opens up a connection to the Twitter Streaming API (v1.1)

I would like to filter multiple keywords (hashtags & words) as separate queries. My original idea was to have multiple public streams.

However, I understand that I can only have one open connection to the Twitter streaming api per application and per IP address and that Twitter encourages us to come up with creative solutions to get what we want.

So my question is this:

If I stream with no filters, such as using statuses/sample (which I believe is 1%) and use custom javascript to filter the output, would I get the same tweets if I used the API method of filtering (i.e track='twitter').

Edit: I have created a diagram explaining this:

enter image description here

As you can see, I want to know if the two outputs wil be the same. I suspect that they won't be because although both outputs are effectively the same filter, one source is a 1% sample, and maybe the other source is a 100% sample but only delivering 1% tweets from that.

So can someone please clarify if both outputs are the same?

Thank you.

javascript
node.js
twitter

According to the Twitter streaming api rules, if the keywords that you track doesn't exceed 1% of the whole global traffic you will receive all data (some tweets might be lost due to network issues etc but it is not significant). This is called garden-hose (firehose is a special filter which gives you all the data but it is given as a paid service through third parties such as http://datasift.com/)

So if a tweet is filtered through public stream then it would be part of your custom filter too unless your keyword set is too broad.

By using custom filters you can track multiple search keywords, and if you miss some data because your keyword set is too broad twitter sends a track limitation notice indicating how much data you are missing.

My suggestion to you would be to use a custom filter and analyze what you get from the stream and what you get as a result for the same keywords from twitter. And when you start getting track limitation notice from twitter, it is time for you to split your keyword set into chunks and start streaming through different streamers by running them from different machines.

The details of the filter streaming is below (taken from official website https://dev.twitter.com/docs/api/1.1/post/statuses/filter)

Returns public statuses that match one or more filter predicates. Multiple parameters may be specified which allows most clients to use a single connection to the Streaming API. Both GET and POST requests are supported, but GET requests with too many parameters may cause the request to be rejected for excessive URL length. Use a POST request to avoid long URLs.

The default access level allows up to 400 track keywords, 5,000 follow userids and 25 0.1-360 degree location boxes. If you need elevated access to the Streaming API, you should explore our partner providers of Twitter data here.

I would like to answer my question with the results of my findings.

I tested both side by side in the same time frame and concluded that the custom filter method, whilst it supports multiple filters does not provide enough tweets to create an interesting enough visualisation.

I think the only way to get something more interesting with concurrent filters is to look at other methods but I am wondering if its not possible. Maybe with a third party.

I have attached a screenshot of the visualisation tracking 'barackobama' The left is the custom filter, the right is statuses/filter.

enter image description here

The statuses/filter api operate on all tweets, instead of those returned by statuses/sample, you can tell by looking at their tweet id's: sample tweets all come from a specific time window. So from millisecond-resolution creation time, you can definitely tell that filter returns tweets outside of sample.

For more details about getting creation time from tweet id and the time window on sample tweets, consult this post: http://blog.falcondai.com/2013/06/666-and-how-twitter-samples-tweets-in.html