I'd need to build a simple analytics back-end for capturing user behaviour. This will be captured via a Javascript snippet on a webpage just like Google Analytics or Mixpanel data.
The system needs to capture close-to-realtime browser data (scrolling position of page, mouse position etc.) It will record the state of the users' page every 5 seconds. There are only three attributes on each measurement but they are have to be taken frequently.
The data doesn't necessarily need to be sent every 5 seconds, it could be bussed up less frequently however it's imperative that I get all of the data while the user is on the page. i.e. I can't bus it once per minute and lose the last 59 seconds of data for someone who leaves after 119 seconds.
If possible I'd like to build a system that will scale for the foreseeable future which means it working for 10,000 sites, each with 100 concurrent visitors, i.e. 100,000 concurrent users each sending one event every 5 seconds.
I'm not worried about querying the data, that can be done using a separate system. I'm most interested in how to handle the capture of the data itself.
Based on the budgeting above, the system needs to handle 20,000 events per second coming from a pool of 100,000 users.
I'd like to host this service on Heroku however while I've done a lot of work with Rails, I'm completely new to high throughput systems (other than knowing you don't process them using Rails).
My high level comment for you is to build your system following the 12 factor design, and then worry about scaling as the customers arrive. I'm thrilled with Node.js and the npm ecosystem, but I also think you could build a perfectly acceptable platform with Rails. If it took 3 dynos to support 100 K concurrent users with Node, and double that with Rails, you still might be better off with Rails, if your comfort with Ruby got you to market 3 months faster. Anyway, assuming you go with Node, here are my answers:
Good luck.
A few addional points I would like to mention is use a CDN for distribution of the JavaScript client, or better yet, provide the full JS to serve from the page. Either way, load fast and load asynchronously. It sounds like a fun project. Good luck!
EDIT In an alternate universe, where you do not have to use heroku, websockets would be an awesome solution.