What is the exact point of using database if I have simple relations (95% queries are dependent on ID).
I am storing users and their stats.
Why would I use external database if I can have neat constructions like:
db.users[32] = something
Array of 500K users is not that big effort for RAM
Pros are:
ps. and considerations:
collection[3]
than db.query("select ...
It depends on the requirements for the durability, latency and lifetime of that data. In memory access to a data structure is almost always significantly faster than hopping the network to an external database but there's things to consider.
You can keep it solely in memory but if your process recycles for some reason, it's gone. That may be OK for your scenarios ...
You can also have issues if you have multiple front ends/processes with load balancing (as opposed to partitioning) or don't have affinity. In some scenarios like that, in memory state can be problematic. There's also options like memcached to address issues like that.
memcached is how facebook solved problems like these: http://www.facebook.com/note.php?note_id=39391378919
Similar to facebook, you can also persist data in a database (be it SQL or NoSQL like mongodb) and cache in memory for efficiency. If you cache in memory and it's backed by a database then you have to worry about the latency of that data and how to refresh it. memcached is a solution for scenarios like that as well. Either that or you write your own mechanism to piggy back data back, have polling (try to avoid though) etc... That's essentially what fb is doing - using databases but offloading the db load with distributed in memory caches. From that post:
memcached is a high-performance, distributed memory object caching system. Here at Facebook, we're likely the world's largest user of memcached. We use memcached to alleviate database load.
This is going to be a more considerative answer than anything. One thing you need to consider here as well is your language. I am a PHP programmer and I am glad for databases.
Trying to store 500K user array in memory in PHP (and operate on it) would be a living nightmare, infact it probably would be for most languages. Databases implement searching tactics to overcome such scenarios using logarithmic time functions upon pre-defined indexes.
You have also got the cost factor. Storing it in a MySQL or MongoDB database on the same server is actually cheaper since you will most likely require less memory to hold your information.
I would seriously test your memory consumption under the load of such an array, I am also guessing this is just one array of many, right?
Would it be faster or slower to do collection[3] than db.query("select ...
Now that depends, I am unsure how node.js handles arrays and the iteration to a specific index within them but some languages don't do a O(log n) search on the index which means that you would just do a O(n) search, this would actually be slower than a straight call on the index of a SQL table. Fair enough, take into consideration the amount of time it would take for SQL to create the result set, write it to disk and then respond for node.js to pick it up and it would probably be slower.
So node.js would definitely be faster on a small index or object but on a much larger one...I am unsure.
There is only ONE application/process accessing this data, and the code is executed line by line - please don't elaborate about locking.
That's surprising. I have, before now, easily had to spin up more than one node.js server. Infact to keep up the ideal web hosting environment you should always have another server ready to come into the fray if your primary server fails (which, believe me it does...). With this in mind I think it is kind of weird you are not taking locking and a central store point for distributed data into account here.