I am recently trying to build a small website on Node.js as well as node-sqlite3 due to limited memory resource. I built a similar small-scale website in the past with Python and found that node-sqlite3 seems to be quite sluggish compared to Python's sqlite3 module. I did a batch of 1000 SELECT operations on both and noticed node-sqlite3 needed almost twice as much time to finish. Did anyone else notice the same thing? If I would like to continue using Node.js, any suggestion on optimizing the db access?
Update: I've added code I used for testing:
Node.js:
var sqlite3 = require('sqlite3').verbose();
var util=require('util');
var dbfn = './db.sqlite';
var db = new sqlite3.Database(dbfn,sqlite3.OPEN_READONLY, function(err) {
if(err) {
console.log(err);
} else {
var perpage=10,max=500,table='data2012',kw='sa';
for(var offset=1;offset<max;offset+=perpage) {
stmt = util.format('select * from %s where xxx like "%s%" limit %d offset %d',table, kw, perpage, offset);
db.all(stmt);
}
}
});
Python:
import sqlite3
DATABASE = './db.sqlite'
db=None
try:
db = sqlite3.connect(DATABASE)
except:
sys.stderr.write(os.getcwd())
table='data2011'
kw=('sa%',)
perpage=10
max=500
for offset in xrange(1,max,perpage):
stmt = 'select * from %s where xxx like ? limit %d offset %d'%(table, perpage, offset)
rs = db.execute(stmt, kw)
rs.fetchall()
db.close()
Well, the python library and the node.js library do not work at all in the same way.
Python will run everything in the same thread, in a sequential way. At most one query is processed at the same time, and the memory footprint is limited to the results of one query.
On the other hand, node.js is asynchronous and to get a non blocking behavior, the sqlite engine is distributed on a pool of threads. It does not run in the same thread as the event loop. Queries will therefore involve some context switches, and an overhead due to the cost of the OS scheduler. Futhermore, because they can be executed concurrently, you also need to consider the cost of concurrency (mutexes, semaphores, etc ...). Finally, the memory footprint is higher (more pending queries and query results in memory at a given point in time), which puts more pressure on the garbage collector.
With this specific benchmark, supposing the queries are only fast lookups, node.js is therefore slower than Python because the cost of the threading activity is higher than the benefit of running some queries in parallel.