I'm building a web crawler in Node.js using the npm crawler
package. My program right now creates 5 child processes which each instantiate a new Crawler, which crawls a list of URLS which the parent provides.
When it runs for about 15-20 minutes, it slows down to a halt and the processes' STATE
column from the output of the top
command reads stuck
for all the children. [see below]
I have little knowledge of the top
command, and the columns provided, but I want to know is there a way to find out what is causing the processes to slow down by looking at the output of top
? I realize that it is probably my code that has a bug in it, but I want to know where I should start debugging: memory leak, caching issue, not enough children, too many children, etc.
Below is the entire output of top
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG MEM RPRVT PURG CMPRS VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH
11615 node 2.0 17:16.43 8 0 42 2519 94M- 94M- 0B 1347M+ 1538M 4150M 11610 11610 stuck 541697072 14789409+ 218 168 21 6481040 63691
11614 node 2.0 16:57.66 8 0 42 2448 47M- 47M- 0B 1360M+ 1498M- 4123M 11610 11610 stuck 541697072 14956093+ 217 151 21 5707766 64937
11613 node 4.4 17:17.37 8 0 44 2415 100M+ 100M+ 0B 1292M- 1485M 4114M 11610 11610 sleeping 541697072 14896418+ 215 181 22 6881669+ 66098+
11612 node 10.3 17:37.81 8 0 42 2478 24M+ 24M+ 0B 1400M- 1512M 4129M 11610 11610 stuck 541697072 14386703+ 215 171 21 7083645+ 65551
11611 node 2.0 17:09.52 8 0 42 2424 68M- 68M- 0B 1321M+ 1483M 4111M 11610 11610 sleeping 541697072 14504735+ 220 168 21 6355162 63701
11610 node 0.0 00:04.63 8 0 42 208 4096B 0B 0B 126M 227M 3107M 11610 11446 sleeping 541697072 45184 410 52 21 36376 6939
Here are the dependencies:
├── colors@0.6.2
├── crawler@0.2.6
├── log-symbols@1.0.0
├── robots@0.9.4
└── sitemapper@0.0.1
Sitemapper is one I wrote myself which could be a source for bugs.