Web Scrape Meteor Pages

Question

Web Scrape Meteor Pages

I'm trying to write an application that scrapes a meteor webpage. This is rather difficult as meteor webpages render initially entirely as Javascript. Is there some way perhaps to render the page with some sort of scraper?

Probably going to do it with node, if that helps.

Thanks

node.js
meteor
web-scraping

Answer 1

You could use phantomjs to render the webpage. This is an example, specifically designed for meteor webpages, (from spiderable) to capture their HTML:

var fs = require('fs');
var child_process = require('child_process');

console.log('Loading a web page');

var page = require('webpage').create();

page.open("http://localhost:3000", function(status) {

});

var i = 0;

setInterval(function() {
     var ready = page.evaluate(function () {
          if (typeof Meteor !== 'undefined' 
               && typeof(Meteor.status) !== 'undefined' 
               && Meteor.status().connected) {
               Deps.flush();
               return DDP._allSubscriptionsReady();
          }
          return false;
     });

     console.log("Ready", ready);

     if (ready) {
          var out = page.content;
          console.log(out);
          phantom.exit();
     }
}, 100);

It is this way but you could wrap the output and capture it using require('child_process').exec and stdin.

You can run the code with phantomjs script.js and it would give you back the HTML of a meteor page.

Answer 2

If they have the spiderable package enabled, then you can pretend to be a web crawler to get the server to render the page.

If you don't control the server or it isn't enabled, you will probably have to use Selenium - but the crawling will be CPU intensive and slow.