How do I prevent Javascript from mutating a page in Selenium? How do I download the original page source?

Question

How do I prevent Javascript from mutating a page in Selenium? How do I download the original page source?

I'm not using Selenium to automate testing, but to automate saving AJAX pages that inject content, even if they require prior authentication to access.

I tried

tl;dr: I tried multiple tools for downloading sites with AJAX and gave up because they were hard to work with or simply didn't work. I'm resorting to using Selenium after trying out WebHTTrack (whose GUI wasn't able to start up on my Ubuntu machine + was a headache to provide authentication with in interactive-terminal mode), wget (which didn't download any of the scripts of stylesheets included on my page, see the bottom for what I tried with wget)... and then I finally gave up after a promising post on using a Mozilla XULRunner AJAX scraper called Crowbar simply seg-faulted on me. So...

ended up making my own broken thing in NodeJS and Selenium-WebdriverJS

My NodeJS script uses selenium-webdriver npm module which is "officially supported by the main project" to:

provide login information + do necessary button-clicking & typing for authentication
download all JS and CSS referenced on target page
download target page with original JS/CSS file links change to local file paths

Now when I view my test page locally I see double of many page elements because the target site loads HTML snippets into the page each time it's loaded. I use this to download my target page right now:

var $;
var getTarget = function () {                                                                                                                                               
    driver.getPageSource().then(function (source) {
        $ = cheerio.load(source.toString());
    }); 
};

var targetHtmlDest = 'test.html';
var writeTarget = function () {
    fs.writeFile(targetHtmlDest, $.html());
}

driver.get(targetSite)
    .then(authenticate)
    .then(getRoot)
    .then(downloadResources)
    .then(writeRoot);
driver.quit();

The problem is that the page source I get is the already modified page source, instead of the original one. Trying to run alert("x");window.stop(); within driver.executeAsyncScript() and driver.executeScript() does nothing.

javascript
html
node.js
selenium
selenium-webdriver

Answer 1

Perhaps using Curl to get the page (you can pass authentication in the command) will get you the bare source? Otherwise you may be able to turn off JavaScript on your test browsers to prevent JS actions from firing.