I'm not using Selenium to automate testing, but to automate saving AJAX pages that inject content, even if they require prior authentication to access.
tl;dr: I tried multiple tools for downloading sites with AJAX and gave up because they were hard to work with or simply didn't work. I'm resorting to using Selenium after trying out WebHTTrack (whose GUI wasn't able to start up on my Ubuntu machine + was a headache to provide authentication with in interactive-terminal mode), wget (which didn't download any of the scripts of stylesheets included on my page, see the bottom for what I tried with wget)... and then I finally gave up after a promising post on using a Mozilla XULRunner AJAX scraper called Crowbar simply seg-faulted on me. So...
My NodeJS script uses selenium-webdriver npm module which is "officially supported by the main project" to:
Now when I view my test page locally I see double of many page elements because the target site loads HTML snippets into the page each time it's loaded. I use this to download my target page right now:
var $;
var getTarget = function () {
driver.getPageSource().then(function (source) {
$ = cheerio.load(source.toString());
});
};
var targetHtmlDest = 'test.html';
var writeTarget = function () {
fs.writeFile(targetHtmlDest, $.html());
}
driver.get(targetSite)
.then(authenticate)
.then(getRoot)
.then(downloadResources)
.then(writeRoot);
driver.quit();
The problem is that the page source I get is the already modified page source, instead of the original one. Trying to run alert("x");window.stop(); within driver.executeAsyncScript() and driver.executeScript() does nothing.
Perhaps using Curl to get the page (you can pass authentication in the command) will get you the bare source? Otherwise you may be able to turn off JavaScript on your test browsers to prevent JS actions from firing.