I'm parsing HTML with regex in node.js to return a string. However, I have been told that this is not a good idea in this post: Pull a specific string from an HTTP request in node.js
What are the more stable alternatives?
I'm new to programming, so links to tutorials would be very helpful. I have trouble understanding some of the documentation explanations.
node-htmlparser handles all of the heavy lifting of parsing HTML. On top of that, node-soupselect lets you use CSS-style selectors to find the particular element you're looking for.
However, I looked at your other question and the question you should really be asking is not "how do I scrape this data from a HTML page", but rather "is there a better way to retrieve the data I'm looking for?" The USGS has APIs that provide their data in machine-readable form.
Here's the JSON object for the location you're intersted in. To get the "most recent instantaneous value" for the elevation of reservoir surface, you'd download that file, do a var d = JSON.parse
, and:
for (var i = 0; i < d.value.timeSeries.length; i++) {
if (d.value.timeSeries[i].variable.variableName == 'Elevation of reservoir water surface above datum, ft') {
var result = d.value.timeSeries[i].values[0].value[d.value.timeSeries[i].values[0].value.length-1];
}
}
result
will now look like { dateTime: "2012-04-07T17:15:00.000-05:00", value: "1065.91" }
.