Alternative to HTML Parsing with Regex

Question

Alternative to HTML Parsing with Regex

I'm parsing HTML with regex in node.js to return a string. However, I have been told that this is not a good idea in this post: Pull a specific string from an HTTP request in node.js

What are the more stable alternatives?

I'm new to programming, so links to tutorials would be very helpful. I have trouble understanding some of the documentation explanations.

javascript
regex
parsing
node.js

Answer 1

node-htmlparser handles all of the heavy lifting of parsing HTML. On top of that, node-soupselect lets you use CSS-style selectors to find the particular element you're looking for.

However, I looked at your other question and the question you should really be asking is not "how do I scrape this data from a HTML page", but rather "is there a better way to retrieve the data I'm looking for?" The USGS has APIs that provide their data in machine-readable form.

Here's the JSON object for the location you're intersted in. To get the "most recent instantaneous value" for the elevation of reservoir surface, you'd download that file, do a var d = JSON.parse, and:

for (var i = 0; i < d.value.timeSeries.length; i++) {
    if (d.value.timeSeries[i].variable.variableName == 'Elevation of reservoir water surface above datum, ft') {
        var result = d.value.timeSeries[i].values[0].value[d.value.timeSeries[i].values[0].value.length-1];
    }
}

result will now look like { dateTime: "2012-04-07T17:15:00.000-05:00", value: "1065.91" }.