I'm building a scraper in nodeJS, and I've come across an issue I can't figure out.
Certain websites use location-specific content and I'd like to find a way to trigger/manipulate this.
Off the bat, I know this is probably a complicated issue. Some sites might use different methods for determining a user's location. Is there a general way to achieve this? I'm currently using Node's request module, and have my headers set like so:
'headers': {
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)'
}
Is there a way of manipulating my headers to spoof a location to a website?
There are multiple methods used by companies to determine what kind of content to serve you.
Big media organisations, like the BBC, use a database mapping IP ranges to geographical locations maintained by a private company. The only way to defeat their access protections is to use a virtual server as a proxy in the country you wish to appear to be visiting from.
Other companies (many European ones) may just be interested in knowing what language to serve up content in. For this they may look at some headers in the web request.