So I'm stuck trying to figure out how Google Analytics avoids spoofing. Sure, when you sign up for an account, they make you verify that you own the domain by uploading a file. But you are also given some script tags with a unique public code (replaced with 'XXXXXXX' below). What's stopping somebody from copying that code, spoofing the request headers, and pretending to be my site by following Google's authentication strategy with curl?
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'XXXXXXX']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
The reason I ask is because I'm trying to create a similar JavaScript plugin that exposes my site's data to participating websites ("clients"). I'm not sure how I can get this functionality without a private key on the client's server side. That kind of sucks because I was really going for the whole "easy as Google Analytics to integrate". Any thoughts?
It sounds like this question really has nothing to do with Google Analytics (I'd really suggest you remove that from your question as I think it's misleading most people and not getting you closer to your answer).
You have some data and you want to share it with only select sites. There is no other way to do that besides protecting the data with some sort of authorization scheme and then giving the selected sites some sort of password or key that lets them have access to it while others who you did not give the key to will not get access to the data. Even this scheme would only work if the code accessing the data is in a private area on a server (where keys/passwords can be protected), not javascript in a browser.
As to the GA spoofing (which I don't think has anything to do with your real question), I suspect that Google doesn't worry about it much because other than a denial of service attack on GA in general (which I suspect they do have protection against), what benefit is there to recording hits for someone else's web site? Whoever is doing it can't get access to the data because the data is in someone else's GA account. I suppose one could do it as annoyance to someone to try to screw up their GA numbers, but without some more profitable motivation, there probably isn't a lot of people trying to do that.
Interesting question.
As the comments hint, Google doesn't really address this. In fact, it's common to have conditional code / preprocessing stuff to disable GA on your staging site / dev boxes, because if you don't it will screw up your numbers.
You could try a sort of three-legged approach with the analytics server, the customer server, and the client. It could work something like this:
Customer's server and your analytics server share a secret key. When the client hits the customer's site, the customer's server tells your analytics server it wants to start tracking this particular customer.
Your analytics server generates a session id for this user, and returns a dynamic URL to the customer's server. The URL points to your JavaScript tracking code (or a loader for it), injected with the session ID.
The customer's server sends the page to the client. The page contains your client-side tracking code with the unique session ID. Actions are tracked and sent to your analytics server.
On your analytics server, you receive tracking information from the client's machine. You check that the session ID is valid and not expired, and that the IP address matches.
This should provide an extra level of security. Unfortunately, it will not be "easy as Google Analytics to integrate..." it would involve server-side participation on the part of your customers. It also won't do as much good for tracking users who haven't been authenticated by your customers, because a third party could simply visit your customer's site to get a valid session ID and then send some fake info to your analytics server. However, for clients authenticated by your customer's site, it could be useful.
Good luck!