WebSocket thru Amazon ELB or directly (remote IP issue)

Question

WebSocket thru Amazon ELB or directly (remote IP issue)

We use WebSockets to communicate with our EC2 instances. Our script is served using nodejs and Express, and then initialize the WebSocket. Right now ELB is used which makes life harder to identify the client IP. Using x-forwarded-for header we can get the IP in the context of HTTP, but when it comes to WebSocket context in the server, it looks like it's not forwarded by Amazon.

We identified 2 options:

Communicate the WebSocket directly with the instance (using its public DNS).
Maintain some sort of sessionid, in which store the IP when in the context of HTTP and associate it with the sessionid. The client side will get its sessionid using the HTTP response, and will use it to on the WebSockets. The the server will be to identify the client and resolve its IP from the cache.

Both options are not great: 1 is not fault tolerant and 2 is complex. Are there more solutions? Can Amazon somehow forward the IP? What is the best practice?

Thanks

node.js
amazon-ec2
websocket
amazon-elb

Answer 1

I have worked with websockets and I have worked with ELB, but I've never worked with them together, so I didn't realize that an HTTP forwarder on an Elastic Load Balancer doesn't understand websocket requests...

So I take it you must be using a TCP forwarder, which explains why you're using a different port, and of course the TCP forwarder is protocol-unaware, so it won't be adding any headers at all.

An option that seems fairly generic and uncomplicated would be for the http side of your application to advise the websocket side by pushing the information across rather than storing it in a cache for retrieval. It's scalable and lightweight, assuming there's not an obstacle in your environment that makes it difficult or impossible to implement.

While generating the web page that loads the websocket, take the string "ipv4:" and the client's IP ("192.168.1.1," for example), concatenate and encrypt them, and make the result url-friendly:

/* pseudo-code */
base64_encode(aes_encrypt('ipv4:192.168.1.1','super_secret_key'))

Using that example key with 128 bit aes and that example IP address, I get:

/* actual value returned by pseudo-code above */
1v5n2ybJBozw9Vz5HY5EDvXzEkcz2A4h1TTE2nKJMPk=

Then when rendering the html for the page containing the websocket, dynamically build the url:

ws = new WebSocket('ws://example.com/sock?client=1v5n2ybJBozw9Vz5HY5EDvXzEkcz2A4h1TTE2nKJMPk=');

Assuming the querystring from the websocket is accessible to your code, you could base64_decode and then aes_decrypt the string found in the query parameter "client" using the super-secret key, and then verify that it begins with "ipv4:" ... if it doesn't, then it's not a legitimate value.

Of course, "ipv4:" (at the beginning of the string) and "client" (for the query parameter) were arbitrary choices and do not have any actual significance. My choice of 128 bit AES was also arbitrary.

The problem, of course, with this setup is that it is subject to a replay: a given client IP address will always generate the same value. If you only using the client IP address for "informational purposes" (such as logging or debugging) then this may be sufficient. If you are using it for anything more significant, you may want to expand this implementation -- for example, by adding a timestamp:

'ipv4:192.168.1.1;valid:1356885663;'

On the receiving end, decode the string and check the timestamp. If it is not +/- whatever interval in seconds that you deem safe, then don't trust it.

These suggestions all hinge on your ability to dynamically generate the websocket url, the browser's ability to connect with it, and you being able to access the querystring portion of the URL in the websocket request... but if those pieces will fall into place, maybe this will help.

Additional thoughts (from comments):

The timestamp I suggested, above, is seconds from the epoch, which gives you an incrementing counter that requires no statefulness in your platform -- it only requires that all of your server clocks are correct -- so it doesn't add unnecessary complexity. If the decrypted value contains a timestamp less than (for example) 5 seconds different (+/-) from the server's current time, then you know you're dealing with an authenticated client. The time interval permitted only needs to be as long as the maximum reasonable time for a client to attempt its websocket connection after loading the original page, plus the maximum skew of all your server clocks.

It is true, of course, that with NAT, multiple different users could be behind the same source IP address. It's also true, though far less less likely, that a user could actually make the websocket connection from a different source IP than the one where they originated the first http connection, and still be quite legitimate... and it sounds like the identity of the authenticate user may be a more important value for you than the actual source IP.

If you include the authenticated user's ID within the encrypted string as well, you have a value that is unique to origin IP, user account, and time, to a precision of 1 second. I think this is what you're referring to by additional salt. Adding the user account to the string should get you the information you're wanting.

'ipv4:192.168.1.1;valid:1356885663;memberid:32767;'

TLS should prevent discovery of the this encrypted string by an unauthorized party, but avoidance of replayability is also important because the generated URL is available in clear text in a user's browser's "view source" for the html page. You don't want a user who is authorized today but unauthorized tomorrow to be able to spoof their way in with a signed string that should be recognized as no longer valid. Keying to a timestamp and requiring it to fall in a very small valid window prevents this.

Answer 2

It depends on how serious the application is.

Basing any kind of decision on client IP address is a risky proposition. Basing security on it, even more so. While the suggestions offered so far work well within the given constraints, it would not be sufficient for a robust enterprise application.

Client IP addresses can be obscured by NATs, as already pointed out. So people accessing the Web from their place of work will often appear to have the same IP address. People's routers at home act as a NAT, so every family member at home accessing the Web will appear to the have the same IP address. Or even the same person accessing the application from a PC and a tablet...

Whether behind a NAT or not, using the application from two browsers on the same machine will appear to the have the same address. Similarly, multiple tabs in the same browser will appear to have the same address.

Other junction points like proxies or load balancers may also hide the original client IP address such that the thing behind the proxy/load balancers thinks they are the client. (More sophisticated or lower level intermediaries can prevent this, which is what makes them more sophisticated or expensive.)

Given all of the above, a serious application should not rely on client IP address for any kind of important decision, especially around security.