Why does internet explorer fail to download a PDF using NodeJS and Express?

Question

Why does internet explorer fail to download a PDF using NodeJS and Express?

I've have a web site I'm building using NodeJS, that needs to serve up some PDFs (among other files).

For reasons I cannot determine, Internet Explorer 8 will fail to completely download the PDF in the Acrobat Viewer first time round (and sometimes multiple times after). Saving the file directly works just fine, but this isn't ideal. Chrome works fine, although I haven't tested other browsers.

There is no error message, other than the status bar stops being updated and shows: Downloaded (2.97 MB of 16.33 MB):

I'm serving the file via NodeJS and the Express (v3, beta2) / Connect framework (it's the Connect Static middleware that is serving the file.) I'm also serving it via SSL, but turning this off doesn't appear to help.

Any ideas would be greatly appreciated! Thanks

EDIT - to include more details:

Firstly - I've upgraded from Express v2 to v3 to attempt to fix the issue - no such luck.

This is the app route that serves the files. The static serving component does work, so the issue appears to be somewhere within how either IE retrieves files or how express serves them to IE.

app.get('/store/*', ensureAuthenticated, express.static(__dirname + '/../uploads'));

function ensureAuthenticated(req, res, next) {
    if (req.isAuthenticated()) {    
        return next();
    }
} else {
    res.redirect('/login');
}

As far as errors - I see no 404 error or anything in IE. It simply hangs with a blank screen and the above image showing the amount downloaded in the bottom left status bar. Adobe eventually (~5 minutes later) fails with an alert of: "This file is damaged and cannot be repaired". I know the file is not damaged, because occasionally IE will load it (see Fiddler requests below).

In Fiddler, I see the following. Multiple requests from IE for the PDF. 2 fail, 1 succeeds

The first two requests failed, while the third successfully delivered the PDF.

If there is anything else I can provide do let me know.

internet-explorer
node.js
pdf
ssl

Answer 1

So, I figured out the problem (or at least, a solution or 4).

It took me a while and an extensive amount of research, test cases and all sorts, but I got there eventually.

When IE8 (and possibly other browsers too, but I've not extensively tested) uses the Adobe 9 (not verified with Adobe X or any other version) plugin to request a PDF, occasionally it'll fetch the whole file as one. This is the success that I was seeing and corresponds to request 22 in the Fiddler screenshot above. It seems to grab the whole file if the file is small, but in this case I was testing with a 16MB PDF.

In other cases, Adobe will send a range header of this form:

Range: bytes=a-b, c-d

Where a-b is a range and c-d is a range, which it appears is not continuous.

My knowledge of Range headers isn't that extensive, so I'm not sure if this is valid or not. The research I've done suggests not.

Anyway, in Connect's static.js it uses a range parsing method in lib/util.js called parseRange. This method returns an array of ranges in the following form:

[
   { start: a, end: b },
   { start: c, end: d}
]

In static.js it calls this method and assigns the values to ranges, then it uses ranges[0] to calculate the range, thereby ignoring the values c and d. The range between a and b is the only data read from the file and sent to the request.

I believe Adobe 9 then continues waiting for more data, which causes the hanging I was seeing.

The solutions

The simplest solution is to remove the header 'Accept-Ranges' in static.js
A more complex (and not necessarily correct) solution I've implemented is to take the minimum of a and c and the maximum of b and d to create a new range and to return that in the range checking code. This gist shows what I've done: https://gist.github.com/2930131
A third (possible) solution I believe would be to monitor when the client keeps waiting for data and do something with the connection (close it, send more data - I'm not sure!). I've no idea really how this would work though but I'll try referring it to people who might.

The second solution works for me in my use case. I hope if you've stumbled across it it's useful for you too!