Node vs. PHP - Right way to make HTTP POST web request

One website I have was originally done in PHP. It does a web POST request to another website every time users do a particular query on the website.

function post_request($url, $data, $referer='') {
$data = http_build_query($data);
$url = parse_url($url);

if ($url['scheme'] != 'http') { 
    die('Error: Only HTTP request are supported !');
}

// extract host and path:
$host = $url['host'];
$path = $url['path'];

// open a socket connection on port 80 - timeout: 7 sec
$fp = fsockopen($host, 80, $errno, $errstr, 7);

if ($fp){
    // Set non-blocking mode 
    stream_set_blocking($fp, 0);

    // send the request headers:
    fputs($fp, "POST $path HTTP/1.1\r\n");
    fputs($fp, "Host: $host\r\n");

    if ($referer != '')
        fputs($fp, "Referer: $referer\r\n");

    fputs($fp, "User-Agent: Mozilla/5.0 Firefox/3.6.12\r\n");
    fputs($fp, "Content-type: application/x-www-form-urlencoded\r\n");
    fputs($fp, "Content-length: ". strlen($data) ."\r\n");
    fputs($fp, "Connection: close\r\n\r\n");
    fputs($fp, $data);

    $result = ''; 
    while(!feof($fp)) {
        // receive the results of the request
        $result .= fgets($fp, 128);
    }

   // close the socket connection:
   fclose($fp);
}
else { 
    return array(
        'status' => 'err', 
        'error' => "$errstr ($errno)"
    );
}

// split the result header from the content
$result = explode("\r\n\r\n", $result, 2);

$header = isset($result[0]) ? $result[0] : '';
$content = isset($result[1]) ? $result[1] : '';

// return as structured array:
return array(
    'status' => 'ok',
    'header' => $header,
    'content' => $content
);
}

This approach works trouble-free, only problem being it takes nearly 3 CPUs to support 100 concurrent users with the above code.

Thinking Node.js would be a good way to do this (web request would be async), I did the following. In terms of CPU requirements there was a definite improvement (mostly works with a single CPU, at most 2)

function postPage(postPath, postData, postReferal, onReply, out) {
    var post_options = {
          host: 'www.somehost.com',
          port: '80',
          path: postPath,
          method: 'POST',
          headers: {
              'Referer': postReferal,
              'Content-Type': 'application/x-www-form-urlencoded',
              'Content-Length': postData.length,
              'User-Agent': 'Mozilla/5.0 Firefox/3.6.12',
              'Connection': 'close'
          }
      };

    // create request
    var post_req = http.request(post_options, function (res) {
        var reply = '';
        res.setEncoding('utf8');
        res.on('data', function (chunk) {
            reply += chunk;
        });

        res.on('end', function () {
            onReply(reply, out);
        });

        res.on('error', function (err) {
            out.writeHead(500, { 'Content-Type': 'text/html' });
            out.end('Error');
        });
    });

    // post the data
    post_req.write(postData);
    post_req.end();
}

The problem in this case is that it is very fragile and around 20% of the web requests fail. If the user try the query again it works, but not a good experience.

Am using Windows Azure Websites to host both the above solutions.

Now, the questions

  1. Is using PHP for this expected to be taking that much resources, or is it because my code isn't optimal?
  2. What is wrong with my Node code (or Azure), that so many requests fail?

Use the request library

Buffering the entire response

The most basic way is to make a request, buffer the entire response from the remote service (indianrail.gov.in) into memory, and then send that back to the client. However it is worth looking at the streaming example below

Install the needed dependencies npm install request eyespect

var request = require('request');
var inspect = require('eyespect').inspector({maxLength: 99999999});  // nicer console logging
var url = 'http://www.indianrail.gov.in';

var postData = {
  fooKey: 'foo value'
};
var postDataString = JSON.stringify(postData);
var opts = {
  method: 'post',
  body: postDataString // postData must be a string here..request can handle encoding key-value pairs, see documentation for details
};

inspect(postDataString, 'post data body as a string');
inspect(url, 'posting to url');
request(url, function (err, res, body) {
  if (err) {
    inspect('error posting request');
    console.log(err);
    return;
  }
  var statusCode = res.statusCode;
  inspect(statusCode, 'statusCode from remote service');
  inspect(body,'body from remote service');
});

Streaming

If you have a response stream to work with you can stream the post data without having to buffer everything into memory first. I am guessing in your example this is the out parameter.

To add some error handling, you can use the async module and repeatedly try the post request until it either completes successfully or the maximum number of attempts is reached

npm install request filed temp eyespect async

var request = require('request');
var inspect = require('eyespect').inspector({maxLength: 99999999});  // nicer console logging
var filed = require('filed');
var temp = require('temp');
var rk = require('required-keys');
var async = require('async');

function postToService(data, cb) {

  // make sure the required key-value pairs were passed in the data parameter
  var keys = ['url', 'postData'];
  var err = rk.truthySync(data, keys);
  if (err) { return cb(err); }

  var url = data.url;
  var postData = data.postData;
  var postDataString = JSON.stringify(postData);
  var opts = {
    method: 'post',
    body: postDataString // postData must be a string here..request can handle encoding key-value pairs, see documentation for details
  };

  var filePath = temp.path({suffix: '.html'});
  // open a writable stream to a file on disk. You could however replace this with any writeable stream such as "out" in your example
  var file = filed(filePath);
  // stream the response to disk just as an example
  var r = request(url).pipe(file);
  r.on('error', function (err) {
    inspect(err, 'error streaming response to file on disk');
    cb(err);
  });

  r.on('end', function (err) {
    cb();
  });
}

function keepPostingUntilSuccess(callback) {
  var url = 'http://www.google.com';
  var postData = {
    fooKey: 'foo value'
  };
  var data = {
    url: url,
    postData: postData
  };
  var complete = false;
  var maxAttemps = 50;
  var attempt = 0;
  async.until(
    function () {
      if (complete) {
        return true;
      }
      if (attempt >= maxAttemps) {
        return true;
      }
      return false;
    },

    function (cb) {
      attempt++;
      inspect(attempt, 'posting to remote service, attempt number');
      postToService(data, function (err) {

        // simulate the request failing 3 times, then completing correctly
        if (attempt < 3) {
          err = 'desired number of attempts not yet reached';
        }
        if (!err) {
          complete = true;
        }
        cb();
      });
    },
    function (err) {
      inspect(complete, 'done with posting, did we complete successfully?');
      if (complete) {
        return callback();
      }
      callback('failed to post data, maximum number of attempts reached');
    });
}


keepPostingUntilSuccess(function (err) {
  if (err) {
    inspect(err, 'error posting data');
    return;
  }
  inspect('posted data successfully');
});