in postgresql wire protocol, how can I reach ReadyForQuery from a partially executed select?

Question

in postgresql wire protocol, how can I reach ReadyForQuery from a partially executed select?

I am trying to debug an issue with the `node-pg-cursor' module in node.js against a postgresql server (version 9.3)

This module allows for sequential reads of N rows in a select and works by sending

cur.read(N): 'Execute' on portal=unnamed, rows=N

this command fetches up to N rows and we can continue fetching rows incrementally until the end, where we receive

CommandComplete
ReadyForQuery

Now my problem is that I want to bail out of the extended command before fetching all the rows and reaching the end of the Execute sequence: I would like to incrementally fetch N rows, N rows, N rows,.. and at one point decide that I have enough.

When I do that (stop fetching via Execute), the query seem to never reach CommandComplete or ReadyForQuery. This seems normal since nothing tells the extended query that I am never going to ask rows from it again.

Apart from closing the connection, is there a command to reach CommandComplete, or ReadyForQuery while not fetching all the rows from the portal ?

I tried to send Close and received CloseComplete, but it did not go to ReadyForQuery.

If I force an ErrorResponse by sending garbage on the protocol, I reach ReadyForQuery but that does not seem very clean ...

node.js
postgresql
postgresql-9.3

Answer 1

It sounds like you might want to be using the asynchronous query processing API, if your driver is a libpq wrapper. If it's a native implementation the source code for libpq may offer you clues.

Overall, it looks like you'll need to cancel the query using a new connection, then continue to consume input until the buffer is empty. You'll receive however much result data was buffered, then an error message indicating the query was cancelled (if it didn't buffer all its output before you cancelled it) and finally a ReadyForQuery.

I quote the libpq manual:

A client that uses PQsendQuery/PQgetResult can also attempt to cancel a command that is still being processed by the server; see Section 31.6. But regardless of the return value of PQcancel, the application must continue with the normal result-reading sequence using PQgetResult. A successful cancellation will simply cause the command to terminate sooner than it would have otherwise.

Systems usually have quite big TCP send buffers, and they're typically dynamic. See Linux's tcp(7), the SO_SNDBUF option to setsockopt(2), etc. So quite a lot of data might be buffered before the PostgreSQL server blocks on writing to the socket. PostgreSQL doesn't offer per-connection control of the send buffer size, or even a global config option; you must do it on the operating system level. (That said, it'd be trivial to patch PostgreSQL to set a send buffer size with setsockopt and SO_SENDBUF if you wanted to).

PostgreSQL can't just flush the output buffer when you cancel a query. Even if it were safe to do so and the platform supported it, Pg doesn't know for sure that the buffer has emptied of results from prior queries and other relevant messages, since you might have piplined multiple queries.

So all you can really do is reduce the maximum size of the TCP output buffer. That'll reduce the amount of data you must read and throw away, but it may impact the performance of other queries that send bulk data.

Instead of trying to run the query and cancelling it when you've seen enough, I suggest reading rows in batches, requesting a new batch when you've consumed the current one. You can do this by using protocol-level cursors. That way you can control how much data the server queues up and you don't have to mess with buffer sizes. You may already be doing this - using a named portal, and sending an Execute with a maximum row-count, waiting for the PortalSuspended to say there are more rows to read.

Answer 2

I think you're referring to this, in the documentation:

If Execute terminates before completing the execution of a portal (due to reaching a nonzero result-row count), it will send a PortalSuspended message; the appearance of this message tells the frontend that another Execute should be issued against the same portal to complete the operation. The CommandComplete message indicating completion of the source SQL command is not sent until the portal's execution is completed. Therefore, an Execute phase is always terminated by the appearance of exactly one of these messages: CommandComplete, EmptyQueryResponse (if the portal was created from an empty query string), ErrorResponse, or PortalSuspended.

Presumably, you're getting PortalSuspended and you want to discard the portal without executing any more of it or consuming any more results.

If so, I think you can just send a Sync message:

At completion of each series of extended-query messages, the frontend should issue a Sync message. This parameterless message causes the backend to close the current transaction if it's not inside a BEGIN/COMMIT transaction block ("close" meaning to commit if no error, or roll back if error). Then a ReadyForQuery response is issued.

You may wish to issue a Close against the portal first:

The Close message closes an existing prepared statement or portal and releases resources.

so what I think you need to do is, in message flow terms:

Parse
Bind a named portal
Describe
Loop:
- Execute with rowcount limit to fetch some rows
- If no more rows needed; then
  - Close the portal
  - Break out of the loop
- If CommandComplete received:
  - Break out of the loop
Sync
Wait for ReadyForQuery