31.5. Batch mode and query pipelining

libpq supports queueing up mulitiple queries into a pipeline to be executed as a batch on the server. Batching queries allows applications to avoid a client/server round-trip after each query to get the results before issuing the next query.

An example of batch use may be found in the source distribution in src/test/examples/libpqbatch.c.

31.5.1. When to use batching

Much like asynchronous query mode, there is no performance disadvantage to using batching and pipelining. It somewhat increased client application complexity and extra caution is required to prevent client/server network deadlocks, but can offer considerable performance improvements.

Batching is most useful when the server is distant, i.e. network latency ("ping time") is high, and when many small operations are being performed in rapid sequence. There is usually less benefit in using batches when each query takes many multiples of the client/server round-trip time to execute. A 100-statement operation run on a server 300ms round-trip-time away would take 30 seconds in network latency alone without batching; with batching it may spend as little as 0.3s waiting for results from the server.

Use batches when your application does lots of small INSERT, UPDATE and DELETE operations that can't easily be transformed into operations on sets or into a COPY operation.

Batching less useful when information from one operation is required by the client before it knows enough to send the next operation. The client must introduce a synchronisation point and wait for a full client/server round-trip to get the results it needs. However, it's often possible to adjust the client design to exchange the required information server-side. Read-modify-write cycles are especially good candidates; for example:

     BEGIN;
     SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
     -- result: x=2
     -- client adds 1 to x:
     UPDATE mytable SET x = 3 WHERE id = 42;
     COMMIT;
    

could be much more efficiently done with:

     UPDATE mytable SET x = x + 1;
    

Note: The batch API was introduced in PostgreSQL 9.6, but clients using it can use batches on server versions 8.4 and newer. Batching works on any server that supports the v3 extended query protocol.

31.5.2. Using batch mode

To issue batches the application must switch libpq into batch mode. Enter batch mode with PQbeginBatchMode(conn) or test whether batch mode is active with PQisInBatchMode(conn). In batch mode only asynchronous operations are permitted, and COPY is not allowed. (The restriction on COPY is an implementation limit; the PostgreSQL protocol and server can support batched COPY).

The client uses libpq's asynchronous query functions to dispatch work, marking the end of each batch with PQsendEndBatch. Concurrently, it uses PQgetResult and PQgetNextQuery to get results. It may eventually exit batch mode with PQendBatchMode once all results are processed.

Note: It is best to use batch mode with libpq in non-blocking mode. If used in blocking mode it is possible for a client/server deadlock to occur. The client will block trying to send queries to the server, but the server will block trying to send results from queries it's already processed to the client. This only occurs when the client sends enough queries to fill its output buffer and the server's receive buffer before switching to processing input from the server, but it's hard to predict exactly when that'll happen so it's best to always use non-blocking mode.

31.5.2.1. Issuing queries

After entering batch mode the application dispatches requests using normal asynchronous libpq functions like PQsendQueryParams, PQsendPrepare, etc. The asynchronous requests are followed by a PQsendEndBatch(conn) call to mark the end of the batch. The client does not need to call PQgetResult immediately after dispatching each operation. Result processing is handled separately.

Batched operations will be executed by the server in the order the client sends them. The server will send the results in the order the statements executed. The server usually begins executing the batch before all commands in the batch are queued and the end of batch command is sent. If any statement encounters an error the server aborts the current transaction and skips processing the rest of the batch. Query processing resumes after the end of the failed batch.

It's fine for one operation to depend on the results of a prior one. One query may define a table that the next query in the same batch uses; similarly, an application may create a named prepared statement then execute it with later statements in the same batch.

31.5.2.2. Processing results

The client interleaves result processing with sending batch queries, or for small batches may process all results after sending the whole batch.

To get the result of the first batch entry the client must call PQgetNextQuery. It must then call PQgetResult and handle the results until PQgetResult returns null (or would return null if called). The result from the next batch entry may then be retrieved using PQgetNextQuery and the cycle repeated. The application handles individual statement results as normal.

PQgetResult behaves the same as for normal asynchronous processing except that it may contain the new PGresult types PGRES_BATCH_END and PGRES_BATCH_ABORTED. PGRES_BATCH_END is reported exactly once for each PQsendEndBatch call at the corresponding point in the result stream and at no other time. PGRES_BATCH_ABORTED is emitted during error handling; see error handling.

PQisBusy, PQconsumeInput, etc operate as normal when processing batch results.

libpq does not provide any information to the application about the query currently being processed. The application must keep track of the order in which it sent queries and the expected results. Applications will typically use a state machine or a FIFO queue for this.

31.5.2.3. Error handling

When a query in a batch causes an ERROR the server skips processing all subsequent messages until the end-of-batch message. The open transaction is aborted.

From the client perspective, after the client gets a PGRES_FATAL_ERROR return from PQresultStatus the batch is flagged as aborted. libpq will report PGRES_BATCH_ABORTED result for each remaining queued operation in an aborted batch. The result for PQsendEndBatch is reported as PGRES_BATCH_END to signal the end of the aborted batch and resumption of normal result processing.

The client must process results with PQgetNextQuery(...) and PQgetResult during error recovery.

If the batch used an implicit transaction then operations that have already executed are rolled back and operations that were queued for after the failed operation are skipped entirely. The same behaviour holds if the batch starts and commits a single explicit transaction (i.e. the first statement is BEGIN and the last is COMMIT) except that the session remains in an aborted transaction state at the end of the batch. If a batch contains multiple explicit transactions, all transactions that committed prior to the error remain committed, the currently in-progress transaction is aborted and all subsequent operations in the current and all later transactions in the same batch are skipped completely.

The client must not assume that work is committed when it sends a COMMIT, only when the corresponding result is received to confirm the commit is complete. Because errors arrive asynchronously the application needs to be able to restart from the last received committed change and resend work done after that point if something goes wrong.

31.5.2.4. Interleaving result processing and query dispatch

To avoid deadlocks on large batches the client should be structured around a nonblocking I/O loop using a function like select, poll, epoll, WaitForMultipleObjectEx, etc.

The client application should generally maintain a queue of work still to be dispatched and a queue of work that has been dispatched but not yet had its results processed. When the socket is writable it should dispatch more work. When the socket is readable it should read results and process them, matching them up to the next entry in its expected results queue. Batches should be scoped to logical units of work, usually (but not always) one transaction per batch. There's no need to exit batch mode and re-enter it between batches or to wait for one batch to finish before sending the next.

An example using select() and a simple state machine to track sent and received work is in src/test/examples/testlibpqbatch.c in the PostgreSQL source distribution.

31.5.2.5. Ending batch mode

Once all dispatched commands have had their results processed and the end batch result has been consumed the application may return to non-batched mode with PQendBatchMode(conn).

31.5.3. Functions associated with batch mode

PQisInBatchMode

Returns 1 if a libpq connection is in batch mode, otherwise 0.

int PQisInBatchMode(PGconn *conn);

PQbeginBatchMode

Causes a connection to enter batch mode if it is currently idle or already in batch mode and returns 1 for success. Returns 0 and has no effect if the connection is not currently idle, i.e. it has a result ready, is waiting for more input from the server, etc. This function does not actually send anything to the server, it just changes the libpq connection state.

int PQbeginBatchMode(PGconn *conn);

PQendBatchMode

Causes a connection to exit batch mode if it is currently in batch mode with an empty queue and no pending results and returns 1 for success. Returns 1 and takes no action if not in batch mode. If the connection has pending batch items in the queue for reading with PQgetNextQuery, the current statement isn't finished processing or there are results pending for collection with PQgetResult, returns 0 and does nothing.

int PQendBatchMode(PGconn *conn);

PQsendEndBatch

Delimits the end of a set of a batched commands by sending a sync message and flushing the send buffer. The end of a batch serves as the delimiter of an implicit transaction and an error recovery point; see error handling.

int PQsendEndBatch(PGconn *conn);

PQgetNextQuery

Causes the connection to start processing the next queued query's results. Returns 1 if a new query was popped from the result queue for processing. Returns 0 and has no effect if there are no query results pending, batch mode is not enabled, or if the query currently processed is incomplete or still has pending results. See processing results.

int PQgetNextQuery(PGconn *conn);

PQqueriesInBatch

Returns the number of queries still in the queue for this batch, not including any query that's currently having results being processsed. This is the number of times PQgetNextQuery has to be called before the query queue is empty again.

int PQqueriesInBatch(PGconn *conn);

PQbatchIsAborted

Returns 1 if the batch curently being received on a libpq connection in batch mode is aborted, 0 otherwise. The aborted flag is cleared as soon as the result of the PQsendEndBatch at the end of the aborted batch is processed. Clients don't usually need this function as they can tell that the batch is aborted from PGRES_BATCH_ABORTED result codes.

int PQbatchIsAborted(PGconn *conn);