PostgreSQL 9.6beta1 Documentation | |||
---|---|---|---|
Prev | Up | Chapter 31. libpq - C Library | Next |
libpq supports queueing up mulitiple queries into a pipeline to be executed as a batch on the server. Batching queries allows applications to avoid a client/server round-trip after each query to get the results before issuing the next query.
An example of batch use may be found in the source distribution in src/test/examples/libpqbatch.c.
Much like asynchronous query mode, there is no performance disadvantage to using batching and pipelining. It somewhat increased client application complexity and extra caution is required to prevent client/server network deadlocks, but can offer considerable performance improvements.
Batching is most useful when the server is distant, i.e. network latency ("ping time") is high, and when many small operations are being performed in rapid sequence. There is usually less benefit in using batches when each query takes many multiples of the client/server round-trip time to execute. A 100-statement operation run on a server 300ms round-trip-time away would take 30 seconds in network latency alone without batching; with batching it may spend as little as 0.3s waiting for results from the server.
Use batches when your application does lots of small INSERT, UPDATE and DELETE operations that can't easily be transformed into operations on sets or into a COPY operation.
Batching less useful when information from one operation is required by the client before it knows enough to send the next operation. The client must introduce a synchronisation point and wait for a full client/server round-trip to get the results it needs. However, it's often possible to adjust the client design to exchange the required information server-side. Read-modify-write cycles are especially good candidates; for example:
BEGIN; SELECT x FROM mytable WHERE id = 42 FOR UPDATE; -- result: x=2 -- client adds 1 to x: UPDATE mytable SET x = 3 WHERE id = 42; COMMIT;
could be much more efficiently done with:
UPDATE mytable SET x = x + 1;
Note: The batch API was introduced in PostgreSQL 9.6, but clients using it can use batches on server versions 8.4 and newer. Batching works on any server that supports the v3 extended query protocol.
To issue batches the application must switch
libpq into batch mode. Enter batch mode with PQbeginBatchMode(conn)
or test
whether batch mode is active with PQisInBatchMode(conn)
. In batch mode only asynchronous operations are permitted, and
COPY is not allowed. (The restriction on COPY is an implementation
limit; the PostgreSQL protocol and server can support batched COPY).
The client uses libpq's asynchronous query functions to dispatch work,
marking the end of each batch with PQsendEndBatch
.
Concurrently, it uses PQgetResult
and
PQgetNextQuery
to get results. It may eventually exit
batch mode with PQendBatchMode
once all results are
processed.
Note: It is best to use batch mode with libpq in non-blocking mode. If used in blocking mode it is possible for a client/server deadlock to occur. The client will block trying to send queries to the server, but the server will block trying to send results from queries it's already processed to the client. This only occurs when the client sends enough queries to fill its output buffer and the server's receive buffer before switching to processing input from the server, but it's hard to predict exactly when that'll happen so it's best to always use non-blocking mode.
After entering batch mode the application dispatches requests
using normal asynchronous libpq functions like
PQsendQueryParams
, PQsendPrepare
,
etc. The asynchronous requests are followed by a PQsendEndBatch(conn)
call to mark
the end of the batch. The client does not need to call
PQgetResult
immediately after dispatching each
operation. Result processing
is handled separately.
Batched operations will be executed by the server in the order the client sends them. The server will send the results in the order the statements executed. The server usually begins executing the batch before all commands in the batch are queued and the end of batch command is sent. If any statement encounters an error the server aborts the current transaction and skips processing the rest of the batch. Query processing resumes after the end of the failed batch.
It's fine for one operation to depend on the results of a prior one. One query may define a table that the next query in the same batch uses; similarly, an application may create a named prepared statement then execute it with later statements in the same batch.
The client interleaves result processing with sending batch queries, or for small batches may process all results after sending the whole batch.
To get the result of the first batch entry the client must call PQgetNextQuery
. It must then call
PQgetResult
and handle the results until
PQgetResult
returns null (or would return null if
called). The result from the next batch entry may then be retrieved using
PQgetNextQuery
and the cycle repeated. The
application handles individual statement results as normal.
PQgetResult
behaves the same as for normal asynchronous
processing except that it may contain the new PGresult types
PGRES_BATCH_END and PGRES_BATCH_ABORTED.
PGRES_BATCH_END is reported exactly once for each
PQsendEndBatch
call at the corresponding point in
the result stream and at no other time. PGRES_BATCH_ABORTED
is emitted during error handling; see error handling.
PQisBusy
, PQconsumeInput
, etc
operate as normal when processing batch results.
libpq does not provide any information to the application about the query currently being processed. The application must keep track of the order in which it sent queries and the expected results. Applications will typically use a state machine or a FIFO queue for this.
When a query in a batch causes an ERROR the server skips processing all subsequent messages until the end-of-batch message. The open transaction is aborted.
From the client perspective, after the client gets a
PGRES_FATAL_ERROR return from
PQresultStatus
the batch is flagged as aborted.
libpq will report
PGRES_BATCH_ABORTED result for each remaining queued
operation in an aborted batch. The result for
PQsendEndBatch
is reported as
PGRES_BATCH_END to signal the end of the aborted batch
and resumption of normal result processing.
The client must process results with
PQgetNextQuery(...)
and
PQgetResult
during error recovery.
If the batch used an implicit transaction then operations that have already executed are rolled back and operations that were queued for after the failed operation are skipped entirely. The same behaviour holds if the batch starts and commits a single explicit transaction (i.e. the first statement is BEGIN and the last is COMMIT) except that the session remains in an aborted transaction state at the end of the batch. If a batch contains multiple explicit transactions, all transactions that committed prior to the error remain committed, the currently in-progress transaction is aborted and all subsequent operations in the current and all later transactions in the same batch are skipped completely.
The client must not assume that work is committed when it sends a COMMIT, only when the corresponding result is received to confirm the commit is complete. Because errors arrive asynchronously the application needs to be able to restart from the last received committed change and resend work done after that point if something goes wrong.
To avoid deadlocks on large batches the client should be structured around
a nonblocking I/O loop using a function like select
,
poll
, epoll
,
WaitForMultipleObjectEx
, etc.
The client application should generally maintain a queue of work still to be dispatched and a queue of work that has been dispatched but not yet had its results processed. When the socket is writable it should dispatch more work. When the socket is readable it should read results and process them, matching them up to the next entry in its expected results queue. Batches should be scoped to logical units of work, usually (but not always) one transaction per batch. There's no need to exit batch mode and re-enter it between batches or to wait for one batch to finish before sending the next.
An example using select()
and a simple state machine
to track sent and received work is in
src/test/examples/testlibpqbatch.c in the PostgreSQL
source distribution.
Once all dispatched commands have had their results processed and the end batch
result has been consumed the application may return to non-batched mode with
PQendBatchMode(conn)
.
PQisInBatchMode
Returns 1 if a libpq connection is in batch mode, otherwise 0.
int PQisInBatchMode(PGconn *conn);
PQbeginBatchMode
Causes a connection to enter batch mode if it is currently idle or already in batch mode and returns 1 for success. Returns 0 and has no effect if the connection is not currently idle, i.e. it has a result ready, is waiting for more input from the server, etc. This function does not actually send anything to the server, it just changes the libpq connection state.
int PQbeginBatchMode(PGconn *conn);
PQendBatchMode
Causes a connection to exit batch mode if it is currently in batch mode
with an empty queue and no pending results and returns 1 for success.
Returns 1 and takes no action if not in batch mode. If the connection has
pending batch items in the queue for reading with
PQgetNextQuery
, the current statement isn't finished
processing or there are results pending for collection with
PQgetResult
, returns 0 and does nothing.
int PQendBatchMode(PGconn *conn);
PQsendEndBatch
Delimits the end of a set of a batched commands by sending a sync message and flushing the send buffer. The end of a batch serves as the delimiter of an implicit transaction and an error recovery point; see error handling.
int PQsendEndBatch(PGconn *conn);
PQgetNextQuery
Causes the connection to start processing the next queued query's results. Returns 1 if a new query was popped from the result queue for processing. Returns 0 and has no effect if there are no query results pending, batch mode is not enabled, or if the query currently processed is incomplete or still has pending results. See processing results.
int PQgetNextQuery(PGconn *conn);
PQqueriesInBatch
Returns the number of queries still in the queue for this batch, not
including any query that's currently having results being processsed.
This is the number of times PQgetNextQuery
has to be
called before the query queue is empty again.
int PQqueriesInBatch(PGconn *conn);
PQbatchIsAborted
Returns 1 if the batch curently being received on a
libpq connection in batch mode is
aborted, 0
otherwise. The aborted flag is cleared as soon as the result of the
PQsendEndBatch
at the end of the aborted batch is
processed. Clients don't usually need this function as they can tell
that the batch is aborted from PGRES_BATCH_ABORTED
result codes.
int PQbatchIsAborted(PGconn *conn);