UNLOGGED table CREATEd on one connection not immediately visible to another connection

Started by Geoff Winkless3 months ago6 messagesgeneral

pg@ukku.uk

3 months ago

In our application we have a situation where once a day one process
CREATEs an UNLOGGED table and INSERTs several hundred records using
individual queries (no explicit transactions) all of which return
successfully. We then send the ID of the table that we have created
over a TCP socket to a second process, which runs a query that JOINs
against that new table.

Unfortunately quite often the second process is getting a
PGRES_FATAL_ERROR with

Primary: relation "qreftmp750" does not exist

Now (and this is very important) this appears to be a race condition,
because when that process immediately retries the same query (which we
do when we get FATAL_ERROR) it sometimes works on the second or third
(or even 11th) attempt.

If we were somehow failing to create the table then the retries would
never work, and we absolutely don't send the qreftmp ID to the second
process until we've successfully INSERTed all of the records, so the
race isn't on the application side. There's no explicit transactions
in either process involved, they all just use implicit autocommit, so
I don't see that this can be a DDL versioning issue.

I'm loathe to point the finger at PG because I'm sure that if this
were a real issue it would have been flagged up well before now, but
I've been staring at our code for days and I'm stumped. Any
suggestions?

Thanks

Geoff

Adrian Klaver

adrian.klaver@aklaver.com

3 months ago

In reply to: Geoff Winkless (#1)

Re: UNLOGGED table CREATEd on one connection not immediately visible to another connection

On 1/29/26 07:04, Geoff Winkless wrote:

Hi

In our application we have a situation where once a day one process
CREATEs an UNLOGGED table and INSERTs several hundred records using
individual queries (no explicit transactions) all of which return
successfully. We then send the ID of the table that we have created
over a TCP socket to a second process, which runs a query that JOINs
against that new table.

Unfortunately quite often the second process is getting a
PGRES_FATAL_ERROR with

Primary: relation "qreftmp750" does not exist

Now (and this is very important) this appears to be a race condition,
because when that process immediately retries the same query (which we
do when we get FATAL_ERROR) it sometimes works on the second or third
(or even 11th) attempt.

If we were somehow failing to create the table then the retries would
never work, and we absolutely don't send the qreftmp ID to the second
process until we've successfully INSERTed all of the records, so the
race isn't on the application side. There's no explicit transactions
in either process involved, they all just use implicit autocommit, so
I don't see that this can be a DDL versioning issue.

I'm loathe to point the finger at PG because I'm sure that if this
were a real issue it would have been flagged up well before now, but
I've been staring at our code for days and I'm stumped. Any
suggestions?

Provide the code for the procedure(s) that create the table and send the
ID to the other process.

Question, why is this not run in a single process?

Thanks

Geoff

--
Adrian Klaver
adrian.klaver@aklaver.com

Tom Lane

tgl@sss.pgh.pa.us

3 months ago

In reply to: Geoff Winkless (#1)

Re: UNLOGGED table CREATEd on one connection not immediately visible to another connection

Geoff Winkless <pg@ukku.uk> writes:

In our application we have a situation where once a day one process
CREATEs an UNLOGGED table and INSERTs several hundred records using
individual queries (no explicit transactions) all of which return
successfully. We then send the ID of the table that we have created
over a TCP socket to a second process, which runs a query that JOINs
against that new table.

Unfortunately quite often the second process is getting a
PGRES_FATAL_ERROR with

Primary: relation "qreftmp750" does not exist

Now (and this is very important) this appears to be a race condition,
because when that process immediately retries the same query (which we
do when we get FATAL_ERROR) it sometimes works on the second or third
(or even 11th) attempt.

Perhaps the second process is querying a standby server rather than
the primary? Replication lag could explain this. I don't really
believe that it's possible within a single PG server, though.
We take very substantial pains to avoid the race condition you're
positing.

If there is anything in your client software stack capable of issuing
implicit begin/commit, that'd deserve a second/third/fourth look...

regards, tom lane

Geoff Winkless

pg@ukku.uk

2 months ago

In reply to: Adrian Klaver (#2)

Re: UNLOGGED table CREATEd on one connection not immediately visible to another connection

On Thu, 29 Jan 2026 at 15:09, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

Question, why is this not run in a single process?

A little to do with logic separation but honestly mainly because of
history. Also the need for the first process to be allowed to get on
with other stuff while the work is done by the second process.

Yes, it could be threaded, but this is 35-year-old code, and while we
could rewrite it to use threads but that really shouldn't be required,
should it?

Provide the code for the procedure(s) that create the table and send the
ID to the other process.

I can't provide the entire code, because it's not open. It's using
libpq and our own wrapper around PQsendQuery that handles things like
autoretries, auto-reconnection and grabbing all results to ensure the
connection is kept in a consistent state.

What we do

INSERT INTO sometable (somefield) VALUES ('string') RETURNING seqid

We use that seqid to create a uniquely-named temp table

CREATE UNLOGGED TABLE qtemp%d (scid INTEGER PRIMARY KEY DEFAULT 0 NOT NULL)

We then do (in a loop around a list)

INSERT INTO qtemp%d (scid) values (%d) ON CONFLICT DO NOTHING

with the seqid from the first query and the values from our list

Each of these queries is tested for success by our wrapper. We would
see a failure in the log if any part of this had failed.

None of these queries is inside an explicit transaction.

We then send that seqid value to the second process's socket, and that
process then runs a query that looks something like:

SELECT DISTINCT 0 AS gid, lid FROM galloc
INNER JOIN qtemp%d as qt USING(sc_id)
WHERE EXISTS (SELECT 1 FROM stg where s_id in (993,996,994,995,997)
and stg.party=0 AND stg.scid=qt.scid)
AND NOT EXISTS (SELECT 1 FROM gl WHERE gl.lid=galloc.lid AND gl.g_id=422)'

We get the return error:

Primary: relation "qtempXXX" does not exist

(where XXX is the seqid)

The logging is verbose enough that I can see the seqid being used to
CREATE and INSERT the values matches the value that is being received
by the second process (and the query that is run matches that value).

Further, as I said previously it's very important that the receiving
process sometimes succeeds on retry - our wrapper retries the same
query string five times on fatal error, just in case of transient
failures. Sometimes we get 5 fatal errors in the log, sometimes only 2
(after which it then continues successfully).

Were it not for the retries succeeding I could accept that we might
somehow have failed to create the table (even though we are testing
for the success of the CREATE and the INSERTs), but it seems bizarre
to me that this could fail and then fail and then suddenly work.

There is no way the code could be dropping the table and then
recreating it after transmitting the seqid: there is no facility in
the code to drop the table. We do have a cron job that drops all
qtemp* tables overnight (at 23:00), but this error is occurring at
times nowhere near that (e.g. yesterday we saw this error at 13:28).

I'm fully prepared to accept that I'm doing something wrong because
I've tried to reproduce the error using a couple of test PHP processes
with a socket between them and haven't managed to reproduce the
failure in several million tries, but I'm at a loss as to what to look
for, so any suggestion would be gratefully accepted.

Geoff

David G. Johnston

david.g.johnston@gmail.com

2 months ago

In reply to: Geoff Winkless (#4)

On Friday, February 6, 2026, Geoff Winkless <pg@ukku.uk> wrote:

I can't provide the entire code, because it's not open. It's using
libpq and our own wrapper around PQsendQuery that handles things like
autoretries, auto-reconnection and grabbing all results to ensure the
connection is kept in a consistent state.

Have you investigated schema and search_path interactions?

Do you have a reproducer on your end, can try different versions of
PostgreSQL to see if it is a regression? Or to then remove components to
isolate a middleware dynamic. We’d probably need a reproducer without any
customizations to begin debugging core.

What version are you on?

David J.

Greg Sabino Mullane

greg@turnstep.com

2 months ago

In reply to: Geoff Winkless (#4)

Re: UNLOGGED table CREATEd on one connection not immediately visible to another connection

On Fri, Feb 6, 2026 at 11:56 AM Geoff Winkless <pg@ukku.uk> wrote:

I'm fully prepared to accept that I'm doing something wrong because I've
tried to reproduce the error using a couple of test PHP processes
with a socket between them and haven't managed to reproduce the failure in
several million tries, but I'm at a loss as to what to look
for, so any suggestion would be gratefully accepted.

Temporarily set log_statement = 'all', then parse through the logs to find
the error, and look around it to see when the table was created.

Cheers,
Greg