Cache lookup failed for type 34813 (CREATE TYPE AS ENUM + P/B/E insert, processor-specific)

Started by Eric Marsdenalmost 2 years ago3 messagesbugs
Jump to latest
#1Eric Marsden
eric.marsden@risk-engineering.org

Hello,

I have a strange bug which is either processor-specific (occurs on AMD
Ryzen 3 PRO 4350G processor, but not on Intel) or Linux kernel version
specific. It occurs with PostgreSQL v16.3 as packaged by Debian: 16.3
(Debian 16.3-1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
13.2.0-24) 13.2.0, 64-bit, running on Linux 6.8.12-amd64 / Debian GLIBC
2.38-13.

I see a PostgreSQL error "cache lookup failed for type 34813" after the
following sequence of network requests over a single network connection
to PostgreSQL (which is otherwise idle):

CREATE TYPE rating AS ENUM('ungood', 'good')
CREATE TABLE act(name TEXT, value RATING)
INSERT INTO act VALUES('thoughtcrime', 'ungood') -- OK
INSERT INTO act VALUES('blackwhite', $1) -- with $1 = "good" / type oid for "rating" / text format

The last line, which triggers the error, is using the
prepare-bind-execute protocol, with a single parameter sent in text
mode, with the oid corresponding to the "CREATE TYPE" (queried from the
pg_type table). This is with an obscure Emacs Lisp implementation of the
frontend/backend protocol. I have produced a standalone Emacs Lisp file
which reproduces the problem, available as a gist here

   https://gist.github.com/emarsden/9bbaf02d62961cabaf5404f10da2c01c

To reproduce (requires Emacs to be installed), save as "cache-bug.el"
and run

PGURI="postgresql://pgeltestuser:password@localhost/pgeltestdb" emacs
--batch -Q -l cache-bug.el -f run

This produces a long backtrace of which the last line is (with FR locale)

   PostgreSQL error: "ERREUR : cache lookup failed for type 34841"

I have not made the effort to reproduce this with the psycopg Python
library (the use of custom types is a little inconvenient).

This bug does not occur on the exact same version of PostgreSQL on
Debian (same sha256sum on the postmaster binary) running on an Intel
processor with an older kernel version. Nor does it occur on the problem
machine when running the current Docker image for PostgreSQL (neither
docker.io/library/postgres:16.3-bullseye nor the 16.3-alpine label).

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Marsden (#1)
Re: Cache lookup failed for type 34813 (CREATE TYPE AS ENUM + P/B/E insert, processor-specific)

Eric Marsden <eric.marsden@risk-engineering.org> writes:

I see a PostgreSQL error "cache lookup failed for type 34813" after the
following sequence of network requests over a single network connection
to PostgreSQL (which is otherwise idle):

CREATE TYPE rating AS ENUM('ungood', 'good')
CREATE TABLE act(name TEXT, value RATING)
INSERT INTO act VALUES('thoughtcrime', 'ungood') -- OK
INSERT INTO act VALUES('blackwhite', $1) -- with $1 = "good" / type oid for "rating" / text format

Interesting. To clarify, are these commands run within a single
transaction, and if not where are the transaction boundaries?

Would you run this under log_error_verbosity = verbose and post the
full postmaster log entry for the failure, especially the LOCATION
data? (Even better would be a stack trace from the errfinish call,
but perhaps LOCATION will be enough.)

regards, tom lane

#3Eric Marsden
eric.marsden@risk-engineering.org
In reply to: Tom Lane (#2)
Re: Cache lookup failed for type 34813 (CREATE TYPE AS ENUM + P/B/E insert, processor-specific) / user error

On 21/06/2024 17:35, Tom Lane wrote:

Eric Marsden<eric.marsden@risk-engineering.org> writes:

I see a PostgreSQL error "cache lookup failed for type 34813" after the
following sequence of network requests over a single network connection
to PostgreSQL (which is otherwise idle):
CREATE TYPE rating AS ENUM('ungood', 'good')
CREATE TABLE act(name TEXT, value RATING)
INSERT INTO act VALUES('thoughtcrime', 'ungood') -- OK
INSERT INTO act VALUES('blackwhite', $1) -- with $1 = "good" / type oid for "rating" / text format

Thanks to very useful discussions on #postgresql (IRC), I have found out
that this is a user error rather than a PostgreSQL bug. My PostgreSQL
network protocol library maintains an oid <-> type-name cache that is
populated at connection startup time. Due to improper cleanup in the
tests, a old value for the type oid was remaining when testing on my own
computer (but not in Docker nor for clean test runs on other machines),
so the PostgreSQL error message is perfectly valid.

I have realized that I need to stop using this oid <-> type-name cache
for types that are not PostgreSQL builtins, because there is no way of
ensuring it's synchronized with the backend values.

Thanks, and apologies for the spurious report,
Eric