Improve error message for duplicate labels in enum types

Started by Yugo Nagata6 months ago8 messages

nagata@sraoss.co.jp

6 months ago

1 attachment(s)

Hi,

Currently, when creating an enum type, duplicate labels are caught by a unique
index on pg_enum, resulting in a generic error message.

postgres=# create type t as enum ('a','b','a');
ERROR: duplicate key value violates unique constraint "pg_enum_typid_label_index"
DETAIL: Key (enumtypid, enumlabel)=(16418, a) already exists.

I propose adding an explicit check for duplicate labels during enum creation,
so that a more user-friendly and descriptive error message can be produced,
similar to what is already done in ALTER TYPE ... ADD VALUE
or ALTER TYPE ... RENAME VALUE .. TO ....

With the attached patch applied, the error message becomes:

ERROR: label "a" used more than once

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Attachments:

0001-Improve-error-message-for-duplicate-labels-in-enum-t.patchtext/x-diff; name=0001-Improve-error-message-for-duplicate-labels-in-enum-t.patchDownload

From 0b7f148e9ea6d9d6851e71f7ca3dc91b30b377af Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Thu, 3 Jul 2025 23:45:40 +0900
Subject: [PATCH] Improve error message for duplicate labels in enum types

Previously, duplicate labels in an enum type were caught by a unique
index on pg_enum, resulting in a generic error message. This adds an
explicit check beforehand to produce a more user-friendly and descriptive
error message.
---
 src/backend/catalog/pg_enum.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index a1634e58eec..9a4039d4669 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -164,6 +164,25 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
 	{
 		char	   *lab = strVal(lfirst(lc));
 		Name		enumlabel = palloc0(NAMEDATALEN);
+		ListCell   *lc2;
+
+		/*
+		 * Check for duplicate labels. The unique index on pg_enum would catch
+		 * that anyway, but we prefer a friendlier error message.
+		 */
+		foreach(lc2, vals)
+		{
+			char	   *lab2 = strVal(lfirst(lc2));
+
+			if (lc == lc2)
+				break;
+
+			if (strcmp(lab, lab2) == 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DUPLICATE_OBJECT),
+						 errmsg("label \"%s\" used more than once",
+								lab)));
+		}
 
 		/*
 		 * labels are stored in a name field, for easier syscache lookup, so
-- 
2.43.0

Rahila Syed

rahilasyed90@gmail.com

6 months ago

In reply to: Yugo Nagata (#1)

Re: Improve error message for duplicate labels in enum types

Hi Yugo,

Currently, when creating an enum type, duplicate labels are caught by a
unique
index on pg_enum, resulting in a generic error message.

postgres=# create type t as enum ('a','b','a');
ERROR: duplicate key value violates unique constraint
"pg_enum_typid_label_index"
DETAIL: Key (enumtypid, enumlabel)=(16418, a) already exists.

I propose adding an explicit check for duplicate labels during enum
creation,
so that a more user-friendly and descriptive error message can be produced,
similar to what is already done in ALTER TYPE ... ADD VALUE
or ALTER TYPE ... RENAME VALUE .. TO ....

With the attached patch applied, the error message becomes:

ERROR: label "a" used more than once

Thank you for sharing the patch.
+1 to the idea of improving the error message.

Please take the following points mentioned into consideration.

1. I’m considering whether there might be a more efficient way to handle
this.
The current method adds an extra loop to check for duplicates, in addition
to the existing duplicate index check,
even when no duplicates are present. Would it be possible to achieve this
by wrapping the following
insert call in a PG_TRY() and PG_CATCH() block and logging more descriptive
error in the PG_CATCH() block?

CatalogTuplesMultiInsertWithInfo(pg_enum, slot, slotCount,

indstate);

2. If we choose to include the check in the 0001 patch you provided, would
it make more sense to place
it earlier in the function, before assigning OIDs to the labels and running
qsort? This way, we could
catch duplicates sooner and prevent unnecessary processing.

Thank you,
Rahila Syed

Yugo Nagata

nagata@sraoss.co.jp

6 months ago

In reply to: Rahila Syed (#2)

Re: Improve error message for duplicate labels in enum types

Hi Rahila,

On Fri, 4 Jul 2025 07:42:58 +0530
Rahila Syed <rahilasyed90@gmail.com> wrote:

Thank you for sharing the patch.
+1 to the idea of improving the error message.

Thank you for your review.

Please take the following points mentioned into consideration.

1. I’m considering whether there might be a more efficient way to handle
this.
The current method adds an extra loop to check for duplicates, in addition
to the existing duplicate index check,
even when no duplicates are present. Would it be possible to achieve this
by wrapping the following
insert call in a PG_TRY() and PG_CATCH() block and logging more descriptive
error in the PG_CATCH() block?

Although this introduces a loop-in-loop for checkin the dulicates, I believe
the impact to the performance is not high because the number of values in an
enum would be not so large and creating an enum type is not executed so fequently.
I found check_duplicates_in_publist() and interpret_function_parameter_list take
similar ways, where duplicates are checked in nested loops.

Although this introduces a nested loop to check for duplicates, I believe the
performance impact is negligible, since the number of values in an enum is typically
small, and enum type creation is not a frequent operation.
I found that check_duplicates_in_publist() and interpret_function_parameter_list()
take a similar approach, using nested loops to check for duplicates.

If we were to use PG_TRY and PG_CATCH block, we wouldn't be able to identify exactly
which label is duplicated.

2. If we choose to include the check in the 0001 patch you provided, would
it make more sense to place
it earlier in the function, before assigning OIDs to the labels and running
qsort? This way, we could
catch duplicates sooner and prevent unnecessary processing.

If the duplicate check were done before the loop, we would have to add an extra nested
loop, which seems wasteful. The loop is the first place where the actual label text is
accessed using strVal(), and there's already a check for the label length here.
So I believe this is also the appropriate place to check for duplicates.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Jim Jones

jim.jones@uni-muenster.de

5 months ago

In reply to: Yugo Nagata (#1)

Re: Improve error message for duplicate labels in enum types

Hi Yugo

On 03.07.25 17:04, Yugo Nagata wrote:

Currently, when creating an enum type, duplicate labels are caught by a unique
index on pg_enum, resulting in a generic error message.

postgres=# create type t as enum ('a','b','a');
ERROR: duplicate key value violates unique constraint "pg_enum_typid_label_index"
DETAIL: Key (enumtypid, enumlabel)=(16418, a) already exists.

I propose adding an explicit check for duplicate labels during enum creation,
so that a more user-friendly and descriptive error message can be produced,
similar to what is already done in ALTER TYPE ... ADD VALUE
or ALTER TYPE ... RENAME VALUE .. TO ....

With the attached patch applied, the error message becomes:

ERROR: label "a" used more than once

The error message for already existing enum labels starts with "enum",
e.g. ERROR: enum label "bar" already exists. So, perhaps this new
error message should follow the same pattern?

I also wonder if we need to add tests for it, so that we make sure the
new error is triggered prior to the generic one, e.g. in create_type.sql

-- check for duplicate enum entries
CREATE TYPE den AS ENUM ('foo','bar','foo');
CREATE TYPE en AS ENUM ('foo','bar');
ALTER TYPE en ADD VALUE 'foo';
ALTER TYPE en RENAME VALUE 'foo' TO 'bar';
DROP TYPE en;

Regards, Jim

Yugo Nagata

nagata@sraoss.co.jp

5 months ago

In reply to: Jim Jones (#4)

1 attachment(s)

Re: Improve error message for duplicate labels in enum types

Hi Jim,

On Mon, 25 Aug 2025 10:08:23 +0200
Jim Jones <jim.jones@uni-muenster.de> wrote:

Hi Yugo

On 03.07.25 17:04, Yugo Nagata wrote:

Currently, when creating an enum type, duplicate labels are caught by a unique
index on pg_enum, resulting in a generic error message.

postgres=# create type t as enum ('a','b','a');
ERROR: duplicate key value violates unique constraint "pg_enum_typid_label_index"
DETAIL: Key (enumtypid, enumlabel)=(16418, a) already exists.

I propose adding an explicit check for duplicate labels during enum creation,
so that a more user-friendly and descriptive error message can be produced,
similar to what is already done in ALTER TYPE ... ADD VALUE
or ALTER TYPE ... RENAME VALUE .. TO ....

With the attached patch applied, the error message becomes:

ERROR: label "a" used more than once

The error message for already existing enum labels starts with "enum",
e.g. ERROR: enum label "bar" already exists. So, perhaps this new
error message should follow the same pattern?

Thank you for taking a look. That makes sense, so I updated the message to:

ERROR: enum label "a" used more than once

I also wonder if we need to add tests for it, so that we make sure the
new error is triggered prior to the generic one, e.g. in create_type.sql

-- check for duplicate enum entries
CREATE TYPE den AS ENUM ('foo','bar','foo');
CREATE TYPE en AS ENUM ('foo','bar');
ALTER TYPE en ADD VALUE 'foo';
ALTER TYPE en RENAME VALUE 'foo' TO 'bar';
DROP TYPE en;

I also added a test for duplicate enum entries to enum.sql,
since tests for existing entries are already there.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Attachments:

v2-0001-Improve-error-message-for-duplicate-labels-in-enu.patchtext/x-diff; name=v2-0001-Improve-error-message-for-duplicate-labels-in-enu.patchDownload

From 9b7392f7e26af0c6464731c42adc5b8b91a011f7 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Thu, 3 Jul 2025 23:45:40 +0900
Subject: [PATCH v2] Improve error message for duplicate labels in enum types

Previously, duplicate labels in an enum type were caught by a unique
index on pg_enum, resulting in a generic error message. This adds an
explicit check beforehand to produce a more user-friendly and descriptive
error message.
---
 src/backend/catalog/pg_enum.c      | 19 +++++++++++++++++++
 src/test/regress/expected/enum.out |  3 +++
 src/test/regress/sql/enum.sql      |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index a1634e58eec..daa11d34976 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -164,6 +164,25 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
 	{
 		char	   *lab = strVal(lfirst(lc));
 		Name		enumlabel = palloc0(NAMEDATALEN);
+		ListCell   *lc2;
+
+		/*
+		 * Check for duplicate labels. The unique index on pg_enum would catch
+		 * that anyway, but we prefer a friendlier error message.
+		 */
+		foreach(lc2, vals)
+		{
+			char	   *lab2 = strVal(lfirst(lc2));
+
+			if (lc == lc2)
+				break;
+
+			if (strcmp(lab, lab2) == 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DUPLICATE_OBJECT),
+						 errmsg("enum label \"%s\" used more than once",
+								lab)));
+		}
 
 		/*
 		 * labels are stored in a name field, for easier syscache lookup, so
diff --git a/src/test/regress/expected/enum.out b/src/test/regress/expected/enum.out
index 4d9f36d0d36..09201489080 100644
--- a/src/test/regress/expected/enum.out
+++ b/src/test/regress/expected/enum.out
@@ -52,6 +52,9 @@ hint           |
 sql_error_code | 22P02
 
 \x
+-- check for duplicate enum entries
+CREATE TYPE den AS ENUM ('foo','bar','foo');
+ERROR:  enum label "foo" used more than once
 --
 -- adding new values
 --
diff --git a/src/test/regress/sql/enum.sql b/src/test/regress/sql/enum.sql
index ecc4878a678..8f744c71bea 100644
--- a/src/test/regress/sql/enum.sql
+++ b/src/test/regress/sql/enum.sql
@@ -23,6 +23,9 @@ SELECT * FROM pg_input_error_info('mauve', 'rainbow');
 SELECT * FROM pg_input_error_info(repeat('too_long', 32), 'rainbow');
 \x
 
+-- check for duplicate enum entries
+CREATE TYPE den AS ENUM ('foo','bar','foo');
+
 --
 -- adding new values
 --
-- 
2.43.0

Jim Jones

jim.jones@uni-muenster.de

5 months ago

In reply to: Yugo Nagata (#5)

Re: Improve error message for duplicate labels in enum types

On 26.08.25 04:55, Yugo Nagata wrote:

Thank you for taking a look. That makes sense, so I updated the message to:

ERROR: enum label "a" used more than once

Nice.

I also added a test for duplicate enum entries to enum.sql,
since tests for existing entries are already there.

LGTM; I'll mark the CF entry as Ready for Committer.

Regards, Jim

Tom Lane

tgl@sss.pgh.pa.us

4 months ago

In reply to: Jim Jones (#6)

Re: Improve error message for duplicate labels in enum types

Jim Jones <jim.jones@uni-muenster.de> writes:

LGTM; I'll mark the CF entry as Ready for Committer.

Pushed with some trivial cosmetic adjustments, including the
perhaps-not-so-trivial fix of removing the comment you falsified.

I was concerned about Rahila's upthread worry about the performance
of this approach, but in some quick testing it seemed to add only
barely-noticeable overhead even at 1000 enum labels. At 10000
labels it's slightly annoying: my machine goes from ~80ms to ~250ms.
But that seems well beyond what anybody would be likely to use,
so I judge it not worth trying to be smarter.

The obvious solution if we did wish to avoid the O(N^2) behavior would
be to qsort the labels and then compare only adjacent ones. That'd
require a temporary array though, and I'd bet it's actually slower
than this way for normal-sized enums. Another possibility perhaps is
to apply the check only when there are fewer than say 1000 labels,
reasoning that anything bigger is probably machine-generated anyhow.

regards, tom lane

Yugo Nagata

nagata@sraoss.co.jp

4 months ago

In reply to: Tom Lane (#7)

Re: Improve error message for duplicate labels in enum types

On Tue, 2 Sep 2025 14:00:52 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jim Jones <jim.jones@uni-muenster.de> writes:

LGTM; I'll mark the CF entry as Ready for Committer.

Pushed with some trivial cosmetic adjustments, including the
perhaps-not-so-trivial fix of removing the comment you falsified.

Thank you for committing the patch and for fixing it.

I was concerned about Rahila's upthread worry about the performance
of this approach, but in some quick testing it seemed to add only
barely-noticeable overhead even at 1000 enum labels. At 10000
labels it's slightly annoying: my machine goes from ~80ms to ~250ms.
But that seems well beyond what anybody would be likely to use,
so I judge it not worth trying to be smarter.

The obvious solution if we did wish to avoid the O(N^2) behavior would
be to qsort the labels and then compare only adjacent ones. That'd
require a temporary array though, and I'd bet it's actually slower
than this way for normal-sized enums. Another possibility perhaps is
to apply the check only when there are fewer than say 1000 labels,
reasoning that anything bigger is probably machine-generated anyhow.

I also thought the O(N^2) behavior was acceptable, since it seemed unlikely
that users would try to create such a large number of enum labels.
If any complaints arise in the future, we can address them then.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>