Patch: Add parse_type Function

Started by David E. Wheeleralmost 2 years ago49 messages
#1David E. Wheeler
david@justatheory.com
1 attachment(s)

Hackers,

Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the typid and typmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the parser’s parseTypeString() function.

The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as rendered by, for example, pg_dump. An example:

david=# WITH s(s) AS (
SELECT * FROM unnest(ARRAY[
'timestamp(4)',
'interval(0)',
'interval second(0)',
'timestamptz',
'timestamptz(6)',
'varchar',
'varchar(128)'
])
),
p(type, typid, typmod) AS (
SELECT s, ((parse_type(s)).*)
FROM s
)
SELECT type, format_type(typid, typmod) FROM p;
type | format_type
--------------------+--------------------------------
timestamp(4) | timestamp(4) without time zone
interval(0) | interval(0)
interval second(0) | interval second(0)
timestamptz | timestamp with time zone
timestamptz(6) | timestamp(6) with time zone
varchar | character varying
varchar(128) | character varying(128)
(7 rows)

The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type names for testing data types, but recently added support for aliases. This broke for some types, because it was difficult to extract the typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where `second (0)` is the typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass to `format_type()`.

Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch.

Potential issues and questions for this patch:

* Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`. Whatever the consensus is works for me.

* The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that just contains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think it might also be useful to have access to the typmod and typid directly via this method, since they’re already exposed as `format_type()`’s arguments.

* Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(), but happy to move it to a more appropriate location.

* I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not sure it resolves.

Best,

David

Attachments:

v1-0001-Add-parse_type-SQL-function.patchapplication/octet-stream; name=v1-0001-Add-parse_type-SQL-function.patch; x-unix-mode=0644Download
From b426a1a258b65237632039abffb1ab101ce709c9 Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Sun, 4 Feb 2024 12:49:18 -0500
Subject: [PATCH v1] Add parse_type() SQL function

The `parse_type()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID and
typmod suitabld for passing to `format_type()`. This allows one to
derive the formal SQL name for a type from a string that may be an
alias:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('timestamp(4)') p;

            format_type
    --------------------------------
    timestamp(4) without time zone

Also accounts for data typs that require the SQL grammar to be parsed:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('interval second(0)') p;

        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example.
Originally written by Erik Wienhold for use in pgTAP.
---
 doc/src/sgml/func.sgml                    | 31 ++++++++++-
 src/backend/utils/adt/format_type.c       | 66 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat           |  4 ++
 src/include/utils/builtins.h              |  2 +
 src/test/regress/expected/create_type.out | 47 ++++++++++++++++
 src/test/regress/sql/create_type.sql      | 27 ++++++++++
 6 files changed, 176 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6788ba8ef4..12e2d69568 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24754,7 +24754,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -24768,6 +24768,35 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
        </para></entry>
       </row>
 
+      <row>
+       <entry id="parse_type" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>parse_type</primary>
+        </indexterm>
+        <function>parse_type</function> ( <parameter>type</parameter> <type>text</type> )
+        <returnvalue>record</returnvalue>
+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>int4</type> )
+       </para>
+       <para>
+        Parses a string representing an SQL type declaration as used in a
+        <command>CREATE TABLE</command> statement, optionally schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typid</parameter>, representing the OID and modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function> function.</link>
+       </para>
+       <para>
+        For example:
+<programlisting>
+SELECT format_type(p.typid, p.typmod) FROM parse_type('timestamp(4)') p;
+        format_type
+ --------------------------------
+  timestamp(4) without time zone
+</programlisting>
+       </para></entry>
+      </row>
+
       <row>
        <entry id="pg-char-to-encoding" role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 28ba0fbd19..2477c8b8e0 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -26,6 +26,9 @@
 #include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/syscache.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "parser/parse_type.h"
 
 static char *printTypmod(const char *typname, int32 typmod, Oid typmodout);
 
@@ -482,3 +485,66 @@ oidvectortypes(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+/*
+ * Given a string that is supposed to be a SQL-compatible type declaration,
+ * such as "int4" or "integer" or "character varying(32)", parse
+ * the string and convert it to a type OID and type modifier.
+ *
+ * Raises an error on an invalid type.
+ */
+
+/*
+ * parse_type() is the inverse of pg_catalog.format_type(): it takes a string
+ * representing an SQL-compatible type declaration, such as "int4" or "integer"
+ * or "character varying(32)", parses it, and returns the OID and type modifier.
+ * Returns NULL for an invalid type.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+PG_FUNCTION_INFO_V1(parse_type);
+
+Datum
+parse_type(PG_FUNCTION_ARGS)
+{
+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+	const char *type;            /* the type string we want to resolve */
+	Oid         typid;           /* the resolved type oid */
+	int32       typmod;          /* the resolved type modifier */
+	TupleDesc   tupdesc;
+	HeapTuple   rettuple;
+	Datum       values[PARSE_TYPE_STRING_COLS] = {0};
+	bool        nulls[PARSE_TYPE_STRING_COLS] = {0};
+
+	type = text_to_cstring(PG_GETARG_TEXT_PP(0));
+
+	/*
+	 * Build a tuple descriptor for our result type; return an error if not
+	 * called in a context that expects a record.
+	 */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) {
+		ereport(
+			ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("function returning record called in context that cannot accept type record"))
+		);
+	}
+
+	BlessTupleDesc(tupdesc);
+
+	/*
+	 * Parse type-name argument to obtain type OID and encoded typmod. We don't
+	 * need to check for parseTypeString failure, but just let the error be
+	 * raised. The 0 arg works both as the `Node *escontext` arg in Postgres 16
+	 * and the `bool missing_ok` arg in 9.4-15.
+	 */
+	(void) parseTypeString(type, &typid, &typmod, 0);
+
+	/* Create and return tuple. */
+	values[0] = typid;
+	values[1] = typmod;
+	rettuple = heap_form_tuple(tupdesc, values, nulls);
+	return HeapTupleGetDatum(rettuple);
+#undef PARSE_TYPE_STRING_COLS
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d..31d7024010 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -2184,6 +2184,10 @@
 { oid => '1081', descr => 'format a type oid and atttypmod to canonical SQL',
   proname => 'format_type', proisstrict => 'f', provolatile => 's',
   prorettype => 'text', proargtypes => 'oid int4', prosrc => 'format_type' },
+{ oid => '8401', descr => 'parse a type string into its a type oid and atttypmod',
+  proname => 'parse_type', proisstrict => 'f', provolatile => 's',
+  prorettype => 'record', proargtypes => 'text', prosrc => 'parse_type',
+  proallargtypes => '{text,oid,int4}', proargmodes => '{i,o,o}', proargnames => '{typname,typid,typmod}' },
 { oid => '1084', descr => 'I/O',
   proname => 'date_in', provolatile => 's', prorettype => 'date',
   proargtypes => 'cstring', prosrc => 'date_in' },
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 359c570f23..264ad090be 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -133,6 +133,8 @@ extern char *format_type_with_typemod(Oid type_oid, int32 typemod);
 
 extern int32 type_maximum_size(Oid type_oid, int32 typemod);
 
+extern Datum parse_type(PG_FUNCTION_ARGS);
+
 /* quote.c */
 extern char *quote_literal_cstr(const char *rawstr);
 
diff --git a/src/test/regress/expected/create_type.out b/src/test/regress/expected/create_type.out
index 7383fcdbb1..5a3ebfc658 100644
--- a/src/test/regress/expected/create_type.out
+++ b/src/test/regress/expected/create_type.out
@@ -249,6 +249,53 @@ select format_type('bpchar'::regtype, -1);
  bpchar
 (1 row)
 
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+ typid | typmod 
+-------+--------
+    25 |     -1
+(1 row)
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+ERROR:  type "nonesuch" does not exist
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT parse_type('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT parse_type('year(4)'); -- grammar error expected
+ERROR:  type "year" does not exist
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)',
+        'mytab'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+          format_type           
+--------------------------------
+ timestamp(4) without time zone
+ interval(0)
+ interval second(0)
+ timestamp with time zone
+ timestamp(6) with time zone
+ character varying
+ character varying(128)
+ mytab
+(8 rows)
+
 -- Test non-error-throwing APIs using widget, which still throws errors
 SELECT pg_input_is_valid('(1,2,3)', 'widget');
  pg_input_is_valid 
diff --git a/src/test/regress/sql/create_type.sql b/src/test/regress/sql/create_type.sql
index c25018029c..c14c7590a9 100644
--- a/src/test/regress/sql/create_type.sql
+++ b/src/test/regress/sql/create_type.sql
@@ -192,6 +192,33 @@ select format_type('bpchar'::regtype, null);
 -- this behavior difference is intentional
 select format_type('bpchar'::regtype, -1);
 
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+SELECT parse_type('year(4)'); -- grammar error expected
+
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)',
+        'mytab'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+
 -- Test non-error-throwing APIs using widget, which still throws errors
 SELECT pg_input_is_valid('(1,2,3)', 'widget');
 SELECT pg_input_is_valid('(1,2)', 'widget');  -- hard error expected
-- 
2.43.0

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: David E. Wheeler (#1)
Re: Patch: Add parse_type Function

Hi

ne 4. 2. 2024 v 18:51 odesílatel David E. Wheeler <david@justatheory.com>
napsal:

Hackers,

Attached is a patch to add a new function, `parse_type()`. It parses a
type string and returns a record with the typid and typmod for the type, or
raises an error if the type is invalid. It’s effectively a thin layer over
the parser’s parseTypeString() function.

The purpose of this function is to allow uses to resolve any type name,
including aliases, to its formal name as rendered by, for example, pg_dump.
An example:

david=# WITH s(s) AS (
SELECT * FROM unnest(ARRAY[
'timestamp(4)',
'interval(0)',
'interval second(0)',
'timestamptz',
'timestamptz(6)',
'varchar',
'varchar(128)'
])
),
p(type, typid, typmod) AS (
SELECT s, ((parse_type(s)).*)
FROM s
)
SELECT type, format_type(typid, typmod) FROM p;
type | format_type
--------------------+--------------------------------
timestamp(4) | timestamp(4) without time zone
interval(0) | interval(0)
interval second(0) | interval second(0)
timestamptz | timestamp with time zone
timestamptz(6) | timestamp(6) with time zone
varchar | character varying
varchar(128) | character varying(128)
(7 rows)

The use case leading to this patch was pgTAP column data type assertions.
pgTAP traditionally required full type names for testing data types, but
recently added support for aliases. This broke for some types, because it
was difficult to extract the typmod correctly, and even when we did, it
failed for types such as `interval second(0)`, where `second (0)` is the
typmod, and there was no sensible way to access the bit mask to generate
the proper typmod to pass to `format_type()`.

Erik Wienhold wrote this function to solve the problem, and I volunteered
to submit a patch.

Potential issues and questions for this patch:

* Is `parse_type()` the right name to use? I can see arguments for
`parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`.
Whatever the consensus is works for me.

there can be an analogy with parse_ident, so parse_type looks ok

* The function returns a record, which is a little fussy. I could see
adding a `format_type_string()` function that just contains `SELECT
format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the
formatted type. But I think it might also be useful to have access to the
typmod and typid directly via this method, since they’re already exposed as
`format_type()`’s arguments.

I think so record has sense for this case

* Is format_type.c the right home for the function? I put it there because
I closely associate it with format_type(), but happy to move it to a more
appropriate location.

yes

* I tried to link to `format_type` from the docs entry for `parse_type`,
and added an ID for the former, but I’m not sure it resolves.

I thinks so proposed functionality can be useful

Regards

Pavel

Show quoted text

Best,

David

#3David E. Wheeler
david@justatheory.com
In reply to: Pavel Stehule (#2)
Re: Patch: Add parse_type Function

On Feb 4, 2024, at 13:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I thinks so proposed functionality can be useful

Great, thank you!

Is that a review? :-)

D

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: David E. Wheeler (#3)
Re: Patch: Add parse_type Function

ne 4. 2. 2024 v 19:30 odesílatel David E. Wheeler <david@justatheory.com>
napsal:

On Feb 4, 2024, at 13:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I thinks so proposed functionality can be useful

Great, thank you!

Is that a review? :-)

not yet, but I'll do it

Pavel

Show quoted text

D

#5David E. Wheeler
david@justatheory.com
In reply to: Pavel Stehule (#4)
Re: Patch: Add parse_type Function

On Feb 4, 2024, at 13:52, Pavel Stehule <pavel.stehule@gmail.com> wrote:

not yet, but I'll do it

Nice, thank you. I put it into the Commitfest.

https://commitfest.postgresql.org/47/4807/

Best,

David

#6Jim Jones
jim.jones@uni-muenster.de
In reply to: David E. Wheeler (#1)
Re: Patch: Add parse_type Function

Hi David,
Thanks for the patch.

On 04.02.24 18:51, David E. Wheeler wrote:

Hackers,

Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the typid and typmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the parser’s parseTypeString() function.

The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as rendered by, for example, pg_dump. An example:

david=# WITH s(s) AS (
SELECT * FROM unnest(ARRAY[
'timestamp(4)',
'interval(0)',
'interval second(0)',
'timestamptz',
'timestamptz(6)',
'varchar',
'varchar(128)'
])
),
p(type, typid, typmod) AS (
SELECT s, ((parse_type(s)).*)
FROM s
)
SELECT type, format_type(typid, typmod) FROM p;
type | format_type
--------------------+--------------------------------
timestamp(4) | timestamp(4) without time zone
interval(0) | interval(0)
interval second(0) | interval second(0)
timestamptz | timestamp with time zone
timestamptz(6) | timestamp(6) with time zone
varchar | character varying
varchar(128) | character varying(128)
(7 rows)

The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type names for testing data types, but recently added support for aliases. This broke for some types, because it was difficult to extract the typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where `second (0)` is the typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass to `format_type()`.

Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch.

Potential issues and questions for this patch:

* Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`. Whatever the consensus is works for me.

* The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that just contains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think it might also be useful to have access to the typmod and typid directly via this method, since they’re already exposed as `format_type()`’s arguments.

* Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(), but happy to move it to a more appropriate location.

* I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not sure it resolves.

Best,

David

The patch applies cleanly

Checking patch doc/src/sgml/func.sgml...
Checking patch src/backend/utils/adt/format_type.c...
Checking patch src/include/catalog/pg_proc.dat...
Checking patch src/include/utils/builtins.h...
Checking patch src/test/regress/expected/create_type.out...
Checking patch src/test/regress/sql/create_type.sql...
Applied patch doc/src/sgml/func.sgml cleanly.
Applied patch src/backend/utils/adt/format_type.c cleanly.
Applied patch src/include/catalog/pg_proc.dat cleanly.
Applied patch src/include/utils/builtins.h cleanly.
Applied patch src/test/regress/expected/create_type.out cleanly.
Applied patch src/test/regress/sql/create_type.sql cleanly.

The docs render just fine and all tests pass (check and check-world).

There are a few issues though:

1) The function causes a crash when called with a NULL parameter:

SELECT * FROM parse_type(NULL);
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

You have to check it in the beginning of function and either return an
error message or just NULL, e.g

    if (PG_ARGISNULL(0))
        PG_RETURN_NULL();

2) Add a function call with NULL in the regression tests

SELECT * FROM parse_type(NULL);

3) Add the expected behaviour of calling the function with NULL in the
docs (error message or null)

4) The name of the returned columns is repeated in the docs

Returns a record with two fields, <parameter>typid</parameter> and

<parameter>typid</parameter>

--
Jim

#7David E. Wheeler
david@justatheory.com
In reply to: Jim Jones (#6)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 5, 2024, at 08:02, Jim Jones <jim.jones@uni-muenster.de> wrote:

There are a few issues though:

Excellent, thank you very much! Updated patch attached.

Best,

David

Attachments:

v2-0001-Add-parse_type-SQL-function.patchapplication/octet-stream; name=v2-0001-Add-parse_type-SQL-function.patch; x-unix-mode=0644Download
From 3da1e483571ca91191f133e0df1de7009483cd9f Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Mon, 5 Feb 2024 09:09:18 -0500
Subject: [PATCH v2] Add parse_type() SQL function

The `parse_type()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID and
typmod suitabld for passing to `format_type()`. This allows one to
derive the formal SQL name for a type from a string that may be an
alias:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('timestamp(4)') p;

            format_type
    --------------------------------
    timestamp(4) without time zone

Also accounts for data typs that require the SQL grammar to be parsed:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('interval second(0)') p;

        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example.
Originally written by Erik Wienhold for use in pgTAP.
---
 doc/src/sgml/func.sgml                    | 32 ++++++++++-
 src/backend/utils/adt/format_type.c       | 70 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat           |  4 ++
 src/include/utils/builtins.h              |  2 +
 src/test/regress/expected/create_type.out | 53 +++++++++++++++++
 src/test/regress/sql/create_type.sql      | 28 +++++++++
 6 files changed, 188 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6788ba8ef4..5344a870ed 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24754,7 +24754,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -24768,6 +24768,36 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
        </para></entry>
       </row>
 
+      <row>
+       <entry id="parse_type" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>parse_type</primary>
+        </indexterm>
+        <function>parse_type</function> ( <parameter>type</parameter> <type>text</type> )
+        <returnvalue>record</returnvalue>
+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>int4</type> )
+       </para>
+       <para>
+        Parses a string representing an SQL type declaration as used in a
+        <command>CREATE TABLE</command> statement, optionally schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function> function.</link>
+        Returns a null record on null input.
+       </para>
+       <para>
+        For example:
+<programlisting>
+SELECT format_type(p.typid, p.typmod) FROM parse_type('timestamp(4)') p;
+        format_type
+ --------------------------------
+  timestamp(4) without time zone
+</programlisting>
+       </para></entry>
+      </row>
+
       <row>
        <entry id="pg-char-to-encoding" role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 28ba0fbd19..91e1064a34 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -26,6 +26,9 @@
 #include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/syscache.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "parser/parse_type.h"
 
 static char *printTypmod(const char *typname, int32 typmod, Oid typmodout);
 
@@ -482,3 +485,70 @@ oidvectortypes(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+/*
+ * Given a string that is supposed to be a SQL-compatible type declaration,
+ * such as "int4" or "integer" or "character varying(32)", parse
+ * the string and convert it to a type OID and type modifier.
+ *
+ * Raises an error on an invalid type.
+ */
+
+/*
+ * parse_type() is the inverse of pg_catalog.format_type(): it takes a string
+ * representing an SQL-compatible type declaration, such as "int4" or "integer"
+ * or "character varying(32)", parses it, and returns the OID and type modifier.
+ * Returns NULL for an invalid type.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+PG_FUNCTION_INFO_V1(parse_type);
+
+Datum
+parse_type(PG_FUNCTION_ARGS)
+{
+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+	const char *type;            /* the type string we want to resolve */
+	Oid         typid;           /* the resolved type oid */
+	int32       typmod;          /* the resolved type modifier */
+	TupleDesc   tupdesc;
+	HeapTuple   rettuple;
+	Datum       values[PARSE_TYPE_STRING_COLS] = {0};
+	bool        nulls[PARSE_TYPE_STRING_COLS] = {0};
+
+	if (PG_ARGISNULL(0)) {
+        PG_RETURN_NULL();
+	}
+
+	type = text_to_cstring(PG_GETARG_TEXT_PP(0));
+
+	/*
+	 * Build a tuple descriptor for our result type; return an error if not
+	 * called in a context that expects a record.
+	 */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) {
+		ereport(
+			ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("function returning record called in context that cannot accept type record"))
+		);
+	}
+
+	BlessTupleDesc(tupdesc);
+
+	/*
+	 * Parse type-name argument to obtain type OID and encoded typmod. We don't
+	 * need to check for parseTypeString failure, but just let the error be
+	 * raised. The 0 arg works both as the `Node *escontext` arg in Postgres 16
+	 * and the `bool missing_ok` arg in 9.4-15.
+	 */
+	(void) parseTypeString(type, &typid, &typmod, 0);
+
+	/* Create and return tuple. */
+	values[0] = typid;
+	values[1] = typmod;
+	rettuple = heap_form_tuple(tupdesc, values, nulls);
+	return HeapTupleGetDatum(rettuple);
+#undef PARSE_TYPE_STRING_COLS
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d..31d7024010 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -2184,6 +2184,10 @@
 { oid => '1081', descr => 'format a type oid and atttypmod to canonical SQL',
   proname => 'format_type', proisstrict => 'f', provolatile => 's',
   prorettype => 'text', proargtypes => 'oid int4', prosrc => 'format_type' },
+{ oid => '8401', descr => 'parse a type string into its a type oid and atttypmod',
+  proname => 'parse_type', proisstrict => 'f', provolatile => 's',
+  prorettype => 'record', proargtypes => 'text', prosrc => 'parse_type',
+  proallargtypes => '{text,oid,int4}', proargmodes => '{i,o,o}', proargnames => '{typname,typid,typmod}' },
 { oid => '1084', descr => 'I/O',
   proname => 'date_in', provolatile => 's', prorettype => 'date',
   proargtypes => 'cstring', prosrc => 'date_in' },
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 359c570f23..264ad090be 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -133,6 +133,8 @@ extern char *format_type_with_typemod(Oid type_oid, int32 typemod);
 
 extern int32 type_maximum_size(Oid type_oid, int32 typemod);
 
+extern Datum parse_type(PG_FUNCTION_ARGS);
+
 /* quote.c */
 extern char *quote_literal_cstr(const char *rawstr);
 
diff --git a/src/test/regress/expected/create_type.out b/src/test/regress/expected/create_type.out
index 7383fcdbb1..84ea71090c 100644
--- a/src/test/regress/expected/create_type.out
+++ b/src/test/regress/expected/create_type.out
@@ -249,6 +249,59 @@ select format_type('bpchar'::regtype, -1);
  bpchar
 (1 row)
 
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+ typid | typmod 
+-------+--------
+    25 |     -1
+(1 row)
+
+SELECT * FROM parse_type(NULL) p(typid, typmod);
+ typid | typmod 
+-------+--------
+       |       
+(1 row)
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+ERROR:  type "nonesuch" does not exist
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT parse_type('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT parse_type('year(4)'); -- grammar error expected
+ERROR:  type "year" does not exist
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)',
+        'mytab'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+          format_type           
+--------------------------------
+ timestamp(4) without time zone
+ interval(0)
+ interval second(0)
+ timestamp with time zone
+ timestamp(6) with time zone
+ character varying
+ character varying(128)
+ mytab
+(8 rows)
+
 -- Test non-error-throwing APIs using widget, which still throws errors
 SELECT pg_input_is_valid('(1,2,3)', 'widget');
  pg_input_is_valid 
diff --git a/src/test/regress/sql/create_type.sql b/src/test/regress/sql/create_type.sql
index c25018029c..e4700b8719 100644
--- a/src/test/regress/sql/create_type.sql
+++ b/src/test/regress/sql/create_type.sql
@@ -192,6 +192,34 @@ select format_type('bpchar'::regtype, null);
 -- this behavior difference is intentional
 select format_type('bpchar'::regtype, -1);
 
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+SELECT * FROM parse_type(NULL) p(typid, typmod);
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+SELECT parse_type('year(4)'); -- grammar error expected
+
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)',
+        'mytab'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+
 -- Test non-error-throwing APIs using widget, which still throws errors
 SELECT pg_input_is_valid('(1,2,3)', 'widget');
 SELECT pg_input_is_valid('(1,2)', 'widget');  -- hard error expected
-- 
2.43.0

In reply to: Jim Jones (#6)
Re: Patch: Add parse_type Function

Jim Jones <jim.jones@uni-muenster.de> writes:

1) The function causes a crash when called with a NULL parameter:

SELECT * FROM parse_type(NULL);
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

You have to check it in the beginning of function and either return an
error message or just NULL, e.g

    if (PG_ARGISNULL(0))
        PG_RETURN_NULL();

Instead of handling this in the function body, the function should be
declared as strict. This is in fact the default in pg_proc.h,

/* strict with respect to NULLs? */
bool proisstrict BKI_DEFAULT(t);

so removing the explicit proisstrict => 'f' from the pg_proc.dat entry
will fix it with no additional code.

2) Add a function call with NULL in the regression tests

SELECT * FROM parse_type(NULL);

3) Add the expected behaviour of calling the function with NULL in the
docs (error message or null)

Once the function is declared strict, I don't think either of these is
necessary: function strictness is tested elsewhere, and it's the default
behaviour. The only functions that explicitly say they return NULL on
NULL inputs are quote_literal (because you might expect it to return the
string 'NULL', but there's qoute_nullable for that) and xmlexists (which
I don't see any particular reason for).

- ilmari

#9Jim Jones
jim.jones@uni-muenster.de
In reply to: David E. Wheeler (#7)
Re: Patch: Add parse_type Function

On 05.02.24 15:10, David E. Wheeler wrote:

Excellent, thank you very much! Updated patch attached.

Best,

David

v2 no longer crashes with a null parameter.
docs and regress tests were updated accordingly.

patch no longer applies cleanly (tiny little indentation issue):

/home/jim/Downloads/v2-0001-Add-parse_type-SQL-function.patch:140:
indent with spaces.
        PG_RETURN_NULL();
warning: 1 line adds whitespace errors.

I read the comments again, and something is not entirely clear to me.
Line 494 says "Raises an error on an invalid type." and 501 says
"Returns NULL for an invalid type."
Perhaps merging both comment blocks and rephrasing these sentences would
make things clearer?

--
Jim

#10Jim Jones
jim.jones@uni-muenster.de
In reply to: Dagfinn Ilmari Mannsåker (#8)
Re: Patch: Add parse_type Function

On 05.02.24 15:32, Dagfinn Ilmari Mannsåker wrote:

Once the function is declared strict, I don't think either of these is
necessary: function strictness is tested elsewhere, and it's the default
behaviour. The only functions that explicitly say they return NULL on
NULL inputs are quote_literal (because you might expect it to return the
string 'NULL', but there's qoute_nullable for that) and xmlexists (which
I don't see any particular reason for).

- ilmari

+1
Yes, if the function was strict (which in the current design is not the
case) there is no need to check for nulls.

--
Jim

#11David E. Wheeler
david@justatheory.com
In reply to: Jim Jones (#10)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 5, 2024, at 09:49, Jim Jones <jim.jones@uni-muenster.de> wrote:

Yes, if the function was strict (which in the current design is not the
case) there is no need to check for nulls.

Right, done, and cleaned up the redundant comments.

Best,

David

Attachments:

v3-0001-Add-parse_type-SQL-function.patchapplication/octet-stream; name=v3-0001-Add-parse_type-SQL-function.patch; x-unix-mode=0644Download
From c514e2900bf4967186c13922bcb9808dcfd3f851 Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Mon, 5 Feb 2024 10:12:40 -0500
Subject: [PATCH v3] Add parse_type() SQL function

The `parse_type()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID and
typmod suitabld for passing to `format_type()`. This allows one to
derive the formal SQL name for a type from a string that may be an
alias:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('timestamp(4)') p;

            format_type
    --------------------------------
    timestamp(4) without time zone

Also accounts for data typs that require the SQL grammar to be parsed:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('interval second(0)') p;

        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example.
Originally written by Erik Wienhold for use in pgTAP.
---
 doc/src/sgml/func.sgml                    | 31 +++++++++++-
 src/backend/utils/adt/format_type.c       | 59 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat           |  4 ++
 src/include/utils/builtins.h              |  2 +
 src/test/regress/expected/create_type.out | 53 ++++++++++++++++++++
 src/test/regress/sql/create_type.sql      | 28 +++++++++++
 6 files changed, 176 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6788ba8ef4..1e2426acae 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24754,7 +24754,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -24768,6 +24768,35 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
        </para></entry>
       </row>
 
+      <row>
+       <entry id="parse_type" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>parse_type</primary>
+        </indexterm>
+        <function>parse_type</function> ( <parameter>type</parameter> <type>text</type> )
+        <returnvalue>record</returnvalue>
+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>int4</type> )
+       </para>
+       <para>
+        Parses a string representing an SQL type declaration as used in a
+        <command>CREATE TABLE</command> statement, optionally schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function> function.</link>
+       </para>
+       <para>
+        For example:
+<programlisting>
+SELECT format_type(p.typid, p.typmod) FROM parse_type('timestamp(4)') p;
+        format_type
+ --------------------------------
+  timestamp(4) without time zone
+</programlisting>
+       </para></entry>
+      </row>
+
       <row>
        <entry id="pg-char-to-encoding" role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 28ba0fbd19..0d56c8cbb1 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -26,6 +26,9 @@
 #include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/syscache.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "parser/parse_type.h"
 
 static char *printTypmod(const char *typname, int32 typmod, Oid typmodout);
 
@@ -482,3 +485,59 @@ oidvectortypes(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+/*
+ * parse_type() is the inverse of pg_catalog.format_type(): it takes a string
+ * representing an SQL-compatible type declaration, such as "int4" or "integer"
+ * or "character varying(32)", parses it, and returns the OID and type modifier.
+ *
+ * Raises an error on an invalid type.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+PG_FUNCTION_INFO_V1(parse_type);
+
+Datum
+parse_type(PG_FUNCTION_ARGS)
+{
+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+	const char *type;            /* the type string we want to resolve */
+	Oid         typid;           /* the resolved type oid */
+	int32       typmod;          /* the resolved type modifier */
+	TupleDesc   tupdesc;
+	HeapTuple   rettuple;
+	Datum       values[PARSE_TYPE_STRING_COLS] = {0};
+	bool        nulls[PARSE_TYPE_STRING_COLS] = {0};
+
+	type = text_to_cstring(PG_GETARG_TEXT_PP(0));
+
+	/*
+	 * Build a tuple descriptor for our result type; return an error if not
+	 * called in a context that expects a record.
+	 */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) {
+		ereport(
+			ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("function returning record called in context that cannot accept type record"))
+		);
+	}
+
+	BlessTupleDesc(tupdesc);
+
+	/*
+	 * Parse type-name argument to obtain type OID and encoded typmod. We don't
+	 * need to check for parseTypeString failure, but just let the error be
+	 * raised. The 0 arg works both as the `Node *escontext` arg in Postgres 16
+	 * and the `bool missing_ok` arg in 9.4-15.
+	 */
+	(void) parseTypeString(type, &typid, &typmod, 0);
+
+	/* Create and return tuple. */
+	values[0] = typid;
+	values[1] = typmod;
+	rettuple = heap_form_tuple(tupdesc, values, nulls);
+	return HeapTupleGetDatum(rettuple);
+#undef PARSE_TYPE_STRING_COLS
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d..37205e1b31 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -2184,6 +2184,10 @@
 { oid => '1081', descr => 'format a type oid and atttypmod to canonical SQL',
   proname => 'format_type', proisstrict => 'f', provolatile => 's',
   prorettype => 'text', proargtypes => 'oid int4', prosrc => 'format_type' },
+{ oid => '8401', descr => 'parse a type string into its a type oid and atttypmod',
+  proname => 'parse_type', proisstrict => 't', provolatile => 's',
+  prorettype => 'record', proargtypes => 'text', prosrc => 'parse_type',
+  proallargtypes => '{text,oid,int4}', proargmodes => '{i,o,o}', proargnames => '{typname,typid,typmod}' },
 { oid => '1084', descr => 'I/O',
   proname => 'date_in', provolatile => 's', prorettype => 'date',
   proargtypes => 'cstring', prosrc => 'date_in' },
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 359c570f23..264ad090be 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -133,6 +133,8 @@ extern char *format_type_with_typemod(Oid type_oid, int32 typemod);
 
 extern int32 type_maximum_size(Oid type_oid, int32 typemod);
 
+extern Datum parse_type(PG_FUNCTION_ARGS);
+
 /* quote.c */
 extern char *quote_literal_cstr(const char *rawstr);
 
diff --git a/src/test/regress/expected/create_type.out b/src/test/regress/expected/create_type.out
index 7383fcdbb1..84ea71090c 100644
--- a/src/test/regress/expected/create_type.out
+++ b/src/test/regress/expected/create_type.out
@@ -249,6 +249,59 @@ select format_type('bpchar'::regtype, -1);
  bpchar
 (1 row)
 
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+ typid | typmod 
+-------+--------
+    25 |     -1
+(1 row)
+
+SELECT * FROM parse_type(NULL) p(typid, typmod);
+ typid | typmod 
+-------+--------
+       |       
+(1 row)
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+ERROR:  type "nonesuch" does not exist
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT parse_type('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT parse_type('year(4)'); -- grammar error expected
+ERROR:  type "year" does not exist
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)',
+        'mytab'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+          format_type           
+--------------------------------
+ timestamp(4) without time zone
+ interval(0)
+ interval second(0)
+ timestamp with time zone
+ timestamp(6) with time zone
+ character varying
+ character varying(128)
+ mytab
+(8 rows)
+
 -- Test non-error-throwing APIs using widget, which still throws errors
 SELECT pg_input_is_valid('(1,2,3)', 'widget');
  pg_input_is_valid 
diff --git a/src/test/regress/sql/create_type.sql b/src/test/regress/sql/create_type.sql
index c25018029c..e4700b8719 100644
--- a/src/test/regress/sql/create_type.sql
+++ b/src/test/regress/sql/create_type.sql
@@ -192,6 +192,34 @@ select format_type('bpchar'::regtype, null);
 -- this behavior difference is intentional
 select format_type('bpchar'::regtype, -1);
 
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+SELECT * FROM parse_type(NULL) p(typid, typmod);
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+SELECT parse_type('year(4)'); -- grammar error expected
+
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)',
+        'mytab'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+
 -- Test non-error-throwing APIs using widget, which still throws errors
 SELECT pg_input_is_valid('(1,2,3)', 'widget');
 SELECT pg_input_is_valid('(1,2)', 'widget');  -- hard error expected
-- 
2.43.0

#12Jim Jones
jim.jones@uni-muenster.de
In reply to: David E. Wheeler (#11)
Re: Patch: Add parse_type Function

On 05.02.24 16:14, David E. Wheeler wrote:

Right, done, and cleaned up the redundant comments.

Best,

David

Nice.
The patch applies cleanly and it no longer crashes with NULL parameters.
Docs render fine and regression tests pass. I have nothing more to add.
Let's now wait for Pavel's review.

Thanks!

--
Jim

#13David E. Wheeler
david@justatheory.com
In reply to: Jim Jones (#12)
Re: Patch: Add parse_type Function

On Feb 5, 2024, at 10:47 AM, Jim Jones <jim.jones@uni-muenster.de> wrote:

The patch applies cleanly and it no longer crashes with NULL parameters.
Docs render fine and regression tests pass. I have nothing more to add.
Let's now wait for Pavel's review.

Much obliged!

D

#14jian he
jian.universality@gmail.com
In reply to: David E. Wheeler (#11)
Re: Patch: Add parse_type Function
+ /*
+ * Parse type-name argument to obtain type OID and encoded typmod. We don't
+ * need to check for parseTypeString failure, but just let the error be
+ * raised. The 0 arg works both as the `Node *escontext` arg in Postgres 16
+ * and the `bool missing_ok` arg in 9.4-15.
+ */
+ (void) parseTypeString(type, &typid, &typmod, 0);
if you are wondering around other code deal with NULL, ErrorSaveContext.
NULL would be more correct?
`(void) parseTypeString(type, &typid, &typmod, NULL);`

also parseTypeString already explained the third argument, our
comments can be simplified as:
/*
* Parse type-name argument to obtain type OID and encoded typmod. We don't
* need to handle parseTypeString failure, just let the error be
* raised.
*/

cosmetic issue. Looking around other error handling code, the format
should be something like:
`
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("function returning record called in"
"context that cannot accept type record")));
`

`PG_FUNCTION_INFO_V1(parse_type);`
is unnecessary?
based on the error information: https://cirrus-ci.com/task/5899457502380032
not sure why it only fails on windows.

+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+#undef PARSE_TYPE_STRING_COLS
Are these necessary?
given that the comments already say return the OID and type modifier.
+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>int4</type> )
here `int4` should be `integer`?

commit message:
`Also accounts for data typs that require the SQL grammar to be parsed:`
except the typo issue, this sentence is still hard for me to understand.

The `parse_type()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID and
typmod suitabld for passing to `format_type()`.

suitabld should be suitable.

+       <para>
+        Parses a string representing an SQL type declaration as used in a
+        <command>CREATE TABLE</command> statement, optionally schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and
modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function>
function.</link>
+       </para>
can be simplified:
+       <para>
+        Parses a string representing an SQL data type, optionally
schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and
modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function>
function.</link>
+       </para>
(I don't have a strong opinion though)
#15Erik Wienhold
ewie@ewie.name
In reply to: jian he (#14)
Re: Patch: Add parse_type Function

Let me comment on some issues since I wrote the very first version of
parse_type() on which David's patch is based.

On 2024-02-01 01:00 +0100, jian he <jian.universality@gmail.com> wrote:

cosmetic issue. Looking around other error handling code, the format
should be something like:
`
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("function returning record called in"
"context that cannot accept type record")));
`

+1

`PG_FUNCTION_INFO_V1(parse_type);`
is unnecessary?
based on the error information: https://cirrus-ci.com/task/5899457502380032
not sure why it only fails on windows.

Yeah, it's not necessary for internal functions per [1]https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C-V1-CALL-CONV. It's a
leftover from when this function was part of the pgtap extension.

+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+#undef PARSE_TYPE_STRING_COLS
Are these necessary?
given that the comments already say return the OID and type modifier.

Not sure what the convention here is but I found this in a couple of
places, maybe even a tutorial on writing C functions. See
`git grep '_COLS\s\+[1-9]'` for those instances. It's in line with my
habit of avoiding magic constants.

+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>int4</type> )
here `int4` should be `integer`?

+1

commit message:
`Also accounts for data typs that require the SQL grammar to be parsed:`
except the typo issue, this sentence is still hard for me to understand.

Yes, the sentence is rather handwavy. What is meant here is that this
function also resolves types whose typmod is determined by the SQL
parser or some step after that. There are types whose typmod is
whatever value is found inside the parenthesis, e.g. bit(13) has typmod
13. Our first idea before coming up with that function was to do some
string manipulation and extract the typmod from inside the parenthesis.
But we soon found out that other typmods don't translate one to one,
e.g. varchar(13) has typmod 17. The interval type is also special
because the typmod is some bitmask encoding of e.g. 'second(0)'. Hence
the mention of the SQL grammar.

+       <para>
+        Parses a string representing an SQL type declaration as used in a
+        <command>CREATE TABLE</command> statement, optionally schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and
modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function>
function.</link>
+       </para>
can be simplified:
+       <para>
+        Parses a string representing an SQL data type, optionally
schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and
modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function>
function.</link>
+       </para>
(I don't have a strong opinion though)

Sounds better. The CREATE TABLE reference is superfluous.

[1]: https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C-V1-CALL-CONV

--
Erik

#16David E. Wheeler
david@justatheory.com
In reply to: Erik Wienhold (#15)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 10, 2024, at 20:52, Erik Wienhold <ewie@ewie.name> wrote:

Let me comment on some issues since I wrote the very first version of
parse_type() on which David's patch is based.

Thanks Erik.

On 2024-02-01 01:00 +0100, jian he <jian.universality@gmail.com> wrote:
if you are wondering around other code deal with NULL, ErrorSaveContext.
NULL would be more correct?
`(void) parseTypeString(type, &typid, &typmod, NULL);`

Fixed.

also parseTypeString already explained the third argument, our
comments can be simplified as:
/*
* Parse type-name argument to obtain type OID and encoded typmod. We don't
* need to handle parseTypeString failure, just let the error be
* raised.
*/

Thanks, adopted that language.

cosmetic issue. Looking around other error handling code, the format
should be something like:
`
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("function returning record called in"
"context that cannot accept type record")));
`

+1

I poked around and found some other examples, yes. I went with a single long line for errmsg following the example in adminpack.c[1]https://github.com/postgres/postgres/blob/09eb633e1baa3b7cd7929f3cc77f9c46f63c20b1/contrib/adminpack/adminpack.c#L508-L511

`PG_FUNCTION_INFO_V1(parse_type);`
is unnecessary?
based on the error information: https://cirrus-ci.com/task/5899457502380032
not sure why it only fails on windows.

Yeah, it's not necessary for internal functions per [1]. It's a
leftover from when this function was part of the pgtap extension.

Removed.

+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+#undef PARSE_TYPE_STRING_COLS
Are these necessary?
given that the comments already say return the OID and type modifier.

Not sure what the convention here is but I found this in a couple of
places, maybe even a tutorial on writing C functions. See
`git grep '_COLS\s\+[1-9]'` for those instances. It's in line with my
habit of avoiding magic constants.

Left in place for now.

+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>int4</type> )
here `int4` should be `integer`?

+1

Fixed.

Yes, the sentence is rather handwavy. What is meant here is that this
function also resolves types whose typmod is determined by the SQL
parser or some step after that. There are types whose typmod is
whatever value is found inside the parenthesis, e.g. bit(13) has typmod
13. Our first idea before coming up with that function was to do some
string manipulation and extract the typmod from inside the parenthesis.
But we soon found out that other typmods don't translate one to one,
e.g. varchar(13) has typmod 17. The interval type is also special
because the typmod is some bitmask encoding of e.g. 'second(0)'. Hence
the mention of the SQL grammar.

I adopted some of your language here --- and fixed the spelling errors :-)

can be simplified:
+       <para>
+        Parses a string representing an SQL data type, optionally
schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and
modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function>
function.</link>
+       </para>
(I don't have a strong opinion though)

Sounds better. The CREATE TABLE reference is superfluous.

Done.

Best,

David
[1]: https://github.com/postgres/postgres/blob/09eb633e1baa3b7cd7929f3cc77f9c46f63c20b1/contrib/adminpack/adminpack.c#L508-L511

Attachments:

v4-0001-Add-parse_type-SQL-function.patchapplication/octet-stream; name=v4-0001-Add-parse_type-SQL-function.patch; x-unix-mode=0644Download
From 72b2e9dc4f677d9d17544fbe23aee0418f76b5cf Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Mon, 12 Feb 2024 11:25:41 -0500
Subject: [PATCH v4] Add parse_type() SQL function

The `parse_type()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID and
typmod suitable for passing to `format_type()`. This allows one to
derive the formal SQL name for a type from a string that may be an
alias:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('timestamp(4)') p;

            format_type
    --------------------------------
    timestamp(4) without time zone

This function also resolves types whose typmod is determined by the SQL
parser or some step after that, such as interval types where the stored
field option is encoded in the typmod:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('interval second(0)') p;

        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example.
Originally written by Erik Wienhold for use in pgTAP.
---
 doc/src/sgml/func.sgml                    | 30 ++++++++++++-
 src/backend/utils/adt/format_type.c       | 52 ++++++++++++++++++++++
 src/include/catalog/pg_proc.dat           |  4 ++
 src/include/utils/builtins.h              |  2 +
 src/test/regress/expected/create_type.out | 53 +++++++++++++++++++++++
 src/test/regress/sql/create_type.sql      | 28 ++++++++++++
 6 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 11d537b341..adcc433a69 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24759,7 +24759,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -24773,6 +24773,34 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
        </para></entry>
       </row>
 
+      <row>
+       <entry id="parse_type" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>parse_type</primary>
+        </indexterm>
+        <function>parse_type</function> ( <parameter>type</parameter> <type>text</type> )
+        <returnvalue>record</returnvalue>
+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>integer</type> )
+       </para>
+       <para>
+        Parses a string representing an SQL data type, optionally schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function> function.</link>
+       </para>
+       <para>
+        For example:
+<programlisting>
+SELECT format_type(p.typid, p.typmod) FROM parse_type('timestamp(4)') p;
+        format_type
+ --------------------------------
+  timestamp(4) without time zone
+</programlisting>
+       </para></entry>
+      </row>
+
       <row>
        <entry id="pg-char-to-encoding" role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 28ba0fbd19..2be1cf9f5e 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -26,6 +26,9 @@
 #include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/syscache.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "parser/parse_type.h"
 
 static char *printTypmod(const char *typname, int32 typmod, Oid typmodout);
 
@@ -482,3 +485,52 @@ oidvectortypes(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+/*
+ * parse_type() is the inverse of pg_catalog.format_type(): it takes a string
+ * representing an SQL-compatible type declaration, such as "int4" or "integer"
+ * or "character varying(32)", parses it, and returns the OID and type modifier.
+ *
+ * Raises an error on an invalid type.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+Datum
+parse_type(PG_FUNCTION_ARGS)
+{
+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+	const char *type;            /* the type string we want to resolve */
+	Oid         typid;           /* the resolved type oid */
+	int32       typmod;          /* the resolved type modifier */
+	TupleDesc   tupdesc;
+	HeapTuple   rettuple;
+	Datum       values[PARSE_TYPE_STRING_COLS] = {0};
+	bool        nulls[PARSE_TYPE_STRING_COLS] = {0};
+
+	type = text_to_cstring(PG_GETARG_TEXT_PP(0));
+
+	/*
+	 * Build a tuple descriptor for our result type; return an error if not
+	 * called in a context that expects a record.
+	 */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("function returning record called in context that cannot accept type record")));
+
+	BlessTupleDesc(tupdesc);
+
+	/*
+	 * Parse type-name argument to obtain type OID and encoded typmod. We don't
+	 * need to handle parseTypeString failure, just let the error be raised.
+	 */
+	(void) parseTypeString(type, &typid, &typmod, NULL);
+
+	/* Create and return tuple. */
+	values[0] = typid;
+	values[1] = typmod;
+	rettuple = heap_form_tuple(tupdesc, values, nulls);
+	return HeapTupleGetDatum(rettuple);
+#undef PARSE_TYPE_STRING_COLS
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d..37205e1b31 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -2184,6 +2184,10 @@
 { oid => '1081', descr => 'format a type oid and atttypmod to canonical SQL',
   proname => 'format_type', proisstrict => 'f', provolatile => 's',
   prorettype => 'text', proargtypes => 'oid int4', prosrc => 'format_type' },
+{ oid => '8401', descr => 'parse a type string into its a type oid and atttypmod',
+  proname => 'parse_type', proisstrict => 't', provolatile => 's',
+  prorettype => 'record', proargtypes => 'text', prosrc => 'parse_type',
+  proallargtypes => '{text,oid,int4}', proargmodes => '{i,o,o}', proargnames => '{typname,typid,typmod}' },
 { oid => '1084', descr => 'I/O',
   proname => 'date_in', provolatile => 's', prorettype => 'date',
   proargtypes => 'cstring', prosrc => 'date_in' },
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 359c570f23..264ad090be 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -133,6 +133,8 @@ extern char *format_type_with_typemod(Oid type_oid, int32 typemod);
 
 extern int32 type_maximum_size(Oid type_oid, int32 typemod);
 
+extern Datum parse_type(PG_FUNCTION_ARGS);
+
 /* quote.c */
 extern char *quote_literal_cstr(const char *rawstr);
 
diff --git a/src/test/regress/expected/create_type.out b/src/test/regress/expected/create_type.out
index 7383fcdbb1..84ea71090c 100644
--- a/src/test/regress/expected/create_type.out
+++ b/src/test/regress/expected/create_type.out
@@ -249,6 +249,59 @@ select format_type('bpchar'::regtype, -1);
  bpchar
 (1 row)
 
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+ typid | typmod 
+-------+--------
+    25 |     -1
+(1 row)
+
+SELECT * FROM parse_type(NULL) p(typid, typmod);
+ typid | typmod 
+-------+--------
+       |       
+(1 row)
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+ERROR:  type "nonesuch" does not exist
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT parse_type('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT parse_type('year(4)'); -- grammar error expected
+ERROR:  type "year" does not exist
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)',
+        'mytab'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+          format_type           
+--------------------------------
+ timestamp(4) without time zone
+ interval(0)
+ interval second(0)
+ timestamp with time zone
+ timestamp(6) with time zone
+ character varying
+ character varying(128)
+ mytab
+(8 rows)
+
 -- Test non-error-throwing APIs using widget, which still throws errors
 SELECT pg_input_is_valid('(1,2,3)', 'widget');
  pg_input_is_valid 
diff --git a/src/test/regress/sql/create_type.sql b/src/test/regress/sql/create_type.sql
index c25018029c..e4700b8719 100644
--- a/src/test/regress/sql/create_type.sql
+++ b/src/test/regress/sql/create_type.sql
@@ -192,6 +192,34 @@ select format_type('bpchar'::regtype, null);
 -- this behavior difference is intentional
 select format_type('bpchar'::regtype, -1);
 
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+SELECT * FROM parse_type(NULL) p(typid, typmod);
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+SELECT parse_type('year(4)'); -- grammar error expected
+
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)',
+        'mytab'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+
 -- Test non-error-throwing APIs using widget, which still throws errors
 SELECT pg_input_is_valid('(1,2,3)', 'widget');
 SELECT pg_input_is_valid('(1,2)', 'widget');  -- hard error expected
-- 
2.43.1

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#16)
Re: Patch: Add parse_type Function

"David E. Wheeler" <david@justatheory.com> writes:

[ v4-0001-Add-parse_type-SQL-function.patch ]

It strikes me that this is basically to_regtype() with the additional
option to return the typmod. That leads to some questions:

* Should we choose a name related to to_regtype? I don't have any
immediate suggestions, but since you didn't seem entirely set on
parse_type, maybe it's worth thinking along those lines. OTOH,
to the extent that people would use this with format_type, maybe
parse_type is fine.

* Perhaps the type OID output argument should be declared regtype
not plain OID? It wouldn't make any difference for passing it to
format_type, but in other use-cases perhaps regtype would be more
readable. It's not a deal-breaker either way of course, since
you can cast oid to regtype or vice versa.

* Maybe the code should be in adt/regproc.c not format_type.c.

* Experience with to_regtype suggests strongly that people would
prefer "return NULL" to failure for an unrecognized type name.

Skimming the patch, I notice that the manual addition to
builtins.h should be unnecessary: the pg_proc.dat entry
should be enough to create an extern in fmgrprotos.h.
Also, I'm pretty sure that reformat_dat_file.pl will
think your pg_proc.dat entry is overly verbose. See
https://www.postgresql.org/docs/devel/system-catalog-initial-data.html

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: Patch: Add parse_type Function

I wrote:

It strikes me that this is basically to_regtype() with the additional
option to return the typmod. That leads to some questions:

BTW, another way that this problem could be approached is to use
to_regtype() as-is, with a separate function to obtain the typmod:

select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));

This is intellectually ugly, since it implies parsing the same
typename string twice. But on the other hand it avoids the notational
pain and runtime overhead involved in using a record-returning
function. So I think it might be roughly a wash for performance.
Question to think about is which way is easier to use. I don't
have an opinion particularly; just throwing the idea out there.

regards, tom lane

#19Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#18)
Re: Patch: Add parse_type Function

po 12. 2. 2024 v 19:20 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

I wrote:

It strikes me that this is basically to_regtype() with the additional
option to return the typmod. That leads to some questions:

BTW, another way that this problem could be approached is to use
to_regtype() as-is, with a separate function to obtain the typmod:

select format_type(to_regtype('timestamp(4)'),
to_regtypmod('timestamp(4)'));

This is intellectually ugly, since it implies parsing the same
typename string twice. But on the other hand it avoids the notational
pain and runtime overhead involved in using a record-returning
function. So I think it might be roughly a wash for performance.
Question to think about is which way is easier to use. I don't
have an opinion particularly; just throwing the idea out there.

+1

Pavel

Show quoted text

regards, tom lane

#20David E. Wheeler
david@justatheory.com
In reply to: Tom Lane (#17)
Re: Patch: Add parse_type Function

On Feb 12, 2024, at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David E. Wheeler" <david@justatheory.com> writes:

[ v4-0001-Add-parse_type-SQL-function.patch ]

It strikes me that this is basically to_regtype() with the additional
option to return the typmod. That leads to some questions:

Huh. I saw it more on a par with format_type(), at least in the output I expect.

* Should we choose a name related to to_regtype? I don't have any
immediate suggestions, but since you didn't seem entirely set on
parse_type, maybe it's worth thinking along those lines. OTOH,
to the extent that people would use this with format_type, maybe
parse_type is fine.

Yeah, and the `to_*` functions return a single OID, not two fields.

* Perhaps the type OID output argument should be declared regtype
not plain OID? It wouldn't make any difference for passing it to
format_type, but in other use-cases perhaps regtype would be more
readable. It's not a deal-breaker either way of course, since
you can cast oid to regtype or vice versa.

Sure. I used `oid` because that’s exactly what the argument to format_type() is called, so thought the parity there more appropriate. Maybe its argument should be renamed to regtype?

* Maybe the code should be in adt/regproc.c not format_type.c.

Happy to move it wherever makes the most sense.

* Experience with to_regtype suggests strongly that people would
prefer "return NULL" to failure for an unrecognized type name.

Could do, but as with to_regtype() there’s an issue with error handling when it descends into the SQL parser.

Skimming the patch, I notice that the manual addition to
builtins.h should be unnecessary: the pg_proc.dat entry
should be enough to create an extern in fmgrprotos.h.

Confirmed, will remove in next patch.

Also, I'm pretty sure that reformat_dat_file.pl will
think your pg_proc.dat entry is overly verbose. See
https://www.postgresql.org/docs/devel/system-catalog-initial-data.html

There are a bunch of things it reformats:

❯ git diff --name-only
src/include/catalog/pg_aggregate.dat
src/include/catalog/pg_database.dat
src/include/catalog/pg_proc.dat
src/include/catalog/pg_type.dat
src/include/utils/builtins.h

And even in pg_proc.dat there are 13 blocks it reformats. I can submit with just my block formatted if you’d prefer.

BTW, another way that this problem could be approached is to use
to_regtype() as-is, with a separate function to obtain the typmod:

select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));

Oh interesting.

This is intellectually ugly, since it implies parsing the same
typename string twice. But on the other hand it avoids the notational
pain and runtime overhead involved in using a record-returning
function. So I think it might be roughly a wash for performance.
Question to think about is which way is easier to use. I don't
have an opinion particularly; just throwing the idea out there.

Honestly I haven’t cared for the fact that parse_type() returns a record; it makes it a bit clunky. But yeah, so does this to pass the same value to two function calls.

Perhaps it makes sense to add to_regtypmod() as you suggest, but then also something like:

CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
SELECT format_type(to_regtype($1), to_regtypmod($1))
$$;

Best,

David

#21David E. Wheeler
david@justatheory.com
In reply to: David E. Wheeler (#20)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 12, 2024, at 15:55, David E. Wheeler <david@justatheory.com> wrote:

Happy to move it wherever makes the most sense.

Update with a bunch of the suggested changes. Some questions still open from the previous post, though.

Best,

David

Attachments:

v5-0001-Add-parse_type-SQL-function.patchapplication/octet-stream; name=v5-0001-Add-parse_type-SQL-function.patch; x-unix-mode=0644Download
From 857f7eace0bc65ef882dd8ced51993e37142055c Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Tue, 13 Feb 2024 17:58:34 -0500
Subject: [PATCH v5] Add parse_type() SQL function

The `parse_type()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID and
typmod suitable for passing to `format_type()`. This allows one to
derive the formal SQL name for a type from a string that may be an
alias:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('timestamp(4)') p;

            format_type
    --------------------------------
    timestamp(4) without time zone

This function also resolves types whose typmod is determined by the SQL
parser or some step after that, such as interval types where the stored
field option is encoded in the typmod:

    SELECT format_type(p.typid, p.typmod)
      FROM parse_type('interval second(0)') p;

        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example.
Originally written by Erik Wienhold for use in pgTAP.
---
 doc/src/sgml/func.sgml                | 30 +++++++++++++++-
 src/backend/utils/adt/regproc.c       | 51 +++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |  6 ++++
 src/test/regress/expected/regproc.out | 51 +++++++++++++++++++++++++++
 src/test/regress/sql/regproc.sql      | 28 +++++++++++++++
 5 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 11d537b341..adcc433a69 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24759,7 +24759,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -24773,6 +24773,34 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
        </para></entry>
       </row>
 
+      <row>
+       <entry id="parse_type" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>parse_type</primary>
+        </indexterm>
+        <function>parse_type</function> ( <parameter>type</parameter> <type>text</type> )
+        <returnvalue>record</returnvalue>
+        ( <parameter>typid</parameter> <type>oid</type>,
+          <parameter>typmod</parameter> <type>integer</type> )
+       </para>
+       <para>
+        Parses a string representing an SQL data type, optionally schema-qualified.
+        Returns a record with two fields, <parameter>typid</parameter> and
+        <parameter>typmod</parameter>, representing the OID and modifier for the
+        type. These correspond to the parameters to pass to the
+        <link linkend="format_type"><function>format_type</function> function.</link>
+       </para>
+       <para>
+        For example:
+<programlisting>
+SELECT format_type(p.typid, p.typmod) FROM parse_type('timestamp(4)') p;
+        format_type
+ --------------------------------
+  timestamp(4) without time zone
+</programlisting>
+       </para></entry>
+      </row>
+
       <row>
        <entry id="pg-char-to-encoding" role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 1e3bf3f5fd..4fee27a139 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_type.h"
+#include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -2016,3 +2017,53 @@ parseNameAndArgTypes(const char *string, bool allowNone, List **names,
 
 	return true;
 }
+
+
+/*
+ * parse_type() is the inverse of pg_catalog.format_type(): it takes a string
+ * representing an SQL-compatible type declaration, such as "int4" or "integer"
+ * or "character varying(32)", parses it, and returns the OID and type modifier.
+ *
+ * Raises an error on an invalid type.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+Datum
+parse_type(PG_FUNCTION_ARGS)
+{
+#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
+	const char *type;            /* the type string we want to resolve */
+	Oid         typid;           /* the resolved type oid */
+	int32       typmod;          /* the resolved type modifier */
+	TupleDesc   tupdesc;
+	HeapTuple   rettuple;
+	Datum       values[PARSE_TYPE_STRING_COLS] = {0};
+	bool        nulls[PARSE_TYPE_STRING_COLS] = {0};
+
+	type = text_to_cstring(PG_GETARG_TEXT_PP(0));
+
+	/*
+	 * Build a tuple descriptor for our result type; return an error if not
+	 * called in a context that expects a record.
+	 */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("function returning record called in context that cannot accept type record")));
+
+	BlessTupleDesc(tupdesc);
+
+	/*
+	 * Parse type-name argument to obtain type OID and encoded typmod. We don't
+	 * need to handle parseTypeString failure, just let the error be raised.
+	 */
+	(void) parseTypeString(type, &typid, &typmod, NULL);
+
+	/* Create and return tuple. */
+	values[0] = typid;
+	values[1] = typmod;
+	rettuple = heap_form_tuple(tupdesc, values, nulls);
+	return HeapTupleGetDatum(rettuple);
+#undef PARSE_TYPE_STRING_COLS
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d..8071b44f6b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -2184,6 +2184,12 @@
 { oid => '1081', descr => 'format a type oid and atttypmod to canonical SQL',
   proname => 'format_type', proisstrict => 'f', provolatile => 's',
   prorettype => 'text', proargtypes => 'oid int4', prosrc => 'format_type' },
+{ oid => '8401',
+  descr => 'parse a type string into its a type oid and atttypmod',
+  proname => 'parse_type', provolatile => 's', prorettype => 'record',
+  proargtypes => 'text', proallargtypes => '{text,oid,int4}',
+  proargmodes => '{i,o,o}', proargnames => '{typname,typid,typmod}',
+  prosrc => 'parse_type' },
 { oid => '1084', descr => 'I/O',
   proname => 'date_in', provolatile => 's', prorettype => 'date',
   proargtypes => 'cstring', prosrc => 'date_in' },
diff --git a/src/test/regress/expected/regproc.out b/src/test/regress/expected/regproc.out
index a9420850b8..033f6a81a5 100644
--- a/src/test/regress/expected/regproc.out
+++ b/src/test/regress/expected/regproc.out
@@ -544,3 +544,54 @@ SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 ERROR:  improper qualified name (too many dotted names): way.too.many.names
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
 ERROR:  cross-database references are not implemented: no_such_catalog.schema.name
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+ typid | typmod 
+-------+--------
+    25 |     -1
+(1 row)
+
+SELECT * FROM parse_type(NULL) p(typid, typmod);
+ typid | typmod 
+-------+--------
+       |       
+(1 row)
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+ERROR:  type "nonesuch" does not exist
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT parse_type('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT parse_type('year(4)'); -- grammar error expected
+ERROR:  type "year" does not exist
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+          format_type           
+--------------------------------
+ timestamp(4) without time zone
+ interval(0)
+ interval second(0)
+ timestamp with time zone
+ timestamp(6) with time zone
+ character varying
+ character varying(128)
+(7 rows)
+
diff --git a/src/test/regress/sql/regproc.sql b/src/test/regress/sql/regproc.sql
index de2aa881a8..489463aa9e 100644
--- a/src/test/regress/sql/regproc.sql
+++ b/src/test/regress/sql/regproc.sql
@@ -145,3 +145,31 @@ SELECT * FROM pg_input_error_info('incorrect type name syntax', 'regtype');
 SELECT * FROM pg_input_error_info('numeric(1,2,3)', 'regtype');  -- bogus typmod
 SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
+
+-- Test parse_type
+SELECT * FROM parse_type('text') p(typid, typmod);
+SELECT * FROM parse_type(NULL) p(typid, typmod);
+
+-- Test parse_type errors
+SELECT parse_type('nonesuch'); -- error expected
+SELECT parse_type('interval nonesuch'); -- grammar error expected
+SELECT parse_type('year(4)'); -- grammar error expected
+
+-- Test parse_type with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT ((parse_type(s)).*)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+
-- 
2.43.1

#22Erik Wienhold
ewie@ewie.name
In reply to: Tom Lane (#18)
1 attachment(s)
Re: Patch: Add parse_type Function

On 2024-02-12 19:20 +0100, Tom Lane wrote:

I wrote:

It strikes me that this is basically to_regtype() with the additional
option to return the typmod. That leads to some questions:

BTW, another way that this problem could be approached is to use
to_regtype() as-is, with a separate function to obtain the typmod:

select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));

This is intellectually ugly, since it implies parsing the same
typename string twice. But on the other hand it avoids the notational
pain and runtime overhead involved in using a record-returning
function. So I think it might be roughly a wash for performance.
Question to think about is which way is easier to use. I don't
have an opinion particularly; just throwing the idea out there.

Out of curiosity, I benchmarked this with the attached to_regtypmod()
patch based on David's v5 applied to a6c21887a9. The script running
pgbench and its output are included at the end.

Just calling parse_type() vs to_regtype()/to_regtypmod() is a wash for
performance as you thought. But format_type() performs better with
to_regtypmod() than with parse_type(). Accessing the record fields
returned by parse_type() adds some overhead.

to_regtypmod() is better for our use case in pgTAP which relies on
format_type() to normalize the type name. The implementation of
to_regtypmod() is also simpler than parse_type(). Usage-wise, both are
clunky IMO.

Benchmark script:

#!/usr/bin/env bash

set -eu

cat <<'SQL' > parse_type.sql
SELECT parse_type('interval second(0)');
SQL

cat <<'SQL' > parse_type_and_format.sql
SELECT format_type(p.typid, p.typmod) FROM parse_type('interval second(0)') p;
SQL

cat <<'SQL' > to_regtypmod.sql
SELECT to_regtype('interval second(0)'), to_regtypmod('interval second(0)');
SQL

cat <<'SQL' > to_regtypmod_and_format.sql
SELECT format_type(to_regtype('interval second(0)'), to_regtypmod('interval second(0)'));
SQL

for f in \
parse_type.sql \
parse_type_and_format.sql \
to_regtypmod.sql \
to_regtypmod_and_format.sql
do
pgbench -n -f "$f" -T10 postgres
echo
done

pgbench output:

pgbench (17devel)
transaction type: parse_type.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 277017
number of failed transactions: 0 (0.000%)
latency average = 0.036 ms
initial connection time = 1.623 ms
tps = 27706.005513 (without initial connection time)

pgbench (17devel)
transaction type: parse_type_and_format.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 222487
number of failed transactions: 0 (0.000%)
latency average = 0.045 ms
initial connection time = 1.603 ms
tps = 22252.095670 (without initial connection time)

pgbench (17devel)
transaction type: to_regtypmod.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 276134
number of failed transactions: 0 (0.000%)
latency average = 0.036 ms
initial connection time = 1.570 ms
tps = 27617.628259 (without initial connection time)

pgbench (17devel)
transaction type: to_regtypmod_and_format.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 270820
number of failed transactions: 0 (0.000%)
latency average = 0.037 ms
initial connection time = 1.631 ms
tps = 27086.331104 (without initial connection time)

--
Erik

Attachments:

0001-Add-to_regtypmod-for-benchmarking-against-parse_type.patchtext/plain; charset=us-asciiDownload
From 0b60432a84d63a7fccaae0fe123a0aa2ae67493b Mon Sep 17 00:00:00 2001
From: Erik Wienhold <ewie@ewie.name>
Date: Sun, 18 Feb 2024 17:33:35 +0100
Subject: [PATCH] Add to_regtypmod() for benchmarking against parse_type()

---
 src/backend/utils/adt/regproc.c       | 18 ++++++++++
 src/include/catalog/pg_proc.dat       |  3 ++
 src/test/regress/expected/regproc.out | 51 +++++++++++++++++++++++++++
 src/test/regress/sql/regproc.sql      | 26 ++++++++++++++
 4 files changed, 98 insertions(+)

diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 4fee27a139..6285dc7192 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -2067,3 +2067,21 @@ parse_type(PG_FUNCTION_ARGS)
 	return HeapTupleGetDatum(rettuple);
 #undef PARSE_TYPE_STRING_COLS
 }
+
+Datum
+to_regtypmod(PG_FUNCTION_ARGS)
+{
+	const char *type;            /* the type string we want to resolve */
+	Oid         typid;           /* the resolved type oid */
+	int32       typmod;          /* the resolved type modifier */
+
+	type = text_to_cstring(PG_GETARG_TEXT_PP(0));
+
+	/*
+	 * Parse type-name argument to obtain type OID and encoded typmod. We don't
+	 * need to handle parseTypeString failure, just let the error be raised.
+	 */
+	(void) parseTypeString(type, &typid, &typmod, NULL);
+
+	PG_RETURN_INT32(typmod);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6cad101bca..befeaeec37 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -2190,6 +2190,9 @@
   proargtypes => 'text', proallargtypes => '{text,oid,int4}',
   proargmodes => '{i,o,o}', proargnames => '{typname,typid,typmod}',
   prosrc => 'parse_type' },
+{ oid => '8402', descr => 'covert type name to typmod',
+  proname => 'to_regtypmod', provolatile => 's', prorettype => 'int4',
+  proargtypes => 'text', prosrc => 'to_regtypmod' },
 { oid => '1084', descr => 'I/O',
   proname => 'date_in', provolatile => 's', prorettype => 'date',
   proargtypes => 'cstring', prosrc => 'date_in' },
diff --git a/src/test/regress/expected/regproc.out b/src/test/regress/expected/regproc.out
index 033f6a81a5..c548b320c3 100644
--- a/src/test/regress/expected/regproc.out
+++ b/src/test/regress/expected/regproc.out
@@ -595,3 +595,54 @@ SELECT format_type(typid, typmod) FROM p;
  character varying(128)
 (7 rows)
 
+-- Test to_regtypmod
+SELECT * FROM to_regtypmod('text');
+ to_regtypmod 
+--------------
+           -1
+(1 row)
+
+SELECT * FROM to_regtypmod(NULL);
+ to_regtypmod 
+--------------
+             
+(1 row)
+
+-- Test to_regtypmod errors
+SELECT to_regtypmod('nonesuch'); -- error expected
+ERROR:  type "nonesuch" does not exist
+SELECT to_regtypmod('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypmod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT to_regtypmod('year(4)'); -- grammar error expected
+ERROR:  type "year" does not exist
+-- Test to_regtypmod with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT to_regtype(s), to_regtypmod(s)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
+          format_type           
+--------------------------------
+ timestamp(4) without time zone
+ interval(0)
+ interval second(0)
+ timestamp with time zone
+ timestamp(6) with time zone
+ character varying
+ character varying(128)
+(7 rows)
+
diff --git a/src/test/regress/sql/regproc.sql b/src/test/regress/sql/regproc.sql
index 489463aa9e..ac19737012 100644
--- a/src/test/regress/sql/regproc.sql
+++ b/src/test/regress/sql/regproc.sql
@@ -173,3 +173,29 @@ p(typid, typmod) AS (
 )
 SELECT format_type(typid, typmod) FROM p;
 
+-- Test to_regtypmod
+SELECT * FROM to_regtypmod('text');
+SELECT * FROM to_regtypmod(NULL);
+
+-- Test to_regtypmod errors
+SELECT to_regtypmod('nonesuch'); -- error expected
+SELECT to_regtypmod('interval nonesuch'); -- grammar error expected
+SELECT to_regtypmod('year(4)'); -- grammar error expected
+
+-- Test to_regtypmod with various aliases and grammar-based types
+WITH s(s) AS (
+    SELECT * FROM unnest(ARRAY[
+        'timestamp(4)',
+        'interval(0)',
+        'interval second(0)',
+        'timestamptz',
+        'timestamptz(6)',
+        'varchar',
+        'varchar(128)'
+    ])
+),
+p(typid, typmod) AS (
+    SELECT to_regtype(s), to_regtypmod(s)
+      FROM s
+)
+SELECT format_type(typid, typmod) FROM p;
-- 
2.43.2

#23Pavel Stehule
pavel.stehule@gmail.com
In reply to: Erik Wienhold (#22)
Re: Patch: Add parse_type Function

Hi

ne 18. 2. 2024 v 19:50 odesílatel Erik Wienhold <ewie@ewie.name> napsal:

On 2024-02-12 19:20 +0100, Tom Lane wrote:

I wrote:

It strikes me that this is basically to_regtype() with the additional
option to return the typmod. That leads to some questions:

BTW, another way that this problem could be approached is to use
to_regtype() as-is, with a separate function to obtain the typmod:

select format_type(to_regtype('timestamp(4)'),

to_regtypmod('timestamp(4)'));

This is intellectually ugly, since it implies parsing the same
typename string twice. But on the other hand it avoids the notational
pain and runtime overhead involved in using a record-returning
function. So I think it might be roughly a wash for performance.
Question to think about is which way is easier to use. I don't
have an opinion particularly; just throwing the idea out there.

Out of curiosity, I benchmarked this with the attached to_regtypmod()
patch based on David's v5 applied to a6c21887a9. The script running
pgbench and its output are included at the end.

Just calling parse_type() vs to_regtype()/to_regtypmod() is a wash for
performance as you thought. But format_type() performs better with
to_regtypmod() than with parse_type(). Accessing the record fields
returned by parse_type() adds some overhead.

to_regtypmod() is better for our use case in pgTAP which relies on
format_type() to normalize the type name. The implementation of
to_regtypmod() is also simpler than parse_type(). Usage-wise, both are
clunky IMO.

Benchmark script:

#!/usr/bin/env bash

set -eu

cat <<'SQL' > parse_type.sql
SELECT parse_type('interval second(0)');
SQL

cat <<'SQL' > parse_type_and_format.sql
SELECT format_type(p.typid, p.typmod) FROM parse_type('interval
second(0)') p;
SQL

cat <<'SQL' > to_regtypmod.sql
SELECT to_regtype('interval second(0)'), to_regtypmod('interval
second(0)');
SQL

cat <<'SQL' > to_regtypmod_and_format.sql
SELECT format_type(to_regtype('interval second(0)'),
to_regtypmod('interval second(0)'));
SQL

for f in \
parse_type.sql \
parse_type_and_format.sql \
to_regtypmod.sql \
to_regtypmod_and_format.sql
do
pgbench -n -f "$f" -T10 postgres
echo
done

pgbench output:

pgbench (17devel)
transaction type: parse_type.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 277017
number of failed transactions: 0 (0.000%)
latency average = 0.036 ms
initial connection time = 1.623 ms
tps = 27706.005513 (without initial connection time)

pgbench (17devel)
transaction type: parse_type_and_format.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 222487
number of failed transactions: 0 (0.000%)
latency average = 0.045 ms
initial connection time = 1.603 ms
tps = 22252.095670 (without initial connection time)

pgbench (17devel)
transaction type: to_regtypmod.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 276134
number of failed transactions: 0 (0.000%)
latency average = 0.036 ms
initial connection time = 1.570 ms
tps = 27617.628259 (without initial connection time)

pgbench (17devel)
transaction type: to_regtypmod_and_format.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 270820
number of failed transactions: 0 (0.000%)
latency average = 0.037 ms
initial connection time = 1.631 ms
tps = 27086.331104 (without initial connection time)

The overhead of parse_type_and_format can be related to higher planning
time. PL/pgSQL can assign composite without usage FROM clause.

Regards

Pavel

Show quoted text

--
Erik

#24Erik Wienhold
ewie@ewie.name
In reply to: Pavel Stehule (#23)
Re: Patch: Add parse_type Function

On 2024-02-18 20:00 +0100, Pavel Stehule wrote:

The overhead of parse_type_and_format can be related to higher planning
time. PL/pgSQL can assign composite without usage FROM clause.

Thanks, didn't know that this makes a difference. In that case both
variants are on par.

BEGIN;

CREATE FUNCTION format_with_parse_type(text)
RETURNS text
LANGUAGE plpgsql
STABLE STRICT
AS $$
DECLARE
p record := parse_type($1);
BEGIN
RETURN format_type(p.typid, p.typmod);
END
$$;

CREATE FUNCTION format_with_to_regtypmod(text)
RETURNS text
LANGUAGE plpgsql
STABLE STRICT
AS $$
BEGIN
RETURN format_type(to_regtype($1), to_regtypmod($1));
END
$$;

COMMIT;

Results:

SELECT format_with_parse_type('interval second(0)');

pgbench (17devel)
transaction type: format_with_parse_type.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 253530
number of failed transactions: 0 (0.000%)
latency average = 0.039 ms
initial connection time = 1.846 ms
tps = 25357.551681 (without initial connection time)

SELECT format_with_to_regtypmod('interval second(0)');

pgbench (17devel)
transaction type: format_with_to_regtypmod.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 257942
number of failed transactions: 0 (0.000%)
latency average = 0.039 ms
initial connection time = 1.544 ms
tps = 25798.015526 (without initial connection time)

--
Erik

#25David E. Wheeler
david@justatheory.com
In reply to: Erik Wienhold (#24)
Re: Patch: Add parse_type Function

On Feb 18, 2024, at 15:55, Erik Wienhold <ewie@ewie.name> wrote:

The overhead of parse_type_and_format can be related to higher planning
time. PL/pgSQL can assign composite without usage FROM clause.

Thanks, didn't know that this makes a difference. In that case both
variants are on par.

Presumably this is a side-effect of the use of a RECORD here, which requires a FROM clause in pure SQL, yes?

The only way I can think of to avoid that is to:

1. Add a to_regtypmod() for those who just want the typemod.

2. Add a format_type_string() function that returns a string, the equivalent of this function but in C:

CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
SELECT format_type(to_regtype($1), to_regtypmod($1))
$$;

We could also keep the record-returning function for users who want both the regtype and the regytypemod at once, but with the other two I consider it less essential.

Thoughts?

Best,

David

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#25)
Re: Patch: Add parse_type Function

"David E. Wheeler" <david@justatheory.com> writes:

The only way I can think of to avoid that is to:

1. Add a to_regtypmod() for those who just want the typemod.

Seems like there's a good case for doing that.

2. Add a format_type_string() function that returns a string, the equivalent of this function but in C:

CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
SELECT format_type(to_regtype($1), to_regtypmod($1))
$$;

I'm less thrilled about that, mainly because I can't think of
a good name for it ("format_type_string" is certainly not that).
Is the use-case for this functionality really strong enough that
we need to provide it as a single function rather than something
assembled out of spare parts?

regards, tom lane

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#26)
Re: Patch: Add parse_type Function

I wrote:

I'm less thrilled about that, mainly because I can't think of
a good name for it ("format_type_string" is certainly not that).

After some time ruminating, a couple of possibilities occurred to
me:
reformat_type(text)
canonical_type_name(text)

Is the use-case for this functionality really strong enough that
we need to provide it as a single function rather than something
assembled out of spare parts?

I'm still unconvinced about that, though.

regards, tom lane

#28David E. Wheeler
david@justatheory.com
In reply to: Tom Lane (#26)
Re: Patch: Add parse_type Function

On Feb 19, 2024, at 15:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. Add a to_regtypmod() for those who just want the typemod.

Seems like there's a good case for doing that.

I’ll work on that.

I'm less thrilled about that, mainly because I can't think of
a good name for it ("format_type_string" is certainly not that).
Is the use-case for this functionality really strong enough that
we need to provide it as a single function rather than something
assembled out of spare parts?

Not essential for pgTAP, no, as we can just use the parts. It was the typmod bit that was missing.

On Feb 19, 2024, at 17:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After some time ruminating, a couple of possibilities occurred to
me:
reformat_type(text)
canonical_type_name(text)

I was just thinking about hitting the thesaurus entry for “canonical”, but I quite like reformat_type. It’s super clear and draws the parallel to format_type() more clearly. Will likely steal the name for pgTAP if we don’t add it to core. :-)

I'm still unconvinced about that, though.

I guess the questions are:

* What are the other use cases for it?
* How obvious is it how to do it?

For the latter, it could easily be an example in the docs.

Best,

David

#29Erik Wienhold
ewie@ewie.name
In reply to: David E. Wheeler (#28)
Re: Patch: Add parse_type Function

On 2024-02-19 23:59 +0100, David E. Wheeler wrote:

On Feb 19, 2024, at 15:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. Add a to_regtypmod() for those who just want the typemod.

Seems like there's a good case for doing that.

I’ll work on that.

See the patch I wrote for my benchmarks. But it's pretty easy anyway to
cut down parse_type() ;)

I'm less thrilled about that, mainly because I can't think of a good
name for it ("format_type_string" is certainly not that). Is the
use-case for this functionality really strong enough that we need to
provide it as a single function rather than something assembled out
of spare parts?

Not essential for pgTAP, no, as we can just use the parts. It was the
typmod bit that was missing.

But you don't actually need reformat_type() in pgTAP. You can just get
the type OID and modifier of the want_type and have_type and compare
those. Then use format_type() for the error message. Looks a bit
cleaner to me than doing the string comparison.

On second thought, I guess comparing the reformatted type names is
necessary in order to have a uniform API on older Postgres releases
where pgTAP has to provide its own to_regtypmod() based on typmodin
functions.

On Feb 19, 2024, at 17:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After some time ruminating, a couple of possibilities occurred to
me: reformat_type(text) canonical_type_name(text)

I was just thinking about hitting the thesaurus entry for “canonical”,
but I quite like reformat_type. It’s super clear and draws the
parallel to format_type() more clearly. Will likely steal the name for
pgTAP if we don’t add it to core. :-)

I'm still unconvinced about that, though.

I guess the questions are:

* What are the other use cases for it?

Can't think of any right now other than this are-type-names-the-same
check. Perhaps also for pretty-printing user-provided type names where
we don't take the type info from e.g. pg_attribute.

* How obvious is it how to do it?

For the latter, it could easily be an example in the docs.

Can be mentioned right under format_type().

--
Erik

#30David E. Wheeler
david@justatheory.com
In reply to: Erik Wienhold (#29)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 19, 2024, at 21:58, Erik Wienhold <ewie@ewie.name> wrote:

See the patch I wrote for my benchmarks. But it's pretty easy anyway to
cut down parse_type() ;)

LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached.

But you don't actually need reformat_type() in pgTAP. You can just get
the type OID and modifier of the want_type and have_type and compare
those. Then use format_type() for the error message. Looks a bit
cleaner to me than doing the string comparison.

Fair.

On second thought, I guess comparing the reformatted type names is
necessary in order to have a uniform API on older Postgres releases
where pgTAP has to provide its own to_regtypmod() based on typmodin
functions.

Maybe. Worth playing with.

For the latter, it could easily be an example in the docs.

Can be mentioned right under format_type().

Well I included it in the to_regtypemod docs here, but could so either.

Best,

David

Attachments:

v6-0001-Add-to_regtypemod-SQL-function.patchapplication/octet-stream; name=v6-0001-Add-to_regtypemod-SQL-function.patch; x-unix-mode=0644Download
From 8ac7bf10036d762003702e2c2f84df45051e82f6 Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Mon, 19 Feb 2024 21:55:33 -0500
Subject: [PATCH v6] Add to_regtypemod() SQL function

The `to_regtypemod()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID
(which id discards) and a typmod, which it returns. This value is
suitable for passing  as the second argument to `format_type()`, which
allow one to derive the formal SQL typmod from a string:

    SELECT format_type(
      to_regtype('varchar(32)'),
      to_regtypemod('varchar(32)')
    );
          format_type
    -----------------------
     character varying(32)

This function also resolves types whose typmod is determined by the SQL
parser or some step after that, such as interval types where the stored
field option is encoded in the typmod:

    SELECT format_type(
      to_regtype('interval second(0)'),
      to_regtypemod('interval second(0)')
    );
        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example. Based
on code originally written by Erik Wienhold for use in pgTAP.
---
 doc/src/sgml/func.sgml                | 29 +++++++++++-
 src/backend/catalog/genbki.pl         |  2 +-
 src/backend/utils/adt/regproc.c       | 30 ++++++++++++
 src/include/catalog/pg_proc.dat       |  3 ++
 src/test/regress/expected/regproc.out | 66 +++++++++++++++++++++++++++
 src/test/regress/sql/regproc.sql      | 14 ++++++
 6 files changed, 141 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cf3de80394..57cfbe19f5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24764,7 +24764,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -25462,7 +25462,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
       </row>
 
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="to_regtype" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>to_regtype</primary>
         </indexterm>
@@ -25477,6 +25477,31 @@ SELECT collation for ('foo' COLLATE "de_DE");
         not found.
        </para></entry>
       </row>
+
+      <row>
+       <entry id="to_regtypemod" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>to_regtypemod</primary>
+        </indexterm>
+        <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para>
+        Parses a string representing an SQL data type, optionally schema-qualified, and
+        returns its typmod, modifier for the type. Complements <xref linkend="to_regtype"/>,
+        and can be passed to <xref linkend="format_type" />.
+       </para>
+       <para>
+        For example:
+<programlisting>
+SELECT format_type(to_regtype('varchar(32)'), to_regtypemod('varchar(32)'));
+      format_type
+-----------------------
+ character varying(32)
+</programlisting>
+       </para></entry>
+      </row>
+
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 94afdc5491..a0c299f067 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -1098,7 +1098,7 @@ sub lookup_oids
 			{
 				warn sprintf
 				  "unresolved OID reference \"%s\" in %s.dat field %s line %s\n",
-				  $lookupname, $catname, $attname, $bki_values->{line_number};
+				  $lookupname || '', $catname || '', $attname || '', $bki_values->{line_number} || '';
 				$num_errors++;
 			}
 		}
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 1e3bf3f5fd..0eb24ffe57 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_type.h"
+#include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -1220,6 +1221,35 @@ to_regtype(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+
+/*
+ * to_regtypemod() complements to_regtype, returning the typmod for the type,
+ * if any.
+ *
+ * If the type name is not found, we return NULL.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+Datum
+to_regtypemod(PG_FUNCTION_ARGS)
+{
+	char	   *typ_name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	Oid         typid;
+	int32       typmod;
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+
+	/*
+	 * Parse type-name argument to obtain the encoded typmod. Return NULL
+	 * on failure.
+	 */
+	if (!parseTypeString(typ_name, &typid, &typmod, (Node *) &escontext)) {
+		PG_RETURN_NULL();
+	}
+
+	PG_RETURN_INT32(typmod);
+}
+
 /*
  * regtypeout		- converts type OID to "typ_name"
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9c120fc2b7..2e1dd92ffd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7162,6 +7162,9 @@
 { oid => '3493', descr => 'convert type name to regtype',
   proname => 'to_regtype', provolatile => 's', prorettype => 'regtype',
   proargtypes => 'text', prosrc => 'to_regtype' },
+{ oid => '8401', descr => 'convert type name to type mod',
+  proname => 'to_regtypemod', provolatile => 's', prorettype => 'int4',
+  proargtypes => 'text', prosrc => 'to_regtypemod' },
 { oid => '1079', descr => 'convert text to regclass',
   proname => 'regclass', provolatile => 's', prorettype => 'regclass',
   proargtypes => 'text', prosrc => 'text_regclass' },
diff --git a/src/test/regress/expected/regproc.out b/src/test/regress/expected/regproc.out
index a9420850b8..2f99d22cc9 100644
--- a/src/test/regress/expected/regproc.out
+++ b/src/test/regress/expected/regproc.out
@@ -544,3 +544,69 @@ SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 ERROR:  improper qualified name (too many dotted names): way.too.many.names
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
 ERROR:  cross-database references are not implemented: no_such_catalog.schema.name
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamp(4)');
+ to_regtypemod 
+---------------
+             4
+(1 row)
+
+SELECT to_regtypemod('interval(0)');
+ to_regtypemod 
+---------------
+    2147418112
+(1 row)
+
+SELECT to_regtypemod('interval second(0)');
+ to_regtypemod 
+---------------
+     268435456
+(1 row)
+
+SELECT to_regtypemod('timestamptz');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamptz(6)');
+ to_regtypemod 
+---------------
+             6
+(1 row)
+
+SELECT to_regtypemod('varchar');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('varchar(128)');
+ to_regtypemod 
+---------------
+           132
+(1 row)
+
+SELECT to_regtypemod(NULL); -- null expected
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypemod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT to_regtypemod('year(4)'); -- grammar error expected
+ to_regtypemod 
+---------------
+              
+(1 row)
+
diff --git a/src/test/regress/sql/regproc.sql b/src/test/regress/sql/regproc.sql
index de2aa881a8..bf83d77fab 100644
--- a/src/test/regress/sql/regproc.sql
+++ b/src/test/regress/sql/regproc.sql
@@ -145,3 +145,17 @@ SELECT * FROM pg_input_error_info('incorrect type name syntax', 'regtype');
 SELECT * FROM pg_input_error_info('numeric(1,2,3)', 'regtype');  -- bogus typmod
 SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
+
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+SELECT to_regtypemod('timestamp(4)');
+SELECT to_regtypemod('interval(0)');
+SELECT to_regtypemod('interval second(0)');
+SELECT to_regtypemod('timestamptz');
+SELECT to_regtypemod('timestamptz(6)');
+SELECT to_regtypemod('varchar');
+SELECT to_regtypemod('varchar(128)');
+
+SELECT to_regtypemod(NULL); -- null expected
+SELECT to_regtypemod('interval nonesuch'); -- grammar error expected
+SELECT to_regtypemod('year(4)'); -- grammar error expected
-- 
2.43.2

#31jian he
jian.universality@gmail.com
In reply to: David E. Wheeler (#30)
Re: Patch: Add parse_type Function

On Tue, Feb 20, 2024 at 11:06 AM David E. Wheeler <david@justatheory.com> wrote:

LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached.

+SELECT to_regtypemod('interval nonesuch'); -- grammar error expected
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypemod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+SELECT to_regtypemod('year(4)'); -- grammar error expected
+ to_regtypemod
+---------------
+
+(1 row)
+

the second hint `-- grammar error expected` seems to contradict with
the results?

#32David E. Wheeler
david@justatheory.com
In reply to: jian he (#31)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 20, 2024, at 01:30, jian he <jian.universality@gmail.com> wrote:

the second hint `-- grammar error expected` seems to contradict with
the results?

Quite right, thank you, that’s actually a trapped error. I’ve tweaked the comments and their order in v7, attached.

This goes back to the discussion of the error raising of to_regtype[1]/messages/by-id/57E1FDDC-5A38-452D-82D7-A44DA2E13862@justatheory.com, so I’ve incorporated the patch from that thread into this patch, and set up the docs for to_regtypemod() with similar information. The wording is still a little opaque for my taste, though, written more for someone who knows a bit about the internals, but it’s a start.

I’ve also fixed the wayward parameter in the function signature in the docs, and added a note about why I’ve also patched genbki.pl.

Best,

David

[1]: /messages/by-id/57E1FDDC-5A38-452D-82D7-A44DA2E13862@justatheory.com

Attachments:

v7-0001-Add-to_regtypemod-SQL-function.patchapplication/octet-stream; name=v7-0001-Add-to_regtypemod-SQL-function.patch; x-unix-mode=0644Download
From dd11116416a78d7fc4ae7f2f549dc1ad987b199d Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Tue, 20 Feb 2024 09:43:48 -0500
Subject: [PATCH v7] Add to_regtypemod() SQL function

The `to_regtypemod()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID
(which id discards) and a typmod, which it returns. This value is
suitable for passing  as the second argument to `format_type()`, which
allow one to derive the formal SQL typmod from a string:

    SELECT format_type(
      to_regtype('varchar(32)'),
      to_regtypemod('varchar(32)')
    );
          format_type
    -----------------------
     character varying(32)

This function also resolves types whose typmod is determined by the SQL
parser or some step after that, such as interval types where the stored
field option is encoded in the typmod:

    SELECT format_type(
      to_regtype('interval second(0)'),
      to_regtypemod('interval second(0)')
    );
        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example. Based
on code originally written by Erik Wienhold for use in pgTAP.

Like `to_regtype()`, `to_regtypemod()` raises an error when the
`parseTypeString()` function drops into the SQL grammar parser raises
an error. So invalid type names like `inteval` or `interval nonesuch`
raise an error, rather than return `NULL`. Note this behavior in the
docs, and also update the documentation for `to_regtype()` to note
this behavior.

While at it, fix an "use of undefined value in sprintf" error in
`genbki.pl` that arose when I had specified `prorettype => 'integer'`
instead of `prorettype => 'int4'` in `pg_proc.dat`. I only realized the
invalid type name when I fixed the Perl bug.
---
 doc/src/sgml/func.sgml                | 40 +++++++++++++---
 src/backend/catalog/genbki.pl         |  2 +-
 src/backend/utils/adt/regproc.c       | 29 ++++++++++++
 src/include/catalog/pg_proc.dat       |  3 ++
 src/test/regress/expected/regproc.out | 66 +++++++++++++++++++++++++++
 src/test/regress/sql/regproc.sql      | 14 ++++++
 6 files changed, 146 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cf3de80394..56c5d7b097 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24764,7 +24764,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -25462,7 +25462,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
       </row>
 
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="to_regtype" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>to_regtype</primary>
         </indexterm>
@@ -25470,13 +25470,39 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <returnvalue>regtype</returnvalue>
        </para>
        <para>
-        Translates a textual type name to its OID.  A similar result is
-        obtained by casting the string to type <type>regtype</type> (see
-        <xref linkend="datatype-oid"/>); however, this function will return
-        <literal>NULL</literal> rather than throwing an error if the name is
-        not found.
+        Parses a string of text, extracts a potential type name from it, and
+        translates that name into an OID. A similar result is obtained by
+        casting the string to type <type>regtype</type> (see
+        <xref linkend="datatype-oid"/>). Failure to extract a valid potential
+        type name results in an error; however, if the extracted name is not
+        known to the system, this function will return <literal>NULL</literal>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry id="to_regtypemod" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>to_regtypemod</primary>
+        </indexterm>
+        <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para>
+        Parses a string of text, extracts a potential type name from it, and
+        translates its typmod, the modifier for the type, if any. Failure to
+        extract a valid potential type name results in an error; however, if
+        the extracted name is not known to the system, this function will
+        return <literal>NULL</literal>. Complements <xref linkend="to_regtype"/>,
+        and can be passed to <xref linkend="format_type" />. For example:
+<programlisting>
+SELECT format_type(to_regtype('varchar(32)'), to_regtypemod('varchar(32)'));
+      format_type
+-----------------------
+ character varying(32)
+</programlisting>
        </para></entry>
       </row>
+
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 94afdc5491..a0c299f067 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -1098,7 +1098,7 @@ sub lookup_oids
 			{
 				warn sprintf
 				  "unresolved OID reference \"%s\" in %s.dat field %s line %s\n",
-				  $lookupname, $catname, $attname, $bki_values->{line_number};
+				  $lookupname || '', $catname || '', $attname || '', $bki_values->{line_number} || '';
 				$num_errors++;
 			}
 		}
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 1e3bf3f5fd..75b8429e0d 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -1220,6 +1220,35 @@ to_regtype(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+
+/*
+ * to_regtypemod() complements to_regtype, returning the typmod for the type,
+ * if any.
+ *
+ * If the type name is not found, we return NULL.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+Datum
+to_regtypemod(PG_FUNCTION_ARGS)
+{
+	char	   *typ_name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	Oid         typid;
+	int32       typmod;
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+
+	/*
+	 * Parse type-name argument to obtain the encoded typmod. Return NULL
+	 * on failure.
+	 */
+	if (!parseTypeString(typ_name, &typid, &typmod, (Node *) &escontext)) {
+		PG_RETURN_NULL();
+	}
+
+	PG_RETURN_INT32(typmod);
+}
+
 /*
  * regtypeout		- converts type OID to "typ_name"
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9c120fc2b7..2e1dd92ffd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7162,6 +7162,9 @@
 { oid => '3493', descr => 'convert type name to regtype',
   proname => 'to_regtype', provolatile => 's', prorettype => 'regtype',
   proargtypes => 'text', prosrc => 'to_regtype' },
+{ oid => '8401', descr => 'convert type name to type mod',
+  proname => 'to_regtypemod', provolatile => 's', prorettype => 'int4',
+  proargtypes => 'text', prosrc => 'to_regtypemod' },
 { oid => '1079', descr => 'convert text to regclass',
   proname => 'regclass', provolatile => 's', prorettype => 'regclass',
   proargtypes => 'text', prosrc => 'text_regclass' },
diff --git a/src/test/regress/expected/regproc.out b/src/test/regress/expected/regproc.out
index a9420850b8..ff1ee5b52c 100644
--- a/src/test/regress/expected/regproc.out
+++ b/src/test/regress/expected/regproc.out
@@ -544,3 +544,69 @@ SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 ERROR:  improper qualified name (too many dotted names): way.too.many.names
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
 ERROR:  cross-database references are not implemented: no_such_catalog.schema.name
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamp(4)');
+ to_regtypemod 
+---------------
+             4
+(1 row)
+
+SELECT to_regtypemod('interval(0)');
+ to_regtypemod 
+---------------
+    2147418112
+(1 row)
+
+SELECT to_regtypemod('interval second(0)');
+ to_regtypemod 
+---------------
+     268435456
+(1 row)
+
+SELECT to_regtypemod('timestamptz');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamptz(6)');
+ to_regtypemod 
+---------------
+             6
+(1 row)
+
+SELECT to_regtypemod('varchar');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('varchar(128)');
+ to_regtypemod 
+---------------
+           132
+(1 row)
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypemod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
diff --git a/src/test/regress/sql/regproc.sql b/src/test/regress/sql/regproc.sql
index de2aa881a8..c1717e1f8f 100644
--- a/src/test/regress/sql/regproc.sql
+++ b/src/test/regress/sql/regproc.sql
@@ -145,3 +145,17 @@ SELECT * FROM pg_input_error_info('incorrect type name syntax', 'regtype');
 SELECT * FROM pg_input_error_info('numeric(1,2,3)', 'regtype');  -- bogus typmod
 SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
+
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+SELECT to_regtypemod('timestamp(4)');
+SELECT to_regtypemod('interval(0)');
+SELECT to_regtypemod('interval second(0)');
+SELECT to_regtypemod('timestamptz');
+SELECT to_regtypemod('timestamptz(6)');
+SELECT to_regtypemod('varchar');
+SELECT to_regtypemod('varchar(128)');
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
-- 
2.43.2

#33Erik Wienhold
ewie@ewie.name
In reply to: David E. Wheeler (#32)
Re: Patch: Add parse_type Function

On 2024-02-20 15:44 +0100, David E. Wheeler wrote:

I’ve tweaked the comments and their order in v7, attached.

This goes back to the discussion of the error raising of
to_regtype[1], so I’ve incorporated the patch from that thread into
this patch, and set up the docs for to_regtypemod() with similar
information.

Makes sense.

[1] /messages/by-id/57E1FDDC-5A38-452D-82D7-A44DA2E13862@justatheory.com

-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" role="func_table_entry"><para role="func_signature">
<indexterm>
<primary>format_type</primary>
</indexterm>
@@ -25462,7 +25462,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
</row>
<row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="to_regtype" role="func_table_entry"><para role="func_signature">

The references are printed as "???" right now. Can be fixed with
xreflabel="format_type" and xreflabel="to_regtype" on those <entry>
elements.

+ <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> )

The docs show parameter name "type" but pg_proc.data does not define
proargnames. So the to_regtypemod() cannot be called using named
notation:

test=> select to_regtypemod(type => 'int'::text);
ERROR: function to_regtypemod(type => text) does not exist

+        Parses a string of text, extracts a potential type name from it, and
+        translates its typmod, the modifier for the type, if any. Failure to

s/typmod, the modifier of the type/type modifier/

Because format_type() already uses "type modifier" and I think it helps
searchability to use the same wording.

--
Erik

#34David E. Wheeler
david@justatheory.com
In reply to: Erik Wienhold (#33)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 20, 2024, at 21:09, Erik Wienhold <ewie@ewie.name> wrote:

The references are printed as "???" right now. Can be fixed with
xreflabel="format_type" and xreflabel="to_regtype" on those <entry>
elements.

Thanks.

The docs show parameter name "type" but pg_proc.data does not define
proargnames. So the to_regtypemod() cannot be called using named
notation:

test=> select to_regtypemod(type => 'int'::text);
ERROR: function to_regtypemod(type => text) does not exist

Yes, this format is identical to that of to_regtype():

david=# select to_regtype(type => 'int'::text);
ERROR: function to_regtype(type => text) does not exist

+ Parses a string of text, extracts a potential type name from it, and

+ translates its typmod, the modifier for the type, if any. Failure to

s/typmod, the modifier of the type/type modifier/

Because format_type() already uses "type modifier" and I think it helps
searchability to use the same wording.

Yes, definitely better wording, thanks. V8 attached.

Best,

David

Attachments:

v8-0001-Add-to_regtypemod-SQL-function.patchapplication/applefile; name=v8-0001-Add-to_regtypemod-SQL-function.patchDownload
	2
<,v8-0001-Add-to_regtypemod-SQL-function.patch
#35Erik Wienhold
ewie@ewie.name
In reply to: David E. Wheeler (#34)
Re: Patch: Add parse_type Function

On 2024-02-21 16:14 +0100, David E. Wheeler wrote:

V8 attached.

Thanks. But it's an applefile again, not a patch :P

--
Erik

#36David E. Wheeler
david@justatheory.com
In reply to: Erik Wienhold (#35)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 21, 2024, at 11:18, Erik Wienhold <ewie@ewie.name> wrote:

Thanks. But it's an applefile again, not a patch :P

WTF. I still have that feature disabled.

Oh, I think I deleted the file after dragged it into Mail but before sending, because it’s empty everywhere I look. 🤦🏻‍♂️ Let’s try that again.

Best,

David

Attachments:

v8-0001-Add-to_regtypemod-SQL-function.patchapplication/octet-stream; name=v8-0001-Add-to_regtypemod-SQL-function.patch; x-unix-mode=0644Download
From 8b918759fd6f2328ac06a17c69a0990e2748e834 Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Wed, 21 Feb 2024 10:12:08 -0500
Subject: [PATCH v8] Add to_regtypemod() SQL function

The `to_regtypemod()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID
(which id discards) and a typmod, which it returns. This value is
suitable for passing  as the second argument to `format_type()`, which
allow one to derive the formal SQL typmod from a string:

    SELECT format_type(
      to_regtype('varchar(32)'),
      to_regtypemod('varchar(32)')
    );
          format_type
    -----------------------
     character varying(32)

This function also resolves types whose typmod is determined by the SQL
parser or some step after that, such as interval types where the stored
field option is encoded in the typmod:

    SELECT format_type(
      to_regtype('interval second(0)'),
      to_regtypemod('interval second(0)')
    );
        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example. Based
on code originally written by Erik Wienhold for use in pgTAP.

Like `to_regtype()`, `to_regtypemod()` raises an error when the
`parseTypeString()` function drops into the SQL grammar parser raises
an error. So invalid type names like `inteval` or `interval nonesuch`
raise an error, rather than return `NULL`. Note this behavior in the
docs, and also update the documentation for `to_regtype()` to note
this behavior.

While at it, fix an "use of undefined value in sprintf" error in
`genbki.pl` that arose when I had specified `prorettype => 'integer'`
instead of `prorettype => 'int4'` in `pg_proc.dat`. I only realized the
invalid type name when I fixed the Perl bug.
---
 doc/src/sgml/func.sgml                | 40 +++++++++++++---
 src/backend/catalog/genbki.pl         |  2 +-
 src/backend/utils/adt/regproc.c       | 29 ++++++++++++
 src/include/catalog/pg_proc.dat       |  3 ++
 src/test/regress/expected/regproc.out | 66 +++++++++++++++++++++++++++
 src/test/regress/sql/regproc.sql      | 14 ++++++
 6 files changed, 146 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e5fa82c161..4c3ffeb5e4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24763,7 +24763,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" xreflabel="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -25461,7 +25461,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
       </row>
 
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="to_regtype" xreflabel="to_regtype" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>to_regtype</primary>
         </indexterm>
@@ -25469,13 +25469,39 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <returnvalue>regtype</returnvalue>
        </para>
        <para>
-        Translates a textual type name to its OID.  A similar result is
-        obtained by casting the string to type <type>regtype</type> (see
-        <xref linkend="datatype-oid"/>); however, this function will return
-        <literal>NULL</literal> rather than throwing an error if the name is
-        not found.
+        Parses a string of text, extracts a potential type name from it, and
+        translates that name into an OID. A similar result is obtained by
+        casting the string to type <type>regtype</type> (see
+        <xref linkend="datatype-oid"/>). Failure to extract a valid potential
+        type name results in an error; however, if the extracted name is not
+        known to the system, this function will return <literal>NULL</literal>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry id="to_regtypemod" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>to_regtypemod</primary>
+        </indexterm>
+        <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para>
+        Parses a string of text, extracts a potential type name from it, and
+        translates its type modifier, if any. Failure to extract a valid
+        potential type name results in an error; however, if the extracted name
+        is not known to the system, this function will return
+        <literal>NULL</literal>. Complements <xref linkend="to_regtype"/>, and
+        can be passed to <xref linkend="format_type" />. For example:
+<programlisting>
+SELECT format_type(to_regtype('varchar(32)'), to_regtypemod('varchar(32)'));
+      format_type
+-----------------------
+ character varying(32)
+</programlisting>
        </para></entry>
       </row>
+
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 94afdc5491..a0c299f067 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -1098,7 +1098,7 @@ sub lookup_oids
 			{
 				warn sprintf
 				  "unresolved OID reference \"%s\" in %s.dat field %s line %s\n",
-				  $lookupname, $catname, $attname, $bki_values->{line_number};
+				  $lookupname || '', $catname || '', $attname || '', $bki_values->{line_number} || '';
 				$num_errors++;
 			}
 		}
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 1e3bf3f5fd..75b8429e0d 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -1220,6 +1220,35 @@ to_regtype(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+
+/*
+ * to_regtypemod() complements to_regtype, returning the typmod for the type,
+ * if any.
+ *
+ * If the type name is not found, we return NULL.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+Datum
+to_regtypemod(PG_FUNCTION_ARGS)
+{
+	char	   *typ_name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	Oid         typid;
+	int32       typmod;
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+
+	/*
+	 * Parse type-name argument to obtain the encoded typmod. Return NULL
+	 * on failure.
+	 */
+	if (!parseTypeString(typ_name, &typid, &typmod, (Node *) &escontext)) {
+		PG_RETURN_NULL();
+	}
+
+	PG_RETURN_INT32(typmod);
+}
+
 /*
  * regtypeout		- converts type OID to "typ_name"
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9c120fc2b7..2e1dd92ffd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7162,6 +7162,9 @@
 { oid => '3493', descr => 'convert type name to regtype',
   proname => 'to_regtype', provolatile => 's', prorettype => 'regtype',
   proargtypes => 'text', prosrc => 'to_regtype' },
+{ oid => '8401', descr => 'convert type name to type mod',
+  proname => 'to_regtypemod', provolatile => 's', prorettype => 'int4',
+  proargtypes => 'text', prosrc => 'to_regtypemod' },
 { oid => '1079', descr => 'convert text to regclass',
   proname => 'regclass', provolatile => 's', prorettype => 'regclass',
   proargtypes => 'text', prosrc => 'text_regclass' },
diff --git a/src/test/regress/expected/regproc.out b/src/test/regress/expected/regproc.out
index a9420850b8..ff1ee5b52c 100644
--- a/src/test/regress/expected/regproc.out
+++ b/src/test/regress/expected/regproc.out
@@ -544,3 +544,69 @@ SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 ERROR:  improper qualified name (too many dotted names): way.too.many.names
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
 ERROR:  cross-database references are not implemented: no_such_catalog.schema.name
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamp(4)');
+ to_regtypemod 
+---------------
+             4
+(1 row)
+
+SELECT to_regtypemod('interval(0)');
+ to_regtypemod 
+---------------
+    2147418112
+(1 row)
+
+SELECT to_regtypemod('interval second(0)');
+ to_regtypemod 
+---------------
+     268435456
+(1 row)
+
+SELECT to_regtypemod('timestamptz');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamptz(6)');
+ to_regtypemod 
+---------------
+             6
+(1 row)
+
+SELECT to_regtypemod('varchar');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('varchar(128)');
+ to_regtypemod 
+---------------
+           132
+(1 row)
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypemod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
diff --git a/src/test/regress/sql/regproc.sql b/src/test/regress/sql/regproc.sql
index de2aa881a8..c1717e1f8f 100644
--- a/src/test/regress/sql/regproc.sql
+++ b/src/test/regress/sql/regproc.sql
@@ -145,3 +145,17 @@ SELECT * FROM pg_input_error_info('incorrect type name syntax', 'regtype');
 SELECT * FROM pg_input_error_info('numeric(1,2,3)', 'regtype');  -- bogus typmod
 SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
+
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+SELECT to_regtypemod('timestamp(4)');
+SELECT to_regtypemod('interval(0)');
+SELECT to_regtypemod('interval second(0)');
+SELECT to_regtypemod('timestamptz');
+SELECT to_regtypemod('timestamptz(6)');
+SELECT to_regtypemod('varchar');
+SELECT to_regtypemod('varchar(128)');
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
-- 
2.43.2

#37Erik Wienhold
ewie@ewie.name
In reply to: David E. Wheeler (#36)
Re: Patch: Add parse_type Function

On 2024-02-21 17:51 +0100, David E. Wheeler wrote:

On Feb 21, 2024, at 11:18, Erik Wienhold <ewie@ewie.name> wrote:

Thanks. But it's an applefile again, not a patch :P

Let’s try that again.

Thanks.

+ <function>to_regtypemod</function> ( <parameter>type</parameter> <type>text</type> )

The docs still state that to_regtypemod() has a named parameter, which
is not the case per pg_proc.dat.

Besides that, LGTM.

--
Erik

#38David E. Wheeler
david@justatheory.com
In reply to: Erik Wienhold (#37)
1 attachment(s)
Re: Patch: Add parse_type Function

On Feb 21, 2024, at 16:43, Erik Wienhold <ewie@ewie.name> wrote:

The docs still state that to_regtypemod() has a named parameter, which
is not the case per pg_proc.dat.

Bah, I cleaned it up before but somehow put it back. Thanks for the catch. Fixed.

Best,

David

Attachments:

v9-0001-Add-to_regtypemod-SQL-function.patchapplication/octet-stream; name=v9-0001-Add-to_regtypemod-SQL-function.patch; x-unix-mode=0644Download
From f3606fdb98eeba3b5d61eae04efcaf6b6385efbf Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Wed, 21 Feb 2024 16:47:40 -0500
Subject: [PATCH v9] Add to_regtypemod() SQL function

The `to_regtypemod()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID
(which id discards) and a typmod, which it returns. This value is
suitable for passing  as the second argument to `format_type()`, which
allow one to derive the formal SQL typmod from a string:

    SELECT format_type(
      to_regtype('varchar(32)'),
      to_regtypemod('varchar(32)')
    );
          format_type
    -----------------------
     character varying(32)

This function also resolves types whose typmod is determined by the SQL
parser or some step after that, such as interval types where the stored
field option is encoded in the typmod:

    SELECT format_type(
      to_regtype('interval second(0)'),
      to_regtypemod('interval second(0)')
    );
        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example. Based
on code originally written by Erik Wienhold for use in pgTAP.

Like `to_regtype()`, `to_regtypemod()` raises an error when the
`parseTypeString()` function drops into the SQL grammar parser raises
an error. So invalid type names like `inteval` or `interval nonesuch`
raise an error, rather than return `NULL`. Note this behavior in the
docs, and also update the documentation for `to_regtype()` to note
this behavior.

While at it, fix an "use of undefined value in sprintf" error in
`genbki.pl` that arose when I had specified `prorettype => 'integer'`
instead of `prorettype => 'int4'` in `pg_proc.dat`. I only realized the
invalid type name when I fixed the Perl bug.
---
 doc/src/sgml/func.sgml                | 40 +++++++++++++---
 src/backend/catalog/genbki.pl         |  2 +-
 src/backend/utils/adt/regproc.c       | 29 ++++++++++++
 src/include/catalog/pg_proc.dat       |  3 ++
 src/test/regress/expected/regproc.out | 66 +++++++++++++++++++++++++++
 src/test/regress/sql/regproc.sql      | 14 ++++++
 6 files changed, 146 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e5fa82c161..92c9b9d1c0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24763,7 +24763,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" xreflabel="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -25461,7 +25461,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
       </row>
 
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="to_regtype" xreflabel="to_regtype" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>to_regtype</primary>
         </indexterm>
@@ -25469,13 +25469,39 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <returnvalue>regtype</returnvalue>
        </para>
        <para>
-        Translates a textual type name to its OID.  A similar result is
-        obtained by casting the string to type <type>regtype</type> (see
-        <xref linkend="datatype-oid"/>); however, this function will return
-        <literal>NULL</literal> rather than throwing an error if the name is
-        not found.
+        Parses a string of text, extracts a potential type name from it, and
+        translates that name into an OID. A similar result is obtained by
+        casting the string to type <type>regtype</type> (see
+        <xref linkend="datatype-oid"/>). Failure to extract a valid potential
+        type name results in an error; however, if the extracted name is not
+        known to the system, this function will return <literal>NULL</literal>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry id="to_regtypemod" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>to_regtypemod</primary>
+        </indexterm>
+        <function>to_regtypemod</function> ( <type>text</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para>
+        Parses a string of text, extracts a potential type name from it, and
+        translates its type modifier, if any. Failure to extract a valid
+        potential type name results in an error; however, if the extracted name
+        is not known to the system, this function will return
+        <literal>NULL</literal>. Complements <xref linkend="to_regtype"/>, and
+        can be passed to <xref linkend="format_type" />. For example:
+<programlisting>
+SELECT format_type(to_regtype('varchar(32)'), to_regtypemod('varchar(32)'));
+      format_type
+-----------------------
+ character varying(32)
+</programlisting>
        </para></entry>
       </row>
+
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 94afdc5491..a0c299f067 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -1098,7 +1098,7 @@ sub lookup_oids
 			{
 				warn sprintf
 				  "unresolved OID reference \"%s\" in %s.dat field %s line %s\n",
-				  $lookupname, $catname, $attname, $bki_values->{line_number};
+				  $lookupname || '', $catname || '', $attname || '', $bki_values->{line_number} || '';
 				$num_errors++;
 			}
 		}
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 1e3bf3f5fd..75b8429e0d 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -1220,6 +1220,35 @@ to_regtype(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+
+/*
+ * to_regtypemod() complements to_regtype, returning the typmod for the type,
+ * if any.
+ *
+ * If the type name is not found, we return NULL.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+Datum
+to_regtypemod(PG_FUNCTION_ARGS)
+{
+	char	   *typ_name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	Oid         typid;
+	int32       typmod;
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+
+	/*
+	 * Parse type-name argument to obtain the encoded typmod. Return NULL
+	 * on failure.
+	 */
+	if (!parseTypeString(typ_name, &typid, &typmod, (Node *) &escontext)) {
+		PG_RETURN_NULL();
+	}
+
+	PG_RETURN_INT32(typmod);
+}
+
 /*
  * regtypeout		- converts type OID to "typ_name"
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9c120fc2b7..2e1dd92ffd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7162,6 +7162,9 @@
 { oid => '3493', descr => 'convert type name to regtype',
   proname => 'to_regtype', provolatile => 's', prorettype => 'regtype',
   proargtypes => 'text', prosrc => 'to_regtype' },
+{ oid => '8401', descr => 'convert type name to type mod',
+  proname => 'to_regtypemod', provolatile => 's', prorettype => 'int4',
+  proargtypes => 'text', prosrc => 'to_regtypemod' },
 { oid => '1079', descr => 'convert text to regclass',
   proname => 'regclass', provolatile => 's', prorettype => 'regclass',
   proargtypes => 'text', prosrc => 'text_regclass' },
diff --git a/src/test/regress/expected/regproc.out b/src/test/regress/expected/regproc.out
index a9420850b8..ff1ee5b52c 100644
--- a/src/test/regress/expected/regproc.out
+++ b/src/test/regress/expected/regproc.out
@@ -544,3 +544,69 @@ SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 ERROR:  improper qualified name (too many dotted names): way.too.many.names
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
 ERROR:  cross-database references are not implemented: no_such_catalog.schema.name
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamp(4)');
+ to_regtypemod 
+---------------
+             4
+(1 row)
+
+SELECT to_regtypemod('interval(0)');
+ to_regtypemod 
+---------------
+    2147418112
+(1 row)
+
+SELECT to_regtypemod('interval second(0)');
+ to_regtypemod 
+---------------
+     268435456
+(1 row)
+
+SELECT to_regtypemod('timestamptz');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamptz(6)');
+ to_regtypemod 
+---------------
+             6
+(1 row)
+
+SELECT to_regtypemod('varchar');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('varchar(128)');
+ to_regtypemod 
+---------------
+           132
+(1 row)
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypemod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
diff --git a/src/test/regress/sql/regproc.sql b/src/test/regress/sql/regproc.sql
index de2aa881a8..c1717e1f8f 100644
--- a/src/test/regress/sql/regproc.sql
+++ b/src/test/regress/sql/regproc.sql
@@ -145,3 +145,17 @@ SELECT * FROM pg_input_error_info('incorrect type name syntax', 'regtype');
 SELECT * FROM pg_input_error_info('numeric(1,2,3)', 'regtype');  -- bogus typmod
 SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
+
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+SELECT to_regtypemod('timestamp(4)');
+SELECT to_regtypemod('interval(0)');
+SELECT to_regtypemod('interval second(0)');
+SELECT to_regtypemod('timestamptz');
+SELECT to_regtypemod('timestamptz(6)');
+SELECT to_regtypemod('varchar');
+SELECT to_regtypemod('varchar(128)');
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
-- 
2.43.2

#39Erik Wienhold
ewie@ewie.name
In reply to: David E. Wheeler (#38)
Re: Patch: Add parse_type Function

On 2024-02-21 22:49 +0100, David E. Wheeler wrote:

On Feb 21, 2024, at 16:43, Erik Wienhold <ewie@ewie.name> wrote:

The docs still state that to_regtypemod() has a named parameter, which
is not the case per pg_proc.dat.

Bah, I cleaned it up before but somehow put it back. Thanks for the
catch. Fixed.

Thanks David! LGTM.

--
Erik

#40David E. Wheeler
david@justatheory.com
In reply to: Erik Wienhold (#39)
Re: Patch: Add parse_type Function

On Feb 21, 2024, at 17:19, Erik Wienhold <ewie@ewie.name> wrote:

Thanks David! LGTM.

Thanks. Anyone else? Or is it ready for committer?

Best,

David

#41David E. Wheeler
david@justatheory.com
In reply to: David E. Wheeler (#40)
Re: Patch: Add parse_type Function

On Feb 21, 2024, at 19:13, David E. Wheeler <david@justatheory.com> wrote:

Thanks. Anyone else? Or is it ready for committer?

What’s the protocol for marking a patch ready for committer?

Thanks,

David

#42Jim Jones
jim.jones@uni-muenster.de
In reply to: David E. Wheeler (#41)
Re: Patch: Add parse_type Function

On 24.02.24 14:46, David E. Wheeler wrote:

What’s the protocol for marking a patch ready for committer?

I guess after the review of the last assigned reviewer

v9 applies cleanly, all tests pass and documentation builds correctly.

Just a very small observation:

The fact that a completely invalid type returns NULL ..

SELECT to_regtypemod('foo');
 to_regtypemod
---------------
              
(1 row)

.. but a "partially" valid one returns an error might be confusing

postgres=# SELECT to_regtypemod('timestamp(-4)');
ERROR:  syntax error at or near "-"
LINE 1: SELECT to_regtypemod('timestamp(-4)');
                  ^
CONTEXT:  invalid type name "timestamp(-4)"

postgres=# SELECT to_regtypemod('text(-4)');
ERROR:  type modifier is not allowed for type "text"

This behaviour is mentioned in the documentation, so I'd say it is ok.

+        <xref linkend="datatype-oid"/>). Failure to extract a valid
potential
+        type name results in an error; however, if the extracted name
is not
+        known to the system, this function will return
<literal>NULL</literal>.

I would personally prefer either NULL or an error in both cases, but I
can totally live with the current design.

OTOH, format_type returns "???" and it seems to be fine, so don't take
this comment too seriously :)

SELECT format_type(-1,-1);
 format_type
-------------
 ???
(1 row)

Other than that, LGTM.

Thanks David!

Best,
Jim

--
Jim

#43David E. Wheeler
david@justatheory.com
In reply to: Jim Jones (#42)
Re: Patch: Add parse_type Function

On Feb 24, 2024, at 19:11, Jim Jones <jim.jones@uni-muenster.de> wrote:

What’s the protocol for marking a patch ready for committer?

I guess after the review of the last assigned reviewer

Oh, I didn’t realize someone was assigned. :-)

The fact that a completely invalid type returns NULL ..

SELECT to_regtypemod('foo');
to_regtypemod
---------------

(1 row)

.. but a "partially" valid one returns an error might be confusing

postgres=# SELECT to_regtypemod('timestamp(-4)');
ERROR: syntax error at or near "-"
LINE 1: SELECT to_regtypemod('timestamp(-4)');
^
CONTEXT: invalid type name "timestamp(-4)"

postgres=# SELECT to_regtypemod('text(-4)');
ERROR: type modifier is not allowed for type "text"

Yeah, there was quite a bit of discussion of this issue back in September[1]/messages/by-id/09F9CAD6-5096-43CC-B6A7-685703E4714D@justatheory.com.

This behaviour is mentioned in the documentation, so I'd say it is ok.

This is my attempt to make it clearer that it can return an error, but I don’t love the wording TBH.

I would personally prefer either NULL or an error in both cases, but I
can totally live with the current design.

SAME.

Best,

David

[1]: /messages/by-id/09F9CAD6-5096-43CC-B6A7-685703E4714D@justatheory.com

#44David E. Wheeler
david@justatheory.com
In reply to: David E. Wheeler (#43)
1 attachment(s)
Re: Patch: Add parse_type Function

Hello Hackers,

On Feb 25, 2024, at 13:00, David E. Wheeler <david@justatheory.com> wrote:

postgres=# SELECT to_regtypemod('timestamp(-4)');
ERROR: syntax error at or near "-"
LINE 1: SELECT to_regtypemod('timestamp(-4)');
^
CONTEXT: invalid type name "timestamp(-4)"

postgres=# SELECT to_regtypemod('text(-4)');
ERROR: type modifier is not allowed for type "text"

Yeah, there was quite a bit of discussion of this issue back in September[1].

This behaviour is mentioned in the documentation, so I'd say it is ok.

This is my attempt to make it clearer that it can return an error, but I don’t love the wording TBH.

I’ve rebased the patch and, in an attempt to clarify this behavior, added a couple of examples to the docs for to_regtype. Updated patch attached.

Best,

David

Attachments:

v10-0001-Add-to_regtypemod-SQL-function.patchapplication/octet-stream; name=v10-0001-Add-to_regtypemod-SQL-function.patch; x-unix-mode=0644Download
From f878a001ff18b826dddb6cab98a5f703ebc9a2e1 Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Thu, 7 Mar 2024 20:32:23 -0500
Subject: [PATCH v10] Add to_regtypemod() SQL function

The `to_regtypemod()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID
(which id discards) and a typmod, which it returns. This value is
suitable for passing as the second argument to `format_type()`, which
allow one to derive the formal SQL typmod from a string:

    SELECT format_type(
      to_regtype('varchar(32)'),
      to_regtypemod('varchar(32)')
    );
          format_type
    -----------------------
     character varying(32)

This function also resolves types whose typmod is determined by the SQL
parser or some step after that, such as interval types where the stored
field option is encoded in the typmod:

    SELECT format_type(
      to_regtype('interval second(0)'),
      to_regtypemod('interval second(0)')
    );
        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example. Based
on code originally written by Erik Wienhold for use in pgTAP.

Like `to_regtype()`, `to_regtypemod()` raises an error when the
`parseTypeString()` function drops into the SQL grammar parser raises
an error. So invalid type names like `inteval` or `interval nonesuch`
raise an error, rather than return `NULL`. Note this behavior in the
docs, and also update the documentation for `to_regtype()` to note
this behavior and cite examples.

While at it, fix an "use of undefined value in sprintf" error in
`genbki.pl` that arose when I had specified `prorettype => 'integer'`
instead of `prorettype => 'int4'` in `pg_proc.dat`. I only realized the
invalid type name when I fixed the Perl bug.
---
 doc/src/sgml/func.sgml                | 54 +++++++++++++++++++---
 src/backend/catalog/genbki.pl         |  2 +-
 src/backend/utils/adt/regproc.c       | 29 ++++++++++++
 src/include/catalog/pg_proc.dat       |  3 ++
 src/test/regress/expected/regproc.out | 66 +++++++++++++++++++++++++++
 src/test/regress/sql/regproc.sql      | 14 ++++++
 6 files changed, 160 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e5fa82c161..07e5207af6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24763,7 +24763,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" xreflabel="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -25461,7 +25461,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
       </row>
 
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="to_regtype" xreflabel="to_regtype" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>to_regtype</primary>
         </indexterm>
@@ -25469,13 +25469,53 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <returnvalue>regtype</returnvalue>
        </para>
        <para>
-        Translates a textual type name to its OID.  A similar result is
-        obtained by casting the string to type <type>regtype</type> (see
-        <xref linkend="datatype-oid"/>); however, this function will return
-        <literal>NULL</literal> rather than throwing an error if the name is
-        not found.
+        Parses a string of text, extracts a potential type name from it, and
+        translates that name into an OID. A similar result is obtained by
+        casting the string to type <type>regtype</type> (see
+        <xref linkend="datatype-oid"/>). Failure to extract a valid potential
+        type name results in an error. For example:
+<programlisting>
+SELECT to_regtype('party');
+ to_regtype
+------------
+
+</programlisting>
+        However, if the extracted name is not known to the system, this function
+        will return <literal>NULL</literal>. For example:
+<programlisting>
+SELECT to_regtype('interval nonesuch');
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: select to_regtype('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+</programlisting>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry id="to_regtypemod" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>to_regtypemod</primary>
+        </indexterm>
+        <function>to_regtypemod</function> ( <type>text</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para>
+        Parses a string of text, extracts a potential type name from it, and
+        translates its type modifier, if any. Failure to extract a valid
+        potential type name results in an error; however, if the extracted name
+        is not known to the system, this function will return
+        <literal>NULL</literal>. Complements <xref linkend="to_regtype"/>, and
+        can be passed to <xref linkend="format_type" />. For example:
+<programlisting>
+SELECT format_type(to_regtype('varchar(32)'), to_regtypemod('varchar(32)'));
+      format_type
+-----------------------
+ character varying(32)
+</programlisting>
        </para></entry>
       </row>
+
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 94afdc5491..a0c299f067 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -1098,7 +1098,7 @@ sub lookup_oids
 			{
 				warn sprintf
 				  "unresolved OID reference \"%s\" in %s.dat field %s line %s\n",
-				  $lookupname, $catname, $attname, $bki_values->{line_number};
+				  $lookupname || '', $catname || '', $attname || '', $bki_values->{line_number} || '';
 				$num_errors++;
 			}
 		}
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 1e3bf3f5fd..75b8429e0d 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -1220,6 +1220,35 @@ to_regtype(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+
+/*
+ * to_regtypemod() complements to_regtype, returning the typmod for the type,
+ * if any.
+ *
+ * If the type name is not found, we return NULL.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+Datum
+to_regtypemod(PG_FUNCTION_ARGS)
+{
+	char	   *typ_name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	Oid         typid;
+	int32       typmod;
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+
+	/*
+	 * Parse type-name argument to obtain the encoded typmod. Return NULL
+	 * on failure.
+	 */
+	if (!parseTypeString(typ_name, &typid, &typmod, (Node *) &escontext)) {
+		PG_RETURN_NULL();
+	}
+
+	PG_RETURN_INT32(typmod);
+}
+
 /*
  * regtypeout		- converts type OID to "typ_name"
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 291ed876fc..48feafaad4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7155,6 +7155,9 @@
 { oid => '3493', descr => 'convert type name to regtype',
   proname => 'to_regtype', provolatile => 's', prorettype => 'regtype',
   proargtypes => 'text', prosrc => 'to_regtype' },
+{ oid => '8401', descr => 'convert type name to type mod',
+  proname => 'to_regtypemod', provolatile => 's', prorettype => 'int4',
+  proargtypes => 'text', prosrc => 'to_regtypemod' },
 { oid => '1079', descr => 'convert text to regclass',
   proname => 'regclass', provolatile => 's', prorettype => 'regclass',
   proargtypes => 'text', prosrc => 'text_regclass' },
diff --git a/src/test/regress/expected/regproc.out b/src/test/regress/expected/regproc.out
index a9420850b8..ff1ee5b52c 100644
--- a/src/test/regress/expected/regproc.out
+++ b/src/test/regress/expected/regproc.out
@@ -544,3 +544,69 @@ SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 ERROR:  improper qualified name (too many dotted names): way.too.many.names
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
 ERROR:  cross-database references are not implemented: no_such_catalog.schema.name
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamp(4)');
+ to_regtypemod 
+---------------
+             4
+(1 row)
+
+SELECT to_regtypemod('interval(0)');
+ to_regtypemod 
+---------------
+    2147418112
+(1 row)
+
+SELECT to_regtypemod('interval second(0)');
+ to_regtypemod 
+---------------
+     268435456
+(1 row)
+
+SELECT to_regtypemod('timestamptz');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamptz(6)');
+ to_regtypemod 
+---------------
+             6
+(1 row)
+
+SELECT to_regtypemod('varchar');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('varchar(128)');
+ to_regtypemod 
+---------------
+           132
+(1 row)
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypemod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
diff --git a/src/test/regress/sql/regproc.sql b/src/test/regress/sql/regproc.sql
index de2aa881a8..c1717e1f8f 100644
--- a/src/test/regress/sql/regproc.sql
+++ b/src/test/regress/sql/regproc.sql
@@ -145,3 +145,17 @@ SELECT * FROM pg_input_error_info('incorrect type name syntax', 'regtype');
 SELECT * FROM pg_input_error_info('numeric(1,2,3)', 'regtype');  -- bogus typmod
 SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
+
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+SELECT to_regtypemod('timestamp(4)');
+SELECT to_regtypemod('interval(0)');
+SELECT to_regtypemod('interval second(0)');
+SELECT to_regtypemod('timestamptz');
+SELECT to_regtypemod('timestamptz(6)');
+SELECT to_regtypemod('varchar');
+SELECT to_regtypemod('varchar(128)');
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
-- 
2.44.0

#45Erik Wienhold
ewie@ewie.name
In reply to: David E. Wheeler (#44)
Re: Patch: Add parse_type Function

Hi David,

On 2024-03-08 02:37 +0100, David E. Wheeler wrote:

I’ve rebased the patch and, in an attempt to clarify this behavior,
added a couple of examples to the docs for to_regtype. Updated patch
attached.

On your latest addition:

+        <xref linkend="datatype-oid"/>). Failure to extract a valid potential
+        type name results in an error. For example:
+<programlisting>
+SELECT to_regtype('party');
+ to_regtype
+------------
+
+</programlisting>
+        However, if the extracted name is not known to the system, this function
+        will return <literal>NULL</literal>. For example:
+<programlisting>
+SELECT to_regtype('interval nonesuch');
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: select to_regtype('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+</programlisting>

I think you need to swap the examples. The text mentions the error case
first and the NULL case second.

--
Erik

#46David E. Wheeler
david@justatheory.com
In reply to: Erik Wienhold (#45)
1 attachment(s)
Re: Patch: Add parse_type Function

On Mar 7, 2024, at 23:39, Erik Wienhold <ewie@ewie.name> wrote:

I think you need to swap the examples. The text mentions the error case
first and the NULL case second.

🤦🏻‍♂️

Thanks, fixed in the attached patch.

David

Attachments:

v11-0001-Add-to_regtypemod-SQL-function.patchapplication/octet-stream; name=v11-0001-Add-to_regtypemod-SQL-function.patch; x-unix-mode=0644Download
From 2a58ef7d7c8d3d7558e0f21941c4707bedb4eda3 Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Fri, 8 Mar 2024 20:40:43 -0500
Subject: [PATCH v11] Add to_regtypemod() SQL function

The `to_regtypemod()` function uses the underlying `parseTypeString()` C
function to parse a string representing a data type into a type ID
(which id discards) and a typmod, which it returns. This value is
suitable for passing as the second argument to `format_type()`, which
allow one to derive the formal SQL typmod from a string:

    SELECT format_type(
      to_regtype('varchar(32)'),
      to_regtypemod('varchar(32)')
    );
          format_type
    -----------------------
     character varying(32)

This function also resolves types whose typmod is determined by the SQL
parser or some step after that, such as interval types where the stored
field option is encoded in the typmod:

    SELECT format_type(
      to_regtype('interval second(0)'),
      to_regtypemod('interval second(0)')
    );
        format_type
    --------------------
    interval second(0)

Useful for unit tests for against column data types, for example. Based
on code originally written by Erik Wienhold for use in pgTAP.

Like `to_regtype()`, `to_regtypemod()` raises an error when the
`parseTypeString()` function drops into the SQL grammar parser raises
an error. So invalid type names like `inteval` or `interval nonesuch`
raise an error, rather than return `NULL`. Note this behavior in the
docs, and also update the documentation for `to_regtype()` to note
this behavior and cite examples.

While at it, fix an "use of undefined value in sprintf" error in
`genbki.pl` that arose when I had specified `prorettype => 'integer'`
instead of `prorettype => 'int4'` in `pg_proc.dat`. I only realized the
invalid type name when I fixed the Perl bug.
---
 doc/src/sgml/func.sgml                | 54 +++++++++++++++++++---
 src/backend/catalog/genbki.pl         |  2 +-
 src/backend/utils/adt/regproc.c       | 29 ++++++++++++
 src/include/catalog/pg_proc.dat       |  3 ++
 src/test/regress/expected/regproc.out | 66 +++++++++++++++++++++++++++
 src/test/regress/sql/regproc.sql      | 14 ++++++
 6 files changed, 160 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0bb7aeb40e..18908c8a48 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24763,7 +24763,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
      <tbody>
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="format_type" xreflabel="format_type" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>format_type</primary>
         </indexterm>
@@ -25461,7 +25461,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
       </row>
 
       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="to_regtype" xreflabel="to_regtype" role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>to_regtype</primary>
         </indexterm>
@@ -25469,13 +25469,53 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <returnvalue>regtype</returnvalue>
        </para>
        <para>
-        Translates a textual type name to its OID.  A similar result is
-        obtained by casting the string to type <type>regtype</type> (see
-        <xref linkend="datatype-oid"/>); however, this function will return
-        <literal>NULL</literal> rather than throwing an error if the name is
-        not found.
+        Parses a string of text, extracts a potential type name from it, and
+        translates that name into an OID. A similar result is obtained by
+        casting the string to type <type>regtype</type> (see
+        <xref linkend="datatype-oid"/>). Failure to extract a valid potential
+        type name results in an error. For example:
+<programlisting>
+SELECT to_regtype('interval nonesuch');
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: select to_regtype('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
+</programlisting>
+        However, if the extracted name is not known to the system, this function
+        will return <literal>NULL</literal>. For example:
+<programlisting>
+SELECT to_regtype('party');
+ to_regtype
+------------
+
+</programlisting>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry id="to_regtypemod" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>to_regtypemod</primary>
+        </indexterm>
+        <function>to_regtypemod</function> ( <type>text</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para>
+        Parses a string of text, extracts a potential type name from it, and
+        translates its type modifier, if any. Failure to extract a valid
+        potential type name results in an error; however, if the extracted name
+        is not known to the system, this function will return
+        <literal>NULL</literal>. Complements <xref linkend="to_regtype"/>, and
+        can be passed to <xref linkend="format_type" />. For example:
+<programlisting>
+SELECT format_type(to_regtype('varchar(32)'), to_regtypemod('varchar(32)'));
+      format_type
+-----------------------
+ character varying(32)
+</programlisting>
        </para></entry>
       </row>
+
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 94afdc5491..a0c299f067 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -1098,7 +1098,7 @@ sub lookup_oids
 			{
 				warn sprintf
 				  "unresolved OID reference \"%s\" in %s.dat field %s line %s\n",
-				  $lookupname, $catname, $attname, $bki_values->{line_number};
+				  $lookupname || '', $catname || '', $attname || '', $bki_values->{line_number} || '';
 				$num_errors++;
 			}
 		}
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 1e3bf3f5fd..75b8429e0d 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -1220,6 +1220,35 @@ to_regtype(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
+
+/*
+ * to_regtypemod() complements to_regtype, returning the typmod for the type,
+ * if any.
+ *
+ * If the type name is not found, we return NULL.
+ *
+ * Internally it relies on the Postgres core parseTypeString() function defined
+ * in src/backend/parser/parse_type.c.
+ */
+Datum
+to_regtypemod(PG_FUNCTION_ARGS)
+{
+	char	   *typ_name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	Oid         typid;
+	int32       typmod;
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+
+	/*
+	 * Parse type-name argument to obtain the encoded typmod. Return NULL
+	 * on failure.
+	 */
+	if (!parseTypeString(typ_name, &typid, &typmod, (Node *) &escontext)) {
+		PG_RETURN_NULL();
+	}
+
+	PG_RETURN_INT32(typmod);
+}
+
 /*
  * regtypeout		- converts type OID to "typ_name"
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 291ed876fc..48feafaad4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7155,6 +7155,9 @@
 { oid => '3493', descr => 'convert type name to regtype',
   proname => 'to_regtype', provolatile => 's', prorettype => 'regtype',
   proargtypes => 'text', prosrc => 'to_regtype' },
+{ oid => '8401', descr => 'convert type name to type mod',
+  proname => 'to_regtypemod', provolatile => 's', prorettype => 'int4',
+  proargtypes => 'text', prosrc => 'to_regtypemod' },
 { oid => '1079', descr => 'convert text to regclass',
   proname => 'regclass', provolatile => 's', prorettype => 'regclass',
   proargtypes => 'text', prosrc => 'text_regclass' },
diff --git a/src/test/regress/expected/regproc.out b/src/test/regress/expected/regproc.out
index a9420850b8..ff1ee5b52c 100644
--- a/src/test/regress/expected/regproc.out
+++ b/src/test/regress/expected/regproc.out
@@ -544,3 +544,69 @@ SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 ERROR:  improper qualified name (too many dotted names): way.too.many.names
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
 ERROR:  cross-database references are not implemented: no_such_catalog.schema.name
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamp(4)');
+ to_regtypemod 
+---------------
+             4
+(1 row)
+
+SELECT to_regtypemod('interval(0)');
+ to_regtypemod 
+---------------
+    2147418112
+(1 row)
+
+SELECT to_regtypemod('interval second(0)');
+ to_regtypemod 
+---------------
+     268435456
+(1 row)
+
+SELECT to_regtypemod('timestamptz');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('timestamptz(6)');
+ to_regtypemod 
+---------------
+             6
+(1 row)
+
+SELECT to_regtypemod('varchar');
+ to_regtypemod 
+---------------
+            -1
+(1 row)
+
+SELECT to_regtypemod('varchar(128)');
+ to_regtypemod 
+---------------
+           132
+(1 row)
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+ to_regtypemod 
+---------------
+              
+(1 row)
+
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
+ERROR:  syntax error at or near "nonesuch"
+LINE 1: SELECT to_regtypemod('interval nonesuch');
+                 ^
+CONTEXT:  invalid type name "interval nonesuch"
diff --git a/src/test/regress/sql/regproc.sql b/src/test/regress/sql/regproc.sql
index de2aa881a8..c1717e1f8f 100644
--- a/src/test/regress/sql/regproc.sql
+++ b/src/test/regress/sql/regproc.sql
@@ -145,3 +145,17 @@ SELECT * FROM pg_input_error_info('incorrect type name syntax', 'regtype');
 SELECT * FROM pg_input_error_info('numeric(1,2,3)', 'regtype');  -- bogus typmod
 SELECT * FROM pg_input_error_info('way.too.many.names', 'regtype');
 SELECT * FROM pg_input_error_info('no_such_catalog.schema.name', 'regtype');
+
+-- Test to_regtypemod
+SELECT to_regtypemod('text');
+SELECT to_regtypemod('timestamp(4)');
+SELECT to_regtypemod('interval(0)');
+SELECT to_regtypemod('interval second(0)');
+SELECT to_regtypemod('timestamptz');
+SELECT to_regtypemod('timestamptz(6)');
+SELECT to_regtypemod('varchar');
+SELECT to_regtypemod('varchar(128)');
+
+SELECT to_regtypemod(NULL);                -- returns null on null input
+SELECT to_regtypemod('year(4)');           -- error trapped, returns null
+SELECT to_regtypemod('interval nonesuch'); -- grammar error raised, not trapped
-- 
2.44.0

#47Erik Wienhold
ewie@ewie.name
In reply to: David E. Wheeler (#46)
Re: Patch: Add parse_type Function

On 2024-03-09 02:42 +0100, David E. Wheeler wrote:

On Mar 7, 2024, at 23:39, Erik Wienhold <ewie@ewie.name> wrote:

I think you need to swap the examples. The text mentions the error case
first and the NULL case second.

🤦🏻‍♂️

Thanks, fixed in the attached patch.

Thanks, LGTM.

--
Erik

#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#46)
Re: Patch: Add parse_type Function

"David E. Wheeler" <david@justatheory.com> writes:

Thanks, fixed in the attached patch.

Pushed with some editorialization. Mostly, I whacked the
documentation around pretty heavily: we have a convention for what
examples in function descriptions should look like, and this wasn't
it. Not entirely your fault, since some nearby entries in that
table hadn't gotten the word either.

Also, I made a point of adding tests for a corner case that
I initially thought the patch would not get right:

SELECT format_type(to_regtype('bit'), to_regtypemod('bit'));
format_type
-------------
bit(1)
(1 row)

SELECT format_type(to_regtype('"bit"'), to_regtypemod('"bit"'));
format_type
-------------
"bit"
(1 row)

This comes from the comment in format_type():

/*
* bit with typmod -1 is not the same as BIT, which means
* BIT(1) per SQL spec. Report it as the quoted typename so
* that parser will not assign a bogus typmod.
*/

My initial fear was that we'd have to emit NULL for no typmod in order
to round-trip the second case, but it seems to work as-is, so that's good.
I left it emitting -1 for no typmod, and documented that explicitly.

regards, tom lane

#49David E. Wheeler
david@justatheory.com
In reply to: Tom Lane (#48)
Re: Patch: Add parse_type Function

On Mar 20, 2024, at 17:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pushed with some editorialization. Mostly, I whacked the
documentation around pretty heavily: we have a convention for what
examples in function descriptions should look like, and this wasn't
it. Not entirely your fault, since some nearby entries in that
table hadn't gotten the word either.

Ah, great, and your wording on the parser error issue is much better, thank you!

Best,

David