pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value
Hello,
Attached patch introduces a function pg_column_toast_chunk_id
that returns a chunk ID of a TOASTed value.
Recently, one of our clients needed a way to show which columns
are actually TOASTed because they would like to know how much
updates on the original table affects to its toast table
specifically with regard to auto VACUUM. We could not find a
function for this purpose in the current PostgreSQL, so I would
like propose pg_column_toast_chunk_id.
This function returns a chunk ID of a TOASTed value, or NULL
if the value is not TOASTed. Here is an example;
postgres=# \d val
Table "public.val"
Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
t | text | | |
postgres=# select length(t), pg_column_size(t), pg_column_compression(t), pg_column_toast_chunk_id(t), tableoid from val;
length | pg_column_size | pg_column_compression | pg_column_toast_chunk_id | tableoid
--------+----------------+-----------------------+--------------------------+----------
3 | 4 | | | 16388
3000 | 46 | pglz | | 16388
32000 | 413 | pglz | | 16388
305 | 309 | | | 16388
64000 | 64000 | | 16393 | 16388
(5 rows)
postgres=# select chunk_id, chunk_seq from pg_toast.pg_toast_16388;
chunk_id | chunk_seq
----------+-----------
16393 | 0
16393 | 1
16393 | 2
(snip)
16393 | 30
16393 | 31
16393 | 32
(33 rows)
This function is also useful to identify a problematic row when
an error like
"ERROR: unexpected chunk number ... (expected ...) for toast value"
occurs.
The patch is a just a concept patch and not including documentation
and tests.
What do you think about this feature?
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
pg_column_toast_chunk_id.patchtext/x-diff; name=pg_column_toast_chunk_id.patchDownload
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 5778e3f0ef..4ddcf885cd 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5070,6 +5070,47 @@ pg_column_compression(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(result));
}
+/*
+ * Return the chunk id of the TOASTed value.
+ * Return NULL for unTOASTed value.
+ */
+Datum
+pg_column_toast_chunk_id(PG_FUNCTION_ARGS)
+{
+ int typlen;
+ struct varlena *attr;
+ struct varatt_external toast_pointer;
+
+ /* On first call, get the input type's typlen, and save at *fn_extra */
+ if (fcinfo->flinfo->fn_extra == NULL)
+ {
+ /* Lookup the datatype of the supplied argument */
+ Oid argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+
+ typlen = get_typlen(argtypeid);
+ if (typlen == 0) /* should not happen */
+ elog(ERROR, "cache lookup failed for type %u", argtypeid);
+
+ fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(int));
+ *((int *) fcinfo->flinfo->fn_extra) = typlen;
+ }
+ else
+ typlen = *((int *) fcinfo->flinfo->fn_extra);
+
+ if (typlen != -1)
+ PG_RETURN_NULL();
+
+ attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0));
+
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ PG_RETURN_NULL();
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ PG_RETURN_OID(toast_pointer.va_valueid);
+}
+
/*
* string_agg - Concatenates values and returns string.
*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c358cff16..4c214f4c63 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7414,6 +7414,9 @@
{ oid => '2121', descr => 'compression method for the compressed datum',
proname => 'pg_column_compression', provolatile => 's', prorettype => 'text',
proargtypes => 'any', prosrc => 'pg_column_compression' },
+{ oid => '8393', descr => 'chunk ID of TOASTed value',
+ proname => 'pg_column_toast_chunk_id', provolatile => 's', prorettype => 'oid',
+ proargtypes => 'any', prosrc => 'pg_column_toast_chunk_id' },
{ oid => '2322',
descr => 'total disk space usage for the specified tablespace',
proname => 'pg_tablespace_size', provolatile => 'v', prorettype => 'int8',
Hi!
I like the idea of having a standard function which shows a TOAST value ID
for a row. I've used my own to handle TOAST errors. Just, maybe, more
correct
name would be "...value_id", because you actually retrieve valueid field
from the TOAST pointer, and chunk ID consists of valueid + chunk_seq.
--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/
Hi Nikita,
On Wed, 5 Jul 2023 17:49:20 +0300
Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi!
I like the idea of having a standard function which shows a TOAST value ID
for a row. I've used my own to handle TOAST errors. Just, maybe, more
correct
name would be "...value_id", because you actually retrieve valueid field
from the TOAST pointer, and chunk ID consists of valueid + chunk_seq.
Thank you for your review!
Although, the retrieved field is "va_valueid" and it is called "value ID" in the
code, I chose the name "..._chunk_id" because I found the description in the
documentation as followings:
-------------
Every TOAST table has the columns chunk_id (an OID identifying the particular TOASTed value), chunk_seq (a sequence number for the chunk within its value), and chunk_data (the actual data of the chunk). A unique index on chunk_id and chunk_seq provides fast retrieval of the values. A pointer datum representing an out-of-line on-disk TOASTed value therefore needs to store the OID of the TOAST table in which to look and the OID of the specific value (its chunk_id)
-------------
https://www.postgresql.org/docs/devel/storage-toast.html
Here, chunk_id defined separately from chunk_seq. Therefore, I wonder
pg_column_toast_chunk_id would be ok. However, I don't insist on this
and I would be happy to change it if the other name is more natural for users.
Regards,
Yugo Nagata
--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Fri, 7 Jul 2023 17:21:36 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
Hi Nikita,
On Wed, 5 Jul 2023 17:49:20 +0300
Nikita Malakhov <hukutoc@gmail.com> wrote:Hi!
I like the idea of having a standard function which shows a TOAST value ID
for a row. I've used my own to handle TOAST errors. Just, maybe, more
correct
name would be "...value_id", because you actually retrieve valueid field
from the TOAST pointer, and chunk ID consists of valueid + chunk_seq.Thank you for your review!
Although, the retrieved field is "va_valueid" and it is called "value ID" in the
code, I chose the name "..._chunk_id" because I found the description in the
documentation as followings:-------------
Every TOAST table has the columns chunk_id (an OID identifying the particular TOASTed value), chunk_seq (a sequence number for the chunk within its value), and chunk_data (the actual data of the chunk). A unique index on chunk_id and chunk_seq provides fast retrieval of the values. A pointer datum representing an out-of-line on-disk TOASTed value therefore needs to store the OID of the TOAST table in which to look and the OID of the specific value (its chunk_id)
-------------
https://www.postgresql.org/docs/devel/storage-toast.htmlHere, chunk_id defined separately from chunk_seq. Therefore, I wonder
pg_column_toast_chunk_id would be ok. However, I don't insist on this
and I would be happy to change it if the other name is more natural for users.
I attached v2 patch that contains the documentation fix.
Regards,
Yugo Nagata
Regards,
Yugo Nagata--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/--
Yugo NAGATA <nagata@sraoss.co.jp>
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v2_pg_column_toast_chunk_id.patchtext/x-diff; name=v2_pg_column_toast_chunk_id.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a47ce4343..c2c3156cd4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27608,6 +27608,20 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_column_toast_chunk_id</primary>
+ </indexterm>
+ <function>pg_column_toast_chunk_id</function> ( <type>"any"</type> )
+ <returnvalue>oid</returnvalue>
+ </para>
+ <para>
+ Returns the chunk id of the TOASTed value, or <literal>NULL</literal>
+ if the value is not TOASTed.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 884bfbc8ce..fe8788c1b1 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5069,6 +5069,47 @@ pg_column_compression(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(result));
}
+/*
+ * Return the chunk id of the TOASTed value.
+ * Return NULL for unTOASTed value.
+ */
+Datum
+pg_column_toast_chunk_id(PG_FUNCTION_ARGS)
+{
+ int typlen;
+ struct varlena *attr;
+ struct varatt_external toast_pointer;
+
+ /* On first call, get the input type's typlen, and save at *fn_extra */
+ if (fcinfo->flinfo->fn_extra == NULL)
+ {
+ /* Lookup the datatype of the supplied argument */
+ Oid argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+
+ typlen = get_typlen(argtypeid);
+ if (typlen == 0) /* should not happen */
+ elog(ERROR, "cache lookup failed for type %u", argtypeid);
+
+ fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(int));
+ *((int *) fcinfo->flinfo->fn_extra) = typlen;
+ }
+ else
+ typlen = *((int *) fcinfo->flinfo->fn_extra);
+
+ if (typlen != -1)
+ PG_RETURN_NULL();
+
+ attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0));
+
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ PG_RETURN_NULL();
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ PG_RETURN_OID(toast_pointer.va_valueid);
+}
+
/*
* string_agg - Concatenates values and returns string.
*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..0cacd0391d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7403,6 +7403,9 @@
{ oid => '2121', descr => 'compression method for the compressed datum',
proname => 'pg_column_compression', provolatile => 's', prorettype => 'text',
proargtypes => 'any', prosrc => 'pg_column_compression' },
+{ oid => '8393', descr => 'chunk ID of TOASTed value',
+ proname => 'pg_column_toast_chunk_id', provolatile => 's', prorettype => 'oid',
+ proargtypes => 'any', prosrc => 'pg_column_toast_chunk_id' },
{ oid => '2322',
descr => 'total disk space usage for the specified tablespace',
proname => 'pg_tablespace_size', provolatile => 'v', prorettype => 'int8',
Hello
My +1 to have such a function in core or in some contrib at least (pg_surgery? amcheck?).
In the past, more than once I needed to find a damaged tuple knowing only chunk id and toastrelid. This feature would help a lot.
regards, Sergei
minor doc issues.
Returns the chunk id of the TOASTed value, or NULL if the value is not TOASTed.
Should it be "chunk_id"?
you may place it after pg_create_logical_replication_slot entry to
make it look like alphabetical order.
There is no test. maybe we can add following to src/test/regress/sql/misc.sql
create table val(t text);
INSERT into val(t) SELECT string_agg(
chr((ascii('B') + round(random() * 25)) :: integer),'')
FROM generate_series(1,2500);
select pg_column_toast_chunk_id(t) is not null from val;
drop table val;
On Mon, Nov 6, 2023 at 8:00 AM jian he <jian.universality@gmail.com> wrote:
minor doc issues.
Returns the chunk id of the TOASTed value, or NULL if the value is not TOASTed.
Should it be "chunk_id"?you may place it after pg_create_logical_replication_slot entry to
make it look like alphabetical order.There is no test. maybe we can add following to src/test/regress/sql/misc.sql
create table val(t text);
INSERT into val(t) SELECT string_agg(
chr((ascii('B') + round(random() * 25)) :: integer),'')
FROM generate_series(1,2500);
select pg_column_toast_chunk_id(t) is not null from val;
drop table val;
Hi
the main C function (pg_column_toast_chunk_id) I didn't change.
I added tests as mentioned above.
tests put it on src/test/regress/sql/misc.sql, i hope that's fine.
I placed pg_column_toast_chunk_id in "Table 9.99. Database Object
Location Functions" (below Table 9.98. Database Object Size
Functions).
Attachments:
v2_pg_column_toast_chunk_id (1).patchtext/x-patch; charset=US-ASCII; name="v2_pg_column_toast_chunk_id (1).patch"Download
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a47ce4343..c2c3156cd4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27608,6 +27608,20 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_column_toast_chunk_id</primary>
+ </indexterm>
+ <function>pg_column_toast_chunk_id</function> ( <type>"any"</type> )
+ <returnvalue>oid</returnvalue>
+ </para>
+ <para>
+ Returns the chunk id of the TOASTed value, or <literal>NULL</literal>
+ if the value is not TOASTed.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 884bfbc8ce..fe8788c1b1 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5069,6 +5069,47 @@ pg_column_compression(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(result));
}
+/*
+ * Return the chunk id of the TOASTed value.
+ * Return NULL for unTOASTed value.
+ */
+Datum
+pg_column_toast_chunk_id(PG_FUNCTION_ARGS)
+{
+ int typlen;
+ struct varlena *attr;
+ struct varatt_external toast_pointer;
+
+ /* On first call, get the input type's typlen, and save at *fn_extra */
+ if (fcinfo->flinfo->fn_extra == NULL)
+ {
+ /* Lookup the datatype of the supplied argument */
+ Oid argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+
+ typlen = get_typlen(argtypeid);
+ if (typlen == 0) /* should not happen */
+ elog(ERROR, "cache lookup failed for type %u", argtypeid);
+
+ fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(int));
+ *((int *) fcinfo->flinfo->fn_extra) = typlen;
+ }
+ else
+ typlen = *((int *) fcinfo->flinfo->fn_extra);
+
+ if (typlen != -1)
+ PG_RETURN_NULL();
+
+ attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0));
+
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ PG_RETURN_NULL();
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ PG_RETURN_OID(toast_pointer.va_valueid);
+}
+
/*
* string_agg - Concatenates values and returns string.
*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..0cacd0391d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7403,6 +7403,9 @@
{ oid => '2121', descr => 'compression method for the compressed datum',
proname => 'pg_column_compression', provolatile => 's', prorettype => 'text',
proargtypes => 'any', prosrc => 'pg_column_compression' },
+{ oid => '8393', descr => 'chunk ID of TOASTed value',
+ proname => 'pg_column_toast_chunk_id', provolatile => 's', prorettype => 'oid',
+ proargtypes => 'any', prosrc => 'pg_column_toast_chunk_id' },
{ oid => '2322',
descr => 'total disk space usage for the specified tablespace',
proname => 'pg_tablespace_size', provolatile => 'v', prorettype => 'int8',
On Tue, 2 Jan 2024 08:00:00 +0800
jian he <jian.universality@gmail.com> wrote:
On Mon, Nov 6, 2023 at 8:00 AM jian he <jian.universality@gmail.com> wrote:
minor doc issues.
Returns the chunk id of the TOASTed value, or NULL if the value is not TOASTed.
Should it be "chunk_id"?
Thank you for your suggestion. As you pointed out, it is called "chunk_id"
in the documentation, so I rewrote it and also added a link to the section
where the TOAST table structure is explained.
you may place it after pg_create_logical_replication_slot entry to
make it look like alphabetical order.
I've been thinking about where we should place the function in the doc,
and I decided place it in the table of Database Object Size Functions
because I think pg_column_toast_chunk_id also would assist understanding
the result of size functions as similar to pg_column_compression; that is,
those function can explain why a large value in size could be stored in
a column.
There is no test. maybe we can add following to src/test/regress/sql/misc.sql
create table val(t text);
INSERT into val(t) SELECT string_agg(
chr((ascii('B') + round(random() * 25)) :: integer),'')
FROM generate_series(1,2500);
select pg_column_toast_chunk_id(t) is not null from val;
drop table val;
Thank you for the test proposal. However, if we add a test, I want
to check that the chunk_id returned by the function exists in the
TOAST table, and that it returns NULL if the values is not TOASTed.
For the purpose, I wrote a test using a dynamic SQL since the table
name of the TOAST table have to be generated from the main table's OID.
Hi
the main C function (pg_column_toast_chunk_id) I didn't change.
I added tests as mentioned above.
tests put it on src/test/regress/sql/misc.sql, i hope that's fine.
I placed pg_column_toast_chunk_id in "Table 9.99. Database Object
Location Functions" (below Table 9.98. Database Object Size
Functions).
I could not find any change in your patch from my previous patch.
Maybe, you attached wrong file. I attached a patch updated based
on your review, including the documentation fixes and a test.
What do you think about this it?
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v3-0001-Add-pg_column_toast_chunk_id-function.patchtext/x-diff; name=v3-0001-Add-pg_column_toast_chunk_id-function.patchDownload
From 97af1b2300ecf07a34923da87a9d84e6aa963956 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 29 Mar 2023 09:59:25 +0900
Subject: [PATCH v3] Add pg_column_toast_chunk_id function
This function returns the chunk_id of an on-disk TOASTed value, or
NULL if the value is un-TOASTed or not on disk. This enables users to
know which columns are actually TOASTed. This function is also useful
to identify a problematic row when an error like
"ERROR: unexpected chunk number ... (expected ...) for toast value"
occurs.
---
doc/src/sgml/func.sgml | 17 ++++++++++++++
src/backend/utils/adt/varlena.c | 41 +++++++++++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 3 +++
3 files changed, 61 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..2d82331323 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28078,6 +28078,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_column_toast_chunk_id</primary>
+ </indexterm>
+ <function>pg_column_toast_chunk_id</function> ( <type>"any"</type> )
+ <returnvalue>oid</returnvalue>
+ </para>
+ <para>
+ Shows the <structfield>chunk_id</structfield> of an on-disk
+ <acronym>TOAST</acronym>ed value. Returns <literal>NULL</literal>
+ if the value is un-<acronym>TOAST</acronym>ed or not on-disk.
+ See <xref linkend="storage-toast-ondisk"/> for details about
+ <acronym>TOAST</acronym>.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 543afb66e5..84d36781a4 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5105,6 +5105,47 @@ pg_column_compression(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(result));
}
+/*
+ * Return the chunk_id of the on-disk TOASTed value.
+ * Return NULL if the value is unTOASTed or not on disk.
+ */
+Datum
+pg_column_toast_chunk_id(PG_FUNCTION_ARGS)
+{
+ int typlen;
+ struct varlena *attr;
+ struct varatt_external toast_pointer;
+
+ /* On first call, get the input type's typlen, and save at *fn_extra */
+ if (fcinfo->flinfo->fn_extra == NULL)
+ {
+ /* Lookup the datatype of the supplied argument */
+ Oid argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+
+ typlen = get_typlen(argtypeid);
+ if (typlen == 0) /* should not happen */
+ elog(ERROR, "cache lookup failed for type %u", argtypeid);
+
+ fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(int));
+ *((int *) fcinfo->flinfo->fn_extra) = typlen;
+ }
+ else
+ typlen = *((int *) fcinfo->flinfo->fn_extra);
+
+ if (typlen != -1)
+ PG_RETURN_NULL();
+
+ attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0));
+
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ PG_RETURN_NULL();
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ PG_RETURN_OID(toast_pointer.va_valueid);
+}
+
/*
* string_agg - Concatenates values and returns string.
*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58811a6530..1d4521ac1f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7454,6 +7454,9 @@
{ oid => '2121', descr => 'compression method for the compressed datum',
proname => 'pg_column_compression', provolatile => 's', prorettype => 'text',
proargtypes => 'any', prosrc => 'pg_column_compression' },
+{ oid => '8393', descr => 'chunk ID of on-disk TOASTed value',
+ proname => 'pg_column_toast_chunk_id', provolatile => 's', prorettype => 'oid',
+ proargtypes => 'any', prosrc => 'pg_column_toast_chunk_id' },
{ oid => '2322',
descr => 'total disk space usage for the specified tablespace',
proname => 'pg_tablespace_size', provolatile => 'v', prorettype => 'int8',
--
2.25.1
On Fri, Jan 26, 2024 at 8:42 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Tue, 2 Jan 2024 08:00:00 +0800
jian he <jian.universality@gmail.com> wrote:On Mon, Nov 6, 2023 at 8:00 AM jian he <jian.universality@gmail.com> wrote:
minor doc issues.
Returns the chunk id of the TOASTed value, or NULL if the value is not TOASTed.
Should it be "chunk_id"?Thank you for your suggestion. As you pointed out, it is called "chunk_id"
in the documentation, so I rewrote it and also added a link to the section
where the TOAST table structure is explained.you may place it after pg_create_logical_replication_slot entry to
make it look like alphabetical order.I've been thinking about where we should place the function in the doc,
and I decided place it in the table of Database Object Size Functions
because I think pg_column_toast_chunk_id also would assist understanding
the result of size functions as similar to pg_column_compression; that is,
those function can explain why a large value in size could be stored in
a column.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..2d82331323 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28078,6 +28078,23 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_column_toast_chunk_id</primary>
+ </indexterm>
+ <function>pg_column_toast_chunk_id</function> ( <type>"any"</type> )
+ <returnvalue>oid</returnvalue>
+ </para>
+ <para>
+ Shows the <structfield>chunk_id</structfield> of an on-disk
+ <acronym>TOAST</acronym>ed value. Returns <literal>NULL</literal>
+ if the value is un-<acronym>TOAST</acronym>ed or not on-disk.
+ See <xref linkend="storage-toast-ondisk"/> for details about
+ <acronym>TOAST</acronym>.
+ </para></entry>
+ </row>
v3 patch will place it on `Table 9.97. Replication Management Functions`
I agree with you. it should be placed after pg_column_compression. but
apply your patch, it will be at
There is no test. maybe we can add following to src/test/regress/sql/misc.sql
create table val(t text);
INSERT into val(t) SELECT string_agg(
chr((ascii('B') + round(random() * 25)) :: integer),'')
FROM generate_series(1,2500);
select pg_column_toast_chunk_id(t) is not null from val;
drop table val;Thank you for the test proposal. However, if we add a test, I want
to check that the chunk_id returned by the function exists in the
TOAST table, and that it returns NULL if the values is not TOASTed.
For the purpose, I wrote a test using a dynamic SQL since the table
name of the TOAST table have to be generated from the main table's OID.Hi
the main C function (pg_column_toast_chunk_id) I didn't change.
I added tests as mentioned above.
tests put it on src/test/regress/sql/misc.sql, i hope that's fine.
I placed pg_column_toast_chunk_id in "Table 9.99. Database Object
Location Functions" (below Table 9.98. Database Object Size
Functions).I could not find any change in your patch from my previous patch.
Maybe, you attached wrong file. I attached a patch updated based
on your review, including the documentation fixes and a test.
What do you think about this it?
sorry, I had attached the wrong file.
but your v3 also has no tests, documentation didn't fix.
maybe you also attached the wrong file too?
On Tue, 30 Jan 2024 12:12:31 +0800
jian he <jian.universality@gmail.com> wrote:
On Fri, Jan 26, 2024 at 8:42 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Tue, 2 Jan 2024 08:00:00 +0800
jian he <jian.universality@gmail.com> wrote:On Mon, Nov 6, 2023 at 8:00 AM jian he <jian.universality@gmail.com> wrote:
minor doc issues.
Returns the chunk id of the TOASTed value, or NULL if the value is not TOASTed.
Should it be "chunk_id"?Thank you for your suggestion. As you pointed out, it is called "chunk_id"
in the documentation, so I rewrote it and also added a link to the section
where the TOAST table structure is explained.you may place it after pg_create_logical_replication_slot entry to
make it look like alphabetical order.I've been thinking about where we should place the function in the doc,
and I decided place it in the table of Database Object Size Functions
because I think pg_column_toast_chunk_id also would assist understanding
the result of size functions as similar to pg_column_compression; that is,
those function can explain why a large value in size could be stored in
a column.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 210c7c0b02..2d82331323 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28078,6 +28078,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset </para></entry> </row>+ <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_column_toast_chunk_id</primary> + </indexterm> + <function>pg_column_toast_chunk_id</function> ( <type>"any"</type> ) + <returnvalue>oid</returnvalue> + </para> + <para> + Shows the <structfield>chunk_id</structfield> of an on-disk + <acronym>TOAST</acronym>ed value. Returns <literal>NULL</literal> + if the value is un-<acronym>TOAST</acronym>ed or not on-disk. + See <xref linkend="storage-toast-ondisk"/> for details about + <acronym>TOAST</acronym>. + </para></entry> + </row>v3 patch will place it on `Table 9.97. Replication Management Functions`
I agree with you. it should be placed after pg_column_compression. but
apply your patch, it will be atThere is no test. maybe we can add following to src/test/regress/sql/misc.sql
create table val(t text);
INSERT into val(t) SELECT string_agg(
chr((ascii('B') + round(random() * 25)) :: integer),'')
FROM generate_series(1,2500);
select pg_column_toast_chunk_id(t) is not null from val;
drop table val;Thank you for the test proposal. However, if we add a test, I want
to check that the chunk_id returned by the function exists in the
TOAST table, and that it returns NULL if the values is not TOASTed.
For the purpose, I wrote a test using a dynamic SQL since the table
name of the TOAST table have to be generated from the main table's OID.Hi
the main C function (pg_column_toast_chunk_id) I didn't change.
I added tests as mentioned above.
tests put it on src/test/regress/sql/misc.sql, i hope that's fine.
I placed pg_column_toast_chunk_id in "Table 9.99. Database Object
Location Functions" (below Table 9.98. Database Object Size
Functions).I could not find any change in your patch from my previous patch.
Maybe, you attached wrong file. I attached a patch updated based
on your review, including the documentation fixes and a test.
What do you think about this it?sorry, I had attached the wrong file.
but your v3 also has no tests, documentation didn't fix.
maybe you also attached the wrong file too?
Sorry, I also attached a wrong file.
Attached is the correct one.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v3-0001-Add-pg_column_toast_chunk_id-function.patchtext/x-diff; name=v3-0001-Add-pg_column_toast_chunk_id-function.patchDownload
From 97af1b2300ecf07a34923da87a9d84e6aa963956 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 29 Mar 2023 09:59:25 +0900
Subject: [PATCH v3] Add pg_column_toast_chunk_id function
This function returns the chunk_id of an on-disk TOASTed value, or
NULL if the value is un-TOASTed or not on disk. This enables users to
know which columns are actually TOASTed. This function is also useful
to identify a problematic row when an error like
"ERROR: unexpected chunk number ... (expected ...) for toast value"
occurs.
---
doc/src/sgml/func.sgml | 17 ++++++++++++++
src/backend/utils/adt/varlena.c | 41 +++++++++++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 3 +++
3 files changed, 61 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..2d82331323 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28078,6 +28078,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_column_toast_chunk_id</primary>
+ </indexterm>
+ <function>pg_column_toast_chunk_id</function> ( <type>"any"</type> )
+ <returnvalue>oid</returnvalue>
+ </para>
+ <para>
+ Shows the <structfield>chunk_id</structfield> of an on-disk
+ <acronym>TOAST</acronym>ed value. Returns <literal>NULL</literal>
+ if the value is un-<acronym>TOAST</acronym>ed or not on-disk.
+ See <xref linkend="storage-toast-ondisk"/> for details about
+ <acronym>TOAST</acronym>.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 543afb66e5..84d36781a4 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5105,6 +5105,47 @@ pg_column_compression(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(result));
}
+/*
+ * Return the chunk_id of the on-disk TOASTed value.
+ * Return NULL if the value is unTOASTed or not on disk.
+ */
+Datum
+pg_column_toast_chunk_id(PG_FUNCTION_ARGS)
+{
+ int typlen;
+ struct varlena *attr;
+ struct varatt_external toast_pointer;
+
+ /* On first call, get the input type's typlen, and save at *fn_extra */
+ if (fcinfo->flinfo->fn_extra == NULL)
+ {
+ /* Lookup the datatype of the supplied argument */
+ Oid argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+
+ typlen = get_typlen(argtypeid);
+ if (typlen == 0) /* should not happen */
+ elog(ERROR, "cache lookup failed for type %u", argtypeid);
+
+ fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(int));
+ *((int *) fcinfo->flinfo->fn_extra) = typlen;
+ }
+ else
+ typlen = *((int *) fcinfo->flinfo->fn_extra);
+
+ if (typlen != -1)
+ PG_RETURN_NULL();
+
+ attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0));
+
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ PG_RETURN_NULL();
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ PG_RETURN_OID(toast_pointer.va_valueid);
+}
+
/*
* string_agg - Concatenates values and returns string.
*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58811a6530..1d4521ac1f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7454,6 +7454,9 @@
{ oid => '2121', descr => 'compression method for the compressed datum',
proname => 'pg_column_compression', provolatile => 's', prorettype => 'text',
proargtypes => 'any', prosrc => 'pg_column_compression' },
+{ oid => '8393', descr => 'chunk ID of on-disk TOASTed value',
+ proname => 'pg_column_toast_chunk_id', provolatile => 's', prorettype => 'oid',
+ proargtypes => 'any', prosrc => 'pg_column_toast_chunk_id' },
{ oid => '2322',
descr => 'total disk space usage for the specified tablespace',
proname => 'pg_tablespace_size', provolatile => 'v', prorettype => 'int8',
--
2.25.1
On Tue, Jan 30, 2024 at 12:35 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
Sorry, I also attached a wrong file.
Attached is the correct one.
I think you attached the wrong file again. also please name it as v4.
On Tue, 30 Jan 2024 13:47:45 +0800
jian he <jian.universality@gmail.com> wrote:
On Tue, Jan 30, 2024 at 12:35 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
Sorry, I also attached a wrong file.
Attached is the correct one.I think you attached the wrong file again. also please name it as v4.
Opps..sorry, again.
I attached the correct one, v4.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v4-0001-Add-pg_column_toast_chunk_id-function.patchtext/x-diff; name=v4-0001-Add-pg_column_toast_chunk_id-function.patchDownload
From 8c0f9993c49d1d4ed1bb3e10ba26652da98cd41e Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 29 Mar 2023 09:59:25 +0900
Subject: [PATCH v4] Add pg_column_toast_chunk_id function
This function returns the chunk_id of an on-disk TOASTed value, or
NULL if the value is un-TOASTed or not on disk. This enables users to
know which columns are actually TOASTed. This function is also useful
to identify a problematic row when an error like
"ERROR: unexpected chunk number ... (expected ...) for toast value"
occurs.
---
doc/src/sgml/func.sgml | 17 ++++++++
src/backend/utils/adt/varlena.c | 41 ++++++++++++++++++++
src/include/catalog/pg_proc.dat | 3 ++
src/test/regress/expected/misc_functions.out | 18 +++++++++
src/test/regress/sql/misc_functions.sql | 18 +++++++++
5 files changed, 97 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..2d82331323 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28078,6 +28078,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_column_toast_chunk_id</primary>
+ </indexterm>
+ <function>pg_column_toast_chunk_id</function> ( <type>"any"</type> )
+ <returnvalue>oid</returnvalue>
+ </para>
+ <para>
+ Shows the <structfield>chunk_id</structfield> of an on-disk
+ <acronym>TOAST</acronym>ed value. Returns <literal>NULL</literal>
+ if the value is un-<acronym>TOAST</acronym>ed or not on-disk.
+ See <xref linkend="storage-toast-ondisk"/> for details about
+ <acronym>TOAST</acronym>.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 543afb66e5..84d36781a4 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5105,6 +5105,47 @@ pg_column_compression(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(result));
}
+/*
+ * Return the chunk_id of the on-disk TOASTed value.
+ * Return NULL if the value is unTOASTed or not on disk.
+ */
+Datum
+pg_column_toast_chunk_id(PG_FUNCTION_ARGS)
+{
+ int typlen;
+ struct varlena *attr;
+ struct varatt_external toast_pointer;
+
+ /* On first call, get the input type's typlen, and save at *fn_extra */
+ if (fcinfo->flinfo->fn_extra == NULL)
+ {
+ /* Lookup the datatype of the supplied argument */
+ Oid argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+
+ typlen = get_typlen(argtypeid);
+ if (typlen == 0) /* should not happen */
+ elog(ERROR, "cache lookup failed for type %u", argtypeid);
+
+ fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+ sizeof(int));
+ *((int *) fcinfo->flinfo->fn_extra) = typlen;
+ }
+ else
+ typlen = *((int *) fcinfo->flinfo->fn_extra);
+
+ if (typlen != -1)
+ PG_RETURN_NULL();
+
+ attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0));
+
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ PG_RETURN_NULL();
+
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ PG_RETURN_OID(toast_pointer.va_valueid);
+}
+
/*
* string_agg - Concatenates values and returns string.
*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58811a6530..1d4521ac1f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7454,6 +7454,9 @@
{ oid => '2121', descr => 'compression method for the compressed datum',
proname => 'pg_column_compression', provolatile => 's', prorettype => 'text',
proargtypes => 'any', prosrc => 'pg_column_compression' },
+{ oid => '8393', descr => 'chunk ID of on-disk TOASTed value',
+ proname => 'pg_column_toast_chunk_id', provolatile => 's', prorettype => 'oid',
+ proargtypes => 'any', prosrc => 'pg_column_toast_chunk_id' },
{ oid => '2322',
descr => 'total disk space usage for the specified tablespace',
proname => 'pg_tablespace_size', provolatile => 'v', prorettype => 'int8',
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 9302134077..6a1fcc22f5 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -670,3 +670,21 @@ FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size - 1),
0 | t
(1 row)
+-- Test pg_column_toast_chunk_id:
+-- Check if the returned chunk_id exists in the TOAST table
+CREATE TABLE test_chunk_id (v1 text, v2 text);
+INSERT INTO test_chunk_id VALUES (
+ repeat('0123456789', 10), -- v1: small enough not to be TOASTed
+ repeat('0123456789', 100000)); -- v2: large enough to be TOASTed
+DO $$
+ DECLARE result text;
+ BEGIN
+ EXECUTE format(
+ 'SELECT ''ok'' FROM test_chunk_id
+ WHERE pg_column_toast_chunk_id(v1) IS NULL AND
+ pg_column_toast_chunk_id(v2) IN (SELECT chunk_id FROM pg_toast.pg_toast_%s)',
+ regclass('test_chunk_id')::int::text) INTO result;
+ RAISE INFO '%', result;
+END; $$;
+INFO: ok
+DROP TABLE test_chunk_id;
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index d3dc591173..fe3b6c26c6 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -250,3 +250,21 @@ FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size + 1),
SELECT segment_number, file_offset = :segment_size - 1
FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size - 1),
pg_split_walfile_name(file_name);
+
+-- Test pg_column_toast_chunk_id:
+-- Check if the returned chunk_id exists in the TOAST table
+CREATE TABLE test_chunk_id (v1 text, v2 text);
+INSERT INTO test_chunk_id VALUES (
+ repeat('0123456789', 10), -- v1: small enough not to be TOASTed
+ repeat('0123456789', 100000)); -- v2: large enough to be TOASTed
+DO $$
+ DECLARE result text;
+ BEGIN
+ EXECUTE format(
+ 'SELECT ''ok'' FROM test_chunk_id
+ WHERE pg_column_toast_chunk_id(v1) IS NULL AND
+ pg_column_toast_chunk_id(v2) IN (SELECT chunk_id FROM pg_toast.pg_toast_%s)',
+ regclass('test_chunk_id')::int::text) INTO result;
+ RAISE INFO '%', result;
+END; $$;
+DROP TABLE test_chunk_id;
--
2.25.1
On Tue, Jan 30, 2024 at 1:56 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
I attached the correct one, v4.
+-- Test pg_column_toast_chunk_id:
+-- Check if the returned chunk_id exists in the TOAST table
+CREATE TABLE test_chunk_id (v1 text, v2 text);
+INSERT INTO test_chunk_id VALUES (
+ repeat('0123456789', 10), -- v1: small enough not to be TOASTed
+ repeat('0123456789', 100000)); -- v2: large enough to be TOASTed
select pg_size_pretty(100000::bigint);
return 98kb.
I think this is just too much, maybe I didn't consider the
implications of compression.
Anyway, I refactored the tests, making the toast value size be small.
I aslo refactor the doc.
pg_column_toast_chunk_id entry will be right after pg_column_compression entry.
You can check the screenshot.
Attachments:
Screenshot from 2024-01-30 14-37-42.pngimage/png; name="Screenshot from 2024-01-30 14-37-42.png"Download
�PNG
IHDR � ���� sBIT|d� tEXtSoftware gnome-screenshot��>