[PATCH] Infinite loop while acquiring new TOAST Oid

Started by Nikita Malakhovabout 3 years ago26 messages
#1Nikita Malakhov
hukutoc@gmail.com
1 attachment(s)

Hi hackers!

While working on Pluggable TOAST we've detected a defective behavior
on tables with large amounts of TOASTed data - queries freeze and DB stalls.
Further investigation led us to the loop with GetNewOidWithIndex function
call - when all available Oid already exist in the related TOAST table this
loop continues infinitely. Data type used for value ID is the UINT32, which
is
unsigned int and has a maximum value of *4294967295* which allows
maximum 4294967295 records in the TOAST table. It is not a very big amount
for modern databases and is the major problem for productive systems.

Quick fix for this problem is limiting GetNewOidWithIndex loops to some
reasonable amount defined by related macro and returning error if there is
still no available Oid. Please check attached patch, any feedback is
appreciated.

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Attachments:

0001_infinite_new_toast_oid_v1.patchapplication/octet-stream; name=0001_infinite_new_toast_oid_v1.patchDownload
From 73bbacd36de89a7d79b87270ddd89bf74fe2e1aa Mon Sep 17 00:00:00 2001
From: Nikita Malakhov <n.malakhov@postgrespro.ru>
Date: Mon, 28 Nov 2022 18:05:53 +0300
Subject: [PATCH] Infinite loop while acquiring new TOAST value Oid

When TOASTed value is inserted it acquires new Oid which is stored
in the TOAST Pointer. Acquired Oid is checked if it already exists
in TOAST table the value is inserted into. If such value already
exists Oid acquiring is repeated, and when all values for the type
of value id - uint32 - are used - the cycle is repeated infinitely.
This results in DB stalling and SQL queries freezing.
The problem is actual for production databases with large amounts
of data.
Fix proposes limiting repeatable calls of GetNewOidWithIndex function
that acquires new Oid to the value defined by GETNEWOID_MAX_THRESHOLD
macro (3 in this patch) and return error if acquired Oid still .

Author: Nikita Malakhov <n.malakhov@postgrespro.ru>
---
 src/backend/access/common/toast_internals.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 576e585a89..e32796c75f 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -27,6 +27,8 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
+#define GETNEWOID_MAX_THRESHOLD 3
+
 static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
 static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
 
@@ -144,6 +146,7 @@ toast_save_datum(Relation rel, Datum value,
 	Pointer		dval = DatumGetPointer(value);
 	int			num_indexes;
 	int			validIndex;
+	uint32		getnewoid_counter = 0;
 
 	Assert(!VARATT_IS_EXTERNAL(value));
 
@@ -281,10 +284,20 @@ toast_save_datum(Relation rel, Datum value,
 			 */
 			do
 			{
+				if(getnewoid_counter > GETNEWOID_MAX_THRESHOLD)
+				{
+					toast_close_indexes(toastidxs, num_indexes, NoLock);
+					table_close(toastrel, NoLock);
+
+					elog(ERROR, "Cannot get new TOAST value Oid for toast relation with Oid %u",
+			 			RelationGetRelid(toastrel));
+				}
+
 				toast_pointer.va_valueid =
 					GetNewOidWithIndex(toastrel,
 									   RelationGetRelid(toastidxs[validIndex]),
 									   (AttrNumber) 1);
+				getnewoid_counter++;
 			} while (toastid_valueid_exists(rel->rd_toastoid,
 											toast_pointer.va_valueid));
 		}
-- 
2.25.1

#2Andres Freund
andres@anarazel.de
In reply to: Nikita Malakhov (#1)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

On 2022-11-28 18:34:20 +0300, Nikita Malakhov wrote:

While working on Pluggable TOAST we've detected a defective behavior
on tables with large amounts of TOASTed data - queries freeze and DB stalls.
Further investigation led us to the loop with GetNewOidWithIndex function
call - when all available Oid already exist in the related TOAST table this
loop continues infinitely. Data type used for value ID is the UINT32, which
is
unsigned int and has a maximum value of *4294967295* which allows
maximum 4294967295 records in the TOAST table. It is not a very big amount
for modern databases and is the major problem for productive systems.

I don't think the absolute number is the main issue - by default external
toasting will happen only for bigger datums. 4 billion external datums
typically use a lot of space.

If you hit this easily with your patch, then you likely broke the conditions
under which external toasting happens.

IMO the big issue is the global oid counter making it much easier to hit oid
wraparound. Due to that we end up assigning oids that conflict with existing
toast oids much sooner than 4 billion toasted datums.

I think the first step to improve the situation is to not use a global oid
counter for toasted values. One way to do that would be to use the sequence
code to do oid assignment, but we likely can find a more efficient
representation.

Eventually we should do the obvious thing and make toast ids 64bit wide - to
combat the space usage we likely should switch to representing the ids as
variable width integers or such, otherwise the space increase would likely be
prohibitive.

Quick fix for this problem is limiting GetNewOidWithIndex loops to some
reasonable amount defined by related macro and returning error if there is
still no available Oid. Please check attached patch, any feedback is
appreciated.

This feels like the wrong spot to tackle the issue. For one, most of the
looping will be in GetNewOidWithIndex(), so limiting looping in
toast_save_datum() won't help much. For another, if the limiting were in the
right place, it'd break currently working cases. Due to oid wraparound it's
pretty easy to hit "ranges" of allocated oids, without even getting close to
2^32 toasted datums.

Greetings,

Andres Freund

#3Nikita Malakhov
hukutoc@gmail.com
In reply to: Andres Freund (#2)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi!

We've already encountered this issue on large production databases, and
4 billion rows is not so much for modern bases, so this issue already arises
from time to time and would arise more and more often. I agree that global
oid counter is the main issue, and better solution would be local counters
with larger datatype for value id. This is the right way to solve this
issue,
although it would take some time. As I understand, global counter was taken
because it looked the fastest way of getting unique ID.
Ok, I'll prepare a patch with it.

Due to that we end up assigning oids that conflict with existing
toast oids much sooner than 4 billion toasted datums.

Just a note: global oid is checked for related TOAST table only, so equal
oids
in different TOAST tables would not collide.

Eventually we should do the obvious thing and make toast ids 64bit wide -

to

combat the space usage we likely should switch to representing the ids as
variable width integers or such, otherwise the space increase would likely

be

prohibitive.

I'm already working on it, but I thought that 64-bit value ID won't be
easily
accepted by community. I'd be very thankful for any advice on this.

On Mon, Nov 28, 2022 at 11:36 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-11-28 18:34:20 +0300, Nikita Malakhov wrote:

While working on Pluggable TOAST we've detected a defective behavior
on tables with large amounts of TOASTed data - queries freeze and DB

stalls.

Further investigation led us to the loop with GetNewOidWithIndex function
call - when all available Oid already exist in the related TOAST table

this

loop continues infinitely. Data type used for value ID is the UINT32,

which

is
unsigned int and has a maximum value of *4294967295* which allows
maximum 4294967295 records in the TOAST table. It is not a very big

amount

for modern databases and is the major problem for productive systems.

I don't think the absolute number is the main issue - by default external
toasting will happen only for bigger datums. 4 billion external datums
typically use a lot of space.

If you hit this easily with your patch, then you likely broke the
conditions
under which external toasting happens.

IMO the big issue is the global oid counter making it much easier to hit
oid
wraparound. Due to that we end up assigning oids that conflict with
existing
toast oids much sooner than 4 billion toasted datums.

I think the first step to improve the situation is to not use a global oid
counter for toasted values. One way to do that would be to use the sequence
code to do oid assignment, but we likely can find a more efficient
representation.

Eventually we should do the obvious thing and make toast ids 64bit wide -
to
combat the space usage we likely should switch to representing the ids as
variable width integers or such, otherwise the space increase would likely
be
prohibitive.

Quick fix for this problem is limiting GetNewOidWithIndex loops to some
reasonable amount defined by related macro and returning error if there

is

still no available Oid. Please check attached patch, any feedback is
appreciated.

This feels like the wrong spot to tackle the issue. For one, most of the
looping will be in GetNewOidWithIndex(), so limiting looping in
toast_save_datum() won't help much. For another, if the limiting were in
the
right place, it'd break currently working cases. Due to oid wraparound it's
pretty easy to hit "ranges" of allocated oids, without even getting close
to
2^32 toasted datums.

Greetings,

Andres Freund

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

#4Andres Freund
andres@anarazel.de
In reply to: Nikita Malakhov (#3)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

On 2022-11-28 23:54:53 +0300, Nikita Malakhov wrote:

We've already encountered this issue on large production databases, and
4 billion rows is not so much for modern bases, so this issue already arises
from time to time and would arise more and more often.

Was the issue that you exceeded 4 billion toasted datums, or that assignment
took a long time? How many toast datums did you actually have? Was this due to
very wide rows leading to even small datums getting toasted?

Greetings,

Andres Freund

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Andres Freund <andres@anarazel.de> writes:

I think the first step to improve the situation is to not use a global oid
counter for toasted values. One way to do that would be to use the sequence
code to do oid assignment, but we likely can find a more efficient
representation.

I don't particularly buy that, because the only real fix is this:

Eventually we should do the obvious thing and make toast ids 64bit wide

and if we do that there'll be no need to worry about multiple counters.

- to
combat the space usage we likely should switch to representing the ids as
variable width integers or such, otherwise the space increase would likely be
prohibitive.

And I don't buy that either. An extra 4 bytes with a 2K payload is not
"prohibitive", it's more like "down in the noise".

I think if we switch to int8 keys and widen the global OID counter to 8
bytes (using just the low 4 bytes for other purposes), we'll have a
perfectly fine solution. There is still plenty of work to be done under
that plan, because of the need to maintain backward compatibility for
existing TOAST tables --- and maybe people would want an option to keep on
using them, for non-enormous tables? If we add additional work on top of
that, it'll just mean that it will take longer to have any solution.

Quick fix for this problem is limiting GetNewOidWithIndex loops to some
reasonable amount defined by related macro and returning error if there is
still no available Oid. Please check attached patch, any feedback is
appreciated.

This feels like the wrong spot to tackle the issue.

Yeah, that is completely horrid. It does not remove the existing failure
mode, just changes it to have worse consequences.

regards, tom lane

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

On 2022-11-28 16:04:12 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

- to
combat the space usage we likely should switch to representing the ids as
variable width integers or such, otherwise the space increase would likely be
prohibitive.

And I don't buy that either. An extra 4 bytes with a 2K payload is not
"prohibitive", it's more like "down in the noise".

The space usage for the the the toast relation + index itself is indeed
irrelevant. Where it's not "down in the noise" is in struct varatt_external,
i.e. references to external toast datums. The size of that already is an
issue.

Greetings,

Andres Freund

#7Nikita Malakhov
hukutoc@gmail.com
In reply to: Tom Lane (#5)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

Andres Freund <andres@anarazel.de> writes:

Was the issue that you exceeded 4 billion toasted datums, or that

assignment

took a long time? How many toast datums did you actually have? Was this

due to

very wide rows leading to even small datums getting toasted?

Yep, we had 4 billion toasted datums. I remind that currently relation has
single
TOAST table for all toastable attributes, so it is not so difficult to get
to 4 billion
of toasted values.

I think if we switch to int8 keys and widen the global OID counter to 8
bytes (using just the low 4 bytes for other purposes), we'll have a
perfectly fine solution. There is still plenty of work to be done under
that plan, because of the need to maintain backward compatibility for
existing TOAST tables --- and maybe people would want an option to keep on
using them, for non-enormous tables? If we add additional work on top of
that, it'll just mean that it will take longer to have any solution.

I agree, but:
1) Global OID counter is used not only for TOAST, so there would be a lot of
places where the short counter (low part of 64 OID, if we go with that) is
used;
2) Upgrading to 64-bit id would require re-toasting old TOAST tables. Or
there
is some way to distinguish old tables from new ones?

But I don't see any reason to keep an old short ID as an option.

...

Yeah, that is completely horrid. It does not remove the existing failure
mode, just changes it to have worse consequences.

On Tue, Nov 29, 2022 at 12:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I think the first step to improve the situation is to not use a global

oid

counter for toasted values. One way to do that would be to use the

sequence

code to do oid assignment, but we likely can find a more efficient
representation.

I don't particularly buy that, because the only real fix is this:

Eventually we should do the obvious thing and make toast ids 64bit wide

and if we do that there'll be no need to worry about multiple counters.

- to
combat the space usage we likely should switch to representing the ids as
variable width integers or such, otherwise the space increase would

likely be

prohibitive.

And I don't buy that either. An extra 4 bytes with a 2K payload is not
"prohibitive", it's more like "down in the noise".

I think if we switch to int8 keys and widen the global OID counter to 8
bytes (using just the low 4 bytes for other purposes), we'll have a
perfectly fine solution. There is still plenty of work to be done under
that plan, because of the need to maintain backward compatibility for
existing TOAST tables --- and maybe people would want an option to keep on
using them, for non-enormous tables? If we add additional work on top of
that, it'll just mean that it will take longer to have any solution.

Quick fix for this problem is limiting GetNewOidWithIndex loops to some
reasonable amount defined by related macro and returning error if there

is

still no available Oid. Please check attached patch, any feedback is
appreciated.

This feels like the wrong spot to tackle the issue.

Yeah, that is completely horrid. It does not remove the existing failure
mode, just changes it to have worse consequences.

regards, tom lane

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Andres Freund <andres@anarazel.de> writes:

On 2022-11-28 16:04:12 -0500, Tom Lane wrote:

And I don't buy that either. An extra 4 bytes with a 2K payload is not
"prohibitive", it's more like "down in the noise".

The space usage for the the the toast relation + index itself is indeed
irrelevant. Where it's not "down in the noise" is in struct varatt_external,
i.e. references to external toast datums.

Ah, gotcha. Yes, the size of varatt_external is a problem.

regards, tom lane

#9Andres Freund
andres@anarazel.de
In reply to: Nikita Malakhov (#7)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

On 2022-11-29 00:24:49 +0300, Nikita Malakhov wrote:

2) Upgrading to 64-bit id would require re-toasting old TOAST tables. Or
there is some way to distinguish old tables from new ones?

The catalog / relcache entry should suffice to differentiate between the two.

Greetings,

Andres Freund

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Andres Freund <andres@anarazel.de> writes:

On 2022-11-29 00:24:49 +0300, Nikita Malakhov wrote:

2) Upgrading to 64-bit id would require re-toasting old TOAST tables. Or
there is some way to distinguish old tables from new ones?

The catalog / relcache entry should suffice to differentiate between the two.

Yeah, you could easily look at the datatype of the first attribute
(in either the TOAST table or its index) to determine what to do.

As I said before, I think there's a decent argument that some people
will want the option to stay with 4-byte TOAST OIDs indefinitely,
at least for smaller tables. So even without the fact that forced
conversions would be horridly expensive, we'll need to continue
support for both forms of TOAST table.

regards, tom lane

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#10)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

On 2022-11-28 16:57:53 -0500, Tom Lane wrote:

As I said before, I think there's a decent argument that some people
will want the option to stay with 4-byte TOAST OIDs indefinitely,
at least for smaller tables.

I think we'll need to do something about the width of varatt_external to make
the conversion to 64bit toast oids viable - and if we do, I don't think
there's a decent argument for staying with 4 byte toast OIDs. I think the
varatt_external equivalent would end up being smaller in just about all cases.
And as you said earlier, the increased overhead inside the toast table / index
is not relevant compared to the size of toasted datums.

Greetings,

Andres Freund

#12Nikita Malakhov
hukutoc@gmail.com
In reply to: Andres Freund (#11)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

I'll check that tomorrow. If it is so then there won't be a problem keeping
old tables without re-toasting.

On Tue, Nov 29, 2022 at 1:10 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-11-28 16:57:53 -0500, Tom Lane wrote:

As I said before, I think there's a decent argument that some people
will want the option to stay with 4-byte TOAST OIDs indefinitely,
at least for smaller tables.

I think we'll need to do something about the width of varatt_external to
make
the conversion to 64bit toast oids viable - and if we do, I don't think
there's a decent argument for staying with 4 byte toast OIDs. I think the
varatt_external equivalent would end up being smaller in just about all
cases.
And as you said earlier, the increased overhead inside the toast table /
index
is not relevant compared to the size of toasted datums.

Greetings,

Andres Freund

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#11)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Andres Freund <andres@anarazel.de> writes:

On 2022-11-28 16:57:53 -0500, Tom Lane wrote:

As I said before, I think there's a decent argument that some people
will want the option to stay with 4-byte TOAST OIDs indefinitely,
at least for smaller tables.

And as you said earlier, the increased overhead inside the toast table / index
is not relevant compared to the size of toasted datums.

Perhaps not.

I think we'll need to do something about the width of varatt_external to make
the conversion to 64bit toast oids viable - and if we do, I don't think
there's a decent argument for staying with 4 byte toast OIDs. I think the
varatt_external equivalent would end up being smaller in just about all cases.

I agree that we can't simply widen varatt_external to use 8 bytes for
the toast ID in all cases. Also, I now get the point about avoiding
use of globally assigned OIDs here: if the counter starts from zero
for each table, then a variable-width varatt_external could actually
be smaller than currently for many cases. However, that bit is somewhat
orthogonal, and it's certainly not required for fixing the basic problem.

So it seems like the plan of attack ought to be:

1. Invent a new form or forms of varatt_external to allow different
widths of the toast ID. Use the narrowest width possible for any
given ID value.

2. Allow TOAST tables/indexes to store either 4-byte or 8-byte IDs.
(Conversion could be done as a side effect of table-rewrite
operations, perhaps.)

3. Localize ID selection so that tables can have small toast IDs
even when other tables have many IDs. (Optional, could do later.)

regards, tom lane

#14Nikita Malakhov
hukutoc@gmail.com
In reply to: Nikita Malakhov (#12)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

I've missed that -

4 billion external datums
typically use a lot of space.

Not quite so. It's 8 Tb for the minimal size of toasted data (about 2 Kb).
In my practice tables with more than 5 billions of rows are not of
something out
of the ordinary (highly loaded databases with large amounts of data in use).

On Tue, Nov 29, 2022 at 1:12 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi,

I'll check that tomorrow. If it is so then there won't be a problem keeping
old tables without re-toasting.

On Tue, Nov 29, 2022 at 1:10 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-11-28 16:57:53 -0500, Tom Lane wrote:

As I said before, I think there's a decent argument that some people
will want the option to stay with 4-byte TOAST OIDs indefinitely,
at least for smaller tables.

I think we'll need to do something about the width of varatt_external to
make
the conversion to 64bit toast oids viable - and if we do, I don't think
there's a decent argument for staying with 4 byte toast OIDs. I think the
varatt_external equivalent would end up being smaller in just about all
cases.
And as you said earlier, the increased overhead inside the toast table /
index
is not relevant compared to the size of toasted datums.

Greetings,

Andres Freund

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

#15Nikita Malakhov
hukutoc@gmail.com
In reply to: Nikita Malakhov (#14)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi!

I'm working on this issue according to the plan Tom proposed above -

I agree that we can't simply widen varatt_external to use 8 bytes for
the toast ID in all cases. Also, I now get the point about avoiding
use of globally assigned OIDs here: if the counter starts from zero
for each table, then a variable-width varatt_external could actually
be smaller than currently for many cases. However, that bit is somewhat
orthogonal, and it's certainly not required for fixing the basic problem.

Have I understood correctly that you suppose using an individual counter
for each TOAST table? I'm working on this approach, so we store counters
in cache, but I see an issue with the first insert - when there is no
counter
in cache so we have to loop through the table with increasing steps to find
available one (i.e. after restart). Or we still use single global counter,
but
64-bit with a wraparound?

So it seems like the plan of attack ought to be:

1. Invent a new form or forms of varatt_external to allow different
widths of the toast ID. Use the narrowest width possible for any
given ID value.

I'm using the VARTAG field - there are plenty of available values, so there
is no problem in distinguishing regular toast pointer with 'short' value id
(4 bytes) from long (8 bytes).

varatt_external currently is 32-bit aligned, so there is no reason in using
narrower type for value ids up to 16 bits.Or is it?

2. Allow TOAST tables/indexes to store either 4-byte or 8-byte IDs.
(Conversion could be done as a side effect of table-rewrite
operations, perhaps.)

Still on toast/detoast part, would get to that later.

3. Localize ID selection so that tables can have small toast IDs
even when other tables have many IDs. (Optional, could do later.)

Thank you!

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

#16Nikita Malakhov
hukutoc@gmail.com
In reply to: Nikita Malakhov (#15)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi hackers!

Here's some update on the subject - I've made a branch based on master from
28/11 and introduced 64-bit TOAST value ID there. It is not complete now but
is already working and has some features:
- extended structure for TOAST pointer (varatt_long_external) with 64-bit
TOAST value ID;
- individual ID counters for each TOAST table, being cached for faster
access
and lookup of maximum TOAST ID in use after server restart.
https://github.com/postgrespro/postgres/tree/toast_64bit

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

#17Nikita Malakhov
hukutoc@gmail.com
In reply to: Nikita Malakhov (#16)
1 attachment(s)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi hackers!

I've prepared a patch with a 64-bit TOAST Value ID. It is a kind of
prototype
and needs some further work, but it is already working and ready to play
with.

I've introduced 64-bit value ID field in varatt_external, but to keep it
compatible
with older bases 64-bit value is stored as a structure of two 32-bit fields
and value
ID moved to be the last in the varatt_external structure. Also, I've
introduced
different ID assignment function - instead of using global OID assignment
function
I use individual counters for each TOAST table, automatically cached and
after
server restart when new value is inserted into TOAST table maximum used
value
is searched and used to assign the next one.

Andres Freund:
I think we'll need to do something about the width of varatt_external to

make

the conversion to 64bit toast oids viable - and if we do, I don't think
there's a decent argument for staying with 4 byte toast OIDs. I think the
varatt_external equivalent would end up being smaller in just about all

cases.

And as you said earlier, the increased overhead inside the toast table /

index

is not relevant compared to the size of toasted datums.

There is some small overhead due to indexing 64-bit values. Also, for
smaller
tables we can use 32-bit ID instead of 64-bit, but then we would have a
problem
when we reach the limit of 2^32.

Pg_upgrade seems to be not a very difficult case if we go keeping old TOAST
tables using 32-bit values,

Please have a look. I'd be grateful for some further directions.

GIT branch with this feature, rebased onto current master:
https://github.com/postgrespro/postgres/tree/toast_64bit

On Wed, Dec 7, 2022 at 12:00 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!

Here's some update on the subject - I've made a branch based on master from
28/11 and introduced 64-bit TOAST value ID there. It is not complete now
but
is already working and has some features:
- extended structure for TOAST pointer (varatt_long_external) with 64-bit
TOAST value ID;
- individual ID counters for each TOAST table, being cached for faster
access
and lookup of maximum TOAST ID in use after server restart.
https://github.com/postgrespro/postgres/tree/toast_64bit

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Attachments:

0001_64_bit_toast_valueid_v1.patchapplication/octet-stream; name=0001_64_bit_toast_valueid_v1.patchDownload
From 3fe12335e9a0eb7beff9e04d82d7f853aeb6c30b Mon Sep 17 00:00:00 2001
From: Nikita Malakhov <n.malakhov@postgrespro.ru>
Date: Thu, 1 Dec 2022 23:56:25 +0300
Subject: [PATCH 1/2] 64-bit TOAST value id instead of 32-bit

This patch introduces 64-bit TOAST value id to fix issue with
expending all available TOAST value OIDs in TOAST table - this
causes INSERT queries to hang and DB stalls.
Also, TOAST tables have individual counters cached for fast access.
---
 contrib/amcheck/verify_heapam.c               |  63 ++--
 contrib/pageinspect/heapfuncs.c               |   3 +-
 contrib/test_decoding/test_decoding.c         |   2 +-
 src/backend/access/common/detoast.c           |  38 +--
 src/backend/access/common/toast_compression.c |   6 +-
 src/backend/access/common/toast_internals.c   | 315 ++++++++++++++++--
 src/backend/access/heap/heaptoast.c           |  22 +-
 src/backend/access/table/toast_helper.c       |   6 +-
 src/backend/catalog/toasting.c                |   6 +
 src/backend/replication/logical/proto.c       |   2 +-
 .../replication/logical/reorderbuffer.c       |   8 +-
 src/backend/replication/pgoutput/pgoutput.c   |   7 +-
 src/backend/utils/mmgr/aset.c                 |  32 +-
 src/backend/utils/mmgr/generation.c           |   9 +
 src/backend/utils/mmgr/slab.c                 |   4 +
 src/bin/initdb/initdb.c                       |   7 +-
 src/include/access/detoast.h                  |  23 ++
 src/include/access/heaptoast.h                |   6 +-
 src/include/access/tableam.h                  |   6 +-
 src/include/access/toast_internals.h          |   5 +
 src/include/postgres.h                        |  43 ++-
 src/test/regress/regress.c                    |   2 +-
 22 files changed, 507 insertions(+), 108 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index b72a5c96d1..ac40749fb6 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -68,7 +68,7 @@ typedef enum SkipPages
  */
 typedef struct ToastedAttribute
 {
-	struct varatt_external toast_pointer;
+	struct varatt_long_external toast_pointer;
 	BlockNumber blkno;			/* block in main table */
 	OffsetNumber offnum;		/* offset in main table */
 	AttrNumber	attnum;			/* attribute in main table */
@@ -1171,16 +1171,16 @@ check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx,
 	if (isnull)
 	{
 		report_toast_corruption(ctx, ta,
-								psprintf("toast value %u has toast chunk with null sequence number",
-										 ta->toast_pointer.va_valueid));
+								psprintf("toast value %ld has toast chunk with null sequence number",
+										 get_uint64align32(&(ta->toast_pointer.va_valueid))));
 		return;
 	}
 	if (chunk_seq != *expected_chunk_seq)
 	{
 		/* Either the TOAST index is corrupt, or we don't have all chunks. */
 		report_toast_corruption(ctx, ta,
-								psprintf("toast value %u index scan returned chunk %d when expecting chunk %d",
-										 ta->toast_pointer.va_valueid,
+								psprintf("toast value %ld index scan returned chunk %d when expecting chunk %d",
+										 get_uint64align32(&(ta->toast_pointer.va_valueid)),
 										 chunk_seq, *expected_chunk_seq));
 	}
 	*expected_chunk_seq = chunk_seq + 1;
@@ -1191,8 +1191,8 @@ check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx,
 	if (isnull)
 	{
 		report_toast_corruption(ctx, ta,
-								psprintf("toast value %u chunk %d has null data",
-										 ta->toast_pointer.va_valueid,
+								psprintf("toast value %ld chunk %d has null data",
+										 get_uint64align32(&(ta->toast_pointer.va_valueid)),
 										 chunk_seq));
 		return;
 	}
@@ -1211,8 +1211,8 @@ check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx,
 		uint32		header = ((varattrib_4b *) chunk)->va_4byte.va_header;
 
 		report_toast_corruption(ctx, ta,
-								psprintf("toast value %u chunk %d has invalid varlena header %0x",
-										 ta->toast_pointer.va_valueid,
+								psprintf("toast value %ld chunk %d has invalid varlena header %0x",
+										 get_uint64align32(&(ta->toast_pointer.va_valueid)),
 										 chunk_seq, header));
 		return;
 	}
@@ -1223,8 +1223,8 @@ check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx,
 	if (chunk_seq > last_chunk_seq)
 	{
 		report_toast_corruption(ctx, ta,
-								psprintf("toast value %u chunk %d follows last expected chunk %d",
-										 ta->toast_pointer.va_valueid,
+								psprintf("toast value %ld chunk %d follows last expected chunk %d",
+										 get_uint64align32(&(ta->toast_pointer.va_valueid)),
 										 chunk_seq, last_chunk_seq));
 		return;
 	}
@@ -1234,8 +1234,8 @@ check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx,
 
 	if (chunksize != expected_size)
 		report_toast_corruption(ctx, ta,
-								psprintf("toast value %u chunk %d has size %u, but expected size %u",
-										 ta->toast_pointer.va_valueid,
+								psprintf("toast value %ld chunk %d has size %u, but expected size %u",
+										 get_uint64align32(&(ta->toast_pointer.va_valueid)),
 										 chunk_seq, chunksize, expected_size));
 }
 
@@ -1267,7 +1267,8 @@ check_tuple_attribute(HeapCheckContext *ctx)
 	char	   *tp;				/* pointer to the tuple data */
 	uint16		infomask;
 	Form_pg_attribute thisatt;
-	struct varatt_external toast_pointer;
+/*	struct varatt_external toast_pointer; */
+	struct varatt_long_external toast_pointer;
 
 	infomask = ctx->tuphdr->t_infomask;
 	thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
@@ -1326,7 +1327,7 @@ check_tuple_attribute(HeapCheckContext *ctx)
 	{
 		uint8		va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
 
-		if (va_tag != VARTAG_ONDISK)
+		if (va_tag != VARTAG_ONDISK && va_tag != VARTAG_LONG_EXTERNAL)
 		{
 			report_corruption(ctx,
 							  psprintf("toasted attribute has unexpected TOAST tag %u",
@@ -1373,13 +1374,15 @@ check_tuple_attribute(HeapCheckContext *ctx)
 	/*
 	 * Must copy attr into toast_pointer for alignment considerations
 	 */
-	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+/* XXX LONG */
+	VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr);
+/*	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); */
 
 	/* Toasted attributes too large to be untoasted should never be stored */
 	if (toast_pointer.va_rawsize > VARLENA_SIZE_LIMIT)
 		report_corruption(ctx,
-						  psprintf("toast value %u rawsize %d exceeds limit %d",
-								   toast_pointer.va_valueid,
+						  psprintf("toast value %ld rawsize %d exceeds limit %d",
+								   get_uint64align32(&toast_pointer.va_valueid),
 								   toast_pointer.va_rawsize,
 								   VARLENA_SIZE_LIMIT));
 
@@ -1406,16 +1409,16 @@ check_tuple_attribute(HeapCheckContext *ctx)
 		}
 		if (!valid)
 			report_corruption(ctx,
-							  psprintf("toast value %u has invalid compression method id %d",
-									   toast_pointer.va_valueid, cmid));
+							  psprintf("toast value %ld has invalid compression method id %d",
+									   get_uint64align32(&toast_pointer.va_valueid), cmid));
 	}
 
 	/* The tuple header better claim to contain toasted values */
 	if (!(infomask & HEAP_HASEXTERNAL))
 	{
 		report_corruption(ctx,
-						  psprintf("toast value %u is external but tuple header flag HEAP_HASEXTERNAL not set",
-								   toast_pointer.va_valueid));
+						  psprintf("toast value %ld is external but tuple header flag HEAP_HASEXTERNAL not set",
+								   get_uint64align32(&toast_pointer.va_valueid)));
 		return true;
 	}
 
@@ -1423,8 +1426,8 @@ check_tuple_attribute(HeapCheckContext *ctx)
 	if (!ctx->rel->rd_rel->reltoastrelid)
 	{
 		report_corruption(ctx,
-						  psprintf("toast value %u is external but relation has no toast relation",
-								   toast_pointer.va_valueid));
+						  psprintf("toast value %ld is external but relation has no toast relation",
+								   get_uint64align32(&toast_pointer.va_valueid)));
 		return true;
 	}
 
@@ -1443,7 +1446,7 @@ check_tuple_attribute(HeapCheckContext *ctx)
 
 		ta = (ToastedAttribute *) palloc0(sizeof(ToastedAttribute));
 
-		VARATT_EXTERNAL_GET_POINTER(ta->toast_pointer, attr);
+		VARATT_EXTERNAL_GET_LONG_POINTER(ta->toast_pointer, attr);
 		ta->blkno = ctx->blkno;
 		ta->offnum = ctx->offnum;
 		ta->attnum = ctx->attnum;
@@ -1477,10 +1480,16 @@ check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta)
 	/*
 	 * Setup a scan key to find chunks in toast table with matching va_valueid
 	 */
+/*
 	ScanKeyInit(&toastkey,
 				(AttrNumber) 1,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(ta->toast_pointer.va_valueid));
+*/
+	ScanKeyInit(&toastkey,
+				(AttrNumber) 1,
+				BTEqualStrategyNumber, F_INT8EQ,
+				Int64GetDatum(get_uint64align32(&(ta->toast_pointer.va_valueid))));
 
 	/*
 	 * Check if any chunks for this toasted object exist in the toast table,
@@ -1504,11 +1513,11 @@ check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta)
 	if (!found_toasttup)
 		report_toast_corruption(ctx, ta,
 								psprintf("toast value %u not found in toast table",
-										 ta->toast_pointer.va_valueid));
+										 get_uint64align32(&(ta->toast_pointer.va_valueid))));
 	else if (expected_chunk_seq <= last_chunk_seq)
 		report_toast_corruption(ctx, ta,
 								psprintf("toast value %u was expected to end at chunk %d, but ended while expecting chunk %d",
-										 ta->toast_pointer.va_valueid,
+										 get_uint64align32(&(ta->toast_pointer.va_valueid)),
 										 last_chunk_seq, expected_chunk_seq));
 }
 
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index aed2753253..3fa6d0b0e1 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -364,7 +364,8 @@ tuple_data_split_internal(Oid relid, char *tupdata,
 				 */
 				if (VARATT_IS_EXTERNAL(tupdata + off) &&
 					!VARATT_IS_EXTERNAL_ONDISK(tupdata + off) &&
-					!VARATT_IS_EXTERNAL_INDIRECT(tupdata + off))
+					!VARATT_IS_EXTERNAL_INDIRECT(tupdata + off) &&
+					!VARATT_IS_LONG_EXTERNAL(tupdata + off))
 					ereport(ERROR,
 							(errcode(ERRCODE_DATA_CORRUPTED),
 							 errmsg("first byte of varlena attribute is incorrect for attribute %d", i)));
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e0fd6f1765..e027ae2ac7 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -580,7 +580,7 @@ tuple_to_stringinfo(StringInfo s, TupleDesc tupdesc, HeapTuple tuple, bool skip_
 		/* print data */
 		if (isnull)
 			appendStringInfoString(s, "null");
-		else if (typisvarlena && VARATT_IS_EXTERNAL_ONDISK(origval))
+		else if (typisvarlena && (VARATT_IS_EXTERNAL_ONDISK(origval) || VARATT_IS_LONG_EXTERNAL(origval)))
 			appendStringInfoString(s, "unchanged-toast-datum");
 		else if (!typisvarlena)
 			print_literal(s, typid,
diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index 92c9c658d3..990d4a8387 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -46,7 +46,7 @@ detoast_external_attr(struct varlena *attr)
 {
 	struct varlena *result;
 
-	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr) || VARATT_IS_LONG_EXTERNAL(attr))
 	{
 		/*
 		 * This is an external stored plain value
@@ -115,7 +115,7 @@ detoast_external_attr(struct varlena *attr)
 struct varlena *
 detoast_attr(struct varlena *attr)
 {
-	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	if ((VARATT_IS_EXTERNAL_ONDISK(attr)) || VARATT_IS_LONG_EXTERNAL(attr))
 	{
 		/*
 		 * This is an externally stored datum --- fetch it back from there
@@ -223,11 +223,11 @@ detoast_attr_slice(struct varlena *attr,
 	else if (pg_add_s32_overflow(sliceoffset, slicelength, &slicelimit))
 		slicelength = slicelimit = -1;
 
-	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr) || VARATT_IS_LONG_EXTERNAL(attr))
 	{
-		struct varatt_external toast_pointer;
+		struct varatt_long_external toast_pointer;
 
-		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr);
 
 		/* fast path for non-compressed external datums */
 		if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
@@ -344,14 +344,14 @@ toast_fetch_datum(struct varlena *attr)
 {
 	Relation	toastrel;
 	struct varlena *result;
-	struct varatt_external toast_pointer;
+	struct varatt_long_external toast_pointer;
 	int32		attrsize;
 
-	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+	if (!VARATT_IS_EXTERNAL_ONDISK(attr) && !VARATT_IS_LONG_EXTERNAL(attr))
 		elog(ERROR, "toast_fetch_datum shouldn't be called for non-ondisk datums");
 
 	/* Must copy to access aligned fields */
-	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+	VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr);
 
 	attrsize = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
 
@@ -372,7 +372,7 @@ toast_fetch_datum(struct varlena *attr)
 	toastrel = table_open(toast_pointer.va_toastrelid, AccessShareLock);
 
 	/* Fetch all chunks */
-	table_relation_fetch_toast_slice(toastrel, toast_pointer.va_valueid,
+	table_relation_fetch_toast_slice(toastrel, get_uint64align32(&(toast_pointer.va_valueid)),
 									 attrsize, 0, attrsize, result);
 
 	/* Close toast table */
@@ -398,14 +398,14 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset,
 {
 	Relation	toastrel;
 	struct varlena *result;
-	struct varatt_external toast_pointer;
+	struct varatt_long_external toast_pointer;
 	int32		attrsize;
 
-	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+	if (!VARATT_IS_EXTERNAL_ONDISK(attr) && !VARATT_IS_LONG_EXTERNAL(attr))
 		elog(ERROR, "toast_fetch_datum_slice shouldn't be called for non-ondisk datums");
 
 	/* Must copy to access aligned fields */
-	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+	VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr);
 
 	/*
 	 * It's nonsense to fetch slices of a compressed datum unless when it's a
@@ -452,7 +452,7 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset,
 	toastrel = table_open(toast_pointer.va_toastrelid, AccessShareLock);
 
 	/* Fetch all chunks */
-	table_relation_fetch_toast_slice(toastrel, toast_pointer.va_valueid,
+	table_relation_fetch_toast_slice(toastrel, get_uint64align32(&(toast_pointer.va_valueid)),
 									 attrsize, sliceoffset, slicelength,
 									 result);
 
@@ -547,12 +547,12 @@ toast_raw_datum_size(Datum value)
 	struct varlena *attr = (struct varlena *) DatumGetPointer(value);
 	Size		result;
 
-	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr) || VARATT_IS_LONG_EXTERNAL(attr))
 	{
 		/* va_rawsize is the size of the original datum -- including header */
-		struct varatt_external toast_pointer;
+		struct varatt_long_external toast_pointer;
 
-		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr);
 		result = toast_pointer.va_rawsize;
 	}
 	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
@@ -603,16 +603,16 @@ toast_datum_size(Datum value)
 	struct varlena *attr = (struct varlena *) DatumGetPointer(value);
 	Size		result;
 
-	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr) || VARATT_IS_LONG_EXTERNAL(attr))
 	{
 		/*
 		 * Attribute is stored externally - return the extsize whether
 		 * compressed or not.  We do not count the size of the toast pointer
 		 * ... should we?
 		 */
-		struct varatt_external toast_pointer;
+		struct varatt_long_external toast_pointer;
 
-		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr);
 		result = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
 	}
 	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
diff --git a/src/backend/access/common/toast_compression.c b/src/backend/access/common/toast_compression.c
index 7156ae9c47..5b27c84275 100644
--- a/src/backend/access/common/toast_compression.c
+++ b/src/backend/access/common/toast_compression.c
@@ -261,11 +261,11 @@ toast_get_compression_id(struct varlena *attr)
 	 * the external toast pointer.  If compressed inline, fetch it from the
 	 * toast compression header.
 	 */
-	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr) || VARATT_IS_LONG_EXTERNAL(attr))
 	{
-		struct varatt_external toast_pointer;
+		struct varatt_long_external toast_pointer;
 
-		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr);
 
 		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 			cmid = VARATT_EXTERNAL_GET_COMPRESS_METHOD(toast_pointer);
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 576e585a89..278c7822f4 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -27,8 +27,168 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
+#include "access/htup_details.h"
+#include "commands/defrem.h"
+#include "lib/pairingheap.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+#define InvalidInt64 0x0000000000000000
+
 static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
 static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
+static bool toastrel_long_valueid_exists(Relation toastrel, uint64 valueid, bool long_ind);
+
+/* Toast Relation counters cache */
+typedef struct ToastCounterCacheEntry
+{
+	Oid			toastrelOid;
+	uint64		counter;
+} ToastCounterCacheEntry;
+
+static List	*ToastCounterCache = NIL;
+
+uint64
+SearchMinUnusedValueid(Oid	toastrelOid)
+{
+	ToastCounterCacheEntry  *entry;
+	bool	cnt_exists_ind = false;
+	Relation toast_rel;
+	uint64	valueid = 0;
+	uint64	hi_valueid = 0xffffffffffff;
+	uint64	lo_valueid = 0;
+	ScanKeyData toastkey;
+	SysScanDesc toastscan;
+	int			num_indexes;
+	int			validIndex;
+	Relation   *toastidxs;
+
+
+	toast_rel = table_open(toastrelOid, AccessShareLock);
+	validIndex = toast_open_indexes(toast_rel,
+									AccessShareLock,
+									&toastidxs,
+									&num_indexes);
+
+	ScanKeyInit(&toastkey,
+			(AttrNumber) 1,
+			BTEqualStrategyNumber, F_INT8EQ,
+			Int64GetDatum(lo_valueid + 1));
+	toastscan = systable_beginscan(toast_rel,
+								   RelationGetRelid(toastidxs[validIndex]),
+								   true, SnapshotAny, 1, &toastkey);
+
+	if (systable_getnext(toastscan) != NULL)
+		cnt_exists_ind = true;
+
+	systable_endscan(toastscan);
+	if(!cnt_exists_ind)
+		goto out;
+
+	while(true) //(hi_valueid - lo_valueid) > 0)
+	{
+		valueid = (lo_valueid + ((hi_valueid - lo_valueid) >> 1));
+		cnt_exists_ind = false;
+
+		ScanKeyInit(&toastkey,
+			(AttrNumber) 1,
+			BTEqualStrategyNumber, F_INT8EQ,
+			Int64GetDatum(valueid));
+		toastscan = systable_beginscan(toast_rel,
+								   RelationGetRelid(toastidxs[validIndex]),
+								   true, SnapshotAny, 1, &toastkey);
+
+		if (systable_getnext(toastscan) != NULL)
+			cnt_exists_ind = true;
+
+		systable_endscan(toastscan);
+
+		if( hi_valueid - lo_valueid == 0 )
+			goto out;
+
+		if(cnt_exists_ind)
+			lo_valueid = valueid + 1;
+		else
+			hi_valueid = valueid - 1;
+	}
+
+
+out:
+	toast_close_indexes(toastidxs, num_indexes, NoLock);
+
+	if(cnt_exists_ind) 
+		valueid++;
+
+	table_close(toast_rel, NoLock);
+
+	entry = palloc(sizeof(*entry));
+
+	entry->toastrelOid = toastrelOid;
+	entry->counter = valueid;
+
+	ToastCounterCache = lappend(ToastCounterCache, entry);
+
+	return valueid;
+}
+
+/*
+ * SearchTsrCache - get cached toaster routine, emits an error if toaster
+ * doesn't exist
+ */
+uint64
+SearchTrelCounterCache(Oid	toastrelOid)
+{
+	ListCell		   *lc;
+	ToastCounterCacheEntry  *entry;
+	MemoryContext		ctx;
+	uint64	valueid = 0;
+	ctx = MemoryContextSwitchTo(CacheMemoryContext);
+
+	foreach(lc, ToastCounterCache)
+	{
+		entry = (ToastCounterCacheEntry*)lfirst(lc);
+		if (entry->toastrelOid == toastrelOid)
+		{
+			if(entry->counter < 0xffffffffffffffff)
+				entry->counter++;
+			else
+				entry->counter = 1;
+			valueid = entry->counter;
+			goto out;
+		}
+	}
+
+out:
+	MemoryContextSwitchTo(ctx);
+
+	return valueid;
+}
+
+uint64
+InsertTrelCounterCache(Oid	toastrelOid)
+{
+	ToastCounterCacheEntry  *entry;
+	MemoryContext		ctx;
+	uint64	valueid = 0;
+	ctx = MemoryContextSwitchTo(CacheMemoryContext);
+
+	valueid = SearchMinUnusedValueid(toastrelOid);
+
+	/* did not find entry, make a new one */
+	entry = palloc(sizeof(*entry));
+
+	entry->toastrelOid = toastrelOid;
+	entry->counter = valueid;
+	ToastCounterCache = lappend(ToastCounterCache, entry);
+
+	MemoryContextSwitchTo(ctx);
+
+	return valueid;
+}
+
+/**/
 
 /* ----------
  * toast_compress_datum -
@@ -128,7 +288,7 @@ toast_save_datum(Relation rel, Datum value,
 	bool		t_isnull[3];
 	CommandId	mycid = GetCurrentCommandId(true);
 	struct varlena *result;
-	struct varatt_external toast_pointer;
+	struct varatt_long_external toast_pointer;
 	union
 	{
 		struct varlena hdr;
@@ -144,6 +304,7 @@ toast_save_datum(Relation rel, Datum value,
 	Pointer		dval = DatumGetPointer(value);
 	int			num_indexes;
 	int			validIndex;
+	bool			long_ind = false;
 
 	Assert(!VARATT_IS_EXTERNAL(value));
 
@@ -212,6 +373,16 @@ toast_save_datum(Relation rel, Datum value,
 	else
 		toast_pointer.va_toastrelid = RelationGetRelid(toastrel);
 
+	if(TupleDescAttr(toasttupDesc, 0)->atttypid == OIDOID || TupleDescAttr(toasttupDesc, 0)->atttypid == INT4OID)
+	{
+/* regular toast pointer*/
+		long_ind = false;
+	}
+	else
+	{
+/* 64-bit id */
+		long_ind = true;
+	}
 	/*
 	 * Choose an OID to use as the value ID for this toast value.
 	 *
@@ -226,27 +397,38 @@ toast_save_datum(Relation rel, Datum value,
 	 */
 	if (!OidIsValid(rel->rd_toastoid))
 	{
+		uint64 valueid = 0;
 		/* normal case: just choose an unused OID */
+/*		
 		toast_pointer.va_valueid =
 			GetNewOidWithIndex(toastrel,
 							   RelationGetRelid(toastidxs[validIndex]),
 							   (AttrNumber) 1);
+*/
+		valueid = SearchTrelCounterCache(rel->rd_rel->reltoastrelid);
+		if(valueid == 0)
+			valueid = InsertTrelCounterCache(rel->rd_rel->reltoastrelid);
+		set_uint64align32(&(toast_pointer.va_valueid), valueid);
 	}
 	else
 	{
+		uint64 zero_int8 = 0;
+		uint64 valueid = 0;
 		/* rewrite case: check to see if value was in old toast table */
-		toast_pointer.va_valueid = InvalidOid;
+		/* toast_pointer.va_valueid = InvalidOid; */
+		set_uint64align32(&(toast_pointer.va_valueid), zero_int8);
 		if (oldexternal != NULL)
 		{
-			struct varatt_external old_toast_pointer;
+			struct varatt_long_external old_toast_pointer;
 
-			Assert(VARATT_IS_EXTERNAL_ONDISK(oldexternal));
+			Assert(VARATT_IS_EXTERNAL_ONDISK(oldexternal) || VARATT_IS_LONG_EXTERNAL(oldexternal));
 			/* Must copy to access aligned fields */
-			VARATT_EXTERNAL_GET_POINTER(old_toast_pointer, oldexternal);
+			VARATT_EXTERNAL_GET_LONG_POINTER(old_toast_pointer, oldexternal);
 			if (old_toast_pointer.va_toastrelid == rel->rd_toastoid)
 			{
 				/* This value came from the old toast table; reuse its OID */
-				toast_pointer.va_valueid = old_toast_pointer.va_valueid;
+				/* toast_pointer.va_valueid = old_toast_pointer.va_valueid; */
+				set_uint64align32(&(toast_pointer.va_valueid), get_uint64align32(&(old_toast_pointer.va_valueid)));
 
 				/*
 				 * There is a corner case here: the table rewrite might have
@@ -265,35 +447,50 @@ toast_save_datum(Relation rel, Datum value,
 				 * copies belonging to already-deleted heap tuples would not
 				 * be reclaimed by VACUUM.
 				 */
+/*
 				if (toastrel_valueid_exists(toastrel,
 											toast_pointer.va_valueid))
+*/
+				if (toastrel_long_valueid_exists(toastrel,
+											get_uint64align32(&(toast_pointer.va_valueid)), long_ind))
 				{
 					/* Match, so short-circuit the data storage loop below */
 					data_todo = 0;
 				}
 			}
 		}
-		if (toast_pointer.va_valueid == InvalidOid)
+/*		if (toast_pointer.va_valueid == InvalidOid) */
+		if (get_uint64align32(&(toast_pointer.va_valueid)) == zero_int8)
 		{
+			bool exists_ind = false;
 			/*
 			 * new value; must choose an OID that doesn't conflict in either
 			 * old or new toast table
 			 */
 			do
 			{
+/*
 				toast_pointer.va_valueid =
 					GetNewOidWithIndex(toastrel,
 									   RelationGetRelid(toastidxs[validIndex]),
 									   (AttrNumber) 1);
-			} while (toastid_valueid_exists(rel->rd_toastoid,
-											toast_pointer.va_valueid));
+*/
+				valueid = SearchTrelCounterCache(rel->rd_rel->reltoastrelid);
+				if(valueid == 0)
+					valueid = InsertTrelCounterCache(rel->rd_rel->reltoastrelid);
+//				set_uint64align32(&(toast_pointer.va_valueid), valueid);
+
+//				set_uint64align32(&(toast_pointer.va_valueid), SearchTrelCounterCache(rel->rd_rel->reltoastrelid));
+				exists_ind = toastrel_long_valueid_exists(toastrel,
+											get_uint64align32(&(toast_pointer.va_valueid)), long_ind);
+			} while (!exists_ind);
+			set_uint64align32(&(toast_pointer.va_valueid), valueid);
 		}
 	}
-
 	/*
 	 * Initialize constant parts of the tuple data
 	 */
-	t_values[0] = ObjectIdGetDatum(toast_pointer.va_valueid);
+	t_values[0] = Int64GetDatum(get_uint64align32(&(toast_pointer.va_valueid))); //ObjectIdGetDatum(toast_pointer.va_valueid);
 	t_values[2] = PointerGetDatum(&chunk_data);
 	t_isnull[0] = false;
 	t_isnull[1] = false;
@@ -338,12 +535,14 @@ toast_save_datum(Relation rel, Datum value,
 		{
 			/* Only index relations marked as ready can be updated */
 			if (toastidxs[i]->rd_index->indisready)
+			{
 				index_insert(toastidxs[i], t_values, t_isnull,
 							 &(toasttup->t_self),
 							 toastrel,
 							 toastidxs[i]->rd_index->indisunique ?
 							 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
 							 false, NULL);
+			}
 		}
 
 		/*
@@ -365,14 +564,14 @@ toast_save_datum(Relation rel, Datum value,
 	 */
 	toast_close_indexes(toastidxs, num_indexes, NoLock);
 	table_close(toastrel, NoLock);
-
 	/*
 	 * Create the TOAST pointer value that we'll return
 	 */
-	result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
-	SET_VARTAG_EXTERNAL(result, VARTAG_ONDISK);
-	memcpy(VARDATA_EXTERNAL(result), &toast_pointer, sizeof(toast_pointer));
-
+	// result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
+	result = (struct varlena *) palloc(VARSIZE_LONG_EXTERNAL(toast_pointer));
+/*	SET_VARTAG_EXTERNAL(result, VARTAG_ONDISK); */
+	SET_VARTAG_EXTERNAL(result, VARTAG_LONG_EXTERNAL);
+	memcpy(VARDATA_EXTERNAL(result), &toast_pointer, VARATT_LONG_SIZE);
 	return PointerGetDatum(result);
 }
 
@@ -386,7 +585,7 @@ void
 toast_delete_datum(Relation rel, Datum value, bool is_speculative)
 {
 	struct varlena *attr = (struct varlena *) DatumGetPointer(value);
-	struct varatt_external toast_pointer;
+	struct varatt_long_external toast_pointer;
 	Relation	toastrel;
 	Relation   *toastidxs;
 	ScanKeyData toastkey;
@@ -396,11 +595,11 @@ toast_delete_datum(Relation rel, Datum value, bool is_speculative)
 	int			validIndex;
 	SnapshotData SnapshotToast;
 
-	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+	if (!VARATT_IS_EXTERNAL_ONDISK(attr) && !VARATT_IS_LONG_EXTERNAL(attr))
 		return;
 
 	/* Must copy to access aligned fields */
-	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+	VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr);
 
 	/*
 	 * Open the toast relation and its indexes
@@ -416,10 +615,27 @@ toast_delete_datum(Relation rel, Datum value, bool is_speculative)
 	/*
 	 * Setup a scan key to find chunks with matching va_valueid
 	 */
-	ScanKeyInit(&toastkey,
+/*	if(VARATT_IS_LONG_EXTERNAL(attr))
+	{ */
+		ScanKeyInit(&toastkey,
+				(AttrNumber) 1,
+				BTEqualStrategyNumber, F_INT8EQ,
+				Int64GetDatum(get_uint64align32(&toast_pointer.va_valueid)));
+/*	}
+	else
+	{
+		ScanKeyInit(&toastkey,
+				(AttrNumber) 1,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum((&toast_pointer.va_valueid)->lo));
+*/
+/*
+		ScanKeyInit(&toastkey,
 				(AttrNumber) 1,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(toast_pointer.va_valueid));
+*/
+/*	} */
 
 	/*
 	 * Find all the chunks.  (We don't actually care whether we see them in
@@ -450,6 +666,65 @@ toast_delete_datum(Relation rel, Datum value, bool is_speculative)
 	table_close(toastrel, NoLock);
 }
 
+static bool
+toastrel_long_valueid_exists(Relation toastrel, uint64 valueid, bool long_ind)
+{
+	bool		result = false;
+	ScanKeyData toastkey;
+	SysScanDesc toastscan;
+	int			num_indexes;
+	int			validIndex;
+	Relation   *toastidxs;
+	TupleDesc	tupDesc;
+	uint32		short_valueid;
+
+	tupDesc = RelationGetDescr(toastrel);
+	/* Fetch a valid index relation */
+	validIndex = toast_open_indexes(toastrel,
+									RowExclusiveLock,
+									&toastidxs,
+									&num_indexes);
+
+	/*
+	 * Setup a scan key to find chunks with matching va_valueid
+	 */
+/*	
+	if(long_ind)
+	{*/
+		ScanKeyInit(&toastkey,
+				(AttrNumber) 1,
+				BTEqualStrategyNumber, F_INT8EQ,
+				Int64GetDatum(valueid));
+/*	}
+	else
+	{
+		short_valueid = (uint32) (valueid & 0xffffffff);
+
+		ScanKeyInit(&toastkey,
+				(AttrNumber) 1,
+				BTEqualStrategyNumber, F_OIDEQ,
+				Int32GetDatum(short_valueid));
+	}
+*/
+	/*
+	 * Is there any such chunk?
+	 */
+	toastscan = systable_beginscan(toastrel,
+								   RelationGetRelid(toastidxs[validIndex]),
+								   true, SnapshotAny, 1, &toastkey);
+
+	if (systable_getnext(toastscan) != NULL)
+		result = true;
+
+	systable_endscan(toastscan);
+
+	/* Clean up */
+	toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
+
+	return result;
+}
+
+
 /* ----------
  * toastrel_valueid_exists -
  *
diff --git a/src/backend/access/heap/heaptoast.c b/src/backend/access/heap/heaptoast.c
index 1575a81b01..d5d0042744 100644
--- a/src/backend/access/heap/heaptoast.c
+++ b/src/backend/access/heap/heaptoast.c
@@ -266,7 +266,6 @@ heap_toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		biggest_attno = toast_tuple_find_biggest_attribute(&ttc, false, true);
 		if (biggest_attno < 0)
 			break;
-
 		toast_tuple_externalize(&ttc, biggest_attno, options);
 	}
 
@@ -329,9 +328,7 @@ heap_toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	}
 	else
 		result_tuple = newtup;
-
 	toast_tuple_cleanup(&ttc);
-
 	return result_tuple;
 }
 
@@ -623,7 +620,7 @@ toast_build_flattened_tuple(TupleDesc tupleDesc,
  * result is the varlena into which the results should be written.
  */
 void
-heap_fetch_toast_slice(Relation toastrel, Oid valueid, int32 attrsize,
+heap_fetch_toast_slice(Relation toastrel, uint64 valueid, int32 attrsize,
 					   int32 sliceoffset, int32 slicelength,
 					   struct varlena *result)
 {
@@ -652,10 +649,17 @@ heap_fetch_toast_slice(Relation toastrel, Oid valueid, int32 attrsize,
 	Assert(endchunk <= totalchunks);
 
 	/* Set up a scan key to fetch from the index. */
+/*
 	ScanKeyInit(&toastkey[0],
 				(AttrNumber) 1,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(valueid));
+*/
+	elog(NOTICE, "heap_fetch_toast_slice rel %u val %ld", toastrel->rd_id, valueid);
+	ScanKeyInit(&toastkey[0],
+				(AttrNumber) 1,
+				BTEqualStrategyNumber, F_INT8EQ,
+				Int64GetDatum(valueid));
 
 	/*
 	 * No additional condition if fetching all chunks. Otherwise, use an
@@ -727,7 +731,7 @@ heap_fetch_toast_slice(Relation toastrel, Oid valueid, int32 attrsize,
 		else
 		{
 			/* should never happen */
-			elog(ERROR, "found toasted toast chunk for toast value %u in %s",
+			elog(ERROR, "found toasted toast chunk for toast value %ld in %s",
 				 valueid, RelationGetRelationName(toastrel));
 			chunksize = 0;		/* keep compiler quiet */
 			chunkdata = NULL;
@@ -739,13 +743,13 @@ heap_fetch_toast_slice(Relation toastrel, Oid valueid, int32 attrsize,
 		if (curchunk != expectedchunk)
 			ereport(ERROR,
 					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("unexpected chunk number %d (expected %d) for toast value %u in %s",
+					 errmsg_internal("unexpected chunk number %d (expected %d) for toast value %ld in %s",
 									 curchunk, expectedchunk, valueid,
 									 RelationGetRelationName(toastrel))));
 		if (curchunk > endchunk)
 			ereport(ERROR,
 					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("unexpected chunk number %d (out of range %d..%d) for toast value %u in %s",
+					 errmsg_internal("unexpected chunk number %d (out of range %d..%d) for toast value %ld in %s",
 									 curchunk,
 									 startchunk, endchunk, valueid,
 									 RelationGetRelationName(toastrel))));
@@ -754,7 +758,7 @@ heap_fetch_toast_slice(Relation toastrel, Oid valueid, int32 attrsize,
 		if (chunksize != expected_size)
 			ereport(ERROR,
 					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("unexpected chunk size %d (expected %d) in chunk %d of %d for toast value %u in %s",
+					 errmsg_internal("unexpected chunk size %d (expected %d) in chunk %d of %d for toast value %ld in %s",
 									 chunksize, expected_size,
 									 curchunk, totalchunks, valueid,
 									 RelationGetRelationName(toastrel))));
@@ -783,7 +787,7 @@ heap_fetch_toast_slice(Relation toastrel, Oid valueid, int32 attrsize,
 	if (expectedchunk != (endchunk + 1))
 		ereport(ERROR,
 				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg_internal("missing chunk number %d for toast value %u in %s",
+				 errmsg_internal("missing chunk number %d for toast value %ld in %s",
 								 expectedchunk, valueid,
 								 RelationGetRelationName(toastrel))));
 
diff --git a/src/backend/access/table/toast_helper.c b/src/backend/access/table/toast_helper.c
index 74ba2189f0..91bf2c67c5 100644
--- a/src/backend/access/table/toast_helper.c
+++ b/src/backend/access/table/toast_helper.c
@@ -71,10 +71,10 @@ toast_tuple_init(ToastTupleContext *ttc)
 			 * we have to delete it later.
 			 */
 			if (att->attlen == -1 && !ttc->ttc_oldisnull[i] &&
-				VARATT_IS_EXTERNAL_ONDISK(old_value))
+				(VARATT_IS_EXTERNAL_ONDISK(old_value) || VARATT_IS_LONG_EXTERNAL(old_value)))
 			{
 				if (ttc->ttc_isnull[i] ||
-					!VARATT_IS_EXTERNAL_ONDISK(new_value) ||
+					!(VARATT_IS_EXTERNAL_ONDISK(new_value) || VARATT_IS_LONG_EXTERNAL(new_value)) ||
 					memcmp((char *) old_value, (char *) new_value,
 						   VARSIZE_EXTERNAL(old_value)) != 0)
 				{
@@ -330,7 +330,7 @@ toast_delete_external(Relation rel, Datum *values, bool *isnull,
 
 			if (isnull[i])
 				continue;
-			else if (VARATT_IS_EXTERNAL_ONDISK(value))
+			else if (VARATT_IS_EXTERNAL_ONDISK(value) || VARATT_IS_LONG_EXTERNAL(value))
 				toast_delete_datum(rel, value, is_speculative);
 		}
 	}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 9bc10729b0..866fb8aa82 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -204,10 +204,16 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 
 	/* this is pretty painful...  need a tuple descriptor */
 	tupdesc = CreateTemplateTupleDesc(3);
+/*
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1,
 					   "chunk_id",
 					   OIDOID,
 					   -1, 0);
+*/
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1,
+					   "chunk_id",
+					   INT8OID,
+					   -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2,
 					   "chunk_seq",
 					   INT4OID,
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index f5f2bc24d8..0f6eb2dbc5 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -814,7 +814,7 @@ logicalrep_write_tuple(StringInfo out, Relation rel, TupleTableSlot *slot,
 			continue;
 		}
 
-		if (att->attlen == -1 && VARATT_IS_EXTERNAL_ONDISK(values[i]))
+		if (att->attlen == -1 && (VARATT_IS_EXTERNAL_ONDISK(values[i]) || VARATT_IS_LONG_EXTERNAL(values[i])))
 		{
 			/*
 			 * Unchanged toasted datum.  (Note that we don't promise to detect
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index b567b8b59e..533f1f3f8e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -4755,12 +4755,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		struct varlena *varlena;
 
 		/* va_rawsize is the size of the original datum -- including header */
-		struct varatt_external toast_pointer;
+		struct varatt_long_external toast_pointer;
 		struct varatt_indirect redirect_pointer;
 		struct varlena *new_datum = NULL;
 		struct varlena *reconstructed;
 		dlist_iter	it;
 		Size		data_done = 0;
+		uint64	valueid = 0;
 
 		/* system columns aren't toasted */
 		if (attr->attnum < 0)
@@ -4784,14 +4785,15 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		if (!VARATT_IS_EXTERNAL(varlena))
 			continue;
 
-		VARATT_EXTERNAL_GET_POINTER(toast_pointer, varlena);
+		VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, varlena);
 
 		/*
 		 * Check whether the toast tuple changed, replace if so.
 		 */
+		valueid = get_uint64align32(&toast_pointer.va_valueid);
 		ent = (ReorderBufferToastEnt *)
 			hash_search(txn->toast_hash,
-						(void *) &toast_pointer.va_valueid,
+						(void *) &valueid,
 						HASH_FIND,
 						NULL);
 		if (ent == NULL)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index ca46fba3af..1b8e59ef7d 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1284,8 +1284,11 @@ pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 		 * VARTAG_INDIRECT. See ReorderBufferToastReplace.
 		 */
 		if (att->attlen == -1 &&
-			VARATT_IS_EXTERNAL_ONDISK(new_slot->tts_values[i]) &&
-			!VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]))
+			((VARATT_IS_EXTERNAL_ONDISK(new_slot->tts_values[i]) &&
+			!VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i])) ||
+			(VARATT_IS_LONG_EXTERNAL(new_slot->tts_values[i]) &&
+			!(VARATT_IS_LONG_EXTERNAL(old_slot->tts_values[i]) || VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i])))
+			))
 		{
 			if (!tmp_new_slot)
 			{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6a8bbcd59..a821fcbcc9 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1025,12 +1025,13 @@ AllocSetFree(void *pointer)
 #ifdef MEMORY_CONTEXT_CHECKING
 		{
 			Size		chunk_size = block->endptr - (char *) pointer;
-
 			/* Test for someone scribbling on unused space in chunk */
 			Assert(chunk->requested_size < chunk_size);
 			if (!sentinel_ok(pointer, chunk->requested_size))
-				elog(WARNING, "detected write past chunk end in %s %p",
-					 set->header.name, chunk);
+				elog(WARNING, "1 detected write past chunk end in %s %p req size %ld ch size %ld",
+					 set->header.name, chunk, chunk->requested_size, chunk_size);
+/*				elog(WARNING, "detected write past chunk end in %s %p",
+					 set->header.name, chunk); */
 		}
 #endif
 
@@ -1072,8 +1073,11 @@ AllocSetFree(void *pointer)
 		/* Test for someone scribbling on unused space in chunk */
 		if (chunk->requested_size < GetChunkSizeFromFreeListIdx(fidx))
 			if (!sentinel_ok(pointer, chunk->requested_size))
-				elog(WARNING, "detected write past chunk end in %s %p",
-					 set->header.name, chunk);
+				elog(WARNING, "2 detected write past chunk end in %s %p size %ld f size %ld",
+					 set->header.name, chunk, chunk->requested_size, GetChunkSizeFromFreeListIdx(fidx));
+
+/*				elog(WARNING, "detected write past chunk end in %s %p",
+					 set->header.name, chunk); */
 #endif
 
 #ifdef CLOBBER_FREED_MEMORY
@@ -1148,8 +1152,12 @@ AllocSetRealloc(void *pointer, Size size)
 		/* Test for someone scribbling on unused space in chunk */
 		Assert(chunk->requested_size < oldsize);
 		if (!sentinel_ok(pointer, chunk->requested_size))
+			elog(WARNING, "detected write past chunk end in %s %p  size %ld oldsize %ld",
+				 set->header.name, chunk, chunk->requested_size, oldsize);
+/*
 			elog(WARNING, "detected write past chunk end in %s %p",
 				 set->header.name, chunk);
+*/
 #endif
 
 #ifdef MEMORY_CONTEXT_CHECKING
@@ -1246,10 +1254,11 @@ AllocSetRealloc(void *pointer, Size size)
 
 #ifdef MEMORY_CONTEXT_CHECKING
 	/* Test for someone scribbling on unused space in chunk */
+//	elog(NOTICE, "write past chunk 4 req size %ld oldsize %ld", chunk->requested_size, oldsize);
 	if (chunk->requested_size < oldsize)
 		if (!sentinel_ok(pointer, chunk->requested_size))
-			elog(WARNING, "detected write past chunk end in %s %p",
-				 set->header.name, chunk);
+			elog(WARNING, "detected write past chunk end in %s %p size %ld oldsize %ld",
+				 set->header.name, chunk, chunk->requested_size, oldsize);
 #endif
 
 	/*
@@ -1603,9 +1612,12 @@ AllocSetCheck(MemoryContext context)
 			 */
 			if (dsize != InvalidAllocSize && dsize < chsize &&
 				!sentinel_ok(chunk, ALLOC_CHUNKHDRSZ + dsize))
-				elog(WARNING, "problem in alloc set %s: detected write past chunk end in block %p, chunk %p",
-					 name, block, chunk);
-
+				elog(WARNING, "problem in alloc set %s: detected write past chunk end in block %p, chunk %p size %ld oldsize %ld",
+					 name, block, chunk, dsize, chsize);
+/*
+				elog(WARNING, "problem in alloc set %s: detected write past chunk end in block %p, chunk %p size %ld oldsize %ld",
+					 name, block, chunk, dsize, chsize);
+*/
 			/*
 			 * If chunk is allocated, disallow external access to private part
 			 * of chunk header.
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b432a92be3..8a83861518 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -671,8 +671,13 @@ GenerationFree(void *pointer)
 	/* Test for someone scribbling on unused space in chunk */
 	Assert(chunk->requested_size < chunksize);
 	if (!sentinel_ok(pointer, chunk->requested_size))
+		elog(WARNING, "detected write past chunk end in %s %p size %ld oldsize %ld",
+			 ((MemoryContext) block->context)->name, chunk, chunk->requested_size, chunksize);
+
+/*
 		elog(WARNING, "detected write past chunk end in %s %p",
 			 ((MemoryContext) block->context)->name, chunk);
+*/
 #endif
 
 #ifdef CLOBBER_FREED_MEMORY
@@ -780,8 +785,12 @@ GenerationRealloc(void *pointer, Size size)
 	/* Test for someone scribbling on unused space in chunk */
 	Assert(chunk->requested_size < oldsize);
 	if (!sentinel_ok(pointer, chunk->requested_size))
+		elog(WARNING, "detected write past chunk end in %s %p size %ld oldsize %ld",
+			 ((MemoryContext) set)->name, chunk, chunk->requested_size, oldsize);
+/*
 		elog(WARNING, "detected write past chunk end in %s %p",
 			 ((MemoryContext) set)->name, chunk);
+*/
 #endif
 
 	/*
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 6df0839b6a..be5ababf75 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -482,8 +482,12 @@ SlabFree(void *pointer)
 	/* Test for someone scribbling on unused space in chunk */
 	Assert(slab->chunkSize < (slab->fullChunkSize - Slab_CHUNKHDRSZ));
 	if (!sentinel_ok(pointer, slab->chunkSize))
+		elog(WARNING, "detected write past chunk end in %s %p size %ld oldsize %ld",
+			 slab->header.name, chunk, slab->chunkSize, (slab->fullChunkSize - Slab_CHUNKHDRSZ));
+/*
 		elog(WARNING, "detected write past chunk end in %s %p",
 			 slab->header.name, chunk);
+*/
 #endif
 
 	/* compute index of the chunk with respect to block start */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7c391aaf0b..880035f8a0 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2683,11 +2683,16 @@ initialize_data_directory(void)
 	fputs(_("performing post-bootstrap initialization ... "), stdout);
 	fflush(stdout);
 
+	snprintf(cmd, sizeof(cmd),
+			 "\"%s\" %s %s template1 %s",
+			 backend_exec, backend_options, extra_options,
+			 debug ? "-d 5" : "");
+/*
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s\" %s %s template1 >%s",
 			 backend_exec, backend_options, extra_options,
 			 DEVNULL);
-
+*/
 	PG_CMD_OPEN;
 
 	setup_auth(cmdfd);
diff --git a/src/include/access/detoast.h b/src/include/access/detoast.h
index b1d8ea09dd..f4381872bf 100644
--- a/src/include/access/detoast.h
+++ b/src/include/access/detoast.h
@@ -27,6 +27,29 @@ do { \
 	memcpy(&(toast_pointer), VARDATA_EXTERNAL(attre), sizeof(toast_pointer)); \
 } while (0)
 
+/* 	memcpy(&(toast_pointer), VARDATA_EXTERNAL(attre), sizeof(varatt_external)); \ */
+/*		memcpy(&(toast_pointer) + sizeof(varatt_external), VARDATA_EXTERNAL(attre) + sizeof(varatt_external), sizeof(uint32)); \  */
+/*
+	Assert(VARSIZE_EXTERNAL(attre) == sizeof(toast_pointer) + VARHDRSZ_EXTERNAL); \
+*/
+ // sizeof(varatt_long_external))
+#define LONG_TOAST_POINTER_SIZE \
+	sizeof(varatt_long_external) + VARHDRSZ_EXTERNAL
+
+#define VARATT_EXTERNAL_GET_LONG_POINTER(toast_pointer, attr) \
+do { \
+	varattrib_1b_e *attre = (varattrib_1b_e *) (attr); \
+	Assert(VARATT_IS_EXTERNAL(attre)); \
+	Assert(VARSIZE_EXTERNAL(attre) == LONG_TOAST_POINTER_SIZE); \
+	if(VARATT_IS_LONG_EXTERNAL(attr)) \
+		memcpy(&(toast_pointer), VARDATA_EXTERNAL(attre), sizeof(toast_pointer)); \
+	else \
+	{ \
+		memcpy(&(toast_pointer), VARDATA_EXTERNAL(attre), sizeof(varatt_external)); \
+		memset(&(toast_pointer) + sizeof(varatt_external), 0, sizeof(uint32)); \
+	} \
+} while (0)
+
 /* Size of an EXTERNAL datum that contains a standard TOAST pointer */
 #define TOAST_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(varatt_external))
 
diff --git a/src/include/access/heaptoast.h b/src/include/access/heaptoast.h
index a75699054a..44832861d8 100644
--- a/src/include/access/heaptoast.h
+++ b/src/include/access/heaptoast.h
@@ -80,11 +80,11 @@
 #define EXTERN_TUPLES_PER_PAGE	4	/* tweak only this */
 
 #define EXTERN_TUPLE_MAX_SIZE	MaximumBytesPerTuple(EXTERN_TUPLES_PER_PAGE)
-
+/* XXX long_external */
 #define TOAST_MAX_CHUNK_SIZE	\
 	(EXTERN_TUPLE_MAX_SIZE -							\
 	 MAXALIGN(SizeofHeapTupleHeader) -					\
-	 sizeof(Oid) -										\
+	 sizeof(uint64) -										\
 	 sizeof(int32) -									\
 	 VARHDRSZ)
 
@@ -142,7 +142,7 @@ extern HeapTuple toast_build_flattened_tuple(TupleDesc tupleDesc,
  *	Fetch a slice from a toast value stored in a heap table.
  * ----------
  */
-extern void heap_fetch_toast_slice(Relation toastrel, Oid valueid,
+extern void heap_fetch_toast_slice(Relation toastrel, uint64 valueid,
 								   int32 attrsize, int32 sliceoffset,
 								   int32 slicelength, struct varlena *result);
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 4d1ef405c2..4c587e93ff 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -724,7 +724,8 @@ typedef struct TableAmRoutine
 	 * table implemented by this AM.  See table_relation_fetch_toast_slice()
 	 * for more details.
 	 */
-	void		(*relation_fetch_toast_slice) (Relation toastrel, Oid valueid,
+	/*  Oid valueid, */
+	void		(*relation_fetch_toast_slice) (Relation toastrel, uint64 valueid,
 											   int32 attrsize,
 											   int32 sliceoffset,
 											   int32 slicelength,
@@ -1885,8 +1886,9 @@ table_relation_toast_am(Relation rel)
  * result is caller-allocated space into which the fetched bytes should be
  * stored.
  */
+ /* Oid valueid, */
 static inline void
-table_relation_fetch_toast_slice(Relation toastrel, Oid valueid,
+table_relation_fetch_toast_slice(Relation toastrel, uint64 valueid,
 								 int32 attrsize, int32 sliceoffset,
 								 int32 slicelength, struct varlena *result)
 {
diff --git a/src/include/access/toast_internals.h b/src/include/access/toast_internals.h
index 85e7dc0fc5..b92dd22df9 100644
--- a/src/include/access/toast_internals.h
+++ b/src/include/access/toast_internals.h
@@ -47,11 +47,16 @@ typedef struct toast_compress_header
 
 extern Datum toast_compress_datum(Datum value, char cmethod);
 extern Oid	toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+/* extern bool toastrel_long_valueid_exists(Relation toastrel, uint64 valueid, bool long_ind); */
 
 extern void toast_delete_datum(Relation rel, Datum value, bool is_speculative);
 extern Datum toast_save_datum(Relation rel, Datum value,
 							  struct varlena *oldexternal, int options);
 
+extern uint64 SearchMinUnusedValueid(Oid	toastrelOid);
+extern uint64 SearchTrelCounterCache(Oid	toastrelOid);
+extern uint64 InsertTrelCounterCache(Oid	toastrelOid);
+
 extern int	toast_open_indexes(Relation toastrel,
 							   LOCKMODE lock,
 							   Relation **toastidxs,
diff --git a/src/include/postgres.h b/src/include/postgres.h
index 54730dfb38..f971bad233 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -72,10 +72,35 @@ typedef struct varatt_external
 	int32		va_rawsize;		/* Original data size (includes header) */
 	uint32		va_extinfo;		/* External saved size (without header) and
 								 * compression method */
-	Oid			va_valueid;		/* Unique ID of value within TOAST table */
 	Oid			va_toastrelid;	/* RelID of TOAST table containing it */
+	Oid			va_valueid;		/* Unique ID of value within TOAST table */
 }			varatt_external;
 
+typedef struct uint64align32
+{
+	uint32	hi;
+	uint32	lo;
+} uint64align32;
+
+#define set_uint64align32(p, v)	\
+	( \
+		(p)->hi = (v) >> 32, \
+		(p)->lo = (v) & 0xffffffff \
+	)
+
+#define get_uint64align32(p)	\
+	(((uint64)((p)->hi)) << 32 | ((uint64)((p)->lo)))
+
+typedef struct varatt_long_external
+{
+	int32				va_rawsize;		/* Original data size (includes header) */
+	uint32			va_extinfo;		/* External saved size (without header) and */
+								 			/* compression method */
+	Oid				va_toastrelid;	/* RelID of TOAST table containing it */
+	uint64align32	va_valueid;		/* Unique ID of value within TOAST table */
+/* 	char				*inline_tail_data; */
+}			varatt_long_external;
+
 /*
  * These macros define the "saved size" portion of va_extinfo.  Its remaining
  * two high-order bits identify the compression method.
@@ -124,6 +149,7 @@ typedef enum vartag_external
 	VARTAG_INDIRECT = 1,
 	VARTAG_EXPANDED_RO = 2,
 	VARTAG_EXPANDED_RW = 3,
+	VARTAG_LONG_EXTERNAL = 4,
 	VARTAG_ONDISK = 18
 } vartag_external;
 
@@ -135,6 +161,7 @@ typedef enum vartag_external
 	((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) : \
 	 VARTAG_IS_EXPANDED(tag) ? sizeof(varatt_expanded) : \
 	 (tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \
+	 (tag) == VARTAG_LONG_EXTERNAL ? sizeof(varatt_long_external) : \
 	 (AssertMacro(false), 0))
 
 /*
@@ -313,6 +340,13 @@ typedef struct
  * Other macros here should usually be used only by tuple assembly/disassembly
  * code and code that specifically wants to work with still-toasted Datums.
  */
+/* , datalen) (datalen)*/
+/* (datalen)  + (datalen) */
+/* #define VARATT_LONG_SIZE		((Size) offsetof(varatt_long_external, inline_tail_data)) */
+
+#define VARATT_LONG_SIZE		((Size) sizeof(varatt_long_external))
+#define VARSIZE_LONG_EXTERNAL(PTR)		((Size) VARHDRSZ_EXTERNAL + VARATT_LONG_SIZE)
+
 #define VARDATA(PTR)						VARDATA_4B(PTR)
 #define VARSIZE(PTR)						VARSIZE_4B(PTR)
 
@@ -320,7 +354,10 @@ typedef struct
 #define VARDATA_SHORT(PTR)					VARDATA_1B(PTR)
 
 #define VARTAG_EXTERNAL(PTR)				VARTAG_1B_E(PTR)
-#define VARSIZE_EXTERNAL(PTR)				(VARHDRSZ_EXTERNAL + VARTAG_SIZE(VARTAG_EXTERNAL(PTR)))
+#define VARSIZE_EXTERNAL(PTR)				(VARATT_IS_LONG_EXTERNAL(PTR) ? \
+													VARSIZE_LONG_EXTERNAL(PTR) : \
+													VARHDRSZ_EXTERNAL + VARTAG_SIZE(VARTAG_EXTERNAL(PTR)))
+
 #define VARDATA_EXTERNAL(PTR)				VARDATA_1B_E(PTR)
 
 #define VARATT_IS_COMPRESSED(PTR)			VARATT_IS_4B_C(PTR)
@@ -339,6 +376,8 @@ typedef struct
 	(VARATT_IS_EXTERNAL(PTR) && !VARTAG_IS_EXPANDED(VARTAG_EXTERNAL(PTR)))
 #define VARATT_IS_SHORT(PTR)				VARATT_IS_1B(PTR)
 #define VARATT_IS_EXTENDED(PTR)				(!VARATT_IS_4B_U(PTR))
+#define VARATT_IS_LONG_EXTERNAL(PTR) \
+	(VARATT_IS_EXTERNAL(PTR) && VARTAG_EXTERNAL(PTR) == VARTAG_LONG_EXTERNAL)
 
 #define SET_VARSIZE(PTR, len)				SET_VARSIZE_4B(PTR, len)
 #define SET_VARSIZE_SHORT(PTR, len)			SET_VARSIZE_1B(PTR, len)
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 2977045cc7..fa59a11e70 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -606,7 +606,7 @@ make_tuple_indirect(PG_FUNCTION_ARGS)
 			continue;
 
 		/* copy datum, so it still lives later */
-		if (VARATT_IS_EXTERNAL_ONDISK(attr))
+		if (VARATT_IS_EXTERNAL_ONDISK(attr) || VARATT_IS_LONG_EXTERNAL(attr))
 			attr = detoast_external_attr(attr);
 		else
 		{
-- 
2.25.1


From 59444b9be33dc71bb3434f1d7517272c7802d098 Mon Sep 17 00:00:00 2001
From: Nikita Malakhov <n.malakhov@postgrespro.ru>
Date: Wed, 7 Dec 2022 13:25:42 +0300
Subject: [PATCH 2/2] 64-bit TOAST value ID - removed NOTICE message that
 failed lots of tests and returned initdb to original state.

---
 src/backend/access/heap/heaptoast.c | 1 -
 src/bin/initdb/initdb.c             | 6 +++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/heap/heaptoast.c b/src/backend/access/heap/heaptoast.c
index d5d0042744..ded527c6d7 100644
--- a/src/backend/access/heap/heaptoast.c
+++ b/src/backend/access/heap/heaptoast.c
@@ -655,7 +655,6 @@ heap_fetch_toast_slice(Relation toastrel, uint64 valueid, int32 attrsize,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(valueid));
 */
-	elog(NOTICE, "heap_fetch_toast_slice rel %u val %ld", toastrel->rd_id, valueid);
 	ScanKeyInit(&toastkey[0],
 				(AttrNumber) 1,
 				BTEqualStrategyNumber, F_INT8EQ,
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 880035f8a0..60ecdd334b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2682,17 +2682,17 @@ initialize_data_directory(void)
 	 */
 	fputs(_("performing post-bootstrap initialization ... "), stdout);
 	fflush(stdout);
-
+/*
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s\" %s %s template1 %s",
 			 backend_exec, backend_options, extra_options,
 			 debug ? "-d 5" : "");
-/*
+*/
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s\" %s %s template1 >%s",
 			 backend_exec, backend_options, extra_options,
 			 DEVNULL);
-*/
+
 	PG_CMD_OPEN;
 
 	setup_auth(cmdfd);
-- 
2.25.1

#18Nikita Malakhov
hukutoc@gmail.com
In reply to: Nikita Malakhov (#17)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi hackers!

Any suggestions on the previous message (64-bit toast value ID with
individual counters)?

On Tue, Dec 13, 2022 at 1:41 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!

I've prepared a patch with a 64-bit TOAST Value ID. It is a kind of
prototype
and needs some further work, but it is already working and ready to play
with.

I've introduced 64-bit value ID field in varatt_external, but to keep it
compatible
with older bases 64-bit value is stored as a structure of two 32-bit
fields and value
ID moved to be the last in the varatt_external structure. Also, I've
introduced
different ID assignment function - instead of using global OID assignment
function
I use individual counters for each TOAST table, automatically cached and
after
server restart when new value is inserted into TOAST table maximum used
value
is searched and used to assign the next one.

Andres Freund:
I think we'll need to do something about the width of varatt_external to

make

the conversion to 64bit toast oids viable - and if we do, I don't think
there's a decent argument for staying with 4 byte toast OIDs. I think the
varatt_external equivalent would end up being smaller in just about all

cases.

And as you said earlier, the increased overhead inside the toast table /

index

is not relevant compared to the size of toasted datums.

There is some small overhead due to indexing 64-bit values. Also, for
smaller
tables we can use 32-bit ID instead of 64-bit, but then we would have a
problem
when we reach the limit of 2^32.

Pg_upgrade seems to be not a very difficult case if we go keeping old TOAST
tables using 32-bit values,

Please have a look. I'd be grateful for some further directions.

GIT branch with this feature, rebased onto current master:
https://github.com/postgrespro/postgres/tree/toast_64bit

--

Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

#19Gurjeet Singh
gurjeet@singh.im
In reply to: Nikita Malakhov (#18)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

On Thu, Dec 22, 2022 at 10:07 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Any suggestions on the previous message (64-bit toast value ID with individual counters)?

Was this patch ever added to CommitFest? I don't see it in the current
Open Commitfest.

https://commitfest.postgresql.org/43/

Best regards,
Gurjeet http://Gurje.et
http://aws.amazon.com

#20Nikita Malakhov
hukutoc@gmail.com
In reply to: Gurjeet Singh (#19)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi!

No, it wasn't. It was a proposal, I thought I'd get some feedback on it
before sending it to commitfest.

On Sat, Apr 22, 2023 at 6:17 PM Gurjeet Singh <gurjeet@singh.im> wrote:

On Thu, Dec 22, 2022 at 10:07 AM Nikita Malakhov <hukutoc@gmail.com>
wrote:

Any suggestions on the previous message (64-bit toast value ID with

individual counters)?

Was this patch ever added to CommitFest? I don't see it in the current
Open Commitfest.

https://commitfest.postgresql.org/43/

Best regards,
Gurjeet http://Gurje.et
http://aws.amazon.com

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#21Aleksander Alekseev
aleksander@timescale.com
In reply to: Nikita Malakhov (#20)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

I agree that we can't simply widen varatt_external to use 8 bytes for
the toast ID in all cases.

+1

Note that the user may have a table with multiple TOASTable
attributes. If we simply widen the TOAST pointer it may break the
existing tables in the edge case. Also this may be a reason why
certain users may prefer to continue using narrow pointers. IMO wide
TOAST pointers should be a table option. Whether the default for new
tables should be wide or narrow pointers is debatable.

In another discussion [1]https://commitfest.postgresql.org/43/3626/ we seem to agree that we also want to have
an ability to include a 32-bit dictionary_id to the TOAST pointers and
perhaps support more compression methods (ZSTD to name one). Besides
that it would be nice to have an ability to extend TOAST pointers in
the future without breaking the existing pointers. One possible
solution would be to add a varint feature bitmask to every pointer. So
we could add flags like TOAST_IS_WIDE, TOAST_HAS_DICTIONARY,
TOAST_UNKNOWN_FEATURE_FROM_2077, etc indefinitely.

I suggest we address all the current and future needs once and
completely refactor TOAST pointers rather than solving one problem at
a time. I believe this will be more beneficial for the community in
the long term.

Thoughts?

[1]: https://commitfest.postgresql.org/43/3626/

--
Best regards,
Aleksander Alekseev

#22Nikita Malakhov
hukutoc@gmail.com
In reply to: Aleksander Alekseev (#21)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi!

Widening of a TOAST pointer is possible if we keep backward compatibility
with
old-fashioned TOAST tables - I mean differ 'long' and 'short' TOAST
pointers and
process them accordingly on insert and delete cases, and vacuum with logical
replication. It is not very difficult, however it takes some effort.
Recently I've found
out that I have not overseen all compatibility cases, so the provided patch
is
functional but limited in compatibility.

We already have a flag byte in the TOAST pointer which is responsible for
the type
of the pointer - va_flag field. It was explained in the Pluggable TOAST
patch.
One is enough, there is no need to add another one.

On Wed, Apr 26, 2023 at 3:55 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:

Hi,

I agree that we can't simply widen varatt_external to use 8 bytes for
the toast ID in all cases.

+1

Note that the user may have a table with multiple TOASTable
attributes. If we simply widen the TOAST pointer it may break the
existing tables in the edge case. Also this may be a reason why
certain users may prefer to continue using narrow pointers. IMO wide
TOAST pointers should be a table option. Whether the default for new
tables should be wide or narrow pointers is debatable.

In another discussion [1] we seem to agree that we also want to have
an ability to include a 32-bit dictionary_id to the TOAST pointers and
perhaps support more compression methods (ZSTD to name one). Besides
that it would be nice to have an ability to extend TOAST pointers in
the future without breaking the existing pointers. One possible
solution would be to add a varint feature bitmask to every pointer. So
we could add flags like TOAST_IS_WIDE, TOAST_HAS_DICTIONARY,
TOAST_UNKNOWN_FEATURE_FROM_2077, etc indefinitely.

I suggest we address all the current and future needs once and
completely refactor TOAST pointers rather than solving one problem at
a time. I believe this will be more beneficial for the community in
the long term.

Thoughts?

[1]: https://commitfest.postgresql.org/43/3626/

--
Best regards,
Aleksander Alekseev

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#23Aleksander Alekseev
aleksander@timescale.com
In reply to: Nikita Malakhov (#22)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi Nikita,

I've prepared a patch with a 64-bit TOAST Value ID. It is a kind of prototype
and needs some further work, but it is already working and ready to play with.

Unfortunately the patch rotted a bit. Could you please submit an
updated/rebased patch so that it could be reviewed in the scope of
July commitfest?

--
Best regards,
Aleksander Alekseev

#24Nikita Malakhov
hukutoc@gmail.com
In reply to: Aleksander Alekseev (#23)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi!

Aleksander, thank you for reminding me of this patch, try to do it in a few
days.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#25Aleksander Alekseev
aleksander@timescale.com
In reply to: Nikita Malakhov (#24)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

Hi,

Aleksander, thank you for reminding me of this patch, try to do it in a few days.

A consensus was reached [1]/messages/by-id/0737f444-59bb-ac1d-2753-873c40da0840@eisentraut.org to mark this patch as RwF for now. There
are many patches to be reviewed and this one doesn't seem to be in the
best shape, so we have to prioritise. Please feel free re-submitting
the patch for the next commitfest.

[1]: /messages/by-id/0737f444-59bb-ac1d-2753-873c40da0840@eisentraut.org

--
Best regards,
Aleksander Alekseev

#26Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#13)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid

On Mon, Nov 28, 2022 at 05:24:29PM -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

And as you said earlier, the increased overhead inside the toast table / index
is not relevant compared to the size of toasted datums.

Perhaps not.

(Replying to an old thread here, due to some work involving more
compression methods in this area.)

I agree that we can't simply widen varatt_external to use 8 bytes for
the toast ID in all cases. Also, I now get the point about avoiding
use of globally assigned OIDs here: if the counter starts from zero
for each table, then a variable-width varatt_external could actually
be smaller than currently for many cases. However, that bit is somewhat
orthogonal, and it's certainly not required for fixing the basic problem.

So it seems like the plan of attack ought to be:

1. Invent a new form or forms of varatt_external to allow different
widths of the toast ID. Use the narrowest width possible for any
given ID value.

2. Allow TOAST tables/indexes to store either 4-byte or 8-byte IDs.
(Conversion could be done as a side effect of table-rewrite
operations, perhaps.)

3. Localize ID selection so that tables can have small toast IDs
even when other tables have many IDs. (Optional, could do later.)

Hmm, I was reading this thread, and I am wondering if tackling the
problem in the opposite order is not actually the path of least pain.
Doing 3 first where we would switch from a global source assigning the
toast IDs to a local source for each toast relation can lift quite a
bit of the pain we have, especially if we rely on a sequence attached
to the toast table to do all the value assignment work (WAL logging,
already 8 bytes, etc.). A new vartag_external would still be required
to store a 4-byte toast ID, and it could always be extended to 8 byte
if necessary. That sounds like having more benefits.
--
Michael