overflow bug for inhcounts
Per a review comment on the not-null constraint patch from Jian He, I
played a bit with the various inhcount variables in the tree, with this
simple test which creates a child table and 2^16+1 parents:
for i in `seq 1 $((2 ** 16+1))`; do echo "create table parent_$i (a int);"; done | psql
(echo "create table child () inherits (" ;
for i in `seq 1 $(( 2**16 ))`; do
echo -n "parent_$i,";
done;
echo "parent_$((2**16 + 1))) ;" ) | psql
(Needs a larger-than-default max_locks_per_transaction). With current
master, this command finishes successfully, but sets attinhcount to 1
for the column:
select attname, attnum, attinhcount from pg_attribute where attrelid = 'child'::regclass and attnum > 0;
attname │ attnum │ attinhcount
─────────┼────────┼─────────────
a │ 1 │ 1
This is because ColumnDef->inhcount is a 32-bit int, but
Form_pg_attribute->attinhcount is int16, so we didn't break the overflow
test for ColumnDef inhcount, but attinhcount has overflowed during
assignment.
At this point we can disinherit the table from a single parent (which
will bring the attinhcount to 0) and then drop the column. This breaks
any query that examines one of the other 2^16 parents,
=# alter table child no inherit parent_1;
ALTER TABLE
=# alter table child drop column a;
ALTER TABLE
=# select * from parent_2;
ERROR: could not find inherited attribute "a" of relation "child"
I suppose the same could happen with CookedConstraint->inhcount, which
is also 32-bit int, but I didn't build an example for that.
From branch master, I propose we change those two members to int16
(ColumnDef->inhcount and CookedConstraint->inhcount) to make those
counters consistently use the same type; and then use
pg_add_s16_overflow() instead of ++ for the increments, as in the
attached patch. With this patch, the child table creation fails as
expected ("too many inheritance parents").
In stable branches, I see two possible approaches: we could use the same
ptach as master (but add another int16 to the struct as padding, to
avoid changing the struct layout), or, less intrusive, we could leave
that alone and instead change the "overflow" after the addition to test
inhcount > PG_INT16_MAX instead of < 0. Or we could leave it all alone.
Oh and actually, we could change all these variables to be unsigned,
since there's no use for negative inhcounts. The patch doesn't do that;
it'd require changing the subtraction paths to use overflow-protected
ops as well.
Opinions?
(I'm not terribly enthused about adding a test that creates a child
table with 2^16 parents, because of the added runtime -- on my machine
the scripts above take about 4 seconds.)
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
Attachments:
0001-Unbreak-overflow-test-for-attinhcount-coninhcount.patchtext/x-diff; charset=utf-8Download+20-19
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
... This is because ColumnDef->inhcount is a 32-bit int, but
Form_pg_attribute->attinhcount is int16, so we didn't break the overflow
test for ColumnDef inhcount, but attinhcount has overflowed during
assignment.
Ugh ... somebody's ancient oversight there. Or maybe attinhcount
was once int32, and we narrowed it for space reasons?
From branch master, I propose we change those two members to int16
(ColumnDef->inhcount and CookedConstraint->inhcount) to make those
counters consistently use the same type; and then use
pg_add_s16_overflow() instead of ++ for the increments, as in the
attached patch. With this patch, the child table creation fails as
expected ("too many inheritance parents").
+1. I didn't check if there were any other places to touch, but
this looks like a good solution for master.
In stable branches, I see two possible approaches: we could use the same
ptach as master (but add another int16 to the struct as padding, to
avoid changing the struct layout),
That would not preserve ABI on machines with the wrong endianness.
or, less intrusive, we could leave
that alone and instead change the "overflow" after the addition to test
inhcount > PG_INT16_MAX instead of < 0. Or we could leave it all alone.
On the whole I'd leave it alone in back branches. Nobody who's not
intentionally trying to break their table will hit this.
(I'm not terribly enthused about adding a test that creates a child
table with 2^16 parents, because of the added runtime -- on my machine
the scripts above take about 4 seconds.)
Agreed, too expensive for the value.
regards, tom lane
Hi,
On 2024-10-08 18:11:39 +0200, Alvaro Herrera wrote:
Oh and actually, we could change all these variables to be unsigned,
since there's no use for negative inhcounts. The patch doesn't do that;
it'd require changing the subtraction paths to use overflow-protected
ops as well.
Unfortunately we don't really have a way to represent unsigned numbers on the
SQL level today. So I'd not go there for now - it's not like this is a real
limitation for practical use cases.
One case where I'd like unsigned numbers is pg_class.relpages - it's pretty
awkward that it "looks" negative for large tables. 16TB isn't that large
anymore...
Greetings,
Andres
Oid
CreateConstraintEntry(const char *constraintName,
Oid constraintNamespace,
char constraintType,
....
int conInhCount,
bool conNoInherit,
bool conPeriod,
bool is_internal)
we can change
int conInhCount,
to
int16 conInhCount.
doing that, meaning we also need to refactor some of the caller functions.
we can also change
node Constraint field inhcount from int to int16.
the attached patch doing it.
while at it, minor changed CreateConstraintEntry caller's comments.
based on your patch.
Attachments:
further_refactor_intcount.no-cfbotapplication/octet-stream; name=further_refactor_intcount.no-cfbotDownload+10-10
On 2024-Oct-08, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
... This is because ColumnDef->inhcount is a 32-bit int, but
Form_pg_attribute->attinhcount is int16, so we didn't break the overflow
test for ColumnDef inhcount, but attinhcount has overflowed during
assignment.Ugh ... somebody's ancient oversight there. Or maybe attinhcount
was once int32, and we narrowed it for space reasons?
Yeah, both attinhcount and coninhcount were narrowed in 90189eefc1e1
during REL_16_STABLE development, so it's a relatively new problem.
From branch master, I propose we change those two members to int16
(ColumnDef->inhcount and CookedConstraint->inhcount) to make those
counters consistently use the same type; and then use
pg_add_s16_overflow() instead of ++ for the increments, as in the
attached patch. With this patch, the child table creation fails as
expected ("too many inheritance parents").+1. I didn't check if there were any other places to touch, but
this looks like a good solution for master.
Thanks for looking.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La conclusión que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusión de ellos" (Tanenbaum)
On 2024-Oct-09, jian he wrote:
CreateConstraintEntry(const char *constraintName,
we can change
int conInhCount,
to
int16 conInhCount.
doing that, meaning we also need to refactor some of the caller functions.
Good thought, thanks.
we can also change
node Constraint field inhcount from int to int16.
Nah, that field is unused, so I removed it. I gave another look at the
not-nulls branch, and we can do without it.
This is pushed, although there was a technical glitch (since fixed by
Magnus Hagander) and the pgsql-committers message was not sent. But you
can see it here:
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/