index unique
Hi,
postgres 12 with postgis.
on a table we need a primary key and to get a unique combinaison, we need 3
columns of that table:
1 of type integer,
1 of type text,
1 of type geometry
creating the PK constraint doesn work: (even with our current small data
set)
*ERROR: index row size 6072 exceeds btree version 4 maximum 2704 for index
"xxx_spkey"DETAIL: Index row references tuple (32,1) in relation
"xxx".HINT: Values larger than 1/3 of a buffer page cannot be
indexed.Consider a function index of an MD5 hash of the value, or use full
text indexing.*
*ok. we can do this.*
*but if so, we need to create a gist index on the geometry column to do any
topology request.*
*so 2 indexes containing this single column.*
*if we install extension btree_gist, no pb to create an index on all 3
columns.*
*but as gist does not support unicity, this index cannot be used for the
PK.*
*OK, we may try to use a function to get the bounding box around the
geometry objects and use the result into a btree index........*
*Any idea (I mean: another idea !) to tackle this ?*
*Or any critic on the "solution" ??*
*thanks,*
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
Primary key is going to be a BTREE index. I'm surprised you require the geometry in order to achieve uniqueness?
You can't put the geometry into a BTREE because it's too large.
You could add a column and stick the MD5 hash of the geometry there, and use that as the last piece of uniqueness?
If the bbox of the geometry is "good enough" for your key, that kind of speaks to the idea that maybe your geometry doesn't need to be part of the PK? It's possible for different geometries to have the same bbox...
ATB,
P
Show quoted text
On Jun 3, 2021, at 1:51 PM, Marc Millas <marc.millas@mokadb.com> wrote:
Hi,
postgres 12 with postgis.
on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometrycreating the PK constraint doesn work: (even with our current small data set)
ERROR: index row size 6072 exceeds btree version 4 maximum 2704 for index "xxx_spkey"
DETAIL: Index row references tuple (32,1) in relation "xxx".
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.ok. we can do this.
but if so, we need to create a gist index on the geometry column to do any topology request.
so 2 indexes containing this single column.if we install extension btree_gist, no pb to create an index on all 3 columns.
but as gist does not support unicity, this index cannot be used for the PK.OK, we may try to use a function to get the bounding box around the geometry objects and use the result into a btree index........
Any idea (I mean: another idea !) to tackle this ?
Or any critic on the "solution" ??thanks,
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
Hi Paul,
obviously the bounding box is not a perfect choice. we also think of one of
the "centers" point, but quite similar non uniqueness.
so, if no "tech" solution, we continue to work with the business to try to
find an appropriate PK
thanks
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
On Thu, Jun 3, 2021 at 10:57 PM Paul Ramsey <pramsey@cleverelephant.ca>
wrote:
Show quoted text
Primary key is going to be a BTREE index. I'm surprised you require the
geometry in order to achieve uniqueness?
You can't put the geometry into a BTREE because it's too large.
You could add a column and stick the MD5 hash of the geometry there, and
use that as the last piece of uniqueness?
If the bbox of the geometry is "good enough" for your key, that kind of
speaks to the idea that maybe your geometry doesn't need to be part of the
PK? It's possible for different geometries to have the same bbox...
ATB,
POn Jun 3, 2021, at 1:51 PM, Marc Millas <marc.millas@mokadb.com> wrote:
Hi,
postgres 12 with postgis.
on a table we need a primary key and to get a unique combinaison, weneed 3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometrycreating the PK constraint doesn work: (even with our current small data
set)
ERROR: index row size 6072 exceeds btree version 4 maximum 2704 for
index "xxx_spkey"
DETAIL: Index row references tuple (32,1) in relation "xxx".
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full textindexing.
ok. we can do this.
but if so, we need to create a gist index on the geometry column to doany topology request.
so 2 indexes containing this single column.
if we install extension btree_gist, no pb to create an index on all 3
columns.
but as gist does not support unicity, this index cannot be used for the
PK.
OK, we may try to use a function to get the bounding box around the
geometry objects and use the result into a btree index........
Any idea (I mean: another idea !) to tackle this ?
Or any critic on the "solution" ??thanks,
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
postgres 12 with postgis.
on a table we need a primary key and to get a unique combinaison, we need 3
columns of that table:
1 of type integer,
1 of type text,
1 of type geometrycreating the PK constraint doesn work:
I find that if a natural primary key candidate is so complex, it is
usually better to use a surrogate key.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
postgres 12 with postgis.
on a table we need a primary key and to get a unique combinaison, weneed 3
columns of that table:
1 of type integer,
1 of type text,
1 of type geometrycreating the PK constraint doesn work:
I find that if a natural primary key candidate is so complex, it is
usually better to use a surrogate key.
You make this sound like an either-or proposition, but personally it takes
a very exceptional circumstance to forgo defining a unique natural key.
Whether I choose to supplement that with a surrogate key is a different
matter altogether.
In this case identity would seem to make more sense using labels, not
composition. For instance, on a substance table I would have the
identifier for water be "H20", and have the chemical composition of water
be "H20" (not the best example...). In this case ensuring uniqueness of
the formula-to-label dependency would be trivial to implement and I
probably would too - but in the case of geometry I'd just accept that using
an index to do this would not be possible and, if I really needed
reassurance of geometry uniqueness, I would do so in triggers.
David J.
On 2021-06-07 10:20:22 -0700, David G. Johnston wrote:
On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2021-06-03 22:51:55 +0200, Marc Millas wrote:postgres 12 with postgis.
on a table we need a primary key and to get a unique combinaison, we need
3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometrycreating the PK constraint doesn work:
I find that if a natural primary key candidate is so complex, it is
usually better to use a surrogate key.You make this sound like an either-or proposition,
While he is talking about *a* primary key, it should be *the* primary
key. There can be only one (that's why it is the primary key).
There can be several unique keys, though.
but personally it takes a very exceptional circumstance to forgo
defining a unique natural key.
True, but not what he's talking about.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
quite funny to see how a tech question seems to end into an english grammar
thing :-)
quote
You make this sound like an either-or proposition,
While he is talking about *a* primary key, it should be *the* primary
key. There can be only one (that's why it is the primary key).
There can be several unique keys, though.
but personally it takes a very exceptional circumstance to forgo
defining a unique natural key.
True, but not what he's talking about.
end quote
when I did write the question, I write: "on a table we need a primary key"
its quite obvious that there is only one PK, but, at that stage, I
dont need one for functional reasons
so, I need "a" primary key, whatever it is, just because postgres logical
replication needs one to accept updates.
then, to build "the" needed PK, we need to decide which columns or set of
will be appropriate
as it has to be able to identify something unique. and this is how we get
to this list of 3 columns.
So, ok, this set doesn't work asis.
all solutions we thought of are just tech workaround, providing no
functionnal meaning.
still, its somehow surprising: (at least to me !)
postgres has no problem creating a btree for a geometry column, and, so, no
problem for putting a geom column in a PK.
(very different from Oracle...)
the only pb observed is the size of the object accepted. if the geom is a
bit "big" then the index errors.about btree size of index object.
but if I create a table test_l with a text column blabla as a PK, and
insert a 100 000 character long string , no pb.
if I do an explain analyze select blabla from test_l order by blabla,
postgres tells me that it did an index only scan.
so, I wonder why Postgres is able to put a 100 000 long text in a btree
index and NOT a geom column which wkt is 10 000 bytes long.
there, sure, is a reason.
please, let me know :-)
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
On Tue, Jun 8, 2021 at 12:09 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
Show quoted text
On 2021-06-07 10:20:22 -0700, David G. Johnston wrote:
On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer <hjp-pgsql@hjp.at>
wrote:
On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
postgres 12 with postgis.
on a table we need a primary key and to get a unique combinaison,we need
3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometrycreating the PK constraint doesn work:
I find that if a natural primary key candidate is so complex, it is
usually better to use a surrogate key.You make this sound like an either-or proposition,
While he is talking about *a* primary key, it should be *the* primary
key. There can be only one (that's why it is the primary key).There can be several unique keys, though.
but personally it takes a very exceptional circumstance to forgo
defining a unique natural key.True, but not what he's talking about.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
On 2021-06-08 18:30:16 +0200, Marc Millas wrote:
the only pb observed is the size of the object accepted. if the geom is a bit
"big" then the index errors.about btree size of index object.
but if I create a table test_l with a text column blabla as a PK, and insert a
100 000 character long string , no pb.
if I do an explain analyze select blabla from test_l order by blabla, postgres
tells me that it did an index only scan.
so, I wonder why Postgres is able to put a 100 000 long text in a btree index
and NOT a geom column which wkt is 10 000 bytes long.
Good question. Maybe your texts compresses better than your geometries:
hjp=> insert into t(t) values(repeat('a', 235327) || '1');
INSERT 0 1
Time: 60.057 ms
hjp=> insert into t(t) values(repeat('a', 235328) || '1');
ERROR: index row size 2720 exceeds maximum 2712 for index "t_pkey1"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
Time: 58.751 ms
Note the difference between the length of the string I was trying to
insert and the length of the row it complains about.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Cristal clear !
and it have to be the case as my test was done with some not so random data.
but this mean that we cannot put a bunch of datatypes in a PK, as soon as
it may be longer than 2701.
I know, its clearly stated in postgres doc about btree," The only
limitation is that an index entry cannot exceed approximately one-third of
a page (after TOAST compression, if applicable)."
but as this looks quite hard coded, it means that for long utf8 things the
data length is not so long before hitting the limit.
is there any plan to adress this ?
kind of index toast ?? :-)
thanks
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
On Tue, Jun 8, 2021 at 8:42 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
Show quoted text
On 2021-06-08 18:30:16 +0200, Marc Millas wrote:
the only pb observed is the size of the object accepted. if the geom is
a bit
"big" then the index errors.about btree size of index object.
but if I create a table test_l with a text column blabla as a PK, andinsert a
100 000 character long string , no pb.
if I do an explain analyze select blabla from test_l order by blabla,postgres
tells me that it did an index only scan.
so, I wonder why Postgres is able to put a 100 000 long text in a btreeindex
and NOT a geom column which wkt is 10 000 bytes long.
Good question. Maybe your texts compresses better than your geometries:
hjp=> insert into t(t) values(repeat('a', 235327) || '1');
INSERT 0 1
Time: 60.057 ms
hjp=> insert into t(t) values(repeat('a', 235328) || '1');
ERROR: index row size 2720 exceeds maximum 2712 for index "t_pkey1"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text
indexing.
Time: 58.751 msNote the difference between the length of the string I was trying to
insert and the length of the row it complains about.hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
On Tuesday, June 8, 2021, Marc Millas <marc.millas@mokadb.com> wrote:
but as this looks quite hard coded, it means that for long utf8 things
the data length is not so long before hitting the limit.is there any plan to adress this ?
None that I’ve seen, and I don’t expect to see one either. Mainly because
I’ve yet to see or think of a use case that would warrant even considering
doing such, ignoring the fact that actually changing such a value is likely
to be a non-starter from a compatability perspective.
David J.
Marc Millas <marc.millas@mokadb.com> writes:
I know, its clearly stated in postgres doc about btree," The only
limitation is that an index entry cannot exceed approximately one-third of
a page (after TOAST compression, if applicable)."
Yup.
is there any plan to adress this ?
No. The cost/benefit ratio seems completely untenable.
The usual recommendation is that you shorten long values by hashing
them, eg create a unique index on md5(my_long_column).
regards, tom lane
Marc Millas schrieb am 03.06.2021 um 22:51:
on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometrycreating the PK constraint doesn work: (even with our current small data set)
ERROR: index row size 6072 exceeds btree version 4 maximum 2704 for index "xxx_spkey"
DETAIL: Index row references tuple (32,1) in relation "xxx".
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.ok. we can do this.
but if so, we need to create a gist index on the geometry column to do any topology request.
so 2 indexes containing this single column.if we install extension btree_gist, no pb to create an index on all 3 columns.
but as gist does not support unicity, this index cannot be used for the PK.OK, we may try to use a function to get the bounding box around the geometry objects and use the result into a btree index........
Any idea (I mean: another idea !) to tackle this ?
Or any critic on the "solution" ??
How do you define the "uniqueness" of the geometry?
GIST can support "uniqueness" through exclusion constraints.
It's not a primary key, so you can't create foreign keys referencing that table,
but it does ensure uniqueness (In fact the "normal" unique indexes are
essentially a special case of exclusion constraints)
create index on the_table using gist (int_column with =, text_col with =, geometry_col with &&);
Replace the && operator with whatever is appropriate for your use case.
Thomas
Thanks Thomas,
but, as stated after the first post, the need was for a PK as asked by
postgres (ie. for tech needs, not for functionnal needs)
up to now, looks like we must create a PK (and so, the associated index)
just to answer logical replication needs.(and qgis which also needs a PK)
that index (some kind of hash on the geom column + the other 2). have no
meaning on a functionnal point of view, and there are chances that it will
never be used by postgres in normal use..
as the hash will not help on any topological request...
that was the reason of the first mail: as we must create a PK, is there any
way to make something useful and not this unuseful "thing" ?
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
On Tue, Jun 8, 2021 at 10:51 PM Thomas Kellerer <shammat@gmx.net> wrote:
Show quoted text
Marc Millas schrieb am 03.06.2021 um 22:51:
on a table we need a primary key and to get a unique combinaison, we
need 3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometrycreating the PK constraint doesn work: (even with our current small data
set)
ERROR: index row size 6072 exceeds btree version 4 maximum 2704 for
index "xxx_spkey"
DETAIL: Index row references tuple (32,1) in relation "xxx".
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full textindexing.
ok. we can do this.
but if so, we need to create a gist index on the geometry column to doany topology request.
so 2 indexes containing this single column.
if we install extension btree_gist, no pb to create an index on all 3
columns.
but as gist does not support unicity, this index cannot be used for the
PK.
OK, we may try to use a function to get the bounding box around the
geometry objects and use the result into a btree index........
Any idea (I mean: another idea !) to tackle this ?
Or any critic on the "solution" ??How do you define the "uniqueness" of the geometry?
GIST can support "uniqueness" through exclusion constraints.
It's not a primary key, so you can't create foreign keys referencing that
table,
but it does ensure uniqueness (In fact the "normal" unique indexes are
essentially a special case of exclusion constraints)create index on the_table using gist (int_column with =, text_col with
=, geometry_col with &&);Replace the && operator with whatever is appropriate for your use case.
Thomas
On 8 Jun 2021, at 22:50, Thomas Kellerer <shammat@gmx.net> wrote:
Marc Millas schrieb am 03.06.2021 um 22:51:
on a table we need a primary key and to get a unique combinaison, we need 3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometryHow do you define the "uniqueness" of the geometry?
That is actually the big question here. Multiple “unique” geometries can specify the same geometry!
A geom as simple as a line from (0,0) - (1,0) can just as easily be specified as (1,0) - (0,0). That’s the simplest case, and one could argue that the point of origin is different, but the next example would be a triangle starting at the same origin but traversed in different directions. It gets harder the more vertices a polygon has.
I would argue that a geometry type is ill-suited as a primary key column candidate.
Now, of course, the OP could have a case where their geometries are guaranteed to be unique regardless, but they’d better make sure before adding them to the PK.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
Hi Alban,
I plainly agree on the uniqueness thing. and on the fact that a PK with
only a geometry column can be considered somehow "ill suited".
That said, the PK we finally use contains, as said, 3 columns:
--an id (integer column)
--a topology describer (the text column)
--and the geometry column.
this set IS unique in our dataset. and helps provide some quality checks
(wrong id, non coherent tolology,. and some geometry errors..)
anyway, my original post was about the fact that we must create a PK based
on some kind of work around the limited btree length behaviour thus
creating an index that will never be of any use, just for technical reasons.
People in this list have been quite clear that the btree limitation will
NOT change. Which ends the point.
regards
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
On Thu, Jun 10, 2021 at 10:33 PM Alban Hertroys <haramrae@gmail.com> wrote:
Show quoted text
On 8 Jun 2021, at 22:50, Thomas Kellerer <shammat@gmx.net> wrote:
Marc Millas schrieb am 03.06.2021 um 22:51:
on a table we need a primary key and to get a unique combinaison, we
need 3 columns of that table:
1 of type integer,
1 of type text,
1 of type geometryHow do you define the "uniqueness" of the geometry?
That is actually the big question here. Multiple “unique” geometries can
specify the same geometry!A geom as simple as a line from (0,0) - (1,0) can just as easily be
specified as (1,0) - (0,0). That’s the simplest case, and one could argue
that the point of origin is different, but the next example would be a
triangle starting at the same origin but traversed in different directions.
It gets harder the more vertices a polygon has.I would argue that a geometry type is ill-suited as a primary key column
candidate.Now, of course, the OP could have a case where their geometries are
guaranteed to be unique regardless, but they’d better make sure before
adding them to the PK.Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
On 2021-06-11 14:37:57 +0200, Marc Millas wrote:
anyway, my original post was about the fact that we must create a PK based on
some kind of work around the limited btree length behaviour thus creating an
index that will never be of any use, just for technical reasons.
Actually, that index will be used, and it will be used every time you
update, insert, or delete a row - by the logical replication system
which has to identify the affected row(s) on the secondary system.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"