BRIN cost estimate breaks geometric indexes
Hi,
Found out today that BRIN indexes don't really work for PostGIS and box
datatypes.
Since
https://github.com/postgres/postgres/commit/7e534adcdc70866e7be74d626b0ed067c890a251
Postgres
requires datatype to provide correlation statistics. Such statistics wasn't
provided by PostGIS and box types.
Today I tried to replace a 200gb gist index with 8mb brin index and queries
didn't work as expected - it was never used. set enable_seqscan=off helped
for a bit but that's not a permanent solution.
Plans for context:
https://gist.github.com/Komzpa/2cd396ec9b65e2c93341e9934d974826
Debugging session on #postgis IRC channel leads to this ticket to create a
(not that meaningful) correlation statistics for geometry datatype:
https://trac.osgeo.org/postgis/ticket/4625#ticket
Postgres Professional mentioned symptoms of the issue in their in-depth
manual: https://habr.com/ru/company/postgrespro/blog/346460/ - box datatype
showed same unusable BRIN symptoms for them.
A reasonable course of action on Postgres side seems to be to not assume
selectivity of 1 in absence of correlation statistics, but something that
would prefer such an index to a parallel seq scan, but higher than similar
GIST.
Any other ideas?
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On 21.01.2020 0:00, Darafei "Komяpa" Praliaskouski wrote:
Hi,
Found out today that BRIN indexes don't really work for PostGIS and
box datatypes.Since
https://github.com/postgres/postgres/commit/7e534adcdc70866e7be74d626b0ed067c890a251 Postgres
requires datatype to provide correlation statistics. Such statistics
wasn't provided by PostGIS and box types.Today I tried to replace a 200gb gist index with 8mb brin index and
queries didn't work as expected - it was never used. set
enable_seqscan=off helped for a bit but that's not a permanent solution.
Plans for context:
https://gist.github.com/Komzpa/2cd396ec9b65e2c93341e9934d974826Debugging session on #postgis IRC channel leads to this ticket to
create a (not that meaningful) correlation statistics for geometry
datatype: https://trac.osgeo.org/postgis/ticket/4625#ticketPostgres Professional mentioned symptoms of the issue in their
in-depth manual:
https://habr.com/ru/company/postgrespro/blog/346460/ - box datatype
showed same unusable BRIN symptoms for them.
(Translated to English:
https://habr.com/en/company/postgrespro/blog/452900/)
A reasonable course of action on Postgres side seems to be to not
assume selectivity of 1 in absence of correlation statistics, but
something that would prefer such an index to a parallel seq scan, but
higher than similar GIST.Any other ideas?
As far as I understand, correlation is computed only for sortable types,
which means that the current concept of correlation works as intended
only for B-tree indexes.
Ideally, correlation should be computed for (attribute, index) pair,
taking into account order of values returned by the index scan. Less
ideal but more easier approach can be to ignore the computed correlation
for any index access except B-tree, and just assume some predefined
constant.
Hi,
Patch may look as simple as this one:
https://patch-diff.githubusercontent.com/raw/postgres/postgres/pull/49.diff
Previous mention in -hackers is available at
https://postgrespro.com/list/id/CAKJS1f9n-Wapop5Xz1dtGdpdqmzeGqQK4sV2MK-zZugfC14Xtw@mail.gmail.com
-
seems everyone overlooked that patch there breaks geometric indexing back
then.
On Tue, Jan 21, 2020 at 2:07 AM Egor Rogov <e.rogov@postgrespro.ru> wrote:
On 21.01.2020 0:00, Darafei "Komяpa" Praliaskouski wrote:
Hi,
Found out today that BRIN indexes don't really work for PostGIS and
box datatypes.Since
https://github.com/postgres/postgres/commit/7e534adcdc70866e7be74d626b0ed067c890a251 Postgres
requires datatype to provide correlation statistics. Such statistics
wasn't provided by PostGIS and box types.Today I tried to replace a 200gb gist index with 8mb brin index and
queries didn't work as expected - it was never used. set
enable_seqscan=off helped for a bit but that's not a permanent solution.
Plans for context:
https://gist.github.com/Komzpa/2cd396ec9b65e2c93341e9934d974826Debugging session on #postgis IRC channel leads to this ticket to
create a (not that meaningful) correlation statistics for geometry
datatype: https://trac.osgeo.org/postgis/ticket/4625#ticketPostgres Professional mentioned symptoms of the issue in their
in-depth manual:
https://habr.com/ru/company/postgrespro/blog/346460/ - box datatype
showed same unusable BRIN symptoms for them.(Translated to English:
https://habr.com/en/company/postgrespro/blog/452900/)A reasonable course of action on Postgres side seems to be to not
assume selectivity of 1 in absence of correlation statistics, but
something that would prefer such an index to a parallel seq scan, but
higher than similar GIST.Any other ideas?
As far as I understand, correlation is computed only for sortable types,
which means that the current concept of correlation works as intended
only for B-tree indexes.Ideally, correlation should be computed for (attribute, index) pair,
taking into account order of values returned by the index scan. Less
ideal but more easier approach can be to ignore the computed correlation
for any index access except B-tree, and just assume some predefined
constant.
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa