extended statistics n-distinct on multiple columns not used when join two tables

Started by James Pang (chaolpan)almost 3 years ago6 messageshackers
Jump to latest
#1James Pang (chaolpan)
chaolpan@cisco.com

Hi,
When join two table on multiple columns equaljoin, rows estimation always use selectivity = multiplied by distinct multiple individual columns, possible to use extended n-distinct statistics on multiple columns?
PG v14.8-1, attached please check test case with details.

Thanks,

James

Attachments:

test_join_selectivity_rows_estimation_join.txttext/plain; name=test_join_selectivity_rows_estimation_join.txtDownload
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: James Pang (chaolpan) (#1)
Re: extended statistics n-distinct on multiple columns not used when join two tables

Hi

út 13. 6. 2023 v 11:21 odesílatel James Pang (chaolpan) <chaolpan@cisco.com>
napsal:

Hi,

When join two table on multiple columns equaljoin, rows estimation
always use selectivity = multiplied by distinct multiple individual
columns, possible to use extended n-distinct statistics on multiple
columns?

PG v14.8-1, attached please check test case with details.

There is not any support for multi tables statistic

Regards

Pavel

Show quoted text

Thanks,

James

#3David Rowley
dgrowleyml@gmail.com
In reply to: Pavel Stehule (#2)
Re: extended statistics n-distinct on multiple columns not used when join two tables

(moving to -hackers)

On Tue, 13 Jun 2023 at 21:30, Pavel Stehule <pavel.stehule@gmail.com> wrote:

út 13. 6. 2023 v 11:21 odesílatel James Pang (chaolpan) <chaolpan@cisco.com> napsal:

When join two table on multiple columns equaljoin, rows estimation always use selectivity = multiplied by distinct multiple individual columns, possible to use extended n-distinct statistics on multiple columns?

PG v14.8-1, attached please check test case with details.

There is not any support for multi tables statistic

I think it's probably worth adjusting the docs to mention this. It
seems like it might be something that could surprise someone.

Something like the attached, maybe?

David

Attachments:

mention_ext_stats_dont_work_for_joins.patchapplication/octet-stream; name=mention_ext_stats_dont_work_for_joins.patchDownload+6-0
#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Rowley (#3)
Re: extended statistics n-distinct on multiple columns not used when join two tables

út 13. 6. 2023 v 13:26 odesílatel David Rowley <dgrowleyml@gmail.com>
napsal:

(moving to -hackers)

On Tue, 13 Jun 2023 at 21:30, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

út 13. 6. 2023 v 11:21 odesílatel James Pang (chaolpan) <

chaolpan@cisco.com> napsal:

When join two table on multiple columns equaljoin, rows estimation

always use selectivity = multiplied by distinct multiple individual
columns, possible to use extended n-distinct statistics on multiple
columns?

PG v14.8-1, attached please check test case with details.

There is not any support for multi tables statistic

I think it's probably worth adjusting the docs to mention this. It
seems like it might be something that could surprise someone.

Something like the attached, maybe?

+1

Pavel

Show quoted text

David

#5James Pang (chaolpan)
chaolpan@cisco.com
In reply to: Pavel Stehule (#4)
RE: extended statistics n-distinct on multiple columns not used when join two tables

Thanks for your information, yes, with multiple columns equal join and correlation , looks like extended statistics could help reduce “significantly rows estimation”. Hopefully it’s in future version.

James

From: Pavel Stehule <pavel.stehule@gmail.com>
Sent: Tuesday, June 13, 2023 7:29 PM
To: David Rowley <dgrowleyml@gmail.com>
Cc: PostgreSQL Developers <pgsql-hackers@lists.postgresql.org>; James Pang (chaolpan) <chaolpan@cisco.com>
Subject: Re: extended statistics n-distinct on multiple columns not used when join two tables

út 13. 6. 2023 v 13:26 odesílatel David Rowley <dgrowleyml@gmail.com<mailto:dgrowleyml@gmail.com>> napsal:
(moving to -hackers)

On Tue, 13 Jun 2023 at 21:30, Pavel Stehule <pavel.stehule@gmail.com<mailto:pavel.stehule@gmail.com>> wrote:

út 13. 6. 2023 v 11:21 odesílatel James Pang (chaolpan) <chaolpan@cisco.com<mailto:chaolpan@cisco.com>> napsal:

When join two table on multiple columns equaljoin, rows estimation always use selectivity = multiplied by distinct multiple individual columns, possible to use extended n-distinct statistics on multiple columns?

PG v14.8-1, attached please check test case with details.

There is not any support for multi tables statistic

I think it's probably worth adjusting the docs to mention this. It
seems like it might be something that could surprise someone.

Something like the attached, maybe?

+1

Pavel

David

#6David Rowley
dgrowleyml@gmail.com
In reply to: Pavel Stehule (#4)
Re: extended statistics n-distinct on multiple columns not used when join two tables

On Tue, 13 Jun 2023 at 23:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I think it's probably worth adjusting the docs to mention this. It
seems like it might be something that could surprise someone.

Something like the attached, maybe?

+1

Ok, I pushed that patch. Thanks.

David