pg_stat_statements and "IN" conditions

Started by Dmitry Dolgovover 5 years ago155 messageshackers

9erthalion6@gmail.com

over 5 years ago

I would like to start another thread to follow up on [1]/messages/by-id/CAF42k=JCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ@mail.gmail.com, mostly to bump up the
topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr in
queries like:

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

The current implementation produces different jumble hash for every different
number of arguments for essentially the same query. Unfortunately a lot of ORMs
like to generate these types of queries, which in turn leads to
pg_stat_statements pollution. Ideally we want to prevent this and have only one
record for such a query.

As the result of [1]/messages/by-id/CAF42k=JCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ@mail.gmail.com I've identified two highlighted approaches to improve this
situation:

* Reduce the generated ArrayExpr to an array Const immediately, in cases where
all the inputs are Consts.

* Make repeating Const to contribute nothing to the resulting hash.

I've tried to prototype both approaches to find out pros/cons and be more
specific. Attached patches could not be considered a completed piece of work,
but they seem to work, mostly pass the tests and demonstrate the point. I would
like to get some high level input about them and ideally make it clear what is
the preferred solution to continue with.

# Reducing ArrayExpr to an array Const

IIUC this requires producing a Const with ArrayType constvalue in
transformAExprIn for ScalarArrayOpExpr. This could be a general improvement,
since apparently it's being done later anyway. But it deals only with Const,
which leaves more on the table, e.g. Params and other similar types of
duplication we observe when repeating constants are wrapped into VALUES.

Another point here is that it's quite possible this approach will still require
corresponding changes in pg_stat_statements, since just preventing duplicates
to show also loses the information. Ideally we also need to have some
understanding how many elements are actually there and display it, e.g. in
cases when there is just one outlier query that contains a huge IN list.

# Contribute nothing to the hash

I guess there could be multiple ways of doing this, but the first idea I had in
mind is to skip jumbling when necessary. At the same time it can be implemented
more centralized for different types of queries (although in the attached patch
there are only Const & Values). In the simplest case we just identify sequence
of constants of the same type, which just ignores any other cases when stuff is
mixed. But I believe it's something that could be considered a rare corner case
and it's better to start with the simplest solution.

Having said that I believe the second approach of contributing nothing to the
hash sounds more appealing, but would love to hear other opinions.

[1]: /messages/by-id/CAF42k=JCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ@mail.gmail.com

pg_stat_statements and "IN" conditions

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: