Indexing on JSONB field not working

Started by Zhihong Zhangover 6 years ago43 messagesbugs
Jump to latest
#1Zhihong Zhang
zhihong@gmail.com

I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2

ON public.assets USING btree

(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)

TABLESPACE pg_default;

However query doesn't use it,

explain select id, _doc->>'floatValue' from assets where (_doc #>
'{floatValue}'::text[])::double precision < 3.0 limit 3;

Limit (cost=0.00..3.24 rows=3 width=53)

-> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)

Filter: (((_doc #> '{floatValue}'::text[]))::double precision <
'3'::double precision)

The version of the database,

"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
20140120 (Red Hat 4.8.2-16), 64-bit"

However, the index works for text field in JSONB.

Let me know if I can provide more information.

Zhihong Zhang

zhihong@gmail.com

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Zhang (#1)
Re: Indexing on JSONB field not working

On Fri, Dec 20, 2019 at 04:52:17PM -0500, Zhihong Zhang wrote:

I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2

ON public.assets USING btree

(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)

TABLESPACE pg_default;

However query doesn't use it,

explain select id, _doc->>'floatValue' from assets where (_doc #>
'{floatValue}'::text[])::double precision < 3.0 limit 3;

Limit (cost=0.00..3.24 rows=3 width=53)

-> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)

Filter: (((_doc #> '{floatValue}'::text[]))::double precision <
'3'::double precision)

The index scan is likely expected to be more expensive than the plain
sequential scan with the LIMIT interrupting it pretty much right away
(it's expected to scan only ~0.0003% of the table.

You can probably push the database to use the index by disabling
sequential scans, i.e.

SET enable_seqscan = off;

and then doing the explain again.

The interesting question however is which of the plans is faster. It's
quite possible the database is making the right choice - index scans are
not necessarily faster.

The version of the database,

"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
20140120 (Red Hat 4.8.2-16), 64-bit"

However, the index works for text field in JSONB.

Well, the text field probably has different statistics, so the
sequential scan would have so scan much larger part of the table. Who
knows - you haven't shared the execution plans.

Let me know if I can provide more information.

Show us explain analyze for both queries, with both index-scan and
seq-scan (you'll have to use enable_seqscan and enable_indexscan to
force the plan choice).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#1)
Re: Indexing on JSONB field not working

On Fri, Dec 20, 2019 at 5:12 PM Zhihong Zhang <zhihong@gmail.com> wrote:

I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2

ON public.assets USING btree

(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)

TABLESPACE pg_default;

However query doesn’t use it,

Did you analyze the table after building the index? Expression indexes
have their own statistics, but they don't get populated until the table is
analyzed.

Cheers,

Jeff

#4Zhihong Zhang
zhihong@gmail.com
In reply to: Jeff Janes (#3)
Re: Indexing on JSONB field not working

A few clarifications,

1. The index is very effective. If I treat it as text, the index works. With index, the result returns in 1 second. Otherwise, it’s 5 minutes.
2. Removing limit doesn’t change the behavior.
3. I ran Analyze multiple times after indexing.

Zhihong

Show quoted text

On Dec 20, 2019, at 5:57 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Fri, Dec 20, 2019 at 5:12 PM Zhihong Zhang <zhihong@gmail.com <mailto:zhihong@gmail.com>> wrote:
I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2

ON public.assets USING btree

(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)

TABLESPACE pg_default;

However query doesn’t use it,

Did you analyze the table after building the index? Expression indexes have their own statistics, but they don't get populated until the table is analyzed.

Cheers,

Jeff

#5Zhihong Zhang
zhihong@gmail.com
In reply to: Tomas Vondra (#2)
Re: Indexing on JSONB field not working

Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!

SET enable_seqscan = off;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
"Planning Time: 1.617 ms"
"Execution Time: 0.276 ms"

SET enable_seqscan = on;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)"
" Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
" Rows Removed by Filter: 2602824"
"Planning Time: 0.283 ms"
"Execution Time: 313501.777 ms"

Zhihong

Show quoted text

On Dec 20, 2019, at 5:30 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Fri, Dec 20, 2019 at 04:52:17PM -0500, Zhihong Zhang wrote:

I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2

ON public.assets USING btree

(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)

TABLESPACE pg_default;

However query doesn't use it,

explain select id, _doc->>'floatValue' from assets where (_doc #>
'{floatValue}'::text[])::double precision < 3.0 limit 3;

Limit (cost=0.00..3.24 rows=3 width=53)

-> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)

Filter: (((_doc #> '{floatValue}'::text[]))::double precision <
'3'::double precision)

The index scan is likely expected to be more expensive than the plain
sequential scan with the LIMIT interrupting it pretty much right away
(it's expected to scan only ~0.0003% of the table.

You can probably push the database to use the index by disabling
sequential scans, i.e.

SET enable_seqscan = off;

and then doing the explain again.

The interesting question however is which of the plans is faster. It's
quite possible the database is making the right choice - index scans are
not necessarily faster.

The version of the database,

"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
20140120 (Red Hat 4.8.2-16), 64-bit"

However, the index works for text field in JSONB.

Well, the text field probably has different statistics, so the
sequential scan would have so scan much larger part of the table. Who
knows - you haven't shared the execution plans.

Let me know if I can provide more information.

Show us explain analyze for both queries, with both index-scan and
seq-scan (you'll have to use enable_seqscan and enable_indexscan to
force the plan choice).

regards

--
Tomas Vondra http://www.2ndQuadrant.com <http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#4)
Re: Indexing on JSONB field not working

On Fri, Dec 20, 2019 at 7:39 PM Zhihong Zhang <zhihong@gmail.com> wrote:

A few clarifications,

1. The index is very effective. If I treat it as text, the index works.
With index, the result returns in 1 second. Otherwise, it’s 5 minutes.

An index is over a datatype. You can't just "treat an index like text".
Either it is an index over text, or it is not. It you cast an expression
to text while defining the index, then that is a different index than if
you defined it with a cast to double precision.

2. Removing limit doesn’t change the behavior.

I have no problem getting your index to be used with or without the LIMIT ,
as long as the expression "(_doc #> '{floatValue}'::text[])::double
precision < 3.0" is selective enough that it appears to be worth using an
index for it. For example:

insert into assets (id,_doc) select x,
jsonb_build_object('floatValue',random()*1000) from
generate_series(1,1000000) f(x);

We can't comment on how two queries might differ, then we have only seen
one of them.

Cheers,

Jeff

#7Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Zhang (#5)
Re: Indexing on JSONB field not working

On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:

Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!

SET enable_seqscan = off;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
"Planning Time: 1.617 ms"
"Execution Time: 0.276 ms"

SET enable_seqscan = on;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)"
" Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
" Rows Removed by Filter: 2602824"
"Planning Time: 0.283 ms"
"Execution Time: 313501.777 ms"

Well, this confirms what I suspected before - the optimizer believes the
seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
that mismatches the actual performance. The question is why ...

For LIMIT queries, I can think of two common issues - the cost estimate
is computed as a simple linear approximation in the input relation. For
example, we know the seqscan is expected to produce 867607 rows with a
total cost of 936605, so the cost of producing just 100 rows is

100 * 936605 / 867607 = 107.95

But that assumes a number of things: (a) that the seqscan row estimate
is correct, and that (b) the matching rows are uniformly distributed in
the table. If it's misestimated, or if the rows are towards the end of
the relation (i.e. after doing a most of the costed work) this estimate
may be quite off.

Can you do explain analyze of the query without the LIMIT?

BTW a LIMIT without an ORDER BY is a bit strange. Also, maybe you could
do try using a partial index (if the where condition does not change).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#5)
Re: Indexing on JSONB field not working

On Sat, Dec 21, 2019 at 7:00 AM Zhihong Zhang <zhihong@gmail.com> wrote:

Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!

SET enable_seqscan = off;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #>
'{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236
rows=7 loops=1)"
" -> Index Scan using assets_floatvalue_idx on assets
(cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230
rows=7 loops=1)"
" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision
< '3'::double precision)"
"Planning Time: 1.617 ms"
"Execution Time: 0.276 ms"

313 is a pretty high estimate for fetching an estimated 100 rows. It must
think that nearly every row fetched from the table is going to be a random
page fetch. Which means that it must think the correlation between
assets_floatvalue_idx and physical table order is close to 0.

SET enable_seqscan = on;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #>
'{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.00..107.95 rows=100 width=53) (actual
time=41021.311..313501.746 rows=7 loops=1)"
" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)
(actual time=41021.309..313501.732 rows=7 loops=1)"
" Filter: (((_doc #> '{floatValue}'::text[]))::double precision <
'3'::double precision)"
" Rows Removed by Filter: 2602824"
"Planning Time: 0.283 ms"
"Execution Time: 313501.777 ms"

It thinks it will find 867607 rows which meet the <3.0 condition, but
really it only finds 7. It has to scan the full table, because with only 7
rows it can never stop early due to the LIMIT 100. Why is the estimate
wrong by a factor of over 100,000? It should be using the statistics from
the expression index here (even though it is not using the index during
execution), and so should have pretty good statistics.

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the
screen, not across. In psql, that would toggled on with \x.

Cheers,

Jeff

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#7)
Re: Indexing on JSONB field not working

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:

"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"

Well, this confirms what I suspected before - the optimizer believes the
seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
that mismatches the actual performance. The question is why ...

The planner evidently believes that 867607 rows will match the query
condition, so it expects that the scan will stop (after collecting
100 rows) very quickly. In reality only 7 rows match, so the scan
has to run to completion. This is what's bollixing the plan choice.

I suspect that 867607 is just a default estimate, but if ANALYZE has
been run then there should be stats for the index column, so why isn't
it doing better? When I try a similar case here, I get good estimates:

regression=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
(1 row)

regression=# create table public.assets(_doc jsonb);
CREATE TABLE
regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST);
CREATE INDEX
regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x;
INSERT 0 10000
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------
Limit (cost=0.00..7.71 rows=100 width=32)
-> Seq Scan on assets (cost=0.00..293.44 rows=3808 width=32)
Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

regression=# analyze assets;
ANALYZE
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=0.29..8.33 rows=2 width=32)
-> Index Scan using assets_float8_idx on assets (cost=0.29..8.33 rows=2 width=32)
Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

The "3808" estimate is just a default for '<' with no stats, but
with stats I get a dead-on estimate.

regards, tom lane

#10Zhihong Zhang
zhihong@gmail.com
In reply to: Tom Lane (#9)
Re: Indexing on JSONB field not working

Just came back from the long break and I couldn’t quite follow the threads. Is this a bug or something I am doing wrong?

If it’s a bug, can I have a tracking or ticket number?

Thanks!

Zhihong

Show quoted text

On Dec 21, 2019, at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:

"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"

Well, this confirms what I suspected before - the optimizer believes the
seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
that mismatches the actual performance. The question is why ...

The planner evidently believes that 867607 rows will match the query
condition, so it expects that the scan will stop (after collecting
100 rows) very quickly. In reality only 7 rows match, so the scan
has to run to completion. This is what's bollixing the plan choice.

I suspect that 867607 is just a default estimate, but if ANALYZE has
been run then there should be stats for the index column, so why isn't
it doing better? When I try a similar case here, I get good estimates:

regression=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
(1 row)

regression=# create table public.assets(_doc jsonb);
CREATE TABLE
regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST);
CREATE INDEX
regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x;
INSERT 0 10000
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------
Limit (cost=0.00..7.71 rows=100 width=32)
-> Seq Scan on assets (cost=0.00..293.44 rows=3808 width=32)
Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

regression=# analyze assets;
ANALYZE
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=0.29..8.33 rows=2 width=32)
-> Index Scan using assets_float8_idx on assets (cost=0.29..8.33 rows=2 width=32)
Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

The "3808" estimate is just a default for '<' with no stats, but
with stats I get a dead-on estimate.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zhihong Zhang (#10)
Re: Indexing on JSONB field not working

Zhihong Zhang <zhihong@gmail.com> writes:

Just came back from the long break and I couldn’t quite follow the threads. Is this a bug or something I am doing wrong?

We're wondering why the optimizer doesn't derive a better rowcount
estimate, and waiting on you to provide information about what the
relevant pg_stats entries are.

regards, tom lane

#12Zhihong Zhang
zhihong@gmail.com
In reply to: Jeff Janes (#8)
Re: Indexing on JSONB field not working

That command yields nothing. If I use the tablename ‘assets’, it only returns stats on columns, nothing about indexes. The indexed field is in JSONB field ‘_doc’. This is the stats on the column,

select * from pg_stats where tablename='assets' and attname='_doc';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_
common_elem_freqs | elem_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------
------------------+----------------------
public | assets | _doc | f | 0 | 18 | -1 | | | | | |
|
(1 row)

Let me know if you want any other information.

Zhihong

Show quoted text

On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.

#13Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#12)
Re: Indexing on JSONB field not working

On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the
screen, not across. In psql, that would toggled on with \x.

On Thu, Dec 26, 2019 at 11:30 AM Zhihong Zhang <zhihong@gmail.com> wrote:

That command yields nothing.

Did you use the actual name of the index (in case its actual name differs
from what you showed us in the CREATE INDEX)? If you did and it doesn't
show anything, then I don't think the table has been ANALYZED since the
index was created. Note that ANALYZE and EXPLAIN ANALYZE are different
things. You need to run "ANALYZE assets;" or "VACUUM ANALYZE assets;"

Cheers,

Jeff

#14Zhihong Zhang
zhihong@gmail.com
In reply to: Jeff Janes (#13)
Re: Indexing on JSONB field not working

The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some are created when table is created. None of them is in pg_stats.

The pg_stats only contains a row for each column, nothing else.

The database is on AWS RDS. Does that make any difference?

Zhihong

Show quoted text

On Dec 26, 2019, at 11:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com <mailto:jeff.janes@gmail.com>> wrote:

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.

On Thu, Dec 26, 2019 at 11:30 AM Zhihong Zhang <zhihong@gmail.com <mailto:zhihong@gmail.com>> wrote:
That command yields nothing.

Did you use the actual name of the index (in case its actual name differs from what you showed us in the CREATE INDEX)? If you did and it doesn't show anything, then I don't think the table has been ANALYZED since the index was created. Note that ANALYZE and EXPLAIN ANALYZE are different things. You need to run "ANALYZE assets;" or "VACUUM ANALYZE assets;"

Cheers,

Jeff

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zhihong Zhang (#14)
Re: Indexing on JSONB field not working

Zhihong Zhang <zhihong@gmail.com> writes:

The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some are created when table is created. None of them is in pg_stats.

You still haven't shown us exactly what you're doing, only asserted that
you're doing the right thing, which seems somewhat unlikely given that
nobody has been able to reproduce this behavior based on what you've
told us so far.

One possible gotcha is indicated by this comment in analyze.c:

* Open all indexes of the relation, and see if there are any analyzable
* columns in the indexes. We do not analyze index columns if there was
* an explicit column list in the ANALYZE command, however. If we are

that is, if you did something like "ANALYZE assets(_doc);" it would
not result in creation of stats for any expression indexes.

The database is on AWS RDS. Does that make any difference?

Wouldn't have thought that their version of Postgres deviates much
from community Postgres. If you were talking about Aurora or Redshift,
I can't speak to what those do.

regards, tom lane

#16Zhihong Zhang
zhihong@gmail.com
In reply to: Tom Lane (#15)
Re: Indexing on JSONB field not working

I simply ran ‘analyze;’ without table name and I assume that analyzes everything. I just ran ‘analyze assets;’ again but still no stats on indexes.

I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling this?

Thanks!

Zhihong

Show quoted text

On Dec 26, 2019, at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Zhihong Zhang <zhihong@gmail.com> writes:

The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some are created when table is created. None of them is in pg_stats.

You still haven't shown us exactly what you're doing, only asserted that
you're doing the right thing, which seems somewhat unlikely given that
nobody has been able to reproduce this behavior based on what you've
told us so far.

One possible gotcha is indicated by this comment in analyze.c:

* Open all indexes of the relation, and see if there are any analyzable
* columns in the indexes. We do not analyze index columns if there was
* an explicit column list in the ANALYZE command, however. If we are

that is, if you did something like "ANALYZE assets(_doc);" it would
not result in creation of stats for any expression indexes.

The database is on AWS RDS. Does that make any difference?

Wouldn't have thought that their version of Postgres deviates much
from community Postgres. If you were talking about Aurora or Redshift,
I can't speak to what those do.

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zhihong Zhang (#16)
Re: Indexing on JSONB field not working

Zhihong Zhang <zhihong@gmail.com> writes:

I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling this?

No, I don't think so. We only collect stats on index expressions, though,
not simple columns (since those would be duplicative of the underlying
column's stats).

Hmmm ... looking at the pg_stats view, it has a filter

WHERE NOT a.attisdropped AND
has_column_privilege(c.oid, a.attnum, 'select'::text) AND
(c.relrowsecurity = false OR NOT row_security_active(c.oid))

The has_column_privilege test might be getting in the way if you're
not superuser; it will probably think you have no access privileges
for the index. I now recall somebody complaining about that before [1]/messages/by-id/6369212.CF36pTLAQO@peanuts2,
but no fix has been accepted as yet.

Having said that, though, that only accounts for you not seeing the
entries in the pg_stats view; it doesn't explain why the optimizer
doesn't see them, assuming they're actually there in pg_statistic,
which they surely should be.

As I recall, RDS doesn't give out superuser access, so it may be
hard for you to learn more about what's happening :-(

regards, tom lane

[1]: /messages/by-id/6369212.CF36pTLAQO@peanuts2

#18Zhihong Zhang
zhihong@gmail.com
In reply to: Tom Lane (#17)
Re: Indexing on JSONB field not working

I am not superuser. Let me see how I can get access to the superuser on RDS.

I will get back to you.

Thanks!

Zhihong

Show quoted text

On Dec 26, 2019, at 3:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Zhihong Zhang <zhihong@gmail.com> writes:

I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling this?

No, I don't think so. We only collect stats on index expressions, though,
not simple columns (since those would be duplicative of the underlying
column's stats).

Hmmm ... looking at the pg_stats view, it has a filter

WHERE NOT a.attisdropped AND
has_column_privilege(c.oid, a.attnum, 'select'::text) AND
(c.relrowsecurity = false OR NOT row_security_active(c.oid))

The has_column_privilege test might be getting in the way if you're
not superuser; it will probably think you have no access privileges
for the index. I now recall somebody complaining about that before [1],
but no fix has been accepted as yet.

Having said that, though, that only accounts for you not seeing the
entries in the pg_stats view; it doesn't explain why the optimizer
doesn't see them, assuming they're actually there in pg_statistic,
which they surely should be.

As I recall, RDS doesn't give out superuser access, so it may be
hard for you to learn more about what's happening :-(

regards, tom lane

[1] /messages/by-id/6369212.CF36pTLAQO@peanuts2

#19Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Zhang (#18)
Re: Indexing on JSONB field not working

On Thu, Dec 26, 2019 at 05:25:22PM -0500, Zhihong Zhang wrote:

I am not superuser. Let me see how I can get access to the superuser on RDS.

You can't - that's the point of am environment managed by someone else.

Perhaps you could inspect the data directly in pg_statistic, not through
the pg_stats view. But I haven't tried if that works on RDS.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#19)
Re: Indexing on JSONB field not working

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Thu, Dec 26, 2019 at 05:25:22PM -0500, Zhihong Zhang wrote:

I am not superuser. Let me see how I can get access to the superuser on RDS.

You can't - that's the point of am environment managed by someone else.
Perhaps you could inspect the data directly in pg_statistic, not through
the pg_stats view. But I haven't tried if that works on RDS.

He definitely shouldn't be able to do that, if not superuser.

However, after experimenting a bit it seems like pg_stats does work for
indexes as long as you are the owner of the index's table, which probably
explains why it took so long for anybody to notice the problem. Maybe the
OP is trying to look at the stats as a non-owner that's just been GRANT'ed
table access?

regards, tom lane

#21Zhihong Zhang
zhihong@gmail.com
In reply to: Tom Lane (#20)
#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Zhang (#21)
#23Zhihong Zhang
zhihong@gmail.com
In reply to: Zhihong Zhang (#10)
#24Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#14)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zhihong Zhang (#23)
#26Zhihong Zhang
zhihong@gmail.com
In reply to: Jeff Janes (#24)
#27Zhihong Zhang
zhihong@gmail.com
In reply to: Tom Lane (#25)
#28Zhihong Zhang
zhihong@gmail.com
In reply to: Tom Lane (#25)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zhihong Zhang (#26)
#30Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#28)
#31Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#29)
#32Zhihong Zhang
zhihong@gmail.com
In reply to: Jeff Janes (#30)
#33Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Zhang (#32)
#34Zhihong Zhang
zhihong@gmail.com
In reply to: Tomas Vondra (#33)
#35Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#32)
#36Zhihong Zhang
zhihong@gmail.com
In reply to: Jeff Janes (#35)
#37Zhihong Zhang
zhihong@gmail.com
In reply to: Zhihong Zhang (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zhihong Zhang (#37)
#39Zhihong Zhang
zhihong@gmail.com
In reply to: Tom Lane (#38)
#40Zhihong Zhang
zhihong@gmail.com
In reply to: Tom Lane (#38)
#41Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#37)
#42Jeff Janes
jeff.janes@gmail.com
In reply to: Zhihong Zhang (#39)
#43Zhihong Zhang
zhihong@gmail.com
In reply to: Jeff Janes (#41)