Partial index on JSON column

Started by Samuel Williamsabout 7 years ago8 messagesgeneral
Jump to latest
#1Samuel Williams
space.ship.traveller@gmail.com

Hello

I have a table with ~3 billion events.

Of this, there are a small subset of events which match the following query:

CREATE INDEX index_user_event_for_suggestion_notification ON
public.user_event USING btree ((((parameters ->>
'suggestion_id'::text))::integer), what) WHERE ((parameters ->>
'suggestion_id'::text) IS NOT NULL)

When I do this query:

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text)::integer = 26) AND what = 'suggestion_notification';

Aggregate (cost=7115959.48..7115959.49 rows=1 width=8)
-> Bitmap Heap Scan on user_event (cost=37360.24..7115907.56 rows=20771
width=0)
Recheck Cond: ((what)::text = 'suggestion_notification'::text)
Filter: (((parameters ->> 'suggestion_id'::text))::integer = 26)
-> Bitmap Index Scan on index_user_event_for_clustering
(cost=0.00..37355.05 rows=4154273 width=0)
Index Cond: ((what)::text = 'suggestion_notification'::text)

It's slow. I need to explicitly add the NULL constraint:

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text)::integer = 26) AND ((parameters ->> 'suggestion_id'::
text) IS NOT NULL) AND what = 'suggestion_notification';

Aggregate (cost=38871.48..38871.49 rows=1 width=8)
-> Index Scan using index_user_event_for_suggestion_notification on
user_event (cost=0.42..38819.81 rows=20668 width=0)
Index Cond: ((((parameters ->> 'suggestion_id'::text))::integer =
26) AND ((what)::text = 'suggestion_notification'::text))

I feel like the null constraint should be implicit.

That being said:

- Is my partial index wrong? Should I write it differently so the optimiser
knows this?
- Is my query wrong? How can I make the most use of this index without
being explicit?
- Any other suggestions for how I can do this?

Thanks
Samuel

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Samuel Williams (#1)
Re: Partial index on JSON column

Samuel Williams <space.ship.traveller@gmail.com> writes:

When I do this query:

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text)::integer = 26) AND what = 'suggestion_notification';

It's slow. I need to explicitly add the NULL constraint:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold. In any
case you're better off without the runtime type conversion: that
isn't doing much for you except raising the odds of getting an error.

regards, tom lane

#3Samuel Williams
space.ship.traveller@gmail.com
In reply to: Tom Lane (#2)
Re: Partial index on JSON column

Thanks for the quick reply Tom,

I will try your advice.

The reason why I used ::integer for the INDEX is because I assumed it would
be more efficient both in space and performance.

In the JSONB field, it is actually an integer, i.e. {"location_age": 1,
"suggestion_id": 26}

So, now that I think about it, maybe the way I'm using ::text is wrong.

Any further advice is most appreciated.

Kind regards,
Samuel

On Wed, 20 Feb 2019 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Samuel Williams <space.ship.traveller@gmail.com> writes:

When I do this query:

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text)::integer = 26) AND what =

'suggestion_notification';

It's slow. I need to explicitly add the NULL constraint:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold. In any
case you're better off without the runtime type conversion: that
isn't doing much for you except raising the odds of getting an error.

regards, tom lane

#4Michael Lewis
mlewis@entrata.com
In reply to: Samuel Williams (#1)
Re: Partial index on JSON column

On Tue, Feb 19, 2019 at 1:41 PM Samuel Williams <
space.ship.traveller@gmail.com> wrote:

Hello

I have a table with ~3 billion events.

Of this, there are a small subset of events which match the following
query:

CREATE INDEX index_user_event_for_suggestion_notification ON
public.user_event USING btree ((((parameters ->>
'suggestion_id'::text))::integer), what) WHERE ((parameters ->>
'suggestion_id'::text) IS NOT NULL)

You may want to use this-
CREATE INDEX index_user_event_for_suggestion_notification ON
public.user_event USING btree (parameters ->> 'suggestion_id', what) WHERE
parameters ? 'suggestion_id';

-Michael

#5Michael Lewis
mlewis@entrata.com
In reply to: Samuel Williams (#3)
Re: Partial index on JSON column

On Wed, 20 Feb 2019 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Samuel Williams <space.ship.traveller@gmail.com> writes:

When I do this query:

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text)::integer = 26) AND what =

'suggestion_notification';

It's slow. I need to explicitly add the NULL constraint:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold. In any
case you're better off without the runtime type conversion: that
isn't doing much for you except raising the odds of getting an error.

For run-time safe comparisons of something stored in jsonb that I expect to
only have booleans (and I don't trust input to always be valid), I have a
utility function like this-

CREATE OR REPLACE FUNCTION public.util_to_bool( pInput TEXT )
RETURNS BOOL AS
$BODY$
BEGIN
RETURN pInput::BOOL;
EXCEPTION WHEN OTHERS THEN
RETURN FALSE;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

This is probably not best practice however.

Show quoted text

regards, tom lane

#6Samuel Williams
space.ship.traveller@gmail.com
In reply to: Samuel Williams (#3)
Re: Partial index on JSON column

So, I checked my original code, and I didn't add `::text`, it was actually
added by Postgres:

EXPLAIN for: SELECT "user_event".* FROM "user_event" WHERE
"user_event"."what" = $1 AND (((parameters ->> 'suggestion_id')::integer) =
119 AND ((parameters ->> 'suggestion_id') IS NOT NULL)) [["what",
"suggestion_notification"]]
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Index Scan using index_user_event_for_suggestion_notification on
user_event (cost=0.42..77193.95 rows=20669 width=138)
Index Cond: ((((parameters ->> 'suggestion_id'::text))::integer = 119)
AND ((what)::text = 'suggestion_notification'::text))
(2 rows)

Is there some way to directly use the integer value in the index with
minimal type coercions?

Thanks
Samuel

On Wed, 20 Feb 2019 at 10:24, Samuel Williams <
space.ship.traveller@gmail.com> wrote:

Show quoted text

Thanks for the quick reply Tom,

I will try your advice.

The reason why I used ::integer for the INDEX is because I assumed it
would be more efficient both in space and performance.

In the JSONB field, it is actually an integer, i.e. {"location_age": 1,
"suggestion_id": 26}

So, now that I think about it, maybe the way I'm using ::text is wrong.

Any further advice is most appreciated.

Kind regards,
Samuel

On Wed, 20 Feb 2019 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Samuel Williams <space.ship.traveller@gmail.com> writes:

When I do this query:

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text)::integer = 26) AND what =

'suggestion_notification';

It's slow. I need to explicitly add the NULL constraint:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold. In any
case you're better off without the runtime type conversion: that
isn't doing much for you except raising the odds of getting an error.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: Partial index on JSON column

I wrote:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold.

Hmm ... on closer inspection, we do assume that CoerceViaIO is strict,
but not everyplace knows that, so there's something that could be
improved here. In the meantime, try it as stated above.

regards, tom lane

#8Samuel Williams
space.ship.traveller@gmail.com
In reply to: Tom Lane (#7)
Re: Partial index on JSON column

Thanks Tom, I did solve the problem by adding the null constraint for now,
it's a quick solution, and I look forward to the future where this case is
handled appropriately.

On Wed, 20 Feb 2019 at 12:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

I wrote:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold.

Hmm ... on closer inspection, we do assume that CoerceViaIO is strict,
but not everyplace knows that, so there's something that could be
improved here. In the meantime, try it as stated above.

regards, tom lane