massive update on gin index

Started by Marcos Pegoraroalmost 4 years ago6 messagesgeneral

marcos@f10.com.br

almost 4 years ago

In a table with people's info I have 3 phone numbers, mobile, work and
home. But then some have 2 mobiles, some have 2 work numbers, so decided to
test it as an array of json. I know I could have another table for that,
but I was just testing.

So my original table had
Mobile, Work, Home and all of them are btree indexed.

Then added a jsonb field and updated it with those 3 phone numbers on it
[{"phone": 2236279878, "type": 1}, {"phone": 22998432631, "type": 2}]
[{"phone": 22996783278, "type": 2}]
create index idxPhones on People using gin(Phones)

If I select using old or new fields, both uses index and Execution Time is
similar
explain analyze select * from People where Phones @>
'[{"phone": 2236279878}]';
explain analyze select * from People where Mobile = 2236279878 or Work
= 2236279878 or Home = 2236279878;

But then I repeated 2 or 3 times that update which stores those 3 phones on
json and then my gin index became slow, very very slow, why ?

select using btree on 3 phone numbers - Execution Time: 0.164 ms
select using gin on json on first update - Execution Time: 0.220 ms
select using gin on json next to 2 or 3 updates - Execution Time: 11.220 ms

And that execution time will come back to 0.220 ms only if I recreate the
index.

Then I found gin_pending_list_limit and fast_update which I think are used
to update GIN indexes, but didn´t find any examples of both.

What am I missing ? That gin index needs to have some more options or
attributes on it ?
I know in a day by day use I'll never do that massive update twice but just
to understand when will this index be updated ?

Thanks
Marcos

Guyren Howe

guyren@gmail.com

almost 4 years ago

In reply to: Marcos Pegoraro (#1)

Re: massive update on gin index

You might consider defining a phone type that includes your “type” information, and just having an array of those, if you really want to do something like this.

But a related table instead would be the obvious answer.

Show quoted text

On Sep 14, 2022, at 12:33 , Marcos Pegoraro <marcos@f10.com.br <mailto:marcos@f10.com.br>> wrote:

In a table with people's info I have 3 phone numbers, mobile, work and home. But then some have 2 mobiles, some have 2 work numbers, so decided to test it as an array of json. I know I could have another table for that, but I was just testing.

So my original table had
Mobile, Work, Home and all of them are btree indexed.

Then added a jsonb field and updated it with those 3 phone numbers on it
[{"phone": 2236279878, "type": 1}, {"phone": 22998432631, "type": 2}]
[{"phone": 22996783278, "type": 2}]
create index idxPhones on People using gin(Phones)

If I select using old or new fields, both uses index and Execution Time is similar
explain analyze select * from People where Phones @> '[{"phone": 2236279878}]';
explain analyze select * from People where Mobile = 2236279878 or Work = 2236279878 or Home = 2236279878;

But then I repeated 2 or 3 times that update which stores those 3 phones on json and then my gin index became slow, very very slow, why ?

select using btree on 3 phone numbers - Execution Time: 0.164 ms
select using gin on json on first update - Execution Time: 0.220 ms
select using gin on json next to 2 or 3 updates - Execution Time: 11.220 ms

And that execution time will come back to 0.220 ms only if I recreate the index.

Then I found gin_pending_list_limit and fast_update which I think are used to update GIN indexes, but didn´t find any examples of both.

What am I missing ? That gin index needs to have some more options or attributes on it ?
I know in a day by day use I'll never do that massive update twice but just to understand when will this index be updated ?

Thanks
Marcos

Rob Sargent

robjsargent@gmail.com

almost 4 years ago

In reply to: Guyren Howe (#2)

Re: massive update on gin index

On 9/14/22 13:38, Guyren Howe wrote:

You might consider defining a phone type that includes your “type”
information, and just having an array of those, if you really want to
do something like this.

But a related table instead would be the obvious answer.

Did you try a simple array of phone numbers? If you really care about
mobile,work,home prepend the number with one of HMW. Easily stripped off
as necessary. I've had decent performance with arrays in the past.

Marcos Pegoraro

marcos@f10.com.br

almost 4 years ago

In reply to: Rob Sargent (#3)

Re: massive update on gin index

Did you try a simple array of phone numbers? If you really care about
mobile,work,home prepend the number with one of HMW. Easily stripped off
as necessary. I've had decent performance with arrays in the past.

I know I have other options, and possibly better, but I was trying to
understand what happens with gin indexes, just that.

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Marcos Pegoraro (#4)

Re: massive update on gin index

Marcos Pegoraro <marcos@f10.com.br> writes:

I know I have other options, and possibly better, but I was trying to
understand what happens with gin indexes, just that.

GIN does have a "pending list" of insertions not yet pushed into the main
index structure, and search performance will suffer if that gets too
bloated. I don't recall much about how to control that, but I think
vacuuming the table will serve to empty the pending list. Also see

https://www.postgresql.org/docs/current/gin-implementation.html#GIN-FAST-UPDATE

regards, tom lane

Marcos Pegoraro

marcos@f10.com.br

almost 4 years ago

In reply to: Tom Lane (#5)

Re: massive update on gin index

Em qua., 14 de set. de 2022 às 16:55, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

GIN does have a "pending list" of insertions not yet pushed into the main
index structure, and search performance will suffer if that gets too
bloated. I don't recall much about how to control that, but I think
vacuuming the table will serve to empty the pending list. Also see

https://www.postgresql.org/docs/current/gin-implementation.html#GIN-FAST-UPDATE

regards, tom lane

Correct, if I want use that index immediately with same performance I have
to call

select pg_catalog.gin_clean_pending_list('idxphones');

Or wait next autovacuum.

thanks
Marcos