PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

Started by Fujii Masaoover 11 years ago31 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Should we try to install some hack around fastupdate for 9.4? I fear
the divergence between reasonable values of work_mem and reasonable
sizes for that list is only going to continue to get bigger. I'm sure
there's somebody out there who has work_mem = 16GB, and stuff like
263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
appeal of large values.

Controlling the threshold of the size of pending list only by GUC doesn't
seem reasonable. Users may want to increase the threshold only for the
GIN index which can be updated heavily, and decrease it otherwise. So
I think that it's better to add new storage parameter for GIN index to control
the threshold, or both storage parameter and GUC.

Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more
appropriate.

The attached patch introduces the GIN index storage parameter
"PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
GIN pending list. If it's not set, work_mem is used as that maximum size,
instead. So this patch doesn't break the existing application which
currently uses work_mem as the threshold of cleanup operation of
the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

This is an index storage parameter, and allows us to specify each
threshold per GIN index. So, for example, we can increase the threshold
only for the GIN index which can be updated heavily, and decrease it otherwise.

This patch uses another patch that I proposed (*1) as an infrastructure.
Please apply that infrastructure patch first if you apply this patch.

(*1)
/messages/by-id/CAHGQGwEanQ_e8WLHL25=bm_8Z5zkyZw0K0yiR+kdMV2HgnE9FQ@mail.gmail.com

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#1)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Should we try to install some hack around fastupdate for 9.4? I fear
the divergence between reasonable values of work_mem and reasonable
sizes for that list is only going to continue to get bigger. I'm sure
there's somebody out there who has work_mem = 16GB, and stuff like
263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
appeal of large values.

Controlling the threshold of the size of pending list only by GUC doesn't
seem reasonable. Users may want to increase the threshold only for the
GIN index which can be updated heavily, and decrease it otherwise. So
I think that it's better to add new storage parameter for GIN index to control
the threshold, or both storage parameter and GUC.

Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more
appropriate.

The attached patch introduces the GIN index storage parameter
"PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
GIN pending list. If it's not set, work_mem is used as that maximum size,
instead. So this patch doesn't break the existing application which
currently uses work_mem as the threshold of cleanup operation of
the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

This is an index storage parameter, and allows us to specify each
threshold per GIN index. So, for example, we can increase the threshold
only for the GIN index which can be updated heavily, and decrease it otherwise.

This patch uses another patch that I proposed (*1) as an infrastructure.
Please apply that infrastructure patch first if you apply this patch.

(*1)
/messages/by-id/CAHGQGwEanQ_e8WLHL25=bm_8Z5zkyZw0K0yiR+kdMV2HgnE9FQ@mail.gmail.com

Regards,

--
Fujii Masao

Sorry, I forgot to attached the patch.... This time, attached.

Regards,

--
Fujii Masao

Attachments:

pending_list_cleanup_size_v1.patchtext/x-patch; charset=US-ASCII; name=pending_list_cleanup_size_v1.patchDownload+270-352
#3Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#1)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

The attached patch introduces...

A patch perhaps? :)
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#2)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Fri, Aug 8, 2014 at 11:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Should we try to install some hack around fastupdate for 9.4? I fear
the divergence between reasonable values of work_mem and reasonable
sizes for that list is only going to continue to get bigger. I'm sure
there's somebody out there who has work_mem = 16GB, and stuff like
263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
appeal of large values.

Controlling the threshold of the size of pending list only by GUC doesn't
seem reasonable. Users may want to increase the threshold only for the
GIN index which can be updated heavily, and decrease it otherwise. So
I think that it's better to add new storage parameter for GIN index to control
the threshold, or both storage parameter and GUC.

Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more
appropriate.

The attached patch introduces the GIN index storage parameter
"PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
GIN pending list. If it's not set, work_mem is used as that maximum size,
instead. So this patch doesn't break the existing application which
currently uses work_mem as the threshold of cleanup operation of
the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

This is an index storage parameter, and allows us to specify each
threshold per GIN index. So, for example, we can increase the threshold
only for the GIN index which can be updated heavily, and decrease it otherwise.

This patch uses another patch that I proposed (*1) as an infrastructure.
Please apply that infrastructure patch first if you apply this patch.

(*1)
/messages/by-id/CAHGQGwEanQ_e8WLHL25=bm_8Z5zkyZw0K0yiR+kdMV2HgnE9FQ@mail.gmail.com

Regards,

--
Fujii Masao

Sorry, I forgot to attached the patch.... This time, attached.

I think that this patch should be rebased.
It contains garbage code.

Regards,

-------
Sawada Masahiko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Masahiko Sawada (#4)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Sat, Aug 16, 2014 at 4:23 PM, Sawada Masahiko <sawada.mshk@gmail.com> wrote:

On Fri, Aug 8, 2014 at 11:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Should we try to install some hack around fastupdate for 9.4? I fear
the divergence between reasonable values of work_mem and reasonable
sizes for that list is only going to continue to get bigger. I'm sure
there's somebody out there who has work_mem = 16GB, and stuff like
263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
appeal of large values.

Controlling the threshold of the size of pending list only by GUC doesn't
seem reasonable. Users may want to increase the threshold only for the
GIN index which can be updated heavily, and decrease it otherwise. So
I think that it's better to add new storage parameter for GIN index to control
the threshold, or both storage parameter and GUC.

Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more
appropriate.

The attached patch introduces the GIN index storage parameter
"PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
GIN pending list. If it's not set, work_mem is used as that maximum size,
instead. So this patch doesn't break the existing application which
currently uses work_mem as the threshold of cleanup operation of
the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

This is an index storage parameter, and allows us to specify each
threshold per GIN index. So, for example, we can increase the threshold
only for the GIN index which can be updated heavily, and decrease it otherwise.

This patch uses another patch that I proposed (*1) as an infrastructure.
Please apply that infrastructure patch first if you apply this patch.

(*1)
/messages/by-id/CAHGQGwEanQ_e8WLHL25=bm_8Z5zkyZw0K0yiR+kdMV2HgnE9FQ@mail.gmail.com

Regards,

--
Fujii Masao

Sorry, I forgot to attached the patch.... This time, attached.

I think that this patch should be rebased.
It contains garbage code.

Thanks for reviewing the patch! ISTM that I failed to make the patch from
my git repository... Attached is the rebased version.

Regards,

--
Fujii Masao

Attachments:

pending_list_cleanup_size_v2.patchtext/x-patch; charset=US-ASCII; name=pending_list_cleanup_size_v2.patchDownload+68-21
#6Jeff Janes
jeff.janes@gmail.com
In reply to: Fujii Masao (#5)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Sun, Aug 17, 2014 at 7:46 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for reviewing the patch! ISTM that I failed to make the patch from
my git repository... Attached is the rebased version.

I get some compiler warnings on v2 of this patch:

reloptions.c:219: warning: excess elements in struct initializer
reloptions.c:219: warning: (near initialization for 'intRelOpts[15]')

Cheers,

Jeff

#7Fujii Masao
masao.fujii@gmail.com
In reply to: Jeff Janes (#6)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sun, Aug 17, 2014 at 7:46 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for reviewing the patch! ISTM that I failed to make the patch from
my git repository... Attached is the rebased version.

I get some compiler warnings on v2 of this patch:

reloptions.c:219: warning: excess elements in struct initializer
reloptions.c:219: warning: (near initialization for 'intRelOpts[15]')

Thanks for testing the patch!
Attached is the updated version of the patch.

Previously the patch depended on another infrastructure patch
(which allows a user to specify the unit in reloption (*1)). But that
infrastructure patch has serious problem and it's not easy to fix
the problem. So I changed the patch so that it doesn't depend on
that infrastructure patch at all. Even without the infrastructure
patch, the feature that this patch introduces is useful.

Also I added the regression test into the patch.

(*1)
/messages/by-id/CAHGQGwEanQ_e8WLHL25=bm_8Z5zkyZw0K0yiR+kdMV2HgnE9FQ@mail.gmail.com

Regards,

--
Fujii Masao

Attachments:

pending_list_cleanup_size_v3.patchtext/x-patch; charset=US-ASCII; name=pending_list_cleanup_size_v3.patchDownload+101-50
#8Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Fujii Masao (#7)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

(2014/09/09 22:17), Fujii Masao wrote:

On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I get some compiler warnings on v2 of this patch:

reloptions.c:219: warning: excess elements in struct initializer
reloptions.c:219: warning: (near initialization for 'intRelOpts[15]')

Attached is the updated version of the patch.

Thank you for updating the patch!

I took a quick review on the patch. It looks good to me, but one thing
I'm concerned about is

You wrote:

The attached patch introduces the GIN index storage parameter
"PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
GIN pending list. If it's not set, work_mem is used as that

maximum size,

instead. So this patch doesn't break the existing application which
currently uses work_mem as the threshold of cleanup operation of
the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

As you mentioned, I think it's important to consider for the existing
applications, but I'm wondering if it would be a bit confusing users to
have two parameters, PENDING_LIST_CLEANUP_SIZE and work_mem, for this
setting. Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE
to work_mem as the default value when running the CREATE INDEX command?

Sorry for the delay.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Etsuro Fujita (#8)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2014/09/09 22:17), Fujii Masao wrote:

On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I get some compiler warnings on v2 of this patch:

reloptions.c:219: warning: excess elements in struct initializer
reloptions.c:219: warning: (near initialization for 'intRelOpts[15]')

Attached is the updated version of the patch.

Thank you for updating the patch!

I took a quick review on the patch. It looks good to me,

Thanks for reviewing the patch!

but one thing I'm
concerned about is

You wrote:

The attached patch introduces the GIN index storage parameter
"PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
GIN pending list. If it's not set, work_mem is used as that maximum
size,
instead. So this patch doesn't break the existing application which
currently uses work_mem as the threshold of cleanup operation of
the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

As you mentioned, I think it's important to consider for the existing
applications, but I'm wondering if it would be a bit confusing users to have
two parameters,

Yep.

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to
work_mem as the default value when running the CREATE INDEX command?

That's an idea. But there might be some users who want to change
the cleanup size per session like they can do by setting work_mem,
and your idea would prevent them from doing that...

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Sorry for the delay.

No problem. Thanks!

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Fujii Masao (#9)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

(2014/09/10 12:31), Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2014/09/09 22:17), Fujii Masao wrote:

Attached is the updated version of the patch.

I took a quick review on the patch. It looks good to me,

but one thing I'm
concerned about is

You wrote:

The attached patch introduces the GIN index storage parameter
"PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
GIN pending list. If it's not set, work_mem is used as that maximum
size,
instead. So this patch doesn't break the existing application which
currently uses work_mem as the threshold of cleanup operation of
the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

As you mentioned, I think it's important to consider for the existing
applications, but I'm wondering if it would be a bit confusing users to have
two parameters,

Yep.

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to
work_mem as the default value when running the CREATE INDEX command?

That's an idea. But there might be some users who want to change
the cleanup size per session like they can do by setting work_mem,
and your idea would prevent them from doing that...

Why not use ALTER INDEX ... SET (PENDING_LIST_CLEANUP_SIZE= ...)? Maybe
I'm missing something, though.

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Yeah, that's an idea. So, I'd like to hear the opinions of others.

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Etsuro Fujita (#10)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Wed, Sep 10, 2014 at 5:35 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2014/09/10 12:31), Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2014/09/09 22:17), Fujii Masao wrote:

Attached is the updated version of the patch.

I took a quick review on the patch. It looks good to me,

but one thing I'm
concerned about is

You wrote:

The attached patch introduces the GIN index storage parameter
"PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
GIN pending list. If it's not set, work_mem is used as that maximum
size,
instead. So this patch doesn't break the existing application which
currently uses work_mem as the threshold of cleanup operation of
the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

As you mentioned, I think it's important to consider for the existing
applications, but I'm wondering if it would be a bit confusing users to
have
two parameters,

Yep.

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE
to
work_mem as the default value when running the CREATE INDEX command?

That's an idea. But there might be some users who want to change
the cleanup size per session like they can do by setting work_mem,
and your idea would prevent them from doing that...

Why not use ALTER INDEX ... SET (PENDING_LIST_CLEANUP_SIZE= ...)? Maybe I'm
missing something, though.

It takes AccessExclusive lock and has an effect on every sessions
(not only specified session).

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Fujii Masao (#9)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to
work_mem as the default value when running the CREATE INDEX command?

That's an idea. But there might be some users who want to change
the cleanup size per session like they can do by setting work_mem,
and your idea would prevent them from doing that...

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes using
the reloption.

I'm not sure about the idea of being able to change it per session,
though. Do you mean that you would like insert processes use a very
large value so that they can just append new values to the pending list,
and have vacuum use a small value so that it cleans up as soon as it
runs? Two things: 1. we could have an "autovacuum_" reloption which
only changes what autovacuum does; 2. we could have autovacuum run
index cleanup actions separately from actual vacuuming.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Fujii Masao
masao.fujii@gmail.com
In reply to: Alvaro Herrera (#12)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to
work_mem as the default value when running the CREATE INDEX command?

That's an idea. But there might be some users who want to change
the cleanup size per session like they can do by setting work_mem,
and your idea would prevent them from doing that...

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes using
the reloption.

Agreed.

I'm not sure about the idea of being able to change it per session,
though. Do you mean that you would like insert processes use a very
large value so that they can just append new values to the pending list,
and have vacuum use a small value so that it cleans up as soon as it
runs? Two things: 1. we could have an "autovacuum_" reloption which
only changes what autovacuum does; 2. we could have autovacuum run
index cleanup actions separately from actual vacuuming.

Yes, I was thinking something like that. But if autovacuum
has already been able to handle that, it's nice. Anyway,
as you pointed out, it's better to have both GUC and reloption
for the cleanup size of pending list.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Fujii Masao (#13)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

(2014/09/13 2:42), Fujii Masao wrote:

On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to
work_mem as the default value when running the CREATE INDEX command?

That's an idea. But there might be some users who want to change
the cleanup size per session like they can do by setting work_mem,
and your idea would prevent them from doing that...

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes using
the reloption.

Agreed.

I'm not sure about the idea of being able to change it per session,
though. Do you mean that you would like insert processes use a very
large value so that they can just append new values to the pending list,
and have vacuum use a small value so that it cleans up as soon as it
runs? Two things: 1. we could have an "autovacuum_" reloption which
only changes what autovacuum does; 2. we could have autovacuum run
index cleanup actions separately from actual vacuuming.

Yes, I was thinking something like that. But if autovacuum
has already been able to handle that, it's nice. Anyway,
as you pointed out, it's better to have both GUC and reloption
for the cleanup size of pending list.

OK, I'd vote for your idea of having both the GUC and the reloption.
So, I think the patch needs to be updated. Fujii-san, what plan do you
have about the patch?

Sorry for the delay.

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Fujii Masao
masao.fujii@gmail.com
In reply to: Etsuro Fujita (#14)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2014/09/13 2:42), Fujii Masao wrote:

On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE
to
work_mem as the default value when running the CREATE INDEX command?

That's an idea. But there might be some users who want to change
the cleanup size per session like they can do by setting work_mem,
and your idea would prevent them from doing that...

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes using
the reloption.

Agreed.

I'm not sure about the idea of being able to change it per session,
though. Do you mean that you would like insert processes use a very
large value so that they can just append new values to the pending list,
and have vacuum use a small value so that it cleans up as soon as it
runs? Two things: 1. we could have an "autovacuum_" reloption which
only changes what autovacuum does; 2. we could have autovacuum run
index cleanup actions separately from actual vacuuming.

Yes, I was thinking something like that. But if autovacuum
has already been able to handle that, it's nice. Anyway,
as you pointed out, it's better to have both GUC and reloption
for the cleanup size of pending list.

OK, I'd vote for your idea of having both the GUC and the reloption. So, I
think the patch needs to be updated. Fujii-san, what plan do you have about
the patch?

Please see the attached patch. In this patch, I introduced the GUC parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the parameter.
But do you have any better idea about that default value?

BTW, I moved the CommitFest entry of this patch to next CF 2014-10.

Regards,

--
Fujii Masao

Attachments:

pending_list_cleanup_size_v4.patchtext/x-patch; charset=US-ASCII; name=pending_list_cleanup_size_v4.patchDownload+128-46
#16Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Fujii Masao (#15)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

(2014/10/08 22:51), Fujii Masao wrote:

On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE
to
work_mem as the default value when running the CREATE INDEX command?

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes using
the reloption.

OK, I'd vote for your idea of having both the GUC and the reloption. So, I
think the patch needs to be updated. Fujii-san, what plan do you have about
the patch?

Please see the attached patch. In this patch, I introduced the GUC parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the parameter.
But do you have any better idea about that default value?

It seems reasonable to me that the GUC has the same default value as
work_mem. So, +1 for the default value of 4MB.

BTW, I moved the CommitFest entry of this patch to next CF 2014-10.

OK, I'll review the patch in the CF.

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#16)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

(2014/10/09 11:49), Etsuro Fujita wrote:

(2014/10/08 22:51), Fujii Masao wrote:

On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting
PENDING_LIST_CLEANUP_SIZE
to
work_mem as the default value when running the CREATE INDEX command?

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes
using
the reloption.

OK, I'd vote for your idea of having both the GUC and the reloption.
So, I
think the patch needs to be updated. Fujii-san, what plan do you
have about
the patch?

Please see the attached patch. In this patch, I introduced the GUC
parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the
parameter.
But do you have any better idea about that default value?

It seems reasonable to me that the GUC has the same default value as
work_mem. So, +1 for the default value of 4MB.

BTW, I moved the CommitFest entry of this patch to next CF 2014-10.

OK, I'll review the patch in the CF.

Thank you for updating the patch! Here are my review comments.

* The patch applies cleanly and make and make check run successfully. I
think that the patch is mostly good.

* In src/backend/utils/misc/guc.c:
+ 	{
+ 		{"pending_list_cleanup_size", PGC_USERSET, CLIENT_CONN_STATEMENT,
+ 			gettext_noop("Sets the maximum size of the pending list for GIN 
index."),
+ 			 NULL,
+ 			GUC_UNIT_KB
+ 		},
+ 		&pending_list_cleanup_size,
+ 		4096, 0, MAX_KILOBYTES,
+ 		NULL, NULL, NULL
+ 	},

ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No?
Also why not set min to 64, not to 0, in accoradance with that of
work_mem?

* In src/backend/utils/misc/postgresql.conf.sample:
Likewise, why not put this variable in the section of RESOURCE USAGE,
not in that of CLIENT CONNECTION DEFAULTS.

* In src/backend/access/common/reloptions.c:
+ 	{
+ 		{
+ 			"pending_list_cleanup_size",
+ 			"Maximum size of the pending list for this GIN index, in kilobytes.",
+ 			RELOPT_KIND_GIN
+ 		},
+ 		-1, 0, MAX_KILOBYTES
+ 	},

As in guc.c, why not set min to 64, not to 0?

* In src/include/access/gin.h:
/* GUC parameter */
extern PGDLLIMPORT int GinFuzzySearchLimit;
+ extern int pending_list_cleanup_size;

The comment should be "GUC parameters".

* In src/backend/access/gin/ginfast.c:
+ /* GUC parameter */
+ int			pending_list_cleanup_size = 0;

Do we need to substitute 0 for pending_list_cleanup_size?

* In doc/src/sgml/config.sgml:
+      <varlistentry id="guc-pending-list-cleanup-size" 
xreflabel="pending_list_cleanup_size">
+       <term><varname>pending_list_cleanup_size</varname> 
(<type>integer</type>)

As in postgresql.conf.sample, ISTM it'd be better to explain this
variable in the section of Resource Consumption (maybe in "Memory"), not
in that of Client Connection Defaults.

* In doc/src/sgml/gin.sgml:
! <literal>pending_list_cleanup_size</>. To avoid fluctuations in
observed

ISTM it'd be better to use <varname> for pending_list_cleanup_size, not
<literal>, here.

* In doc/src/sgml/ref/create_index.sgml:
+ <term><literal>PENDING_LIST_CLEANUP_SIZE</></term>

IMHO, it seems to me better for this variable to be in lowercase in
accordance with the GUC version. Also, I think that the words "GIN
indexes accept a different parameter:" in the section of "Index Storage
Parameters" in this reference page would be "GIN indexes accept
different parameters:".

Sorry for the delay in reviewing the patch.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Etsuro Fujita (#17)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2014/10/09 11:49), Etsuro Fujita wrote:

(2014/10/08 22:51), Fujii Masao wrote:

On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE? How about setting
PENDING_LIST_CLEANUP_SIZE
to
work_mem as the default value when running the CREATE INDEX command?

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.

Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes
using
the reloption.

OK, I'd vote for your idea of having both the GUC and the reloption.
So, I
think the patch needs to be updated. Fujii-san, what plan do you
have about
the patch?

Please see the attached patch. In this patch, I introduced the GUC
parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the
parameter.
But do you have any better idea about that default value?

It seems reasonable to me that the GUC has the same default value as
work_mem. So, +1 for the default value of 4MB.

BTW, I moved the CommitFest entry of this patch to next CF 2014-10.

OK, I'll review the patch in the CF.

Thank you for updating the patch! Here are my review comments.

* The patch applies cleanly and make and make check run successfully. I
think that the patch is mostly good.

Thanks! Attached is the updated version of the patch.

* In src/backend/utils/misc/guc.c:
+       {
+               {"pending_list_cleanup_size", PGC_USERSET,
CLIENT_CONN_STATEMENT,
+                       gettext_noop("Sets the maximum size of the pending
list for GIN index."),
+                        NULL,
+                       GUC_UNIT_KB
+               },
+               &pending_list_cleanup_size,
+               4096, 0, MAX_KILOBYTES,
+               NULL, NULL, NULL
+       },

ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No?

Yes if the pending list always exists in the memory. But not, IIUC. Thought?

Also why not set min to 64, not to 0, in accoradance with that of work_mem?

I'm OK to use 64. But I just chose 0 because I could not think of any reasonable
reason why 64k is suitable as the minimum size of the pending list.
IOW, I have no idea about whether it's reasonable to use the min value of
work_mem as the min size of the pending list.

* In src/backend/utils/misc/postgresql.conf.sample:
Likewise, why not put this variable in the section of RESOURCE USAGE, not in
that of CLIENT CONNECTION DEFAULTS.

Same as above.

* In src/backend/access/common/reloptions.c:
+       {
+               {
+                       "pending_list_cleanup_size",
+                       "Maximum size of the pending list for this GIN
index, in kilobytes.",
+                       RELOPT_KIND_GIN
+               },
+               -1, 0, MAX_KILOBYTES
+       },

As in guc.c, why not set min to 64, not to 0?

Same as above.

* In src/include/access/gin.h:
/* GUC parameter */
extern PGDLLIMPORT int GinFuzzySearchLimit;
+ extern int pending_list_cleanup_size;

The comment should be "GUC parameters".

Yes, fixed.

* In src/backend/access/gin/ginfast.c:
+ /* GUC parameter */
+ int                   pending_list_cleanup_size = 0;

Do we need to substitute 0 for pending_list_cleanup_size?

Same as above.

* In doc/src/sgml/config.sgml:
+      <varlistentry id="guc-pending-list-cleanup-size"
xreflabel="pending_list_cleanup_size">
+       <term><varname>pending_list_cleanup_size</varname>
(<type>integer</type>)

As in postgresql.conf.sample, ISTM it'd be better to explain this variable
in the section of Resource Consumption (maybe in "Memory"), not in that of
Client Connection Defaults.

Same as above.

* In doc/src/sgml/gin.sgml:
! <literal>pending_list_cleanup_size</>. To avoid fluctuations in
observed

ISTM it'd be better to use <varname> for pending_list_cleanup_size, not
<literal>, here.

Yes, fixed.

* In doc/src/sgml/ref/create_index.sgml:
+ <term><literal>PENDING_LIST_CLEANUP_SIZE</></term>

IMHO, it seems to me better for this variable to be in lowercase in
accordance with the GUC version.

Using lowercase only for pending_list_cleanup_size and uppercase for
other options
looks strange to me. What about using lowercase for all the storage options?
I changed the document in that way.

Also, I think that the words "GIN indexes
accept a different parameter:" in the section of "Index Storage Parameters"
in this reference page would be "GIN indexes accept different parameters:".

Yes, fixed.

Sorry for the delay in reviewing the patch.

No problem. Thanks!

Regards,

--
Fujii Masao

Attachments:

pending_list_cleanup_size_v5.patchtext/x-patch; charset=US-ASCII; name=pending_list_cleanup_size_v5.patchDownload+148-66
#19Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Fujii Masao (#18)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

(2014/10/30 21:30), Fujii Masao wrote:

On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Here are my review comments.

* The patch applies cleanly and make and make check run successfully. I
think that the patch is mostly good.

Thanks! Attached is the updated version of the patch.

Thank you for updating the patch!

* In src/backend/utils/misc/guc.c:
+       {
+               {"pending_list_cleanup_size", PGC_USERSET,
CLIENT_CONN_STATEMENT,
+                       gettext_noop("Sets the maximum size of the pending
list for GIN index."),
+                        NULL,
+                       GUC_UNIT_KB
+               },
+               &pending_list_cleanup_size,
+               4096, 0, MAX_KILOBYTES,
+               NULL, NULL, NULL
+       },

ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No?

Yes if the pending list always exists in the memory. But not, IIUC. Thought?

Exactly. But I think we can expect that in many cases, since I think
that the users would often set the GUC to a small value to the extent
that most of the pending list pages would be cached by shared buffer, to
maintain *search* performance.

I'd like to hear the opinions of others about the category for the GUC.

Also why not set min to 64, not to 0, in accoradance with that of work_mem?

I'm OK to use 64. But I just chose 0 because I could not think of any reasonable
reason why 64k is suitable as the minimum size of the pending list.
IOW, I have no idea about whether it's reasonable to use the min value of
work_mem as the min size of the pending list.

IIUC, I think that min = 0 disables fast update, so ISTM that it'd be
appropriate to set min to some positive value. And ISTM that the idea
of using the min value of work_mem is not so bad.

* In doc/src/sgml/ref/create_index.sgml:
+ <term><literal>PENDING_LIST_CLEANUP_SIZE</></term>

IMHO, it seems to me better for this variable to be in lowercase in
accordance with the GUC version.

Using lowercase only for pending_list_cleanup_size and uppercase for
other options
looks strange to me. What about using lowercase for all the storage options?

+1

I changed the document in that way.

*** 356,361 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ <replaceable 
class="parameter">name</
--- 356,372 ----
       </listitem>
      </varlistentry>
      </variablelist>
+    <variablelist>
+    <varlistentry>
+     <term><literal>PENDING_LIST_CLEANUP_SIZE</></term>

The above is still in uppercse.

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Fujii Masao
masao.fujii@gmail.com
In reply to: Etsuro Fujita (#19)
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

On Tue, Nov 4, 2014 at 12:04 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

IIUC, I think that min = 0 disables fast update, so ISTM that it'd be
appropriate to set min to some positive value. And ISTM that the idea of
using the min value of work_mem is not so bad.

OK. I changed the min value to 64kB.

*** 356,361 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ <replaceable
class="parameter">name</
--- 356,372 ----
</listitem>
</varlistentry>
</variablelist>
+    <variablelist>
+    <varlistentry>
+     <term><literal>PENDING_LIST_CLEANUP_SIZE</></term>

The above is still in uppercse.

Fixed.

Attached is the updated version of the patch. Thanks for the review!

Regards,

--
Fujii Masao

Attachments:

pending_list_cleanup_size_v6.patchtext/x-patch; charset=US-ASCII; name=pending_list_cleanup_size_v6.patchDownload+148-66
#21Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Fujii Masao (#20)
#22Fujii Masao
masao.fujii@gmail.com
In reply to: Etsuro Fujita (#21)
#23Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Fujii Masao (#22)
#24Fujii Masao
masao.fujii@gmail.com
In reply to: Etsuro Fujita (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#24)
#26Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#26)
#28Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#27)
#29Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#28)
#30Jeff Janes
jeff.janes@gmail.com
In reply to: Fujii Masao (#18)
#31Fujii Masao
masao.fujii@gmail.com
In reply to: Jeff Janes (#30)