CLUSTER and synchronized scans and pg_dump et al

Started by Bruce Momjianover 18 years ago22 messageshackers

bruce@momjian.us

over 18 years ago

It occurred to me the other day that synchronized scans could play havoc with
clustered tables. When you dump and reload a table even if it was recently
clustered if any other sequential scans are happening in the system at the
time you dump it the dump could shuffle the records out of order.

Now the records would still be effectively ordered for most purposes but our
statistics can't detect that. Since the correlation would be poor the restored
database would have markedly different statistics showing virtually no
correlation on the clustered column.

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Bruce Momjian (#1)

Re: CLUSTER and synchronized scans and pg_dump et al

Gregory Stark <stark@enterprisedb.com> writes:

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

Yeah, Rae Steining was complaining to me about that off-list a few weeks
ago. The whole syncscan behavior risks breaking many apps that "always
worked before", even if they were disregarding the letter of the SQL spec.

Maybe a GUC variable to enable/disable syncscan?

regards, tom lane

Guillaume Smet

guillaume.smet@gmail.com

over 18 years ago

In reply to: Tom Lane (#2)

Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 27, 2008 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, Rae Steining was complaining to me about that off-list a few weeks
ago. The whole syncscan behavior risks breaking many apps that "always
worked before", even if they were disregarding the letter of the SQL spec.

Maybe a GUC variable to enable/disable syncscan?

I'm not sure it's really a good reason for that because it's just a
matter of time for them to be broken anyway.

But it seems at least a good idea to have a way to build reproducible
test cases on production boxes without being perturbed by the other
scans running. Would it need a restart and be a global GUC variable or
could it be set temporarily per session?

--
Guillaume

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Guillaume Smet (#3)

Re: CLUSTER and synchronized scans and pg_dump et al

"Guillaume Smet" <guillaume.smet@gmail.com> writes:

Would it need a restart and be a global GUC variable or
could it be set temporarily per session?

It could be PGC_USERSET, afaics.

regards, tom lane

Guillaume Smet

guillaume.smet@gmail.com

over 18 years ago

In reply to: Tom Lane (#4)

Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 27, 2008 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It could be PGC_USERSET, afaics.

If so, it seems like a good idea even if it's just for debugging purposes.

--
Guillaume

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Guillaume Smet (#5)

Re: CLUSTER and synchronized scans and pg_dump et al

"Guillaume Smet" <guillaume.smet@gmail.com> writes:

Maybe a GUC variable to enable/disable syncscan?

If so, it seems like a good idea even if it's just for debugging purposes.

Do we have nominations for a name? The first idea that comes to mind
is "synchronized_scanning" (defaulting to ON).

Also, does anyone object to making pg_dump just disable it
unconditionally? Greg's original gripe only mentioned the case of
clustered tables, but it'd be kind of a pain to make pg_dump turn it
on and off again for different tables. And I could see people
complaining about pg_dump failing to preserve row order even in
unclustered tables.

regards, tom lane

Guillaume Smet

guillaume.smet@gmail.com

over 18 years ago

In reply to: Tom Lane (#6)

Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 27, 2008 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, does anyone object to making pg_dump just disable it
unconditionally? Greg's original gripe only mentioned the case of
clustered tables, but it'd be kind of a pain to make pg_dump turn it
on and off again for different tables. And I could see people
complaining about pg_dump failing to preserve row order even in
unclustered tables.

+1. I don't think it's worth it to enable it for non clustered tables
and it's always better to keep the order if we can.

--
Guillaume

Markus Bertheau

mbertheau.pg@googlemail.com

over 18 years ago

In reply to: Tom Lane (#6)

Re: CLUSTER and synchronized scans and pg_dump et al

2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:

Do we have nominations for a name? The first idea that comes to mind
is "synchronized_scanning" (defaulting to ON).

"synchronized_sequential_scans" is a bit long, but contains the
keyword "sequential scans", which will ring a bell with many, more so
than "synchronized_scanning".

Markus

Guillaume Smet

guillaume.smet@gmail.com

over 18 years ago

In reply to: Markus Bertheau (#8)

Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 27, 2008 9:07 PM, Markus Bertheau <mbertheau.pg@googlemail.com> wrote:

2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:

Do we have nominations for a name? The first idea that comes to mind
is "synchronized_scanning" (defaulting to ON).

"synchronized_sequential_scans" is a bit long, but contains the
keyword "sequential scans", which will ring a bell with many, more so
than "synchronized_scanning".

synchronize_seqscans?

#10

Florian Pflug

fgp@phlo.org

over 18 years ago

In reply to: Guillaume Smet (#9)

Re: CLUSTER and synchronized scans and pg_dump et al

Guillaume Smet wrote:

On Jan 27, 2008 9:07 PM, Markus Bertheau
<mbertheau.pg@googlemail.com> wrote:

2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:

Do we have nominations for a name? The first idea that comes to
mind is "synchronized_scanning" (defaulting to ON).

"synchronized_sequential_scans" is a bit long, but contains the
keyword "sequential scans", which will ring a bell with many, more
so than "synchronized_scanning".

synchronize_seqscans?

How about enable_syncscan, or enable_seqscan_sync? It's not strictly
something the influences the planner, but maybe it's similar enough to
justify a similar naming?

regards, Florian Pflug

#11

Guillaume Smet

guillaume.smet@gmail.com

over 18 years ago

In reply to: Florian Pflug (#10)

Re: CLUSTER and synchronized scans and pg_dump et al

Hi Florian,

Glad to see you back!

On Jan 28, 2008 3:25 PM, Florian G. Pflug <fgp@phlo.org> wrote:

How about enable_syncscan, or enable_seqscan_sync? It's not strictly
something the influences the planner, but maybe it's similar enough to
justify a similar naming?

It was my first idea but I didn't propose it as it's really a
different thing IMHO. enable_* variables don't change the way
PostgreSQL really does the job as synchronize_scans (or whatever the
name will be) does.
And it's not very consistent with the other GUC variables (most of
them could have "enable" in their name) but we limited the usage of
enable_* to planner variables. I don't know if it's on purpose though.

--
Guillaume

#12

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 18 years ago

In reply to: Bruce Momjian (#1)

Re: CLUSTER and synchronized scans and pg_dump et al

On Sun, Jan 27, 2008 at 9:02 AM, in message

<87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
wrote:

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

I don't see a general case for worrying about the order of rows
returned by queries which lack an ORDER BY clause.

-Kevin

#13

Andrew Dunstan

andrew@dunslane.net

over 18 years ago

In reply to: Kevin Grittner (#12)

Re: CLUSTER and synchronized scans and pg_dump et al

Kevin Grittner wrote:

On Sun, Jan 27, 2008 at 9:02 AM, in message

<87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
wrote:

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

I don't see a general case for worrying about the order of rows
returned by queries which lack an ORDER BY clause.

What will be the performance hit from doing that?

cheers

andrew

#14

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 18 years ago

In reply to: Andrew Dunstan (#13)

Re: CLUSTER and synchronized scans and pg_dump et al

On Mon, Jan 28, 2008 at 9:00 AM, in message <479DEDF5.4090909@dunslane.net>,

Andrew Dunstan <andrew@dunslane.net> wrote:

Kevin Grittner wrote:

On Sun, Jan 27, 2008 at 9:02 AM, in message

<87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
wrote:

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

What will be the performance hit from doing that?

If the rows actually are in order of the clustered index, it
shouldn't add much more than the time needed to sequentially pass
the clustered index, should it? Even so, perhaps there should be a
command-line option on pg_dump to control whether it does this.

-Kevin

#15

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Guillaume Smet (#11)

Re: CLUSTER and synchronized scans and pg_dump et al

"Guillaume Smet" <guillaume.smet@gmail.com> writes:

It was my first idea but I didn't propose it as it's really a
different thing IMHO. enable_* variables don't change the way
PostgreSQL really does the job as synchronize_scans (or whatever the
name will be) does.
And it's not very consistent with the other GUC variables (most of
them could have "enable" in their name) but we limited the usage of
enable_* to planner variables. I don't know if it's on purpose though.

Yeah, it is a more or less deliberate policy to use enable_ only for
planner control variables, which this one certainly isn't. I seem
to recall an argument also that prefixing enable_ is just noise; it
doesn't add anything to your understanding of what the variable does.

So far I think "synchronize_seqscans" is the best proposal.

regards, tom lane

#16

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Andrew Dunstan (#13)

Re: CLUSTER and synchronized scans and pg_dump et al

Andrew Dunstan <andrew@dunslane.net> writes:

Kevin Grittner wrote:

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

What will be the performance hit from doing that?

That worries me too. Also, in general pg_dump's charter is to reproduce
the state of the database as best it can, not to "improve" it.

regards, tom lane

#17

Steve Atkins

steve@blighty.com

over 18 years ago

In reply to: Tom Lane (#16)

Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 28, 2008, at 8:36 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Kevin Grittner wrote:

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

What will be the performance hit from doing that?

That worries me too. Also, in general pg_dump's charter is to
reproduce
the state of the database as best it can, not to "improve" it.

One common use of cluster around here is to act as a faster version
of vacuum full when there's a lot of dead rows in a table. There's no
intent to keep the table clustered on that index, and the cluster flag
isn't removed with alter table (why bother, the only thing it affects is
the cluster command).

I'm guessing that's not unusual, and it'd lead to sorting tables as part
of pg_dump.

Cheers,
Steve

#18

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 18 years ago

In reply to: Tom Lane (#16)

Re: CLUSTER and synchronized scans and pg_dump et al

On Mon, Jan 28, 2008 at 10:36 AM, in message <3001.1201538162@sss.pgh.pa.us>,

Tom Lane <tgl@sss.pgh.pa.us> wrote:

in general pg_dump's charter is to reproduce
the state of the database as best it can, not to "improve" it.

Seems that I've often seen it recommended as a way to eliminate bloat.

It seems like there are some practical use cases where it would be
a pain to have to do a CLUSTER right on the heels of having used
pg_dump to psql.

This does seem like the right way to do it where a user really wants
to maintain the physical sequence; my biggest concern is that
CLUSTER is sometimes used to eliminate bloat, and there is no real
interest in maintaining that sequence later. I'd bet that people
generally do not alter the table to remove the clustered index
choice, so this option could be rather painful somewhere
downstream, when the sequence has become pretty random.

Maybe it would make sense if it was not the default, and the issues
were properly documented under the description of the option?

-Kevin

#19

Florian Pflug

fgp@phlo.org

over 18 years ago

In reply to: Steve Atkins (#17)

Re: CLUSTER and synchronized scans and pg_dump et al

Steve Atkins wrote:

On Jan 28, 2008, at 8:36 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Kevin Grittner wrote:

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

What will be the performance hit from doing that?

That worries me too. Also, in general pg_dump's charter is to reproduce
the state of the database as best it can, not to "improve" it.

One common use of cluster around here is to act as a faster version
of vacuum full when there's a lot of dead rows in a table. There's no
intent to keep the table clustered on that index, and the cluster flag
isn't removed with alter table (why bother, the only thing it affects is
the cluster command).

I'm guessing that's not unusual, and it'd lead to sorting tables as part
of pg_dump.

I've done that too - and every time I typed that "CLUSTER ... " I
thought why, oh why isn't there something like REWRITE TABLE <table>",
which would work just like CLUSTER, but without the sorting ;-) Maybe
something to put on the TODO list...

We might even call it "VACCUM REWRITE" ;-)

regards, Florian Pflug

#20

Jeff Davis

pgsql@j-davis.com

over 18 years ago

In reply to: Tom Lane (#2)

Re: CLUSTER and synchronized scans and pg_dump et al

On Sun, 2008-01-27 at 12:45 -0500, Tom Lane wrote:

Maybe a GUC variable to enable/disable syncscan?

The first iterations of the patch included a GUC.

I don't have any objection to re-introducing a GUC to enable/disable it.
However, I would suggest that it defaults to "on", because:

1. There aren't many cases where you'd want it to be off, and this
particular case with pg_dump is the best one that I've heard of (thanks
Greg). We want people who install 8.3 to see a boost without lots of
tuning, if possible.
2. It only turns on for tables over 25% of shared buffers anyway.

Introducing GUCs reintroduces the same questions that were discussed
before.

1. Should the 25% figure be tunable as well?
2. Remember that the 25% figure is also tied to Simon and Heikki's
buffer recycling patch (buffer ring patch). Should they be configurable
independently? Should they be tied together, but configurable?

The simplest solution, in my opinion, is something like:

large_scan_threshold = 0.25 # set to -1 to disable

Where a scan of any table larger than (large_scan_threshold *
shared_buffers) employs both synchronized scans and buffer recycling. We
may implement other large scan strategies in the future.

Regards,
Jeff Davis

#21

Jeff Davis

pgsql@j-davis.com

over 18 years ago

In reply to: Bruce Momjian (#1)

#22

Jeff Davis

pgsql@j-davis.com

over 18 years ago

In reply to: Tom Lane (#6)