CLUSTER and synchronized scans and pg_dump et al

Started by Gregory Starkalmost 18 years ago22 messages
#1Gregory Stark
stark@enterprisedb.com

It occurred to me the other day that synchronized scans could play havoc with
clustered tables. When you dump and reload a table even if it was recently
clustered if any other sequential scans are happening in the system at the
time you dump it the dump could shuffle the records out of order.

Now the records would still be effectively ordered for most purposes but our
statistics can't detect that. Since the correlation would be poor the restored
database would have markedly different statistics showing virtually no
correlation on the clustered column.

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#1)
Re: CLUSTER and synchronized scans and pg_dump et al

Gregory Stark <stark@enterprisedb.com> writes:

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

Yeah, Rae Steining was complaining to me about that off-list a few weeks
ago. The whole syncscan behavior risks breaking many apps that "always
worked before", even if they were disregarding the letter of the SQL spec.

Maybe a GUC variable to enable/disable syncscan?

regards, tom lane

#3Guillaume Smet
guillaume.smet@gmail.com
In reply to: Tom Lane (#2)
Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 27, 2008 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, Rae Steining was complaining to me about that off-list a few weeks
ago. The whole syncscan behavior risks breaking many apps that "always
worked before", even if they were disregarding the letter of the SQL spec.

Maybe a GUC variable to enable/disable syncscan?

I'm not sure it's really a good reason for that because it's just a
matter of time for them to be broken anyway.

But it seems at least a good idea to have a way to build reproducible
test cases on production boxes without being perturbed by the other
scans running. Would it need a restart and be a global GUC variable or
could it be set temporarily per session?

--
Guillaume

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Guillaume Smet (#3)
Re: CLUSTER and synchronized scans and pg_dump et al

"Guillaume Smet" <guillaume.smet@gmail.com> writes:

Would it need a restart and be a global GUC variable or
could it be set temporarily per session?

It could be PGC_USERSET, afaics.

regards, tom lane

#5Guillaume Smet
guillaume.smet@gmail.com
In reply to: Tom Lane (#4)
Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 27, 2008 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It could be PGC_USERSET, afaics.

If so, it seems like a good idea even if it's just for debugging purposes.

--
Guillaume

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Guillaume Smet (#5)
Re: CLUSTER and synchronized scans and pg_dump et al

"Guillaume Smet" <guillaume.smet@gmail.com> writes:

Maybe a GUC variable to enable/disable syncscan?

If so, it seems like a good idea even if it's just for debugging purposes.

Do we have nominations for a name? The first idea that comes to mind
is "synchronized_scanning" (defaulting to ON).

Also, does anyone object to making pg_dump just disable it
unconditionally? Greg's original gripe only mentioned the case of
clustered tables, but it'd be kind of a pain to make pg_dump turn it
on and off again for different tables. And I could see people
complaining about pg_dump failing to preserve row order even in
unclustered tables.

regards, tom lane

#7Guillaume Smet
guillaume.smet@gmail.com
In reply to: Tom Lane (#6)
Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 27, 2008 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, does anyone object to making pg_dump just disable it
unconditionally? Greg's original gripe only mentioned the case of
clustered tables, but it'd be kind of a pain to make pg_dump turn it
on and off again for different tables. And I could see people
complaining about pg_dump failing to preserve row order even in
unclustered tables.

+1. I don't think it's worth it to enable it for non clustered tables
and it's always better to keep the order if we can.

--
Guillaume

#8Markus Bertheau
mbertheau.pg@googlemail.com
In reply to: Tom Lane (#6)
Re: CLUSTER and synchronized scans and pg_dump et al

2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:

Do we have nominations for a name? The first idea that comes to mind
is "synchronized_scanning" (defaulting to ON).

"synchronized_sequential_scans" is a bit long, but contains the
keyword "sequential scans", which will ring a bell with many, more so
than "synchronized_scanning".

Markus

#9Guillaume Smet
guillaume.smet@gmail.com
In reply to: Markus Bertheau (#8)
Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 27, 2008 9:07 PM, Markus Bertheau <mbertheau.pg@googlemail.com> wrote:

2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:

Do we have nominations for a name? The first idea that comes to mind
is "synchronized_scanning" (defaulting to ON).

"synchronized_sequential_scans" is a bit long, but contains the
keyword "sequential scans", which will ring a bell with many, more so
than "synchronized_scanning".

synchronize_seqscans?

In reply to: Guillaume Smet (#9)
Re: CLUSTER and synchronized scans and pg_dump et al

Guillaume Smet wrote:

On Jan 27, 2008 9:07 PM, Markus Bertheau
<mbertheau.pg@googlemail.com> wrote:

2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:

Do we have nominations for a name? The first idea that comes to
mind is "synchronized_scanning" (defaulting to ON).

"synchronized_sequential_scans" is a bit long, but contains the
keyword "sequential scans", which will ring a bell with many, more
so than "synchronized_scanning".

synchronize_seqscans?

How about enable_syncscan, or enable_seqscan_sync? It's not strictly
something the influences the planner, but maybe it's similar enough to
justify a similar naming?

regards, Florian Pflug

#11Guillaume Smet
guillaume.smet@gmail.com
In reply to: Florian G. Pflug (#10)
Re: CLUSTER and synchronized scans and pg_dump et al

Hi Florian,

Glad to see you back!

On Jan 28, 2008 3:25 PM, Florian G. Pflug <fgp@phlo.org> wrote:

How about enable_syncscan, or enable_seqscan_sync? It's not strictly
something the influences the planner, but maybe it's similar enough to
justify a similar naming?

It was my first idea but I didn't propose it as it's really a
different thing IMHO. enable_* variables don't change the way
PostgreSQL really does the job as synchronize_scans (or whatever the
name will be) does.
And it's not very consistent with the other GUC variables (most of
them could have "enable" in their name) but we limited the usage of
enable_* to planner variables. I don't know if it's on purpose though.

--
Guillaume

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Gregory Stark (#1)
Re: CLUSTER and synchronized scans and pg_dump et al

On Sun, Jan 27, 2008 at 9:02 AM, in message

<87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
wrote:

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

I don't see a general case for worrying about the order of rows
returned by queries which lack an ORDER BY clause.

-Kevin

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Kevin Grittner (#12)
Re: CLUSTER and synchronized scans and pg_dump et al

Kevin Grittner wrote:

On Sun, Jan 27, 2008 at 9:02 AM, in message

<87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
wrote:

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

I don't see a general case for worrying about the order of rows
returned by queries which lack an ORDER BY clause.

What will be the performance hit from doing that?

cheers

andrew

#14Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andrew Dunstan (#13)
Re: CLUSTER and synchronized scans and pg_dump et al

On Mon, Jan 28, 2008 at 9:00 AM, in message <479DEDF5.4090909@dunslane.net>,

Andrew Dunstan <andrew@dunslane.net> wrote:

Kevin Grittner wrote:

On Sun, Jan 27, 2008 at 9:02 AM, in message

<87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
wrote:

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

What will be the performance hit from doing that?

If the rows actually are in order of the clustered index, it
shouldn't add much more than the time needed to sequentially pass
the clustered index, should it? Even so, perhaps there should be a
command-line option on pg_dump to control whether it does this.

-Kevin

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Guillaume Smet (#11)
Re: CLUSTER and synchronized scans and pg_dump et al

"Guillaume Smet" <guillaume.smet@gmail.com> writes:

It was my first idea but I didn't propose it as it's really a
different thing IMHO. enable_* variables don't change the way
PostgreSQL really does the job as synchronize_scans (or whatever the
name will be) does.
And it's not very consistent with the other GUC variables (most of
them could have "enable" in their name) but we limited the usage of
enable_* to planner variables. I don't know if it's on purpose though.

Yeah, it is a more or less deliberate policy to use enable_ only for
planner control variables, which this one certainly isn't. I seem
to recall an argument also that prefixing enable_ is just noise; it
doesn't add anything to your understanding of what the variable does.

So far I think "synchronize_seqscans" is the best proposal.

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#13)
Re: CLUSTER and synchronized scans and pg_dump et al

Andrew Dunstan <andrew@dunslane.net> writes:

Kevin Grittner wrote:

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

What will be the performance hit from doing that?

That worries me too. Also, in general pg_dump's charter is to reproduce
the state of the database as best it can, not to "improve" it.

regards, tom lane

#17Steve Atkins
steve@blighty.com
In reply to: Tom Lane (#16)
Re: CLUSTER and synchronized scans and pg_dump et al

On Jan 28, 2008, at 8:36 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Kevin Grittner wrote:

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

What will be the performance hit from doing that?

That worries me too. Also, in general pg_dump's charter is to
reproduce
the state of the database as best it can, not to "improve" it.

One common use of cluster around here is to act as a faster version
of vacuum full when there's a lot of dead rows in a table. There's no
intent to keep the table clustered on that index, and the cluster flag
isn't removed with alter table (why bother, the only thing it affects is
the cluster command).

I'm guessing that's not unusual, and it'd lead to sorting tables as part
of pg_dump.

Cheers,
Steve

#18Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#16)
Re: CLUSTER and synchronized scans and pg_dump et al

On Mon, Jan 28, 2008 at 10:36 AM, in message <3001.1201538162@sss.pgh.pa.us>,

Tom Lane <tgl@sss.pgh.pa.us> wrote:

in general pg_dump's charter is to reproduce
the state of the database as best it can, not to "improve" it.

Seems that I've often seen it recommended as a way to eliminate bloat.

It seems like there are some practical use cases where it would be
a pain to have to do a CLUSTER right on the heels of having used
pg_dump to psql.

This does seem like the right way to do it where a user really wants
to maintain the physical sequence; my biggest concern is that
CLUSTER is sometimes used to eliminate bloat, and there is no real
interest in maintaining that sequence later. I'd bet that people
generally do not alter the table to remove the clustered index
choice, so this option could be rather painful somewhere
downstream, when the sequence has become pretty random.

Maybe it would make sense if it was not the default, and the issues
were properly documented under the description of the option?

-Kevin

In reply to: Steve Atkins (#17)
Re: CLUSTER and synchronized scans and pg_dump et al

Steve Atkins wrote:

On Jan 28, 2008, at 8:36 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Kevin Grittner wrote:

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.

What will be the performance hit from doing that?

That worries me too. Also, in general pg_dump's charter is to reproduce
the state of the database as best it can, not to "improve" it.

One common use of cluster around here is to act as a faster version
of vacuum full when there's a lot of dead rows in a table. There's no
intent to keep the table clustered on that index, and the cluster flag
isn't removed with alter table (why bother, the only thing it affects is
the cluster command).

I'm guessing that's not unusual, and it'd lead to sorting tables as part
of pg_dump.

I've done that too - and every time I typed that "CLUSTER ... " I
thought why, oh why isn't there something like REWRITE TABLE <table>",
which would work just like CLUSTER, but without the sorting ;-) Maybe
something to put on the TODO list...

We might even call it "VACCUM REWRITE" ;-)

regards, Florian Pflug

#20Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#2)
Re: CLUSTER and synchronized scans and pg_dump et al

On Sun, 2008-01-27 at 12:45 -0500, Tom Lane wrote:

Maybe a GUC variable to enable/disable syncscan?

The first iterations of the patch included a GUC.

I don't have any objection to re-introducing a GUC to enable/disable it.
However, I would suggest that it defaults to "on", because:

1. There aren't many cases where you'd want it to be off, and this
particular case with pg_dump is the best one that I've heard of (thanks
Greg). We want people who install 8.3 to see a boost without lots of
tuning, if possible.
2. It only turns on for tables over 25% of shared buffers anyway.

Introducing GUCs reintroduces the same questions that were discussed
before.

1. Should the 25% figure be tunable as well?
2. Remember that the 25% figure is also tied to Simon and Heikki's
buffer recycling patch (buffer ring patch). Should they be configurable
independently? Should they be tied together, but configurable?

The simplest solution, in my opinion, is something like:

large_scan_threshold = 0.25 # set to -1 to disable

Where a scan of any table larger than (large_scan_threshold *
shared_buffers) employs both synchronized scans and buffer recycling. We
may implement other large scan strategies in the future.

Regards,
Jeff Davis

#21Jeff Davis
pgsql@j-davis.com
In reply to: Gregory Stark (#1)
Re: CLUSTER and synchronized scans and pg_dump et al

On Sun, 2008-01-27 at 15:02 +0000, Gregory Stark wrote:

It occurred to me the other day that synchronized scans could play havoc with
clustered tables. When you dump and reload a table even if it was recently
clustered if any other sequential scans are happening in the system at the
time you dump it the dump could shuffle the records out of order.

Now the records would still be effectively ordered for most purposes but our
statistics can't detect that. Since the correlation would be poor the restored
database would have markedly different statistics showing virtually no
correlation on the clustered column.

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

Thank you for bringing this up, it's an interesting point.

Keep in mind that this only matters if you are actually running pg_dump
concurrently with another scan, because a scan will reset the starting
point after completing.

Regards,
Jeff Davis

#22Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#6)
Re: CLUSTER and synchronized scans and pg_dump et al

On Sun, 2008-01-27 at 13:37 -0500, Tom Lane wrote:

Also, does anyone object to making pg_dump just disable it
unconditionally? Greg's original gripe only mentioned the case of
clustered tables, but it'd be kind of a pain to make pg_dump turn it
on and off again for different tables. And I could see people
complaining about pg_dump failing to preserve row order even in
unclustered tables.

If you are running pg_dump, that increases the likelihood that multiple
sequential scans will be reading the same large table at the same time.
Sync scans prevent that additional scan from bringing your active
database to a halt during your dump (due to horrible seeking and poor
cache efficiency).

I think that pg_dump is a good use case for synchronized scans. Assuming
it doesn't hold up 8.3, I think it's worthwhile to consider only
disabling it for tables that have been clustered.

That being said, this isn't a strong objection. Having a painless 8.3
release is the top priority, of course.

Regards,
Jeff Davis