Is it ever necessary to vacuum a table that only gets inserts/updates?

Started by Jason Buberelover 14 years ago11 messagesgeneral
Jump to latest
#1Jason Buberel
jason@altosresearch.com

Just wondering if there is ever a reason to vacuum a very large table (> 1B
rows) containing rows that never has rows deleted.

Under what circumstance would the table benefit from a vacuum?

-jason

--
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
650.603.0907

#2John R Pierce
pierce@hogranch.com
In reply to: Jason Buberel (#1)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

On 11/16/11 4:24 PM, Jason Buberel wrote:

Just wondering if there is ever a reason to vacuum a very large table
(> 1B rows) containing rows that never has rows deleted.

Under what circumstance would the table benefit from a vacuum?

no updates either?

you still want to do a vacuum analyze every so often to update the
statistics used by the planner.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: John R Pierce (#2)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

John R Pierce <pierce@hogranch.com> writes:

On 11/16/11 4:24 PM, Jason Buberel wrote:

Just wondering if there is ever a reason to vacuum a very large table
(> 1B rows) containing rows that never has rows deleted.

no updates either?

To clarify: in Postgres, an "update" means an insert and a delete.
So unless you mean that this table is insert-only, you certainly
still need vacuum.

you still want to do a vacuum analyze every so often to update the
statistics used by the planner.

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

VACUUM could still be worthwhile though, because (a) it will set commit
hint bits on all pages and (b) it will set visibility-map bits on all
pages. An ANALYZE would only do those things for the random sample of
pages that it visits. While neither of those things are critical, they
do offload work from future queries that would otherwise have to do that
work in-line. So if you've got a maintenance window where the database
isn't answering queries anyway, it could be worthwhile to run a VACUUM
just to get those bits set.

regards, tom lane

#4Jason Buberel
jason@altosresearch.com
In reply to: Tom Lane (#3)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

Thank you Tom & John.

In this case, there are no updates/deleted - only inserts. For now, I have
set per-table autovacuum rules in order to minimize the frequency of
vacuums but to ensure the statistics are updated frequently with analyze:

Table auto-vacuum VACUUM base threshold 500000000
Table auto-vacuum VACUUM scale factor 0.3
Table auto-vacuum ANALYZE base threshold 50000
Table auto-vacuum ANALYZE scale factor 0.02
Table auto-vacuum VACUUM cost delay 20
Table auto-vacuum VACUUM cost limit 200

On Wed, Nov 16, 2011 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

John R Pierce <pierce@hogranch.com> writes:

On 11/16/11 4:24 PM, Jason Buberel wrote:

Just wondering if there is ever a reason to vacuum a very large table
(> 1B rows) containing rows that never has rows deleted.

no updates either?

To clarify: in Postgres, an "update" means an insert and a delete.
So unless you mean that this table is insert-only, you certainly
still need vacuum.

you still want to do a vacuum analyze every so often to update the
statistics used by the planner.

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

VACUUM could still be worthwhile though, because (a) it will set commit
hint bits on all pages and (b) it will set visibility-map bits on all
pages. An ANALYZE would only do those things for the random sample of
pages that it visits. While neither of those things are critical, they
do offload work from future queries that would otherwise have to do that
work in-line. So if you've got a maintenance window where the database
isn't answering queries anyway, it could be worthwhile to run a VACUUM
just to get those bits set.

regards, tom lane

--
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
650.603.0907

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

On Nov 17, 2011 1:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

John R Pierce <pierce@hogranch.com> writes:

On 11/16/11 4:24 PM, Jason Buberel wrote:

Just wondering if there is ever a reason to vacuum a very large table
(> 1B rows) containing rows that never has rows deleted.

no updates either?

To clarify: in Postgres, an "update" means an insert and a delete.
So unless you mean that this table is insert-only, you certainly
still need vacuum.

you still want to do a vacuum analyze every so often to update the
statistics used by the planner.

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to
handle xid wraparound? If so, doing it pre-emptively might help avoid a
giant I/O load and work pause when its forced.

Or am I just confused?

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#5)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

Craig Ringer <ringerc@ringerc.id.au> writes:

On Nov 17, 2011 1:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to
handle xid wraparound?

Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

regards, tom lane

#7Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Tom Lane (#6)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

On 18/11/11 04:59, Tom Lane wrote:

Craig Ringer<ringerc@ringerc.id.au> writes:

On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us> wrote:

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to
handle xid wraparound?

Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

regards, tom lane

Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records. I thought that
random inserts would lead to bloat, as there would be lots of blocks far
from the optimum fill factor.

Regards,
Gavin

#8Adam Cornett
adam.cornett@gmail.com
In reply to: Gavin Flower (#7)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

On Fri, Nov 18, 2011 at 2:56 PM, Gavin Flower <GavinFlower@archidevsys.co.nz

wrote:

On 18/11/11 04:59, Tom Lane wrote:

Craig Ringer<ringerc@ringerc.id.au> writes:

On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us> wrote:

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to
handle xid wraparound?

Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

regards, tom lane

Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records. I thought that
random inserts would lead to bloat, as there would be lots of blocks far
from the optimum fill factor.

Regards,
Gavin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-general&lt;http://www.postgresql.org/mailpref/pgsql-general&gt;

I might be wrong (I'm sure Tom will correct me if so), but Postgres does
not store tuples in an ordered format on disk, they are on disk in the
order they are inserted, unless the table is re-ordered by
cluster<http://www.postgresql.org/docs/current/interactive/sql-cluster.html&gt;,
which only does a one time sort.

Table bloat (and the table fill factor) are usually associated with deletes
and updates. If you delete a row, or update it so that it takes up less
room (by say removing a large text value) then postgres could use the now
free space on that page to store a new tuple.

-Adam

#9Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Adam Cornett (#8)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

On 19/11/11 11:32, Adam Cornett wrote:

On Fri, Nov 18, 2011 at 2:56 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz <mailto:GavinFlower@archidevsys.co.nz>>
wrote:

On 18/11/11 04:59, Tom Lane wrote:

Craig Ringer<ringerc@ringerc.id.au
<mailto:ringerc@ringerc.id.au>> writes:

On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

If it's purely an insert-only table, such as a logging
table, then in
principle you only need periodic ANALYZEs and not any
VACUUMs.

Won't a VACUUM FREEZE (or autovac equivalent) be necessary
eventually, to
handle xid wraparound?

Sure, but if he's continually adding new rows, I don't see
much point in
launching extra freeze operations.

regards, tom lane

Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records. I
thought that random inserts would lead to bloat, as there would be
lots of blocks far from the optimum fill factor.

Regards,
Gavin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I might be wrong (I'm sure Tom will correct me if so), but Postgres
does not store tuples in an ordered format on disk, they are on disk
in the order they are inserted, unless the table is re-ordered by
cluster
<http://www.postgresql.org/docs/current/interactive/sql-cluster.html&gt;,
which only does a one time sort.

Table bloat (and the table fill factor) are usually associated with
deletes and updates. If you delete a row, or update it so that it
takes up less room (by say removing a large text value) then postgres
could use the now free space on that page to store a new tuple.

-Adam

HI Adam,

I suspect that you are right - noiw I come to think of it- I think I got
caught out by the ghost of VSAM creeping up on me )You seriously do NOT
want to know about IBM's VSAM!).

Regards,
Gavin

#10Scott Marlowe
scott.marlowe@gmail.com
In reply to: Gavin Flower (#9)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

On Sat, Nov 19, 2011 at 12:53 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:

On 19/11/11 11:32, Adam Cornett wrote:

On Fri, Nov 18, 2011 at 2:56 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:

On 18/11/11 04:59, Tom Lane wrote:

Craig Ringer<ringerc@ringerc.id.au>  writes:

On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us>  wrote:

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually,
to
handle xid wraparound?

Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

                       regards, tom lane

Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records.  I thought that
random inserts would lead to bloat, as there would be lots of blocks far
from the optimum fill factor.

Regards,
Gavin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I might be wrong (I'm sure Tom will correct me if so), but Postgres does not
store tuples in an ordered format on disk, they are on disk in the order
they are inserted, unless the table is re-ordered by cluster, which only
does a one time sort.
Table bloat (and the table fill factor) are usually associated with deletes
and updates.  If you delete a row, or update it so that it takes up less
room (by say removing a large text value) then postgres could use the now
free space on that page to store a new tuple.
-Adam

HI Adam,

I suspect that you are right - noiw I come to think of it- I think I got
caught out by the ghost of VSAM creeping up on me )You seriously do NOT want
to know about IBM's VSAM!).

Careful, on a list with as many old timers as this one, you may be
sending that message to the guy who wrote the original implementation.
:) I only go as far back as Rexx and JCL and RBase 5000, but never
used VSAM. ISAM yes.

#11Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Scott Marlowe (#10)
Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

On 20/11/11 11:57, Scott Marlowe wrote:

On Sat, Nov 19, 2011 at 12:53 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:

On 19/11/11 11:32, Adam Cornett wrote:

On Fri, Nov 18, 2011 at 2:56 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:

On 18/11/11 04:59, Tom Lane wrote:

Craig Ringer<ringerc@ringerc.id.au> writes:

On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us> wrote:

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually,
to
handle xid wraparound?

Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

regards, tom lane

Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records. I thought that
random inserts would lead to bloat, as there would be lots of blocks far
from the optimum fill factor.

Regards,
Gavin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I might be wrong (I'm sure Tom will correct me if so), but Postgres does not
store tuples in an ordered format on disk, they are on disk in the order
they are inserted, unless the table is re-ordered by cluster, which only
does a one time sort.
Table bloat (and the table fill factor) are usually associated with deletes
and updates. If you delete a row, or update it so that it takes up less
room (by say removing a large text value) then postgres could use the now
free space on that page to store a new tuple.
-Adam

HI Adam,

I suspect that you are right - noiw I come to think of it- I think I got
caught out by the ghost of VSAM creeping up on me )You seriously do NOT want
to know about IBM's VSAM!).

Careful, on a list with as many old timers as this one, you may be
sending that message to the guy who wrote the original implementation.
:) I only go as far back as Rexx and JCL and RBase 5000, but never
used VSAM. ISAM yes.

Brings back memories... AAAARRRGGGGHHHHHHH!!!!!!!!!!!!!!!!!

Many years ago (when I was not so old as I am now) I had a junior
analyst/programmer, who I Had asked to insert about 20k ordered records
from a tape file into a VSAM file where its primary was the sort key of
the records on tape. He wrote a COBOL program that took 7 hours to do
so. Now, he and a Systems Programmer had each been on a 5 day VSAM
course, and they managed to optimise the download to take only half the
time. I went and looked at a COBOL manual for the first time in a few
yeas (I was a Systems Analyst, and above 'mere' programming), and
suggested they change the file type from 'RANDOM' to 'INDEX-SEQUENTIAL'
- the download now took a mere 70 seconds!

At that I time I was adept at designing index sequential files on ICL
mainframes, then VSAM on FACOM mainframes. So I avoided databases,
especially as it involved another (rather snooty) team to do anything
with a database, and program code had to be changed when migrating form
development to UAT and then to production. Once they insisted I create a
data model of a system I had designed with 5 files -- after 4 hours
overtime 2 members of that team and myself came up with a data model;
that exactly matched the 5 files and fields I had used...

I left the mainframe world many years ago, and did not fall into the
temptation to get back into COBOL programming for Y2K.

Now my favourite software stack is Linux/PostgreSQL/JBoss -- and now I
not only design systems, I am expected to code them too!