SELECT * FROM <table> LIMIT 1; is really slow
I have a table with about 16,000,000 rows in it.
When I do a:
SELECT * FROM <table> LIMIT 1;
it takes about 10 minutes (thats about how long it takes to do a full
sequential scan).
I had originally thought that there might be a large number of
"wasted/retired" tuples in the table so I "vacuum analysed" the
database. It had no effect. I had a "vacuum full" going on the table
for 17 hours before I killed it.
Any idea why its so slow? It used to be instant.
dave
David Blasby wrote:
I have a table with about 16,000,000 rows in it.
When I do a:
SELECT * FROM <table> LIMIT 1;
it takes about 10 minutes (thats about how long it takes to do a full
sequential scan).I had originally thought that there might be a large number of
"wasted/retired" tuples in the table so I "vacuum analysed" the
database. It had no effect. I had a "vacuum full" going on the table
for 17 hours before I killed it.
Are you sure that the vacuum full was running or sitting there to
wait an idle transaction ?
Regards
Gaetano Mendola
Gaetano Mendola wrote:
David Blasby wrote:
I have a table with about 16,000,000 rows in it.
When I do a:
SELECT * FROM <table> LIMIT 1;
it takes about 10 minutes (thats about how long it takes to do a full
sequential scan).I had originally thought that there might be a large number of
"wasted/retired" tuples in the table so I "vacuum analysed" the
database. It had no effect. I had a "vacuum full" going on the table
for 17 hours before I killed it.Are you sure that the vacuum full was running or sitting there to
wait an idle transaction ?
It was running at about 90% CPU.
I just did another vacuum analyse on the table:
cwb_prod_5_20=# VACUUM ANALYZE verbose csn_edges;
INFO: vacuuming "public.csn_edges"
INFO: index "csn_edges_group_code_idx" now contains 16289929 row
versions in 75789 pages
INFO: index "csn_edges_edge_id_idx" now contains 16289929 row versions
in 55210 pages
INFO: index "csn_edges_code_idx" now contains 16289929 row versions in
61203 pages
INFO: index "csn_edges_outside_idx" now contains 16289929 row versions
in 75719 pages
INFO: index "csn_edges_the_geom_idx" now contains 16289929 row versions
in 238795 pages
INFO: "csn_edges": found 0 removable, 16289929 nonremovable row
versions in 2783986 pages
INFO: vacuuming "pg_toast.pg_toast_126945560"
INFO: index "pg_toast_126945560_index" now contains 441432 row versions
in 3064 pages
INFO: "pg_toast_126945560": found 0 removable, 441432 nonremovable row
versions in 154691 pages
INFO: analyzing "public.csn_edges"
INFO: "csn_edges": 2783986 pages, 3000 rows sampled, 6724 estimated
total rows
VACUUM
Its still slow!
dave
David Blasby <dblasby@refractions.net> writes:
I have a table with about 16,000,000 rows in it.
When I do a:
SELECT * FROM <table> LIMIT 1;
it takes about 10 minutes (thats about how long it takes to do a full
sequential scan).
The only explanation that comes to mind is huge amounts of dead space
before the first live row. But the output of "vacuum verbose" would
probably be enough to tell whether this is really the correct diagnosis.
I had originally thought that there might be a large number of
"wasted/retired" tuples in the table so I "vacuum analysed" the
database. It had no effect.
It wouldn't; you'd need vacuum full to collapse out the dead space.
You could also try CLUSTER which is faster than VACUUM FULL when most
of the data has to be moved anyway. (Some days I think we should dump
VACUUM FULL, because it's optimized for a case that's no longer very
interesting...)
regards, tom lane
David Blasby <dblasby@refractions.net> writes:
I just did another vacuum analyse on the table:
Ah, here we go:
INFO: "csn_edges": found 0 removable, 16289929 nonremovable row
versions in 2783986 pages
That works out to just under 6 rows per 8K page, which wouldn't be too
bad if the rows are 1K wide on average, but are they? (You might want
to run contrib/pgstattuple to get some exact information about average
tuple size.)
INFO: analyzing "public.csn_edges"
INFO: "csn_edges": 2783986 pages, 3000 rows sampled, 6724 estimated
total rows
This looks like a smoking gun to me. The huge underestimate of number
of rows from ANALYZE is a known failure mode of the existing sampling
method when the early pages of the table are thinly populated. (Manfred
just fixed that for 7.5, btw.)
I think you want to VACUUM FULL or CLUSTER the table, and then take a
look at your FSM settings and routine vacuuming frequency to see if
you need to adjust them to keep this from happening again.
regards, tom lane
Tom Lane wrote:
INFO: "csn_edges": found 0 removable, 16289929 nonremovable row
versions in 2783986 pagesThat works out to just under 6 rows per 8K page, which wouldn't be too
bad if the rows are 1K wide on average, but are they? (You might want
to run contrib/pgstattuple to get some exact information about average
tuple size.)
The rows are "wide" - there's a PostGIS geometry present.
INFO: analyzing "public.csn_edges"
INFO: "csn_edges": 2783986 pages, 3000 rows sampled, 6724 estimated
total rowsThis looks like a smoking gun to me. The huge underestimate of number
of rows from ANALYZE is a known failure mode of the existing sampling
method when the early pages of the table are thinly populated. (Manfred
just fixed that for 7.5, btw.)I think you want to VACUUM FULL or CLUSTER the table, and then take a
look at your FSM settings and routine vacuuming frequency to see if
you need to adjust them to keep this from happening again.
I'm now clustering - thanks for the help!
The history of this table is quite short - I just created it last week.
The original table had a bigint column that I converted to int (using
the "alter table csn_edges rename to csn_edges_backup;CREATE TABLE
csn_edges AS SELECT a,b,c::int,d,e FROM csn_edges; delete table
csn_edges_backup;" trick). I dont think there were any changes to the
current csn_edges table after it was created.
I have another copy of this table in another database - vacuum analyse
verbose says its "only" 1,500,000 pages (vs 2,800,000). Shouldnt vacuum
know your table is wasting 10Gb of space and fix it for you? Or at
least HINT? Or a "TIDY" command?
Should I be upping my FSM to 2,000,000 pages?
dave
David Blasby <dblasby@refractions.net> writes:
I just did another vacuum analyse on the table:
Ah, here we go:
INFO: "csn_edges": found 0 removable, 16289929 nonremovable row
versions in 2783986 pagesThat works out to just under 6 rows per 8K page, which wouldn't be too
bad if the rows are 1K wide on average, but are they? (You might want
to run contrib/pgstattuple to get some exact information about average
tuple size.)INFO: analyzing "public.csn_edges"
INFO: "csn_edges": 2783986 pages, 3000 rows sampled, 6724 estimated
total rowsThis looks like a smoking gun to me. The huge underestimate of number
of rows from ANALYZE is a known failure mode of the existing sampling
method when the early pages of the table are thinly populated. (Manfred
just fixed that for 7.5, btw.)
Tom, is there a way choose between a sample and full?
Show quoted text
I think you want to VACUUM FULL or CLUSTER the table, and then take a
look at your FSM settings and routine vacuuming frequency to see if
you need to adjust them to keep this from happening again.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
It wouldn't; you'd need vacuum full to collapse out the dead space.
You could also try CLUSTER which is faster than VACUUM FULL when most
of the data has to be moved anyway. (Some days I think we should dump
VACUUM FULL, because it's optimized for a case that's no longer very
interesting...)
Out of interest, is CLUSTER another fast way of truly removing OIDs from
a table, after going SET WITHOUT OIDS?
Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
It wouldn't; you'd need vacuum full to collapse out the dead space.
You could also try CLUSTER which is faster than VACUUM FULL when most
of the data has to be moved anyway. (Some days I think we should dump
VACUUM FULL, because it's optimized for a case that's no longer very
interesting...)
Out of interest, is CLUSTER another fast way of truly removing OIDs from
a table, after going SET WITHOUT OIDS?
I think not --- IIRC, CLUSTER just copies the tuples verbatim. It'd do
fine for getting rid of dead tuples and unused space, but not for making
any actual changes in the individual tuples.
What would be useful for that (as of 7.5) is a dummy ALTER TABLE TYPE,
viz
ALTER TABLE tab ALTER COLUMN col TYPE same-type-it-already-has
There not being any special case to short-circuit this, the thing will
sit there and reconstruct the tuples and write them into a fresh table
file (and rebuild the indexes, too). Net effect is to collapse out any
dropped columns (or OIDs) as well as losing dead tuples and free space.
I was just noticing today that the ALTER TABLE documentation suggests
a dummy UPDATE and VACUUM FULL to get rid of the space occupied by a
dropped column. On reflection the ALTER TYPE method is likely to be
an order of magnitude faster. Will update the docs.
A further thought along this line: if you have to do an UPDATE that
affects every row of a large table, the traditional way is, eg,
UPDATE tab SET col = col + 1
which leaves you with N live rows, N dead rows, and lots of pain to get
back down to a less-than-twice-normal-size table. (Traditional way is
VACUUM FULL; CLUSTER is better, but still painful.) As of 7.5 you could
hack this with
ALTER TABLE tab ALTER COLUMN col TYPE same-type USING col + 1
which will have the effect of rewriting a fresh table file containing
the updated rows, and dropping the old file at commit. Sweet. (Peak
disk usage is 2x normal either way, but you don't pay through the nose
to get back down to 1x.) I'm not real sure where to document this
trick but it seems like we ought to mention it someplace.
regards, tom lane
David Blasby <dblasby@refractions.net> writes:
I have another copy of this table in another database - vacuum analyse
verbose says its "only" 1,500,000 pages (vs 2,800,000).
Hmm ... this is consistent with the idea that you did an UPDATE
affecting every row of the table. That would generate N new rows which
would all get added to the end of the table, if there were no dead space
available within the table. After commit you have a lot of pages worth
of dead tuples followed by a lot of pages worth of live tuples. Plain
VACUUM cannot do much to fix this since it doesn't move rows around.
VACUUM FULL will fix it, but its index-update overhead is high enough
that CLUSTER is a better deal.
See followup message discussing ALTER TABLE as another alternative.
regards, tom lane
Tom Lane wrote:
of dead tuples followed by a lot of pages worth of live tuples. Plain
VACUUM cannot do much to fix this since it doesn't move rows around.
VACUUM FULL will fix it, but its index-update overhead is high enough
that CLUSTER is a better deal.
Tom: I was interested in performance improvements from cluster, so I
tried to cluster a table on a spatial index:
dra_working=# \d geomtest
Table "public.geomtest"
Column | Type | Modifiers
----------------+----------+-----------
rd_segment_id | integer |
admit_date | date |
retire_date | date |
most_recent | boolean |
lineargeometry | geometry |
Indexes: geomtest_idx gist (lineargeometry)
dra_working=# cluster geomtest_idx on geomtest;
ERROR: CLUSTER: cannot cluster when index access method does not handle
nulls
You may be able to work around this by marking column "lineargeometry"
NOT NULL
dra_working=# select version();
version
---------------------------------------
PostgreSQL 7.3.6 on i686-pc-linux-gnu
As of quite a while ago (7.2?) the GiST access method was made null-safe
by Teodor and Oleg, I think. Is this a safety wrapper left over from
before the upgrade to GiST?
--
__
/
| Paul Ramsey
| Refractions Research
| Email: pramsey@refractions.net
| Phone: (250) 885-0632
\_
On Wed, 26 May 2004, Tom Lane wrote:
if you have to do an UPDATE that affects every row of a large table
UPDATE tab SET col = col + 1
which leaves you with N live rows, N dead rows, and lots of pain to get
back down to a less-than-twice-normal-size table. (Traditional way is
VACUUM FULL; CLUSTER is better, but still painful.) As of 7.5 you could
hack this withALTER TABLE tab ALTER COLUMN col TYPE same-type USING col + 1
which will have the effect of rewriting a fresh table file containing
the updated rows, and dropping the old file at commit.I'm not real sure where to document this
trick but it seems like we ought to mention it someplace.
Isn't it better to detect a UPDATE without a where and do that update in
the same way as the alter table above? Then we don't need to document and
learn a new non standard way of doing an update.
--
/Dennis Bj�rklund
What would be useful for that (as of 7.5) is a dummy ALTER TABLE TYPE,
vizALTER TABLE tab ALTER COLUMN col TYPE same-type-it-already-has
I think we should add special syntax for this purpose, since I would like to
(or someone else later on) see all possible cases of "alter column" short
circuited.
The syntax I would see fit is something like:
ALTER TABLE tab [MOVE] TABLESPACE xy;
For the above special case the tablespace would be the same as before.
Andreas
Import Notes
Resolved by subject fallback
Dennis Bjorklund <db@zigo.dhs.org> writes:
On Wed, 26 May 2004, Tom Lane wrote:
I'm not real sure where to document this
trick but it seems like we ought to mention it someplace.
Isn't it better to detect a UPDATE without a where and do that update in
the same way as the alter table above? Then we don't need to document and
learn a new non standard way of doing an update.
No, because the locking implications are completely different. I don't
want UPDATE to suddenly decide it needs an exclusive lock on the table
based on the shape of the WHERE clause.
regards, tom lane
I did a CLUSTER - it took just under 12 hours. The original load of the
table (including parsing all the geometries) took about 4 hours (+4 to
build the indexes and vacuum analyse). I dont know why the cluster is
so much slower...
My "SELECT * FROM <table> LIMIT 1;" is now fast.
Thanks for the help!
As a side note, I think the behavior of the Database is a bit
counter-intuitive. If I were to do a:
UPDATE <table> SET i = i + 1;
My table will "leak" 10Gb of disk space, make queries extreamly slow,
and require taking the database off-line for 12 hours to fix it!
Both the VACUUM and the UPDATE manual should be updated to say that this
is whats happening. If I do my above query 100 times, will I be leaking
a Terrabyte of disk space?
Will increasing my max_fsm_pages to 2,000,000 solve my problem?
dave
On Wed, 26 May 2004 18:17:55 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
(Some days I think we should dump
VACUUM FULL, because it's optimized for a case that's no longer very
interesting...)
So we still have to stick with VACUUM FULL for some time, right?
The next set of compatibility breakers I'm currently working on requires
a change in VACUUM FULL behaviour. I would only move tuples that are
visible to all running transactions. OTOH I wouldn't stop at the first
unmovable tuple.
With
X active tuple
. free space or dead tuple
y new tuple, not yet visible to a running transaction
z deleted tuple, still visible to a running transaction
the current implementation transforms this relation
XXXX.... XXXX.... XXXX.... XXXX.... XXyX.... XXXX.... XzXX.... XXXX....
into
XXXXXXXX XXXXXzXX XXXXXXXX XXXXXXyX
The new implementation would leave it as
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXX.. ..y..... ........ .z......
If there are concurrent long-running transactions, the new VACUUM FULL
wouldn't truncate the relation as aggressively as it does now. It could
leave the relation with lots of free space near the end. This was
absolutely unacceptable at the time when VACUUM FULL was designed. But
now we could use lazy VACUUM as an excuse for VACUUM FULL not working so
hard. After the transaction still seeing z terminates, VACUUM (without
FULL) can truncate the relation to
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXX.. ..y.....
and when y is updated the new version will be stored in a lower block
and plain VACUUM can truncate the relation again:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXY.
AFAICS this would make vacuum.c much simpler (no more chain moves).
Clearly this change alone doesn't have any merit. But would such a
patch have any chance of being accepted, if it facilitates improvements
in other areas?
Servus
Manfred
Manfred Koizar <mkoi-pg@aon.at> writes:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXX.. ..y.....
and when y is updated the new version will be stored in a lower block
Oh? What makes you think that? I see no guarantee of it.
AFAICS this would make vacuum.c much simpler (no more chain moves).
How will it do that? I think you'd have to not move *any* updated
tuples to be sure you don't need the chain-move mechanism. Not moving
the outdated tuple isn't sufficient, you can't move the one it points at
either.
Clearly this change alone doesn't have any merit. But would such a
patch have any chance of being accepted, if it facilitates improvements
in other areas?
I'm disinclined to mess with VACUUM FULL without a clearer explanation
of where you're headed. So far as I can see, you're simply going to
make VACUUM FULL less effective with no stated benefit.
(BTW, it now occurs to me that CLUSTER and ALTER TABLE in their present
forms may be broken, because they only copy rows that are valid
according to SnapshotNow; this means that rows that are still visible to
old transactions could be lost. The lack of any attempt to preserve
update chain relationships seems ungood as well, if we might have old
transactions come along and try to update the table later.)
regards, tom lane
On Thu, 27 May 2004 14:23:07 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
and when y is updated the new version will be stored in a lower block
Oh? What makes you think that? I see no guarantee of it.
You're right, I see only a tendency, because the majority of free space
is before the last block (obviously). But don't we try to store the new
version on the same block as the old version? That'd weaken my argument
a bit.
I think you'd have to not move *any* updated
tuples to be sure you don't need the chain-move mechanism.
Yes, per definitionem (move only tuples that are visible to all).
I'm disinclined to mess with VACUUM FULL without a clearer explanation
of where you're headed.
Have no fear. I won't change anything in the near term. As you were
talking about the future of VACUUM FULL, I thought this might be a good
opportunity to ask. The fact that you didn't outright reject the idea
is good enough for now.
I have no clear explanation at the moment, just some fuzzy ideas that
are beginning to crystallise. I'm messing around with heap tuple
headers again, and the Xvac field seems to get in the way, unless I can
cut down the number of different scenarios where it is needed.
Servus
Manfred
On Thu, May 27, 2004 at 09:52:30PM +0200, Manfred Koizar wrote:
I have no clear explanation at the moment, just some fuzzy ideas that
are beginning to crystallise. I'm messing around with heap tuple
headers again, and the Xvac field seems to get in the way, unless I can
cut down the number of different scenarios where it is needed.
Now you are on the subject, can I ask you to take a peek at what I did
regarding tuple headers?
At first I thought I'd have to add back Xmax as a field on its own, but
later (in chat with Bruce) I arrived to the conclusion that it isn't
really necessary, and I only added a bit to the infomask to flag when
the Cmin is overridden with Xmax.
However I'm not convinced that this is enough --- is there a situation
on which we should need to peek at Cmin after setting Xmax for a
particusar tuple?
The problem was
BEGIN;
insert into foo values (1)
begin
delete from foo
rollback
-- at this point the tuple shold be visible,
-- but it has my Xid as Xmin and Cmin was
-- overriden with Xmax
commit
I'd appreciate your (Manfred's and Tom's) comments on the topic.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"And as an added bonus, now my computer goes to the toilet for me, leaving me
free to spend time on more useful activities! yay slug codefests!" (C. Parker)
On Thu, 27 May 2004 16:50:24 -0400, Alvaro Herrera
<alvherre@dcc.uchile.cl> wrote:
Now you are on the subject, can I ask you to take a peek at what I did
regarding tuple headers?
I did read your patch, but I didn't understand it. :-(
At first I thought I'd have to add back Xmax as a field on its own
Veto! This would increase heap tuple header size ==> less tuples per
page ==> more pages per table ==> more I/O ==> performance loss.
is there a situation
on which we should need to peek at Cmin after setting Xmax for a
particusar tuple?
http://archives.postgresql.org/pgsql-hackers/2002-05/msg00090.php
Servus
Manfred