xid wrap / optimize frozen tables?
Hi,
as many before, I ran into the issue of a postgresql database (8.4.1)
- committing many transactions
- to huge volume tables (3-figure GB in size)
- running the xid wrap vacuum (to freeze tuples)
where the additional read IO load has negative impact to the extent of the
system becoming unusable.
Besides considering the fact that this can be worked around by exchanging
printed sheets of paper or plastic (hello to .au) for hardware, I'd very much
appreciate answers to these questions:
* have I missed any more recent improvements regarding this problem? My
understanding is that the full scan for unfrozen tuples can be made less likely
(by reducing the number of transactions and tuning the autovac), but that it is
still required. Is this correct?
* A pretty obvious idea seems to be to add special casing for "fully frozen
tables": If we could register the fact that a table is fully frozen (has no new
tuples after the complete-freeze xid), a vacuum would get reduced to just
increasing that "last frozen" xid.
It seems like Alvaro Herrera had implemented a patch along the lines of this
idea but I fail to find any other references to it:
http://grokbase.com/t/postgresql/pgsql-hackers/0666gann3t/how-to-avoid-transaction-id-wrap#200606113hlzxtcuzrcsfwc4pxjimyvwgu
Does anyone have pointers what happened to the patch?
Thanks, Nils
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 23, 2015 at 03:46:33PM +0200, Nils Goroll wrote:
Hi,
as many before, I ran into the issue of a postgresql database (8.4.1)
Uh, did you mean 9.4.1 as 8.4.1 is end-of-lifed.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Nils Goroll <slink@schokola.de> writes:
as many before, I ran into the issue of a postgresql database (8.4.1)
*Please* tell us that was a typo.
If it wasn't, there were possibly-relevant fixes in 8.4.6, 8.4.9,
and perhaps later; I got tired of scanning the release notes.
8.4.x is out of support entirely now, of course, but it ought to be fairly
painless to drop in the last 8.4.x release (8.4.22) and see if that makes
things any better. If not, you ought to be thinking about an upgrade to
something 9.recent rather than trying to hack up 8.4.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23/05/15 16:50, Tom Lane wrote:
as many before, I ran into the issue of a postgresql database (8.4.1)
*Please* tell us that was a typo.
Yes it was, my sincere apologies. It's 9.4.1
Nils
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 23, 2015 at 6:46 AM, Nils Goroll <slink@schokola.de> wrote:
Hi,
as many before, I ran into the issue of a postgresql database (8.4.1)
- committing many transactions
- to huge volume tables (3-figure GB in size)
- running the xid wrap vacuum (to freeze tuples)where the additional read IO load has negative impact to the extent of the
system becoming unusable.
(9.4.1 noted)
Are you sure it is the read IO that causes the problem? It should be
pretty light, as it would mostly be satisfied by read-ahead (unless you
have GIN indexes) and for pages that aren't automatically prefetched, it
just waits patiently for the requested data to arrive. (As opposed to
writing, in which it runs around dirtying things that other processes need,
clogging up their IO rather than just its own).
What monitoring techniques do you use to determine the source of the
slowdown?
Besides considering the fact that this can be worked around by exchanging
printed sheets of paper or plastic (hello to .au) for hardware, I'd very
much
appreciate answers to these questions:* have I missed any more recent improvements regarding this problem? My
understanding is that the full scan for unfrozen tuples can be made less
likely
(by reducing the number of transactions and tuning the autovac), but that
it is
still required. Is this correct?* A pretty obvious idea seems to be to add special casing for "fully frozen
tables": If we could register the fact that a table is fully frozen (has
no new
tuples after the complete-freeze xid), a vacuum would get reduced to just
increasing that "last frozen" xid.It seems like Alvaro Herrera had implemented a patch along the lines of
this
idea but I fail to find any other references to it:Does anyone have pointers what happened to the patch?
I don't know happened to that, but there is another patch waiting for
review and testing:
https://commitfest.postgresql.org/5/221/
Cheers,
Jeff
Hi Jeff and all,
On 23/05/15 22:13, Jeff Janes wrote:
Are you sure it is the read IO that causes the problem?
Yes. Trouble is here that we are talking about a 361 GB table
List of relations
Schema | Name | Type | Owner | Size |
Description
--------+-----------------------------+----------+----------+------------+-------------
public | *redacted*_y2015m04 | table | postgres | 361 GB |
and while we have
shared_buffers = 325GB
huge_pages = on
this is not the only table of this size (total db size ist 1.8tb) and more
current data got written to *redacted*_y2015m05 (the manually-partitioned table
for may), so most of the m04 data would have got evicted from the cache when
this issue surfaced initially.
There is one application pushing data (bulk inserts) and we have transaction
rates for this app in a log. The moment the vacuum started, these rates dropped.
Unfortunately I cannot present helpful log excerpts here as the autovacuum never
finished so far (because the admin killed the db), so we have zero logging about
past autovac events.
At the moment, the application is shut down and the machine is only running the
vacs:
query_start | 2015-05-22 19:33:52.44334+02
waiting | f
query | autovacuum: VACUUM public.*redacted*_y2015m04 (to prevent
wraparound)
query_start | 2015-05-22 19:34:02.46004+02
waiting | f
query | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05 (to
prevent wraparound)
so we know that any io must be caused by the vacs:
shell# uptime
13:33:33 up 1 day, 18:01, 2 users, load average: 5.75, 12.71, 8.43
shell# zpool iostat
capacity operations bandwidth
pool alloc free read write read write
--------------- ----- ----- ----- ----- ----- -----
tank1 358G 6.90T 872 55 15.1M 3.08M
Again, we know IO capacity is insufficient, the pool is on 2 magnetic disks only
atm, so an avg read rate of 872 IOPS averaged over 42 hours is not even bad.
I don't know happened to that, but there is another patch waiting for review and
testing:
This is really interesting, thank you very much for the pointer.
Cheers, Nils
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5/24/15 6:42 AM, Nils Goroll wrote:
shared_buffers = 325GB
FWIW, a lot of people report performance loss with shared buffers that
large. At a minimum, if you're going to set them that large then you
want to make sure that the OS has a bare minimum of memory in use for
it's disk cache, since everything in there will be double cached.
query_start | 2015-05-22 19:33:52.44334+02
waiting | f
query | autovacuum: VACUUM public.*redacted*_y2015m04 (to prevent
wraparound)
query_start | 2015-05-22 19:34:02.46004+02
waiting | f
query | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05 (to
prevent wraparound)
By default autovac has a cost delay > 0, which will significantly slow
down those vacuums. Normally that's what you want (so you're not
swamping the system), but in this case it's going to be problematic for you.
Also, if there's other tables that need to be frozen you should
experiment with starting those vacuums now, by hand. That means you'll
have more IO happening than you have spindles, but my experience is that
it can take a lot of extra Postgres backends to actually swamp
sequential reads on an IO system. VACUUM's reads are sequential by
nature, so you should be able to get close to the maximum theoretical
sequential read speed from your drives.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 24, 2015 6:42 AM, "Nils Goroll" <slink@schokola.de> wrote:
Hi Jeff and all,
On 23/05/15 22:13, Jeff Janes wrote:
Are you sure it is the read IO that causes the problem?
Yes. Trouble is here that we are talking about a 361 GB table
List of relations
Schema | Name | Type | Owner | Size |
Description
--------+-----------------------------+----------+----------+------------+-------------
public | *redacted*_y2015m04 | table | postgres | 361 GB |
and while we have
shared_buffers = 325GB
huge_pages = on
As mentioned, that is very large setting for share buffers.
this is not the only table of this size (total db size ist 1.8tb) and more
current data got written to *redacted*_y2015m05 (the manually-partitioned
table
for may), so most of the m04 data would have got evicted from the cache
when
this issue surfaced initially.
There is one application pushing data (bulk inserts) and we have
transaction
rates for this app in a log. The moment the vacuum started, these rates
dropped.
Unfortunately I cannot present helpful log excerpts here as the
autovacuum never
finished so far (because the admin killed the db), so we have zero
logging about
past autovac events.
Could you do an experiment in which you do a large sequential read on the
database files and measure the impact on the queries that way? Like:
tar -cf - data_dir | wc -c
Or better, use some fancy version that throttles to the read rate observed
below.
At the moment, the application is shut down and the machine is only
running the
vacs:
query_start | 2015-05-22 19:33:52.44334+02
waiting | f
query | autovacuum: VACUUM public.*redacted*_y2015m04 (to
prevent
wraparound)
query_start | 2015-05-22 19:34:02.46004+02
waiting | f
query | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05
(to
prevent wraparound)
so we know that any io must be caused by the vacs:
shell# uptime
13:33:33 up 1 day, 18:01, 2 users, load average: 5.75, 12.71, 8.43
What OS is this? This load average looks very high. Does the OS charge
processes that are blocked on IO against uptime?
shell# zpool iostat
capacity operations bandwidth
pool alloc free read write read write
--------------- ----- ----- ----- ----- ----- -----
tank1 358G 6.90T 872 55 15.1M 3.08M
I'm not familiar with zpool but this shows a lot of writing going on. If
the table was already frozen and just needed to be observed as being all
frozen, then it should not be dirtying one block for every 5 blocks read.
I would not be surprised if it were the reading, not the writing, which
caused the performance problem.
Cheers, Jeff
Just FYI: We have worked around these issues by running regular (scripted and
thus controlled) vaccuums on all tables but the active ones and adding L2 ZFS
caching (l2arc). I hope to get back to this again soon.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 3, 2015 at 2:49 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
I would not be surprised if it were the reading, not the writing, which
caused the performance problem.
Of course I screwed up that last sentence. I meant the opposite, it would
not surprise me if it were the writing that caused the problem, despite
there being 5 times less of it.
Jeff