reclaiming space from heavily used tables?

Started by Rob Nikanderover 8 years ago7 messagesgeneral
Jump to latest
#1Rob Nikander
rob.nikander@gmail.com

Hi,

I've got a large table from which I'd like to completely reclaim space. I read the docs and it sounds like I can’t run `vacuum full`, because this table is accessed constantly and can’t have downtime. Assuming that’s true, what do you think of the following idea? Is there a better alternative?

1. Replace the table T with two tables T1 and T2 and a view T that is `select * from T1 union T2`.
2. Write only to T1, and slowly move records from T2 to T1.
3. When T2 is empty, redefine the view T to be simply `select * from T1`.
4. Recreate or vacuum full T2, so space is fully reclaimed.
5. Redefine view as the union select and repeat process going other direction from T1 to T2, as needed.

I guess one question is whether I can do 1 and 3 (renaming tables, redefining views) atomically without disturbing concurrent select queries.

Rob

#2Jeremy Finzel
finzelj@gmail.com
In reply to: Rob Nikander (#1)
Re: reclaiming space from heavily used tables?

On Mon, Dec 18, 2017 at 1:03 PM, Rob Nikander <rob.nikander@gmail.com>
wrote:

Hi,

I've got a large table from which I'd like to completely reclaim space. I
read the docs and it sounds like I can’t run `vacuum full`, because this
table is accessed constantly and can’t have downtime. Assuming that’s true,
what do you think of the following idea? Is there a better alternative?

1. Replace the table T with two tables T1 and T2 and a view T that is
`select * from T1 union T2`.
2. Write only to T1, and slowly move records from T2 to T1.
3. When T2 is empty, redefine the view T to be simply `select * from T1`.
4. Recreate or vacuum full T2, so space is fully reclaimed.
5. Redefine view as the union select and repeat process going other
direction from T1 to T2, as needed.

I guess one question is whether I can do 1 and 3 (renaming tables,
redefining views) atomically without disturbing concurrent select queries.

This is what you want: https://github.com/reorg/pg_repack

This has been around for many years and is a very trusted extension (when
will it be in core????).

Show quoted text

Rob

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeremy Finzel (#2)
Re: reclaiming space from heavily used tables?

Jeremy Finzel wrote:

This is what you want: https://github.com/reorg/pg_repack

This has been around for many years and is a very trusted extension (when
will it be in core????).

It's been around, but is it trusted? I for one do not trust it. See
for example
/messages/by-id/CA+TgmoaWoU+BpBG4nwz1LGkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@mail.gmail.com

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Jeremy Finzel
finzelj@gmail.com
In reply to: Alvaro Herrera (#3)
Re: reclaiming space from heavily used tables?

It's been around, but is it trusted? I for one do not trust it. See
for example
/messages/by-id/CA+TgmoaWoU+BpBG4nwz1L
GkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@mail.gmail.com

Needs some discussion. Has anyone actually reported corruption related to
this? I don't doubt the opinion, but I do wonder why I have never seen
reports of corruption related to this when people have readily reported
such things in years past about pg_repack. If it (apparently) works so
well in practice for so many companies, but has this bug, why not leverage
what they have done and fix its bugs when it serves such an important
function? The extension serves a huge need on many fronts.

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeremy Finzel (#4)
Re: reclaiming space from heavily used tables?

Jeremy Finzel wrote:

It's been around, but is it trusted? I for one do not trust it. See
for example
/messages/by-id/CA+TgmoaWoU+BpBG4nwz1L
GkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@mail.gmail.com

Needs some discussion. Has anyone actually reported corruption related to
this? I don't doubt the opinion, but I do wonder why I have never seen
reports of corruption related to this when people have readily reported
such things in years past about pg_repack.

Maybe the nature of the corruption caused is different. It took months
of running large databases on production for corruption to become
apparent from multixact bugs, for example. Or maybe because the
relfrozenxid is fixed by other activity in the system, any bugs are
masked -- but that you could get in trouble if autovacuum is disabled,
perhaps. Or maybe it's masked even in that case, since autovacuum
observes that the table has a very old frozenxid and runs a forced
vacuum on that table anyway.

If it (apparently) works so well in practice for so many companies,
but has this bug, why not leverage what they have done [...] ?

What have they done?

[...] and fix its bugs when it serves such an important function? The
extension serves a huge need on many fronts.

It's not PGDG's tool, so these are not "our" bugs to fix. Though it's
pretty clear that we need *something* to cover those needs ... but
nothing has been submitted to commitfests.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Jeremy Finzel
finzelj@gmail.com
In reply to: Alvaro Herrera (#5)
Re: reclaiming space from heavily used tables?

Maybe the nature of the corruption caused is different. It took months
of running large databases on production for corruption to become
apparent from multixact bugs, for example. Or maybe because the
relfrozenxid is fixed by other activity in the system, any bugs are
masked -- but that you could get in trouble if autovacuum is disabled,
perhaps. Or maybe it's masked even in that case, since autovacuum
observes that the table has a very old frozenxid and runs a forced
vacuum on that table anyway.

Or maybe there isn’t actual corruption. There is debate in the Git issue
discussed over this... I’m not sure. But whatever the case may be, I don’t
find it useful to throw out a tool out of hand because of what in theory
appears to be a huge bug but which in reality has not happened to a very
broad install base of users. pg_repack appears to be the number one way
people solve this kind of problem in very critical production environments.
So I don’t find it reasonable to just throw out the tool.

If it (apparently) works so well in practice for so many companies,
but has this bug, why not leverage what they have done [...] ?

What have they done?

Made a tool that can cluster tables, remove bloat and reindex, and move
tables between tablespaces with no downtime.

[...] and fix its bugs when it serves such an important function? The
extension serves a huge need on many fronts.

It's not PGDG's tool, so these are not "our" bugs to fix. Though it's
pretty clear that we need *something* to cover those needs ... but
nothing has been submitted to commitfests.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I’m not trying to point fingers or say who should do what. My point is
that this tool solves some important production challenges and it would be
better to leverage this code base and maybe fix some issues then try to
write something completely new. And that is largely because there has
already been huge production usage of this tool for years which is already
exposed many issues that have been fixed.
Many thanks for the feedback,
Jeremy

#7Rob Nikander
rob.nikander@gmail.com
In reply to: Jeremy Finzel (#6)
Re: reclaiming space from heavily used tables?

I don’t see how I can use that extension. If some people are saying it’s dangerous and others are saying it’s fine, I don’t have the time to drill down into PG internals so that I can judge for myself. I’ll probably try the two table idea outlined in my original message.

Show quoted text

On Dec 18, 2017, at 5:24 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
[…]

On Dec 18, 2017, at 2:51 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
[…]