Make relcache init write errors not be fatal

Started by Jeff Janesover 7 years ago3 messageshackers
Jump to latest
#1Jeff Janes
jeff.janes@gmail.com

After running a testing server out of storage, I tried to track down why it
was so hard to get it back up again. (Rather than what I usually do which
is just throwing it away and making the test be smaller).

I couldn't start a backend because it couldn't write the relcache init file.

I found this comment, but it did not carry its sentiment to completion:

/*
* We used to consider this a fatal error, but we might as well
* continue with backend startup ...
*/

With the attached patch applied, I could at least get a backend going so I
could drop some tables/indexes and free up space.

I'm not enamoured with the implementation of passing a flag down
to write_item, but it seemed better than making write_item return an error
code and then checking the return status in a dozen places. Maybe we could
turn write_item into a macro, so the macro can implement the "return" from
the outer function directly?

Cheers,

Jeff

Attachments:

relcache_init_v1.patchapplication/octet-stream; name=relcache_init_v1.patchDownload+33-17
#2Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#1)
Re: Make relcache init write errors not be fatal

Hi,

On 2018-12-22 20:49:58 -0500, Jeff Janes wrote:

After running a testing server out of storage, I tried to track down why it
was so hard to get it back up again. (Rather than what I usually do which
is just throwing it away and making the test be smaller).

I couldn't start a backend because it couldn't write the relcache init file.

I found this comment, but it did not carry its sentiment to completion:

/*
* We used to consider this a fatal error, but we might as well
* continue with backend startup ...
*/

With the attached patch applied, I could at least get a backend going so I
could drop some tables/indexes and free up space.

Why is this a good idea? It'll just cause hard to debug performance
issues imo.

Greetings,

Andres Freund

#3Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#2)
Re: Make relcache init write errors not be fatal

On Sat, Dec 22, 2018 at 8:54 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-12-22 20:49:58 -0500, Jeff Janes wrote:

After running a testing server out of storage, I tried to track down why

it

was so hard to get it back up again. (Rather than what I usually do

which

is just throwing it away and making the test be smaller).

I couldn't start a backend because it couldn't write the relcache init

file.

I found this comment, but it did not carry its sentiment to completion:

/*
* We used to consider this a fatal error, but we might as well
* continue with backend startup ...
*/

With the attached patch applied, I could at least get a backend going so

I

could drop some tables/indexes and free up space.

Why is this a good idea? It'll just cause hard to debug performance
issues imo.

You get lots of WARNINGs, so it shouldn't be too hard to debug. And once
you drop a table or an index, the init will succeed and you wouldn't have
the performance issues at all anymore.

The alternative, barring finding extraneous data on the same partition that
can be removed, seems to be having indefinite downtime until you can locate
a larger hard drive and move everything to it, or using dangerous hacks.

Cheers,

Jeff