Postgres crash? could not write to log file: No space left on device

Started by Yuri Levinskyalmost 13 years ago17 messagesbugs

yuril@celltick.com

almost 13 years ago

Dear All,

I have the following issue on Sun Solaris 10. PostgreSQL version is
9.2.3. The wall logging is minimal and no archiving. The DB restarted
several time, the box is up for last 23 days. The PostgreSQL
installation and files under /data/postgres that is half empty. Is it
some other destination that might cause the problem? Can I log the space
consumption and directory name where the problem is happening by some
debug level or trace setting?

PANIC: could not write to log file 81, segment 125 at offset 13959168,
length 1392640: No space left on device

LOG: process 10203 still waiting for ShareLock on transaction 3010915
after 1004.113 ms

STATEMENT: UPDATE tctuserinfo SET clickmiles = clickmiles + $1,
periodicalclickmiles = periodicalclickmiles + $2, active = $3,
activeupdatetime = $4, activationsetby = $5, smscid = $6, sdrtime = $7,
simvalue = simvalue + $8, totalsimval

ue = totalsimvalue + $9, firstclick = $10, lastclick = $11,
firstactivationtime = $12, cbchannel = $13, clickmilesupdatetime = $14,
ci = $15, lac = $16, bscid = $17, lastlocationupdatetime = $18,
subscriptiontype = $19, contentcategory =

$20, livechannels = $21, contextclicks = $22 WHERE phonenumber = $23

LOG: WAL writer process (PID 10476) was terminated by signal 6

LOG: terminating any other active server processes

FATAL: the database system is in recovery mode

But it looks OK:

dbnetapp:/vol/postgres

90G 44G 46G 49% /data/postgres

Is it possible that "heavy" queries consuming disk space (as temporary
space) and after the crash and recovery it becoming OK?

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Yuri Levinsky (#1)

Re: Postgres crash? could not write to log file: No space left on device

"Yuri Levinsky" <yuril@celltick.com> writes:

I have the following issue on Sun Solaris 10. PostgreSQL version is
9.2.3. The wall logging is minimal and no archiving. The DB restarted
several time, the box is up for last 23 days. The PostgreSQL
installation and files under /data/postgres that is half empty. Is it
some other destination that might cause the problem? Can I log the space
consumption and directory name where the problem is happening by some
debug level or trace setting?

PANIC: could not write to log file 81, segment 125 at offset 13959168,
length 1392640: No space left on device

That's definitely telling you it got ENOSPC from a write in
$PGDATA/pg_xlog. Maybe you have a user-specific space quota affecting
the postgres account?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Yuri Levinsky

yuril@celltick.com

almost 13 years ago

In reply to: Tom Lane (#2)

Re: Postgres crash? could not write to log file: No spaceleft on device

Tom,
There are no any space quota on user postgres. It always happens on
different log files. When I am creating huge dummy file in same FS -
nothing wrong.

P.S. I am getting same error when I am using pg_dump in tar format. I
don't know if there is some relation between 2 issues. The reason of the
pg_dump space failure is: when using tar format, pg_dump write temporary
data into /var, which might be smaller than biggest table requires. The
fix is simple in this case: to use custom format or increase /var. I
actually don't know what PostgreSQL has to do in /var and if it is same
problem? By the way it's impossible to understand where not enough space
from pg_dump output as well as in my original issue.

Sincerely yours,

Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, June 25, 2013 4:47 PM
To: Yuri Levinsky
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres crash? could not write to log file: No
spaceleft on device

"Yuri Levinsky" <yuril@celltick.com> writes:

I have the following issue on Sun Solaris 10. PostgreSQL version is
9.2.3. The wall logging is minimal and no archiving. The DB restarted
several time, the box is up for last 23 days. The PostgreSQL
installation and files under /data/postgres that is half empty. Is it
some other destination that might cause the problem? Can I log the
space consumption and directory name where the problem is happening by

some debug level or trace setting?

PANIC: could not write to log file 81, segment 125 at offset
13959168, length 1392640: No space left on device

That's definitely telling you it got ENOSPC from a write in
$PGDATA/pg_xlog. Maybe you have a user-specific space quota affecting
the postgres account?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Lou Picciano

loupicciano@comcast.net

almost 13 years ago

In reply to: Yuri Levinsky (#1)

Re: Postgres crash? could not write to log file: No space left on device

Yuri, You're sure the pg xlogs are going where you expect them to? Have you fine-tooth-combed your conf file for log file-related settings? Log files may well be directed to fs other than /data/postgres (as is common in our environments, e.g.)

Do a $ df -h on the various FSes involved...

Are you using Solaris 10 ACLs? Dig deeper on Tom's point on user-specific quotas. ZFS in use? Various quota settings under Solaris can get you really unexpected mileage.

Lou Picciano

----- Original Message -----
From: Yuri Levinsky
To: pgsql-bugs@postgresql.org
Sent: Tue, 25 Jun 2013 12:23:00 -0000 (UTC)
Subject: [BUGS] Postgres crash? could not write to log file: No space left on device

Dear All, I have the following issue on Sun Solaris 10. PostgreSQL version is 9.2.3. The wall logging is minimal and no archiving. The DB restarted several time, the box is up for last 23 days. The PostgreSQL installation and files under /data/postgres that is half empty. Is it some other destination that might cause the problem? Can I log the space consumption and directory name where the problem is happening by some debug level or trace setting? PANIC: could not write to log file 81, segment 125 at offset 13959168, length 1392640: No space left on deviceLOG: process 10203 still waiting for ShareLock on transaction 3010915 after 1004.113 msSTATEMENT: UPDATE tctuserinfo SET clickmiles = clickmiles + $1, periodicalclickmiles = periodicalclickmiles + $2, active = $3, activeupdatetime = $4, activationsetby = $5, smscid = $6, sdrtime = $7, simvalue = simvalue + $8, totalsimvalue = totalsimvalue + $9, firstclick = $10, lastclick = $11, firstactivationtime = $12, cbchannel = $13, clickmilesupdatetime = $14, ci = $15, lac = $16, bscid = $17, lastlocationupdatetime = $18, subscriptiontype = $19, contentcategory =$20, livechannels = $21, contextclicks = $22 WHERE phonenumber = $23LOG: WAL writer process (PID 10476) was terminated by signal 6LOG: terminating any other active server processesFATAL: the database system is in recovery mode But it looks OK: dbnetapp:/vol/postgres 90G 44G 46G 49% /data/postgres Is it possible that “heavy” queries consuming disk space (as temporary space) and after the crash and recovery it becoming OK?

Yuri Levinsky

yuril@celltick.com

almost 13 years ago

In reply to: Lou Picciano (#4)

Re: Postgres crash? could not write to log file: No spaceleft on device

Hi,

I did it before rising the question: no ZFS, Solaris 10 64 bit, I am personally created manually 2G files – no limitations, the logs location is correct and individual files timestamps are up today. It's not always happening – 1-3 times a day. I inspected my config file and didn't see any destination that isn't /data/postgres. Have I perform any specific setting to limit it into /data/postgres?

Sincerely yours,

Yuri Levinsky, DBA

Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel

Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

From: Lou Picciano [mailto:loupicciano@comcast.net]
Sent: Tuesday, June 25, 2013 5:34 PM
To: Yuri Levinsky
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres crash? could not write to log file: No spaceleft on device

Do a $ df -h on the various FSes involved...

Are you using Solaris 10 ACLs? Dig deeper on Tom's point on user-specific quotas. ZFS in use? Various quota settings under Solaris can get you really unexpected mileage.

Lou Picciano

Dear All,

I have the following issue on Sun Solaris 10. PostgreSQL version is 9.2.3. The wall logging is minimal and no archiving. The DB restarted several time, the box is up for last 23 days. The PostgreSQL installation and files under /data/postgres that is half empty. Is it some other destination that might cause the problem? Can I log the space consumption and directory name where the problem is happening by some debug level or trace setting?

PANIC: could not write to log file 81, segment 125 at offset 13959168, length 1392640: No space left on device

LOG: process 10203 still waiting for ShareLock on transaction 3010915 after 1004.113 ms

STATEMENT: UPDATE tctuserinfo SET clickmiles = clickmiles + $1, periodicalclickmiles = periodicalclickmiles + $2, active = $3, activeupdatetime = $4, activationsetby = $5, smscid = $6, sdrtime = $7, simvalue = simvalue + $8, totalsimval

ue = totalsimvalue + $9, firstclick = $10, lastclick = $11, firstactivationtime = $12, cbchannel = $13, clickmilesupdatetime = $14, ci = $15, lac = $16, bscid = $17, lastlocationupdatetime = $18, subscriptiontype = $19, contentcategory =

$20, livechannels = $21, contextclicks = $22 WHERE phonenumber = $23

LOG: WAL writer process (PID 10476) was terminated by signal 6

LOG: terminating any other active server processes

FATAL: the database system is in recovery mode

But it looks OK:

dbnetapp:/vol/postgres

90G 44G 46G 49% /data/postgres

Is it possible that “heavy” queries consuming disk space (as temporary space) and after the crash and recovery it becoming OK?

This mail was received via Mail-SeCure System.

bricklen

bricklen@gmail.com

almost 13 years ago

In reply to: Yuri Levinsky (#5)

Re: Postgres crash? could not write to log file: No spaceleft on device

On Tue, Jun 25, 2013 at 9:43 AM, Yuri Levinsky <yuril@celltick.com> wrote:

I inspected my config file and didn't see any destination that isn't
/data/postgres. Have I perform any specific setting to limit it into
/data/postgres?

Execute from psql:

show stats_temp_directory
stats_temp_directory
----------------------
pg_stat_tmp

Assuming your stats_temp_directory is pg_stat_tmp, where is that directory
located? Under $PGDATA?
What is the path to $PGDATA?
If your stats_temp_directory is located on a small partition, that could be
a problem
Where is your pg_xlog dir located?

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: bricklen (#6)

Re: Postgres crash? could not write to log file: No spaceleft on device

bricklen <bricklen@gmail.com> writes:

Assuming your stats_temp_directory is pg_stat_tmp, where is that directory
located? Under $PGDATA?
What is the path to $PGDATA?
If your stats_temp_directory is located on a small partition, that could be
a problem
Where is your pg_xlog dir located?

The given error message is definitely complaining about being unable to
write a pg_xlog file --- stats or other temp files are not relevant.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Jeff Davis

pgsql@j-davis.com

almost 13 years ago

In reply to: Tom Lane (#2)

Re: Postgres crash? could not write to log file: No space left on device

On Tue, 2013-06-25 at 09:46 -0400, Tom Lane wrote:

"Yuri Levinsky" <yuril@celltick.com> writes:

PANIC: could not write to log file 81, segment 125 at offset 13959168,
length 1392640: No space left on device

That's definitely telling you it got ENOSPC from a write in
$PGDATA/pg_xlog.

Either that, or write() wrote less than expected but did not set errno.
It looks like we assume ENOSPC when errno is not set.

Regards,
Jeff Davis

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Jeff Davis (#8)

Re: Postgres crash? could not write to log file: No space left on device

Jeff Davis <pgsql@j-davis.com> writes:

On Tue, 2013-06-25 at 09:46 -0400, Tom Lane wrote:

That's definitely telling you it got ENOSPC from a write in
$PGDATA/pg_xlog.

Either that, or write() wrote less than expected but did not set errno.

Good point. I wonder if he's using a filesystem that is capable of
reporting partial writes for other reasons, eg maybe it allows signals
to end writes early. (Though if it is, it's not apparent why such
failures would only be manifesting on the pg_xlog files and not for
anything else.)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10

Bruce Momjian

bruce@momjian.us

almost 13 years ago

In reply to: Tom Lane (#9)

Re: Postgres crash? could not write to log file: No space left on device

On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(Though if it is, it's not apparent why such
failures would only be manifesting on the pg_xlog files and not for
anything else.)

Well data files are only ever written to in 8k chunks. Maybe these
errors are only occuring on >8k xlog records such as records with
multiple full page images. I'm not sure how much we write for other
types of files but they won't be written to as frequently as xlog or
data files and might not cause errors that are as noticeable.

--
greg

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Bruce Momjian (#10)

Re: Postgres crash? could not write to log file: No space left on device

On 2013-06-26 13:14:37 +0100, Greg Stark wrote:

On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(Though if it is, it's not apparent why such
failures would only be manifesting on the pg_xlog files and not for
anything else.)

Well data files are only ever written to in 8k chunks. Maybe these
errors are only occuring on >8k xlog records such as records with
multiple full page images. I'm not sure how much we write for other
types of files but they won't be written to as frequently as xlog or
data files and might not cause errors that are as noticeable.

We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as
well...

Yuri, have you compiled postgres with nonstandard configure or
pg_config_manual.h settings?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 13 years ago

In reply to: Andres Freund (#11)

Re: Postgres crash? could not write to log file: No space left on device

On 26.06.2013 15:21, Andres Freund wrote:

On 2013-06-26 13:14:37 +0100, Greg Stark wrote:

On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

(Though if it is, it's not apparent why such
failures would only be manifesting on the pg_xlog files and not for
anything else.)

Well data files are only ever written to in 8k chunks. Maybe these
errors are only occuring on>8k xlog records such as records with
multiple full page images. I'm not sure how much we write for other
types of files but they won't be written to as frequently as xlog or
data files and might not cause errors that are as noticeable.

We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as
well...

Actually, XLogWrite() writes multiple pages at once. If all wal_buffers
are dirty, it can try to write them all in one write() call.

We've discussed retrying short writes before, and IIRC Tom has argued
that it shouldn't be necessary when writing to disk. Nevertheless, I
think we should retry in XLogWrite(). It can write much bigger chunks
than most write() calls, so there's more room for a short write to
happen there if it can happen at all. Secondly, it PANICs on failure, so
it would be nice to try a bit harder to avoid that.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13

Yuri Levinsky

yuril@celltick.com

almost 13 years ago

In reply to: Andres Freund (#11)

Re: Postgres crash? could not write to log file: No spaceleft on device

Sorry for delay, the email somehow went into junk mails folder. I
didn't compiled Postgres, we downloaded the already compiled version for
SUN Solaris 10 64 bit. The same version is working fine on other
installations. On this specific installation we got "buggy kernel" error
during massive data load (copy) from Informix. As result we added "noac"
flag to NFS that looking on NetApp storage. The "noac" means disable NFS
caching, the problem disappeared despite multiple and instant reproduces
before. Unfortunately after site went up we started to see DB reboots
for 2 days. I disabled pg_dump with tar option, that somehow filled up
by 100% the /var directory and freed it after failure. I limited the
temp storage per session as well. Unfortunately I lost the connection to
the site and unable to check today if it helped or not.

Sincerely yours,

Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

-----Original Message-----
From: Andres Freund [mailto:andres@2ndquadrant.com]
Sent: Wednesday, June 26, 2013 3:21 PM
To: Greg Stark
Cc: Tom Lane; Jeff Davis; Yuri Levinsky; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres crash? could not write to log file: No
spaceleft on device

On 2013-06-26 13:14:37 +0100, Greg Stark wrote:

On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(Though if it is, it's not apparent why such failures would only be

manifesting on the pg_xlog files and not for anything else.)

Well data files are only ever written to in 8k chunks. Maybe these
errors are only occuring on >8k xlog records such as records with
multiple full page images. I'm not sure how much we write for other
types of files but they won't be written to as frequently as xlog or
data files and might not cause errors that are as noticeable.

We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as
well...

Yuri, have you compiled postgres with nonstandard configure or
pg_config_manual.h settings?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

This mail was received via Mail-SeCure System.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Heikki Linnakangas (#12)

Re: Postgres crash? could not write to log file: No space left on device

On 2013-06-26 15:40:08 +0300, Heikki Linnakangas wrote:

On 26.06.2013 15:21, Andres Freund wrote:

On 2013-06-26 13:14:37 +0100, Greg Stark wrote:

On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

(Though if it is, it's not apparent why such
failures would only be manifesting on the pg_xlog files and not for
anything else.)

Well data files are only ever written to in 8k chunks. Maybe these
errors are only occuring on>8k xlog records such as records with
multiple full page images. I'm not sure how much we write for other
types of files but they won't be written to as frequently as xlog or
data files and might not cause errors that are as noticeable.

We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as
well...

Actually, XLogWrite() writes multiple pages at once. If all wal_buffers are
dirty, it can try to write them all in one write() call.

Oh. Misremembered that.

We've discussed retrying short writes before, and IIRC Tom has argued that
it shouldn't be necessary when writing to disk. Nevertheless, I think we
should retry in XLogWrite(). It can write much bigger chunks than most
write() calls, so there's more room for a short write to happen t$here if it
can happen at all. Secondly, it PANICs on failure, so it would be nice to
try a bit harder to avoid that.

At the very least we should log the amount of bytes actually writen if
it was a short write to make it possible to discern that case from the
direct ENOSPC response.

This might also be caused by the fact that until recently the SIGALRM
handler didn't set SA_RESTART... If a backend decided to write out the
xlog directly it very well might have an active alarm...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Heikki Linnakangas (#12)

Re: Postgres crash? could not write to log file: No space left on device

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

We've discussed retrying short writes before, and IIRC Tom has argued
that it shouldn't be necessary when writing to disk. Nevertheless, I
think we should retry in XLogWrite(). It can write much bigger chunks
than most write() calls, so there's more room for a short write to
happen there if it can happen at all. Secondly, it PANICs on failure, so
it would be nice to try a bit harder to avoid that.

Seems reasonable. My concern about the idea in general was the
impossibility of being sure we'd protected every single write() call.
But if we can identify specific call sites that seem at more risk than
most, I'm okay with adding extra logic there.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#16

Yuri Levinsky

yuril@celltick.com

almost 13 years ago

In reply to: Andres Freund (#14)

Re: Postgres crash? could not write to log file: No spaceleft on device

Dear All,
I succeed to find some Postgres note where similar issue is related to
NFS mount option 'intr'. It was recommended to use 'hard,nointr' on
Solaris NFS or do not use NFS at all. The default on SUN is 'intr' and
on Linux is 'nointr', even not specified. Surprisingly the issue doesn't
happens on any my Linux installations. According to the very long note
the issue happens during heavy load, when PostgreSQL doesn't have answer
during some time. By the way adding 'noac' mount option prevent
successfully reproduced by me the "buggy kernel" error that also
followed by PostgreSQL crash on this my particular system. All this
leads me to conclusion that something basically wrong: these mount
options have to be documented somewhere as recommended for NFS or log/bg
writers have to take it somehow into account.
Can this be confirmed as a bug? Can it be fixed in some nearest future?

2013-07-02 01:37:58 GMTXX000PANIC: could not write to file
"pg_xlog/xlogtemp.24205": Interrupted system call
2013-07-02 01:40:52 GMT00000LOG: WAL writer process (PID 24205) was
terminated by signal 6
2013-07-02 01:40:52 GMT00000LOG: terminating any other active server
processes
2013-07-02 01:40:52 GMT57P03FATAL: the database system is in recovery
mode

2013-07-02 02:17:41 GMT57P03FATAL: the database system is in recovery
mode
2013-07-02 02:17:41 GMT00000LOG: autovacuum launcher started
2013-07-02 02:17:41 GMT00000LOG: database system is ready to accept
connections
2013-07-02 02:44:50 GMTXX000PANIC: could not write to file
"pg_xlog/xlogtemp.14855": Interrupted system call
2013-07-02 02:48:02 GMT00000LOG: WAL writer process (PID 14855) was
terminated by signal 6
2013-07-02 02:48:02 GMT00000LOG: terminating any other active server
processes
2013-07-02 02:48:02 GMT57P03FATAL: the database system is in recovery
mode

2013-07-02 04:15:49 GMTXX000PANIC: could not open file
"pg_xlog/00000001000000B9000000C9" (log file 185, segment 201):
Interrupted system call
2013-07-02 04:18:55 GMT00000LOG: WAL writer process (PID 2296) was
terminated by signal 6
2013-07-02 04:18:55 GMT00000LOG: terminating any other active server
processes
2013-07-02 04:18:55 GMT57P03FATAL: the database system is in recovery
mode

Sincerely yours,

Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

-----Original Message-----
From: Andres Freund [mailto:andres@2ndquadrant.com]
Sent: Wednesday, June 26, 2013 4:04 PM
To: Heikki Linnakangas
Cc: Greg Stark; Tom Lane; Jeff Davis; Yuri Levinsky;
pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres crash? could not write to log file: No
spaceleft on device

On 2013-06-26 15:40:08 +0300, Heikki Linnakangas wrote:

On 26.06.2013 15:21, Andres Freund wrote:

On 2013-06-26 13:14:37 +0100, Greg Stark wrote:

On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane<tgl@sss.pgh.pa.us>

wrote:

(Though if it is, it's not apparent why such failures would only
be manifesting on the pg_xlog files and not for anything else.)

Well data files are only ever written to in 8k chunks. Maybe these
errors are only occuring on>8k xlog records such as records with
multiple full page images. I'm not sure how much we write for other
types of files but they won't be written to as frequently as xlog or

data files and might not cause errors that are as noticeable.

We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as
well...

Actually, XLogWrite() writes multiple pages at once. If all
wal_buffers are dirty, it can try to write them all in one write()

call.

Oh. Misremembered that.

We've discussed retrying short writes before, and IIRC Tom has argued
that it shouldn't be necessary when writing to disk. Nevertheless, I
think we should retry in XLogWrite(). It can write much bigger chunks
than most
write() calls, so there's more room for a short write to happen t$here

if it can happen at all. Secondly, it PANICs on failure, so it would
be nice to try a bit harder to avoid that.

At the very least we should log the amount of bytes actually writen if
it was a short write to make it possible to discern that case from the
direct ENOSPC response.

This might also be caused by the fact that until recently the SIGALRM
handler didn't set SA_RESTART... If a backend decided to write out the
xlog directly it very well might have an active alarm...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

This mail was received via Mail-SeCure System.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 13 years ago

In reply to: Tom Lane (#15)

Re: Postgres crash? could not write to log file: No space left on device

On 26.06.2013 17:15, Tom Lane wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

We've discussed retrying short writes before, and IIRC Tom has argued
that it shouldn't be necessary when writing to disk. Nevertheless, I
think we should retry in XLogWrite(). It can write much bigger chunks
than most write() calls, so there's more room for a short write to
happen there if it can happen at all. Secondly, it PANICs on failure, so
it would be nice to try a bit harder to avoid that.

Seems reasonable. My concern about the idea in general was the
impossibility of being sure we'd protected every single write() call.
But if we can identify specific call sites that seem at more risk than
most, I'm okay with adding extra logic there.

Committed a patch to add retry loop to XLogWrite().

I noticed that FileWrite() has some additional Windows-specific code to
also retry on an ERROR_NO_SYSTEM_RESOURCES error. That's a bit scary,
because we don't check for that in any other write() calls in the
backend. If we really need to be prepared for that on Windows, I think
that would need to be in a wrapper function in src/port or src/backend/port.

Would a Windows-person like to comment on that?

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Postgres crash? could not write to log file: No space left on device

Attachments: