stale WAL files?

Started by Rob Sargentabout 7 years ago22 messagesgeneral

robjsargent@gmail.com

about 7 years ago

PG10.7, Centos7

On Mar15 we filled our default tablespace/WAL partition. Cleaned up some old dumps and restarted. pg_wal had apparently exploded but cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all databases on that server (except template0). All seems fine except for 271 WALs from MAR16 going nowhere fast. Of course we see transient WALS come and go every day. We did a restart of the server last Friday (Mar22) without issue.

-rw-------. 1 postgres postgres 16777216 Mar 16 16:32 0000000100000CE9000000DD
...(269 others)
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01 0000000100000CEA000000E9

What’s keeping these alive. Or can they be deleted? Any normal admin routine likely to clean these up?

One of our crew was apparently trying to create a function: he had a run-away json query which caused memory grief but I don’t see that reflected anywhere in the log. The log file for that day goes south as follows.

[1]: Last reported normal operation
43c2-a1f9-7b68f1618cd4';
[2]: death knell
[3]: First failed operation from our application
= s.events_equal + b.events_equal, events_greater = s.events_greater + b.events_greater, threshold_events = s.threshold_events +\
b.threshold_events from bulk."bc_1819_17_fd6dbc1e_57e5_4d49_b896_59d6687c8ee6" as b where s.markerset_id = '8d723d2f-1281-48c5-9\
016-2dab3f4d242b' and s.probandset_id = b.probandset_id and s.markerset_id = b.markerset_id and s.startbase = b.startbase and s.e\
ndbase = b.endbase and s.probandset_id >= '90000000-0000-0000-0000-000000000000' and s.probandset_id < 'a0000000-0000-0000-0000-\
000000000000'
2019-03-15 17:07:30.818 MDT [317161] PANIC: could not write to file "pg_wal/xlogtemp.317161": No space left on device
[4]: continued effort from create function work

begin
--new probandset_group record
insert into probandset_group(id,name) select uuid_generate_v4(),'1808_p3c2n4';

--create necessary new probandset and probandset_group_member records
select addprobandset(a.name, people, groupname) from
(select powerset((select array_agg(name order by name) lst from base.person
where name in (superset))) as name
except (select regexp_split_to_array(p.name,',') from probandset p )) a
where array_length(a.name,1)>1;

--update superset field for new probandset_group
update probandset_group set proband_superset_id = (select id from probandset where name = superset);
end;

[1]: Last reported normal operation
[2]: death knell
[3]: First failed operation from our application
[4]: continued effort from create function work

Thanks in advance.

adrian.klaver@aklaver.com

about 7 years ago

In reply to: Rob Sargent (#1)

Re: stale WAL files?

On 3/25/19 5:10 PM, Rob Sargent wrote:

PG10.7, Centos7

On Mar15 we filled our default tablespace/WAL partition. Cleaned up
some old dumps and restarted. pg_wal had apparently exploded but
cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all
databases on that server (except template0). All seems fine except for
271 WALs from MAR16 going nowhere fast. Of course we see transient WALS
come and go every day. We did a restart of the server last Friday
(Mar22) without issue.

-rw-------. 1 postgres postgres 16777216 Mar 16 16:32
0000000100000CE9000000DD
...(269 others)
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9

What’s keeping these alive. Or can they be deleted? Any normal admin
routine likely to clean these up?

wal_keep_segments?

Do you have replication set up and replication slots in use?

One of our crew was apparently trying to create a function: he had a
run-away json query which caused memory grief but I don’t see that
reflected anywhere in the log. The log file for that day goes south as
follows.

[1]2019-03-15 12:09:15.230 MDT [29189] STATEMENT: select count(*)
from sui.probancset_group_member where group_id = '4ce6a94d-bb2d-\
43c2-a1f9-7b68f1618cd4';
[2]2019-03-15 17:07:30.818 MDT [35020] ERROR: could not write to
hash-join temporary file: No space left on device
[3]2019-03-15 17:07:30.818 MDT [35020] STATEMENT: update segment as
s set events_less = s.events_less + b.events_less, events_equal\
= s.events_equal + b.events_equal, events_greater =
s.events_greater + b.events_greater, threshold_events =
s.threshold_events +\
b.threshold_events from
bulk."bc_1819_17_fd6dbc1e_57e5_4d49_b896_59d6687c8ee6" as b where
s.markerset_id = '8d723d2f-1281-48c5-9\
016-2dab3f4d242b' and s.probandset_id = b.probandset_id and
s.markerset_id = b.markerset_id and s.startbase = b.startbase and s.e\
ndbase = b.endbase and s.probandset_id >=
'90000000-0000-0000-0000-000000000000' and s.probandset_id <
'a0000000-0000-0000-0000-\
000000000000'
2019-03-15 17:07:30.818 MDT [317161] PANIC: could not write to file
"pg_wal/xlogtemp.317161": No space left on device
[4]2019-03-15 17:08:19.231 MDT [35166] ERROR: unterminated
dollar-quoted string at or near "$$

begin
--new probandset_group record
insert into probandset_group(id,name) select
uuid_generate_v4(),'1808_p3c2n4';

--create necessary new probandset and
probandset_group_member records
select addprobandset(a.name, people, groupname) from
   (select powerset((select array_agg(name order
by name) lst from base.person
   where name in (superset))) as name
   except (select
regexp_split_to_array(p.name,',') from probandset p )) a
   where array_length(a.name,1)>1;

--update superset field for new probandset_group
update probandset_group set proband_superset_id =
(select id from probandset where name = superset);
end;

[1] Last reported normal operation
[2] death knell
[3] First failed operation from our application
[4] continued effort from create function work

Thanks in advance.

--
Adrian Klaver
adrian.klaver@aklaver.com

robjsargent@gmail.com

about 7 years ago

In reply to: Adrian Klaver (#2)

Re: stale WAL files?

On Mar 26, 2019, at 8:08 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 3/25/19 5:10 PM, Rob Sargent wrote:

PG10.7, Centos7
On Mar15 we filled our default tablespace/WAL partition. Cleaned up some old dumps and restarted. pg_wal had apparently exploded but cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all databases on that server (except template0). All seems fine except for 271 WALs from MAR16 going nowhere fast. Of course we see transient WALS come and go every day. We did a restart of the server last Friday (Mar22) without issue.
-rw-------. 1 postgres postgres 16777216 Mar 16 16:32 0000000100000CE9000000DD
...(269 others)
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01 0000000100000CEA000000E9
What’s keeping these alive. Or can they be deleted? Any normal admin routine likely to clean these up?

wal_keep_segments?

Commented out in postgres.conf

Do you have replication set up and replication slots in use?

No, sorry I should have said that up front. We’re simple folk.

Michael Paquier

michael@paquier.xyz

about 7 years ago

In reply to: Rob Sargent (#3)

Re: stale WAL files?

On Tue, Mar 26, 2019 at 09:50:37AM -0600, Rob Sargent wrote:

No, sorry I should have said that up front. We’re simple folk.

What is the WAL position (LSN) where Postgres is writing to and what
is the set of WAL segments in pg_wal (or pg_xlog if that's a server
older than 10)? Please double-check the configuration value of
wal_keep_segments, and as mentioned upthread, could you make sure that
you have no replication slots active? This can be done simply by
querying pg_replication_slots. Please note as well that checkpoints
are server-wide, so there is no point to run them on all databases.
Only one command will be effective for all databases.
--
Michael

robjsargent@gmail.com

about 7 years ago

In reply to: Michael Paquier (#4)

Re: stale WAL files?

On 3/28/19 7:30 AM, Michael Paquier wrote:

On Tue, Mar 26, 2019 at 09:50:37AM -0600, Rob Sargent wrote:

No, sorry I should have said that up front. We’re simple folk.

What is the WAL position (LSN)

postgres=# select * from pg_current_wal_flush_lsn();
pg_current_wal_flush_lsn
--------------------------
CEA/E57EAA8
(1 row)

postgres=# select * from pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
CEA/E57EAA8
(1 row)

postgres=# select * from pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
CEA/E57EAA8
(1 row)

where Postgres is writing to and what
is the set of WAL segments in pg_wal (or pg_xlog if that's a server
older than 10)?

select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

Please double-check the configuration value of
wal_keep_segments,

#wal_keep_segments = 0 # in logfile segments, 16MB each

and as mentioned upthread, could you make sure that
you have no replication slots active?

This could be part of the problem?

#max_replication_slots = 10 # max number of replication slots

but

select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary |
active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)

This can be done simply by
querying pg_replication_slots. Please note as well that checkpoints
are server-wide, so there is no point to run them on all databases.
Only one command will be effective for all databases.

I suspected as much, but there weren't many dbs so I went all in.

Show quoted text

--
Michael

Michael Paquier

michael@paquier.xyz

about 7 years ago

In reply to: Rob Sargent (#5)

Re: stale WAL files?

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:

This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

In Postgres 10 and older versions, the server keeps WAL segment for
the last completed segment, and the previous completed segment. So
even if a checkpoint is issued, the current WAL insert point is never
really going to be on the first segment in pg_wal. Isn't that the
origin of what you think is a problem? So, say, if you issue a
checkpoint again, don't you see 0000000100000CEA000000B1 going away?

In Postgres 11, WAL segments worth only one checkpoint are kept
around.
--
Michael

robjsargent@gmail.com

about 7 years ago

In reply to: Michael Paquier (#6)

Re: stale WAL files?

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

In Postgres 10 and older versions, the server keeps WAL segment for
the last completed segment, and the previous completed segment. So
even if a checkpoint is issued, the current WAL insert point is never
really going to be on the first segment in pg_wal. Isn't that the
origin of what you think is a problem? So, say, if you issue a
checkpoint again, don't you see 0000000100000CEA000000B1 going away?

In Postgres 11, WAL segments worth only one checkpoint are kept
around.
--
Michael

I’m on the road all through the weekend with limited connectivity. But recall that I’m getting new, transient WAL files

robjsargent@gmail.com

about 7 years ago

In reply to: Michael Paquier (#6)

Re: stale WAL files?

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

In Postgres 10 and older versions, the server keeps WAL segment for
the last completed segment, and the previous completed segment. So
even if a checkpoint is issued, the current WAL insert point is never
really going to be on the first segment in pg_wal. Isn't that the
origin of what you think is a problem? So, say, if you issue a
checkpoint again, don't you see 0000000100000CEA000000B1 going away?

I had CEA000000015, generated today, as only entry since Mar 16. Issued checkpoint, soon had CEA0000016 and 15. Five minutes later I still hav 15 and 16 (and of course all 271 from Mar 16).

Show quoted text

In Postgres 11, WAL segments worth only one checkpoint are kept
around.
--
Michael

robjsargent@gmail.com

about 7 years ago

In reply to: Michael Paquier (#6)

Re: stale WAL files?

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

Today there are 210 Mar 16 WAL files versus the originally reported 271. I cannot at this point confirm the original count, other that to say I used “ls -ltr | grep ‘Mar 16’ | wc -l” to get both numbers.

Is it interesting that the earliest files (by ls time stamp) are not lowest numerically? Those two file names end “0000B[12]” (written at 16:33) in a range across the directory from “00001A” through “0000E9”? “0000B0” was written at16:53 and “0000B3” was written at 16:54
Is there any analysis of the file names I could do which might shed any light on the issue?

robjsargent@gmail.com

about 7 years ago

In reply to: Michael Paquier (#6)

Re: stale WAL files?

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

robjsargent@gmail.com

about 7 years ago

In reply to: Rob Sargent (#10)

Re: stale WAL files?

On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16

Rene Romero Benavides

rene.romero.b@gmail.com

about 7 years ago

In reply to: Rob Sargent (#11)

Re: stale WAL files?

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:

On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz>

wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at

least deleted). I’ve taken a snapshot of the pg_wal dir such that I can
see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16

Please share your complete postgresql.conf file and the results from this

query:
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Rene Romero Benavides

rene.romero.b@gmail.com

about 7 years ago

In reply to: Rene Romero Benavides (#12)

Re: stale WAL files?

On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <
rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:

On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz>

wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at

least deleted). I’ve taken a snapshot of the pg_wal dir such that I can
see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16

Please share your complete postgresql.conf file and the results from this

query:
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

BTW , how spread apart are checkpoints happening? do you have stats on
that? maybe they're too spread apart and that's why WAL files cannot be
recycled rapidly enough?
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

robjsargent@gmail.com

about 7 years ago

In reply to: Rene Romero Benavides (#13)

Re: stale WAL files?

I’m under fighting a nasty cold. It may take a day or two for me to response to recent questions and suggestion.

robjsargent@gmail.com

about 7 years ago

In reply to: Rene Romero Benavides (#13)

Re: stale WAL files?

Sorry folks, I’m still home nursing a nasty chest cold and my only tool today is an iPad.
I have failed to get the postgresql.conf into the copy buffer so that, along with the results of pg_settings, will have to wait for another day.

Today there are “only” 135 Mar 16 WAL files. I haven’t sorted out which have been cleaned up but can do so if that’s thought to be helpful.

There is still 2.2G in the pg_wal directory but that disc has ~360G left. (I believe the burst of WAL files was the result of a novice using LIMIT with a to-Json function and the target table has >100M rows.)

Given that current WALs come and go regularly, I think the CHECKPOINT is running frequently enough (for normal loads at least).

Show quoted text

On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:

On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16

Please share your complete postgresql.conf file and the results from this query:
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

robjsargent@gmail.com

about 7 years ago

In reply to: Rene Romero Benavides (#13)

Re: stale WAL files?

On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com <mailto:rene.romero.b@gmail.com>> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com <mailto:robjsargent@gmail.com>> wrote:

On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com <mailto:robjsargent@gmail.com>> wrote:

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16

Please share your complete postgresql.conf file and the results from this query:
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/ <http://pglearn.blogspot.mx/>

BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/ <http://pglearn.blogspot.mx/>

I think my attempt to send postgresql.conf and pg_settings report ran afoul of the mailer. Here again by attachment.

robjsargent@gmail.com

about 7 years ago

In reply to: Rene Romero Benavides (#13)

Re: stale WAL files?

On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com <mailto:rene.romero.b@gmail.com>> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com <mailto:robjsargent@gmail.com>> wrote:

On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com <mailto:robjsargent@gmail.com>> wrote:

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16

Please share your complete postgresql.conf file and the results from this query:
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/ <http://pglearn.blogspot.mx/>

BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/ <http://pglearn.blogspot.mx/>

two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.

Rene Romero Benavides

rene.romero.b@gmail.com

about 7 years ago

In reply to: Rob Sargent (#17)

Re: stale WAL files?

On Wed, Apr 3, 2019 at 1:05 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com>
wrote:

On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <
rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:

On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz>

wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at

least deleted). I’ve taken a snapshot of the pg_wal dir such that I can
see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16

Please share your complete postgresql.conf file and the results from

this query:
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

BTW , how spread apart are checkpoints happening? do you have stats on
that? maybe they're too spread apart and that's why WAL files cannot be
recycled rapidly enough?
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

two attempts (one in-line, one with attachement) at sending
postgresql.conf and pg_settings report have been sent to a moderator.

As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

robjsargent@gmail.com

about 7 years ago

In reply to: Rene Romero Benavides (#18)

Re: stale WAL files?

On Apr 3, 2019, at 5:23 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Wed, Apr 3, 2019 at 1:05 PM Rob Sargent <robjsargent@gmail.com <mailto:robjsargent@gmail.com>> wrote:

On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com <mailto:rene.romero.b@gmail.com>> wrote:

On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com <mailto:rene.romero.b@gmail.com>> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com <mailto:robjsargent@gmail.com>> wrote:

On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com <mailto:robjsargent@gmail.com>> wrote:

On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:

On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
This is pg10 so it's pg_wal. ls -ltr

-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16

Please share your complete postgresql.conf file and the results from this query:
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/ <http://pglearn.blogspot.mx/>

BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/ <http://pglearn.blogspot.mx/>

two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.

As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.

That’s an interesting catch. Thank you. I’ll have that reverted that to default.
Note that the WAL files are all the default 16M however.
Currently we’re down to 88 Mar 16 WAL files. My inclination is to wait this out, to see if all of Mar 16 goes away quietly then reset our backups.

Show quoted text

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/ <http://pglearn.blogspot.mx/>

robjsargent@gmail.com

about 7 years ago

In reply to: Rene Romero Benavides (#18)

Re: stale WAL files?

As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

I've manage to generate another 359 WAL files in a 10 minute span
yesterday (now only 357 remain and I suspect they will wither away as
before). Are these being held simply because of the high max_wal_size
value?

This is a development environment, wherein I'm loading 4M+ records,
first into 41 staging tables 100K rows per. In a loop over each staging
table, the data is then placed into application tables via selects.
First select * into "matching table" then select id into intersection
record (id, fixed groupId). Each such iteration is in it's own
transaction. I have dropped and recreate this same database numerous
times working my way up from 100K to 4M records, dialing in application
parameters according to number of primary records. I have not, however,
dropped the last incarnation.

Rene Romero Benavides

rene.romero.b@gmail.com

about 7 years ago

In reply to: Rob Sargent (#20)

robjsargent@gmail.com

about 7 years ago

In reply to: Rene Romero Benavides (#21)