hidden junk files in ...data/base/oid/
Hello,
I ran upgrade from 9.1 to 9.3 on CentOS using pg_upgrade and database
cluster size dropped from 77GB to 4.3GB. I wonder how this could happen. No
data lost. This means I had about 70GB junk files in my database...
I checked top 10 biggest database tables: nearly same size on old/new
cluster, the biggest table about 2GB, next 1GB, all other tables smaller
and smaller..
I checked files of biggest database (/var/lib/pgsql/9.1/data/base/27610):
# du -sm * |awk '{s+=$1} END {print s}'
7107
# du -sm . |sort -n
75264 .
So size of non-hidden database files is 1/10th of the whole database size.
Question: Where does the 70GB hidden files in ...data/base/oid/ come
from? Which postgres process could generate them? Some missed
maintenance from my side? A bug? Anybody else experienced such issue?
Thanks, Andrej
---------details: ...data/base/oid file listing shortened:
# ls -la
total 77069960
drwx------. 2 postgres postgres 32768 May 12 18:50 .
drwxr-xr-x. 5 postgres postgres 4096 May 12 11:47 ..
-rw-------. 1 postgres postgres 354156544 Feb 25 12:06 .27623.17rLmT
-rw-------. 1 postgres postgres 338952192 Feb 25 14:21 .27623.6dH1b6
-rw-------. 1 postgres postgres 5767168 Mar 7 16:00 ..27623.6dH1b6.6PrU4B
-rw-------. 1 postgres postgres 411041792 Feb 25 15:07 .27623.aN42DG
-rw-------. 1 postgres postgres 342884352 Mar 8 15:16 ..27623.aN42DG.0U5xfj
-rw-------. 1 postgres postgres 343146496 Mar 8 13:13 ..27623.aN42DG.2WFmNo
-rw-------. 1 postgres postgres 343408640 Mar 8 10:43 ..27623.aN42DG.384SXU
-rw-------. 1 postgres postgres 357302272 Mar 8 05:26 ..27623.aN42DG.3hHjZ8
-rw-------. 1 postgres postgres 360185856 Mar 7 18:19 ..27623.aN42DG.5lWta4
-rw-------. 1 postgres postgres 343146496 Mar 8 10:12 ..27623.aN42DG.64lNVQ
...shortened...
-rw-------. 1 postgres postgres 1005322240 Feb 25 15:38 .27731.2.JKYXGW
-rw-------. 1 postgres postgres 359661568 Mar 9 14:52 ..27731.2.JKYXGW.3h8RuF
-rw-------. 1 postgres postgres 331087872 Mar 9 07:37 ..27731.2.JKYXGW.3hK5aF
-rw-------. 1 postgres postgres 359923712 Mar 9 09:29 ..27731.2.JKYXGW.3KA5Cq
-rw-------. 1 postgres postgres 359923712 Mar 9 16:55 ..27731.2.JKYXGW.45nQei
-rw-------. 1 postgres postgres 137363456 Mar 9 04:47 ..27731.2.JKYXGW.4zya2Z
...shortened...
-rw-------. 1 postgres postgres 769916928 Feb 25 15:53 .27902.YboxvS
-rw-------. 1 postgres postgres 671612928 Feb 20 10:01 .27902.YMEtoS
-rw-------. 1 postgres postgres 159645696 Feb 25 16:24 .59866.Lkyxgs
-rw-------. 1 postgres postgres 272629760 Feb 20 18:37 .59866.RTcUkC
-rw-------. 1 postgres postgres 505151488 Feb 25 16:40 .59961.5BcZpK
-rw-------. 1 postgres postgres 91750400 Feb 25 16:55 .60194.gUqSdJ
-rw-------. 1 postgres postgres 8192 Apr 7 05:20 60592
-rw-------. 1 postgres postgres 8192 Jan 31 13:03 60594
-rw-------. 1 postgres postgres 8192 Apr 7 02:01 60596
-rw-------. 1 postgres postgres 8192 Feb 28 14:44 60598
-rw-------. 1 postgres postgres 8192 Apr 7 11:55 60600
...shortened...
-rw-------. 1 postgres postgres 139264 May 12 12:08 702364
-rw-------. 1 postgres postgres 24576 May 9 12:42 702364_fsm
-rw-------. 1 postgres postgres 8192 May 9 12:40 702364_vm
-rw-------. 1 postgres postgres 0 May 9 10:10 702369
-rw-------. 1 postgres postgres 860160 May 12 12:08 702372
-rw-------. 1 postgres postgres 24576 May 9 12:37 702372_fsm
-rw-------. 1 postgres postgres 8192 May 9 12:42 702372_vm
-rw-------. 1 postgres postgres 8192 May 9 10:10 702377
-rw-------. 1 postgres postgres 499712 May 12 12:08 702381
...shortened...
-rw-------. 1 postgres postgres 16384 May 9 14:34 704207
-rw-------. 1 postgres postgres 16384 May 9 14:34 704208
-rw-------. 1 postgres postgres 8192 May 9 14:34 704209
-rw-------. 1 postgres postgres 16384 May 9 14:34 704210
-rw-------. 1 postgres postgres 16384 May 9 14:34 704211
-rw-------. 1 postgres postgres 512 May 9 14:34 pg_filenode.map
-rw-------. 1 postgres postgres 106804 May 12 18:50 pg_internal.init
-rw-------. 1 postgres postgres 4 Jan 28 13:52 PG_VERSION
Hello,
solved.
This is not a postgres issue.
The system was used in HA-cluster with streaming replications.
The hidden files I asked for were created probably by broken (killed)
rsync. It uses such file-format for temporary files used during copying.
This rsync is used by master to slave database synchronization (full
on-line backup of master database to slave node) before starting postgres
in hot-standby mode on slave the node...
Best Regards, Andrej
Import Notes
Resolved by subject fallback
Andrej Vanek wrote:
Hello,
solved.
This is not a postgres issue.The system was used in HA-cluster with streaming replications.
The hidden files I asked for were created probably by broken (killed)
rsync. It uses such file-format for temporary files used during copying.This rsync is used by master to slave database synchronization (full
on-line backup of master database to slave node) before starting postgres
in hot-standby mode on slave the node...
You not only have leftover first-order rsync temp files (.NNNNN.uvwxyz)
-- but also when those temp files were being copied over by another
rsync run, which created temp files for the first-order temp files,
leaving you with second-order temp files (..NNNNN.uvwxyz.opqrst). Not
nice. I wonder if this is anywhere near sanity -- it looks like you're
copying stuff from one direction first, then failed over, then copied in
the opposite direction. I would have your setup reviewed real closely,
to avoid data-corrupting configuration mistakes. I have seen people
make subtle mistakes in their configuration, causing their whole HA
setups to be completely broken.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hello,
thanks for your answer.
I've identified problems in my cluster agent script. It is a custom written
script with built in automated recovery of failed slave. It was written in
time when postgres 9.1 streaming replications feature was just in beta
release and there was no postgres agent for streaming replications
available out there.
The problem was that the failed slave recovery was hardcoded into start
operation. But this start operation was aborted by pacemaker due to startup
operation timeout. This occured before having finished backup from master
to failed slave (in case of bigger database). This is the point where rsync
could be aborted and left over temporary junk files. There was no cleanup
before re-running the backup from master (using rsync). This may be the
reason why there may be left rsync temporary files.
Second problem identified is what you write: copying stuff from one
direction first, then failed over, then copied in the opposite direction.
This was caused because my agent was missing the lock file that standard
clusterlabs pgsql agent uses to avoid starting failed master in case of
double failure followed by reboot.
Now I'm migrating to the standard pacemaker's postgres cluster agent
provided by clusterlabs.org to avoid such issues. It is surely much better
tested by plenty of installations worldwide with community feedback.
In addition I need to automate single (master or slave) failure recovery as
much as possible. For this purpose I plan to introduce a new resource on
top of pgsql resource which would recover failed pgsql slave(or master) in
case master is active on another node (I use only two node cluster). Manual
recovery by operator would be needed for cases when postgres on both nodes
is down to avoid accidental data loss.
Do you know whether there is such cluster agent already available?
Best Regards, Andrej
2014-05-27 16:09 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Show quoted text
Andrej Vanek wrote:
Hello,
solved.
This is not a postgres issue.The system was used in HA-cluster with streaming replications.
The hidden files I asked for were created probably by broken (killed)
rsync. It uses such file-format for temporary files used during copying.This rsync is used by master to slave database synchronization (full
on-line backup of master database to slave node) before starting postgres
in hot-standby mode on slave the node...You not only have leftover first-order rsync temp files (.NNNNN.uvwxyz)
-- but also when those temp files were being copied over by another
rsync run, which created temp files for the first-order temp files,
leaving you with second-order temp files (..NNNNN.uvwxyz.opqrst). Not
nice. I wonder if this is anywhere near sanity -- it looks like you're
copying stuff from one direction first, then failed over, then copied in
the opposite direction. I would have your setup reviewed real closely,
to avoid data-corrupting configuration mistakes. I have seen people
make subtle mistakes in their configuration, causing their whole HA
setups to be completely broken.--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services