bloated postgres data folder, clean up

Started by Rémi Curaabout 10 years ago10 messagesgeneral
Jump to latest
#1Rémi Cura
remi.cura@gmail.com

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.
All is in a virtualbox, so I'm sure to be able to reproduce exactly, and
fried my postgres folder a couple of time before getting it right.

Julien (Rouhaud) helped me to find those useless files via SQL.
The idea is to list files in postgres directory with `pg_ls_dir`, then to
check that the dir name correspond to something useful (using
pg_relation_filenode).
------------------
https://gist.github.com/Remi-C/926eaee04d61a7245eb8
------------------

To be sure I export the found files list,
then use oid2name to check that no file is recognized.

files can then be deleted (using plpythonu in my case).

So far a vacuum full analyze raise no errors.

Warning : for this to work, the SQL query must be sent while connected to
the database to clean.

Hope this may be useful
Cheers,
Rémi-C

#2Peter Devoy
peter@3xe.co.uk
In reply to: Rémi Cura (#1)
Re: bloated postgres data folder, clean up

Hope this may be useful

Thanks for sharing!

Peter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Rémi Cura (#1)
Re: bloated postgres data folder, clean up

R�mi Cura wrote:

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.

Would you add a new page to the wiki with this?

https://wiki.postgresql.org/wiki/Category:Administrative_Snippets

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Rémi Cura
remi.cura@gmail.com
In reply to: Alvaro Herrera (#3)
Re: bloated postgres data folder, clean up

Would gladly do it,
but still this "wiki cooloff" stuff,
can't create a page
Cheers,
Rémi-C

2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Show quoted text

Rémi Cura wrote:

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.

Would you add a new page to the wiki with this?

https://wiki.postgresql.org/wiki/Category:Administrative_Snippets

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Johnny Morano
johnny.morano@payon.com
In reply to: Rémi Cura (#4)
Re: bloated postgres data folder, clean up

Hi Remi!

This SQL function you have provided, seems to return all valid files, is that correct? In my case, it returned all my ‘base/’ files. Is that normal?
If yes, maybe you rename the function to ‘find_useful_postgres_files’ ;-)

Could you explain in steps how to use this function to make a cleanup of bloated data? (like in an example with commands and example output, if possible of course)

Thanks!

Mit besten Grüßen / With best regards,
Johnny Morano
____________________________________________________

Johnny Morano | Principal Systems Engineer

PAY.ON GmbH | AN ACI WORLDWIDE COMPANY | WWW.PAYON.COM<http://www.payon.com/&gt;
Jakob-Haringer-Str. 1 | 5020 Salzburg | Austria
Registered at: LG Salzburg | Company number: FN 315081 f | VAT-ID: ATU64439405
Managing Director: Christian Bamberger

Follow us on:

[cid:image001.jpg@01D126D0.E1AB0670]<http://blog.payon.com/&gt; [cid:image002.jpg@01D126D0.E1AB0670] <http://www.linkedin.com/company/146260?trk=tyah&gt; [cid:image003.jpg@01D126D0.E1AB0670] <https://twitter.com/PAYON_com&gt;

This email message and any attachments may contain confidential, proprietary or non-public information. This information is intended solely for the designated recipient(s). If an addressing or transmission error has misdirected this email, please notify the sender immediately and destroy this email. Any review, dissemination, use or reliance upon this information by unintended recipients is prohibited. Any opinions expressed in this email are those of the author personally.

This message and any attachments have been scanned for viruses prior leaving PAY.ON; however, PAY.ON does not guarantee the security of this message and will not be responsible for any damages arising as a result of any virus being passed on or arising from any alteration of this message by a third party. PAY.ON may monitor e-mails sent to and from PAY.ON.

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rémi Cura
Sent: Mittwoch, 2. März 2016 14:58
To: Alvaro Herrera
Cc: PostgreSQL General
Subject: Re: [GENERAL] bloated postgres data folder, clean up

Would gladly do it,
but still this "wiki cooloff" stuff,
can't create a page
Cheers,
Rémi-C

2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com<mailto:alvherre@2ndquadrant.com>>:
Rémi Cura wrote:

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.

Would you add a new page to the wiki with this?

https://wiki.postgresql.org/wiki/Category:Administrative_Snippets

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
image002.jpgimage/jpeg; name=image002.jpgDownload
image003.jpgimage/jpeg; name=image003.jpgDownload
#6Rémi Cura
remi.cura@gmail.com
In reply to: Johnny Morano (#5)
Re: bloated postgres data folder, clean up

Hey,
this is quite the *opposite*.
The function find files in the postgres database folder that are not used
by the database.

To use it :
* connect to the database you want to analyse ( **mandatory** ).
* create the function (execute function definition)
* Execute `SELECT * FROM find_useless_postgres_file('your_database_name')`

This will output a list of files that are on the disk but not used by
postgres,
and so can be removed.

To be extra sure, you should use oid2name programme to check that the
useless files are really useless.

For this :
* output the list of potential useless files with copy for instance
ex :
COPY ( SELECT file_name
FROM find_useless_postgres_file('your_database_name')
) TO 'path_to_you_database_folder/potential_useless.txt'

now you've got a file with a list of potential erroneous files.

* Then use oid2name
`$su postgres
$cd path_to_you_database_folder
$while read i; do oid2name -f "$i" -i -S -q -d your_database_name; done
< potential_useless.txt
`

Nothing should show, meaning that every potential erroneous file
has not been recognized by oid2name !
If you feel unconvinced, you can manually try oid2name on some
of the potential erroneous files, to be extra sure.
It should not find anything.

* Now delete all the files in `potential_useless.txt`.
It could be wiser to not delete the files but rename those
(for instance, adding `.potentially_useless` as a postfix)
so if it breaks something, you have an easy way to revert everything.

Anyway, use *-*extra extra*-* caution if you delete.
Except a backup, there would be no easy way to correct a mistake.

Cheers,
Rémi-C

2016-03-02 15:38 GMT+01:00 Johnny Morano <johnny.morano@payon.com>:

Show quoted text

Hi Remi!

This SQL function you have provided, seems to return all valid files, is
that correct? In my case, it returned all my ‘base/’ files. Is that normal?

If yes, maybe you rename the function to ‘find_useful_postgres_files’ ;-)

Could you explain in steps how to use this function to make a cleanup of
bloated data? (like in an example with commands and example output, if
possible of course)

Thanks!

Mit besten Grüßen / With best regards,

Johnny Morano

____________________________________________________

*Johnny Morano | Principal Systems Engineer*

PAY.ON GmbH | AN ACI WORLDWIDE COMPANY | WWW.PAYON.COM
<http://www.payon.com/&gt;

Jakob-Haringer-Str. 1 | 5020 Salzburg | Austria

Registered at: LG Salzburg | Company number: FN 315081 f | VAT-ID:
ATU64439405

Managing Director: Christian Bamberger

Follow us on:

[image: cid:image001.jpg@01D126D0.E1AB0670] <http://blog.payon.com/&gt; [image:
cid:image002.jpg@01D126D0.E1AB0670]
<http://www.linkedin.com/company/146260?trk=tyah&gt; [image:
cid:image003.jpg@01D126D0.E1AB0670] <https://twitter.com/PAYON_com&gt;

This email message and any attachments may contain confidential,
proprietary or non-public information. This information is intended solely
for the designated recipient(s). If an addressing or transmission error has
misdirected this email, please notify the sender immediately and destroy
this email. Any review, dissemination, use or reliance upon this
information by unintended recipients is prohibited. Any opinions expressed
in this email are those of the author personally.

This message and any attachments have been scanned for viruses prior
leaving PAY.ON; however, PAY.ON does not guarantee the security of this
message and will not be responsible for any damages arising as a result of
any virus being passed on or arising from any alteration of this message by
a third party. PAY.ON may monitor e-mails sent to and from PAY.ON.

*From:* pgsql-general-owner@postgresql.org [mailto:
pgsql-general-owner@postgresql.org] *On Behalf Of *Rémi Cura
*Sent:* Mittwoch, 2. März 2016 14:58
*To:* Alvaro Herrera
*Cc:* PostgreSQL General
*Subject:* Re: [GENERAL] bloated postgres data folder, clean up

Would gladly do it,

but still this "wiki cooloff" stuff,

can't create a page

Cheers,

Rémi-C

2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Rémi Cura wrote:

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.

Would you add a new page to the wiki with this?

https://wiki.postgresql.org/wiki/Category:Administrative_Snippets

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

image003.jpgimage/jpeg; name=image003.jpgDownload
image001.jpgimage/jpeg; name=image001.jpgDownload
image002.jpgimage/jpeg; name=image002.jpgDownload
#7Johnny Morano
johnny.morano@payon.com
In reply to: Rémi Cura (#6)
Re: bloated postgres data folder, clean up

Hi,

So, I still think the SQL function isnt really working well.
Here’s what I did:

/data/postgres # psql

postgres=# \copy ( select * from find_useless_postgres_file('live') ) to /tmp/useless_files.csv delimiter ';' csv header;
postgres=# \q

/data/postgres # wc -l /tmp/useless_files.csv
7422 /tmp/useless_files.csv

# filter out the .## files, e.g.:
# 48175847.37;/base/16398/48175847.37;/data/postgres/base/16398/48175847.37;1047420928
# 48175847.36;/base/16398/48175847.36;/data/postgres/base/16398/48175847.36;1073741824
# 48175847.35;/base/16398/48175847.35;/data/postgres/base/16398/48175847.35;1073741824
# 48175847.34;/base/16398/48175847.34;/data/postgres/base/16398/48175847.34;1073741824
# 48175847.33;/base/16398/48175847.33;/data/postgres/base/16398/48175847.33;1073741824
# 48175847.32;/base/16398/48175847.32;/data/postgres/base/16398/48175847.32;1073741824
# 48175847.31;/base/16398/48175847.31;/data/postgres/base/16398/48175847.31;1073741824
# 48175847.30;/base/16398/48175847.30;/data/postgres/base/16398/48175847.30;1073741824
#
# because oid2name doesn't like them, gives error:
# /data/postgres # oid2name -f 48175847.30 -i -S -q -d live
# oid2name: query failed: ERROR: invalid input syntax for type oid: "48175847.30"
# LINE 11: (pg_catalog.pg_relation_filenode(c.oid) IN ('48175847.30')...
# ^
#
# oid2name: query was: SELECT pg_catalog.pg_relation_filenode(c.oid) as "Filenode", relname as "Table Name"
# FROM pg_catalog.pg_class c
# LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
# LEFT JOIN pg_catalog.pg_database d ON d.datname = pg_catalog.current_database(),
# pg_catalog.pg_tablespace t
# WHERE relkind IN ('r', 'm', 'i', 'S', 't') AND
# t.oid = CASE
# WHEN reltablespace <> 0 THEN reltablespace
# ELSE dattablespace
# END AND
# (pg_catalog.pg_relation_filenode(c.oid) IN ('48175847.30'))
# ORDER BY relname

/data/postgres # cut -d ';' -f1 /tmp/useless_files.csv | cut -d'.' -f1 | sort -n | uniq >/tmp/potential_useless_oids.csv
/data/postgres # wc -l /tmp/potential_useless_oids.csv
1017 /tmp/potential_useless_oids.csv

# get a list of all used oids, instead of examing one by one
/data/postgres # oid2name -i -S -q -d live | awk '{print $1}' | sort -n | uniq >/tmp/used_oids.csv
/data/postgres # wc -l /tmp/used_oids.csv
940 /tmp/used_oids.csv

/data/postgres # while read i; do grep $i /tmp/used_oids.csv >/dev/null || (echo "$i" >>/tmp/not_in_use_oids.csv); done < /tmp/potential_useless_oids.csv
/data/postgres # wc -l /tmp/not_in_use_oids.csv
168 /tmp/not_in_use_oids.csv

/data/postgres # egrep -v 'fsm|vm' /tmp/not_in_use_oids.csv |wc -l
1
/data/postgres # egrep -v 'fsm|vm' /tmp/not_in_use_oids.csv
file_name
# The CSV header only

So, no bloated files.
But, using the function from https://wiki.postgresql.org/wiki/Show_database_bloat I get:

live=# select tbloat,wasted_space from table_bloat order by wasted_space desc limit 25;
tbloat │ wasted_space
────────┼──────────────
1.0 │ 9976 kB
1.2 │ 98 GB
1.0 │ 97 MB
1.4 │ 96 kB
1.2 │ 920 kB
1.2 │ 88 kB
1.1 │ 88 kB
2.0 │ 8192 bytes
0.0 │ 8192 bytes
1.3 │ 8192 bytes
2.0 │ 8192 bytes
1.3 │ 8192 bytes
1.5 │ 8192 bytes
1.5 │ 8192 bytes
2.0 │ 8192 bytes
1.1 │ 8192 bytes
1.0 │ 8192 bytes
1.1 │ 8192 bytes
1.3 │ 8192 bytes
1.5 │ 8192 bytes
1.1 │ 80 kB
1.0 │ 7584 kB
1.6 │ 71 MB
1.0 │ 704 kB
1.1 │ 6968 kB
(25 rows)

So actually, quite a lot of bloated data ☺
What am I doing wrong?

Mit freundlichen Grüßen / With kind regards,
Johnny Morano
____________________________________________________

Johnny Morano | Principal Systems Engineer

PAY.ON GmbH | AN ACI WORLDWIDE COMPANY | WWW.PAYON.COM<http://www.payon.com/&gt;
Jakob-Haringer-Str. 1 | 5020 Salzburg | Austria

This email message and any attachments may contain confidential, proprietary or non-public information. This information is intended solely for the designated recipient(s). If an addressing or transmission error has misdirected this email, please notify the sender immediately and destroy this email. Any review, dissemination, use or reliance upon this information by unintended recipients is prohibited. Any opinions expressed in this email are those of the author personally.

From: Rémi Cura [mailto:remi.cura@gmail.com]
Sent: Mittwoch, 2. März 2016 17:49
To: Johnny Morano
Cc: Alvaro Herrera; PostgreSQL General
Subject: Re: [GENERAL] bloated postgres data folder, clean up

Hey,
this is quite the *opposite*.
The function find files in the postgres database folder that are not used by the database.
To use it :
* connect to the database you want to analyse ( **mandatory** ).
* create the function (execute function definition)
* Execute `SELECT * FROM find_useless_postgres_file('your_database_name')`

This will output a list of files that are on the disk but not used by postgres,
and so can be removed.
To be extra sure, you should use oid2name programme to check that the useless files are really useless.

For this :
* output the list of potential useless files with copy for instance
ex :
COPY ( SELECT file_name
FROM find_useless_postgres_file('your_database_name')
) TO 'path_to_you_database_folder/potential_useless.txt'
now you've got a file with a list of potential erroneous files.
* Then use oid2name
`$su postgres
$cd path_to_you_database_folder
$while read i; do oid2name -f "$i" -i -S -q -d your_database_name; done < potential_useless.txt
`
Nothing should show, meaning that every potential erroneous file
has not been recognized by oid2name !
If you feel unconvinced, you can manually try oid2name on some
of the potential erroneous files, to be extra sure.
It should not find anything.

* Now delete all the files in `potential_useless.txt`.
It could be wiser to not delete the files but rename those
(for instance, adding `.potentially_useless` as a postfix)
so if it breaks something, you have an easy way to revert everything.

Anyway, use *-*extra extra*-* caution if you delete.
Except a backup, there would be no easy way to correct a mistake.
Cheers,
Rémi-C

2016-03-02 15:38 GMT+01:00 Johnny Morano <johnny.morano@payon.com<mailto:johnny.morano@payon.com>>:
Hi Remi!

This SQL function you have provided, seems to return all valid files, is that correct? In my case, it returned all my ‘base/’ files. Is that normal?
If yes, maybe you rename the function to ‘find_useful_postgres_files’ ;-)

Could you explain in steps how to use this function to make a cleanup of bloated data? (like in an example with commands and example output, if possible of course)

Thanks!

Mit besten Grüßen / With best regards,
Johnny Morano
____________________________________________________

Johnny Morano | Principal Systems Engineer

PAY.ON GmbH | AN ACI WORLDWIDE COMPANY | WWW.PAYON.COM<http://www.payon.com/&gt;
Jakob-Haringer-Str. 1 | 5020 Salzburg | Austria
Registered at: LG Salzburg | Company number: FN 315081 f | VAT-ID: ATU64439405
Managing Director: Christian Bamberger

Follow us on:

[cid:image001.jpg@01D126D0.E1AB0670]<http://blog.payon.com/&gt; [cid:image002.jpg@01D126D0.E1AB0670] <http://www.linkedin.com/company/146260?trk=tyah&gt; [cid:image003.jpg@01D126D0.E1AB0670] <https://twitter.com/PAYON_com&gt;

This email message and any attachments may contain confidential, proprietary or non-public information. This information is intended solely for the designated recipient(s). If an addressing or transmission error has misdirected this email, please notify the sender immediately and destroy this email. Any review, dissemination, use or reliance upon this information by unintended recipients is prohibited. Any opinions expressed in this email are those of the author personally.

This message and any attachments have been scanned for viruses prior leaving PAY.ON; however, PAY.ON does not guarantee the security of this message and will not be responsible for any damages arising as a result of any virus being passed on or arising from any alteration of this message by a third party. PAY.ON may monitor e-mails sent to and from PAY.ON.

From: pgsql-general-owner@postgresql.org<mailto:pgsql-general-owner@postgresql.org> [mailto:pgsql-general-owner@postgresql.org<mailto:pgsql-general-owner@postgresql.org>] On Behalf Of Rémi Cura
Sent: Mittwoch, 2. März 2016 14:58
To: Alvaro Herrera
Cc: PostgreSQL General
Subject: Re: [GENERAL] bloated postgres data folder, clean up

Would gladly do it,
but still this "wiki cooloff" stuff,
can't create a page
Cheers,
Rémi-C

2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com<mailto:alvherre@2ndquadrant.com>>:
Rémi Cura wrote:

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.

Would you add a new page to the wiki with this?

https://wiki.postgresql.org/wiki/Category:Administrative_Snippets

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
image002.jpgimage/jpeg; name=image002.jpgDownload
image003.jpgimage/jpeg; name=image003.jpgDownload
#8Rémi Cura
remi.cura@gmail.com
In reply to: Johnny Morano (#7)
Re: bloated postgres data folder, clean up

Hey,
first I forgot something in the querry to remove the annoying .XXX :
-----------
SELECT distinct substring(file_name from '\d+' )
FROM find_useless_postgres_file('your_database_name') ;
-----------

Now it seems you do everything all right,
with a slight confusion between bloating and useless files.

From what I understood, bloating is when postgres creates lots of file
as a cache for a table that once was big, or saw a lot of actions.
Still, the created files are referenced by postgres and postgres might use
them.
They are _not_ to be deleted, and are _not_ useless.
You can reduce the number by `VACUUM FULL`

On the other hand, you may end up with useless files in your file system
that are _not_ referenced by postgres in any way,
and are basically trash, uselss for anything.
(broken files)
On way to produce those useless files is for example to launch a vacuum full
(create a lot of files, then use those to clean table, then delete those
files),
and kill this vacuum full before it ends.
There might by other causes but I'm not an expert.

So to conclude, the fact that my function doesn't find useless file in you
postgres database folder is rather good news,
and by no mean imply that your tables are bloated (or not)

Cheers
Rémi-C

2016-03-03 8:31 GMT+01:00 Johnny Morano <johnny.morano@payon.com>:

Show quoted text

Hi,

So, I still think the SQL function isnt really working well.

Here’s what I did:

/data/postgres # psql

postgres=# \copy ( select * from find_useless_postgres_file('live') ) to
/tmp/useless_files.csv delimiter ';' csv header;

postgres=# \q

/data/postgres # wc -l /tmp/useless_files.csv

7422 /tmp/useless_files.csv

# filter out the .## files, e.g.:

#
48175847.37;/base/16398/48175847.37;/data/postgres/base/16398/48175847.37;1047420928

#
48175847.36;/base/16398/48175847.36;/data/postgres/base/16398/48175847.36;1073741824

#
48175847.35;/base/16398/48175847.35;/data/postgres/base/16398/48175847.35;1073741824

#
48175847.34;/base/16398/48175847.34;/data/postgres/base/16398/48175847.34;1073741824

#
48175847.33;/base/16398/48175847.33;/data/postgres/base/16398/48175847.33;1073741824

#
48175847.32;/base/16398/48175847.32;/data/postgres/base/16398/48175847.32;1073741824

#
48175847.31;/base/16398/48175847.31;/data/postgres/base/16398/48175847.31;1073741824

#
48175847.30;/base/16398/48175847.30;/data/postgres/base/16398/48175847.30;1073741824

#

# because oid2name doesn't like them, gives error:

# /data/postgres # oid2name -f 48175847.30 -i -S -q -d live

# oid2name: query failed: ERROR: invalid input syntax for type oid:
"48175847.30"

# LINE 11: (pg_catalog.pg_relation_filenode(c.oid) IN ('48175847.30')...

# ^

#

# oid2name: query was: SELECT pg_catalog.pg_relation_filenode(c.oid) as
"Filenode", relname as "Table Name"

# FROM pg_catalog.pg_class c

# LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace

# LEFT JOIN pg_catalog.pg_database d ON d.datname =
pg_catalog.current_database(),

# pg_catalog.pg_tablespace t

# WHERE relkind IN ('r', 'm', 'i', 'S', 't') AND

# t.oid = CASE

# WHEN reltablespace <> 0 THEN reltablespace

# ELSE dattablespace

# END AND

# (pg_catalog.pg_relation_filenode(c.oid) IN ('48175847.30'))

# ORDER BY relname

/data/postgres # cut -d ';' -f1 /tmp/useless_files.csv | cut -d'.' -f1 |
sort -n | uniq >/tmp/potential_useless_oids.csv

/data/postgres # wc -l /tmp/potential_useless_oids.csv

1017 /tmp/potential_useless_oids.csv

# get a list of all used oids, instead of examing one by one

/data/postgres # oid2name -i -S -q -d live | awk '{print $1}' | sort -n |
uniq >/tmp/used_oids.csv

/data/postgres # wc -l /tmp/used_oids.csv

940 /tmp/used_oids.csv

/data/postgres # while read i; do grep $i /tmp/used_oids.csv >/dev/null ||
(echo "$i" >>/tmp/not_in_use_oids.csv); done <
/tmp/potential_useless_oids.csv

/data/postgres # wc -l /tmp/not_in_use_oids.csv

168 /tmp/not_in_use_oids.csv

/data/postgres # egrep -v 'fsm|vm' /tmp/not_in_use_oids.csv |wc -l

1

/data/postgres # egrep -v 'fsm|vm' /tmp/not_in_use_oids.csv

file_name

# The CSV header only

So, no bloated files.

But, using the function from
https://wiki.postgresql.org/wiki/Show_database_bloat I get:

live=# select tbloat,wasted_space from table_bloat order by wasted_space
desc limit 25;

tbloat │ wasted_space

────────┼──────────────

1.0 │ 9976 kB

1.2 │ 98 GB

1.0 │ 97 MB

1.4 │ 96 kB

1.2 │ 920 kB

1.2 │ 88 kB

1.1 │ 88 kB

2.0 │ 8192 bytes

0.0 │ 8192 bytes

1.3 │ 8192 bytes

2.0 │ 8192 bytes

1.3 │ 8192 bytes

1.5 │ 8192 bytes

1.5 │ 8192 bytes

2.0 │ 8192 bytes

1.1 │ 8192 bytes

1.0 │ 8192 bytes

1.1 │ 8192 bytes

1.3 │ 8192 bytes

1.5 │ 8192 bytes

1.1 │ 80 kB

1.0 │ 7584 kB

1.6 │ 71 MB

1.0 │ 704 kB

1.1 │ 6968 kB

(25 rows)

So actually, quite a lot of bloated data J

What am I doing wrong?

Mit freundlichen Grüßen / With kind regards,

Johnny Morano

____________________________________________________

*Johnny Morano | Principal Systems Engineer*

PAY.ON GmbH | AN ACI WORLDWIDE COMPANY | WWW.PAYON.COM
<http://www.payon.com/&gt;

Jakob-Haringer-Str. 1 | 5020 Salzburg | Austria

This email message and any attachments may contain confidential,
proprietary or non-public information. This information is intended solely
for the designated recipient(s). If an addressing or transmission error has
misdirected this email, please notify the sender immediately and destroy
this email. Any review, dissemination, use or reliance upon this
information by unintended recipients is prohibited. Any opinions expressed
in this email are those of the author personally.

*From:* Rémi Cura [mailto:remi.cura@gmail.com]
*Sent:* Mittwoch, 2. März 2016 17:49
*To:* Johnny Morano
*Cc:* Alvaro Herrera; PostgreSQL General

*Subject:* Re: [GENERAL] bloated postgres data folder, clean up

Hey,

this is quite the *opposite*.

The function find files in the postgres database folder that are not used
by the database.

To use it :

* connect to the database you want to analyse ( **mandatory** ).

* create the function (execute function definition)

* Execute `SELECT * FROM find_useless_postgres_file('your_database_name')`

This will output a list of files that are on the disk but not used by
postgres,

and so can be removed.

To be extra sure, you should use oid2name programme to check that the
useless files are really useless.

For this :
* output the list of potential useless files with copy for instance
ex :
COPY ( SELECT file_name

FROM find_useless_postgres_file('your_database_name')

) TO 'path_to_you_database_folder/potential_useless.txt'

now you've got a file with a list of potential erroneous files.

* Then use oid2name

`$su postgres
$cd path_to_you_database_folder

$while read i; do oid2name -f "$i" -i -S -q -d your_database_name; done
< potential_useless.txt
`

Nothing should show, meaning that every potential erroneous file
has not been recognized by oid2name !

If you feel unconvinced, you can manually try oid2name on some
of the potential erroneous files, to be extra sure.
It should not find anything.

* Now delete all the files in `potential_useless.txt`.

It could be wiser to not delete the files but rename those

(for instance, adding `.potentially_useless` as a postfix)

so if it breaks something, you have an easy way to revert everything.

Anyway, use *-*extra extra*-* caution if you delete.
Except a backup, there would be no easy way to correct a mistake.

Cheers,

Rémi-C

2016-03-02 15:38 GMT+01:00 Johnny Morano <johnny.morano@payon.com>:

Hi Remi!

This SQL function you have provided, seems to return all valid files, is
that correct? In my case, it returned all my ‘base/’ files. Is that normal?

If yes, maybe you rename the function to ‘find_useful_postgres_files’ ;-)

Could you explain in steps how to use this function to make a cleanup of
bloated data? (like in an example with commands and example output, if
possible of course)

Thanks!

Mit besten Grüßen / With best regards,

Johnny Morano

____________________________________________________

*Johnny Morano | Principal Systems Engineer*

PAY.ON GmbH | AN ACI WORLDWIDE COMPANY | WWW.PAYON.COM
<http://www.payon.com/&gt;

Jakob-Haringer-Str. 1 | 5020 Salzburg | Austria

Registered at: LG Salzburg | Company number: FN 315081 f | VAT-ID:
ATU64439405

Managing Director: Christian Bamberger

Follow us on:

[image: cid:image001.jpg@01D126D0.E1AB0670] <http://blog.payon.com/&gt; [image:
cid:image002.jpg@01D126D0.E1AB0670]
<http://www.linkedin.com/company/146260?trk=tyah&gt; [image:
cid:image003.jpg@01D126D0.E1AB0670] <https://twitter.com/PAYON_com&gt;

This email message and any attachments may contain confidential,
proprietary or non-public information. This information is intended solely
for the designated recipient(s). If an addressing or transmission error has
misdirected this email, please notify the sender immediately and destroy
this email. Any review, dissemination, use or reliance upon this
information by unintended recipients is prohibited. Any opinions expressed
in this email are those of the author personally.

This message and any attachments have been scanned for viruses prior
leaving PAY.ON; however, PAY.ON does not guarantee the security of this
message and will not be responsible for any damages arising as a result of
any virus being passed on or arising from any alteration of this message by
a third party. PAY.ON may monitor e-mails sent to and from PAY.ON.

*From:* pgsql-general-owner@postgresql.org [mailto:
pgsql-general-owner@postgresql.org] *On Behalf Of *Rémi Cura
*Sent:* Mittwoch, 2. März 2016 14:58
*To:* Alvaro Herrera
*Cc:* PostgreSQL General
*Subject:* Re: [GENERAL] bloated postgres data folder, clean up

Would gladly do it,

but still this "wiki cooloff" stuff,

can't create a page

Cheers,

Rémi-C

2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Rémi Cura wrote:

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.

Would you add a new page to the wiki with this?

https://wiki.postgresql.org/wiki/Category:Administrative_Snippets

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
image002.jpgimage/jpeg; name=image002.jpgDownload
image003.jpgimage/jpeg; name=image003.jpgDownload
#9Julien Rouhaud
rjuju123@gmail.com
In reply to: Rémi Cura (#8)
Re: bloated postgres data folder, clean up

On 03/03/2016 18:15, Rémi Cura wrote:

Hey,

Hello Rémi,

first I forgot something in the querry to remove the annoying .XXX :
-----------
SELECT distinct substring(file_name from '\d+' )
FROM find_useless_postgres_file('your_database_name') ;
-----------

Thanks for working on this :)

I added a comment on the gist URL you provided. It's a simplified
version of the main query that should work fine and detect more orphan
files. Double checking it would be a good idea though.

Also, as you can't check other databases than the one you're connected
to, I used current_database() instead of user defined database name.

It's also important to warn that all of this only work for finding
orphan files on the default tablespace (and to never blindly remove
files in the PGDATA of course).

Now it seems you do everything all right,
with a slight confusion between bloating and useless files.

From what I understood, bloating is when postgres creates lots of file
as a cache for a table that once was big, or saw a lot of actions.
Still, the created files are referenced by postgres and postgres might
use them.
They are _not_ to be deleted, and are _not_ useless.
You can reduce the number by `VACUUM FULL`

On the other hand, you may end up with useless files in your file system
that are _not_ referenced by postgres in any way,
and are basically trash, uselss for anything.
(broken files)
On way to produce those useless files is for example to launch a vacuum full
(create a lot of files, then use those to clean table, then delete those
files),
and kill this vacuum full before it ends.
There might by other causes but I'm not an expert.

So to conclude, the fact that my function doesn't find useless file in
you postgres database folder is rather good news,
and by no mean imply that your tables are bloated (or not)

Cheers
Rémi-C

2016-03-03 8:31 GMT+01:00 Johnny Morano <johnny.morano@payon.com
<mailto:johnny.morano@payon.com>>:

Hi,____

__ __

So, I still think the SQL function isnt really working well.____

Here’s what I did:____

__ __

/data/postgres # psql ____

__ __

postgres=# \copy ( select * from find_useless_postgres_file('live')
) to /tmp/useless_files.csv delimiter ';' csv header;____

postgres=# \q____

__ __

/data/postgres # wc -l /tmp/useless_files.csv____

7422 /tmp/useless_files.csv____

__ __

# filter out the .## files, e.g.:____

#
48175847.37;/base/16398/48175847.37;/data/postgres/base/16398/48175847.37;1047420928____

#
48175847.36;/base/16398/48175847.36;/data/postgres/base/16398/48175847.36;1073741824____

#
48175847.35;/base/16398/48175847.35;/data/postgres/base/16398/48175847.35;1073741824____

#
48175847.34;/base/16398/48175847.34;/data/postgres/base/16398/48175847.34;1073741824____

#
48175847.33;/base/16398/48175847.33;/data/postgres/base/16398/48175847.33;1073741824____

#
48175847.32;/base/16398/48175847.32;/data/postgres/base/16398/48175847.32;1073741824____

#
48175847.31;/base/16398/48175847.31;/data/postgres/base/16398/48175847.31;1073741824____

#
48175847.30;/base/16398/48175847.30;/data/postgres/base/16398/48175847.30;1073741824____

#____

# because oid2name doesn't like them, gives error:____

# /data/postgres # oid2name -f 48175847.30 -i -S -q -d live____

# oid2name: query failed: ERROR: invalid input syntax for type oid:
"48175847.30"____

# LINE 11: (pg_catalog.pg_relation_filenode(c.oid) IN
('48175847.30')...____

# ^____

# ____

# oid2name: query was: SELECT pg_catalog.pg_relation_filenode(c.oid)
as "Filenode", relname as "Table Name" ____

# FROM pg_catalog.pg_class c ____

# LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
____

# LEFT JOIN pg_catalog.pg_database d ON d.datname =
pg_catalog.current_database(),____

# pg_catalog.pg_tablespace t ____

# WHERE relkind IN ('r', 'm', 'i', 'S', 't') AND ____

# t.oid = CASE____

# WHEN reltablespace <> 0 THEN reltablespace____

# ELSE dattablespace____

# END AND ____

# (pg_catalog.pg_relation_filenode(c.oid) IN ('48175847.30')) ____

# ORDER BY relname____

__ __

/data/postgres # cut -d ';' -f1 /tmp/useless_files.csv | cut -d'.'
-f1 | sort -n | uniq >/tmp/potential_useless_oids.csv____

/data/postgres # wc -l /tmp/potential_useless_oids.csv____

1017 /tmp/potential_useless_oids.csv____

__ __

# get a list of all used oids, instead of examing one by one____

/data/postgres # oid2name -i -S -q -d live | awk '{print $1}' | sort
-n | uniq >/tmp/used_oids.csv____

/data/postgres # wc -l /tmp/used_oids.csv ____

940 /tmp/used_oids.csv____

__ __

/data/postgres # while read i; do grep $i /tmp/used_oids.csv

/dev/null || (echo "$i" >>/tmp/not_in_use_oids.csv); done <

/tmp/potential_useless_oids.csv____

/data/postgres # wc -l /tmp/not_in_use_oids.csv____

168 /tmp/not_in_use_oids.csv____

__ __

/data/postgres # egrep -v 'fsm|vm' /tmp/not_in_use_oids.csv |wc -l____

1____

/data/postgres # egrep -v 'fsm|vm' /tmp/not_in_use_oids.csv ____

file_name____

# The CSV header only____

__ __

So, no bloated files.____

But, using the function from
https://wiki.postgresql.org/wiki/Show_database_bloat I get:____

__ __

live=# select tbloat,wasted_space from table_bloat order by
wasted_space desc limit 25;____

tbloat │ wasted_space ____

────────┼──────────────____

1.0 │ 9976 kB____

1.2 │ 98 GB____

1.0 │ 97 MB____

1.4 │ 96 kB____

1.2 │ 920 kB____

1.2 │ 88 kB____

1.1 │ 88 kB____

2.0 │ 8192 bytes____

0.0 │ 8192 bytes____

1.3 │ 8192 bytes____

2.0 │ 8192 bytes____

1.3 │ 8192 bytes____

1.5 │ 8192 bytes____

1.5 │ 8192 bytes____

2.0 │ 8192 bytes____

1.1 │ 8192 bytes____

1.0 │ 8192 bytes____

1.1 │ 8192 bytes____

1.3 │ 8192 bytes____

1.5 │ 8192 bytes____

1.1 │ 80 kB____

1.0 │ 7584 kB____

1.6 │ 71 MB____

1.0 │ 704 kB____

1.1 │ 6968 kB____

(25 rows)____

__ __

So actually, quite a lot of bloated data J____

What am I doing wrong?____

__ __

Mit freundlichen Grüßen / With kind regards,____

Johnny Morano____

________________________________________________________

__ __

*Johnny Morano | Principal Systems Engineer____*

__ __

PAY.ON GmbH | AN ACI WORLDWIDE COMPANY | WWW.PAYON.COM
<http://www.payon.com/&gt;____

Jakob-Haringer-Str. 1 | 5020 Salzburg | Austria____

__ __

This email message and any attachments may contain confidential,
proprietary or non-public information. This information is intended
solely for the designated recipient(s). If an addressing or
transmission error has misdirected this email, please notify the
sender immediately and destroy this email. Any review,
dissemination, use or reliance upon this information by unintended
recipients is prohibited. Any opinions expressed in this email are
those of the author personally.____

__ __

*From:*Rémi Cura [mailto:remi.cura@gmail.com
<mailto:remi.cura@gmail.com>]
*Sent:* Mittwoch, 2. März 2016 17:49
*To:* Johnny Morano
*Cc:* Alvaro Herrera; PostgreSQL General

*Subject:* Re: [GENERAL] bloated postgres data folder, clean up____

__ __

Hey,____

this is quite the *opposite*.____

The function find files in the postgres database folder that are not
used by the database.____

To use it : ____

* connect to the database you want to analyse ( **mandatory** ).____

* create the function (execute function definition)____

* Execute `SELECT * FROM
find_useless_postgres_file('your_database_name')`____

This will output a list of files that are on the disk but not used
by postgres,____

and so can be removed.____

To be extra sure, you should use oid2name programme to check that
the useless files are really useless.

For this :
* output the list of potential useless files with copy for instance
ex :
COPY ( SELECT file_name____

FROM find_useless_postgres_file('your_database_name')____

) TO 'path_to_you_database_folder/potential_useless.txt'____

now you've got a file with a list of potential erroneous files.____

* Then use oid2name____

`$su postgres
$cd path_to_you_database_folder____

$while read i; do oid2name -f "$i" -i -S -q -d
your_database_name; done < potential_useless.txt
`____

Nothing should show, meaning that every potential erroneous file
has not been recognized by oid2name !____

If you feel unconvinced, you can manually try oid2name on some
of the potential erroneous files, to be extra sure.
It should not find anything.____

__ __

* Now delete all the files in `potential_useless.txt`.____

It could be wiser to not delete the files but rename those____

(for instance, adding `.potentially_useless` as a postfix)____

so if it breaks something, you have an easy way to revert
everything.____

__ __

Anyway, use *-*extra extra*-* caution if you delete.
Except a backup, there would be no easy way to correct a mistake.____

Cheers,____

Rémi-C____

__ __

2016-03-02 15:38 GMT+01:00 Johnny Morano <johnny.morano@payon.com
<mailto:johnny.morano@payon.com>>:____

Hi Remi!____

____

This SQL function you have provided, seems to return all valid
files, is that correct? In my case, it returned all my ‘base/’
files. Is that normal?____

If yes, maybe you rename the function to
‘find_useful_postgres_files’ ;-)____

____

Could you explain in steps how to use this function to make a
cleanup of bloated data? (like in an example with commands and
example output, if possible of course)____

____

Thanks!____

____

____

Mit besten Grüßen / With best regards,____

Johnny Morano____

________________________________________________________

____

*Johnny Morano | Principal Systems Engineer*____

____

PAY.ON GmbH | AN ACI WORLDWIDE COMPANY | WWW.PAYON.COM
<http://www.payon.com/&gt;____

Jakob-Haringer-Str. 1 | 5020 Salzburg | Austria____

Registered at: LG Salzburg | Company number: FN 315081 f |
VAT-ID: ATU64439405____

Managing Director: Christian Bamberger____

____

____

Follow us on: ____

____

cid:image001.jpg@01D126D0.E1AB0670
<http://blog.payon.com/&gt; cid:image002.jpg@01D126D0.E1AB0670
<http://www.linkedin.com/company/146260?trk=tyah&gt; cid:image003.jpg@01D126D0.E1AB0670
<https://twitter.com/PAYON_com&gt; ____

____

This email message and any attachments may contain confidential,
proprietary or non-public information. This information is intended
solely for the designated recipient(s). If an addressing or
transmission error has misdirected this email, please notify the
sender immediately and destroy this email. Any review,
dissemination, use or reliance upon this information by unintended
recipients is prohibited. Any opinions expressed in this email are
those of the author personally.____

____

This message and any attachments have been scanned for viruses prior
leaving PAY.ON; however, PAY.ON does not guarantee the security of
this message and will not be responsible for any damages arising as
a result of any virus being passed on or arising from any alteration
of this message by a third party. PAY.ON may monitor e-mails sent to
and from PAY.ON.____

____

____

____

____

*From:*pgsql-general-owner@postgresql.org
<mailto:pgsql-general-owner@postgresql.org>
[mailto:pgsql-general-owner@postgresql.org
<mailto:pgsql-general-owner@postgresql.org>] *On Behalf Of *Rémi Cura
*Sent:* Mittwoch, 2. März 2016 14:58
*To:* Alvaro Herrera
*Cc:* PostgreSQL General
*Subject:* Re: [GENERAL] bloated postgres data folder, clean up____

____

Would gladly do it,____

but still this "wiki cooloff" stuff,____

can't create a page____

Cheers,____

Rémi-C____

____

2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com
<mailto:alvherre@2ndquadrant.com>>:____

Rémi Cura wrote:

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.

Would you add a new page to the wiki with this?

https://wiki.postgresql.org/wiki/Category:Administrative_Snippets

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training &
Services____

____

__ __

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Rémi Cura
remi.cura@gmail.com
In reply to: Julien Rouhaud (#9)
Re: bloated postgres data folder, clean up

​Hey Julien,
thanks for the original query !

There was a slight mistake in the query,
it was comparing the file name with
pg_class.relfilenode.
It is not safe in some case (see doc : "caution" in here
<http://www.postgresql.org/docs/current/static/storage-file-layout.html&gt;)
, so better use the pg_relation_filenode() function.
AS a result this database could not be started anymore (no worry I had
copy).
However using pg_relation_filenode() seems to be safe (passes vacuum full
analyse).

I'll modify the query as soon as I have access to gist.

I agree the warning about ​tablespace is also important.
I'll put all of this on the wiki as soon has I have permission to create a
new page
(man, how long is this "cool-off", it's been already several days !).

Cheers,
Rémi-C
​​

2016-03-03 20:10 GMT+01:00 Julien Rouhaud <julien.rouhaud@dalibo.com>:

Show quoted text

On 03/03/2016 18:15, Rémi Cura wrote:

Hey,

Hello Rémi,

first I forgot something in the querry to remove the annoying .XXX :
-----------
SELECT distinct substring(file_name from '\d+' )
FROM find_useless_postgres_file('your_database_name') ;
-----------

Thanks for working on this :)

I added a comment on the gist URL you provided. It's a simplified
version of the main query that should work fine and detect more orphan
files. Double checking it would be a good idea though.

Also, as you can't check other databases than the one you're connected
to, I used current_database() instead of user defined database name.

It's also important to warn that all of this only work for finding
orphan files on the default
​​
tablespace (and to never blindly remove
files in the PGDATA of course).
​​