Redundant database objects.
Hi all,
Our project has been running for 10 years now.
We have a large number of orphaned or redundant tables, views, and
functions, due to many years of inadequate source management.
We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC
gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 "
Is there an effective way to identify these objects using the stats tables?
Something like a last accessed/used or some such column?
Any suggestions welcomed.
Thanks
Andrew Bartley
Aimstats Pty Ltd
On 07/12/2010 02:40 PM, Andrew Bartley wrote:
We have a large number of orphaned or redundant tables, views, and
functions, due to many years of inadequate source management.We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC
gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 "Is there an effective way to identify these objects using the stats
tables? Something like a last accessed/used or some such column?
Maybe pg_statio* views?
http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html
Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support
Thanks Joe,
Unfortunately these views only give me what appears to be a certain time
frame. This does not help all that much. It will give a list of tables,
indexes and sequences that have been used in the time frame, so that is at
least a start.
It would be good if there was a timestamp (last accessed) that would give me
a clearer indication.
Thanks
Andrew
On 13 July 2010 08:46, Joe Conway <mail@joeconway.com> wrote:
Show quoted text
On 07/12/2010 02:40 PM, Andrew Bartley wrote:
We have a large number of orphaned or redundant tables, views, and
functions, due to many years of inadequate source management.We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC
gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 "Is there an effective way to identify these objects using the stats
tables? Something like a last accessed/used or some such column?Maybe pg_statio* views?
http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.htmlJoe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support
Andrew Bartley wrote:
Unfortunately these views only give me what appears to be a certain
time frame. This does not help all that much. It will give a list of
tables, indexes and sequences that have been used in the time frame,
so that is at least a start.
You can use pg_stat_reset() to set those back to 0 again and then see
what actually gets used moving forward from the point you do that.
That's a reasonable idea to do anyway to make all those statistics
better reflect recent activity rather than historical. Just be warned
that it will screw up many monitoring systems if you have them pointed
toward those statistics tables and grabbing snapshots, some will view
the reset as the values going negative which doesn't make any real-world
sense.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
Thanks Greg,
It seems that the underlying stats tables are reset on a periodic basis, can
i stop this process? Is it a .conf setting? I have had a good look around,
nothing sticks out. If I can stop it, then i could use pg_stat_reset() then
monitor the stat views over an extended period without them being reset by
some periodic job. That, at the moment, is my main concern.
Thanks again.
Also i need to find similar information regarding functions and views....
Any suggestions?
Thanks
Andrew Bartley
On 13 July 2010 09:45, Greg Smith <greg@2ndquadrant.com> wrote:
Show quoted text
Andrew Bartley wrote:
Unfortunately these views only give me what appears to be a certain time
frame. This does not help all that much. It will give a list of tables,
indexes and sequences that have been used in the time frame, so that is at
least a start.You can use pg_stat_reset() to set those back to 0 again and then see what
actually gets used moving forward from the point you do that. That's a
reasonable idea to do anyway to make all those statistics better reflect
recent activity rather than historical. Just be warned that it will screw
up many monitoring systems if you have them pointed toward those statistics
tables and grabbing snapshots, some will view the reset as the values going
negative which doesn't make any real-world sense.--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
Hello.
When we moved old projects from postgresql 7.x to 8.4, I just looked at
modification time for files in base/<dboid>/<toid> . So, I could
determine, that some databases were inactive (precisely,not updated) for
about a year and move them to archive...
Andrew Bartley wrote:
Is there an effective way to identify these objects using the stats
tables? Something like a last accessed/used or some such column?Any suggestions welcomed.
Thanks
Andrew Bartley
Aimstats Pty Ltd
--
С уважением,
Александр Пыхалов,
системный администратор ЮГИНФО ЮФУ.
Thanks Alexander,
Wish i had thought of that.
I still need some way of finding redundant functions
Thanks again
Andrew
On 13 July 2010 15:38, Alexander Pyhalov <alp@rsu.ru> wrote:
Show quoted text
Hello.
When we moved old projects from postgresql 7.x to 8.4, I just looked at
modification time for files in base/<dboid>/<toid> . So, I could determine,
that some databases were inactive (precisely,not updated) for about a year
and move them to archive...Andrew Bartley wrote:
Is there an effective way to identify these objects using the stats
tables? Something like a last accessed/used or some such column?
Any suggestions welcomed.
ThanksAndrew Bartley
Aimstats Pty Ltd--
С уважением,
Александр Пыхалов,
системный администратор ЮГИНФО ЮФУ.
On 07/12/2010 11:07 PM, Andrew Bartley wrote:
I still need some way of finding redundant functions
A bit of a blunt instrument, but you could log all statements for a
while, and then grep through the logs using a list of all functions of
interest to see which ones never show up. Be wary of the performance hit
and rapidly growing log though. Another idea would be to modify a copy
of 8.3.x source code (I think that's what you said you were on in an
earlier post) to emit a NOTICE with a name whenever a function is called
if it meets some criteria.
Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support
Hello
2010/7/13 Andrew Bartley <ambartley@gmail.com>:
Thanks Alexander,
Wish i had thought of that.
I still need some way of finding redundant functions
Thanks again
Andrew
I used a function source code injection for this task
article is czech, but comments in code are in english (I used it in 8.3)
you can use a PL profiler too
http://www.depesz.com/index.php/2008/05/15/waiting-for-84-function-stats/
Regards
Pavel Stehule
Andrew Bartley wrote:
It seems that the underlying stats tables are reset on a periodic
basis, can i stop this process? Is it a .conf setting?
Up until PostgreSQL 8.2 there's a setting named
stats_reset_on_server_start that clears everything when the server
stops:
http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.html
If you're on that version or earlier and it's turned on, there's your
problem. This went away in 8.3.
Also i need to find similar information regarding functions and
views.... Any suggestions?
Some suggestions already popped up here for functions. Views are
tougher because they essentially work like a macro substitution: the
content of the view gets substituted into the query where it appears,
and off the query planner goes. That's why there's no statistics about
them, they don't actually exist as objects that things are executed
against. I don't know of any way to track their use other than to log
all your queries and look for them popping up. A grep against the
application source code for them can be useful too.
The flip side to that is that eliminating views doesn't really improve
performance, so it's rarely a top priority to get rid of them--unlike
unused indexes for example.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
Thanks to all that replied,
I used Joe Conway's suggestion, using grep and an extracted list of tables,
functions and views form the DB. It worked very well.
I will attach the code I used to this thread once complete.
Again Thanks
Andrew Bartley
On 14 July 2010 00:43, Greg Smith <greg@2ndquadrant.com> wrote:
Show quoted text
Andrew Bartley wrote:
It seems that the underlying stats tables are reset on a periodic basis,
can i stop this process? Is it a .conf setting?Up until PostgreSQL 8.2 there's a setting named stats_reset_on_server_start
that clears everything when the server stops:
http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.htmlIf you're on that version or earlier and it's turned on, there's your
problem. This went away in 8.3.Also i need to find similar information regarding functions and views....
Any suggestions?
Some suggestions already popped up here for functions. Views are tougher
because they essentially work like a macro substitution: the content of the
view gets substituted into the query where it appears, and off the query
planner goes. That's why there's no statistics about them, they don't
actually exist as objects that things are executed against. I don't know of
any way to track their use other than to log all your queries and look for
them popping up. A grep against the application source code for them can be
useful too.The flip side to that is that eliminating views doesn't really improve
performance, so it's rarely a top priority to get rid of them--unlike unused
indexes for example.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
On Thu, Jul 15, 2010 at 1:04 AM, Andrew Bartley <ambartley@gmail.com> wrote:
Thanks to all that replied,
I used Joe Conway's suggestion, using grep and an extracted list of tables,
functions and views form the DB. It worked very well.I will attach the code I used to this thread once complete.
Again Thanks
Andrew BartleyOn 14 July 2010 00:43, Greg Smith <greg@2ndquadrant.com> wrote:
Andrew Bartley wrote:
It seems that the underlying stats tables are reset on a periodic basis,
can i stop this process? Is it a .conf setting?Up until PostgreSQL 8.2 there's a setting named
stats_reset_on_server_start that clears everything when the server stops:
http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.htmlIf you're on that version or earlier and it's turned on, there's your
problem. This went away in 8.3.Also i need to find similar information regarding functions and views....
Any suggestions?Some suggestions already popped up here for functions. Views are tougher
because they essentially work like a macro substitution: the content of the
view gets substituted into the query where it appears, and off the query
planner goes. That's why there's no statistics about them, they don't
actually exist as objects that things are executed against. I don't know of
any way to track their use other than to log all your queries and look for
them popping up. A grep against the application source code for them can be
useful too.The flip side to that is that eliminating views doesn't really improve
performance, so it's rarely a top priority to get rid of them--unlike unused
indexes for example.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
After some time of looking for a solution to a similar problem I came
up with the following probable solution.
1)Enable logging to CSV format.
2)Then log all queries, insert, update, deletion statements for the
given cluster.
3)Run your application for a period sufficient to have captured the
execution of all possible functions and the deletes, inserts and
updates to the tables and sequences.
4)Create a table in other database that conforms to the the field
structure of CSV logging.
5)Populate this table with the contents of your CSV file.
6)Now (I think) all the database objects invoked or used in anyway
(not sure about nested function calls though) will be listed in this
table. So now you may query the appropriate field(s) to find out the
"active" database objects.
Since you are interested in weeding out the "inactive" objects and
while obviously not loosing data or useful database objects I was
thinking of the additional steps
7)After working hours stop your application(s) and somehow perform a
database dump and restore (to other brand new database) on only these
"active" database objects (maybe using --table=<activetable1>
--table=<activetable..n>). Other database objects appearing in your
"active database objects list" may require manual creation on your new
database.
8)Rename the original database to other suitable label and rename the
new database to the original name of the original database.
9)Start your application, perform some investigations to see if all is well.
Allan.