Some frustrations with admin tasks on PGSQL database
I'm loving the fact that while I am doing some one-time updates to the
DB, users can still SELECT away to glory. This is a major boon in
comparison to my experience with another major opensource database.
However, I am a little frustrated by the amount of time PGSQL takes to
complete tasks. Just to accommodate these tasks, my conf file has the
following:
autovacuum = off
wal_buffers=64
checkpoint_segments=1000
checkpoint_timeout=900
fsync = off
maintenance_work_mem = 128MB
[PS: I will enable fsync after these operations, and decrease the
checkpoint_segments.]
I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:
UPDATE mytable SET mycolumn = lower(mycolumn);
This is, at the time of this writing, has taken well over 35 minutes!
On a table of a "mere" 6 million rows (quoted from one discussion on
this mailing list).
I am on a 4GB RAM machine with two Intel Dual Core processors. Albeit
this is not a dedicated db server, another comparable FOSS database
never took these kinds of times to perform its operations.
Suspecting that locking may be the cause of this, I read up on
http://www.postgresql.org/docs/8.2/static/explicit-locking.html and
found nothing specific that would help a person starting out on the DB
to actually do meaningful explicit locking that the UPDATE command
does not already do.
I am now trying doing something like
UPDATE mytable SET mycolumn = lower(mycolumn)
WHERE id BETWEEN x AND y ;
This is way too laborious and untenable because I want to put the
fsync back on as soon as possible; this is a production database!
What else can I do to make this go fast enough to be normal!? Penny
for any thoughts and tips.
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
However, I am a little frustrated by the amount of time PGSQL takes to
complete tasks. Just to accommodate these tasks, my conf file has the
following:autovacuum = off
wal_buffers=64
checkpoint_segments=1000
checkpoint_timeout=900
fsync = off
maintenance_work_mem = 128MB
I note you did not give either the shared_buffers setting or the
work_mem setting, both of which would have a major impact on your
problem.
I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:UPDATE mytable SET mycolumn = lower(mycolumn);
This is, at the time of this writing, has taken well over 35 minutes!
On a table of a "mere" 6 million rows (quoted from one discussion on
this mailing list).
How big is the actual table itself (in bytes).
Suspecting that locking may be the cause of this, I read up on
http://www.postgresql.org/docs/8.2/static/explicit-locking.html and
found nothing specific that would help a person starting out on the DB
to actually do meaningful explicit locking that the UPDATE command
does not already do.
I doubt it's locking.
What else can I do to make this go fast enough to be normal!? Penny
for any thoughts and tips.
Check your I/O throughput. You should be maxing out your disk...
Although, with 6million rows, it might even fit in memory. Can you see
(in ps) what it's actually doing?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On 17/08/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
However, I am a little frustrated by the amount of time PGSQL takes to
complete tasks. Just to accommodate these tasks, my conf file has the
following:autovacuum = off
wal_buffers=64
checkpoint_segments=1000
checkpoint_timeout=900
fsync = off
maintenance_work_mem = 128MBI note you did not give either the shared_buffers setting or the
work_mem setting, both of which would have a major impact on your
problem.
Oh, sorry, here are the rest of them:
max_connections = 250
shared_buffers = 60000
effective_cache_size = 60000
work_mem = 60000
max_fsm_relations = 1500
max_fsm_pages = 80000
sort_mem = 16348
temp_buffers = 4096
authentication_timeout = 10s
ssl = off
Are these too low?
How big is the actual table itself (in bytes).
Where should I find this?
I tried "/usr/lib/pgsql/data" and "/var/lib/pgsql/data" but there is
no folder that seems to show the data files.
TIA!
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:
UPDATE mytable SET mycolumn = lower(mycolumn);
can you please check this:
select count(*) from mytable;
select count(*) from mytable where mycolumn ~ '[A-Z]';
and if the second is lower than first make the update:
update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]';
of course if your data contain national characters you have to include
them (upper case only) in this regexp.
depesz
--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)
On Fri, Aug 17, 2007 at 09:50:42PM +0800, Phoenix Kiula wrote:
How big is the actual table itself (in bytes).
Where should I find this?
select pg_relation_size('mytable');
depesz
--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)
On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:
UPDATE mytable SET mycolumn = lower(mycolumn);can you please check this:
select count(*) from mytable;
select count(*) from mytable where mycolumn ~ '[A-Z]';and if the second is lower than first make the update:
update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]';of course if your data contain national characters you have to include
them (upper case only) in this regexp.
Wow, smartest advice of the day! Yes, a lot of our data in that column
has dots and numbers (800,000 compared to 6 million), so I wanted to
get only to the stuff that was pure alphabets, but just didn't think
of how.
[Slithers away to get dunce cap].
Thanks much!
On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote:
Wow, smartest advice of the day! Yes, a lot of our data in that column
has dots and numbers (800,000 compared to 6 million), so I wanted to
get only to the stuff that was pure alphabets, but just didn't think
of how.
what i really meant was: check how many record will actually be modified
- i.e. contain upper-case letters.
the fact that some records contain only dots and digits is nice, but i
suspect that you also have a lot of record with only lower-case letters
in the field. or am i wrong?
depesz
--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)
On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:
On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote:
Wow, smartest advice of the day! Yes, a lot of our data in that column
has dots and numbers (800,000 compared to 6 million), so I wanted to
get only to the stuff that was pure alphabets, but just didn't think
of how.what i really meant was: check how many record will actually be modified
- i.e. contain upper-case letters.the fact that some records contain only dots and digits is nice, but i
suspect that you also have a lot of record with only lower-case letters
in the field. or am i wrong?
You guessed it right. The query makes perfect sense. Only update those
records that have uppercase, and therefore require updating! Many
thanks.