Some frustrations with admin tasks on PGSQL database

Started by Phoenix Kiulaover 18 years ago8 messagesgeneral

phoenix.kiula@gmail.com

over 18 years ago

I'm loving the fact that while I am doing some one-time updates to the
DB, users can still SELECT away to glory. This is a major boon in
comparison to my experience with another major opensource database.

However, I am a little frustrated by the amount of time PGSQL takes to
complete tasks. Just to accommodate these tasks, my conf file has the
following:

autovacuum = off
wal_buffers=64
checkpoint_segments=1000
checkpoint_timeout=900
fsync = off
maintenance_work_mem = 128MB

[PS: I will enable fsync after these operations, and decrease the
checkpoint_segments.]

I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:

UPDATE mytable SET mycolumn = lower(mycolumn);

This is, at the time of this writing, has taken well over 35 minutes!
On a table of a "mere" 6 million rows (quoted from one discussion on
this mailing list).

I am on a 4GB RAM machine with two Intel Dual Core processors. Albeit
this is not a dedicated db server, another comparable FOSS database
never took these kinds of times to perform its operations.

Suspecting that locking may be the cause of this, I read up on
http://www.postgresql.org/docs/8.2/static/explicit-locking.html and
found nothing specific that would help a person starting out on the DB
to actually do meaningful explicit locking that the UPDATE command
does not already do.

I am now trying doing something like

UPDATE mytable SET mycolumn = lower(mycolumn)
WHERE id BETWEEN x AND y ;

This is way too laborious and untenable because I want to put the
fsync back on as soon as possible; this is a production database!

What else can I do to make this go fast enough to be normal!? Penny
for any thoughts and tips.

Martijn van Oosterhout

kleptog@svana.org

over 18 years ago

In reply to: Phoenix Kiula (#1)

Re: Some frustrations with admin tasks on PGSQL database

On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:

However, I am a little frustrated by the amount of time PGSQL takes to
complete tasks. Just to accommodate these tasks, my conf file has the
following:

autovacuum = off
wal_buffers=64
checkpoint_segments=1000
checkpoint_timeout=900
fsync = off
maintenance_work_mem = 128MB

I note you did not give either the shared_buffers setting or the
work_mem setting, both of which would have a major impact on your
problem.

I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:

UPDATE mytable SET mycolumn = lower(mycolumn);

This is, at the time of this writing, has taken well over 35 minutes!
On a table of a "mere" 6 million rows (quoted from one discussion on
this mailing list).

How big is the actual table itself (in bytes).

Suspecting that locking may be the cause of this, I read up on
http://www.postgresql.org/docs/8.2/static/explicit-locking.html and
found nothing specific that would help a person starting out on the DB
to actually do meaningful explicit locking that the UPDATE command
does not already do.

I doubt it's locking.

What else can I do to make this go fast enough to be normal!? Penny
for any thoughts and tips.

Check your I/O throughput. You should be maxing out your disk...
Although, with 6million rows, it might even fit in memory. Can you see
(in ps) what it's actually doing?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

Phoenix Kiula

phoenix.kiula@gmail.com

over 18 years ago

In reply to: Martijn van Oosterhout (#2)

Re: Some frustrations with admin tasks on PGSQL database

On 17/08/07, Martijn van Oosterhout <kleptog@svana.org> wrote:

On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:

However, I am a little frustrated by the amount of time PGSQL takes to
complete tasks. Just to accommodate these tasks, my conf file has the
following:

autovacuum = off
wal_buffers=64
checkpoint_segments=1000
checkpoint_timeout=900
fsync = off
maintenance_work_mem = 128MB

I note you did not give either the shared_buffers setting or the
work_mem setting, both of which would have a major impact on your
problem.

Oh, sorry, here are the rest of them:

max_connections = 250
shared_buffers = 60000
effective_cache_size = 60000
work_mem = 60000
max_fsm_relations = 1500
max_fsm_pages = 80000
sort_mem = 16348
temp_buffers = 4096
authentication_timeout = 10s
ssl = off

Are these too low?

How big is the actual table itself (in bytes).

Where should I find this?

I tried "/usr/lib/pgsql/data" and "/var/lib/pgsql/data" but there is
no folder that seems to show the data files.

TIA!

hubert depesz lubaczewski

depesz@depesz.com

over 18 years ago

In reply to: Phoenix Kiula (#1)

Re: Some frustrations with admin tasks on PGSQL database

On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:

I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:
UPDATE mytable SET mycolumn = lower(mycolumn);

can you please check this:

select count(*) from mytable;
select count(*) from mytable where mycolumn ~ '[A-Z]';

and if the second is lower than first make the update:
update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]';

of course if your data contain national characters you have to include
them (upper case only) in this regexp.

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

hubert depesz lubaczewski

depesz@depesz.com

over 18 years ago

In reply to: Phoenix Kiula (#3)

Re: Some frustrations with admin tasks on PGSQL database

On Fri, Aug 17, 2007 at 09:50:42PM +0800, Phoenix Kiula wrote:

How big is the actual table itself (in bytes).

Where should I find this?

select pg_relation_size('mytable');

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Phoenix Kiula

phoenix.kiula@gmail.com

over 18 years ago

In reply to: hubert depesz lubaczewski (#4)

Re: Some frustrations with admin tasks on PGSQL database

On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:

On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:

I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:
UPDATE mytable SET mycolumn = lower(mycolumn);

can you please check this:

select count(*) from mytable;
select count(*) from mytable where mycolumn ~ '[A-Z]';

and if the second is lower than first make the update:
update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]';

of course if your data contain national characters you have to include
them (upper case only) in this regexp.

Wow, smartest advice of the day! Yes, a lot of our data in that column
has dots and numbers (800,000 compared to 6 million), so I wanted to
get only to the stuff that was pure alphabets, but just didn't think
of how.

[Slithers away to get dunce cap].

Thanks much!

hubert depesz lubaczewski

depesz@depesz.com

over 18 years ago

In reply to: Phoenix Kiula (#6)

Re: Some frustrations with admin tasks on PGSQL database

On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote:

Wow, smartest advice of the day! Yes, a lot of our data in that column
has dots and numbers (800,000 compared to 6 million), so I wanted to
get only to the stuff that was pure alphabets, but just didn't think
of how.

what i really meant was: check how many record will actually be modified
- i.e. contain upper-case letters.

the fact that some records contain only dots and digits is nice, but i
suspect that you also have a lot of record with only lower-case letters
in the field. or am i wrong?

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Phoenix Kiula

phoenix.kiula@gmail.com

over 18 years ago

In reply to: hubert depesz lubaczewski (#7)

Re: Some frustrations with admin tasks on PGSQL database

On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:

On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote:

Wow, smartest advice of the day! Yes, a lot of our data in that column
has dots and numbers (800,000 compared to 6 million), so I wanted to
get only to the stuff that was pure alphabets, but just didn't think
of how.

what i really meant was: check how many record will actually be modified
- i.e. contain upper-case letters.

the fact that some records contain only dots and digits is nice, but i
suspect that you also have a lot of record with only lower-case letters
in the field. or am i wrong?

You guessed it right. The query makes perfect sense. Only update those
records that have uppercase, and therefore require updating! Many
thanks.