While we're on the subject of searches...

Started by Vince Vielhaberabout 25 years ago10 messageshackersgeneral
Jump to latest
#1Vince Vielhaber
vev@michvhf.com
hackersgeneral

Over the past few months there've been a number of requests for an
interactive type documentation setup like the folks at php.net have.
The first version of it is now online and ready for testing. You can
also search the docs, but the search isn't that exotic - but since
there are fewer than 500 pages of documentation the crude ILIKE search
I'm doing will suffice for now. So check it out, beat it up, and if it
seems to work ok I'll move it to the main site and clear out the notes
that are currently in it (when you read them you'll know why). They're
available at:

http://odbc.postgresql.org/docs/

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
56K Nationwide Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================

#2webb sprague
wsprague@o1.com
In reply to: Vince Vielhaber (#1)
hackersgeneral
WAL Log using all my disk space!

Hi all,

The problem:

I do a large bulk copy once a day (100,000 records of Radius data),
tearing down indices, truncating a large table that contains summary
information, and rebuilding everything after the copy. Over the course
of this operation, I can generate up to 1.5 gigs of WAL data in
pg_xlog. Sometimes (like just now), I will run out of disk space and
the postmaster will crash. I try to restart it, and it errors out.
Then I delete all the WAL logs, try to restart, and (surprise) it errors
out again.

I tried to set some of the of the WAL parameters in postgres.conf like
so:

wal_buffers = 4 # min 4
wal_files = 8 # range 0-64
wal_sync_method = fdatasync # fsync or fdatasync or open_sync or
open_datasync

but I get 24+ separate files.

I would like to recover without an initdb, but if that isn't possible, I
would definitely like to avoid this problem in the future.

Thanks to all

#3Brook Milligan
brook@biology.nmsu.edu
In reply to: Vince Vielhaber (#1)
hackersgeneral
Re: [HACKERS] While we're on the subject of searches...

Over the past few months there've been a number of requests for an
interactive type documentation setup like the folks at php.net have.

Great to add to the documentation, but I hope the PostgreSQL project
doesn't take it so far as to make the primary documentation
interactive. A well-thought out, coherent document is _much_ more
useful than the skads of random tips that characterize some other
projects. The current document is very well-written (though perhaps
incomplete). I would hate to see that decline in quality.

Cheers,
Brook

#4Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: webb sprague (#2)
general
RE: WAL Log using all my disk space!

I do a large bulk copy once a day (100,000 records of Radius data),
tearing down indices, truncating a large table that contains summary
information, and rebuilding everything after the copy. Over the course
of this operation, I can generate up to 1.5 gigs of WAL data in
pg_xlog. Sometimes (like just now), I will run out of disk space and
the postmaster will crash. I try to restart it, and it errors out.
Then I delete all the WAL logs, try to restart, and (surprise) it errors
out again.

(Removing WAL logs is not good idea).

I tried to set some of the of the WAL parameters in postgres.conf like
so:

wal_buffers = 4 # min 4

(More is better and doesn't affect disk space usage).

wal_files = 8 # range 0-64

Ops. With wal_files > 0 server pre-allocates log files in advance!
Should be used only if disk space is not problem...

I would like to recover without an initdb, but if that isn't
possible, I would definitely like to avoid this problem in the
future.

So, are you able to restart? If not - send us startup server log.
You should be able to remove some of preallocated wal_files but
I need in numbers from server log to say what is safe to remove.

Now how to reduce disk space usage.
First to keep in mind - server removes old (useless) log files at
checkpoint time. Second - log file becomes useless (from transaction
subsystem POV) if it keeps no record from any running transaction.
Third - unfortunately (from my POV), we requires two checkpoint in
log files now, so we do not remove files with records between last
two checkpoints.
Recommendation: try to split your bulk operation into a few transactions
with smaller write traffic and run CHECKPOINT commands between them.
You could also try to change checkpoint_segments and/or checkpoint_timeout
params, but imho explicit CHECKPOINT is better for bulk ops, because of
it will not affect normal operations.

Vadim

#5Vince Vielhaber
vev@michvhf.com
In reply to: Brook Milligan (#3)
hackersgeneral
Re: [HACKERS] While we're on the subject of searches...

On Fri, 27 Apr 2001, Brook Milligan wrote:

Over the past few months there've been a number of requests for an
interactive type documentation setup like the folks at php.net have.

Great to add to the documentation, but I hope the PostgreSQL project
doesn't take it so far as to make the primary documentation
interactive. A well-thought out, coherent document is _much_ more
useful than the skads of random tips that characterize some other
projects. The current document is very well-written (though perhaps
incomplete). I would hate to see that decline in quality.

I wouldn't want that either. If anything I'd like to see some of the
tips and/or clarifications put into the regular docs. Something that
may appear well thought out and clear to many, may still be confusing
to others. If this doc enhancement can help someone writing the docs
make them clearer then it's definitely desirable.

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
56K Nationwide Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================

#6Marcin Mazurek
M.Mazurek@poznan.multinet.pl
In reply to: Vince Vielhaber (#1)
hackersgeneral
Re: While we're on the subject of searches...

Vince Vielhaber (vev@michvhf.com) napisaďż˝(a):

that are currently in it (when you read them you'll know why). They're
available at:

http://odbc.postgresql.org/docs/

Were You thinking about connecting it with some mailing list - kind of
"note-announce". It may be interesting to read all new notes to pg docs.

mazek

--
Don't tell me how hard you work. Tell me how much you get done.
-- James J. Ling

#7webb sprague
wsprague@o1.com
In reply to: Mikheev, Vadim (#4)
general
Re: WAL Log using all my disk space!

I just put a CHECKPOINT command on both sides of my COPY statement, and
it seemed to go quite a bit faster and not give me any problems. Does
that make sense?

Thanks
W

"Mikheev, Vadim" wrote:

Show quoted text

I do a large bulk copy once a day (100,000 records of Radius data),
tearing down indices, truncating a large table that contains summary
information, and rebuilding everything after the copy. Over the course
of this operation, I can generate up to 1.5 gigs of WAL data in
pg_xlog. Sometimes (like just now), I will run out of disk space and
the postmaster will crash. I try to restart it, and it errors out.
Then I delete all the WAL logs, try to restart, and (surprise) it errors
out again.

(Removing WAL logs is not good idea).

I tried to set some of the of the WAL parameters in postgres.conf like
so:

wal_buffers = 4 # min 4

(More is better and doesn't affect disk space usage).

wal_files = 8 # range 0-64

Ops. With wal_files > 0 server pre-allocates log files in advance!
Should be used only if disk space is not problem...

I would like to recover without an initdb, but if that isn't
possible, I would definitely like to avoid this problem in the
future.

So, are you able to restart? If not - send us startup server log.
You should be able to remove some of preallocated wal_files but
I need in numbers from server log to say what is safe to remove.

Now how to reduce disk space usage.
First to keep in mind - server removes old (useless) log files at
checkpoint time. Second - log file becomes useless (from transaction
subsystem POV) if it keeps no record from any running transaction.
Third - unfortunately (from my POV), we requires two checkpoint in
log files now, so we do not remove files with records between last
two checkpoints.
Recommendation: try to split your bulk operation into a few transactions
with smaller write traffic and run CHECKPOINT commands between them.
You could also try to change checkpoint_segments and/or checkpoint_timeout
params, but imho explicit CHECKPOINT is better for bulk ops, because of
it will not affect normal operations.

Vadim

#8Vince Vielhaber
vev@michvhf.com
In reply to: Marcin Mazurek (#6)
hackersgeneral
Re: While we're on the subject of searches...

On Fri, 27 Apr 2001, Marcin Mazurek wrote:

Vince Vielhaber (vev@michvhf.com) napisa���(a):

that are currently in it (when you read them you'll know why). They're
available at:

http://odbc.postgresql.org/docs/

Were You thinking about connecting it with some mailing list - kind of
"note-announce". It may be interesting to read all new notes to pg docs.

Can't say that I had. It's food for thought tho.

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
56K Nationwide Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================

#9Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: webb sprague (#7)
general
RE: WAL Log using all my disk space!

I just put a CHECKPOINT command on both sides of my COPY
statement, and it seemed to go quite a bit faster and not

Hmm, why would COPY be faster just after CHECKPOINT.
I assume this was caused by some fortuity (like preparing
new log file at checkpoint time - server does it even if
wal_files = 0 but current log file is 75% full).

give me any problems. Does that make sense?

Of couse. Also remember

Third - unfortunately (from my POV), we requires two checkpoint in
log files now, so we do not remove files with records between last
two checkpoints.

Some additional CPKT-s may be required.

BUT, I forgot about fourth point - server logs *entire* data pages on first
after last CPKT page modification. It doesn't affect COPY-ed tables (when
first
row goes to new page just this row will be logged) but may affect indices
and UPDATE/DELETE ops.

So, one should be very careful when using CPKT-s.

Vadim

#10webb sprague
wsprague@o1.com
In reply to: Mikheev, Vadim (#9)
general
Diagnostic

I am trying to determine how many seconds a query takes, and I can't
figure it out. I seem to remember some weird, mostly undocumented query
that went like:

foo# get diagnostics select processed into <int4 variable>;
foo# show <int4 variable>;

But this doesn't work, and I think it may only work in plpgsql function
anyway.

I know how to use explain, but I want to see if postgres is correct
which query is fastest.

Does anybody know how to find this info out?

Thx