Full page writes improvement, code update

Started by Koichi Suzukialmost 19 years ago69 messages
#1Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
1 attachment(s)

Hi,

Here's an update of a code to improve full page writes as proposed in

http://archives.postgresql.org/pgsql-hackers/2007-01/msg01491.php
and
http://archives.postgresql.org/pgsql-patches/2007-01/msg00607.php

Update includes some modification for error handling in archiver and
restoration command.

In the previous threads, I posted several evaluation and shown that we
can keep all the full page writes needed for full XLOG crash recovery,
while compressing archive log size considerably better than gzip, with
less CPU consumption. I've found no further objection for this proposal
but still would like to hear comments/opinions/advices.

Regards;

--
Koichi Suzuki

Attachments:

pg_lesslog.tgzapplication/octet-stream; name=pg_lesslog.tgzDownload
#2Simon Riggs
simon@2ndquadrant.com
In reply to: Koichi Suzuki (#1)
Re: Full page writes improvement, code update

On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote:

Here's an update of a code to improve full page writes as proposed in

http://archives.postgresql.org/pgsql-hackers/2007-01/msg01491.php
and
http://archives.postgresql.org/pgsql-patches/2007-01/msg00607.php

Update includes some modification for error handling in archiver and
restoration command.

In the previous threads, I posted several evaluation and shown that we
can keep all the full page writes needed for full XLOG crash recovery,
while compressing archive log size considerably better than gzip, with
less CPU consumption. I've found no further objection for this proposal
but still would like to hear comments/opinions/advices.

Koichi-san,

Looks interesting. I like the small amount of code to do this.

A few thoughts:

- Not sure why we need "full_page_compress", why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
it)

- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.

- I'd rather mark just the nonremovable blocks. But no real difference

- We definitely don't want an normal elog in a XLogInsert critical
section, especially at DEBUG1 level

- diff -c format is easier and the standard

- pg_logarchive and pg_logrestore should be called by a name that
reflects what they actually do. Possibly pg_compresslog and
pg_decompresslog etc.. I've not reviewed those programs, but:

- Not sure why we have to compress away page headers. Touch as little as
you can has always been my thinking with recovery code.

- I'm very uncomfortable with touching the LSN. Maybe I misunderstand?

- Have you thought about how pg_standby would integrate with this
option? Can you please?

- I'll do some docs for this after Freeze, if you'd like. I have some
other changes to make there, so I can do this at the same time.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#3Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Simon Riggs (#2)
Re: [PATCHES] Full page writes improvement, code update

Simon;

Thanks a lot for your comments/advices. I'd like to write some feedback.

Simon Riggs wrote:

On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote:

Here's an update of a code to improve full page writes as proposed in

http://archives.postgresql.org/pgsql-hackers/2007-01/msg01491.php
and
http://archives.postgresql.org/pgsql-patches/2007-01/msg00607.php

Update includes some modification for error handling in archiver and
restoration command.

In the previous threads, I posted several evaluation and shown that we
can keep all the full page writes needed for full XLOG crash recovery,
while compressing archive log size considerably better than gzip, with
less CPU consumption. I've found no further objection for this proposal
but still would like to hear comments/opinions/advices.

Koichi-san,

Looks interesting. I like the small amount of code to do this.

A few thoughts:

- Not sure why we need "full_page_compress", why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
it)

No, no one asked to have a separate option. There'll be no bad
influence to do so. As written below, full page write can be
categolized as follows:

1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.

2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.

3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.

My proposal deals with 2). So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.

Only one thing I had to do is to create "dummy" full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not "dummy" full page writes. On the other hand, if
both logical log and "real" full page writes are found in a log record,
the recovery has to use "real" full page writes.

- I'd rather mark just the nonremovable blocks. But no real difference

It sound nicer. According to the full page write categories above, we
can mark full page writes as needed in "crash recovery" or "archive
recovery". Please give some feedback to the above full page write
categories.

- We definitely don't want an normal elog in a XLogInsert critical
section, especially at DEBUG1 level

I agree. I'll remove elog calls from critical sections.

- diff -c format is easier and the standard

I'll change the diff option.

- pg_logarchive and pg_logrestore should be called by a name that
reflects what they actually do. Possibly pg_compresslog and
pg_decompresslog etc.. I've not reviewed those programs, but:

I wasn't careful to have command names more based on what to be done.
I'll change the command name.

- Not sure why we have to compress away page headers. Touch as little as
you can has always been my thinking with recovery code.

Eliminating page headers gives compression rate slightly better, a
couple of percents. To make code simpler, I'll drop this compression.

- I'm very uncomfortable with touching the LSN. Maybe I misunderstand?

The reason why we need pg_logarchive (or pg_decompresslog) is to
maintain LSN the same as those in the original WAL. For this purpose,
we need "dummy" full page write. I don't like to touch LSN either and
this is the reason why pg_decompresslog need some extra work,
eliminating many other changes in the recovery.

- Have you thought about how pg_standby would integrate with this
option? Can you please?

Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
http://archives.postgresql.org/pgsql-hackers/2007-02/msg00034.php
removing full page writes can lose a chance to recover from
partial/inconsisitent writes in the crash of pg_standby. In this case,
we have to import a backup and archive logs (with full page writes
during the backup) to recover. (We have to import them when the file
system crashes anyway). If it's okay, I believe
pg_compresslog/pg_decompresslog can be integrated with pg_standby.

Maybe we can work together to include pg_compresslog/pg_decompresslog in
pg_standby.

- I'll do some docs for this after Freeze, if you'd like. I have some
other changes to make there, so I can do this at the same time.

I'll be looking forward to your writings.

Best regards;

--
Koichi Suzuki

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Koichi Suzuki (#3)
Re: [PATCHES] Full page writes improvement, code update

On Wed, 2007-03-28 at 10:54 +0900, Koichi Suzuki wrote:

As written below, full page write can be
categolized as follows:

1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.

2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.

3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.

My proposal deals with 2). So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

I'm very happy with this proposal, though I do still have some points in
detailed areas.

If you accept that 1 & 2 are valid goals, then 1 & 3 or 1, 2 & 3 are
also valid goals, ISTM. i.e. you might choose to use full_page_writes on
the primary and yet would like to see optimised data transfer to the
standby server. In that case, you would need the mark.

- Not sure why we need "full_page_compress", why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
it)

No, no one asked to have a separate option. There'll be no bad
influence to do so. So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

OK, different question:
Why would anyone ever set full_page_compress = off?

Why have a parameter that does so little? ISTM this is:

i) one more thing to get wrong

ii) cheaper to mark the block when appropriate than to perform the if()
test each time. That can be done only in the path where backup blocks
are present.

iii) If we mark the blocks every time, it allows us to do an offline WAL
compression. If the blocks aren't marked that option is lost. The bit is
useful information, so we should have it in all cases.

- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.

Only one thing I had to do is to create "dummy" full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not "dummy" full page writes. On the other hand, if
both logical log and "real" full page writes are found in a log record,
the recovery has to use "real" full page writes.

I apologise for not understanding your reply, perhaps my original
request was unclear.

In recovery.conf, I'd like to see a parameter such as

dummy_backup_blocks = off (default) | on

to explicitly indicate to the recovery process that backup blocks are
present, yet they are garbage and should be ignored. Having garbage data
within the system is potentially dangerous and I want to be told by the
user that they were expecting that and its OK to ignore that data.
Otherwise I want to throw informative errors. Maybe it seems OK now, but
the next change to the system may have unintended consequences and it
may not be us making the change. "It's OK the Alien will never escape
from the lab" is the starting premise for many good sci-fi horrors and I
want to watch them, not be in one myself. :-)

We can call it other things, of course. e.g.
ignore_dummy_blocks
decompressed_blocks
apply_backup_blocks

Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
http://archives.postgresql.org/pgsql-hackers/2007-02/msg00034.php
removing full page writes can lose a chance to recover from
partial/inconsisitent writes in the crash of pg_standby. In this case,
we have to import a backup and archive logs (with full page writes
during the backup) to recover. (We have to import them when the file
system crashes anyway). If it's okay, I believe
pg_compresslog/pg_decompresslog can be integrated with pg_standby.

Maybe we can work together to include pg_compresslog/pg_decompresslog in
pg_standby.

ISTM there are two options.

I think this option is already possible:

1. Allow pg_decompresslog to operate on a file, replacing it with the
expanded form, like gunzip, so we would do this:
restore_command = 'pg_standby %f decomp.tmp && pg_decompresslog
decomp.tmp %p'

though the decomp.tmp file would not get properly initialised or cleaned
up when we finish.

whereas this will take additional work

2. Allow pg_standby to write to stdin, so that we can do this:
restore_command = 'pg_standby %f | pg_decompresslog - %p'

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#5Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Simon Riggs (#4)
Re: [PATCHES] Full page writes improvement, code update

Hi, Here're some feedback to the comment:

Simon Riggs wrote:

On Wed, 2007-03-28 at 10:54 +0900, Koichi Suzuki wrote:

As written below, full page write can be
categolized as follows:

1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.

2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.

3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.

My proposal deals with 2). So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

I'm very happy with this proposal, though I do still have some points in
detailed areas.

If you accept that 1 & 2 are valid goals, then 1 & 3 or 1, 2 & 3 are
also valid goals, ISTM. i.e. you might choose to use full_page_writes on
the primary and yet would like to see optimised data transfer to the
standby server. In that case, you would need the mark.

Yes, I need the mark. In my proposal, only unmarked full-page-writes,
which were written as the first update after a checkpoint, are to be
removed offline (pg_compresslog).

- Not sure why we need "full_page_compress", why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
it)

No, no one asked to have a separate option. There'll be no bad
influence to do so. So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

OK, different question:
Why would anyone ever set full_page_compress = off?

Why have a parameter that does so little? ISTM this is:

i) one more thing to get wrong

ii) cheaper to mark the block when appropriate than to perform the if()
test each time. That can be done only in the path where backup blocks
are present.

iii) If we mark the blocks every time, it allows us to do an offline WAL
compression. If the blocks aren't marked that option is lost. The bit is
useful information, so we should have it in all cases.

Not only full-page-writes are written as WAL record. In my proposal,
both full-page-writes and logical log are written in a WAL record, which
will make WAL size slightly bigger (five percent or so). If
full_page_compress = off, only a full-page-write will be written in a
WAL record. I thought someone will not be happy with this size growth.

I agree to make this mandatory if every body is happy with extra logical
log in WAL records with full page writes.

I'd like to have your opinion.

- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.

Only one thing I had to do is to create "dummy" full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not "dummy" full page writes. On the other hand, if
both logical log and "real" full page writes are found in a log record,
the recovery has to use "real" full page writes.

I apologise for not understanding your reply, perhaps my original
request was unclear.

In recovery.conf, I'd like to see a parameter such as

dummy_backup_blocks = off (default) | on

to explicitly indicate to the recovery process that backup blocks are
present, yet they are garbage and should be ignored. Having garbage data
within the system is potentially dangerous and I want to be told by the
user that they were expecting that and its OK to ignore that data.
Otherwise I want to throw informative errors. Maybe it seems OK now, but
the next change to the system may have unintended consequences and it
may not be us making the change. "It's OK the Alien will never escape
from the lab" is the starting premise for many good sci-fi horrors and I
want to watch them, not be in one myself. :-)

We can call it other things, of course. e.g.
ignore_dummy_blocks
decompressed_blocks
apply_backup_blocks

So far, we don't need any modification to the recovery and redo
functions. They ignore the dummy and apply logical logs. Also, if
there are both full page writes and logical log, current recovery
selects full page writes to apply.

I agree to introduce this option if 8.3 code introduces any conflict to
the current. Or, we could introduce this option for future safety. Do
you think we should introduce this option?

If this should be introduced now, what we should do is to check this
option when dummy full-page-write appears.

Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
http://archives.postgresql.org/pgsql-hackers/2007-02/msg00034.php
removing full page writes can lose a chance to recover from
partial/inconsisitent writes in the crash of pg_standby. In this case,
we have to import a backup and archive logs (with full page writes
during the backup) to recover. (We have to import them when the file
system crashes anyway). If it's okay, I believe
pg_compresslog/pg_decompresslog can be integrated with pg_standby.

Maybe we can work together to include pg_compresslog/pg_decompresslog in
pg_standby.

ISTM there are two options.

I think this option is already possible:

1. Allow pg_decompresslog to operate on a file, replacing it with the
expanded form, like gunzip, so we would do this:
restore_command = 'pg_standby %f decomp.tmp && pg_decompresslog
decomp.tmp %p'

though the decomp.tmp file would not get properly initialised or cleaned
up when we finish.

whereas this will take additional work

2. Allow pg_standby to write to stdin, so that we can do this:
restore_command = 'pg_standby %f | pg_decompresslog - %p'

Both seem to work fine. pg_decompresslog will read entire file at the
beginning and so both two will be equivallent. To make the second
option run quicker, pg_decompresslog needs some modification to read WAL
record one after another.

Anyway, could you try to run pg_standby with pg_compresslog and
pg_decompresslog?

----
Additional recomment on page header removal:

I found that it is not simple to keep page header in the compressed
archive log. Because we eliminate unmarked full page writes and shift
the rest of the WAL file data, it is not simple to keep page header as
the page header in the compressed archive log. It is much simpler to
remove page header as well and rebuild them. I'd like to keep current
implementation in this point.

Any suggestions are welcome.
-------------------

I'll modify the name of the commands and post it hopefully within 20hours.

With Best Regards;

--
Koichi Suzuki

#6Simon Riggs
simon@2ndquadrant.com
In reply to: Koichi Suzuki (#5)
Re: [PATCHES] Full page writes improvement, code update

On Thu, 2007-03-29 at 17:50 +0900, Koichi Suzuki wrote:

Not only full-page-writes are written as WAL record. In my proposal,
both full-page-writes and logical log are written in a WAL record, which
will make WAL size slightly bigger (five percent or so). If
full_page_compress = off, only a full-page-write will be written in a
WAL record. I thought someone will not be happy with this size growth.

OK, I see what you're doing now and agree with you that we do need a
parameter. Not sure about the name you've chosen though - it certainly
confused me until you explained.

A parameter called ..._compress indicates to me that it would reduce
something in size whereas what it actually does is increase the size of
WAL slightly. We should have a parameter name that indicates what it
actually does, otherwise some people will choose to use this parameter
even when they are not using archive_command with pg_compresslog.

Some possible names...

additional_wal_info = 'COMPRESS'
add_wal_info
wal_additional_info
wal_auxiliary_info
wal_extra_data
attach_wal_info
...
others?

I've got some ideas for the future for adding additional WAL info for
various purposes, so it might be useful to have a parameter that can
cater for multiple types of additional WAL data. Or maybe we go for
something more specific like

wal_add_compress_info = on
wal_add_XXXX_info ...

In recovery.conf, I'd like to see a parameter such as

dummy_backup_blocks = off (default) | on

to explicitly indicate to the recovery process that backup blocks are
present, yet they are garbage and should be ignored. Having garbage data
within the system is potentially dangerous and I want to be told by the
user that they were expecting that and its OK to ignore that data.
Otherwise I want to throw informative errors. Maybe it seems OK now, but
the next change to the system may have unintended consequences and it
may not be us making the change. "It's OK the Alien will never escape
from the lab" is the starting premise for many good sci-fi horrors and I
want to watch them, not be in one myself. :-)

We can call it other things, of course. e.g.
ignore_dummy_blocks
decompressed_blocks
apply_backup_blocks

So far, we don't need any modification to the recovery and redo
functions. They ignore the dummy and apply logical logs. Also, if
there are both full page writes and logical log, current recovery
selects full page writes to apply.

I agree to introduce this option if 8.3 code introduces any conflict to
the current. Or, we could introduce this option for future safety. Do
you think we should introduce this option?

Yes. You are skipping a correctness test and that should be by explicit
command only. It's no problem to include that as well, since you are
already having to specify pg_... decompress... but the recovery process
doesn't know whether or not you've done that.

Anyway, could you try to run pg_standby with pg_compresslog and
pg_decompresslog?

After freeze, yes.

----
Additional recomment on page header removal:

I found that it is not simple to keep page header in the compressed
archive log. Because we eliminate unmarked full page writes and shift
the rest of the WAL file data, it is not simple to keep page header as
the page header in the compressed archive log. It is much simpler to
remove page header as well and rebuild them. I'd like to keep current
implementation in this point.

OK.

This is a good feature. Thanks for your patience with my comments.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#7Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#4)
Re: [PATCHES] Full page writes improvement, code update

Simon,

OK, different question:
Why would anyone ever set full_page_compress = off?

The only reason I can see is if compression costs us CPU but gains RAM &
I/O. I can think of a lot of applications ... benchmarks included ...
which are CPU-bound but not RAM or I/O bound. For those applications,
compression is a bad tradeoff.

If, however, CPU used for compression is made up elsewhere through smaller
file processing, then I'd agree that we don't need a switch.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

#8Simon Riggs
simon@2ndquadrant.com
In reply to: Josh Berkus (#7)
Re: [PATCHES] Full page writes improvement, code update

On Thu, 2007-03-29 at 11:45 -0700, Josh Berkus wrote:

OK, different question:
Why would anyone ever set full_page_compress = off?

The only reason I can see is if compression costs us CPU but gains RAM &
I/O. I can think of a lot of applications ... benchmarks included ...
which are CPU-bound but not RAM or I/O bound. For those applications,
compression is a bad tradeoff.

If, however, CPU used for compression is made up elsewhere through smaller
file processing, then I'd agree that we don't need a switch.

Koichi-san has explained things for me now.

I misunderstood what the parameter did and reading your post, ISTM you
have as well. I do hope Koichi-san will alter the name to allow
everybody to understand what it does.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#9Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Simon Riggs (#8)
Re: [PATCHES] Full page writes improvement, code update

Josh;

I'd like to explain what the term "compression" in my proposal means
again and would like to show the resource consumption comparision with
cp and gzip.

My proposal is to remove unnecessary full page writes (they are needed
in crash recovery from inconsistent or partial writes) when we copy WAL
to archive log and rebuilt them as a dummy when we restore from archive
log. Dummy is needed to maintain LSN. So it is very very different
from general purpose compression such as gzip, although pg_compresslog
compresses archive log as a result.

As to CPU and I/O consumption, I've already evaluated as follows:

1) Collect all the WAL segment.
2) Copy them by different means, cp, pg_compresslog and gzip.

and compared the ellapsed time as well as other resource consumption.

Benchmark: DBT-2
Database size: 120WH (12.3GB)
Total WAL size: 4.2GB (after 60min. run)
Elapsed time:
cp: 120.6sec
gzip: 590.0sec
pg_compresslog: 79.4sec
Resultant archive log size:
cp: 4.2GB
gzip: 2.2GB
pg_compresslog: 0.3GB
Resource consumption:
cp: user: 0.5sec system: 15.8sec idle: 16.9sec I/O wait: 87.7sec
gzip: user: 286.2sec system: 8.6sec idle: 260.5sec I/O wait: 36.0sec
pg_compresslog:
user: 7.9sec system: 5.5sec idle: 37.8sec I/O wait: 28.4sec

Because the resultant log size is considerably smaller than cp or gzip,
pg_compresslog need much less I/O and because the logic is much simpler
than gzip, it does not consume CPU.

The term "compress" may not be appropriate. We may call this "log
optimization" instead.

So I don't see any reason why this (at least optimization "mark" in each
log record) can't be integrated.

Simon Riggs wrote:

On Thu, 2007-03-29 at 11:45 -0700, Josh Berkus wrote:

OK, different question:
Why would anyone ever set full_page_compress = off?

The only reason I can see is if compression costs us CPU but gains RAM &
I/O. I can think of a lot of applications ... benchmarks included ...
which are CPU-bound but not RAM or I/O bound. For those applications,
compression is a bad tradeoff.

If, however, CPU used for compression is made up elsewhere through smaller
file processing, then I'd agree that we don't need a switch.

As I wrote to Simon's comment, I concern only one thing.

Without a switch, because both full page writes and corresponding
logical log is included in WAL, this will increase WAL size slightly
(maybe about five percent or so). If everybody is happy with this, we
don't need a switch.

Koichi-san has explained things for me now.

I misunderstood what the parameter did and reading your post, ISTM you
have as well. I do hope Koichi-san will alter the name to allow
everybody to understand what it does.

Here're some candidates:
full_page_writes_optimize
full_page_writes_mark: means it marks full_page_write as "needed in
crash recovery", "needed in archive recovery" and so on.

I don't insist these names. It's very helpful if you have any
suggestion to reflect what it really means.

Regards;
--
Koichi Suzuki

#10Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Koichi Suzuki (#5)
1 attachment(s)
Re: [PATCHES] Full page writes improvement, code update

Hi,

Here's a patch reflected some of Simon's comments.

1) Removed an elog call in a critical section.

2) Changed the name of the commands, pg_complesslog and pg_decompresslog.

3) Changed diff option to make a patch.

--
Koichi Suzuki

Attachments:

pg_lesslog.tgzapplication/octet-stream; name=pg_lesslog.tgzDownload
��~�E��k�Y� �_��"*URb2_RJ%�j��II��W���jJN�f��d�#�Jeu���6<��0Xxv��5�lcac�c��?#w���y�WD�I=��#Rf2n�{����{�wz�I2�����>��}{n�������o��������������;�|sc~x��D�g��~�y�c���gj�����;��g��o���������������o�����_����K�u��p�=����������hB�{��c/�fi8	o���������O�g$i5�a�t-N����A����<�;8�o������G�{xa����:���:���Z���>P���U|P
'�����
�e�k8�z�sX���U�K���n���I�^�y��8L���$I�x�M/��~+��]|�H#����h��I��?
��~�O����]��w����.H>���p����7���f��=}�@�_�G}�����cz����4�I@O�|Ar��}\?�"�����=-����	��F������/F���?�������������F�����WY8�������0��{0�<���o����P[o�9��� P�����Geh+����n<�E�[�����}��g���a�g���l��@�~��(C�8�O�3��������i���q���������[;7���ln�������k���Q�O��V>.��jT����h�t������V���ln�A�i�SO��g��`�g�����J��/��I
���������	K[�_�!.��hD�4{���^0��k��>�6k�7����[���o��6o�����q�K�F^�����T:i?l����KI���_�W���"�;#o�f�'t���wr�����
��5?.������$��������L=��M�8���i4�.
��:��������_�u������:B�q���=�����6jf7�x9D������<|�.��]19�Y�$&���[[���H*%%�~\���R0J�R2� 0����\hU���jVO���]z6�z�(�V0�WP~�q���Y�d�R��2.F��<��k�������!��c\���62��������}|�9�w(�:��g��d�.~VL����������F,���1�������������D��������}|�o�,[_F���Q0&����D�����H�A������L���)�t�_�S��a����f�zu��xB������A�����N����zs[�����m��+|�a���������xHL�{����������!`!���������N��$8���0���0���o��������w���<�FO�����1��/�����_&af��`=�!���!U#4���u44��[Yp�P����&G@8�prF6�a�P��=.v	�!hV�v�������>.��g�+5�����B��J;��4�MBY��j��erD��e
�O��(���6*�l\E�`�%*��?>�+������
^!*Y�

d����l�����^K#3
�~�>��g ����3��c�~��J��8z���������}��g{Z��V������A�,(��e�3�zOZ���N���{_u���*��Q��y���������������B�����:np�\z���7w���c���������^�?g����s�����#�	�^ ����m�"+�A� }��mxG�o�_��5��u�F��N�m��oJ+����{��R�����>
&����M	�$-��8����S*'/�g�h�%�u����eB���v���b3�����H��c���}�Ie#Q�z,��C`Q���&�$J=��&U���:����]��~(on��V�[�'���`�t�O`"��{�{���3�1�_5���S���a+���(�8l�J>sz��yy����/�4�i�(�}t�ZN�2a��
��
�������?� ��_��?����/�
\E|keHk9���h� ���9������$����I�t������wU|$2/��Y�E�/j"����$�%)��%�hQg�!� ��	1��Q�j�
5��P.C��z�*�"J���G3�1�}E� R��;(����)�nf��l@��oS�B���<����G������J�@4K�/��!)^�6�!�Z6�:��c���������Gv��E�&9�1����=�d��@�;�d�
^��,I/V ��LO��Z.$,�y%������z�
d����|P���+��`o�]����1�1j|�� Br�v��9U�Qu��`"��BS����UQ�:����G�U��O�l���&O5���>W����`Zy���Zj)45��
4j�G�=:���)(w�����~��"K�(�(���Uzz7+�}KL��4��;�&����9���4�n����2<L�q��HQ����g���!O*V�F�8�h�t�xe7�NB�(O9l2�����,D�N�&���;�v)`�A������Y/A	��x�o�-'�m��]�p��PP�������"|�UC� 
@�����O�D�UU���<�'S�/aW@4H\���+��H\�c:��Q[�����Q�A�s_��1�	�w�h>Of�����Z

�N2x|�1�J�p�1�d,��T/��w�,H�gd��t#?IU%�:�f��P
����=X��
$�x@���
c�������)XH��g��J =��?�����zVL^L4S����%L#@:�`����@���&}����H���:�2?���a���������Z��ch%�2�\1�vz�������7�3��R�����jw� T�uZ������-� �����zZ���S�J��Q4e|�����1��25r�`�v��
��Z�{��U��e2��NR��iL:��'~��P_��l\�Z��7��
��~X���?hp�JmB��G����Q��j3�$#[��~�uf��XeI���dk_���J�/��{�]�jEH�n��a��Zo������Bb��W0�"+R����2H"S5�y@���`��%\#��?$����*�%7�/�H�`�,E�H).L��"�������z,�����;s�z��6�8�?�4��F�Mg�G��L>� o�2��m��lZ�� ��{��3
��I��:�S���$M<�������g;��_(�#��������\�R�L�9���q0I�I%���
���&����rX�F���GF�"d
���h����9!N�I�eV�
�KW��Y-�K��7
b������p��b��>2#�0c��,��t����/��m�y��/��a:�5Z��M?�}�H�1���0����`�f
���E���!<�4%����~t�������t����  ���!���2��#`��F+LA3hG���	?���0��������vp�K"
�~q���fJ_���uhxYp�#yd2��A���L����{�;������#E^,};n$}(Z�����8���*�7Z�69$4��Q
AH&�3[f�����(
r�ud�y�!�,d(9�<�F����,�x��MF�B�1y)�/�T6f�%ytv�_����
%�n!$pd}C�!��Y$��Q��4���.��S�qS{�G�Z��A4�,�/�03�\#�=
�s���y��Lv[)3UY�� R�l(C,�j������*��to� M��R�/�l������M�J�5�(+[n<����0�,��d%S��@����E3��#<I�7vn!�c%�%�A�������wj[��J��&�.3��h��������[�����Q��������U�z�T�����K�x��jdx���,1[C�#j�o�����"��� 2���z��E�������`7�+,����\����::L@���" _q�4��s=c(]jz(SZ�-88��
�a)�a�����A`�nv��E^q�	��Mrf�M0��&�"�&K��}��,������1@'Q�_vJ������UJ�D��%�����@���i��E(�k��YjC�,���Y�p�����'��\�H��	~�3��� �(=Rcg��@�A�F�9�Ca����7JgWH���[|���|d��%��S3[���pV��}�?�C�LH�>�!����U��8���\��6)��?�)�h���Jz�a6�����H���v�5<����V�;q��"0M_���:��p�%�d�f��{�;�N`��R��Q�6�8�Lq:���l�����j&_#c/��x����y��ta���2P6r�����
[J���A�����g-������[���<
�j/U���cXk�f������5<|Cv@���(`F�"����}�-�b��QA�t����L7 ��gX��L��S���j�C[to����%��(`g��md�@S?�~��67�Ib�'��C�����F�?@�����A�;����
x"��"��q�B�rH����Q%K�A�����K�\�dql�"��1y���xZ�~vX����$&b�x�L�������?>:���m�@�v����J������7*�;�+��"P}+2�� ���]�P�D<�$bP�D>Y�ZZ,��J��b�D��4�z�o�����?��������?�����������?���������������^����������������?�������E������O�?����������y�������������_�����y�#������������)����o���JF�����
���/�_�v;n/�
"^��U�.�]�uo[��1y�'��r��C	��.��p�a��v�[���d�%��HgCne7���
i��FM>��t���q��-���j�;�[����r����<����
��6[�ml
�=�������l�n-Vdv����W�,�75�m�S���d7��M�|[`W����};����s���������e(�:j�N���L3Q��
�*g����`�%K�CJ����M	��-Y�|n�5�Q�%��Q]xd-�f�zb9MY*�m3v�K'KC=�K�6�����
�pF��:�D��:d�t'N�WE`�#�A"��z=�J��w����� �r�.��������^��1m:D��Z#�CBe�� �vT=������8x��v�:�v��J��`'I�]7��!Bg���x=�/gC��T���8AU��=�9z�����L�r��]����B���V�V���v_��l9���-�����-��5'%����*Y�H�/)���'���
)ZE�e�-L��.�(�F80������{�,nZ�e��2a:?/�������1j�V�����)(����
2c���t�M�8�S6���&!���2�	�y�|�9	uE�0s2>���b>�U;YX����+��s�2r��^�%�K0����=��<�#��9V��&���Y���ere�,�6�e�pq���2�7�^�M��g��te�LK�;_q������yN�B#QG���{�w��.��-,�me��5�H��
���63���a\��2�����I���Vle���3�X.ME�+���)���rS���y�����Y����-���'��#���0�[�O9����p?�2L�]
t�P�<�D��:$���[��!�.�-b^)�*��4F�BE��2Y���� ���<����T���*8E�� ������U�F���d�\�lRJAXH������������rm-�I�~2����KGa�G�*�NeW�������;����G��w������H"i�)����&y���B���Y�fax�rwP�Sa/�3b�3uiqJ+/L�g�K%Y�b~����TSP��V8�#��hO���"���):��E�Lze��t��v��}���-���T������|g�S+�u�s���p��wU�
�S�9�J���^��sD)�����q���4��;!�$`T3���B"��5*M;�+���]f
x�N�'w�O�\0���,e�ec�Y�f�$�uD� �
�uw-��{e;��h� �����3�ax������������/�/^��o_����������o���?����^�������7�O���������?�����9-��gz(��s��P����m�o����z��{���QA>����� �11R��rI�T�Z�<�����<��N}��*�H/l��2�����o/���fucI>���KM��e������fh2�4����m���(�����vp��������2����`��	h�"�xa8�[,��
e����$����`)���n��F&��P8�E��t��]OB���L��df�hf���ryZf���s�E�f�W����9��y3s��w
6(�yn�#�b�>Lmy�`~�K,*I�����"`5_�iu�YQ��X���
m:�<���{��d|���L��
���mf�/*]��b�]�#Fd�"��'R�y��W�qu��J��h0��aCgX)�JX��A!�:oG����2��%�U5���OI���CN~D��g>^���~���V�K���Q�D���+$����$H�����������EYr�:��������K�|hEqxN�N���yS�e�V�
&I�(H�p�����-���\�������j�Q���0�������;)��=	_O���0'bC��|������]7&+��z�c�:v�N����6
R���=
yk�~8����Qxo����.�i�"��,S��?+��sI'�J��n���@N.*�j[
���i�Vj,����B��-#��hT�%�2�2�yF��;��C�)Ls����|	{�&���U���y�Z�Y�w��yJ/z	Mr���G)�C�xr�ZV�Mk�nW����3��+��!�b3I�����v�|�c$�l�>���G�i-�>�c�2f2�s8%�6{�.�2S�����>_�he��#du�
q�c��F�eB��s0N�t���:����z�2�X<8���i!U��|�u�}j�tUV�+���I�%�����Py^^�z���K����m��\S�FA�L�N���>�"�Z�DT<u�����H�d��9a��9��������������>�����b~*��4hGQ��R�8#dg��i5��=��N�d������;�!���j�s��d��U�C<�6��=f������X���%���E�Gz-s8I��rl��]ciV0/$�����]��M�k����
�E"�C"��*�A��	��	��IWRX�LD�������M�R�{���ci���������2����'*A�`�!��3��L��W[�-������N��]�`��^�,�io�U��1�4��1[���Z�E�%�
�e�>L��v����m��sJ}�����
�j�9�_����6.qn�������CHlo,��t6e!�,��X��m�?��,��(��d�D��hmp�J�9:������� �U5�'S�g��?����	���+>I`�z,�t�a���s��[!��;�x�hM�!��01S.'�a�=�w�(��V���������4>��!���s��9�)�lo�*�"sO9�����* �*�d)8���=��]V�@���A���*t��}���F$�W.�@�
4��tn�OL�W��G����h�
��v�e)jc���e����p��'��tVD�=~����QQQ&�I�
�Jb��uS��RHW���kI+:�g/s�U�B��`2HTMpY,�V���UQ���3�j�5R�A���ZK���� !;~� ���8������b/�eF�E���P���� �\����x�L���b�h�0S���k��P-��u�P.O�P$]&���~dU���x�>}}�����z�4�I�0�za$"%X7���V�J���C*�N�Df(�~-������Y�\irrI����s���	G����o���i	�x�!����iE���N5>���d1������J(v�E�u��rOb���#����{�<����T�^��E���S�����������O��om���6om��R��nlo�|���=~!��k}o-�k��U��_��0�n��g�W����(7z���7`ekccgmskmc�������������x76>��(���]�s{mc{m�s�����wo���s��������7�<��/�=�}���_E�:i>�����yd�T<�\���m�C6J����7�O:��7G{��F���~��ia����F���5#�~���{�adL��9�5��s:���u�}e�(����Nug[p��g�7�,I�G�N���}�n���G!<,`�W����/��3H{�!��6�C�#�[��t�KR�
:��v���H�Dvw,z�M����u��ssS-�{��o�tH�w}��u���i��M7���'VQ�_�RiB�[U�S1/����X��G�<���V���nn����i�?�
NG�qZ��/Fg�E�`D{�.�������������MB����J�-���7�T?��6�����������EW���80Et��P�:����xH����%oGg�B�'��QI��U=�h���TT?�.��&�5i��[[��Q5+�@P�&
���w����d07J�c#����(@��;��s��MH�[�L��4��p(���������vg�k�#6��}9��~`�����Pc��\�����
<�(���n
�fl0'E�S`�9��(p���Fu��}��F�p7�h')���3��)��
����%���,�f`�E�1f���L�k5�v�YW��������@>��*���qr��h���O��3�@'��R��'��<Y�����V�������0�,��/��>���V><f��y���;���;f?�G.�_���}���T&��Q�)c�� �>8�F������M~��S�=�s8�vtY�Z[@~��5��ax\��,��o��u/l�d���C���o6]��	��kw~)����vh9�������t�� ��'�����a�Q�>�����nt`����/o�j[qdY�>�#b^�`���r�.~/"Q4�������d���H�o|�,J6�G���
��x���B^&�A��7������%9����m�%wzf��W?���u�s���d}&���Y��7��^E���+Z�&j�[�wom���}��Y���`n�����P�ur��/�o-Z/�!�W�t�i;�a�����LH�HvS���j�	9y��O����w��N������[�L��f��n�Y|z���S�uk�������[;����[%q��'���~C�7�y�e���<������TI���i=lW�%�7�gwl��*�����&&b����!1�p��j���U�ST��i�5mgv7<y�5���&h�L���eZ'��0�N�U�@5��>~�8��������u� �<�b
�����.h������"����jwB"�?K�1Z�ig�36��f;����������^������^/��l�Mos�.���W����|-n�y�fu�����	��q�+����HPO��hvvn?%W��������N
��q������o'�%@��
������W+�UjK������]�
4z�}��|�K��@^�[�v�D���������;O�k}���A0�/�����o�m�1�	���z�&��WN�u�b,�@���d����6v���won����C|s�-B�����V
f�x,Y_���=S��
���R?u�:ZuT��1Y��q2=w�*��A�D�1�k���e�J�07#ao�n�4H�����R��P��M��/�yGD7T0��%S`l�rO�1����7�r�?�9h�m D�5h6���kh=����
MU�kTs2(WV6^l4P@B`�z�{��!!��a���������'X����cZ8�c���8�r?�h��Y�7���$@���K�/��k������hs��U��ax�����-�d��m��^�N'Ju� ���E��*|wS�{s�����n�w����}W�PE/o��7�F�i;�#��d���H+��g�E�����m�������m5�������m������ssk3[��C����1����{�Ayd}sk}k���{,�]M�V*Y/S��up��>�0��lP��',�u`���b�E�B��m����|-$�A6Mj����
X@����|{(zL�
Y��t��
��O������1���aH����R�^�lN��P$�ELoB�E�����p[�,@����7�q�]|B�����8��Rf0��������m2,�������T?���'����/�]y�Y(���b�#�����L���`)�4���hg�:�t� ��fo��s���$]d�!`.�����y���V�V�e�����}DQj�oz��N���H�����@���k$�k���*�x�eh#�.�n-}����q������T��*�Om~�z���N�o[S���_[����}�����@@E�h��^��y��T]��q7�H�����Y��k��i��X+)6��<[������������v�k��bA�*4EQ��+�f1��8P9��h����QN�`��Z�f�����(l�w��n$���PO���tCl�X�����L'��{��Z�[5�1�,��fEd��S�����r��5��F�0��\2t�P�A���^�B#K�S��b�gY�}���V?��z_�m������'���
6������h
gk�RX%�K������H
�|[�em>���sq��s`�c�p��D<�&��7)�T8M�9����'�:L�T1�����	*2���-�v�E:8�T�9):��C�2G�X��|�"T9��Zt�sA��B�b5��
��+����((<m����3
��G���{�����b?e.�>O�Hw;1�U���0(�=|��=�y!u����r�6kV��j]��K��e
�y�Er�=���Y���:�R����/\(��K�YN����]���{'����Yu1,�B��	%�> J��4�P��`O%Y�Q\3���6���V�����B����u����J��������77�?�������'`eS��7���so���p�k�(�����W���29e&���N)$ LT�qS*z ��$�$��mi\����H����T�X��p��Lf�y���{V�������W����1�JS�H�[�A���EESR�Xi��`w���P)�����[���0�
o#VSZY%��j��vh����'�����&�>|�����o�������z�a����vv�'{��[��!g���o[��w�|{�|{�|{�|����s��Q�~b�B&��i5,�6ev���Gvk�^+�Z����v0$l7~�y�:&}�f���\�c��9A��j���v���*)N�T�s�:�9��Qg>�*��!���Y�Of���9���)�E
T��R���
�Q�$����@��<Xf�c2^' qv���$�2)����@�I���UU������XS.$���qC$	�3���
����'�A�����
����L	�v����}����
��g�J�}������?g���y�/v`����xy�������������^��=�!�!�������� �!���������M��������P�,�}���j��'
Z���g��(x�c��)QxBa.�W�����������3L!y������dp})c���TyQ��Q�7V�`��~��y��p�es �#�Y�!~��<�2}�&����Tn��v�v��k^�T��!zAt�ra��4S[F��5����h6F*2�EV�X�����m��c��Xy#u�W8��
���D����lY�]h���?�V��S��,���B%C,Y�U�����
y�
�E�`U�{�_�W?9��L������Uk*��E��VY���[��dZH���K��F�=�4�dv�{F/�g�D���1�����e0�O0,R�i������Y�@��dk��m+6�!���L����e�g����F%������e����?9M����������?W���7w�����{����N�58��Xp��A�������)w����<�;8�o��T�P��)��sS���c�:m���a���>�Ig�`���[�������p���n��NE?���W�������V�& b��?5�~���������H�o}���s���������~>�<���fm��ym6=G�M��^�RK����@�;mQ���^r���z@�g #�/..j�F�W����������A�:�ic�jH�0���+_����WH�
d��
rI
3A�LG����5��w�X���)~K?�^�V�<XFr��rIQ8��Fs�@)[nw7���5�%�;�"J�9L�J�;}F>U��LX��,B�������:a4�aI����Q!�Z�$owPY����/{��Q������T����2B�z�9��A� �z2���7�2cU��G �I%�gY+��Q�������p��"q�y�\�Lu��|=D��j��4}��X�����$��v���c�!�j^��y'V���'#oO��6T�:>���v���4;n}�p�&�W�[��]�n/�F�4��������2@,����hU�w�C�Hz�N`|Bf0`��.� ��2���%���*6�����H�F�0�z#����F�e�lm9�(��%����i�n�tZ��$k��F[b�4T8�\D����|�����q_�x���V���c;2��3#w�<����if7N<y�f���=x����4&>e�r@o{�����T���TV<�;8��)�D�R�Y�e�S�^(�`)s�����BH^�	��I�r������!_V.+�s�=��Y-���3���~������h31����S	���QfS��p&�'�R�jt�j��8x�{��/���#��
q����k�#L�Q�	����r}-�X�38���r+�$����	��.vc��p��������~9��)�e��*$��#��}�a�}B�Ao���yN��X�~>p�g����������&�I�����<*���XC�A��A�27o��?0�J��
}������(�/�3�����qX���h�VQ��,I��v���Q���?ai����c�6�����Y�E���a,��4
Z���U�cS �-��g�z����Z"�����(��k$4M$����10������D6���3�Can�����q8D	7���?��9M�
��r��\r���$���v���y����m��7����
������.���ZB1
��Z|Y���Od��c�{:	_�%��(pFrG��p�� ���N��pe�
�F�
�3g��l<�6�<��7�{�c����rb,l8a��8 �������	?�G\�$�CW�X�,4��$���I8G>�.~�?���~�����"13L�
��!?C�#05��	��.�~5Lc���f�x��U�1�~�V��k1��R�ij�G3;�����x��otOaM�dBm�U~���D!�s�q�eAK����g]���7]��"�`Os��t�����kh=��3���X�{=�g��IZ�����y��nb�]b%|$�k|9
%�K�����3�[����(���v')>9����Z�U���D��P�;�����$�>�,���p�2(TR��*vX���R�C@�?Y��(�(
��L��s��X}y�c4 �Ct�*������ C�������*2}��w����E88R�+�����j�m?�Ai�_U�&�.Z*���o��M�?�a�T����_7Q��	��������o�@��z����m7�h�����B����$X�����Hf&�],$!&|��
��x(�YfP]�7��G�n��Q���@3��Ez���Nz!��L�2�)VK������p5���	P(M
c�U8C��c:���<��-|AI���0o�G�'���I�DC�-����������b�'Y�=8=8���;�H�.��x��6m�f�/
_f��Y_�!����u�R<�\���Aqv|�|�d@���.lc�Jb��Af���� �;fg�e���l����'���Cz���������tP�Zc����'S���$ED"�m0�����.�n���aSbU�<P�y���SZ�� ���tDV�������In��M ��U��;k\�>E�f4�U���Opk��d�b5h�t�
�-Lx��`�Z1U��,{]�����.b�T~b�-��R\�6!���G�������}^��
sWg"s
��CLq�6�$�+���(�x�i��JU@���l�LD���M�u��-AkM*6�.�^����~t����M�O��E��$��Q��z��|K+f�'5��9�����Y�F[$xI���1�`Z�
�3���?��>�?�����p��D�����]G��~�9�';<u��N�+�R��%J1*��XG���Dg�Q�E�A��L�I���%�J'��Z��+��EF
��U*��� ��5WX��������t��T��U�;���Jy��A���&�c{''0g0,�ocY�r-D�h�C%$�p���w�o��GeC�rXj+��^_�	b}}��U����PMD�xSO)(��nA��%`\��!�C�BI�k�*#0����Z^�@�����PY1��-Q�&l2X�9��P<c�.H����e�]���6@�$�����'�Z��3T\�	z�8�9��0��4�lo���@�E�������.A}��l�R��I�	D�)��1/�[�����5�.��gW�d6�]�,�fJ���T.JE�o"��=��a�������e=Tc���X��6�!���sE[��Dc���m�����GL;���=JF��W��F�R^8U�	u��X(��6�����.�K�^����ZE����\<	5IlJ����h�����[�p�����eL��#�q����i"J���b����'�&��+n[o���4�U�-�g�\P��[Egp0��x�)h^���Er�)h�\����Y�(7
�Pg�S�
�!���])�Y�s�	���'+4�g�?=g��\���<=���=�|���K��O_��dD�r
yzBjm��v�
#G����O:�CX��X���{y�kO�]�eO�>Z���8h�j�K3g�%�������q/����t~�
GF~eyJ�'���B�,�)+$�"PCg�����Z������B$7Y(\,�+<g�.�����x�k�+oX]�G��QREC�H��>�����Q=3��%���b
I��]br�}��d��
����<���T��S�����g=`��n�TD.B�����YO�����eP;�Yn�����I��*,���06dKqj�7����#
ID�wD�/�Aj7z�x����4�N"�+i��e���6c3�I���"P�	������s������&�g�Y����y\�6�I��Q@*���=��q:�4�[$����,O9<~'�Q�������_�;
�D}���;�=c�43.�l���������Q2�����EaQ6;V�(��\@�F��k�#]������Aw���n�������F���
"�*��R�\E:�	r9x��z�5�^�
�D�_k��\2����L�b�@�y��#	ka���W��qc��f�B3^?����,����{k�;�S����P��sYG��A�4���P��e�s�Z��+�>�3�e5��p40���oe��*'����&�R�N��dq}(�F�<��W"�)�d�8a��$��o��D�i�"
�B>�8.�^C�Z��}��X�-j�"���=;��y�����r�v�VhF�Ma������x������.��to���q�^�2�����~;��T��o�|_��7d�W��p�������7sL!�3�\�O���N�����v
.��Q�*K�\�J�9�`�g#�����j��Ia��g�$�<�4ZGx�B���!�����G�v������bs�%�-,k�X��)ol�)4/T��������C���w���d{��v��#�������#��
�T��Y~�X�A@C�����'�5�0��v[���(6	A�����<l%c��,�:�/���@�S��-	���h��9o��0$�
�:LQ~8����A7�~����0K��d���'�x.�����{�'s�����&[��C�#�������9�cS����Mf�H��72r���6��4�~t�X��u4
��%�I�(�%��rd�e���.�8\h?e����d�
@,Z.��p�����}RQ���8�3�3���N^�`E	����AF�I-m	d�	�o����p%o#��)���<=�{����]*�U?q
c����=J5�%{mq��S\�	�����E\���C��fm��
��}��b���u�a�s��V�X[u��@&C{��Xg�	���5c�"TTe���c+���)�|���O#��S�8\+�a�H��m��h���
m�R�x��u������ZaOX�6T��r��YT9�W�469p��Z3g5�����Q���`�����bT�ACq�e�"�.*6�|t\/� �@��
�(e������^����u�{�T��nAw�v��&��{�	l�^~
��=}�&�c����c�W�^A����I������$��'���t��zi�)��:Sj��Z�i��/#R��sz���I�
�3�O1C���h�Z��=�$m4�d��@���B���f����p,�Q�S9���/��lly���d,=!]�*R�Pa�)Q��.�="$+�q���2D�=����kd$�������4+�y�?���@8H��FI��kR���������;|f����������I��b>OQFv�������{��Y��1��5o���1�DT����vP�<m9A� �?�b.��&O��T�4��
�9QEiL�5�H�����+�wY��"d���`�F�No�t��x�f��GvR��7��;
�A'����T>q�Mhp���4���E��������#��L���]��
�F�����c]�Z_��R��I���N������G�u,���+Ca42�nMJ_
����
�'��
�L,&l�����>�����:��*�Z�3<tV�������k���e�7{�t���yz�8�����3eJ�K��(����j>�ZyCT2��^AA�2�]</4�6Z���rQ�z>�6���Tf�X��z����@�\=������\:@�&��)�"M�w�wT�� ��U��[�����lj��8��Q$Sy���<8n�;Er��.G��d�DPL��A�!9E��J)`_���1����1%�(�L��L��*�`����|!b��.��Xn�N��=�d	"���J��%@�2������8�JE���dl��q	s�USbB'#���MV������$u=VR��z�	�].����h��v��xj'�u�l�!�T��d=���5����RS�2OS|���J�������p�(�f}��t���R��UZ��.���?f�����-u8��������(�l1{�*��y*Z���x0�
�Z�Q����%z�����e8��.�.�d�#*�_��S�L�I,#x��2�>�f�	gRIz2"������������������%Fx]��!�GOCz��p2Bh��G
�y&��`����'���|��oS8���K%���B�����n~]�EH�PH�$x'�^�H���7%��xR]x��U�E)�����b��t�'d�^�1oJ�N���\�,R&hw����v����OB� �mc��[�-I/_�������Ru~��M����mE$�*��oAe�*/2&�U��
�cZxL.������z�A1�����j�@CA����H�afat����R�(��U(xV�$�<AL�q$�2�S��2;�l�����	o�������d��s���g7t���TN�@G*/��t���2�>?��H���b�&1'�*qrT���"�j�&��S���fz���{��v��t��z��sV����__��� ��6���6��n���e��@oZ��0/�hA�B��j`F&'��?f��
c��p�<zX���N����g���+,�I9�����?#���<�i0)W����_��_z_�0�A�T��C���ck��Y�=��T�)����m��-d�������9�*w�����}j�����g�����J�USs�����6�N;��xk��8�[���V���.FH�F�E�����[�*�a�Khy�c#�{R�x�z��u��[�v����k�Ax(��H��.���������/����O(4>C"C����Q��V8u�^�#sjq��izYu�e�d��@�D@#a����[K���ikon$G�U,i�!��u��
V��`m��~�5����V�X��8��M�	��\@�m?n72�����3#�|��B��B.3�Y��U�@'s\C��Xm�p��oe���]y'��q��Z�J�9���}��`H�8��#�d�9U���df�}�h��;����F}k��*r�;X0��Sq�_A�o���W�����H/��4&H$Pb��2c-�����A��;>D{��Y"��#�6*����[�opn/�>���%���:��R�U0�P��1Jo�����N�V�A��8�k�c�D�qPGw��!N�S���3/��l�O�.��/(��"�������-�z"���~�MEa�xS����a�:�v�k�Y���q����\U����8�Hj����������i�)�6���w�4�:qB��� x��������n�������~�I�IN��`��7_$,�7��:�ruY���2VX�����������zeOj��R������M��3���XE��;>hc�9������mB;��h6�HmR����n ��^Yz���r:����o7k����*�R@4�[4q���~U~�B;������}G*7���l�l�_&���Ieg�;�rh�a��G�-�
��4"��\��e,��u�e����i�+������H���Zr?����������^������vd}]lM�G��<���sM��q������]�:Wp��M��/-\T9L����8O���\��w+�R]�R�����b�����x�i���~���^��z !�hJ�#��Y�)�3�2��hy����U�hTk~2���B���A��t%�v���C�73�o-7a>9�0��V� <��:�O�x:�sa��2ip�j_�1��y||�-���0��w9�_/�7��7��>j���I84���s�F%�A`�$(Y��gO*��@��$XKg	Y��"��\$�%U���P�D����a��a1Eb��(G�=5����e|KX#C�8�{
I�.&E��x�=(���j�w�
`�#YRk���+��~'��\Z����Zj��'C������d�.�@��Qhe>R��z3PRWX�u�Ez�st<j���R������P���(�xm!V�d�!��Qb��A�!�"�6�?�)^��$���H�������k��qE��-?7�Y��H8]
7$�>:�:q,l{�{���
��.3�A��*Q�j��{�qt�Z��+��Qr��/���Yt�d�f������F��G{������(�W^���j�*�D#�b�L1$��,������E��-Vg'k!����{��6,������3�7E�n���+C*�+���%��	p
	��$���%���-�g�Fc?�cR���;����z;c�yTo}�����������g>�Z]`�����)Fd���C�2K��J%�`fxbw0���I�h��-,���������Iy4�^�R�P��K�VP+�_�
���"e�y �7���0_��F�b�����=_�����K��hy>��:zb�u�)?��PM����R\ ���`Bs���
�x���q���<\i��"MH���������h�v���6����t���v���n6�e���#un���e�T�Od�F�$#��-h;/�%E�I`v��]�.iJ����#> ���1����������K���V�
AV�}������Q>�9�\-�����:������fm�	<GA
��4�v��Ws�T:0�a�[�=��U4��,�#��9�j�Ho�
�	��S
��6�u-���v�,2�4E���{.g|������
]e�U%
]�8���i��n@R�w,U�M�-m~^HUz\
�}�������[?�����O�����+�z.k'9��^J��"�JO�6�0.~bK	����L%��l�K]�(�;�7I��$��A�-��8��}e�@lw�!�0�����U�������m�HZ��h��Y�?����tm�d
�����b ��'T�i�NP�����W�V|M�3q�9�����7�2s5�C�GGe�$X��23�S������Z��N�x�v�9YX�u�c��.t�.
�5F�����p���T��	�<^,"��=�K�oeUijW��&���w_�6�H�9Vt��2��x�Y��������U03�����J�@�5���<MV,,J�L�J��pd��^T����Y����)-�����a�UI8sJK��g����O�#��@}��K���t'����Z&�@�a�)�uR��o���W�M1�J.�����Wfaf�l�_F)~O�:�V%f�-�WE#���h������ms�X�Y�:Bq��"��O0��D��^���x������%��J������YN|?����D�m7�~Dw������UO*��^���$���0�n�����x��b�[�s�g���=US����i�L�����w�D��"�YP�p<��4/�wQ�?��6.����Q���F6�����h���V�^2��
>�}��N�y��N��`���l=����5�d�x3�p�*��n�&>�E���o��[O�!x
����p���[��K�����.�5`)������8�^��c�����Z��U$��w��:�Kvq���,A��C�`�EM���-�Ex�������2{�����i��tP $.^��"�`����"f9*���c�&Zt�JP1�� C�xN
F���������7:R����y���1��v8EGM��2���)�U�8�
�u��WJ��\��S�ZphD��v����$�PN�K@��X�b
��,�4���\r'�8!��^6�%��j	�D������9�����C������,�Ra=��h����oO{;�5���j&���+�&�-�;Y�
����]p�&�_��c�9R�{:w�WV���(`�d��Ox����{�	��a������N�{�n�K�J��oQ�$T:���9e��%t����avN���N��mWH'LEY���[�a���
0�E���$W�������]�;n�*�,�q8:6W���
N���6�8j78�-	����U�Q�����&5%����t��Ddb���[�G1�j-���
�7���O%�>y�tO
��2@;���&�0Eg�P���O�d5(o���(xb�9l��+��=�O��^,~��0�����a�����a}���W&����&z�nNEca���uP�T��e��f��2�&|��E�y" 8.�6����%��V�R$�q�R���u��U?�z|�>���a0O�����Fk�T)J��X2��Uw�M"{��+�P�������L�M����`8��y�%y2���������y)�r��,�b
�%��'�w>�����21�"����%���V2�M���o-��a����\�jC\����y�p/@q���"9�S���z��do�����)��P;��l��$j����_�8����]`xo�/0��_c5�B��<��[�
�2�I��BR��6&{1`4N�_��v��(o'	m���E�2��_7�-@W�S����YHT��V����87��������,sl�D�Y�/:��;�����%[����[����S
�f��������Ux�bz%3��]�F&��k��`R|�1����Ps�J�3��m��eWl��V�������,�,97��D45j��o�f�JR�M�O�{(��I�u�%�1��d��������q��oJ�A��&o���
�d9b��V���������0W�-�_��G�[����wv_U������h�����f�%rb�����:���(o�p������U�%���>����9��]����
��,�Pw���@�@%�j]u�U��3�����G�"�4 A�X5��6AJ������S�yUX��$��G� RH����,�yjAt�H�lj�EXO� R� �E�s�t�q�X�P~.*��������\�G���]��A���sk7���G����R��%��!4����22$+��Qta
Dp��\�fe�}c"y�"���dNb���e�W`d�"�k
L�
�������=EB�7/����g�d�C�7��Z����T��%��@�tZT"�c"�Q�k�U�H��J���!Mf���������]�]f=yYw��)���z�0s��RI���v��e��vk������Y�/A~|F��:�e����c�29M%���qGx�I$�&+:��z�
��W��rQ�w����?��%s��/�z!Y��V��c�#]�?4X�Y��M���������#(�DR�F��<�%��sMYvi�/�7��e�I�lN�Qr��&�u�<u�sw��]x�:�_����+C�Y��J��,;�a���\:Hl�<�>8=�����%�7��Y�&gS���c��������s!n��nD�k���09�|.dAW�Q;��BE������I�z��~4B��=�{�uY�
?]N�R5-5�Sz&��MGt;m4�!�kwhU��]x����)B���9"����C�H��h�+�� �w �"�w��H�0�%�-���A��]r��c�iZ���r�v�:	�H�����)���o�;�]$���%J���
wyf����M��.���&t�J��"5���+;��lz���}�rkU
��Ul�J$����N�IKvn���w�:�s�w&+��;�S��l�`�S*%�����+�9<�p�-���Mq�_�,3�k��5c��X���3���F�8&�����;NDN����0����o���3ru���d��%�	��$�(��&����b���O0]B�^���1��29�NyZ�M�d|pO��=�F�d3Ku��0�X���e���	e��������L���B��hdv_�
�v�NMFi���T��'�nK�2=Si��V*��K7�_9J�Z0
��N�jf������/��zY�4�{�xo��MU����\�S�Ib���� vS�.������*bl������y/��c����2������;����o���s�����-�Myi��'GWY�U��m����	V��0�d�)	���Tx�.�Q��s=��"[������k������>pWiD���v�Hxl���F5�S��T�M�7�����W��Q���B����
�E�yz�L*�����PT�y���p���H��{�n3�]��1K8�n$/�J�1l����+{x���4����<Z;9�+*�D#������Ys��q�?�A���>e�'����JC����C����{L%�`}=A��\p�<�OP�|�a
�����#L��I#[��>
%+Q��@gt�;u�nl��Q�%���{��L�FN�66���D!EE������,B���Ed��MQ����*�ej���Z���}Ru_�5_+��u�����{�"��9��kFRt�����������a������������x'�f�N{
�����������
�qRP�����fB�+�AY������� 3�NX"<�s$��r6���RIC6ws�>s���CW���Z��1)��� �&�Q�i�P������Y�����W���V��R0;���,��y4l�o�7������vNq�8s���_
CTR�����V�K%'4���� +3S�6t[e����*�-LT@rn2s�H����
��>�����NI"g���c��4o���@��:���!wUhQJ-w�99�E'�03�OXySZ��H�%��H�M�!A��d��JB�Z����U@�.z��'.5�J����F9~��G�O� ����`�.Ki���/��3�PY�8�����+�K�����]��i�]b��\����3��U����Q)��pX���
���9�������H![�JN��1�hI���`X���9�b�/|���bW��K
Q��C�O���fka��e}h}���\F!G�Sx���u�`���V�&���X@C��^�y���iPP{C�l���|�{���^=�
I1
"��iH
K��H��:2��*���Q���dn6&(��7���np��"�{���ZK��G��M��#f	���o���,���[7�����|�h�Nu���������A�b,�7N�#e8Q��m4�ui�������'�W���D���Q�P�jZg���z���pRr]�
3�^0�C��B
��[��K}+��=����mIl-�dnC�����(����0�8��2Y������:�'A��T�����r��D��Q�����A�@9��j�����<��d61#�8S�&���-�q&y"����
�,���b;�{�Pl#�vM�<{E�H4�QX����.m�g��3,�$e����L)<P]���4���h9���MiN����f�w�h���;h��N��]1#q��F�iT��iPt4�.�.]s�����)�)�M�r�J���L�0W��)Y�j=��j��C'�{��@�_�f.�q�FE0���b��
�^d�Ir ���������gz��-����u�y�����;�s������}�&}��n������������[;�������_x���g�qR��>��9~��_/y��?��P�����������cOR[��S���)|��A��I�Y/���/�tF��/��Fa/�F�g��L��V��V/u��'���j6	a(�� �'��nu
W0����������G�t�M������+�NF_
�.7�S_neg�=z�]|�NT,^��������K���I^U�H3O{ vy�3��R��Gx�f8���J�����
�+�c�RC�K��x�|Z�2��|w�%�D���<������4hZ��^K8lgJ�0��w��RD7�iTJ�)�`��^iE��8I����J+����V�
��e���tC��^����pd�2����Q�?���H���k4�:-|�����9g��2��l�2��^�����w�O���U�7�P�]Vh�B�&����
[(0(��������y�|o�D� �����/<�RiEVB=�W�!K��)�AD�Q��*gGX�C�m�����P�%��l(3�K����ETy*��e�s��yJG�NQ�X�Z�7)G>�Zi�\�7�Z_�3��%�����L��"�%���������Fo����<�9��T.[����i���i4�&9�ze�x%�i�C`�&�������R��Eh6'\!�D:����������_��|�K_��0DiE�h���-��b��e
�Z�G1V�!�hX�?*mj��3��j9�
SgnHJ����GA���j��Q��f�t�5���b},����`����S�{<zQaSW6��
��
���zkX����S}�`FI�g�lt�����IE5i1X�r+/$���`
h�~E���5���D$,�E�`��Z�x���c�c�����%kN�g��zQ4*�I��t�I�	LP��.���E����<�Vf��t���F��3a�!�bX�=�U��������	�y��t���V�����^���'�<�}��
��B���/F]uB�_/����x��,	�I���������5���i��,��V�e>�*s��&�
T��x!��'�.��v�:���}���o��1<�{$d����lwKwJ���M9�Th��/��T��<�FA7�����D@���=`�\3����j���8�6�(��Cd%�;f�_����6�TD�
'r$|9Kj	�q��D�.Z-zvC��!K���G
� 2�;��uCMI��J�A��xhm��\3!Z����^��N�cE TU$������lA>���y�n("����K��h.�J�n�Z�>��c��w�0.=���z�3�n87�'d�?�������������������������������������������J��V���
#11Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Koichi Suzuki (#9)
Re: [PATCHES] Full page writes improvement, code update

Without a switch, because both full page writes and
corresponding logical log is included in WAL, this will
increase WAL size slightly
(maybe about five percent or so). If everybody is happy
with this, we
don't need a switch.

Sorry, I still don't understand that. What is the "corresponding logical
log" ?
It seems to me, that a full page WAL record has enough info to produce a

dummy LSN WAL entry. So insead of just cutting the full page wal record
you
could replace it with a LSN WAL entry when archiving the log.

Then all that is needed is the one flag, no extra space ?

Andreas

#12Simon Riggs
simon@2ndquadrant.com
In reply to: Zeugswetter Andreas ADI SD (#11)
Re: [PATCHES] Full page writes improvement, code update

On Fri, 2007-03-30 at 10:22 +0200, Zeugswetter Andreas ADI SD wrote:

Without a switch, because both full page writes and
corresponding logical log is included in WAL, this will
increase WAL size slightly
(maybe about five percent or so). If everybody is happy
with this, we
don't need a switch.

Sorry, I still don't understand that. What is the "corresponding logical
log" ?
It seems to me, that a full page WAL record has enough info to produce a

dummy LSN WAL entry. So insead of just cutting the full page wal record
you
could replace it with a LSN WAL entry when archiving the log.

Then all that is needed is the one flag, no extra space ?

The full page write is required for crash recovery, but that isn't
required during archive recovery because the base backup provides the
safe base. Archive recovery needs the normal xlog record, which in some
cases has been optimised away because the backup block is present, since
the full block already contains the changes.

If you want to remove the backup blocks, you need to put back the
information that was optimised away, otherwise you won't be able to do
the archive recovery correctly. Hence a slight increase in WAL volume to
allow it to be compressed does make sense.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#13Richard Huxton
dev@archonet.com
In reply to: Simon Riggs (#12)
Re: [PATCHES] Full page writes improvement, code update

Simon Riggs wrote:

On Fri, 2007-03-30 at 10:22 +0200, Zeugswetter Andreas ADI SD wrote:

Without a switch, because both full page writes and
corresponding logical log is included in WAL, this will
increase WAL size slightly
(maybe about five percent or so). If everybody is happy
with this, we
don't need a switch.

Sorry, I still don't understand that. What is the "corresponding logical
log" ?
It seems to me, that a full page WAL record has enough info to produce a

dummy LSN WAL entry. So insead of just cutting the full page wal record
you
could replace it with a LSN WAL entry when archiving the log.

Then all that is needed is the one flag, no extra space ?

The full page write is required for crash recovery, but that isn't
required during archive recovery because the base backup provides the
safe base.

Is that always true? Could the backup not pick up a partially-written
page? Assuming it's being written to as the backup is in progress. (We
are talking about when disk blocks are smaller than PG blocks here, so
can't guarantee an atomic write for a PG block?)

--
Richard Huxton
Archonet Ltd

#14Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Simon Riggs (#12)
Re: [PATCHES] Full page writes improvement, code update

Archive recovery needs the
normal xlog record, which in some cases has been optimised
away because the backup block is present, since the full
block already contains the changes.

Aah, I didn't know that optimization exists.
I agree that removing that optimization is good/ok.

Andreas

#15Simon Riggs
simon@2ndquadrant.com
In reply to: Richard Huxton (#13)
Re: [PATCHES] Full page writes improvement, code update

On Fri, 2007-03-30 at 11:27 +0100, Richard Huxton wrote:

Is that always true? Could the backup not pick up a partially-written
page? Assuming it's being written to as the backup is in progress. (We
are talking about when disk blocks are smaller than PG blocks here, so
can't guarantee an atomic write for a PG block?)

Any page written during a backup has a backup block that would not be
removable by Koichi's tool, so yes, you'd still be safe.

i.e. between pg_start_backup() and pg_stop_backup() we always use full
page writes, even if you are running in full_page_writes=off mode.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#16Richard Huxton
dev@archonet.com
In reply to: Simon Riggs (#15)
Re: [PATCHES] Full page writes improvement, code update

Simon Riggs wrote:

On Fri, 2007-03-30 at 11:27 +0100, Richard Huxton wrote:

Is that always true? Could the backup not pick up a partially-written
page? Assuming it's being written to as the backup is in progress. (We
are talking about when disk blocks are smaller than PG blocks here, so
can't guarantee an atomic write for a PG block?)

Any page written during a backup has a backup block that would not be
removable by Koichi's tool, so yes, you'd still be safe.

i.e. between pg_start_backup() and pg_stop_backup() we always use full
page writes, even if you are running in full_page_writes=off mode.

Ah, that's OK then.

--
Richard Huxton
Archonet Ltd

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#15)
Re: [PATCHES] Full page writes improvement, code update

"Simon Riggs" <simon@2ndquadrant.com> writes:

Any page written during a backup has a backup block that would not be
removable by Koichi's tool, so yes, you'd still be safe.

How does it know not to do that?

regards, tom lane

#18Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#17)
Re: [PATCHES] Full page writes improvement, code update

On Fri, 2007-03-30 at 16:35 -0400, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

Any page written during a backup has a backup block that would not be
removable by Koichi's tool, so yes, you'd still be safe.

How does it know not to do that?

Not sure what you mean, but I'll take a stab...

I originally questioned Koichi-san's request for a full_page_compress
parameter, which is how it would tell whether/not. After explanation, I
accepted the need for a parameter, but I think we're looking for a new
name for it.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#19Koichi Suzuki
koichi.szk@gmail.com
In reply to: Simon Riggs (#18)
Re: [PATCHES] Full page writes improvement, code update

Simon;
Tom;

Koichi is writing.

Your question is how to determine WAL record generated between
pg_start_backup and pg_stop_backup and here's an answer.

XLogInsert( ) already has a logic to determine if inserting WAL record
is between pg_start_backup and pg_stop_backup. Currently it is used
to remove full_page_writes when full_page_writes=off. We can use
this to mark WAL records. We have one bit not used in WAL record
header, the last bit of xl_info, where upper four bits are used to
indicate the resource manager and three of the rest are used to
indicate number of full page writes included in the record.

So in my proposal, this unused bit is used to mark that full page
writes must not be removed at offline optimization by pg_complesslog.

Sorry I didn't have mailing list capability from home and have just
completed my subscription from
home. I had to create new thread to continue my post. Sorry for confusion.

Please refer to the original thread about this discussion.

Best Regards;

--
------
Koichi Suzuki

#20Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Tom Lane (#17)
Re: [HACKERS] Full page writes improvement, code update

Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

Any page written during a backup has a backup block that would not be
removable by Koichi's tool, so yes, you'd still be safe.

How does it know not to do that?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

XLogInsert( ) already has a logic to determine if inserting WAL record
is between pg_start_backup and pg_stop_backup. Currently it is used
to remove full_page_writes when full_page_writes=off. We can use
this to mark WAL records. We have one bit not used in WAL record
header, the last bit of xl_info, where upper four bits are used to
indicate the resource manager and three of the rest are used to
indicate number of full page writes included in the record.

In my proposal, this unused bit is used to mark that full page
writes must not be removed at offline optimization by pg_compresslog.

Regards;

--
------
Koichi Suzuki

--
Koichi Suzuki

#21Bruce Momjian
bruce@momjian.us
In reply to: Koichi Suzuki (#10)
Re: [HACKERS] Full page writes improvement, code update

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Koichi Suzuki wrote:

Hi,

Here's a patch reflected some of Simon's comments.

1) Removed an elog call in a critical section.

2) Changed the name of the commands, pg_complesslog and pg_decompresslog.

3) Changed diff option to make a patch.

--
Koichi Suzuki

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#22Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Koichi Suzuki (#20)
1 attachment(s)
Re: [HACKERS] Full page writes improvement, code update again.

Here's third revision of WAL archival optimization patch. GUC
parameter name was changed to wal_add_optimization_info.

Regards;

--
Koichi Suzuki

Attachments:

20070403_pg_lesslog.tar.gzapplication/gzip; name=20070403_pg_lesslog.tar.gzDownload
�J�F��{[G�8�����$�l!��q~G�lk�m��I����X�(3#c�����u��LK�������4=}����{M/{�(�F���������M�����|��h�4l�4��|�������"��h3�>�,� �b�d�ee�������35���~2���E���������[7w�e��w�������/����������W	��5�GQP�zv���0����hM�0��I���*
����x#O�(����dT�}���z�N<��f�(�>�q������&[�o�QV�v)w��'���j6��W��|��6�q�j}��0��7��yz�$��I8r[��<���O&y��s���}e]�h���������=%�(x�qx����C����5����(
�$�JF����E� ��K����)��Q77/��+8i������e7���D�=���F� '��]���+6����$�	s���4�w��~���h��{���<1���%M��/�'���_���#��2�^�3��U��3��a�fy00W��LN���=��Z>�����E�E������'i�'x� �0%����SV�:��4�C��e��*��h�W�W3l�u8�j�s��(�@��������=�?hk��&���b=�H�QgO�d|�F����.��7L�(������4�i��$����;���V#�3��D������SnW����������>�F�A	����_���_����:1�_�(8�t6���D�xB��#D�nl�5pLZ��t��[�A�L� �	"��l|aN-|5�c���Zi��f4L�f�ou�^O��5�
A38J�(M�����f�V_������x2o�*�����! yo8B���������n��*���d(��z���$0w �W@���n������h�@5'�g��V	`?[����}�7j<�����Ds�*"�L�����z��q����
������z�)p<�����D/ux{�2�y�dI�$9\��x�;W_�����C�@���
�u
7���Z&~��r���6VL�
���@�n.�r}�3���t{0����A����`)p{_�mz}��"�~��i}�3����+�Bpo6���V�(�*Pg�VV�����gQ�_Y�Q��~z��z��$�k��AQ��2$�V���UV/��xz+���������_'�<�_6~���+Y��!���IR��V�7qN���gE�l��V�Og�n	|�0s�����Y<��Y��QsFuB�yENu�RF
T���r�F��o/�������S�
��-F�����%�j�b�������J�	&D#K7��w�{%��|�N�w�,.��e��b����V������p�K����[��/���&`�p( ��d�9@
�I6���0�?�,��v�fu���cC�LG@��	aJ����Sp��q������������Ci���x�.n�Woa�d#��ms�XWz3�W����_���_�����1>��e�L����b3�\��V�<�<0�gpoJru��zt~p -�S_?�z������-�l>��a���0e��D���Ub)l��	��d�(���sv� GC�2>�J_buC]�b*Z)��)V]0�,�����a_��k��O����� �\�<��G��	����*��p\�	� ���8��7UK�!��d
�����}�{�������Z�m�M{��1b������q�-��7O�����I�u<�A�Y��.����r;�U8��e<(�Sx��}���p~������$�V��)���Y���p���%�H(�)G��YN��D���QL������`����x��X�L]Z�R�!3����&��y�9����
�;>:;m��*��(E�HEj
�J��
C�Y�T��!Q���f�~b���Q������H��(�a���8�p�_�������.��o,f�H���*�����W�0���"���<6c;Th�9���U�b��a�e������{���+��<�Ej��5-��\�����qG!���d���f�PSz�]�2���d�DyS5;l��9k�:���g������
�s0�x/��lI� I�����o,y3{��+}r�9�B���+Z�l�����6\������*
���k���Z_1�@��Jg~��V��<SK���1A0��d��z��{B����	��L��I����z'HW���?a�{�{($sgF�O
��^�Szbh[���_�%�p<�[:�y�������!���q��v����F� 0�t��������8���j�
_���f�dB�#�z�^�����|/������pn5�wp����~�_E����i�4�d��
9���mc���������M�E�l����(0��BP})��[����X�Q�#��|O2>��7�s���?�P�"���=�b��,�U��6�c	d�^D� ����j�g�X�b��`5q���rB��6�a8�z�gzu�����ossY�&���j��rn�e�,�n�S�
��h`����L���8���	�!XbJ
	h���C�f��-�SVE��0l^��W���tt��5�������������G�yk�w>4gv��������{�������o*_X��gQ�f���p���������d����\:�fq�1�&���@g��r|�
G�ep���V>fx,q���M@�Q�gv��POx�pG�Qo1�

/�x���������)t�E���h3I���4f��V���������wm�U{�q6e���Wsz����IL���|i%i|O�I��3����Y�h����~;�,��kzfKh�����h�<��3���Q����Z���E���(���	�������BWbxZV���lW(�l�6D��p*_�]���:v�MB�Pq6�rW��<��h�a<*�me&������J2~�dV�MK�(��o������4����i�Lg���"'�e����������!�c��^�%Z������p�B�����Nal���A��!��P\��$
+�d���vB�za�V�����h��,\�=r�%����0)b>*��,��J�q�*���Q�����	����@����n�t<$�����,�����Q������������N9e G���b���q�r��gMD�Li(�zC�5�o2Z��i��2�.J�\�����v=q�cx���2�b>)�MX��7F5WgQ�F/�Q�tq����66
U�9��X~��������5��\���Os�IG@�uq�hgK� f(a�'T�F����P�2�1�w�'l�$g�|2�fbs��k�K��}������%�[���V:Z�����0�^T-��l>�0���d�R�n��K��{��j��}p������u����)��V7c���@@�5��~3G�-�=�k��HjU�b[h��H���y�g6�W�;����k������>������I��w���a#����9`�lLH��e�
(�������_��QG2nie�dU��� }�����%L���,
��E�z�v��?%3�uT�:����M�(5g^P�t<t��-���-����������0����C8 '
��[��'���5+��1��~������������[{"������J/�,��Xs���2}zw^bIVG)
�����E��BGG�{���#�y|mQT���{:o�_��p���*�~�����f���"�x�
u���	5#�>��C��2 b���yl�������������pL��28�1�ys�q��<>��!��,N��Vu�������#M�p����{P����)*	st����B���n�;��oq"Q����
l��w�,S��iY>�]�aEq���e�C����tU;j�";Gd��� �5ac��u���*�Q���t���D�j��P���!%Gd���K���Z�B����+��U��P�4�� q7��!;.�)P�(�+�G�W���Bm�(2�����=Jt]�l�	�G��:*���'�<.F*������s	����������}aN��UX�-�\��>�1yvX���2{���,���Z���?Qi���+,�V����8#�����m�����y�5*�m�U�4Io�c�|g��t��|� DY?������ �F8��	�~�f�:���7j�h�d["�e�Y��@���Z�<M:�prC�������K	4���,��upb6P��B<�j�j���2�������h��q�t�����z�������^���{9��B.�PH�"��}�~j��a�j�8��,�2y���_6p�"(0`����I����)DTv���~�c:��,b����`����=�.�t��iQ D(��#��"�(Em|�������6E=��(�vli������RT��)��������i�uF��_��/'p3w�Z�?����:��]l
����4������l.�b��Z�9�p$��O
����A��A+��!��cq��fs�����y�����?�7���?|���������p����U���c��
��B��W�5���
��%~�,�X��6�C�|�?���������������S|��/���8�dt���.c���i`��������1���%����w�$�������#y�T1NFO��R�`80����!�*m��FT@��D��6��Q��������:�~�#	�g�Y��������{Xs�����i)��3��/g�T���|	
+��~�E&��	L'�{�8k����J�������ut_ee�0+
�$���]�w�j]�QQ���`|����;Bt3iw��N�.�t��oa�i�5��T0��?�(wtm�����D�g ����/Is���G��0�D�+�g]\e9���~p���!��P��
��q.fI�����cKI�nE�k<M�-w)lU�er&�b+
a�����X�������o������i?U�Y�*G�&�.�CcA5�Fs�
J�C���~���e���j����-���h��������l����lJ����-����|��cT0` ���`B�e����r�k*&K���UM� 2�r����(��PU��i]/��0	>
��!�'��J����*�!!E�)�@����/}��AI����(��P�t ��`�Np �����i!h�
p�����������w��g�jk2;eC��l��e_F�?�����T���=&w����?19F�e�����I*@�M�[�����}5�%�W��b�Js,����azA��Vt�:g=x��Y�.q���O�7ls��\�l���{�^���H�O�<�.L���*�B?�����31tT���1�^�9l�4�?]���}!3�#�B�?�����n��tS�kn`���KY�U��d8��`��;aP~����:
G�����e{R
�S�	�e#��;!m
�2�OR��\�	m��u
�h���}���"�����h�?U>k�$u>���w�VjVS����c�����'8	�3<��!w:<���#�@���._�HDk�V��W��y.���(�8%�IW�l\Lg�v�
ua���P�����cd��5������������wb�������)��������u�@�2��{jp>'�]�����X�Rv����K)�J�HQN��+/����H�[����-����W�9�~d�pu+��K�\���z�;=Kmw�"�hu���t_N}�[�����r��C[+G0���h�n�\�Z��Q�����w�j��@��DR���Qv�0���G�������q�0�	{���y�r��������j���J����R���n�'���~As=�����oK(�v�����2���k_������%fN2a_.���J��:{�{�[��sV���q���B�a�Z#w��D/�\��]&�7��������}�U����[]���eGu���c|�NJ�%=�
z�{q*�>
����]�M��4��(�������QPt��)�E)���{:�/j�2;�����?����������~�&����K]�_�^�A�7��.m�z!��������iz����Vf`��Z���|����A�����Z�X��cU�� �=*�z3���z�h^�s
�f��3\���LR��h�?������]w��3p`�@�bl������2��x�'��\|Y��X��s��w�w
�s!�o���J,,9����!�(:�T0bl<���������^fG�]6�����f�,��>���e��/��PNkr����g=Y"�����H)Xh�d�q9�u��[3YK���
�e&�t��L5�UF��h����
,���)�~)�����c���gk�F��ZDyl���%�c���P��6:� �i;����p���m��DF�d=�aK�<�����K�4%{�~����A�VcK,��������rJ���3�Y2��`Fj8c���(��Q��U@��c��r0n��^X1�*�F�$�T�.<(u �(�vS�8=*������a�g&��%��y!�i��I�}i���@F;F��_�K��SB�Ei�}������$�|^��n����t��E:����x��6d�,S�{�k�m��<�_H=D���t����S���$L%7�kJ����+�h�/w��/���Z�mPQ�nN����qcu���)�G~�}V9�V�������b�iy�Za����3'�����2R��+�e�U�q����`;T�J�c��VbP�����!�dTou�-K��qx�W��)���J�wN:���Z��F�U= 0�?�s�X�[������v�t��z�X��K���iH�����Fv,Y��N(�S"�r���u���d��p&*�j9#����_�Y=��9�I�'�D}O������m���zO��e8`��Y2H���@I|C9��$I`0)������{����@^B�N�j��rr0�KK���p���������+l}�aWH������g�yG��v��}�HS�wb�Z]��
���Pp/���I!Q#�i
LTo*U!��3��t4C"B�
jn^Xv�+�=J�qK���>7ZY��5�z������:���k�lWIe�����}�O�SJ�ce�,�5���\(|v�xj�{oO�dV��(]�a�5s�M����!JmJ6HhD�.N�[��b�0�C�+c-`��d!A)�g��H��p(��19%+��d�27�Q,I�����x��
��i!��j
���o�R^=A�Mg�~����r��i#0��)�',�330(������x��O���rv:�P�)���4����|�^F�������j�H������\��,E����
�Pk�Q#_�*<3*TK`>@�2�W)ag�����?U/!, �dBa����_?(N�V�X:����9�u"��?��%K6�K(�5�t��FAiQ�����2���z��EC����A�~�E
�x�{������7�?����HBV�A����3��=T��:��C^oa��1
%)�e|;��O�'���S0P�*������TM![�In�E�LQ�0�k�6N�.����NqQxn���m~o���|E������Ea6���KT0�aN�~1�/�?1���F8�����:�����GlL|�5x%e���
s�]�O��S��9�����sN��)Y��]�w��U��� ��P��3�W�<��w<fxs�D���� �����B��%j^a�*{D�6:R���6��#t�i���&�u�@LrK3�C��o5��I��Fo�X���j�i�G��S�F]s��n<�����d<R!��{Om�[Go���r��<��b_��w��I��s������G�~�������a�S�a�*��[�N�x5���NUD�F�
4
3�7^PI]WQ�����i���_ko��!��������b7��zoT���*��c�U�9{���}v����O����gJ�Z��l6�����I��<��,g����V`B��'����r���$c/�&<Y&����)���Y���s"��J������3����x�r=������hnm��6������_��s'���W����9e����&�X�^�����Q
����`���^o~���0�x�hc�Q�� ��7���J�����i���&U�w��])~�H�%N%������	e.F����:�/������i
�tv��z2�D�o
���h��	[����,�G�:H.W��b�P��{����v���J%��=��{'�^v+�#x�#fWF\S@�0�����o�ZE�}U���
��lv+5���=]h�h�7VO���SzI�`��!����7n�_U��R����Q{4/���u���������i���>�H�w��{����������4�6)�wc�3�� �`�?M����e�>ZC�����5��X��������U��������[�6�{���PyC����w�femm��I`?����[���G��G[[�~����W@���ZqE��c���6��7l�M�����am���?�e7�~o�W��%�RH�[��
At��7h��~�:?8�u>��������g���u����F���s���T���u8j
��<��Scu���P�ap��>�y�o��?�����+�Fj�e<y�������l	d���B�?
&
���w�z>���(��;W(~�b�T��_F�b�_�����%a�K)u����.i��%��G���GzN+*m6!�8n�������
���`���!���� ���Exb-Nua:+/O?�t�)��6tou�N������X�Wy�����1��&���6�����(� ���������yF��D��[l��Bmo�wv�l��H���@���",���m���-C�����gmQ��.����D��}U�5�������p�4��-c{H8���A/��� ��_��q��R]�-�6C�����!M���]��6R^\�g!�F1|<��&��\q�����6S9:>k?Z���rB�(q���IN+�p(���o����a�����\/j��a��#�q
������7�hr�2a���}HVO.=�����c�T�s��5�"w�"v���V��}7��l�7��/T�
�D�m)l�20�.Tl�rc
/���M��,��:4��r����n�4v�@��,��r��a�H&�^�Du9���g�U+%\�������L_�\���j���?���$�y"���R��ZM��kVP�}{�	��������I�����>�1�`>p�'��"�fQ�2�Q;���R��H�p���%��%n��FBZw.�>��[�-��uX���|�$tL��h�d�e��{	8��&n�{�D�����*5�z�L�-E���_'��dj�g�a�w}u�6��U<������K��;�4(U�1���s��a��hl���YD��?�4!�����R&�}	/R�'Sl�����e�kqa���A���O}�f���OA�ym-�f��{��?�����s�~}��^8�����r��J�e��8������#�-+��9wcm��`s��v�Q��-r��W����V7z�I��(���m��G��F�%b��TKk���f7����X�\=d����S�*�/�M(u�{���k?����zv��k����g�����e����W,r�[���=zk�~���Q8�2������������������-�����?	/�|Z�����\~i��<��u;�����I��u�9z����hH�a]�	�����GVN���\6M&a�t^P�����X�y�\%�g��}��:?���,�~ /�\���
oy	�,���{���}Z��=���^<o��uw's:��L`�^c�'u7���{�wt�S
;�s
��u����Y��q�9��LW]}g�`�5�Y����G������#CG6���# %[oAG���	��K��hA����Yz1�b�Q�����y�v�d�_�����~J�rl���$H<��"v�.�w�����F���|��	�#��K��:U|zpL)����;�h���]z>���|�k�S���4�}C��d��[�N�io�4����}sm���	��[�������V.�u'" �����������N�|������������e[��n��0;�!�F|b��N�����������b�)���+%
���u������X����)���S��� �"7�sF6�"��${�g�En9�Mcq$`e��uc��Gqt��:S���HU��?j47�v�;&�3�<1X�#Q��A���3R�A����t0���0�e���"Q�U)�����J�M�����R����!����k��rI��l�~6����N���������Y!�&��]�O(��&�0���Ly�5v ��{�v���v���!�w��n���OycM�|��.����nz��P�>X�����M��&k���j�����zyC+�w�L�1�evxH&U��������v�IU���d����%�5�-�?;��U���R�������}���m��	���������FNu���&����T��)}*U}A����
f}��U
C�ld{#����`�����c>�q�R���F��������==&�����p��� �,Pg?;�3��z�����
k�j�=pg_$�UY�M���?"Q.T�56��2���UJ3k>������8X����2'��oa�'�����e���{��-������T?��R�5�h����'AQN	<���yev�
�!&\3n���	���:�,X�4����S�r�����<%��0��`�QD:�v��a�q�<`�s�*�&�t��z1������TB+�*�����#��#��P���s��J�!�<�~Nf:P9����>����C��)��7�����>��E'������x�O�������Z��
%����CDn�|L�")�=t��"�ET��gkS`�G�r���U�j�%�X3]t�$ya^R�"O#��OX�Ug���/R�Qy��r!b�&�$���D��<w �?vB=��Q�h����t���'��{�����F��y�VP3�9�����G��
@R�P�t�
�8\�$A��y�R�����UW����Y�>	����t5�^�^
��W�����=�Nuo&�.\��Xy+����(�H�r[Mlmr�ceJ�@v�@��{l��%^	��<��D�l�n��hHv�QQ�P�����2��M��0��@��q�����8oG�lN|�������<E�Se�����Zt�h����d�*S>cT�MF��i�-�SSu��^������J�X�,���������gl��x�B��f)8����U\Dqwn�!������>�)��PY�0���1U��BK��3���4�����3��_g�F;s�g���&0�p�>�W�808�e�&�i������l��*�(�8�scV����heE��0����e,������)���p^��s[���-����h>x��A��?��O���U��7=���g,�\��M�SC�A
'�$�����������~{_)~��U�*���25�S��m���k��^}L�(�e�Q>.��uJ�X�tc;���>7�~2���B���R�K��B�Q���<��e��l:��}�z����u�j'n��X�q,�����XT���54�����`�]F���a�`������O��9���"�Vs�b���{��d������i>����W�On��nnm����p+|�����������7���Owk�)b����G,�����e�����# 7*J�K)PgL�|��
�OT��2�6}�����.��l\��
,<�W�=+'���;�&��W��Kae�2�(��1H�-Hu*�Y��?��"@�_�4����
=�Gee����u�v@CeX}��;����o���N�o7��=��-�v��m�o��g���=��6{���ow���� xh�}h���|����;��w���v��^�,n���mMmCV���}n����Z���u��d�����`U�m�
�,�sr��t.�A�j5��X�R��	)��+L��_�� �v��?TK���W=����w�� �X��*3����?�)��jHxN���������\g�8:~�����U��]Nu��)�6�R�y
�mZU}����wuU��IH������V��`����/�?t��A��rp��L��ul�����#.������@�$@���W�:uw��9�}����3��.�M
���<�sk�PY}��������&e�!��3dF�I}�us�%�3������}����7`��|=+4�/}�q[��U�}ql�������w+�P���HPj�r�}Sr|qa�,�,C2��e�(v�~�x���]?���<\'o9��zfa��8(��,_��D�b;JaI�|Q��a�3������^�@���Z��-�}��@����j6f	2�$V���U����{px����Y��k����Y�x#���z��$�4|������y�h�$������o�*T��6���w[���p
v����>/B�_���Q���B����Z�B������IU�e0LibVPh��b������8x
��=&�^����Z}M��$�2fO����lZ�-J>&Y����|�9p��5{��F��5#0�@E���/��w
�:����!�&Q����������O���}��5��f���i{��g��S|>z�'�g2@�4�
9�>f�'
�RI���������i?��O�� R|�1�������NQ����������@m;����w��J���
Mr��HE��9%H�\�^� ���R8�n�<CO��2�����
�����5���1�����h��`,c�%�B��
�wj,�p+�L�tuP�a�o�f�TG����q�:ka�".���W���
<�����L;#�r �l��n(a������~��2�@jL�9�|�,��(�$AQ~��~�%���*���>�&t�`�����%��w���vP�`���;���A*Oq%:r��Y%D����&z�\�U6K��	L�4c��G@�I���W� ��j>Ol����w��4E)��bv�	��t�+��8C�����C�W�{�yc%$'k;��sb�!�b^����u�fV���'Hd/�gm��u|~vr~����B�����������V%�@C>N�"KF�<��1�>�{�)����������t��	�O�����pZ��I���^x\EF0��Kb�`
��bN^9������������Tx��%<n�������@'9#~nw��U�
�:�8���v{/��4��cH�&��5����:����br�����2M�1��������=T
d�&@[�����.�|�-l�F��'�(�\-H�Kh��<�����pr8f]�	&�s��QiZJ��}k��P=��8�9�����$���!]V&+�e�3�Z�-s�����u�������02]�P�O��?���2����2)8A�����{U#�Z�c<��~�V�0d������]�`�������)��5���������hF��jDq�����4�#�����y8�_���'������r�/�&N����0�~8�����Y��<�'��S�K�?�p��4�^�����e�R�~�[8
yT�G���l����dm��#a%�z�#\�����*��$��.�c���	@av9�QN�58�e�����e��_���5�����{
��x�*��^2�E#��Ak5����16�:~vh��71�(���Y8]�7\���KjM#I.�\}�Af 1a��1�
4U�Z�bF��l7~O�u��&k�O�p���3����).�������I��������^�h�x�s��@��V(`_XP5�^v5����:�P���$Loju}���3�����=��o���f9#�#@gxh����Hgv]7���]��Or���v�O{�m��{���^��qfPs)�
�Md�PUG@O�������>��R1]���R6�fp�_5���($������~����wy�HS�3��t�u*����Y�8m�=�Q����4��WW+`8X�2
0X.B��\Pav���+X������|����w���s��=YP�`U^"�6S@�t$x��S3�Q3��;�!���b]��,�@OS���x��,��5�3�F�%�Fe�����9��"�4Q�]:$(���H��X�Q��F�F����0"�e�
(rHU
CU�:�5���x�&U�����g�����,&J�:�l"�MI��L���d�������T(�U��+>�>�Ta�(�9P�QdB��x��,"�k����(�J�MD�8����Or�������H��>���7���2�����bo�1����	�U��w�R�W�����~�V+����i���[6"��!��u���s����9h����=����f��i��71��;��6<�9�Y�g����	�4d�
e��o�2��3�21�t�������i�������@e�8�L&���%���T�a��P���%LH�p�+�A<��L�������	F0���T�eR�S��!=H��Z�uMF�)n����*��&E�==?8����������x��t���/3��.�A�,���	Ou J��X�9Q)����}tEKV8��w�6&f�Od6��EE����"~;`���i�������^��$��!R?@d���%z@	�tuHXL#��F��� ������6!8����L#E������=x�3���SD�wD���SJ'P��(H��%.K��]��*���'���)B���t�����$<Fph-���Y���B���T)c*?������uBvA�Q���QW����W!��Nd�A�q�A���f]��3�e��5?�e:Cx�R���2�
|�"041����[y�Cy����R�P���O����	�7�x�,	�w�j�(�Pd>��5s���h�U|q��3-m�-�2�s��Xu����6����>�?�����p��B���k��B��u~v<�	vh(�41�:�V������Q4iu�:��$�:�
�.��Z��,�&�C+�X�kv��YS�W
��{.������2I��q��U�f:��q���T8*��T��+)���:t���������l�
e��`:G-
�q�k�������#
=��T�F����_���/!���mXQe;��)m�P
�[��S�	�D��i���������5[U`�X�S���.8xm����2!
��C;2��gf������dYt��<�
�!)!�dZg�Dm��,�v���0-K�%� ,���9�[��y�&��R��'��)��yn �M)G�yQ�b��v�q�x<kr!���
�i2S����rg�D����s������t������
U�0����@a�N� %��i��h,�\�m�7���j����G�z���(���x�����.9z9!�]�V��/\4l���Ut<���P��������O�� T�+z��I�`Ueh�����x��<![h�_�i�	�����l��]���J�E�l����r9y^A���C���
Z����6}�=�$V��J�H�`��7
�Pw����C�������b��mN�i/�pz���(=V�y��T�;N���]
,�R�}�
V�#R1 �H�3k;���V>�+8u|r�9��������io�L�Sm��3q���G�[�~i����1����R�<�{zg)�k���2?����|J S����qRC�����~�Dy5�^ 7X�\,+�g>,��G�J|�����7,.��Wh(��"p$b_��Lc�(I^�����P1���.����p�%rmuSS[A�AL-����"5��%� �Zo���/B������n����q����'��S��R'����s!�����o���F��"�����n����Y��e�:I�n��(6,=w�����#9�>�;�0@�������:8��?������2J���/��16L����u������X����y0�_�	tTi��H�IE�rx�NH���U;9���:kS'���Y���g��f���
�9L���2��%��h���E�X��X~V��8����|��0�6��z{�n���SZ�*o3*I��V���<�K
	���*����V��A�����$�P�ZKv�R0�PZc��������0a�q}
.}n��V�L3���T��I���{k�;�S����?���{�kYF��A���yx�/���"vK����y����B^M�?	�Bf�[�����I�������S$3EB���^>������(z`�6��~�����p��@` �6)"g-��c�<�j5��-cu^4����zG��&�:�*�5����� �:<�V`F�MA������x���v�G�=,�wo���i�V�*���L��9� W��o�~_�7!d�V���]m���{���U�w�����/������0�.����7��-�r�P����E������f���<'
g�%�d���}z��V�����-=�a������&o���o(�[P��1����lQ����E�mu�]tI���!������@��u�G����g�^��H���*w,
?b,�  ��_��������%�`��Dz�5�UB����� +K+DI��(K�.�k���$�(G��8J�X�[���AY2nb�(����rq��A7�a���x� ��b��/�!zO������{�'s���FE�-�	�%��q�F_�{����d�y���{$E�>]�C{�^��>?~��@Q��%5��u�<�z	���w���7���OY�A�Q����S�m�6�?=^R�L�~�)��o������)wYJ'��rA��`�����~RKk�E|B�6�*2\+���l
��8O���H��\�(�e?q/
�����-J5�%[mq��K\	����F\���]�|���CV_����xpX��s����Q�c#����]���V�4�f���u9y���
snV�:g�s�����_(o���p�F�:M�
T�Z^�6_)n��e��^J���H��f$������zuv��<���������"���}�>E������b���!?�*L�&�.�7�zt\+� G�3�<H^�W�y�%��Wj+����J�<9��hw��kY��[@�*���&�`����g���m��c����������ON�{����,b���}��rk�N~�^ZzI�t�%���:��u�����%�T���\#"|V��<�_1B��'o�F��J���)l���s�u�:������WuI� ������e�#C����T^�-T$b��LKJRQ 9���8�G1$����|tGd�DMvh6��h^d��s����m�<H��F�,�k�R��
%.�8/�
���u<�0��2���
$�o�����<�3H��s��3g�����e�r��D*cJ��dx��d���v��<�:F���i6�Y\��wJ.�x�'c���$&K����?s���]��&Yu�� �;���������X�HO���tGn6h����'N�	
��;��T#(���{�Vpi�9�����&$��H��`5�a���0z>��[%�����������dS21���0���&����FX�h9��mVt���	����>6���uz�e~����gy���/�m'������\q���`�^�)�7��Gp!����O�,�N	^>-;���\-{S+k�
�r�+�i S
Y����B�m����t9�S�	�&���J*%�g+�A6��A���G������23�ibI�"���~�zG9�t[���V���[�����N��%y2�aPf��������������dWL���I���S$n�*����_!+���_QPP����=����tV	|����X�r��)��'t��K�zD�C,�sVVi��h��x���������Jy���dl��q�

U�bB#���MT������,q=URs���/��p�t����str��:l�w��N�x��K���a�z�nc���5o$����&���1�bi���U-��Q���,K)+��F��UtZ��.���?�����5u�b��d�?�e�]I��=@�T�<����J�W��u�
��(�RPJ�=��A?�4�!��'A�{A���^��~`y�D|��kN�'�A1JO(�
���rD�<Pt<������72�y�>��_�����c�AOcz��x2���?�A
�x%��`��~�3�1����-����q�����U������v;?y��
��oNVf#�U�/B~S����7�u�u�nc/*�P��0���$K�$M�����P�r*w���D�2B����pB6Ga!ok�B�Rhq�XnO�>T�9-���N}����WY:���P��?���c�e���&��G4����/t���:�P�;��f�aE5j^#��Y�����~ ��Fy�B��j'���R��#YPRH�bV8����_�O�����<s��R ��U���]�m���E��0TVX��j�k������� ����G7�9�V�{������#�(�m��GO�lj�<�MD�0������R�8v�R���3[�w|wq���8�9�rA��c���V���Q�7��`����hF�Bg�l`�''�F8�y)��7��u�*�0��R;��/8��W���t4+%�7QF�]�uH���Zc��R~�o�|O���A%h��������Y��{R��XS����":�J���]^��Tr���A��A����sj��������������UKs���u�6����}��A�|o�o'ZL+��!��o�N,��n����#\(��9L
�H2��0�}��0�� ��!�s7��(�!g�#D�G�\���H_~2���=~B��R����
��������C�����M����������a���6�������uW�����v����jNF�`�S���z��8������5
��2�����vA�T3�vVd������eE�
X���(�d��k�����.�`���=Q�3��p7�QP+f�2�q����
y��x$��6'k����f�S�O����Y������G�Z�������PmW�~�lU����9���t8�q���N�>[Ca�E�� ��c`���Q�����,���!�V*���z����70�7O�}�qcx��X!����s�����7��
��e��i�i��}���G����A����8)K��<j^B47�l�B]
�5VPN�E�w^s����s�I�v_7���M�7!�=�P:Y���g%z���c*������~6b������a�-��9�JKpu$��.��u��T:�A��c]�������Y�;:G�������c��������a��������h��:�b�e���M�[���V���'1FG�pTni���g�&~���rG��"��t����YPZ��6��d4O$7���Eg7`���*=��o��pr�/��f��Z����|��������oW`��|�r��dB�&�7u�P��@���wR0�[G�f���yvt|
R!���B���+���b��H���4u���H��J��H-y��������v����6�����Lm�,���M�G�</��sM��u������]���p��M%�/0-�T9��������������N)/W�b���n�g�dKh�<ku��~��+���z .��J�+��Y)�G����|4`�:Z4��0��hu1��]��-]��]�hd���[X
��QZ0����r�(�-E����_�-l5P*
�_�B��UF�2��o�E9&I�1���m���_@��gW����9fN
� ��3�F��Q�U�$	�W������#��=����D\�0���o6)DIy��a
�I��� �U�:,%Olb��������y���7��\����w#��e�����z�����-�v��^�6�����������\�V�8�&�n�'+wi�6�\+���ap1�!w��X{������y�����N�g�S�op^C�N���������D��(IF���=��{H�����X�pS����!�	l�%��6�W����G~�+�^��9]
6��>?�9~,�{�{���
9�n3�A���*Q�l�{�qt�Z��+��Q|��/�
9Yt��F��������{�~f_��A�K^TU�
�\@��V�r��Y�WW�F�&�������K�0����������+�����}���=���J�J��`
/xE\Cb�:�&6@��qn����8����8Ga�����k{����9j���;;�+�M�CIQ�t��2*`��Y`�!F��*�C��Js�*k*�gexc��������>���$�ci(���Iz4^�B�����0��+�_!
���$ef>�����b��|���I�a�J*�B�����+?q���R���%��&�<�C9mX����8A�5[����]+%������q���<\n��"MH�%�����{S���{?������D:O\�8`���:��K4��|,�1R�f�v^�R���ktO2pY������R4�D���������
(�S"
�/XJ����:��r�Rnu)d=��N��^�
���C�U�R<�N����X\l4����9�r`g�Q�C�t(6>�)u`�����=��c2�}�	��������:�SsN��&������E���G��x�J����:I����UYU��u<��k M�7TI]��d�6n�t�y#U�q5p�Eg�2�?	6����� <�^~*5���h=�����7E	w�#��diE	��/�����YJU�,��������}��@M� P.���S��)�+[f�v�z�zNsf~���Mw���/��F���0�K�@{&����rg�b<�����d ��'��i�IP���-)�����Igb�{]�M�o\e6�,j��f�(6"����0�Ef �"�5;B����p���=��da��i��#���Qx�#�(�����}���}t7�K*��h��(�K�:*n)�U��Y�l77��g���������)3G�.yQ��A���9+�����$Xe3#��9�o�
�!\u��/���h����/�D�W&���t�Ug	����)m�{�Q��a�UJ8sKK��W���wH�#����R���:��N�����d���pSx�����X�#T�M��J�]B��0�233�6����'	M]n
������������S�PTQk��,P���"Z�l�L�3L8��rv^+�9!�D��\���TZ|��*V�.�Rj�����M�J��.�m���t?"���
��U|��'���opkK��3���0#�n�l�����7c���s�w���^����2VLL��XfY�O����%"����������d�y	�K��YZ�af����5�5��R�<tEC_v�����ZD����{�w��6�(MN�����y0���YS��H�B9��(��xP:�$�{��1�oM,�����)Y���������O�_�P�y
���i)������8�C��,p��C�F�.�����`����M\��%Mv��aHA
b,�d�����Z����v��/���|��n-��=���x;����O� ��X������YZ����b�"���L���YCc��Q�u��8>�G��s��t�D9���p��ZLE9db��)L�J~�5��
�WJ����%
^-���p`��L�$�4���G.;VQL��w�@�;I�%5a�����5u�$>�X7�%����K�c��e=�\�����Y:��z� ������v�2L��l&�*����O���b'��a�B����]w:��/y��1�1��;���+�ju�d3�f��7<Y�f�=��W�a���Z�KN�{���K!K��oQ�8T�:���8U���g@�0:'���	��O�qSQ�llYh����{�e�����+���L����$�m�������E�G��*��H^�%W3��2�Cm'�%.��w9K5r:�7����~����?�L��|�$ES�e���O���\��X���������n.����o��
S4&k;��4KT��F���%�����C�^����K����d����w/��w��7�
�t���[�O�3Y�HX7��vsJK�,�2����E���WJ���u,����LY��N��������V{�8����Ka����g��C�����,>��x*�T�@T��R�0]��������+n��&���r������X$r��P
��a ��LX(���6t���Y:������j�Y�<�
�'�Dk>��}o��E��7i�{b
)���
�(���%��9��RJ^m���W���f��<�7 ����H��)Y
Ur�qB��E~mP����	��0��l��8j�����
�y~�������=�o0�V�c�P��@�j-sEY�$�c"��F+�� '��Kc��CY=��IL{��nQ�L��O���C���tg��4$���i���gl��������,Sl�D�Y�/�����!���������l[����S
�f��������V��z%5����#�����C�,�����(8j�Y�qE�B��P�l�-��@�
��v:f�_E���d�@y���A-�M����Br����e5#4��:���^2�GhU��7������i�fj��I�#��hw�ff���Y}u��2��hi���� �h��(�0.���v��g��-"������{k���XyS��7Jr�
YT�O[��W�����wz��]$�3W ��ea��+0xD(T3+)�u[	�B~gv�u��G
�"�4 � �rS5A
���#�R�y�X��${�G?#P�����"�4���)C�U�	��3�T:&�Z(�p;�6��}����s���2z4�V�g=�y���4k�������sU9c�P`�t��q���F�x��h�\����=��XQ���H^����:�,�X��c)��4
�H�L�I^�z�t!����p���D�<������Pv����U��J2���J%��N})��'�MU���E]\od�j�s��K���m�D���2������+��
������K�J�eH�)�����+�6�����8Y��%q����#b�k��u�L�6�X�'�T'�d����*��J��D9�J����
K�?�����
@�^����D*G��!������85��Y�Q�V���@���'�\Ib��[<=�9��sUEri�/�6�K�2��S6�\�(9[b#�9�j�:��;�������vq.�*`q���T��%+j6s��?vNzO����A�@�d������
)�=N�
I~	v1��t1�A$�6���Q.�DT*�����%�g;#V����,��d�|w�u�uZ�?�L%S5m5�S.�Y���:m2�!�iwhe����e�t�����4C�����8\+Z�$����WD���3N�$�n	b�%.F���BG7�����u����R�.�'�I��"C0�5��������D����H����(qW`��^����T�����=(-���(b.o��	t��r�O���U5k�W��*��*?h==-��m*��������X�c��Ne���m4�SN��|�#�����8�l���l7�7���������3J�|�1���3>��F�&�������,\VM(t�q&~?�����#ru��d��%�	���I����Mf/�hE�O0\B��Oa��sf�L'=-W�4�S����/bV�.��#V��6c��Aq\���5�:8@�)��ZDG#s���@8Q���[yj
B����H>)u[�S��R�X����RMzT���uP���e������Y	�:t���\�v����ug��DP�����.p 7TUGm�u!/�o�f�`�HL!���������D�R��	����d��5��UN����.T)���i�]�v�g�z[���qH���x���L�rk���O�*g!��C�i���f�Q��+=�C�.�x�R�@5���y��vo��4"���]%T��"�����)�T��&�rU�
�a���Y��rtC.d��k��B�<�B"U����p*�mB�&!\��E"��=[���T�o�2��I��X�b)�m��s�fca�����-8���u3�{E��h`[9���2�c.�3���!�����d�PVi�%��?C(����`P���C�?L�cx�U,v���7g���>R��h�����ZtuNc�JT�7���N��������0*�)�[���p��9M���_�[�K��K��7�������o�4��UD��J	��41����^��^*��u�����{|J���4�N}Qq�I5fUvu�7��u�
cI/�<�#Z�	�N��rM{r.�������������
*����DHSeR(K�WCt|�x��	s�� z�D�rN.g�U����=�Nu�3"��^D�u���hLBq�4H�	�F���O�O��}e��c=���u���������H�<�ec[���
s�d(n8���S'�����������,�o�R�qM��m:�*���
��V)����Jp�3��\Z����;�4{�'���7��)H�r�\�FY(-G�U2��"�:>Er���Z����eN��F�	#�L3��T"irS�4)��#$�I19�$�������(���p]p�R����b$����z�����TWT	CuZJ0F����g2��
�yE'�N�KXE����r��d}��(#�r���������RI\��J�4(������m�:p������$	U��M�6��.�*:�������]��~���wN��_��6�u���\e���D��r�}+�j��LB6���T���yd��m���!G��X����y���C����]A5$��r $}U��z��u�#B��"�*�F��~��
3����.�G�H��d�~z��Tn<"���T��T�-�����|���v����'��r7:�~�r�����>Y��nD^z���\����v���;2��Y��g��9?E�B��i��ju]�X��$��d��h`��s�����7�*��=���{���l)�DnC�H�6)������4�B�,t$��LH���(R��*��Ce��-�<0��*D�pG5y�`���V �U5@��LG�q�A6�Gl��y����H��I�j��������lGr�[�n���i��k�����Mo�6��s,]Z7�.�0Hf��I�\���{�*����$���H9d��Cin��:ZgU�����w�n�{����"1��D�aT��y��K��+w���i����>��u9C%�J���&������,N���aJ5�
E��b��0!����1v\�Qa��%��4�B_&��:I��
<��?�a��e�l6J.�����q���h���NS��������/6�;����;�;�hnl7�[_�?��g�~RA��nw�P?��m��M?���U�{��#4c}q����b5�u�9GM)����+w�!yZ��w����vh�r����*�+r5����������A���I8��u�sG���P�Q��i=	i%��FC��]P������v��:����ET'���2`��4��0����)���y�UR�������<H.O��sVUM����8�?T������i���{�rO�.�����K��rK�[�sX�OJS������	>��0�C��_���?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?o���:�hU�
#23Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Bruce Momjian (#21)
1 attachment(s)
Re: [HACKERS] Full page writes improvement, code update

Bruce Momjian wrote:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

Thank you very much for including. Attached is an update of the patch
according to Simon Riggs's comment about GUC name.

Regards;

--
Koichi Suzuki

Attachments:

20070403_pg_lesslog.tgzapplication/octet-stream; name=20070403_pg_lesslog.tgzDownload
�J�F��{[G�8�����$�l!��q~G�lk�m��I����X�(3#c�����u��LK�������4=}����{M/{�(�F���������M�����|��h�4l�4��|�������"��h3�>�,� �b�d�ee�������35���~2���E���������[7w�e��w�������/����������W	��5�GQP�zv���0����hM�0��I���*
����x#O�(����dT�}���z�N<��f�(�>�q������&[�o�QV�v)w��'���j6��W��|��6�q�j}��0��7��yz�$��I8r[��<���O&y��s���}e]�h���������=%�(x�qx����C����5����(
�$�JF����E� ��K����)��Q77/��+8i������e7���D�=���F� '��]���+6����$�	s���4�w��~���h��{���<1���%M��/�'���_���#��2�^�3��U��3��a�fy00W��LN���=��Z>�����E�E������'i�'x� �0%����SV�:��4�C��e��*��h�W�W3l�u8�j�s��(�@��������=�?hk��&���b=�H�QgO�d|�F����.��7L�(������4�i��$����;���V#�3��D������SnW����������>�F�A	����_���_����:1�_�(8�t6���D�xB��#D�nl�5pLZ��t��[�A�L� �	"��l|aN-|5�c���Zi��f4L�f�ou�^O��5�
A38J�(M�����f�V_������x2o�*�����! yo8B���������n��*���d(��z���$0w �W@���n������h�@5'�g��V	`?[����}�7j<�����Ds�*"�L�����z��q����
������z�)p<�����D/ux{�2�y�dI�$9\��x�;W_�����C�@���
�u
7���Z&~��r���6VL�
���@�n.�r}�3���t{0����A����`)p{_�mz}��"�~��i}�3����+�Bpo6���V�(�*Pg�VV�����gQ�_Y�Q��~z��z��$�k��AQ��2$�V���UV/��xz+���������_'�<�_6~���+Y��!���IR��V�7qN���gE�l��V�Og�n	|�0s�����Y<��Y��QsFuB�yENu�RF
T���r�F��o/�������S�
��-F�����%�j�b�������J�	&D#K7��w�{%��|�N�w�,.��e��b����V������p�K����[��/���&`�p( ��d�9@
�I6���0�?�,��v�fu���cC�LG@��	aJ����Sp��q������������Ci���x�.n�Woa�d#��ms�XWz3�W����_���_�����1>��e�L����b3�\��V�<�<0�gpoJru��zt~p -�S_?�z������-�l>��a���0e��D���Ub)l��	��d�(���sv� GC�2>�J_buC]�b*Z)��)V]0�,�����a_��k��O����� �\�<��G��	����*��p\�	� ���8��7UK�!��d
�����}�{�������Z�m�M{��1b������q�-��7O�����I�u<�A�Y��.����r;�U8��e<(�Sx��}���p~������$�V��)���Y���p���%�H(�)G��YN��D���QL������`����x��X�L]Z�R�!3����&��y�9����
�;>:;m��*��(E�HEj
�J��
C�Y�T��!Q���f�~b���Q������H��(�a���8�p�_�������.��o,f�H���*�����W�0���"���<6c;Th�9���U�b��a�e������{���+��<�Ej��5-��\�����qG!���d���f�PSz�]�2���d�DyS5;l��9k�:���g������
�s0�x/��lI� I�����o,y3{��+}r�9�B���+Z�l�����6\������*
���k���Z_1�@��Jg~��V��<SK���1A0��d��z��{B����	��L��I����z'HW���?a�{�{($sgF�O
��^�Szbh[���_�%�p<�[:�y�������!���q��v����F� 0�t��������8���j�
_���f�dB�#�z�^�����|/������pn5�wp����~�_E����i�4�d��
9���mc���������M�E�l����(0��BP})��[����X�Q�#��|O2>��7�s���?�P�"���=�b��,�U��6�c	d�^D� ����j�g�X�b��`5q���rB��6�a8�z�gzu�����ossY�&���j��rn�e�,�n�S�
��h`����L���8���	�!XbJ
	h���C�f��-�SVE��0l^��W���tt��5�������������G�yk�w>4gv��������{�������o*_X��gQ�f���p���������d����\:�fq�1�&���@g��r|�
G�ep���V>fx,q���M@�Q�gv��POx�pG�Qo1�

/�x���������)t�E���h3I���4f��V���������wm�U{�q6e���Wsz����IL���|i%i|O�I��3����Y�h����~;�,��kzfKh�����h�<��3���Q����Z���E���(���	�������BWbxZV���lW(�l�6D��p*_�]���:v�MB�Pq6�rW��<��h�a<*�me&������J2~�dV�MK�(��o������4����i�Lg���"'�e����������!�c��^�%Z������p�B�����Nal���A��!��P\��$
+�d���vB�za�V�����h��,\�=r�%����0)b>*��,��J�q�*���Q�����	����@����n�t<$�����,�����Q������������N9e G���b���q�r��gMD�Li(�zC�5�o2Z��i��2�.J�\�����v=q�cx���2�b>)�MX��7F5WgQ�F/�Q�tq����66
U�9��X~��������5��\���Os�IG@�uq�hgK� f(a�'T�F����P�2�1�w�'l�$g�|2�fbs��k�K��}������%�[���V:Z�����0�^T-��l>�0���d�R�n��K��{��j��}p������u����)��V7c���@@�5��~3G�-�=�k��HjU�b[h��H���y�g6�W�;����k������>������I��w���a#����9`�lLH��e�
(�������_��QG2nie�dU��� }�����%L���,
��E�z�v��?%3�uT�:����M�(5g^P�t<t��-���-����������0����C8 '
��[��'���5+��1��~������������[{"������J/�,��Xs���2}zw^bIVG)
�����E��BGG�{���#�y|mQT���{:o�_��p���*�~�����f���"�x�
u���	5#�>��C��2 b���yl�������������pL��28�1�ys�q��<>��!��,N��Vu�������#M�p����{P����)*	st����B���n�;��oq"Q����
l��w�,S��iY>�]�aEq���e�C����tU;j�";Gd��� �5ac��u���*�Q���t���D�j��P���!%Gd���K���Z�B����+��U��P�4�� q7��!;.�)P�(�+�G�W���Bm�(2�����=Jt]�l�	�G��:*���'�<.F*������s	����������}aN��UX�-�\��>�1yvX���2{���,���Z���?Qi���+,�V����8#�����m�����y�5*�m�U�4Io�c�|g��t��|� DY?������ �F8��	�~�f�:���7j�h�d["�e�Y��@���Z�<M:�prC�������K	4���,��upb6P��B<�j�j���2�������h��q�t�����z�������^���{9��B.�PH�"��}�~j��a�j�8��,�2y���_6p�"(0`����I����)DTv���~�c:��,b����`����=�.�t��iQ D(��#��"�(Em|�������6E=��(�vli������RT��)��������i�uF��_��/'p3w�Z�?����:��]l
����4������l.�b��Z�9�p$��O
����A��A+��!��cq��fs�����y�����?�7���?|���������p����U���c��
��B��W�5���
��%~�,�X��6�C�|�?���������������S|��/���8�dt���.c���i`��������1���%����w�$�������#y�T1NFO��R�`80����!�*m��FT@��D��6��Q��������:�~�#	�g�Y��������{Xs�����i)��3��/g�T���|	
+��~�E&��	L'�{�8k����J�������ut_ee�0+
�$���]�w�j]�QQ���`|����;Bt3iw��N�.�t��oa�i�5��T0��?�(wtm�����D�g ����/Is���G��0�D�+�g]\e9���~p���!��P��
��q.fI�����cKI�nE�k<M�-w)lU�er&�b+
a�����X�������o������i?U�Y�*G�&�.�CcA5�Fs�
J�C���~���e���j����-���h��������l����lJ����-����|��cT0` ���`B�e����r�k*&K���UM� 2�r����(��PU��i]/��0	>
��!�'��J����*�!!E�)�@����/}��AI����(��P�t ��`�Np �����i!h�
p�����������w��g�jk2;eC��l��e_F�?�����T���=&w����?19F�e�����I*@�M�[�����}5�%�W��b�Js,����azA��Vt�:g=x��Y�.q���O�7ls��\�l���{�^���H�O�<�.L���*�B?�����31tT���1�^�9l�4�?]���}!3�#�B�?�����n��tS�kn`���KY�U��d8��`��;aP~����:
G�����e{R
�S�	�e#��;!m
�2�OR��\�	m��u
�h���}���"�����h�?U>k�$u>���w�VjVS����c�����'8	�3<��!w:<���#�@���._�HDk�V��W��y.���(�8%�IW�l\Lg�v�
ua���P�����cd��5������������wb�������)��������u�@�2��{jp>'�]�����X�Rv����K)�J�HQN��+/����H�[����-����W�9�~d�pu+��K�\���z�;=Kmw�"�hu���t_N}�[�����r��C[+G0���h�n�\�Z��Q�����w�j��@��DR���Qv�0���G�������q�0�	{���y�r��������j���J����R���n�'���~As=�����oK(�v�����2���k_������%fN2a_.���J��:{�{�[��sV���q���B�a�Z#w��D/�\��]&�7��������}�U����[]���eGu���c|�NJ�%=�
z�{q*�>
����]�M��4��(�������QPt��)�E)���{:�/j�2;�����?����������~�&����K]�_�^�A�7��.m�z!��������iz����Vf`��Z���|����A�����Z�X��cU�� �=*�z3���z�h^�s
�f��3\���LR��h�?������]w��3p`�@�bl������2��x�'��\|Y��X��s��w�w
�s!�o���J,,9����!�(:�T0bl<���������^fG�]6�����f�,��>���e��/��PNkr����g=Y"�����H)Xh�d�q9�u��[3YK���
�e&�t��L5�UF��h����
,���)�~)�����c���gk�F��ZDyl���%�c���P��6:� �i;����p���m��DF�d=�aK�<�����K�4%{�~����A�VcK,��������rJ���3�Y2��`Fj8c���(��Q��U@��c��r0n��^X1�*�F�$�T�.<(u �(�vS�8=*������a�g&��%��y!�i��I�}i���@F;F��_�K��SB�Ei�}������$�|^��n����t��E:����x��6d�,S�{�k�m��<�_H=D���t����S���$L%7�kJ����+�h�/w��/���Z�mPQ�nN����qcu���)�G~�}V9�V�������b�iy�Za����3'�����2R��+�e�U�q����`;T�J�c��VbP�����!�dTou�-K��qx�W��)���J�wN:���Z��F�U= 0�?�s�X�[������v�t��z�X��K���iH�����Fv,Y��N(�S"�r���u���d��p&*�j9#����_�Y=��9�I�'�D}O������m���zO��e8`��Y2H���@I|C9��$I`0)������{����@^B�N�j��rr0�KK���p���������+l}�aWH������g�yG��v��}�HS�wb�Z]��
���Pp/���I!Q#�i
LTo*U!��3��t4C"B�
jn^Xv�+�=J�qK���>7ZY��5�z������:���k�lWIe�����}�O�SJ�ce�,�5���\(|v�xj�{oO�dV��(]�a�5s�M����!JmJ6HhD�.N�[��b�0�C�+c-`��d!A)�g��H��p(��19%+��d�27�Q,I�����x��
��i!��j
���o�R^=A�Mg�~����r��i#0��)�',�330(������x��O���rv:�P�)���4����|�^F�������j�H������\��,E����
�Pk�Q#_�*<3*TK`>@�2�W)ag�����?U/!, �dBa����_?(N�V�X:����9�u"��?��%K6�K(�5�t��FAiQ�����2���z��EC����A�~�E
�x�{������7�?����HBV�A����3��=T��:��C^oa��1
%)�e|;��O�'���S0P�*������TM![�In�E�LQ�0�k�6N�.����NqQxn���m~o���|E������Ea6���KT0�aN�~1�/�?1���F8�����:�����GlL|�5x%e���
s�]�O��S��9�����sN��)Y��]�w��U��� ��P��3�W�<��w<fxs�D���� �����B��%j^a�*{D�6:R���6��#t�i���&�u�@LrK3�C��o5��I��Fo�X���j�i�G��S�F]s��n<�����d<R!��{Om�[Go���r��<��b_��w��I��s������G�~�������a�S�a�*��[�N�x5���NUD�F�
4
3�7^PI]WQ�����i���_ko��!��������b7��zoT���*��c�U�9{���}v����O����gJ�Z��l6�����I��<��,g����V`B��'����r���$c/�&<Y&����)���Y���s"��J������3����x�r=������hnm��6������_��s'���W����9e����&�X�^�����Q
����`���^o~���0�x�hc�Q�� ��7���J�����i���&U�w��])~�H�%N%������	e.F����:�/������i
�tv��z2�D�o
���h��	[����,�G�:H.W��b�P��{����v���J%��=��{'�^v+�#x�#fWF\S@�0�����o�ZE�}U���
��lv+5���=]h�h�7VO���SzI�`��!����7n�_U��R����Q{4/���u���������i���>�H�w��{����������4�6)�wc�3�� �`�?M����e�>ZC�����5��X��������U��������[�6�{���PyC����w�femm��I`?����[���G��G[[�~����W@���ZqE��c���6��7l�M�����am���?�e7�~o�W��%�RH�[��
At��7h��~�:?8�u>��������g���u����F���s���T���u8j
��<��Scu���P�ap��>�y�o��?�����+�Fj�e<y�������l	d���B�?
&
���w�z>���(��;W(~�b�T��_F�b�_�����%a�K)u����.i��%��G���GzN+*m6!�8n�������
���`���!���� ���Exb-Nua:+/O?�t�)��6tou�N������X�Wy�����1��&���6�����(� ���������yF��D��[l��Bmo�wv�l��H���@���",���m���-C�����gmQ��.����D��}U�5�������p�4��-c{H8���A/��� ��_��q��R]�-�6C�����!M���]��6R^\�g!�F1|<��&��\q�����6S9:>k?Z���rB�(q���IN+�p(���o����a�����\/j��a��#�q
������7�hr�2a���}HVO.=�����c�T�s��5�"w�"v���V��}7��l�7��/T�
�D�m)l�20�.Tl�rc
/���M��,��:4��r����n�4v�@��,��r��a�H&�^�Du9���g�U+%\�������L_�\���j���?���$�y"���R��ZM��kVP�}{�	��������I�����>�1�`>p�'��"�fQ�2�Q;���R��H�p���%��%n��FBZw.�>��[�-��uX���|�$tL��h�d�e��{	8��&n�{�D�����*5�z�L�-E���_'��dj�g�a�w}u�6��U<������K��;�4(U�1���s��a��hl���YD��?�4!�����R&�}	/R�'Sl�����e�kqa���A���O}�f���OA�ym-�f��{��?�����s�~}��^8�����r��J�e��8������#�-+��9wcm��`s��v�Q��-r��W����V7z�I��(���m��G��F�%b��TKk���f7����X�\=d����S�*�/�M(u�{���k?����zv��k����g�����e����W,r�[���=zk�~���Q8�2������������������-�����?	/�|Z�����\~i��<��u;�����I��u�9z����hH�a]�	�����GVN���\6M&a�t^P�����X�y�\%�g��}��:?���,�~ /�\���
oy	�,���{���}Z��=���^<o��uw's:��L`�^c�'u7���{�wt�S
;�s
��u����Y��q�9��LW]}g�`�5�Y����G������#CG6���# %[oAG���	��K��hA����Yz1�b�Q�����y�v�d�_�����~J�rl���$H<��"v�.�w�����F���|��	�#��K��:U|zpL)����;�h���]z>���|�k�S���4�}C��d��[�N�io�4����}sm���	��[�������V.�u'" �����������N�|������������e[��n��0;�!�F|b��N�����������b�)���+%
���u������X����)���S��� �"7�sF6�"��${�g�En9�Mcq$`e��uc��Gqt��:S���HU��?j47�v�;&�3�<1X�#Q��A���3R�A����t0���0�e���"Q�U)�����J�M�����R����!����k��rI��l�~6����N���������Y!�&��]�O(��&�0���Ly�5v ��{�v���v���!�w��n���OycM�|��.����nz��P�>X�����M��&k���j�����zyC+�w�L�1�evxH&U��������v�IU���d����%�5�-�?;��U���R�������}���m��	���������FNu���&����T��)}*U}A����
f}��U
C�ld{#����`�����c>�q�R���F��������==&�����p��� �,Pg?;�3��z�����
k�j�=pg_$�UY�M���?"Q.T�56��2���UJ3k>������8X����2'��oa�'�����e���{��-������T?��R�5�h����'AQN	<���yev�
�!&\3n���	���:�,X�4����S�r�����<%��0��`�QD:�v��a�q�<`�s�*�&�t��z1������TB+�*�����#��#��P���s��J�!�<�~Nf:P9����>����C��)��7�����>��E'������x�O�������Z��
%����CDn�|L�")�=t��"�ET��gkS`�G�r���U�j�%�X3]t�$ya^R�"O#��OX�Ug���/R�Qy��r!b�&�$���D��<w �?vB=��Q�h����t���'��{�����F��y�VP3�9�����G��
@R�P�t�
�8\�$A��y�R�����UW����Y�>	����t5�^�^
��W�����=�Nuo&�.\��Xy+����(�H�r[Mlmr�ceJ�@v�@��{l��%^	��<��D�l�n��hHv�QQ�P�����2��M��0��@��q�����8oG�lN|�������<E�Se�����Zt�h����d�*S>cT�MF��i�-�SSu��^������J�X�,���������gl��x�B��f)8����U\Dqwn�!������>�)��PY�0���1U��BK��3���4�����3��_g�F;s�g���&0�p�>�W�808�e�&�i������l��*�(�8�scV����heE��0����e,������)���p^��s[���-����h>x��A��?��O���U��7=���g,�\��M�SC�A
'�$�����������~{_)~��U�*���25�S��m���k��^}L�(�e�Q>.��uJ�X�tc;���>7�~2���B���R�K��B�Q���<��e��l:��}�z����u�j'n��X�q,�����XT���54�����`�]F���a�`������O��9���"�Vs�b���{��d������i>����W�On��nnm����p+|�����������7���Owk�)b����G,�����e�����# 7*J�K)PgL�|��
�OT��2�6}�����.��l\��
,<�W�=+'���;�&��W��Kae�2�(��1H�-Hu*�Y��?��"@�_�4����
=�Gee����u�v@CeX}��;����o���N�o7��=��-�v��m�o��g���=��6{���ow���� xh�}h���|����;��w���v��^�,n���mMmCV���}n����Z���u��d�����`U�m�
�,�sr��t.�A�j5��X�R��	)��+L��_�� �v��?TK���W=����w�� �X��*3����?�)��jHxN���������\g�8:~�����U��]Nu��)�6�R�y
�mZU}����wuU��IH������V��`����/�?t��A��rp��L��ul�����#.������@�$@���W�:uw��9�}����3��.�M
���<�sk�PY}��������&e�!��3dF�I}�us�%�3������}����7`��|=+4�/}�q[��U�}ql�������w+�P���HPj�r�}Sr|qa�,�,C2��e�(v�~�x���]?���<\'o9��zfa��8(��,_��D�b;JaI�|Q��a�3������^�@���Z��-�}��@����j6f	2�$V���U����{px����Y��k����Y�x#���z��$�4|������y�h�$������o�*T��6���w[���p
v����>/B�_���Q���B����Z�B������IU�e0LibVPh��b������8x
��=&�^����Z}M��$�2fO����lZ�-J>&Y����|�9p��5{��F��5#0�@E���/��w
�:����!�&Q����������O���}��5��f���i{��g��S|>z�'�g2@�4�
9�>f�'
�RI���������i?��O�� R|�1�������NQ����������@m;����w��J���
Mr��HE��9%H�\�^� ���R8�n�<CO��2�����
�����5���1�����h��`,c�%�B��
�wj,�p+�L�tuP�a�o�f�TG����q�:ka�".���W���
<�����L;#�r �l��n(a������~��2�@jL�9�|�,��(�$AQ~��~�%���*���>�&t�`�����%��w���vP�`���;���A*Oq%:r��Y%D����&z�\�U6K��	L�4c��G@�I���W� ��j>Ol����w��4E)��bv�	��t�+��8C�����C�W�{�yc%$'k;��sb�!�b^����u�fV���'Hd/�gm��u|~vr~����B�����������V%�@C>N�"KF�<��1�>�{�)����������t��	�O�����pZ��I���^x\EF0��Kb�`
��bN^9������������Tx��%<n�������@'9#~nw��U�
�:�8���v{/��4��cH�&��5����:����br�����2M�1��������=T
d�&@[�����.�|�-l�F��'�(�\-H�Kh��<�����pr8f]�	&�s��QiZJ��}k��P=��8�9�����$���!]V&+�e�3�Z�-s�����u�������02]�P�O��?���2����2)8A�����{U#�Z�c<��~�V�0d������]�`�������)��5���������hF��jDq�����4�#�����y8�_���'������r�/�&N����0�~8�����Y��<�'��S�K�?�p��4�^�����e�R�~�[8
yT�G���l����dm��#a%�z�#\�����*��$��.�c���	@av9�QN�58�e�����e��_���5�����{
��x�*��^2�E#��Ak5����16�:~vh��71�(���Y8]�7\���KjM#I.�\}�Af 1a��1�
4U�Z�bF��l7~O�u��&k�O�p���3����).�������I��������^�h�x�s��@��V(`_XP5�^v5����:�P���$Loju}���3�����=��o���f9#�#@gxh����Hgv]7���]��Or���v�O{�m��{���^��qfPs)�
�Md�PUG@O�������>��R1]���R6�fp�_5���($������~����wy�HS�3��t�u*����Y�8m�=�Q����4��WW+`8X�2
0X.B��\Pav���+X������|����w���s��=YP�`U^"�6S@�t$x��S3�Q3��;�!���b]��,�@OS���x��,��5�3�F�%�Fe�����9��"�4Q�]:$(���H��X�Q��F�F����0"�e�
(rHU
CU�:�5���x�&U�����g�����,&J�:�l"�MI��L���d�������T(�U��+>�>�Ta�(�9P�QdB��x��,"�k����(�J�MD�8����Or�������H��>���7���2�����bo�1����	�U��w�R�W�����~�V+����i���[6"��!��u���s����9h����=����f��i��71��;��6<�9�Y�g����	�4d�
e��o�2��3�21�t�������i�������@e�8�L&���%���T�a��P���%LH�p�+�A<��L�������	F0���T�eR�S��!=H��Z�uMF�)n����*��&E�==?8����������x��t���/3��.�A�,���	Ou J��X�9Q)����}tEKV8��w�6&f�Od6��EE����"~;`���i�������^��$��!R?@d���%z@	�tuHXL#��F��� ������6!8����L#E������=x�3���SD�wD���SJ'P��(H��%.K��]��*���'���)B���t�����$<Fph-���Y���B���T)c*?������uBvA�Q���QW����W!��Nd�A�q�A���f]��3�e��5?�e:Cx�R���2�
|�"041����[y�Cy����R�P���O����	�7�x�,	�w�j�(�Pd>��5s���h�U|q��3-m�-�2�s��Xu����6����>�?�����p��B���k��B��u~v<�	vh(�41�:�V������Q4iu�:��$�:�
�.��Z��,�&�C+�X�kv��YS�W
��{.������2I��q��U�f:��q���T8*��T��+)���:t���������l�
e��`:G-
�q�k�������#
=��T�F����_���/!���mXQe;��)m�P
�[��S�	�D��i���������5[U`�X�S���.8xm����2!
��C;2��gf������dYt��<�
�!)!�dZg�Dm��,�v���0-K�%� ,���9�[��y�&��R��'��)��yn �M)G�yQ�b��v�q�x<kr!���
�i2S����rg�D����s������t������
U�0����@a�N� %��i��h,�\�m�7���j����G�z���(���x�����.9z9!�]�V��/\4l���Ut<���P��������O�� T�+z��I�`Ueh�����x��<![h�_�i�	�����l��]���J�E�l����r9y^A���C���
Z����6}�=�$V��J�H�`��7
�Pw����C�������b��mN�i/�pz���(=V�y��T�;N���]
,�R�}�
V�#R1 �H�3k;���V>�+8u|r�9��������io�L�Sm��3q���G�[�~i����1����R�<�{zg)�k���2?����|J S����qRC�����~�Dy5�^ 7X�\,+�g>,��G�J|�����7,.��Wh(��"p$b_��Lc�(I^�����P1���.����p�%rmuSS[A�AL-����"5��%� �Zo���/B������n����q����'��S��R'����s!�����o���F��"�����n����Y��e�:I�n��(6,=w�����#9�>�;�0@�������:8��?������2J���/��16L����u������X����y0�_�	tTi��H�IE�rx�NH���U;9���:kS'���Y���g��f���
�9L���2��%��h���E�X��X~V��8����|��0�6��z{�n���SZ�*o3*I��V���<�K
	���*����V��A�����$�P�ZKv�R0�PZc��������0a�q}
.}n��V�L3���T��I���{k�;�S����?���{�kYF��A���yx�/���"vK����y����B^M�?	�Bf�[�����I�������S$3EB���^>������(z`�6��~�����p��@` �6)"g-��c�<�j5��-cu^4����zG��&�:�*�5����� �:<�V`F�MA������x���v�G�=,�wo���i�V�*���L��9� W��o�~_�7!d�V���]m���{���U�w�����/������0�.����7��-�r�P����E������f���<'
g�%�d���}z��V�����-=�a������&o���o(�[P��1����lQ����E�mu�]tI���!������@��u�G����g�^��H���*w,
?b,�  ��_��������%�`��Dz�5�UB����� +K+DI��(K�.�k���$�(G��8J�X�[���AY2nb�(����rq��A7�a���x� ��b��/�!zO������{�'s���FE�-�	�%��q�F_�{����d�y���{$E�>]�C{�^��>?~��@Q��%5��u�<�z	���w���7���OY�A�Q����S�m�6�?=^R�L�~�)��o������)wYJ'��rA��`�����~RKk�E|B�6�*2\+���l
��8O���H��\�(�e?q/
�����-J5�%[mq��K\	����F\���]�|���CV_����xpX��s����Q�c#����]���V�4�f���u9y���
snV�:g�s�����_(o���p�F�:M�
T�Z^�6_)n��e��^J���H��f$������zuv��<���������"���}�>E������b���!?�*L�&�.�7�zt\+� G�3�<H^�W�y�%��Wj+����J�<9��hw��kY��[@�*���&�`����g���m��c����������ON�{����,b���}��rk�N~�^ZzI�t�%���:��u�����%�T���\#"|V��<�_1B��'o�F��J���)l���s�u�:������WuI� ������e�#C����T^�-T$b��LKJRQ 9���8�G1$����|tGd�DMvh6��h^d��s����m�<H��F�,�k�R��
%.�8/�
���u<�0��2���
$�o�����<�3H��s��3g�����e�r��D*cJ��dx��d���v��<�:F���i6�Y\��wJ.�x�'c���$&K����?s���]��&Yu�� �;���������X�HO���tGn6h����'N�	
��;��T#(���{�Vpi�9�����&$��H��`5�a���0z>��[%�����������dS21���0���&����FX�h9��mVt���	����>6���uz�e~����gy���/�m'������\q���`�^�)�7��Gp!����O�,�N	^>-;���\-{S+k�
�r�+�i S
Y����B�m����t9�S�	�&���J*%�g+�A6��A���G������23�ibI�"���~�zG9�t[���V���[�����N��%y2�aPf��������������dWL���I���S$n�*����_!+���_QPP����=����tV	|����X�r��)��'t��K�zD�C,�sVVi��h��x���������Jy���dl��q�

U�bB#���MT������,q=URs���/��p�t����str��:l�w��N�x��K���a�z�nc���5o$����&���1�bi���U-��Q���,K)+��F��UtZ��.���?�����5u�b��d�?�e�]I��=@�T�<����J�W��u�
��(�RPJ�=��A?�4�!��'A�{A���^��~`y�D|��kN�'�A1JO(�
���rD�<Pt<������72�y�>��_�����c�AOcz��x2���?�A
�x%��`��~�3�1����-����q�����U������v;?y��
��oNVf#�U�/B~S����7�u�u�nc/*�P��0���$K�$M�����P�r*w���D�2B����pB6Ga!ok�B�Rhq�XnO�>T�9-���N}����WY:���P��?���c�e���&��G4����/t���:�P�;��f�aE5j^#��Y�����~ ��Fy�B��j'���R��#YPRH�bV8����_�O�����<s��R ��U���]�m���E��0TVX��j�k������� ����G7�9�V�{������#�(�m��GO�lj�<�MD�0������R�8v�R���3[�w|wq���8�9�rA��c���V���Q�7��`����hF�Bg�l`�''�F8�y)��7��u�*�0��R;��/8��W���t4+%�7QF�]�uH���Zc��R~�o�|O���A%h��������Y��{R��XS����":�J���]^��Tr���A��A����sj��������������UKs���u�6����}��A�|o�o'ZL+��!��o�N,��n����#\(��9L
�H2��0�}��0�� ��!�s7��(�!g�#D�G�\���H_~2���=~B��R����
��������C�����M����������a���6�������uW�����v����jNF�`�S���z��8������5
��2�����vA�T3�vVd������eE�
X���(�d��k�����.�`���=Q�3��p7�QP+f�2�q����
y��x$��6'k����f�S�O����Y������G�Z�������PmW�~�lU����9���t8�q���N�>[Ca�E�� ��c`���Q�����,���!�V*���z����70�7O�}�qcx��X!����s�����7��
��e��i�i��}���G����A����8)K��<j^B47�l�B]
�5VPN�E�w^s����s�I�v_7���M�7!�=�P:Y���g%z���c*������~6b������a�-��9�JKpu$��.��u��T:�A��c]�������Y�;:G�������c��������a��������h��:�b�e���M�[���V���'1FG�pTni���g�&~���rG��"��t����YPZ��6��d4O$7���Eg7`���*=��o��pr�/��f��Z����|��������oW`��|�r��dB�&�7u�P��@���wR0�[G�f���yvt|
R!���B���+���b��H���4u���H��J��H-y��������v����6�����Lm�,���M�G�</��sM��u������]���p��M%�/0-�T9��������������N)/W�b���n�g�dKh�<ku��~��+���z .��J�+��Y)�G����|4`�:Z4��0��hu1��]��-]��]�hd���[X
��QZ0����r�(�-E����_�-l5P*
�_�B��UF�2��o�E9&I�1���m���_@��gW����9fN
� ��3�F��Q�U�$	�W������#��=����D\�0���o6)DIy��a
�I��� �U�:,%Olb��������y���7��\����w#��e�����z�����-�v��^�6�����������\�V�8�&�n�'+wi�6�\+���ap1�!w��X{������y�����N�g�S�op^C�N���������D��(IF���=��{H�����X�pS����!�	l�%��6�W����G~�+�^��9]
6��>?�9~,�{�{���
9�n3�A���*Q�l�{�qt�Z��+��Q|��/�
9Yt��F��������{�~f_��A�K^TU�
�\@��V�r��Y�WW�F�&�������K�0����������+�����}���=���J�J��`
/xE\Cb�:�&6@��qn����8����8Ga�����k{����9j���;;�+�M�CIQ�t��2*`��Y`�!F��*�C��Js�*k*�gexc��������>���$�ci(���Iz4^�B�����0��+�_!
���$ef>�����b��|���I�a�J*�B�����+?q���R���%��&�<�C9mX����8A�5[����]+%������q���<\n��"MH�%�����{S���{?������D:O\�8`���:��K4��|,�1R�f�v^�R���ktO2pY������R4�D���������
(�S"
�/XJ����:��r�Rnu)d=��N��^�
���C�U�R<�N����X\l4����9�r`g�Q�C�t(6>�)u`�����=��c2�}�	��������:�SsN��&������E���G��x�J����:I����UYU��u<��k M�7TI]��d�6n�t�y#U�q5p�Eg�2�?	6����� <�^~*5���h=�����7E	w�#��diE	��/�����YJU�,��������}��@M� P.���S��)�+[f�v�z�zNsf~���Mw���/��F���0�K�@{&����rg�b<�����d ��'��i�IP���-)�����Igb�{]�M�o\e6�,j��f�(6"����0�Ef �"�5;B����p���=��da��i��#���Qx�#�(�����}���}t7�K*��h��(�K�:*n)�U��Y�l77��g���������)3G�.yQ��A���9+�����$Xe3#��9�o�
�!\u��/���h����/�D�W&���t�Ug	����)m�{�Q��a�UJ8sKK��W���wH�#����R���:��N�����d���pSx�����X�#T�M��J�]B��0�233�6����'	M]n
������������S�PTQk��,P���"Z�l�L�3L8��rv^+�9!�D��\���TZ|��*V�.�Rj�����M�J��.�m���t?"���
��U|��'���opkK��3���0#�n�l�����7c���s�w���^����2VLL��XfY�O����%"����������d�y	�K��YZ�af����5�5��R�<tEC_v�����ZD����{�w��6�(MN�����y0���YS��H�B9��(��xP:�$�{��1�oM,�����)Y���������O�_�P�y
���i)������8�C��,p��C�F�.�����`����M\��%Mv��aHA
b,�d�����Z����v��/���|��n-��=���x;����O� ��X������YZ����b�"���L���YCc��Q�u��8>�G��s��t�D9���p��ZLE9db��)L�J~�5��
�WJ����%
^-���p`��L�$�4���G.;VQL��w�@�;I�%5a�����5u�$>�X7�%����K�c��e=�\�����Y:��z� ������v�2L��l&�*����O���b'��a�B����]w:��/y��1�1��;���+�ju�d3�f��7<Y�f�=��W�a���Z�KN�{���K!K��oQ�8T�:���8U���g@�0:'���	��O�qSQ�llYh����{�e�����+���L����$�m�������E�G��*��H^�%W3��2�Cm'�%.��w9K5r:�7����~����?�L��|�$ES�e���O���\��X���������n.����o��
S4&k;��4KT��F���%�����C�^����K����d����w/��w��7�
�t���[�O�3Y�HX7��vsJK�,�2����E���WJ���u,����LY��N��������V{�8����Ka����g��C�����,>��x*�T�@T��R�0]��������+n��&���r������X$r��P
��a ��LX(���6t���Y:������j�Y�<�
�'�Dk>��}o��E��7i�{b
)���
�(���%��9��RJ^m���W���f��<�7 ����H��)Y
Ur�qB��E~mP����	��0��l��8j�����
�y~�������=�o0�V�c�P��@�j-sEY�$�c"��F+�� '��Kc��CY=��IL{��nQ�L��O���C���tg��4$���i���gl��������,Sl�D�Y�/�����!���������l[����S
�f��������V��z%5����#�����C�,�����(8j�Y�qE�B��P�l�-��@�
��v:f�_E���d�@y���A-�M����Br����e5#4��:���^2�GhU��7������i�fj��I�#��hw�ff���Y}u��2��hi���� �h��(�0.���v��g��-"������{k���XyS��7Jr�
YT�O[��W�����wz��]$�3W ��ea��+0xD(T3+)�u[	�B~gv�u��G
�"�4 � �rS5A
���#�R�y�X��${�G?#P�����"�4���)C�U�	��3�T:&�Z(�p;�6��}����s���2z4�V�g=�y���4k�������sU9c�P`�t��q���F�x��h�\����=��XQ���H^����:�,�X��c)��4
�H�L�I^�z�t!����p���D�<������Pv����U��J2���J%��N})��'�MU���E]\od�j�s��K���m�D���2������+��
������K�J�eH�)�����+�6�����8Y��%q����#b�k��u�L�6�X�'�T'�d����*��J��D9�J����
K�?�����
@�^����D*G��!������85��Y�Q�V���@���'�\Ib��[<=�9��sUEri�/�6�K�2��S6�\�(9[b#�9�j�:��;�������vq.�*`q���T��%+j6s��?vNzO����A�@�d������
)�=N�
I~	v1��t1�A$�6���Q.�DT*�����%�g;#V����,��d�|w�u�uZ�?�L%S5m5�S.�Y���:m2�!�iwhe����e�t�����4C�����8\+Z�$����WD���3N�$�n	b�%.F���BG7�����u����R�.�'�I��"C0�5��������D����H����(qW`��^����T�����=(-���(b.o��	t��r�O���U5k�W��*��*?h==-��m*��������X�c��Ne���m4�SN��|�#�����8�l���l7�7���������3J�|�1���3>��F�&�������,\VM(t�q&~?�����#ru��d��%�	���I����Mf/�hE�O0\B��Oa��sf�L'=-W�4�S����/bV�.��#V��6c��Aq\���5�:8@�)��ZDG#s���@8Q���[yj
B����H>)u[�S��R�X����RMzT���uP���e������Y	�:t���\�v����ug��DP�����.p 7TUGm�u!/�o�f�`�HL!���������D�R��	����d��5��UN����.T)���i�]�v�g�z[���qH���x���L�rk���O�*g!��C�i���f�Q��+=�C�.�x�R�@5���y��vo��4"���]%T��"�����)�T��&�rU�
�a���Y��rtC.d��k��B�<�B"U����p*�mB�&!\��E"��=[���T�o�2��I��X�b)�m��s�fca�����-8���u3�{E��h`[9���2�c.�3���!�����d�PVi�%��?C(����`P���C�?L�cx�U,v���7g���>R��h�����ZtuNc�JT�7���N��������0*�)�[���p��9M���_�[�K��K��7�������o�4��UD��J	��41����^��^*��u�����{|J���4�N}Qq�I5fUvu�7��u�
cI/�<�#Z�	�N��rM{r.�������������
*����DHSeR(K�WCt|�x��	s�� z�D�rN.g�U����=�Nu�3"��^D�u���hLBq�4H�	�F���O�O��}e��c=���u���������H�<�ec[���
s�d(n8���S'�����������,�o�R�qM��m:�*���
��V)����Jp�3��\Z����;�4{�'���7��)H�r�\�FY(-G�U2��"�:>Er���Z����eN��F�	#�L3��T"irS�4)��#$�I19�$�������(���p]p�R����b$����z�����TWT	CuZJ0F����g2��
�yE'�N�KXE����r��d}��(#�r���������RI\��J�4(������m�:p������$	U��M�6��.�*:�������]��~���wN��_��6�u���\e���D��r�}+�j��LB6���T���yd��m���!G��X����y���C����]A5$��r $}U��z��u�#B��"�*�F��~��
3����.�G�H��d�~z��Tn<"���T��T�-�����|���v����'��r7:�~�r�����>Y��nD^z���\����v���;2��Y��g��9?E�B��i��ju]�X��$��d��h`��s�����7�*��=���{���l)�DnC�H�6)������4�B�,t$��LH���(R��*��Ce��-�<0��*D�pG5y�`���V �U5@��LG�q�A6�Gl��y����H��I�j��������lGr�[�n���i��k�����Mo�6��s,]Z7�.�0Hf��I�\���{�*����$���H9d��Cin��:ZgU�����w�n�{����"1��D�aT��y��K��+w���i����>��u9C%�J���&������,N���aJ5�
E��b��0!����1v\�Qa��%��4�B_&��:I��
<��?�a��e�l6J.�����q���h���NS��������/6�;����;�;�hnl7�[_�?��g�~RA��nw�P?��m��M?���U�{��#4c}q����b5�u�9GM)����+w�!yZ��w����vh�r����*�+r5����������A���I8��u�sG���P�Q��i=	i%��FC��]P������v��:����ET'���2`��4��0����)���y�UR�������<H.O��sVUM����8�?T������i���{�rO�.�����K��rK�[�sX�OJS������	>��0�C��_���?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?o���:�hU�
#24Bruce Momjian
bruce@momjian.us
In reply to: Koichi Suzuki (#23)
Re: [HACKERS] Full page writes improvement, code update

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Koichi Suzuki wrote:

Bruce Momjian wrote:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

Thank you very much for including. Attached is an update of the patch
according to Simon Riggs's comment about GUC name.

Regards;

--
Koichi Suzuki

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#25Simon Riggs
simon@2ndquadrant.com
In reply to: Koichi Suzuki (#23)
Re: [HACKERS] Full page writes improvement, code update

On Tue, 2007-04-03 at 19:45 +0900, Koichi Suzuki wrote:

Bruce Momjian wrote:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

Thank you very much for including. Attached is an update of the patch
according to Simon Riggs's comment about GUC name.

The patch comes with its own "install kit", which is great to review
(many thanks), but hard to determine where you think code should go when
committed.

My guess based on your patch
- the patch gets applied to core :-)
- pg_compresslog *and* pg_decompresslog go to a contrib directory called
contrib/lesslog?

Can you please produce a combined patch that does all of the above, plus
edits the contrib Makefile to add all of those, as well as editing the
README so it doesn't mention the patch, just the contrib executables?

The patch looks correct to me now. I haven't tested it yet, but will be
doing so in the last week of April, which is when I'll be doing docs for
this and other stuff, since time is pressing now.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#26Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Simon Riggs (#25)
Re: [HACKERS] Full page writes improvement, code update

Hi,

I agree to put the patch to core and the others (pg_compresslog and
pg_decompresslog) to contrib/lesslog.

I will make separate materials to go to core and contrib.

As for patches, we have tested against pgbench, DBT-2 and our
propriatery benchmarks and it looked to run correctly.

Regards;

Simon Riggs wrote:

On Tue, 2007-04-03 at 19:45 +0900, Koichi Suzuki wrote:

Bruce Momjian wrote:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

Thank you very much for including. Attached is an update of the patch
according to Simon Riggs's comment about GUC name.

The patch comes with its own "install kit", which is great to review
(many thanks), but hard to determine where you think code should go when
committed.

My guess based on your patch
- the patch gets applied to core :-)
- pg_compresslog *and* pg_decompresslog go to a contrib directory called
contrib/lesslog?

Can you please produce a combined patch that does all of the above, plus
edits the contrib Makefile to add all of those, as well as editing the
README so it doesn't mention the patch, just the contrib executables?

The patch looks correct to me now. I haven't tested it yet, but will be
doing so in the last week of April, which is when I'll be doing docs for
this and other stuff, since time is pressing now.

--
Koichi Suzuki

#27Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Simon Riggs (#25)
2 attachment(s)
Re: [HACKERS] Full page writes improvement, code update

Hi,

Here're two patches for

1) lesslog_core.patch, patch for core, to set a mark to the log record
to be removed in archiving,

2) lesslog_contrib.patch, patch for contrib/lesslog, pg_compresslog and
pg_decompresslog,

respectively, as asked. I hope they work.

Regards;

Simon Riggs wrote:

On Tue, 2007-04-03 at 19:45 +0900, Koichi Suzuki wrote:

Bruce Momjian wrote:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

Thank you very much for including. Attached is an update of the patch
according to Simon Riggs's comment about GUC name.

The patch comes with its own "install kit", which is great to review
(many thanks), but hard to determine where you think code should go when
committed.

My guess based on your patch
- the patch gets applied to core :-)
- pg_compresslog *and* pg_decompresslog go to a contrib directory called
contrib/lesslog?

Can you please produce a combined patch that does all of the above, plus
edits the contrib Makefile to add all of those, as well as editing the
README so it doesn't mention the patch, just the contrib executables?

The patch looks correct to me now. I haven't tested it yet, but will be
doing so in the last week of April, which is when I'll be doing docs for
this and other stuff, since time is pressing now.

--
Koichi Suzuki

Attachments:

lesslog_core.patchtext/plain; name=lesslog_core.patchDownload
diff -Ncar postgresql-8.2.1.org/src/backend/access/transam/xlog.c postgresql-8.2.1/src/backend/access/transam/xlog.c
*** postgresql-8.2.1.org/src/backend/access/transam/xlog.c	2006-12-01 03:29:11.000000000 +0900
--- postgresql-8.2.1/src/backend/access/transam/xlog.c	2007-04-06 16:54:23.000000000 +0900
***************
*** 137,142 ****
--- 137,143 ----
  char	   *XLOG_sync_method = NULL;
  const char	XLOG_sync_method_default[] = DEFAULT_SYNC_METHOD_STR;
  bool		fullPageWrites = true;
+ bool		walAddOptimizationInfo = false;
  
  #ifdef WAL_DEBUG
  bool		XLOG_DEBUG = false;
***************
*** 626,632 ****
  				{
  					/* Buffer already referenced by earlier chain item */
  					if (dtbuf_bkp[i])
! 						rdt->data = NULL;
  					else if (rdt->data)
  					{
  						len += rdt->len;
--- 627,641 ----
  				{
  					/* Buffer already referenced by earlier chain item */
  					if (dtbuf_bkp[i])
! 					{
! 						if (fullPageWrites && walAddOptimizationInfo && rdt->data)
! 						{
! 							len += rdt->len;
! 							COMP_CRC32(rdata_crc, rdt->data, rdt->len);
! 						}
! 						else
! 							rdt->data = NULL;
! 					}
  					else if (rdt->data)
  					{
  						len += rdt->len;
***************
*** 642,648 ****
  										&(dtbuf_lsn[i]), &(dtbuf_xlg[i])))
  					{
  						dtbuf_bkp[i] = true;
! 						rdt->data = NULL;
  					}
  					else if (rdt->data)
  					{
--- 651,663 ----
  										&(dtbuf_lsn[i]), &(dtbuf_xlg[i])))
  					{
  						dtbuf_bkp[i] = true;
! 						if (fullPageWrites && walAddOptimizationInfo && rdt->data)
! 						{
! 							len += rdt->len;
! 							COMP_CRC32(rdata_crc, rdt->data, rdt->len);
! 						}
! 						else
! 							rdt->data = NULL;
  					}
  					else if (rdt->data)
  					{
***************
*** 908,913 ****
--- 923,941 ----
  		return RecPtr;
  	}
  
+ 	/*
+ 	 * If online backup is not in progress and wal_add_optimization_info is on, 
+ 	 * mark backup blocks removable if any.
+ 	 * This mark will be referenced during archiving to remove needless backup
+ 	 * blocks in the record and compress WAL segment files.
+ 	 * NOTE: wal_add_optimization_info is ignored when full_page_writes is off.
+ 	 */
+ 	if (fullPageWrites && walAddOptimizationInfo && (info & XLR_BKP_BLOCK_MASK) &&
+ 		!Insert->forcePageWrites)
+ 	{
+ 		info |= XLR_BKP_REMOVABLE;
+ 	}
+ 
  	/* Insert record header */
  
  	record = (XLogRecord *) Insert->currpos;
***************
*** 2820,2832 ****
  		blk += blen;
  	}
  
! 	/* Check that xl_tot_len agrees with our calculation */
! 	if (blk != (char *) record + record->xl_tot_len)
  	{
! 		ereport(emode,
! 				(errmsg("incorrect total length in record at %X/%X",
! 						recptr.xlogid, recptr.xrecoff)));
! 		return false;
  	}
  
  	/* Finally include the record header */
--- 2848,2876 ----
  		blk += blen;
  	}
  
! 	/*
! 	 * If physical log has not been removed, check the length to see
! 	 * the following.
! 	 *   - No physical log existed originally,
! 	 *   - WAL record was not removable because it is generated during
! 	 *     the online backup,
! 	 *   - Cannot be removed because the physical log spanned in
! 	 *     two segments.
! 	 * The reason why we skip the length check on the physical log removal is
! 	 * that the flag XLR_SET_BKB_BLOCK(0..2) is reset to zero and it prevents
! 	 * the above loop to proceed blk to the end of the record.
! 	 */
! 	if (!(record->xl_info & XLR_BKP_REMOVABLE) ||
! 		record->xl_info & XLR_BKP_BLOCK_MASK)
  	{
! 		/* Check that xl_tot_len agrees with our calculation */
! 		if (blk != (char *) record + record->xl_tot_len)
! 		{
! 			ereport(emode,
! 					(errmsg("incorrect total length in record at %X/%X",
! 							recptr.xlogid, recptr.xrecoff)));
! 			return false;
! 		}
  	}
  
  	/* Finally include the record header */
diff -Ncar postgresql-8.2.1.org/src/backend/utils/misc/guc.c postgresql-8.2.1/src/backend/utils/misc/guc.c
*** postgresql-8.2.1.org/src/backend/utils/misc/guc.c	2006-11-29 23:50:07.000000000 +0900
--- postgresql-8.2.1/src/backend/utils/misc/guc.c	2007-04-06 16:54:23.000000000 +0900
***************
*** 97,102 ****
--- 97,103 ----
  extern int	CommitSiblings;
  extern char *default_tablespace;
  extern bool fullPageWrites;
+ extern bool walAddOptimizationInfo;
  
  #ifdef TRACE_SORT
  extern bool trace_sort;
***************
*** 546,551 ****
--- 547,560 ----
  		true, NULL, NULL
  	},
  	{
+ 		{"wal_add_optimization_info", PGC_SIGHUP, WAL_SETTINGS,
+ 			gettext_noop("Writes logical log corresponding to full pages in WAL record."),
+ 			gettext_noop("")
+ 		},
+ 		&walAddOptimizationInfo,
+ 		false, NULL, NULL
+ 	},
+ 	{
  		{"silent_mode", PGC_POSTMASTER, LOGGING_WHEN,
  			gettext_noop("Runs the server silently."),
  			gettext_noop("If this parameter is set, the server will automatically run in the "
diff -Ncar postgresql-8.2.1.org/src/backend/utils/misc/postgresql.conf.sample postgresql-8.2.1/src/backend/utils/misc/postgresql.conf.sample
*** postgresql-8.2.1.org/src/backend/utils/misc/postgresql.conf.sample	2006-11-21 10:23:37.000000000 +0900
--- postgresql-8.2.1/src/backend/utils/misc/postgresql.conf.sample	2007-04-06 16:54:23.000000000 +0900
***************
*** 154,159 ****
--- 154,161 ----
  					#   fsync_writethrough
  					#   open_sync
  #full_page_writes = on			# recover from partial page writes
+ #wal_add_optimization_info = off	# write logical log correspond to full
+ 					# page.
  #wal_buffers = 64kB			# min 32kB
  					# (change requires restart)
  #commit_delay = 0			# range 0-100000, in microseconds
diff -Ncar postgresql-8.2.1.org/src/include/access/xlog.h postgresql-8.2.1/src/include/access/xlog.h
*** postgresql-8.2.1.org/src/include/access/xlog.h	2006-11-06 07:42:10.000000000 +0900
--- postgresql-8.2.1/src/include/access/xlog.h	2007-04-06 16:54:23.000000000 +0900
***************
*** 66,73 ****
  /*
   * If we backed up any disk blocks with the XLOG record, we use flag bits in
   * xl_info to signal it.  We support backup of up to 3 disk blocks per XLOG
!  * record.	(Could support 4 if we cared to dedicate all the xl_info bits for
!  * this purpose; currently bit 0 of xl_info is unused and available.)
   */
  #define XLR_BKP_BLOCK_MASK		0x0E	/* all info bits used for bkp blocks */
  #define XLR_MAX_BKP_BLOCKS		3
--- 66,74 ----
  /*
   * If we backed up any disk blocks with the XLOG record, we use flag bits in
   * xl_info to signal it.  We support backup of up to 3 disk blocks per XLOG
!  * record.
!  * Bit 0 of xl_info is used to represent that backup blocks are not necessary
!  * in archive-log.
   */
  #define XLR_BKP_BLOCK_MASK		0x0E	/* all info bits used for bkp blocks */
  #define XLR_MAX_BKP_BLOCKS		3
***************
*** 75,80 ****
--- 76,82 ----
  #define XLR_BKP_BLOCK_1			XLR_SET_BKP_BLOCK(0)	/* 0x08 */
  #define XLR_BKP_BLOCK_2			XLR_SET_BKP_BLOCK(1)	/* 0x04 */
  #define XLR_BKP_BLOCK_3			XLR_SET_BKP_BLOCK(2)	/* 0x02 */
+ #define XLR_BKP_REMOVABLE		XLR_SET_BKP_BLOCK(3)	/* 0x01 */
  
  /*
   * Sometimes we log records which are out of transaction control.
lesslog_contrib.patchtext/plain; name=lesslog_contrib.patchDownload
diff -Ncr postgresql-8.2.1.org/contrib/Makefile postgresql-8.2.1/contrib/Makefile
*** postgresql-8.2.1.org/contrib/Makefile	2006-09-09 13:07:51.000000000 +0900
--- postgresql-8.2.1/contrib/Makefile	2007-04-06 17:14:21.000000000 +0900
***************
*** 16,21 ****
--- 16,22 ----
  		intagg		\
  		intarray	\
  		isn		\
+ 		lesslog	\
  		lo		\
  		ltree		\
  		oid2name	\
diff -Ncr postgresql-8.2.1.org/contrib/README postgresql-8.2.1/contrib/README
*** postgresql-8.2.1.org/contrib/README	2006-09-09 13:07:52.000000000 +0900
--- postgresql-8.2.1/contrib/README	2007-04-06 17:14:21.000000000 +0900
***************
*** 68,73 ****
--- 68,77 ----
  	PostgreSQL type extensions for ISBN, ISSN, ISMN, EAN13 product numbers
  	by Germ�n M�ndez Bravo (Kronuz) <kronuz@hotmail.com>
  
+ lesslog -
+     Reduce archive log file size by removing unnecessary physical log.
+ 	by Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp>
+ 
  lo -
  	Large Object maintenance
  	by Peter Mount <peter@retep.org.uk> 
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/Makefile postgresql-8.2.1/contrib/lesslog/Makefile
*** postgresql-8.2.1.org/contrib/lesslog/Makefile	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/Makefile	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,3 ----
+ all install clean:
+ 	$(MAKE) -f Makefile.pg_compresslog $@
+ 	$(MAKE) -f Makefile.pg_decompresslog $@
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/Makefile.pg_compresslog postgresql-8.2.1/contrib/lesslog/Makefile.pg_compresslog
*** postgresql-8.2.1.org/contrib/lesslog/Makefile.pg_compresslog	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/Makefile.pg_compresslog	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,19 ----
+ PROGRAM = pg_compresslog
+ OBJS	= pg_compresslog.o file.o debug.o
+ 
+ PG_CPPFLAGS = -I$(libpq_srcdir)
+ PG_LIBS = $(libpq_pgport) $(top_builddir)/src/backend/utils/hash/pg_crc.o
+ 
+ DOCS = README.lesslog
+ 
+ ifdef USE_PGXS
+ PGXS := $(shell pg_config --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/pg_compresslog
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
+ 
+ $(OBJS): Makefile.pg_compresslog
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/Makefile.pg_decompresslog postgresql-8.2.1/contrib/lesslog/Makefile.pg_decompresslog
*** postgresql-8.2.1.org/contrib/lesslog/Makefile.pg_decompresslog	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/Makefile.pg_decompresslog	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,19 ----
+ PROGRAM = pg_decompresslog
+ OBJS	= pg_decompresslog.o file.o debug.o
+ 
+ PG_CPPFLAGS = -I$(libpq_srcdir)
+ PG_LIBS = $(libpq_pgport) $(top_builddir)/src/backend/utils/hash/pg_crc.o
+ 
+ DOCS = 
+ 
+ ifdef USE_PGXS
+ PGXS := $(shell pg_config --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/pg_decompresslog
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
+ 
+ $(OBJS): Makefile.pg_decompresslog
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/README.lesslog postgresql-8.2.1/contrib/lesslog/README.lesslog
*** postgresql-8.2.1.org/contrib/lesslog/README.lesslog	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/README.lesslog	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,71 ----
+ lesslog README        2006/04/06
+ 
+ ** What is lesslog?
+ 
+ lesslog is a set of tools to reduce the size of PostgreSQL archive log.  lesslog consists of the following materials.
+ 
+ - pg_compresslog
+     This is a command to remove physical log records with "removable" mark. 
+     This command should be specified as archive_command in postgresql.conf.  
+     This command also removes page headers by changing page size from 8kB to
+     16MB, which are restored by pg_decompresslog.
+ 
+ - pg_decompresslog
+     This command restores page headers and add dummy data to make up for
+     physical log record, finally restores LSN of each log record and restores
+     the page size to be used in the archive recovery. This command should be
+     specified as restore_command in recovery.conf.
+ 
+ 
+ ** How to use lesslog
+ 1. Build and install the additional tools.
+     Move to contrib/lesslog directory, then make and make install.    
+     pg_compresslog and pg_decompresslog will be installed to PostgreSQL install
+     directory.
+ 2. Edit postgresql.conf
+     Edit postgresql.conf which is copied to DB cluster by initdb and edit
+     parameters as follows.
+ 
+         full_page_writes = on
+         wal_add_optimization_info = on
+         archive_command = 'pg_compresslog "%p" <archive directory>/"%f"'
+ 
+ ** How to use pg_compresslog
+ Synopsis
+     pg_compresslog [from [to]]
+ 
+ Explanation
+     pg_compresslog removes physical log from the WAL segment file specified by
+     <from> and archives as <to> file name.
+ 
+     if <from> is omitted or specfied as "-", it reads setment file from stdin. 
+     If <to> is omitted or specified as "-", it means stdout.
+ 
+     Physical log records removed by pg_compresslog are those written while
+     online backup is not running and both full_page_writes and
+     wal_add_optimization_info are "on".
+ 
+     To use the output of pg_compresslog command in archive recovery, it must be
+     restored using pg_decompresslog command.
+ 
+ Return value
+     pg_compresslog returns zero if no error occurs, 0 if error occurs.
+ 
+ ** How to use pg_decompresslog
+ Synopsis
+     pg_decompresslog [from [to]]
+ 
+ Explanation
+     pg_decompresslog reads archive log file specified by <from> argument and
+     restores an area corresponds to the removed physical log, which restores
+     LSN of each log record, and writes them to the file specified by <to>
+     argument.
+ 
+     If <from> is omitted or specified as "-", it reads from stdin.   If <to> is
+     omitted of specified as "-", it writes to stdout.
+ 
+     You can specifiy the file written by pg_compresslog as <from> argument.
+ 
+ Return value
+     It returns zero if no error occurs, 1 if error occurs.
+ 
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/debug.c postgresql-8.2.1/contrib/lesslog/debug.c
*** postgresql-8.2.1.org/contrib/lesslog/debug.c	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/debug.c	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,125 ----
+ /*
+  * debug.c
+  *    Debug dump function implementation.
+  */
+ #include <stdio.h>
+ #include <errno.h>
+ 
+ #include "postgres.h"
+ #include "access/xlog.h"
+ #include "access/xlog_internal.h"
+ 
+ void get_segment_id(const char *filename);
+ void dump_record(XLogRecPtr *ptr, size_t off, XLogRecord *precord);
+ void dump_page_header(int num, XLogPageHeader pheader);
+ void dumpXLogRecord(XLogRecPtr *ptr, size_t off, XLogRecord *record);
+ 
+ /* Current segment ID  */
+ static uint32 segment_id;
+ 
+ /* List for the resource manager. */
+ static const char * const RM_names[RM_MAX_ID + 1] = {
+ 	"XLOG ",					/* 0 */
+ 	"XACT ",					/* 1 */
+ 	"SMGR ",					/* 2 */
+ 	"CLOG ",					/* 3 */
+ 	"DBASE",					/* 4 */
+ 	"TBSPC",					/* 5 */
+ 	"MXACT",					/* 6 */
+ 	"RM  7",					/* 7 */
+ 	"RM  8",					/* 8 */
+ 	"RM  9",					/* 9 */
+ 	"HEAP ",					/* 10 */
+ 	"BTREE",					/* 11 */
+ 	"HASH ",					/* 12 */
+ 	"RTREE",					/* 13 */
+ 	"GIST ",					/* 14 */
+ 	"SEQ  " 					/* 15 */
+ };
+ 
+ /*
+  * Obtain segment ID from WAL segment file name.
+  *
+  * Parameters:
+  *    filename: WAL segment file name.
+  *
+  * Note: If no slash mark (path delimiter) is included in the argument, or if
+  * the argument does not follow WAL segment file name format, nothing will
+  * happen.
+  */
+ void
+ get_segment_id(const char *filename)
+ {
+ 	TimeLineID tli;
+ 	uint32 xlogid;
+ 	char *p;
+ 	p = strrchr(filename, '/');
+ 	if (!p)
+ 		return;
+ 	p++;
+ 	if (sscanf(p, "%08X%08X%08X", &tli, &xlogid, &segment_id) != 3)
+ 		return;
+ }
+ 
+ /*
+  * Dump the page header content.
+  *
+  * Paramters:
+  *    num: Page number to be included in the dump output.
+  *    page: Target page.
+  */
+ void
+ dump_page_header(int num, XLogPageHeader page)
+ {
+ 	printf("=[%04d]==================================================\n", num);
+ 	printf("PAGE: xlp_magic=%02X\n", page->xlp_magic);
+ 	printf("PAGE: xlp_info=%02X\n", page->xlp_info);
+ 	printf("PAGE: xlp_tli=%u\n", page->xlp_tli);
+ 	printf("PAGE: xlogid=%u\n", page->xlp_pageaddr.xlogid);
+ 	printf("PAGE: xrecoff=%u\n", page->xlp_pageaddr.xrecoff);
+ 	if (page->xlp_info & XLP_FIRST_IS_CONTRECORD)
+ 	{
+ 		XLogContRecord *cont =
+ 			(XLogContRecord *)((char *)page + XLogPageHeaderSize(page));
+ 		printf("PAGE: rem_len=%u\n", cont->xl_rem_len);
+ 	}
+ 	printf("=========================================================\n");
+ }
+ 
+ /*
+  * Dump record header content in xlogdump format.
+  *
+  * Parameters:
+  *    ptr: Record position information (only log ID will be used).
+  *    off: Record offset within the segment.
+  *    record: Pointer to the record.
+  *
+  * Note: the source is copied and modified using xlogdump source.
+  */
+ void
+ dumpXLogRecord(XLogRecPtr *ptr, size_t off, XLogRecord *record)
+ {
+ 	static XLogRecPtr prevRecPtr = { 0, 0};
+ 
+ 	printf("%u/%08X: prv %u/%08X",
+ 		   ptr->xlogid, (uint32)off + segment_id * XLOG_SEG_SIZE,
+ 		   record->xl_prev.xlogid, record->xl_prev.xrecoff);
+ 
+ 	if (!XLByteEQ(record->xl_prev, prevRecPtr))
+ 		printf("(?)");
+ 	prevRecPtr.xlogid = ptr->xlogid;
+ 	prevRecPtr.xrecoff = (uint32)off + segment_id * XLOG_SEG_SIZE;
+ 
+ 	printf("; xid %u; ", record->xl_xid);
+ 
+ 	if (record->xl_rmid <= RM_MAX_ID)
+ 		printf("%s", RM_names[record->xl_rmid]);
+ 	else
+ 		printf("RM %2d", record->xl_rmid);
+ 
+ 	printf(" info %02X len %u tot_len %u\n", record->xl_info,
+ 		   record->xl_len, record->xl_tot_len);
+ 
+ 	fflush(stdout);
+ }
+ 
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/debug.h postgresql-8.2.1/contrib/lesslog/debug.h
*** postgresql-8.2.1.org/contrib/lesslog/debug.h	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/debug.h	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,28 ----
+ /*
+  * debug.h
+  *    Interface for debug dump function.
+  */
+ #ifndef DEBUG_H_INCLUDED
+ #define DEBUG_H_INCLUDED
+ 
+ #include "access/xlog.h"
+ #include "access/xlog_internal.h"
+ 
+ /*
+  * In the release, debug function call itself will be eliminated.
+  */
+ #ifdef DEBUG
+ 
+ void get_segment_id(const char *filename);
+ void dump_page_header(int num, XLogPageHeader pheader);
+ void dumpXLogRecord(XLogRecPtr *ptr, size_t off, XLogRecord *record);
+ 
+ #else
+ 
+ #define get_segment_id(a)
+ #define dump_page_header(a, b)
+ #define dumpXLogRecord(a, b, c)
+ 
+ #endif
+ 
+ #endif
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/file.c postgresql-8.2.1/contrib/lesslog/file.c
*** postgresql-8.2.1.org/contrib/lesslog/file.c	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/file.c	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,180 ----
+ /*
+  * file.c
+  *    Common I/O routine implementation used in archive/restoration
+  */
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <string.h>
+ #include <sys/types.h>
+ #include <sys/stat.h>
+ #include <fcntl.h>
+ #include <unistd.h>
+ #include <errno.h>
+ 
+ #include "postgres.h"
+ #include "access/xlog_internal.h"
+ 
+ #include "file.h"
+ 
+ /*
+  * Read the file data and return the size actually read.
+  * The length to read is specified by an argument.
+  *
+  * Parameters:
+  *    fd: File descriptor.
+  *    buff: Buffer to read.
+  *    len: Size to read.
+  *
+  * Note: If error occurd, exit(2) will be called here and will not return to the
+  * caller in this case.
+  */
+ int
+ read_buff(int fd, char *buff, size_t len)
+ {
+ 	int ret;
+ 	size_t read_len = 0;
+ 
+ 	do
+ 	{
+ 		ret = read(fd, buff + read_len, len - read_len);
+ 		if (ret < 0)
+ 		{
+ 			if (errno == EINTR)
+ 				continue;
+ 			fprintf(stderr, "failed to read : %s\n", strerror(errno));
+ 			exit(1);
+ 		}
+ 		else if (ret == 0)
+ 			break;
+ 		read_len += ret;
+ 	} while (read_len < len);
+ 
+ 	return read_len;
+ }
+ 
+ /*
+  * Write to the file and return length actually written.
+  * The length to write should be specified by an argument.
+  *
+  * Parameter:
+  *    fd: File descriptor.
+  *    buff: Buffer to write.
+  *    len: Size to write.
+  *
+  * Note: If an error occurs, exit(2) will be called in this function and will
+  * not return to the caller in this case.
+  */
+ void
+ write_buff(int fd, const char *buff, size_t len)
+ {
+ 	int ret;
+ 	int written_len = 0;
+ 
+ 	do
+ 	{
+ 		ret = write(fd, buff + written_len, len - written_len);
+ 		if (ret < 0)
+ 		{
+ 			if (errno == EINTR)
+ 				continue;
+ 			fprintf(stderr, "failed to write : %s\n", strerror(errno));
+ 			exit(1);
+ 		}
+ 		written_len += ret;
+ 	} while (written_len < len);
+ }
+ 
+ /*
+  * Copy the contents of the file.
+  *
+  * Parameter:
+  *    from_fd: File descriptor of the file to copy from.
+  *    to_fd: File descriptor of the file to copy to.
+  *
+  * Note: If error occurs in this function, exit(2) will be called here and will
+  * not return to the caller in this case.
+  */
+ void
+ copy_file(int from_fd, int to_fd)
+ {
+ 	int read_len = 0;
+ 	char buff[8 * 1024];	/* 8KB buffer */
+ 
+ 	while (1)
+ 	{
+ 		/* Read to the buffer. */
+ 		read_len = read(from_fd, buff, sizeof(buff));
+ 		if (read_len < 0)
+ 		{
+ 			if (errno == EINTR)
+ 				continue;
+ 			fprintf(stderr, "failed to read : %s\n", strerror(errno));
+ 			exit(1);
+ 		}
+ 		else if (read_len == 0)
+ 			break;
+ 		/* Write all the buffer content. */
+ 		write_buff(to_fd, buff, read_len);
+ 	}
+ 
+ 	return;
+ }
+ 
+ /*
+  * Validate the record by comparing CRC value.
+  *
+  * CRC value will be calculated in the following order.
+  *  - Logical Log
+  *  - Full page write (if exists)
+  *  - Record header (exluding the CRC area)
+  *
+  * Parameters:
+  *    precord: Pointer to the target record.
+  */
+ bool
+ is_valid_record(XLogRecord *precord)
+ {
+ 	pg_crc32 crc;
+ 	BkpBlock *pblk;
+ 	int i;
+ 
+ 	/* Calculate CRC for a logical log. */
+ 	INIT_CRC32(crc);
+ 	COMP_CRC32(crc, XLogRecGetData(precord), precord->xl_len);
+ 
+ 	/*
+ 	 * If full page writes exist, calculate CRC for each full page write.
+ 	 */
+ 	pblk = (BkpBlock *)((char *)XLogRecGetData(precord) + precord->xl_len);
+ 	for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++)
+ 	{
+ 		uint32 blen;
+ 
+ 		if (!(precord->xl_info & XLR_SET_BKP_BLOCK(i)))
+ 			continue;
+ 
+ 		if (pblk->hole_offset + pblk->hole_length > BLCKSZ)
+ 		{
+ 			fprintf(stderr, "incorrect hole size in record.\n");
+ 			return false;
+ 		}
+ 
+ 		blen = sizeof(BkpBlock) + BLCKSZ - pblk->hole_length;
+ 		COMP_CRC32(crc, (char *)pblk, blen);
+ 		pblk = (BkpBlock *)((char *)pblk + blen);
+ 	}
+ 
+ 	/* Calculate record header CRC value. */
+ 	COMP_CRC32(crc, (char *)precord + sizeof(pg_crc32),
+ 		SizeOfXLogRecord - sizeof(pg_crc32));
+ 
+ 	/* Examine if the final CRC is the same as the value found in the record. */
+ 	FIN_CRC32(crc);
+ 	if (!EQ_CRC32(precord->xl_crc, crc))
+ 	{
+ 		fprintf(stderr, "incorrect resource manager data checksum.\n");
+ 		return false;
+ 	}
+ 
+ 	return true;
+ }
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/file.h postgresql-8.2.1/contrib/lesslog/file.h
*** postgresql-8.2.1.org/contrib/lesslog/file.h	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/file.h	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,28 ----
+ /*
+  * file.h
+  *    Common file I/O routines for pg_archive and pg_restore.
+  */
+ #ifndef FILE_H_INCLUDED
+ #define FILE_H_INCLUDED
+ 
+ #include "postgres.h"
+ #include "access/xlog.h"
+ 
+ int read_buff(int fd, char *buff, size_t len);
+ void write_buff(int fd, const char *buff, size_t len);
+ void copy_file(int from_fd, int to_fd);
+ bool is_valid_record(XLogRecord *precord);
+ 
+ /*
+  * Check if the page header in the buffer is valid.
+  */
+ #define IS_WAL_FILE(buff) \
+ 	(((XLogPageHeader)(buff))->xlp_magic == XLOG_PAGE_MAGIC)
+ 
+ /*
+  * Check if the record is log switch WAL record.
+  */
+ #define IS_XLOG_SWITCH(rec) \
+ 	((rec)->xl_rmid == RM_XLOG_ID && (rec)->xl_info == XLOG_SWITCH)
+ 
+ #endif /* !FILE_H_INCLUDED */
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/pg_compresslog.c postgresql-8.2.1/contrib/lesslog/pg_compresslog.c
*** postgresql-8.2.1.org/contrib/lesslog/pg_compresslog.c	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/pg_compresslog.c	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,450 ----
+ /*
+  * pg_compresslog.c
+  *    Implementation of the archive command (pg_compresslog).
+  */
+ #include <stdio.h>
+ #include <sys/types.h>
+ #include <sys/stat.h>
+ #include <fcntl.h>
+ #include <unistd.h>
+ #include <errno.h>
+ 
+ #include "postgres.h"
+ #include "access/xlog.h"
+ #include "access/xlog_internal.h"
+ #include "catalog/pg_control.h"
+ 
+ #include "file.h"
+ #include "debug.h"
+ 
+ /* =============================================================================
+  * Global variables
+  * ===========================================================================*/
+ 
+ /* Buffer to read WAL segment file. */
+ static char xlog_buff[XLogSegSize];
+ /* Buffer to hold archive log file image with physical log removed. */
+ static char arch_buff[XLogSegSize];
+ 
+ static int cont_log_size;		/* Log size considering the former segment. */
+ static int logical_log_size;	/* Total size of the logical log. */
+ static int physical_log_size;	/* Total size of the physical log. */
+ 
+ /* =============================================================================
+  * Prototype declaration
+  * ===========================================================================*/
+ static void print_usage(int code);
+ static int open_xlog_file(int argc, char *argv[]);
+ static int open_arch_file(int argc, char *argv[]);
+ int create_arch_image(const char *from, char *to);
+ static bool remove_bkp_block(XLogRecord *record);
+ 
+ /* =============================================================================
+  * Macros
+  * ===========================================================================*/
+ /* Check if the physical log can be removed. */
+ #define IS_REMOVABLE(record) \
+ 	(((record)->xl_info & XLR_BKP_BLOCK_MASK) && \
+ 	((record)->xl_info & XLR_BKP_REMOVABLE))
+ 
+ /* =============================================================================
+  * Function definitions
+  * ===========================================================================*/
+ /*
+  * Entry point of pg_compresslog command.
+  */
+ int
+ main(int argc, char *argv[])
+ {
+ 	int from_fd = -1;
+ 	int to_fd = -1;
+ 	size_t xlog_len;
+ 	size_t arch_len;
+ 
+ 	/* Error if there are more argument(s) other than from and to. */
+ 	if (argc > 3)
+ 		print_usage(1);
+ 
+ 	/* Open WAL segment file to archive. */
+ 	from_fd = open_xlog_file(argc, argv);
+ 
+ 	/*
+ 	 * If input file is not stdin, check the size of the file.
+ 	 * If the size is not 16MB, then the specified file is not the WAL
+ 	 * segment file.   We're not sure if we can scan the file to find
+ 	 * removable physical log, copy the whole file and then exits.
+ 	 */
+ 	if (from_fd != fileno(stdin))
+ 	{
+ 		struct stat st;
+ 
+ 		if (fstat(from_fd, &st) < 0)
+ 		{
+ 			fprintf(stderr, "failed to stat `%s': %s\n", argv[1],
+ 				strerror(errno));
+ 			exit(1);
+ 		}
+ 		if (st.st_size != XLogSegSize)
+ 		{
+ 			to_fd = open_arch_file(argc, argv);
+ 			copy_file(from_fd, to_fd);
+ 			exit(0);
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Read all the data from WAL segment file to archive.
+ 	 * If the amount of the data is not sufficient (less than 16MB: XLogSegSize)
+ 	 * or header is not valid, specified input file is not a WAL segment file.  
+ 	 * Copy the whole input to the output and then exit.
+ 	 */
+ 	xlog_len = read_buff(from_fd, xlog_buff, XLogSegSize);
+ 	if (xlog_len != XLogSegSize || !IS_WAL_FILE(xlog_buff))
+ 	{
+ 		/* Write the checked header part and then copy the rest of the input file. */
+ 		to_fd = open_arch_file(argc, argv);
+ 		write_buff(to_fd, xlog_buff, xlog_len);
+ 		copy_file(from_fd, to_fd);
+ 		if (close(from_fd) < 0)
+ 		{
+ 			fprintf(stderr, "failed to close `%s': %s\n", argv[1],
+ 				strerror(errno));
+ 			exit(1);
+ 		}
+ 		exit(0);
+ 	}
+ 	if (close(from_fd) < 0)
+ 	{
+ 		fprintf(stderr, "failed to close `%s': %s\n", argv[1], strerror(errno));
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Build the entire compressed output file image on the buffer,
+ 	 * removing physical logs, then write the whole compressed file image.
+ 	 */
+ 	arch_len = create_arch_image(xlog_buff, arch_buff); 
+ 	to_fd = open_arch_file(argc, argv);
+ 	write_buff(to_fd, arch_buff, arch_len);
+ 	if (close(to_fd) < 0)
+ 	{
+ 		fprintf(stderr, "failed to close `%s': %s\n", argv[2], strerror(errno));
+ 		exit(1);
+ 	}
+ 
+ 	exit(0);
+ }
+ 
+ /*
+  * Show the usage of the command and then exits with specified code.
+  */
+ static void
+ print_usage(int code)
+ {
+ 	printf(
+ 		"usage: pg_compresslog [from [to]]\n"
+ 		"    from - Input file name (stdin if omitted or '-' is given)\n"
+ 		"    to   - Output file name (stdiout if omitted or '-' is given)\n"
+ 	);
+ 	exit(code);
+ }
+ 
+ /*
+  * Build archive log file image of 16MB, from WAL segment buffer image of
+  * 8kB pages. And return size of archive log file image.
+  *
+  * Parameters:
+  *    from: WAL segment buffer page
+  *    to: Archive log file image
+  *
+  * Note: exit() will be called here when a error is detected.  In the case of
+  * error, the control will not be given to the caller.
+  */
+ int
+ create_arch_image(const char *from, char *to)
+ {
+ 	const char *read_pos = from;
+ 	char *write_pos = to;
+ 	const char *crrpage = from;
+ 	XLogPageHeader page = (XLogPageHeader)from;
+ 	XLogRecord *rec = NULL;
+ 	XLogRecord *write_rec = NULL;
+ 
+ 	/*
+ 	 * Copy the first page header of the segment to the buffer.
+ 	 * If the record is the successor of the last record of the former segment,
+ 	 * then copies XLogContRecord too.
+ 	 * XLogContRecord.xl_rem_len means the total data length of the
+ 	 * continuation record, not the length of the record in the given page.
+ 	 * Therefore, this value is not influenced by the change of the page size.
+ 	 */
+ 	read_pos = crrpage = from;
+ 	memcpy(write_pos, read_pos, XLogPageHeaderSize(page));
+ 	write_pos += XLogPageHeaderSize(page);
+ 	read_pos += XLogPageHeaderSize(page);
+ 	if (page->xlp_info & XLP_FIRST_IS_CONTRECORD)
+ 	{
+ 		memcpy(write_pos, read_pos, SizeOfXLogContRecord);
+ 		write_pos += SizeOfXLogContRecord;
+ 	}
+ 
+ 	/*
+ 	 * Loop page by page.
+ 	 */
+ 	for (crrpage = from; crrpage < from + XLogSegSize; crrpage += XLOG_BLCKSZ)
+ 	{
+ 		XLogRecPtr ptr;
+ 
+ 		/* Parse the page header. */
+ 		page = (XLogPageHeader)crrpage;
+ 		read_pos = crrpage + XLogPageHeaderSize(page);
+ 		ptr = page->xlp_pageaddr;
+ 
+ 		/* If there is a continuous data, copy them to the write buffer. */
+ 		if (page->xlp_info & XLP_FIRST_IS_CONTRECORD)
+ 		{
+ 			int cont_len = ((XLogContRecord *)read_pos)->xl_rem_len;
+ 			int copy_len = cont_len;
+ 			int free_len = XLOG_BLCKSZ -
+ 				(read_pos + SizeOfXLogContRecord - crrpage);
+ 			/*
+ 			 * Copy the continuous data within this page.
+ 			 * xl_rem_len specifies the length of the continuous data after this page,
+ 			 * so this may be larger than the length of the rest of this page.
+ 			 */
+ 			if (copy_len > free_len)
+ 				copy_len = free_len;
+ 			memcpy(write_pos, read_pos + SizeOfXLogContRecord, copy_len);
+ 			read_pos += MAXALIGN(SizeOfXLogContRecord + copy_len);
+ 			write_pos += copy_len;
+ 			if (!rec)
+ 				cont_log_size += copy_len;
+ 
+ 			/*
+ 			 * If the data continues to the next page and no record header
+ 			 * exists in this file, then switch to the next page.
+ 			 */
+ 			if (cont_len != copy_len)
+ 				continue;
+ 
+ 			/*
+ 			 * Set the write position to the end of the current record, 
+ 			 * considering alignment.
+ 			 */
+ 			write_pos = to + MAXALIGN(write_pos - to);
+ 
+ 			/*
+ 			 * If the record should have a header in this segment (not a continuous
+ 			 * record from the last segment), perform CRC check and check if
+ 			 * physical log record can be removed.
+ 			 */
+ 			if (write_rec)
+ 			{
+ 				/* Check if the record is valid. */
+ 				if (!is_valid_record(write_rec))
+ 					exit(1);
+ 
+ 				/*
+ 				 * Determine if the physical log can be removed.
+ 				 * If it can be removed, then rewind the position for the next log record
+ 				 * to the position of the physical log (plus padding).
+ 				 */
+ 				if (remove_bkp_block(write_rec))
+ 					write_pos = (char *)write_rec +
+ 						MAXALIGN(SizeOfXLogRecord + rec->xl_len);
+ 			}
+ 		}
+ 
+ 		/* Read the data within the page record by record. */
+ 		while(read_pos <= crrpage + XLOG_BLCKSZ - SizeOfXLogRecord)
+ 		{
+ 			int freespace = XLOG_BLCKSZ - (read_pos - crrpage);
+ 
+ 			/* Obtain the record header info. */
+ 			rec = (XLogRecord *)read_pos;
+ 			write_rec = (XLogRecord *)write_pos;
+ 			logical_log_size += rec->xl_len;
+ 			physical_log_size +=
+ 				rec->xl_tot_len - (SizeOfXLogRecord + rec->xl_len);
+ 			dumpXLogRecord(&ptr, read_pos - from, rec);
+ 
+ 			/*
+ 			 * If the record continues to the following pages, copy only the portion
+ 			 * in this page and then switch to the next page.
+ 			 */
+ 			if (rec->xl_tot_len > freespace)
+ 			{
+ 				/* Copy the log data only in the current page. */
+ 				memcpy(write_pos, read_pos, freespace);
+ 				/* read_pos will be overwritten at the next loop.  We don't need to update this here. */
+ 				write_pos += freespace;
+ 				break;
+ 			}
+ 
+ 			/* Copy the record data to the archive buffer.  */
+ 			memcpy(write_pos, read_pos, rec->xl_tot_len);
+ 			read_pos += MAXALIGN(rec->xl_tot_len);
+ 			write_pos += MAXALIGN(rec->xl_tot_len);
+ 
+ 			/* Check if the record is valid using CRC in the record header.  */
+ 			if (!is_valid_record(write_rec))
+ 				exit(1);
+ 
+ 			/*
+ 			 *  Log record other than log switch must have it's logical data.
+ 			 * See the comment around the line 3065 of src/backend/access/transam/xlog.c
+ 			 * (8.2.0).a
+ 			 */
+ 			if (IS_XLOG_SWITCH(write_rec))
+ 			{
+ 				if (write_rec->xl_len != 0)
+ 				{
+ 					fprintf(stderr, "invalid xlog switch record.\n");
+ 					exit(1);
+ 				}
+ 			}
+ 			else if (write_rec->xl_len == 0)
+ 			{
+ 				fprintf(stderr, "invalid record length.\n");
+ 				exit(1);
+ 			}
+ 
+ 			/* If the log record is the log switch record, then no more log record exists
+ 			 * in * the input file.   Exit.
+ 			 */
+ 			if (IS_XLOG_SWITCH(write_rec))
+ 				return write_pos - to;
+ 
+ 			/*
+ 			 * If the physical log is removable, then rewind the position of the next
+ 			 * record to
+ 			 * the physical log start position (and padding).
+ 			 */
+ 			if (remove_bkp_block(write_rec))
+ 				write_pos = (char *)write_rec +
+ 					MAXALIGN(SizeOfXLogRecord + write_rec->xl_len);
+ 			else
+ 				write_pos = to + MAXALIGN(write_pos - to);
+ 
+ 		}
+ 	}
+ 
+ 	return (write_pos - to);
+ }
+ 
+ /*
+  * Remove the physical log which was marked `REMOVABLE'.
+  * Return true if the physical record has been removed, false otherwise.
+  */
+ static bool
+ remove_bkp_block(XLogRecord *record)
+ {
+ 	pg_crc32 crc;
+ 
+ 	/*
+ 	 * If no record is specified or the physical log is not removable, just
+ 	 * return.
+ 	 */
+ 	if (!record || !IS_REMOVABLE(record))
+ 		return false;
+ 
+ 	/*
+ 	 * Reset XLR_BKP_BLOCK_MASK. 
+ 	 * We need the flag to show the physical log is removable to restore
+ 	 * removed physical log with a dummy.  It is not reset.
+ 	 */
+ 	record->xl_info &= ~XLR_BKP_BLOCK_MASK;
+ 
+ 	/*
+ 	 * Record contents changes by physical log removal and CRC has to be
+ 	 * recalculated.
+ 	 * CRC will be accumulated as follows:
+ 	 * 1. Logical log
+ 	 * 2. Physical log (It has ben removed and we don't calculate its CRC here).
+ 	 * 3. WAL record header excluding CRC part
+ 	 * Please refer to the line 2817 of RecordIsValid(), src/backend/access/transam/xlog.c.
+ 	 */
+ 	INIT_CRC32(crc);
+ 	COMP_CRC32(crc, XLogRecGetData(record), record->xl_len);
+ 	COMP_CRC32(crc, (char *)record + sizeof(pg_crc32),
+ 		SizeOfXLogRecord - sizeof(pg_crc32));
+ 	FIN_CRC32(crc);
+ 	record->xl_crc = crc;
+ 
+ 	return true;
+ }
+ 
+ /*
+  * Open the WAL segment file to archive and return file descriptor.
+  * 
+  * The first argument of pg_compresslog will be regarded as an input file.
+  * If omitted or specified as "-", stdin will be used as an input file.
+  *
+  * Parameters:
+  *    argc: Number of arguments (argument to main() will be passed as is).
+  *    argv: Array of pointers to argument strings (argument to main() will be
+  *    passed as is).
+  *
+  * Note: exit() will be called here if error occurs.  Will not return to the
+  * caller in this case.
+  */
+ static int
+ open_xlog_file(int argc, char *argv[])
+ {
+ 	int from_fd = -1;
+ 
+ 	if (argc > 1 && strcmp(argv[1], "-") != 0)
+ 	{
+ 		/* Open WAL segment file to archive. */
+ 		from_fd = open(argv[1], O_RDONLY, 0);
+ 		if (from_fd < 0)
+ 		{
+ 			fprintf(stderr, "failed to open `%s': %s\n", argv[1],
+ 				strerror(errno));
+ 			exit(1);
+ 		}
+ 
+ 		/* Obtain segment ID from the file name (for record dump). */
+ 		get_segment_id(argv[1]);
+ 	}
+ 	else
+ 		from_fd = fileno(stdin);
+ 
+ 	return from_fd;
+ }
+ 
+ /*
+  * Open the archive segment file to write the result and return file descriptor.
+  * 
+  * The second argument to pg_compresslog will be regarded as an output file.
+  * If omitted or specified as "-", stdout will be used as an output file.
+  *
+  * Parameters:
+  *    argc: Number of arguments (argument to main() will be passed as is).
+  *    argv: Array of pointers to argument strings (argument to main() will be
+  *
+  * Note: When an error occurs within this function, exit() will be called here
+  * and will not return to the caller in this case.
+  */
+ static int
+ open_arch_file(int argc, char *argv[])
+ {
+ 	int to_fd = -1;
+ 
+ 	if (argc > 2 && strcmp(argv[2], "-") != 0)
+ 	{
+ 		/* Open the archive log file. */
+ 		to_fd = open(argv[2], O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
+ 			S_IRUSR | S_IWUSR);
+ 		if (to_fd < 0)
+ 		{
+ 			fprintf(stderr, "failed to open `%s': %s\n", argv[2],
+ 				strerror(errno));
+ 			exit(1);
+ 		}
+ 	}
+ 	else
+ 		to_fd = fileno(stdout);
+ 
+ 	return to_fd;
+ }
diff -Ncr postgresql-8.2.1.org/contrib/lesslog/pg_decompresslog.c postgresql-8.2.1/contrib/lesslog/pg_decompresslog.c
*** postgresql-8.2.1.org/contrib/lesslog/pg_decompresslog.c	1970-01-01 09:00:00.000000000 +0900
--- postgresql-8.2.1/contrib/lesslog/pg_decompresslog.c	2007-04-06 17:14:21.000000000 +0900
***************
*** 0 ****
--- 1,543 ----
+ /*
+  * file pg_decompresslog.c
+  *    Implementation of the archive restore command (pg_decompresslog).
+  */
+ #include <stdio.h>
+ #include <sys/types.h>
+ #include <sys/stat.h>
+ #include <fcntl.h>
+ #include <unistd.h>
+ 
+ #include "postgres.h"
+ #include "access/xlog.h"
+ #include "access/xlog_internal.h"
+ #include "catalog/pg_control.h"
+ 
+ #include "file.h"
+ #include "debug.h"
+ 
+ /* =============================================================================
+  * Global variables
+  * ===========================================================================*/
+ 
+ /* Buffer to hold restored WAL segment file image. */
+ static char xlog_buff[XLogSegSize];
+ /* Buffer to read an archive log file. */
+ static char arch_buff[XLogSegSize];
+ /* Position to write a record data. */
+ static char *write_pos = xlog_buff;
+ /* Position to read record data in the archive log buffer. */
+ static char *read_pos = arch_buff;
+ /* This holds data in the first page header of the segment. */
+ static XLogPageHeaderData baseheader;
+ 
+ /* =============================================================================
+  * Prototype declaration
+  * ===========================================================================*/
+ static void print_usage(int code);
+ int create_wal_image(int arch_len);
+ static int write_record(char *record_buff, int rem_len, bool isFromPrevSeg);
+ static int get_freespace(void);
+ static void insert_XLogContRecord(char *write_pos, int rem_len);
+ static void insert_pageheader(char *write_pos, XLogPageHeader pheader,
+ 	bool hasContRecord);
+ static int open_arch_file(int argc, char *argv[]);
+ static int open_xlog_file(int argc, char *argv[]);
+ 
+ /* =============================================================================
+  * Function definitions
+  * ===========================================================================*/
+ 
+ /*
+  * Entry point of pg_decompresslog command.
+  */
+ int
+ main(int argc, char *argv[])
+ {
+ 	int from_fd = -1;
+ 	int to_fd = -1;
+ 	size_t arch_len;
+ 
+ 	/* Error if argument(s) other than <from>, and <to> are given.  */
+ 	if (argc > 3)
+ 		print_usage(1);
+ 
+ 	/* Open the archive log file to restore. */
+ 	from_fd = open_arch_file(argc, argv);
+ 
+ 	/*
+ 	 * Read all the data in the input archive log file.
+ 	 * If the header at the first page is not valid, it is not a WAL segment file
+ 	 * and then copy the whole input file to the output file.
+ 	 */
+ 	arch_len = read_buff(from_fd, arch_buff, XLogSegSize);
+ 	if (!IS_WAL_FILE(arch_buff))
+ 	{
+ 		/* Write what is read for header validation check and then copy the rest of the input file. */
+ 		to_fd = open_xlog_file(argc, argv);
+ 		write_buff(to_fd, arch_buff, arch_len);
+ 		copy_file(from_fd, to_fd);
+ 		if (close(from_fd) < 0)
+ 		{
+ 			fprintf(stderr, "failed to close `%s': %s\n", argv[1],
+ 				strerror(errno));
+ 			exit(1);
+ 		}
+ 		exit(0);
+ 	}
+ 	if (close(from_fd) < 0)
+ 	{
+ 		fprintf(stderr, "failed to close `%s': %s\n", argv[1], strerror(errno));
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Build the restored WAL segment file image.
+ 	 * Write all the restored WAL segment file image.
+ 	 */
+ 	if (create_wal_image((int)arch_len))
+ 	{
+ 		fprintf(stderr, "failed to create the image of `%s'\n", argv[1]);
+ 		exit(1);
+ 	}
+ 	to_fd = open_xlog_file(argc, argv);
+ 	write_buff(to_fd, xlog_buff, XLogSegSize);
+ 	if (close(to_fd) < 0)
+ 	{
+ 		fprintf(stderr, "failed to close `%s': %s\n", argv[2], strerror(errno));
+ 		exit(1);
+ 	}
+ 	
+ 	exit(0);
+ }
+ 
+ /*
+  * Show the usage of the command and exit with specified code.
+  */
+ static void
+ print_usage(int code)
+ {
+ 	printf(
+ 		"usage: pg_decompresslog [from [to]]\n"
+ 		"    from - Iput file name (stdin if omitted or specified as '-')\n"
+ 		"    to   - Output file name (stdout if omitted or specified as '-')\n"
+ 	);
+ 	exit(code);
+ }
+ 
+ /*
+  * Restore 8KB page WAl segment file image from 16MB page archive log build by
+  * pg_compresslog command.
+  *
+  * Parameters:
+  *   arch_len: Size of the archive log file.
+  */
+ int
+ create_wal_image(int arch_len)
+ {
+ 	/* Buffer holding one record data. */
+ 	static char record_buff[XLogSegSize];
+ 	XLogPageHeader pheader;
+ 	XLogContRecord *pcontrec = NULL;
+ 	XLogRecord *precord = NULL;
+ 	char *rec_write_pos = record_buff;
+ 	int rec_len = 0;
+ 	bool isFromPrevSeg = false;
+ 
+ 	/*
+ 	 * Copy the archive log file page header and hold info in the header.
+ 	 * They are used to restore page headers of WAL segment file.
+ 	 */
+ 	pheader = (XLogPageHeader)arch_buff;
+ 	if (XLogPageHeaderSize(pheader) != SizeOfXLogLongPHD)
+ 	{
+ 		fprintf(stderr, "invalid pageheader size.\n");
+ 		return -1;
+ 	}
+ 	memcpy(write_pos, (char *)pheader, SizeOfXLogLongPHD);
+ 	read_pos += SizeOfXLogLongPHD;
+ 	write_pos += SizeOfXLogLongPHD;
+ 
+ 	baseheader.xlp_magic = pheader->xlp_magic;
+ 	baseheader.xlp_info &= ~XLP_ALL_FLAGS;
+ 	baseheader.xlp_tli = pheader->xlp_tli;
+ 	baseheader.xlp_pageaddr.xlogid = pheader->xlp_pageaddr.xlogid;
+ 	baseheader.xlp_pageaddr.xrecoff = pheader->xlp_pageaddr.xrecoff;
+ 
+ 	/*
+ 	 * Copy XLogContRecord and the continuous record to the record buffer, if there is
+ 	 * a continuous record from the last segment file.  Then move them to WAL segment
+ 	 * file image buffer.
+ 	 */
+ 	if (pheader->xlp_info & XLP_FIRST_IS_CONTRECORD)
+ 	{
+ 		pcontrec = (XLogContRecord *)read_pos;
+ 
+ 		/* If the size of the continue record is not valid, it's an error. */
+ 		if (pcontrec->xl_rem_len == 0)
+ 		{
+ 			printf("invalid continue record length : xl_rem_len = %u\n",
+ 													pcontrec->xl_rem_len);
+ 			return -1;
+ 		}
+ 
+ 		memcpy(write_pos, read_pos, SizeOfXLogContRecord);
+ 		write_pos += SizeOfXLogContRecord;
+ 		rec_len = pcontrec->xl_rem_len;
+ 		memcpy(rec_write_pos, (read_pos + SizeOfXLogContRecord), rec_len);
+ 		read_pos += MAXALIGN(SizeOfXLogContRecord + rec_len);
+ 		isFromPrevSeg = true;
+ 
+ 		/* Write the continuous data to WAL segment file image buffer. */
+ 		if (write_record(record_buff, rec_len, isFromPrevSeg))
+ 			return 0;
+ 	}
+ 	isFromPrevSeg = false;
+ 	dump_page_header(((write_pos - xlog_buff) / XLOG_BLCKSZ), 
+ 												(XLogPageHeader)xlog_buff);
+ 
+ 	/*
+ 	 * Loop record by record, and build each record image in the record buffer.
+ 	 */
+ 	while ((read_pos - arch_buff) < arch_len)
+ 	{
+ 		/* Set the write position of the record data. */
+ 		rec_write_pos = record_buff;
+ 		precord = (XLogRecord *)read_pos;
+ 
+ 		/* 
+ 		 * If the record data fits in the current segment, validate the record.
+ 		 * If WAL record cotinues to the next segment, we cannot calculate CRC for
+ 		 * the whole record and skip the validation.
+ 		 */
+ 		if ((char *)precord - arch_buff + precord->xl_tot_len <= arch_len)
+ 			if (!is_valid_record(precord))
+ 				exit(1);
+ 
+ 		/*
+ 		 * Record other than the log switch must have corresponding logical data.
+ 		 * Refer to the comment around the line 3056 in src/backend/access/transam/xlog.c (8.2.0).
+ 		 */
+ 		if (IS_XLOG_SWITCH(precord))
+ 		{
+ 			if (precord->xl_len != 0)
+ 			{
+ 				fprintf(stderr, "invalid xlog switch record.\n");
+ 				exit(1);
+ 			}
+ 		}
+ 		else if (precord->xl_len == 0)
+ 		{
+ 			fprintf(stderr, "invalid record length.\n");
+ 			exit(1);
+ 		}
+ 
+ 		/*
+ 		 * Copy the record header and the logical log to the record buffer.
+ 		 * We don't move read_pos here to cauculate the alignment considering
+ 		 * physical log.
+ 		 */
+ 		memcpy(rec_write_pos, read_pos, (SizeOfXLogRecord + precord->xl_len));
+ 		rec_write_pos += (SizeOfXLogRecord + precord->xl_len);
+ 		rec_len = precord->xl_tot_len;
+ 		
+ 		/* Copy the physical log, or restore it. */
+ 		if (precord->xl_tot_len > SizeOfXLogRecord + precord->xl_len)
+ 		{
+ 			/*
+ 			 * If physical log does exist (not removed), then simply copy it.
+ 			 * If physical log is removed, then build a dummy.
+ 			 */
+ 			if (precord->xl_info & XLR_BKP_BLOCK_MASK)
+ 			{
+ 				memcpy(rec_write_pos, 
+ 					XLogRecGetData((XLogRecord *)read_pos) + precord->xl_len,
+ 					precord->xl_tot_len - (SizeOfXLogRecord + precord->xl_len));
+ 				read_pos += MAXALIGN(precord->xl_tot_len);
+ 			}
+ 			else
+ 			{
+ 				/*
+ 				 * Because full page write flag will be omitted during the archiving, CRC check
+ 				 * should be performed only against the record header and the logical log.
+ 				 * Therefore, we don't have to recalculate CRC value here.
+ 				 */
+ 				memset(rec_write_pos, '\0', 
+ 					precord->xl_tot_len - (SizeOfXLogRecord + precord->xl_len));
+ 				read_pos += MAXALIGN(SizeOfXLogRecord + precord->xl_len);
+ 			}
+ 		}
+ 		else
+ 		{
+ 			/* If theres not physical log in the original log, simply updates the input position. */
+ 			read_pos += MAXALIGN(precord->xl_tot_len);
+ 		}
+ 
+ 		/*
+ 		 * Write a record image to the WAL segment image buffer.  Hey, the page size
+ 		 * is again 8kB as the original WAL.
+ 		 */
+ 		if (write_record(record_buff, rec_len, isFromPrevSeg))
+ 			break;
+ 
+ 	}
+ 
+ 	return 0;
+ }
+ 
+ /*
+  * Build a record image to fit in 8kB page to WAL segment image buffer.
+  *
+  * Return value:
+  *    0: One record data build complete.
+  *    1: Hit the tail of the segment.
+  */
+ static int
+ write_record(char *record_buff, int rem_len, bool isFromPrevSeg)
+ {
+ 	char *phead_pos = NULL;
+ 	char *rec_read_pos = record_buff; /* Read position in the record buffer. */
+ 	char *rec_head_pos = NULL;
+ 	int freespace; /* Size of the free space in the page. */
+ 	bool hasContRecord = isFromPrevSeg;
+ 
+ 	/*
+ 	 * Hold the position of the record header (or XLogContRecord).
+ 	 * It is needed for an alignment.
+ 	 */
+ 	rec_head_pos = write_pos; /* Set the write position of the record data. */
+ 	if (isFromPrevSeg)
+ 		rec_head_pos -= SizeOfXLogContRecord;
+ 
+ 	freespace = get_freespace();
+ 
+  	/*
+  	 * If free space size is the same as the page size, it means that the last record restoration
+  	 * filled the last page.   So we add page header here.
+  	 * Because the record is complete in the last page, we don't need XLogContRecord.
+ 	 * Page header at the top of the segment must have had written at the first call of this function.
+ 	 * So we always add short format header here.
+  	 */
+  	if (freespace == XLOG_BLCKSZ)
+  	{
+  		phead_pos = write_pos;
+  		insert_pageheader(write_pos, &baseheader, false);
+  		write_pos += SizeOfXLogShortPHD;
+  		freespace = (XLOG_BLCKSZ - SizeOfXLogShortPHD);
+  		rec_head_pos = write_pos;
+  		dump_page_header(((write_pos - xlog_buff) / XLOG_BLCKSZ),
+  											(XLogPageHeader)phead_pos);
+  	}
+ 
+ 	/*
+ 	 * Loop page by page.
+ 	 */
+ 	while(1)
+ 	{
+ 		/*
+ 		 * If the record header does not fit the page, then insert a page header to the next
+ 		 * page and copy the record data.
+ 		 */
+ 		if (!hasContRecord && freespace < SizeOfXLogRecord)
+ 		{
+ 			write_pos += freespace;
+ 		}
+ 		else if (freespace < rem_len)
+ 		{
+ 			/*
+ 			 * If the record data does not fit the page, fill this page with the former
+ 			 * part of the record, copy the rest to the next page, insert a page header
+ 			 * and XLogContRecord.
+ 			 * the next page.
+ 			 */
+ 			memcpy(write_pos, rec_read_pos, freespace);
+ 			if (!hasContRecord)
+ 				dumpXLogRecord(&baseheader.xlp_pageaddr,
+ 					 			(size_t)(rec_head_pos - xlog_buff),
+ 												(XLogRecord *)rec_head_pos);
+ 			write_pos += freespace;
+ 			rec_read_pos += freespace;
+ 			rem_len -= freespace;
+ 			hasContRecord = true;
+ 		}
+ 		else
+ 		{
+ 			/*
+ 			 * If th recor data fits to the page, copy the whole record data to the
+ 			 * buffer and switch to the next record.
+ 			 */
+ 			int len;
+ 			memcpy(write_pos, rec_read_pos, rem_len);
+ 			if (!hasContRecord)
+ 				dumpXLogRecord(&baseheader.xlp_pageaddr,
+ 					 			(size_t)(rec_head_pos - xlog_buff),
+ 												(XLogRecord *)rec_head_pos);
+ 
+ 			/*
+ 			 * Alignment handling.
+ 			 * Alignment has to be adjusted for each record.
+ 			 */
+ 			if (hasContRecord)
+ 				len = MAXALIGN(SizeOfXLogContRecord + rem_len);
+ 			else
+ 				len = MAXALIGN(rem_len);
+ 			write_pos = rec_head_pos + len;
+ 			hasContRecord = false;
+ 
+ 			break;
+ 		}
+ 
+ 		/*
+ 		 * Insert a page header.
+ 		 * If the start of the page is a continuous data from the last page, 
+ 		 * insert XLogContRecor too.
+ 		 */
+ 		if ((write_pos - xlog_buff) >= XLogSegSize)
+ 			return 1;
+ 		phead_pos = write_pos;
+ 		insert_pageheader(write_pos, &baseheader, hasContRecord);
+ 		write_pos += SizeOfXLogShortPHD;
+ 		freespace = (XLOG_BLCKSZ - SizeOfXLogShortPHD);
+ 		rec_head_pos = write_pos;
+ 		if (hasContRecord)
+ 		{
+ 			insert_XLogContRecord(write_pos, rem_len);
+ 			write_pos += SizeOfXLogContRecord;
+ 			freespace -= SizeOfXLogContRecord;
+ 		}
+ 		dump_page_header(((write_pos - xlog_buff) / XLOG_BLCKSZ),
+ 											(XLogPageHeader)phead_pos);
+ 	}
+ 	return 0;
+ }
+ 
+ /*
+  * Calculate free space size of the page.
+  */
+ static int
+ get_freespace(void)
+ {
+ 	return XLOG_BLCKSZ - (write_pos - xlog_buff) % XLOG_BLCKSZ;
+ }
+ 
+ /*
+  * Insert a XLogContRecord to the buffer.
+  *
+  * Parameters:
+  *    write_pos: Write position in the buffer.
+  *    rem_len: Length of the remaining record which continues from the last
+  *    page.
+  */
+ static void
+ insert_XLogContRecord(char *write_pos, int rem_len)
+ {
+ 	XLogContRecord contrec;
+ 
+ 	contrec.xl_rem_len = rem_len;
+ 	memcpy(write_pos, (char *)&contrec, SizeOfXLogContRecord);
+ }
+ 
+ /*
+  * Insert a page header to the buffer.
+  *
+  * Parameters:
+  *    write_pos: Write position in the buffer.
+  *    pheader: Pointer to the structure holding header info at the firt page of 
+  *    the segment.
+  *    hasContRecord: Flag to indicate a continuous record from the last page.
+  */
+ static void
+ insert_pageheader(char *write_pos, XLogPageHeader pheader, bool hasContRecord)
+ {
+ 	/*
+ 	 * Each page header is restored using the page header at the first page of
+ 	 * the WAL segment.   Magic number (xlp_magic), timeline id (xlp_tli) and
+ 	 * XLOGID (xlogid) should no change within a segment and they are copied
+ 	 * from the first page header.  Continuous data (xlp_info) depends on the
+ 	 * record of a given page.
+ 	 * xrecoff is calculated by adding XLOG_BLKSZ to xrecoff value in the first
+ 	 * page header.
+ 	 */
+ 	pheader->xlp_info &= ~XLP_ALL_FLAGS;
+ 	if (hasContRecord)
+ 		pheader->xlp_info |= XLP_FIRST_IS_CONTRECORD;
+ 	pheader->xlp_pageaddr.xrecoff += XLOG_BLCKSZ;
+ 	memcpy(write_pos, (char *)pheader, SizeOfXLogShortPHD);
+ }
+ 
+ /*
+  * Open thie archive log file to be restored and return file descriptor.
+  * 
+  * The first argument of the command is an input file name.
+  * If omitted or specified as "-", stdin will be used as an input file.
+  *
+  * Parameters:
+  *    argc: This is one of the argument to the command, number of the arguments.
+  *    argv: This is one of the argument to the command, a pointer arry to the
+  *    argument list.
+  *
+  * Note: If error occurs within this function, whole command will exit here
+  * using exit() and the caller will not have any chance to take care of errors.
+  */
+ static int
+ open_arch_file(int argc, char *argv[])
+ {
+ 	int from_fd = -1;
+ 
+ 	if (argc > 1 && strcmp(argv[1], "-") != 0)
+ 	{
+ 		/* Open archive log file to restore. */
+ 		from_fd = open(argv[1], O_RDONLY, 0);
+ 		if (from_fd < 0)
+ 		{
+ 			fprintf(stderr, "failed to open `%s': %s\n", argv[1],
+ 				strerror(errno));
+ 			exit(1);
+ 		}
+ 
+ 		/* Obtain the segment ID from the file name (for record dump). */
+ 		get_segment_id(argv[1]);
+ 	}
+ 	else
+ 		from_fd = fileno(stdin);
+ 
+ 	return from_fd;
+ }
+ 
+ /*
+  * Open the WAL segment file to write the restored data and return file
+  * descriptor.
+  * 
+  * The secomd argument to the command will be regarded as an output file.
+  * If omitted or specified as "-", stdout will be used as the output file.
+  *
+  * Parameters:
+  *    argc: This is one of the argument to the command, number of the arguments.
+  *    argv: This is one of the argument to the command, a pointer arry to the
+  *    argument list.
+  *
+  * Note: If error occurs within this function, whole command will exit here
+  * using exit() and the caller will not have any chance to take care of errors.
+  */
+ static int
+ open_xlog_file(int argc, char *argv[])
+ {
+ 	int to_fd = -1;
+ 
+ 	if (argc > 2 && strcmp(argv[2], "-") != 0)
+ 	{
+ 		/* Open the WAL segment file */
+ 		to_fd = open(argv[2], O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
+ 			S_IRUSR | S_IWUSR);
+ 		if (to_fd < 0)
+ 		{
+ 			fprintf(stderr, "failed to open `%s': %s\n", argv[2],
+ 				strerror(errno));
+ 			exit(1);
+ 		}
+ 	}
+ 	else
+ 		to_fd = fileno(stdout);
+ 
+ 	return to_fd;
+ }
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Koichi Suzuki (#9)
Re: [HACKERS] Full page writes improvement, code update

Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes:

My proposal is to remove unnecessary full page writes (they are needed
in crash recovery from inconsistent or partial writes) when we copy WAL
to archive log and rebuilt them as a dummy when we restore from archive
log.
...
Benchmark: DBT-2
Database size: 120WH (12.3GB)
Total WAL size: 4.2GB (after 60min. run)
Elapsed time:
cp: 120.6sec
gzip: 590.0sec
pg_compresslog: 79.4sec
Resultant archive log size:
cp: 4.2GB
gzip: 2.2GB
pg_compresslog: 0.3GB
Resource consumption:
cp: user: 0.5sec system: 15.8sec idle: 16.9sec I/O wait: 87.7sec
gzip: user: 286.2sec system: 8.6sec idle: 260.5sec I/O wait: 36.0sec
pg_compresslog:
user: 7.9sec system: 5.5sec idle: 37.8sec I/O wait: 28.4sec

What checkpoint settings were used to make this comparison? I'm
wondering whether much of the same benefit can't be bought at zero cost
by increasing the checkpoint interval, because that translates directly
to a reduction in the number of full-page images inserted into WAL.

Also, how much was the database run itself slowed down by the increased
volume of WAL (due to duplicated information)? It seems rather
pointless to me to measure only the archiving effort without any
consideration for the impact on the database server proper.

regards, tom lane

PS: there's something fishy about the gzip numbers ... why all the idle
time?

#29Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Tom Lane (#28)
Re: [HACKERS] Full page writes improvement, code update

Hi,

In the case below, we run DBT-2 benchmark for one hour to get the
measure. Checkpoint occured three times (checkpoint interval was 20min).

For more information, when checkpoint interval is one hour, the amount
of the archived log size was as follows:
cp: 3.1GB
gzip: 1.5GB
pg_compresslog: 0.3GB

For both cases, database size was 12.7GB, relatively small.

As pointed out, if we don't run the checkpoint forever, the value for cp
will become close to that for pg_compresslog, but it is not practical.

The point here is, if we collect archive log with cp and the average
work load is a quarter of the full power, cp archiving will produce
about 0.8GB archive log per hour (for DBT-2 case, of course the size
depends on the nature of the transaction). If we run the database
whole day, the amount of the archive log will be as large as database
itself. After one week, archive log size gets seven times as large as
the database itself. This is the point. In production, such large
archive log will raise storage cost. The purpose of the proposal is
not to improve the performance, but to decrease the size of archive log
to save necessary storage, preserving the same chance of recovery at the
crash recovery as full_page_writes=on.

Because of DBT-2 nature, it is not meaningful to compare the throuput
(databsae size determines the number of transactions to run). Instead,
I compared the throuput using pgbench. These measures are: cp:
570tps, gzip:558tps, pg_compresslog: 574tps, negligible difference.

In terms of idle time for gzip and other command to archive WAL offline,
no difference in the environment was given other than the command to
archive. My guess is because the user time is very large in gzip, it
has more chance for scheduler to give resource to other processes. In
the case of cp, idle time is more than 30times longer than user time.
Pg_compresslog uses seven times longer idle time than user time. On the
other hand, gzip uses less idle time than user time. Considering the
total amount of user time, I think it's reasonable measure.

Again, in my proposal, it is not the issue to increase run time
performance. Issue is to decrease the size of archive log to save the
storage.

Regards;

Tom Lane wrote:

Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes:

My proposal is to remove unnecessary full page writes (they are needed
in crash recovery from inconsistent or partial writes) when we copy WAL
to archive log and rebuilt them as a dummy when we restore from archive
log.
...
Benchmark: DBT-2
Database size: 120WH (12.3GB)
Total WAL size: 4.2GB (after 60min. run)
Elapsed time:
cp: 120.6sec
gzip: 590.0sec
pg_compresslog: 79.4sec
Resultant archive log size:
cp: 4.2GB
gzip: 2.2GB
pg_compresslog: 0.3GB
Resource consumption:
cp: user: 0.5sec system: 15.8sec idle: 16.9sec I/O wait: 87.7sec
gzip: user: 286.2sec system: 8.6sec idle: 260.5sec I/O wait: 36.0sec
pg_compresslog:
user: 7.9sec system: 5.5sec idle: 37.8sec I/O wait: 28.4sec

What checkpoint settings were used to make this comparison? I'm
wondering whether much of the same benefit can't be bought at zero cost
by increasing the checkpoint interval, because that translates directly
to a reduction in the number of full-page images inserted into WAL.

Also, how much was the database run itself slowed down by the increased
volume of WAL (due to duplicated information)? It seems rather
pointless to me to measure only the archiving effort without any
consideration for the impact on the database server proper.

regards, tom lane

PS: there's something fishy about the gzip numbers ... why all the idle
time?

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Koichi Suzuki

#30Joshua D. Drake
jd@commandprompt.com
In reply to: Koichi Suzuki (#29)
Re: [HACKERS] Full page writes improvement, code update

In terms of idle time for gzip and other command to archive WAL offline,
no difference in the environment was given other than the command to
archive. My guess is because the user time is very large in gzip, it
has more chance for scheduler to give resource to other processes. In
the case of cp, idle time is more than 30times longer than user time.
Pg_compresslog uses seven times longer idle time than user time. On the
other hand, gzip uses less idle time than user time. Considering the
total amount of user time, I think it's reasonable measure.

Again, in my proposal, it is not the issue to increase run time
performance. Issue is to decrease the size of archive log to save the
storage.

Considering the relatively little amount of storage a transaction log
takes, it would seem to me that the performance angle is more appropriate.

Is it more efficient in other ways besides negligible tps? Possibly more
efficient memory usage? Better restore times for a crashed system?

Sincerely,

Joshua D. Drake

Regards;

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#31Hannu Krosing
hannu@skype.net
In reply to: Joshua D. Drake (#30)
Re: [HACKERS] Full page writes improvement, code update

Ühel kenal päeval, T, 2007-04-10 kell 18:17, kirjutas Joshua D. Drake:

In terms of idle time for gzip and other command to archive WAL offline,
no difference in the environment was given other than the command to
archive. My guess is because the user time is very large in gzip, it
has more chance for scheduler to give resource to other processes. In
the case of cp, idle time is more than 30times longer than user time.
Pg_compresslog uses seven times longer idle time than user time. On the
other hand, gzip uses less idle time than user time. Considering the
total amount of user time, I think it's reasonable measure.

Again, in my proposal, it is not the issue to increase run time
performance. Issue is to decrease the size of archive log to save the
storage.

Considering the relatively little amount of storage a transaction log
takes, it would seem to me that the performance angle is more appropriate.

As I understand it it's not about transaction log but about write-ahead
log.

and the amount of data in WAL can become very important once you have to
keep standby servers in different physical locations (cities, countries
or continents) where channel throughput and cost comes into play.

With simple cp (scp/rsync) the amount of WAL data needing to be copied
is about 10x more than data collected by trigger based solutions
(Slony/pgQ). With pg_compresslog WAL-shipping seems to have roughly the
same amount and thus becomes a viable alternative again.

Is it more efficient in other ways besides negligible tps? Possibly more
efficient memory usage? Better restore times for a crashed system?

I think that TPS is more affected by number of writes than size of each
block written, so there is probably not that much to gain in TPS, except
perhaps from better disk cache usage.

For me pg_compresslog seems to be a winner even if it just does not
degrade performance.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Koichi Suzuki (#29)
Re: [HACKERS] Full page writes improvement, code update

Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes:

For more information, when checkpoint interval is one hour, the amount
of the archived log size was as follows:
cp: 3.1GB
gzip: 1.5GB
pg_compresslog: 0.3GB

The notion that 90% of the WAL could be backup blocks even at very long
checkpoint intervals struck me as excessive, so I went looking for a
reason, and I may have found one. There has been a bug in CVS HEAD
since Feb 8 causing every btree page split record to include a backup
block whether needed or not. If these numbers were taken with recent
8.3 code, please retest with current HEAD.

regards, tom lane

#33Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Tom Lane (#32)
Re: [HACKERS] Full page writes improvement, code update

The score below was taken based on 8.2 code, not 8.3 code. So I don't
think the below measure is introduced only in 8.3 code.

Tom Lane wrote:

Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes:

For more information, when checkpoint interval is one hour, the amount
of the archived log size was as follows:
cp: 3.1GB
gzip: 1.5GB
pg_compresslog: 0.3GB

The notion that 90% of the WAL could be backup blocks even at very long
checkpoint intervals struck me as excessive, so I went looking for a
reason, and I may have found one. There has been a bug in CVS HEAD
since Feb 8 causing every btree page split record to include a backup
block whether needed or not. If these numbers were taken with recent
8.3 code, please retest with current HEAD.

regards, tom lane

--
Koichi Suzuki

#34Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Hannu Krosing (#31)
Re: [HACKERS] Full page writes improvement, code update

I don't fully understand what "transaction log" means. If it means
"archived WAL", the current (8.2) code handle WAL as follows:

1) If full_page_writes=off, then no full page writes will be written to
WAL, except for those during onlie backup (between pg_start_backup and
pg_stop_backup). The WAL size will be considerably small but it cannot
recover from partial/inconsistent write to the database files. We have
to go back to the online backup and apply all the archive log.

2) If full_page_writes=on, then full page writes will be written at the
first update of a page after each checkpoint, plus full page writes at
1). Because we have no means (in 8.2) to optimize the WAL so far, what
we can do is to copy WAL or gzip it at archive time.

If we'd like to keep good chance of recovery after the crash, 8.2
provides only the method 2), leaving archive log size considerably
large. My proposal maintains the chance of crash recovery the same as
in the case of full_page_writes=on and reduces the size of archived log
as in the case of full_page_writes=off.

Regards;

Hannu Krosing wrote:

Ühel kenal päeval, T, 2007-04-10 kell 18:17, kirjutas Joshua D. Drake:

In terms of idle time for gzip and other command to archive WAL offline,
no difference in the environment was given other than the command to
archive. My guess is because the user time is very large in gzip, it
has more chance for scheduler to give resource to other processes. In
the case of cp, idle time is more than 30times longer than user time.
Pg_compresslog uses seven times longer idle time than user time. On the
other hand, gzip uses less idle time than user time. Considering the
total amount of user time, I think it's reasonable measure.

Again, in my proposal, it is not the issue to increase run time
performance. Issue is to decrease the size of archive log to save the
storage.

Considering the relatively little amount of storage a transaction log
takes, it would seem to me that the performance angle is more appropriate.

As I understand it it's not about transaction log but about write-ahead
log.

and the amount of data in WAL can become very important once you have to
keep standby servers in different physical locations (cities, countries
or continents) where channel throughput and cost comes into play.

With simple cp (scp/rsync) the amount of WAL data needing to be copied
is about 10x more than data collected by trigger based solutions
(Slony/pgQ). With pg_compresslog WAL-shipping seems to have roughly the
same amount and thus becomes a viable alternative again.

Is it more efficient in other ways besides negligible tps? Possibly more
efficient memory usage? Better restore times for a crashed system?

I think that TPS is more affected by number of writes than size of each
block written, so there is probably not that much to gain in TPS, except
perhaps from better disk cache usage.

For me pg_compresslog seems to be a winner even if it just does not
degrade performance.

--
Koichi Suzuki

#35Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Koichi Suzuki (#34)
Re: [HACKERS] Full page writes improvement, code update

I don't fully understand what "transaction log" means. If it means
"archived WAL", the current (8.2) code handle WAL as follows:

Probably we can define "transaction log" to be the part of WAL that is
not
full pages.

1) If full_page_writes=off, then no full page writes will be
written to WAL, except for those during onlie backup (between
pg_start_backup and
pg_stop_backup). The WAL size will be considerably small
but it cannot
recover from partial/inconsistent write to the database
files. We have to go back to the online backup and apply all
the archive log.

2) If full_page_writes=on, then full page writes will be
written at the first update of a page after each checkpoint,
plus full page writes at
1). Because we have no means (in 8.2) to optimize the WAL
so far, what
we can do is to copy WAL or gzip it at archive time.

If we'd like to keep good chance of recovery after the crash,
8.2 provides only the method 2), leaving archive log size
considerably large. My proposal maintains the chance of
crash recovery the same as in the case of full_page_writes=on
and reduces the size of archived log as in the case of
full_page_writes=off.

Yup, this is a good summary.

You say you need to remove the optimization that avoids
the logging of a new tuple because the full page image exists.
I think we must already have the info in WAL which tuple inside the full
page image
is new (the one for which we avoided the WAL entry for).

How about this:
Leave current WAL as it is and only add the not removeable flag to full
pages.
pg_compresslog then replaces the full page image with a record for the
one tuple that is changed.
I tend to think it is not worth the increased complexity only to save
bytes in the uncompressed WAL though.

Another point about pg_decompresslog:

Why do you need a pg_decompresslog ? Imho pg_compresslog should already
do the replacing of the
full_page with the dummy entry. Then pg_decompresslog could be a simple
gunzip, or whatever compression was used,
but no logic.

Andreas

#36Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Zeugswetter Andreas ADI SD (#35)
Re: [HACKERS] Full page writes improvement, code update

Hi,

Sorry, inline reply.

Zeugswetter Andreas ADI SD wrote:

Yup, this is a good summary.

You say you need to remove the optimization that avoids
the logging of a new tuple because the full page image exists.
I think we must already have the info in WAL which tuple inside the full
page image
is new (the one for which we avoided the WAL entry for).

How about this:
Leave current WAL as it is and only add the not removeable flag to full
pages.
pg_compresslog then replaces the full page image with a record for the
one tuple that is changed.
I tend to think it is not worth the increased complexity only to save
bytes in the uncompressed WAL though.

It is essentially what my patch proposes. My patch includes flag to
full page writes which "can be" removed.

Another point about pg_decompresslog:

Why do you need a pg_decompresslog ? Imho pg_compresslog should already
do the replacing of the
full_page with the dummy entry. Then pg_decompresslog could be a simple
gunzip, or whatever compression was used,
but no logic.

Just removing full page writes does not work. If we shift the rest of
the WAL, then LSN becomes inconsistent in compressed archive logs which
pg_compresslog produces. For recovery, we have to restore LSN as the
original WAL. Pg_decompresslog restores removed full page writes as a
dumm records so that recovery redo functions won't be confused.

Regards;

Andreas

--
-------------
Koichi Suzuki

#37Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Koichi Suzuki (#36)
Re: [HACKERS] Full page writes improvement, code update

Yup, this is a good summary.

You say you need to remove the optimization that avoids the logging

of

a new tuple because the full page image exists.
I think we must already have the info in WAL which tuple inside the
full page image is new (the one for which we avoided the WAL entry
for).

How about this:
Leave current WAL as it is and only add the not removeable flag to
full pages.
pg_compresslog then replaces the full page image with a record for

the

one tuple that is changed.
I tend to think it is not worth the increased complexity only to

save

bytes in the uncompressed WAL though.

It is essentially what my patch proposes. My patch includes
flag to full page writes which "can be" removed.

Ok, a flag that marks full page images that can be removed is perfect.

But you also turn off the optimization that avoids writing regular
WAL records when the info is already contained in a full-page image
(increasing the
uncompressed size of WAL).
It was that part I questioned. As already stated, maybe I should not
have because
it would be too complex to reconstruct a regular WAL record from the
full-page image.
But that code would also be needed for WAL based partial replication, so
if it where too
complicated we would eventually want a switch to turn off the
optimization anyway
(at least for heap page changes).

Another point about pg_decompresslog:

Why do you need a pg_decompresslog ? Imho pg_compresslog should
already do the replacing of the full_page with the dummy entry. Then

pg_decompresslog could be a simple gunzip, or whatever compression

was

used, but no logic.

Just removing full page writes does not work. If we shift the rest

of

the WAL, then LSN becomes inconsistent in compressed archive logs

which

pg_compresslog produces. For recovery, we have to restore LSN as the

original WAL. Pg_decompresslog restores removed full page writes as

a

dumm records so that recovery redo functions won't be confused.

Ah sorry, I needed some pgsql/src/backend/access/transam/README reading.

LSN is the physical position of records in WAL. Thus your dummy record
size is equal to what you cut out of the original record.
What about disconnecting WAL LSN from physical WAL record position
during replay ?
Add simple short WAL records in pg_compresslog like: advance LSN by 8192
bytes.

Andreas

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas ADI SD (#37)
Re: [HACKERS] Full page writes improvement, code update

"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:

But you also turn off the optimization that avoids writing regular
WAL records when the info is already contained in a full-page image
(increasing the uncompressed size of WAL).
It was that part I questioned.

That's what bothers me about this patch, too. It will be increasing
the cost of writing WAL (more data -> more CRC computation and more
I/O, not to mention more contention for the WAL locks) which translates
directly to a server slowdown.

The main arguments that I could see against Andreas' alternative are:

1. Some WAL record types are arranged in a way that actually would not
permit the reconstruction of the short form from the long form, because
they throw away too much data when the full-page image is substituted.
An example that's fresh in my mind is that the current format of the
btree page split WAL record discards newitemoff in that case, so you
couldn't identify the inserted item in the page image. Now this is only
saving two bytes in what's usually going to be a darn large record
anyway, and it complicates the code to do it, so I wouldn't cry if we
changed btree split to include newitemoff always. But there might be
some other cases where more data is involved. In any case, someone
would have to look through every single WAL record type to determine
whether reconstruction is possible and fix it if not.

2. The compresslog utility would have to have specific knowledge about
every compressible WAL record type, to know how to convert it to the
short format. That means an ongoing maintenance commitment there.
I don't think this is unacceptable, simply because we need only teach
it about a few common record types, not everything under the sun ---
anything it doesn't know how to fix, just leave alone, and if it's an
uncommon record type it really doesn't matter. (I guess that means
that we don't really have to do #1 for every last record type, either.)

So I don't think either of these is a showstopper. Doing it this way
would certainly make the patch more acceptable, since the argument that
it might hurt rather than help performance in some cases would go away.

What about disconnecting WAL LSN from physical WAL record position
during replay ?
Add simple short WAL records in pg_compresslog like: advance LSN by 8192
bytes.

I don't care for that, as it pretty much destroys some of the more
important sanity checks that xlog replay does. The page boundaries
need to match the records contained in them. So I think we do need
to have pg_decompresslog insert dummy WAL entries to fill up the
space saved by omitting full pages.

regards, tom lane

#39Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#38)
Re: [HACKERS] Full page writes improvement, code update

On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:

"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:

But you also turn off the optimization that avoids writing regular
WAL records when the info is already contained in a full-page image
(increasing the uncompressed size of WAL).
It was that part I questioned.

I think its right to question it, certainly.

That's what bothers me about this patch, too. It will be increasing
the cost of writing WAL (more data -> more CRC computation and more
I/O, not to mention more contention for the WAL locks) which translates
directly to a server slowdown.

I don't really understand this concern. Koichi-san has included a
parameter setting that would prevent any change at all in the way WAL is
written. If you don't want this slight increase in WAL, don't enable it.
If you do enable it, you'll also presumably be compressing the xlog too,
which works much better than gzip using less CPU. So overall it saves
more than it costs, ISTM, and nothing at all if you choose not to use
it.

The main arguments that I could see against Andreas' alternative are:

1. Some WAL record types are arranged in a way that actually would not
permit the reconstruction of the short form from the long form, because
they throw away too much data when the full-page image is substituted.
An example that's fresh in my mind is that the current format of the
btree page split WAL record discards newitemoff in that case, so you
couldn't identify the inserted item in the page image. Now this is only
saving two bytes in what's usually going to be a darn large record
anyway, and it complicates the code to do it, so I wouldn't cry if we
changed btree split to include newitemoff always. But there might be
some other cases where more data is involved. In any case, someone
would have to look through every single WAL record type to determine
whether reconstruction is possible and fix it if not.

2. The compresslog utility would have to have specific knowledge about
every compressible WAL record type, to know how to convert it to the
short format. That means an ongoing maintenance commitment there.
I don't think this is unacceptable, simply because we need only teach
it about a few common record types, not everything under the sun ---
anything it doesn't know how to fix, just leave alone, and if it's an
uncommon record type it really doesn't matter. (I guess that means
that we don't really have to do #1 for every last record type, either.)

So I don't think either of these is a showstopper. Doing it this way
would certainly make the patch more acceptable, since the argument that
it might hurt rather than help performance in some cases would go away.

Yeh, its additional code paths, but it sounds like Koichi-san and
colleagues are going to be trail blazing any bugs there and will be
around to fix any more that emerge.

What about disconnecting WAL LSN from physical WAL record position
during replay ?
Add simple short WAL records in pg_compresslog like: advance LSN by 8192
bytes.

I don't care for that, as it pretty much destroys some of the more
important sanity checks that xlog replay does. The page boundaries
need to match the records contained in them. So I think we do need
to have pg_decompresslog insert dummy WAL entries to fill up the
space saved by omitting full pages.

Agreed. I don't want to start touching something that works so well.

We've been thinking about doing this for at least 3 years now, so I
don't see any reason to baulk at it now. I'm happy with Koichi-san's
patch as-is, assuming further extensive testing will be carried out on
it during beta.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#39)
Re: [HACKERS] Full page writes improvement, code update

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:

That's what bothers me about this patch, too. It will be increasing
the cost of writing WAL (more data -> more CRC computation and more
I/O, not to mention more contention for the WAL locks) which translates
directly to a server slowdown.

I don't really understand this concern.

The real objection is that a patch that's alleged to make WAL smaller
actually does the exact opposite. Now maybe you can buy that back
downstream of the archiver --- after yet more added-on processing ---
but it still seems that there's a fundamental misdesign here.

Koichi-san has included a parameter setting that would prevent any
change at all in the way WAL is written.

It bothers me that we'd need to have such a switch. That's just another
way to shoot yourself in the foot, either by not enabling it (in which
case applying pg_compresslog as it stands would actively break your
WAL), or by enabling it when you weren't actually going to use
pg_compresslog (because you misunderstood the documentation to imply
that it'd make your WAL smaller by itself). What I want to see is a
patch that doesn't bloat WAL at all and therefore doesn't need a switch.
I think Andreas is correct to complain that it should be done that way.

regards, tom lane

#41Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#40)
Re: [HACKERS] Full page writes improvement, code update

On Fri, 2007-04-13 at 11:47 -0400, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:

That's what bothers me about this patch, too. It will be increasing
the cost of writing WAL (more data -> more CRC computation and more
I/O, not to mention more contention for the WAL locks) which translates
directly to a server slowdown.

I don't really understand this concern.

The real objection is that a patch that's alleged to make WAL smaller
actually does the exact opposite. Now maybe you can buy that back
downstream of the archiver --- after yet more added-on processing ---
but it still seems that there's a fundamental misdesign here.

Koichi-san has included a parameter setting that would prevent any
change at all in the way WAL is written.

It bothers me that we'd need to have such a switch. That's just another
way to shoot yourself in the foot, either by not enabling it (in which
case applying pg_compresslog as it stands would actively break your
WAL), or by enabling it when you weren't actually going to use
pg_compresslog (because you misunderstood the documentation to imply
that it'd make your WAL smaller by itself). What I want to see is a
patch that doesn't bloat WAL at all and therefore doesn't need a switch.
I think Andreas is correct to complain that it should be done that way.

I agree with everything you say because we already had *exactly* this
discussion when the patch was already submitted, with me saying
everything you just said.

After a few things have been renamed to show their correct function and
impact, I am now comfortable with this patch.

Writing lots of additional code simply to remove a parameter that
*might* be mis-interpreted doesn't sound useful to me, especially when
bugs may leak in that way. My take is that this is simple and useful
*and* we have it now; other ways don't yet exist, nor will they in time
for 8.3.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#41)
Re: [HACKERS] Full page writes improvement, code update

"Simon Riggs" <simon@2ndquadrant.com> writes:

Writing lots of additional code simply to remove a parameter that
*might* be mis-interpreted doesn't sound useful to me, especially when
bugs may leak in that way. My take is that this is simple and useful
*and* we have it now; other ways don't yet exist, nor will they in time
for 8.3.

The potential for misusing the switch is only one small part of the
argument; the larger part is that this has been done in the wrong way
and will cost performance unnecessarily. The fact that it's ready
now is not something that I think should drive our choices.

I believe that it would be possible to make the needed core-server
changes in time for 8.3, and then to work on compress/decompress
on its own time scale and publish it on pgfoundry; with the hope
that it would be merged to contrib or core in 8.4. Frankly the
compress/decompress code needs work anyway before it could be
merged (eg, I noted a distinct lack of I/O error checking).

regards, tom lane

#43Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Zeugswetter Andreas ADI SD (#37)
Re: [HACKERS] Full page writes improvement, code update

Sorry I was very late to find this.

With DBT-2 benchmark, I've already compared the amount of WAL. The
result was as follows:

Amount of WAL after 60min. run of DBT-2 benchmark
wal_add_optimization_info = off (default) 3.13GB
wal_add_optimization_info = on (new case) 3.17GB -> can be optimized to
0.31GB by pg_compresslog.

So the difference will be around a couple of percents. I think this is
very good figure.

For information,
DB Size: 12.35GB (120WH)
Checkpoint timeout: 60min. Checkpoint occured only once in the run.

----------------

I don't think replacing LSN works fine. For full recovery to the
current time, we need both archive log and WAL. Replacing LSN will make
archive log LSN inconsistent with WAL's LSN and the recovery will not work.

Reconstruction to regular WAL is proposed as pg_decompresslog. We
should be careful enough not to make redo routines confused with the
dummy full page writes, as Simon suggested. So far, it works fine.

Regards;

Zeugswetter Andreas ADI SD wrote:

Yup, this is a good summary.

You say you need to remove the optimization that avoids the logging

of

a new tuple because the full page image exists.
I think we must already have the info in WAL which tuple inside the
full page image is new (the one for which we avoided the WAL entry
for).

How about this:
Leave current WAL as it is and only add the not removeable flag to
full pages.
pg_compresslog then replaces the full page image with a record for

the

one tuple that is changed.
I tend to think it is not worth the increased complexity only to

save

bytes in the uncompressed WAL though.

It is essentially what my patch proposes. My patch includes
flag to full page writes which "can be" removed.

Ok, a flag that marks full page images that can be removed is perfect.

But you also turn off the optimization that avoids writing regular
WAL records when the info is already contained in a full-page image
(increasing the
uncompressed size of WAL).
It was that part I questioned. As already stated, maybe I should not
have because
it would be too complex to reconstruct a regular WAL record from the
full-page image.
But that code would also be needed for WAL based partial replication, so
if it where too
complicated we would eventually want a switch to turn off the
optimization anyway
(at least for heap page changes).

Another point about pg_decompresslog:

Why do you need a pg_decompresslog ? Imho pg_compresslog should
already do the replacing of the full_page with the dummy entry. Then

pg_decompresslog could be a simple gunzip, or whatever compression

was

used, but no logic.

Just removing full page writes does not work. If we shift the rest

of

the WAL, then LSN becomes inconsistent in compressed archive logs

which

pg_compresslog produces. For recovery, we have to restore LSN as the

original WAL. Pg_decompresslog restores removed full page writes as

a

dumm records so that recovery redo functions won't be confused.

Ah sorry, I needed some pgsql/src/backend/access/transam/README reading.

LSN is the physical position of records in WAL. Thus your dummy record
size is equal to what you cut out of the original record.
What about disconnecting WAL LSN from physical WAL record position
during replay ?
Add simple short WAL records in pg_compresslog like: advance LSN by 8192
bytes.

Andreas

--
-------------
Koichi Suzuki

#44Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Koichi Suzuki (#43)
Re: [HACKERS] Full page writes improvement, code update

With DBT-2 benchmark, I've already compared the amount of WAL. The
result was as follows:

Amount of WAL after 60min. run of DBT-2 benchmark
wal_add_optimization_info = off (default) 3.13GB

how about wal_fullpage_optimization = on (default)

wal_add_optimization_info = on (new case) 3.17GB -> can be
optimized to 0.31GB by pg_compresslog.

So the difference will be around a couple of percents. I think this

is

very good figure.

For information,
DB Size: 12.35GB (120WH)
Checkpoint timeout: 60min. Checkpoint occured only once in the run.

Unfortunately I think DBT-2 is not a good benchmark to test the disabled
wal optimization.
The test should contain some larger rows (maybe some updates on large
toasted values), and maybe more frequent checkpoints. Actually the poor
ratio between full pages and normal WAL content in this benchmark is
strange to begin with.
Tom fixed a bug recently, and it would be nice to see the new ratio.

Have you read Tom's comment on not really having to be able to
reconstruct all record types from the full page image ? I think that
sounded very promising (e.g. start out with only heap insert/update).

Then:
- we would not need the wal optimization switch (the full page flag
would always be added depending only on backup)
- pg_compresslog would only remove such "full page" images where it
knows how to reconstruct a "normal" WAL record from
- with time and effort pg_compresslog would be able to compress [nearly]
all record types's full images (no change in backend)

I don't think replacing LSN works fine. For full recovery to
the current time, we need both archive log and WAL.
Replacing LSN will make archive log LSN inconsistent with
WAL's LSN and the recovery will not work.

WAL recovery would have had to be modified (decouple LSN from WAL
position during recovery).
An "archive log" would have been a valid WAL (with appropriate LSN
advance records).

Reconstruction to regular WAL is proposed as
pg_decompresslog. We should be careful enough not to make
redo routines confused with the dummy full page writes, as
Simon suggested. So far, it works fine.

Yes, Tom didn't like "LSN replacing" eighter. I withdraw my concern
regarding pg_decompresslog.

Your work in this area is extremely valuable and I hope my comments are
not discouraging.

Thank you
Andreas

#45Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Simon Riggs (#39)
Re: [HACKERS] Full page writes improvement, code update

Hi,

I agree that pg_compresslog should be aware of all the WAL records'
details so that it can optimize archive log safely. In my patch, I've
examined 8.2's WAL records to make pg_compresslog/pg_decompresslog safe.

Also I agree further pg_compresslog maintenance needs to examine changes
in WAL record format. Because the number of such format will be
limited, I think the amount of work will be reasonable enough.

Regards;

Simon Riggs wrote:

On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:

"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:

But you also turn off the optimization that avoids writing regular
WAL records when the info is already contained in a full-page image
(increasing the uncompressed size of WAL).
It was that part I questioned.

I think its right to question it, certainly.

That's what bothers me about this patch, too. It will be increasing
the cost of writing WAL (more data -> more CRC computation and more
I/O, not to mention more contention for the WAL locks) which translates
directly to a server slowdown.

I don't really understand this concern. Koichi-san has included a
parameter setting that would prevent any change at all in the way WAL is
written. If you don't want this slight increase in WAL, don't enable it.
If you do enable it, you'll also presumably be compressing the xlog too,
which works much better than gzip using less CPU. So overall it saves
more than it costs, ISTM, and nothing at all if you choose not to use
it.

The main arguments that I could see against Andreas' alternative are:

1. Some WAL record types are arranged in a way that actually would not
permit the reconstruction of the short form from the long form, because
they throw away too much data when the full-page image is substituted.
An example that's fresh in my mind is that the current format of the
btree page split WAL record discards newitemoff in that case, so you
couldn't identify the inserted item in the page image. Now this is only
saving two bytes in what's usually going to be a darn large record
anyway, and it complicates the code to do it, so I wouldn't cry if we
changed btree split to include newitemoff always. But there might be
some other cases where more data is involved. In any case, someone
would have to look through every single WAL record type to determine
whether reconstruction is possible and fix it if not.

2. The compresslog utility would have to have specific knowledge about
every compressible WAL record type, to know how to convert it to the
short format. That means an ongoing maintenance commitment there.
I don't think this is unacceptable, simply because we need only teach
it about a few common record types, not everything under the sun ---
anything it doesn't know how to fix, just leave alone, and if it's an
uncommon record type it really doesn't matter. (I guess that means
that we don't really have to do #1 for every last record type, either.)

So I don't think either of these is a showstopper. Doing it this way
would certainly make the patch more acceptable, since the argument that
it might hurt rather than help performance in some cases would go away.

Yeh, its additional code paths, but it sounds like Koichi-san and
colleagues are going to be trail blazing any bugs there and will be
around to fix any more that emerge.

What about disconnecting WAL LSN from physical WAL record position
during replay ?
Add simple short WAL records in pg_compresslog like: advance LSN by 8192
bytes.

I don't care for that, as it pretty much destroys some of the more
important sanity checks that xlog replay does. The page boundaries
need to match the records contained in them. So I think we do need
to have pg_decompresslog insert dummy WAL entries to fill up the
space saved by omitting full pages.

Agreed. I don't want to start touching something that works so well.

We've been thinking about doing this for at least 3 years now, so I
don't see any reason to baulk at it now. I'm happy with Koichi-san's
patch as-is, assuming further extensive testing will be carried out on
it during beta.

--
-------------
Koichi Suzuki

#46Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Tom Lane (#42)
Re: [HACKERS] Full page writes improvement, code update

Here's only a part of the reply I should do, but as to I/O error
checking ...

Here's a list of system calls and other external function/library calls
used in pg_lesslog patch series, together with how current patch checks
each errors and how current postgresql source handles the similar calls:

--------------------------------
1. No error check is done
1-1. fileno()
fileno() is called against stdin and stdout from pg_compresslog.c
and pg_decompresslog.c. They are intended to be invoked from a shell
and so stdin and stdout are both available. fileno() error occurs only
if invoker of pg_compresslog or pg_decompresslog closes stdin and/or
stdout before the invoker executes them. I found similar fileno()
usage in pg_dump/pg_backup_archive.c and postmaster/syslogger.c. I
don't think this is an issue.

1-2. fflush()
fflush() is called against stdout within a debug routine, debug.c.
Such usage can also be found in bin/initdb.c, bin/scripts/createdb.c,
bin/psql/common.c and more. I don't think this is an issue either.

1-3. printf() and fprintf()
It is common practice not to check the error. We can find such
calls in many of existing source codes.

1-4. strerror()
It is checked that system call returns error before calling
strerror. Similar code can be found in other PostgreSQL source too.

----------------------------------
2. Error check is done
All the following function calls are associated with return value check.
open(), close(), fstat(), read(), write()

-----------------------------------
3. Functions do not return error
The following functin will not return errors, so no error check is needed.
exit(), memcopy(), memset(), strcmp()
------------------------------------

I hope this helps.

Regards;

Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

Writing lots of additional code simply to remove a parameter that
*might* be mis-interpreted doesn't sound useful to me, especially when
bugs may leak in that way. My take is that this is simple and useful
*and* we have it now; other ways don't yet exist, nor will they in time
for 8.3.

The potential for misusing the switch is only one small part of the
argument; the larger part is that this has been done in the wrong way
and will cost performance unnecessarily. The fact that it's ready
now is not something that I think should drive our choices.

I believe that it would be possible to make the needed core-server
changes in time for 8.3, and then to work on compress/decompress
on its own time scale and publish it on pgfoundry; with the hope
that it would be merged to contrib or core in 8.4. Frankly the
compress/decompress code needs work anyway before it could be
merged (eg, I noted a distinct lack of I/O error checking).

regards, tom lane

--
-------------
Koichi Suzuki

#47Simon Riggs
simon@2ndquadrant.com
In reply to: Zeugswetter Andreas ADI SD (#44)
Re: [HACKERS] Full page writes improvement, code update

On Fri, 2007-04-20 at 10:16 +0200, Zeugswetter Andreas ADI SD wrote:

Your work in this area is extremely valuable and I hope my comments are
not discouraging.

I think its too late in the day to make the changes suggested by
yourself and Tom. They make the patch more invasive and more likely to
error, plus we don't have much time. This really means the patch is
likely to be rejected at the 11th hour when what we have essentially
works...

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#48Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Zeugswetter Andreas ADI SD (#44)
Re: [HACKERS] Full page writes improvement, code update

Hi,

I don't insist the name and the default of the GUC parameter. I'm
afraid wal_fullpage_optimization = on (default) makes some confusion
because the default behavior becomes a bit different on WAL itself.

I'd like to have some more opinion on this.

Zeugswetter Andreas ADI SD wrote:

With DBT-2 benchmark, I've already compared the amount of WAL. The
result was as follows:

Amount of WAL after 60min. run of DBT-2 benchmark
wal_add_optimization_info = off (default) 3.13GB

how about wal_fullpage_optimization = on (default)

wal_add_optimization_info = on (new case) 3.17GB -> can be
optimized to 0.31GB by pg_compresslog.

So the difference will be around a couple of percents. I think this

is

very good figure.

For information,
DB Size: 12.35GB (120WH)
Checkpoint timeout: 60min. Checkpoint occured only once in the run.

Unfortunately I think DBT-2 is not a good benchmark to test the disabled
wal optimization.
The test should contain some larger rows (maybe some updates on large
toasted values), and maybe more frequent checkpoints. Actually the poor
ratio between full pages and normal WAL content in this benchmark is
strange to begin with.
Tom fixed a bug recently, and it would be nice to see the new ratio.

Have you read Tom's comment on not really having to be able to
reconstruct all record types from the full page image ? I think that
sounded very promising (e.g. start out with only heap insert/update).

Then:
- we would not need the wal optimization switch (the full page flag
would always be added depending only on backup)
- pg_compresslog would only remove such "full page" images where it
knows how to reconstruct a "normal" WAL record from
- with time and effort pg_compresslog would be able to compress [nearly]
all record types's full images (no change in backend)

I don't think replacing LSN works fine. For full recovery to
the current time, we need both archive log and WAL.
Replacing LSN will make archive log LSN inconsistent with
WAL's LSN and the recovery will not work.

WAL recovery would have had to be modified (decouple LSN from WAL
position during recovery).
An "archive log" would have been a valid WAL (with appropriate LSN
advance records).

Reconstruction to regular WAL is proposed as
pg_decompresslog. We should be careful enough not to make
redo routines confused with the dummy full page writes, as
Simon suggested. So far, it works fine.

Yes, Tom didn't like "LSN replacing" eighter. I withdraw my concern
regarding pg_decompresslog.

Your work in this area is extremely valuable and I hope my comments are
not discouraging.

Thank you
Andreas

--
-------------
Koichi Suzuki

#49Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Koichi Suzuki (#48)
Re: [HACKERS] Full page writes improvement, code update

I don't insist the name and the default of the GUC parameter.
I'm afraid wal_fullpage_optimization = on (default) makes
some confusion because the default behavior becomes a bit
different on WAL itself.

Seems my wal_fullpage_optimization is not a good name if it caused
misinterpretation already :-(

Amount of WAL after 60min. run of DBT-2 benchmark
wal_add_optimization_info = off (default) 3.13GB

how about wal_fullpage_optimization = on (default)

The meaning of wal_fullpage_optimization = on (default)
would be the same as your wal_add_optimization_info = off (default).
(Reversed name, reversed meaning of the boolean value)

It would be there to *turn off* the (default) WAL full_page
optimization.
For your pg_compresslog it would need to be set to off.
"add_optimization_info" sounded like added info about/for some
optimization
which it is not. We turn off an optimization with the flag for the
benefit
of an easier pg_compresslog implementation.

As already said I would decouple this setting from the part that sets
the "removeable full page" flag in WAL, and making the recovery able to
skip dummy records. This I would do unconditionally.

Andreas

#50Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#41)
Re: [HACKERS] Full page writes improvement, code update

Hackers,

Writing lots of additional code simply to remove a parameter that
*might* be mis-interpreted doesn't sound useful to me, especially when
bugs may leak in that way. My take is that this is simple and useful
*and* we have it now; other ways don't yet exist, nor will they in time
for 8.3.

How about naming the parameter wal_compressable? That would indicate pretty
clearly that the parameter is intended to be used with wal_compress and
nothing else.

However, I do agree with Andreas that anything which adds to WAL volume, even
3%, seems like going in the wrong direction. We already have higher log
output than any comparable database (higher than InnoDB by 3x) and we should
be looking for output to trim as well as compression.

So the relevant question is whether the patch in its current form provides
enough benefit to make it worthwhile for 8.3, or whether we should wait for
8.4. Questions:

1) is there any throughput benefit for platforms with fast CPU but contrained
I/O (e.g. 2-drive webservers)? Any penalty for servers with plentiful I/O?

2) Will this patch make attempts to reduce WAL volume in the future
significantly harder?

3) How is this better than command-line compression for log-shipping? e.g.
why do we need it in the database?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#51Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Josh Berkus (#50)
Re: [HACKERS] Full page writes improvement, code update

Hi,

Sorry, because of so many comments/questions, I'll write inline....

Josh Berkus wrote:

Hackers,

Writing lots of additional code simply to remove a parameter that
*might* be mis-interpreted doesn't sound useful to me, especially when
bugs may leak in that way. My take is that this is simple and useful
*and* we have it now; other ways don't yet exist, nor will they in time
for 8.3.

How about naming the parameter wal_compressable? That would indicate pretty
clearly that the parameter is intended to be used with wal_compress and
nothing else.

Hmm, it sounds nicer.

However, I do agree with Andreas that anything which adds to WAL volume, even
3%, seems like going in the wrong direction. We already have higher log
output than any comparable database (higher than InnoDB by 3x) and we should
be looking for output to trim as well as compression.

So the relevant question is whether the patch in its current form provides
enough benefit to make it worthwhile for 8.3, or whether we should wait for
8.4. Questions:

Before answering questions below, I'd like to say that archive log
optimization has to be address different point of views to the current
(upto 8.2) settings.

1) To deal with partial/inconsisitent write to the data file at crash
recovery, we need full page writes at the first modification to pages
after each checkpoint. It consumes much of WAL space.

2) 1) is not necessary for archive recovery (PITR) and full page writes
can be removed for this purpose. However, we need full page writes
during hot backup to deal with partial writes by backup commands. This
is implemented in 8.2.

3) To maintain crash recovery chance and reduce the amount of archive
log, removal of unnecessary full page writes from archive logs is a
good choice. To do this, we need both logical log and full page writes
in WAL.

I don't think there should be only one setting. It depend on how
database is operated. Leaving wal_add_optiomization_info = off default
does not bring any change in WAL and archive log handling. I
understand some people may not be happy with additional 3% or so
increase in WAL size, especially people who dosn't need archive log at
all. So I prefer to leave the default off.

For users, I think this is simple enough:

1) For people happy with 8.2 settings:
No change is needed to move to 8.3 and there's really no change.

2) For people who need to reduce archive log size but like to leave full
page writes to WAL (to maintain crash recovery chance):
a) Add GUC parameter: wal_add_optiomization_info=on
b) Change archive command from "cp" to "pg_compresslog"
c) Change restore command from "cp" to "pg_decompresslog"

Archive log can be stored and restored as done in older releases.

1) is there any throughput benefit for platforms with fast CPU but contrained
I/O (e.g. 2-drive webservers)? Any penalty for servers with plentiful I/O?

I've only run benchmarks with archive process running, because
wal_add_optimization_info=on does not make sense if we don't archive
WAL. In this situation, total I/O decreases because writes to archive
log decreases. Because of 3% or so increase in WAL size, there will be
increase in WAL write, but decrease in archive writes makes it up.

2) Will this patch make attempts to reduce WAL volume in the future
significantly harder?

Yes, I'd like to continue to work to reduce the WAL size. It's still
an issue when database size becomes several handreds of gigabytes in
size. Anyway, I think WAL size reduction has to be done in
XLogInsert() or XLogWrite(). We need much more discussion for this.
The issue will be how to maintain crash recovery chance by inconsistent
writes (by full_page_writes=off, we have to give it up). On the other
hand we have to keep examining each WAL record.

3) How is this better than command-line compression for log-shipping? e.g.
why do we need it in the database?

I don't fully understand what command-line compression means. Simon
suggested that this patch can be used with log-shipping and I agree.
If we compare compression with gzip or other general purpose
compression, compression ratio, CPU usage and I/O by pg_compresslog are
all quite better than those in gzip.

Please let me know if you intended defferently.

Regards;

--
-------------
Koichi Suzuki

#52Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Koichi Suzuki (#51)
Re: [HACKERS] Full page writes improvement, code update

3) To maintain crash recovery chance and reduce the amount of
archive log, removal of unnecessary full page writes from
archive logs is a good choice.

Definitely, yes. pg_compresslog could even move the full pages written
during backup out of WAL and put them in a different file that needs to
be applied before replay of the corresponding WAL after a physical
restore. This would further help reduce log shipping volume.

To do this, we need both logical log and full page writes in WAL.

This is only true in the sense, that it allows a less complex
implementation of pg_compresslog.

Basically a WAL record consists of info about what happened and
currently eighter per tuple new data or a full page image. The info of
"what happened" together with the full page image is sufficient to
reconstruct the "per tuple new data". There might be a few WAL record
types (e.g. in btree split ?) where this is not so, but we could eighter
fix those or not compress those.

This is why I don't like Josh's suggested name of wal_compressable
eighter.
WAL is compressable eighter way, only pg_compresslog would need to be
more complex if you don't turn off the full page optimization. I think a
good name would tell that you are turning off an optimization.
(thus my wal_fullpage_optimization on/off)

Andreas

#53Josh Berkus
josh@agliodbs.com
In reply to: Zeugswetter Andreas ADI SD (#52)
Re: [HACKERS] Full page writes improvement, code update

Koichi, Andreas,

1) To deal with partial/inconsisitent write to the data file at crash
recovery, we need full page writes at the first modification to pages
after each checkpoint. It consumes much of WAL space.

We need to find a way around this someday. Other DBs don't do this; it may be
becuase they're less durable, or because they fixed the problem.

I don't think there should be only one setting. It depend on how
database is operated. Leaving wal_add_optiomization_info = off default
does not bring any change in WAL and archive log handling. I
understand some people may not be happy with additional 3% or so
increase in WAL size, especially people who dosn't need archive log at
all. So I prefer to leave the default off.

Except that, is there any reason to turn this off if we are archiving? Maybe
it should just be slaved to archive_command ... if we're not using PITR, it's
off, if we are, it's on.

1) is there any throughput benefit for platforms with fast CPU but
contrained I/O (e.g. 2-drive webservers)? Any penalty for servers with
plentiful I/O?

I've only run benchmarks with archive process running, because
wal_add_optimization_info=on does not make sense if we don't archive
WAL. In this situation, total I/O decreases because writes to archive
log decreases. Because of 3% or so increase in WAL size, there will be
increase in WAL write, but decrease in archive writes makes it up.

Yeah, I was just looking for a way to make this a performance feature. I see
now that it can't be. ;-)

3) How is this better than command-line compression for log-shipping?
e.g. why do we need it in the database?

I don't fully understand what command-line compression means. Simon
suggested that this patch can be used with log-shipping and I agree.
If we compare compression with gzip or other general purpose
compression, compression ratio, CPU usage and I/O by pg_compresslog are
all quite better than those in gzip.

OK, that answered my question.

This is why I don't like Josh's suggested name of wal_compressable
eighter.
WAL is compressable eighter way, only pg_compresslog would need to be
more complex if you don't turn off the full page optimization. I think a
good name would tell that you are turning off an optimization.
(thus my wal_fullpage_optimization on/off)

Well, as a PG hacker I find the name wal_fullpage_optimization quite baffling
and I think our general user base will find it even more so. Now that I have
Koichi's explanation of the problem, I vote for simply slaving this to the
PITR settings and not having a separate option at all.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#53)
Re: [HACKERS] Full page writes improvement, code update

Josh Berkus <josh@agliodbs.com> writes:

Well, as a PG hacker I find the name wal_fullpage_optimization quite
baffling and I think our general user base will find it even more so.
Now that I have Koichi's explanation of the problem, I vote for simply
slaving this to the PITR settings and not having a separate option at
all.

The way to not have a separate option is to not need one, by having the
feature not cost anything extra in the first place. Andreas and I have
made the point repeatedly about how to do that.

regards, tom lane

#55Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Josh Berkus (#53)
Re: [HACKERS] Full page writes improvement, code update

1) To deal with partial/inconsisitent write to the data file at

crash

recovery, we need full page writes at the first modification to

pages

after each checkpoint. It consumes much of WAL space.

We need to find a way around this someday. Other DBs don't
do this; it may be becuase they're less durable, or because
they fixed the problem.

They eighter can only detect a failure later (this may be a very long
time depending on access and verify runs) or they also write page
images. Those that write page images usually write "before images" to a
different area that is cleared periodically (e.g. during checkpoint).

Writing to a different area was considered in pg, but there were more
negative issues than positive.
So imho pg_compresslog is the correct path forward. The current
discussion is only about whether we want a more complex pg_compresslog
and no change to current WAL, or an increased WAL size for a less
complex implementation.
Both would be able to compress the WAL to the same "archive log" size.

Andreas

In reply to: Zeugswetter Andreas ADI SD (#55)
Re: [HACKERS] Full page writes improvement, code update

On Wed, Apr 25, 2007 at 10:00:16AM +0200, Zeugswetter Andreas ADI SD wrote:

1) To deal with partial/inconsisitent write to the data file at

crash

recovery, we need full page writes at the first modification to

pages

after each checkpoint. It consumes much of WAL space.

We need to find a way around this someday. Other DBs don't
do this; it may be becuase they're less durable, or because
they fixed the problem.

They eighter can only detect a failure later (this may be a very long
time depending on access and verify runs) or they also write page
images. Those that write page images usually write "before images" to a
different area that is cleared periodically (e.g. during checkpoint).

Writing to a different area was considered in pg, but there were more
negative issues than positive.
So imho pg_compresslog is the correct path forward. The current
discussion is only about whether we want a more complex pg_compresslog
and no change to current WAL, or an increased WAL size for a less
complex implementation.
Both would be able to compress the WAL to the same "archive log" size.

Andreas

I definitely am in the camp of not increasing WAL size at all. If we
need a bit more complexity to ensure that, so be it. Any approach that
increases WAL volume would need to have an amazing benefit to make it
warranted. This certainly does not meet that criteria.

Ken

#57Josh Berkus
josh@agliodbs.com
In reply to: Zeugswetter Andreas ADI SD (#55)
Re: [HACKERS] Full page writes improvement, code update

Andreas,

Writing to a different area was considered in pg, but there were more
negative issues than positive.
So imho pg_compresslog is the correct path forward. The current
discussion is only about whether we want a more complex pg_compresslog
and no change to current WAL, or an increased WAL size for a less
complex implementation.
Both would be able to compress the WAL to the same "archive log" size.

Huh? As conceived, pg_compresslog does nothing to lower log volume for
general purposes, just on-disk storage size for archiving. It doesn't help
us at all with the tremendous amount of log we put out for an OLTP server,
for example.

Not that pg_compresslog isn't useful on its own for improving warm standby
managability, but it's completely separate from addressing the "we're logging
too much" issue.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#57)
Re: [HACKERS] Full page writes improvement, code update

Josh Berkus <josh@agliodbs.com> writes:

Andreas,

So imho pg_compresslog is the correct path forward. The current
discussion is only about whether we want a more complex pg_compresslog
and no change to current WAL, or an increased WAL size for a less
complex implementation.
Both would be able to compress the WAL to the same "archive log" size.

Huh? As conceived, pg_compresslog does nothing to lower log volume for
general purposes, just on-disk storage size for archiving. It doesn't help
us at all with the tremendous amount of log we put out for an OLTP server,
for example.

I don't see how what you said refutes what he said. The sticking point
here is that the patch as-proposed *increases* the log volume before
compression.

regards, tom lane

#59Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Zeugswetter Andreas ADI SD (#49)
Re: [HACKERS] Full page writes improvement, code update

Hi,

Zeugswetter Andreas ADI SD wrote:

I don't insist the name and the default of the GUC parameter.
I'm afraid wal_fullpage_optimization = on (default) makes
some confusion because the default behavior becomes a bit
different on WAL itself.

Seems my wal_fullpage_optimization is not a good name if it caused
misinterpretation already :-(

Amount of WAL after 60min. run of DBT-2 benchmark
wal_add_optimization_info = off (default) 3.13GB

how about wal_fullpage_optimization = on (default)

The meaning of wal_fullpage_optimization = on (default)
would be the same as your wal_add_optimization_info = off (default).
(Reversed name, reversed meaning of the boolean value)

It would be there to *turn off* the (default) WAL full_page
optimization.
For your pg_compresslog it would need to be set to off.
"add_optimization_info" sounded like added info about/for some
optimization
which it is not. We turn off an optimization with the flag for the
benefit
of an easier pg_compresslog implementation.

For pg_compresslog to remove full page writes, we need
wal_add_optimization_info=on.

As already said I would decouple this setting from the part that sets
the "removeable full page" flag in WAL, and making the recovery able to
skip dummy records. This I would do unconditionally.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--
-------------
Koichi Suzuki

#60Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Josh Berkus (#57)
Re: too much WAL volume

Writing to a different area was considered in pg, but there were

more

negative issues than positive.
So imho pg_compresslog is the correct path forward. The current
discussion is only about whether we want a more complex

pg_compresslog

and no change to current WAL, or an increased WAL size for a less
complex implementation.
Both would be able to compress the WAL to the same "archive log"

size.

Huh? As conceived, pg_compresslog does nothing to lower log
volume for general purposes, just on-disk storage size for
archiving. It doesn't help us at all with the tremendous
amount of log we put out for an OLTP server, for example.

Ok, that is not related to the original discussion though.
I have thus changed the subject, and removed [PATCHES].

You cannot directly compare the pg WAL size with other db's since they
write parts to other areas (e.g. physical log in Informix). You would
need to include those writes in a fair comparison.
It is definitely not true, that writing to a different area has only
advantages. The consensus was, that writing the page images to the WAL
has more pro's. We could revisit the pros and cons though.

Other options involve special OS and hardware (we already have that), or
accepting a high risc of needing a
restore after power outage (we don't have that, because we use no
mechanism to detect such a failure).

I am not sure that shrinking per WAL record size (other than the full
page images), e.g. by only logging changed bytes and not whole tuples,
would have a huge impact on OLTP tx/sec, since the limiting factor is
IO's per second and not Mb per second. Recent developments like HOT seem
a lot more promising in this regard since they avoid IO.

Andreas

#61Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Josh Berkus (#53)
Re: [HACKERS] Full page writes improvement, code update

Josh,

Josh Berkus wrote:

Koichi, Andreas,

1) To deal with partial/inconsisitent write to the data file at crash
recovery, we need full page writes at the first modification to pages
after each checkpoint. It consumes much of WAL space.

We need to find a way around this someday. Other DBs don't do this; it may be
becuase they're less durable, or because they fixed the problem.

Maybe both. Fixing the problem may need some means to detect
partial/inconsistent writes to the data files, which may needs
additional CPU resource.

I don't think there should be only one setting. It depend on how
database is operated. Leaving wal_add_optiomization_info = off default
does not bring any change in WAL and archive log handling. I
understand some people may not be happy with additional 3% or so
increase in WAL size, especially people who dosn't need archive log at
all. So I prefer to leave the default off.

Except that, is there any reason to turn this off if we are archiving? Maybe
it should just be slaved to archive_command ... if we're not using PITR, it's
off, if we are, it's on.

Hmm, this sounds to work. On the other hand, existing users, who are
happy with the current archiving and would not like to change current
archiving command to pg_compresslog or archive log size will increase a
bit. I'd like to hear some more on this.

1) is there any throughput benefit for platforms with fast CPU but
contrained I/O (e.g. 2-drive webservers)? Any penalty for servers with
plentiful I/O?

I've only run benchmarks with archive process running, because
wal_add_optimization_info=on does not make sense if we don't archive
WAL. In this situation, total I/O decreases because writes to archive
log decreases. Because of 3% or so increase in WAL size, there will be
increase in WAL write, but decrease in archive writes makes it up.

Yeah, I was just looking for a way to make this a performance feature. I see
now that it can't be. ;-)

As to the performance feature, I tested the patch against 8.3HEAD.
With pgbench, throughput was as follows:
Case1. Archiver: cp command, wal_add_optimization_info = off,
full_page_writes=on
Case2. Archiver: pg_compresslog, wal_add_optimization_info = on,
full_page_writes=on
DB Size: 1.65GB, Total transaction:1,000,000

Throughput was:
Case1: 632.69TPS
Case2: 653.10TPS ... 3% gain.

Archive Log Size:
Case1: 1.92GB
Case2: 0.57GB (about 30% of the Case1)... Before compression, the size
was 1.92GB. Because this is based on the number of WAL segment file
size, there will be at most 16MB error in the measurement. If we count
this, the increase in WAL I/O will be less than 1%.

3) How is this better than command-line compression for log-shipping?
e.g. why do we need it in the database?

I don't fully understand what command-line compression means. Simon
suggested that this patch can be used with log-shipping and I agree.
If we compare compression with gzip or other general purpose
compression, compression ratio, CPU usage and I/O by pg_compresslog are
all quite better than those in gzip.

OK, that answered my question.

This is why I don't like Josh's suggested name of wal_compressable
eighter.
WAL is compressable eighter way, only pg_compresslog would need to be
more complex if you don't turn off the full page optimization. I think a
good name would tell that you are turning off an optimization.
(thus my wal_fullpage_optimization on/off)

Well, as a PG hacker I find the name wal_fullpage_optimization quite baffling
and I think our general user base will find it even more so. Now that I have
Koichi's explanation of the problem, I vote for simply slaving this to the
PITR settings and not having a separate option at all.

Could I have more specific suggestion on this?

Regards;

--
-------------
Koichi Suzuki

#62Greg Smith
gsmith@gregsmith.com
In reply to: Zeugswetter Andreas ADI SD (#60)
Re: too much WAL volume

On Thu, 26 Apr 2007, Zeugswetter Andreas ADI SD wrote:

I am not sure that shrinking per WAL record size (other than the full
page images), e.g. by only logging changed bytes and not whole tuples,
would have a huge impact on OLTP tx/sec, since the limiting factor is
IO's per second and not Mb per second.

With the kind of caching controller that's necessary for any serious OLTP
work with Postgres, number of I/Os per second isn't really an important
number. Total volume of writes to the WAL volume can be though. It's
difficult but not impossible to encounter a workload that becomes
bottlenecked by WAL volume on a good OLTP server, particularly because
that's often going to a single or RAID-1 disk. Whether those workloads
also have the appropriate properties such that their WAL could be shrunk
usefully in real-time is a good question.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#63Jim Nasby
decibel@decibel.org
In reply to: Greg Smith (#62)
Re: too much WAL volume

On Apr 27, 2007, at 4:58 AM, Greg Smith wrote:

On Thu, 26 Apr 2007, Zeugswetter Andreas ADI SD wrote:

I am not sure that shrinking per WAL record size (other than the full
page images), e.g. by only logging changed bytes and not whole
tuples,
would have a huge impact on OLTP tx/sec, since the limiting factor is
IO's per second and not Mb per second.

With the kind of caching controller that's necessary for any
serious OLTP work with Postgres, number of I/Os per second isn't
really an important number. Total volume of writes to the WAL
volume can be though. It's difficult but not impossible to
encounter a workload that becomes bottlenecked by WAL volume on a
good OLTP server, particularly because that's often going to a
single or RAID-1 disk. Whether those workloads also have the
appropriate properties such that their WAL could be shrunk usefully
in real-time is a good question.

Yes, but how many data drives would you need to have to bottleneck on
WAL? Even if the entire database is memory resident you'd still have
to write all the pages out at some point, and it seems to me that
you'd need a fair amount of disk capacity the data directory before
you got pegged by WAL.

When I did some DBT2 testing a bit over a year ago I had a 20 drive
RAID10 for data and a mirror for WAL and was nowhere close to pegged
on WAL (this was on a Sun V40 connected to one of their storage arrays).
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#64Greg Smith
gsmith@gregsmith.com
In reply to: Jim Nasby (#63)
Re: too much WAL volume

On Fri, 27 Apr 2007, Jim Nasby wrote:

Yes, but how many data drives would you need to have to bottleneck on WAL?
Even if the entire database is memory resident you'd still have to write all
the pages out at some point, and it seems to me that you'd need a fair amount
of disk capacity the data directory before you got pegged by WAL.

Depends on the type of transactions. If you're doing something with lots
of INSERT and perhaps some UPDATE volume that doesn't need to read heavily
from the database to complete, because most of the important stuff is
already in memory, you might run into the WAL limit without too much on
the database disk side. I did say it was difficult...

When I did some DBT2 testing a bit over a year ago I had a 20 drive RAID10
for data and a mirror for WAL and was nowhere close to pegged on WAL (this
was on a Sun V40 connected to one of their storage arrays).

No doubt, the main reason I haven't used DBT2 more is because the WAL
volume produced before you run into database limited bottlenecks isn't
large, and certainly not in the same ratio as some of the apps I'm
modeling. Mine lean more toward straightforward transaction logging in
parts.

I'm running on similar hardware (V40 is very close, I think the EMC array
I test against is a bit better than the most of the Sun models) and I've
seen some scenarios that produce 40MB/s average - 60MB/s peak of WAL
volume. Sure seems like I'm rate limited by the RAID-1 WAL disk. As you
say, eventually all the data has to make it to disk, but since it's not
too expensive nowadays to have gigabytes worth of memory and disk array
cache you can put off database writes for a surprisingly long period of
time with the right system design. It's harder to buffer those pesky
O_DIRECT WAL writes when they blow right though at least one level of
cache.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#65Koichi Suzuki
suzuki.koichi@oss.ntt.co.jp
In reply to: Tom Lane (#58)
1 attachment(s)
Re: [PATCHES] Full page writes improvement, code update

Hi,

As replied to "Patch queue triage" by Tom, here's simplified patch to
mark WAL record as "compressable", with no increase in WAL itself.
Compression/decompression commands will be posted separately to PG
Foundary for further review.

-------------------------------
As suggested by Tom, I agree that WAL should not include "both" full
page write and incremental (logical) log. I began to examine WAL
record format to see if incremental log can be made from full page
writes. It will be okay even before 8.4, if simplified patch to the
core is accepted. I will post simplified patch to the core as follows:

1. Mark the flag to indicate that the WAL record is compressable from
full page writes to incremental log. This flag will be set if
a) It is not written during the hot backup, and
b) Archive command is active, and
c) WAL contains full page writes, and
d) full_page_writes=on.
No logical log will be written to WAL in this case, and
2. During recovery, xl_tot_len check will be skipped for compressed WAL
records.

Please note that new GUC is not needed in this patch.

With this patch, compress/decompress can be developped outside the core.

I'd be very grateful if this patch can be considered again.

Best Regards;

--
-------------
Koichi Suzuki

Attachments:

lesslog_core.patch_flagonlytext/plain; name=lesslog_core.patch_flagonlyDownload
diff -cr pgsql_org/src/backend/access/transam/xlog.c pgsql/src/backend/access/transam/xlog.c
*** pgsql_org/src/backend/access/transam/xlog.c	2007-05-02 15:56:38.000000000 +0900
--- pgsql/src/backend/access/transam/xlog.c	2007-05-07 16:30:38.000000000 +0900
***************
*** 837,842 ****
--- 837,854 ----
  		return RecPtr;
  	}
  
+ 	/*
+ 	 * If online backup is not in progress and WAL archiving is active, mark
+ 	 * backup blocks removable if any.
+ 	 * This mark will be referenced during archiving to remove needless backup
+ 	 * blocks in the record and compress WAL segment files.
+ 	 */
+ 	if (XLogArchivingActive() && fullPageWrites &&
+ 			(info & XLR_BKP_BLOCK_MASK) && !Insert->forcePageWrites)
+ 	{
+ 		info |= XLR_BKP_REMOVABLE;
+ 	}
+ 
  	/* Insert record header */
  
  	record = (XLogRecord *) Insert->currpos;
***************
*** 2738,2750 ****
  		blk += blen;
  	}
  
! 	/* Check that xl_tot_len agrees with our calculation */
! 	if (blk != (char *) record + record->xl_tot_len)
  	{
! 		ereport(emode,
! 				(errmsg("incorrect total length in record at %X/%X",
! 						recptr.xlogid, recptr.xrecoff)));
! 		return false;
  	}
  
  	/* Finally include the record header */
--- 2750,2778 ----
  		blk += blen;
  	}
  
! 	/*
! 	 * If physical log has not been removed, check the length to see
! 	 * the following.
! 	 *   - No physical log existed originally,
! 	 *   - WAL record was not removable because it is generated during
! 	 *     the online backup,
! 	 *   - Cannot be removed because the physical log spanned in
! 	 *     two segments.
! 	 * The reason why we skip the length check on the physical log removal is
! 	 * that the flag XLR_SET_BKB_BLOCK(0..2) is reset to zero and it prevents
! 	 * the above loop to proceed blk to the end of the record.
! 	 */
! 	if (!(record->xl_info & XLR_BKP_REMOVABLE) ||
! 		record->xl_info & XLR_BKP_BLOCK_MASK)
  	{
! 		/* Check that xl_tot_len agrees with our calculation */
! 		if (blk != (char *) record + record->xl_tot_len)
! 		{
! 			ereport(emode,
! 					(errmsg("incorrect total length in record at %X/%X",
! 							recptr.xlogid, recptr.xrecoff)));
! 			return false;
! 		}
  	}
  
  	/* Finally include the record header */
pgsql/src/backend/access/transamだけに発見: xlog.c.orig
diff -cr pgsql_org/src/include/access/xlog.h pgsql/src/include/access/xlog.h
*** pgsql_org/src/include/access/xlog.h	2007-01-06 07:19:51.000000000 +0900
--- pgsql/src/include/access/xlog.h	2007-05-07 16:30:38.000000000 +0900
***************
*** 66,73 ****
  /*
   * If we backed up any disk blocks with the XLOG record, we use flag bits in
   * xl_info to signal it.  We support backup of up to 3 disk blocks per XLOG
!  * record.	(Could support 4 if we cared to dedicate all the xl_info bits for
!  * this purpose; currently bit 0 of xl_info is unused and available.)
   */
  #define XLR_BKP_BLOCK_MASK		0x0E	/* all info bits used for bkp blocks */
  #define XLR_MAX_BKP_BLOCKS		3
--- 66,74 ----
  /*
   * If we backed up any disk blocks with the XLOG record, we use flag bits in
   * xl_info to signal it.  We support backup of up to 3 disk blocks per XLOG
!  * record.
!  * Bit 0 of xl_info is used to represent that backup blocks are not necessary
!  * in archive-log.
   */
  #define XLR_BKP_BLOCK_MASK		0x0E	/* all info bits used for bkp blocks */
  #define XLR_MAX_BKP_BLOCKS		3
***************
*** 75,80 ****
--- 76,82 ----
  #define XLR_BKP_BLOCK_1			XLR_SET_BKP_BLOCK(0)	/* 0x08 */
  #define XLR_BKP_BLOCK_2			XLR_SET_BKP_BLOCK(1)	/* 0x04 */
  #define XLR_BKP_BLOCK_3			XLR_SET_BKP_BLOCK(2)	/* 0x02 */
+ #define XLR_BKP_REMOVABLE		XLR_SET_BKP_BLOCK(3)	/* 0x01 */
  
  /*
   * Sometimes we log records which are out of transaction control.
#66Bruce Momjian
bruce@momjian.us
In reply to: Koichi Suzuki (#65)
Re: [PATCHES] Full page writes improvement, code update

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Koichi Suzuki wrote:

Hi,

As replied to "Patch queue triage" by Tom, here's simplified patch to
mark WAL record as "compressable", with no increase in WAL itself.
Compression/decompression commands will be posted separately to PG
Foundary for further review.

-------------------------------
As suggested by Tom, I agree that WAL should not include "both" full
page write and incremental (logical) log. I began to examine WAL
record format to see if incremental log can be made from full page
writes. It will be okay even before 8.4, if simplified patch to the
core is accepted. I will post simplified patch to the core as follows:

1. Mark the flag to indicate that the WAL record is compressable from
full page writes to incremental log. This flag will be set if
a) It is not written during the hot backup, and
b) Archive command is active, and
c) WAL contains full page writes, and
d) full_page_writes=on.
No logical log will be written to WAL in this case, and
2. During recovery, xl_tot_len check will be skipped for compressed WAL
records.

Please note that new GUC is not needed in this patch.

With this patch, compress/decompress can be developped outside the core.

I'd be very grateful if this patch can be considered again.

Best Regards;

--
-------------
Koichi Suzuki

diff -cr pgsql_org/src/backend/access/transam/xlog.c pgsql/src/backend/access/transam/xlog.c
*** pgsql_org/src/backend/access/transam/xlog.c	2007-05-02 15:56:38.000000000 +0900
--- pgsql/src/backend/access/transam/xlog.c	2007-05-07 16:30:38.000000000 +0900
***************
*** 837,842 ****
--- 837,854 ----
return RecPtr;
}
+ 	/*
+ 	 * If online backup is not in progress and WAL archiving is active, mark
+ 	 * backup blocks removable if any.
+ 	 * This mark will be referenced during archiving to remove needless backup
+ 	 * blocks in the record and compress WAL segment files.
+ 	 */
+ 	if (XLogArchivingActive() && fullPageWrites &&
+ 			(info & XLR_BKP_BLOCK_MASK) && !Insert->forcePageWrites)
+ 	{
+ 		info |= XLR_BKP_REMOVABLE;
+ 	}
+ 
/* Insert record header */

record = (XLogRecord *) Insert->currpos;
***************
*** 2738,2750 ****
blk += blen;
}

! /* Check that xl_tot_len agrees with our calculation */
! if (blk != (char *) record + record->xl_tot_len)
{
! ereport(emode,
! (errmsg("incorrect total length in record at %X/%X",
! recptr.xlogid, recptr.xrecoff)));
! return false;
}

/* Finally include the record header */
--- 2750,2778 ----
blk += blen;
}

! /*
! * If physical log has not been removed, check the length to see
! * the following.
! * - No physical log existed originally,
! * - WAL record was not removable because it is generated during
! * the online backup,
! * - Cannot be removed because the physical log spanned in
! * two segments.
! * The reason why we skip the length check on the physical log removal is
! * that the flag XLR_SET_BKB_BLOCK(0..2) is reset to zero and it prevents
! * the above loop to proceed blk to the end of the record.
! */
! if (!(record->xl_info & XLR_BKP_REMOVABLE) ||
! record->xl_info & XLR_BKP_BLOCK_MASK)
{
! /* Check that xl_tot_len agrees with our calculation */
! if (blk != (char *) record + record->xl_tot_len)
! {
! ereport(emode,
! (errmsg("incorrect total length in record at %X/%X",
! recptr.xlogid, recptr.xrecoff)));
! return false;
! }
}

/* Finally include the record header */
pgsql/src/backend/access/transam���������������: xlog.c.orig
diff -cr pgsql_org/src/include/access/xlog.h pgsql/src/include/access/xlog.h
*** pgsql_org/src/include/access/xlog.h	2007-01-06 07:19:51.000000000 +0900
--- pgsql/src/include/access/xlog.h	2007-05-07 16:30:38.000000000 +0900
***************
*** 66,73 ****
/*
* If we backed up any disk blocks with the XLOG record, we use flag bits in
* xl_info to signal it.  We support backup of up to 3 disk blocks per XLOG
!  * record.	(Could support 4 if we cared to dedicate all the xl_info bits for
!  * this purpose; currently bit 0 of xl_info is unused and available.)
*/
#define XLR_BKP_BLOCK_MASK		0x0E	/* all info bits used for bkp blocks */
#define XLR_MAX_BKP_BLOCKS		3
--- 66,74 ----
/*
* If we backed up any disk blocks with the XLOG record, we use flag bits in
* xl_info to signal it.  We support backup of up to 3 disk blocks per XLOG
!  * record.
!  * Bit 0 of xl_info is used to represent that backup blocks are not necessary
!  * in archive-log.
*/
#define XLR_BKP_BLOCK_MASK		0x0E	/* all info bits used for bkp blocks */
#define XLR_MAX_BKP_BLOCKS		3
***************
*** 75,80 ****
--- 76,82 ----
#define XLR_BKP_BLOCK_1			XLR_SET_BKP_BLOCK(0)	/* 0x08 */
#define XLR_BKP_BLOCK_2			XLR_SET_BKP_BLOCK(1)	/* 0x04 */
#define XLR_BKP_BLOCK_3			XLR_SET_BKP_BLOCK(2)	/* 0x02 */
+ #define XLR_BKP_REMOVABLE		XLR_SET_BKP_BLOCK(3)	/* 0x01 */

/*
* Sometimes we log records which are out of transaction control.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#67Tom Lane
tgl@sss.pgh.pa.us
In reply to: Koichi Suzuki (#65)
Re: [PATCHES] Full page writes improvement, code update

Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes:

As replied to "Patch queue triage" by Tom, here's simplified patch to
mark WAL record as "compressable", with no increase in WAL itself.
Compression/decompression commands will be posted separately to PG
Foundary for further review.

Applied with some minor modifications. I didn't like the idea of
suppressing the sanity-check on WAL record length; I think that's
fairly important. Instead, I added a provision for an XLOG_NOOP
WAL record type that can be used to fill in the extra space.
The way I envision that working is that the compressor removes
backup blocks and converts each compressible WAL record to have the
same contents and length it would've had if written without backup
blocks. Then, it inserts an XLOG_NOOP record with length set to
indicate the amount of extra space that needs to be chewed up --
but in the compressed version of the WAL file, XLOG_NOOP's "data
area" is not actually stored. The decompressor need only scan
the file looking for XLOG_NOOP and insert the requisite number of
zero bytes (and maybe recompute the XLOG_NOOP's CRC, depending on
whether you want it to be valid for the short-format record in the
compressed file). There will also be some games to be played for
WAL page boundaries, but you had to do that anyway.

regards, tom lane

#68Koichi Suzuki
koichi.szk@gmail.com
In reply to: Tom Lane (#67)
Re: [PATCHES] Full page writes improvement, code update

I really appreciate for the modification.

I also believe XLOG_NOOP is cool to maintains XLOG format consistent.
I'll continue to write a code to produce incremental log record from
the full page writes as well as too maintain CRC, XLOOG_NOOP and
other XLOG locations, I also found that you've added information on
btree strip log records, which anables to produce corresponding
incremental logs from the full page writes.

2007/5/21, Tom Lane <tgl@sss.pgh.pa.us>:

Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp> writes:

As replied to "Patch queue triage" by Tom, here's simplified patch to
mark WAL record as "compressable", with no increase in WAL itself.
Compression/decompression commands will be posted separately to PG
Foundary for further review.

Applied with some minor modifications. I didn't like the idea of
suppressing the sanity-check on WAL record length; I think that's
fairly important. Instead, I added a provision for an XLOG_NOOP
WAL record type that can be used to fill in the extra space.
The way I envision that working is that the compressor removes
backup blocks and converts each compressible WAL record to have the
same contents and length it would've had if written without backup
blocks. Then, it inserts an XLOG_NOOP record with length set to
indicate the amount of extra space that needs to be chewed up --
but in the compressed version of the WAL file, XLOG_NOOP's "data
area" is not actually stored. The decompressor need only scan
the file looking for XLOG_NOOP and insert the requisite number of
zero bytes (and maybe recompute the XLOG_NOOP's CRC, depending on
whether you want it to be valid for the short-format record in the
compressed file). There will also be some games to be played for
WAL page boundaries, but you had to do that anyway.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--
------
Koichi Suzuki

#69Simon Riggs
simon@2ndquadrant.com
In reply to: Koichi Suzuki (#27)
Re: [HACKERS] Full page writes improvement, code update

On Tue, 2007-04-10 at 16:23 +0900, Koichi Suzuki wrote:

Here're two patches for

1) lesslog_core.patch, patch for core, to set a mark to the log record
to be removed in archiving,

2) lesslog_contrib.patch, patch for contrib/lesslog, pg_compresslog and
pg_decompresslog,

respectively, as asked. I hope they work.

Koichi-san,

Earlier, I offered to document the use of pg_compresslog and
pg_decompresslog and would like to do that now.

My understanding was that we would make these programs available on
pgfoundry.org. Unfortunately, I can't find these files there, so perhaps
I misunderstood.

Do we have later versions of these programs that work with the changes
Tom committed on 20 May? Or is the code posted here the latest version?

Many thanks,

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com