Release Note Changes
Few proposals
- Can we say "smoothed" rather than "distributed" checkpoints?
"Smoothed checkpoints greatly reduce checkpoint I/O spikes"
- Heap-Only Tuples (HOT) accelerate space reuse for UPDATEs
change to
"Heap-Only Tuples (HOT) improve performance of frequent UPDATEs"
I also notice that two performance features have disappeared from the
release notes. (Presumably they have been removed from source). Both of
them have changes that can be seen by users, so can't see why we would
want them removed.
- Merge Join performance has been substantially improved by ring buffer
which avoids materializing the previous sort step. (Simon, Greg)
More info, not for release notes:
The materialization of the prior sort step would generally double the
time taken for the sort, so avoiding this effectively gives a 50%
performance gain on sorts that are part of large merge joins.
- WAL file switches don't update controlfile any longer. Recovery now
refers to the last checkpoint time, which may be many minutes earlier
than time previously mentioned. (Simon, Tom)
More info, not for release notes:
WAL file switches were performed holding important LWLocks, so this
improves scalability on high end systems as well as reducing response
time spikes under heavy load on all kinds of hardware.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Fri, 2007-11-30 at 06:31 +0000, Simon Riggs wrote:
I also notice that two performance features have disappeared from the
release notes. (Presumably they have been removed from source). Both of
them have changes that can be seen by users, so can't see why we would
want them removed.
Wow, just realised 3 of Heikki's performance patches aren't mentioned
either:
- CheckpointStartLock removal
- I/O reduction during recovery
- Tuning of Visibility code
I'm not sure what the rationale is for not mentioning these things.
They're at least as important, if not more so, than mentioning minor
source code changes.
If people understand there aren't 13 performance improvements there are
at *least* 19+ that is a positive message to help people decide to
upgrade.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
"Simon Riggs" <simon@2ndquadrant.com> writes:
If people understand there aren't 13 performance improvements there are
at *least* 19+ that is a positive message to help people decide to
upgrade.
Frankly I think the release notes are already too long. People who judge a
release by counting the number of items in the release notes are not worth
appeasing. Including every individual lock removed or code path optimized will
only obscure the important points on which people should be judging the
relevance of the release to them. Things like smoothing checkpoint i/o which
could be removing a show-stopper problem for them.
If they're mentioned at all a single release note bullet point saying "Many
optimizations and concurrency improvements in areas such as transaction start
and finish, checkpoint start, record visibility checking, merge join plans,
..." would suffice.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!
On 11/30/07, Gregory Stark <stark@enterprisedb.com> wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
If people understand there aren't 13 performance improvements there are
at *least* 19+ that is a positive message to help people decide to
upgrade.Frankly I think the release notes are already too long. People who judge a
release by counting the number of items in the release notes are not worth
appeasing. Including every individual lock removed or code path optimized
will
only obscure the important points on which people should be judging the
relevance of the release to them. Things like smoothing checkpoint i/o
which
could be removing a show-stopper problem for them.If they're mentioned at all a single release note bullet point saying
"Many
optimizations and concurrency improvements in areas such as transaction
start
and finish, checkpoint start, record visibility checking, merge join
plans,
..." would suffice.--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
i agree that release notes should not be too long, but may be there should
be (if there isn't one already) something like a "change log" where people
can find out all the changes done from the previous release, if they are
intrested ?
--
Usama Munir Dar http://linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar
On Fri, 2007-11-30 at 09:49 +0000, Gregory Stark wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
If people understand there aren't 13 performance improvements there are
at *least* 19+ that is a positive message to help people decide to
upgrade.Frankly I think the release notes are already too long.
So why do we have stuff in there that the users will never see?
We already have a release summary, so why summarise *some* of the detail
as well, but not all of it???
I see no reason to diminish yours, Heikki's or my own contributions, all
of which were in the area of performance, which people do care about.
None of the ones I mentioned were trivial patches, nor were their
effects small.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote:
- Heap-Only Tuples (HOT) accelerate space reuse for UPDATEs
change to
"Heap-Only Tuples (HOT) improve performance of frequent UPDATEs"
I think we need to qualify this, or it could be quite misleading.
perhaps add "that don't affect indexed columns" or something like that.
cheers
andrew
On Nov 30, 2007 4:49 AM, Gregory Stark <stark@enterprisedb.com> wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
If people understand there aren't 13 performance improvements there are
at *least* 19+ that is a positive message to help people decide to
upgrade.Frankly I think the release notes are already too long. People who judge a
release by counting the number of items in the release notes are not worth
appeasing. Including every individual lock removed or code path optimized will
only obscure the important points on which people should be judging the
relevance of the release to them. Things like smoothing checkpoint i/o which
could be removing a show-stopper problem for them.
IMO, it's probably good to include things that materially affect how
people operate the databse. An example is improvements to statistics
gathering because it eliminates a historical trade-off in configuring
the server. I agree with you regarding basic operations though.
merlin
Simon Riggs wrote:
On Fri, 2007-11-30 at 06:31 +0000, Simon Riggs wrote:
I also notice that two performance features have disappeared from the
release notes. (Presumably they have been removed from source). Both of
them have changes that can be seen by users, so can't see why we would
want them removed.Wow, just realised 3 of Heikki's performance patches aren't mentioned
either:- CheckpointStartLock removal
I don't think it's worth mentioning, given that we have the Load
Distributed Checkpoints in there. That alone will tell people that
there's been some major changes to checkpoints.
- I/O reduction during recovery
This might be worth mentioning, since it can be quite a big difference
in the right circumstances, and it helps a bit with the scalability
problem of the recovery. Should mention that it only helps with
full_pages_writes=on. One more reason to not gamble with data integrity ;-).
- Tuning of Visibility code
I don't think that was release notes worthy.
The release notes are quite long already...
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
"Usama Dar" <munir.usama@gmail.com> writes:
i agree that release notes should not be too long, but may be there should
be (if there isn't one already) something like a "change log" where people
can find out all the changes done from the previous release, if they are
intrested ?
The CVS history (either direct from the CVS server, or in the
pgsql-committers archives) will give you as much detail as you could
possibly want.
regards, tom lane
Greg,
Frankly I think the release notes are already too long. People who judge a
release by counting the number of items in the release notes are not worth
appeasing. Including every individual lock removed or code path optimized
will only obscure the important points on which people should be judging
the relevance of the release to them. Things like smoothing checkpoint i/o
which could be removing a show-stopper problem for them.
I disagree. For people who want a quick summary of the major user-facing
things changed we'll have multiple sources: (a) the announcement, (b) the
press features list, (c) the Feature-Version matrix. The Release notes
should have a *complete* list of changes.
Why? Because we don't use a bug/feature tracker. So a user trying to figure
out "hey, was my issue XXX fixed so that I should upgrade?" has *no other
source* than the Release notes to look at, except CVS history. And if we
start asking sysadmins and application vendors to read the CVS history, we're
gonna simply push them towards other DBMSes which have this information more
clearly.
If we want to shorten the release notes, then we should adopt an issue
tracker.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Heikki,
This might be worth mentioning, since it can be quite a big difference
in the right circumstances, and it helps a bit with the scalability
problem of the recovery. Should mention that it only helps with
full_pages_writes=on. One more reason to not gamble with data integrity
;-).
Does this mean that recovery from logs with full_page_writes will be faster
than recovery from logs without them?
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Josh Berkus <josh@agliodbs.com> writes:
I disagree. For people who want a quick summary of the major user-facing
things changed we'll have multiple sources: (a) the announcement, (b) the
press features list, (c) the Feature-Version matrix. The Release notes
should have a *complete* list of changes.
Define "complete".
Why? Because we don't use a bug/feature tracker. So a user trying to
figure out "hey, was my issue XXX fixed so that I should upgrade?" has
*no other source* than the Release notes to look at, except CVS
history. And if we start asking sysadmins and application vendors to
read the CVS history, we're gonna simply push them towards other
DBMSes which have this information more clearly.
So in other words, you don't *really* want "complete".
This discussion is all about finding a suitable balance between length
and detail. Simplistic pronouncements don't help us strike that
balance.
FWIW, I tend to agree with the folks who think Bruce trimmed too much
this time. But the release notes are, and always have been, intended to
boil the CVS history down to something useful by eliminating irrelevant
detail. For the vast majority of people, the details that are being
mentioned here are indeed irrelevant. There will be some for whom they
are not. But depending on the question, almost any detail might not be
irrelevant, and at that point you have to be prepared to go check the
archives.
regards, tom lane
On Nov 30, 2007 11:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josh Berkus <josh@agliodbs.com> writes:
I disagree. For people who want a quick summary of the major
user-facing
things changed we'll have multiple sources: (a) the announcement, (b)
the
press features list, (c) the Feature-Version matrix. The Release notes
should have a *complete* list of changes.Define "complete".
Why? Because we don't use a bug/feature tracker. So a user trying to
figure out "hey, was my issue XXX fixed so that I should upgrade?" has
*no other source* than the Release notes to look at, except CVS
history. And if we start asking sysadmins and application vendors to
read the CVS history, we're gonna simply push them towards other
DBMSes which have this information more clearly.So in other words, you don't *really* want "complete".
i think he means a list meant for end users which mentions all features and
bug fixes done for that release. Your argument of go read the CVS logs is
valid, but there are just too many for someone to go through to get the
complete picture. i mean people may end up reading 1000 + logs in a worst
case scenario to find out if a bug they are interested in is fixed , and the
someone who compiled the release notes didn't think it was important enough
to make it to the notes. Going through a 5K release notes document would be
half that time, granted that over time thier ability to read through logs
quicker will improve, but thats a learning curve they have to be willing to
go trough, and not everyone will be interested to do that
if i would have to find a word to describe what we need, i would say we need
something *compendious* i.e. what is at once full in scope and brief and
concise in treatment
it is however work that someone will have to do, but it can be managed as
such that it is a by-product of the process, instead of a 'one time in the
end' job.
This discussion is all about finding a suitable balance between length
and detail. Simplistic pronouncements don't help us strike that
balance.FWIW, I tend to agree with the folks who think Bruce trimmed too much
this time. But the release notes are, and always have been, intended to
boil the CVS history down to something useful by eliminating irrelevant
detail. For the vast majority of people, the details that are being
mentioned here are indeed irrelevant. There will be some for whom they
are not. But depending on the question, almost any detail might not be
irrelevant, and at that point you have to be prepared to go check the
archives.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Usama Munir Dar http://linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar
Josh Berkus wrote:
This might be worth mentioning, since it can be quite a big difference
in the right circumstances, and it helps a bit with the scalability
problem of the recovery. Should mention that it only helps with
full_pages_writes=on. One more reason to not gamble with data integrity
;-).Does this mean that recovery from logs with full_page_writes will be faster
than recovery from logs without them?
In general, yes. Depends a lot on how randomly the data in the WAL is
distributed, speed of reading from WAL etc.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Fri, 30 Nov 2007 06:32:10 -0500
Andrew Dunstan <andrew@dunslane.net> wrote:
Simon Riggs wrote:
- Heap-Only Tuples (HOT) accelerate space reuse for UPDATEs
change to
"Heap-Only Tuples (HOT) improve performance of frequent UPDATEs"I think we need to qualify this, or it could be quite misleading.
perhaps add "that don't affect indexed columns" or something like
that.
Heap Only Tuples (HOT) improves performance for heavy update tables
where the column being updated isn't indexed?
Seems kind of long but isn't that "exactly" what it does?
Sincerely,
Joshua D. Drake
Show quoted text
cheers
andrew
---------------------------(end of
broadcast)--------------------------- TIP 1: if posting/reading
through Usenet, please send an appropriate subscribe-nomail command
to majordomo@postgresql.org so that your message can get through to
the mailing list cleanly
On Fri, 2007-11-30 at 13:07 -0500, Tom Lane wrote:
FWIW, I tend to agree with the folks who think Bruce trimmed too much
this time. But the release notes are, and always have been, intended to
boil the CVS history down to something useful by eliminating irrelevant
detail.
OK, so given everything mentioned on this thread, there are three items
that are user noticeable and so don't fall into the category of
irrelevant detail:
- Merge Join performance has been substantially improved when low number
of duplicate join keys exist on the outer side of the join (Simon, Greg)
- Large I/O reduction during recovery when full_page_writes = on
(Heikki)
- WAL file switch performance has been improved. Recovery startup now
refers to the last checkpoint time, which may be anything up to the
checkpoint_timeout interval before a database crash. (Simon, Tom)
The last one seems important to me, but I can see that might be TMI for
some people, though I feel we should document it somewhere.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote:
[ Sorry for my delay in replying to this.]
Few proposals
- Can we say "smoothed" rather than "distributed" checkpoints?
"Smoothed checkpoints greatly reduce checkpoint I/O spikes"
Agreed. Changed.
- Heap-Only Tuples (HOT) accelerate space reuse for UPDATEs
change to
"Heap-Only Tuples (HOT) improve performance of frequent UPDATEs"
I used the original text because it tries to explain _how_ HOT improves
performance. The item that has the descriptive text explains how the
space reuse works. A generic "improve performance" doesn't seem like an
improvement.
I also notice that two performance features have disappeared from the
release notes. (Presumably they have been removed from source). Both of
them have changes that can be seen by users, so can't see why we would
want them removed.- Merge Join performance has been substantially improved by ring buffer
which avoids materializing the previous sort step. (Simon, Greg)More info, not for release notes:
The materialization of the prior sort step would generally double the
time taken for the sort, so avoiding this effectively gives a 50%
performance gain on sorts that are part of large merge joins.- WAL file switches don't update controlfile any longer. Recovery now
refers to the last checkpoint time, which may be many minutes earlier
than time previously mentioned. (Simon, Tom)More info, not for release notes:
WAL file switches were performed holding important LWLocks, so this
improves scalability on high end systems as well as reducing response
time spikes under heavy load on all kinds of hardware.
Let me give you the criteria I use for the release notes. The release
notes try to document all changes visible to the average user in a way
that is understandable to the average user.
The above items are probably neither visible (except faster) nor
understandable. Now of course we change change that criteria but that
is going to need a larger discussion.
One idea that would allow these to be included is a "geek" section of
the release notes that has items that would not be understandable by the
average user, e.g. optimizer improvements, locking improvements. It
would be kind of like "Postgres is faster in this release in some
obscure ways, and this is why". Of course the section would have to be
labeled clearly and it does open us up to the release notes being less
user-friendly.
Such a section seems to be more of a supplying a curiosity rather than
useful information, though.
I will address the issue of giving people credit for work in my next
email.
The good news is that we can keep adjusting the release notes until 8.3
final.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Simon Riggs wrote:
On Fri, 2007-11-30 at 06:31 +0000, Simon Riggs wrote:
I also notice that two performance features have disappeared from the
release notes. (Presumably they have been removed from source). Both of
them have changes that can be seen by users, so can't see why we would
want them removed.Wow, just realised 3 of Heikki's performance patches aren't mentioned
either:- CheckpointStartLock removal
- I/O reduction during recovery
- Tuning of Visibility code
I'm not sure what the rationale is for not mentioning these things.
They're at least as important, if not more so, than mentioning minor
source code changes.
The source code changes are more _visible_, I think, meaning they often
require programmers to adjust their code.
If people understand there aren't 13 performance improvements there are
at *least* 19+ that is a positive message to help people decide to
upgrade.
Frankly I think most people expact an X% improvement in every Postgres
release. I don't see how mentioning 19 vs. 13 items is going to change
the general understanding that you should upgrade to get better
performance.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Usama Dar wrote:
i agree that release notes should not be too long, but may be there should
be (if there isn't one already) something like a "change log" where people
can find out all the changes done from the previous release, if they are
intrested ?
Right now only the CVS logs provide more detailed information. At some
point perhaps we should have something that summarizes the CVS logs.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
- Heap-Only Tuples (HOT) accelerate space reuse for UPDATEs
change to
"Heap-Only Tuples (HOT) improve performance of frequent UPDATEs"I used the original text because it tries to explain _how_ HOT improves
performance. The item that has the descriptive text explains how the
space reuse works. A generic "improve performance" doesn't seem like an
improvement.
I still think this needs to be qualified either way. As it stands it's
quite misleading. Many update scenarios will not benefit one whit from
HOT updates.
cheers
andrew