postpone next week's release

Started by Robert Haasover 10 years ago157 messages

robertmhaas@gmail.com

over 10 years ago

Hi,

I think we should postpone next week's release. I have been hard at
work on the multixact-related bugs that were reported in 9.4.2 and
9.3.7, and the subsequent bugs found by code-reading, but getting them
all fixed by Monday doesn't seem realistic. Such fixes should have
careful review, and not be dashed into the tree under time pressure.

We could do the release anyway to relieve the pain caused by the
fsync-pgdata hard-failure problem, but it seems to me that if we do
that, we're just going to end up having to do yet another release
almost right away. I think it would be better to wait and do one
release that fixes both sets of issues.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Robert Haas (#1)

Re: postpone next week's release

On Fri, May 29, 2015 at 02:02:43PM -0400, Robert Haas wrote:

Hi,

I think we should postpone next week's release. I have been hard at
work on the multixact-related bugs that were reported in 9.4.2 and
9.3.7, and the subsequent bugs found by code-reading, but getting them
all fixed by Monday doesn't seem realistic. Such fixes should have
careful review, and not be dashed into the tree under time pressure.

We could do the release anyway to relieve the pain caused by the
fsync-pgdata hard-failure problem, but it seems to me that if we do
that, we're just going to end up having to do yet another release
almost right away. I think it would be better to wait and do one
release that fixes both sets of issues.

It does seem wise to make sure we have all these items fixed. We have
PR'ed the recovery failure issue so I think we are good at this point.
I see having to put out another multi-xact-only fix release the week
after as being a bigger negative.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Robert Haas (#1)

Re: postpone next week's release

* Robert Haas (robertmhaas@gmail.com) wrote:

I think we should postpone next week's release. I have been hard at
work on the multixact-related bugs that were reported in 9.4.2 and
9.3.7, and the subsequent bugs found by code-reading, but getting them
all fixed by Monday doesn't seem realistic. Such fixes should have
careful review, and not be dashed into the tree under time pressure.

We could do the release anyway to relieve the pain caused by the
fsync-pgdata hard-failure problem, but it seems to me that if we do
that, we're just going to end up having to do yet another release
almost right away. I think it would be better to wait and do one
release that fixes both sets of issues.

Agreed.

I just caution that we appreciate PGCon coming up and that we do our
best to avoid running into a case where we have to push it further due
to everyone being at the conference.

Thanks!

Stephen

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Stephen Frost (#3)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 02:54:31PM -0400, Stephen Frost wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

I think we should postpone next week's release. I have been hard at
work on the multixact-related bugs that were reported in 9.4.2 and
9.3.7, and the subsequent bugs found by code-reading, but getting them
all fixed by Monday doesn't seem realistic. Such fixes should have
careful review, and not be dashed into the tree under time pressure.

We could do the release anyway to relieve the pain caused by the
fsync-pgdata hard-failure problem, but it seems to me that if we do
that, we're just going to end up having to do yet another release
almost right away. I think it would be better to wait and do one
release that fixes both sets of issues.

Agreed.

I just caution that we appreciate PGCon coming up and that we do our
best to avoid running into a case where we have to push it further due
to everyone being at the conference.

This brings up the issue of when we want to do 9.5 beta. Ideas?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Robert Haas (#1)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 8:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Hi,

I think we should postpone next week's release. I have been hard at
work on the multixact-related bugs that were reported in 9.4.2 and
9.3.7, and the subsequent bugs found by code-reading, but getting them
all fixed by Monday doesn't seem realistic. Such fixes should have
careful review, and not be dashed into the tree under time pressure.

We could do the release anyway to relieve the pain caused by the
fsync-pgdata hard-failure problem, but it seems to me that if we do
that, we're just going to end up having to do yet another release
almost right away. I think it would be better to wait and do one
release that fixes both sets of issues.

Thoughts?

I'm a bit split on this.

We *definitely* don't want to release the multixact fix without it being
carefully reviewed, that's the part I'm not split about :) And I fully
appreciate we can't have that done by monday.

However, the file-permission thing seems to hit quite a few people (have we
ever had this many bug reports after a minor release), which means wed
really want to get that out quickly.

Do you have any feeling of how likely people are to actually hit the
multixact one? I've followed some of that impressive debugging you guys
did, and I know it's a pretty critical bug if you hit it, but how
wide-spread will it be?

I guess one option we could do is encourage packagers to push updated
packages (-2 versions) basically. But if we do that, perhaps we might as
well release anyway?

AIUI, the permission thing won't actually be very likely to affect Windows
users. And Windows packages are the ones that take by far the most work to
make. Perhaps we should consider skipping making packages of that version
on Windows, and then plan to push yet another minor one or two weeks later,
that goes out on all platforms?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Stephen Frost (#3)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 8:54 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

I think we should postpone next week's release. I have been hard at
work on the multixact-related bugs that were reported in 9.4.2 and
9.3.7, and the subsequent bugs found by code-reading, but getting them
all fixed by Monday doesn't seem realistic. Such fixes should have
careful review, and not be dashed into the tree under time pressure.

We could do the release anyway to relieve the pain caused by the
fsync-pgdata hard-failure problem, but it seems to me that if we do
that, we're just going to end up having to do yet another release
almost right away. I think it would be better to wait and do one
release that fixes both sets of issues.

Agreed.

I just caution that we appreciate PGCon coming up and that we do our
best to avoid running into a case where we have to push it further due
to everyone being at the conference.

If we plan it, we certainly *can* make a release during pgcon. If that's
what the reasonable timing comes down to, I think getting these fixes out
definitely has to be considered more important than the conference, so a
few of us will just have to take a break...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Magnus Hagander (#5)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 3:09 PM, Magnus Hagander <magnus@hagander.net> wrote:

Do you have any feeling of how likely people are to actually hit the
multixact one? I've followed some of that impressive debugging you guys did,
and I know it's a pretty critical bug if you hit it, but how wide-spread
will it be?

That precise problem has been reported a few times, but it may not be
widespread. I don't know. My bigger concern is that, at present,
taking a base backup is broken. I haven't figured out the exact
reproduction scenario, but I think it's something like this:

- begin base backup
- checkpoint happens, truncating pg_multixact
- at this point pg_multixact gets copied
- end base backup

I think what will happen on replay is that replaying the checkpoint,
it will try to reference pg_multixact files that don't exist any more
and die with a fatal error.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Magnus Hagander (#6)

Re: [CORE] postpone next week's release

* Magnus Hagander (magnus@hagander.net) wrote:

On Fri, May 29, 2015 at 8:54 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

I think we should postpone next week's release. I have been hard at
work on the multixact-related bugs that were reported in 9.4.2 and
9.3.7, and the subsequent bugs found by code-reading, but getting them
all fixed by Monday doesn't seem realistic. Such fixes should have
careful review, and not be dashed into the tree under time pressure.

We could do the release anyway to relieve the pain caused by the
fsync-pgdata hard-failure problem, but it seems to me that if we do
that, we're just going to end up having to do yet another release
almost right away. I think it would be better to wait and do one
release that fixes both sets of issues.

Agreed.

I just caution that we appreciate PGCon coming up and that we do our
best to avoid running into a case where we have to push it further due
to everyone being at the conference.

If we plan it, we certainly *can* make a release during pgcon. If that's
what the reasonable timing comes down to, I think getting these fixes out
definitely has to be considered more important than the conference, so a
few of us will just have to take a break...

I don't disagree with you about any of that, just wanted to make mention
of the timing.

Thanks!

Stephen

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Robert Haas (#7)

Re: [CORE] postpone next week's release

On 05/29/2015 12:18 PM, Robert Haas wrote:

On Fri, May 29, 2015 at 3:09 PM, Magnus Hagander <magnus@hagander.net> wrote:

Do you have any feeling of how likely people are to actually hit the
multixact one? I've followed some of that impressive debugging you guys did,
and I know it's a pretty critical bug if you hit it, but how wide-spread
will it be?

That precise problem has been reported a few times, but it may not be
widespread. I don't know. My bigger concern is that, at present,
taking a base backup is broken.

This I think is the bigger issue. They both are horrible but basebackup
being broken is rather... egregious.

--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Magnus Hagander (#6)

Re: [CORE] postpone next week's release

Magnus Hagander <magnus@hagander.net> writes:

On Fri, May 29, 2015 at 8:54 PM, Stephen Frost <sfrost@snowman.net> wrote:

I just caution that we appreciate PGCon coming up and that we do our
best to avoid running into a case where we have to push it further due
to everyone being at the conference.

If we plan it, we certainly *can* make a release during pgcon. If that's
what the reasonable timing comes down to, I think getting these fixes out
definitely has to be considered more important than the conference, so a
few of us will just have to take a break...

I think there's no way that we wait more than one additional week to push
the fsync fix. So the problem is not with scheduling the update releases,
it's with whether we can also fit in a 9.5 beta release before PGCon.

(I can't see doing a beta *during* PGCon week. I for one am going to be
on an airplane at the time I'd normally have to be Doing Release Stuff.)

I know Josh doesn't like to do beta1 releases concurrently with back
branches because it confuses the PR messaging. But we could make an
exception perhaps; or do all those releases the same week but announce
the beta the day after the bugfix releases.

Or we just let the beta slide till after PGCon, but then I think we're
missing some excitement factor.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Tom Lane (#10)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Fri, May 29, 2015 at 8:54 PM, Stephen Frost <sfrost@snowman.net>

wrote:

I just caution that we appreciate PGCon coming up and that we do our
best to avoid running into a case where we have to push it further due
to everyone being at the conference.

If we plan it, we certainly *can* make a release during pgcon. If that's
what the reasonable timing comes down to, I think getting these fixes out
definitely has to be considered more important than the conference, so a
few of us will just have to take a break...

I think there's no way that we wait more than one additional week to push
the fsync fix. So the problem is not with scheduling the update releases,
it's with whether we can also fit in a 9.5 beta release before PGCon.

I think 9.5 beta has to stand back. The question is what we do with the
potentially two minor releases. Then we can slot in the beta whenever.

If we do the minor as currently planned, can we do another one the week
after to deal with the multixact issues? (scheduling wise we're going to
have to do one the week after *regardless*, the question is if we can make
two different ones, or if we need to fold them into one)

(I can't see doing a beta *during* PGCon week. I for one am going to be

on an airplane at the time I'd normally have to be Doing Release Stuff.)

Agreed. We can push a *minor* during pgcon, but not beta.

I know Josh doesn't like to do beta1 releases concurrently with back

branches because it confuses the PR messaging. But we could make an
exception perhaps; or do all those releases the same week but announce
the beta the day after the bugfix releases.

I can't comment on the PR parts, I'll leave that to Josh.

Or we just let the beta slide till after PGCon, but then I think we're
missing some excitement factor.

Well, most of the people going to pgcon know it already. And most of the
excitement affects people who are not at pgcon (simply based on that most
of our users are not at pgcon). If doing it the week after pgcon is what
ends up making sense once weve figured out what to do with the minors, then
so be it, IMNSHO.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#12

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Tom Lane (#10)

Re: [CORE] postpone next week's release

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

(I can't see doing a beta *during* PGCon week. I for one am going to be
on an airplane at the time I'd normally have to be Doing Release Stuff.)

[...]

Or we just let the beta slide till after PGCon, but then I think we're
missing some excitement factor.

Personally, I'd be all for a "watch Tom do the 9.5 beta release!"
Unconference slot...

(mostly kidding, but I'm 100% sure it'd draw a huge crowd..)

Thanks!

Stephen

#13

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Magnus Hagander (#11)

Re: [CORE] postpone next week's release

Magnus Hagander <magnus@hagander.net> writes:

On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think there's no way that we wait more than one additional week to push
the fsync fix. So the problem is not with scheduling the update releases,
it's with whether we can also fit in a 9.5 beta release before PGCon.

I think 9.5 beta has to stand back. The question is what we do with the
potentially two minor releases. Then we can slot in the beta whenever.

If we do the minor as currently planned, can we do another one the week
after to deal with the multixact issues? (scheduling wise we're going to
have to do one the week after *regardless*, the question is if we can make
two different ones, or if we need to fold them into one)

I suppose we could, but it doubles the amount of release gruntwork
involved, and it doesn't exactly make us look good to our users either.

I believe Christoph indicated that he was going to cherry-pick the fsync
patch and push out an intermediate Debian package with that fix, so at
least for that community there is not an urgent reason to get out a set
of releases with only the fsync fixes and not the multixact fixes. I'm
not clear though on how many of the other reports we heard came from
Debian users. (Some of them did, but maybe not all.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Tom Lane (#13)

Re: [CORE] postpone next week's release

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think there's no way that we wait more than one additional week to push
the fsync fix. So the problem is not with scheduling the update releases,
it's with whether we can also fit in a 9.5 beta release before PGCon.

I think 9.5 beta has to stand back. The question is what we do with the
potentially two minor releases. Then we can slot in the beta whenever.

If we do the minor as currently planned, can we do another one the week
after to deal with the multixact issues? (scheduling wise we're going to
have to do one the week after *regardless*, the question is if we can make
two different ones, or if we need to fold them into one)

I suppose we could, but it doubles the amount of release gruntwork
involved, and it doesn't exactly make us look good to our users either.

Agreed. Makes it look like we can't manage to figure out our bugs and
put fixes for them together in sensible releases..

Thanks!

Stephen

#15

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Stephen Frost (#14)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 9:46 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think there's no way that we wait more than one additional week to

push

the fsync fix. So the problem is not with scheduling the update

releases,

it's with whether we can also fit in a 9.5 beta release before PGCon.

I think 9.5 beta has to stand back. The question is what we do with the
potentially two minor releases. Then we can slot in the beta whenever.

If we do the minor as currently planned, can we do another one the week
after to deal with the multixact issues? (scheduling wise we're going

to

have to do one the week after *regardless*, the question is if we can

make

two different ones, or if we need to fold them into one)

I suppose we could, but it doubles the amount of release gruntwork
involved, and it doesn't exactly make us look good to our users either.

Agreed. Makes it look like we can't manage to figure out our bugs and
put fixes for them together in sensible releases..

The flipside of that is that we have a bug fix that's preventing peoples
databases from starting, and we're the intentionally delaying the shipment
of it. Though i guess a mitigating fact there is that it is very easy to
manually recover from that. But it's painful if your db server restarts
awhen you're not around...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#16

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Magnus Hagander (#15)

Re: [CORE] postpone next week's release

* Magnus Hagander (magnus@hagander.net) wrote:

On Fri, May 29, 2015 at 9:46 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Fri, May 29, 2015 at 9:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think there's no way that we wait more than one additional week to

push

the fsync fix. So the problem is not with scheduling the update

releases,

it's with whether we can also fit in a 9.5 beta release before PGCon.

I think 9.5 beta has to stand back. The question is what we do with the
potentially two minor releases. Then we can slot in the beta whenever.

If we do the minor as currently planned, can we do another one the week
after to deal with the multixact issues? (scheduling wise we're going

to

have to do one the week after *regardless*, the question is if we can

make

two different ones, or if we need to fold them into one)

I suppose we could, but it doubles the amount of release gruntwork
involved, and it doesn't exactly make us look good to our users either.

Agreed. Makes it look like we can't manage to figure out our bugs and
put fixes for them together in sensible releases..

The flipside of that is that we have a bug fix that's preventing peoples
databases from starting, and we're the intentionally delaying the shipment
of it. Though i guess a mitigating fact there is that it is very easy to
manually recover from that. But it's painful if your db server restarts
awhen you're not around...

And we have *another* fix for a *data corruption* bug which is coming in
the following *week*.

Yes, I think delaying a week to get both in is better than putting out a
fix for one bug when we *know* there's a data corruption bug sitting in
that code, and we're putting out a fix for it the following week.

If we were talking about a month-long delay, that'd be one thing, but
that isn't the impression I've got about what we're talking about.

Thanks!

Stephen

#17

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Tom Lane (#10)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 03:32:57PM -0400, Tom Lane wrote:

I know Josh doesn't like to do beta1 releases concurrently with back
branches because it confuses the PR messaging. But we could make an
exception perhaps; or do all those releases the same week but announce
the beta the day after the bugfix releases.

Or we just let the beta slide till after PGCon, but then I think we're
missing some excitement factor.

I am unclear if we are anywhere near ready for beta1 even in June. Are
we?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Bruce Momjian (#17)

Re: [CORE] postpone next week's release

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, May 29, 2015 at 03:32:57PM -0400, Tom Lane wrote:

I know Josh doesn't like to do beta1 releases concurrently with back
branches because it confuses the PR messaging. But we could make an
exception perhaps; or do all those releases the same week but announce
the beta the day after the bugfix releases.

Or we just let the beta slide till after PGCon, but then I think we're
missing some excitement factor.

I am unclear if we are anywhere near ready for beta1 even in June. Are
we?

I'm all about having that discussion... but can we do it on another
thread or at least wait til we've decided about the back-branch
releases? They are clearly the more important issue to consider.

Thanks!

Stephen

#19

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Stephen Frost (#18)

Re: [CORE] postpone next week's release

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

I am unclear if we are anywhere near ready for beta1 even in June. Are
we?

I'm all about having that discussion... but can we do it on another
thread or at least wait til we've decided about the back-branch
releases? They are clearly the more important issue to consider.

It's the same discussion though, ie what releases are we expecting to
get out in the next couple of weeks.

It's possible that we ought to give up on a pre-conference beta.
Certainly a whole lot of time that I'd hoped would go into reviewing
9.5 feature commits has instead gone into back-branch bug chasing this
week.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Tom Lane (#19)

Re: [CORE] postpone next week's release

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

It's possible that we ought to give up on a pre-conference beta.
Certainly a whole lot of time that I'd hoped would go into reviewing
9.5 feature commits has instead gone into back-branch bug chasing this
week.

I guess that's what I'm getting at. We need to take care of the
back-branches and that means pushing beta back. I fully expect a good
discussion on when to release beta when we get closer on that, but we're
not going to be close while we have outstanding big back-branch bugs.

Thanks!

Stephen

#21

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Tom Lane (#19)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 04:01:00PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

I am unclear if we are anywhere near ready for beta1 even in June. Are
we?

I'm all about having that discussion... but can we do it on another
thread or at least wait til we've decided about the back-branch
releases? They are clearly the more important issue to consider.

It's the same discussion though, ie what releases are we expecting to
get out in the next couple of weeks.

Agreed. If we want to put out beta1 before PGCon, I need to start on
the release notes on Monday.

It's possible that we ought to give up on a pre-conference beta.
Certainly a whole lot of time that I'd hoped would go into reviewing
9.5 feature commits has instead gone into back-branch bug chasing this
week.

Based on what has transpired in the past two weeks, I am thinking we
need to move _slower_, not faster. I am concerned we have focused so
much on new features that we have taken our eye off of reliability.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Stephen Frost (#20)

Re: [CORE] postpone next week's release

On 05/29/2015 01:03 PM, Stephen Frost wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

It's possible that we ought to give up on a pre-conference beta.
Certainly a whole lot of time that I'd hoped would go into reviewing
9.5 feature commits has instead gone into back-branch bug chasing this
week.

I guess that's what I'm getting at. We need to take care of the
back-branches and that means pushing beta back.

--
The most kicking donkey PostgreSQL Infrastructure company in existence.
The oldest, the most experienced, the consulting company to the stars.
Command Prompt, Inc. http://www.commandprompt.com/ +1 -503-667-4564 -
24x7 - 365 - Proactive and Managed Professional Services!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#19)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's possible that we ought to give up on a pre-conference beta.
Certainly a whole lot of time that I'd hoped would go into reviewing
9.5 feature commits has instead gone into back-branch bug chasing this
week.

I'm personally kind of astonished that we're even thinking about beta
so soon. I mean, we at least need to go through the stuff listed
here, I think:

https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

The bigger issue is: what's NOT on that list that should be? I think
we need to devote some cycles to figuring that out, and I sure haven't
had any this week.

In any case, I think the negative PR that we're going to get from not
getting this multixact stuff taken care of is going to far outweigh
any positive PR from getting 9.5beta1 out a little sooner, especially
if 9.5beta1 is bug-ridden because we gave it no time to settle.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#23)

Re: [CORE] postpone next week's release

Robert Haas <robertmhaas@gmail.com> writes:

I'm personally kind of astonished that we're even thinking about beta
so soon. I mean, we at least need to go through the stuff listed
here, I think:
https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

Well, maybe we ought to call it an alpha not a beta, but I think we ought
to put out some kind of release that we can encourage people to test.
What you are suggesting is that we serialize resolution of the known
issues with discovery of new issues, and that's not an efficient use of
time. Especially seeing that we're approaching the summer season where
we won't get much input at all.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#24)

Re: [CORE] postpone next week's release

On 2015-05-29 16:37:00 -0400, Tom Lane wrote:

Well, maybe we ought to call it an alpha not a beta, but I think we ought
to put out some kind of release that we can encourage people to test.

I also do think it's important that we put out a beta (or alpha)
relatively soon. Both because we actually need input to find out what
works and what doesn't and also because it pushes us to tie up loose
ends.

A beta with open items isn't that bad a thing? There's many bigger
projects doing 4-8 betas releases before a major one; and most of them
have open items at the indvidual beta's release times.

I think we should define/document it so that there's no hard goal of
being compatible for beta releases and that the compatibility goal
starts with the first release candidate, and not the betas.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Andres Freund (#25)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 11:04:59PM +0200, Andres Freund wrote:

On 2015-05-29 16:37:00 -0400, Tom Lane wrote:

Well, maybe we ought to call it an alpha not a beta, but I think we ought
to put out some kind of release that we can encourage people to test.

I also do think it's important that we put out a beta (or alpha)
relatively soon. Both because we actually need input to find out what
works and what doesn't and also because it pushes us to tie up loose
ends.

A beta with open items isn't that bad a thing? There's many bigger
projects doing 4-8 betas releases before a major one; and most of them
have open items at the indvidual beta's release times.

I think we should define/document it so that there's no hard goal of
being compatible for beta releases and that the compatibility goal
starts with the first release candidate, and not the betas.

Do we need release notes for an alpha? Once I do the release notes, it
is possible to miss subtle changes in the code that aren't mentioned in
commit messages.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Bruce Momjian (#26)

Re: [CORE] postpone next week's release

On May 29, 2015 2:12:24 PM PDT, Bruce Momjian <bruce@momjian.us> wrote:

On Fri, May 29, 2015 at 11:04:59PM +0200, Andres Freund wrote:

On 2015-05-29 16:37:00 -0400, Tom Lane wrote:

Well, maybe we ought to call it an alpha not a beta, but I think we

ought

to put out some kind of release that we can encourage people to

test.

I also do think it's important that we put out a beta (or alpha)
relatively soon. Both because we actually need input to find out what
works and what doesn't and also because it pushes us to tie up loose
ends.

A beta with open items isn't that bad a thing? There's many bigger
projects doing 4-8 betas releases before a major one; and most of

them

have open items at the indvidual beta's release times.

I think we should define/document it so that there's no hard goal of
being compatible for beta releases and that the compatibility goal
starts with the first release candidate, and not the betas.

Do we need release notes for an alpha? Once I do the release notes, it
is possible to miss subtle changes in the code that aren't mentioned in
commit messages.

Yes I think so. Otherwise it's pretty useless for people not following closely. I see little point in explicitly delaying release note work any further.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Bruce Momjian (#26)

Re: [CORE] postpone next week's release

Bruce Momjian <bruce@momjian.us> writes:

Do we need release notes for an alpha? Once I do the release notes, it
is possible to miss subtle changes in the code that aren't mentioned in
commit messages.

If the commit message isn't clear about something, you'd likely miss the
issue anyway, no? Anyway, once the release notes are in the tree, we
could expect that anyone committing a user-visible semantics change should
update the release notes themselves.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#24)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm personally kind of astonished that we're even thinking about beta
so soon. I mean, we at least need to go through the stuff listed
here, I think:
https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

Well, maybe we ought to call it an alpha not a beta, but I think we ought
to put out some kind of release that we can encourage people to test.
What you are suggesting is that we serialize resolution of the known
issues with discovery of new issues, and that's not an efficient use of
time. Especially seeing that we're approaching the summer season where
we won't get much input at all.

Well, I think we ought to take at least a few weeks to try to do a bit
of code review and clean up what we can from the open items list.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Tom Lane (#28)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 05:37:13PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Do we need release notes for an alpha? Once I do the release notes, it
is possible to miss subtle changes in the code that aren't mentioned in
commit messages.

If the commit message isn't clear about something, you'd likely miss the
issue anyway, no? Anyway, once the release notes are in the tree, we

I often do research in the git tree to get details on the feature beyond
just looking at the commit or the patch.

could expect that anyone committing a user-visible semantics change should
update the release notes themselves.

Yes, that would be nice.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#29)

Re: [CORE] postpone next week's release

On 2015-05-29 18:02:36 -0400, Robert Haas wrote:

Well, I think we ought to take at least a few weeks to try to do a bit
of code review and clean up what we can from the open items list.

Why? A large portion of the input required to go from beta towards a
release is from actual users. To see when things break, what confuses
them and such.

I don't see why that requires that there are no minor entries in the
open items list - and that's what currently is on it. Neither does it
seem to be a problem to do code review concurrently to user beta
testing. We obviously can't start a beta if things crash left and
right, but I don't think that's the situation right now?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Andres Freund (#31)

Re: [CORE] postpone next week's release

* Andres Freund (andres@anarazel.de) wrote:

On 2015-05-29 18:02:36 -0400, Robert Haas wrote:

Well, I think we ought to take at least a few weeks to try to do a bit
of code review and clean up what we can from the open items list.

Why? A large portion of the input required to go from beta towards a
release is from actual users. To see when things break, what confuses
them and such.

I don't see why that requires that there are no minor entries in the
open items list - and that's what currently is on it. Neither does it
seem to be a problem to do code review concurrently to user beta
testing. We obviously can't start a beta if things crash left and
right, but I don't think that's the situation right now?

Agreed.

Thanks!

Stephen

#33

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Andres Freund (#31)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 6:33 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-05-29 18:02:36 -0400, Robert Haas wrote:

Well, I think we ought to take at least a few weeks to try to do a bit
of code review and clean up what we can from the open items list.

Why? A large portion of the input required to go from beta towards a
release is from actual users. To see when things break, what confuses
them and such.

I have two concerns:

1. I'm concerned that once we release beta, any idea about reverting a
feature or fixing something that is broken will get harder, because
people will say "well, we can't do that after we've released a beta".
I confess to particularly wanting a solution to the item listed as
"custom-join has no way to construct Plan nodes of child Path nodes",
the history of which I'll avoid recapitulating until I'm sure I can do
it while maintaining my blood pressure at safe levels.

2. Also, if we're going to make significant multixact-related changes
to 9.5 to try to improve reliability, as you proposed on the other
thread, then it would be nice to do that before beta, so that it gets
tested. Of course, someone is bound to point out that we could make
those changes in time for beta2, and people could test that. But in
practice I think that'll just mean that stuff is only out there for
let's say 2 months before we put it in a major release, which ain't
much.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#33)

Re: [CORE] postpone next week's release

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, May 29, 2015 at 6:33 PM, Andres Freund <andres@anarazel.de> wrote:

Why? A large portion of the input required to go from beta towards a
release is from actual users. To see when things break, what confuses
them and such.

I have two concerns:

1. I'm concerned that once we release beta, any idea about reverting a
feature or fixing something that is broken will get harder, because
people will say "well, we can't do that after we've released a beta".
I confess to particularly wanting a solution to the item listed as
"custom-join has no way to construct Plan nodes of child Path nodes",
the history of which I'll avoid recapitulating until I'm sure I can do
it while maintaining my blood pressure at safe levels.

2. Also, if we're going to make significant multixact-related changes
to 9.5 to try to improve reliability, as you proposed on the other
thread, then it would be nice to do that before beta, so that it gets
tested. Of course, someone is bound to point out that we could make
those changes in time for beta2, and people could test that. But in
practice I think that'll just mean that stuff is only out there for
let's say 2 months before we put it in a major release, which ain't
much.

I think your position is completely nuts. The GROUPING SETS code is
desperately in need of testing. The custom-plan code is desperately
in need of fixing and testing. The multixact code is desperately
in need of testing. The open-items list has several other problems
besides those. All of those problems are independent. If we insist
on tackling them serially rather than in parallel, 9.5 might not come
out till 2017.

I agree that we are not in a position to promise features won't change.
So let's call it an alpha not a beta --- but for heaven's sake let's
try to move forward on all these issues, not just some of them.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#33)

Re: [CORE] postpone next week's release

On May 29, 2015 8:56:40 PM PDT, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, May 29, 2015 at 6:33 PM, Andres Freund <andres@anarazel.de>
wrote:

On 2015-05-29 18:02:36 -0400, Robert Haas wrote:

Well, I think we ought to take at least a few weeks to try to do a

bit

of code review and clean up what we can from the open items list.

Why? A large portion of the input required to go from beta towards a
release is from actual users. To see when things break, what confuses
them and such.

I have two concerns:

1. I'm concerned that once we release beta, any idea about reverting a
feature or fixing something that is broken will get harder, because
people will say "well, we can't do that after we've released a beta".
I confess to particularly wanting a solution to the item listed as
"custom-join has no way to construct Plan nodes of child Path nodes",
the history of which I'll avoid recapitulating until I'm sure I can do
it while maintaining my blood pressure at safe levels.

I think we should just document that this a beta and that changes are to be expected. And have a release candidate once that's not the case.

I agree that it'd be very good of the custom join issue gets fixed. But I don't see a beta prohibiting it. Independently from that in going to ask a Citus colleague to make sure that pg-shard can use this.

2. Also, if we're going to make significant multixact-related changes
to 9.5 to try to improve reliability, as you proposed on the other
thread, then it would be nice to do that before beta, so that it gets
tested. Of course, someone is bound to point out that we could make
those changes in time for beta2, and people could test that. But in
practice I think that'll just mean that stuff is only out there for
let's say 2 months before we put it in a major release, which ain't
much.

There seems to be enough other stuff in die need of testing that I don't think that's sufficient cause, even though I understand the sentiment.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#34)

Re: [CORE] postpone next week's release

On May 29, 2015 9:08:07 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think your position is completely nuts.

Yeehaa.

The GROUPING SETS code is
desperately in need of testing. The custom-plan code is desperately
in need of fixing and testing. The multixact code is desperately
in need of testing.

And the array/plpgsql changes and upsert, and...

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: Tom Lane (#19)

Re: [CORE] postpone next week's release

On Fri, May 29, 2015 at 04:01:00PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

I am unclear if we are anywhere near ready for beta1 even in June. Are
we?

I'm all about having that discussion... but can we do it on another
thread or at least wait til we've decided about the back-branch
releases? They are clearly the more important issue to consider.

It's the same discussion though, ie what releases are we expecting to
get out in the next couple of weeks.

+1 for Stephen's thought to decide about back-branch releases first and to
Magnus's sentiment upthread that beta has to stand back while we schedule
them. In other words, the feedback between these two scheduling decisions
ought to be one-way: bringing today's supported branches to a state we can be
content about deserves first pick from the calendar.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Tom Lane (#34)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 12:08:07AM -0400, Tom Lane wrote:

desperately in need of testing. The custom-plan code is desperately
in need of fixing and testing. The multixact code is desperately
in need of testing. The open-items list has several other problems
besides those. All of those problems are independent. If we insist
on tackling them serially rather than in parallel, 9.5 might not come
out till 2017.

2017? Really? Is there any need for that hyperbole?

Frankly, based on how I feel now, I would have no problem doing 9.5 in
2016 and saying we have a lot of retooling to do. We could say we have
gotten too far out ahead of ourselves and we need to regroup and
restructure the code.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Bruce Momjian (#38)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 08:56:53AM -0400, Bruce Momjian wrote:

On Sat, May 30, 2015 at 12:08:07AM -0400, Tom Lane wrote:

desperately in need of testing. The custom-plan code is desperately
in need of fixing and testing. The multixact code is desperately
in need of testing. The open-items list has several other problems
besides those. All of those problems are independent. If we insist
on tackling them serially rather than in parallel, 9.5 might not come
out till 2017.

2017? Really? Is there any need for that hyperbole?

Frankly, based on how I feel now, I would have no problem doing 9.5 in
2016 and saying we have a lot of retooling to do. We could say we have
gotten too far out ahead of ourselves and we need to regroup and
restructure the code.

Actually, barrelling ahead to get releases out is how we got into this
mess in the first place. I would vote we put the 9.5 release on hold
while we do an honest assessment of where we are. In hindsight, we
should have known to do this even before 9.4 was released.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#34)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think your position is completely nuts. The GROUPING SETS code is
desperately in need of testing. The custom-plan code is desperately
in need of fixing and testing. The multixact code is desperately
in need of testing. The open-items list has several other problems
besides those. All of those problems are independent. If we insist
on tackling them serially rather than in parallel, 9.5 might not come
out till 2017.

If that means it's stable, +1 from me.

I dispute, on every level, the notion that not releasing a beta means
that we can't work on things in parallel. We can work on all of the
things on the open items list in parallel right now. We can also
test. And in fact, we should test. It's entirely appropriate to test
our own stuff before we ask other people to test it. It's also
appropriate to fix the things that we already know are broken before
we ask other people to test it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Bruce Momjian (#39)

Re: [CORE] postpone next week's release

On 05/30/2015 06:11 AM, Bruce Momjian wrote:

2017? Really? Is there any need for that hyperbole?

Frankly, based on how I feel now, I would have no problem doing 9.5 in
2016 and saying we have a lot of retooling to do. We could say we have
gotten too far out ahead of ourselves and we need to regroup and
restructure the code.

Actually, barrelling ahead to get releases out is how we got into this
mess in the first place. I would vote we put the 9.5 release on hold
while we do an honest assessment of where we are. In hindsight, we
should have known to do this even before 9.4 was released.

It seems that we all are forgetting one of the fundamental concepts of
open source development:

Q. When will release X be?
A. When it is done.

A delay because of quality concerns shows the integrity of the project.

Sincerely,

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Robert Haas (#40)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 10:06:52AM -0400, Robert Haas wrote:

If that means it's stable, +1 from me.

I dispute, on every level, the notion that not releasing a beta means
that we can't work on things in parallel. We can work on all of the
things on the open items list in parallel right now. We can also
test. And in fact, we should test. It's entirely appropriate to test
our own stuff before we ask other people to test it. It's also
appropriate to fix the things that we already know are broken before
we ask other people to test it.

Let me share something that people have told me privately but don't want
to state publicly (at least with attribution), and that is that we have
seen great increases in feature development (often funded), without a
corresponding increase development efforts focused on stability. The
fact Alvaro has had to almost single-handedly fix multi-xact bug until
very recently is testament to that.

The bottom line is that we just can't keep going on like this. The fact
we put out a release two weeks ago, then need to put out a fix release
for that, but we have more multi-xact bugs to fix and can't decide if we
should do one or two minor releases, and are pushing out an alpha of 9.5
because we know we aren't ready for a beta, just confirms my analysis.

I hate to be the bearer of bad news, but I think bad news is what we
must face.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Bruce Momjian (#42)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 11:45 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Sat, May 30, 2015 at 10:06:52AM -0400, Robert Haas wrote:

If that means it's stable, +1 from me.

I dispute, on every level, the notion that not releasing a beta means
that we can't work on things in parallel. We can work on all of the
things on the open items list in parallel right now. We can also
test. And in fact, we should test. It's entirely appropriate to test
our own stuff before we ask other people to test it. It's also
appropriate to fix the things that we already know are broken before
we ask other people to test it.

Let me share something that people have told me privately but don't want
to state publicly (at least with attribution), and that is that we have
seen great increases in feature development (often funded), without a
corresponding increase development efforts focused on stability. The
fact Alvaro has had to almost single-handedly fix multi-xact bug until
very recently is testament to that.

It's clear - at least to me - that we need to put more resources into
stabilizing the new multixact system. This is killing us. If we can't
stabilize this, people will go use some other database.

Equally importantly, we need to make sure that we never release
something comparably broken ever again. And that's why I'm not
sanguine about shipping what we've got without adequate reflection.

What, in this release, could break things badly? RLS? Grouping sets?
Heikki's WAL format changes? That last one sounds really scary to me;
it's painful if not impossible to fix the WAL format in a minor
release.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Peter Geoghegan

pg@heroku.com

over 10 years ago

In reply to: Bruce Momjian (#38)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 5:56 AM, Bruce Momjian <bruce@momjian.us> wrote:

Frankly, based on how I feel now, I would have no problem doing 9.5 in
2016 and saying we have a lot of retooling to do. We could say we have
gotten too far out ahead of ourselves and we need to regroup and
restructure the code.

I wouldn't mind doing that, but I think it's premature to conclude
that it's necessary to wait quite that long to release.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Peter Geoghegan

pg@heroku.com

over 10 years ago

In reply to: Robert Haas (#43)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 11:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Let me share something that people have told me privately but don't want
to state publicly (at least with attribution), and that is that we have
seen great increases in feature development (often funded), without a
corresponding increase development efforts focused on stability. The
fact Alvaro has had to almost single-handedly fix multi-xact bug until
very recently is testament to that.

It's clear - at least to me - that we need to put more resources into
stabilizing the new multixact system. This is killing us. If we can't
stabilize this, people will go use some other database.

+1. I don't grok the MultiXact code as some people do, but even still,
I think problems have been ongoing for so long now that we must change
course. FWIW, my perception from afar is that the problems haven't
really tapered off, and we'd be better off taking a fresh approach.

Equally importantly, we need to make sure that we never release
something comparably broken ever again. And that's why I'm not
sanguine about shipping what we've got without adequate reflection.

As you said, there was a failure to appreciate the interactions with
VACUUM. That should have made us more introspective about what we
didn't know and couldn't know during during 9.3 development, but it
didn't.

What, in this release, could break things badly? RLS? Grouping sets?
Heikki's WAL format changes? That last one sounds really scary to me;
it's painful if not impossible to fix the WAL format in a minor
release.

I think we actually have learned some lessons here. MultiXacts were a
somewhat unusual case for a couple of reasons that I need not rehash.

In contrast, Heikki's WAL format changes (just for example) are
fundamentally just a restructuring to the existing format. Sure, there
could be bugs, but I think that it's fundamentally different to the
9.3 MultiXact stuff, in that the MultiXact stuff appears to be
stubbornly difficult to stabilize over months and years. That feels
like something that is unlikely to be true for anything that made it
into 9.5.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Bruce Momjian (#42)

Re: [CORE] postpone next week's release

Hi Bruce, Everyone,

On 2015-05-30 11:45:59 -0400, Bruce Momjian wrote:

Let me share something that people have told me privately but don't want
to state publicly (at least with attribution), and that is that we have
seen great increases in feature development (often funded), without a
corresponding increase development efforts focused on stability.

Yes, I have seen and heard that too. What I think is also important that
in turn our adoption has outpaced feature development (and thus
transitively stability work).

The bottom line is that we just can't keep going on like this. The fact
we put out a release two weeks ago, then need to put out a fix release
for that, but we have more multi-xact bugs to fix and can't decide if we
should do one or two minor releases, and are pushing out an alpha of 9.5
because we know we aren't ready for a beta, just confirms my analysis.

I don't think that alone confirms very much.

I hate to be the bearer of bad news, but I think bad news is what we
must face.

Well, the question is what we do with that observation. Personally I
think it's not a new one. This point has been made repeatedly, including
at most if not all developer meetings I attended. I definitely had
conversations around it both in person, on IM and on list.

I don't think it's primarily a problem of lack of review; although that
is a large problem. I think the biggest systematic problem is that the
compound complexity of postgres has increased dramatically over the
years. Features have added complexity little by little, each not
incrementally not looking that bad. But very little has been done to
manage complexity. Since 8.0 the codesize has roughly doubled, but
little has been done to manage the increased complexity. Few new
abstractions have been introduced and the structure of the code is
largely the same.

As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it
was ~500 LOC, in master it's ~1500. The interactions in 8.0 were
complex, they have gotten much more complex since. It fullfills lots of
different roles, all in one function:

(roughly in the order things happen, but simplified)
* Read the control file/determine whether we crashed
* recovery.conf handling
* backup label handling
* tablespace map handling (huh, I missed that this was added directly to
StartupXLOG. What a bad idea)
* Determine whether we're doing archive recovery, read the relevant
checkpoint if so
* relcache init file removal
* timeline switch handling
* Loading the checkpoint we're starting from
* Initialization of a lot of subsystems
* crash recovery/replay
* Including pgstat, unlogged table, exported snapshot handling
* iff hot standby, some more subsystems are initialized here
* hot standby state handling
* replay process intialization
* crash replay itself, including
* progress tracking
* recovery pause handling
* nextxid tracking
* timeline increase handling
* hot standby state handling
* unlogged relations handling
* archive recovery handling
* creation/initialization of the end of recovery checkpoint
* timeline increment if failover
* subsystem initialization iff !hot_standby
* end of recovery actions

Yes. that's one routine. And, to make things even funnier, half of that
routine isn't exercised by our tests.

You can argue that this is an outlier, but I don't think so. Heapam, the
planner, etc. have similar cases.

And I think this, to some degree, explains a lot of the multixact
problems. While there were a few "simple bugs", most of them were
interactions between the various subsystems that are rather intricate.

So, I think we have built up a lot of technical debt. And very little
effort has been made to fix that; and in the cases where people have the
reception has often been cool, because refactoring things obviously will
destabilize in the short term, even if it fixes problems in the long
term. I don't think that's sustainable.

We can't improve the situation by just delaying the 9.5 release or
something like that. We need to actively work on making the codebase
easier to understand and better tested. But that is actual development
work, and shouldn't happen at the tail end of a release.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#43)

Re: [CORE] postpone next week's release

On 2015-05-30 14:10:36 -0400, Robert Haas wrote:

It's clear - at least to me - that we need to put more resources into
stabilizing the new multixact system. This is killing us. If we can't
stabilize this, people will go use some other database.

I agree. Perhaps I don't see things quite as direly, but then I didn't
just spend weeks on the issue. I remember that I was incredibly
frustrated around 9.3.2 because I'd spent weeks on fixing issued around
this and it just never seemed to stop.

Equally importantly, we need to make sure that we never release
something comparably broken ever again. And that's why I'm not
sanguine about shipping what we've got without adequate reflection.

I think you're inferring something wrong here. A beta/alpha *is* getting
feedback on how good/bad things are. It's just one source of such
information, but we don't have that many others.

As explained in the email I sent before this, I think a lot of the
problems come from too complex code (with barely any testing). But we're
not going to be able to clean this up in 9.5. This will be a longer term
effort.

If we, without further changes, decide to let the release slip to, say,
Q1 2016, the only thing that'll happen is to happen that 9.6 will have
larger, more complex features. With barely any additional review and
testing done. There was very little, if any, additional testing/review
outside jsonb due to the 9.4 slippage.

I don't think the problems have much to do with the release schedule.

What, in this release, could break things badly?

RLS?

Mostly localized to users of the feature. Niche use case.

Grouping sets?

Few changes to code unless grouping sets are used.

Heikki's WAL format changes?

Yes, that's quite invasive. On the other hand, I can't think of another
feature that had as much invested in tooling to detect problem.

What's more:
* Upsert - it's probably the most complex feature in 9.5. It's quite
localized though.
* The locking changes, a good amount of potential for subtle problems
* The signal handling, sinval, client communication changes. Little to
none problems so far, but it's complex stuff. These changes are an
example of potential for problems due to changes to reduce
complexity...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#47)

Re: [CORE] postpone next week's release

Andres Freund <andres@anarazel.de> writes:

* The signal handling, sinval, client communication changes. Little to
none problems so far, but it's complex stuff. These changes are an
example of potential for problems due to changes to reduce
complexity...

As far as that goes, it's quite clear from the buildfarm that the
atomics stuff is not very stable on non-mainstream architectures.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#48)

Re: [CORE] postpone next week's release

On May 30, 2015 2:19:00 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

* The signal handling, sinval, client communication changes. Little

to

none problems so far, but it's complex stuff. These changes are an
example of potential for problems due to changes to reduce
complexity...

As far as that goes, it's quite clear from the buildfarm that the
atomics stuff is not very stable on non-mainstream architectures.

Is that the case? So far it seems to primarily be a problem of the, old, barrier emulation being buggy (non reentrant). And that being visible due to the new barrier in the latch code.

If not be surprised if there were more bugs, don't get me wrong, this is highly platform dependant stuff.

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

David Steele

david@pgmasters.net

over 10 years ago

In reply to: Robert Haas (#43)

Re: [CORE] postpone next week's release

On 5/30/15 2:10 PM, Robert Haas wrote:

What, in this release, could break things badly? RLS? Grouping sets?
Heikki's WAL format changes? That last one sounds really scary to me;
it's painful if not impossible to fix the WAL format in a minor
release.

I would argue Heikki's WAL stuff is a perfect case for releasing a
public alpha/beta soon. I'd love to test PgBackRest with an "official"
9.5dev build. The PgBackRest test suite has lots of tests that run on
versions 8.3+ and might well shake out any bugs that are lying around.

In fact, I've added a new feature based on monitoring the thread and I'm
interested to see how that pans out.

--
- David Steele
david@pgmasters.net

#51

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: David Steele (#50)

Re: [CORE] postpone next week's release

On 05/30/2015 03:48 PM, David Steele wrote:

On 5/30/15 2:10 PM, Robert Haas wrote:

What, in this release, could break things badly? RLS? Grouping sets?
Heikki's WAL format changes? That last one sounds really scary to me;
it's painful if not impossible to fix the WAL format in a minor
release.

I would argue Heikki's WAL stuff is a perfect case for releasing a
public alpha/beta soon. I'd love to test PgBackRest with an "official"
9.5dev build. The PgBackRest test suite has lots of tests that run on
versions 8.3+ and might well shake out any bugs that are lying around.

You are right. Clone git, run it nightly automated and please, please
report anything you find. There is no reason for a tagged release for
that. Consider it a custom, purpose built, build-test farm.

Sincerely,

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

David Steele

david@pgmasters.net

over 10 years ago

In reply to: Joshua D. Drake (#51)

Re: [CORE] postpone next week's release

On 5/30/15 8:38 PM, Joshua D. Drake wrote:

On 05/30/2015 03:48 PM, David Steele wrote:

On 5/30/15 2:10 PM, Robert Haas wrote:

What, in this release, could break things badly? RLS? Grouping sets?
Heikki's WAL format changes? That last one sounds really scary to me;
it's painful if not impossible to fix the WAL format in a minor
release.

I would argue Heikki's WAL stuff is a perfect case for releasing a
public alpha/beta soon. I'd love to test PgBackRest with an "official"
9.5dev build. The PgBackRest test suite has lots of tests that run on
versions 8.3+ and might well shake out any bugs that are lying around.

You are right. Clone git, run it nightly automated and please, please
report anything you find. There is no reason for a tagged release for
that. Consider it a custom, purpose built, build-test farm.

Sure - I can write code to do that. But then why release a beta at all?

--
- David Steele
david@pgmasters.net

#53

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Peter Geoghegan (#44)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 12:26:11PM -0700, Peter Geoghegan wrote:

On Sat, May 30, 2015 at 5:56 AM, Bruce Momjian <bruce@momjian.us> wrote:

Frankly, based on how I feel now, I would have no problem doing 9.5 in
2016 and saying we have a lot of retooling to do. We could say we have
gotten too far out ahead of ourselves and we need to regroup and
restructure the code.

I wouldn't mind doing that, but I think it's premature to conclude
that it's necessary to wait quite that long to release.

I agree it probably wouldn't take until 2016, but if does take until
2016, we have to be fine with that. What I am saying is we can't just
continue to focus on hitting target dates and assume everything will be
fine, because it isn't.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: David Steele (#52)

Re: [CORE] postpone next week's release

On 05/30/2015 06:51 PM, David Steele wrote:

On 5/30/15 8:38 PM, Joshua D. Drake wrote:

On 05/30/2015 03:48 PM, David Steele wrote:

On 5/30/15 2:10 PM, Robert Haas wrote:

What, in this release, could break things badly? RLS? Grouping sets?
Heikki's WAL format changes? That last one sounds really scary to me;
it's painful if not impossible to fix the WAL format in a minor
release.

I would argue Heikki's WAL stuff is a perfect case for releasing a
public alpha/beta soon. I'd love to test PgBackRest with an "official"
9.5dev build. The PgBackRest test suite has lots of tests that run on
versions 8.3+ and might well shake out any bugs that are lying around.

You are right. Clone git, run it nightly automated and please, please
report anything you find. There is no reason for a tagged release for
that. Consider it a custom, purpose built, build-test farm.

Sure - I can write code to do that. But then why release a beta at all?

1. Continuous testing (especially automated) is a great thing (see
Buildfarm)

2. The rules for patches change a bit when we move to Beta

3. We may be able to fix a problem now (or soon) that you might catch
before Beta.

Sincerely,

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

David G. Johnston

david.g.johnston@gmail.com

over 10 years ago

In reply to: Bruce Momjian (#53)

Re: postpone next week's release

On Saturday, May 30, 2015, Bruce Momjian <bruce@momjian.us> wrote:

On Sat, May 30, 2015 at 12:26:11PM -0700, Peter Geoghegan wrote:

On Sat, May 30, 2015 at 5:56 AM, Bruce Momjian <bruce@momjian.us

<javascript:;>> wrote:

Frankly, based on how I feel now, I would have no problem doing 9.5 in
2016 and saying we have a lot of retooling to do. We could say we have
gotten too far out ahead of ourselves and we need to regroup and
restructure the code.

I wouldn't mind doing that, but I think it's premature to conclude
that it's necessary to wait quite that long to release.

I agree it probably wouldn't take until 2016, but if does take until
2016, we have to be fine with that. What I am saying is we can't just
continue to focus on hitting target dates and assume everything will be
fine, because it isn't.

On a slightly tangential note: I'm not prepared to defend doing so but it
seems worth at least considering whether we should continue supporting 9.0
beyond this October.

I don't think it should be be de-supported until at least a couple of 9.5
point releases have been found to be stable.

David J.

#56

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Andres Freund (#46)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 10:47:27PM +0200, Andres Freund wrote:

The bottom line is that we just can't keep going on like this. The fact
we put out a release two weeks ago, then need to put out a fix release
for that, but we have more multi-xact bugs to fix and can't decide if we
should do one or two minor releases, and are pushing out an alpha of 9.5
because we know we aren't ready for a beta, just confirms my analysis.

I don't think that alone confirms very much.

Huh? In what world is that release timeline ever reasonable? It points
to a serious problem.

I hate to be the bearer of bad news, but I think bad news is what we
must face.

Well, the question is what we do with that observation. Personally I
think it's not a new one. This point has been made repeatedly, including
at most if not all developer meetings I attended. I definitely had
conversations around it both in person, on IM and on list.

Well, I think we stop what we are doing, focus on restructuring,
testing, and reviewing areas that historically have had problems, and
when we are done, we can look to go to 9.5 beta. What we don't want to
do is to push out more code and get back into a
wack-a-bug-as-they-are-found mode, which obviously did not serve us well
for multi-xact, and which is what releasing a beta will do, and of
course, more commit-fests, and more features.

If we have to totally stop feature development until we are all happy
with the code we have, so be it. If people feel they have to get into
cleanup mode or they will never get to add a feature to Postgres again,
so be it. If people say, heh, I am not going to do anything and just
come back when cleanup is done (by someone else), then we will end up
with a smaller but more dedicated development team, and I am fine with
that too. I am suggesting that until everyone is happy with the code we
have, we should not move forward. Forget 9.5 feature testing --- we
don't even have 9.3 and 9.4 working to my satisfaction yet, and I bet
others share my opinion. We do not want to look back on this period and
say _this_ is when Postgres lost its reputation for reliability, and
when other databases took that reputation from us.

I don't think it's primarily a problem of lack of review; although that
is a large problem. I think the biggest systematic problem is that the
compound complexity of postgres has increased dramatically over the
years. Features have added complexity little by little, each not
incrementally not looking that bad. But very little has been done to
manage complexity. Since 8.0 the codesize has roughly doubled, but
little has been done to manage the increased complexity. Few new
abstractions have been introduced and the structure of the code is
largely the same.

As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it
was ~500 LOC, in master it's ~1500. The interactions in 8.0 were
complex, they have gotten much more complex since. It fullfills lots of
different roles, all in one function:

Yep, great please to start our work.

So, I think we have built up a lot of technical debt. And very little
effort has been made to fix that; and in the cases where people have the
reception has often been cool, because refactoring things obviously will
destabilize in the short term, even if it fixes problems in the long
term. I don't think that's sustainable.

Agreed.

We can't improve the situation by just delaying the 9.5 release or
something like that. We need to actively work on making the codebase
easier to understand and better tested. But that is actual development
work, and shouldn't happen at the tail end of a release.

It should start right now, and then, once we are happy with our code, we
can take periodic breaks to revisit the exact issues you describe. What
I am saying is that we shouldn't wait until after 9.5 beta or after 9.5
final, or after the next commitfest or whatever. We have already waited
too long to do this.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Michael Paquier

michael.paquier@gmail.com

over 10 years ago

In reply to: Bruce Momjian (#56)

Re: [CORE] postpone next week's release

On Sun, May 31, 2015 at 11:48 AM, Bruce Momjian wrote:

On Sat, May 30, 2015 at 10:47:27PM +0200, Andres Freund wrote:

So, I think we have built up a lot of technical debt. And very little
effort has been made to fix that; and in the cases where people have the
reception has often been cool, because refactoring things obviously will
destabilize in the short term, even if it fixes problems in the long
term. I don't think that's sustainable.

Agreed.

+1. Complexity has increased, and we are actually never at 100% sure
that a given bug fix does not have side effects on other things, hence
I think that a portion of this technical debt is the lack of
regression test coverage, for both existing features and platforms
(like Windows). The thing is that complexity has increased, but for
example for many features we lack test coverage, thinking mainly
replication-related stuff here. Of course we will never get to a level
of 100% of confidence with just the test coverage and the buildfarm,
but we should at least try to get closer to such a goal.

Those are things I am really willing to work on in the very short term
for what it's worth (of course not only that as
reviewing/refactoring/testing existing things is as well damn
important). Now improving the test coverage requires new
infrastructure, so those are new features, and that's perhaps not
dedicated to 9.5, except if we consider that this is part of this
technical debt accumulated among the years. Honestly I think it is.

We can't improve the situation by just delaying the 9.5 release or
something like that. We need to actively work on making the codebase
easier to understand and better tested. But that is actual development
work, and shouldn't happen at the tail end of a release.

It should start right now, and then, once we are happy with our code, we
can take periodic breaks to revisit the exact issues you describe. What
I am saying is that we shouldn't wait until after 9.5 beta or after 9.5
final, or after the next commitfest or whatever. We have already waited
too long to do this.

Definitely.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Peter Geoghegan (#45)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 3:46 PM, Peter Geoghegan <pg@heroku.com> wrote:

What, in this release, could break things badly? RLS? Grouping sets?
Heikki's WAL format changes? That last one sounds really scary to me;
it's painful if not impossible to fix the WAL format in a minor
release.

I think we actually have learned some lessons here. MultiXacts were a
somewhat unusual case for a couple of reasons that I need not rehash.

In contrast, Heikki's WAL format changes (just for example) are
fundamentally just a restructuring to the existing format. Sure, there
could be bugs, but I think that it's fundamentally different to the
9.3 MultiXact stuff, in that the MultiXact stuff appears to be
stubbornly difficult to stabilize over months and years. That feels
like something that is unlikely to be true for anything that made it
into 9.5.

I hope you're right. But I don't think any of us foresaw just how bad
the MultiXact thing was likely to be either.

In fact, I think to some extent we may STILL be in denial about how bad it is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Michael Paquier (#57)

Re: [CORE] postpone next week's release

On Sun, May 31, 2015 at 08:15:38PM +0900, Michael Paquier wrote:

On Sun, May 31, 2015 at 11:48 AM, Bruce Momjian wrote:

On Sat, May 30, 2015 at 10:47:27PM +0200, Andres Freund wrote:

So, I think we have built up a lot of technical debt. And very little
effort has been made to fix that; and in the cases where people have the
reception has often been cool, because refactoring things obviously will
destabilize in the short term, even if it fixes problems in the long
term. I don't think that's sustainable.

Agreed.

+1. Complexity has increased, and we are actually never at 100% sure
that a given bug fix does not have side effects on other things, hence
I think that a portion of this technical debt is the lack of
regression test coverage, for both existing features and platforms
(like Windows). The thing is that complexity has increased, but for
example for many features we lack test coverage, thinking mainly
replication-related stuff here. Of course we will never get to a level
of 100% of confidence with just the test coverage and the buildfarm,
but we should at least try to get closer to such a goal.

FYI, I realize that one additional thing that has discouraged code
reorganization is the additional backpatch overhead. I think we now
need to accept that our reorganization-adverse approach might have cost
us some reliability, and that reorganization is going to add work to
backpatching.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Bruce Momjian (#59)

Re: [CORE] postpone next week's release

On Sun, May 31, 2015 at 09:50:25AM -0400, Bruce Momjian wrote:

+1. Complexity has increased, and we are actually never at 100% sure
that a given bug fix does not have side effects on other things, hence
I think that a portion of this technical debt is the lack of
regression test coverage, for both existing features and platforms
(like Windows). The thing is that complexity has increased, but for
example for many features we lack test coverage, thinking mainly
replication-related stuff here. Of course we will never get to a level
of 100% of confidence with just the test coverage and the buildfarm,
but we should at least try to get closer to such a goal.

FYI, I realize that one additional thing that has discouraged code
reorganization is the additional backpatch overhead. I think we now
need to accept that our reorganization-adverse approach might have cost
us some reliability, and that reorganization is going to add work to
backpatching.

Actually, code reorganization in HEAD might cause backpatching to be
more buggy, reducing reliability --- obviously we need to have a
discussion about that.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: David Steele (#52)

Re: [CORE] postpone next week's release

On Sat, May 30, 2015 at 09:51:04PM -0400, David Steele wrote:

On 5/30/15 8:38 PM, Joshua D. Drake wrote:

On 05/30/2015 03:48 PM, David Steele wrote:

I would argue Heikki's WAL stuff is a perfect case for releasing a
public alpha/beta soon. I'd love to test PgBackRest with an "official"
9.5dev build. The PgBackRest test suite has lots of tests that run on
versions 8.3+ and might well shake out any bugs that are lying around.

You are right. Clone git, run it nightly automated and please, please
report anything you find. There is no reason for a tagged release for
that. Consider it a custom, purpose built, build-test farm.

Sure - I can write code to do that. But then why release a beta at all?

It's largely for the benefit of folks planning manual, or otherwise high-cost,
testing. If you budget for just one big test per year, make it a test of
beta1. For inexpensive testing, you may as well ignore beta and test git
master daily or weekly.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Bruce Momjian (#60)

Re: [CORE] postpone next week's release

Bruce Momjian <bruce@momjian.us> writes:

FYI, I realize that one additional thing that has discouraged code
reorganization is the additional backpatch overhead. I think we now
need to accept that our reorganization-adverse approach might have cost
us some reliability, and that reorganization is going to add work to
backpatching.

Actually, code reorganization in HEAD might cause backpatching to be
more buggy, reducing reliability --- obviously we need to have a
discussion about that.

Commit 6b700301c36e380eb4972ab72c0e914cae60f9fd is a recent real example.
Not that that should dissuade us from ever doing any reorganizations,
but it's foolish to discount back-patching costs.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

David Steele

david@pgmasters.net

over 10 years ago

In reply to: Noah Misch (#61)

Re: [CORE] postpone next week's release

On 5/31/15 11:49 AM, Noah Misch wrote:

On Sat, May 30, 2015 at 09:51:04PM -0400, David Steele wrote:

On 5/30/15 8:38 PM, Joshua D. Drake wrote:

On 05/30/2015 03:48 PM, David Steele wrote:

I would argue Heikki's WAL stuff is a perfect case for releasing a
public alpha/beta soon. I'd love to test PgBackRest with an "official"
9.5dev build. The PgBackRest test suite has lots of tests that run on
versions 8.3+ and might well shake out any bugs that are lying around.

You are right. Clone git, run it nightly automated and please, please
report anything you find. There is no reason for a tagged release for
that. Consider it a custom, purpose built, build-test farm.

Sure - I can write code to do that. But then why release a beta at all?

It's largely for the benefit of folks planning manual, or otherwise high-cost,
testing. If you budget for just one big test per year, make it a test of
beta1. For inexpensive testing, you may as well ignore beta and test git
master daily or weekly.

I've gotten to the point of (relatively) high-cost coding/testing. The
removal of checkpoint_segments and pause_on_recovery are leading to
refactoring of not only the regressions tests but the actual backup
code. 9.5 and 8.3 are the only versions that require exceptions in the
code base.

I've already done basic testing against 9.5 by disabling certain tests.
Now I'm at the point where I need to start modifying code to take new
9.5 features/changes into account and make sure the regression tests
work for 8.3-9.5 with the fewest number of exceptions possible.

From the perspective of backup/restore testing, 9.5 has the most changes
since 9.0. I'd like to know that the API at least is stable before
investing the time in new development.

Perhaps I'm just misunderstanding the nature of the discussion.

--
- David Steele
david@pgmasters.net

#64

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Tom Lane (#62)

Re: [CORE] postpone next week's release

On Sun, May 31, 2015 at 11:55:44AM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

FYI, I realize that one additional thing that has discouraged code
reorganization is the additional backpatch overhead. I think we now
need to accept that our reorganization-adverse approach might have cost
us some reliability, and that reorganization is going to add work to
backpatching.

Actually, code reorganization in HEAD might cause backpatching to be
more buggy, reducing reliability --- obviously we need to have a
discussion about that.

Commit 6b700301c36e380eb4972ab72c0e914cae60f9fd is a recent real example.
Not that that should dissuade us from ever doing any reorganizations,
but it's foolish to discount back-patching costs.

Yep.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#62)

Re: [CORE] postpone next week's release

On 2015-05-31 11:55:44 -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

FYI, I realize that one additional thing that has discouraged code
reorganization is the additional backpatch overhead. I think we now
need to accept that our reorganization-adverse approach might have cost
us some reliability, and that reorganization is going to add work to
backpatching.

Actually, code reorganization in HEAD might cause backpatching to be
more buggy, reducing reliability --- obviously we need to have a
discussion about that.

Commit 6b700301c36e380eb4972ab72c0e914cae60f9fd is a recent real example.
Not that that should dissuade us from ever doing any reorganizations,
but it's foolish to discount back-patching costs.

On the other hand, that code is a complete maintenance nightmare. If
there weren't literally dozens of places that needed to be touched to
add a single parameter, it'd be far less likely for such a mistake to be
made. Right now significant portions of the file differ between the
branches, despite primarily minor feature additions...

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Michael Paquier

michael.paquier@gmail.com

over 10 years ago

In reply to: Bruce Momjian (#60)

Re: [CORE] postpone next week's release

On Sun, May 31, 2015 at 11:03 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Sun, May 31, 2015 at 09:50:25AM -0400, Bruce Momjian wrote:

+1. Complexity has increased, and we are actually never at 100% sure
that a given bug fix does not have side effects on other things, hence
I think that a portion of this technical debt is the lack of
regression test coverage, for both existing features and platforms
(like Windows). The thing is that complexity has increased, but for
example for many features we lack test coverage, thinking mainly
replication-related stuff here. Of course we will never get to a level
of 100% of confidence with just the test coverage and the buildfarm,
but we should at least try to get closer to such a goal.

FYI, I realize that one additional thing that has discouraged code
reorganization is the additional backpatch overhead. I think we now
need to accept that our reorganization-adverse approach might have cost
us some reliability, and that reorganization is going to add work to
backpatching.

Actually, code reorganization in HEAD might cause backpatching to be
more buggy, reducing reliability --- obviously we need to have a
discussion about that.

As a result, IMO all the folks gathering to PGCon (won't be there
sorry, but I read the MLs) should have a talk about that and define a
clear list of items to tackle in terms of reorganization for 9.5, and
then update this page:
https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items
This does not prevent to move on with all the current items and
continue reviewing existing features that have been pushed of course.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Magnus Hagander (#5)

Re: [CORE] postpone next week's release

Magnus Hagander <magnus@hagander.net> writes:

On Fri, May 29, 2015 at 8:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think we should postpone next week's release.

I'm a bit split on this.

We *definitely* don't want to release the multixact fix without it being
carefully reviewed, that's the part I'm not split about :) And I fully
appreciate we can't have that done by monday.

However, the file-permission thing seems to hit quite a few people (have we
ever had this many bug reports after a minor release), which means wed
really want to get that out quickly.

After dithering over the weekend, the majority view on -core seems to be
that we should go ahead with making a release today for the fsync issue.
We'll plan another release next week, or whenever the dust seems to have
settled on the multixact issue(s).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Bruce Momjian (#30)

Re: [CORE] postpone next week's release

On 5/29/15 5:28 PM, Bruce Momjian wrote:

could expect that anyone committing a user-visible semantics change should

update the release notes themselves.

Yes, that would be nice.

FWIW, I've always wondered why we don't create an empty next-version
release notes as part of stamping a major release and expect patch
authors to add to it. I realize that likely creates merge conflicts, but
that seems less work than doing it all at the end. (Or maybe each patch
just creates a file and the final process is pulling all the files
together.)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Jim Nasby (#68)

Re: [CORE] postpone next week's release

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

FWIW, I've always wondered why we don't create an empty next-version
release notes as part of stamping a major release and expect patch
authors to add to it. I realize that likely creates merge conflicts, but
that seems less work than doing it all at the end. (Or maybe each patch
just creates a file and the final process is pulling all the files
together.)

There are good reasons to write the release notes all in one batch:
otherwise you don't get any uniformity of editorial style.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#69)

Re: [CORE] postpone next week's release

On 2015-06-01 12:32:21 -0400, Tom Lane wrote:

There are good reasons to write the release notes all in one batch:
otherwise you don't get any uniformity of editorial style.

I agree that that's a good reason for major releases, I do however
wonder if it'd not be a good idea to do differently for backpatched
bugfixes. It's imo a good thing to force committers to write a release
notice at the same time they're backpatching. The memory is fresh, and
the commit message is more likely to contain pertinent details.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#70)

Re: [CORE] postpone next week's release

Andres Freund <andres@anarazel.de> writes:

On 2015-06-01 12:32:21 -0400, Tom Lane wrote:

There are good reasons to write the release notes all in one batch:
otherwise you don't get any uniformity of editorial style.

I agree that that's a good reason for major releases, I do however
wonder if it'd not be a good idea to do differently for backpatched
bugfixes. It's imo a good thing to force committers to write a release
notice at the same time they're backpatching. The memory is fresh, and
the commit message is more likely to contain pertinent details.

We do expect committers to write commit log messages that contain
appropriate raw material for the release notes. That's not the same
as expecting them to prepare an actual, sgml-marked-up, release note
entry that's in good English and occupies a reasonable amount of space
relative to other items.

Jim's point about merge problems is very pertinent as well. In the
first place, if we had running release notes like that, they'd often
differ from one branch to the next, making back-patching rather annoying.
In the second place, SGML is so bulky that the patch context you'd be
working with would frequently look like not much more than

</para>
</listitem>

making it very easy for the hunks to be misapplied.

Lastly, we have recently adopted a practice of labeling release note
entries with the associated commit hashes. I dunno how much value that
really has, but it would be entirely impossible to write such labels
in advance of pushing the fixes.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72

Josh Berkus

josh@agliodbs.com

over 10 years ago

In reply to: Stephen Frost (#18)

Re: [CORE] postpone next week's release

All,

Just my $0.02 on PR: it has never been a PR problem to do multiple
update releases, as long as we could provide a good reason for doing so
(like: fix A is available now and we didn't want to hold it back waiting
for fix B).

It's always a practical question of (a) packaging and (b) deployment.
That is, we can get packager fatigue where some updates don't get
packaged, and we can get user fatigue where they start ignoring updates.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMe24a585d68f2f809c2322b600e229c89045b8a04bf067c5d4320f8bedccd95093872b0a43b010e2747e9edd82dd2a7b9@asav-3.01.com

#73

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: David Steele (#63)

Re: [CORE] postpone next week's release

On Sun, May 31, 2015 at 12:09:16PM -0400, David Steele wrote:

On 5/31/15 11:49 AM, Noah Misch wrote:

On Sat, May 30, 2015 at 09:51:04PM -0400, David Steele wrote:

Sure - I can write code to do that. But then why release a beta at all?

It's largely for the benefit of folks planning manual, or otherwise high-cost,
testing. If you budget for just one big test per year, make it a test of
beta1. For inexpensive testing, you may as well ignore beta and test git
master daily or weekly.

I've gotten to the point of (relatively) high-cost coding/testing. The
removal of checkpoint_segments and pause_on_recovery are leading to
refactoring of not only the regressions tests but the actual backup
code. 9.5 and 8.3 are the only versions that require exceptions in the
code base.

I've already done basic testing against 9.5 by disabling certain tests.
Now I'm at the point where I need to start modifying code to take new
9.5 features/changes into account and make sure the regression tests
work for 8.3-9.5 with the fewest number of exceptions possible.

Release of beta1 is the cue for that sort of work.

From the perspective of backup/restore testing, 9.5 has the most changes
since 9.0. I'd like to know that the API at least is stable before
investing the time in new development.

Its API will be as good as pgsql-hackers could make it; beta1 is also a call
for help discovering API problems we overlooked. Subsequent API changes are
usually reactions to beta test reports.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: Bruce Momjian (#56)

Restore-reliability mode

Subject changed from "Re: [CORE] postpone next week's release".

On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote:

Well, I think we stop what we are doing, focus on restructuring,
testing, and reviewing areas that historically have had problems, and
when we are done, we can look to go to 9.5 beta. What we don't want to
do is to push out more code and get back into a
wack-a-bug-as-they-are-found mode, which obviously did not serve us well
for multi-xact, and which is what releasing a beta will do, and of
course, more commit-fests, and more features.

If we have to totally stop feature development until we are all happy
with the code we have, so be it. If people feel they have to get into
cleanup mode or they will never get to add a feature to Postgres again,
so be it. If people say, heh, I am not going to do anything and just
come back when cleanup is done (by someone else), then we will end up
with a smaller but more dedicated development team, and I am fine with
that too. I am suggesting that until everyone is happy with the code we
have, we should not move forward.

I like the essence of this proposal. Two suggestions. We can't achieve or
even robustly measure "everyone is happy with the code," so let's pick
concrete exit criteria. Given criteria framed like "Files A,B,C and patches
X,Y,Z have a sign-off from a committer other than their original committer."
anyone can monitor progress and find specific ways to contribute. Second, I
would define the subject matter as "bug fixes, testing and review", not
"restructuring, testing and review." Different code structures are clearest
to different hackers. Restructuring, on average, adds bugs even more quickly
than feature development adds them.

Thanks,
nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Noah Misch (#74)

Re: Restore-reliability mode

On 3 June 2015 at 14:50, Noah Misch <noah@leadboat.com> wrote:

I

would define the subject matter as "bug fixes, testing and review", not
"restructuring, testing and review." Different code structures are
clearest
to different hackers. Restructuring, on average, adds bugs even more
quickly
than feature development adds them.

+1 to this. Rewriting or restructuring code because you don't trust it
(even though you have no reported real-world bugs) is a terrible idea.

Stopping all feature development to do it is even worse.

I know you're not talking about rewriting, but I think
http://www.joelonsoftware.com/articles/fog0000000069.html is always worth a
re-read, if only because it's funny :)

I would always 100% support a decision to push back new releases because of
bugfixes for *known* issues, but if you think you *might *be able to find
bugs in code you don't like, you should do that on your own time. Iff you
find actual bugs, *then *you talk about halting new releases.

Geoff

#76

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Noah Misch (#74)

Re: Restore-reliability mode

On 2015-06-03 09:50:49 -0400, Noah Misch wrote:

Second, I would define the subject matter as "bug fixes, testing and
review", not "restructuring, testing and review." Different code
structures are clearest to different hackers. Restructuring, on
average, adds bugs even more quickly than feature development adds
them.

I can't agree with this. While I agree with not doing large
restructuring for 9.5, I think we can't affort not to refactor for
clarity, even if that introduces bugs. Noticeable parts of our code have
to frequently be modified for new features and are badly structured at
the same time. While restructuring will may temporarily increase the
number of bugs in the short term, it'll decrease the number of bugs long
term while increasing the number of potential contributors and new
features. That's obviously not to say we should just refactor for the
sake of it.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Andres Freund (#76)

Re: Restore-reliability mode

On 06/03/2015 07:18 AM, Andres Freund wrote:

On 2015-06-03 09:50:49 -0400, Noah Misch wrote:

Second, I would define the subject matter as "bug fixes, testing and
review", not "restructuring, testing and review." Different code
structures are clearest to different hackers. Restructuring, on
average, adds bugs even more quickly than feature development adds
them.

I can't agree with this. While I agree with not doing large
restructuring for 9.5, I think we can't affort not to refactor for
clarity, even if that introduces bugs. Noticeable parts of our code have
to frequently be modified for new features and are badly structured at
the same time. While restructuring will may temporarily increase the
number of bugs in the short term, it'll decrease the number of bugs long
term while increasing the number of potential contributors and new
features. That's obviously not to say we should just refactor for the
sake of it.

Our project has been continuing to increase momentum over the last few
years and our adoption has increased at an amazing rate. It is important
to remember that we have users. These users have needs that must be met
else those users will move on to a different technology.

I agree that we need to postpone this release. I also agree that there
is likely re-factoring to be done. I have also never met a programmer
who doesn't think something needs to be re-factored. The majority of
programmers I know all suffer from NIH and want to change how things are
implemented.

If we are going to re-factor, it should not be considered global and
should be attacked with specific goals in mind. If those goals are not
specifically defined and agreed on, we will get very pretty code with
very little use for our users. Then our users will leave because they
are busy waiting on us to re-factor.

In short, we must balance this effort with the needs of the code versus
the needs of our users.

Sincerely,

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#78

Josh Berkus

josh@agliodbs.com

over 10 years ago

In reply to: Robert Haas (#23)

Re: [CORE] Restore-reliability mode

On 06/03/2015 06:50 AM, Noah Misch wrote:

Subject changed from "Re: [CORE] postpone next week's release".

On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote:

If we have to totally stop feature development until we are all happy
with the code we have, so be it. If people feel they have to get into
cleanup mode or they will never get to add a feature to Postgres again,
so be it. If people say, heh, I am not going to do anything and just
come back when cleanup is done (by someone else), then we will end up
with a smaller but more dedicated development team, and I am fine with
that too. I am suggesting that until everyone is happy with the code we
have, we should not move forward.

I like the essence of this proposal. Two suggestions. We can't achieve or
even robustly measure "everyone is happy with the code," so let's pick
concrete exit criteria. Given criteria framed like "Files A,B,C and patches
X,Y,Z have a sign-off from a committer other than their original committer."
anyone can monitor progress and find specific ways to contribute. Second, I
would define the subject matter as "bug fixes, testing and review", not
"restructuring, testing and review." Different code structures are clearest
to different hackers. Restructuring, on average, adds bugs even more quickly
than feature development adds them.

So, historically, this is what the period between feature freeze and
beta1 was for; the "consolidation" phase was supposed to deal with this.
The problem over the last few years, by my observation, has been that
consolidation has been left to just a few people (usually just Bruce &
Tom or Tom & Robert) and our code base is now much to large for that.

The way other projects deal with this is having continuous testing as
stuff comes in, and *more* testing that just our regression tests (e.g.
acceptance tests, integration tests, performance tests, etc.). So our
other issue has been that our code complexity has been growing faster
than our test suite. Part of that is that this community has never
placed much value in automated testing or testers, so people who are
interested in it find other projects to contribute to.

I would argue that if we delay 9.5 in order to do a 100% manual review
of code, without adding any new automated tests or other non-manual
tools for improving stability, then it's a waste of time; we might as
well just release the beta, and our users will find more issues than we
will. I am concerned that if we declare a cleanup period, especially in
the middle of the summer, all that will happen is that the project will
go to sleep for an extra three months.

I will also point out that there is a major adoption cost to delaying
9.5. Right now users are excited about UPSERT, big data, and extra
JSON features. If they have to wait another 7 months, they'll be a lot
less excited, and we'll lose more potential users to the new databases
and the MySQL forks. It could also delay the BDR project (Simon/Craig
can speak to this) which would suck.

Reliability of having a release every year is important as well as
database reliability ... and for a lot of the new webdev generation,
PostgreSQL is already the most reliable piece of software infrastructure
they use. So if we're going to have a cleanup delay, then let's please
make it an *intensive* cleanup delay, with specific goals, milestones,
and a schedule. Otherwise, don't bother.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM4ae914f9770e3cb63aa899dbc2db60d7ff4a3ad580e90f3dc1cd72315fff26fd1935dbe8884fd2e9f5370d881036c3bd@asav-3.01.com

#79

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Josh Berkus (#78)

Re: [CORE] Restore-reliability mode

On 2015-06-03 10:21:28 -0700, Josh Berkus wrote:

So, historically, this is what the period between feature freeze and
beta1 was for; the "consolidation" phase was supposed to deal with this.
The problem over the last few years, by my observation, has been that
consolidation has been left to just a few people (usually just Bruce &
Tom or Tom & Robert) and our code base is now much to large for that.

The way other projects deal with this is having continuous testing as
stuff comes in, and *more* testing that just our regression tests (e.g.
acceptance tests, integration tests, performance tests, etc.). So our
other issue has been that our code complexity has been growing faster
than our test suite. Part of that is that this community has never
placed much value in automated testing or testers, so people who are
interested in it find other projects to contribute to.

I would argue that if we delay 9.5 in order to do a 100% manual review
of code, without adding any new automated tests or other non-manual
tools for improving stability, then it's a waste of time; we might as
well just release the beta, and our users will find more issues than we
will. I am concerned that if we declare a cleanup period, especially in
the middle of the summer, all that will happen is that the project will
go to sleep for an extra three months.

I will also point out that there is a major adoption cost to delaying
9.5. Right now users are excited about UPSERT, big data, and extra
JSON features. If they have to wait another 7 months, they'll be a lot
less excited, and we'll lose more potential users to the new databases
and the MySQL forks. It could also delay the BDR project (Simon/Craig
can speak to this) which would suck.

Reliability of having a release every year is important as well as
database reliability ... and for a lot of the new webdev generation,
PostgreSQL is already the most reliable piece of software infrastructure
they use. So if we're going to have a cleanup delay, then let's please
make it an *intensive* cleanup delay, with specific goals, milestones,
and a schedule. Otherwise, don't bother.

+very many

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

over 10 years ago

In reply to: David Steele (#52)

Re: [CORE] postpone next week's release

On 05/31/2015 03:51 AM, David Steele wrote:

On 5/30/15 8:38 PM, Joshua D. Drake wrote:

On 05/30/2015 03:48 PM, David Steele wrote:

On 5/30/15 2:10 PM, Robert Haas wrote:

What, in this release, could break things badly? RLS? Grouping sets?
Heikki's WAL format changes? That last one sounds really scary to me;
it's painful if not impossible to fix the WAL format in a minor
release.

I would argue Heikki's WAL stuff is a perfect case for releasing a
public alpha/beta soon. I'd love to test PgBackRest with an "official"
9.5dev build. The PgBackRest test suite has lots of tests that run on
versions 8.3+ and might well shake out any bugs that are lying around.

You are right. Clone git, run it nightly automated and please, please
report anything you find. There is no reason for a tagged release for
that. Consider it a custom, purpose built, build-test farm.

Sure - I can write code to do that. But then why release a beta at all?

FWIW: we also carry "official" snapshots on the download site (
https://ftp.postgresql.org/pub/snapshot/dev/) that you could use if you
dont want git directly - those even receive some form of QA (for a
snapshot to be posted it is required to pass a full buildfarm run on the
buildbox).

Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#81

Heikki Linnakangas

hlinnaka@iki.fi

over 10 years ago

In reply to: Andres Freund (#46)

Re: [CORE] postpone next week's release

On 05/30/2015 11:47 PM, Andres Freund wrote:

I don't think it's primarily a problem of lack of review; although that
is a large problem. I think the biggest systematic problem is that the
compound complexity of postgres has increased dramatically over the
years. Features have added complexity little by little, each not
incrementally not looking that bad. But very little has been done to
manage complexity. Since 8.0 the codesize has roughly doubled, but
little has been done to manage the increased complexity. Few new
abstractions have been introduced and the structure of the code is
largely the same.

As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it
was ~500 LOC, in master it's ~1500. The interactions in 8.0 were
complex, they have gotten much more complex since. It fullfills lots of
different roles, all in one function:

(roughly in the order things happen, but simplified)
* Read the control file/determine whether we crashed
* recovery.conf handling
* backup label handling
* tablespace map handling (huh, I missed that this was added directly to
StartupXLOG. What a bad idea)
* Determine whether we're doing archive recovery, read the relevant
checkpoint if so
* relcache init file removal
* timeline switch handling
* Loading the checkpoint we're starting from
* Initialization of a lot of subsystems
* crash recovery/replay
* Including pgstat, unlogged table, exported snapshot handling
* iff hot standby, some more subsystems are initialized here
* hot standby state handling
* replay process intialization
* crash replay itself, including
* progress tracking
* recovery pause handling
* nextxid tracking
* timeline increase handling
* hot standby state handling
* unlogged relations handling
* archive recovery handling
* creation/initialization of the end of recovery checkpoint
* timeline increment if failover
* subsystem initialization iff !hot_standby
* end of recovery actions

Yes. that's one routine. And, to make things even funnier, half of that
routine isn't exercised by our tests.

You can argue that this is an outlier, but I don't think so. Heapam, the
planner, etc. have similar cases.

And I think this, to some degree, explains a lot of the multixact
problems. While there were a few "simple bugs", most of them were
interactions between the various subsystems that are rather intricate.

I think this explanation is wrong. I agree that there are many places
that would be good to refactor - like StartupXLOG() - but the multixact
code was not too bad in that regard. IIRC the patch included some
refactoring, it added some new helper functions in heapam.c, for
example. You can argue that it didn't do enough of it, but that was not
the big issue.

The big issue was at the architecture level. Basically, we liked
vacuuming of XIDs and clog so much that we decided that it'd be nice if
you had to vacuum multixids too, in order to not lose data. Many of the
bugs and issues were not new - we had multixids before - but we upped
the ante and turned minor locking bugs into data loss. And that had
nothing to do with the code structure - we'd have similar issues if we
had rewritten everything java, with the same design.

So, I'm all for refactoring and adding abstractions where it makes
sense, but it's not going to solve design problems.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#82

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Heikki Linnakangas (#81)

Re: [CORE] postpone next week's release

On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote:

I think this explanation is wrong. I agree that there are many places that
would be good to refactor - like StartupXLOG() - but the multixact code was
not too bad in that regard. IIRC the patch included some refactoring, it
added some new helper functions in heapam.c, for example. You can argue that
it didn't do enough of it, but that was not the big issue.

Yea, but the bugs were more around the interactions to other parts of
the system. Like e.g. crash recovery, which now is about bug 7 or
so. And those are the ones that are hard to understand.

The big issue was at the architecture level. Basically, we liked vacuuming
of XIDs and clog so much that we decided that it'd be nice if you had to
vacuum multixids too, in order to not lose data. Many of the bugs and issues
were not new - we had multixids before - but we upped the ante and turned
minor locking bugs into data loss. And that had nothing to do with the code
structure - we'd have similar issues if we had rewritten everything java,
with the same design.

I think we're probably just using slightly different terms here - for me
one very good way of fixing some structurally bad things *is* improving
the design.

If you look at the bugs around multixacts: The first few were around
ctid-chaining, hard to find and fix because there's about 8-10 places
implementing it with slight differences. The next bunch were around
vacuuming, some of them oversights, a good bunch of them more
fundamental. Crash recovery wasn't thought about (lack of
testing/review), and more generally the new code tripped over bad old
decisions (hey, wraparound is ok!). Then there were a bunch of stupid
bugs in crash-recovery (testing mainly), and larger scale bugs (hey, let's
access stuff during recovery). Then there's the whole row level locking
code - which is by now among the hardest to understand code in
postgres - and voila it contained a bunch of oversights that were hard
to spot.

So yes, I think nicer code to work with would have prevented us from
making a significant portion of these. It might have also made us
realize earlier how significant the increase in complexity was.

So, I'm all for refactoring and adding abstractions where it makes sense,
but it's not going to solve design problems.

I personally don't really see the multixact changes being that bad on
the overall design. It pretty much just extended an earlier design. Now
that wasn't great, but I don't think too many people had realized that
at that point. The biggest problem was underestimating the complexity.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83

Heikki Linnakangas

hlinnaka@iki.fi

over 10 years ago

In reply to: Robert Haas (#23)

Re: [CORE] postpone next week's release

On 06/04/2015 12:17 PM, Andres Freund wrote:

On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote:

So, I'm all for refactoring and adding abstractions where it makes sense,
but it's not going to solve design problems.

I personally don't really see the multixact changes being that bad on
the overall design. It pretty much just extended an earlier design. Now
that wasn't great, but I don't think too many people had realized that
at that point. The biggest problem was underestimating the complexity.

Yeah, many of the issues were pre-existing, and would've been good to
fix anyway.

The multixact issues remind me of the another similar thing we did: the
visibility map. It too was non-critical when it was first introduced,
but later we started using it for index-only-scans, and it suddenly
became important that it's up-to-date and crash-safe. We did uncover
some bugs in that area when index-only-scans were introduced, similar to
the multixact bugs, only not as bad because it didn't lead to data loss.
I don't have any point to make with that comparison, but it was similar
in many ways.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: 28642_1433409468_557017BC_28642_54_1_20150604091742.GI18006@awork2.anarazel.de

#84

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Tom Lane (#34)

Re: [CORE] postpone next week's release

On 30 May 2015 at 05:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, May 29, 2015 at 6:33 PM, Andres Freund <andres@anarazel.de>

wrote:

Why? A large portion of the input required to go from beta towards a
release is from actual users. To see when things break, what confuses
them and such.

I have two concerns:

1. I'm concerned that once we release beta, any idea about reverting a
feature or fixing something that is broken will get harder, because
people will say "well, we can't do that after we've released a beta".
I confess to particularly wanting a solution to the item listed as
"custom-join has no way to construct Plan nodes of child Path nodes",
the history of which I'll avoid recapitulating until I'm sure I can do
it while maintaining my blood pressure at safe levels.

2. Also, if we're going to make significant multixact-related changes
to 9.5 to try to improve reliability, as you proposed on the other
thread, then it would be nice to do that before beta, so that it gets
tested. Of course, someone is bound to point out that we could make
those changes in time for beta2, and people could test that. But in
practice I think that'll just mean that stuff is only out there for
let's say 2 months before we put it in a major release, which ain't
much.

I think your position is completely nuts. The GROUPING SETS code is
desperately in need of testing. The custom-plan code is desperately
in need of fixing and testing. The multixact code is desperately
in need of testing. The open-items list has several other problems
besides those. All of those problems are independent. If we insist
on tackling them serially rather than in parallel, 9.5 might not come
out till 2017.

I agree that we are not in a position to promise features won't change.
So let's call it an alpha not a beta --- but for heaven's sake let's
try to move forward on all these issues, not just some of them.

I think releasing 9.5 in some form NOW will aid its software quality.

We've never linked Beta release date to final release date, so if the
quality proves to be as poor as some people think then the list of bugs
will show that and we release later.

AFAIK beta period is exactly the time when we are allowed to pull features
from the release. I welcome the idea that we test it, if its stable and it
works we release it. If doesn't, we pull it.

Not releasing our software yet making a list of our fears doesn't work
towards a solution. Our fears will make us shout at each other too, so I
for one would rather skip that part and do some practical actions.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#85

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Josh Berkus (#78)

Re: [CORE] Restore-reliability mode

Josh,

* Josh Berkus (josh@agliodbs.com) wrote:

I would argue that if we delay 9.5 in order to do a 100% manual review
of code, without adding any new automated tests or other non-manual
tools for improving stability, then it's a waste of time; we might as
well just release the beta, and our users will find more issues than we
will. I am concerned that if we declare a cleanup period, especially in
the middle of the summer, all that will happen is that the project will
go to sleep for an extra three months.

This is the exact same concern that I have. A delay just to have a
delay is not useful. I completely agree that we need more automated
testing, etc, though getting all of that set up and running could be
done at any time too- there's no reason to wait, nor do I believe
delaying 9.5 would make such automated testing appear.

Thanks!

Stephen

#86

Craig Ringer

craig@2ndquadrant.com

over 10 years ago

In reply to: Stephen Frost (#85)

Re: [CORE] Restore-reliability mode

On 4 June 2015 at 22:43, Stephen Frost <sfrost@snowman.net> wrote:

Josh,

* Josh Berkus (josh@agliodbs.com) wrote:

I would argue that if we delay 9.5 in order to do a 100% manual review
of code, without adding any new automated tests or other non-manual
tools for improving stability, then it's a waste of time; we might as
well just release the beta, and our users will find more issues than we
will. I am concerned that if we declare a cleanup period, especially in
the middle of the summer, all that will happen is that the project will
go to sleep for an extra three months.

This is the exact same concern that I have. A delay just to have a
delay is not useful. I completely agree that we need more automated
testing, etc, though getting all of that set up and running could be
done at any time too- there's no reason to wait, nor do I believe
delaying 9.5 would make such automated testing appear.

In terms of specific testing improvements, things I think we need to have
covered and runnable on the buildfarm are:

* pg_dump and pg_restore testing (because it's scary we don't do this)
* WAL archiving based warm standby testing with promotion
* Two node streaming replication with promotion, both with a slot and with
archive fallback
* Three node cascading streaming replication with middle node promotion
then tail end node promotion
* Logical decoding streaming testing, comparing to expected decoded output
* DDL deparse test coverage for all operations
* pg_basebackup + start up from backup
* hard-kill the postmaster, start up from crashed datadir
* pg_start_backup, rsync, pg_stop_backup, start up in hot standby
* disk exhaustion tests both for pg_xlog and for the main datadir, showing
we can recover OK when disk is filled then space is freed
* Tests of crash recovery during various DDL operations

Obviously some of these overlap, so one test can cover more than one item.

Implementing these requires stepping outside the comfortable zone of
pg_regress and the isolationtester and having something that can manage
multiple data directories. It's also hard to be sure you're testing the
same thing each time - for example, when using streaming replication with
archive fallback, it might be tricky to ensure that your replica falls
behind and falls back to WAL archive each time. There's always SIGSTOP I
guess.

While these are multi-node tests, at least in PostgreSQL we can just run on
different ports, so there's no need to muck about with containers or VMs.

I already run some of these tests using Ansible for BDR, but I don't
imagine that'd be acceptable in core. It's Python, and it's not especially
well suited to use as a regression testing framework, it's just what I had
to hand and already needed for other automation tasks.

Is pg_tap a reasonable starting point for this sort of testing?

Am I missing obvious and important tests?

How would a test that would've caught the multixact issues look?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#87

Michael Paquier

michael.paquier@gmail.com

over 10 years ago

In reply to: Craig Ringer (#86)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 8:53 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 4 June 2015 at 22:43, Stephen Frost <sfrost@snowman.net> wrote:

Josh,

* Josh Berkus (josh@agliodbs.com) wrote:

I would argue that if we delay 9.5 in order to do a 100% manual review
of code, without adding any new automated tests or other non-manual
tools for improving stability, then it's a waste of time; we might as
well just release the beta, and our users will find more issues than we
will. I am concerned that if we declare a cleanup period, especially in
the middle of the summer, all that will happen is that the project will
go to sleep for an extra three months.

This is the exact same concern that I have. A delay just to have a
delay is not useful. I completely agree that we need more automated
testing, etc, though getting all of that set up and running could be
done at any time too- there's no reason to wait, nor do I believe
delaying 9.5 would make such automated testing appear.

In terms of specific testing improvements, things I think we need to have
covered and runnable on the buildfarm are:

* pg_dump and pg_restore testing (because it's scary we don't do this)

We do test it in some way with pg_upgrade using set of objects that
are not removed by the regression test suite. Extension dumps are
uncovered yet though.

* WAL archiving based warm standby testing with promotion
* Two node streaming replication with promotion, both with a slot and with
archive fallback
* Three node cascading streaming replication with middle node promotion then
tail end node promotion
* Logical decoding streaming testing, comparing to expected decoded output
* hard-kill the postmaster, start up from crashed datadir
* pg_basebackup + start up from backup
* pg_start_backup, rsync, pg_stop_backup, start up in hot standby
* Tests of crash recovery during various DDL operations

Well, steps in this direction are the point of this patch, the
replication test suite:
https://commitfest.postgresql.org/5/197/
And this one, addition of Windows support for TAP tests:
https://commitfest.postgresql.org/5/207/

* DDL deparse test coverage for all operations

What do you have in mind except what is already in objectaddress.sql
and src/test/modules/test_dll_deparse/?

* disk exhaustion tests both for pg_xlog and for the main datadir, showing
we can recover OK when disk is filled then space is freed

This may be tricky. How would you emulate that?

Is pg_tap a reasonable starting point for this sort of testing?

IMO, using the TAP machinery would be a good base for that. What lacks
is a basic set of perl routines that one can easily use to set of test
scenarios.

How would a test that would've caught the multixact issues look?

I have not followed closely those discussions, not sure about that.

Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Josh Berkus (#78)

Re: [CORE] Restore-reliability mode

On 3 June 2015 at 18:21, Josh Berkus <josh@agliodbs.com> wrote:

I would argue that if we delay 9.5 in order to do a 100% manual review
of code, without adding any new automated tests or other non-manual
tools for improving stability, then it's a waste of time; we might as
well just release the beta, and our users will find more issues than we
will. I am concerned that if we declare a cleanup period, especially in
the middle of the summer, all that will happen is that the project will
go to sleep for an extra three months.

Agreed. Cleanup can occur while we release code for public testing.

Many eyeballs of Beta beats anything we can throw at it thru manual
inspection. The whole problem of bugs is that they are mostly found by
people trying to use the software.

I will also point out that there is a major adoption cost to delaying
9.5. Right now users are excited about UPSERT, big data, and extra
JSON features. If they have to wait another 7 months, they'll be a lot
less excited, and we'll lose more potential users to the new databases
and the MySQL forks.

Reliability of having a release every year is important as well as
database reliability ... and for a lot of the new webdev generation,
PostgreSQL is already the most reliable piece of software infrastructure
they use. So if we're going to have a cleanup delay, then let's please
make it an *intensive* cleanup delay, with specific goals, milestones,
and a schedule. Otherwise, don't bother.

We've decided previously that having a fixed annual schedule was a good
thing for the project. Getting the features that work into the hands of the
people that want them is very important.

Discussing halting the development schedule publicly is very damaging.

If there are features in doubt, lets do more work on them or just pull them
now and return to the schedule. I don't really care which ones get canned
as long as we return to the schedule.

Whatever we do must be exact and measurable. If its not, it means we
haven't assembled enough evidence for action that is sufficiently directed
to achieve the desired goal.

On 3 June 2015 at 18:21, Josh Berkus <josh@agliodbs.com> wrote:

It could also delay the BDR project (Simon/Craig

can speak to this) which would suck.

Nothing being discussed here can/will slow down the BDR project since it is
already a different thread of development. More so, 2ndQuadrant has zero
income tied to the release of 9.5 or the commit of any feature, so as far
as that company is concerned, the release could wait for 10 years.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#89

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Noah Misch (#74)

Re: Restore-reliability mode

On 3 June 2015 at 14:50, Noah Misch <noah@leadboat.com> wrote:

Subject changed from "Re: [CORE] postpone next week's release".

On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote:

Well, I think we stop what we are doing, focus on restructuring,
testing, and reviewing areas that historically have had problems, and
when we are done, we can look to go to 9.5 beta. What we don't want to
do is to push out more code and get back into a
wack-a-bug-as-they-are-found mode, which obviously did not serve us well
for multi-xact, and which is what releasing a beta will do, and of
course, more commit-fests, and more features.

If we have to totally stop feature development until we are all happy
with the code we have, so be it. If people feel they have to get into
cleanup mode or they will never get to add a feature to Postgres again,
so be it. If people say, heh, I am not going to do anything and just
come back when cleanup is done (by someone else), then we will end up
with a smaller but more dedicated development team, and I am fine with
that too. I am suggesting that until everyone is happy with the code we
have, we should not move forward.

I like the essence of this proposal. Two suggestions. We can't achieve or
even robustly measure "everyone is happy with the code," so let's pick
concrete exit criteria. Given criteria framed like "Files A,B,C and
patches
X,Y,Z have a sign-off from a committer other than their original
committer."
anyone can monitor progress and find specific ways to contribute.

I don't like the proposal, nor do I like the follow on comments made.

This whole idea of "feature development" vs reliability is bogus. It
implies people that work on features don't care about reliability. Given
the fact that many of the features are actually about increasing database
reliability in the event of crashes and corruptions it just makes no sense.

How will we participate in cleanup efforts? How do we know when something
has been "cleaned up", how will we measure our success or failure? I think
we should be clear that wasting N months on cleanup can *fail* to achieve a
useful objective. Without a clear plan it almost certainly will do so. The
flip side is that wasting N months will cause great amusement and dancing
amongst those people who wish to pull ahead of our open source project and
we should take care not to hand them a victory from an overreaction.

Lastly, the idea that we allow developers to drift away and we're OK with
that is just plain mad. I've spent a decade trying to grow the pool of
skilled developers who can assist the project. Acting against that, in deed
or just word, is highly counter productive for the project.

Let's just take a breath and think about this.

It is normal for us to spend a month or so consolidating our work. It is
also normal for people that see major problems to call them out,
effectively using the "Stop The Line" technique.
https://leanbuilds.wordpress.com/tag/stop-the-line/

So lets do our normal things, not do a "total stop" for an indefinite
period. If someone has specific things that in their opinion need to be
addressed, list them and we can talk about doing them, together. I thought
that was what the Open Items list was for. Let's use it.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#90

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Simon Riggs (#88)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Agreed. Cleanup can occur while we release code for public testing.

The code is available for public testing right now. Stamping it a
beta implies that we think it's something fairly stable that we'd be
pretty happy to release if things go well, which is a higher bar to
clear.

I can't help noticing for all the drumbeat of "let's release 9.5 beta
now", activity to clean up the items on this list seems quite
sluggish:

https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

I've seen Tom and a few other people doing some work that I would
describe as useful pre-beta stabilization, but I think there is a good
bit more that could be done, and that list is a good starting point.
I hope to have time to do some myself, but right now I am busy trying
to stabilize 9.3, along with Alvaro, Noah, Andres, and Thomas Munro,
and PGCon is coming up in just over a week. I think we could afford
to give ourselves at least until a few weeks following PGCon to tidy
up.

I do agree that an indefinite development freeze with unclear
parameters for resuming development and unclear goals is a bad plan.
But I think giving ourselves a little more time to, say, turn the
buildfarm consistently green, and, say, fix the known but
currently-unfixed multixact bugs, and, say, fix the known bugs in 9.5
features is a good plan, and I hope you and others will support it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#91

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#90)

Re: [CORE] Restore-reliability mode

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jun 5, 2015 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Agreed. Cleanup can occur while we release code for public testing.

The code is available for public testing right now.

Only to people who have the time and ability to pull the code from git
and build from source. I don't know exactly what fraction of interested
testers that excludes, but I bet it's significant. The point of producing
packages would be to remove that barrier to testing.

Stamping it a
beta implies that we think it's something fairly stable that we'd be
pretty happy to release if things go well, which is a higher bar to
clear.

So let's call it an alpha, or some other way of setting expectations
appropriately. But I think it's silly to maintain that the code is not in
a state where end-user testing is useful. They just have to understand
that they can't trust it with production data.

I can't help noticing for all the drumbeat of "let's release 9.5 beta
now", activity to clean up the items on this list seems quite
sluggish:
https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

While we need to work on those items, I do not agree that getting that
list to empty has to happen before we release a test version. I think
serializing effort in that way is simply bad project management. And
it's not how we've operated in the past either: getting the open items
list to empty has always been understood as a prerequisite to RC versions,
not to betas.

To get to specifics instead of generalities: exactly which of the current
open items do you think is so bad that it precludes user testing? I do
not see a beta-blocker in the lot.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#92

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Simon Riggs (#88)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 07:50:31AM +0100, Simon Riggs wrote:

On 3 June 2015 at 18:21, Josh Berkus <josh@agliodbs.com> wrote:
ï¿½

I would argue that if we delay 9.5 in order to do a 100% manual review
of code, without adding any new automated tests or other non-manual
tools for improving stability, then it's a waste of time; we might as
well just release the beta, and our users will find more issues than we
will.ï¿½ I am concerned that if we declare a cleanup period, especially in
the middle of the summer, all that will happen is that the project will
go to sleep for an extra three months.

Agreed. Cleanup can occur while we release code for public testing.

Many eyeballs of Beta beats anything we can throw at it thru manual inspection.
The whole problem of bugs is that they are mostly found by people trying to use
the software.ï¿½

Please address some of the specific issues I mentioned. The problem
with the multi-xact case is that we just kept fixing bugs as people
found them, and did not do a holistic review of the code. I am saying
let's not _keep_ doing that and let's make sure we don't have any
systematic problems in our code where we just keep fixing things without
doing a thorough analysis.

To release 9.5 beta would be to get back into that cycle, and I am not
sure we are ready for that. I think the fact we have multiple people
all reviewing the multi-xact code now (and not dealing with 9.5) is a
good sign. If we were focused on 9.5 beta, I doubt this would have
happened.

I am saying let's make sure we are not deficient in other areas, then
let's move forward again. I would love to think we can do multiple
things at once, but for multi-xact, serious review didn't happen for 18
months, so if slowing release development is what is required, I support
it.

We've decided previously that having a fixed annual schedule was a good thing
for the project. Getting the features that work into the hands of the people
that want them is very important.

Yes, but let's not be a slave to the schedule if our reliability is
suffering, which it clearly has in the past 18 months.

Discussing halting the development schedule publicly is very damaging.ï¿½

Agreed.

If there are features in doubt, lets do more work on them or just pull them now
and return to the schedule. I don't really care which ones get canned as long
as we return to the schedule.

Again, please address my concerns above. This is not about 9.5
features, but rather our overall focus on schedule vs. reliability, and
your arguments are reinforcing my idea that we do not have the proper
balance here.

Whatever we do must be exact and measurable. If its not, it means we haven't
assembled enough evidence for action that is sufficiently directed to achieve
the desired goal.

Sure. I think everyone agrees the multi-xact work is all good, so I am
asking what else needs this kind of research. If there is nothing else,
we can move forward again --- I am just saying we need to ask the
reliability question _first_.

Let me restate something that has appeared in many replies to my ideas
--- I am not asking for infinite or unbounded review, but I am asking
that we make sure reliability gets the proper focus in relation to our
time pressures.  Our balance was so off a month ago that I feel only a
full stop on time pressure would allow us to refocus because people are
not good at focusing on multiple things. It is sometimes necessary to
stop everything to get people's attention, and to help them remember
that without reliability, a database is useless.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Michael Paquier (#87)

Re: [CORE] Restore-reliability mode

Michael Paquier wrote:

On Fri, Jun 5, 2015 at 8:53 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

In terms of specific testing improvements, things I think we need to have
covered and runnable on the buildfarm are:

* pg_dump and pg_restore testing (because it's scary we don't do this)

We do test it in some way with pg_upgrade using set of objects that
are not removed by the regression test suite. Extension dumps are
uncovered yet though.

We could put more emphasis on having objects of all kinds remain in the
regression database, so that the pg_upgrade test covers more of this.

What happened with the extension tests patches you submitted? They
seemed valuable to me, but I lost track.

* DDL deparse test coverage for all operations

What do you have in mind except what is already in objectaddress.sql
and src/test/modules/test_dll_deparse/?

The current SQL scripts in that test do not cover all possible object
types, so there's a lot of the decoding capabilities that are currently
not exercised. So one way to attack this would be to add more object
types to those files. However, a completely different way is to have
the test process serial_schedule from src/test/regress and run
everything in there under deparse. That would be even more useful,
because whenever some future DDL is added, we will automatically get
coverage.

How would a test that would've caught the multixact issues look?

I have not followed closely those discussions, not sure about that.

One issue with these bugs is that unless you use things such as
pg_burn_multixact, producing large enough numbers of multixacts takes a
long time. I've wondered if we could somehow make those easier to
reproduce by lowering the range, and thus doing thousands of
wraparounds, freezing and truncations in reasonable time. (For example,
change the typedefs to uint16 rather than uint32). But then the issue
becomes that the test code is not exactly equivalent to the production
code, which could cause its own bugs.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#94

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Robert Haas (#90)

Re: [CORE] Restore-reliability mode

On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jun 5, 2015 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Agreed. Cleanup can occur while we release code for public testing.

The code is available for public testing right now.

People test when they get the signal from us, not before. While what you
say is literally correct, that is not the point.

Stamping it a
beta implies that we think it's something fairly stable that we'd be
pretty happy to release if things go well, which is a higher bar to
clear.

We don't have a clear definition of what Beta means. For me, Beta has
always meant "trial software, please test".

I don't think anybody will say anything bad about us if we release a beta
and then later pull some of the features because we are not confident with
them when AFTER testing the feature is shown to be below our normal
standard; that will bring us credit, I feel. It is extremely common in
software development to defer some of the features if their goals aren't
met, or to change APIs and interfaces based upon user feedback.

Making decisions on what will definitely be in a release BEFORE testing and
feedback seems foolhardy and certainly not scientific.

None of this means I disagree with assessments of the current state of the
software, I'm saying that we should simply follow the normal process and
stick to the schedule we have previously agreed, for all of the reasons
cited when we agreed it.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#95

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#91)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jun 5, 2015 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Agreed. Cleanup can occur while we release code for public testing.

The code is available for public testing right now.

Only to people who have the time and ability to pull the code from git
and build from source. I don't know exactly what fraction of interested
testers that excludes, but I bet it's significant. The point of producing
packages would be to remove that barrier to testing.

Sure, I agree with that.

Stamping it a
beta implies that we think it's something fairly stable that we'd be
pretty happy to release if things go well, which is a higher bar to
clear.

So let's call it an alpha, or some other way of setting expectations
appropriately. But I think it's silly to maintain that the code is not in
a state where end-user testing is useful. They just have to understand
that they can't trust it with production data.

I don't maintain that end-user testing is unuseful at this point. I
do maintain that it would be better to (1) finish fixing the known
multixact bugs and (2) clean up some of the open items before we make
a big push in that direction. For example, consider this item from
the open items list:

/messages/by-id/CAHGQGwEqWD=yNQE+ZojbpoxyWT3xLK52-V_q9S+XOfCKJd5egA@mail.gmail.com

Now this is a fundamental definitional issue about how RLS is supposed
to work. I'm not going to deny that we COULD ship a release without
deciding what the behavior should be there, but I don't think it's a
good idea. I am fine with the possibility that one of our new
features may, say, dump core someplace due to a NULL pointer deference
we haven't found yet. Such bugs can always exist, but they are easily
fixed once found. But if we're not clear on how a feature is supposed
to behave, which seems to be the case here, I favor trying to resolve
that issue before shipping anything. Otherwise, we're saying "test
this, even though the final version will likely work differently".
That's not really helpful for us and will discourage testers from
doing anything at all.

Going through the open items, the other ones that seem to involve
definitional changes are:

1. FPW compression leaks information - The usefulness of the GUC may
depend on its PGC_*-ness. We should decide what we want to do before
asking people to test it.

2. custom-join has no way to construct Plan nodes of child Path nodes
- The entire feature is a C API, and the API needs to be changed. We
should finalize the API before asking people to test whether they can
use it for interesting things.

3. recovery_target_action = pause & hot_standby = off - Rumor has it
we replaced one surprising behavior with a different but
equally-surprising behavior. We should decide what the right thing is
and make sure the code is doing that before calling it a release.

4. Arguable RLS security bug, EvalPlanQual() paranoia - This seems
like another question of what the expectations around RLS actually
are.

I would also argue that we really ought to make a decision about
"basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe"
before we get too close to final release. Maybe it's not a
beta-blocker, exactly, but it doesn't seem like the sort of change
that should be rushed in too close to the end, because it looks sorta
complicated and scary. (Those are the technical terms.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#96

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Robert Haas (#90)

Re: [CORE] Restore-reliability mode

On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:

I do agree that an indefinite development freeze with unclear
parameters for resuming development and unclear goals is a bad plan.
But I think giving ourselves a little more time to, say, turn the
buildfarm consistently green, and, say, fix the known but
currently-unfixed multixact bugs, and, say, fix the known bugs in 9.5
features is a good plan, and I hope you and others will support it.

Yes, its a good plan and I support that. That's just normal process.

If you mean we should allow that to stall the release of Beta then I
disagree. The presence of bugs clearly has nothing to do with the discovery
of new ones and we should be looking to discover as many as possible as
quickly as possible.

I can understand the argument to avoid releasing Beta because of Dev
Meeting, so we should aim for June 25th Beta 1.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#97

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Bruce Momjian (#92)

Re: [CORE] Restore-reliability mode

On 2015-06-05 11:05:14 -0400, Bruce Momjian wrote:

To release 9.5 beta would be to get back into that cycle, and I am not
sure we are ready for that. I think the fact we have multiple people
all reviewing the multi-xact code now (and not dealing with 9.5) is a
good sign. If we were focused on 9.5 beta, I doubt this would have
happened.

At least form me that I'm working on multixacts right now has nothing to
do with to beta or not to beta.

And I don't understand why releasing an alpha or beta would detract from
that right now. We need more people doing crazy shit with our codebase,
not fewer.

None of the master-only issues is a blocker for an alpha, so besides
some release work within the next two weeks I don't see what'd detract
us that much?

I am saying let's make sure we are not deficient in other areas, then
let's move forward again.

I don't think we actually can do that. The problem of the multixact
stuff is precisely that it looked so innocent that a bunch of
experienced people just didn't see the problem. Omniscience is easy in
hindsight.

I would love to think we can do multiple things at once, but for
multi-xact, serious review didn't happen for 18 months, so if slowing
release development is what is required, I support it.

FWIW, I can stomach a week or four of doing bugfix only stuff. After
that I'm simply not going to be efficient at that anymore. And I
seriously doubt that I'm the only one like that. Doing the same thing
for weeks makes you miss obvious stuff.

I don't think anything as localized as 'do nothing but bugfixes for a
while and then carry on' actually will solve the problem. We need to
find and reallocate resources to put more emphasis on review, robustness
and refactoring in the long term, not do panick-y stuff short term. This
isn't a problem that can be solved by focusing on bugfixing for a week
or four.

That means we have to convince employers to actually *pay* us (people
experienced with the codebase) to do work on these kind of things
instead of much-easier-to-market new features. A lot of
review/robustness work has been essentially done in our spare time,
after long days. Which means the employers need to get more people.

Sure. I think everyone agrees the multi-xact work is all good, so I am
asking what else needs this kind of research. If there is nothing else,
we can move forward again --- I am just saying we need to ask the
reliability question _first_.

I'm starting to get grumpy here. You've called for review in lots of
emails now. Let's get going then?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#98

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#95)

Re: [CORE] Restore-reliability mode

Robert Haas <robertmhaas@gmail.com> writes:

I don't maintain that end-user testing is unuseful at this point. I
do maintain that it would be better to (1) finish fixing the known
multixact bugs and (2) clean up some of the open items before we make
a big push in that direction. For example, consider this item from
the open items list:

/messages/by-id/CAHGQGwEqWD=yNQE+ZojbpoxyWT3xLK52-V_q9S+XOfCKJd5egA@mail.gmail.com

Now this is a fundamental definitional issue about how RLS is supposed
to work. I'm not going to deny that we COULD ship a release without
deciding what the behavior should be there, but I don't think it's a
good idea. I am fine with the possibility that one of our new
features may, say, dump core someplace due to a NULL pointer deference
we haven't found yet. Such bugs can always exist, but they are easily
fixed once found. But if we're not clear on how a feature is supposed
to behave, which seems to be the case here, I favor trying to resolve
that issue before shipping anything. Otherwise, we're saying "test
this, even though the final version will likely work differently".
That's not really helpful for us and will discourage testers from
doing anything at all.

The other side of that coin is that we might get useful comments from
testers on how the feature ought to work. I don't agree with the notion
that all feature details must be graven on stone tablets before we start
trying to get feedback from people outside the core development community.

The same point applies to the FDW C API questions, or to RLS, or to the
"expanded objects" work that I did. (I'd really love it if the PostGIS
folk would try to use that sometime before it's too late to adjust the
definition...) Now, you could argue that people likely to have useful
input on those issues are fully capable of working with git tip, and you'd
probably be right, but would they do so? As Simon says nearby, publishing
an alpha/beta/whatever is our signal to the wider community that it's time
for them to start paying attention. I do not think they will look at 9.5
until we do that; and I think it'll be our loss if they don't start
looking at these things soon.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#99

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#95)

Re: [CORE] Restore-reliability mode

On 2015-06-05 11:20:52 -0400, Robert Haas wrote:

I don't maintain that end-user testing is unuseful at this point.

Unless I misunderstand you, and you're not saying that user level
testing wouldn't be helpful right now, I'm utterly baffled. There's
loads of user-exposed features that desperately need exposure.

Looking at https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL I
don't see a single item that correlates with the ones on the open items
list list. Sure, it's incomplete. But that's a lot of stuff to test
already. And the authors of those features can work on fixing the issues
coming up. Lots of those features have barely got any testing at this
point.

do maintain that it would be better to (1) finish fixing the known
multixact bugs and (2) clean up some of the open items before we make
a big push in that direction.

There's maybe 3-4 people that can actually do something about the
existing issues on that list. The community is far bigger than
that. Right now everyone is sitting on the sidelines and twiddling their
thumbs or developing new stuff. At least that's my impression.

2. custom-join has no way to construct Plan nodes of child Path nodes
- The entire feature is a C API, and the API needs to be changed. We
should finalize the API before asking people to test whether they can
use it for interesting things.

I think any real world exposure of that API will result in much larger
changes than that.

3. recovery_target_action = pause & hot_standby = off - Rumor has it
we replaced one surprising behavior with a different but
equally-surprising behavior. We should decide what the right thing is
and make sure the code is doing that before calling it a release.

Fujii pushed the bugfix, restoring the old behaviour afaics. It's imo
still crazy, but at this point it doesn't look like a 9.5 discussion.

4. Arguable RLS security bug, EvalPlanQual() paranoia - This seems
like another question of what the expectations around RLS actually
are.

In the end that's minor from the end user's perspective.

I would also argue that we really ought to make a decision about
"basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe"
before we get too close to final release. Maybe it's not a
beta-blocker, exactly, but it doesn't seem like the sort of change
that should be rushed in too close to the end, because it looks sorta
complicated and scary. (Those are the technical terms.)

Yea, I'd really like to get that in at some point. I'll work on it as
soon I've finished the multixact truncation thingy.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#100

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Andres Freund (#97)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 05:36:41PM +0200, Andres Freund wrote:

I don't think anything as localized as 'do nothing but bugfixes for a
while and then carry on' actually will solve the problem. We need to
find and reallocate resources to put more emphasis on review, robustness
and refactoring in the long term, not do panick-y stuff short term. This
isn't a problem that can be solved by focusing on bugfixing for a week
or four.

Fine. We just need that refocus, and people usually can't refocus while
they are worried about other pressures, e.g. time --- its like trying to
adjust the GPS while driving --- not easy.

That means we have to convince employers to actually *pay* us (people
experienced with the codebase) to do work on these kind of things
instead of much-easier-to-market new features. A lot of
review/robustness work has been essentially done in our spare time,
after long days. Which means the employers need to get more people.

Agreed --- that is a serious long-term need.

Sure. I think everyone agrees the multi-xact work is all good, so I am
asking what else needs this kind of research. If there is nothing else,
we can move forward again --- I am just saying we need to ask the
reliability question _first_.

I'm starting to get grumpy here. You've called for review in lots of
emails now. Let's get going then?

I really don't know. If people say we don't have anything like
multi-xact that we have avoided, then I have no further concerns. I am
asking that such decisions be made independent of external time
pressures.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#101

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Bruce Momjian (#92)

Re: [CORE] Restore-reliability mode

On 5 June 2015 at 16:05, Bruce Momjian <bruce@momjian.us> wrote:

Please address some of the specific issues I mentioned.

I can discuss them but not because I am involved directly. I take
responsibility as a committer and have an interest from that perspective.

In my role at 2ndQuadrant, I approved all of the time Alvaro and Andres
spent on submitting, reviewing and fixing bugs - at this point that has
cost something close to fifty thousand dollars just on this feature and
subsequent actions. (I believe the feature was originally funded, but we
never saw a penny of that, though others did.)

The problem
with the multi-xact case is that we just kept fixing bugs as people
found them, and did not do a holistic review of the code.

I observed much discussion and review. The bugs we've had have all been
fairly straightforwardly fixed. There haven't been any design-level
oversights or head-palm moments. It's complex software that had complex
behaviour that caused problems. The problem has been that anything on-disk
causes more problems when errors occur. We should review carefully anything
that alters the way on-disk structures work, like the WAL changes, UPSERTs
new mechanism etc..

From my side, it is only recently I got some clear answers to my questions
about how it worked. I think it is very important that major features have
extensive README type documentation with them so the underlying principles
used in the development are clear. I would define the measure of a good
feature as whether another committer can read the code comments and get a
good feel. A bad feature is one where committers walk away from it, saying
I don't really get it and I can't read an explanation of why it does that.
Tom's most significant contribution is his long descriptive comments on
what the problem is that need to be solved, the options and the method
chosen. Clarity of thought is what solves bugs.

Overall, I don't see the need to stop the normal release process and do a
holistic review. But I do think we should check each feature to see whether
it is fully documented or whether we are simply trusting one of us to be
around to fix it.

I am just saying we need to ask the

reliability question _first_.

Agreed

Let me restate something that has appeared in many replies to my ideas
--- I am not asking for infinite or unbounded review, but I am asking
that we make sure reliability gets the proper focus in relation to our
time pressures.  Our balance was so off a month ago that I feel only a
full stop on time pressure would allow us to refocus because people are
not good at focusing on multiple things. It is sometimes necessary to
stop everything to get people's attention, and to help them remember
that without reliability, a database is useless.

Here, I think we are talking about different types of reliability.
PostgreSQL software is well ahead of most industry measures of quality;
these recent bugs have done nothing to damage that, other than a few people
woke up and said "Wow! Postgres had a bug??!?!?". The presence of bugs is
common and if we have grown unused to them, we should be wary of that,
though not tolerant.

PostgreSQL is now reliable in the sense that we have many features that
ensure availability even in the face of software problems and bug induced
corruption. Those have helped us get out of the current situations, giving
users a workaround while bugs are fixed. So the impact of database software
bugs is not what it once was.

Reliable delivery of new versions of software is important too. New
versions often contain new features that fix real world problems, just as
much as bug fixes do, hence why I don't wish to divert from the normal
process and schedule.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#102

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Michael Paquier (#87)

Re: [CORE] Restore-reliability mode

On 6/4/15 11:28 PM, Michael Paquier wrote:
<list of things to test>
* More configuration variations; ./configure, initdb options, and *.conf
* More edge-case testing. (ie: what happens to each varlena as it
approaches 1GB? 1B tables test. Etc.)
* More race-condition testing, like the tool Peter used heavily during
ON CONFLICT development (written by Jeff Janes?)
* More non-SQL testing. For example, the logic in HeapTupleSatisfies* is
quite complicated yet there's no tests dedicated to ensuring the logic
is correct because it'd be extremely difficult (if not impossible) to
construct those tests at a SQL level. Testing them with direct test
calls to HeapTupleSatisfies* wouldn't be difficult, but we have no
machinery to do C level testing.

Is pg_tap a reasonable starting point for this sort of testing?

IMO, using the TAP machinery would be a good base for that. What lacks
is a basic set of perl routines that one can easily use to set of test
scenarios.

I think Stephen was referring specifically to pgTap (http://pgtap.org/).

Isn't our TAP framework just different output from pg_regress? Is there
documentation on our TAP stuff?

How would a test that would've caught the multixact issues look?

I have not followed closely those discussions, not sure about that.

I've thought about this and unfortunately I think this may be a scenario
that's just too complex to completely protect against with a test. What
might help though is having better testing of edge cases (such as MXID
wrap) and then combining that with other forms of testing, such as
pg_upgrade and streaming rep. testing. Test things like "What happens if
we pg_upgrade a cluster that's in danger of wraparound?"
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#103

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Tom Lane (#98)

Re: [CORE] Restore-reliability mode

On 6/5/15 10:39 AM, Tom Lane wrote:

The other side of that coin is that we might get useful comments from
testers on how the feature ought to work. I don't agree with the notion
that all feature details must be graven on stone tablets before we start
trying to get feedback from people outside the core development community.

The same point applies to the FDW C API questions, or to RLS, or to the
"expanded objects" work that I did. (I'd really love it if the PostGIS
folk would try to use that sometime before it's too late to adjust the
definition...) Now, you could argue that people likely to have useful
input on those issues are fully capable of working with git tip, and you'd
probably be right, but would they do so? As Simon says nearby, publishing
an alpha/beta/whatever is our signal to the wider community that it's time
for them to start paying attention. I do not think they will look at 9.5
until we do that; and I think it'll be our loss if they don't start
looking at these things soon.

+1, but I also think we should have a better mechanism for soliciting
user input on these things while design discussions are happening. ISTM
that there's a lot of hand-waving that happens around use cases that
could probably be clarified with end user input.

FWIW, I don't think the blocker here is git or building from source. If
someone has that amount of time to invest it's not much different than
grabbing a tarball.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#104

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Simon Riggs (#94)

Re: [CORE] Restore-reliability mode

Simon Riggs wrote:

On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:

Stamping it a beta implies that we think it's something fairly
stable that we'd be pretty happy to release if things go well, which
is a higher bar to clear.

We don't have a clear definition of what Beta means. For me, Beta has
always meant "trial software, please test".

I think that definition *is* the problem, actually. To me, "beta" means
"trial software, please test, but final product will be very similar to
what you see here". What we need to convey at this point is what you
said, but I think a better word for that is "alpha". There may be more
mobility in there than in a beta, in users's perception, which is the
right impression we want to convey.

Another point is that historically, once we've released a beta, we're
pretty reluctant to bump catversion. We're not ready for that at this
stage, which is one criteria that suggests to me that we're not ready
for beta.

So I think the right thing to do at this point is to get an alpha out,
shortly after releasing upcoming minors.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#105

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Simon Riggs (#94)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 11:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

We don't have a clear definition of what Beta means. For me, Beta has always
meant "trial software, please test".

I don't think anybody will say anything bad about us if we release a beta
and then later pull some of the features because we are not confident with
them when AFTER testing the feature is shown to be below our normal
standard; that will bring us credit, I feel. It is extremely common in
software development to defer some of the features if their goals aren't
met, or to change APIs and interfaces based upon user feedback.

Yeah, but we usually haven't. Tom, for example, has previously not
wanted to even bump catversion after beta1, which rules out a huge
variety of possible fixes and interface changes. If we want to make a
policy decision to change our approach, we should be up-front about
that.

None of this means I disagree with assessments of the current state of the
software, I'm saying that we should simply follow the normal process and
stick to the schedule we have previously agreed, for all of the reasons
cited when we agreed it.

Well, to my way of looking at it, our feature freeze was later this
year than it has been in the past, so our beta will be later, too. If
we want to stick with the schedule, we have to do that throughout.
Our typical schedule has been a two-month final CommitFest starting on
January 15th. This year we had a three month final CommitFest
starting on February 15th. So we finished the last CommitFest two
months later than has been typical.

Typically our beta has been in early May, 1-2 months after the end of
the last CommitFest. If you add the same two months to that, you get
early July, which sounds reasonable, rather than early June, which
sounds rushed, especially since we have an urgent need to get minor
releases out the door to fix critical stability bugs right now, and
then we have PGCon, during which nobody's going to be looking at
anything.

It sounds to me like the original plan was to put out a beta in early
June, which would have been fine if we'd stuck to the traditional
2-month final CommitFest. But we didn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#106

Josh Berkus

josh@agliodbs.com

over 10 years ago

In reply to: Robert Haas (#23)

Re: [CORE] Restore-reliability mode

On 06/05/2015 07:23 AM, Tom Lane wrote:

So let's call it an alpha, or some other way of setting expectations
appropriately. But I think it's silly to maintain that the code is not in
a state where end-user testing is useful. They just have to understand
that they can't trust it with production data.

Yes ... that seems like a good compromise.

Frankly, I'm testing 9.5 already; having alpha packages would make that
testing easier for me, and maybe possible for others.

We'd need to take into account that our packagers are a bit overworked
this month due to update releases ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMa56a0ab3ab6e46539072435d72c0ae65d8279e9ba6ae4f1ad48bb29bb9afc25ab015a3cc7a8106c28c7331acb9ba58bd@asav-2.01.com

#107

Peter Geoghegan

pg@heroku.com

over 10 years ago

In reply to: Andres Freund (#99)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 8:51 AM, Andres Freund <andres@anarazel.de> wrote:

4. Arguable RLS security bug, EvalPlanQual() paranoia - This seems
like another question of what the expectations around RLS actually
are.

In the end that's minor from the end user's perspective.

I think that depends on what we ultimately decide to do about it,
which is something that I have yet to form an opinion on (although I
know we need to document the issue, at the very least). For example,
one idea that Stephen and I discussed privately was making security
barrier quals referencing other relations lock the referenced rows.
This was an informal throwing around of ideas, but it's possible that
something like that could end up happening.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#108

Peter Geoghegan

pg@heroku.com

over 10 years ago

In reply to: Robert Haas (#90)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 7:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I do agree that an indefinite development freeze with unclear
parameters for resuming development and unclear goals is a bad plan.
But I think giving ourselves a little more time to, say, turn the
buildfarm consistently green, and, say, fix the known but
currently-unfixed multixact bugs, and, say, fix the known bugs in 9.5
features is a good plan, and I hope you and others will support it.

FWIW, I have 3 pending bug fixes for UPSERT. While those are pretty
benign issues, I'd be annoyed if they didn't get into the first 9.5
beta (or alpha, even).

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#109

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Simon Riggs (#101)

Re: [CORE] Restore-reliability mode

On Fri, Jun 5, 2015 at 04:54:56PM +0100, Simon Riggs wrote:

On 5 June 2015 at 16:05, Bruce Momjian <bruce@momjian.us> wrote:

Please address some of the specific issues I mentioned.ï¿½

I can discuss them but not because I am involved directly. I take
responsibility as a committer and have an interest from that perspective.

In my role at 2ndQuadrant, I approved all of the time Alvaro and Andres spent
on submitting, reviewing and fixing bugs - at this point that has cost
something close to fifty thousand dollars just on this feature and subsequent
actions. (I believe the feature was originally funded, but we never saw a penny
of that, though others did.)

Yes, the burden has fallen heavily on Alvaro. I personally am concerned
that many people were focusing on 9.5 rather than helping him. I think
that was a mistake on our part and we need to take reliability problems
more seriously.

What has also concerned me is that there are so many 9.3/9.4 bugs in
this area that few of us can even understand what was fixed when, and we
are then having problems figuring out what bugs were present when
analyzing bug reports. pg_upgrade has made this worse by allowing
multi-xact bugs to propagate across major versions, and pg_upgrade had
some multi-xact bugs of its own in early 9.3 releases. :-(

The problem
with the multi-xact case is that we just kept fixing bugs as people
found them, and did not do a holistic review of the code.ï¿½

I observed much discussion and review. The bugs we've had have all been fairly
straightforwardly fixed. There haven't been any design-level oversights or
head-palm moments. It's complex software that had complex behaviour that caused
problems. The problem has been that anything on-disk causes more problems when
errors occur. We should review carefully anything that alters the way on-disk
structures work, like the WAL changes, UPSERTs new mechanism etc..

Agreed. However, I think a thorough review early on could have caught
many of these bugs before they were reported by users. As proof, even
in the past few weeks, review is finding bugs before they are found by
users.

From my side, it is only recently I got some clear answers to my questions
about how it worked. I think it is very important that major features have
extensive README type documentation with them so the underlying principles used
in the development are clear. I would define the measure of a good feature as
whether another committer can read the code comments and get a good feel. A bad
feature is one where committers walk away from it, saying I don't really get it
and I can't read an explanation of why it does that. Tom's most significant
contribution is his long descriptive comments on what the problem is that need
to be solved, the options and the method chosen. Clarity of thought is what
solves bugs.

Yes, I think we should have done that early-on for multi-xact, and I am
hopeful we will learn to do that more often when complex features are
implemented, or when we identify areas that are more complex than we
thought.

Overall, I don't see the need to stop the normal release process and do a
holistic review. But I do think we should check each feature to see whether it
is fully documented or whether we are simply trusting one of us to be around to
fix it.

Agreed. We just need to be honest that we are doing what we need for
reliability and not allow schedule and feature pressure to cause us to
skimp in this area.

I am just saying we need to ask the
reliability question _first_.

Agreed
ï¿½
Let me restate something that has appeared in many replies to my ideas
--- I am not asking for infinite or unbounded review, but I am asking
that we make sure reliability gets the proper focus in relation to our
time pressures.ï¿½ Our balance was so off a month ago that I feel only a
full stop on time pressure would allow us to refocus because people are
not good at focusing on multiple things. It is sometimes necessary to
stop everything to get people's attention, and to help them remember
that without reliability, a database is useless.
Here, I think we are talking about different types of reliability. PostgreSQL
software is well ahead of most industry measures of quality; these recent bugs
have done nothing to damage that, other than a few people woke up and said
"Wow! Postgres had a bug??!?!?". The presence of bugs is common and if we have
grown unused to them, we should be wary of that, though not tolerant.

In going over the 9.5 commits, I was struck by a high volume of cleanups
and fixes, which is good.

PostgreSQL is now reliable in the sense that we have many features that ensure
availability even in the face of software problems and bug induced corruption.
Those have helped us get out of the current situations, giving users a
workaround while bugs are fixed. So the impact of database software bugs is not
what it once was.

Uh, yes, we have avoided the worst of the impact from these bugs. In my
understanding, each bug has X% chance of being serious, and you might go
for a long time before a serious bug is created, but the more bugs we
have, the more likely that one will serious. The _volume_ of multi-xact
bugs should have triggered a review much sooner.

People think I want to stop feature development to review. What I am
saying is that we need to stop development so we can be honest about
whether we need review, and where. It is hard to be honest when time
and feature pressure are on you. It shouldn't take long to make that
decision as a group.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#110

Michael Paquier

michael.paquier@gmail.com

over 10 years ago

In reply to: Alvaro Herrera (#93)

Re: [CORE] Restore-reliability mode

On Sat, Jun 6, 2015 at 12:05 AM, Alvaro Herrera wrote:

Michael Paquier wrote:
What happened with the extension tests patches you submitted? They
seemed valuable to me, but I lost track.

Those ones are registered in the queue of 9.6:
https://commitfest.postgresql.org/5/187/
And this is the latest patch:
/messages/by-id/CAB7nPqSQr1UjZ1h8=be1wBq3mMdmM38nrjBKvBJuM--tTTY=EA@mail.gmail.com
This patch extends prove_check by giving the possibility for a given
utility using t/ to add extra modules in t/extra that will be
installed and usable for its regression tests. This becomes more
interesting considering as well that pg_upgrade could be switched to
use the TAP infrastructure, where we could have modules dedicated to
only the tests of pg_upgrade (supporting TAP tests on Windows is a
necessary condition though before switching pg_upgrade).
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#111

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Alvaro Herrera (#104)

Re: [CORE] Restore-reliability mode

On 5 June 2015 at 17:20, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Simon Riggs wrote:

On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:

Stamping it a beta implies that we think it's something fairly
stable that we'd be pretty happy to release if things go well, which
is a higher bar to clear.

We don't have a clear definition of what Beta means. For me, Beta has
always meant "trial software, please test".

I think that definition *is* the problem, actually. To me, "beta" means
"trial software, please test, but final product will be very similar to
what you see here". What we need to convey at this point is what you
said, but I think a better word for that is "alpha". There may be more
mobility in there than in a beta, in users's perception, which is the
right impression we want to convey.

Another point is that historically, once we've released a beta, we're
pretty reluctant to bump catversion. We're not ready for that at this
stage, which is one criteria that suggests to me that we're not ready
for beta.

So I think the right thing to do at this point is to get an alpha out,
shortly after releasing upcoming minors.

OK, I can get behind that.

My only additional point is that it is a good idea to release an Alpha
every time, not just this release.

And if its called Alpha, lets release it immediately. We can allow Alpha1,
Alpha2 as needed, plus we allow catversion and file format changes between
Alpha versions.

Proposed definitions

Alpha: This is trial software please actively test and report bugs. Your
feedback is sought on usability and performance, which may result in
changes to the features included here. Not all known issues have been
resolved but work continues on resolving them. Multiple Alpha versions may
be released before we move to Beta. We reserve the right to change internal
API definitions, file formats and increment the catalog version between
Alpha versions and Beta, so we do not guarantee and easy upgrade path from
this version to later versions of this release.

Beta: This is trial software please actively test and report bugs and
performance issues. Multiple Beta versions may be released before we move
to Release Candidate. We will attempt to maintain APIs, file formats and
catversions.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#112

Gavin Flower

GavinFlower@archidevsys.co.nz

over 10 years ago

In reply to: Simon Riggs (#111)

Re: [CORE] Restore-reliability mode

On 06/06/15 21:07, Simon Riggs wrote:

On 5 June 2015 at 17:20, Alvaro Herrera <alvherre@2ndquadrant.com
<mailto:alvherre@2ndquadrant.com>> wrote:

Simon Riggs wrote:

On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com

<mailto:robertmhaas@gmail.com>> wrote:

Stamping it a beta implies that we think it's something fairly
stable that we'd be pretty happy to release if things go well,

which

is a higher bar to clear.

We don't have a clear definition of what Beta means. For me,

Beta has

always meant "trial software, please test".

I think that definition *is* the problem, actually. To me, "beta"
means
"trial software, please test, but final product will be very
similar to
what you see here". What we need to convey at this point is what you
said, but I think a better word for that is "alpha". There may be more
mobility in there than in a beta, in users's perception, which is the
right impression we want to convey.

Another point is that historically, once we've released a beta, we're
pretty reluctant to bump catversion. We're not ready for that at this
stage, which is one criteria that suggests to me that we're not ready
for beta.

So I think the right thing to do at this point is to get an alpha out,
shortly after releasing upcoming minors.

OK, I can get behind that.

My only additional point is that it is a good idea to release an Alpha
every time, not just this release.

And if its called Alpha, lets release it immediately. We can allow
Alpha1, Alpha2 as needed, plus we allow catversion and file format
changes between Alpha versions.

Proposed definitions

Alpha: This is trial software please actively test and report bugs.
Your feedback is sought on usability and performance, which may result
in changes to the features included here. Not all known issues have
been resolved but work continues on resolving them. Multiple Alpha
versions may be released before we move to Beta. We reserve the right
to change internal API definitions, file formats and increment the
catalog version between Alpha versions and Beta, so we do not
guarantee and easy upgrade path from this version to later versions of
this release.

Beta: This is trial software please actively test and report bugs and
performance issues. Multiple Beta versions may be released before we
move to Release Candidate. We will attempt to maintain APIs, file
formats and catversions.

--
Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

As a 'user' I am very happy with the idea of having Alpha's, gives me a
feeling that there will be less chance of problems being released in the
final version.

Because not only does it give more chances to test, but might encourage
more people to get involved in contributing, either ideas for minor
tweaks or simple patches etc. (as being not quite finished, and an
expectation that minor functional changes have a possibility of being
accepted for the version, if there is sufficient merit).

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#113

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Simon Riggs (#111)

Re: [CORE] Restore-reliability mode

On Sat, Jun 6, 2015 at 11:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 5 June 2015 at 17:20, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Simon Riggs wrote:

On 5 June 2015 at 15:00, Robert Haas <robertmhaas@gmail.com> wrote:

Stamping it a beta implies that we think it's something fairly
stable that we'd be pretty happy to release if things go well, which
is a higher bar to clear.

We don't have a clear definition of what Beta means. For me, Beta has
always meant "trial software, please test".

I think that definition *is* the problem, actually. To me, "beta" means
"trial software, please test, but final product will be very similar to
what you see here". What we need to convey at this point is what you
said, but I think a better word for that is "alpha". There may be more
mobility in there than in a beta, in users's perception, which is the
right impression we want to convey.

Another point is that historically, once we've released a beta, we're
pretty reluctant to bump catversion. We're not ready for that at this
stage, which is one criteria that suggests to me that we're not ready
for beta.

So I think the right thing to do at this point is to get an alpha out,
shortly after releasing upcoming minors.

OK, I can get behind that.

My only additional point is that it is a good idea to release an Alpha
every time, not just this release.

And if its called Alpha, lets release it immediately. We can allow Alpha1,
Alpha2 as needed, plus we allow catversion and file format changes between
Alpha versions.

If I'm not mistaken, we (Simon and me) actually discussed something else
along this line a while ago that might be worth considering. That is, maybe
we should consider time-based alpha releases. That is, we can just decide
"we wrap an alpha every other Monday until we think we are good to go with
beta". The reason for that is to get much quicker iteration on bugfixes,
which would encourage people to use and test these versions. Report a bug
and if it was easy enough to fix, you have a wrapped release with the fix
in 2 weeks top.

This would require that we can (at least mostly) automate the wrapping of
an alpha release, but I'm pretty sure we can solve that problem. We can
also, I think, get a way with doing the release notes for an alpha just as
a wiki page and a lot less formal than others, meaning we don't need to
hold up any process for that.

Package availability would depend on platform. For those platforms where
package building is more or less entirely automatic already, this could
probably also be easily automated. And for those that take a lot more work,
such as the Windows installers, we could just go with wrapping every other
or every third alpha. As this is not a production release, I don't see why
we'd need to hold some back to cover for the rest.

Proposed definitions

Alpha: This is trial software please actively test and report bugs. Your
feedback is sought on usability and performance, which may result in
changes to the features included here. Not all known issues have been
resolved but work continues on resolving them. Multiple Alpha versions may
be released before we move to Beta. We reserve the right to change internal
API definitions, file formats and increment the catalog version between
Alpha versions and Beta, so we do not guarantee and easy upgrade path from
this version to later versions of this release.

Beta: This is trial software please actively test and report bugs and
performance issues. Multiple Beta versions may be released before we move
to Release Candidate. We will attempt to maintain APIs, file formats and
catversions.

These sound like good definitions. Might add to the beta one something like
"whilst we will try to avoid it, pg_upgrade may be required between betas
and from beta to rc versions".

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#114

Devrim GÜNDÜZ

devrim@gunduz.org

over 10 years ago

In reply to: Magnus Hagander (#113)

Re: [CORE] Restore-reliability mode

Hi,

On Sat, 2015-06-06 at 12:15 +0200, Magnus Hagander wrote:

If I'm not mistaken, we (Simon and me) actually discussed something
else along this line a while ago that might be worth considering. That
is, maybe we should consider time-based alpha releases. That is, we
can just decide "we wrap an alpha every other Monday until we think we
are good to go with beta". The reason for that is to get much quicker
iteration on bugfixes, which would encourage people to use and test
these versions. Report a bug and if it was easy enough to fix, you
have a wrapped release with the fix in 2 weeks top.

+1.

Package availability would depend on platform. For those platforms
where package building is more or less entirely automatic already,
this could probably also be easily automated.

When we used to release more alphas years ago, I was releasing Alpha
RPMs for many platforms. I'll do it again if we keep doing it.

Regards,

--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#115

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Devrim GÜNDÜZ (#114)

Re: [CORE] Restore-reliability mode

To play devil's advocate for a moment, is there anyone who would genuinely
be prepared to download and install an alpha release who would not already
have downloaded one of the nightlies? I only ask because I assume that
releasing
an alpha is not zero-developer-cost and I don't believe
that
there's a large

number of people who *would *be happy to install something that's described
as being buggy and subject to change but are put off by having to type
"configure" and "make".

Further, it seems to me that the number of people who won't roll their own
who are useful as bug-finders is even smaller.

I get the feeling that the argument appears to be "Bruce doesn't want to
release a beta, Simon wants to release something. Let's release an alpha
because it's sort-of half way in between" as a consensus compromise (I'm
not deliberately picking on specific people, I'm aware you're not the only
two involved and arguing for either side, but you do seem to be fairly
polar opposite sides of the argument :) ); I don't really believe that
releasing an alpha moves anything further forward from a testing point of
view, and I'm fairly sure that it will have just as dele
terious effect on bugfixing as would a beta
, with the added disadvantage of the extra developer cost.

Geoff

#116

Sehrope Sarkuni

sehrope@jackdb.com

over 10 years ago

In reply to: Geoff Winkless (#115)

Re: [CORE] Restore-reliability mode

On Sat, Jun 6, 2015 at 6:47 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

To play devil's advocate for a moment, is there anyone who would genuinely be prepared to download
and install an alpha release who would not already have downloaded one of the nightlies? I only ask
because I assume that releasing an alpha is not zero-developer-cost and I don't believe that
there's a large number of people who would be happy to install something that's described as being
buggy and subject to change but are put off by having to type "configure" and "make".

I fit into that category and I would guess there would be others as
well. Having system packages available via an "apt-get install ..."
lowers the bar significantly to try things out.

As an example, I installed the 9.4 beta as soon as it was available to
run a smoke test and try out some of the new jsonb features. I'll be
doing the same with a 9.5 alpha/beta (or whatever it's called), for
both similar testing and to try out UPSERT.

It's much easier to work into dev/test setups if there are system
packages as it's just a config change to an existing script. Building
from source would require a whole new workflow that I don't have time
to incorporate.

Further, it seems to me that the number of people who won't roll their own who are useful as bug-finders is even smaller.

That's probably true but they definitely won't find any bugs if they
don't test at all.

If it's possible to have automated packaging, even for just a subset
of platforms, I think that'd be useful.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#117

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Geoff Winkless (#115)

Re: [CORE] Restore-reliability mode

On Sat, Jun 6, 2015 at 6:47 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

To play devil's advocate for a moment, is there anyone who would genuinely
be prepared to download and install an alpha release who would not already
have downloaded one of the nightlies? I only ask because I assume that
releasing
an alpha is not zero-developer-cost and I don't believe
that
there's a large
number of people who would be happy to install something that's described as
being buggy and subject to change but are put off by having to type
"configure" and "make".

This is pretty much why Peter Eisentraut gave up on doing alphas after
the 9.1 cycle.

Admittedly, what is being proposed here is somewhat different.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#118

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Sehrope Sarkuni (#116)

Re: [CORE] Restore-reliability mode

On 6 June 2015 at 13:41, Sehrope Sarkuni <sehrope@jackdb.com> wrote:

On Sat, Jun 6, 2015 at 6:47 AM, Geoff Winkless <pgsqladmin@geoff.dj>
wrote:

To play devil's advocate for a moment, is there anyone who would

genuinely be prepared to download

and install an alpha release who would not already have downloaded one

of the nightlies? I only ask

because I assume that releasing an alpha is not zero-developer-cost and

I don't believe that

there's a large number of people who would be happy to install something

that's described as being

buggy and subject to change but are put off by having to type

"configure" and "make".

I fit into that category and I would guess there would be others as
well. Having system packages available via an "apt-get install ..."
lowers the bar significantly to try things out.

But it also lowers the bar to the extent that you get the people who won't
read the todo list and end up complaining about the things that everyone
already knows about.

It's much easier to work into dev/test setups if there are system
packages as it's just a config change to an existing script. Building
from source would require a whole new workflow that I don't have time
to incorporate.

Really? You genuinely don't have time to paste, say:

mkdir -p ~/src/pgdevel
cd ~/src/pgdevel
wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar xjf postgresql-snapshot.tar.bz2
mkdir bld

cd bld
../postgresql-9.5devel/configure $(pg_config --configure | sed -e
's/$pg\|postgresql[-\/]$$doc-$\?9\.[0-9]*$dev$\?/\1\29.5dev/g')
make wor
ld
make check
make world-install

and yet you think you have enough time to provide more than a "looks like
it's working" report to the developers?

(NB the sed for the pg_config line will probably need work, it looks like
it should work on the two types of system I have here but I have to admit I
changed the config line manually when I built it)

Further, it seems to me that the number of people who won't roll their

own who are useful as bug-finders is even smaller.

That's probably true but they definitely won't find any bugs if they
don't test at all.

If it's possible to have automated packaging, even for just a subset
of platforms, I think that'd be useful.

Well yes, automated packaging of the nightly build, that doesn't involve
the developers having to stop what they're doing to write official alpha
release docs or any of the other stuff that goes along with doing a
release, would be zero-impact on development (assuming the developers
didn't have to build or maintain the auto-packager) and therefore any
return (however small) would make it worthwhile.

Fancy building (and maintaining) the auto-packaging system, and managing a
mailing list for its users?

Geoff

#119

Sehrope Sarkuni

sehrope@jackdb.com

over 10 years ago

In reply to: Geoff Winkless (#118)

Re: [CORE] Restore-reliability mode

On Sat, Jun 6, 2015 at 10:35 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

Really? You genuinely don't have time to paste, say:

mkdir -p ~/src/pgdevel
cd ~/src/pgdevel
wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar xjf postgresql-snapshot.tar.bz2
mkdir bld
cd bld
../postgresql-9.5devel/configure $(pg_config --configure | sed -e 's/$pg\|postgresql[-> \/]$$doc-$\?9\.[0-9]*$dev$\?/\1\29.5dev/g')
make world
make check
make world-install

and yet you think you have enough time to provide more than a "looks like it's working" report to the developers?

Adding steps to an existing process to fetch and build from source is
significantly more complicated then flipping a version number. And I'm
not trying to run PG's built in tests on my machine. I want to run the
tests for my applications, and ideally, my applications themselves.

If doing so leads me to find that something doesn't work then of
course I would research and report the cause. At that point it's
something that I know will directly effect me if it's not fixed!

Well yes, automated packaging of the nightly build, that doesn't involve the developers having to stop what they're doing to write official alpha release docs or any of the other stuff that goes along with doing a release, would be zero-impact on development (assuming the developers didn't have to build or maintain the auto-packager) and therefore any return (however small) would make it worthwhile.
Fancy building (and maintaining) the auto-packaging system, and managing a mailing list for its users?

I don't have much experience in setting things like this up so I'm not
one to estimate the work load involved. If it existed though, I'd use
it.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#120

Kevin Grittner

kgrittn@ymail.com

over 10 years ago

In reply to: Robert Haas (#105)

Re: [CORE] Restore-reliability mode

Robert Haas <robertmhaas@gmail.com> wrote:

Tom, for example, has previously not wanted to even bump
catversion after beta1, which rules out a huge variety of
possible fixes and interface changes. If we want to make a
policy decision to change our approach, we should be up-front
about that.

What?!? There have been catversion bumps between the REL?_?_BETA1
tag and the REL?_?_0 tag for 8.2, 8.3, 9.0, 9.1, 9.3, and 9.4.
(That is, it has happend on 6 of the last 8 releases.) I don't
think we're talking about any policy change here. We try to avoid
a catversion bump after beta if we can; we're not that reluctant to
do so if needed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#121

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Bruce Momjian (#109)

Re: [CORE] Restore-reliability mode

On 06/05/2015 08:07 PM, Bruce Momjian wrote:

From my side, it is only recently I got some clear answers to my questions
about how it worked. I think it is very important that major features have
extensive README type documentation with them so the underlying principles used
in the development are clear. I would define the measure of a good feature as
whether another committer can read the code comments and get a good feel. A bad
feature is one where committers walk away from it, saying I don't really get it
and I can't read an explanation of why it does that. Tom's most significant
contribution is his long descriptive comments on what the problem is that need
to be solved, the options and the method chosen. Clarity of thought is what
solves bugs.

Yes, I think we should have done that early-on for multi-xact, and I am
hopeful we will learn to do that more often when complex features are
implemented, or when we identify areas that are more complex than we
thought.

I see this idea of the README as very useful. There are far more people
like me in this community than Simon or Alvaro. I can test, I can break
things, I can script up a harness but I need to be understand HOW and
the README would help allow for that.

People think I want to stop feature development to review. What I am
saying is that we need to stop development so we can be honest about
whether we need review, and where. It is hard to be honest when time
and feature pressure are on you. It shouldn't take long to make that
decision as a group.

Right. This is all about taking a step back, a deep breath, an objective
look and then digging in with a more productive and reliable manner.

Sincerely,

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#122

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Robert Haas (#117)

Re: [CORE] Restore-reliability mode

On 06/06/2015 07:33 AM, Robert Haas wrote:

On Sat, Jun 6, 2015 at 6:47 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

To play devil's advocate for a moment, is there anyone who would genuinely
be prepared to download and install an alpha release who would not already
have downloaded one of the nightlies? I only ask because I assume that
releasing
an alpha is not zero-developer-cost and I don't believe
that
there's a large
number of people who would be happy to install something that's described as
being buggy and subject to change but are put off by having to type
"configure" and "make".

Yes, me and everyone like me in feature set.

Compiling takes time, time that does not need to be spent. If I can push
an alpha into a container and start testing, I will do so. If I have to:

git pull; configure --prefix; make -j8 install

Then I will likely move on to other things because my time (nor is any
other's on this list) is not free.

If you add into this a test harness that I can execute from the alpha
release (or another package) that allows me to instant report via
buildfarm or just email a tarball to -hackers that is even better.

I know that I am not taking everything into account here but remember
that most of our users are not -hackers. They are practitioners and a
lot of them would love to help but just can't because a lot of the
infrastructure has never been built and -hackers think like -hackers.

Sincerely,

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#123

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: Simon Riggs (#89)

Re: Restore-reliability mode

On Fri, Jun 05, 2015 at 08:25:34AM +0100, Simon Riggs wrote:

This whole idea of "feature development" vs reliability is bogus. It
implies people that work on features don't care about reliability. Given
the fact that many of the features are actually about increasing database
reliability in the event of crashes and corruptions it just makes no sense.

I'm contrasting work that helps to keep our existing promises ("reliability")
with work that makes new promises ("features"). In software development, we
invariably hazard old promises to make new promises; our success hinges on
electing neither too little nor too much risk. Two years ago, PostgreSQL's
track record had placed it in a good position to invest in new, high-risk,
high-reward promises. We did that, and we emerged solvent yet carrying an
elevated debt service ratio. It's time to reduce risk somewhat.

You write about a different sense of "reliability." (Had I anticipated this
misunderstanding, I might have written "Restore-probity mode.") None of this
was about classifying people, most of whom allocate substantial time to each
kind of work.

How will we participate in cleanup efforts? How do we know when something
has been "cleaned up", how will we measure our success or failure? I think
we should be clear that wasting N months on cleanup can *fail* to achieve a
useful objective. Without a clear plan it almost certainly will do so. The
flip side is that wasting N months will cause great amusement and dancing
amongst those people who wish to pull ahead of our open source project and
we should take care not to hand them a victory from an overreaction.

I agree with all that. We should likewise take care not to become insolvent
from an underreaction.

So lets do our normal things, not do a "total stop" for an indefinite
period. If someone has specific things that in their opinion need to be
addressed, list them and we can talk about doing them, together.

I recommend these four exit criteria:

1. Non-author committer review of foreign keys locks/multixact durability.
Done when that committer certifies, as if he were committing the patch
himself today, that the code will not eat data.

2. Non-author committer review of row-level security. Done when that
committer certifies that the code keeps its promises and that the
documentation bounds those promises accurately.

3. Second committer review of the src/backend/access changes for INSERT ... ON
CONFLICT DO NOTHING/UPDATE. (Bugs affecting folks who don't use the new
syntax are most likely to fall in that portion.) Unlike the previous two
criteria, a review without certification is sufficient.

4. Non-author committer certifying that the 9.5 WAL format changes will not
eat your data. The patch lists Andres and Alvaro as reviewers; if they
already reviewed it enough to make that certification, this one is easy.

That ties up four people. For everyone else:

- Fix bugs those reviews find. This will start slow but will grow to keep
everyone busy. Committers won't certify code, and thus we can't declare
victory, until these bugs are fixed. The rest of this list, in contrast,
calls out topics to sample from, not topics to exhaust.

- Turn current buildfarm members green.

- Write, review and commit more automated test machinery to PostgreSQL. Test
whatever excites you. If you need ideas, Craig posted some good ones
upthread. Here are a few more:
- Add a debug mode that calls sched_yield() in SpinLockRelease(); see
6322.1406219591@sss.pgh.pa.us.
- Improve TAP suite (src/test/perl/TestLib.pm) logging. Currently, these
suites redirect much output to /dev/null. Instead, log that output and
teach the buildfarm to capture the log.
- Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin
count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer
when its global pin count falls to zero.
- With assertions enabled, or perhaps in a new debug mode, have
pg_do_encoding_conversion() and pg_server_to_any() check the data for a
no-op conversion instead of assuming the data is valid.

- Add buildfarm members. This entails reporting any bugs that prevent an
initial passing run. Once you have a passing run, schedule regular runs.
Examples of useful additions:
- "./configure ac_cv_func_getopt_long=no, ac_cv_func_snprintf=no ..." to
enable all the replacement code regardless of the current platform's need
for it. This helps distinguish "Windows bug" from "replacement code bug."
- --disable-integer-datetimes, --disable-float8-byval, disable-float4-byval,
--disable-spinlocks, --disable-atomics, disable-thread-safety,
--disable-largefile, #define RANDOMIZE_ALLOCATED_MEMORY
- Any OS or CPU architecture other than x86 GNU/Linux, even ones already
represented.

- Write, review and commit fixes for the bugs that come to light by way of
these new automated tests.

- Anything else targeted to make PostgreSQL keep the promises it has already
made to our users.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#124

Michael Paquier

michael.paquier@gmail.com

over 10 years ago

In reply to: Noah Misch (#123)

Re: Restore-reliability mode

On Sun, Jun 7, 2015 at 4:58 AM, Noah Misch <noah@leadboat.com> wrote:

- Write, review and commit more automated test machinery to PostgreSQL. Test
whatever excites you. If you need ideas, Craig posted some good ones
upthread. Here are a few more:
- Improve TAP suite (src/test/perl/TestLib.pm) logging. Currently, these
suites redirect much output to /dev/null. Instead, log that output and
teach the buildfarm to capture the log.

We can capture the logs and redirect them by replacing
system_or_bail() with more calls to IPC::run. That would be a patch
simple enough. pg_rewind's tests should be switched to use that as
well.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#125

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Kevin Grittner (#120)

Re: [CORE] Restore-reliability mode

On Sat, Jun 6, 2015 at 12:33 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

Tom, for example, has previously not wanted to even bump
catversion after beta1, which rules out a huge variety of
possible fixes and interface changes. If we want to make a
policy decision to change our approach, we should be up-front
about that.

What?!? There have been catversion bumps between the REL?_?_BETA1
tag and the REL?_?_0 tag for 8.2, 8.3, 9.0, 9.1, 9.3, and 9.4.
(That is, it has happend on 6 of the last 8 releases.) I don't
think we're talking about any policy change here. We try to avoid
a catversion bump after beta if we can; we're not that reluctant to
do so if needed.

Perhaps we're honoring this more in the breech than in the observance,
but I'm not making up what Tom has said about this:

/messages/by-id/27310.1251410965@sss.pgh.pa.us
/messages/by-id/19174.1299782543@sss.pgh.pa.us
/messages/by-id/3413.1301154369@sss.pgh.pa.us
/messages/by-id/3261.1401915832@sss.pgh.pa.us

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#126

Peter Geoghegan

pg@heroku.com

over 10 years ago

In reply to: Robert Haas (#125)

Re: [CORE] Restore-reliability mode

On Sat, Jun 6, 2015 at 7:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Perhaps we're honoring this more in the breech than in the observance,
but I'm not making up what Tom has said about this:

/messages/by-id/27310.1251410965@sss.pgh.pa.us
/messages/by-id/19174.1299782543@sss.pgh.pa.us
/messages/by-id/3413.1301154369@sss.pgh.pa.us
/messages/by-id/3261.1401915832@sss.pgh.pa.us

Of course, not doing a catversion bump after beta1 doesn't necessarily
have much value in and of itself. *Promising* to not do a catversion
bump, and then usually keeping that promise definitely has a certain
value, but clearly we are incapable of that.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#127

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Peter Geoghegan (#126)

Re: [CORE] Restore-reliability mode

On 06/06/2015 07:14 PM, Peter Geoghegan wrote:

On Sat, Jun 6, 2015 at 7:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Perhaps we're honoring this more in the breech than in the observance,
but I'm not making up what Tom has said about this:

/messages/by-id/27310.1251410965@sss.pgh.pa.us
/messages/by-id/19174.1299782543@sss.pgh.pa.us
/messages/by-id/3413.1301154369@sss.pgh.pa.us
/messages/by-id/3261.1401915832@sss.pgh.pa.us

Of course, not doing a catversion bump after beta1 doesn't necessarily
have much value in and of itself. *Promising* to not do a catversion
bump, and then usually keeping that promise definitely has a certain
value, but clearly we are incapable of that.

It seems to me that a cat bump during Alpha or Beta should be absolutely
fine and reservedly fine respectively. Where we should absolutely not
cat bump unless there is absolutely no other choice is during and RC.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#128

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Joshua D. Drake (#121)

Re: [CORE] Restore-reliability mode

Joshua D. Drake wrote:

On 06/05/2015 08:07 PM, Bruce Momjian wrote:

From my side, it is only recently I got some clear answers to my questions
about how it worked. I think it is very important that major features have
extensive README type documentation with them so the underlying principles used
in the development are clear. I would define the measure of a good feature as
whether another committer can read the code comments and get a good feel. A bad
feature is one where committers walk away from it, saying I don't really get it
and I can't read an explanation of why it does that. Tom's most significant
contribution is his long descriptive comments on what the problem is that need
to be solved, the options and the method chosen. Clarity of thought is what
solves bugs.

Yes, I think we should have done that early-on for multi-xact, and I am
hopeful we will learn to do that more often when complex features are
implemented, or when we identify areas that are more complex than we
thought.

I see this idea of the README as very useful. There are far more people like
me in this community than Simon or Alvaro. I can test, I can break things, I
can script up a harness but I need to be understand HOW and the README would
help allow for that.

There is a src/backend/access/README.tuplock that attempts to describe
multixacts. Is that not sufficient?

Now that I think about it, this file hasn't been updated with the latest
changes, so it's probably a bit outdated now.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#129

Kevin Grittner

kgrittn@ymail.com

over 10 years ago

In reply to: Joshua D. Drake (#127)

Re: [CORE] Restore-reliability mode

Joshua D. Drake <jd@commandprompt.com> wrote:

On 06/06/2015 07:14 PM, Peter Geoghegan wrote:

On Sat, Jun 6, 2015 at 7:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Perhaps we're honoring this more in the breech than in the
observance, but I'm not making up what Tom has said about this:

/messages/by-id/27310.1251410965@sss.pgh.pa.us

That's 9.0 release discussion:

| I think that the traditional criterion is that we don't release beta1
| as long as there are any known issues that might force an initdb.
| We were successful in avoiding a post-beta initdb this time, although
| IIRC the majority of release cycles have had one --- so maybe you
| could argue that that's not so important. It would certainly be
| less important if we had working pg_migrator functionality to ease
| the pain of going from beta to final.

/messages/by-id/19174.1299782543@sss.pgh.pa.us

That's 9.1 release discussion:

| Historically we've declared it beta once we think we are done with
| initdb-forcing problems.

| In any case, the existence of pg_upgrade means that "might we need
| another initdb?" is not as strong a consideration as it once was, so
| I'm not sure if we should still use that as a criterion. I don't know
| quite what "ready for beta" should mean otherwise, though.

/messages/by-id/3413.1301154369@sss.pgh.pa.us

Also 9.1, it is listed as one criterion:

| * No open issues that are expected to result in a catversion bump.
| (With pg_upgrade, this is not as critical as it used to be, but
| I still think catalog stability is a good indicator of a release's
| maturity)

/messages/by-id/3261.1401915832@sss.pgh.pa.us

Here we jump to 9.4 discussion:

| > Agreed. Additionally I also agree with Stefan that the price of a initdb
| > during beta isn't that high these days.
|
| Yeah, if nothing else it gives testers another opportunity to exercise
| pg_upgrade ;-). The policy about post-beta1 initdb is "avoid if
| practical", not "avoid at all costs".

So I think these examples show that the policy has shifted from a
pretty strong requirement to "it's probably nice if" status, with
some benefits seen in pg_upgrade testing to actually having a bump.

Of course, not doing a catversion bump after beta1 doesn't necessarily
have much value in and of itself. *Promising* to not do a catversion
bump, and then usually keeping that promise definitely has a certain
value, but clearly we are incapable of that.

As someone who was able to bring up a new production application on
8.2 because it was all redundant data and not yet mission-critical,
I appreciate that in very rate circumstances that combination could
have benefit. But really, how often are people in that position?

It seems to me that a cat bump during Alpha or Beta should be absolutely
fine and reservedly fine respectively. Where we should absolutely not
cat bump unless there is absolutely no other choice is during and RC.

+1 on all of that. And for a while now we've been talking about an
alpha test release, right?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#130

Jeff Janes

jeff.janes@gmail.com

over 10 years ago

In reply to: Geoff Winkless (#118)

Re: [CORE] Restore-reliability mode

On Sat, Jun 6, 2015 at 7:35 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

On 6 June 2015 at 13:41, Sehrope Sarkuni <sehrope@jackdb.com> wrote:

It's much easier to work into dev/test setups if there are system

packages as it's just a config change to an existing script. Building
from source would require a whole new workflow that I don't have time
to incorporate.

Really? You genuinely don't have time to paste, say:

mkdir -p ~/src/pgdevel
cd ~/src/pgdevel
wget
https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar xjf postgresql-snapshot.tar.bz2
mkdir bld

cd bld
../postgresql-9.5devel/configure $(pg_config --configure | sed -e
's/$pg\|postgresql[-\/]$$doc-$\?9\.[0-9]*$dev$\?/\1\29.5dev/g')
make wor
ld
make check
make world-install

I think this is rather uncharitable. You don't include yum, zypper, or
apt-get anywhere in there, and I vaguely recall it took me quite a bit of
time to install all the prereqs in order to get it to compile several years
ago when I first started trying to compile it. And then even more time get
it to compile with several of the config flags I wanted. And then even
more time to get the docs to compile.

And now after I got all of that, when I try your code, it still doesn't
work. $(pg_config ....) seems to not quote things the way that configure
wants them quoted, or something. And the package from which I was using
pg_config uses more config options than I was set up for when compiling
myself, so I either have to manually remove the flags, or find more
dependencies (pam, xslt, ossp-uuid, tcl, tcl-dev, and counting). This is
not very fun, and I didn't even need to get bureaucratic approval to do any
of this stuff, like a lot of people do.

And then when I try to install it, it tries to overwrite some of the files
which were initially installed by the package manager in /usr/lib. That
doesn't seem good.

And how do I, as a hypothetical package manager user, start this puppy up?
Where is pg_ctlcluster? How does one do pg_upgrade between a
package-controlled data directory and this new binary?

And then when I find a bug, how do I know it is a bug and not me doing
something wrong in the build process, and getting the wrong .so to load
with the wrong binary or something like that?

and yet you think you have enough time to provide more than a "looks like
it's working" report to the developers?

If it isn't working, reports of it isn't working with error messages are
pretty helpful and don't take a whole lot of time. If it is working,
reports of that probably aren't terribly helpful without putting a lot more
work into it. But people might be willing to put a lot of work into, say,
performance regression testing it that is their area of expertise, if they
don' t also have to put a lot of work into getting the software to compile
in the first place, which is not their area.

Now I don't see a lot of evidence of beta testing from the public (i.e.
unfamiliar names) on -hackers and -bugs lists. But a lot of hackers report
things that "a client" or "someone on IRC" reported to them, so I'm willing
to believe that a lot of useful beta testing does go on, even though I
don't directly see it, and if there were alpha releases, why wouldn't it
extend to that?

(NB the sed for the pg_config line will probably need work, it looks like
it should work on the two types of system I have here but I have to admit I
changed the config line manually when I built it)

Right, and are the people who use apt-get to install everything likely to
know how to do that work?

Cheers,

Jeff

#131

Peter Geoghegan

pg@heroku.com

over 10 years ago

In reply to: Noah Misch (#123)

Re: Restore-reliability mode

On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch <noah@leadboat.com> wrote:

- Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin
count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer
when its global pin count falls to zero.

Did a patch for this ever materialize?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#132

David Gould

daveg@sonic.net

over 10 years ago

In reply to: Jeff Janes (#130)

Re: [CORE] Restore-reliability mode

I think Alphas are valuable and useful and even more so if they have release
notes. For example, some of my clients are capable of fetching sources and
building from scratch and filing bug reports and are often interested in
particular new features. They even have staging infrastructure that could
test new postgres releases with real applications. But they don't do it.
They also don't follow -hackers, they don't track git, and they don't have
any easy way to tell if if the new feature they are interested in is
actually complete and ready to test at any particular time. A lot of
features are developed in multiple commits over a period of time and they
see no point in testing until at least most of the feature is complete and
expected to work. But it is not obvious from outside when that happens for
any given feature. For my clients the value of Alpha releases would
mainly be the release notes, or some other mark in the sand that says "As of
Alpha-3 feature X is included and expected to mostly work."

-dg

--
David Gould daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#133

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: David Gould (#132)

Re: [CORE] Restore-reliability mode

Among several others, On 8 June 2015 at 13:59, David Gould <daveg@sonic.net>
wrote:

I think Alphas are valuable and useful and even more so if they have
release
notes. For example, some of my clients are capable of fetching sources and
building from scratch and filing bug reports and are often interested in
particular new features. They even have staging infrastructure that could
test new postgres releases with real applications. But they don't do it.
They also don't follow -hackers, they don't track git, and they don't have
any easy way to tell if if the new feature they are interested in is
actually complete and ready to test at any particular time. A lot of
features are developed in multiple commits over a period of time and they
see no point in testing until at least most of the feature is complete and
expected to work. But it is not obvious from outside when that happens for
any given feature. For my clients the value of Alpha releases would
mainly be the release notes, or some other mark in the sand that says "As
of
Alpha-3 feature X is included and expected to mostly work."

Wow! I never knew there were all these people out there who would be
rushing to help test if only the PG developers released alpha versions.
It's funny how they never used to do it when those alphas were done.

I say again: in my experience you don't get useful test reports from people
who aren't able or prepared to compile software; what you do get is lots of
unrelated and/or unhelpful noise in the mailing list. That may be harsh or
unfair or whatever, it's just my experience.

I guess the only thing we can do is see who's right. I'm simply trying to
point out that it's not the zero-cost exercise that everyone appears to
think that it is.

Geoff

#134

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Geoff Winkless (#133)

Re: [CORE] Restore-reliability mode

On Mon, Jun 8, 2015 at 9:21 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

Wow! I never knew there were all these people out there who would be rushing
to help test if only the PG developers released alpha versions. It's funny
how they never used to do it when those alphas were done.

That's probably overplaying your hand a little bit (and it sounds a
bit catty, too). Some testing got done and it had some value. It
just wasn't enough to make Peter feel like it was worthwhile. That
doesn't mean that no testing got done and that it had no value, or
that the same thing would happen this time. I'm as skeptical about
this whole rush-out-an-alpha business as anyone, but I think that
skepticism has to yield to contrary evidence, and people saying "I
would test if..." is legitimate contrary evidence.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#135

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Geoff Winkless (#133)

Re: [CORE] Restore-reliability mode

On 06/08/2015 06:21 AM, Geoff Winkless wrote:

Wow! I never knew there were all these people out there who would be
rushing to help test if only the PG developers released alpha versions.
It's funny how they never used to do it when those alphas were done.

The type of responses you are providing on this thread are not warranted.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#136

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Robert Haas (#134)

Re: [CORE] Restore-reliability mode

On 8 June 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 8, 2015 at 9:21 AM, Geoff Winkless <pgsqladmin@geoff.dj>
wrote:

Wow! I never knew there were all these people out there who would be

rushing

to help test if only the PG developers released alpha versions. It's

funny

how they never used to do it when those alphas were done.

That's probably overplaying your hand a little bit (and it sounds a
bit catty, too).

I agree. The responses I had written yesterday but didn't send were much
worse.

Mainly because I think it's quite an attitude to take that open-source
developers should put extra time into building RPMs of development versions
rather than testers waiting 5 minutes while their machines compile.
Ohmygosh, you have to rpm install a bunch of -devel stuff? What a massive
hardship.

On 8 June 2015 at 16:06, Joshua D. Drake <jd@commandprompt.com> wrote:

The type of responses you are providing on this thread are not warranted.

I got people appearing completely insulted at my remarks and telling me
that if only they could run the alpha they would provide testing, so I
pointed out how easy it is to install the nightly from source and then they
tell me that actually compiling is far too difficult and complicated, and
that there are loads of clients who would run these nightlies if they had
RPMS...

If I truly believed that such an RPM would produce useful testing, I would
spend some of my own time building a setup to produce those RPMs myself and
post here publicising them, at which point we would have a huge number of
useful and productive test reports. Any one of the people telling me that
I'm wrong could easily do the same, but so far none has.

I'm not harping on because I want to make people feel bad, I'm harping on
because I don't want to see beta (and final) releases pushed back further
because of a bad compromise, and I believe that that will happen. I
apologise that I've clearly upset some people but they all have a very easy
route to prove me wrong, and I'll be happy to admit my error.

Geoff

#137

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#134)

Re: [CORE] Restore-reliability mode

On Mon, Jun 8, 2015 at 5:01 , Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 8, 2015 at 9:21 AM, Geoff Winkless <pgsqladmin@geoff.dj>
wrote:

Wow! I never knew there were all these people out there who would
be rushing
to help test if only the PG developers released alpha versions.
It's funny
how they never used to do it when those alphas were done.

That's probably overplaying your hand a little bit (and it sounds a
bit catty, too). Some testing got done and it had some value. It
just wasn't enough to make Peter feel like it was worthwhile. That
doesn't mean that no testing got done and that it had no value, or
that the same thing would happen this time. I'm as skeptical about
this whole rush-out-an-alpha business as anyone, but I think that
skepticism has to yield to contrary evidence, and people saying "I
would test if..." is legitimate contrary evidence.

Agreed.

To get back to the point, I think the problem with original alphas was
that they were after CF snapshots, not something that represented the
final release.

I do think that proper alpha/beta release is signal for several
companies (I do know some that do testing once beta gets out) to do
testing as it does indeed say that we are releasing something that is
close in functionality to the final release.

Also the packages are really important, there are enough companies that
don't install development packages to servers at all so it's not just
compile and run for them, they have to move it over to other machines,
etc. We should be lowering the barrier to user based testing as much as
possible and doing alpha with packages is exactly how we do that.

IMHO the only real discussion here is if current 9.5 is ready for user
testing and FWIW I thin it is.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#138

Claudio Freire

klaussfreire@gmail.com

over 10 years ago

In reply to: Geoff Winkless (#136)

Re: [CORE] Restore-reliability mode

On Mon, Jun 8, 2015 at 12:22 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

On 8 June 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 8, 2015 at 9:21 AM, Geoff Winkless <pgsqladmin@geoff.dj>
wrote:

Wow! I never knew there were all these people out there who would be
rushing
to help test if only the PG developers released alpha versions. It's
funny
how they never used to do it when those alphas were done.

That's probably overplaying your hand a little bit (and it sounds a
bit catty, too).

I agree. The responses I had written yesterday but didn't send were much
worse.

Mainly because I think it's quite an attitude to take that open-source
developers should put extra time into building RPMs of development versions
rather than testers waiting 5 minutes while their machines compile.
Ohmygosh, you have to rpm install a bunch of -devel stuff? What a massive
hardship.

It's not about the 5 minutes of compile time, it's about the signalling.

Just *when* is git ready for testing? You don't know from the outside.

I do lurk here a lot and still am unsure quite often.

Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#139

Geoff Winkless

pgsqladmin@geoff.dj

over 10 years ago

In reply to: Claudio Freire (#138)

Re: [CORE] Restore-reliability mode

On 8 June 2015 at 17:03, Claudio Freire <klaussfreire@gmail.com> wrote:

It's not about the 5 minutes of compile time, it's about the signalling.

Just *when* is git ready for testing? You don't know from the outside.

I do lurk here a lot and still am unsure quite often.

Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.

I can see that, and can absolutely get behind the idea of a nightly being
flagged as an alpha, since it should involve next to no developer time.

I may be overestimating the amount of time that goes towards producing a
release; the fact that the full-on alpha releases were stopped did imply to
me that it's not insignificant.

Geoff

#140

David G. Johnston

david.g.johnston@gmail.com

over 10 years ago

In reply to: Claudio Freire (#138)

Re: [CORE] Restore-reliability mode

On Mon, Jun 8, 2015 at 12:03 PM, Claudio Freire <klaussfreire@gmail.com>
wrote:

Just *when* is git ready for testing? You don't know from the outside.

I do lurk here a lot and still am unsure quite often.

Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.

IIUC the master branch is always ready for testing.

I do not think the project cares whether everyone is testing the exact
same codebase; as long as test findings include the relevant commit hash
the results will be informative.

David J.

#141

David G. Johnston

david.g.johnston@gmail.com

over 10 years ago

In reply to: Geoff Winkless (#139)

Re: [CORE] Restore-reliability mode

On Mon, Jun 8, 2015 at 12:14 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

On 8 June 2015 at 17:03, Claudio Freire <klaussfreire@gmail.com> wrote:

It's not about the 5 minutes of compile time, it's about the signalling.

Just *when* is git ready for testing? You don't know from the outside.

I do lurk here a lot and still am unsure quite often.

Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.

I can see that, and can absolutely get behind the idea of a nightly being
flagged as an alpha, since it should involve next to no developer time.

Nightly where? This is an international community.

The tip of the master branch is the current "alpha" - so the question is
whether a tar bundle should be provided instead of asking people to simply
keep their Git clone up-to-date. These both have the flaw of excluding
people who would test the application if it could simply be installed like
any other package on their system. But I'm not seeing where there would be
a huge group of people who would test an automatically generated source
tar-ball but would not be willing to use Git. Or are we talking about a
non-source tar-ball?

Maybe packagers could be convinced to bundle up the master branch on a
monthly basis and simply call it Master-SNAPSHOT. No alpha, no beta, no
version number. I've never packaged before so I don't know but while the
project should encourage this as things currently standard the core project
is doing its job by ensuring that the tip of master is always in a usable
state.

Or, whenever a new patch release goes out packagers can also bundle up the
current master at the same time.

David J.

#142

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: David G. Johnston (#140)

Re: [CORE] Restore-reliability mode

On 2015-06-08 12:16:34 -0400, David G. Johnston wrote:

IIUC the master branch is always ready for testing.

I don't really think so. For one we often find bugs ourselves quite
quickly.

But more importantly, I'd much rather have users use their precious (and
thus limited!) time to test when the set of features (not every detail
of a feature) is mostly set in stone. There's not much point in doing
in-depth testing before that. Similarly it's not particularly worthwhile
to test while the buildfarm still shows failures on common platforms.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#143

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: David G. Johnston (#140)

Re: [CORE] Restore-reliability mode

David,

* David G. Johnston (david.g.johnston@gmail.com) wrote:

On Mon, Jun 8, 2015 at 12:03 PM, Claudio Freire <klaussfreire@gmail.com>
wrote:

Just *when* is git ready for testing? You don't know from the outside.

I do lurk here a lot and still am unsure quite often.

Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.

IIUC the master branch is always ready for testing.

I do not think the project cares whether everyone is testing the exact
same codebase; as long as test findings include the relevant commit hash
the results will be informative.

For my 2c, I do believe it's useful for projects which are based on PG
or which work with PG to have a 'alpha1' tag to refer to. Asking users
to test with git hash XYZABC isn't great. Getting more users of
applications which use PG to do testing is, in my view at least, a great
way to improve our test coverage and I do think having an alpha will
help with that.

That said, I'm not pushing to have one released this week or before
PGCon or any such- let's get the back-branch releases dealt with and
then we can get an alpha out.

Thanks!

Stephen

#144

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: David G. Johnston (#141)

Re: [CORE] Restore-reliability mode

David G. Johnston wrote:

On Mon, Jun 8, 2015 at 12:14 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

I can see that, and can absolutely get behind the idea of a nightly being
flagged as an alpha, since it should involve next to no developer time.

Nightly where? This is an international community.

A "nightly" refers to our development snapshots, which are uploaded to
the ftp servers every "night" (according to some timezone). You can
find them in pub/snapshot/ for each branch.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#145

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: David G. Johnston (#141)

Re: [CORE] Restore-reliability mode

On Mon, Jun 8, 2015 at 12:32:45PM -0400, David G. Johnston wrote:

On Mon, Jun 8, 2015 at 12:14 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

On 8 June 2015 at 17:03, Claudio Freire <klaussfreire@gmail.com> wrote:

It's not about the 5 minutes of compile time, it's about the
signalling.

Just *when* is git ready for testing? You don't know from the outside.

I do lurk here a lot and still am unsure quite often.

Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.

I can see that, and can absolutely get behind the idea of a nightly being
flagged as an alpha, since it should involve next to no developer time.

Nightly where? This is an international community.

The daily snapshot tarballs are built in a way to minimize the number of
development tools required:

http://www.postgresql.org/ftp/snapshot/dev/

These would be easier to use than pulling from git.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#146

Magnus Hagander

magnus@hagander.net

over 10 years ago

In reply to: Alvaro Herrera (#144)

Re: [CORE] Restore-reliability mode

On Mon, Jun 8, 2015 at 7:01 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

David G. Johnston wrote:

On Mon, Jun 8, 2015 at 12:14 PM, Geoff Winkless <pgsqladmin@geoff.dj>

wrote:

I can see that, and can absolutely get behind the idea of a nightly

being

flagged as an alpha, since it should involve next to no developer time.

Nightly where? This is an international community.

A "nightly" refers to our development snapshots, which are uploaded to
the ftp servers every "night" (according to some timezone). You can
find them in pub/snapshot/ for each branch.

Snapshots are actually not nightly anymore, and haven't been for some time.
They are currently run every 4 hours, and are uploaded to the ftp server
once a buildfarm run (on debian x64) finishes.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#147

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Noah Misch (#123)

Re: Restore-reliability mode

On Sat, Jun 6, 2015 at 03:58:05PM -0400, Noah Misch wrote:

On Fri, Jun 05, 2015 at 08:25:34AM +0100, Simon Riggs wrote:

This whole idea of "feature development" vs reliability is bogus. It
implies people that work on features don't care about reliability. Given
the fact that many of the features are actually about increasing database
reliability in the event of crashes and corruptions it just makes no sense.

I'm contrasting work that helps to keep our existing promises ("reliability")
with work that makes new promises ("features"). In software development, we
invariably hazard old promises to make new promises; our success hinges on
electing neither too little nor too much risk. Two years ago, PostgreSQL's
track record had placed it in a good position to invest in new, high-risk,
high-reward promises. We did that, and we emerged solvent yet carrying an
elevated debt service ratio. It's time to reduce risk somewhat.

You write about a different sense of "reliability." (Had I anticipated this
misunderstanding, I might have written "Restore-probity mode.") None of this
was about classifying people, most of whom allocate substantial time to each
kind of work.

How will we participate in cleanup efforts? How do we know when something
has been "cleaned up", how will we measure our success or failure? I think
we should be clear that wasting N months on cleanup can *fail* to achieve a
useful objective. Without a clear plan it almost certainly will do so. The
flip side is that wasting N months will cause great amusement and dancing
amongst those people who wish to pull ahead of our open source project and
we should take care not to hand them a victory from an overreaction.

I agree with all that. We should likewise take care not to become insolvent
from an underreaction.

I understand the overreaction/underreaction debate. Here were my goals
in this discussion:

1. stop worry about the 9.5 timeline so we could honestly assess our
software - *done*
2. seriously address multi-xact issues without 9.5/commit-fest pressure -
*in process*
3. identify any other areas in need of serious work

While I like the list you provided, I don't think we can be effective in
an environment where we assume every big new features will have problems
like multi-xact. For example, we have not seen destabilization from any
major 9.4 features, that I can remember anyway.

Unless there is consensus about new areas for #3, I am thinking we will
continue looking at multi-xact until we are happy, then move ahead with
9.5 items in the way we have before.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#148

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Bruce Momjian (#147)

Re: Restore-reliability mode

On 2015-06-08 13:44:05 -0400, Bruce Momjian wrote:

I understand the overreaction/underreaction debate. Here were my goals
in this discussion:

1. stop worry about the 9.5 timeline so we could honestly assess our
software - *done*
2. seriously address multi-xact issues without 9.5/commit-fest pressure -
*in process*
3. identify any other areas in need of serious work

While I like the list you provided, I don't think we can be effective in
an environment where we assume every big new features will have problems
like multi-xact. For example, we have not seen destabilization from any
major 9.4 features, that I can remember anyway.

Unless there is consensus about new areas for #3, I am thinking we will
continue looking at multi-xact until we are happy, then move ahead with
9.5 items in the way we have before.

I think one important part is that we (continue to?) regularly tell our
employers that work on pre-commit, post-commit review, and refactoring
are critical for their long term business prospects. My impression so
far is that that the employer side hasn't widely realized that fact, and
that many contributors do the review etc. part in their spare time.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#149

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Andres Freund (#148)

Re: Restore-reliability mode

On Mon, Jun 8, 2015 at 07:48:36PM +0200, Andres Freund wrote:

On 2015-06-08 13:44:05 -0400, Bruce Momjian wrote:

I understand the overreaction/underreaction debate. Here were my goals
in this discussion:

1. stop worry about the 9.5 timeline so we could honestly assess our
software - *done*
2. seriously address multi-xact issues without 9.5/commit-fest pressure -
*in process*
3. identify any other areas in need of serious work

While I like the list you provided, I don't think we can be effective in
an environment where we assume every big new features will have problems
like multi-xact. For example, we have not seen destabilization from any
major 9.4 features, that I can remember anyway.

Unless there is consensus about new areas for #3, I am thinking we will
continue looking at multi-xact until we are happy, then move ahead with
9.5 items in the way we have before.

I think one important part is that we (continue to?) regularly tell our
employers that work on pre-commit, post-commit review, and refactoring
are critical for their long term business prospects. My impression so
far is that that the employer side hasn't widely realized that fact, and
that many contributors do the review etc. part in their spare time.

Agreed. My bet is that more employers realize it now than they did a
few months ago --- kind of hard to miss all those minor releases and
customer complaints. :-(

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#150

Gavin Flower

GavinFlower@archidevsys.co.nz

over 10 years ago

In reply to: David Gould (#132)

Re: [CORE] Restore-reliability mode

On 09/06/15 00:59, David Gould wrote:

I think Alphas are valuable and useful and even more so if they have release
notes. For example, some of my clients are capable of fetching sources and
building from scratch and filing bug reports and are often interested in
particular new features. They even have staging infrastructure that could
test new postgres releases with real applications. But they don't do it.
They also don't follow -hackers, they don't track git, and they don't have
any easy way to tell if if the new feature they are interested in is
actually complete and ready to test at any particular time. A lot of
features are developed in multiple commits over a period of time and they
see no point in testing until at least most of the feature is complete and
expected to work. But it is not obvious from outside when that happens for
any given feature. For my clients the value of Alpha releases would
mainly be the release notes, or some other mark in the sand that says "As of
Alpha-3 feature X is included and expected to mostly work."

-dg

RELEASE NOTES

I think that having:

1. release notes

2. an Alpha people can simply install without having to compile

Would encourage more people to get involved. Such people would be
unlikely to have the time and inclination to use 'nightlies', even if
compiling was not required.

I have read other posts in this thread, that support the above.

Surely, it would be good for pg to have some more people checking
quality at an earlier stage? So reducing barriers to do so is a good thing?

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#151

David Gould

daveg@sonic.net

over 10 years ago

In reply to: Claudio Freire (#138)

Re: [CORE] Restore-reliability mode

On Mon, 8 Jun 2015 13:03:56 -0300
Claudio Freire <klaussfreire@gmail.com> wrote:

Ohmygosh, you have to rpm install a bunch of -devel stuff? What a massive
hardship.

It's not about the 5 minutes of compile time, it's about the signalling.

Just *when* is git ready for testing? You don't know from the outside.

I do lurk here a lot and still am unsure quite often.

Even simply releasing an alpha *tarball* would be useful enough. What
is needed is the signal to test, rather than a fully-built package.

This. The clients I referred to earlier don't even use the rpm packages,
they build from sources. They need to know when it is worthwhile to take a
new set of sources and test. Some sort of labeling about what the contents
are would enable them to do this.

I don't think a monthly snapshot would work as well as the requirement is
knowing that "grouping sets are in" not that "it is July now".

-dg

--
David Gould daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#152

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: Andres Freund (#76)

Re: Restore-reliability mode

On Wed, Jun 03, 2015 at 04:18:37PM +0200, Andres Freund wrote:

On 2015-06-03 09:50:49 -0400, Noah Misch wrote:

Second, I would define the subject matter as "bug fixes, testing and
review", not "restructuring, testing and review." Different code
structures are clearest to different hackers. Restructuring, on
average, adds bugs even more quickly than feature development adds
them.

I can't agree with this. While I agree with not doing large
restructuring for 9.5, I think we can't affort not to refactor for
clarity, even if that introduces bugs. Noticeable parts of our code have
to frequently be modified for new features and are badly structured at
the same time. While restructuring will may temporarily increase the
number of bugs in the short term, it'll decrease the number of bugs long
term while increasing the number of potential contributors and new
features. That's obviously not to say we should just refactor for the
sake of it.

I think I agree with everything after your first sentence. I liked your
specific proposal to split StartupXLOG(), but making broad-appeal
restructuring proposals is hard. I doubt we would get good results by casting
a wide net for restructuring ideas. Automated testing has a lower barrier to
entry and is far less liable to make things worse instead of better. I can
hope for good results from a TestSuiteFest, but not from a RestructureFest.
That said, if folks initiate compelling restructure proposals, we should be
willing to risk bugs from them like we risk bugs to acquire new features.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#153

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Noah Misch (#152)

Re: Restore-reliability mode

On 2015-06-10 01:57:22 -0400, Noah Misch wrote:

I think I agree with everything after your first sentence. I liked your
specific proposal to split StartupXLOG(), but making broad-appeal
restructuring proposals is hard. I doubt we would get good results by casting
a wide net for restructuring ideas.

I'm not meaning that we should actively strive to find as many things to
refactor as possible (yes, over-emphasized a bit). But that we shouldn't
skip refactoring if we notice something structurally bad, just because
it's been that way and we don't want to touch something "working". That
argument has e.g. been made repeatedly for xlog.c contents.

My feeling is that we're reaching the stage where a significant number
of bugs are added because code is structured "needlessly" complicated
and/or repetitive. And better testing can only catch so much - often
enough somebody has to think of all the possible corner cases.

Automated testing has a lower barrier to
entry and is far less liable to make things worse instead of better. I can
hope for good results from a TestSuiteFest, but not from a RestructureFest.
That said, if folks initiate compelling restructure proposals, we should be
willing to risk bugs from them like we risk bugs to acquire new
features.

Sure, increasing testing and reviews are good independently. And
especially testing actually makes refactoring much more realistic.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#154

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Peter Geoghegan (#131)

1 attachment(s)

Re: Restore-reliability mode

Peter Geoghegan wrote:

On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch <noah@leadboat.com> wrote:

- Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin
count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer
when its global pin count falls to zero.

Did a patch for this ever materialize?

I think the first part would be something like the attached.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-add-valgrind-calls.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e4b25587..83fde10 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -47,6 +47,7 @@
 #include "storage/proc.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
+#include "utils/memdebug.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/timestamp.h"
@@ -1438,6 +1439,9 @@ PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy)
 		ref = NewPrivateRefCountEntry(b + 1);
 
 		LockBufHdr(buf);
+
+		VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
+
 		buf->refcount++;
 		if (strategy == NULL)
 		{
@@ -1498,6 +1502,8 @@ PinBuffer_Locked(volatile BufferDesc *buf)
 	 */
 	Assert(GetPrivateRefCountEntry(b + 1, false) == NULL);
 
+	VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
+
 	buf->refcount++;
 	UnlockBufHdr(buf);
 
@@ -1543,6 +1549,8 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
 		Assert(buf->refcount > 0);
 		buf->refcount--;
 
+		VALGRIND_MAKE_MEM_NOACCESS(BufHdrGetBlock(buf), BLCKSZ);
+
 		/* Support LockBufferForCleanup() */
 		if ((buf->flags & BM_PIN_COUNT_WAITER) &&
 			buf->refcount == 1)

#155

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Noah Misch (#123)

Re: Restore-reliability mode

Noah Misch wrote:

- Add buildfarm members. This entails reporting any bugs that prevent an
initial passing run. Once you have a passing run, schedule regular runs.
Examples of useful additions:
- "./configure ac_cv_func_getopt_long=no, ac_cv_func_snprintf=no ..." to
enable all the replacement code regardless of the current platform's need
for it. This helps distinguish "Windows bug" from "replacement code bug."
- --disable-integer-datetimes, --disable-float8-byval, disable-float4-byval,
--disable-spinlocks, --disable-atomics, disable-thread-safety,
--disable-largefile, #define RANDOMIZE_ALLOCATED_MEMORY

#define RELCACHE_FORCE_RELEASE + #define CLOBBER_FREED_MEMORY

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#156

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: Alvaro Herrera (#154)

Re: Restore-reliability mode

On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote:

Peter Geoghegan wrote:

On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch <noah@leadboat.com> wrote:

- Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin
count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer
when its global pin count falls to zero.

Did a patch for this ever materialize?

I think the first part would be something like the attached.

Neat. Does it produce any new complaints during "make installcheck"?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#157

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Noah Misch (#156)

Re: Restore-reliability mode

Noah Misch wrote:

On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote:

Peter Geoghegan wrote:

On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch <noah@leadboat.com> wrote:

- Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin
count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer
when its global pin count falls to zero.

Did a patch for this ever materialize?

I think the first part would be something like the attached.

Neat. Does it produce any new complaints during "make installcheck"?

I only tried a few tests, for lack of time, and it didn't produce any.
(To verify that the whole thing was working properly, I reduced the
range of memory made available during PinBuffer and that resulted in a
crash immediately). I am not really familiar with valgrind TBH and just
copied a recipe to run postmaster under it, so if someone with more
valgrind-fu could verify this, it would be great.

This part:

Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its
global pin count falls to zero.

can be done without any valgrind, I think. Any takers?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers