Please advise features in 7.1
Hello,
I've looked at the resources available through the web page to CVS and other
stuff,
however I cant find a statement of whats likely to be in 7.1 and what is planned
for later.
Reason: I want to know if any of these features are scheduled.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);
This is like MSSQL
2. Any parameterised triggers
3. Any parameterised stored procedures that return a result set.
These are _extraordinarily_ useful for application development.
If anyone has a way of bolting on any of these to 7.0, I'd be keen to hear from
you.
Regards
John
"John Huttley" <John@mwk.co.nz> writes:
Reason: I want to know if any of these features are scheduled.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);
You can do that now (and for many versions past) with a trigger.
It's not quite as convenient as it ought to be, but it's possible.
AFAIK there's no change in that situation for 7.1.
2. Any parameterised triggers
We've had parameterized triggers for years. Maybe you attach some
meaning to that term beyond what I do?
3. Any parameterised stored procedures that return a result set.
There is some support (dating back to Berkeley Postquel) for functions
returning sets, but it's pretty ugly and limited. Proper support might
happen in 7.2 ...
regards, tom lane
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "John Huttley" <John@mwk.co.nz>
Cc: <pgsql-hackers@postgresql.org>
Sent: Thursday, 23 November 2000 19:05
Subject: Re: [HACKERS] Please advise features in 7.1
"John Huttley" <John@mwk.co.nz> writes:
Reason: I want to know if any of these features are scheduled.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);You can do that now (and for many versions past) with a trigger.
It's not quite as convenient as it ought to be, but it's possible.
AFAIK there's no change in that situation for 7.1.
Yes, Perhaps defining the table with a dummy field and setting up a
'before'
trigger which replaced that field with a calculated value?
Messy but feasible.
2. Any parameterised triggers
We've had parameterized triggers for years. Maybe you attach some
meaning to that term beyond what I do?
I'm referring to the manual that says functions used for triggers must have
no parameters
and return a type Opaque. And indeed it is impossible to create a trigger
from a plSQL function that takes any parameters.
Thus if we have a lot of triggers which are very similar, we cannot just use
one function
and pass an identifying parameter or two to it. We must create an
individual function for each trigger.
Its irritating more than fatal.
3. Any parameterised stored procedures that return a result set.
There is some support (dating back to Berkeley Postquel) for functions
returning sets, but it's pretty ugly and limited. Proper support might
happen in 7.2 ...
Something to look forward to! Meanwhile I'll have a play and see if its
possible to use a read trigger
to populate a temporary table. hmm, that might require a statement level
trigger. Another thing for 7.2,
i guess.
The application programming we are doing now utilises stored procedures
returning record sets
(MSSQL) and the lack is showstopper in our migration plans. Sigh.
Thanks Tom
Regards
John
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "John Huttley" <John@mwk.co.nz>
Cc: <pgsql-hackers@postgresql.org>
Sent: Thursday, 23 November 2000 19:05
Subject: Re: [HACKERS] Please advise features in 7.1
"John Huttley" <John@mwk.co.nz> writes:
Reason: I want to know if any of these features are scheduled.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);You can do that now (and for many versions past) with a trigger.
It's not quite as convenient as it ought to be, but it's possible.
AFAIK there's no change in that situation for 7.1.
Yes, Perhaps defining the table with a dummy field and setting up a
'before'
trigger which replaced that field with a calculated value?
Messy but feasible.
2. Any parameterised triggers
We've had parameterized triggers for years. Maybe you attach some
meaning to that term beyond what I do?
I'm referring to the manual that says functions used for triggers must have
no parameters
and return a type Opaque. And indeed it is impossible to create a trigger
from a plSQL function that takes any parameters.
Thus if we have a lot of triggers which are very similar, we cannot just use
one function
and pass an identifying parameter or two to it. We must create an
individual function for each trigger.
Its irritating more than fatal.
3. Any parameterised stored procedures that return a result set.
There is some support (dating back to Berkeley Postquel) for functions
returning sets, but it's pretty ugly and limited. Proper support might
happen in 7.2 ...
Something to look forward to! Meanwhile I'll have a play and see if its
possible to use a read trigger
to populate a temporary table. hmm, that might require a statement level
trigger. Another thing for 7.2,
i guess.
The application programming we are doing now utilises stored procedures
returning record sets
(MSSQL) and the lack is showstopper in our migration plans. Sigh.
Thanks Tom
Regards
John
Reason: I want to know if any of these features are scheduled.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);
This is currently easily done with a procedure that takes a tabletype parameter
with the name the_sum returning the sum of a + b.
Create table test (
A Integer,
B integer
);
create function the_sum (test) returns integer as
'
begin;
return ($1.a + $1.b);
end;
' language 'plpgsql';
A select * won't return the_sum, but a
select t.a, t.b, t.the_sum from test t;
will do what you want.
Unfortunately it only works if you qualify the column the_sum with a tablename or alias.
(But I heard you mention the Micro$oft word, and they tend to always use aliases anyway)
Maybe we could even extend the column search in the unqualified case ?
Andreas
Import Notes
Resolved by subject fallback
At 18:00 23/11/00 +1300, John Huttley wrote:
1. Calculated fields in table definitions . eg.
Can't really do this - you might want to consider a view with an insert &
update rule. I'm not sure how flexible rules are and you may not be able to
write rules to make views functions like tables, but that is at least part
of their purpose I think.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
At 06:00 PM 11/23/00 +1300, John Huttley wrote:
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);
...
These are _extraordinarily_ useful for application development.
If anyone has a way of bolting on any of these to 7.0, I'd be keen to hear
from
you.
Create a trigger on insert/update for this case...
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 12:28 PM 11/23/00 +0100, Zeugswetter Andreas SB wrote:
Reason: I want to know if any of these features are scheduled.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);This is currently easily done with a procedure that takes a tabletype
parameter
with the name the_sum returning the sum of a + b.
Create table test (
A Integer,
B integer
);create function the_sum (test) returns integer as
'
begin;
return ($1.a + $1.b);
end;
' language 'plpgsql';A select * won't return the_sum
create view test2 select A, B, A+B as the_sum from test;
will, though.
See, lots of ways to do it!
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
"john huttley" <john@mwk.co.nz> writes:
We've had parameterized triggers for years. Maybe you attach some
meaning to that term beyond what I do?
I'm referring to the manual that says functions used for triggers must
have no parameters and return a type Opaque.
The function has to be declared that way, but you can actually pass a
set of string parameters to it from the CREATE TRIGGER command. The
strings show up in some special variable or other inside the function.
(No, I don't know why it was done in that ugly way...) See the manual's
discussion of trigger programming.
regards, tom lane
Thanks for your help, everyone.
This is a summary of replies.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);
This functionality can be achieved through the use of views.
Implementing the create table syntax may not be too hard,
but not in 7.1...
2 Parameterised Triggers
Functionality is there, just that the documentation gave the wrong implication.
An user manual example of using parameterised triggers to implement referential
integrity
would be welcome.
3. Stored Procedures returning a record set.
Dream on!
Regards
John
On Tue, 28 Nov 2000, John Huttley wrote:
3. Stored Procedures returning a record set.
Dream on!
This is something I would be really interested to see working. What are the
issues? my understanding is that it is technically feasible but too
complicated to add to PL/PGsql? it seems to me a basic service that needs
to be implemented soon, even if its just returning multiple rows of one
column...
- Andrew
Hi,
how long is PG7.1 already in beta testing? can it be released before Christmas day?
can PG7.1 will recover database from system crash?
Thanks,
XuYifeng
Hi,
how long is PG7.1 already in beta testing? can it be released before Christmas day?
can PG7.1 will recover database from system crash?
Thanks,
XuYifeng
Import Notes
Resolved by subject fallback
At 04:17 PM 11/28/00 +0800, xuyifeng wrote:
Hi,
how long is PG7.1 already in beta testing? can it be released before Christmas day?
can PG7.1 will recover database from system crash?
This guy's a troll from the PHP Builder's site (at least, Tim Perdue and I suspect this
due to some posts he made in regard to Tim's SourceForge/Postgres article).
Since he's read Tim's article, and at least some of the follow-up posts (given that
he's posted responses himself), he should know by now that PG 7.1 is still in a pre-beta
state and won't be released before Christmas day. I also posted a fairly long answer
to a question Tim's posted at phpbuilder.com regarding recoverability and this guy's
undoubtably read it, too.
Have I forgotten anything, xuyifeng?
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
no doubt, I have touched some problems PG has, right? if PG is so good,
is there any necessary for the team to improve PG again?
Regards,
XuYifeng
----- Original Message -----
From: Don Baccus <dhogaza@pacifier.com>
To: xuyifeng <jamexu@telekbird.com.cn>; <pgsql-hackers@postgresql.org>
Sent: Tuesday, November 28, 2000 10:37 PM
Subject: Re: [HACKERS] beta testing version
Show quoted text
At 04:17 PM 11/28/00 +0800, xuyifeng wrote:
Hi,
how long is PG7.1 already in beta testing? can it be released before Christmas day?
can PG7.1 will recover database from system crash?This guy's a troll from the PHP Builder's site (at least, Tim Perdue and I suspect this
due to some posts he made in regard to Tim's SourceForge/Postgres article).Since he's read Tim's article, and at least some of the follow-up posts (given that
he's posted responses himself), he should know by now that PG 7.1 is still in a pre-beta
state and won't be released before Christmas day. I also posted a fairly long answer
to a question Tim's posted at phpbuilder.com regarding recoverability and this guy's
undoubtably read it, too.Have I forgotten anything, xuyifeng?
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 11:15 PM 11/28/00 +0800, xuyifeng wrote:
no doubt, I have touched some problems PG has, right? if PG is so good,
is there any necessary for the team to improve PG again?
See? Troll...
The guy worships MySQL, just in case folks haven't made the connection.
I'm going to ignore him from now on, suggest others do the same, I'm sure
he'll go away eventually.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
you are complete wrong, if I don't like PG, I'll never go here or talk anything about PG, I don't care it.
I just want PG can be improved quickly, for me crash recover is very urgent problem,
otherewise PG is forced to stay on my desktop machine, We'll dare not move it to our Server,
I always see myself as a customer, customer is always right.
Regards,
XuYifeng
----- Original Message -----
From: Don Baccus <dhogaza@pacifier.com>
To: xuyifeng <jamexu@telekbird.com.cn>; <pgsql-hackers@postgresql.org>
Sent: Tuesday, November 28, 2000 11:16 PM
Subject: Re: [HACKERS] beta testing version
Show quoted text
At 11:15 PM 11/28/00 +0800, xuyifeng wrote:
no doubt, I have touched some problems PG has, right? if PG is so good,
is there any necessary for the team to improve PG again?See? Troll...
The guy worships MySQL, just in case folks haven't made the connection.
I'm going to ignore him from now on, suggest others do the same, I'm sure
he'll go away eventually.- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Tue, Nov 28, 2000 at 02:04:01PM +1300, John Huttley wrote:
Thanks for your help, everyone.
This is a summary of replies.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);This functionality can be achieved through the use of views.
Using a view for this isn't quite the same functionality as a computed
field, from what I understand, since the calculation will be done at
SELECT time, rather than INSERT/UPDATE.
This can also be done with a trigger, which, while more cumbersome to
write, would be capable of doing the math at modification time.
Ross
--
Open source code is like a natural resource, it's the result of providing
food and sunshine to programmers, and then staying out of their way.
[...] [It] is not going away because it has utility for both the developers
and users independent of economic motivations. Jim Flynn, Sunnyvale, Calif.
no doubt, I have touched some problems PG has, right? if PG is so good,
is there any necessary for the team to improve PG again?
*rofl*
Good call Don :)
- Thomas
This is a summary of replies.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);This functionality can be achieved through the use of views.
Using a view for this isn't quite the same functionality as a computed
field, from what I understand, since the calculation will be done at
SELECT time, rather than INSERT/UPDATE.
I would expect the calculated field from above example to be calculated
during select time also, no ? You don't want to waste disk space with something
you can easily compute at runtime.
Andreas
Import Notes
Resolved by subject fallback
I guess it depends on what you're using it for -- disk space
is cheap and
abundant anymore, I can see some advantages of having it
computed only once
rather than X times, where X is the number of SELECTs as that
could get
costly on really high traffic servers.. Costly not so much for simple
computations like that but more complex ones.
Once and for all forget the argument in database technology, that disk space
is cheap in regard to $/Mb. That is not the question. The issue is:
1. amout of rows you can cache
2. number of rows you can read from disk per second
(note that it is not pages/sec)
3. how many rows you can sort in memory
In the above sence disk space is one of the most expensive things in a
database system. Saving disk space where possible will gain you drastic
performance advantages.
Andreas
Import Notes
Resolved by subject fallback
On Tue, 28 Nov 2000, xuyifeng wrote:
no doubt, I have touched some problems PG has, right? if PG is so good,
is there any necessary for the team to improve PG again?
There is always room for improvements for any software package ... whether
it be PgSQL, Linux, FreeBSD or PHPBuilder ... as ppl learn more,
understand more and come up with new techniques, things tend to get better
...
Regards,
XuYifeng
----- Original Message -----
From: Don Baccus <dhogaza@pacifier.com>
To: xuyifeng <jamexu@telekbird.com.cn>; <pgsql-hackers@postgresql.org>
Sent: Tuesday, November 28, 2000 10:37 PM
Subject: Re: [HACKERS] beta testing versionAt 04:17 PM 11/28/00 +0800, xuyifeng wrote:
Hi,
how long is PG7.1 already in beta testing? can it be released before Christmas day?
can PG7.1 will recover database from system crash?This guy's a troll from the PHP Builder's site (at least, Tim Perdue and I suspect this
due to some posts he made in regard to Tim's SourceForge/Postgres article).Since he's read Tim's article, and at least some of the follow-up posts (given that
he's posted responses himself), he should know by now that PG 7.1 is still in a pre-beta
state and won't be released before Christmas day. I also posted a fairly long answer
to a question Tim's posted at phpbuilder.com regarding recoverability and this guy's
undoubtably read it, too.Have I forgotten anything, xuyifeng?
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Tue, 28 Nov 2000, xuyifeng wrote:
you are complete wrong, if I don't like PG, I'll never go here or talk
anything about PG, I don't care it. I just want PG can be improved
quickly, for me crash recover is very urgent problem, otherewise PG is
forced to stay on my desktop machine, We'll dare not move it to our
Server, I always see myself as a customer, customer is always right.
except when they are wrong ...
... but, as for crash recover, the plan right now is that on Thursday, Dec
1st, 7.1 goes beta ... if you are so keen on the crash recovery stuff,
what I'd recommend is grab the snapshot, and work with that on your
machine, get used to the features that it presents and report any bugs you
find. Between beta and release, there will be bug fixes, but no features
added, so it makes for a relatively safe starting point. I wouldn't use
it in production (or, rather, I personally would, but it isn't something
I'd recommend for the faint of heart), but it will give you a base to
start from ...
release will be shortly into the new year, depending on what sorts of bugs
ppl report and how quickly they can be fixed ... if all goes well, Jan 1st
will be release date, but, from experience, we're looking at closer to jan
15th :)
Regards,
XuYifeng
----- Original Message -----
From: Don Baccus <dhogaza@pacifier.com>
To: xuyifeng <jamexu@telekbird.com.cn>; <pgsql-hackers@postgresql.org>
Sent: Tuesday, November 28, 2000 11:16 PM
Subject: Re: [HACKERS] beta testing versionAt 11:15 PM 11/28/00 +0800, xuyifeng wrote:
no doubt, I have touched some problems PG has, right? if PG is so good,
is there any necessary for the team to improve PG again?See? Troll...
The guy worships MySQL, just in case folks haven't made the connection.
I'm going to ignore him from now on, suggest others do the same, I'm sure
he'll go away eventually.- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Tue, 28 Nov 2000, Hannu Krosing wrote:
xuyifeng wrote:
I just noticed this conversation so I have not followed all of it,
but you seem to have strange prioritiesI just want PG can be improved quickly, for me crash recover is very urgent problem,
Crash avoidance is usually much more urgent, at least on production
servers.
Good call, but I kinda jumped to the conclusion that since PgSQL itself
isn't that crash prone, its his OS or his hardware that was the problem :0
Import Notes
Reply to msg id not found: 3A23FBAA.4D56DBE7@tm.ee | Resolved by subject fallback
On Tue, Nov 28, 2000 at 05:19:45PM +0100, Zeugswetter Andreas SB wrote:
I guess it depends on what you're using it for -- disk space
is cheap and
abundant anymore, I can see some advantages of having it
computed only once
rather than X times, where X is the number of SELECTs as that
could get
costly on really high traffic servers.. Costly not so much for simple
computations like that but more complex ones.
<snip good arguments about disk space>
As I said in my original post, my understanding of computed fields may
be in error. If they're computed at SELECT time, to avoid creating table
space, then a VIEW is exacly the right solution. However, it's easy to
come up with examples of complex calculations that it would be useful
to cache the results of, in the table. Then, computing at INSERT/UPDATE
is clearly the way to go.
So, having _both_ is the best thing.
Ross
I guess it depends on what you're using it for -- disk space is cheap and
abundant anymore, I can see some advantages of having it computed only once
rather than X times, where X is the number of SELECTs as that could get
costly on really high traffic servers.. Costly not so much for simple
computations like that but more complex ones.
Just playing the devil's advocate a bit.
-Mitch
----- Original Message -----
From: "Zeugswetter Andreas SB" <ZeugswetterA@wien.spardat.at>
To: "'Ross J. Reedstrom'" <reedstrm@rice.edu>;
<pgsql-hackers@postgresql.org>
Sent: Tuesday, November 28, 2000 7:50 AM
Subject: AW: [HACKERS] Please advise features in 7.1 (SUMMARY)
This is a summary of replies.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);This functionality can be achieved through the use of views.
Using a view for this isn't quite the same functionality as a computed
field, from what I understand, since the calculation will be done at
SELECT time, rather than INSERT/UPDATE.I would expect the calculated field from above example to be calculated
during select time also, no ? You don't want to waste disk space with
something
Show quoted text
you can easily compute at runtime.
Andreas
So, having _both_ is the best thing.
Absolutely, that's always what I meant -- we already have views and views
can do this type of stuff at SELECT time can't they? So it's not a change,
just an addition....
And the precalculated and stored on disk thing can be done with triggers.
Andreas
Import Notes
Resolved by subject fallback
So, having _both_ is the best thing.
Absolutely, that's always what I meant -- we already have views and views
can do this type of stuff at SELECT time can't they? So it's not a change,
just an addition....
-Mitch
This is one of the not-so-stomped boxes running PostgreSQL -- I've never
restarted PostgreSQL on it since it was installed.
12:03pm up 122 days, 7:54, 1 user, load average: 0.08, 0.11, 0.09
I had some index corruption problems in 6.5.3 but since 7.0.X I haven't
heard so much as a peep from any PostgreSQL backend. It's superbly stable on
all my machines..
Damn good work guys.
-Mitch
----- Original Message -----
From: "The Hermit Hacker" <scrappy@hub.org>
To: "Hannu Krosing" <hannu@tm.ee>
Cc: "xuyifeng" <jamexu@telekbird.com.cn>; <pgsql-hackers@postgresql.org>;
"Don Baccus" <dhogaza@pacifier.com>
Sent: Tuesday, November 28, 2000 8:53 AM
Subject: Re: [HACKERS] beta testing version
On Tue, 28 Nov 2000, Hannu Krosing wrote:
xuyifeng wrote:
I just noticed this conversation so I have not followed all of it,
but you seem to have strange prioritiesI just want PG can be improved quickly, for me crash recover is very
urgent problem,
Show quoted text
Crash avoidance is usually much more urgent, at least on production
servers.Good call, but I kinda jumped to the conclusion that since PgSQL itself
isn't that crash prone, its his OS or his hardware that was the problem :0
xuyifeng wrote:
I just noticed this conversation so I have not followed all of it,
but you seem to have strange priorities
I just want PG can be improved quickly, for me crash recover is very urgent problem,
Crash avoidance is usually much more urgent, at least on production
servers.
otherewise PG is forced to stay on my desktop machine, We'll dare not move it to our Server,
Why do you keep crashing your server ?
If your desktop crashes less often than your server you might exchange
them, no?
I always see myself as a customer, customer is always right.
I'd like to see myself as being always right too ;)
-------------------
Hannu
Mitch Vincent wrote:
This is one of the not-so-stomped boxes running PostgreSQL -- I've never
restarted PostgreSQL on it since it was installed.
12:03pm up 122 days, 7:54, 1 user, load average: 0.08, 0.11, 0.09
I had some index corruption problems in 6.5.3 but since 7.0.X I haven't
heard so much as a peep from any PostgreSQL backend. It's superbly stable on
all my machines..
I have a 6.5.x box at 328 days of active use.
Crash "recovery" seems silly to me. :-)
-Bop
--
Brought to you from boop!, the dual boot Linux/Win95 Compaq Presario 1625
laptop, currently running RedHat 6.1. Your bopping may vary.
At 03:25 PM 11/28/00 -0700, Ron Chmara wrote:
Mitch Vincent wrote:
This is one of the not-so-stomped boxes running PostgreSQL -- I've never
restarted PostgreSQL on it since it was installed.
12:03pm up 122 days, 7:54, 1 user, load average: 0.08, 0.11, 0.09
I had some index corruption problems in 6.5.3 but since 7.0.X I haven't
heard so much as a peep from any PostgreSQL backend. It's superbly stable on
all my machines..I have a 6.5.x box at 328 days of active use.
Crash "recovery" seems silly to me. :-)
Well, not really ... but since our troll is a devoted MySQL user, it's a bit
of a red-herring anyway, at least as regards his own server.
You know, the one he's afraid to put Postgres on, but sleeps soundly at
night knowing the mighty bullet-proof MySQL with its full transaction
semantics, archive logging and recovery from REDO logs and all that
will save him? :)
Again ... he's a troll, not even a very entertaining one.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Tue, 28 Nov 2000, Don Baccus wrote:
At 03:25 PM 11/28/00 -0700, Ron Chmara wrote:
Mitch Vincent wrote:
This is one of the not-so-stomped boxes running PostgreSQL -- I've never
restarted PostgreSQL on it since it was installed.
12:03pm up 122 days, 7:54, 1 user, load average: 0.08, 0.11, 0.09
I had some index corruption problems in 6.5.3 but since 7.0.X I haven't
heard so much as a peep from any PostgreSQL backend. It's superbly stable on
all my machines..I have a 6.5.x box at 328 days of active use.
Crash "recovery" seems silly to me. :-)
Well, not really ... but since our troll is a devoted MySQL user, it's a bit
of a red-herring anyway, at least as regards his own server.You know, the one he's afraid to put Postgres on, but sleeps soundly at
night knowing the mighty bullet-proof MySQL with its full transaction
semantics, archive logging and recovery from REDO logs and all that
will save him? :)Again ... he's a troll, not even a very entertaining one.
Or informed?
NO, I just tested how solid PgSQL is, I run a program busy inserting record into PG table, when I
suddenly pulled out power from my machine and restarted PG, I can not insert any record into database
table, all backends are dead without any respone (not core dump), note that I am using FreeBSD 4.2,
it's rock solid, it's not OS crash, it just losted power. We use WindowsNT and MSSQL on our production
server, before we accept MSSQL, we use this method to test if MSSQL can endure this kind of strik,
it's OK, all databases are safely recovered, we can continue our work. we are a stock exchange company,
our server are storing millilion $ finance number, we don't hope there are any problems in this case,
we are using UPS, but UPS is not everything, it you bet everything on UPS, you must be idiot.
I know you must be an avocation of PG, but we are professional customer, corporation user, we store critical
data into database, not your garbage data.
Regards,
XuYifeng
----- Original Message -----
From: Don Baccus <dhogaza@pacifier.com>
To: Ron Chmara <ron@opus1.com>; Mitch Vincent <mitch@venux.net>; <pgsql-hackers@postgresql.org>
Sent: Wednesday, November 29, 2000 6:58 AM
Subject: Re: [HACKERS] beta testing version
Show quoted text
At 03:25 PM 11/28/00 -0700, Ron Chmara wrote:
Mitch Vincent wrote:
This is one of the not-so-stomped boxes running PostgreSQL -- I've never
restarted PostgreSQL on it since it was installed.
12:03pm up 122 days, 7:54, 1 user, load average: 0.08, 0.11, 0.09
I had some index corruption problems in 6.5.3 but since 7.0.X I haven't
heard so much as a peep from any PostgreSQL backend. It's superbly stable on
all my machines..I have a 6.5.x box at 328 days of active use.
Crash "recovery" seems silly to me. :-)
Well, not really ... but since our troll is a devoted MySQL user, it's a bit
of a red-herring anyway, at least as regards his own server.You know, the one he's afraid to put Postgres on, but sleeps soundly at
night knowing the mighty bullet-proof MySQL with its full transaction
semantics, archive logging and recovery from REDO logs and all that
will save him? :)Again ... he's a troll, not even a very entertaining one.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Wed, Nov 29, 2000 at 09:59:34AM +0800, xuyifeng wrote:
NO, I just tested how solid PgSQL is, I run a program busy inserting
record into PG table, when I suddenly pulled out power from my machine ...
Nobody claims PostgreSQL is proof against power failures.
... We use WindowsNT and MSSQL on our production server,
before we accept MSSQL, we use this method to test if MSSQL can endure
this kind of strike, it's OK, all databases are safely recovered, we
can continue our work.
You got lucky. Period. MSSQL is not proof against power failures,
and neither is NTFS. In particular, that the database accepted
transactions afterward is far from proof that its files were not
corrupted.
Incompetent testers produce invalid tests. Invalid tests lead to
meaningless conclusions. Incompetent testers' employers suffer
from false confidence, and poor decision-making.
Nathan Myers
ncm@zembu.com
server, before we accept MSSQL, we use this method to test if MSSQL can
endure this kind of strik,
it's OK, all databases are safely recovered, we can continue our work. we
are a stock exchange company,
And how exactly did you test the integrity of your data? Unless every single
record has got at least a CRC stored somewhere, you won't be able AT ALL to
check for database integrity. The reports from NTFS and MSSQL internal
checking are meaningless for your data integrity.
We are doing this checksumming in our project, and already got a few nasty
surprises when the "CRC daemon" stumbled over a few corrupted records we
never would have discovered otherwise. Exactly this checksumming weeded out
our server alternatives; at present only PostgreSQL is left, was the most
reliable of all.
Horst
NO, I just tested how solid PgSQL is, I run a program busy inserting record into PG table, when I
suddenly pulled out power from my machine and restarted PG, I can not insert any record into database
table, all backends are dead without any respone (not core dump), note that I am using FreeBSD 4.2,
it's rock solid, it's not OS crash, it just losted power.
PostgreSQL Versions 7.0 and below have the potential of corruting indices when the system crashes.
The usual procedure would be to reindex the database. Hiroshi has written code to allow this.
It is weird, that your installation blocks. Have you checked the postmaster log ?
We use WindowsNT and MSSQL on our production
server, before we accept MSSQL, we use this method to test if MSSQL can endure this kind of strik,
it's OK, all databases are safely recovered, we can continue our work. we are a stock exchange company,
our server are storing millilion $ finance number, we don't hope there are any problems in this case,
we are using UPS, but UPS is not everything, it you bet everything on UPS, you must be idiot.
I know you must be an avocation of PG, but we are professional customer, corporation user, we store critical
data into database, not your garbage data.
Yes, this is a test I would also do before putting very sensitive data onto a particular brand of database.
Fortunately Version 7.1 of PostgreSQL will live up to your expectations in this area.
Andreas
Import Notes
Resolved by subject fallback
xuyifeng wrote:
NO, I just tested how solid PgSQL is, I run a program busy inserting record into PG table, when I
suddenly pulled out power from my machine and restarted PG, I can not insert any record into database
table, all backends are dead without any respone (not core dump), note that I am using FreeBSD 4.2,
it's rock solid, it's not OS crash, it just losted power. We use WindowsNT and MSSQL on our production
server, before we accept MSSQL, we use this method to test if MSSQL can endure this kind of strik,
it's OK, all databases are safely recovered, we can continue our work.
The only way to safely recover them after a major crash would be
manual/supervised recovery from backups + logs
As not even NTFS is safe from power failures (I have lost an NTFS file
system a few times due to not
having an UPS) it is irrelevant if MSSQL is. Even if MSSQL is "crash
proof" (tm), how can you _prove_
your customers/superiors that the last N minutes of transactions were
not lost ?
If the DB is able to "continue your work" after the crash, you can of
course cover up the fact that the
crash even happened and blame the lost transactions on someone else when
they surface at the next audit ;)
Or just claim thet computer technology is so complicated that losing a
few transactions is normal - but
you could go on working ;) :~) ;-p
What you want for mission-critical data is replicated databases or at
least off-site logging, not "crash
recovery" at some arbitrarily chosen layer. You will need to recover
from the crash even if it destroys
the whole computer.
May I suggest another test for your NT/MSSQL setup - dont pull the plug
but change the input voltage
to 10 000 VAC, if this goes well, test vith 100 000 VAC ;)
This is also a scenario much less likely to be protected by an UPS than
power loss.
we are a stock exchange company,
our server are storing millilion $ finance number, we don't hope there are any problems in this case,
we are using UPS, but UPS is not everything, it you bet everything on UPS, you must be idiot.
So are you, if you bet everything on hoping that DB will do crash
recovery from any type of crash.
A common case of "crash" that may need to be recovered from is also a
human error , like typing drop database
at the wrong console;
I know you must be an avocation of PG, but we are professional customer, corporation user, we store critical
data into database, not your garbage data.
Then you'd better have a crash recovery infrastructure/procedures in
place and not hope that DB server
will do that automatically for you
--------------------
Hannu
From: "Ross J. Reedstrom" <reedstrm@rice.edu>
On Tue, Nov 28, 2000 at 05:19:45PM +0100, Zeugswetter Andreas SB wrote:
I guess it depends on what you're using it for -- disk space
is cheap and
abundant anymore, I can see some advantages of having it
computed only once
rather than X times, where X is the number of SELECTs as that
could get
costly on really high traffic servers.. Costly not so much for simple
computations like that but more complex ones.<snip good arguments about disk space>
As I said in my original post, my understanding of computed fields may
be in error. If they're computed at SELECT time, to avoid creating table
space, then a VIEW is exacly the right solution. However, it's easy to
come up with examples of complex calculations that it would be useful
to cache the results of, in the table. Then, computing at INSERT/UPDATE
is clearly the way to go.So, having _both_ is the best thing.
Ross
I'm new at this, but the view thing?
Isn't that just the same as:
create table test2 ( i1 int4, i2 int4);
...insert...
select i1,i2,i1+i2 from test2;
Magnus
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Programmer/Networker [|] Magnus Naeslund
PGP Key: http://www.genline.nu/mag_pgp.txt
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
I don't have the same luck, sorry to say!
I am running Mandrake linux with OpenWall patched 2.2.17 kernel, dual p3
550Mhz, 1gb memory.
It's a really busy webserver that constantly is running with 10 in load.
Sometime it spikes to ~40-50 in load (the most we had was 114(!)).
I am running postgresql 7.0.2 (from the Mandrake rpm's).
One problem i have is that in one database we rapidly insert/delete in some
tables, and to maintain a good performance on that db, i have to run a
vacuum every hour(!).
I think that db has excessive indexes all over the place (if that could have
anything to do with it?).
Another other problem that is more severe is that the database "crashes"
(read: stops working), if i run psql and do a select it says
"001129.07:04:15.688 [25474] FATAL 1: Memory exhausted in AllocSetAlloc()"
and fails.
I have a cron script that watches postgres, and restarts it if it cant get a
select right.
It fails this way maybe once a day or two days.
I've searched the mailinglist archives for this problem, but it allways
seems that my problem doesn't fit the descriptions of the other ppl's
problem generating this error message.
I have not found the right time to upgrade to 7.0.3 yet, and i don't know if
that would solve anything.
Another problem i have is that i get "001128.12:58:01.248 [23444] FATAL 1:
Socket command type unknown" in my logs. I don't know if i get that from
the unix odbc driver, the remote windows odbc driver, or in unix standard db
connections.
I get "pq_recvbuf: unexpected EOF on client connection" alot too, but that i
think only indicates that the socket was closed in a not-so-nice way, and
that it is no "real" error.
It seems that the psql windows odbc driver is generating this.
The postmaster is running with these parameters: "-N 512 -B 1024 -i -o -S
4096"
But as a happy note i can tell you that we have a Linux box here (pentium
100, kernel 2.0.3x) that has near 1000 days uptime, and runs postgres 6.5.x.
It has never failed, not even a single time :)
Magnus Naeslund
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Programmer/Networker [|] Magnus Naeslund
PGP Key: http://www.genline.nu/mag_pgp.txt
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
----- Original Message -----
From: "Mitch Vincent" <mitch@venux.net>
To: <pgsql-hackers@postgresql.org>
Sent: Tuesday, November 28, 2000 19:12
Subject: Re: [HACKERS] beta testing version
This is one of the not-so-stomped boxes running PostgreSQL -- I've never
restarted PostgreSQL on it since it was installed.12:03pm up 122 days, 7:54, 1 user, load average: 0.08, 0.11, 0.09
I had some index corruption problems in 6.5.3 but since 7.0.X I haven't
heard so much as a peep from any PostgreSQL backend. It's superbly stable
on
all my machines..
Damn good work guys.
-Mitch
----- Original Message -----
From: "The Hermit Hacker" <scrappy@hub.org>
To: "Hannu Krosing" <hannu@tm.ee>
Cc: "xuyifeng" <jamexu@telekbird.com.cn>; <pgsql-hackers@postgresql.org>;
"Don Baccus" <dhogaza@pacifier.com>
Sent: Tuesday, November 28, 2000 8:53 AM
Subject: Re: [HACKERS] beta testing versionOn Tue, 28 Nov 2000, Hannu Krosing wrote:
xuyifeng wrote:
I just noticed this conversation so I have not followed all of it,
but you seem to have strange prioritiesI just want PG can be improved quickly, for me crash recover is very
urgent problem,
Crash avoidance is usually much more urgent, at least on production
servers.Good call, but I kinda jumped to the conclusion that since PgSQL itself
isn't that crash prone, its his OS or his hardware that was the problem
:0
Show quoted text
----- Original Message -----
From: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>
Subject: AW: [HACKERS] beta testing version
NO, I just tested how solid PgSQL is, I run a program busy inserting record into PG table, when I
suddenly pulled out power from my machine and restarted PG, I can not insert any record into database
table, all backends are dead without any respone (not core dump), note that I am using FreeBSD 4.2,
it's rock solid, it's not OS crash, it just losted power.PostgreSQL Versions 7.0 and below have the potential of corruting indices when the system crashes.
The usual procedure would be to reindex the database. Hiroshi has written code to allow this.
It is weird, that your installation blocks. Have you checked the postmaster log ?
REINDEX command failed, I have an unique index, PG claimed there were two records have same key,
but obviously, it's not the fact.
[snip]
Yes, this is a test I would also do before putting very sensitive data onto a particular brand of database.Fortunately Version 7.1 of PostgreSQL will live up to your expectations in this area.
Andreas
Thanks,
---
XuYifeng
"Magnus Naeslund\(f\)" <mag@fbab.net> writes:
Another other problem that is more severe is that the database "crashes"
(read: stops working), if i run psql and do a select it says
"001129.07:04:15.688 [25474] FATAL 1: Memory exhausted in AllocSetAlloc()"
and fails.
That's odd. Does any select at all --- even, say, "SELECT 2+2" --- fail
like that, or just ones referencing a particular table, or maybe you
meant just one specific query?
Another problem i have is that i get "001128.12:58:01.248 [23444] FATAL 1:
Socket command type unknown" in my logs. I don't know if i get that from
the unix odbc driver, the remote windows odbc driver, or in unix standard db
connections.
Do any of your client applications complain that they're being
disconnected on? This might come from something not doing disconnection
cleanly, in which case the client probably wouldn't notice anything wrong.
I get "pq_recvbuf: unexpected EOF on client connection" alot too, but that i
think only indicates that the socket was closed in a not-so-nice way, and
that it is no "real" error.
It seems that the psql windows odbc driver is generating this.
That message is quite harmless AFAIK, although it'd be nice to clean up
the ODBC driver so that it disconnects in the approved fashion.
regards, tom lane
Is "if" clause support in PG?
for example:
"drop table aa if exist"
"insert into aa values(1) if not exists select * from aa where i=1"
I would like PG support it.
---
XuYifeng
----- Original Message -----
From: John Huttley <John@mwk.co.nz>
To: <pgsql-hackers@postgresql.org>
Sent: Tuesday, November 28, 2000 9:04 AM
Subject: [HACKERS] Please advise features in 7.1 (SUMMARY)
Show quoted text
Thanks for your help, everyone.
This is a summary of replies.
1. Calculated fields in table definitions . eg.
Create table test (
A Integer,
B integer,
the_sum As (A+B),
);This functionality can be achieved through the use of views.
Implementing the create table syntax may not be too hard,
but not in 7.1...2 Parameterised Triggers
Functionality is there, just that the documentation gave the wrong implication.
An user manual example of using parameterised triggers to implement referential
integrity
would be welcome.3. Stored Procedures returning a record set.
Dream on!
Regards
John
our server alternatives; at present only PostgreSQL is left, was the most
reliable of all.
mind i ask on which platform (Operating system) did you do your test,i'm
mostly used to linux but after i paid my computer (still 5 month
remaining),i want to get a used SGI box from reputable system and put NetBSD
as well as PostgreSQL on it (and maybe AolServer too,depending on the
threading model of NetBSD).
Alain Toussaint
On Thu, 30 Nov 2000, Thomas Lockhart wrote:
Is "if" clause support in PG?
for example:
"drop table aa if exist"
"insert into aa values(1) if not exists select * from aa where i=1"No. afaict it is not in any SQL standard, so is unlikely to get much
attention from developers.
Plus, for that second one can't you just do:
INSERT INTO aa SELECT 1 WHERE NOT EXISTS (SELECT * FROM aa WHERE i=1);
- Andrew
Import Notes
Reply to msg id not found: 3A25E47E.602C127E@alumni.caltech.edu | Resolved by subject fallback
Is "if" clause support in PG?
for example:
"drop table aa if exist"
"insert into aa values(1) if not exists select * from aa where i=1"
No. afaict it is not in any SQL standard, so is unlikely to get much
attention from developers.
- Thomas
At 05:24 AM 11/30/00 +0000, Thomas Lockhart wrote:
Is "if" clause support in PG?
for example:
"drop table aa if exist"
"insert into aa values(1) if not exists select * from aa where i=1"No. afaict it is not in any SQL standard, so is unlikely to get much
attention from developers.
The insert, at least, can be written in standard SQL anyway...
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
v7.1 should improve crash recovery for situations like this ... you'll
still have to do a recovery of the data on corruption of this magnitude,
but at least with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of the wall
...
On Wed, 29 Nov 2000, xuyifeng wrote:
NO, I just tested how solid PgSQL is, I run a program busy inserting record into PG table, when I
suddenly pulled out power from my machine and restarted PG, I can not insert any record into database
table, all backends are dead without any respone (not core dump), note that I am using FreeBSD 4.2,
it's rock solid, it's not OS crash, it just losted power. We use WindowsNT and MSSQL on our production
server, before we accept MSSQL, we use this method to test if MSSQL can endure this kind of strik,
it's OK, all databases are safely recovered, we can continue our work. we are a stock exchange company,
our server are storing millilion $ finance number, we don't hope there are any problems in this case,
we are using UPS, but UPS is not everything, it you bet everything on UPS, you must be idiot.
I know you must be an avocation of PG, but we are professional customer, corporation user, we store critical
data into database, not your garbage data.Regards,
XuYifeng----- Original Message -----
From: Don Baccus <dhogaza@pacifier.com>
To: Ron Chmara <ron@opus1.com>; Mitch Vincent <mitch@venux.net>; <pgsql-hackers@postgresql.org>
Sent: Wednesday, November 29, 2000 6:58 AM
Subject: Re: [HACKERS] beta testing versionAt 03:25 PM 11/28/00 -0700, Ron Chmara wrote:
Mitch Vincent wrote:
This is one of the not-so-stomped boxes running PostgreSQL -- I've never
restarted PostgreSQL on it since it was installed.
12:03pm up 122 days, 7:54, 1 user, load average: 0.08, 0.11, 0.09
I had some index corruption problems in 6.5.3 but since 7.0.X I haven't
heard so much as a peep from any PostgreSQL backend. It's superbly stable on
all my machines..I have a 6.5.x box at 328 days of active use.
Crash "recovery" seems silly to me. :-)
Well, not really ... but since our troll is a devoted MySQL user, it's a bit
of a red-herring anyway, at least as regards his own server.You know, the one he's afraid to put Postgres on, but sleeps soundly at
night knowing the mighty bullet-proof MySQL with its full transaction
semantics, archive logging and recovery from REDO logs and all that
will save him? :)Again ... he's a troll, not even a very entertaining one.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
At 07:02 PM 11/30/00 -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery for situations like this ... you'll
still have to do a recovery of the data on corruption of this magnitude,
but at least with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of the wall
No, WAL won't help if an actual database file is corrupted, say by a
disk drive hosing a block or portion thereof with zeros. WAL-based
recovery at startup works on an intact database.
Still, in the general case you need real backup and recovery tools.
Then you can apply archives of REDOs to a backup made of a snapshot
and rebuild up to the last transaction. As opposed to your last
pg_dump.
So what about mirroring (RAID 1)? As the docs tell ya, that protects
you against one drive failing but not against power failure, which can
cause bad data to be written to both mirrors if both are actively
writing when the plug is pulled.
Power failures are evil, face it! :)
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Thu, Nov 30, 2000 at 07:02:01PM -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery ...
... with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of
the wall.
Please do not propagate falsehoods like the above. It creates
unsatisfiable expectations, and leads people to fail to take
proper precautions and recovery procedures.
After a power outage on an active database, you may have corruption
at low levels of the system, and unless you have enormous redundancy
(and actually use it to verify everything) the corruption may go
undetected and result in (subtly) wrong answers at any future time.
The logging in 7.1 protects transactions against many sources of
database crash, but not necessarily against OS crash, and certainly
not against power failure. (You might get lucky, or you might just
think you were lucky.) This is the same as for most databases; an
embedded database that talks directly to the hardware might be able
to do better.
Nathan Myers
ncm@zembu.com
On Thu, 30 Nov 2000, Don Baccus wrote:
At 07:02 PM 11/30/00 -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery for situations like this ... you'll
still have to do a recovery of the data on corruption of this magnitude,
but at least with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of the wallNo, WAL won't help if an actual database file is corrupted, say by a
disk drive hosing a block or portion thereof with zeros. WAL-based
recovery at startup works on an intact database.
No, WAL does help, cause you can then pull in your last dump and recover
up to the moment that power cable was pulled out of the wall ...
On Thu, Nov 30, 2000 at 07:47:08PM -0400, The Hermit Hacker wrote:
On Thu, 30 Nov 2000, Don Baccus wrote:
At 07:02 PM 11/30/00 -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery for situations like this ... you'll
still have to do a recovery of the data on corruption of this magnitude,
but at least with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of the wallNo, WAL won't help if an actual database file is corrupted, say by a
disk drive hosing a block or portion thereof with zeros. WAL-based
recovery at startup works on an intact database.No, WAL does help, cause you can then pull in your last dump and recover
up to the moment that power cable was pulled out of the wall ...
False, on so many counts I can't list them all.
Nathan Myers
ncm
On Thu, 30 Nov 2000, Nathan Myers wrote:
On Thu, Nov 30, 2000 at 07:47:08PM -0400, The Hermit Hacker wrote:
On Thu, 30 Nov 2000, Don Baccus wrote:
At 07:02 PM 11/30/00 -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery for situations like this ... you'll
still have to do a recovery of the data on corruption of this magnitude,
but at least with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of the wallNo, WAL won't help if an actual database file is corrupted, say by a
disk drive hosing a block or portion thereof with zeros. WAL-based
recovery at startup works on an intact database.No, WAL does help, cause you can then pull in your last dump and recover
up to the moment that power cable was pulled out of the wall ...False, on so many counts I can't list them all.
would love to hear them ... I'm always opening to having my
misunderstandings corrected ...
On Thu, 30 Nov 2000, Nathan Myers wrote:
On Thu, Nov 30, 2000 at 07:47:08PM -0400, The Hermit Hacker wrote:
On Thu, 30 Nov 2000, Don Baccus wrote:
At 07:02 PM 11/30/00 -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery for situations like this ... you'll
still have to do a recovery of the data on corruption of this magnitude,
but at least with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of the wallNo, WAL won't help if an actual database file is corrupted, say by a
disk drive hosing a block or portion thereof with zeros. WAL-based
recovery at startup works on an intact database.No, WAL does help, cause you can then pull in your last dump and recover
up to the moment that power cable was pulled out of the wall ...False, on so many counts I can't list them all.
*YAWN*
On Thu, Nov 30, 2000 at 05:37:58PM -0800, Mitch Vincent wrote:
No, WAL does help, cause you can then pull in your last dump and recover
up to the moment that power cable was pulled out of the wall ...False, on so many counts I can't list them all.
Why? If we're not talking hardware damage and you have a dump made
sometime previous to the crash, why wouldn't that work to restore the
database? I've had to restore a corrupted database from a dump before,
there wasn't any hardware damage, the database (more specifically the
indexes) were corrupted. Of course WAL wasn't around but I don't see
why this wouldn't work...
I posted a more detailed explanation a few minutes ago, but
it appears to have been eaten by the mailing list server.
I won't re-post the explanations that you all have seen over the
last two days, about disk behavior during a power outage; they're
in the archives (I assume -- when last I checked, web access to it
didn't work). Suffice to say that if you pull the plug, there is
just too much about the state of the disks that is unknown.
As for replaying logs against a restored snapshot dump... AIUI, a
dump records tuples by OID, but the WAL refers to TIDs. Therefore,
the WAL won't work as a re-do log to recover your transactions
because the TIDs of the restored tables are all different.
To get replaying we need an "update log", something that might be
in 7.2 if somebody does a lot of work.
Note I'm not saying you're wrong, just asking that you explain your
comment a little more. If WAL can't be used to help recover from
crashes where database corruption occurs, what good is it?
The WAL is a performance optimization for the current recovery
capabilities, which assume uncorrupted table files. It protects
against those database server crashes that happen not to corrupt
the table files (i.e. most). It doesn't protect against corruption
of the tables, by bugs in PG or in the OS or from "hardware events".
It also doesn't protect against OS crashes that result in
write-buffered sectors not having been written before the crash.
Practically, this means that WAL file entries older than a few
seconds are not useful for much.
In general, it's foolish to expect a single system to store very
valuable data with much confidence. To get full recoverability,
you need a "hot failover" system duplicating your transactions in
real time. (Even then, you're vulnerable to application-level
mistakes.)
Nathan Myers
ncm@zembu.com
Import Notes
Reply to msg id not found: 01c901c05b3755e677700200000a@windows
No, WAL does help, cause you can then pull in your last dump and recover
up to the moment that power cable was pulled out of the wall ...False, on so many counts I can't list them all.
Why? If we're not talking hardware damage and you have a dump made sometime
previous to the crash, why wouldn't that work to restore the database? I've
had to restore a corrupted database from a dump before, there wasn't any
hardware damage, the database (more specifically the indexes) were
corrupted. Of course WAL wasn't around but I don't see why this wouldn't
work...
Note I'm not saying you're wrong, just asking that you explain your comment
a little more. If WAL can't be used to help recover from crashes where
database corruption occurs, what good is it?
-Mitch
At 05:15 PM 11/30/00 -0800, Nathan Myers wrote:
As for replaying logs against a restored snapshot dump... AIUI, a
dump records tuples by OID, but the WAL refers to TIDs. Therefore,
the WAL won't work as a re-do log to recover your transactions
because the TIDs of the restored tables are all different.
Actually, the dump doesn't record tuple OIDs (unless you specifically
ask for them), it just dumps source sql. When this gets reloaded
you get an equivalent database, but not the same database, that you
started out with.
That's why I've presumed you can't run the WAL against it.
If you and I are wrong I'd love to be surprised!
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Thu, 30 Nov 2000, Nathan Myers wrote:
On Thu, Nov 30, 2000 at 07:02:01PM -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery ...
... with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of
the wall.Please do not propagate falsehoods like the above. It creates
unsatisfiable expectations, and leads people to fail to take
proper precautions and recovery procedures.After a power outage on an active database, you may have corruption
at low levels of the system, and unless you have enormous redundancy
(and actually use it to verify everything) the corruption may go
undetected and result in (subtly) wrong answers at any future time.The logging in 7.1 protects transactions against many sources of
database crash, but not necessarily against OS crash, and certainly
not against power failure. (You might get lucky, or you might just
think you were lucky.) This is the same as for most databases; an
embedded database that talks directly to the hardware might be able
to do better.
We're talking about transaction logging here ... nothing gets written to
it until completed ... if I take a "known to be clean" backup from the
night before, restore that and then run through the transaction logs, my
data should be clean, unless my tape itself is corrupt. If the power goes
off half way through a write to the log, then that transaction wouldn't be
marked as completed and won't roll into the restore ...
if a disk goes corrupt, I'd expect that the redo log would possibly have a
problem with corruption .. but if I pull the plug, unless I've somehow
damaged the disk, I would expect my redo log to be clean *and*, unless
Vadim totally messed something up, if there is any corruption in the redo
log, I'd expect that restoring from it would generate from red flags ...
At 03:35 PM 11/30/00 -0800, Nathan Myers wrote:
On Thu, Nov 30, 2000 at 07:02:01PM -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery ...
... with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of
the wall.Please do not propagate falsehoods like the above. It creates
unsatisfiable expectations, and leads people to fail to take
proper precautions and recovery procedures.
Yeah, I posted similar stuff to the PHPbuilder forum in regard to
PG.
The logging in 7.1 protects transactions against many sources of
database crash, but not necessarily against OS crash, and certainly
not against power failure. (You might get lucky, or you might just
think you were lucky.) This is the same as for most databases; an
embedded database that talks directly to the hardware might be able
to do better.
Let's put it this way ... Oracle, a transaction-safe DB with REDO
logging, has for a very long time implemented disk mirroring. Now,
why would they do that if you could pull the plug on the processor
and depend on REDO logging to save you?
And even then you're expected to provide adequate power backup to
enable clean shutdown.
The real safety you get is that your battery sez "we need to shut
down!" but has enough power to let you. Transactions in progress
aren't logged, but everything else can tank cleanly, and your DB is
in a consistent state.
Mirroring protects you against (some) disk drive failures (but not
those that are transparent to the RAID controller/driver - if your
drive writes crap to the primary side of the mirror and no errors
are returned to the hardware/driver, the other side of the mirror
can faithfully reproduce them on the mirror!)
But since drives contain bearings and such that are much more likely
to fail than electronics (good electronics and good designs, at least),
mechanical failure's more likely and will be known to whatever is driving
the drive. And you're OK then...
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Thu, 30 Nov 2000, Nathan Myers wrote:
After a power outage on an active database, you may have corruption
at low levels of the system, and unless you have enormous redundancy
(and actually use it to verify everything) the corruption may go
undetected and result in (subtly) wrong answers at any future time.
Nathan, why are you so hostile against postgres? Is there an ax to grind?
The conditions under which WAL will completely recover your database:
1) OS guarantees complete ordering of fsync()'d writes. (i.e. having two
blocks A and B, A is fsync'd before B, it could NOT happen that B is on
disk but A is not).
2) on boot recovery, OS must not corrupt anything that was fsync'd.
Rule 1) is met by all unixish OSes in existance. Rule 2 is met by some
filesystems, such as reiserfs, tux2, and softupdates.
Show quoted text
The logging in 7.1 protects transactions against many sources of
database crash, but not necessarily against OS crash, and certainly
not against power failure. (You might get lucky, or you might just
think you were lucky.) This is the same as for most databases; an
embedded database that talks directly to the hardware might be able
to do better.
As for replaying logs against a restored snapshot dump... AIUI, a
dump records tuples by OID, but the WAL refers to TIDs. Therefore,
the WAL won't work as a re-do log to recover your transactions
because the TIDs of the restored tables are all different.
True for current way of backing up - ie saving data in "external"
(sql) format. But there is another way - saving data files in their
natural (binary) format. WAL records may be applyed to
such dump, right?
To get replaying we need an "update log", something that might be
What did you mean by "update log"?
Are you sure that WAL is not "update log" ? -:)
in 7.2 if somebody does a lot of work.
Note I'm not saying you're wrong, just asking that you explain your
comment a little more. If WAL can't be used to help recover from
crashes where database corruption occurs, what good is it?The WAL is a performance optimization for the current recovery
capabilities, which assume uncorrupted table files. It protects
against those database server crashes that happen not to corrupt
the table files (i.e. most). It doesn't protect against corruption
of the tables, by bugs in PG or in the OS or from "hardware events".
It also doesn't protect against OS crashes that result in
write-buffered sectors not having been written before the crash.
Practically, this means that WAL file entries older than a few
seconds are not useful for much.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Even now, without BAR, WAL entries become unuseful only after checkpoints
(and I wouldn't recomend to create them each few seconds -:)). WAL based
BAR will require archiving of log records.
Vadim
On Fri, Dec 01, 2000 at 12:00:12AM -0400, The Hermit Hacker wrote:
On Thu, 30 Nov 2000, Nathan Myers wrote:
On Thu, Nov 30, 2000 at 07:02:01PM -0400, The Hermit Hacker wrote:
v7.1 should improve crash recovery ...
... with the WAL stuff that Vadim is producing, you'll be able to
recover up until the point that the power cable was pulled out of
the wall.Please do not propagate falsehoods like the above. It creates
unsatisfiable expectations, and leads people to fail to take
proper precautions and recovery procedures.After a power outage on an active database, you may have corruption
at low levels of the system, and unless you have enormous redundancy
(and actually use it to verify everything) the corruption may go
undetected and result in (subtly) wrong answers at any future time.The logging in 7.1 protects transactions against many sources of
database crash, but not necessarily against OS crash, and certainly
not against power failure. (You might get lucky, or you might just
think you were lucky.) This is the same as for most databases; an
embedded database that talks directly to the hardware might be able
to do better.We're talking about transaction logging here ... nothing gets written
to it until completed ... if I take a "known to be clean" backup from
the night before, restore that and then run through the transaction
logs, my data should be clean, unless my tape itself is corrupt. If
the power goes off half way through a write to the log, then that
transaction wouldn't be marked as completed and won't roll into the
restore ...
Sorry, wrong. First, the only way that your backups could have any
relationship with the transaction logs is if they are copies of the
raw table files with the database shut down, rather than the normal
"snapshot" backup.
Second, the transaction log is not, as has been noted far too frequently
for Vince's comfort, really written atomically. The OS has promised
to write it atomically, and given the opportunity, it will. If you pull
the plug, all promises are broken.
if a disk goes corrupt, I'd expect that the redo log would possibly
have a problem with corruption .. but if I pull the plug, unless I've
somehow damaged the disk, I would expect my redo log to be clean
*and*, unless Vadim totally messed something up, if there is any
corruption in the redo log, I'd expect that restoring from it would
generate from red flags ...
You have great expectations, but nobody has done the work to satisfy
them, so when you pull the plug, I'd expect that you will be left
in the dark, alone and helpless.
Vadim has done an excellent job on what he set out to do: optimize
transaction processing. Designing and implementing a factor-of-twenty
speed improvement on a professional-quality database engine demanded
great effort and expertise. To complain that he hasn't also done
a lot of other stuff would be petty.
Nathan Myers
ncm@zembu.com
On Fri, Dec 01, 2000 at 01:54:23AM -0500, Alex Pilosov wrote:
On Thu, 30 Nov 2000, Nathan Myers wrote:
After a power outage on an active database, you may have corruption
at low levels of the system, and unless you have enormous redundancy
(and actually use it to verify everything) the corruption may go
undetected and result in (subtly) wrong answers at any future time.Nathan, why are you so hostile against postgres? Is there an ax to grind?
Alex, please don't invent enemies. It's clear what important features
PostgreSQL still lacks; over the next several releases these features
will be implemented, at great expense. PostgreSQL is useful and usable
now, given reasonable precautions and expectations. In the future it
will satisfy greater (albeit still reasonable) expectations.
The conditions under which WAL will completely recover your database:
1) OS guarantees complete ordering of fsync()'d writes. (i.e. having two
blocks A and B, A is fsync'd before B, it could NOT happen that B is on
disk but A is not).
2) on boot recovery, OS must not corrupt anything that was fsync'd.Rule 1) is met by all unixish OSes in existance. Rule 2 is met by some
filesystems, such as reiserfs, tux2, and softupdates.
No. The OS asks the disk to write blocks in a certain order, but
disks normally reorder writes. Not only that; as noted earlier,
typical disks report the write completed long before the blocks
actually hit the disk.
A logging file system protects against the simpler forms of OS crash,
where the OS data-structure corruption is noticed before any more disk
writes are scheduled. It can't (by itself) protect against disk
errors. For critical applications, you must supply that protection
yourself, with (e.g.) battery-backed mirroring.
The logging in 7.1 protects transactions against many sources of
database crash, but not necessarily against OS crash, and certainly
not against power failure. (You might get lucky, or you might just
think you were lucky.) This is the same as for most databases; an
embedded database that talks directly to the hardware might be able
to do better.
The best possible database code can't overcome a broken OS or a broken
disk. It would be unreasonable to expect otherwise.
Nathan Myers
ncm@zembu.com
Date: Fri, 1 Dec 2000 01:54:23 -0500 (EST)
From: Alex Pilosov <alex@pilosoft.com>
On Thu, 30 Nov 2000, Nathan Myers wrote:
After a power outage on an active database, you may have corruption
at low levels of the system, and unless you have enormous redundancy
(and actually use it to verify everything) the corruption may go
undetected and result in (subtly) wrong answers at any future time.
Nathan, why are you so hostile against postgres? Is there an ax to grind?
I don't think he is being hostile (I work with him, so I know that he
is generally pro-postgres).
The conditions under which WAL will completely recover your database:
1) OS guarantees complete ordering of fsync()'d writes. (i.e. having two
blocks A and B, A is fsync'd before B, it could NOT happen that B is on
disk but A is not).
2) on boot recovery, OS must not corrupt anything that was fsync'd.
Rule 1) is met by all unixish OSes in existance. Rule 2 is met by some
filesystems, such as reiserfs, tux2, and softupdates.
I think you are missing his main point, which he stated before, which
is that modern disk hardware is both smarter and stupider than most
people realize.
Some disks cleverly accept writes into a RAM cache, and return a
completion signal as soon as they have done that. They then feel free
to reorder the writes to magnetic media as they see fit. This
significantly helps performance. However, it means that all bets off
on a sudden power loss.
Your rule 1 is met at the OS level, but it is not met at the physical
drive level. The fact that the OS guarantees ordering of fsync()'d
writes means little since the drive is capable of reordering writes
behind the back of the OS.
At least with IDE, it is possible to tell the drive to disable this
sort of caching and reordering. However, GNU/Linux, at least, does
not do this. After all, doing it would hurt performance, and would
move us back to the old days when operating systems had to care a
great deal about disk geometry.
I expect that careful attention to the physical disks you purchase can
help you avoid these problems. For example, I would hope that EMC
disk systems handle power loss gracefully. But if you buy ordinary
off the shelf PC hardware, you really do need to arrange for a UPS,
and some sort of automatic shutdown if the UPS is running low.
Otherwise, although the odds are certainly with you, there is no 100%
guarantee that a busy database will survive a sudden power outage.
Ian
On Thu, Nov 30, 2000 at 11:06:31PM -0800, Vadim Mikheev wrote:
As for replaying logs against a restored snapshot dump... AIUI, a
dump records tuples by OID, but the WAL refers to TIDs. Therefore,
the WAL won't work as a re-do log to recover your transactions
because the TIDs of the restored tables are all different.True for current way of backing up - ie saving data in "external"
(sql) format. But there is another way - saving data files in their
natural (binary) format. WAL records may be applyed to
such dump, right?
But (AIUI) you can only safely/usefully copy those files when the
database is shut down.
Many people hope to run PostgreSQL 24x7x365. With vacuuming, you
might just as well shut down afterward; but when that goes away
(in 7.2?), when will you get the chance to take your backups?
Clearly we need either another form of snapshot backup that can
be taken with the database running, and compatible with the
current WAL (or some variation on it); or, we need another kind
of log, in addition to the WAL.
To get replaying we need an "update log", something that might be
in 7.2 if somebody does a lot of work.What did you mean by "update log"?
Are you sure that WAL is not "update log" ? -:)
No, I'm not sure. I think it's possible that a new backup utility
could be written to make a hot backup which could be restored and
then replayed using the current WAL format. It might be easier to
add another log which could be replayed against the existing form
of backups. That last is what I called the "update log".
The point is, WAL now does one job superbly: maintain a consistent
on-disk database image. Asking it to do something else, such as
supporting hot BAR, could interfere with it doing its main job.
Of course, only the person who implements hot BAR can say.
Nathan Myers
ncm@zembu.com
No, WAL does help, cause you can then pull in your last dump and recover
up to the moment that power cable was pulled out of the wall ...False, on so many counts I can't list them all.
would love to hear them ... I'm always opening to having my
misunderstandings corrected ...
Only what has been transferred off site can be considered safe.
But: all the WAL improvements serve to reduce the probability that
you 1. need to restore and 2. need to restore from offsite backups.
If you need to restore from offsite backup you loose transactions
unless you transfer the WAL synchronously with every commit.
Andreas
Import Notes
Resolved by subject fallback
At 00:55 1/12/00 -0800, Nathan Myers wrote:
On Thu, Nov 30, 2000 at 11:06:31PM -0800, Vadim Mikheev wrote:
As for replaying logs against a restored snapshot dump... AIUI, a
dump records tuples by OID, but the WAL refers to TIDs. Therefore,
the WAL won't work as a re-do log to recover your transactions
because the TIDs of the restored tables are all different.True for current way of backing up - ie saving data in "external"
(sql) format. But there is another way - saving data files in their
natural (binary) format. WAL records may be applyed to
such dump, right?But (AIUI) you can only safely/usefully copy those files when the
database is shut down.
This is not true; the way Vadim has implemeted WAL is to write a series of
files of fixed size. When all transactions that have records in one file
have completed, that file is (currently) deleted. When BAR is going, the
files will be archived.
The only circumstance in which this strategy will fail is if there are a
large number of intensive long-standing single transactions - which is
unlikely (not to mention bad practice).
As a result of this, BAR will just need to take a snapshot of the database
and apply the logs (basically like a very extended recovery process).
You have raised some interesting issues regrading write-order etc. Can we
assume that when fsync *returns*, all records are written - though not
necessarily in the order that the IO's were executed?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
At 11:06 PM 11/30/00 -0800, Vadim Mikheev wrote:
As for replaying logs against a restored snapshot dump... AIUI, a
dump records tuples by OID, but the WAL refers to TIDs. Therefore,
the WAL won't work as a re-do log to recover your transactions
because the TIDs of the restored tables are all different.True for current way of backing up - ie saving data in "external"
(sql) format. But there is another way - saving data files in their
natural (binary) format. WAL records may be applyed to
such dump, right?
Right. That's what's missing in PG 7.1, the existence of tools to
make such backups.
Probably the best answer to the "what does WAL get us, if it doesn't
get us full recoverability" questions is to simply say "it's a prerequisite
to getting full recoverability, PG 7.1 sets the foundation and later
work will get us there".
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 12:30 AM 12/1/00 -0800, Ian Lance Taylor wrote:
For example, I would hope that EMC
disk systems handle power loss gracefully.
They must, their marketing literature says so :)
But if you buy ordinary
off the shelf PC hardware, you really do need to arrange for a UPS,
and some sort of automatic shutdown if the UPS is running low.
Which is what disk subsystems like those from EMC do for you. They've
got build-in battery backup that lets them guarantee (assuming the
hardware's working right) that in the case of a power outage, all blocks
the operating system thinks have been written will in actuality be written
before the disk subsystem powers itself down.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 12:55 AM 12/1/00 -0800, Nathan Myers wrote:
Many people hope to run PostgreSQL 24x7x365. With vacuuming, you
might just as well shut down afterward; but when that goes away
(in 7.2?), when will you get the chance to take your backups?
Clearly we need either another form of snapshot backup that can
be taken with the database running, and compatible with the
current WAL (or some variation on it); or, we need another kind
of log, in addition to the WAL.
Vadim's not ignorant of such matters, when he says "make a copy
of the files" he's not talking about using tar on a running
database. BAR tools are needed, as Vadim has pointed out here in
the past.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
As for replaying logs against a restored snapshot dump... AIUI, a
dump records tuples by OID, but the WAL refers to TIDs. Therefore,
the WAL won't work as a re-do log to recover your transactions
because the TIDs of the restored tables are all different.True for current way of backing up - ie saving data in "external"
(sql) format. But there is another way - saving data files in their
natural (binary) format. WAL records may be applyed to
such dump, right?But (AIUI) you can only safely/usefully copy those files when the
database is shut down.
No. You can read/save datafiles at any time. But block reads must be
"atomic" - no one should be able to change any part of a block while
we read it. Cp & tar are probably not suitable for this, but internal
BACKUP command could do this.
Restoring from such backup will like recovering after pg_ctl -m i stop: all
data blocks are consistent and WAL records may be applyed to them.
Many people hope to run PostgreSQL 24x7x365. With vacuuming, you
might just as well shut down afterward; but when that goes away
(in 7.2?), when will you get the chance to take your backups?
Ability to shutdown 7.2 will be preserved -:))
But it's not required for backup.
To get replaying we need an "update log", something that might be
in 7.2 if somebody does a lot of work.What did you mean by "update log"?
Are you sure that WAL is not "update log" ? -:)No, I'm not sure. I think it's possible that a new backup utility
could be written to make a hot backup which could be restored and
then replayed using the current WAL format. It might be easier to
add another log which could be replayed against the existing form
of backups. That last is what I called the "update log".
Consistent read of data blocks is easier to implement, sure.
The point is, WAL now does one job superbly: maintain a consistent
on-disk database image. Asking it to do something else, such as
supporting hot BAR, could interfere with it doing its main job.
Of course, only the person who implements hot BAR can say.
There will be no interference because of BAR will not ask WAL to do
anything else it does right now - redo-ing changes.
Vadim
On Fri, Dec 01, 2000 at 06:39:57AM -0800, Don Baccus wrote:
Probably the best answer to the "what does WAL get us, if it doesn't
get us full recoverability" questions is to simply say "it's a
prerequisite to getting full recoverability, PG 7.1 sets the foundation
and later work will get us there".
Not to quibble, but for most of us, the answer to Don's question is:
"It gives a ~20x speedup over 7.0." That's pretty valuable to some of us.
If it turns out to be useful for other stuff, that's gravy.
Nathan Myers
ncm@zembu.com
On Fri, Dec 01, 2000 at 10:01:15AM +0100, Zeugswetter Andreas SB wrote:
No, WAL does help, cause you can then pull in your last dump and recover
up to the moment that power cable was pulled out of the wall ...False, on so many counts I can't list them all.
would love to hear them ... I'm always opening to having my
misunderstandings corrected ...Only what has been transferred off site can be considered safe.
But: all the WAL improvements serve to reduce the probability that
you 1. need to restore and 2. need to restore from offsite backups.If you need to restore from offsite backup you loose transactions
unless you transfer the WAL synchronously with every commit.
Currently the only way to avoid losing those transactions is by
replicating transactions at the application layer. That is, the
application talks to two different database instances, and enters
transactions into both. That's pretty hard to retrofit into an
existing application, so you'd really rather have replication in
the database. Of course, that's something PostgreSQL, Inc. is also
working on.
Nathan Myers
ncm@zembu.com
At 11:02 AM 12/1/00 -0800, Nathan Myers wrote:
On Fri, Dec 01, 2000 at 06:39:57AM -0800, Don Baccus wrote:
Probably the best answer to the "what does WAL get us, if it doesn't
get us full recoverability" questions is to simply say "it's a
prerequisite to getting full recoverability, PG 7.1 sets the foundation
and later work will get us there".Not to quibble, but for most of us, the answer to Don's question is:
"It gives a ~20x speedup over 7.0." That's pretty valuable to some of us.
If it turns out to be useful for other stuff, that's gravy.
Oh, but given that power failures eat disks anyway, you can just run PG 7.0
with -F and be just as fast as PG 7.1, eh? With no theoretical loss in
safety? Where's your faith in all that doom and gloom you've been
spreading? :) :)
You're right, of course, we'll get roughly -F performance while maintaining
a much more comfortable level of risk than you get with -F.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Fri, Dec 01, 2000 at 09:13:28PM +1100, Philip Warner wrote:
You have raised some interesting issues regrading write-order etc. Can we
assume that when fsync *returns*, all records are written - though not
necessarily in the order that the IO's were executed?
Not with ordinary disks. With a battery-backed disk server, yes.
Nathan Myers
ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes:
On Fri, Dec 01, 2000 at 09:13:28PM +1100, Philip Warner wrote:
You have raised some interesting issues regrading write-order etc. Can we
assume that when fsync *returns*, all records are written - though not
necessarily in the order that the IO's were executed?
Not with ordinary disks. With a battery-backed disk server, yes.
I think the real point of this discussion is that there's no such thing
as an ironclad guarantee. That's why people make backups.
All we can do is the best we can ;-). In that light, I think it's
reasonable for Postgres to proceed on the assumption that fsync does
what it claims to do, ie, all blocks are written when it returns.
We can't realistically expect to persuade a disk controller that
reorders writes to stop doing so. We can, however, expect that we've
minimized the probability of failures induced by anything other than
disk hardware failure or power failure.
regards, tom lane
At 11:09 AM 12/1/00 -0800, Nathan Myers wrote:
On Fri, Dec 01, 2000 at 10:01:15AM +0100, Zeugswetter Andreas SB wrote:
If you need to restore from offsite backup you loose transactions
unless you transfer the WAL synchronously with every commit.
Currently the only way to avoid losing those transactions is by
replicating transactions at the application layer. That is, the
application talks to two different database instances, and enters
transactions into both. That's pretty hard to retrofit into an
existing application, so you'd really rather have replication in
the database. Of course, that's something PostgreSQL, Inc. is also
working on.
Recovery alone isn't quite that difficult. You don't need to instantiate
your database instance until you need to apply the archived transactions,
i.e. after catastrophic failure destroys your db server.
You need to do two things:
1. Transmit a consistent (known-state) snapshot of the database offsite.
2. Synchronously tranfer the WAL as part of every commit (question, do
wait to log a "commit" locally until after the remote site acks that
it got the WAL?)
Then you take a new machine, build a database out of the snapshot, and
apply the archived redo logs and off you go. If you get tired of saving
oodles of redo archives, you make a new snapshot and accumulate the
WAL from that point forward.
Of course, that's not a fast failover solution. The scenario you describe
leads to being able to quickly switch over to a backup server when the
primary server fails. Much better for 24/7/365-style computing.
Exactly what is PostgreSQL, Inc doing in this area? I've not seen
discussions about it here, and the two of the three most active developers
(Jan and Tom) work for Great Bridge, not PostgreSQL, Inc...
I should think Vadim should play a large role in any effort to add WAL-based
replication to Postgres.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Fri, Dec 01, 2000 at 08:10:40AM -0800, Vadim Mikheev wrote:
... a new backup utility
could be written to make a hot backup which could be restored and
then replayed using the current WAL format. It might be easier to
add another log which could be replayed against the existing form
of backups. That last is what I called the "update log".Consistent read of data blocks is easier to implement, sure.
The point is, WAL now does one job superbly: maintain a consistent
on-disk database image. Asking it to do something else, such as
supporting hot BAR, could interfere with it doing its main job.
Of course, only the person who implements hot BAR can say.There will be no interference because of BAR will not ask WAL to do
anything else it does right now - redo-ing changes.
The interference I meant is that the current WAL file format is designed
for its current job. For BAR, you would be better-served by a more
compact format, so you need not archive your logs so frequently.
(The size of the WAL doesn't matter much because you can rotate them
very quickly.) A more compact format is also better as a basis for
replication, to minimize network traffic. To compress the WAL would
hurt performance -- but adding performance was the point of the WAL.
A log encoded at a much higher semantic level could be much more
compact, but wouldn't be useful as a WAL because it describes
differences from a snapshot backup, not from the current table
file contents.
Thus, I'm not saying that you can't implement both WAL and hot BAR
using the same log; rather, it's just not _obviously_ the best way to
do it.
Nathan Myers
ncm@zembu.com
Ok, this has peaked my interest in learning exactly what WAL is and what it
does... I don't see any in-depth explanation of WAL on the postgresql.org
site, can someone point me to some documentation? (if any exists, that is).
Thanks!
-Mitch
----- Original Message -----
From: "Nathan Myers" <ncm@zembu.com>
To: <pgsql-hackers@postgresql.org>
Sent: Friday, December 01, 2000 11:02 AM
Subject: Re: [HACKERS] beta testing version
Show quoted text
On Fri, Dec 01, 2000 at 06:39:57AM -0800, Don Baccus wrote:
Probably the best answer to the "what does WAL get us, if it doesn't
get us full recoverability" questions is to simply say "it's a
prerequisite to getting full recoverability, PG 7.1 sets the foundation
and later work will get us there".Not to quibble, but for most of us, the answer to Don's question is:
"It gives a ~20x speedup over 7.0." That's pretty valuable to some of us.
If it turns out to be useful for other stuff, that's gravy.Nathan Myers
ncm@zembu.com
On Fri, Dec 01, 2000 at 11:48:23AM -0800, Don Baccus wrote:
At 11:09 AM 12/1/00 -0800, Nathan Myers wrote:
On Fri, Dec 01, 2000 at 10:01:15AM +0100, Zeugswetter Andreas SB wrote:
If you need to restore from offsite backup you loose transactions
unless you transfer the WAL synchronously with every commit.Currently the only way to avoid losing those transactions is by
replicating transactions at the application layer. That is, the
application talks to two different database instances, and enters
transactions into both. That's pretty hard to retrofit into an
existing application, so you'd really rather have replication in
the database. Of course, that's something PostgreSQL, Inc. is also
working on.Recovery alone isn't quite that difficult. You don't need to instantiate
your database instance until you need to apply the archived transactions,
i.e. after catastrophic failure destroys your db server.
True, it's sufficient for the application just to log the text of
its updating transactions off-site. Then, to recover, instantiate
a database from a backup and have the application re-run its
transactions.
You need to do two things:
(Remember, we're talking about what you could do *now*, with 7.1.
Presumably with 7.2 other options will open.)
1. Transmit a consistent (known-state) snapshot of the database offsite.
2. Synchronously tranfer the WAL as part of every commit (question, do
wait to log a "commit" locally until after the remote site acks that
it got the WAL?)Then you take a new machine, build a database out of the snapshot, and
apply the archived redo logs and off you go. If you get tired of saving
oodles of redo archives, you make a new snapshot and accumulate the
WAL from that point forward.
I don't know of any way to synchronously transfer the WAL, currently.
Anyway, I would expect doing it to interfere seriously with performance.
The "wait to log a 'commit' locally until after the remote site acks that
it got the WAL" is (akin to) the familiar two-phase commit.
Nathan Myers
ncm@zembu.com
At 12:56 PM 12/1/00 -0800, Nathan Myers wrote:
(Remember, we're talking about what you could do *now*, with 7.1.
Presumably with 7.2 other options will open.)
Maybe *you* are :) Seriously, I'm thinking out loud about future
possibilities. Putting a lot of work into building up a temporary
solution on top of 7.1 doesn't make a lot of sense, anyone wanting
to work on such things ought to think about 7.2, which presumably will
beta sometime mid-2001 or so???
And I don't think there are 7.1 hacks that are simple ... could be
wrong, though.
I don't know of any way to synchronously transfer the WAL, currently.
Nope.
Anyway, I would expect doing it to interfere seriously with performance.
Yep. Anyone here have experience with replication and Oracle or others?
I've heard from one source that setting it up reliabily in Oracle and
getting the switch from the dead to the backup server working properly was
something of a DBA nightmare, but that's true of just about anything in
Oracle. Once it was up, it worked reliably, though (also typical
of Oracle).
The "wait to log a 'commit' locally until after the remote site acks that
it got the WAL" is (akin to) the familiar two-phase commit.
Right.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
Don Baccus writes:
Exactly what is PostgreSQL, Inc doing in this area?
Good question... See http://www.erserver.com/.
I've not seen discussions about it here, and the two of the three most
active developers (Jan and Tom) work for Great Bridge, not PostgreSQL,
Inc...
Vadim Mikheev and Thomas Lockhart work for PostgreSQL, Inc., at least in
some form or another. Which *might* be construed as a reason for their
perceived inactivity.
--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
From: "Nathan Myers" <ncm@zembu.com>
On Thu, Nov 30, 2000 at 07:02:01PM -0400, The Hermit Hacker wrote:
[snip]
The logging in 7.1 protects transactions against many sources of
database crash, but not necessarily against OS crash, and certainly
not against power failure. (You might get lucky, or you might just
think you were lucky.) This is the same as for most databases; an
embedded database that talks directly to the hardware might be able
to do better.
If PG had a type of tree based logging filesystem, that it self handles,
wouldn't that be almost perfectly safe? I mean that you might lose some data
in an transaction, but the client never gets an OK anyways...
Like a combination of raw block io and tux2 like fs.
Doesn't Oracle do it's own block io, no?
Magnus
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Programmer/Networker [|] Magnus Naeslund
PGP Key: http://www.genline.nu/mag_pgp.txt
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
At 05:42 PM 12/2/00 +0100, Peter Eisentraut wrote:
Don Baccus writes:
Exactly what is PostgreSQL, Inc doing in this area?
Good question... See http://www.erserver.com/.
"Advanced Replication and Distributed Information capabilities are also under development to meet specific
business and competitive requirements for both PostgreSQL, Inc. and clients. Several of these enhanced
PostgreSQL, Inc. developments may remain proprietary for up to 24 months, with availability limited to
clients and partners, in order to assist us in recovering development costs and continue to provide funding
for our other Open Source contributions. "
Boy, I can just imagine the uproar this statement will cause on Slashdot when
the world finds out about it.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Sat, Dec 02, 2000 at 11:31:37AM -0800, Don Baccus wrote:
At 05:42 PM 12/2/00 +0100, Peter Eisentraut wrote:
Don Baccus writes:
Exactly what is PostgreSQL, Inc doing in this area?
Good question... See http://www.erserver.com/.
<snip>
Boy, I can just imagine the uproar this statement will cause on Slashdot when
the world finds out about it.
That one doesn't worry me us much as this quote from the press release at
http://www.pgsql.com/press/PR_5.html
"We expect to have the source code tested and ready to contribute to
the open source community before the middle of October. Until that time
we are considering requests from a number of development companies and
venture capital groups to join us in this process."
Where's the damn core code? I've seen a number of examples already of
people asking about remote access/replication function, with an eye
toward implementing it, and being told "PostgreSQL, Inc. is working
on that". It's almost Microsoftesque: preannounce future functionality
suppressing the competition.
I realize this is probably just the typical deadline slip that we see
on the public releases of pgsql itself, not a silent retraction of the
promise to release the code (especially since some of the same core
people are involved), but there is a difference: if I absolutely need
something that's only in CVS right now, I can bite the bullet and use
a snapshot server. With erserver, I'm stuck sitting on my hands, with a
promise of future functionality. Well, not really sitting on my hands:
working on other tasks, with the assumption that erserver will be there
soon. I'd rather not roll my own in an incompatable way, and have to
port or redo the custom parts.
So, now I'm going into a couple critical, funding decision making
meetings in the next few weeks. I was planning on being able to promise
certain systems with concrete knowledge of what I will and won't be
able to provide, and how much custom coding will be needed. Now, If the
schedsule slips much more, I won't. It's even possible that the erserver's
implementation won't fit my needs at all, and I'll be back rolling my own.
I realize this sounds a bit ungrateful: they're giving away the code,
after all, and potentially saving my a lot of work.
It's just the contrast between the really open work on the core server,
and the lack of a peep when the promised deadlines have rolled past that
gets under my skin.
I'd be really happy with someone reiterating the commitment to an
open release, and letting us all know how badly the schedule has
slipped. Remember, we're all here to help! Get everyone stomping bugs
in code you're going to release soon anyway, and concentrate on the
quasi-propriatary extensions.
Ross
On Sat, 2 Dec 2000, Don Baccus wrote:
...
Will Great Bridge step to the plate and fund a truly open source alternative,
leaving us with a potential code fork? If IB gets its political problems
under control and developers rally around it, two years is going to be a
long time to just sit back and wait for PG, Inc to release eRServer.
I doubt that. There is an IB (Interbase) replication option today, but
you must purchase it. That isn't so bad actually. PostgreSQL looks to be
going that way too: base functionality is open source, periphial
companies make money selling extensions.
Besides simple master-slave replication is old news anyhow, and not
terribly useful. Products like FrontBase (www.frontbase.com) have full
shared-nothing cluster support too (FrontBase is commerical). Clustering
is a much better solution for redundancy purposes that replication.
Tom
Import Notes
Reply to msg id not found: 3.0.1.32.20001202141117.017d5bf0@mail.pacifier.com | Resolved by subject fallback
At 03:51 PM 12/2/00 -0600, Ross J. Reedstrom wrote:
"We expect to have the source code tested and ready to contribute to
the open source community before the middle of October. Until that time
we are considering requests from a number of development companies and
venture capital groups to join us in this process."Where's the damn core code? I've seen a number of examples already of
people asking about remote access/replication function, with an eye
toward implementing it, and being told "PostgreSQL, Inc. is working
on that". It's almost Microsoftesque: preannounce future functionality
suppressing the competition.
Well, this is just all 'round a bad precedent and an unwelcome path
for PostgreSQL, Inc to embark upon.
They've also embarked on one fully proprietary product (built on PG),
which means they're not an Open Source company, just a sometimes Open
Source company.
It's a bit ironic to learn about this on the same day I learned that
Solaris 8 is being made available in source form. Sun's slowly "getting
it" and moving glacially towards Open Source, while PostgreSQL, Inc.
seems to be drifting in the opposite direction.
if I absolutely need
something that's only in CVS right now, I can bite the bullet and use
a snapshot server.
This work might be released as Open Source, but it isn't an open development
scenario. The core work's not available for public scrutiny, and the details
of what they're actually up don't appear to be public either.
OK, they're probably funding Vadim's work on WAL, so the idictment's probably
not 100% accurate - but I don't know that.
I'd be really happy with someone reiterating the commitment to an
open release, and letting us all know how badly the schedule has
slipped. Remember, we're all here to help! Get everyone stomping bugs
in code you're going to release soon anyway, and concentrate on the
quasi-propriatary extensions.
Which makes me wonder, is Vadim's time going to be eaten up by working
on these quasi-proprietary extensions that the rest of us won't get
for two years unless we become customers of Postgres, Inc?
Will Great Bridge step to the plate and fund a truly open source alternative,
leaving us with a potential code fork? If IB gets its political problems
under control and developers rally around it, two years is going to be a
long time to just sit back and wait for PG, Inc to release eRServer.
These developments are a major annoyance.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 01:52 PM 12/2/00 -0800, Tom Samplonius wrote:
I doubt that. There is an IB (Interbase) replication option today, but
you must purchase it. That isn't so bad actually. PostgreSQL looks to be
going that way too: base functionality is open source, periphial
companies make money selling extensions.
PostgreSQL, Inc perhaps has that as a game plan. Thus far Great Bridge claims
to be 100% devoted to the Open Source model.
Besides simple master-slave replication is old news anyhow, and not
terribly useful. Products like FrontBase (www.frontbase.com) have full
shared-nothing cluster support too (FrontBase is commerical). Clustering
is a much better solution for redundancy purposes that replication.
I'm not so much concerned about exactly what PG, Inc is planning to offer
as a proprietary piece - I'm purist enough that I worry about what this
signals for their future direction.
If PG, Inc starts doing proprietary chunks, and Great Bridge remains 100%
dedicated to Open Source, I know who I'll want to succeed and prosper.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Sat, Dec 02, 2000 at 03:47:19PM -0800, Adam Haberlach wrote:
Where's the damn core code? I've seen a number of examples already of
people asking about remote access/replication function, with an eye
toward implementing it, and being told "PostgreSQL, Inc. is working
on that". It's almost Microsoftesque: preannounce future functionality
suppressing the competition.
Well, I'll admit that this was getting a little over the top, especially
quoted out of context. ;-)
For What It's Worth: In the three years (has it really been that long?)
that I've been off and on Postgres mailing lists, I've probably seen at
least 100 requests for replication, with about 40 of them mentioning
implementing it themself.I'm pretty sure that being told "PostgreSQL Inc. is working on that" is
not the only thing stopping it from happening. Most people just aren't up
to making it happen.
Indeed. And it's only been less than a year that that response
has been given. However, it is only in that same timespan that the
functionality and performance of the core server gotten to the point
were replication/remote access is one of immediately fruitful itches to
scratch. We'll see what happens in the future.
Ross
Import Notes
Reply to msg id not found: 20001202154719.B16960@ricochet.net
On Sat, Dec 02, 2000 at 03:51:15PM -0600, Ross J. Reedstrom wrote:
On Sat, Dec 02, 2000 at 11:31:37AM -0800, Don Baccus wrote:
At 05:42 PM 12/2/00 +0100, Peter Eisentraut wrote:
Don Baccus writes:
Exactly what is PostgreSQL, Inc doing in this area?
Good question... See http://www.erserver.com/.
<snip>
Boy, I can just imagine the uproar this statement will cause on Slashdot when
the world finds out about it.That one doesn't worry me us much as this quote from the press release at
http://www.pgsql.com/press/PR_5.html
"We expect to have the source code tested and ready to contribute to
the open source community before the middle of October. Until that time
we are considering requests from a number of development companies and
venture capital groups to join us in this process."Where's the damn core code? I've seen a number of examples already of
people asking about remote access/replication function, with an eye
toward implementing it, and being told "PostgreSQL, Inc. is working
on that". It's almost Microsoftesque: preannounce future functionality
suppressing the competition.
For What It's Worth: In the three years (has it really been that long?)
that I've been off and on Postgres mailing lists, I've probably seen at
least 100 requests for replication, with about 40 of them mentioning
implementing it themself.
I'm pretty sure that being told "PostgreSQL Inc. is working on that" is
not the only thing stopping it from happening. Most people just aren't up
to making it happen.
--
Adam Haberlach |"California's the big burrito, Texas is the big
adam@newsnipple.com | taco ... and following that theme, Florida is
http://www.newsnipple.com| the big tamale ... and the only tamale that
'88 EX500 | counts any more." -- Dan Rather
PostgreSQL, Inc perhaps has that as a game plan.
I'm not so much concerned about exactly what PG, Inc is planning to offer
as a proprietary piece - I'm purist enough that I worry about what this
signals for their future direction.
Hmm. What has kept replication from happening in the past? It is a big
job and difficult to do correctly. It is entirely my fault that you
haven't seen the demo code released; I've been packaging it to make it a
bit easier to work with.
If PG, Inc starts doing proprietary chunks, and Great Bridge remains 100%
dedicated to Open Source, I know who I'll want to succeed and prosper.
Let me be clear: PostgreSQL Inc. is owned and controlled by people who
have lived the Open Source philosophy, which is not typical of most
companies in business today. We are eager to show how this can be done
on a full time basis, not only as an avocation. And we are eager to do
this as part of the community we have helped to build.
As soon as you find a business model which does not require income, let
me know. The .com'ers are trying it at the moment, and there seems to be
a few flaws... ;)
- Thomas
At 02:58 AM 12/3/00 +0000, Thomas Lockhart wrote:
PostgreSQL, Inc perhaps has that as a game plan.
I'm not so much concerned about exactly what PG, Inc is planning to offer
as a proprietary piece - I'm purist enough that I worry about what this
signals for their future direction.Hmm. What has kept replication from happening in the past? It is a big
job and difficult to do correctly.
Presumably what has kept it from happening in the past is that other
things were of much higher priority. Replicating a database on an
engine as unreliable as PG was in earlier incarnations would simply
replicate your problems, for instance.
This statement of yours kinda belittles the work done over the past
few years by volunteers. It also ignores the fact that folks in other
companies do get paid to work on open source software full-time without having
to resort to creating closed source, proprietary products.
It is entirely my fault that you
haven't seen the demo code released; I've been packaging it to make it a
bit easier to work with.
OK, good, this part gets open sourced. Still not an open development model.
Knowing details about what's going on while code's being developed, not to mention
being able to critique decisions, is one of the major benefits of the open
development model.
Let me be clear: PostgreSQL Inc. is owned and controlled by people who
have lived the Open Source philosophy, which is not typical of most
companies in business today. We are eager to show how this can be done
on a full time basis, not only as an avocation.
Building closed source proprietary products helps you live the open source
philosophy on a full-time basis?
...
As soon as you find a business model which does not require income, let
me know.
Red herring, and you know it. The question isn't whether or not your business
generates income, but how it generates income.
Your comment is the classic one tossed out by closed-source, proprietary
software advocates who dismiss open source software out-of-hand.
Couldn't you think of something better, at least? Like ... something
original?
The .com'ers are trying it at the moment, and there seems to be
a few flaws... ;)
That's a horrible analogy, and I suspect you know it, but at least it is
original.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
This statement of yours kinda belittles the work done over the past
few years by volunteers.
imho it does not, and if somehow you can read that into it then you have
a much different understanding of language than I. I *am* one of those
volunteers, and know that the hundreds of hours I have contributed is
only a small part of the whole.
My discussion on this is over; apologies to others for helping to waste
bandwidth :(
I'll be happy to continue it next over some beers, which is a much more
appropriate setting.
- Thomas
Thomas Lockhart wrote:
PostgreSQL, Inc perhaps has that as a game plan.
I'm not so much concerned about exactly what PG, Inc is planning to offer
as a proprietary piece - I'm purist enough that I worry about what this
signals for their future direction.Hmm. What has kept replication from happening in the past? It is a big
job and difficult to do correctly.
Well, this has nothing whatsoever to do with open or closed source. Linux
and FreeBSD are much larger, much harder to do correctly, as they are supersets
of thousands of open source projects. Complexity is not relative to licensing.
If PG, Inc starts doing proprietary chunks, and Great Bridge remains 100%
dedicated to Open Source, I know who I'll want to succeed and prosper.Let me be clear: PostgreSQL Inc. is owned and controlled by people who
have lived the Open Source philosophy, which is not typical of most
companies in business today.
That's one of the reasons why it's worked... open source meant open
contribution, open collaboration, open bug fixing. The price of admission
was doing your own installs, service, support, and giving something back....
PG, I assume, is pretty much the same as most open source projects, massive
amounts of contribution shepherded by one or two individuals.
We are eager to show how this can be done
on a full time basis, not only as an avocation. And we are eager to do
this as part of the community we have helped to build.
As soon as you find a business model which does not require income, let
me know. The .com'ers are trying it at the moment, and there seems to be
a few flaws... ;)
Well, whether or not a product is open, or closed, has very little
to do with commercial success. Heck, the entire IBM PC spec was open, and
that certainly didn't hurt Dell, Compaq, etc.... the genie coming out
of the bottle _only_ hurt IBM. In this case, however, the genie's been
out for quite a while....
BUT:
People don't buy a product because it's open, they buy it because it offers
significant value above and beyond what they can do *without* paying for
a product. Linus didn't start a new kernel out of some idealistic mantra
of freeing the world, he was broke and wanted a *nix-y OS. Years later,
the product has grown massively. Those who are profiting off of it are
unrelated to the code, to most of the developers.... why is this?
As it is, any company trying to make a closed version of an open source
product has some _massive_ work to do. Manuals. Documentation. Sales.
Branding. Phone support lines. Legal departments/Lawsuit prevention. Figuring
out how to prevent open source from stealing the thunder by duplicating
features. And building a _product_.
Most Open Source projects are not products, they are merely code, and some
horrid documentation, and maybe some support. The companies making money
are not making better code, they are making better _products_....
And I really havn't seen much in the way of full featured products, complete
with printed docs, 24 hour support, tutorials, wizards, templates, a company
to sue if the code causes damage, GUI install, setup, removal, etc. etc. etc.
Want to make money from open source? Well, you have to find, or build,
a _product_. Right now, there are no OS db products that can compare to oh,
an Oracle product, a MSSQL product. There may be superior code, but that
doesn't make a difference in business. Business has very little to do
with building the perfect mousetrap, if nobody can easily use it.
-Bop
--
Brought to you from boop!, the dual boot Linux/Win95 Compaq Presario 1625
laptop, currently running RedHat 6.1. Your bopping may vary.
On Sat, Dec 02, 2000 at 07:32:14PM -0800, Don Baccus wrote:
At 02:58 AM 12/3/00 +0000, Thomas Lockhart wrote:
PostgreSQL, Inc perhaps has that as a game plan.
I'm not so much concerned about exactly what PG, Inc is planning to offer
as a proprietary piece - I'm purist enough that I worry about what this
.
.
.
As soon as you find a business model which does not require income, let
me know.Red herring, and you know it. The question isn't whether or not your business
generates income, but how it generates income.
So far, Open Source doesn't. The VA Linux IPO made ME some income,
but I'm not sure that was part of their plan...
Your comment is the classic one tossed out by closed-source, proprietary
software advocates who dismiss open source software out-of-hand.Couldn't you think of something better, at least? Like ... something
original?The .com'ers are trying it at the moment, and there seems to be
a few flaws... ;)That's a horrible analogy, and I suspect you know it, but at least it is
original.
It wasn't an analogy.
In any case, can we create pgsql-politics so we don't have to go over
this issue every three months? Can we create pgsql-benchmarks while we
are at it, to take care of the other thread that keeps popping up?
--
Adam Haberlach |"California's the big burrito, Texas is the big
adam@newsnipple.com | taco ... and following that theme, Florida is
http://www.newsnipple.com| the big tamale ... and the only tamale that
'88 EX500 | counts any more." -- Dan Rather
And I really havn't seen much in the way of full featured products, complete
with printed docs, 24 hour support, tutorials, wizards, templates, a company
to sue if the code causes damage, GUI install, setup, removal, etc. etc. etc.
Mac OS X.
;-)
-pmb
--
bierman@apple.com
"4 out of 5 people with the wrong hardware want to run Mac OS X because..."
http://www.newertech.com/oscompatibility/osxinfo.html
At 09:29 PM 12/2/00 -0800, Adam Haberlach wrote:
Red herring, and you know it. The question isn't whether or not your business
generates income, but how it generates income.So far, Open Source doesn't. The VA Linux IPO made ME some income,
but I'm not sure that was part of their plan...
VA Linux is a HARDWARE COMPANY. They sell servers. "We've engineered 2U
performance into a 1U box" is their current line.
Dell probably makes more money on their Linux server offerings (I have to
admit that donb.photo.net is running on one of their PowerEdge servers) than
VA Linux does.
If I can show you a HARDWARE COMPANY that is diving on selling MS NT servers,
will you agree that this proves that the closed source and open source models
both must be wrong, because HARDWARE COMPANIES based on each paradigm are
losing money???
The .com'ers are trying it at the moment, and there seems to be
a few flaws... ;)That's a horrible analogy, and I suspect you know it, but at least it is
original.
It wasn't an analogy.
Sure it is. Read, damn it. First he makes the statement that a business
based on open source is, by definition, a zero-revenue company then he
raises the spectre of .com companies (how many of them are open source?)
as support for his argument.
OK, it's not an analogy, it's a disassociation with reality. Feel better?
In any case, can we create pgsql-politics so we don't have to go over
this issue every three months?
Maybe you don't care about the open source aspect of this, but as a user
with about 1500 Open Source advocates using my code, I do. If IB comes
forth in a fully Open Source state my user base will insist I switch.
And I will.
And I'll stop telling the world that MySQL sucks, too. Or at least that
they suck worse than the PG world :)
There is risk here. It isn't so much in the fact that PostgreSQL, Inc
is doing a couple of modest closed-source things with the code. After
all, the PG community has long acknowleged that the BSD license would
allow others to co-op the code and commercialize it with no obligations.
It is rather sad to see PG, Inc. take the first step in this direction.
How long until the entire code base gets co-opted?
(Yeah, that's extremist, but seeing PG, Inc. lay down the formal foundation
for such co-opting by taking the first step might well make the potential
reality become real. It certainly puts some of the long-term developers
in no position to argue against such a co-opted snitch of the code).
I have to say I'm feeling pretty silly about raising such an effort to
increase PG awareness in mindshare vs. MySQL. I mean, if PG, Inc's
efforts somehow delineate the hopes and goals of the PG community, I'm
fairly disgusted.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 04:42 AM 12/3/00 +0000, Thomas Lockhart wrote:
This statement of yours kinda belittles the work done over the past
few years by volunteers.imho it does not,
Sure it does. You in essence are saying that "advanced replication is so
hard that it could only come about if someone were willing to finance a
PROPRIETARY solution. The PG developer group couldn't manage it if
it were done Open Source".
In other words, it is much harder than any of the work done by the
same group of people before they started working on proprietary
versions.
And that the only way to get them doing their best work is to put them
on proprietary, or "semi-proprietary" projects, though 24 months from
now, who's going to care? You've opened the door to IB prominence, not
only shooting PG's open source purity down in flames, but probably PG, Inc's
as well - IF IB can figure out their political problems.
IB, as it stands, is a damned good product in many ways ahead of PG. You're
giving them life by this approach, which is a kind of bizarre businees strategy.
I *am* one of those volunteers
Yes, I well remember you screwing up PG 7.0 just before beta, without bothering
to test your code, and leaving on vacation.
You were irresponsible then, and you're being irresponsible now.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 09:56 PM 12/2/00 -0700, Ron Chmara wrote:
...
And I really havn't seen much in the way of full featured products, complete
with printed docs, 24 hour support, tutorials, wizards, templates, a company
to sue if the code causes damage, GUI install, setup, removal, etc. etc. etc.Want to make money from open source? Well, you have to find, or build,
a _product_. Right now, there are no OS db products that can compare to oh,
an Oracle product, a MSSQL product. There may be superior code, but that
doesn't make a difference in business. Business has very little to do
with building the perfect mousetrap, if nobody can easily use it.
Which of course is the business model - certainly not a "zero revenue" model
as Thomas arrogantly suggests - which OSS service companies are following.
They provide the cocoon around the code.
I buy RH releases from Fry's. Yes, I could download, but the price is such
that I'd rather just go buy the damned release CDs. I don't begrudge it,
they're providing me a real SERVICE, saving me time, which saves me dollars
in opportunity costs (given my $200/hr customer billing rate). They make
money buy publishing releases, I still get all the sources. We all win.
It is not a bad model.
Question - if this model sucks, then certainly PG, Inc's net revenue last
year was greater than any true open source software company's? I mean, let's
see that slam against the "zero revenue business model" be proven by showing
us some real numbers.
Just what was PG, Inc's net revenue last year, and just how does their mixed
revenue model stack up against the OSS world?
(NOT the .com world, which is in a different business, no matter what Thomas
wants to claim).
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
There is risk here. It isn't so much in the fact that PostgreSQL, Inc
is doing a couple of modest closed-source things with the code. After
all, the PG community has long acknowleged that the BSD license would
allow others to co-op the code and commercialize it with no obligations.It is rather sad to see PG, Inc. take the first step in this direction.
How long until the entire code base gets co-opted?
I totaly missed your point here. How closing source of ERserver is related
to closing code of PostgreSQL DB server? Let me clear things:
1. ERserver isn't based on WAL. It will work with any version >= 6.5
2. WAL was partially sponsored by my employer, Sectorbase.com,
not by PG, Inc.
Vadim
Don Baccus <dhogaza@pacifier.com> writes:
At 04:42 AM 12/3/00 +0000, Thomas Lockhart wrote:
This statement of yours kinda belittles the work done over the past
few years by volunteers.imho it does not,
Sure it does. You in essence are saying that "advanced replication is so
hard that it could only come about if someone were willing to finance a
PROPRIETARY solution. The PG developer group couldn't manage it if
it were done Open Source".
<snip>
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
Mr. Baccus,
It is funny how you rant and rave about the importance of opensource
and how Postgresql Inc. making an non-opensource product is bad. Yet I
go to your website which is full of photographs and you make it a big
deal about people should not steal your photographs and how someone
must buy a commercial license to use them. That doesn't sound very
'open-source' to me! Why don't you practice what you preach and allow
redistribution of those photographs?
--
Prasanth Kumar
kumar1@home.com
Import Notes
Reply to msg id not found: DonBaccussmessageofSat02Dec2000221432-0800
Don Baccus writes:
How long until the entire code base gets co-opted?
Yeah so what? Nobody's forcing you to use, buy, or pay attention to any
such efforts. The market will determine whether the release model of
PostgreSQL, Inc. appeals to customers. Open source software is a
privilege, and nobody has the right to call someone "irresponsible"
because they want to get paid for their work and don't choose to give away
their code.
--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Ron Chmara wrote:
As it is, any company trying to make a closed version of an open source
product has some _massive_ work to do. Manuals. Documentation. Sales.
Branding. Phone support lines. Legal departments/Lawsuit prevention. Figuring
out how to prevent open source from stealing the thunder by duplicating
features. And building a _product_.Most Open Source projects are not products, they are merely code, and some
horrid documentation, and maybe some support. The companies making money
are not making better code, they are making better _products_....And I really havn't seen much in the way of full featured products, complete
with printed docs, 24 hour support, tutorials, wizards, templates, a company
to sue if the code causes damage, GUI install, setup, removal, etc. etc. etc.
This kind of stuff is more along the lines of what Great Bridge is doing. In about
a week, we'll be releasing a GB-branded release of 7.0.3 - including printed
manuals (much of which is new), a GUI installer (which is open source), support
packages including fully-staffed 24/7. Details to follow soon on pgsql-announce.
I don't want to speak for Pgsql Inc., but it seems to me that they are pursuing a
slightly different business model than us - more focused on providing custom
development around the base PostgreSQL software. And that's a great way to get
more people using PostgreSQL. Some of what they create for their customers may be
open source, some not. It's certainly their decision - and it's a perfectly
justifiable business model, followed by open source companies such as Covalent
(Apache), Zend (PHP), and TurboLinux. I don't think it's productive or appropriate
to beat up on Pgsql Inc for developing bolt-on products in a different way -
particularly with Vadim's clarification that the bolt-ons don't require anything
special in the open source backend.
Our own business model is, as I indicated, different. We got a substantial
investment from our parent company, whose chairman sat on the Red Hat board for
three years, and a mandate to create a *big* company that could provide the
infrastructure (human and technical) to enable PostgreSQL to go up against the
proprietary players like Oracle and Microsoft. A fully-staffed 24/7 data center
isn't cheap, and our services won't be either. But it's a different type of
business - we're providing the benefits of the open source development model to a
group of customers that might not otherwise get involved, precisely because they
demand to see a company of Great Bridge's heft behind a product before they buy.
I think PostgreSQL and other open source projects are big enough for lots of
different companies, with lots of different types of business models. Indeed, from
what I've seen of Pgsql Inc (and I hope I haven't mischaracterized them), our
business models are highly complementary. At Great Bridge, we hope and expect that
other companies that "get it" will get more involved with PostgreSQL - that can
only add to the strength of the project.
Regards,
Ned
--
----------------------------------------------------
Ned Lilly e: ned@greatbridge.com
Vice President w: www.greatbridge.com
Evangelism / Hacker Relations v: 757.233.5523
Great Bridge, LLC f: 757.233.5555
Branding. Phone support lines. Legal departments/Lawsuit prevention.
Figuring
out how to prevent open source from stealing the thunder by duplicating
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
features. And building a _product_.
Oops. You didn't really mean that, did you? Could it be that there are some
people out there thinking "let them free software fools do the hard initial
work, once things are working nicely, we take over, add a few "secret"
ingredients, and voila - the commercial product has been created?
After reading the statement above I believe that surely most of the honest
developers involved in postgres would wish they had chosen GPL as licensing
scheme.
I agree that most of the work is always done by a few. I also agree that it
would be nice if they could get some financial reward for it. But no dirty
tricks please. Do not betray the base. Otherwise, the broad developer base
will be gone before you even can say "freesoftware".
I, for my part, have learned another lesson today. I was just about to give
in with the licensing scheme in our project to allow the GPL incompatible
OpenSSL to be used. After reading the above now I know it is worth the extra
effort to "roll our own" or wait for another GPL'd solution rather than
sacrificing the unique protection the GPL gives us.
Horst
coordinator gnumed project
How long until the entire code base gets co-opted?
Yeah so what? Nobody's forcing you to use, buy, or pay attention to any
such efforts. The market will determine whether the release model of
PostgreSQL, Inc. appeals to customers. Open source software is a
privilege, and nobody has the right to call someone "irresponsible"
because they want to get paid for their work and don't choose to give away
their code.
Just bear in mind that although a few developers always deliver outstanding
performance in any project, those open source projects have usually seen a
huge broad developer base. Hundreds of people putting their effort into the
project. These people never ask for a cent, never even dream of some
commercial benefit. They do it for the sake of creating something good,
being part of something great.
Especially in the case of Postgres the "product" has a long heritage, and
the most active people today are not neccessarily the ones who have put in
most "total" effort (AFAIK, I might be wrong here). Anyway, Postgres would
not be where it is today without the hundreds of small cooperators &
testers. Lock them out from the source code - even if it is only a side
branch, and Postgres will die (well, at least it would die for our project)
Open source is not a mere marketing model. It is a philosophy. It is about
essential freedom, about human progress, about freedom of speech and
thought. It is about sharing and caring. Those who don't understand this,
should please stick to their ropes and develop closed source from the
beginning and not try to fool the free software community.
Horst
Thomas Lockhart wrote:
As soon as you find a business model which does not require income, let
me know. The .com'ers are trying it at the moment, and there seems to be
a few flaws... ;)
While I have not contributed anything to Postgres yet, I have
contributed to other environments. The prospect that I could create a
piece of code, spend weeks/years of my own time on something and some
entity can come along, take what I've written and create a product which
is better for it, and then not share back is offensive. Under GPL it is
illegal. (Postgres should try to move to GPL)
I am working on a full-text search engine for Postgres. A really fast
one, something better than anything else out there. It combines the
power and scalability of a web search engine, with the data-mining
capabilities of SQL.
If I write this extension to Postgres, and release it, is it right that
a business can come along, add a few things here and there and introduce
a new closed source product on what I have written? That is certainly
not what I intend. My intention was to honor the people before me for
providing the rich environment which is Postgres. I have made real money
using Postgres in a work environment. The time I would give back more
than covers MSSQL/Oracle licenses.
Open source is a social agreement, not a business model. If you break
the social agreement for a business model, the business model will fail
because the society which fundamentally created the product you wish to
sell will crumble from mistrust (or shun you). In short, it is wrong to
sell the work of others without proper compensation and the full
agreement of everyone that has contributed. If you don't get that, get
out of the open source market now.
That said, there is a long standing business model which is 100%
compatible with Open Source and it is of the lowly 'VAR.' You do not
think for one minute that an Oracle VAR would dare to add features to
Oracle and make their own SQL do you?
As a PostgreSQL "VAR" you are in a better position that any other VAR.
You get to partner in the code development process. (You couldn't ask
Oracle to add a feature and expect to keep it to yourself, could you?)
I know this is a borderline rant, and I am sorry, but I think it is very
important that the integrity of open source be preserved at 100% because
it is a very slippery slope, and we are all surrounded by the temptation
cheat the spirit of open source "just a little" for short term gain.
At 11:00 PM 12/2/00 -0800, Vadim Mikheev wrote:
There is risk here. It isn't so much in the fact that PostgreSQL, Inc
is doing a couple of modest closed-source things with the code. After
all, the PG community has long acknowleged that the BSD license would
allow others to co-op the code and commercialize it with no obligations.It is rather sad to see PG, Inc. take the first step in this direction.
How long until the entire code base gets co-opted?
I totaly missed your point here. How closing source of ERserver is related
to closing code of PostgreSQL DB server? Let me clear things:
(not based on WAL)
That's wasn't clear from the blurb.
Still, this notion that PG, Inc will start producing closed-source products
poisons the well. It strengthens FUD arguments of the "open source can't
provide enterprise solutions" variety. "Look, even PostgreSQL, Inc realizes
that you must follow a close sourced model in order to provide tools for
the corporate world."
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 01:06 PM 12/3/00 +0100, Peter Eisentraut wrote:
Open source software is a
privilege,
I admit that I don't subscribe to Stallman's "source to software is a
right" argument. That's far off my reality map.
and nobody has the right to call someone "irresponsible"
because they want to get paid for their work and don't choose to give away
their code.
However, I do have the right to make such statements, just as you have the
right to disagree. It's called the first amendment in my country.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
I totaly missed your point here. How closing source of ERserver is related
to closing code of PostgreSQL DB server? Let me clear things:(not based on WAL)
That's wasn't clear from the blurb.
Still, this notion that PG, Inc will start producing closed-source products
poisons the well. It strengthens FUD arguments of the "open source can't
provide enterprise solutions" variety. "Look, even PostgreSQL, Inc realizes
that you must follow a close sourced model
in order to provide tools for the corporate world."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Did you miss Thomas' answer? Wasn't it clear that the order is to provide
income?
Vadim
I think this trend is MUCH bigger than what Postgres, Inc. is doing... its
happening all over
the comminity. Heck take a look around... Jabber, Postgres, Red Hat, SuSe,
Storm etc. etc.
these companies are making good money off a business plan that was basically
"hey, lets take some
of that open source and make a real product out of it...". As long as they
dribble releases into
the community, they're not in violation... Its not a bad business model if
you think about it, if you
can take a product that is good (great as in PG) and add value, sell it and
make money, why not?
Hell, you didn't have to spend the gazillion R&D dollars on the initial
design and implementation,
your basically reaping the rewards off of the work of other people.
Are you ready for hundreds upon hundreds of little projects turning into
"startup" companies?
It was bound to happen. Why? because money is involved, plain and simple.
Maybe its a natural progression of this stuff, who knows, I just know that
I've been around
the block a couple times, been in the industry too long to know that the
minority voice never
gets the prize... we usually set the trend and pay for it in the end...
fatalistic? maybe. But not
far from the truth...
Sorry to be a downer... The Red Sox didn't get Mussina....
----- Original Message -----
From: "Don Baccus" <dhogaza@pacifier.com>
To: "Ross J. Reedstrom" <reedstrm@rice.edu>
Cc: "Peter Eisentraut" <peter_e@gmx.net>; "PostgreSQL Development"
<pgsql-hackers@postgresql.org>
Sent: Saturday, December 02, 2000 5:11 PM
Subject: Re: [HACKERS] beta testing version
At 03:51 PM 12/2/00 -0600, Ross J. Reedstrom wrote:
"We expect to have the source code tested and ready to contribute to
the open source community before the middle of October. Until that time
we are considering requests from a number of development companies and
venture capital groups to join us in this process."Where's the damn core code? I've seen a number of examples already of
people asking about remote access/replication function, with an eye
toward implementing it, and being told "PostgreSQL, Inc. is working
on that". It's almost Microsoftesque: preannounce future functionality
suppressing the competition.Well, this is just all 'round a bad precedent and an unwelcome path
for PostgreSQL, Inc to embark upon.They've also embarked on one fully proprietary product (built on PG),
which means they're not an Open Source company, just a sometimes Open
Source company.It's a bit ironic to learn about this on the same day I learned that
Solaris 8 is being made available in source form. Sun's slowly "getting
it" and moving glacially towards Open Source, while PostgreSQL, Inc.
seems to be drifting in the opposite direction.if I absolutely need
something that's only in CVS right now, I can bite the bullet and use
a snapshot server.This work might be released as Open Source, but it isn't an open
development
scenario. The core work's not available for public scrutiny, and the
details
of what they're actually up don't appear to be public either.
OK, they're probably funding Vadim's work on WAL, so the idictment's
probably
not 100% accurate - but I don't know that.
I'd be really happy with someone reiterating the commitment to an
open release, and letting us all know how badly the schedule has
slipped. Remember, we're all here to help! Get everyone stomping bugs
in code you're going to release soon anyway, and concentrate on the
quasi-propriatary extensions.Which makes me wonder, is Vadim's time going to be eaten up by working
on these quasi-proprietary extensions that the rest of us won't get
for two years unless we become customers of Postgres, Inc?Will Great Bridge step to the plate and fund a truly open source
alternative,
Show quoted text
leaving us with a potential code fork? If IB gets its political problems
under control and developers rally around it, two years is going to be a
long time to just sit back and wait for PG, Inc to release eRServer.These developments are a major annoyance.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
Peter Eisentraut wrote:
mlw writes:
There are hundreds (thousands?) of people that have contributed to the
development of Postgres, either directly with code, or beta testing,
with the assumption that they are benefiting a community. Many would
probably not have done so if they had suspected that what they do is
used in a product that excludes them.With the BSD license it has always been clear that this would be possible,
and for as long as I've been around the core/active developers have
frequently reiterated that this is a desirable aspect and in fact
encouraged. If you don't like that, then you should have read the license
before using the product.I have said before, open source is a social contract, not a business
model.Well, you're free to take the PostgreSQL source and start your own "social
contract" project; but we don't do that around here.
And you don't feel that this is a misappropriation of a public trust? I
feel shame for you.
Import Notes
Reference msg id not found: Pine.LNX.4.30.0012032109530.937-100000@peter.localdomain | Resolved by subject fallback
On Sat, 2 Dec 2000, Adam Haberlach wrote:
In any case, can we create pgsql-politics so we don't have to go over
this issue every three months? Can we create pgsql-benchmarks while we
are at it, to take care of the other thread that keeps popping up?
no skin off my back:
pgsql-advocacy
pgsql-chat
pgsql-benchmarks
-advocacy/-chat are pretty much the same concept ...
On Sat, 2 Dec 2000, Don Baccus wrote:
I *am* one of those volunteers
Yes, I well remember you screwing up PG 7.0 just before beta, without bothering
to test your code, and leaving on vacation.You were irresponsible then, and you're being irresponsible now.
Okay, so let me get this one straight ... it was irresponsible for him to
put code in that was broken the last time, but it wouldn't be
irresponsible for us to release code that we don't feel is ready this
time? *raised eyebrow*
Just want to get this straight, as it kinda sounds hypocritical to me, but
want to make sure that I understand before I fully arrive at that
conclusion ... :)
Don Baccus wrote:
At 04:42 AM 12/3/00 +0000, Thomas Lockhart wrote:
This statement of yours kinda belittles the work done over the past
few years by volunteers.imho it does not,
Sure it does. You in essence are saying that "advanced replication is so
hard that it could only come about if someone were willing to finance a
PROPRIETARY solution. The PG developer group couldn't manage it if
it were done Open Source".In other words, it is much harder than any of the work done by the
same group of people before they started working on proprietary
versions.And that the only way to get them doing their best work is to put them
on proprietary, or "semi-proprietary" projects, though 24 months from
now, who's going to care? You've opened the door to IB prominence, not
only shooting PG's open source purity down in flames, but probably PG, Inc's
as well - IF IB can figure out their political problems.IB, as it stands, is a damned good product in many ways ahead of PG. You're
giving them life by this approach, which is a kind of bizarre businees strategy.
You (and others ;) may also be interested in SAPDB (SAP's version of
Adabas),
that is soon to be released under GPL. It is already downloadable for
free use
from www.sapdb.org
-------------
Hannu
On Mon, 4 Dec 2000, Horst Herb wrote:
Branding. Phone support lines. Legal departments/Lawsuit prevention.
Figuring
out how to prevent open source from stealing the thunder by duplicating
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
features. And building a _product_.
Oops. You didn't really mean that, did you? Could it be that there are some
people out there thinking "let them free software fools do the hard initial
work, once things are working nicely, we take over, add a few "secret"
ingredients, and voila - the commercial product has been created?After reading the statement above I believe that surely most of the
honest developers involved in postgres would wish they had chosen GPL
as licensing scheme.I agree that most of the work is always done by a few. I also agree
that it would be nice if they could get some financial reward for it.
But no dirty tricks please. Do not betray the base. Otherwise, the
broad developer base will be gone before you even can say
"freesoftware".I, for my part, have learned another lesson today. I was just about to
give in with the licensing scheme in our project to allow the GPL
incompatible OpenSSL to be used. After reading the above now I know it
is worth the extra effort to "roll our own" or wait for another GPL'd
solution rather than sacrificing the unique protection the GPL gives
us.
to this day, this still cracks me up ... if a BSD licensed OSS project
somehow gets its code base "closed", that closing can only affect the code
base from its closing on forward ... on that day, there is *nothing*
stopping from the OSS community from taking the code base from teh second
before it was closed and running with it ...
you get no more, and no less, protection under either license.
the "protection" that GPL provides is that it prevents someone from taking
the code, making proprietary modications to it and branding it as their
own for release ... cause under GPL, they would have to release the source
code for the modifications ...
PgSQL, Inc hasn't done anything so far but develop third party
*applications* over top of PgSQL, with plans to release them at various
stages as the clients we are developing them for permit ... as well as
provided consulting to clients looking at moving towards PgSQL and
requiring help with migrations ...
We aren't going to release something that is half-assed and buggy ... the
whole erServer stuff right now is *totally* external to the PgSQL server,
and, as such, is a third-party application, not a proprietary extension
like Don wants to make it out to be ...
"Gary MacDougall" <gary@freeportweb.com> writes:
I think this trend is MUCH bigger than what Postgres, Inc. is
doing... its happening all over the comminity. Heck take a look
around... Jabber, Postgres, Red Hat, SuSe, Storm etc. etc. these
companies are making good money off a business plan that was basically
"hey, lets take some of that open source and make a real product out
of it...".
I doubt many of these "make good money". We're almost breaking even,
which is probably the best among these.
Note also that some companies contribute engineering resources into
core free software components, like gcc, gdb, the linux kernel, glibc,
gnome, gtk+, rpm, apache, XFree, KDE - AFAIK, Red Hat and SuSE are by
far the two doing this the most.
--
Trond Eivind Glomsr�d
Red Hat, Inc.
Adam Haberlach wrote:
In any case, can we create pgsql-politics so we don't have to go over
this issue every three months? Can we create pgsql-benchmarks while we
are at it, to take care of the other thread that keeps popping up?
pgsql-yawn, where any of them can happen as often and long as
they want.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
mlw wrote:
Thomas Lockhart wrote:
As soon as you find a business model which does not require income, let
me know. The .com'ers are trying it at the moment, and there seems to be
a few flaws... ;)While I have not contributed anything to Postgres yet, I have
contributed to other environments. The prospect that I could create a
piece of code, spend weeks/years of my own time on something and some
entity can come along, take what I've written and create a product which
is better for it, and then not share back is offensive. Under GPL it is
illegal. (Postgres should try to move to GPL)
I think that forbidding anyone else from profiting from your work is
also
somewhat obscene ;)
The whole idea of open source is that in open ideas mature faster, bugs
are
I am working on a full-text search engine for Postgres. A really fast
one, something better than anything else out there.
Is'nt everybody ;)
It combines the power and scalability of a web search engine, with
the data-mining capabilities of SQL.
Are you doing it in a fully open-source fashion or just planning to
release
it as OS "when it somewhat works" ?
If I write this extension to Postgres, and release it, is it right that
a business can come along, add a few things here and there and introduce
a new closed source product on what I have written? That is certainly
not what I intend.
If your intention is to later cash in on proprietary uses of your code
you should of course use GPL.
My intention was to honor the people before me for
providing the rich environment which is Postgres. I have made real money
using Postgres in a work environment. The time I would give back more
than covers MSSQL/Oracle licenses.Open source is a social agreement, not a business model.
Not one but many (and btw. incompatible) social agreements.
If you break the social agreement for a business model,
You are free to put your additions under GPL, it is just a tradition in
PG
community not to contaminate the core with anything less free than BSD
(and yes,
forcing your idea of freedom on other people qualifies as "less free" ;)
the business model will fail
because the society which fundamentally created the product you wish to
sell will crumble from mistrust (or shun you). In short, it is wrong to
sell the work of others without proper compensation and the full
agreement of everyone that has contributed. If you don't get that, get
out of the open source market now.
SO now a social contract is a market ? I _am_ confused.
That said, there is a long standing business model which is 100%
compatible with Open Source and it is of the lowly 'VAR.' You do not
think for one minute that an Oracle VAR would dare to add features to
Oracle and make their own SQL do you?
But if Oracle were released under BSD license, it might benefit both the
VAR and the customer to do so under some circumstances.
As a PostgreSQL "VAR" you are in a better position that any other VAR.
You get to partner in the code development process. (You couldn't ask
Oracle to add a feature and expect to keep it to yourself, could you?)
You could ask another VAR to do that if you yourself are incapable/don't
have time, etc.
And of course I can keep it to myself even if done by Oracle.
What I can't do is forbid others from having it too .
I know this is a borderline rant, and I am sorry, but I think it is very
important that the integrity of open source be preserved at 100% because
it is a very slippery slope, and we are all surrounded by the temptation
cheat the spirit of open source "just a little" for short term gain.
Do you mean that anyone who has contributed to an opensource project
should
be forbidden from doing any closed-source development ?
-----------
Hannu
The Hermit Hacker wrote:
On Sat, 2 Dec 2000, Don Baccus wrote:
I *am* one of those volunteers
Yes, I well remember you screwing up PG 7.0 just before beta, without bothering
to test your code, and leaving on vacation.You were irresponsible then, and you're being irresponsible now.
Okay, so let me get this one straight ... it was irresponsible for him to
put code in that was broken the last time, but it wouldn't be
irresponsible for us to release code that we don't feel is ready this
time? *raised eyebrow*Just want to get this straight, as it kinda sounds hypocritical to me, but
want to make sure that I understand before I fully arrive at that
conclusion ... :)
IIRC, this thread woke up on someone complaining about PostgreSQl inc
promising
to release some code for replication in mid-october and asking for
confirmation
that this is just a schedule slip and that the project is still going on
and
going to be released as open source.
What seems to be the answer is: "NO, we will keep the replication code
proprietary".
I have not seen this answer myself, but i've got this impression from
the contents
of the whole discussion.
Do you know if this is the case ?
-----------
Hannu
Hannu Krosing wrote:
I know this is a borderline rant, and I am sorry, but I think it is very
important that the integrity of open source be preserved at 100% because
it is a very slippery slope, and we are all surrounded by the temptation
cheat the spirit of open source "just a little" for short term gain.Do you mean that anyone who has contributed to an opensource project
should be forbidden from doing any closed-source development ?
No, not at all. At least for me, if I write code which is dependent on
the open source work of others, then hell yes, that work should also be
open source. That, to me, is the difference between right and wrong.
If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.
I honestly feel that it is wrong to take what others have shared and use
it for the basis of something you will not share, and I can't understand
how anyone could think differently.
mlw wrote:
Hannu Krosing wrote:
I know this is a borderline rant, and I am sorry, but I think it is very
important that the integrity of open source be preserved at 100% because
it is a very slippery slope, and we are all surrounded by the temptation
cheat the spirit of open source "just a little" for short term gain.Do you mean that anyone who has contributed to an opensource project
should be forbidden from doing any closed-source development ?No, not at all. At least for me, if I write code which is dependent on
the open source work of others, then hell yes, that work should also be
open source. That, to me, is the difference between right and wrong.
That may be so, that the world as a whole is not that far yet. If
open-source
is going to prevail (which I believe it will do), it is not because it
is
"right", but because it is a more efficient way of producing quality
software.
I honestly feel that it is wrong to take what others have shared and use
it for the basis of something you will not share, and I can't understand
how anyone could think differently.
There can be many many reasons you would need to also write
closed-source code.
BSD license gives you that freedom (GPL does not).
By distributing your code under BSD license you acnowledge that the
world is
not perfect. This is not the way of a true revolutionary ;)
Don't let that scare you away from contributing to PostgreSQL though,
You could
always contribute and keep your code under different license,
GPL,LGPL,MPL, ...
It would probably not be integrated in the core, but would very likely
be kept
in contrib.
----------
Hannu
No offense Trond, if you were in on the Red Hat IPO from the start,
you'd have to say those people made "good money". Bad market
or good market, those "friends of Red Hat" made some serious coin.
Let me clarify, I'm not against this process (and making money), I just
think there is an issue with OSL that will start to catch up with itself
pretty soon.
g.
----- Original Message -----
From: "Trond Eivind Glomsr�d" <teg@redhat.com>
To: "PostgreSQL Development" <pgsql-hackers@postgresql.org>
Sent: Sunday, December 03, 2000 4:24 PM
Subject: Re: [HACKERS] beta testing version
"Gary MacDougall" <gary@freeportweb.com> writes:
I think this trend is MUCH bigger than what Postgres, Inc. is
doing... its happening all over the comminity. Heck take a look
around... Jabber, Postgres, Red Hat, SuSe, Storm etc. etc. these
companies are making good money off a business plan that was basically
"hey, lets take some of that open source and make a real product out
of it...".
I doubt many of these "make good money". We're almost breaking even,
which is probably the best among these.
Note also that some companies contribute engineering resources into
core free software components, like gcc, gdb, the linux kernel, glibc,
gnome, gtk+, rpm, apache, XFree, KDE - AFAIK, Red Hat and SuSE are by
far the two doing this the most.
--
Trond Eivind Glomsr�d
Red Hat, Inc.
"Gary MacDougall" <gary@freeportweb.com> writes:
No offense Trond, if you were in on the Red Hat IPO from the start,
you'd have to say those people made "good money".
I'm talking about the business as such, not the IPO where the price
went stratospheric (we were priced like we were earning 1 or 2 billion
dollars year, which was kindof weird).
--
Trond Eivind Glomsr�d
Red Hat, Inc.
No, not at all. At least for me, if I write code which is dependent on
the open source work of others, then hell yes, that work should also be
open source. That, to me, is the difference between right and wrong.
Actually, your not legally bound to anything if you write "new" additional
code, even if its dependant on something. You could consider it
"propietary"
and charge for it. There a tons of these things going on right now.
Having dependancy on an open source product/code/functionality does not
make one bound to make thier code "open source".
If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.
Thats a given.
I honestly feel that it is wrong to take what others have shared and use
it for the basis of something you will not share, and I can't understand
how anyone could think differently.
The issue isn't "fairness", the issue really is really trust. And from what
I'm
seeing, like anything else in life, if you rely solely on trust when money
is
involved, the system will fail--eventually.
sad... isn't it?
Show quoted text
Gary MacDougall wrote:
No, not at all. At least for me, if I write code which is dependent on
the open source work of others, then hell yes, that work should also be
open source. That, to me, is the difference between right and wrong.Actually, your not legally bound to anything if you write "new" additional
code, even if its dependant on something. You could consider it
"propietary"
and charge for it. There a tons of these things going on right now.Having dependancy on an open source product/code/functionality does not
make one bound to make thier code "open source".If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.Thats a given.
I honestly feel that it is wrong to take what others have shared and use
it for the basis of something you will not share, and I can't understand
how anyone could think differently.The issue isn't "fairness", the issue really is really trust. And from what
I'm
seeing, like anything else in life, if you rely solely on trust when money
is
involved, the system will fail--eventually.sad... isn't it?
That's why, as bad as it is, GPL is the best answer.
On Sun, Dec 03, 2000 at 05:17:36PM -0500, mlw wrote:
... if I write code which is dependent on
the open source work of others, then hell yes, that work should also be
open source. That, to me, is the difference between right and wrong.
This is short and I will say no more:
The entire social contract around PostgreSQL is written down in the
license. Those who have contributed to the project (are presumed to)
have read it and agreed to it before submitting their changes. Some
people have contributed intending someday to fold the resulting code
base into their proprietary product, and carefully checked to ensure
the license would allow it. Nobody has any legal or moral right to
impose extra use restrictions, on their own code or (especially!) on
anybody else's.
If you would like to place additional restrictions on your own
contributions, you can:
1. Work on other projects. (Adabas will soon be GPL, but you can
start now. Others are coming, too.) There's always plenty of
work to be done on Free Software.
2. Fork the source base, add your code, and release the whole thing
under GPL. You can even fold in changes from the original project,
later. (Don't expect everybody to get along, afterward.) A less
drastic alternative is to release GPL'd patches.
3. Grin and bear it. Greed is a sin, but so is envy.
Flame wars about licensing mainly distract people from writing code.
How would *you* like the time spent?
Nathan Myers
ncm@zembu.com
mlw wrote: [heavily edited]
No, not at all. At least for me, if I write code which is dependent on
the open source work of others, then hell yes, that work should also be
open source. That, to me, is the difference between right and wrong.
I honestly feel that it is wrong to take what others have shared and use
it for the basis of something you will not share, and I can't understand
how anyone could think differently.
You're missing the point almost completely. We've been around on this
GPL-vs-BSD discussion many many many times before, and the discussion
always ends up at the same place: we aren't changing the license.
The two key reasons (IMHO) are:
1. The original code base is BSD. We do not have the right to
unilaterally relabel that code as GPL. Maybe we could try to say that
all additions/changes after a certain date are GPL, but that'd become a
hopeless mess very shortly; how would you keep track of what was which?
Not to mention the fact that a mixed-license project would not satisfy
GPL partisans anyway.
2. Since Postgres is a database, and the vast majority of uses for
databases are business-related, we have to have a license that
businesses will feel comfortable with. One aspect of that comfort is
that they be able to do things like building proprietary applications
atop the database. If we take a purist GPL approach, we'll just drive
away a lot of potential users and contributors. (I for one wouldn't be
here today, most likely, if Postgres had been GPL --- my then company
would not have gotten involved with it.)
I have nothing against GPL; it's appropriate for some things. But
it's not appropriate for *this* project, because of history and subject
matter. We've done just fine with the BSD license and I do not see a
reason to think that GPL would be an improvement.
regards, tom lane
At 5:17 PM -0500 12/3/00, mlw wrote:
I honestly feel that it is wrong to take what others have shared and use
it for the basis of something you will not share, and I can't understand
how anyone could think differently.
Yeah, it really sucks when companies that are in buisness to make money by creating solutions and support for end users take the hard work of volenteers, commit resources to extending and enhancing that work, and make that work more accessable end users (for a fee).
Maybe it's unfair that the people at the bottom of that chain don't reap a percentage of the revenue generated at the top, but those people were free to read the license of the product they were contributing to.
Ironically, the GPL protects the future income a programmer much bettter than the BSD license, becuase under the GPL the original author can sell the code to a commercial enterprise who otherwise would not have been able to use it. Even more ironically, the GPL doesn't prevent 3rd parties from feeding at the trough as long as they DON'T extend and enhance the product. (Though Red Hat and friends donate work back to maintain community support.)
To me, Open Source is about admitting that the Computer Science field is in it's infancy, and the complex systems we're building today are the fundamental building blocks of tomorrow's systems. It is about exchanging control for adoption, a trade-off that has millions of case studies.
Think Different,
-pmb
--
"Every time you provide an option, you're asking the user to make a decision.
That means they will have to think about something and decide about it.
It's not necessarily a bad thing, but, in general, you should always try to
minimize the number of decisions that people have to make."
http://joel.editthispage.com/stories/storyReader$51
On Sun, 3 Dec 2000, Hannu Krosing wrote:
The Hermit Hacker wrote:
On Sat, 2 Dec 2000, Don Baccus wrote:
I *am* one of those volunteers
Yes, I well remember you screwing up PG 7.0 just before beta, without bothering
to test your code, and leaving on vacation.You were irresponsible then, and you're being irresponsible now.
Okay, so let me get this one straight ... it was irresponsible for him to
put code in that was broken the last time, but it wouldn't be
irresponsible for us to release code that we don't feel is ready this
time? *raised eyebrow*Just want to get this straight, as it kinda sounds hypocritical to me, but
want to make sure that I understand before I fully arrive at that
conclusion ... :)IIRC, this thread woke up on someone complaining about PostgreSQl inc
promising
to release some code for replication in mid-october and asking for
confirmation
that this is just a schedule slip and that the project is still going on
and
going to be released as open source.What seems to be the answer is: "NO, we will keep the replication code
proprietary".I have not seen this answer myself, but i've got this impression from
the contents
of the whole discussion.Do you know if this is the case ?
If this is the impression that someone gave, I am shocked ... Thomas
himself has already posted stating that it was a scheduale slip on his
part. Vadim did up the software days before the Oracle OpenWorld
conference, but it was a very rudimentary implementation. At the show,
Thomas dove in to build a basic interface to it, and, as time permits, has
been working on packaging to get it into contrib before v7.1 is released
...
I've been trying to follow this thread, and seem to have missed where
someone arrived at the conclusion that we were proprietarizing(word?) this
... we do apologize that it didn't get out mid-October, but it is/was
purely a scheduale slip ...
On Sun, 3 Dec 2000, mlw wrote:
Show quoted text
Hannu Krosing wrote:
I know this is a borderline rant, and I am sorry, but I think it is very
important that the integrity of open source be preserved at 100% because
it is a very slippery slope, and we are all surrounded by the temptation
cheat the spirit of open source "just a little" for short term gain.Do you mean that anyone who has contributed to an opensource project
should be forbidden from doing any closed-source development ?No, not at all. At least for me, if I write code which is dependent on
the open source work of others, then hell yes, that work should also be
open source. That, to me, is the difference between right and wrong.If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.I honestly feel that it is wrong to take what others have shared and use
it for the basis of something you will not share, and I can't understand
how anyone could think differently.
On Sun, 3 Dec 2000, Gary MacDougall wrote:
If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.Thats a given.
okay, then now I'm confused ... neither SePICK or erServer are derived
from uncompensated parties ... they work over top of PgSQL, but are not
integrated into them, nor have required any changes to PgSQL in order to
make it work ...
... so, where is this whole outcry coming from?
On Sun, 3 Dec 2000, Don Baccus wrote:
At 11:00 PM 12/2/00 -0800, Vadim Mikheev wrote:
There is risk here. It isn't so much in the fact that PostgreSQL, Inc
is doing a couple of modest closed-source things with the code. After
all, the PG community has long acknowleged that the BSD license would
allow others to co-op the code and commercialize it with no obligations.It is rather sad to see PG, Inc. take the first step in this direction.
How long until the entire code base gets co-opted?
I totaly missed your point here. How closing source of ERserver is related
to closing code of PostgreSQL DB server? Let me clear things:(not based on WAL)
That's wasn't clear from the blurb.
Still, this notion that PG, Inc will start producing closed-source products
poisons the well. It strengthens FUD arguments of the "open source can't
provide enterprise solutions" variety. "Look, even PostgreSQL, Inc realizes
that you must follow a close sourced model in order to provide tools for
the corporate world."
Don ... have you never worked for a client that has paid you to develop a
product for them? Have you taken the work you did for them, that they
paid for, and shoved it out into the community to use for free? Why would
we do anything any differently?
Your clients ask you to develop something for them as an extension to
PgSQL (its extensible, ya know?) that can be loaded as a simple module
(ala IPMeter) that gives them a competitive advantage over their
competitors, but that doesn't require any changes to the physical backend
to implement ... would you refuse their money? or would you do like
PgSQL, Inc is doing, where we do a risk-analysis of the changes and work
with the client to make use of the "competitive advantage" it gives them
for a period of time prior to releasing it open source?
Geoff explains it much better then I do, from a business perspective, but
any extension/application that PgSQL, Inc develops for our clients has a
life-span on it ... after which, keeping it in track with what is being
developed would cost more then the competitive advantage it gives the
clients ... sometimes, that is 0, some times, 6 months ... extreme cases,
24 months ...
Nobody is going to pay you to develop X if you are going to turn around
and give it for free to their competitor ... it makes no business. In
alot of cases, making these changes benefits the project as some of the
stuff that is required for them get integrated into the backend ...
On Sun, Dec 03, 2000 at 08:49:09PM -0400, The Hermit Hacker wrote:
On Sun, 3 Dec 2000, Hannu Krosing wrote:
IIRC, this thread woke up on someone complaining about PostgreSQl inc
promising
to release some code for replication in mid-october and asking for
confirmation
that this is just a schedule slip and that the project is still going on
and
going to be released as open source.
That would be me asking the question, as a reply to Don's concern regarding
the 'prorietary extension on a 24 mo. release delay'
What seems to be the answer is: "NO, we will keep the replication code
proprietary".I have not seen this answer myself, but i've got this impression from
the contents
of the whole discussion.Do you know if this is the case ?
If this is the impression that someone gave, I am shocked ... Thomas
himself has already posted stating that it was a scheduale slip on his
part.
Actually, Thomas said:
Thomas> Hmm. What has kept replication from happening in the past? It
Thomas> is a big job and difficult to do correctly. It is entirely my
Thomas> fault that you haven't seen the demo code released; I've been
Thomas> packaging it to make it a bit easier to work with.
I noted the use of the words "demo code" rather than "core code". That
bothered (and still bothers) me, but I didn't reply at the time,
since there was already enough heat in this thread. I'll take your
interpretation to mean it's just a matter of semantics.
[...] Vadim did up the software days before the Oracle OpenWorld
conference, but it was a very rudimentary implementation. At the show,
Thomas dove in to build a basic interface to it, and, as time permits, has
been working on packaging to get it into contrib before v7.1 is released
...I've been trying to follow this thread, and seem to have missed where
someone arrived at the conclusion that we were proprietarizing(word?) this
... we do apologize that it didn't get out mid-October, but it is/was
purely a scheduale slip ...
Mixture of the silent schedule slip on the core code, and the explicit
statement on the erserver.com page regarding the 'proprietary extensions'
with a delayed source release.
The biggest problem I see with having core developers making proprietary
extensions is the potentional for conflict of interest when and if
some of us donate equivalent code to the core. The core developers who
have also done proprietary versions will have to be very cautious
when working on such code. They're in a bind, with two parts. First,
they have obligations to their employer and their employer's partners
to not release the closed work early. Second, possibly ignoring such
independent extensions, or even actively excluding them for the core,
in favor of their own code. The core developers _do_ have a bit of a
track record favoring each others code over external code, as is natural:
we all trust work more from sources we know better, especially when that
source is ourselves. But this favoratism could work against the earliest
possible open solution.
I'm still anxious to see the core patches needed to support replication.
Since you've leaked that they work going back to v6.5, I have a feeling
the approach may not be the one I was hoping for.
Ross
I'm still anxious to see the core patches needed to support replication.
Since you've leaked that they work going back to v6.5, I have a feeling
the approach may not be the one I was hoping for.
There are no core patches required to support replication. This has been
said already, but perhaps lost in the noise.
- Thomas
I'm agreeing with the people like SePICK and erServer.
I'm only being sort of cheeky in saying that they wouldn't have had a
product had
it not been for the Open Source that they are leveraging off of.
Making money? I don't know what they're plans are, but at some point I would
fully expect *someone* to make money.
----- Original Message -----
From: "The Hermit Hacker" <scrappy@hub.org>
To: "Gary MacDougall" <gary@freeportweb.com>
Cc: "mlw" <markw@mohawksoft.com>; "Hannu Krosing" <hannu@tm.ee>; "Thomas
Lockhart" <lockhart@alumni.caltech.edu>; "Don Baccus"
<dhogaza@pacifier.com>; "PostgreSQL Development"
<pgsql-hackers@postgresql.org>
Sent: Sunday, December 03, 2000 7:53 PM
Subject: Re: [HACKERS] beta testing version
Show quoted text
On Sun, 3 Dec 2000, Gary MacDougall wrote:
If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.Thats a given.
okay, then now I'm confused ... neither SePICK or erServer are derived
from uncompensated parties ... they work over top of PgSQL, but are not
integrated into them, nor have required any changes to PgSQL in order to
make it work ...... so, where is this whole outcry coming from?
Correct me if I'm wrong but in the last 3 years what company that you
know of didn't consider an IPO part of the "business and such". Most
tech companies that have been formed in the last 4 - 5 years have one
thing on the brain--IPO. It's the #1 thing (sadly) that they care about.
I only wished these companies cared as much about *creating* and
inovation more than they cared about going public...
g.
Show quoted text
No offense Trond, if you were in on the Red Hat IPO from the start,
you'd have to say those people made "good money".
I'm talking about the business as such, not the IPO where the price
went stratospheric (we were priced like we were earning 1 or 2 billion
dollars year, which was kindof weird).
--
Trond Eivind Glomsr�d
Red Hat, Inc.
On Sun, 3 Dec 2000, Ross J. Reedstrom wrote:
If this is the impression that someone gave, I am shocked ... Thomas
himself has already posted stating that it was a scheduale slip on his
part.Actually, Thomas said:
Thomas> Hmm. What has kept replication from happening in the past? It
Thomas> is a big job and difficult to do correctly. It is entirely my
Thomas> fault that you haven't seen the demo code released; I've been
Thomas> packaging it to make it a bit easier to work with.I noted the use of the words "demo code" rather than "core code". That
bothered (and still bothers) me, but I didn't reply at the time,
since there was already enough heat in this thread. I'll take your
interpretation to mean it's just a matter of semantics.
there is nothing that we are developing at this date that is *core* code
... the "demo code" that we are going to be putting into contrib is a
simplistic version, and the first cut, of what we are developing ... like
everything in contrib, it will be hack-on-able, extendable, etc ...
I'm still anxious to see the core patches needed to support
replication. Since you've leaked that they work going back to v6.5, I
have a feeling the approach may not be the one I was hoping for.
this is where the 'confusion' appears to be arising .. there are no
*patches* ... anything that will require patches to the core server will
almost have to be put to the open source or we hit problems where
development continues without us ... what we are doing with replication
requires *zero* patches to the server, it is purely a third-party
application ...
On Sun, 3 Dec 2000, Gary MacDougall wrote:
I'm agreeing with the people like SePICK and erServer.
I'm only being sort of cheeky in saying that they wouldn't have had a
product had
it not been for the Open Source that they are leveraging off of.
So, basically, if I hadn't pulled together Thomas, Bruce and Vadim 5 years
ago, when Jolly and Andrew finished their graduate thesis, and continued
to provide the resources required to bring PgSQL from v1.06 to now, we
wouldn't be able to use that as a basis for third party applications
... pretty much, ya, that sums it up ...
----- Original Message -----
From: "The Hermit Hacker" <scrappy@hub.org>
To: "Gary MacDougall" <gary@freeportweb.com>
Cc: "mlw" <markw@mohawksoft.com>; "Hannu Krosing" <hannu@tm.ee>; "Thomas
Lockhart" <lockhart@alumni.caltech.edu>; "Don Baccus"
<dhogaza@pacifier.com>; "PostgreSQL Development"
<pgsql-hackers@postgresql.org>
Sent: Sunday, December 03, 2000 7:53 PM
Subject: Re: [HACKERS] beta testing versionOn Sun, 3 Dec 2000, Gary MacDougall wrote:
If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.Thats a given.
okay, then now I'm confused ... neither SePICK or erServer are derived
from uncompensated parties ... they work over top of PgSQL, but are not
integrated into them, nor have required any changes to PgSQL in order to
make it work ...... so, where is this whole outcry coming from?
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sun, Dec 03, 2000 at 08:53:08PM -0400, The Hermit Hacker wrote:
On Sun, 3 Dec 2000, Gary MacDougall wrote:
If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.Thats a given.
okay, then now I'm confused ... neither SePICK or erServer are derived
from uncompensated parties ... they work over top of PgSQL, but are not
integrated into them, nor have required any changes to PgSQL in order to
make it work ...... so, where is this whole outcry coming from?
This paragraph from erserver.com:
eRServer development is currently concentrating on core, universal
functions that will enable individuals and IT professionals
to implement PostgreSQL ORDBMS solutions for mission critical
datawarehousing, datamining, and eCommerce requirements. These
initial developments will be published under the PostgreSQL Open
Source license, and made available through our sites, Certified
Platinum Partners, and others in PostgreSQL community.
led me (and many others) to believe that this was going to be a tighly
integrated service, requiring code in the PostgreSQL core, since that's the
normal use of 'core' around here.
Now that I know it's a completely external implementation, I feel bad about
griping about deadlines. I _do_ wish I'd known this _design choice_ a bit
earlier, as it impacts how I'll try to do some things with pgsql, but that's
my own fault for over interpreting press releases and pre-announcements.
Ross
On Sun, 3 Dec 2000, Ross J. Reedstrom wrote:
On Sun, Dec 03, 2000 at 08:53:08PM -0400, The Hermit Hacker wrote:
On Sun, 3 Dec 2000, Gary MacDougall wrote:
If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do what
ever you want.Thats a given.
okay, then now I'm confused ... neither SePICK or erServer are derived
from uncompensated parties ... they work over top of PgSQL, but are not
integrated into them, nor have required any changes to PgSQL in order to
make it work ...... so, where is this whole outcry coming from?
This paragraph from erserver.com:
eRServer development is currently concentrating on core, universal
functions that will enable individuals and IT professionals
to implement PostgreSQL ORDBMS solutions for mission critical
datawarehousing, datamining, and eCommerce requirements. These
initial developments will be published under the PostgreSQL Open
Source license, and made available through our sites, Certified
Platinum Partners, and others in PostgreSQL community.led me (and many others) to believe that this was going to be a tighly
integrated service, requiring code in the PostgreSQL core, since that's the
normal use of 'core' around here.Now that I know it's a completely external implementation, I feel bad about
griping about deadlines. I _do_ wish I'd known this _design choice_ a bit
earlier, as it impacts how I'll try to do some things with pgsql, but that's
my own fault for over interpreting press releases and pre-announcements.
Apologies from our side as well ... failings on the english language and
choice of said on our side ... the last thing that we want to do is have
to maintain patches across multiple versions for stuff that is core to the
server ... Thomas/Vadim can easily correct me if I've missed something,
but to the best of my knowledge, from our many discussions, anything that
is *core* to the PgSQL server itself will always be released similar to
any other project (namely, tested and open) ... including hooks for any
proprietary projects ... the sanctity of the *core* server is *always*
foremost in our minds, no matter what other projects we are working on ...
bingo.
Not just third-party app's, but think of all the vertical products that
include PG...
I'm right now wondering if TIVO uses it?
You have to think that PG will show up in some pretty interesting money
making products...
So yes, had you not got the ball rolling.... well, you know what I'm saying.
g.
----- Original Message -----
From: "The Hermit Hacker" <scrappy@hub.org>
To: "Gary MacDougall" <gary@freeportweb.com>
Cc: "mlw" <markw@mohawksoft.com>; "Hannu Krosing" <hannu@tm.ee>; "Thomas
Lockhart" <lockhart@alumni.caltech.edu>; "Don Baccus"
<dhogaza@pacifier.com>; "PostgreSQL Development"
<pgsql-hackers@postgresql.org>
Sent: Sunday, December 03, 2000 10:18 PM
Subject: Re: [HACKERS] beta testing version
On Sun, 3 Dec 2000, Gary MacDougall wrote:
I'm agreeing with the people like SePICK and erServer.
I'm only being sort of cheeky in saying that they wouldn't have had a
product had
it not been for the Open Source that they are leveraging off of.So, basically, if I hadn't pulled together Thomas, Bruce and Vadim 5 years
ago, when Jolly and Andrew finished their graduate thesis, and continued
to provide the resources required to bring PgSQL from v1.06 to now, we
wouldn't be able to use that as a basis for third party applications
... pretty much, ya, that sums it up ...----- Original Message -----
From: "The Hermit Hacker" <scrappy@hub.org>
To: "Gary MacDougall" <gary@freeportweb.com>
Cc: "mlw" <markw@mohawksoft.com>; "Hannu Krosing" <hannu@tm.ee>; "Thomas
Lockhart" <lockhart@alumni.caltech.edu>; "Don Baccus"
<dhogaza@pacifier.com>; "PostgreSQL Development"
<pgsql-hackers@postgresql.org>
Sent: Sunday, December 03, 2000 7:53 PM
Subject: Re: [HACKERS] beta testing versionOn Sun, 3 Dec 2000, Gary MacDougall wrote:
If you write a program which stands on its own, takes no work from
uncompensated parties, then you have the unambiguous right to do
what
ever you want.
Thats a given.
okay, then now I'm confused ... neither SePICK or erServer are derived
from uncompensated parties ... they work over top of PgSQL, but are
not
integrated into them, nor have required any changes to PgSQL in order
to
make it work ...
... so, where is this whole outcry coming from?
Marc G. Fournier ICQ#7615664 IRC Nick:
Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary:
scrappy@{freebsd|postgresql}.org
Show quoted text
At 09:42 PM 12/3/00 -0600, Ross J. Reedstrom wrote:
This paragraph from erserver.com:
eRServer development is currently concentrating on core, universal
functions that will enable individuals and IT professionals
to implement PostgreSQL ORDBMS solutions for mission critical
datawarehousing, datamining, and eCommerce requirements. These
initial developments will be published under the PostgreSQL Open
Source license, and made available through our sites, Certified
Platinum Partners, and others in PostgreSQL community.led me (and many others) to believe that this was going to be a tighly
integrated service, requiring code in the PostgreSQL core, since that's the
normal use of 'core' around here.
Right. This is a big source of misunderstanding. There's still the fact
that 50% of the PG steering committee that are involved in [partially] closed
source development based on PG, though. This figure disturbs me.
50% is a lot. It's like ... half, right? Or did I miss something in the
conversion?
This represents significant change from the past where 0%, AFAIK, were
involved in closed source PG add-ons.
Now that I know it's a completely external implementation, I feel bad about
griping about deadlines. I _do_ wish I'd known this _design choice_ a bit
earlier, as it impacts how I'll try to do some things with pgsql, but that's
my own fault for over interpreting press releases and pre-announcements.
IF 50% of the steering committee is to embark on such a task in a closed source
or semi-closed source development model, it would seem common courtesy to inform the
community of the facts as early as they were decided upon.
In fact, it might seem to be common courtesy to float the notion in the community,
to gauge reaction and to build support, before finalizing such a decision.
AFAIC this arrived out of no where, a sort of stealth "50% of the steering committee
has decided to embark on a semi-proprietary solution to the replication problem that
you won't see as open source for [up to] two years after its completion".
That's a paradigm shift. Whether right or wrong, there's a responsibility to
communicate the fact that 50% of the steering committee has decided to partially
abandon the open source development model for one that is (in some cases) closed
for two years and (in other cases) forever.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 11:59 PM 12/3/00 -0400, The Hermit Hacker wrote:
the sanctity of the *core* server is *always*
foremost in our minds, no matter what other projects we are working on ...
What happens if financially things aren't entirely rosy with your company?
The problem in taking itty-bitty steps in this direction is that you're
involving outside money interests that don't necessarily adhere to this
view.
Having taken the first steps to a proprietary, closed source future, would
you pledge to bankrupt your company rather than accept a large captital
investment with an ROI based on proprietary extensions to the core that
might not be likely to come out of the non-tainted side of the development
house?
Would your company sign a contract to that effect with independent parties,
i.e. that it would never violate the sanctity of the *core*? Even if it means
you go broke? And that your investors go broke?
Or would your investors prefer you not make such a formal committment, in order
to keep options open if things don't go well?
(in the early 80's my company received a total of $8,000,000 in pre-IPO
capital investments, so I have some experience with the expectations of investors.
It tends to make me a bit paranoid. I'm not the only COO to have such experiences
while living the life).
What happens in two years if those investors in eRServer haven't gotten adequate
return on their investment? Do you have a formal agreement that the source will
be released regardless? Can the community inspect the agreement so we can judge
for ourselves whether or not this assurance is adequately backed by contract
language?
Are your agreements Open Source? :)
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
At 02:47 PM 12/1/00 -0500, Tom Lane wrote:
All we can do is the best we can ;-). In that light, I think it's
reasonable for Postgres to proceed on the assumption that fsync does
what it claims to do, ie, all blocks are written when it returns.
We can't realistically expect to persuade a disk controller that
reorders writes to stop doing so. We can, however, expect that we've
minimized the probability of failures induced by anything other than
disk hardware failure or power failure.
Right. This is very much the guarantee that RAID (non-zero) makes,
except "other than disk hardware failure" is replaced by "other than
the failure of two drives". RAID gives you that (very, very substantial
boost which is why it is so popular for DB servers). It doesn't give
you power failure assurance for much the same reason that PG (or Oracle,
etc) can.
If transaction processing alone could give you protection against a
single disk hardware failure, Oracle wouldn't've bothered implementing
mirroring in the past before software (and even reasonable hardware)
RAID was available.
Likewise, if mirroring + transaction processing could protect against
disks hosing themselves in power failure situations Oracle wouldn't
suggest that enterprise level customers invest in external disk
subsystems with battery backup sufficient to guarantee everything
the db server believes has been written really is written.
Of course, Oracle license fees are high enough that proper hardware
support tends to look cheap in comparison...
Vadim's WAL code is excellent, and the fact that we run in essence
with -F performance and also less write activity to the disk both
increases performance, and tends to lessen the probability that the
disk will actually be writing a block when the power goes off. The
dice aren't quite so loaded against the server with this lowered
disk activity...
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
In fact, it might seem to be common courtesy...
An odd choice of words coming from you Don.
We are offering our services and expertise to a community outside
-hackers, as a business formed in a way that this new community expects
to see. Nothing special or sinister here. Other than it seems to have
raised the point that you expected each of us to be working for you,
gratis, on projects you find compelling, using all of our available
time, far into the future just as each of us has over the last five
years.
After your recent spewing, it irks me a little to admit that this will
not change, and that we are likely to continue to each work on OS
PostgreSQL projects using all of our available time, just as we have in
the past.
A recent example of non-sinister change in another area is the work done
to release 7.0.3. This is a release which would not have happened in
previous cycles, since we are so close to beta on 7.1. But GB paid Tom
Lane to work on it as part of *their* business plan, and he sheparded it
through the cycle. There was no outcry from you at this presumption, and
on this diversion of community resources for this effort. Not sure why,
other than you chose to pick some other fight.
And no matter which fight you chose, you're wasting the time of others
as you fight your demons.
- Thomas
Horst Herb wrote:
Branding. Phone support lines. Legal departments/Lawsuit prevention.
Figuring
out how to prevent open source from stealing the thunder by duplicating
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
features. And building a _product_.
Oops. You didn't really mean that, did you? Could it be that there are some
people out there thinking "let them free software fools do the hard initial
work, once things are working nicely, we take over, add a few "secret"
ingredients, and voila - the commercial product has been created?
That wasn't the _intended_ meaning, but I suppose that it's a related issue.
I was referring to companies expending variable amounts of time and resources
on a new closed source technology, only to have their marketshare shriveled up
by OSS coders rapidly duplicating their efforts, and releasing free code or a
less expensive product.
To put it in proper context:
If the project under discussion was reverse engineered (or even clean room
"re-engineered") and released as a separate, open source, product (or
even just "free" code), the demand for the PG, Inc. software is placed at
risk.
The actual size, and scope, of the project is irrelevant, as determined
OSS advocates have pretty much taken on any, and every, viable project. It's
not really about "stealing" code efforts, anymore than RedHat "stole"
linux, or that Pg has been stealing features from other ORDBMS's... it's that
OSS is a difficult market to capture, if you are selling closed source
code that can be created, or duplicated, by others.
Stronghold and Raven(?) were more sucessful products before the OSS
encryption efforts took off. Now anybody can build an SSL server, without
paying for licenses that used to cost thousands (I know there's the RSA
issue in this history as well, but let's be realistic about who actually
obeyed all those laws, okay?). Zend is trying to build an IDE for
PHP, but the open-source market moves fast enough that within a few
months of release, there will be clones, reverse engineered versions,
etc. SSH tried valiantly to close their code base.... which created
a market for OpenSSH. You see it time and again, there's a closed
version/extension/plug-in, feature, and an OSS clone gets built up
for it. GUI for sendmail? OSS now. New AIM protocols? gaim was on
it in days. New, proprietary, M$ mail software that took years to build
up, research, and develop? Give the OSS hordes a few months. New,
closed, SMB protocols? Give the samba team a few days, maybe a few
weeks.
To wrap up this point, a closed derivative (or closed new) product is
now competing against OSS pools of developers, which is much harder to
stop than a single closed source company. It's difficult to compete
on code quality, or code features.... you have to compete with a
*product* that is bettter than anything globally co-ordinated code
hackers can build themselves.
-Bop
--
Brought to you from iBop the iMac, a MacOS, Win95, Win98, LinuxPPC machine,
which is currently in MacOS land. Your bopping may vary.
At 07:11 AM 12/4/00 +0000, Thomas Lockhart wrote:
We are offering our services and expertise to a community outside
-hackers, as a business formed in a way that this new community expects
to see. Nothing special or sinister here. Other than it seems to have
raised the point that you expected each of us to be working for you,
gratis, on projects you find compelling, using all of our available
time, far into the future just as each of us has over the last five
years.
No, not at all. Working gratis is not the issue, as I made clear. There
are - despite your rather condescending statement implying otherwise -
business models that lead to revenue without abandoning open source.
I'm making a decent living following such a business model, thank
you very much. I'm living proof that it is possible.
...
A recent example of non-sinister change in another area is the work done
to release 7.0.3. This is a release which would not have happened in
previous cycles, since we are so close to beta on 7.1. But GB paid Tom
Lane to work on it as part of *their* business plan, and he sheparded it
through the cycle. There was no outcry from you at this presumption, and
on this diversion of community resources for this effort. Not sure why,
other than you chose to pick some other fight.
There's a vast difference between releasing 7.0.3 in open source form TODAY
and eRServer, which may not be released in open source form for up to two
years after it enters the market on a closed source, proprietary footing.
To suggest there is no difference, as you seem to be doing, is a hopelessly
unconvincing argument.
The fact that you seem blind to the difference is one reason why PG, Inc
worries me (since you are a principle in the company).
The reason you heard no outcry from me in the PG 7.0.3 case is because there
*is* a difference between it and a semi-proprietary product like eRServer.
If GB had held Tom's work on PG 7.0.3 and released it only in (say) a packaged
release for purchase, saying "we'll release it to the CVS tree after we
recoup our investment", there would've been an outcry from me, bet on it.
Probably others, too...
And no matter which fight you chose, you're wasting the time of others
as you fight your demons.
Well, I guess I'll have to stay off my medication, otherwise my demons
might disappear. I'm a regular miracle of medical science until I forget
to take them.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
On Thu, 30 Nov 2000, Nathan Myers wrote:
Second, the transaction log is not, as has been noted far too frequently
for Vince's comfort, really written atomically. The OS has promised
to write it atomically, and given the opportunity, it will. If you pull
the plug, all promises are broken.
Say what?
Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================
Judging by the information below, taken *directly* from PostgreSQL, Inc.
website, it appears that they will be releasing all code into the main
source code branch -- with the exception of "Advanced Replication and
Distributed Information capabilities" (to which capabilities they are
referring is not made clear) which may remain proprietary for up to 24
months "in order to assist us in recovering development costs and continue
to provide funding for our other Open Source contributions."
I have interpreted this to mean that basic replication (server -> server,
server -> client, possibly more) will be available shortly for Postgres
(with the release of 7.1?) and that those more advanced features will
follow behind. This is one of the last features that was missing from
Postgres (along with recordset returning functions and clusters, among
others) that was holding it back from the enterprise market -- and I do
not blame PostgreSQL, Inc. one bit for withholding some of the more
advanced features to recoup their development costs -- it was *their time*
and *their money* they spent developing the *product* and it must be
recoup'ed for projects like this to make sense in the future (who knows,
maybe next they will implement RS returning SP's or clusters, projects
that are funded with their profit off the advanced replication and
distributed information capabilities that they *may* withhold -- would
people still be whining then?)
Michael Fork - CCNA - MCP - A+
Network Support - Toledo Internet Access - Toledo Ohio
(http://www.pgsql.com/press/PR_5.html)
"At the moment we are limiting our test groups to our existing Platinum
Partners and those clients whose requirements include these
features." advises Jeff MacDonald, VP of Support Services. "We expect to
have the source code tested and ready to contribute to the open source
community before the middle of October. Until that time we are considering
requests from a number of development companies and venture capital groups
to join us in this process."
Davidson explains, "These initial Replication functions are important to
almost every commercial user of PostgreSQL. While we've fully funded all
of this development ourselves, we will be immediately donating these
capabilities to the open source PostgreSQL Global Development Project as
part of our ongoing commitment to the PostgreSQL community."
http://www.erserver.com/
eRServer development is currently concentrating on core, universal
functions that will enable individuals and IT professionals to implement
PostgreSQL ORDBMS solutions for mission critical datawarehousing,
datamining, and eCommerce requirements. These initial developments will be
published under the PostgreSQL Open Source license, and made available
through our sites, Certified Platinum Partners, and others in PostgreSQL
community.
Advanced Replication and Distributed Information capabilities are also
under development to meet specific business and competitive requirements
for both PostgreSQL, Inc. and clients. Several of these enhanced
PostgreSQL, Inc. developments may remain proprietary for up to 24 months,
with availability limited to clients and partners, in order to assist us
in recovering development costs and continue to provide funding for our
other Open Source contributions.
On Sun, 3 Dec 2000, Hannu Krosing wrote:
Show quoted text
The Hermit Hacker wrote:
IIRC, this thread woke up on someone complaining about PostgreSQl inc
promising
to release some code for replication in mid-october and asking for
confirmation
that this is just a schedule slip and that the project is still going on
and
going to be released as open source.What seems to be the answer is: "NO, we will keep the replication code
proprietary".I have not seen this answer myself, but i've got this impression from
the contents
of the whole discussion.Do you know if this is the case ?
-----------
Hannu
Import Notes
Resolved by subject fallback
On Sun, 3 Dec 2000, Don Baccus wrote:
At 11:59 PM 12/3/00 -0400, The Hermit Hacker wrote:
the sanctity of the *core* server is *always*
foremost in our minds, no matter what other projects we are working on ...What happens if financially things aren't entirely rosy with your
company? The problem in taking itty-bitty steps in this direction is
that you're involving outside money interests that don't necessarily
adhere to this view.Having taken the first steps to a proprietary, closed source future,
would you pledge to bankrupt your company rather than accept a large
captital investment with an ROI based on proprietary extensions to the
core that might not be likely to come out of the non-tainted side of
the development house?
You mean sort of like Great Bridge investing in core developers? Quite
frankly, I have yet to see anything but good come out of Tom as a result
of that, as now he has more time on his hands ... then again, maybe Outer
Joins was a bad idea? *raised eyebrow*
PgSQL is *open source* ... that means that if you don't like it, take the
code, fork off your own version if you don't like what's happening to the
current tree and build your own community *shrug*
Can we PLEASE kill this thread? There are only a handful of people who
are making contributions here and nothing really new is being said. I
agree that the issue should be discussed, but this does not seem like the
right forum.
Thanks.
- brandon
b. palmer, bpalmer@crimelabs.net
pgp: www.crimelabs.net/bpalmer.pgp5
On Mon, 4 Dec 2000, Don Baccus wrote:
A recent example of non-sinister change in another area is the work done
to release 7.0.3. This is a release which would not have happened in
previous cycles, since we are so close to beta on 7.1. But GB paid Tom
Lane to work on it as part of *their* business plan, and he sheparded it
through the cycle. There was no outcry from you at this presumption, and
on this diversion of community resources for this effort. Not sure why,
other than you chose to pick some other fight.There's a vast difference between releasing 7.0.3 in open source form
TODAY and eRServer, which may not be released in open source form for
up to two years after it enters the market on a closed source,
proprietary footing. To suggest there is no difference, as you seem to
be doing, is a hopelessly unconvincing argument.
Except, eRServer, the basic model, will be released Open Source, and, if
all goes as planned, in time for inclusion in contrib of v7.1 ...
This paragraph from erserver.com:
eRServer development is currently concentrating on core, universal
functions that will enable individuals and IT professionals
to implement PostgreSQL ORDBMS solutions for mission critical
datawarehousing, datamining, and eCommerce requirements. These
initial developments will be published under the PostgreSQL Open
Source license, and made available through our sites, Certified
Platinum Partners, and others in PostgreSQL community.
led me (and many others) to believe that this was going to be a tighly
integrated service, requiring code in the PostgreSQL core, since that's the
normal use of 'core' around here.
"Around here" isn't "around there" ;)
As you can see, "core" == "fundamental" in the general sense, in a
statement not written specifically for the hacker community but for the
world at large. In many cases, taking one syllable rather than four is a
good thing, but sorry it led to confusion.
My schedule is completely out of whack, partly from taking the afternoon
off to cool down from the personal attacks being lobbed my direction.
Will pick things up as time permits, but we should have some code for
contrib/ in time for beta2, if it is acceptable to the community to put
it in there.
- Thomas
Right. This is very much the guarantee that RAID (non-zero) makes,
except "other than disk hardware failure" is replaced by "other than
the failure of two drives". RAID gives you that (very, very
substantial
boost which is why it is so popular for DB servers). It doesn't give
you power failure assurance for much the same reason that PG
(or Oracle,
etc) can.
As far as I know (and have tested in excess) Informix IDS does survive
any power loss without leaving the db in a corrupted state.
The basic technology is, that it only relys on writes to one "file"
(raw device in that case), the txlog, which is directly written.
All writes to the txlog are basically appends to that log. Meaning that all writes
are sync writes to the currently active (== last) page. All other IO is not a problem,
because a backup image "physical log" is kept for each page that needs to
be written. During fast recovery the content of the physical log is restored to the
originating pages (thus all pendig IO is undone) before rollforward is started.
Andreas
Import Notes
Resolved by subject fallback
On Tue, Dec 05, 2000 at 05:29:36AM +0000, Thomas Lockhart wrote:
As you can see, "core" == "fundamental" in the general sense, in a
statement not written specifically for the hacker community but for the
world at large. In many cases, taking one syllable rather than four is a
good thing, but sorry it led to confusion.
Yep, a closer re-read led me to enlightenment.
My schedule is completely out of whack, partly from taking the afternoon
off to cool down from the personal attacks being lobbed my direction.
I'm sorry about that. I hope the part of this thread that I helped start
didn't contribute too much to your distress. Had I realized at the time
that there was _no_ pgsql core work involved, I would have been less
distressed myself by the time slip. With beta on the way, I was concerned
that it wouldn't get in until the 7.2 tree opened.
Will pick things up as time permits, but we should have some code for
contrib/ in time for beta2, if it is acceptable to the community to put
it in there.
I'm of the 'contrib is for stuff that doesn't even necessarily currently
build' school, although I appreciate the work that's been done to reverse
the bit rot. Drop it in at any time, as far as I'm concerned.
Ross
As far as I know (and have tested in excess) Informix IDS
does survive any power loss without leaving the db in a
corrupted state. The basic technology is, that it only relys
on writes to one "file" (raw device in that case), the txlog,
which is directly written. All writes to the txlog are basically
appends to that log. Meaning that all writes are sync writes to
the currently active (== last) page. All other IO is not a problem,
because a backup image "physical log" is kept for each page
that needs to be written. During fast recovery the content of the
physical log is restored to the originating pages (thus all pendig
IO is undone) before rollforward is started.
Sounds great! We can follow this way: when first after last checkpoint
update to a page being logged, XLOG code can log not AM specific update
record but entire page (creating backup "physical log"). During after
crash recovery such pages will be redone first, ensuring page consistency
for further redo ops. This means bigger log, of course.
Initdb will not be required for these code changes, so it can be
implemented in any 7.1.X, X >=1.
Thanks, Andreas!
Vadim
Import Notes
Resolved by subject fallback
On Sunday 03 December 2000 04:00, Vadim Mikheev wrote:
There is risk here. It isn't so much in the fact that PostgreSQL, Inc
is doing a couple of modest closed-source things with the code. After
all, the PG community has long acknowleged that the BSD license would
allow others to co-op the code and commercialize it with no obligations.It is rather sad to see PG, Inc. take the first step in this direction.
How long until the entire code base gets co-opted?
I totaly missed your point here. How closing source of ERserver is related
to closing code of PostgreSQL DB server? Let me clear things:1. ERserver isn't based on WAL. It will work with any version >= 6.5
2. WAL was partially sponsored by my employer, Sectorbase.com,
not by PG, Inc.
Has somebody thought about putting PG in the GPL licence instead of the BSD?
PG inc would still be able to do there money giving support (just like IBM,
HP and Compaq are doing there share with Linux), without been able to close
the code.
Only a thought...
Saludos... :-)
--
"And I'm happy, because you make me feel good, about me." - Melvin Udall
-----------------------------------------------------------------
Mart�n Marqu�s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------
I totaly missed your point here. How closing source of
ERserver is related to closing code of PostgreSQL DB server?
Let me clear things:1. ERserver isn't based on WAL. It will work with any version >= 6.5
2. WAL was partially sponsored by my employer, Sectorbase.com,
not by PG, Inc.Has somebody thought about putting PG in the GPL licence
instead of the BSD?
PG inc would still be able to do there money giving support
(just like IBM, HP and Compaq are doing there share with Linux),
without been able to close the code.
ERserver is *external* application that change *nothing* in
PostgreSQL code. So, no matter under what licence are
server code, any company will be able to close code of
any privately developed *external* application.
And I don't see what's wrong with this, do you?
Vadim
Import Notes
Resolved by subject fallback
On Sunday 03 December 2000 12:41, mlw wrote:
Thomas Lockhart wrote:
As soon as you find a business model which does not require income, let
me know. The .com'ers are trying it at the moment, and there seems to be
a few flaws... ;)While I have not contributed anything to Postgres yet, I have
contributed to other environments. The prospect that I could create a
piece of code, spend weeks/years of my own time on something and some
entity can come along, take what I've written and create a product which
is better for it, and then not share back is offensive. Under GPL it is
illegal. (Postgres should try to move to GPL)
With you on the last statemente.
I am working on a full-text search engine for Postgres. A really fast
one, something better than anything else out there. It combines the
power and scalability of a web search engine, with the data-mining
capabilities of SQL.
If you want to make something GPL I would be more then interested to help
you. We could use something like that over here, and I have no problem at all
with releasing it as GPL code.
If I write this extension to Postgres, and release it, is it right that
a business can come along, add a few things here and there and introduce
a new closed source product on what I have written? That is certainly
not what I intend. My intention was to honor the people before me for
providing the rich environment which is Postgres. I have made real money
using Postgres in a work environment. The time I would give back more
than covers MSSQL/Oracle licenses.
I'm not sure, but you could introduce a peice of GPL code in the BSD code,
but the result would have to be GPL.
Hoping to hear from you,
--
"And I'm happy, because you make me feel good, about me." - Melvin Udall
-----------------------------------------------------------------
Mart���n Marqu���s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------
On Sunday 03 December 2000 21:49, The Hermit Hacker wrote:
I've been trying to follow this thread, and seem to have missed where
someone arrived at the conclusion that we were proprietarizing(word?) this
I have missed that part as well.
... we do apologize that it didn't get out mid-October, but it is/was
purely a scheduale slip ...
I would never say something about schedules of OSS. Let it be in BSD or GPL
license.
Saludos... :-)
--
"And I'm happy, because you make me feel good, about me." - Melvin Udall
-----------------------------------------------------------------
Mart���n Marqu���s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------
On Tuesday 05 December 2000 16:23, Martin A. Marques wrote:
Has somebody thought about putting PG in the GPL licence instead of the
BSD? PG inc would still be able to do there money giving support (just like
IBM, HP and Compaq are doing there share with Linux), without been able to
close the code.
I shouldn't be answering myself, but I just got to the end of the thread
(exams got on me the last 2 days), so I want to apologize for sending this
mail (even if it reflects what my feelings are) without reading the other
mails in the thread.
Sorry
--
"And I'm happy, because you make me feel good, about me." - Melvin Udall
-----------------------------------------------------------------
Mart�n Marqu�s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------
I totaly missed your point here. How closing source of
ERserver is related to closing code of PostgreSQL DB server?
Let me clear things:1. ERserver isn't based on WAL. It will work with any version >= 6.5
2. WAL was partially sponsored by my employer, Sectorbase.com,
not by PG, Inc.Has somebody thought about putting PG in the GPL licence
instead of the BSD?
PG inc would still be able to do there money giving support
(just like IBM, HP and Compaq are doing there share with Linux),
without been able to close the code.
This gets brought up every couple of months, I don't see the point
in denying any of the current Postgresql developers the chance
to make some money selling a non-freeware version of Postgresql.
We can also look at it another way, let's say ER server was meant
to be closed source, if the code it was derived from was GPL'd
then that chance was gone before it even happened. Hence no
reason to develop it.
*poof* no ER server.
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
On Tue, 5 Dec 2000, Martin A. Marques wrote:
On Sunday 03 December 2000 04:00, Vadim Mikheev wrote:
There is risk here. It isn't so much in the fact that PostgreSQL, Inc
is doing a couple of modest closed-source things with the code. After
all, the PG community has long acknowleged that the BSD license would
allow others to co-op the code and commercialize it with no obligations.It is rather sad to see PG, Inc. take the first step in this direction.
How long until the entire code base gets co-opted?
I totaly missed your point here. How closing source of ERserver is related
to closing code of PostgreSQL DB server? Let me clear things:1. ERserver isn't based on WAL. It will work with any version >= 6.5
2. WAL was partially sponsored by my employer, Sectorbase.com,
not by PG, Inc.Has somebody thought about putting PG in the GPL licence instead of the BSD?
its been brought up and rejected continuously ... in some of our opinions,
GPL is more harmful then helpful ... as has been said before many times,
and I'm sure will continue to be said "changing the license to GPL is a
non-discussable issue" ...
On Tue, Dec 05, 2000 at 10:43:03AM -0800, Mikheev, Vadim wrote:
As far as I know (and have tested in excess) Informix IDS
does survive any power loss without leaving the db in a
corrupted state. The basic technology is, that it only relys
on writes to one "file" (raw device in that case), the txlog,
which is directly written. All writes to the txlog are basically
appends to that log. Meaning that all writes are sync writes to
the currently active (== last) page. All other IO is not a problem,
because a backup image "physical log" is kept for each page
that needs to be written. During fast recovery the content of the
physical log is restored to the originating pages (thus all pendig
IO is undone) before rollforward is started.Sounds great! We can follow this way: when first after last checkpoint
update to a page being logged, XLOG code can log not AM specific update
record but entire page (creating backup "physical log"). During after
crash recovery such pages will be redone first, ensuring page consistency
for further redo ops. This means bigger log, of course.
Be sure to include a CRC of each part of the block that you hope
to replay individually.
Nathan Myers
ncm@zembu.com
Sounds great! We can follow this way: when first after last
checkpoint update to a page being logged, XLOG code can log
not AM specific update record but entire page (creating backup
"physical log"). During after crash recovery such pages will
be redone first, ensuring page consistency for further redo ops.
This means bigger log, of course.Be sure to include a CRC of each part of the block that you hope
to replay individually.
Why should we do this? I'm not going to replay parts individually,
I'm going to write entire pages to OS cache and than apply changes to
them. Recovery is considered as succeeded after server is ensured
that all applyed changes are on the disk. In the case of crash during
recovery we'll replay entire game.
Vadim
Import Notes
Resolved by subject fallback
The Hermit Hacker wrote:
its been brought up and rejected continuously ... in some of our opinions,
GPL is more harmful then helpful ... as has been said before many times,
and I'm sure will continue to be said "changing the license to GPL is a
non-discussable issue" ...
I've declined commenting on this thread until now -- but this statement
bears amplification.
GPL is NOT the be-all end-all Free Software (in the FSF/GNU sense!)
license. There is room for more than one license -- just as there is
room for more than one OS, more than one Unix, more than one Free RDBMS,
more than one Free webserver, more than one scripting language, more
than one compiler system, more than one Linux distribution, more than
one BSD, and more than one CPU architecture.
Why make a square peg development group fit a round peg license? :-)
Use a round peg for round holes, and a square peg for square holes.
Choice of license for PostgreSQL is not negotiable. I don't say that as
an edict from Lamar Owen (after all, I am in no position to edict
anything :-)) -- I say that as a studied observation of the last times
this subject has come up.
I personally prefer GPL. But my personal preference and what is good
for the project are two different things. BSD is good for this project
with this group of developers -- and it should not change.
And, like any other open development effort, there will be missteps --
which missteps should, IMHO, be put behind us. No software is perfect;
no development team is, either.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
On Tuesday 05 December 2000 18:03, The Hermit Hacker wrote:
Has somebody thought about putting PG in the GPL licence instead of the
BSD?its been brought up and rejected continuously ... in some of our opinions,
GPL is more harmful then helpful ... as has been said before many times,
and I'm sure will continue to be said "changing the license to GPL is a
non-discussable issue" ...
It's pretty clear to me, and I respect the decision (I really do).
--
"And I'm happy, because you make me feel good, about me." - Melvin Udall
-----------------------------------------------------------------
Mart���n Marqu���s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------
Regardless of what license is best, could the license even be changed now? I
mean, some of the initial Berkeley code is still in there in some sense and
I would think that the original license (BSD I assume) of the initial source
code release would have to be somehow honored.. I'm just wondering if the PG
team could change the license even if they wanted to.. I should go read the
license again, I know the answer to the above is in there but it's been a
long time since I've looked it over and I'm in the middle of packing, so I
haven't got the time right now.. Thanks to anyone for satisfying my
curiosity in answering this question.
I think that it's very, very good if the license is indeed untouchable, it
keeps PostgreSQL from becoming totally closed-source and/or totally
commercial.. Obviously things can be added to PG and sold commercially, but
there will always be the base PostgreSQL out there for everyone...... I
hope.
Just my $0.02 worth..
-Mitch
----- Original Message -----
From: "Lamar Owen" <lamar.owen@wgcr.org>
To: "PostgreSQL Development" <pgsql-hackers@postgresql.org>
Sent: Tuesday, December 05, 2000 1:45 PM
Subject: Re: [HACKERS] beta testing version
The Hermit Hacker wrote:
its been brought up and rejected continuously ... in some of our
opinions,
Show quoted text
GPL is more harmful then helpful ... as has been said before many times,
and I'm sure will continue to be said "changing the license to GPL is a
non-discussable issue" ...I've declined commenting on this thread until now -- but this statement
bears amplification.GPL is NOT the be-all end-all Free Software (in the FSF/GNU sense!)
license. There is room for more than one license -- just as there is
room for more than one OS, more than one Unix, more than one Free RDBMS,
more than one Free webserver, more than one scripting language, more
than one compiler system, more than one Linux distribution, more than
one BSD, and more than one CPU architecture.Why make a square peg development group fit a round peg license? :-)
Use a round peg for round holes, and a square peg for square holes.Choice of license for PostgreSQL is not negotiable. I don't say that as
an edict from Lamar Owen (after all, I am in no position to edict
anything :-)) -- I say that as a studied observation of the last times
this subject has come up.I personally prefer GPL. But my personal preference and what is good
for the project are two different things. BSD is good for this project
with this group of developers -- and it should not change.And, like any other open development effort, there will be missteps --
which missteps should, IMHO, be put behind us. No software is perfect;
no development team is, either.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
Mitch Vincent wrote:
Regardless of what license is best, could the license even be changed now? I
mean, some of the initial Berkeley code is still in there in some sense and
I would think that the original license (BSD I assume) of the initial source
code release would have to be somehow honored.. I'm just wondering if the PG
team could change the license even if they wanted to.. I should go read the
license again, I know the answer to the above is in there but it's been a
_Every_single_ copyright holder of code in the core server would have to
agree to any change.
Not a likely event.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes:
Mitch Vincent wrote:
Regardless of what license is best, could the license even be changed now? I
mean, some of the initial Berkeley code is still in there in some sense and
I would think that the original license (BSD I assume) of the initial source
code release would have to be somehow honored.. I'm just wondering if the PG
team could change the license even if they wanted to.. I should go read the
license again, I know the answer to the above is in there but it's been a_Every_single_ copyright holder of code in the core server would have to
agree to any change.
No - GPL projects can include BSD-copyrighted code, no problem
there. That being said, creating bad blood is not a good thing, so an
approach like this would hurt PostgreSQL a lot.
--
Trond Eivind Glomsr�d
Red Hat, Inc.
"Martin A. Marques" wrote:
Has somebody thought about putting PG in the GPL licence instead of the BSD?
It is somewhat difficult to put other peoples code under some different
license.
And AFAIK (IANAL) the old license would still apply too for all the code
that
has been released under it.
PG inc would still be able to do there money giving support (just like IBM,
HP and Compaq are doing there share with Linux), without been able to close
the code.
PG inc would also be able to make money selling dairy products (as they
seem
to employ some smart people and smart peole, if in need, are usually
able to
make the money they need).
But I suspect that the farther away from developing postgres(-related)
products
they have to look for making a living, the worse the results are for
PostgreSQL.
Only a thought...
You can always license _your_ contributions under whatever license you
choose -
GPL, LGPL,MPL, SCL or even a shrink-wrap, open-the-wrap-before-reading
license
that demands users to forfeit their firstborn child for even looking at
the product.
From what I have read on this list (I guess) it may be unsafe for you
to release something in public domain (in US at least), as you are then
unable to claim yourself not liable for your users' blunders (akin to
leaving a loaded gun on a parkbench)
----------
Hannu
Trond Eivind Glomsr�d wrote:
Lamar Owen <lamar.owen@wgcr.org> writes:
Mitch Vincent wrote:
code release would have to be somehow honored.. I'm just wondering if the PG
team could change the license even if they wanted to.. I should go read the
_Every_single_ copyright holder of code in the core server would have to
agree to any change.
No - GPL projects can include BSD-copyrighted code, no problem
there. That being said, creating bad blood is not a good thing, so an
approach like this would hurt PostgreSQL a lot.
Well, in actuality, the original code from PostgreSQL would still be
BSD-licensed and would be immune to infection from the GPL 'virus'. See
rms' comments on the Vista software package -- that package is public
domain, and the original code will always be public domain.
To get the 'original' code relicensed would require the consent of every
developer.
Of course, the BSD license allows redistribution under virtually any
license -- but said redistribution doesn't affect the original in any
way.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
PS: is there a difference in the '�' you used in your e-mail this time
and the '�' that used to be present? I'm ignorant of that letter's
usage.
Sounds great! We can follow this way: when first after last
checkpoint update to a page being logged, XLOG code can log
not AM specific update record but entire page (creating backup
"physical log"). During after crash recovery such pages will
be redone first, ensuring page consistency for further redo ops.
This means bigger log, of course.Be sure to include a CRC of each part of the block that you hope
to replay individually.Why should we do this? I'm not going to replay parts individually,
I'm going to write entire pages to OS cache and than apply changes to
them. Recovery is considered as succeeded after server is ensured
that all applyed changes are on the disk. In the case of crash during
recovery we'll replay entire game.
Yes, but there would need to be a way to verify the last page or record from txlog when
running on crap hardware. The point was, that crap hardware writes our 8k pages
in any order (e.g. 512 bytes from the end, then 512 bytes from front ...), and does not
even notice, that it only wrote part of one such 512 byte block when reading it back
after a crash. But, I actually doubt that this is true for all but the most crappy hardware.
Andreas
Import Notes
Resolved by subject fallback
Zeugswetter Andreas SB <ZeugswetterA@Wien.Spardat.at> writes:
Yes, but there would need to be a way to verify the last page or
record from txlog when running on crap hardware.
How exactly *do* we determine where the end of the valid log data is,
anyway?
regards, tom lane
Tom Lane wrote:
Zeugswetter Andreas SB <ZeugswetterA@Wien.Spardat.at> writes:
Yes, but there would need to be a way to verify the last page or
record from txlog when running on crap hardware.How exactly *do* we determine where the end of the valid log data is,
anyway?
Couldn't you use a CRC ?
Anyway... may I suggest adding CRCs to the data ? I just discovered that
I had a faulty HD controller and I fear that something could have been
written erroneously (this could also help to detect faulty memory,
though only in certain cases).
Bye!
--
Daniele Orlandi
Planet Srl
On Wed, Dec 06, 2000 at 11:15:26AM -0500, Tom Lane wrote:
Zeugswetter Andreas SB <ZeugswetterA@Wien.Spardat.at> writes:
Yes, but there would need to be a way to verify the last page or
record from txlog when running on crap hardware.How exactly *do* we determine where the end of the valid log data is,
anyway?
I don't know how pgsql does it, but the only safe way I know of is to
include an "end" marker after each record. When writing to the log,
append the records after the last end marker, ending with another end
marker, and fdatasync the log. Then overwrite the previous end marker
to indicate it's not the end of the log any more and fdatasync again.
To ensure that it is written atomically, the end marker must not cross a
hardware sector boundary (typically 512 bytes). This can be trivially
guaranteed by making the marker a single byte.
Any other way I've seen discussed (here and elsewhere) either
- Requires atomic multi-sector writes, which are possible only if all
the sectors are sequential on disk, the kernel issues one large write
for all of them, and you don't powerfail in the middle of the write.
- Assume that a CRC is a guarantee. A CRC would be a good addition to
help ensure the data wasn't broken by flakey drive firmware, but
doesn't guarantee consistency.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote:
On Wed, Dec 06, 2000 at 11:15:26AM -0500, Tom Lane wrote:
Zeugswetter Andreas SB <ZeugswetterA@Wien.Spardat.at> writes:
Yes, but there would need to be a way to verify the last page or
record from txlog when running on crap hardware.How exactly *do* we determine where the end of the valid log data is,
anyway?I don't know how pgsql does it, but the only safe way I know of is to
include an "end" marker after each record. When writing to the log,
append the records after the last end marker, ending with another end
marker, and fdatasync the log. Then overwrite the previous end marker
to indicate it's not the end of the log any more and fdatasync again.To ensure that it is written atomically, the end marker must not cross a
hardware sector boundary (typically 512 bytes). This can be trivially
guaranteed by making the marker a single byte.
An "end" marker is not sufficient, unless all writes are done in
one-sector units with an fsync between, and the drive buffering
is turned off. For larger writes the OS will re-order the writes.
Most drives will re-order them too, even if the OS doesn't.
Any other way I've seen discussed (here and elsewhere) either
- Requires atomic multi-sector writes, which are possible only if all
the sectors are sequential on disk, the kernel issues one large write
for all of them, and you don't powerfail in the middle of the write.
- Assume that a CRC is a guarantee.
We are already assuming a CRC is a guarantee.
The drive computes a CRC for each sector, and if the CRC is OK the
drive is happy. CRC errors within the drive are quite frequent, and
the drive re-reads when a bad CRC comes up. (If it sees errors too
frequently on a sector, it rewrites it; if it sees persistent errors
on a sector, it marks that one bad and relocates it.) You can expect
to experience, in production, about the error rate that the drive
manufacturer specifies as "maximum".
... A CRC would be a good addition to
help ensure the data wasn't broken by flakey drive firmware, but
doesn't guarantee consistency.
No, a CRC would be a good addition to compensate for sector write
reordering, which is done both by the OS and by the drive, even for
"atomic" writes.
It is not only "flaky" or "cheap" drives that re-order writes, or
acknowledge writes as complete that have are not yet on disk. You
can generally assume that *any* drive does it unless you have
specifically turned that off. The assumption is that if you care,
you have a UPS, or at least have configured the hardware yourself
to meet your needs.
It is purely wishful thinking to believe otherwise.
Nathan Myers
ncm@zembu.com
On Wed, Dec 06, 2000 at 12:29:00PM +0100, Zeugswetter Andreas SB wrote:
Why should we do this? I'm not going to replay parts individually,
I'm going to write entire pages to OS cache and than apply changes
to them. Recovery is considered as succeeded after server is ensured
that all applyed changes are on the disk. In the case of crash
during recovery we'll replay entire game.Yes, but there would need to be a way to verify the last page or
record from txlog when running on crap hardware. The point was, that
crap hardware writes our 8k pages in any order (e.g. 512 bytes from
the end, then 512 bytes from front ...), and does not even notice,
that it only wrote part of one such 512 byte block when reading it
back after a crash. But, I actually doubt that this is true for all
but the most crappy hardware.
By this standard all hardware is crap. The behavior Andreas describes
as "crappy" is the normal behavior of almost all drives in production,
including the ones in your machine.
Furthermore, OSes re-order "atomic" writes into file systems (i.e.
not raw partitions) to match partition block order, which often doesn't
match the file block order. Hence, the OSes are "crappy" too.
Wishful thinking is a poor substitute for real atomicity. Block
CRCs can at least verify complete writes to reasonable confidence,
if not ensure them.
Nathan Myers
ncm
Bruce Guenter wrote:
- Assume that a CRC is a guarantee. A CRC would be a good addition to
help ensure the data wasn't broken by flakey drive firmware, but
doesn't guarantee consistency.
Even a CRC per transaction (it could be a nice END record) ?
Bye!
--
Daniele
-------------------------------------------------------------------------------
Daniele Orlandi - Utility Line Italia - http://www.orlandi.com
Via Mezzera 29/A - 20030 - Seveso (MI) - Italy
-------------------------------------------------------------------------------
On Wed, Dec 06, 2000 at 11:08:00AM -0800, Nathan Myers wrote:
On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote:
On Wed, Dec 06, 2000 at 11:15:26AM -0500, Tom Lane wrote:
How exactly *do* we determine where the end of the valid log data is,
anyway?I don't know how pgsql does it, but the only safe way I know of is to
include an "end" marker after each record. When writing to the log,
append the records after the last end marker, ending with another end
marker, and fdatasync the log. Then overwrite the previous end marker
to indicate it's not the end of the log any more and fdatasync again.To ensure that it is written atomically, the end marker must not cross a
hardware sector boundary (typically 512 bytes). This can be trivially
guaranteed by making the marker a single byte.An "end" marker is not sufficient, unless all writes are done in
one-sector units with an fsync between, and the drive buffering
is turned off.
That's why an end marker must follow all valid records. When you write
records, you don't touch the marker, and add an end marker to the end of
the records you've written. After writing and syncing the records, you
rewrite the end marker to indicate that the data following it is valid,
and sync again. There is no state in that sequence in which partially-
written data could be confused as real data, assuming either your drives
aren't doing write-back caching or you have a UPS, and fsync doesn't
return until the drives return success.
For larger writes the OS will re-order the writes.
Most drives will re-order them too, even if the OS doesn't.
I'm well aware of that.
Any other way I've seen discussed (here and elsewhere) either
- Assume that a CRC is a guarantee.We are already assuming a CRC is a guarantee.
The drive computes a CRC for each sector, and if the CRC is OK the
drive is happy. CRC errors within the drive are quite frequent, and
the drive re-reads when a bad CRC comes up.
The kind of data failures that a CRC is guaranteed to catch (N-bit
errors) are almost precisely those that a mis-read on a hardware sector
would cause.
... A CRC would be a good addition to
help ensure the data wasn't broken by flakey drive firmware, but
doesn't guarantee consistency.No, a CRC would be a good addition to compensate for sector write
reordering, which is done both by the OS and by the drive, even for
"atomic" writes.
But it doesn't guarantee consistency, even in that case. There is a
possibility (however small) that the random data that was located in the
sectors before the write will match the CRC.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Wed, Dec 06, 2000 at 11:13:33PM +0000, Daniele Orlandi wrote:
Bruce Guenter wrote:
- Assume that a CRC is a guarantee. A CRC would be a good addition to
help ensure the data wasn't broken by flakey drive firmware, but
doesn't guarantee consistency.Even a CRC per transaction (it could be a nice END record) ?
CRCs are designed to catch N-bit errors (ie N bits in a row with their
values flipped). N is (IIRC) the number of bits in the CRC minus one.
So, a 32-bit CRC can catch all 31-bit errors. That's the only guarantee
a CRC gives. Everything else has a 1 in 2^32-1 chance of producing the
same CRC as the original data. That's pretty good odds, but not a
guarantee.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
CRCs are designed to catch N-bit errors (ie N bits in a row with their
values flipped). N is (IIRC) the number of bits in the CRC minus one.
So, a 32-bit CRC can catch all 31-bit errors. That's the only guarantee
a CRC gives. Everything else has a 1 in 2^32-1 chance of producing the
same CRC as the original data. That's pretty good odds, but not a
guarantee.
You've got a higher chance of undetected hard drive errors, memory errors,
solar flares, etc. than a CRC of that quality failing...
Chris
CRCs are designed to catch N-bit errors (ie N bits in a row
with their
values flipped). N is (IIRC) the number of bits in the CRC
minus one.
So, a 32-bit CRC can catch all 31-bit errors. That's the
only guarantee
a CRC gives. Everything else has a 1 in 2^32-1 chance of
producing the
same CRC as the original data. That's pretty good odds, but not a
guarantee.You've got a higher chance of undetected hard drive errors,
memory errors,
solar flares, etc. than a CRC of that quality failing...
Also, how long is CRC in TCP/IP packages? => there is always
risk that backend will commit not what you sended to it.
Vadim
Import Notes
Resolved by subject fallback
Sounds great! We can follow this way: when first after last
checkpoint update to a page being logged, XLOG code can log
not AM specific update record but entire page (creating backup
"physical log"). During after crash recovery such pages will
be redone first, ensuring page consistency for further redo ops.
This means bigger log, of course.Be sure to include a CRC of each part of the block that you hope
to replay individually.Why should we do this? I'm not going to replay parts individually,
I'm going to write entire pages to OS cache and than apply
changes to them. Recovery is considered as succeeded after server
is ensured that all applyed changes are on the disk. In the case of
crash during recovery we'll replay entire game.Yes, but there would need to be a way to verify the last page
or record from txlog when running on crap hardware. The point was,
that crap hardware writes our 8k pages in any order (e.g. 512 bytes
from the end, then 512 bytes from front ...), and does not
even notice, that it only wrote part of one such 512 byte block when
reading it back after a crash. But, I actually doubt that this is
true for all but the most crappy hardware.
Oh, I didn't consider log consistency that time. Anyway we need in CRC
for entire log record not for its 512-bytes parts.
Well, I didn't care about not atomic 8K-block writes in current WAL
implementation - we never were protected from this: backend inserts
tuple, but only line pointers go to disk => new lp points on some
garbade inside unupdated page content. Yes, transaction was not
committed but who knows content of this garbade and what we'll get
from scan trying to read it. Same for index pages.
Can we come to agreement about CRC in log records? Probably it's
not too late to add it (initdb).
Seeing bad CRC recovery procedure will assume that current record
(and all others after it, if any) is garbade - ie comes from
interrupted disk write - and may be ignored (backend writes data
pages only after changes are logged - if changes weren't
successfully logged then on-disk image of data pages was not
updated and we are not interested in log records).
This may be implemented very fast (if someone points me where
I can find CRC func). And I could implement "physical log"
till next monday.
Comments?
Vadim
Import Notes
Resolved by subject fallback
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
This may be implemented very fast (if someone points me where
I can find CRC func).
Lifted from the PNG spec (RFC 2083):
15. Appendix: Sample CRC Code
The following sample code represents a practical implementation of
the CRC (Cyclic Redundancy Check) employed in PNG chunks. (See also
ISO 3309 [ISO-3309] or ITU-T V.42 [ITU-V42] for a formal
specification.)
/* Make the table for a fast CRC. */
void make_crc_table(void)
{
unsigned long c;
int n, k;
for (n = 0; n < 256; n++) {
c = (unsigned long) n;
for (k = 0; k < 8; k++) {
if (c & 1)
c = 0xedb88320L ^ (c >> 1);
else
c = c >> 1;
}
crc_table[n] = c;
}
crc_table_computed = 1;
}
/* Update a running CRC with the bytes buf[0..len-1]--the CRC
should be initialized to all 1's, and the transmitted value
is the 1's complement of the final running CRC (see the
crc() routine below)). */
unsigned long update_crc(unsigned long crc, unsigned char *buf,
int len)
{
unsigned long c = crc;
int n;
if (!crc_table_computed)
make_crc_table();
for (n = 0; n < len; n++) {
c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
}
return c;
}
/* Return the CRC of the bytes buf[0..len-1]. */
unsigned long crc(unsigned char *buf, int len)
{
return update_crc(0xffffffffL, buf, len) ^ 0xffffffffL;
}
regards, tom lane
Lifted from the PNG spec (RFC 2083):
Drat, I dropped the table declarations:
/* Table of CRCs of all 8-bit messages. */
unsigned long crc_table[256];
/* Flag: has the table been computed? Initially false. */
int crc_table_computed = 0;
regards, tom lane
This may be implemented very fast (if someone points me where
I can find CRC func). And I could implement "physical log"
till next monday.
I have been experimenting with CRCs for the past 6 month in our database for
internal logging purposes. Downloaded a lot of hash libraries, tried
different algorithms, and implemented a few myself. Which algorithm do you
want? Have a look at the openssl libraries (www.openssl.org) for a start -if
you don't find what you want let me know.
As the logging might include large data blocks, especially now that we can
TOAST our data, I would strongly suggest to use strong hashes like RIPEMD or
MD5 instead of CRC-32 and the like. Sure, it takes more time tocalculate and
more place on the hard disk, but then: a database without data integrity
(and means of _proofing_ integrity) is pretty worthless.
Horst
recently I have downloaded a pre-beta postgresql, I found insert and update speed is slower then 7.0.3,
even I turn of sync flag, it is still slow than 7.0, why? how can I make it faster?
Regards,
XuYifeng
Horst Herb wrote:
This may be implemented very fast (if someone points me where
I can find CRC func). And I could implement "physical log"
till next monday.I have been experimenting with CRCs for the past 6 month in our database for
internal logging purposes. Downloaded a lot of hash libraries, tried
different algorithms, and implemented a few myself. Which algorithm do you
want? Have a look at the openssl libraries (www.openssl.org) for a start -if
you don't find what you want let me know.As the logging might include large data blocks, especially now that we can
TOAST our data, I would strongly suggest to use strong hashes like RIPEMD or
MD5 instead of CRC-32 and the like. Sure, it takes more time tocalculate and
more place on the hard disk, but then: a database without data integrity
(and means of _proofing_ integrity) is pretty worthless.
The choice of hash algoritm could be made a compile-time switch quite
easyly I guess.
---------
Hannu
On Thu, Dec 07, 2000 at 06:40:49PM +1100, Horst Herb wrote:
This may be implemented very fast (if someone points me where
I can find CRC func). And I could implement "physical log"
till next monday.As the logging might include large data blocks, especially now that
we can TOAST our data, I would strongly suggest to use strong hashes
like RIPEMD or MD5 instead of CRC-32 and the like.
Cryptographically-secure hashes are unnecessarily expensive to compute.
A simple 64-bit CRC would be of equal value, at much less expense.
Nathan Myers
ncm@zembu.com
This may be implemented very fast (if someone points me where
I can find CRC func).Lifted from the PNG spec (RFC 2083):
Thanks! What about Copyrights/licence?
Vadim
Import Notes
Resolved by subject fallback
This may be implemented very fast (if someone points me where
I can find CRC func). And I could implement "physical log"
till next monday.I have been experimenting with CRCs for the past 6 month in
our database for internal logging purposes. Downloaded a lot of
hash libraries, tried different algorithms, and implemented a few
myself. Which algorithm do you want? Have a look at the openssl
libraries (www.openssl.org) for a start -if you don't find what
you want let me know.
Thanks.
As the logging might include large data blocks, especially
now that we can TOAST our data,
TOAST breaks data into a few 2K (or so) tuples to be inserted
separately. But first after checkpoint btree split will require
logging of 2x8K record -:(
I would strongly suggest to use strong hashes like RIPEMD or
MD5 instead of CRC-32 and the like. Sure, it takes more time
tocalculate and more place on the hard disk, but then: a database
without data integrity (and means of _proofing_ integrity) is
pretty worthless.
Other opinions? Also, we shouldn't forget licence issues.
Vadim
Import Notes
Resolved by subject fallback
recently I have downloaded a pre-beta postgresql, I found
insert and update speed is slower then 7.0.3,
even I turn of sync flag, it is still slow than 7.0, why?
how can I make it faster?
Try to compare 7.0.3 & 7.1beta in multi-user environment.
Vadim
Import Notes
Resolved by subject fallback
That's why an end marker must follow all valid records.
...
That requires an extra out-of-sequence write.
Yes, and also increase probability to corrupt already committed
to log data.
(I'd also like to see CRCs on all the table blocks as well; is there
a place to put them?)
Do we need it? "physical log" feature suggested by Andreas will protect
us from non atomic data block writes.
Vadim
Import Notes
Resolved by subject fallback
On Wed, Dec 06, 2000 at 06:53:37PM -0600, Bruce Guenter wrote:
On Wed, Dec 06, 2000 at 11:08:00AM -0800, Nathan Myers wrote:
On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote:
I don't know how pgsql does it, but the only safe way I know of
is to include an "end" marker after each record.An "end" marker is not sufficient, unless all writes are done in
one-sector units with an fsync between, and the drive buffering
is turned off.That's why an end marker must follow all valid records. When you write
records, you don't touch the marker, and add an end marker to the end of
the records you've written. After writing and syncing the records, you
rewrite the end marker to indicate that the data following it is valid,
and sync again. There is no state in that sequence in which partially-
written data could be confused as real data, assuming either your drives
aren't doing write-back caching or you have a UPS, and fsync doesn't
return until the drives return success.
That requires an extra out-of-sequence write.
Any other way I've seen discussed (here and elsewhere) either
- Assume that a CRC is a guarantee.We are already assuming a CRC is a guarantee.
The drive computes a CRC for each sector, and if the CRC is OK the
drive is happy. CRC errors within the drive are quite frequent, and
the drive re-reads when a bad CRC comes up.The kind of data failures that a CRC is guaranteed to catch (N-bit
errors) are almost precisely those that a mis-read on a hardware sector
would cause.
They catch a single mis-read, but not necessarily the quite likely
double mis-read.
... A CRC would be a good addition to
help ensure the data wasn't broken by flakey drive firmware, but
doesn't guarantee consistency.No, a CRC would be a good addition to compensate for sector write
reordering, which is done both by the OS and by the drive, even for
"atomic" writes.But it doesn't guarantee consistency, even in that case. There is a
possibility (however small) that the random data that was located in
the sectors before the write will match the CRC.
Generally, there are no guarantees, only reasonable expectations. A
64-bit CRC would give sufficient confidence without the out-of-sequence
write, and also detect corruption from any source including power outage.
(I'd also like to see CRCs on all the table blocks as well; is there
a place to put them?)
Nathan Myers
ncm@zembu.com
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
This may be implemented very fast (if someone points me where
I can find CRC func).Lifted from the PNG spec (RFC 2083):
Thanks! What about Copyrights/licence?
Should fit fine under our regular BSD license. CRC as such is long
since in the public domain...
regards, tom lane
(I'd also like to see CRCs on all the table blocks as well; is there
a place to put them?)Do we need it? "physical log" feature suggested by Andreas will protect
us from non atomic data block writes.
CRCs are neccessary because of glitches, hardware failures, operating system
bugs, viruses, etc - a lot of factors which can alter data stored on the
harddisk independend of postgresql. I learned this lesson the hard way when
I wrote a database application for a hospital, where data integrity is
vital.
Logging CRCs with each record gave us proof that data had been corrupted by
"external" factors (we never found out what it was). It was only a few bytes
in a data base with several 100k of records, but still intolerable. Medicine
is heading a way where decisions will be backed up by computerized
algorithms which in turn depend on exact data. A one bit glitch in a
Terabyte database can make the difference between life and death. These
glitches will happen, no doubt. Doesn't matter - as long as you have some
means of proofing your data integrity and some mechanism of alerting you
when shit has happend.
At present I am coordinating another medical project, we have chosen
PostgreSQL as our backend, and the main problem we have is creating
efficient CRC triggers (I'd wish postgres would support generic triggers
that are valid system wide or at least valid for all tables inheriting the
same table) for own homegrown integrity logging.
Horst
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
I would strongly suggest to use strong hashes like RIPEMD or
MD5 instead of CRC-32 and the like.
Other opinions? Also, we shouldn't forget licence issues.
I agree with whoever commented that crypto hashes are silly for this
application. A 64-bit CRC *might* be enough stronger than a 32-bit
CRC to be worth the extra calculation, but frankly I doubt that too.
Remember that we are already sitting atop hardware that's really pretty
reliable, despite the carping that's been going on in this thread. All
that we have to do is detect the infrequent case where a block of data
didn't get written due to system failure. It's wildly pessimistic to
think that we might get called on to do so as much as once a day (if
you are trying to run a reliable database, and are suffering power
failures once a day, and haven't bought a UPS, you're a lost cause).
A 32-bit CRC will fail to detect such an error with a probability of
about 1 in 2^32. So, a 32-bit CRC will have an MBTF of 2^32 days, or
11 million years, on the wildly pessimistic side --- real installations
probably 100 times better. That's plenty for me, and improving the odds
to 2^64 or 2^128 is not worth any slowdown IMHO.
regards, tom lane
P.S.: I would volunteer to integrate CRC routines into postgres if somebody
points me in the right direction in the source code.
Horst
On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote:
That's why an end marker must follow all valid records.
That requires an extra out-of-sequence write.
Yes, and also increase probability to corrupt already committed
to log data.
Are you referring to the case where the drive loses power in mid-write?
That is solved by either arranging for the markers to always be placed
at the start of a block, or by plugging in a UPS.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Thu, Dec 07, 2000 at 12:25:41PM -0800, Nathan Myers wrote:
That requires an extra out-of-sequence write.
Ayup!
Generally, there are no guarantees, only reasonable expectations.
I would differ, but that's irrelevant.
A 64-bit CRC would give sufficient confidence...
This is part of what I was getting at, in a roundabout way. If you use
a CRC, hash, or any other kind of non-trivial check code, you have a
certain level of confidence in the data, but not a guarantee. If you
decide, based on your expert opinions, that a 32 or 64 bit CRC or hash
gives you an adequate level of confidence in the event of a crash, then
I'll be satisfied, but don't call it a guarantee.
Them's small nits we're picking at, though.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote:
That's why an end marker must follow all valid records.
...
That requires an extra out-of-sequence write.
Yes, and also increase probability to corrupt already committed
to log data.(I'd also like to see CRCs on all the table blocks as well; is there
a place to put them?)Do we need it? "physical log" feature suggested by Andreas will protect
us from non atomic data block writes.
There are myriad sources of corruption, including RAM bit rot and
software bugs. The earlier and more reliably it's caught, the better.
The goal is to be able to say that a power outage won't invisibly
corrupt your database.
Here is are sources to a 64-bit CRC computation, under BSD license:
http://gcc.gnu.org/ml/gcc/1999-11n/msg00592.html
Nathan Myers
ncm@zembu.com
On Thu, Dec 07, 2000 at 04:35:00PM -0500, Tom Lane wrote:
Remember that we are already sitting atop hardware that's really
pretty reliable, despite the carping that's been going on in this
thread. All that we have to do is detect the infrequent case where a
block of data didn't get written due to system failure. It's wildly
pessimistic to think that we might get called on to do so as much as
once a day (if you are trying to run a reliable database, and are
suffering power failures once a day, and haven't bought a UPS, you're
a lost cause). A 32-bit CRC will fail to detect such an error with a
probability of about 1 in 2^32. So, a 32-bit CRC will have an MBTF of
2^32 days, or 11 million years, on the wildly pessimistic side ---
real installations probably 100 times better. That's plenty for me,
and improving the odds to 2^64 or 2^128 is not worth any slowdown
IMHO.
1. Computing a CRC-64 takes only about twice as long as a CRC-32, for
2^32 times the confidence. That's pretty cheap confidence.
2. I disagree with way the above statistics were computed. That eleven
million-year figure gets whittled down pretty quickly when you
factor in all the sources of corruption, even without crashes.
(Power failures are only one of many sources of corruption.) They
grow with the size and activity of the database. Databases are
getting very large and busy indeed.
3. Many users clearly hope to be able to pull the plug on their hardware
and get back up confidently. While we can't promise they won't have
to go to their backups, we should at least be equipped to promise,
with confidence, that they will know whether they need to.
4. For a way to mark the "current final" log entry, you want a lot more
confidence, because you read a lot more of them, and reading beyond
the end may cause you to corrupt a currently-valid database, which
seems a lot worse than just using a corrupted database.
Still, I agree that a 32-bit CRC is better than none at all.
Nathan Myers
ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes:
2. I disagree with way the above statistics were computed. That eleven
million-year figure gets whittled down pretty quickly when you
factor in all the sources of corruption, even without crashes.
(Power failures are only one of many sources of corruption.) They
grow with the size and activity of the database. Databases are
getting very large and busy indeed.
Sure, but the argument still holds. If the net MTBF of your underlying
system is less than a day, it's too unreliable to run a database that
you want to trust. Doesn't matter what the contributing failure
mechanisms are. In practice, I'd demand an MTBF of a lot more than a
day before I'd accept a hardware system as satisfactory...
3. Many users clearly hope to be able to pull the plug on their hardware
and get back up confidently. While we can't promise they won't have
to go to their backups, we should at least be equipped to promise,
with confidence, that they will know whether they need to.
And the difference in odds between 2^32 and 2^64 matters here? I made
a numerical case that it doesn't, and you haven't refuted it. By your
logic, we might as well say that we should be using a 128-bit CRC, or
256-bit, or heck, a few kilobytes. It only takes a little longer to go
up each step, right, so where should you stop? I say MTBF measured in
megayears ought to be plenty. Show me the numerical argument that 64
bits is the right place on the curve.
4. For a way to mark the "current final" log entry, you want a lot more
confidence, because you read a lot more of them,
You only need to make the distinction during a restart, so I don't
think that argument is correct.
regards, tom lane
I believe that there are many good points to have CRC facilities "built
int", and I failed to detect any arguments against it. In my domain (the
medical domain) we simply can't use data without "proof" of integrity
("proof" as in highest possible level of confidence within reasonable
effort)
Therefore, I propose defining new data types like "CRC32", "CRC64",
"RIPEMD", whatever (rather than pluggable arbitrary CRCs).
Similar as with the SERIAL data type, the CRC data type would generate
automatically a trigger function that calculates a CRC across a tuple
(omitting the CRC property of course, and maybe the OID as well) before each
update and store it in itself.
Is there anything wrong with this idea?
Can somebody help me by pointing me into the right direction to implement
it? (The person who implemeted the SERIAL data type maybe ?)
Regards,
Horst
Therefore, I propose defining new data types like "CRC32", "CRC64",
"RIPEMD", whatever (rather than pluggable arbitrary CRCs).
I suspect that you are really looking at the problem from the wrong end.
CRC checking should not need to be done by the database user, with a fancy
type. The postgres server itself should guarantee data integrity - you
shouldn't have to worry about it in userland.
This is, in fact, what the recent discussion on this list has been
proposing...
Chris
I suspect that you are really looking at the problem from the wrong end.
CRC checking should not need to be done by the database user, with a fancy
type. The postgres server itself should guarantee data integrity - you
shouldn't have to worry about it in userland.
I agree in principle. However, performance sometimes is more important than
integrity. Think of a data logger of uncritical data. A online forum. There
a plenty of occasions where you don't have to worry for a single bit on or
off, but a lot to worry about performance. Look at all those people using M$
Access or MySQL who don't give a damn about data integrity. As opposed to
them, there will always be other "typical" database applications where 100%
integrity is paramount. Then it is nice to have a choice of CRCs, where the
database designer can choose according to his/her specific
performance/integrity balanced needs. This is why I would prefer the
"datatype" solution.
This is, in fact, what the recent discussion on this list has been
proposing...
AFAIK the thread for "built in" crcs referred only to CRCs in the
transaction log. This here is a different thing. CRCs in the transaction log
are crucial to proof integrity of the log, CRCs as datatype are neccessary
to proof integrity of database entries at row level. Always remember that a
psotgres data base on the harddisk can be manipulated accidentally /
maliciously without postgres even running. These are the cases where you
need row level CRCs.
Horst
"Mikheev, Vadim" wrote:
recently I have downloaded a pre-beta postgresql, I found
insert and update speed is slower then 7.0.3,
even I turn of sync flag, it is still slow than 7.0, why?
How much slower do you see it to be ?
how can I make it faster?
Try to compare 7.0.3 & 7.1beta in multi-user environment.
As I understand it you claim it to be faster in multi-user environment ?
Could you give some brief technical background why is it so
and why must it make single-user slower ?
---------------
Hannu
"Horst Herb" <hherb@malleenet.net.au> writes:
AFAIK the thread for "built in" crcs referred only to CRCs in the
transaction log. This here is a different thing. CRCs in the transaction log
are crucial to proof integrity of the log, CRCs as datatype are neccessary
to proof integrity of database entries at row level.
I think a row-level CRC is rather pointless. Perhaps it'd be a good
idea to have a disk-page-level CRC, though. That would check the rows
on the page *and* allow catching errors in the page and tuple overhead
structures, which row-level CRCs would not cover.
I suspect TOAST breaks your notion of computing a CRC at trigger time
anyway --- some of the fields may be toasted already, some not.
If you're sufficiently paranoid that you insist you need a row-level
CRC, it seems to me that you'd want to generate it and later check it
in your application, not in the database. That's the only way you get
end-to-end coverage. Surely you don't trust your TCP connection to the
server, either?
regards, tom lane
I think a row-level CRC is rather pointless. Perhaps it'd be a good
idea to have a disk-page-level CRC, though. That would check the rows
on the page *and* allow catching errors in the page and tuple overhead
structures, which row-level CRCs would not cover.
row level is neccessary to be able tocheck integrity at application level.
I suspect TOAST breaks your notion of computing a CRC at trigger time
anyway --- some of the fields may be toasted already, some not.
The workaround is a loggingtable where you store the crcs as well. Lateron,
an "integrity daemon" can compare whether match or not.
If you're sufficiently paranoid that you insist you need a row-level
CRC, it seems to me that you'd want to generate it and later check it
in your application, not in the database. That's the only way you get
Oh, sure, that is the way we do it now. And no, nothing to do with paranoia.
Burnt previously badly by assumption that a decent SQL server is a
"guarantee" for data integrity. Shit simply happens.
end-to-end coverage. Surely you don't trust your TCP connection to the
server, either?
TCP _IS_ heavily checksummed. But yes, we _do_ calculate checksums at the
client, recalculate at the server, and compare after the transaction is
completed. As we have only few writes between heavy read access, the
performance penalty doing this (for our purposes) is minimal.
Horst
On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote:
1. Computing a CRC-64 takes only about twice as long as a CRC-32, for
2^32 times the confidence. That's pretty cheap confidence.
Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair
slower, the CRC was (quite surprisingly) about 40% slower, and the
implementation of SHA that I had available was a real dog. Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
Try to compare 7.0.3 & 7.1beta in multi-user environment.
As I understand it you claim it to be faster in multi-user
environment ?Could you give some brief technical background why is it so
and why must it make single-user slower ?
Because of commit in 7.1 does fsync, with ot without -F
(we can discuss and change this), but in multi-user env
a number of commits can be made with single fsync.
Seems I've described this before?
Vadim
Import Notes
Resolved by subject fallback
"Horst Herb" <hherb@malleenet.net.au> writes:
Surely you don't trust your TCP connection to the
server, either?
TCP _IS_ heavily checksummed.
Yes, and so are the disk drives that you are asserting you don't trust.
My point is that in both cases, there are lots and lots of failure
mechanisms that won't be caught by the transport or storage CRC.
The same applies to anything other than an end-to-end check.
regards, tom lane
Date: Fri, 8 Dec 2000 12:19:39 -0600
From: Bruce Guenter <bruceg@em.ca>
Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair
slower, the CRC was (quite surprisingly) about 40% slower, and the
implementation of SHA that I had available was a real dog. Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.
I just want to confirm that you used something like the fast 32-bit
CRC algorithm, appended. The one posted earlier was accurate but
slow.
Ian
/*
* Copyright (C) 1986 Gary S. Brown. You may use this program, or
* code or tables extracted from it, as desired without restriction.
*/
/* Modified slightly by Ian Lance Taylor, ian@airs.com, for use with
Taylor UUCP. */
#include "uucp.h"
#include "prot.h"
/* First, the polynomial itself and its table of feedback terms. The */
/* polynomial is */
/* X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 */
/* Note that we take it "backwards" and put the highest-order term in */
/* the lowest-order bit. The X^32 term is "implied"; the LSB is the */
/* X^31 term, etc. The X^0 term (usually shown as "+1") results in */
/* the MSB being 1. */
/* Note that the usual hardware shift register implementation, which */
/* is what we're using (we're merely optimizing it by doing eight-bit */
/* chunks at a time) shifts bits into the lowest-order term. In our */
/* implementation, that means shifting towards the right. Why do we */
/* do it this way? Because the calculated CRC must be transmitted in */
/* order from highest-order term to lowest-order term. UARTs transmit */
/* characters in order from LSB to MSB. By storing the CRC this way, */
/* we hand it to the UART in the order low-byte to high-byte; the UART */
/* sends each low-bit to hight-bit; and the result is transmission bit */
/* by bit from highest- to lowest-order term without requiring any bit */
/* shuffling on our part. Reception works similarly. */
/* The feedback terms table consists of 256, 32-bit entries. Notes: */
/* */
/* The table can be generated at runtime if desired; code to do so */
/* is shown later. It might not be obvious, but the feedback */
/* terms simply represent the results of eight shift/xor opera- */
/* tions for all combinations of data and CRC register values. */
/* [this code is no longer present--ian] */
/* */
/* The values must be right-shifted by eight bits by the "updcrc" */
/* logic; the shift must be unsigned (bring in zeroes). On some */
/* hardware you could probably optimize the shift in assembler by */
/* using byte-swap instructions. */
static const unsigned long aicrc32tab[] = { /* CRC polynomial 0xedb88320 */
0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L,
0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L,
0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L,
0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L,
0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL,
0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L,
0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL,
0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL,
0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L,
0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L,
0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L,
0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L,
0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL,
0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L,
0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL,
0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL,
0xedb88320L, 0x9abfb3b6L, 0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L, 0x73dc1683L,
0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L, 0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L,
0xf00f9344L, 0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL, 0x196c3671L, 0x6e6b06e7L,
0xfed41b76L, 0x89d32be0L, 0x10da7a5aL, 0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L,
0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L, 0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL,
0xd80d2bdaL, 0xaf0a1b4cL, 0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL, 0x4669be79L,
0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L, 0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL,
0xc5ba3bbeL, 0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L, 0x2cd99e8bL, 0x5bdeae1dL,
0x9b64c2b0L, 0xec63f226L, 0x756aa39cL, 0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L,
0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL, 0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L,
0x86d3d2d4L, 0xf1d4e242L, 0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L, 0x18b74777L,
0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL, 0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L,
0xa00ae278L, 0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L, 0x4969474dL, 0x3e6e77dbL,
0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L, 0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L,
0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L, 0xcdd70693L, 0x54de5729L, 0x23d967bfL,
0xb3667a2eL, 0xc4614ab8L, 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL
};
/*
* IUPDC32 macro derived from article Copyright (C) 1986 Stephen Satchell.
* NOTE: First argument must be in range 0 to 255.
* Second argument is referenced twice.
*
* Programmers may incorporate any or all code into their programs,
* giving proper credit within the source. Publication of the
* source routines is permitted so long as proper credit is given
* to Stephen Satchell, Satchell Evaluations and Chuck Forsberg,
* Omen Technology.
*/
#define IUPDC32(b, ick) \
(aicrc32tab[((int) (ick) ^ (b)) & 0xff] ^ (((ick) >> 8) & 0x00ffffffL))
unsigned long
icrc (z, c, ick)
const char *z;
size_t c;
unsigned long ick;
{
while (c > 4)
{
ick = IUPDC32 (*z++, ick);
ick = IUPDC32 (*z++, ick);
ick = IUPDC32 (*z++, ick);
ick = IUPDC32 (*z++, ick);
c -= 4;
}
while (c-- != 0)
ick = IUPDC32 (*z++, ick);
return ick;
}
Bruce Guenter <bruceg@em.ca> writes:
... Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.
... but that would be an algorithm that you know NOTHING about the
properties of. What is your basis for asserting it's better than CRC?
CRC is pretty well studied and its error-detection behavior is known
(and good). MD5 has been studied less thoroughly AFAIK, and in any
case what's known about its behavior is that the *entire* MD5 output
provides a good signature for a datastream. If you pick some ad-hoc
method like taking a randomly chosen subset of MD5's output bits,
you really don't know anything at all about what the error-detection
properties of the method are.
I am reminded of Knuth's famous advice about random number generators:
"Random numbers should not be generated with a method chosen at random.
Some theory should be used." Error-detection codes, like random-number
generators, have decades of theory behind them. Seat-of-the-pants
tinkering, even if it starts with a known-good method, is not likely to
produce an improvement.
regards, tom lane
On Fri, Dec 08, 2000 at 10:36:39AM -0800, Ian Lance Taylor wrote:
Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair
slower, the CRC was (quite surprisingly) about 40% slower, and the
implementation of SHA that I had available was a real dog. Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.I just want to confirm that you used something like the fast 32-bit
CRC algorithm, appended. The one posted earlier was accurate but
slow.
Yes. I just rebuilt the framework using this exact code, and it
performed identically to the previous CRC code (which didn't have an
unrolled inner loop). These were compiled with -O6 with egcs 1.1.2.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
Because of commit in 7.1 does fsync, with ot without -F
(we can discuss and change this), but in multi-user env
a number of commits can be made with single fsync.
I was planning to ask why you disabled the -F switch. Seems to me that
people who trusted their OS+hardware before would still want to do so
in 7.1, and so it still makes sense to be able to suppress the fsync
calls.
regards, tom lane
On Fri, Dec 08, 2000 at 12:19:39PM -0600, Bruce Guenter wrote:
On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote:
1. Computing a CRC-64 takes only about twice as long as a CRC-32, for
2^32 times the confidence. That's pretty cheap confidence.Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair
slower, the CRC was (quite surprisingly) about 40% slower, and the
implementation of SHA that I had available was a real dog. Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.
This is very interesting. MD4 is faster than MD5. (MD5, described as
"MD4 with suspenders on", does some extra stuff to protect against more-
obscure attacks, of no interest to us.) Which 64-bit CRC code did you
use, Mark Mitchell's? Are you really saying MD5 was faster than CRC-32?
I don't know of any reason to think that 32 bits of an MD5 would be
better distributed than a CRC-32, or that having computed the 64 bits
there would be any point in throwing away half.
Current evidence suggests that MD4 would be a good choice for a hash
algorithm.
Nathan Myers
ncm@zembu.com
Try to compare 7.0.3 & 7.1beta in multi-user environment.
As I understand it you claim it to be faster in multi-user
environment ?Could you give some brief technical background why is it so
and why must it make single-user slower ?Because of commit in 7.1 does fsync, with ot without -F
(we can discuss and change this), but in multi-user env
a number of commits can be made with single fsync.
Seems I've described this before?
Ops, I forgot to answer question "why in single-user env 7.1 is
slower than 7.0.3?". I assumed that 7.1 was compared with 7.0.3
*with -F*, which probably is not correct, I don't know.
Well, the next test shows that 7.1 is faster in single-user env
than 7.0.3 *without -F*:
table (i int, t text); 1000 INSERTs (in separate transactions),
sizeof(t) 1 .. 256:
7.0.3: 42 sec -> 24 tps
7.1 : 24 sec -> 42 tps
Vadim
Import Notes
Resolved by subject fallback
Because of commit in 7.1 does fsync, with ot without -F
(we can discuss and change this), but in multi-user env
a number of commits can be made with single fsync.I was planning to ask why you disabled the -F switch. Seems
to me that people who trusted their OS+hardware before would
still want to do so in 7.1, and so it still makes sense to be
able to suppress the fsync calls.
I just didn't care about -F functionality, sorry.
I agreed that we should resurrect it.
Vadim
Import Notes
Resolved by subject fallback
On Fri, Dec 08, 2000 at 01:58:12PM -0500, Tom Lane wrote:
Bruce Guenter <bruceg@em.ca> writes:
... Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.... but that would be an algorithm that you know NOTHING about the
properties of. What is your basis for asserting it's better than CRC?
MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
impossible to produce a collision using any other method than brute
force attempts. In other words, any stream of input to the hash that is
longer than the hash length (8 bytes for MD5) is equally probable to
produce a given hash code.
CRC is pretty well studied and its error-detection behavior is known
(and good). MD5 has been studied less thoroughly AFAIK, and in any
case what's known about its behavior is that the *entire* MD5 output
provides a good signature for a datastream. If you pick some ad-hoc
method like taking a randomly chosen subset of MD5's output bits,
you really don't know anything at all about what the error-detection
properties of the method are.
Actually, in my reading reagarding the properties of MD5, I read an
article that stated that if a smaller number of bits was desired, one
could either (and here's where my memory fails me) just select the
middle N bits from the resulting hash, or fold the hash using XOR until
the desired number of bits was reached. I'll see if I can find a
reference...
RFC2289 (http://www.ietf.org/rfc/rfc2289.txt) includes an algorithm for
folding MD5 digests down to 64 bits by XORing the top half with the
bottom half. See appendix A.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
Bruce Guenter wrote:
CRCs are designed to catch N-bit errors (ie N bits in a row with their
values flipped). N is (IIRC) the number of bits in the CRC minus one.
So, a 32-bit CRC can catch all 31-bit errors. That's the only guarantee
a CRC gives. Everything else has a 1 in 2^32-1 chance of producing the
same CRC as the original data. That's pretty good odds, but not a
guarantee.
Nothing is a guarante. Everywhere you have a non-null probability of
failure. Memories of any kind doesn't give you a *guarantee* that the
data you read is exactly the one you wrote. CPUs and transmsision lines
are subject to errors too.
You only may be guaranteed that the overall proabability of your system
is under a specified level. When the level is low enought you usually
suppose the absence of errors guaranteed.
With CRC32 you considerably reduce p, and given the frequency when CRC
would need to reveal an error, I would consider it enought.
Bye!
--
Daniele
-------------------------------------------------------------------------------
Daniele Orlandi - Utility Line Italia - http://www.orlandi.com
Via Mezzera 29/A - 20030 - Seveso (MI) - Italy
-------------------------------------------------------------------------------
I just didn't care about -F functionality, sorry.
I agreed that we should resurrect it.OK. Do you want to work on that, or shall I?
In near future I'll be busy doing CRC + "physical log"
things...
Vadim
Import Notes
Resolved by subject fallback
Bruce Guenter <bruceg@em.ca> writes:
MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
impossible to produce a collision using any other method than brute
force attempts.
True but irrelevant. What we need to worry about is the probability
that a random error will be detected, not the computational effort that
a malicious attacker would need in order to insert an undetectable
error.
MD5 is designed for a purpose that really doesn't have much to do with
error detection, when you think about it. It says "you will have a hard
time computing a different string that produces the same hash as some
prespecified string". This is not the same as promising
better-than-random odds against a damaged copy of some string having the
same hash as the original. CRC, on the other hand, is specifically
designed for error detection, and for localized errors (such as a
corrupted byte or two) it does a provably better-than-random job.
For nonlocalized errors you don't get a guarantee, but you do get
same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC).
I really doubt that MD5 can beat a CRC with the same number of output
bits for the purpose of error detection; given the lack of guarantee
about short burst errors, I doubt it's even as good. (Wild-pointer
stomps on disk buffers are an example of the sort of thing that may
look like a burst error.)
Now, if you are worried about crypto-capable gremlins living in your
file system, maybe what you want is MD5. But I'm inclined to think that
CRC is more appropriate for the job at hand.
regards, tom lane
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
I just didn't care about -F functionality, sorry.
I agreed that we should resurrect it.
OK. Do you want to work on that, or shall I?
regards, tom lane
On Fri, Dec 08, 2000 at 11:10:19AM -0800, Nathan Myers wrote:
This is very interesting. MD4 is faster than MD5. (MD5, described as
"MD4 with suspenders on", does some extra stuff to protect against more-
obscure attacks, of no interest to us.) Which 64-bit CRC code did you
use, Mark Mitchell's?
Yes.
Are you really saying MD5 was faster than CRC-32?
Yes. I expect it's because the operations used in MD5 are easily
parallelized, and operate on blocks of 64-bytes at a time, while the CRC
is mostly non-parallelizable, uses a table lookup, and operates on
single bytes.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Fri, Dec 08, 2000 at 11:10:19AM -0800, I wrote:
Current evidence suggests that MD4 would be a good choice for a hash
algorithm.
Thinking about it, I suspect that any CRC implementation that can't outrun
MD5 by a wide margin is seriously sub-optimal. Can you post any more
details about how the tests were run? I'd like to try it.
Nathan Myers
ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes:
Thinking about it, I suspect that any CRC implementation that can't outrun
MD5 by a wide margin is seriously sub-optimal.
I was finding that hard to believe, too, at least for CRC-32 (CRC-64
would take more code, so I'm not so sure about it).
Is that 64-bit code you pointed us to before actually a CRC, or
something else? It doesn't call itself a CRC, and I was having a hard
time extracting anything definite (like the polynomial) from all the
bit-pushing underbrush :-(
regards, tom lane
On Fri, Dec 08, 2000 at 03:38:09PM -0500, Tom Lane wrote:
Bruce Guenter <bruceg@em.ca> writes:
MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
impossible to produce a collision using any other method than brute
force attempts.True but irrelevant. What we need to worry about is the probability
that a random error will be detected,
Which I indicated immediately after the sentence you quoted. The
probability that a random error will be detected is the same as the
probability of a collision in the hash given two different inputs. The
brute force note means that the probability of a collision is as good as
random.
MD5 is designed for a purpose that really doesn't have much to do with
error detection, when you think about it. It says "you will have a hard
time computing a different string that produces the same hash as some
prespecified string". This is not the same as promising
better-than-random odds against a damaged copy of some string having the
same hash as the original.
It does provide as-good-as-random odds against a damaged copy of some
string having the same hash as the original -- nobody has been able to
exhibit any collisions in MD5 (see http://cr.yp.to/papers/hash127.ps,
page 18 for notes on this).
CRC, on the other hand, is specifically
designed for error detection, and for localized errors (such as a
corrupted byte or two) it does a provably better-than-random job.
For nonlocalized errors you don't get a guarantee, but you do get
same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC).
For the log, the CRC's primary function (as far as I understand it)
would be to guard against inconsistent transaction being treated as
consistent data. Such inconsistent transactions would be partially
written, resulting in errors much larger than a small burst.
For guarding the actual record data, I agree with you 100% -- what we're
likely to see is a few localized bytes with flipped bits due to hardware
failure of one kind or another. However, if the data is really
critical, an ECC may be more appropriate, but that would make the data
significantly larger (9/8 for the algorithms I've seen).
I really doubt that MD5 can beat a CRC with the same number of output
bits for the purpose of error detection;
Agreed. However, MD5 provides four times as many bits as the standard
32-bit CRC.
(I think I initially suggested you could take an arbitrary 32 bits out
of MD5 to provide a check code "as good as CRC-32". I now take that
back. Due to the burst error nature of CRCs, nothing else could be as
good as it, unless the alternate algorithm also made some guarantees,
which MD5 definitely doesn't.)
(Wild-pointer
stomps on disk buffers are an example of the sort of thing that may
look like a burst error.)
Actually, wild-pointer incidents involving disk buffers at the kernel
level, from my experience, are characterized by content from one file
appearing in another, which is distinctly different than a burst error,
and more like what would be seen if a log record were partially written.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
Bruce Guenter <bruceg@em.ca> writes:
Are you really saying MD5 was faster than CRC-32?
Yes. I expect it's because the operations used in MD5 are easily
parallelized, and operate on blocks of 64-bytes at a time, while the CRC
is mostly non-parallelizable, uses a table lookup, and operates on
single bytes.
What MD5 implementation did you use? The one I have handy (the original
RSA reference version) sure looks like it's more computation per byte
than a CRC.
regards, tom lane
On Fri, Dec 08, 2000 at 04:21:21PM -0500, Tom Lane wrote:
ncm@zembu.com (Nathan Myers) writes:
Thinking about it, I suspect that any CRC implementation that can't outrun
MD5 by a wide margin is seriously sub-optimal.I was finding that hard to believe, too, at least for CRC-32 (CRC-64
would take more code, so I'm not so sure about it).
Would you like to see the simple benchmarking setup I used? The amount
of code involved (once all the hashes are factored in) is fairly large,
so I'm somewhat hesitant to just send it to the mailing list.
Is that 64-bit code you pointed us to before actually a CRC, or
something else? It doesn't call itself a CRC, and I was having a hard
time extracting anything definite (like the polynomial) from all the
bit-pushing underbrush :-(
It isn't a CRC. It's a fingerprint. As you've mentioned, it doesn't
have the guarantees against burst errors that a CRC would have, but it
does have as good as random collision avoidance over any random data
corruption. At least, that's what the author claims. My math isn't
nearly good enough to verify such claims.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Fri, Dec 08, 2000 at 04:30:58PM -0500, Tom Lane wrote:
Bruce Guenter <bruceg@em.ca> writes:
Are you really saying MD5 was faster than CRC-32?
Yes. I expect it's because the operations used in MD5 are easily
parallelized, and operate on blocks of 64-bytes at a time, while the CRC
is mostly non-parallelizable, uses a table lookup, and operates on
single bytes.What MD5 implementation did you use?
I used the GPL'ed implementation written by Ulrich Drepper in 1995. The
code from OpenSSL looks identical in terms of the operations performed.
The one I have handy (the original
RSA reference version) sure looks like it's more computation per byte
than a CRC.
The algorithm itself does use more computation per byte. However, the
algorithm works on blocks of 64 bytes at a time. As well, the
operations should be easily pipelined. On the other hand, the CRC code
is largely serial, and highly dependant on a table lookup operation.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
Bruce Guenter <bruceg@em.ca> writes:
Would you like to see the simple benchmarking setup I used? The amount
of code involved (once all the hashes are factored in) is fairly large,
so I'm somewhat hesitant to just send it to the mailing list.
I agree, don't send it to the whole list. But I'd like a copy.
regards, tom lane
Bruce Guenter <bruceg@em.ca> writes:
I agree, don't send it to the whole list. But I'd like a copy.
Here you go.
As near as I could tell, the test as you have it (one CRC computation per
fread) is purely I/O bound. I changed the main loop to this:
int main() {
static char buf[8192];
size_t rd;
hash_t hash;
while (rd = fread(buf, 1, sizeof buf, stdin)) {
int i;
for (i = 0; i < 1000; i++) {
init(&hash);
update(&hash, buf, rd);
}
}
return 0;
}
so as to get a reasonable amount of computation per fread. On an
otherwise idle HP 9000 C180 machine, I get the following numbers on a
1MB input file:
time benchcrc <random32
real 35.3
user 35.0
sys 0.0
time benchmd5 <random32
real 37.6
user 37.3
sys 0.0
This is a lot closer than I'd have expected, but it sure ain't
"MD5 40% faster" as you reported. I wonder why the difference
in results between your platform and mine?
BTW, I used gcc 2.95.2 to compile, -O6, no other switches.
regards, tom lane
Import Notes
Reply to msg id not found: 20001208162813.N7800@em.ca
A couple further observations while playing with this benchmark ---
1. This MD5 implementation is not too robust. On my machine it dumps
core if given a non-word-aligned data buffer. We could probably work
around that, but it bespeaks a little *too* much hand optimization...
2. It's a bad idea to ignore the startup/termination costs of the
algorithms. Of course startup/termination is trivial for CRC, but
it's not so trivial for MD5. I changed the code so that the md5
update() routine also calls md5_finish_ctx(), so that each inner
loop represents a complete MD5 calculation for a message of the
size of the main routine's fread buffer. I then experimented with
different buffer sizes. At a buffer size of 1K:
time benchcrc <random32
real 35.4
user 35.1
sys 0.0
time benchmd5 <random32
real 41.4
user 41.1
sys 0.0
At a buffer size of 100 bytes:
time benchcrc <random32
real 36.3
user 36.0
sys 0.0
time benchmd5 <random32
real 1:09.7
user 1:09.2
sys 0.0
(The total amount of data processed is 1000 MB in either case, but
it's divided into more messages in the second case.)
I'm not sure exactly what Vadim has in mind for computing CRCs on the
WAL log. If he's thinking of a CRC for each log message, the MD5 stuff
would be at a definite disadvantage. For disk-page checksums (8K or
more) this isn't too much of an issue, however.
regards, tom lane
Import Notes
Reply to msg id not found: 20001208162813.N7800@em.ca
On Fri, Dec 08, 2000 at 09:28:38PM -0500, Tom Lane wrote:
Bruce Guenter <bruceg@em.ca> writes:
I agree, don't send it to the whole list. But I'd like a copy.
Here you go.
As near as I could tell, the test as you have it (one CRC computation per
fread) is purely I/O bound.
Nope. They got 99-100% CPU time with the original version.
I changed the main loop to this:
[...hash each block repeatedly...]
Good idea. Might have been even better to just read the block once and
hash it even more times.
On an
otherwise idle HP 9000 C180 machine, I get the following numbers on a
1MB input file:time benchcrc <random32
real 35.3 > user 35.0 > sys 0.0time benchmd5 <random32
real 37.6 > user 37.3 > sys 0.0This is a lot closer than I'd have expected, but it sure ain't
"MD5 40% faster" as you reported. I wonder why the difference
in results between your platform and mine?
The difference is likely because PA-RISC (like most other RISC
architectures) lack a "roll" opcode that is very prevalent in the MD5
algorithm. Intel CPUs have it. With a new version modified to repeat
the inner loop 100,000 times, I got the following:
time benchcrc <random
21.35user 0.01system 0:21.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (79major+11minor)pagefaults 0swaps
time benchmd5 <random
12.79user 0.01system 0:12.79elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (80major+11minor)pagefaults 0swaps
time benchcrc <random
21.32user 0.06system 0:21.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (79major+11minor)pagefaults 0swaps
time benchmd5 <random
12.79user 0.01system 0:12.80elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (80major+11minor)pagefaults 0swaps
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
Bruce Guenter <bruceg@em.ca> writes:
This is a lot closer than I'd have expected, but it sure ain't
"MD5 40% faster" as you reported. I wonder why the difference
in results between your platform and mine?
The difference is likely because PA-RISC (like most other RISC
architectures) lack a "roll" opcode that is very prevalent in the MD5
algorithm.
A good theory, but unfortunately not a correct theory. PA-RISC can do a
circular shift in one cycle using the "shift right double" instruction,
with the same register specified as both high and low halves of the
64-bit operand. And gcc does know about that.
After some groveling through assembly code, it seems that the CRC-32
implementation is about as tight as it could get: two loads, two XORs,
and two EXTRU's per byte (one used to implement the right shift, the
other to implement masking with 0xFF). And the wall clock timing does
indeed come out to just about six cycles per byte. The MD5 code also
looks pretty tight. Each basic OP requires either two or three logical
operations (and/or/xor/not) depending on which round you're looking at,
plus four additions and a circular shift. PA-RISC needs two cycles to
load an arbitrary 32-bit constant, but other than that I see no wasted
cycles here:
ldil L'-1444681467,%r20
xor %r3,%r14,%r19
ldo R'-1444681467(%r20),%r20
and %r1,%r19,%r19
addl %r15,%r20,%r20
xor %r14,%r19,%r19
addl %r19,%r26,%r19
addl %r20,%r19,%r15
shd %r15,%r15,27,%r15
addl %r15,%r3,%r15
Note gcc has been smart enough to assign all the correct_words[] array
elements to registers, else we'd lose another cycle to a load operation
--- fortunately PA-RISC has lots of registers.
There are 64 of these basic OPs needed in each round, and each round
processes 64 input bytes, so basically you can figure one OP per byte.
Ignoring loop overhead and so forth, it's nine or ten cycles per byte
for MD5 versus six for CRC.
I'm at a loss to see how a Pentium would arrive at a better result for
MD5 than for CRC. For one thing, it's going to be at a disadvantage
because it hasn't got enough registers. I'd be interested to see the
assembly code...
regards, tom lane
On Fri, Dec 08, 2000 at 10:17:00PM -0500, Tom Lane wrote:
A couple further observations while playing with this benchmark ---
1. This MD5 implementation is not too robust. On my machine it dumps
core if given a non-word-aligned data buffer. We could probably work
around that, but it bespeaks a little *too* much hand optimization...
The operations in the MD5 core are based on word-sized chunks.
Obviously, the implentation only does word-sized loads and stores for
that data, and you got a bus error.
2. It's a bad idea to ignore the startup/termination costs of the
algorithms.
Yes. I had included the startup costs in my benchmark, but not the
termination costs, which are large for MD5 as you point out.
Of course startup/termination is trivial for CRC, but
it's not so trivial for MD5. I changed the code so that the md5
update() routine also calls md5_finish_ctx(), so that each inner
loop represents a complete MD5 calculation for a message of the
size of the main routine's fread buffer. I then experimented with
different buffer sizes. At a buffer size of 1K:
On my Celeron, at 1K blocks MD5 is still significantly faster than CRC,
but is slightly slower at 100 byte blocks. For comparison, I added
RIPEMD-160, but it's far slower than any of them (twice as long as CRC).
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Sat, Dec 09, 2000 at 06:46:23PM -0500, Tom Lane wrote:
Bruce Guenter <bruceg@em.ca> writes:
The difference is likely because PA-RISC (like most other RISC
architectures) lack a "roll" opcode that is very prevalent in the MD5
algorithm.A good theory, but unfortunately not a correct theory. PA-RISC can do a
circular shift in one cycle using the "shift right double" instruction,
with the same register specified as both high and low halves of the
64-bit operand. And gcc does know about that.
Interesting. I was under the impression that virtually no RISC CPU had
a rotate instruction. Do any others?
After some groveling through assembly code, it seems that the CRC-32
implementation is about as tight as it could get: two loads, two XORs,
and two EXTRU's per byte (one used to implement the right shift, the
other to implement masking with 0xFF).
Same with the x86 core:
movb %dl,%al
xorb (%ecx),%al
andl $255,%eax
shrl $8,%edx
incl %ecx
xorl (%esi,%eax,4),%edx
And the wall clock timing does
indeed come out to just about six cycles per byte.
On my Celeron, the timing for those six opcodes is almost whopping 13
cycles per byte. Obviously there's some major performance hit to do the
memory instructions, because there's no more than 4 cycles worth of
dependant instructions in that snippet.
BTW, for reference, P3 timings are almost identical to those of the
Celeron, so it's not causing problems outside the built-in caches common
to the two chips.
The MD5 code also
looks pretty tight. Each basic OP requires either two or three logical
operations (and/or/xor/not) depending on which round you're looking at,
plus four additions and a circular shift. PA-RISC needs two cycles to
load an arbitrary 32-bit constant, but other than that I see no wasted
cycles here:ldil L'-1444681467,%r20
xor %r3,%r14,%r19
ldo R'-1444681467(%r20),%r20
and %r1,%r19,%r19
addl %r15,%r20,%r20
xor %r14,%r19,%r19
addl %r19,%r26,%r19
addl %r20,%r19,%r15
shd %r15,%r15,27,%r15
addl %r15,%r3,%r15
Here's the x86 assembly code for what appears to be the same basic OP:
movl %edx,%eax
xorl %esi,%eax
andl %edi,%eax
xorl %esi,%eax
movl -84(%ebp),%ecx
leal -1444681467(%ecx,%eax),%eax
addl %eax,%ebx
roll $5,%ebx
addl %edx,%ebx
This is a couple fewer instructions, mainly saving on doing any loads to
use the constant value. This takes almost exactly 9 cycles per byte.
There are 64 of these basic OPs needed in each round, and each round
processes 64 input bytes, so basically you can figure one OP per byte.
Ignoring loop overhead and so forth, it's nine or ten cycles per byte
for MD5 versus six for CRC.
On Celeron/P3, CRC scores almost 13 cycles per byte, MD4 is about 6
cycles per byte, and MD5 is about 9 cycles per byte. On Pentium MMX,
CRC is 7.25, MD4 is 7.5 and MD5 is 10.25. So, the newer CPUs actually
do worse on CRC than the older ones do. Weirder and weirder.
I'm at a loss to see how a Pentium would arrive at a better result for
MD5 than for CRC. For one thing, it's going to be at a disadvantage
because it hasn't got enough registers.
I agree. It would appear that the table lookup is causing a major
bubble in the pipelines on the newer Celeron/P2/P3 CPUs.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Sat, Dec 09, 2000 at 06:46:23PM -0500, Tom Lane wrote:
I'm at a loss to see how a Pentium would arrive at a better result for
MD5 than for CRC. For one thing, it's going to be at a disadvantage
because it hasn't got enough registers. I'd be interested to see the
assembly code...
Minutiae aside, it's clear that the MD5 and CRC are "comparable",
regardless of CPU.
For a 32-bit hash, the proven characteristics of CRCs are critical in
some applications. With a good 64-bit hash, the probability of any
collision whether from a burst error or otherwise becomes much lower
than every other systematic source of error -- the details just don't
matter any more. If you miss the confidence that CRCs gave you about
burst errors, consider how easy it would be to construct a collision
if you could just try changing a couple of adjacent bytes -- an
exhaustive search would be easy.
MD4 would be a better choice than MD5, despite that a theoretical attack
on MD4 has been described (albeit never executed). We don't even care
about real attacks, never mind theoretical ones. What matters is that
MD4 is entirely good enough, and faster to compute than MD5.
I find these results very encouraging. BSD-licensed MD4 code is readily
available, e.g. from any of the BSDs themselves.
Nathan Myers
ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes:
Minutiae aside, it's clear that the MD5 and CRC are "comparable",
regardless of CPU.
We've established that the inner loops are pretty comparable. I'm
still concerned about the startup/termination costs of MD5 on short
records. The numbers Bruce and I were trading were mostly for records
long enough to make the startup costs negligible, but they're not
negligible for sub-100-byte records.
For a 32-bit hash, the proven characteristics of CRCs are critical in
some applications. With a good 64-bit hash, the probability of any
collision whether from a burst error or otherwise becomes much lower
than every other systematic source of error -- the details just don't
matter any more.
That's a good point. Of course the critical phrase there is *good*
hash, ie, one without any systematic weaknesses, but as long as we
don't use a "method chosen at random" you're right, it hardly matters.
However, this just begs the question: can't the same be said of a 32-bit
checksum? My argument the other day essentially was that 32 bits is
plenty for what we need to do, and I have not heard a refutation.
One thing we should look at before going with a 64-bit method is the
extra storage space for the larger checksum. We can clearly afford
an extra 32 bits for a checksum on an 8K disk page, but if Vadim is
envisioning checksumming each individual XLOG record then the extra
space is more annoying.
Also, there's the KISS issue. When it takes less than a dozen lines
to do a CRC, versus pages to do MD5, you have to ask yourself what the
extra code space is buying you... also whether you want to get into
licensing issues by borrowing someone else's code. The MD5 code that
Bruce was using is GPL, not BSD, and so couldn't go into the Postgres
core anyway.
MD4 would be a better choice than MD5, despite that a theoretical attack
on MD4 has been described (albeit never executed). We don't even care
about real attacks, never mind theoretical ones. What matters is that
MD4 is entirely good enough, and faster to compute than MD5.
I find these results very encouraging. BSD-licensed MD4 code is readily
available, e.g. from any of the BSDs themselves.
MD4 would be worth looking at, especially if it has less
startup/shutdown overhead than MD5. I think a 64-bit true CRC might
also be worth looking at, just for completeness. But I don't know
where to find code for one.
regards, tom lane
Bruce Guenter <bruceg@em.ca> writes:
A good theory, but unfortunately not a correct theory. PA-RISC can do a
circular shift in one cycle using the "shift right double" instruction,
Interesting. I was under the impression that virtually no RISC CPU had
a rotate instruction. Do any others?
Darn if I know. A RISC purist would probably say that PA-RISC isn't all
that reduced ... for example, the reason it needs six cycles not seven
for the CRC inner loop is that the LOAD instruction has an option to
postincrement the pointer register (like a C "*ptr++").
Same with the x86 core:
movb %dl,%al
xorb (%ecx),%al
andl $255,%eax
shrl $8,%edx
incl %ecx
xorl (%esi,%eax,4),%edx
On my Celeron, the timing for those six opcodes is almost whopping 13
cycles per byte. Obviously there's some major performance hit to do the
memory instructions, because there's no more than 4 cycles worth of
dependant instructions in that snippet.
Yes. It looks like we're looking at pipeline stalls for the memory
reads. I expect PA-RISC would have the same problem if it were not that
the CRC table and data buffer are almost certainly loaded into level-2
cache memory. Curious that you don't get the same result --- what is
the memory cache architecture on your box?
As Nathan remarks nearby, this is just minutiae, but I'm interested
anyway...
regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [001210 12:00] wrote:
Bruce Guenter <bruceg@em.ca> writes:
A good theory, but unfortunately not a correct theory. PA-RISC can do a
circular shift in one cycle using the "shift right double" instruction,Interesting. I was under the impression that virtually no RISC CPU had
a rotate instruction. Do any others?Darn if I know. A RISC purist would probably say that PA-RISC isn't all
that reduced ... for example, the reason it needs six cycles not seven
for the CRC inner loop is that the LOAD instruction has an option to
postincrement the pointer register (like a C "*ptr++").Same with the x86 core:
movb %dl,%al
xorb (%ecx),%al
andl $255,%eax
shrl $8,%edx
incl %ecx
xorl (%esi,%eax,4),%edxOn my Celeron, the timing for those six opcodes is almost whopping 13
cycles per byte. Obviously there's some major performance hit to do the
memory instructions, because there's no more than 4 cycles worth of
dependant instructions in that snippet.Yes. It looks like we're looking at pipeline stalls for the memory
reads. I expect PA-RISC would have the same problem if it were not that
the CRC table and data buffer are almost certainly loaded into level-2
cache memory. Curious that you don't get the same result --- what is
the memory cache architecture on your box?As Nathan remarks nearby, this is just minutiae, but I'm interested
anyway...
I would try unrolling the loop some (if possible) and retesting.
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
On Sun, Dec 10, 2000 at 02:53:43PM -0500, Tom Lane wrote:
On my Celeron, the timing for those six opcodes is almost whopping 13
cycles per byte. Obviously there's some major performance hit to do the
memory instructions, because there's no more than 4 cycles worth of
dependant instructions in that snippet.Yes. It looks like we're looking at pipeline stalls for the memory
reads.
In particular, for the single-byte memory read. By loading in 32-bit
words at a time, the cost drops to about 7 cycles per byte. I
imagine on a 64-bit CPU, loading 64-bit words at a time would drop the
cost even further.
word1 = *(unsigned long*)z;
while (c > 4)
{
z += 4;
ick = IUPDC32 (word1, ick); word1 >>= 8;
c -= 4;
ick = IUPDC32 (word1, ick); word1 >>= 8;
word1 = *(unsigned long*)z;
ick = IUPDC32 (word1, ick); word1 >>= 8;
ick = IUPDC32 (word1, ick);
}
I tried loading two words at a time, starting to load the second word
well before it's used, but that didn't actually reduce the time taken.
As Nathan remarks nearby, this is just minutiae, but I'm interested
anyway...
Yup.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Sun, Dec 10, 2000 at 12:24:59PM -0800, Alfred Perlstein wrote:
I would try unrolling the loop some (if possible) and retesting.
The inner loop was already unrolled, but was only processing single
bytes at a time. By loading in 32-bit words at once, it reduced the
cost to only 7 cycles per byte (from 13).
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
PG should include support for SHA1 anyway. MD5 is not being used in
new stuff anymore, I think. I actually have an SHA1 implementation
that links into PG if anyone is interested (especially if it could get
included in a future release).
e
Interesting. I was under the impression that virtually no RISC CPU had
a rotate instruction. Do any others?
(fyi; doesn't really contribute to the thread :/
Most or all do. There are no "pure RISC" chips in production; all have
had some optimized complex operations added for performance and for code
compactness.
- Thomas
On Sun, Dec 10, 2000 at 02:36:38PM -0500, Tom Lane wrote:
MD4 would be a better choice than MD5, despite that a theoretical attack
on MD4 has been described (albeit never executed). We don't even care
about real attacks, never mind theoretical ones. What matters is that
MD4 is entirely good enough, and faster to compute than MD5.I find these results very encouraging. BSD-licensed MD4 code is readily
available, e.g. from any of the BSDs themselves.MD4 would be worth looking at, especially if it has less
startup/shutdown overhead than MD5. I think a 64-bit true CRC might
also be worth looking at, just for completeness. But I don't know
where to find code for one.
The startup/shutdown for MD4 is identical to that of MD5, however the
inner loop is much smaller (a total of 48 operations instead of 64, with
fewer constants). The inner MD4 loop is about 1.5 times the speed of
MD5.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
One thing we should look at before going with a 64-bit method is the
extra storage space for the larger checksum. We can clearly afford
an extra 32 bits for a checksum on an 8K disk page, but if Vadim is
envisioning checksumming each individual XLOG record then the extra
space is more annoying.
We need in checksum for each record. But there is no problem with
64bit CRC: log record header is 8byte aligned, so CRC addition
will add 8bytes to header anyway. Is there any CRC64 code?
Vadim
Import Notes
Resolved by subject fallback
On Mon, Dec 11, 2000 at 10:09:01AM -0800, Mikheev, Vadim wrote:
One thing we should look at before going with a 64-bit method is the
extra storage space for the larger checksum. We can clearly afford
an extra 32 bits for a checksum on an 8K disk page, but if Vadim is
envisioning checksumming each individual XLOG record then the extra
space is more annoying.We need in checksum for each record. But there is no problem with
64bit CRC: log record header is 8byte aligned, so CRC addition
will add 8bytes to header anyway. Is there any CRC64 code?
All you need is a good 64-bit polynomial. Unfortunately, I've been
unable to find one that's been analyzed to any amount.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Thu, Dec 07, 2000 at 07:36:33PM -0500, Tom Lane wrote:
ncm@zembu.com (Nathan Myers) writes:
2. I disagree with way the above statistics were computed. That eleven
million-year figure gets whittled down pretty quickly when you
factor in all the sources of corruption, even without crashes.
(Power failures are only one of many sources of corruption.) They
grow with the size and activity of the database. Databases are
getting very large and busy indeed.Sure, but the argument still holds. If the net MTBF of your underlying
system is less than a day, it's too unreliable to run a database that
you want to trust. Doesn't matter what the contributing failure
mechanisms are. In practice, I'd demand an MTBF of a lot more than a
day before I'd accept a hardware system as satisfactory...
In many intended uses (such as Landmark's original plan?) it is not just
one box, but hundreds or thousands. With thousands of databases deployed,
the MTBF (including power outages) for commodity hardware is well under a
day, and there's not much you can do about that.
In a large database (e.g. 64GB) you have 8M blocks. Each hash covers
one block. With a 32-bit checksum, when you check one block, you have
a 2^(-32) likelihood of missing an error, assuming there is one. With
8M blocks, you can only claim a 2^(-9) chance.
This is what I meant by "whittling". A factor of ten or a thousand
here, another there, and pretty soon the possibility of undetected
corruption is something that can't reasonably be ruled out.
3. Many users clearly hope to be able to pull the plug on their hardware
and get back up confidently. While we can't promise they won't have
to go to their backups, we should at least be equipped to promise,
with confidence, that they will know whether they need to.And the difference in odds between 2^32 and 2^64 matters here? I made
a numerical case that it doesn't, and you haven't refuted it. By your
logic, we might as well say that we should be using a 128-bit CRC, or
256-bit, or heck, a few kilobytes. It only takes a little longer to go
up each step, right, so where should you stop? I say MTBF measured in
megayears ought to be plenty. Show me the numerical argument that 64
bits is the right place on the curve.
I agree that this is a reasonable question. However, the magic of
exponential growth makes any dissatisfaction with a 64-bit checksum
far less likely than with a 32-bit checksum.
It would forestall any such problems to arrange a configure-time
flag such as "--with-checksum crc-32" or "--with-checksum md4",
and make it clear where to plug in the checksum of one's choice.
Then, ship 7.2 with just crc-32 and let somebody else produce
patches for alternatives if they want them.
BTW, I have been looking for Free 64-bit CRC codes/polynomials and
the closest thing I have found so far was Mark Mitchell's hash,
translated from the Modula-3 system. All the tape drive makers
advertise (but don't publish (AFAIK)) a 64-bit CRC.
A reasonable approach would be to deliver CRC-32 in 7.2, and then
reconsider the default later if anybody contributes good alternatives.
Nathan Myers
ncm@zembu.com
On Sat, Dec 09, 2000 at 12:03:52AM +1100, Horst Herb wrote:
AFAIK the thread for "built in" crcs referred only to CRCs in
the transaction log.
We have been discussing checksums for both the table blocks and for
the transaction log.
Always remember that a psotgres data base on the harddisk can be
manipulated accidentally / maliciously without postgres even running.
These are the cases where you need row level CRCs.
"There is no security without physical security." If somebody can
change the row contents, they can also change the row and/or block
checksum to match.
Nathan Myers
ncm@zembu.com
O
Always remember that a psotgres data base on the harddisk can be
manipulated accidentally / maliciously without postgres even running.
These are the cases where you need row level CRCs."There is no security without physical security." If somebody can
change the row contents, they can also change the row and/or block
checksum to match.
They may, but in a proper setup they won't be able to access the CRC log
files. That way, you can still detect alterations. I presume anyway that most
alterations would be rather accidental than malicious, and in that case the
CRC is extremely helpful
Horst