RE: WAL versus Postgres (or: what goes around, comes ar ound)

Started by Mikheev, Vadimalmost 26 years ago42 messageshackers
Jump to latest
#1Mikheev, Vadim
vmikheev@SECTORBASE.COM

I've read this paper ~2 years ago. My plans so far were:

1. WAL in 7.1
2. New (overwriting) storage manager in 7.2

Comments?

Vadim,

Perhaps best solution will be to keep both (or three) storage
managers - and specify which one to use at database creation time.

After reading the Stonebraker's paper, I could think there
are situations that we want the no-overwrite storage manager and
other where overwrite storage manager may offer better performance.
Wasn't Postgres originally designed to allow different storage
managers?

Overwriting and non-overwriting smgr-s have quite different nature.
Access methods would take care about what type of smgr is used for
specific table/index...

Vadim

#2Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Mikheev, Vadim (#1)
RE: WAL versus Postgres (or: what goes around, comes ar ound)

I've read this paper ~2 years ago. My plans so far were:

1. WAL in 7.1
2. New (overwriting) storage manager in 7.2

Oh, so Vadim has overwriting storage manager concept for 7.2.
Vadim, how will you keep old rows around for MVCC?

Just like you told about it - some outstanding files for old
versions. Something like Oracle' rollback segments.
And, for sure, this will be the most complex part of smgr and
that's why I think that we can't use their smgr if we're
going to keep MVCC.

As for WAL, WAL itself (as collection of routines to log changes,
create checkpoints etc) is 90% done. Now it has to be integrated
into system and the most hard part of this work are access methods
specific redo/undo functions. If we're going to use our access
methods then we'll have to write these functions for no matter
what WAL implementation will be used.

Vadim

#3Daniel Kalchev
daniel@digsys.bg
In reply to: Mikheev, Vadim (#1)
Re: WAL versus Postgres (or: what goes around, comes ar ound)

"Mikheev, Vadim" said:

Perhaps best solution will be to keep both (or three) storage
managers - and specify which one to use at database creation time.

After reading the Stonebraker's paper, I could think there
are situations that we want the no-overwrite storage manager and
other where overwrite storage manager may offer better performance.
Wasn't Postgres originally designed to allow different storage
managers?

Overwriting and non-overwriting smgr-s have quite different nature.
Access methods would take care about what type of smgr is used for
specific table/index...

In light of the discussion whether we can use Berkeley DB (or Sleepycat DB?) -
perhaps it is indeed good idea to start working on the access methods layer -
or perhaps just define more 'reasonable' SMGR layer at higher level than the
current Postgres code.

The idea is: (when) we have this storage manager layer, we could use different
storage managers (or heaps managers in current terms) to manage different
tables/databases.

My idea to use different managers at the database level comes from the fact,
that we do not have transactions that span databases, and that transactions
are probably the things that will be difficult to implement (in short time)
for heaps using different storage managers - such as one table no-overwrite,
another table WAL, third table Berkeley DB etc.

From Vadim's response I imagine he considers this easier to implement...

On the license issue - it is unlikely PostgreSQL to rip off its storage
internals to replace everything with Berkeley DB. This may have worked three
or five years ago, but the current storage manager is reasonable (especially
its crash recovery - I have not seen any other DBMS that is even close to
PostgreSQL in terms of 'cost of crash recovery' - this is anyway different
topic). But, if we have the storage manager layer, it may be possible to use
Berkeley DB as an additional access method - for databases/applications that
may make benefit of it - performance wise and where license permits.

Daniel

#4Michael A. Olson
mao@sleepycat.com
In reply to: Daniel Kalchev (#3)
Berkeley DB license

Yesterday I sent out a message explaining Sleepycat's standard
licensing policy with respect to binary redistribution. That
policy generally imposes GPL-like restrictions on the embedding
application, unless the distributor purchases a separate
license.

We've talked it over at Sleepycat, and we're willing to write a
special agreement for PostgreSQL's use of Berkeley DB. That
agreement would permit redistribution of Berkeley DB with
PostgreSQL at no charge, in binary or source code form, by any
party, with or without modifications to the engine.

In short, we can adopt the PostgreSQL license terms for PostgreSQL's
use of Berkeley DB.

The remaining issues are technical ones.

Rather than replacing just the storage manager, you'd be replacing
the access methods, buffer manager, transaction manager, and some
of the shared memory plumbing with our stuff. I wasn't sufficiently
clear in my earlier message, and talked about "no-overwrite" as if
it were the only component.

Clearly, that's a lot of work. On the other hand, you'd have the
benefit of an extremely well-tested and widely deployed library to
provide those services. Lots of different groups have used the
software, so the abstractions that the API presents are well-thought
out and work well in most cases.

The group is interested in multi-version concurrency control, so that
readers never block on writers. If that's genuinely critical, we'd
be willing to see some work done to add it to Berkeley DB, so that it
can do either conventional 2PL without versioning, or MV. Naturally,
we'd participate in any kind of design discussions you wanted, but
we'd like to see the PostgreSQL implementors build it, since you
understand the feature you want.

Finally, there's the question of whether a tree-based heap store with
an artificial key will be as fast as the heap structure you're using
now. Benchmarking is the only way to know for sure. I don't believe
that this will be a major problem. The internal nodes of any btree
generally wind up in the cache very quickly, and stay there because
they're hot. So you're not doing a lot of disk I/O to get a record
off disk, you're chasing pointers in memory. We don't lose technical
evaluations on performance, as a general thing; I think that you will
be satisfied with the speed.

mike

#5Philip Warner
pjw@rhyme.com.au
In reply to: Michael A. Olson (#4)
Re: Berkeley DB license

At 06:57 16/05/00 -0700, Michael A. Olson wrote:

We've talked it over at Sleepycat, and we're willing to write a
special agreement for PostgreSQL's use of Berkeley DB. That
agreement would permit redistribution of Berkeley DB with
PostgreSQL at no charge, in binary or source code form, by any
party, with or without modifications to the engine.

Just to clarify - if I take PostgreSQL, make a few minor changes to create
a commercial product called Boastgress, your proposed license would allow
the distribution of binaries for the new product without further
interaction, payments, or licensing from Sleepycat?

Similaryly, if changes were made to BDB, I would not have to send those
changes to you, nor would I have to make the source available?

Please don't misunderstand me - it seems to me that you are making a very
generous offer, and I want to clarify that I have understood correctly.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#6Michael A. Olson
mao@sleepycat.com
In reply to: Philip Warner (#5)
Re: Berkeley DB license

At 12:28 AM 5/17/00 +1000, you wrote:

Just to clarify - if I take PostgreSQL, make a few minor changes to create
a commercial product called Boastgress, your proposed license would allow
the distribution of binaries for the new product without further
interaction, payments, or licensing from Sleepycat?

Correct.

Similaryly, if changes were made to BDB, I would not have to send those
changes to you, nor would I have to make the source available?

Also correct. However, the license would only permit redistribution of
the Berkeley DB software embedded in the PostgreSQL engine or the
derivative product that the proprietary vendor distributes. The
vendor would not be permitted to extract Berkeley DB from PostgreSQL
and distribute it separately, as part of some other product offering
or as a standalone embedded database engine.

The intent here is to clear the way for use of Berkeley DB in PostgreSQL,
but not to apply PostgreSQL's license to Berkeley DB for other uses.

mike

#7Bruce Momjian
bruce@momjian.us
In reply to: Michael A. Olson (#6)
Re: Berkeley DB license

Also correct. However, the license would only permit redistribution of
the Berkeley DB software embedded in the PostgreSQL engine or the
derivative product that the proprietary vendor distributes. The
vendor would not be permitted to extract Berkeley DB from PostgreSQL
and distribute it separately, as part of some other product offering
or as a standalone embedded database engine.

The intent here is to clear the way for use of Berkeley DB in PostgreSQL,
but not to apply PostgreSQL's license to Berkeley DB for other uses.

Totally agree, and totally reasonable.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#8The Hermit Hacker
scrappy@hub.org
In reply to: Michael A. Olson (#4)
Re: Berkeley DB license

On Tue, 16 May 2000, Michael A. Olson wrote:

Rather than replacing just the storage manager, you'd be replacing
the access methods, buffer manager, transaction manager, and some
of the shared memory plumbing with our stuff.

So, basically, we rip out 3+ years or work on our backend and put an SQL
front-end over top of BerkleyDB?

#9Bruce Momjian
bruce@momjian.us
In reply to: The Hermit Hacker (#8)
Re: Berkeley DB license

On Tue, 16 May 2000, Michael A. Olson wrote:

Rather than replacing just the storage manager, you'd be replacing
the access methods, buffer manager, transaction manager, and some
of the shared memory plumbing with our stuff.

So, basically, we rip out 3+ years or work on our backend and put an SQL
front-end over top of BerkleyDB?

Well, if we look at our main componients,
parser/rewrite/optimizer/executor, they stay pretty much the same. It
is the lower level stuff that would change.

Now, no one is suggesting we do this. The issue is exploring what gains
we could make in doing this.

I would hate to throw out our code, but I would also hate to not make
change because we think our code is better without objectively judging
ours against someone else's.

In the end, we may find that the needs of a database for storage are
different enough that SDB would not be a win, but I think it is worth
exploring to see if that is true.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Bruce Momjian (#9)
RE: Berkeley DB license

Rather than replacing just the storage manager, you'd be replacing
the access methods, buffer manager, transaction manager, and some
of the shared memory plumbing with our stuff. I wasn't sufficiently
clear in my earlier message, and talked about "no-overwrite" as if
it were the only component.

Clearly, that's a lot of work. On the other hand, you'd have the
benefit of an extremely well-tested and widely deployed library to
provide those services. Lots of different groups have used the
software, so the abstractions that the API presents are well-thought
out and work well in most cases.

True. But after replacement the system as whole would not be well-tested.
"A lot of work" means "a lot of bugs, errors etc".

The group is interested in multi-version concurrency control, so that
readers never block on writers. If that's genuinely critical, we'd
be willing to see some work done to add it to Berkeley DB, so that it
can do either conventional 2PL without versioning, or MV. Naturally,

This would be the first system with both types of CC -:)

Well, so, before replacing anything we would have to add MVCC to BDB.
I still didn't look at your sources, 'll do in a few days...

Vadim

#11Michael A. Olson
mao@sleepycat.com
In reply to: The Hermit Hacker (#8)
Re: Berkeley DB license

At 01:05 PM 5/16/00 -0300, The Hermit Hacker wrote:

So, basically, we rip out 3+ years or work on our backend and put an SQL
front-end over top of BerkleyDB?

I'd put this differently.

Given that you're considering rewriting the low-level storage code
anyway, and given that Berkeley DB offers a number of interesting
services, you should consider using it.

It may make sense for you to leverage the 9+ years of work in
Berkeley DB to save yourself a major reimplementation effort now.

We'd like you guys to make the decision on technical merit, so we
agreed to the license terms you require for PostgreSQL.

mike

#12Brian E Gallew
geek+@cmu.edu
In reply to: Bruce Momjian (#9)
Re: Berkeley DB license

Then <pgman@candle.pha.pa.us> spoke up and said:

In the end, we may find that the needs of a database for storage are
different enough that SDB would not be a win, but I think it is worth
exploring to see if that is true.

Actually, there are other possibilities, too. As a for-instance, it
might be interesting to see what a reiserfs-based storage manager
looks/performs like.

--
=====================================================================
| JAVA must have been developed in the wilds of West Virginia. |
| After all, why else would it support only single inheritance?? |
=====================================================================
| Finger geek@cmu.edu for my public key. |
=====================================================================

#13The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#9)
Re: Berkeley DB license

On Tue, 16 May 2000, Bruce Momjian wrote:

On Tue, 16 May 2000, Michael A. Olson wrote:

Rather than replacing just the storage manager, you'd be replacing
the access methods, buffer manager, transaction manager, and some
of the shared memory plumbing with our stuff.

So, basically, we rip out 3+ years or work on our backend and put an SQL
front-end over top of BerkleyDB?

Now, no one is suggesting we do this. The issue is exploring what gains
we could make in doing this.

Definitely ... I'm just reducing it down to simpler terms, that's all :)

#14Bruce Momjian
bruce@momjian.us
In reply to: The Hermit Hacker (#13)
Re: Berkeley DB license

On Tue, 16 May 2000, Bruce Momjian wrote:

On Tue, 16 May 2000, Michael A. Olson wrote:

Rather than replacing just the storage manager, you'd be replacing
the access methods, buffer manager, transaction manager, and some
of the shared memory plumbing with our stuff.

So, basically, we rip out 3+ years or work on our backend and put an SQL
front-end over top of BerkleyDB?

Now, no one is suggesting we do this. The issue is exploring what gains
we could make in doing this.

Definitely ... I'm just reducing it down to simpler terms, that's all :)

I am glad you did. I like the fact we are open to re-evaluate our code
and consider code from outside sources. Many open-source efforts have
problems with code-not-made-here.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#15Bruce Momjian
bruce@momjian.us
In reply to: Mikheev, Vadim (#10)
Re: Berkeley DB license

The group is interested in multi-version concurrency control, so that
readers never block on writers. If that's genuinely critical, we'd
be willing to see some work done to add it to Berkeley DB, so that it
can do either conventional 2PL without versioning, or MV. Naturally,

This would be the first system with both types of CC -:)

Well, so, before replacing anything we would have to add MVCC to BDB.
I still didn't look at your sources, 'll do in a few days...

Vadim, I thought you said you were going to be doing a new storage
manager for 7.2, including an over-write storage manager that keeps MVCC
tuples in a separate location. Could SDB work in that environment
easier, without having MVCC integrated into SDB?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#16Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Bruce Momjian (#15)
RE: Berkeley DB license

Well, so, before replacing anything we would have to add
MVCC to BDB. I still didn't look at your sources, 'll do
in a few days...

Vadim, I thought you said you were going to be doing a new storage
manager for 7.2, including an over-write storage manager that
keeps MVCC tuples in a separate location. Could SDB work in that
environment easier, without having MVCC integrated into SDB?

How can we integrate SDB code into PostgreSQL without MVCC support
in SDB if we still want to have MVCC?! I missed something?
Or you ask is replacement+changes_in_SDB_for_MVCC easier than
WAL+new_our_smgr? I don't know.

Vadim

#17Bruce Momjian
bruce@momjian.us
In reply to: Mikheev, Vadim (#16)
Re: Berkeley DB license

[Charset iso-8859-1 unsupported, filtering to ASCII...]

Well, so, before replacing anything we would have to add
MVCC to BDB. I still didn't look at your sources, 'll do
in a few days...

Vadim, I thought you said you were going to be doing a new storage
manager for 7.2, including an over-write storage manager that
keeps MVCC tuples in a separate location. Could SDB work in that
environment easier, without having MVCC integrated into SDB?

How can we integrate SDB code into PostgreSQL without MVCC support
in SDB if we still want to have MVCC?! I missed something?
Or you ask is replacement+changes_in_SDB_for_MVCC easier than
WAL+new_our_smgr? I don't know.

You stated that the new storage manager will do over-writing, and that
the MVCC-needed tuples will be kept somewhere else and removed when not
needed.

It is possible to use SDB, and keep the MVCC-needed tuples somewhere
else, also in SDB, so we don't have to add MVCC into the SDB existing
code, we just need to use SDB to implement MVCC.

The issue was that SDB does two-phase locking, and I was asking if MVCC
could be layered on top of SDB, rather than being added into SDB.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#18Bruce Momjian
bruce@momjian.us
In reply to: Mikheev, Vadim (#16)
Re: Berkeley DB license

How can we integrate SDB code into PostgreSQL without MVCC support
in SDB if we still want to have MVCC?! I missed something?
Or you ask is replacement+changes_in_SDB_for_MVCC easier than
WAL+new_our_smgr? I don't know.

I guess I was asking if MVCC could be implemented on top of SDB, rather
than changes made to SDB itself.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#19Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Bruce Momjian (#18)
RE: Berkeley DB license

You stated that the new storage manager will do over-writing, and that
the MVCC-needed tuples will be kept somewhere else and removed when not
needed.

It is possible to use SDB, and keep the MVCC-needed tuples somewhere
else, also in SDB, so we don't have to add MVCC into the SDB existing
code, we just need to use SDB to implement MVCC.

Possible, in theory.

The issue was that SDB does two-phase locking, and I was asking if MVCC

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Due to this fact seems we'll have to change SDB anyway. With MVCC per-tuple
locking is not needed. Short-term per-buffer _latches_ are used to prevent
concurrent changes in a buffer, no locks made via lock manager.
I'm not sure does SDB API allow _any_ access to modified tuples or not.
I would rather assume that it doesn't.

could be layered on top of SDB, rather than being added into SDB.

As I said - possible, in theory, - and also not good thing to do, in theory.
MVCC and 2PL are quite different approaches to problem of concurrency
control. So, how good is layering one approach over another, who knows?

Vadim

#20Hannu Krosing
hannu@tm.ee
In reply to: Bruce Momjian (#14)
Re: Berkeley DB license

Bruce Momjian wrote:

On Tue, 16 May 2000, Bruce Momjian wrote:

On Tue, 16 May 2000, Michael A. Olson wrote:

Rather than replacing just the storage manager, you'd be replacing
the access methods, buffer manager, transaction manager, and some
of the shared memory plumbing with our stuff.

So, basically, we rip out 3+ years or work on our backend and put an SQL
front-end over top of BerkleyDB?

Now, no one is suggesting we do this. The issue is exploring what gains
we could make in doing this.

Definitely ... I'm just reducing it down to simpler terms, that's all :)

I am glad you did. I like the fact we are open to re-evaluate our code
and consider code from outside sources. Many open-source efforts have
problems with code-not-made-here.

I have been planning to add a full-text index (a.k.a. inverted index) to
postgres
text and array types for some time already. This is the only major index
type
not yet supported by postgres and currently implemented as an external
index in
our products. I have had a good excuse (for me) for postponing that work
in
the limited size of text datatype (actually the tuples) but AFAIK it is
going
away in 7.1 ;)

I have done a full-text index for a major national newspaper that has
worked
ok for several years using python and old (v1.86, pre-sleepycat) BSD DB
code - I
stayed away from 2.x versions due to SC license terms. I'm happy to
hear that BSD DB and postgreSQL storage schemes are designed to be
compatible.

But I still suspect that taking some existing postgres index (most
likely btree)
as base would be less effort than integrating locking/transactions of
PG/BDB.

------------
Hannu

#21Michael A. Olson
mao@sleepycat.com
In reply to: Mikheev, Vadim (#19)
#22Michael A. Olson
mao@sleepycat.com
In reply to: Michael A. Olson (#4)
#23Michael A. Olson
mao@sleepycat.com
In reply to: Michael A. Olson (#22)
#24Michael A. Olson
mao@sleepycat.com
In reply to: Michael A. Olson (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Michael A. Olson (#24)
#26Mike Mascari
mascarm@mascari.com
In reply to: Bruce Momjian (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Mike Mascari (#26)
#28Philip Warner
pjw@rhyme.com.au
In reply to: Mike Mascari (#26)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael A. Olson (#23)
#30Alfred Perlstein
bright@wintelcom.net
In reply to: Michael A. Olson (#23)
#31Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#29)
In reply to: Michael A. Olson (#4)
#33Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Benjamin Adida (#32)
#34Michael A. Olson
mao@sleepycat.com
In reply to: Mikheev, Vadim (#33)
#35Bruce Momjian
bruce@momjian.us
In reply to: Michael A. Olson (#22)
#36Alex Pilosov
alex@pilosoft.com
In reply to: Bruce Momjian (#35)
#37Bruce Momjian
bruce@momjian.us
In reply to: Alex Pilosov (#36)
#38Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Alex Pilosov (#36)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#38)
#40Matthias Urlichs
smurf@noris.net
In reply to: Alex Pilosov (#36)
#41Bruce Momjian
bruce@momjian.us
In reply to: Matthias Urlichs (#40)
#42Michael A. Olson
mao@sleepycat.com
In reply to: Bruce Momjian (#35)