Priorities for 6.6

Started by Tom Laneover 26 years ago106 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Jan Wieck writes (over in pgsql-sql):

* WE STILL NEED THE GENERAL TUPLE SPLIT CAPABILITY!!! *

I've been thinking about making this post for a while ... with 6.5
almost out the door, I guess now is a good time.

I don't know what people have had in mind for 6.6, but I propose that
there ought to be three primary objectives for our next release:

1. Eliminate arbitrary restrictions on tuple size.

2. Eliminate arbitrary restrictions on query size (textual
length/complexity that is).

3. Cure within-statement memory leaks, so that processing large numbers
of tuples in one statement is reliable.

All of these are fairly major projects, and it might be that we get
little or nothing else done if we take these on. But these are the
problems we've been hearing about over and over and over. I think
fixing these would do more to improve Postgres than almost any other
work we might do.

Comments? Does anyone have a different list of pet peeves? Is there
any chance of getting everyone to subscribe to a master plan like this?

regards, tom lane

#2Vadim Mikheev
vadim@krs.ru
In reply to: Tom Lane (#1)
Re: [HACKERS] Priorities for 6.6

Tom Lane wrote:

I don't know what people have had in mind for 6.6, but I propose that
there ought to be three primary objectives for our next release:

1. Eliminate arbitrary restrictions on tuple size.

This is not primary for me -:)
Though, it's required by PL/pgSQL and so... I agreed that
this problem must be resolved in some way. Related TODO items:

* Allow compression of large fields or a compressed field type
* Allow large text type to use large objects(Peter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I like it very much, though I don't like that LO are stored
in separate files. This is known as "multi-representation" feature
in Illustra.

2. Eliminate arbitrary restrictions on query size (textual
length/complexity that is).

Yes, this is quite annoyning thing.

3. Cure within-statement memory leaks, so that processing large numbers
of tuples in one statement is reliable.

Quite significant!

All of these are fairly major projects, and it might be that we get
little or nothing else done if we take these on. But these are the
problems we've been hearing about over and over and over. I think
fixing these would do more to improve Postgres than almost any other
work we might do.

Comments? Does anyone have a different list of pet peeves? Is there
any chance of getting everyone to subscribe to a master plan like this?

No chance -:))

This is what I would like to see in 6.6:

1. Referential integrity.
2. Dirty reads (will be required by 1. if we'll decide to follow
the way proposed by Jan - using rules, - though there is another
way I'll talk about later; dirty reads are useful anyway).
3. Savepoints (they are my primary wish-to-implement thing).
4. elog(ERROR) must return error-codes, not just messages!
This is very important for non-interactive application...
in conjuction with 3. -:)

Vadim

#3The Hermit Hacker
scrappy@hub.org
In reply to: Vadim Mikheev (#2)
Re: [HACKERS] Priorities for 6.6

On Fri, 4 Jun 1999, Vadim Mikheev wrote:

* Allow compression of large fields or a compressed field type

This one looks cool...

All of these are fairly major projects, and it might be that we get
little or nothing else done if we take these on. But these are the
problems we've been hearing about over and over and over. I think
fixing these would do more to improve Postgres than almost any other
work we might do.

Comments? Does anyone have a different list of pet peeves? Is there
any chance of getting everyone to subscribe to a master plan like this?

No chance -:))

have to agree with Vadim here...the point that has *always* been stressed
here is that if something is important to you, fix it. Don't expect
anyone else to fall into some sort of "party line" or scheduale, cause
then ppl lose the enjoyment in what they are doing *shrug*

for instance, out of the three things you listed, the only one that I'd
consider an issue is the third, as I've never hit the first two
limitations ...*shrug*

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#4Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#2)
Re: [HACKERS] Priorities for 6.6

This is what I would like to see in 6.6:

1. Referential integrity.

Bingo. Item #1. Period. End of story. Everything else pales in
comparison. We just get too many requests for this, though I think it
an insignificant feature myself. Jan, I believe you have some ideas on
this. (Like an elephant, I never forget.)

4. elog(ERROR) must return error-codes, not just messages!
This is very important for non-interactive application...
in conjuction with 3. -:)

Added to TODO.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#1)
Re: [HACKERS] Priorities for 6.6

Jan Wieck writes (over in pgsql-sql):

* WE STILL NEED THE GENERAL TUPLE SPLIT CAPABILITY!!! *

I've been thinking about making this post for a while ... with 6.5
almost out the door, I guess now is a good time.

I don't know what people have had in mind for 6.6, but I propose that
there ought to be three primary objectives for our next release:

1. Eliminate arbitrary restrictions on tuple size.

2. Eliminate arbitrary restrictions on query size (textual
length/complexity that is).

3. Cure within-statement memory leaks, so that processing large numbers
of tuples in one statement is reliable.

I think the other hot item for 6.6 is outer joins.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#5)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

I think the other hot item for 6.6 is outer joins.

I would like to have 48 hours in day -:)

Vadim

#7Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#6)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

I think the other hot item for 6.6 is outer joins.

I would like to have 48 hours in day -:)

Vadim

You and I are off the hook. Jan volunteered for foreign keys, and
Thomas for outer joins. We can relax. :-)

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#8Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#7)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

Bruce Momjian wrote:

I think the other hot item for 6.6 is outer joins.

I would like to have 48 hours in day -:)

Vadim

You and I are off the hook. Jan volunteered for foreign keys, and
Thomas for outer joins. We can relax. :-)

I volunteered for savepoints -:))

Vadim

#9Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#8)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

Bruce Momjian wrote:

I think the other hot item for 6.6 is outer joins.

I would like to have 48 hours in day -:)

Vadim

You and I are off the hook. Jan volunteered for foreign keys, and
Thomas for outer joins. We can relax. :-)

I volunteered for savepoints -:))

Oh.

Hey, I thought you were going to sleep?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#9)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

I think the other hot item for 6.6 is outer joins.

I would like to have 48 hours in day -:)

Vadim

You and I are off the hook. Jan volunteered for foreign keys, and
Thomas for outer joins. We can relax. :-)

I volunteered for savepoints -:))

Oh.

Hey, I thought you were going to sleep?

I just try to have at least 25 hours in day :)

Vadim

#11Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#10)
Re: [HACKERS] Priorities for 6.6

I volunteered for savepoints -:))

Oh.

Hey, I thought you were going to sleep?

I just try to have at least 25 hours in day :)

Just have some pelmeni and go to sleep.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#12Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#11)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

I volunteered for savepoints -:))

Oh.

Hey, I thought you were going to sleep?

I just try to have at least 25 hours in day :)

Just have some pelmeni and go to sleep.

-:)))

Vadim

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vadim Mikheev (#12)
Re: [HACKERS] Priorities for 6.6

Vadim Mikheev <vadim@krs.ru> writes:

Tom Lane wrote:

1. Eliminate arbitrary restrictions on tuple size.

This is not primary for me -:)

Fair enough; it's not something I need either. But I see complaints
about it constantly on the mailing lists; a lot of people do need it.

* Allow large text type to use large objects(Peter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I like it very much, though I don't like that LO are stored
in separate files.

But, but ... if we fixed the tuple-size problem then people could stop
using large objects at all, and instead just put their data into tuples.
I hate to see work going into improving LO support when we really ought
to be phasing out the whole feature --- it's got *so* many conceptual
and practical problems ...

any chance of getting everyone to subscribe to a master plan like this?

No chance -:))

Yeah, I know ;-). But I was hoping to line up enough people so that
these things have some chance of getting done. I doubt that any of
these projects can be implemented by just one or two people; they all
affect too much of the code. (For instance, eliminating query-size
restrictions will require looking at all of the interface libraries,
psql, pg_dump, and probably other apps, even though the fixes in
the backend should be somewhat localized.)

regards, tom lane

#14Don Baccus
dhogaza@pacifier.com
In reply to: Tom Lane (#13)
Re: [HACKERS] Priorities for 6.6

At 05:39 PM 6/3/99 -0400, Tom Lane wrote:

But, but ... if we fixed the tuple-size problem then people could stop
using large objects at all, and instead just put their data into tuples.
I hate to see work going into improving LO support when we really ought
to be phasing out the whole feature --- it's got *so* many conceptual
and practical problems ...

Making them go away would be a real blessing. Oracle folk
bitch about CLOBS and BLOBS and the like, too. They're a
pain.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#15Vadim Mikheev
vadim@krs.ru
In reply to: Don Baccus (#14)
Re: [HACKERS] Priorities for 6.6

Don Baccus wrote:

At 05:39 PM 6/3/99 -0400, Tom Lane wrote:

* Allow large text type to use large objects(Peter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I like it very much, though I don't like that LO are stored
in separate files. This is known as "multi-representation" feature
in Illustra.

But, but ... if we fixed the tuple-size problem then people could stop
using large objects at all, and instead just put their data into tuples.
I hate to see work going into improving LO support when we really ought
to be phasing out the whole feature --- it's got *so* many conceptual
and practical problems ...

Making them go away would be a real blessing. Oracle folk
bitch about CLOBS and BLOBS and the like, too. They're a
pain.

Note: I told about "multi-representation" feature, not just about
LO/CLOBS/BLOBS support. "Multi-representation" means that server
stores tuple fields sometime inside the main relation file,
sometime outside of it, but this is hidden from user and so
people "just put their data into tuples". I think that putting
big fields outside of main relation file is very good thing.
BTW, this approach also allows what you are proposing - why not
put not too big field (~ 8K or so) to another block of main file?
BTW, I don't like using LOs as external storage.

Implementation seems easy:

struct varlena
{
int32 vl_len;
char vl_dat[1];
};

1. make vl_len uint32;
2. use vl_len & 0x80000000 as flag that underlying data is
in another place;
3. put oid of external "relation" (where data is stored),
blocknumber and item position (something else?) to vl_dat.
...
...
...

Vadim

#16Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#15)
Re: [HACKERS] Priorities for 6.6

Implementation seems easy:

struct varlena
{
int32 vl_len;
char vl_dat[1];
};

1. make vl_len uint32;
2. use vl_len & 0x80000000 as flag that underlying data is
in another place;
3. put oid of external "relation" (where data is stored),
blocknumber and item position (something else?) to vl_dat.
...

Yes, it would be very nice to have this.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#17Don Baccus
dhogaza@pacifier.com
In reply to: Vadim Mikheev (#15)
Re: [HACKERS] Priorities for 6.6

At 10:56 AM 6/4/99 +0800, Vadim Mikheev wrote:

Note: I told about "multi-representation" feature, not just about
LO/CLOBS/BLOBS support. "Multi-representation" means that server
stores tuple fields sometime inside the main relation file,
sometime outside of it, but this is hidden from user and so
people "just put their data into tuples". I think that putting
big fields outside of main relation file is very good thing.

Yes, it is, though "big" is relative (as computers grow). The
key is to hide the details of where things are stored from the
user, so the user doesn't really have to know what is "big"
(today) vs. "small" (tomorrow or today, for that matter). I
don't think it's so much the efficiency hit of having big
items stored outside the main relation file, as the need for
the user to know what's "big" and what's "small", that's the
problem.

I mean, my background is as a compiler writer for high-level
languages...call me a 1970's idealist if you will, but I
really think such things should be hidden from the user.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#18Don Baccus
dhogaza@pacifier.com
In reply to: Vadim Mikheev (#15)
Re: [HACKERS] Priorities for 6.6

At 10:56 AM 6/4/99 +0800, Vadim Mikheev wrote:

Note: I told about "multi-representation" feature, not just about
LO/CLOBS/BLOBS support. "Multi-representation" means that server
stores tuple fields sometime inside the main relation file,
sometime outside of it, but this is hidden from user and so
people "just put their data into tuples".

OK, in my first response I didn't pick up on your generalization,
but I did respond with a generalization that implementation
details should be hidden from the user.

Which is what you're saying.

As a compiler writer, this is more or less what I devoted my
life to 20 years ago...of course, reasonable efficiency is
a pre-condition if you're going to hide details from the
user...

I'll back off a bit, though, and say that a lot of DB users
really don't need an enterprise engine like Oracle (i.e.
something that requires a suite of $100K/yr DBAs :)

There's a niche for a solid reliable, rich feature set,
reasonably well-performing db out there, and this niche
is ever-growing with the web.

With $500 web servers sitting on $29.95/mo DSL lines,
as does mine (http://donb.photo.net/tweeterdom), who
wants to pay $6K to Oracle?

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#19Philip Warner
pjw@rhyme.com.au
In reply to: Don Baccus (#18)
Re: [HACKERS] Priorities for 6.6

Dear All,

It seems to me that there are a bunch of related issues that probably need to be tied together (and forgotten about?):

1. A 'nice' user interface for blobs
2. Text fields stored as blobs
3. Naming issues for 'system' tables etc.
4. pg_dump support for blobs and other 'internal' structures.
5. Blob storage in files Vs. a 'nicer' storage medium.
6. The tuple-size problem(?)

Points (1) & (2) are really the same thing; if you provide a nice interface to blobs: "select len(blob_field) from ...." and "select blob_field from ...", then any discussion of the messiness associated with blobs will go away. Personally, I would hate to lose the ability to store a blob's data using a series of 'lo_write' calls: one system I work on (not in PG) has blob data as large as 24MB which makes blob_write functionality essential.

Points (3) & (4) recognize that there are a number issues floating around that relate to the basic inappropriateness of using SQL to reload the data structures of an existing database. I have only used a few commercial DBs, but the ones I have used uniformly have a 'dump' that produces data files in it's own format. There is no question that having pg_dump produce a schema and/or INSERT statements is nice, but a new option needs to be added to allow raw exports, and a new pg_load utility needs to be written. Cross-version compatibility between export formats must also be maintained (obviously).

Point (5) recognizes that storing 'large' data in the same area that a row is stored in will remove any benefits of clustering, so a method of handling blob data needs to be found, irrespective of whether PG still supports blobs as such. I don't know how PG handles large text fields - some commercial systems allow the user to 'map' specific fields to separate data files. The current system (storing blobs in files) is fine except in so far as it *looks* messy, produces *huge* directories, and is slow for many small blobs (file open/read/close per row).

I don't know anything about the 'tuple-size' problem (point 6), but it may also relate to a solution for storing blob-data (or specific columns) in alternate locations.

I hope this is not all static...

Philip Warner.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#20Hannu Krosing
hannu@trust.ee
In reply to: Tom Lane (#1)
Re: [HACKERS] Priorities for 6.6

Tom Lane wrote:

I don't know what people have had in mind for 6.6, but I propose that
there ought to be three primary objectives for our next release:

1. Eliminate arbitrary restrictions on tuple size.

2. Eliminate arbitrary restrictions on query size (textual
length/complexity that is).

3. Cure within-statement memory leaks, so that processing large numbers
of tuples in one statement is reliable.

I would add a few that I think would be important:

A. Add outer joins

B. Add the possibility to prepare statements and then execute them
with a set of arguments. This already exists in SPI but for many
C/S apps it would be desirable to have this in the fe/be protocol
as well

C. Look over the protocol and unify the _binary_ representations of
datatypes on wire. in fact each type already has two sets of
in/out conversion functions in its definition tuple, one for disk and
another for net, it's only that until now they are the same for
all types and thus probably used wromg in some parts of code.

D. After B. and C., add a possibility to insert binary data
in "(small)binary" field without relying on LOs or expensive
(4x the size) quoting. Allow any characters in said binary field

E. to make 2. and B., C, D. possible, some more fundamental changes in
fe/be-protocol may be needed. There seems to be some effort for a new
fe/be communications mechanism using CORBA.
But my proposal would be to adopt the X11 protocol which is quite
light
but still very clean, well understood and which can transfer
arbitrary
data in an efficient way.
There are even "low bandwidth" variants of it for using over
really slow links. Also some kinds of "out of band" provisions exist,
that are used by window managers.
It should also be trivial to adapt crypto wrappers/proxies (such as
the
one in ssh)
The protocol is described in a document available from
http://www.x.org

F. As a lousy alternative to 1. fix the LO storage. Currently _all_ of
the LO files are kept in the same directory as the tables and
indexes.
this can bog down the whole database quite fast if one lots of LOs
and
a file system that does linear scans on open (like ext2).
A sheme where LOs are kept in subdirectories based on the hex
representation of their oids would avoid that (so LO with OID
0x12345678
would be stored in $PG_DATA/DBNAME/LO/12/34/56/78.lo or maybe
reversed
$PG_DATA/DBNAME/LO/78/56/34/12.lo to distribute them more evenly in
"buckets"

All of these are fairly major projects, and it might be that we get
little or nothing else done if we take these on.

But then, the other things to do _are_ little compared to these ;)

But these are the problems we've been hearing about over and over and
over.

The LO thing (and lack of decent full-text indexing) is what has kept me
using hybrid solutions where I keep the LO data and home-grown full-text
indexes in file system outside of the database.

I think fixing these would do more to improve Postgres than
almost any other work we might do.

Amen!

----------------
Hannu

#21Peter Galbavy
Peter.Galbavy@knowledge.com
In reply to: Bruce Momjian (#16)
Re: [HACKERS] Priorities for 6.6

On Thu, Jun 03, 1999 at 11:27:14PM -0400, Bruce Momjian wrote:

Implementation seems easy:

struct varlena
{
int32 vl_len;
char vl_dat[1];
};

1. make vl_len uint32;
2. use vl_len & 0x80000000 as flag that underlying data is
in another place;
3. put oid of external "relation" (where data is stored),
blocknumber and item position (something else?) to vl_dat.
...

Yes, it would be very nice to have this.

I hate to be fussy - normally I am just watching, but could we
*please* keep any flag like above in another field. That way, when
the size of an object reaches 2^31 we will not have legacy problems..

struct varlena
{
size_t vl_len;
int vl_flags;
caddr_t vl_dat[1];
};

(Please:)

Regards,
--
Peter Galbavy
Knowledge Matters Ltd
http://www.knowledge.com/

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Galbavy (#21)
Re: [HACKERS] Priorities for 6.6

Vadim Mikheev <vadim@krs.ru> writes:

Note: I told about "multi-representation" feature, not just about
LO/CLOBS/BLOBS support. "Multi-representation" means that server
stores tuple fields sometime inside the main relation file,
sometime outside of it, but this is hidden from user and so
people "just put their data into tuples". I think that putting
big fields outside of main relation file is very good thing.

Ah, I see what you mean. If you think that is easier than splitting
tuples, we could go that way. We'd have a limit of about 500 fields in
a tuple (maybe less if the tuple contains "small" fields that are not
pushed to another place). That's annoying if the goal is to eliminate
limits, but I think it would be unlikely to be a big problem in
practice.

Perhaps a better way is to imagine these "pointers to another place"
to be just part of the tuple structure on disk, without tying them to
individual fields. In other words, the tuple's data is still a string
of fields, but now you can have that data either right there with the
tuple header, or pointed to by a list of "indirect links" that are
stored with the tuple header. (Kinda like direct vs indirect blocks in
Unix filesystem.) You can chop the tuple data into blocks without
regard for field boundaries if you do it that way. I think that might
be better than altering the definition of varlena --- it'd be visible
only to the tuple read and write mechanisms, not to everything in the
executor that deals with varlena fields...

regards, tom lane

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#22)
Re: [HACKERS] Priorities for 6.6

Hannu Krosing <hannu@trust.ee> writes:

E. to make 2. and B., C, D. possible, some more fundamental changes in
fe/be-protocol may be needed. There seems to be some effort for a new
fe/be communications mechanism using CORBA.
But my proposal would be to adopt the X11 protocol which is quite
light but still very clean, well understood and which can transfer
arbitrary data in an efficient way.

... but no one uses it for database work. If we're going to go to the
trouble of overhauling the fe/be protocol, I think we should adopt
something fairly standard, and that seems to mean CORBA.

F. As a lousy alternative to 1. fix the LO storage. Currently _all_ of
the LO files are kept in the same directory as the tables and
indexes. this can bog down the whole database quite fast

Yes. I was thinking last night that there's no good reason not to
just stick all the LOs into a single relation --- or actually two
relations, one having a row per LO (which would really just act to tell
you what LOs exist, and perhaps store access-privileges info) and one
that has a row per LO chunk, with columns LONumber, Offset, Data rather
than just Offset and Data as is done now. The existing index on Offset
would be replaced by a multi-index on LONumber and Offset. In this
scheme the LONumbers need not be tied hard-and-fast to OIDs, but could
actually be anything you wanted, which would be much nicer for
dump/reload purposes.

However, I am loathe to put *any* work into improving LOs, since I think
the right answer is to get rid of the need for the durn things by
eliminating the size restrictions on regular tuples.

regards, tom lane

#24Vince Vielhaber
vev@michvhf.com
In reply to: Tom Lane (#23)
Re: [HACKERS] Priorities for 6.6

On 04-Jun-99 Tom Lane wrote:

However, I am loathe to put *any* work into improving LOs, since I think
the right answer is to get rid of the need for the durn things by
eliminating the size restrictions on regular tuples.

Is this doable? I just looked at the list of datatypes and didn't see
binary as one of them. Imagining a Real Estate database with pictures
of homes (inside and out), etc. or an employee database with mugshots of
the employees, what datatype would you use to store the pictures (short
of just storing a filename of the pic)?

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null
# include <std/disclaimers.h> TEAM-OS2
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vince Vielhaber (#24)
Re: [HACKERS] Priorities for 6.6

Vince Vielhaber <vev@michvhf.com> writes:

On 04-Jun-99 Tom Lane wrote:

However, I am loathe to put *any* work into improving LOs, since I think
the right answer is to get rid of the need for the durn things by
eliminating the size restrictions on regular tuples.

Is this doable? I just looked at the list of datatypes and didn't see
binary as one of them.

bytea ... even if we didn't have one, inventing it would be trivial.
(Although I wonder whether pg_dump copes with arbitrary data in fields
properly ... I think there are still some issues about COPY protocol
not being fully 8-bit-clean...)

As someone else pointed out, you'd still want an equivalent of
lo_read/lo_write, but now it would mean fetch or put N bytes at an
offset of M bytes within the value of field X of tuple Y in some
relation. Otherwise field X is pretty much like any other item in the
database. I suppose it'd only make sense to allow random data to be
fetched/stored in a bytea field --- other datatypes would want to
constrain the data to valid values...

regards, tom lane

#26Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#25)
Re: [HACKERS] Priorities for 6.6

I obtained Oracle for Linux and did some comparisons with PostgreSQL
6.5 current using the test suite I mentioned before that is good to
create heavy loads.

With default postmaster settings (postmaster -S -i), PostgreSQL was
several times slower than Oracle. However with -F (postmaster -S -i -o
'-F'), PostgreSQL was much faster than the default settings. Yes, this
is well known behavior of PostgreSQL. Without -F PostgreSQL does
fsync() every time a transaction is committed, and it is the bottle
neck of the performance. I observed the disk activity LED almost
always on while running PostgreSQL without -F. However with -F, there
may be a chance that we loose committed data if the computer gets
crashed.

On the other hand the LED was on only every few secs while running
Oracle. I heard that Oracle has a "REDO log file" and a log is written
into there when a transaction is committed. If so, apparently Oracle
does not issue sync() or fsync() every time a transaction gets
committed. I don't know how Oracle guarantees the log be written into
the disk without sync() or fsync() at the commit time, but seems
something like it is one of the most important technique to enhance
the performance of PostgreSQL.

Does anybody have an idea on this?
---
Tatsuo Ishii

#27Don Baccus
dhogaza@pacifier.com
In reply to: Tatsuo Ishii (#26)
Re: [HACKERS] Priorities for 6.6

At 08:50 AM 6/5/99 +0900, Tatsuo Ishii wrote:

On the other hand the LED was on only every few secs while running
Oracle. I heard that Oracle has a "REDO log file" and a log is written
into there when a transaction is committed. If so, apparently Oracle
does not issue sync() or fsync() every time a transaction gets
committed. I don't know how Oracle guarantees the log be written into
the disk without sync() or fsync() at the commit time, but seems
something like it is one of the most important technique to enhance
the performance of PostgreSQL.

Does anybody have an idea on this?

It's a well-known bug in the current Oracle release for Linux,
the redo log is supposed to be fsynch'd on commitment. Oracle
does fsynch on other Unices. It will be interesting to see if
the upcoming 8.1.5 (or "8i", "i" for internet, as it's called)
will have the bug fixed. This still won't cause a lot of
disk thrashing in a recommended Oracle installation as the
redo log should be on a separate spindle from any db spindles,
and Oracle grabs the entire file when the db's created in
order to increase the odds that the file will be one sequential
series of blocks (of course, real Oracle studs use raw disks
in which case the db can guarantee serial block writing).

There's a separate demeon hanging around that writes dirty
database pages to the disk at its leisure.

Of course, if I've understood past postings to this list,
Postgres also fsynch's after read-only selects, too, and
my own experience would seem to confirm it (putting a
string of selects in a transaction makes the disk get
quiet, just as it does with inserts).

I can guarantee that Oracle NEVER does that :)

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#28Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Vince Vielhaber (#24)
Re: [HACKERS] Priorities for 6.6

eliminating the size restrictions on regular tuples.

Is this doable?

Presumably we would have to work out a "chunking" client/server
protocol to allow sending very large tuples. Also, it would need to
report the size of the tuple before it shows up, to allow very large
rows to be caught correctly.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#28)
Re: [HACKERS] Priorities for 6.6

Don Baccus <dhogaza@pacifier.com> writes:

Of course, if I've understood past postings to this list,
Postgres also fsynch's after read-only selects, too,

I recently learned something about this that I hadn't understood before.
When a tuple is written out during an insert/update transaction, it is
marked as not definitely committed (since of course Postgres can't know
whether you'll abort the transaction later). The ID of the transaction
that wrote it is stored with it. Subsequently, whenever the tuple is
scanned, the backend has to go to the "transaction log" to see if that
transaction has been committed yet --- if not, it ignores the tuple.

As soon as the transaction is known to be committed, the next operation
that visits that tuple will mark it as "known committed", so as to avoid
future consultations of the transaction log. This happens *even if the
current operation is a select*. That is why selects can cause disk
writes in Postgres.

Similar things happen when a tuple is replaced or deleted, of course.

In short, if you load a bunch of tuples into a table, the first select
after the load can run a lot slower than you might expect, because it'll
be writing back most or all of the pages it touches. But that penalty
doesn't affect every select, only the first one to scan a newly-written
tuple.

regards, tom lane

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#29)
Re: [HACKERS] Priorities for 6.6

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

eliminating the size restrictions on regular tuples.

Is this doable?

Presumably we would have to work out a "chunking" client/server
protocol to allow sending very large tuples.

I don't really see a need to change the protocol. It's true that
a single tuple containing a couple dozen megabytes (per someone's
recent example) would stress the system unpleasantly, but that would
be true in a *lot* of ways. Perhaps we should plan on keeping the
LO feature to allow for really huge objects.

As far as I've seen, 99% of users are not interested in storing objects
that are so large that handling them as single tuples would pose serious
performance problems. It's just that a hard limit at 8K (or any other
particular small number) is annoying.

regards, tom lane

#31Don Baccus
dhogaza@pacifier.com
In reply to: Tom Lane (#29)
Re: [HACKERS] Priorities for 6.6

At 11:31 AM 6/5/99 -0400, Tom Lane wrote:

In short, if you load a bunch of tuples into a table, the first select
after the load can run a lot slower than you might expect, because it'll
be writing back most or all of the pages it touches. But that penalty
doesn't affect every select, only the first one to scan a newly-written
tuple.

While I don't doubt your analysis is correct for the case you've
uncovered, it doesn't explain why surrounding a bunch of selects
with a begin/end block greatly descreases disk activity for tables
that don't change. I'm pulling out "select" lists (html <select>)
from small tables of counties, states, countries for the project
I'm working on. The two countries, for instance, are "USA" and
"CA" and the table's not been updated in two months :). I'm
building a form and doing a very simple "select * from county_locales"
type selects, then building a <select> list containing all of the
possible values (not as many as you might think, this project
involves only the Pacific Northwest). There are several of
these selects executed for each form. Without the transaction
block, there's a lot of disk activity. With it, much less.

I can go pull out the begin/end blocks, they're conditionalized
in my Tcl scripts based on a "postgres" predicate so they'll
disappear if I migrate the database to another engine. Maybe
I'll have time this afternoon, if you'd like me to confirm, I'm
going to a brunch right now...

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#32Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#29)
Re: [HACKERS] Priorities for 6.6

In short, if you load a bunch of tuples into a table, the first select
after the load can run a lot slower than you might expect, because it'll
be writing back most or all of the pages it touches. But that penalty
doesn't affect every select, only the first one to scan a newly-written
tuple.

I have removed this from the TODO list:

* Prevent fsync in SELECT-only queries

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#32)
Re: [HACKERS] Priorities for 6.6

Don Baccus <dhogaza@pacifier.com> writes:

While I don't doubt your analysis is correct for the case you've
uncovered, it doesn't explain why surrounding a bunch of selects
with a begin/end block greatly descreases disk activity for tables
that don't change.

Hmm, I'm not sure why that should be, either. Anyone?

regards, tom lane

#34Hannu Krosing
hannu@trust.ee
In reply to: Tom Lane (#33)
Re: [HACKERS] Priorities for 6.6

Tom Lane wrote:

Don Baccus <dhogaza@pacifier.com> writes:

While I don't doubt your analysis is correct for the case you've
uncovered, it doesn't explain why surrounding a bunch of selects
with a begin/end block greatly descreases disk activity for tables
that don't change.

Hmm, I'm not sure why that should be, either. Anyone?

From a recent discussion I remember that every block that is read

in is marked as dirty, regardless of weather it is modified or not.

It is not a genuine bug (as it only slows thong down instead of
getting wrong results), but still a misfeature.

It is most likely an ancient quickfix for some execution path that
failed to set the dirty mark when it should have.

---------------------
Hannu

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#34)
Re: [HACKERS] Priorities for 6.6

Hannu Krosing <hannu@trust.ee> writes:

Hmm, I'm not sure why that should be, either. Anyone?

From a recent discussion I remember that every block that is read
in is marked as dirty, regardless of weather it is modified or not.

No, that was me claiming that, on the basis of a profile I had taken
that showed an unreasonably large number of writes --- but the case
I was profiling was a selective UPDATE on a table that had just
been loaded. When I repeated the test, the number of writes decreased
to the right ballpark.

I am not sure what effect Don is seeing, but I don't think it's quite
as dumb a mistake as that...

regards, tom lane

#36Don Baccus
dhogaza@pacifier.com
In reply to: Hannu Krosing (#34)
Re: [HACKERS] Priorities for 6.6

At 11:51 PM 6/5/99 +0300, Hannu Krosing wrote:

It is not a genuine bug (as it only slows thong down instead of
getting wrong results), but still a misfeature.

Well, it depends on how one defines "bug", I suppose :) In the
strictest sense you're correct, yet for real world use, particularly
in environments with high traffic, it's a killer.

It is most likely an ancient quickfix for some execution path that
failed to set the dirty mark when it should have.

Yep, I remember this from the earlier conversation, too.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#37Don Baccus
dhogaza@pacifier.com
In reply to: Tom Lane (#35)
Re: [HACKERS] Priorities for 6.6

At 04:58 PM 6/5/99 -0400, Tom Lane wrote:

I am not sure what effect Don is seeing, but I don't think it's quite
as dumb a mistake as that...

If you want, I can wait until the 6.5 release is out, then
play some more to make sure I can make the disk thrash with
old tables. This certainly isn't the kind of thing that
deserves rush treatment.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#38Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Don Baccus (#37)
Re: [HACKERS] Priorities for 6.6

While I don't doubt your analysis is correct for the case you've
uncovered, it doesn't explain why surrounding a bunch of selects
with a begin/end block greatly descreases disk activity for tables
that don't change. I'm pulling out "select" lists (html <select>)
from small tables of counties, states, countries for the project
I'm working on. The two countries, for instance, are "USA" and
"CA" and the table's not been updated in two months :). I'm
building a form and doing a very simple "select * from county_locales"
type selects, then building a <select> list containing all of the
possible values (not as many as you might think, this project
involves only the Pacific Northwest). There are several of
these selects executed for each form. Without the transaction
block, there's a lot of disk activity. With it, much less.

I can go pull out the begin/end blocks, they're conditionalized
in my Tcl scripts based on a "postgres" predicate so they'll
disappear if I migrate the database to another engine. Maybe
I'll have time this afternoon, if you'd like me to confirm, I'm
going to a brunch right now...

PostgreSQL writes into pg_log each time a transaction gets committed
even if it is a read only one. Once whatever file writings happen in
the transaction, fsync() would be forced at the commit time. Probably
that's why you observe less disk activity when you surround some
selects in begin/end blocks.

By the way, may I ask more question regarding Oracle? You mentioned
the magic of no-fsync in Oracle is actually a bug. Ok, I understand. I
also heard that Oracle does some kind of redo-log bufferings. Does
this mean certain committed data might be lost if the system crashed
before the buffered data is written into the disk?
---
Tatsuo Ishii

#39Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tatsuo Ishii (#38)
Re: [HACKERS] Priorities for 6.6

By the way, may I ask more question regarding Oracle? You mentioned
the magic of no-fsync in Oracle is actually a bug. Ok, I understand. I
also heard that Oracle does some kind of redo-log bufferings. Does
this mean certain committed data might be lost if the system crashed
before the buffered data is written into the disk?

That is my guess. Informix does that. No run runs with non-buffered
logging. They run with buffered logging, which may loose transactions
for a few seconds or minutes before a crash.

I think we need that, and it should be the default, but few people agree
with me. I have some schemes to do this.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#40Kaare Rasmussen
kar@webline.dk
In reply to: Bruce Momjian (#39)
Re: [HACKERS] Priorities for 6.6

I think we need that, and it should be the default, but few people agree
with me. I have some schemes to do this.

I believe you're absolutely right. To most people, performance matters
more than security in a system break down. After all we're talking
Linux, FreeBSD and other systems here. And if people worry, they can
buy UPS'es, duplicate hardware and stuff. It's extremely rare that the
hardware needs to fail.

To counter this, I think Postgresql needs some roll forward mechanism.
Maybe that's what Vadim means with savepoints? Now we're at the
Enterprise end, i could add that companies need hot backup. And if you
include the parallelizing server I believe the commercial community
will be served very well.

I was at a seminar last week where Oracle bragged about 8i. Maybe
Postgresql some time in the future could have hooks for other
languages? I know there's a PL-thing and a C-thing, but I would
personally like a Perl interface.

#41Kaare Rasmussen
kar@webline.dk
In reply to: Bruce Momjian (#39)
Re: [HACKERS] Priorities for 6.6

On a personal note, I hope that outer joins and views on unions will
get attention in 6.6.

Ryan Bradetich is working on views. Maybe I can get my wish in 6.6?

If Tom's idea about removing the 8K tuble limit and Bruce's idea about
relaxed sync'ing will make it into the next release, it should be
version 7.0 in my opinion.

#42Hannu Krosing
hannu@trust.ee
In reply to: Kaare Rasmussen (#40)
Re: [HACKERS] Priorities for 6.6

Kaare Rasmussen wrote:

I was at a seminar last week where Oracle bragged about 8i. Maybe
Postgresql some time in the future could have hooks for other
languages? I know there's a PL-thing and a C-thing, but I would
personally like a Perl interface.

The hooks are already in place, thanks to Jan.
He started by Tcl first and PL only after that.
It should be quite possible to add others with not too much work.

Hannu

#43Vadim Mikheev
vadim@krs.ru
In reply to: Tom Lane (#29)
Re: [HACKERS] Priorities for 6.6

Tom Lane wrote:

Don Baccus <dhogaza@pacifier.com> writes:

Of course, if I've understood past postings to this list,
Postgres also fsynch's after read-only selects, too,

I recently learned something about this that I hadn't understood before.
When a tuple is written out during an insert/update transaction, it is
marked as not definitely committed (since of course Postgres can't know
whether you'll abort the transaction later). The ID of the transaction
that wrote it is stored with it. Subsequently, whenever the tuple is
scanned, the backend has to go to the "transaction log" to see if that
transaction has been committed yet --- if not, it ignores the tuple.

As soon as the transaction is known to be committed, the next operation
that visits that tuple will mark it as "known committed", so as to avoid
future consultations of the transaction log. This happens *even if the
current operation is a select*. That is why selects can cause disk
writes in Postgres.

Right. But we could avoid fsync for such write operation, i.e.
do write call but not fsync. This will not avoid real disk writes
but select will not wait for them.

Vadim

#44Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#32)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

In short, if you load a bunch of tuples into a table, the first select
after the load can run a lot slower than you might expect, because it'll
be writing back most or all of the pages it touches. But that penalty
doesn't affect every select, only the first one to scan a newly-written
tuple.

I have removed this from the TODO list:

* Prevent fsync in SELECT-only queries

When selecting (i.e. - read-only) transaction commits,
it change pg_log - we obviously can avoid this!
No sense to store commit/abort status of read-only xactions!

Vadim

#45Vadim Mikheev
vadim@krs.ru
In reply to: Tom Lane (#33)
Re: [HACKERS] Priorities for 6.6

Tom Lane wrote:

Don Baccus <dhogaza@pacifier.com> writes:

While I don't doubt your analysis is correct for the case you've
uncovered, it doesn't explain why surrounding a bunch of selects
with a begin/end block greatly descreases disk activity for tables
that don't change.

Hmm, I'm not sure why that should be, either. Anyone?

pg_log fsync for read-only xactions...
And more of that, commit fsyncs ALL dirty buffers
in pool, even dirtied not by xaction being committed!

Vadim

#46Vadim Mikheev
vadim@krs.ru
In reply to: Tom Lane (#33)
Re: [HACKERS] Priorities for 6.6

Hannu Krosing wrote:

Tom Lane wrote:

Don Baccus <dhogaza@pacifier.com> writes:

While I don't doubt your analysis is correct for the case you've
uncovered, it doesn't explain why surrounding a bunch of selects
with a begin/end block greatly descreases disk activity for tables
that don't change.

Hmm, I'm not sure why that should be, either. Anyone?

From a recent discussion I remember that every block that is read

in is marked as dirty, regardless of weather it is modified or not.

No!

Vadim

#47Vadim Mikheev
vadim@krs.ru
In reply to: Kaare Rasmussen (#40)
Re: [HACKERS] Priorities for 6.6

Kaare Rasmussen wrote:

I think we need that, and it should be the default, but few people agree
with me. I have some schemes to do this.

I remember this, Bruce. But I would like to see it implemented
in right way. I'm not happy with "two sync() in postmaster" idea.
We have to implement Shared Catalog Cache (SCC), mark all dirtied
relation files there and than just fsync() these files, before
fsync() of pg_log.

To counter this, I think Postgresql needs some roll forward mechanism.
Maybe that's what Vadim means with savepoints? Now we're at the

No. Savepoints are short-term things, living during xaction.

Vadim

#48Kaare Rasmussen
kar@webline.dk
In reply to: Hannu Krosing (#42)
Re: [HACKERS] Priorities for 6.6

The hooks are already in place, thanks to Jan.
He started by Tcl first and PL only after that.
It should be quite possible to add others with not too much work.

Explain a bit more - I'd like to have a Perl interface. It has to be
added by some of the clever postgresql hackers? A non-C-speaking
individual like me can't do it, right?

#49Kristofer Munn
kmunn@munn.com
In reply to: Kaare Rasmussen (#48)
Re: [HACKERS] Priorities for 6.6

I've been following this thread via Usenet and wanted to weigh in as a
large application developer using Postgresql on Linux. I want to thank
everyone for their wonderful work on getting the database to where it is
today. I haven't seen the following discussed at length anywhere and
hopefully it's not because I've overlooked some key functionality that
already does this...

With Postgresql, the biggest handicap I run into in day-to-day development
(as compared to the numerous other database I deal with) is the lack of
stored procedures that return rows (a la Sybase, MS SQL, etc). These
procedures are pre-compiled (pre-planned) and would execute faster than
feeding the commands to the parser one at a time and performing any
conditional logic on the front-end. The ability to store logic equivalent
to (please forgive any syntax mistakes)...

IF EXISTS(SELECT 1 FROM tblFoo where checkFlag = 1)
SELECT col1, col2, col3 FROM tblFoo where checkFlag = 1
ELSE
BEGIN
SELECT 'No rows were found.' AS errorStr
RETURN -1
END
RETURN 0

or

BEGIN TRANSACTION

IF EXISTS(SELECT 1 FROM tblFoo WHERE uniqueID = @idparam)
BEGIN
UPDATE tblFoo SET col1 = @col1, col2 = @col2
WHERE uniqueID = @idparam

IF @error != 0
BEGIN
ROLLBACK TRANSACTION
SELECT "Unable to update record." AS errorStr
RETURN -1
END
END
ELSE BEGIN
INSERT INTO tblFoo (col1, col2) VALUES (@col1, @col2)

IF @error != 0
BEGIN
ROLLBACK TRANSACTION
SELECT "Unable to insert record." AS errorStr
RETURN -1
END

SELECT @idparam = @@identity
END

COMMIT TRANSACTION
SELECT @idparam AS rowID
RETURN 0

into some sort of compiled procedure at the database would be tremendously
useful. Plus, most of the execution time for some multiway joins seems to
be spent on the planning of the command, not the actual doing - time which
can be recaptured by compiling a procedure once (and perhaps after every
VACUUM ANALYZE).

The procedures would also help developers centralize their code across
platforms. My application runs PHP for the web interface and Perl for
behind-the-scenes processing. I have to re-implement/re-write the same
code to do the same thing - once under Perl and once under PHP and
maintain both. With stored procedures that return multiple rows, I could
simply put that code in the database and simplify my PHP and Perl code by
an order of magnitude (not a trivial thing for interpreted languages).

Finally, implementing stored procedures using the same language constructs
as MS SQL and Sybase (virtually identical) would allow existing developers
to write new applications and/or port existing applications to Postgresql.
Even if we weren't to add the same language but simply add the ability to
return multiple rows to the existing stored function languages, it would
be a giant step forward for myself and other application developers.

Thanks for your time...

- K

Kristofer Munn * http://www.munn.com/~kmunn/ * ICQ# 352499 * AIM: KrMunn

#50Kristofer Munn
kmunn@munn.com
In reply to: Kristofer Munn (#49)
Re: [HACKERS] Priorities for 6.6

I've been following this thread via Usenet and wanted to weigh in as a
large application developer using Postgresql on Linux. I want to thank
everyone for their wonderful work on getting the database to where it is
today. I haven't seen the following discussed at length anywhere and
hopefully it's not because I've overlooked some key functionality that
already does this...

With Postgresql, the biggest handicap I run into in day-to-day development
(as compared to the numerous other database I deal with) is the lack of
stored procedures that can return multiple rows (a la Sybase, MS SQL,
etc). These procedures are pre-compiled (pre-planned) and would execute
faster than feeding the commands to the parser one at a time and
performing any conditional logic on the front-end. The ability to store
the equivalent of (please forgive any syntax mistakes)...

IF EXISTS(SELECT 1 FROM tblFoo where checkFlag = 1)
SELECT col1, col2, col3 FROM tblFoo where checkFlag = 1
ELSE
BEGIN
SELECT 'No rows were found.' AS errorStr
RETURN -1
END
RETURN 0

or

BEGIN TRANSACTION

IF EXISTS(SELECT 1 FROM tblFoo WHERE uniqueID = @idparam)
BEGIN
UPDATE tblFoo SET col1 = @col1, col2 = @col2
WHERE uniqueID = @idparam

IF @error != 0
BEGIN
ROLLBACK TRANSACTION
SELECT "Unable to update record." AS errorStr
RETURN -1
END
END
ELSE BEGIN
INSERT INTO tblFoo (col1, col2) VALUES (@col1, @col2)

IF @error != 0
BEGIN
ROLLBACK TRANSACTION
SELECT "Unable to insert record." AS errorStr
RETURN -1
END

SELECT @idparam = @@identity
END

COMMIT TRANSACTION
SELECT @idparam AS rowID
RETURN 0

into some sort of compiled procedure at the database would be tremendously
useful. Plus, most of the execution time for some multiway joins seems to
be spent on the planning of the command, not the actual doing - time which
can be recaptured by compiling a procedure once (and perhaps after every
VACUUM ANALYZE).

The procedures would also help developers centralize their code across
platforms. My application runs PHP for the web interface and Perl for
behind-the-scenes processing. I have to re-implement/re-write the same
code to do the same thing - once under Perl and once under PHP and
maintain both. With stored procedures that return multiple rows, I could
simply put that code in the database and simplify my PHP and Perl code by
an order of magnitude (not a trivial thing for interpreted languages).

Finally, implementing stored procedures using the same language constructs
as MS SQL and Sybase (virtually identical) would allow existing developers
to write new applications and/or port existing applications to Postgresql.
Even if we weren't to add the same language but simply add the ability to
return multiple rows to the existing stored function languages, it would
be a giant step forward for myself and other application developers.

Thanks for your time...

- K

Kristofer Munn * http://www.munn.com/~kmunn/ * ICQ# 352499 * AIM: KrMunn

#51Kristofer Munn
kmunn@munn.com
In reply to: Kristofer Munn (#50)
Re: [HACKERS] Priorities for 6.6

I've been following this thread via Usenet and wanted to weigh in as a
large application developer using Postgresql on Linux. I want to thank
everyone for their wonderful work on getting the database to where it is
today. I haven't seen the following discussed at length anywhere and
hopefully it's not because I've overlooked some key functionality that
already does this...

With Postgresql, the biggest handicap I run into in day-to-day development
(as compared to the numerous other database I deal with) is the lack of
stored procedures that can return multiple rows (a la Sybase, MS SQL,
etc). These procedures are pre-compiled (pre-planned) and would execute
faster than feeding the commands to the parser one at a time and
performing any conditional logic on the front-end. The ability to store
the equivalent of (please forgive any syntax mistakes)...

IF EXISTS(SELECT 1 FROM tblFoo where checkFlag = 1)
SELECT col1, col2, col3 FROM tblFoo where checkFlag = 1
ELSE
BEGIN
SELECT 'No rows were found.' AS errorStr
RETURN -1
END
RETURN 0

or

BEGIN TRANSACTION

IF EXISTS(SELECT 1 FROM tblFoo WHERE uniqueID = @idparam)
BEGIN
UPDATE tblFoo SET col1 = @col1, col2 = @col2
WHERE uniqueID = @idparam

IF @error != 0
BEGIN
ROLLBACK TRANSACTION
SELECT "Unable to update record." AS errorStr
RETURN -1
END
END
ELSE BEGIN
INSERT INTO tblFoo (col1, col2) VALUES (@col1, @col2)

IF @error != 0
BEGIN
ROLLBACK TRANSACTION
SELECT "Unable to insert record." AS errorStr
RETURN -1
END

SELECT @idparam = @@identity
END

COMMIT TRANSACTION
SELECT @idparam AS rowID
RETURN 0

into some sort of compiled procedure at the database would be tremendously
useful. Plus, most of the execution time for some multiway joins seems to
be spent on the planning of the command, not the actual doing - time which
can be recaptured by compiling a procedure once (and perhaps after every
VACUUM ANALYZE).

The procedures would also help developers centralize their code across
platforms. My application runs PHP for the web interface and Perl for
behind-the-scenes processing. I have to re-implement/re-write the same
code to do the same thing - once under Perl and once under PHP and
maintain both. With stored procedures that return multiple rows, I could
simply put that code in the database and simplify my PHP and Perl code by
an order of magnitude (not a trivial thing for interpreted languages).

Finally, implementing stored procedures using the same language constructs
as MS SQL and Sybase (virtually identical) would allow existing developers
to write new applications and/or port existing applications to Postgresql.
Even if we weren't to add the same language but simply add the ability to
return multiple rows to the existing stored function languages, it would
be a giant step forward for myself and other application developers.

Thanks for your time...

- K

Kristofer Munn * http://www.munn.com/~kmunn/ * ICQ# 352499 * AIM: KrMunn

#52Don Baccus
dhogaza@pacifier.com
In reply to: Tatsuo Ishii (#38)
Re: [HACKERS] Priorities for 6.6

At 12:22 PM 6/6/99 +0900, Tatsuo Ishii wrote:

By the way, may I ask more question regarding Oracle? You mentioned
the magic of no-fsync in Oracle is actually a bug. Ok, I understand. I
also heard that Oracle does some kind of redo-log bufferings. Does
this mean certain committed data might be lost if the system crashed
before the buffered data is written into the disk?

Not sure, actually, I'm by no means an Oracle expert, I was just
passing alone information gleaned from the Oracle/linux newsgroup.
You can access this via the main Oracle website, go to the Oracle
Technology Network and register, much as you did to download your
developer's copy of the db engine. Some very experienced Oracle
types hang out there.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#53Noname
wieck@debis.com
In reply to: Bruce Momjian (#32)
Re: [HACKERS] Priorities for 6.6

In short, if you load a bunch of tuples into a table, the first select
after the load can run a lot slower than you might expect, because it'll
be writing back most or all of the pages it touches. But that penalty
doesn't affect every select, only the first one to scan a newly-written
tuple.

I have removed this from the TODO list:

* Prevent fsync in SELECT-only queries

I think this entry should stay.

In fact, there is a write on every transaction that
commits/aborts even if it's one that doesn't modify any data.
pg_log is written for SELECT only transactions too. I'm
nearly 99.5% sure that not fsync()'ing those transaction
would not hit reliability and we might have to work it out.

This might be one cause that surrounding a bunch of SELECT
statements by BEGIN/END speeds up PostgreSQL in non -F mode.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#54Noname
wieck@debis.com
In reply to: Hannu Krosing (#34)
Re: [HACKERS] Priorities for 6.6

Tom Lane wrote:

Don Baccus <dhogaza@pacifier.com> writes:

While I don't doubt your analysis is correct for the case you've
uncovered, it doesn't explain why surrounding a bunch of selects
with a begin/end block greatly descreases disk activity for tables
that don't change.

Hmm, I'm not sure why that should be, either. Anyone?

From a recent discussion I remember that every block that is read

in is marked as dirty, regardless of weather it is modified or not.

It is not a genuine bug (as it only slows thong down instead of
getting wrong results), but still a misfeature.

It is most likely an ancient quickfix for some execution path that
failed to set the dirty mark when it should have.

Can't believe that this is true - uhhhhhh!

If it is, then it's surely a severe BUG!

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#55Oleg Bartunov
oleg@sai.msu.su
In reply to: Kaare Rasmussen (#48)
Re: [HACKERS] Priorities for 6.6

It would be cool to have Perl interface to postgres internals!
Unfortunately I'm not C programmer. I think Edmund could do this.

Regards,

Oleg

On Sun, 6 Jun 1999, Kaare Rasmussen wrote:

Date: Sun, 6 Jun 1999 16:35:14 +0200 (CEST)
From: Kaare Rasmussen <kar@webline.dk>
To: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] Priorities for 6.6

The hooks are already in place, thanks to Jan.
He started by Tcl first and PL only after that.
It should be quite possible to add others with not too much work.

Explain a bit more - I'd like to have a Perl interface. It has to be
added by some of the clever postgresql hackers? A non-C-speaking
individual like me can't do it, right?

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#56Noname
wieck@debis.com
In reply to: Bruce Momjian (#39)
Re: [HACKERS] Priorities for 6.6

By the way, may I ask more question regarding Oracle? You mentioned
the magic of no-fsync in Oracle is actually a bug. Ok, I understand. I
also heard that Oracle does some kind of redo-log bufferings. Does
this mean certain committed data might be lost if the system crashed
before the buffered data is written into the disk?

That is my guess. Informix does that. No run runs with non-buffered
logging. They run with buffered logging, which may loose transactions
for a few seconds or minutes before a crash.

I think we need that, and it should be the default, but few people agree
with me. I have some schemes to do this.

The major problem in this area is, that with the given model
of telling which tuples are committed, noone can guarantee a
consistent PostgreSQL database in the case of a non-fsynced
crash. You might loose some tuples and might get some
outdated ones back. But it depends on subsequently executed
SELECT's which ones and it all doesn't have anything to do
with transaction boundaries or with the order in which
transactions committed.

As I understand Oracle the entire reliability depends on the
redo logs. If a crash is too badly, you can allways restore
the last backup and recover from that. The database crash
recovery will roll forward until the last COMMIT that occurs
in the redolog (except for point in time recovery).

Someone can live with the case, that the last COMMIT's
(sorted by time) cannot get recovered. But noone can live
with a database that's left corrupt.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#57Noname
wieck@debis.com
In reply to: Oleg Bartunov (#55)
Re: [HACKERS] Priorities for 6.6

It would be cool to have Perl interface to postgres internals!
Unfortunately I'm not C programmer. I think Edmund could do this.

Now I've got a sequence overflow counting the requests for
something like PL/Perl!

Even if I can't believe, could it be that there are too many
Perl users vs. Perl developers?

Several times I clearly said that I don't know much more
about Perl than that it's YASL (Yet Another Scripting
Language).

But ...

1. I've designed and implemented the PL interface into the
PostgreSQL function manager including the CREATE/DROP
PROCEDURAL LANGUAGE statements.

2. I've added the procedural languages PL/Tcl and PL/pgSQL,
which are described in the programmers manual and which
are both used since v6.4 outside in the world.

3. I've offered help for building PL/Perl several times now.
But only "it would be nice if someone else ..." requests
are coming in.

Some years ago I searched for a general purpose scripting
language. I found Tcl, which has a graphical user interface
(Tk) and is supported on the platforms I need (all UN*X and
(for sake) Windows-NT/95). Since then I've created a couple
of Tcl interfaces to things, it cannot do by default (like
SAP remote function calls - not available as open source so
don't call for :-( ). The simpleness it uses for interfacing
foreign things is why it's called the "Tool Command
Language".

This simpleness gave me the power to create PL/Tcl. PL/pgSQL
was my answer to requests about a native language that
doesn't depend on any other thing installed on a PostgreSQL
target system.

To explain point 3 in detail: I still feel responsible for
the function managers procedural language interface -- since
I created it. BUT I WOULDN'T LEARN PERL PLUS IT'S API ONLY TO
PROVIDE PL/Perl TO THE {U|LOO}SER-COMMUNITY! That would mean
to get responsible for one more thing I don't need for
myself.

If there's (only one) Perl PROGRAMMER out in the world
reading this, who does see a (however small) possibility to
explain how to integrate a Perl interpreter into PostgreSQL,
RESPOND!!!!!!!!!!!

I'll let y'all know about the responses I got. Even if I
don't expect a single one where a PL/Perl could result from.

Maybe Perl isn't the scripting language someone should choose
because it is too limited in it's capabilities - remember
that real programmers don't use pascal... - maybe real
programmer wouldn't ever use Perl...

Maybe - (please don't) - Jan

Regards,

Oleg

On Sun, 6 Jun 1999, Kaare Rasmussen wrote:

Date: Sun, 6 Jun 1999 16:35:14 +0200 (CEST)
From: Kaare Rasmussen <kar@webline.dk>
To: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] Priorities for 6.6

The hooks are already in place, thanks to Jan.
He started by Tcl first and PL only after that.
It should be quite possible to add others with not too much work.

Explain a bit more - I'd like to have a Perl interface. It has to be
added by some of the clever postgresql hackers? A non-C-speaking
individual like me can't do it, right?

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#58Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#44)
Re: [HACKERS] Priorities for 6.6

Oh, I see. SELECT is a transaction, so it flushes pglog. Re-added to
TODO.

Bruce Momjian wrote:

In short, if you load a bunch of tuples into a table, the first select
after the load can run a lot slower than you might expect, because it'll
be writing back most or all of the pages it touches. But that penalty
doesn't affect every select, only the first one to scan a newly-written
tuple.

I have removed this from the TODO list:

* Prevent fsync in SELECT-only queries

When selecting (i.e. - read-only) transaction commits,
it change pg_log - we obviously can avoid this!
No sense to store commit/abort status of read-only xactions!

Vadim

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#59Mark Hollomon
mhh@nortelnetworks.com
In reply to: Noname (#57)
Re: [HACKERS] Priorities for 6.6

Jan Wieck wrote:

It would be cool to have Perl interface to postgres internals!
Unfortunately I'm not C programmer. I think Edmund could do this.

Now I've got a sequence overflow counting the requests for
something like PL/Perl!

[personal view on perl and tcl deleted]

Jan:

I've been looking for a project to get me active in the postgresql
community after lurking since before it was (officially) PostgreSQL.

I will do the PL/Perl interface.

Perl is a great integration tool. This can be seen from the enormous
growth it its use in such areas as CGI programming. And imbedding
functionality into perl from other sources is rarely a hard problem.

But embedding perl in other applications is not as easy as it could be.

--

Mark Hollomon
mhh@nortelnetworks.com

#60Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#47)
Re: [HACKERS] Priorities for 6.6

Kaare Rasmussen wrote:

I think we need that, and it should be the default, but few people agree
with me. I have some schemes to do this.

I remember this, Bruce. But I would like to see it implemented
in right way. I'm not happy with "two sync() in postmaster" idea.
We have to implement Shared Catalog Cache (SCC), mark all dirtied
relation files there and than just fsync() these files, before
fsync() of pg_log.

I see. You want to use the shared catalog cache to flag relations that
have been modified, and fsync those before fsync of pglog. Another idea
is to send a signal to each backend that has marked a bit in shared
memory saying it has written to a relation, and have the signal handler
fsync all its dirty relations, set a finished bit, and have the
postmaster then fsync pglog. The shared catalog cache still requires
the postmaster to open every relation that is marked as dirty to fsync
it, which could be a performance problem. Now, if we could pass file
descriptors between processes, that would make things easy. I think BSD
can do it, but I don't believe it is portable.

My idea would be:

backend 1 2 3 4 5 6 7
dirtied:

1 2 3 4 5 6 7
fsync'ed:

Each backend sets it's 'dirtied' bit when it modifies and relation.

Every 5 seconds, postmaster scans dirtied list, sends signal to each
backend that has dirtied. Each backend fsyncs its relations, then sets
its fsync'ed bit. When all have signaled fsynced, the postmaster can
update pg_log on disk. Another issue is that now that we update the
transaction status as part of SELECT, pg_log is not the only
representation of committed status.

Of course, we have to prevent flush of pglog by OS, perhaps by making a
copy of the last two pages of pg_log before this and remove it after.
If a backend starts up and sees that pg_log copy file, it puts that in
place of the current last two pages of pg_log.

Also, for 6.6, I am going to add system table indexes so all cache
lookups use indexes. I am unsure that shared catalog cache is going to
do that buffer cache doesn't already do. Perhaps if we just flushed the
system table cache buffers less frequently, there would be no need for a
shared system cache.

Basically, this fsync() thing is killing performance, and I think we can
come up with an smart solution to this if we discuss the options.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#61Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#56)
Re: [HACKERS] Priorities for 6.6

The major problem in this area is, that with the given model
of telling which tuples are committed, noone can guarantee a
consistent PostgreSQL database in the case of a non-fsynced
crash. You might loose some tuples and might get some
outdated ones back. But it depends on subsequently executed
SELECT's which ones and it all doesn't have anything to do
with transaction boundaries or with the order in which
transactions committed.

As I understand Oracle the entire reliability depends on the
redo logs. If a crash is too badly, you can allways restore
the last backup and recover from that. The database crash
recovery will roll forward until the last COMMIT that occurs
in the redolog (except for point in time recovery).

Someone can live with the case, that the last COMMIT's
(sorted by time) cannot get recovered. But noone can live
with a database that's left corrupt.

Yes, I 100% agree. We have to bring the database back to a consistent
case where only the last few transactions are not done at all, and all
previous ones are completely done. See previous post on methods and
issues.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#62Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Mark Hollomon (#59)
Re: [HACKERS] Priorities for 6.6

Added to TODO list.

Jan Wieck wrote:

It would be cool to have Perl interface to postgres internals!
Unfortunately I'm not C programmer. I think Edmund could do this.

Now I've got a sequence overflow counting the requests for
something like PL/Perl!

[personal view on perl and tcl deleted]

Jan:

I've been looking for a project to get me active in the postgresql
community after lurking since before it was (officially) PostgreSQL.

I will do the PL/Perl interface.

Perl is a great integration tool. This can be seen from the enormous
growth it its use in such areas as CGI programming. And imbedding
functionality into perl from other sources is rarely a hard problem.

But embedding perl in other applications is not as easy as it could be.

--

Mark Hollomon
mhh@nortelnetworks.com

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#63Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#60)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

update pg_log on disk. Another issue is that now that we update the
transaction status as part of SELECT, pg_log is not the only

We should don't update pg_log for read-only xactions.

representation of committed status.

Of course, we have to prevent flush of pglog by OS, perhaps by making a
copy of the last two pages of pg_log before this and remove it after.
If a backend starts up and sees that pg_log copy file, it puts that in
place of the current last two pages of pg_log.

Keep two last pg_log pages in shmem, lock them, copy, unlock,
write copy to pg_log.

Also, for 6.6, I am going to add system table indexes so all cache
lookups use indexes. I am unsure that shared catalog cache is going to
do that buffer cache doesn't already do. Perhaps if we just flushed the
system table cache buffers less frequently, there would be no need for a
shared system cache.

I would like to see ntuples and npages in pg_class up-to-date.
Now we do fseek for each heap_insert and for each heap_beginscan.
And note that we have to open() system relation files, even
if pages are in buffer pool.

Vadim

#64Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#63)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

update pg_log on disk. Another issue is that now that we update the
transaction status as part of SELECT, pg_log is not the only

We should don't update pg_log for read-only xactions.

No, I was saying we mark those SELECT'ed rows as being part of committed
transactions. When we SELECT a row, we look at pg_log to see if it is
committed, and mark that row as part of a committed transaction so we
don't have to check pg_log again. We can't do that with the system we
invisioning until we put pg_log on disk as a committed transaction.
Could be tricky, though having two copies of pg_log in memory, one
disk-copy and one active copy, and use disk-copy for row xact status
updates would do the trick.

representation of committed status.

Of course, we have to prevent flush of pglog by OS, perhaps by making a
copy of the last two pages of pg_log before this and remove it after.
If a backend starts up and sees that pg_log copy file, it puts that in
place of the current last two pages of pg_log.

Keep two last pg_log pages in shmem, lock them, copy, unlock,
write copy to pg_log.

Yes, much better. Control what gets to disk by not updating the file at
all.

Also, for 6.6, I am going to add system table indexes so all cache
lookups use indexes. I am unsure that shared catalog cache is going to
do that buffer cache doesn't already do. Perhaps if we just flushed the
system table cache buffers less frequently, there would be no need for a
shared system cache.

I would like to see ntuples and npages in pg_class up-to-date.
Now we do fseek for each heap_insert and for each heap_beginscan.
And note that we have to open() system relation files, even
if pages are in buffer pool.

Why do we have to open system tables if already in buffer cache? I
guess so in case we need to write it out, or fault on another page.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#65Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#64)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian wrote:

Also, for 6.6, I am going to add system table indexes so all cache
lookups use indexes. I am unsure that shared catalog cache is going to
do that buffer cache doesn't already do. Perhaps if we just flushed the
system table cache buffers less frequently, there would be no need for a
shared system cache.

I would like to see ntuples and npages in pg_class up-to-date.
Now we do fseek for each heap_insert and for each heap_beginscan.
And note that we have to open() system relation files, even
if pages are in buffer pool.

Why do we have to open system tables if already in buffer cache? I
guess so in case we need to write it out, or fault on another page.

Just because of ... heap_open()->RelationBuildDesc() does it.
Maybe we could delay smgropen?

But in any case note that big guys have shared catalog cache,
and this is not because of they haven't good buffer pool -:)
Keeping page in pool for just single row is not good.

"Oracle itself accesses the data dictionary frequently during
the parsing of SQL statements. This access is essential to the
continuing operation of Oracle. See Chapter 8, "The Data Dictionary,"
for more information on the data dictionary.

...

Caching of the Data Dictionary for Fast Access

Because Oracle constantly accesses the data dictionary during database
operation to validate user access and to verify the state of objects,
much of the data dictionary information is cached in the SGA. All
information is stored in memory using the LRU (least recently
used) algorithm. Information typically kept in the caches is that
required for parsing."

Vadim

#66Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#65)
Re: [HACKERS] Priorities for 6.6

Just because of ... heap_open()->RelationBuildDesc() does it.
Maybe we could delay smgropen?

But in any case note that big guys have shared catalog cache,
and this is not because of they haven't good buffer pool -:)
Keeping page in pool for just single row is not good.

"Oracle itself accesses the data dictionary frequently during
the parsing of SQL statements. This access is essential to the
continuing operation of Oracle. See Chapter 8, "The Data Dictionary,"
for more information on the data dictionary.

...

Caching of the Data Dictionary for Fast Access

Because Oracle constantly accesses the data dictionary during database
operation to validate user access and to verify the state of objects,
much of the data dictionary information is cached in the SGA. All
information is stored in memory using the LRU (least recently
used) algorithm. Information typically kept in the caches is that
required for parsing."

I agree we need it. I just think we could use better fsync more, and
seeing how hard shared catalog cache may be, it may be good to get fsync
faster first.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#67Kristofer Munn
kmunn@munn.com
In reply to: Vadim Mikheev (#65)
Re: [HACKERS] Priorities for 6.6

My apologies for the apparent triple posting of my earlier message
regarding stored procedures. The mailing list send me mail notifying me
that the message had bounced due to my not being subscribed so I had tried
to subscribe and then send again. Somehow, all three eventually go
through.

- K

Kristofer Munn * http://www.munn.com/~kmunn/ * ICQ# 352499 * AIM: KrMunn

#68ZEUGSWETTER Andreas IZ5
Andreas.Zeugswetter@telecom.at
In reply to: Kristofer Munn (#67)
Re: [HACKERS] Priorities for 6.6

By the way, may I ask more question regarding Oracle? You mentioned
the magic of no-fsync in Oracle is actually a bug. Ok, I understand. I
also heard that Oracle does some kind of redo-log bufferings. Does
this mean certain committed data might be lost if the system crashed
before the buffered data is written into the disk?

Yes, you might loose a transaction that has been reported committed to the
client.
But, it guarantees that every transaction is eighter committed, or rolled
back as a
whole. Thus leaving the database in a consistent state. We have a lot of
applications
where this is acceptable, and others where this is not. It is the ability to
choose
(in Informix buffered versus unbuffered logging) that makes us happy.

Andreas

#69Tom Lane
tgl@sss.pgh.pa.us
In reply to: ZEUGSWETTER Andreas IZ5 (#68)
Re: [HACKERS] Priorities for 6.6

"Mark Hollomon" <mhh@nortelnetworks.com> writes:

I've been looking for a project to get me active in the postgresql
community after lurking since before it was (officially) PostgreSQL.

I will do the PL/Perl interface.

Great! Glad to hear it.

regards, tom lane

#70Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#69)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian <maillist@candle.pha.pa.us> writes:

... Another idea
is to send a signal to each backend that has marked a bit in shared
memory saying it has written to a relation, and have the signal handler
fsync all its dirty relations, set a finished bit, and have the
postmaster then fsync pglog.

I do not think it's practical to expect any useful work to happen inside
a signal handler. The signal could come at any moment, such as when
data structures are being updated and are in a transient invalid state.
Unless you are willing to do a lot of fooling around with blocking &
unblocking the signal, about all the handler can safely do is set a flag
variable that will be examined somewhere in the backend main loop.

However, if enough information is available in shared memory, perhaps
the postmaster could do this scan/update/flush all by itself?

Of course, we have to prevent flush of pglog by OS, perhaps by making a
copy of the last two pages of pg_log before this and remove it after.
If a backend starts up and sees that pg_log copy file, it puts that in
place of the current last two pages of pg_log.

It seems to me that one or so disk writes per transaction is not all
that big a cost. Does it take much more than one write to update
pg_log, and if so why?

regards, tom lane

#71Vadim Mikheev
vadim@krs.ru
In reply to: Tom Lane (#69)
Re: [HACKERS] Priorities for 6.6

Tom Lane wrote:

"Mark Hollomon" <mhh@nortelnetworks.com> writes:

I've been looking for a project to get me active in the postgresql
community after lurking since before it was (officially) PostgreSQL.

I will do the PL/Perl interface.

Great! Glad to hear it.

And me!

Vadim

#72Noname
wieck@debis.com
In reply to: Mark Hollomon (#59)
Re: [HACKERS] Priorities for 6.6

Mark Hollomon wrote:

Jan Wieck wrote:

It would be cool to have Perl interface to postgres internals!
Unfortunately I'm not C programmer. I think Edmund could do this.

Now I've got a sequence overflow counting the requests for
something like PL/Perl!

[personal view on perl and tcl deleted]

Really sorry for that. It's not my favorite behaviour to talk
dirty about something I don't know. But after asking kindly
several times I thought making someone angry could work - and
it did :-)

Jan:

I've been looking for a project to get me active in the postgresql
community after lurking since before it was (officially) PostgreSQL.

I will do the PL/Perl interface.

That's a word! I'll contact you with private mail.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

In reply to: Vadim Mikheev (#71)
Re: [HACKERS] Priorities for 6.6

According to Vadim Mikheev:

"Mark Hollomon" <mhh@nortelnetworks.com> writes:

I've been looking for a project to get me active in the postgresql
community after lurking since before it was (officially) PostgreSQL.

I will do the PL/Perl interface.

Great! Glad to hear it.

And me!

It would be really nice if the client/server interface could be
fixed up to remove the tuple size limits by the time the embedded
perl interface is added. I think preloading some perl functions
to get arbitrarily processed output back from a select would be
handy for a lot of uses, and even better if we didn't have to worry
about the size of the returned "record".

Les Mikesell
les@mcs.com

#74Brian E Gallew
geek+@cmu.edu
In reply to: Tom Lane (#70)
Re: [HACKERS] Priorities for 6.6

Bruce Momjian <maillist@candle.pha.pa.us> writes:

... Another idea
is to send a signal to each backend that has marked a bit in shared
memory saying it has written to a relation, and have the signal handler
fsync all its dirty relations, set a finished bit, and have the
postmaster then fsync pglog.

One other problem with signals is that things get complicated if
PostgreSQL ever moves to a multi-threading model.

--
=====================================================================
| JAVA must have been developed in the wilds of West Virginia. |
| After all, why else would it support only single inheritance?? |
=====================================================================
| Finger geek@cmu.edu for my public key. |
=====================================================================

#75Kaare Rasmussen
kar@webline.dk
In reply to: Noname (#57)
Re: [HACKERS] Priorities for 6.6

If there's (only one) Perl PROGRAMMER out in the world
reading this, who does see a (however small) possibility to
explain how to integrate a Perl interpreter into PostgreSQL,
RESPOND!!!!!!!!!!!

Well I'm a Perl programmer. I don't have a clue about how to integrate
Perl into PostgreSQL, but it should be possible.

I'd like to help. I know Perl, but nothing about the internals of
PostgreSQL and I don't code C.

Maybe Perl isn't the scripting language someone should choose
because it is too limited in it's capabilities - remember

Too limited? Perl? You're joking.

that real programmers don't use pascal... - maybe real
programmer wouldn't ever use Perl...

Then I'm no real programmer. But then again I program in any language
that is needed.

#76Kaare Rasmussen
kar@webline.dk
In reply to: Noname (#72)
Re: [HACKERS] Priorities for 6.6

Really sorry for that. It's not my favorite behaviour to talk
dirty about something I don't know. But after asking kindly
several times I thought making someone angry could work - and
it did :-)

I've never seen yout post before. But then again I've not been on
hackers for too long.

#77Dmitry Samersoff
dms@wplus.net
In reply to: Kaare Rasmussen (#75)
1 attachment(s)
Re: [HACKERS] Priorities for 6.6

On 07-Jun-99 Kaare Rasmussen wrote:

If there's (only one) Perl PROGRAMMER out in the world
reading this, who does see a (however small) possibility to
explain how to integrate a Perl interpreter into PostgreSQL,
RESPOND!!!!!!!!!!!

Easy.
See attachment.

---
Dmitry Samersoff, dms@wplus.net, ICQ:3161705
http://devnull.wplus.net
* There will come soft rains ...

Attachments:

loadmail.plapplication/octet-stream; SizeOnDisk=3402; name=loadmail.plDownload
#78Noname
wieck@debis.com
In reply to: Dmitry Samersoff (#77)
Re: [HACKERS] Priorities for 6.6

Dmitry Samersoff wrote:

This message is in MIME format
--_=XFMail.1.3.p0.FreeBSD:990608114224:212=_
Content-Type: text/plain; charset=KOI8-R

On 07-Jun-99 Kaare Rasmussen wrote:

If there's (only one) Perl PROGRAMMER out in the world
reading this, who does see a (however small) possibility to
explain how to integrate a Perl interpreter into PostgreSQL,
RESPOND!!!!!!!!!!!

Easy.
See attachment.

Content-Disposition: attachment; filename="loadmail.pl"

Dmitry,

it's well known that a Perl script can access a PostgreSQL
database. But that's not the thing we're looking for.

For building PL/Perl, the Perl INTERPRETER must be INSIDE the
backend. Only if it is part of the PostgreSQL backend, it can
have access to the SPI. It will not work to spawn off an
external interpreter which then contacts the database back
via Pg. Thus, there must be some way to link a shared object
against the Perl libraries and at the time the PostgreSQL
database backend loads our shared object to call functions in
the Perl library.

The attachment you've sent is simply a Perl script that does
some db access. Nice, but not the point. Please show us how
easy it is to do what we want.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#79Noname
wieck@debis.com
In reply to: Kaare Rasmussen (#75)
Re: [HACKERS] Priorities for 6.6

Kaare Rasmussen wrote:

If there's (only one) Perl PROGRAMMER out in the world
reading this, who does see a (however small) possibility to
explain how to integrate a Perl interpreter into PostgreSQL,
RESPOND!!!!!!!!!!!

Well I'm a Perl programmer. I don't have a clue about how to integrate
Perl into PostgreSQL, but it should be possible.

I'd like to help. I know Perl, but nothing about the internals of
PostgreSQL and I don't code C.

That's the point and why I wrote programmer in capitals (REAL
PROGRAMMERS STILL THINK AND TALK IN CAPITALS). It's not Perl
script-writing what's needed at this stage. We need
programmers that are familiar with the C API of Perl and
could easily write things like the Pg package. And that's
still not enough knowledge.

Maybe Perl isn't the scripting language someone should choose
because it is too limited in it's capabilities - remember

Too limited? Perl? You're joking.

No, I wasn't joking. Up to now there's only one person who
said that what we need is possible, but it wouldn't be as
easy as it should be. I didn't talked about how powerful the
Perl language is. And I know that it's easy to integrate
anything into Perl. But after all, it's a Perl script that
has the entire control.

This time, the Perl interpreter has to become a silly little
working slave. Beeing quiet until it's called and quiet
again after having served one function call until the big
master PostgreSQL calls him again.

This flexibility requires a real good design of the
interpreters internals. And that's what I'm addressing here.

that real programmers don't use pascal... - maybe real
programmer wouldn't ever use Perl...

Then I'm no real programmer. But then again I program in any language
that is needed.

You aren't - you're a script writer and that's a today quiche
eater :-)

The term "Real Programmer" is something any hacker should
know!

Top of the article "Real Programmers Don't Use Pascal":

<<Back in the good old days -- the "Golden Era" of computers,
it was easy to separate the men from the boys (sometimes
called "Real Men" and "Quiche Eaters" in the literature).
During this period, the Real Men were the ones that
understood computer programming, and the Quiche Eaters were
the ones that didn't. ...>>

Take a look at

http://burks.bton.ac.uk/burks/foldoc/33/86.htm

and follow the links - enjoy.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#80Oleg Broytmann
phd@emerald.netskate.ru
In reply to: Noname (#79)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

Hello!

On Tue, 8 Jun 1999, Jan Wieck wrote:

This time, the Perl interpreter has to become a silly little
working slave. Beeing quiet until it's called and quiet
again after having served one function call until the big
master PostgreSQL calls him again.

This flexibility requires a real good design of the
interpreters internals. And that's what I'm addressing here.

I know exactly 1 (one) program that incorporate (embed) Perl interpreter
- it is editor VIM (well-known vi-clone from www.vim.org). I think anyone
who want to learn how to embed perl may start looking int vim sources.
Once I tried to compile vim+perl myself, but perl didn't work. I am
perl-hater, so this probably was the reason.
Anothe example - mod_perl for Apache - is rather bad example, as
mod_perl is too big, overcomplicated and too perlish :)

VIM can also be compiled with builtin Python interpreter, and I had no
problem compilng and using vim+python. Python is well known for its
extending and embedding capabilities. mod_python (it is called PyApache) is
small and elegant example of how to embed python, but of course it is not
as powerful as mod_perl (one cannot touch Apache internals from mod_python,
though author lead PyApache development this way).
Yes, I am biased toward Python, but I cannot say "I recommend embed
Python to construct PL/Python" - I have no time to lead the development,
and I doubt there are many pythoners here (D'Arcy?).

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

Oleg.
----
Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.

#81Dmitry Samersoff
dms@wplus.net
In reply to: Noname (#79)
Re: [HACKERS] Priorities for 6.6

On 08-Jun-99 Jan Wieck wrote:

Kaare Rasmussen wrote:

If there's (only one) Perl PROGRAMMER out in the world
reading this, who does see a (however small) possibility to
explain how to integrate a Perl interpreter into PostgreSQL,
RESPOND!!!!!!!!!!!

Well I'm a Perl programmer. I don't have a clue about how to integrate
Perl into PostgreSQL, but it should be possible.

I'd like to help. I know Perl, but nothing about the internals of
PostgreSQL and I don't code C.

That's the point and why I wrote programmer in capitals (REAL
PROGRAMMERS STILL THINK AND TALK IN CAPITALS). It's not Perl
script-writing what's needed at this stage. We need
programmers that are familiar with the C API of Perl and
could easily write things like the Pg package. And that's
still not enough knowledge.

Ok!
I have no problems writing C package like Pg.pm to use with Perl 5.x.

However, IMHO all tasks requiring such packages
can better be done using C++ and STL.

---
Dmitry Samersoff, dms@wplus.net, ICQ:3161705
http://devnull.wplus.net
* There will come soft rains ...

#82Hannu Krosing
hannu@trust.ee
In reply to: Oleg Broytmann (#80)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

Oleg Broytmann wrote:

Hello!

VIM can also be compiled with builtin Python interpreter, and I had no
problem compilng and using vim+python. Python is well known for its
extending and embedding capabilities. mod_python (it is called PyApache) is
small and elegant example of how to embed python, but of course it is not
as powerful as mod_perl (one cannot touch Apache internals from mod_python,
though author lead PyApache development this way).

Actually abou 1.5 years ago it used to allow access to internals, but
seemingly nobody used it and so it was thrown out in later versions.

Yes, I am biased toward Python, but I cannot say "I recommend embed
Python to construct PL/Python" - I have no time to lead the development,
and I doubt there are many pythoners here (D'Arcy?).

I have contemplated it several times, but to be really useful we would
first need a nice interface for returning "tables" from PL functions.

I suspect this is not something trivial to add ?

With it I could use PL/Python to make all kinds of external objects like
mailboxes
(both local and POP/IMAP/NNTP), conf files (/etc/passwd, pg_hba.conf),
DNS/LDAP/...
queries or any other nice things available through existing python
modules available
to postgres queries.

-----------------
Hannu

#83Michael Robinson
robinson@netrinsics.com
In reply to: Hannu Krosing (#82)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

Oleg Broytmann <phd@emerald.netskate.ru> writes:

I know exactly 1 (one) program that incorporate (embed) Perl interpreter
- - it is editor VIM (well-known vi-clone from www.vim.org). I think anyone
who want to learn how to embed perl may start looking int vim sources.

Also now GIMP.

Yes, I am biased toward Python, but I cannot say "I recommend embed
Python to construct PL/Python" - I have no time to lead the development,
and I doubt there are many pythoners here (D'Arcy?).

I went down that road a little ways to see what it looked like, and decided
that was the wrong road to take.

The reason people want an embedded procedural language is because procedures
in such a language have access to the guts of the backend, and can perform
many operations much more efficiently than having to go push everything
through the FE->SQL->compiler->executor->tuple list->FE->lather->rinse->repeat
bottleneck.

I decided that the proper solution was to expose all the internal guts of
the backend through a proper CORBA interface. That way, any language with
an ORB could act as an embedded procedural language.

Currently, I'm working on imbedding ORBit into Python to get my skills up
to speed. When that's accomplished, I intend to tackle PostgreSQL. For
a proof of concept, you can look at mod_corba (ORBit embedded in Apache,
exposing the Apache API).

-Michael Robinson

#84Bernard Frankpitt
frankpit@pop.dn.net
In reply to: Noname (#79)
Re: [HACKERS] Priorities for 6.6

I re-read the Real Programmers don't use Pascal ---- it beats writing
this damned proposal I'm working on. If the line about Real Programmers
use goto's is anything to go by, then you should nominate Vadim as the
v6.5 Real Programmer. You should _see_ all those gotos in nbtree.c!
He's done a fine job with them too.

Bernie

#85Philip Warner
pjw@rhyme.com.au
In reply to: Michael Robinson (#83)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

At 14:40 8/06/99 +0400, you wrote:

Hello!

On Tue, 8 Jun 1999, Jan Wieck wrote:

This time, the Perl interpreter has to become a silly little
working slave. Beeing quiet until it's called and quiet
again after having served one function call until the big
master PostgreSQL calls him again.

This flexibility requires a real good design of the
interpreters internals. And that's what I'm addressing here.

I know exactly 1 (one) program that incorporate (embed) Perl interpreter
- it is editor VIM (well-known vi-clone from www.vim.org). I think anyone

Apache also has Perl very nicely embedded (as opposed to available through indirect CGI calls); when it is embedded it automatically reloads and recompiles changed scripts.

I presume that the Apache code may be useful in this process.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#86Noname
wieck@debis.com
In reply to: Michael Robinson (#83)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

Michael Robinson wrote:

The reason people want an embedded procedural language is because procedures
in such a language have access to the guts of the backend, and can perform
many operations much more efficiently than having to go push everything
through the FE->SQL->compiler->executor->tuple list->FE->lather->rinse->repeat
bottleneck.

That's one reason. Another one is that you can create stored
procedures that get triggered on events
(INSERT/UPDATE/DELETE) and can perform referential integrity
checks and other things then.

Some of them could also be done by constraints (if we
sometimes have the rule system powerful enought to do it
correctly). Some can't.

I decided that the proper solution was to expose all the internal guts of
the backend through a proper CORBA interface. That way, any language with
an ORB could act as an embedded procedural language.

And how does your CORBA get triggered in the case someone
uses the good old psql? Or MUST everything then talk CORBA
and you disable any other kind of access completely? Note
that for a trigger that has to ensure referential integrity
it's not enough to say "it will be triggered if the user uses
the correct access path". It has to ensure that the user
doesn't use the wrong one!

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#87Noname
wieck@debis.com
In reply to: Bernard Frankpitt (#84)
Real Programmers (was: [HACKERS] Priorities for 6.6)

I re-read the Real Programmers don't use Pascal ---- it beats writing
this damned proposal I'm working on. If the line about Real Programmers
use goto's is anything to go by, then you should nominate Vadim as the
v6.5 Real Programmer. You should _see_ all those gotos in nbtree.c!
He's done a fine job with them too.

Vadim is surely one of the real programmers in our project.
It's not only that he isn't afraid using GOTO's. He also
know's very well how to speed things up by complicating the
code :-)

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#88Dmitry Samersoff
dms@wplus.net
In reply to: Philip Warner (#85)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

On 09-Jun-99 Philip Warner wrote:

At 14:40 8/06/99 +0400, you wrote:

Hello!

On Tue, 8 Jun 1999, Jan Wieck wrote:

This time, the Perl interpreter has to become a silly little
working slave. Beeing quiet until it's called and quiet
again after having served one function call until the big
master PostgreSQL calls him again.

This flexibility requires a real good design of the
interpreters internals. And that's what I'm addressing here.

I know exactly 1 (one) program that incorporate (embed) Perl interpreter
- it is editor VIM (well-known vi-clone from www.vim.org). I think anyone

Apache also has Perl very nicely embedded (as opposed to available through
indirect CGI calls); when it is embedded it automatically reloads and
recompiles changed scripts.

IMHO, It's bad practice to embed Perl, C++ and so on into postgres.
because it slow down postgres, increase memory requirement
and amount of leaks and errors.

Postgres should use it's own language like plpgsql, and
it's better to point your mind to improve and speed up it.

For example:
Add pre-compilation and syntax check while create function
executed
Add some string and regex manipulation functions.
Add exception handling.

All completely non standard thing may (and should) be done outside of postgres
or in worst case by DYNALOAD mechanic.

You can look at Apache's mod_perl and mod_php3 to compare two ways
mentioned above:

First - embedding perl with all it's history and lots of function completely
unnecessary and inconvenient for web programming.

Second - php3 - language initially developed to embed into apache.

---
Dmitry Samersoff, dms@wplus.net, ICQ:3161705
http://devnull.wplus.net
* There will come soft rains ...

#89Michael Robinson
robinson@netrinsics.com
In reply to: Noname (#86)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

wieck@debis.com (Jan Wieck) writes:

That's one reason. Another one is that you can create stored
procedures that get triggered on events
(INSERT/UPDATE/DELETE) and can perform referential integrity
checks and other things then.

Some of them could also be done by constraints (if we
sometimes have the rule system powerful enought to do it
correctly). Some can't.

Yes, this is true. However, between SQL, and PL/PGSQL, I think we have
this covered, and I don't see a lot of urgency for adding new languages
just for this purpose.

And how does your CORBA get triggered in the case someone
uses the good old psql? Or MUST everything then talk CORBA
and you disable any other kind of access completely? Note
that for a trigger that has to ensure referential integrity
it's not enough to say "it will be triggered if the user uses
the correct access path". It has to ensure that the user
doesn't use the wrong one!

Well, that's the nice thing about ORBit. You can link two CORBA-connected
systems into one binary, and ORBit will give a clean and efficient in-process
connection between the two. So, just as pgsql can expose it's guts via
CORBA, so, too, can the programming runtime of your choice. All that's
required (in theory) is a one-time wrapper for pgsql's current embedded-
language API, and you don't have to mess with the pgsql side ever again. Of
course, this win only applies to languages with ORBit bindings (currently C,
with C++, Python, Ada, and several others in the pipeline).

But, again, I don't see a lot of urgency for this kind of solution.

-Michael Robinson

#90Philip Warner
pjw@rhyme.com.au
In reply to: Dmitry Samersoff (#88)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

At 13:34 9/06/99 +0400, you wrote:

IMHO, It's bad practice to embed Perl, C++ and so on into postgres.
because it slow down postgres, increase memory requirement
and amount of leaks and errors.

You can't possibly mean to say Perl is a slow, leaky memory hog! ;-}

Do I detect a conspiracy here? (Oleg?)

Postgres should use it's own language like plpgsql, and
it's better to point your mind to improve and speed up it.

For example:
Add pre-compilation and syntax check while create function
executed

At least a syntax check...

Add some string and regex manipulation functions.
Add exception handling.

These all sound like a good idea.

Jan: If I volunteer to attempt any of these, can you provide advice?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#91Oleg Broytmann
phd@emerald.netskate.ru
In reply to: Philip Warner (#90)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

On Wed, 9 Jun 1999, Philip Warner wrote:

You can't possibly mean to say Perl is a slow, leaky memory hog! ;-}
Do I detect a conspiracy here? (Oleg?)

I am not in the language wars, sorry :)

For example:
Add pre-compilation and syntax check while create function
executed

At least a syntax check...

Python can store compiled bytecodes on disk.

Add exception handling.

Python is definetely good on this.

I'll following the discussion.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

Oleg.
----
Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.

#92Dmitry Samersoff
dms@wplus.net
In reply to: Philip Warner (#90)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

On 09-Jun-99 Philip Warner wrote:

At 13:34 9/06/99 +0400, you wrote:

IMHO, It's bad practice to embed Perl, C++ and so on into postgres.
because it slow down postgres, increase memory requirement
and amount of leaks and errors.

You can't possibly mean to say Perl is a slow, leaky memory hog! ;-}

No - it just a huge bug ;-))
I like Perl - but it's really too big to be combined with something else.

Do I detect a conspiracy here? (Oleg?)

I hope not. No masquerading is installed - I'm Dmitry yet. ;-))

Postgres should use it's own language like plpgsql, and
it's better to point your mind to improve and speed up it.

For example:
Add pre-compilation and syntax check while create function
executed

At least a syntax check...

Add some string and regex manipulation functions.
Add exception handling.

These all sound like a good idea.

Thanks

Jan: If I volunteer to attempt any of these, can you provide advice?

---
Dmitry Samersoff, dms@wplus.net, ICQ:3161705
http://devnull.wplus.net
* There will come soft rains ...

#93Kaare Rasmussen
kar@webline.dk
In reply to: Noname (#79)
Re: [HACKERS] Priorities for 6.6

Perl language is. And I know that it's easy to integrate
anything into Perl. But after all, it's a Perl script that
has the entire control.

I don't think so, but then again I'm only speculating. If the Apache
people can embed an entire Perl interpreter in their Web server,
shouldn't it be possible for PostgreSQL? Or maybe the Apache people are
REALLY REAL PROGRAMMERS? :-)

This flexibility requires a real good design of the
interpreters internals. And that's what I'm addressing here.

As I said, I don't code C. I haven't got the time to learn it right
now, and not the time to learn PostgreSQL's internals. If my offer to
help with any Perl question I can help with is below your standards,
I'm sorry.

You aren't - you're a script writer and that's a today quiche
eater :-)

That remark shows that you know nothing about Perl. But it's okay; be
ignorant in your own little way ;-]

Btw. How do you define script writing as oposed to programming?

#94Hannu Krosing
hannu@trust.ee
In reply to: Dmitry Samersoff (#88)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

Dmitry Samersoff wrote:

IMHO, It's bad practice to embed Perl, C++ and so on into postgres.
because it slow down postgres, increase memory requirement
and amount of leaks and errors.

Postgres should use it's own language like plpgsql, and
it's better to point your mind to improve and speed up it.

For example:
Add pre-compilation and syntax check while create function
executed
Add some string and regex manipulation functions.
Add exception handling.

What we really need is a better PL interface, the current one has
quite a few limitations. Corba may help here, but _not_ a simple
Corba wrapper of existing api

All completely non standard thing may (and should) be done outside of postgres
or in worst case by DYNALOAD mechanic.

Currently we are doing it in your "worst case" way :)

the v6.5 has even special scripts to create/destroy PLs. Only SQL and
internal (compiled-in C) are "embedded" in non-removable way.

Even PL/PGSQL must be installed to be used

You can look at Apache's mod_perl and mod_php3 to compare two ways
mentioned above:

First - embedding perl with all it's history and lots of function completely
unnecessary and inconvenient for web programming.

Second - php3 - language initially developed to embed into apache.

Compare this to Zope - using an advanced language to craft a special
tool
(in this case application server), which can both be used from other
servers
but also has its own server (also written in python) which can
outperform
apache in many usage patterns. And it has even a small SQL server
(gadfly)
embedded for both example of SQL adapter and for smaller scale work
(also
written in python)
I think this achieves both the slickness of php3 and with extendability
of perl.

Now - what has it to do with embedding languages in PostgreSQL?

IMHO having better support for PLs should go further than just calling
functions in SELECT or triggers - to make it possible for people to
easily
add new types/indexing schemes the PL should also be usable for
input/output
functions, access methods and maybe even for prototyping new fe/be
protocol.

I hope to get a draft implementation of this in 6.5 before its official
launch :)

-------------------
Hannu

#95Noname
wieck@debis.com
In reply to: Kaare Rasmussen (#93)
Re: [HACKERS] Priorities for 6.6

Kaare Rasmussen wrote:

shouldn't it be possible for PostgreSQL? Or maybe the Apache people are
REALLY REAL PROGRAMMERS? :-)

There are some.

You aren't - you're a script writer and that's a today quiche
eater :-)

That remark shows that you know nothing about Perl. But it's okay; be
ignorant in your own little way ;-]

Kaare,

I would never really flame on a list like this. And I
personally prefer scripts wherever possible. Sometimes I
can't resist to write some humor - just that my kind of humor
is a little hard to understand. But real programmers don't
care if other human's understand them as long as their
computers do. But even then, they have a programming problem,
and totally don't care any more about human's and their
communication problems.

Btw. How do you define script writing as oposed to programming?

#define SCRIPT_WRITING "modern form of quiche eating"
#define PROGRAMMING (((READABILITY_OF_WRITTEN_CODING <= 0 && \
ABLE_TO_WRITE_FORTRAN_STYLE_IN_C > 4) || \
USES_GOTO_EXCESSIVELY) ? TRUE : FALSE)

Disclaimer: This entire message isn't serious!

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#96Noname
wieck@debis.com
In reply to: Hannu Krosing (#94)
Re: [HACKERS] PL/Lang (was: Priorities for 6.6)

Hannu Krosing wrote:

What we really need is a better PL interface, the current one has
quite a few limitations. Corba may help here, but _not_ a simple
Corba wrapper of existing api

Actually I'm a little in doubt if calling it an interface
isn't gone too far. The only difference (from the call
handlers point of view) is that the fmgr calls one and the
same C entry point for different functions and passes
therefore the OID of the function meant. So in reality it's
more another call technique than an interface.

On the point on CORBA I say: If Someone can create a general
interface that makes integration of external
languages/applications as easy so little old ladies and 12
year old pacman players can do it, than he should. If that
new, fantastic, glory technique is based on CORBA or COBOL, I
wouln't mind - I would shout a welcome!

I'm not able to produce such a magic thing. All I can create
are things like PL/Tcl, PL/pgSQL, the actual pain of the rule
system and some other ugly ones. And as long as noone comes
up with the above peace of magic, I'll go on providing the
things I can create.

It was hard to write them, so it should be hard to use them.
I only try to make it not as hard as to learn an entirely new
language from scratch. In the PL/Tcl tree there is a little
test suite. I'm sure that a Perl expert could create the
same suite in a few minutes (like I did it) - but only in
PL/Perl - never in PL/Tcl, because then he would have a
larger learning curve ahead.

Let people use the languages they're familiar with, and
you'll get good, reliable, performat applications. Force them
to use something they've never seen before and they'll "shoot
theirself into the foot".

To understand this one someone must (re)read:

http://www.cs.rpi.edu/~edwardsb/shoot.in.foot.html

All completely non standard thing may (and should) be done outside of postgres
or in worst case by DYNALOAD mechanic.

Currently we are doing it in your "worst case" way :)

the v6.5 has even special scripts to create/destroy PLs. Only SQL and
internal (compiled-in C) are "embedded" in non-removable way.

Even PL/PGSQL must be installed to be used

And I don't think that having PL/pgSQL to follow the
generalized standard way all PL's must go is the worst case.
IMHO it's the best of all cases. Think of other standards -
ALL Windows applications must use the Windows API - except
for those created by M$. Maybe they have a good reason not to
use the Windows API - they know it's internals.

IMHO having better support for PLs should go further than just calling
functions in SELECT or triggers - to make it possible for people to
easily
add new types/indexing schemes the PL should also be usable for
input/output
functions, access methods and maybe even for prototyping new fe/be
protocol.

For PL/Tcl, this time will surely come. When Tcl8.1 has
settled so 8.0 could be considered the release that has to be
supported for backward compatibility, I'll kick out the 7.6
support from PL/Tcl. This would again increase it's
performance.

The actual limitation not to be able to create data type
input/output functions is because the call handler doesn't
support it. Definitely, it could! It already has to lookup
pg_type to find the input/output functions for the return-
value/arguments. If the OID in the pg_type tuples in-/out-
function is the one the call handler actually is called for,
why not acting so?

The PL/Tcl module you're seeing with v6.5 is mainly still the
one I've developed based on Tcl7.6 (the fact that one and the
same sources actually compile with 7.6 and 8.0 stands for
itself). All the above features would have required the
Tcl_Obj interface which was new to 8.0. But at the time I
created it, 7.6 was the default in install packages of unices
and surely there where many 7.5's still in use. I decided it
would be better to first support most Tcl installations and
later force those who used it to do an upgrade.

I hope to get a draft implementation of this in 6.5 before its official
launch :)

I'll be back...

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#97Vadim Mikheev
vadim@krs.ru
In reply to: Noname (#87)
Re: Real Programmers (was: [HACKERS] Priorities for 6.6)

Jan Wieck wrote:

I re-read the Real Programmers don't use Pascal ---- it beats writing
this damned proposal I'm working on. If the line about Real Programmers
use goto's is anything to go by, then you should nominate Vadim as the
v6.5 Real Programmer. You should _see_ all those gotos in nbtree.c!
He's done a fine job with them too.

Unfortunately, I use loops in 9 cases of 10, seems
like I have no chance to win nomination -:(
Though, you should _see_ gotos in heapam.c - maybe there
is still some chance for me.
-:)

Actually, I just don't think that breaks in loops are always better
than gotos.

Vadim is surely one of the real programmers in our project.
It's not only that he isn't afraid using GOTO's. He also

Like someone didn't afraid to use siglongjmp in elog.c.

know's very well how to speed things up by complicating the
code :-)

-:)

VAdim

#98Kaare Rasmussen
kar@webline.dk
In reply to: Noname (#95)
Re: [HACKERS] Priorities for 6.6

I would never really flame on a list like this. And I

Who's flaming. I'm just tickling your side bone. Maybe you'll end up
believing yourself if nobody tells you otherwise ;-}

#99Noname
wieck@debis.com
In reply to: Vadim Mikheev (#97)
Re: Real Programmers (was: [HACKERS] Priorities for 6.6)

Actually, I just don't think that breaks in loops are always better
than gotos.

Vadim is surely one of the real programmers in our project.
It's not only that he isn't afraid using GOTO's. He also

Like someone didn't afraid to use siglongjmp in elog.c.

There are much better ones in the PL handlers! memcpy()'s
mangling sigjmp_buf's between sigsetjmp() siglongjmp() stuff.

know's very well how to speed things up by complicating the
code :-)

-:)

VAdim

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#100Vadim Mikheev
vadim@krs.ru
In reply to: Noname (#99)
Re: Real Programmers (was: [HACKERS] Priorities for 6.6)

Jan Wieck wrote:

Actually, I just don't think that breaks in loops are always better
than gotos.

Vadim is surely one of the real programmers in our project.
It's not only that he isn't afraid using GOTO's. He also

Like someone didn't afraid to use siglongjmp in elog.c.

There are much better ones in the PL handlers! memcpy()'s
mangling sigjmp_buf's between sigsetjmp() siglongjmp() stuff.

Wow! Voodoo!
I very like such things -:)
This is really the way what Real Programmers follow -:)

Vadim

#101A James Lewis
james@fsck.co.uk
In reply to: Vadim Mikheev (#100)
Re: Real Programmers (was: [HACKERS] Priorities for 6.6)

Hey, why don't you just overwrite the jmp instruction with a nop....

On Thu, 10 Jun 1999, Vadim Mikheev wrote:

Jan Wieck wrote:

Actually, I just don't think that breaks in loops are always better
than gotos.

Vadim is surely one of the real programmers in our project.
It's not only that he isn't afraid using GOTO's. He also

Like someone didn't afraid to use siglongjmp in elog.c.

There are much better ones in the PL handlers! memcpy()'s
mangling sigjmp_buf's between sigsetjmp() siglongjmp() stuff.

Wow! Voodoo!
I very like such things -:)
This is really the way what Real Programmers follow -:)

Vadim

A.J. (james@fsck.co.uk)
Ignorance is not knowing.
Stupidity is the active pursuit of ignorance.

#102Noname
wieck@debis.com
In reply to: A James Lewis (#101)
Re: Real Programmers (was: [HACKERS] Priorities for 6.6)

Hey, why don't you just overwrite the jmp instruction with a nop....

Hmmmm - this would require that the code segment is writable
what it isn't on most modern systems.

But the shared objects are usually compiled with -fPIC
(position independent code), so it should be possible to copy
the code segment part of the PL handlers into an malloc()'ed
area to get it into writable memory and execute it there over
function pointers...

Nice idea, we'll try it with the upcoming PL/Perl handler.

On second thought, there maybe is another tricky way to
prevent it all. Copy the entire Perl interpreter into
malloc()'ed memory and modify it's calls to malloc(), free()
redirecting them to private ones. Then we have total control
over it's allocations, can create an image copy of it after
each some successful calls into another area and in the case
of a transaction abort reset it to the last valid state by
restoring the copy.

On third thought, we could also do it the Microsoft way. Hook
into the kernel's virtual memory control and trace every
first write operation into a page. At this time we copy the
old pages state to somewhere else. This will save some
allocated memory because we only need restorable copies of
the pages modified since the last save cycle. Requires to
hack down ways to get around access restrictions so the
postmaster is able to patch the OS kernel at startup (only
requires root permissions so /dev/kmem can get opened for
writing), but since this is definitely the best way to do it,
it's worth the efford.

The result from this work then will become the base for more
changes. If the postmaster is already patching the kernel,
it can also take over the process scheduling to optimize the
system for PostgreSQL performance and we could get rid of
these damned SYSV IPC semaphores. Finally the postmaster will
control a new type of block cache, by mapping part's of the
relations into virtual memory pages of the backends on demand
avoiding SYSV shared memories too.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#103A James Lewis
james@fsck.co.uk
In reply to: Noname (#102)
Re: Real Programmers (was: [HACKERS] Priorities for 6.6)

Then again, I never coded assembler on a modern system...... it was fun
though....

Cheers for a great database! If you have to delay 6.5 longer, do it...
it's better to have somthing stable.

James

On Thu, 10 Jun 1999, Jan Wieck wrote:

Hey, why don't you just overwrite the jmp instruction with a nop....

Hmmmm - this would require that the code segment is writable
what it isn't on most modern systems.

But the shared objects are usually compiled with -fPIC
(position independent code), so it should be possible to copy
the code segment part of the PL handlers into an malloc()'ed
area to get it into writable memory and execute it there over
function pointers...

Nice idea, we'll try it with the upcoming PL/Perl handler.

On second thought, there maybe is another tricky way to
prevent it all. Copy the entire Perl interpreter into
malloc()'ed memory and modify it's calls to malloc(), free()
redirecting them to private ones. Then we have total control
over it's allocations, can create an image copy of it after
each some successful calls into another area and in the case
of a transaction abort reset it to the last valid state by
restoring the copy.

On third thought, we could also do it the Microsoft way. Hook
into the kernel's virtual memory control and trace every
first write operation into a page. At this time we copy the
old pages state to somewhere else. This will save some
allocated memory because we only need restorable copies of
the pages modified since the last save cycle. Requires to
hack down ways to get around access restrictions so the
postmaster is able to patch the OS kernel at startup (only
requires root permissions so /dev/kmem can get opened for
writing), but since this is definitely the best way to do it,
it's worth the efford.

The result from this work then will become the base for more
changes. If the postmaster is already patching the kernel,
it can also take over the process scheduling to optimize the
system for PostgreSQL performance and we could get rid of
these damned SYSV IPC semaphores. Finally the postmaster will
control a new type of block cache, by mapping part's of the
relations into virtual memory pages of the backends on demand
avoiding SYSV shared memories too.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

A.J. (james@fsck.co.uk)
Ignorance is not knowing.
Stupidity is the active pursuit of ignorance.

#104Mark Hollomon
mhh@nortelnetworks.com
In reply to: A James Lewis (#103)
Re: Real Programmers (was: [HACKERS] Priorities for 6.6)

In message "Re: Real Programmers (was: [HACKERS] Priorities for 6.6)",
you write:

Hey, why don't you just overwrite the jmp instruction with a nop....

Hmmmm - this would require that the code segment is writable
what it isn't on most modern systems.

But the shared objects are usually compiled with -fPIC
(position independent code), so it should be possible to copy
the code segment part of the PL handlers into an malloc()'ed
area to get it into writable memory and execute it there over
function pointers...

Nice idea, we'll try it with the upcoming PL/Perl handler.

On second thought, there maybe is another tricky way to
prevent it all. Copy the entire Perl interpreter into
malloc()'ed memory and modify it's calls to malloc(), free()
redirecting them to private ones. Then we have total control
over it's allocations, can create an image copy of it after
each some successful calls into another area and in the case
of a transaction abort reset it to the last valid state by
restoring the copy.

On third thought, we could also do it the Microsoft way. Hook
into the kernel's virtual memory control and trace every
first write operation into a page. At this time we copy the
old pages state to somewhere else. This will save some
allocated memory because we only need restorable copies of
the pages modified since the last save cycle. Requires to
hack down ways to get around access restrictions so the
postmaster is able to patch the OS kernel at startup (only
requires root permissions so /dev/kmem can get opened for
writing), but since this is definitely the best way to do it,
it's worth the efford.

The result from this work then will become the base for more
changes. If the postmaster is already patching the kernel,
it can also take over the process scheduling to optimize the
system for PostgreSQL performance and we could get rid of
these damned SYSV IPC semaphores. Finally the postmaster will
control a new type of block cache, by mapping part's of the
relations into virtual memory pages of the backends on demand
avoiding SYSV shared memories too.

Or, embed postgreSQL directly in the OS using the OS toolkit.

#105Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Hannu Krosing (#20)
Re: [HACKERS] Priorities for 6.6

C. Look over the protocol and unify the _binary_ representations of
datatypes on wire. in fact each type already has two sets of
in/out conversion functions in its definition tuple, one for disk and
another for net, it's only that until now they are the same for
all types and thus probably used wromg in some parts of code.

Added to TODO:

* remove duplicate type in/out functions for disk and net

D. After B. and C., add a possibility to insert binary data
in "(small)binary" field without relying on LOs or expensive
(4x the size) quoting. Allow any characters in said binary field

I will add this to the TODO list if you can tell me how does the user
pass this into the backend via a query?

* Add non-large-object binary field

F. As a lousy alternative to 1. fix the LO storage. Currently _all_ of
the LO files are kept in the same directory as the tables and
indexes.
this can bog down the whole database quite fast if one lots of LOs
and
a file system that does linear scans on open (like ext2).
A sheme where LOs are kept in subdirectories based on the hex
representation of their oids would avoid that (so LO with OID
0x12345678
would be stored in $PG_DATA/DBNAME/LO/12/34/56/78.lo or maybe
reversed
$PG_DATA/DBNAME/LO/78/56/34/12.lo to distribute them more evenly in
"buckets"

I have already added a TODO item to use hash directories for large
objects. Probably single or double-level 256 directory buckets are
enough:

04/4A/file
09/B3/file

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#106Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#25)
Re: [HACKERS] Priorities for 6.6

OK, question answered, TODO item added:

* Add non-large-object binary field

Is this doable? I just looked at the list of datatypes and didn't see
binary as one of them.

bytea ... even if we didn't have one, inventing it would be trivial.
(Although I wonder whether pg_dump copes with arbitrary data in fields
properly ... I think there are still some issues about COPY protocol
not being fully 8-bit-clean...)

As someone else pointed out, you'd still want an equivalent of
lo_read/lo_write, but now it would mean fetch or put N bytes at an
offset of M bytes within the value of field X of tuple Y in some
relation. Otherwise field X is pretty much like any other item in the
database. I suppose it'd only make sense to allow random data to be
fetched/stored in a bytea field --- other datatypes would want to
constrain the data to valid values...

regards, tom lane

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026