Joins and links
Hello all!
You probably remember me - recently I complained about speed
of joins in Postgres. After a short investigation the way was
found in which Postgres's optimizer can do the right job. It
was constructive discussion. Now I want to tell you what could
make Postgres better and faster. And what will make us (our
development group) happy. Maybe I am bothering someone, if
I do - tell me that.
Let me begin.
First of all, some accounting reports need to be delivered
very fast - within minutes or so. And what's bad is that
quite a few of these reports are quite time-consuming and search
intensive. In particular, internals of these reports include
a lot of joins on tables.
Secondly, almost all of accounting information naturally
fits into network data model, which can be implemented very
efficiently.
This stuff described here is not accounting-specific, it
can be found in every database which uses master-detail
tables and other such types of relations.
So. How is join being performed in such cases? Although I am
not an expert, I can imagine the way: first there is an (index)
scan on first table, and then an (index) scan on the second.
It is the best way, reality could be much worse as we have seen.
How can we radically improve performance in such cases? There
is a simple and quite obvious way. (For you not to think that
I am hallucinating I will tell you that there exist some
real servers that offer such features I am talking about)
We should make a real reference in one table to another! That
means there could be special data type called, say, "link",
which is a physical record number in the foreign table.
Queries could look like this:
table1:
a int4
b link (->table2)
table2:
c int4
recnum (system auxiliary field, really a record number in the table)
select * from table2 where table1.a > 5 and table1.b = table2.recnum
Such joins can fly really fast, as practice shows :)
Just consider: the thing table1.b = table2.recnum is a READY-MADE
join, so server doesn't have to build anything on top of that. It
can simply perform lookup through link, and since it is a physical
record number, this is done with the efficiency of C pointers! Thus
performance gain is ENORMOUS.
And it simplifies the optimizer, because it doesn't have to decide
anything about whether to use indices and such like. The join is
performed always the same way, and it is the best way.
This feature, being implemented, could bring Postgres ahead
of most commercial servers, so proving creative abilities of
free software community. Let us make a step in the future!
Best regards,
Leon
At 15:46 +0300 on 05/07/1999, Leon wrote:
ow can we radically improve performance in such cases? There
is a simple and quite obvious way. (For you not to think that
I am hallucinating I will tell you that there exist some
real servers that offer such features I am talking about)
We should make a real reference in one table to another! That
means there could be special data type called, say, "link",
which is a physical record number in the foreign table.Queries could look like this:
table1:
a int4
b link (->table2)table2:
c int4
recnum (system auxiliary field, really a record number in the table)select * from table2 where table1.a > 5 and table1.b = table2.recnum
Such joins can fly really fast, as practice shows :)
If you are interested in such a feature, I would send it to the hackers
list and not the general list, which is not intended for development
issues, but for general problems and issues with existing versions.
The best would be, of course, to get hold of CVS and develop the needed
code yourself. That's what open software is all about. Perhaps if it's so
important to you, you could pay PostgreSQL incorporated, buying programmer
time for this feature.
In any case, a message to the hackers list may help you understand how
things are implemented in Postgres and how much work will be needed for
such a development. On the face of it, I can see several problems with this
idea, namely inefficiency in deletions, the need for fixed-length records
with no ability to add or drop columns or have variable-length fields
without maximum length, and needing to know all the tables that reference
a table for the sake of vacuuming. But maybe the hackers (who write
Postgres) would think differently. Ask them.
Herouth
--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma
Leon,
I have a few questions abo8ut what you are proposing.
My background was in non SQL dbms (DataFlex) which supported recnums
which as you pointed out had a number of advantages. However, recnums
also have a significant number of problems. Some features like
replication have significant dificulties. Others such as exporting and
re-loading your data also need special work-arounds.
A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
to choose one index and have the database table sorted by this index.
Then you don't need a separate index for that sort-order. It means that
almost any index can work like a recnum and avoid looking in both the
index and the data. I am trying to remember the name of this feature but
cannot at the moment.
Regards
Dave
--
David Warnock
Sundayta Ltd
Leon,
I have a few questions abo8ut what you are proposing.
My background was in non SQL dbms (DataFlex) which supported recnums
which as you pointed out had a number of advantages. However, recnums
also have a significant number of problems. Some features like
replication have significant dificulties. Others such as exporting and
re-loading your data also need special work-arounds.A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
to choose one index and have the database table sorted by this index.
Then you don't need a separate index for that sort-order. It means that
almost any index can work like a recnum and avoid looking in both the
index and the data. I am trying to remember the name of this feature but
cannot at the moment.
We have CLUSTER command, but it is something that has to be run manually.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello hackers!
I posted this message to general mailing list, and was told
that hackers' list is more appropriate place to post this
message to. What will you say about it?
This is a forwarded message
From: Leon <leon@udmnet.ru>
To: pgsql-general <pgsql-general@postgreSQL.org>
Date: Monday, July 05, 1999, 5:46:36 PM
Subject: Joins and links
===8<==============Original message text===============
Hello all!
You probably remember me - recently I complained about speed
of joins in Postgres. After a short investigation the way was
found in which Postgres's optimizer can do the right job. It
was constructive discussion. Now I want to tell you what could
make Postgres better and faster. And what will make us (our
development group) happy. Maybe I am bothering someone, if
I do - tell me that.
Let me begin.
First of all, some accounting reports need to be delivered
very fast - within minutes or so. And what's bad is that
quite a few of these reports are quite time-consuming and search
intensive. In particular, internals of these reports include
a lot of joins on tables.
Secondly, almost all of accounting information naturally
fits into network data model, which can be implemented very
efficiently.
This stuff described here is not accounting-specific, it
can be found in every database which uses master-detail
tables and other such types of relations.
So. How is join being performed in such cases? Although I am
not an expert, I can imagine the way: first there is an (index)
scan on first table, and then an (index) scan on the second.
It is the best way, reality could be much worse as we have seen.
How can we radically improve performance in such cases? There
is a simple and quite obvious way. (For you not to think that
I am hallucinating I will tell you that there exist some
real servers that offer such features I am talking about)
We should make a real reference in one table to another! That
means there could be special data type called, say, "link",
which is a physical record number in the foreign table.
Queries could look like this:
table1:
a int4
b link (->table2)
table2:
c int4
recnum (system auxiliary field, really a record number in the table)
select * from table2 where table1.a > 5 and table1.b = table2.recnum
Such joins can fly really fast, as practice shows :)
Just consider: the thing table1.b = table2.recnum is a READY-MADE
join, so server doesn't have to build anything on top of that. It
can simply perform lookup through link, and since it is a physical
record number, this is done with the efficiency of C pointers! Thus
performance gain is ENORMOUS.
And it simplifies the optimizer, because it doesn't have to decide
anything about whether to use indices and such like. The join is
performed always the same way, and it is the best way.
This feature, being implemented, could bring Postgres ahead
of most commercial servers, so proving creative abilities of
free software community. Let us make a step in the future!
Best regards,
Leon
===8<===========End of original message text===========
Best regards,
Leon mailto:leon@udmnet.ru
Import Notes
Resolved by subject fallback
David Warnock wrote:
A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
to choose one index and have the database table sorted by this index.
Then you don't need a separate index for that sort-order. It means that
almost any index can work like a recnum and avoid looking in both the
index and the data. I am trying to remember the name of this feature but
cannot at the moment.
CLUSTER ?
--
Maarten Boekhold, boekhold@tibco.com
TIBCO Finance Technology Inc.
The Atrium
Strawinskylaan 3051
1077 ZX Amsterdam, The Netherlands
tel: +31 20 3012158, fax: +31 20 3012358
http://www.tibco.com
Maarten Boekhold wrote:
David Warnock wrote:
A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
to choose one index and have the database table sorted by this index.
Then you don't need a separate index for that sort-order. It means that
almost any index can work like a recnum and avoid looking in both the
index and the data. I am trying to remember the name of this feature but
cannot at the moment.CLUSTER ?
Thats the one. Thanks.
Dave
Bruce,
I did not know Postgresql had that. I have just looked at the docs and
it seems that the postgresql CLUSTER is similar to the feature in MS SQL
Server but at present is rather more limited as
a) It is static whereas CLUSTER can be set as an attribute on an index
in MS SQL Server and then ther index is not created separately the table
is kept permanently sorted by the index. This obviously is very very
slow if you add to the middle of the index and wasteful of space if you
delete rows from the table. However, for a sequential ID it is supposed
to work well.
b) as the Postgresql CLUSTER is static it does not replace the index.
I should say at this point that I never actually used the CLUSTER
feature in MS SQL Server as we decided not to use that product after
evaluating it. So I have no practical experience to know how much of the
speed improvement wanted by Leon it would deliver.
Dave
--
David Warnock
Sundayta Ltd
David Warnock wrote:
A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
to choose one index and have the database table sorted by this index.
Then you don't need a separate index for that sort-order. It means that
almost any index can work like a recnum and avoid looking in both the
index and the data. I am trying to remember the name of this feature but
cannot at the moment.CLUSTER ?
man cluster.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce,
I did not know Postgresql had that. I have just looked at the docs and
it seems that the postgresql CLUSTER is similar to the feature in MS SQL
Server but at present is rather more limited as
It is in the FAQ under Performance too.
a) It is static whereas CLUSTER can be set as an attribute on an index
in MS SQL Server and then ther index is not created separately the table
is kept permanently sorted by the index. This obviously is very very
slow if you add to the middle of the index and wasteful of space if you
delete rows from the table. However, for a sequential ID it is supposed
to work well.
Well, sometimes it is better to have control over when this happens.
What is why cluster is nice, and we don't have time to add dynamic
cluster right now.
b) as the Postgresql CLUSTER is static it does not replace the index.
I should say at this point that I never actually used the CLUSTER
feature in MS SQL Server as we decided not to use that product after
evaluating it. So I have no practical experience to know how much of the
speed improvement wanted by Leon it would deliver.
See the performance FAQ. If you look in
pgsql/contrib/fulltextindex/README, we mention it too because without
it, fulltext indexing is slow.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello David,
Monday, July 05, 1999 you wrote:
D> I should say at this point that I never actually used the CLUSTER
D> feature in MS SQL Server as we decided not to use that product after
D> evaluating it. So I have no practical experience to know how much of the
D> speed improvement wanted by Leon it would deliver.
Not much. Because the main idea is to eliminate index scans
entirely, whereas CLUSTER is merely "CLUSTER will help because once the index
identifies the heap page for the first row that matches, all other rows that
match are probably already on the same heap page, saving disk accesses and
speeding up the query." - this is at best a few percent gain and means
nothing if the database is entirely in memory (as it often is).
Best regards, Leon
Bruce,
Thanks for your informative messages.
Well, sometimes it is better to have control over when this happens.
What is why cluster is nice, and we don't have time to add dynamic
cluster right now.I personally would not see it as a high priority.
My only reason for suggesting it was as a possible way to help provide
more performance for Leon without adding the full record number support
that he was asking for.
In fact, you were mentioning that inserting into the middle is slow, but
that sequential adding to the end is good, but in fact, heap already
does this, doesn't it? I guess if you only add occasionally, it is OK.
Also, our no-over-write table structure had a tendency to mess up that
ordering because updated rows do not go into the same place as the
original row.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Import Notes
Reply to msg id not found: 3780DF27.10F2822@sundayta.co.uk | Resolved by subject fallback
Bruce,
Thanks for your informative messages.
Well, sometimes it is better to have control over when this happens.
What is why cluster is nice, and we don't have time to add dynamic
cluster right now.
I personally would not see it as a high priority.
My only reason for suggesting it was as a possible way to help provide
more performance for Leon without adding the full record number support
that he was asking for.
Dave
--
David Warnock
Sundayta Ltd
Leon,
I agree that the static CLUSTER that Postgresql currently supports will
not help you much. When I suggested looking for a CLUSTER type feature I
only knew of dynamic clustering that removed the need for an index.
The discussion is not going anywhere as static clustering will not help
you and dynamic clustering is not about to be added.
If you are interested in other solutions that do not involve adding
record number support (which I personally still feel to be a mistake in
a set orientated dbms) then have you considered an application server
linked to triggers.
For some applications is is mposible for an application server to
maintain the latest reports on-line, recalculating as required by a
trigger notifying it of relevant changes.
Then reporting comes instantly from the app server.
If there are a large number of different reports or the reports have a
lot of selections and options this may not be possible (but a half way
house might still be by using a slightly flattened table structure for
reporting).
Regards
Dave
--
David Warnock
Sundayta Ltd
Bruce,
It is amazing when you get responses written this fast (so that the
reponse arrives before the copy of the message from the list).
In fact, you were mentioning that inserting into the middle is slow, but
that sequential adding to the end is good,
Yes this is what I was told about the way MS SQL Server does clustering.
but in fact, heap already does this, doesn't it?
heap? I am not sure what you mean.
I guess if you only add occasionally, it is OK.
Also, our no-over-write table structure had a tendency to mess up that
ordering because updated rows do not go into the same place as the
original row.
I have just been thinking a bit more and have realised that the
multi-generational architecture of 6.5 (which I have used in Interbase)
means that probably both clustering (in thr dynamic sense) and full
record number support as request by Leon are impractical.
It seems to me that record number relationships will fail completely if
there can be more than one version of a record. (well even if they are
forced to work they will lose some/all of their speed advantage).
Dynamically clustered indexes might still work but unless tables are
appended to only with no inserts or updates then maintainig the table in
index order when there can be multiple version of each row would be very
slow.
Dave
--
David Warnock
Sundayta Ltd
Bruce,
It is amazing when you get responses written this fast (so that the
reponse arrives before the copy of the message from the list).
Yep.
but in fact, heap already does this, doesn't it?
heap? I am not sure what you mean.
Heap is our base table structure. Records are always inserted on the
end of the heap file. Vacuum removes old, superceeded rows.
I guess if you only add occasionally, it is OK.
Also, our no-over-write table structure had a tendency to mess up that
ordering because updated rows do not go into the same place as the
original row.I have just been thinking a bit more and have realised that the
multi-generational architecture of 6.5 (which I have used in Interbase)
means that probably both clustering (in thr dynamic sense) and full
record number support as request by Leon are impractical.
Yes, would be a big problem. Most commercial databases have found that
the network data model is impractical in most cases.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello David,
Monday, July 05, 1999 you wrote:
D> If you are interested in other solutions that do not involve adding
D> record number support (which I personally still feel to be a mistake in
D> a set orientated dbms)
Why? There will be no such field as "record number", the only
place where it can exist is the field which references another
table. I can quite share your feeling about wrongness of
physical-oriented things in abstract tables, but don't
plain old indices deal with physical record numbers? We could
do the same - hide the value stored in such field and only
offer the user ability to use it in queries without knowing
the value.
D> then have you considered an application server
D> linked to triggers.
Unfortunately, every day user demands new types of reports
for financial analysis. And nobody knows what will be user's
wish tomorrow.
And, besides, it is not only my personal wish. What I am
proposing is huge (dozen-fold) performance gain on widespread
tasks. If you implement this, happy users will erect a gold
monument to Postgres development team.
Best regards, Leon
Leon <leon@udmnet.ru> writes:
We should make a real reference in one table to another! That
means there could be special data type called, say, "link",
which is a physical record number in the foreign table.
There is no such thing as a physical record number for a tuple in
Postgres. The closest you could come is an OID, which isn't really any
faster than any other joinable field --- you still need an index to
support fast lookup by OID.
If we did have such a concept, the speed penalties for supporting
hard links from one tuple to another would be enormous. Every time
you change a tuple, you'd have to try to figure out what other tuples
reference it, and update them all.
Finally, I'm not convinced that the results would be materially faster
than a standard mergejoin (assuming that you have indexes on both the
fields being joined) or hashjoin (in the case that one table is small
enough to be loaded into memory).
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofMon5Jul1999202452+05006850.990705@udmnet.ru | Resolved by subject fallback
And, besides, it is not only my personal wish. What I am
proposing is huge (dozen-fold) performance gain on widespread
tasks. If you implement this, happy users will erect a gold
monument to Postgres development team.
We(Vadim) did MVCC, and I haven't seen any monuments yet.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello Tom,
Monday, July 05, 1999 you wrote:
T> If we did have such a concept, the speed penalties for supporting
T> hard links from one tuple to another would be enormous. Every time
T> you change a tuple, you'd have to try to figure out what other tuples
T> reference it, and update them all.
I'm afraid that's mainly because fields in Postgres have variable
length and after update they go to the end of the table. Am I right?
In that case there could be done such referencing only with
tables with wixed width rows, whose updates can naturally be done
without moving. It is a little sacrifice, but it is worth it.
T> Finally, I'm not convinced that the results would be materially faster
T> than a standard mergejoin (assuming that you have indexes on both the
T> fields being joined) or hashjoin (in the case that one table is small
T> enough to be loaded into memory).
Consider this: no indices, no optimizer thinking, no index lookups -
no nothing! Just a sequential number of record multiplied by
record size. Exactly three CPU instructions: read, multiply,
lookup. Can you see the gain now?
Best regards, Leon
Hello Bruce,
Monday, July 05, 1999 you wrote:
I have just been thinking a bit more and have realised that the
multi-generational architecture of 6.5 (which I have used in Interbase)
means that probably both clustering (in thr dynamic sense) and full
record number support as request by Leon are impractical.
B> Yes, would be a big problem. Most commercial databases have found that
B> the network data model is impractical in most cases.
That's exacly why such powerful tool as SQL is incomparably slower
than plain dbf in most cases. Ignoring network data model was
a big mistake.
Best regards, Leon
Hello David,
Monday, July 05, 1999 you wrote:
D> I have just been thinking a bit more and have realised that the
D> multi-generational architecture of 6.5 (which I have used in Interbase)
D> means that probably both clustering (in thr dynamic sense) and full
D> record number support as request by Leon are impractical.
D> It seems to me that record number relationships will fail completely if
D> there can be more than one version of a record.
Maybe it is a silly question, but what are "more than one version
of a record"? In my opinion record is a atomic unique entity.
Isn't it?
D> (well even if they are
D> forced to work they will lose some/all of their speed advantage).
Best regards, Leon
Leon wrote:
Why? There will be no such field as "record number", the only
place where it can exist is the field which references another
table. I can quite share your feeling about wrongness of
physical-oriented things in abstract tables, but don't
plain old indices deal with physical record numbers? We could
do the same - hide the value stored in such field and only
offer the user ability to use it in queries without knowing
the value.
Leon,
In my understanding, pointer based approaches like you
are recommending have been implemented in several prototype
objected oriented databases. They have been shown to be
orders of magnitude slower than set oriented techniques,thus
many OO databases are implemented as wrappers over
relational systems!
In general, the best way to handle stuff like this for reports
is to cashe small tables which are joined (like product lookups)
in memory to make the queries run much faster. To do this,
your design has to be smart, by seperating those tuples which
are "active" products from those "inactive" products so that
the database can cashe the active records and not the inactive
records. Perhaps something like:
1. CREATE VIEW PRODUCT AS ( SELECT * FROM PRODUCT_ACTIVE_CASHED
UNION ALL SELECT * FROM PRODUCT_INACTIVE);
2. SELECT ORDER_NO, PRODUCT_NAME FROM ORDER_LINE, PRODUCT WHERE
PRODUCT.PRODUCT = ORDER_LINE.PRODUCT and ORDER_LINE.ORDER = 120;
Would be a general like solution, where orders with active
products are brought up quickly since the join is done
in memory, but orders with inactive products take much
longer, since the query on the active table is a cashe
miss, leaving a disk access on the inactive table.
Perhaps there are several other nicer ways do to this, from my
understanding a HASH based cashe could allow frequently accesed
tuples to be cahsed in memory? ... anyway, I'm no expert.
A more traditional method (which I use all the time), is to
have canned reports that are pre-generated using common
conditions. These are then saved on a web server and
updated daily. It is a bit less accurate, but often for 99%
of the purposes, day old information is just fine....
Hope this helps!
;) Clark
Hello Clark,
Monday, July 05, 1999 you wrote:
C> In my understanding, pointer based approaches like you
C> are recommending have been implemented in several prototype
C> objected oriented databases. They have been shown to be
C> orders of magnitude slower than set oriented techniques,thus
C> many OO databases are implemented as wrappers over
C> relational systems!
I can't guess where you got such information. Contrarily,
I know at least one (commercial) network database server which
orders of magnitude faster than ANY SQL server. It simply no
match to them. That experience is exactly what made me write
to Postgres mailing list. As I wrote (maybe to hackers' list)
pointer lookup takes ideally three CPU commands - read,
multiply, lookup, whereas index scan takes dozens of them and
puts a strain on optimizer's intellectual abilities, and
as we have seen it can hardly choose the optimum way of
performing a join. In pointer-field case optimizer can be quite
dumb, because there is only one way to perform a query.
Best regards, Leon
Leon <leon@udmnet.ru> writes:
I'm afraid that's mainly because fields in Postgres have variable
length and after update they go to the end of the table. Am I right?
In that case there could be done such referencing only with
tables with wixed width rows, whose updates can naturally be done
without moving. It is a little sacrifice, but it is worth it.
No, you are not right. Tuple updates can *never* be done without
moving the tuple, because the old tuple value must not be overwritten
until and unless the transaction is committed. (Under MVCC, it may
need to stick around even longer than that, I believe.) Thus, a tuple
update would require an update (and move) of every referencing tuple,
which could cascade into updates of tuples that reference those tuples,
etc.
But the really serious problem is simply that of finding the tuples
that reference the tuple you are about to update. This is relatively
straightforwards for indexes --- you just compute the index value for
the old tuple and probe into the index with it. When the tuples might
be anywhere in the database, there are no easy shortcuts. I think the
only way would be maintaining an index listing all the referencing
tuples for every referenced tuple. This would be a bit of a bear to
maintain, as well as a source of unwanted blocks/deadlocks since it
would be a bottleneck for updates to *every* table in the database.
T> Finally, I'm not convinced that the results would be materially faster
T> than a standard mergejoin (assuming that you have indexes on both the
T> fields being joined) or hashjoin (in the case that one table is small
T> enough to be loaded into memory).
Consider this: no indices,
You'd still need indices --- see above
no optimizer thinking,
You'd still need to run the optimizer to decide whether you wanted to
use this technique or some more-conventional one (unless your proposal
is to remove all other join mechanisms? Rather inflexible, that...)
no nothing! Just a sequential number of record multiplied by
record size. Exactly three CPU instructions: read, multiply,
lookup. Can you see the gain now?
If you think it takes three instructions to access a tuple that's out
on disk somewhere, I'm afraid you're sadly mistaken.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofMon5Jul1999230940+05002965.990705@udmnet.ru | Resolved by subject fallback
Leon <leon@udmnet.ru> writes:
We should make a real reference in one table to another! That
means there could be special data type called, say, "link",
which is a physical record number in the foreign table.There is no such thing as a physical record number for a tuple in
Postgres. The closest you could come is an OID, which isn't really any
faster than any other joinable field --- you still need an index to
support fast lookup by OID.
Actually, there is:
select ctid from pg_class;
ctid
------
(0,1)
(0,2)
...
The number is the block number offset in the block. It doesn't help
because UPDATED rows would get a new tid. Tid's can be used for short
periods if you are sure the data in the table doesn't change, and there
is a TODO item to allow ctid reference in the WHERE clause.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello Tom,
Monday, July 05, 1999 you wrote:
T> If we did have such a concept, the speed penalties for supporting
T> hard links from one tuple to another would be enormous. Every time
T> you change a tuple, you'd have to try to figure out what other tuples
T> reference it, and update them all.I'm afraid that's mainly because fields in Postgres have variable
length and after update they go to the end of the table. Am I right?
In that case there could be done such referencing only with
tables with wixed width rows, whose updates can naturally be done
without moving. It is a little sacrifice, but it is worth it.T> Finally, I'm not convinced that the results would be materially faster
T> than a standard mergejoin (assuming that you have indexes on both the
T> fields being joined) or hashjoin (in the case that one table is small
T> enough to be loaded into memory).Consider this: no indices, no optimizer thinking, no index lookups -
no nothing! Just a sequential number of record multiplied by
record size. Exactly three CPU instructions: read, multiply,
lookup. Can you see the gain now?Best regards, Leon
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello David,
Monday, July 05, 1999 you wrote:
D> I have just been thinking a bit more and have realised that the
D> multi-generational architecture of 6.5 (which I have used in Interbase)
D> means that probably both clustering (in thr dynamic sense) and full
D> record number support as request by Leon are impractical.D> It seems to me that record number relationships will fail completely if
D> there can be more than one version of a record.Maybe it is a silly question, but what are "more than one version
of a record"? In my opinion record is a atomic unique entity.
Isn't it?
Read how MVCC works in the manuals.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello Tom,
Tuesday, July 06, 1999 you wrote:
T> No, you are not right. Tuple updates can *never* be done without
T> moving the tuple, because the old tuple value must not be overwritten
T> until and unless the transaction is committed. (Under MVCC, it may
T> need to stick around even longer than that, I believe.) Thus, a tuple
T> update would require an update (and move) of every referencing tuple,
T> which could cascade into updates of tuples that reference those tuples,
T> etc.
This problem can be solved. An offhand solution is to have
an additional system field which will point to new tuple left after
update. It is filled at the same time as the original tuple is
marked invalid. So the scenario is as follows: we follow the link,
and if we find that in the tuple where we arrived this system field
is not NULL, we go to (the same table of course) where it is pointing
to. Sure VACUUM will eliminate these. Performance penalty is small.
T>> Finally, I'm not convinced that the results would be materially faster
T>> than a standard mergejoin (assuming that you have indexes on both the
T>> fields being joined) or hashjoin (in the case that one table is small
T>> enough to be loaded into memory).
Consider this: no indices,
T> You'd still need indices --- see above
Only time when we will need to look who is referencing us is
during VACUUM. So no real need of indices.
no optimizer thinking,
T> You'd still need to run the optimizer to decide whether you wanted to
T> use this technique or some more-conventional one (unless your proposal
T> is to remove all other join mechanisms? Rather inflexible, that...)
No. I am not an evil itself which tries to eliminate everything :)
I said when optimizer sees join made through such field it has
the only option - to follow link. It simply has no choice.
T> If you think it takes three instructions to access a tuple that's out
T> on disk somewhere, I'm afraid you're sadly mistaken.
No. I meant a tuple which is in memory somewhere :)
Best regards, Leon
Hello Bruce,
Tuesday, July 06, 1999 you wrote:
Maybe it is a silly question, but what are "more than one version
of a record"? In my opinion record is a atomic unique entity.
Isn't it?
B> Read how MVCC works in the manuals.
Ah, you mean MVCC! That's what I replied to Tom Lane:
This problem can be solved. An offhand solution is to have
an additional system field which will point to new tuple left after
update. It is filled at the same time as the original tuple is
marked invalid. So the scenario is as follows: we follow the link,
and if we find that in the tuple where we arrived this system field
is not NULL, we go to (the same table of course) where it is pointing
to. Sure VACUUM will eliminate these. Performance penalty is small.
Best regards, Leon
Hello Bruce,
Tuesday, July 06, 1999 you wrote:
B> Actually, there is:
B> select ctid from pg_class;
B> ctid
B> ------
B> (0,1)
B> (0,2)
B> ...
B> The number is the block number offset in the block. It doesn't help
B> because UPDATED rows would get a new tid. Tid's can be used for short
B> periods if you are sure the data in the table doesn't change, and there
B> is a TODO item to allow ctid reference in the WHERE clause.
It seems that you are moving in a right direction. But these tids
seem to be of short-term use, anyway. Why not to upgrade to full-featured
links at once?
Best regards, Leon
Leon <leon@udmnet.ru> writes:
This problem can be solved. An offhand solution is to have
an additional system field which will point to new tuple left after
update. It is filled at the same time as the original tuple is
marked invalid. So the scenario is as follows: we follow the link,
and if we find that in the tuple where we arrived this system field
is not NULL, we go to (the same table of course) where it is pointing
to. Sure VACUUM will eliminate these. Performance penalty is small.
Is it small? After multiple updates to the referenced tuple, you'd be
talking about following a chain of TID references in order to find the
referenced tuple from the referencing tuple. I'd expect this to take
more time than an index access within a fairly small number of updates
(maybe four or so, just on the basis of counting disk-block fetches).
VACUUM is an interesting problem as well: to clean up the chains as you
suggest, VACUUM could no longer be a one-table-at-a-time proposition.
It would have to be able to update tuples elsewhere while repacking the
tuples in the current table. This probably means that VACUUM requires
a global lock across the whole database. Also, making those updates
in an already-vacuumed table without undoing its nicely vacuummed state
might be tricky.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofTue6Jul1999002817+05001819.990706@udmnet.ru | Resolved by subject fallback
Hello Tom,
Tuesday, July 06, 1999 you wrote:
T> Is it small?
It is :) First you should tell me what is the cost of tid lookup.
If it is significantly more expensive than C pointer, then we
should consider implementing such cheap pointer. If tid is already
cheap, then even 10 consecutive lookups will cost almost nothing.
And besides all, you should consider statistics. Can you name
five or even three applications where large databases are
massively updated without being vacuumed often?
T> After multiple updates to the referenced tuple, you'd be
T> talking about following a chain of TID references in order to find the
T> referenced tuple from the referencing tuple. I'd expect this to take
T> more time than an index access within a fairly small number of updates
T> (maybe four or so, just on the basis of counting disk-block fetches).
T> VACUUM is an interesting problem as well: to clean up the chains as you
T> suggest, VACUUM could no longer be a one-table-at-a-time proposition.
T> It would have to be able to update tuples elsewhere while repacking the
T> tuples in the current table. This probably means that VACUUM requires
T> a global lock across the whole database.
Does VACUUM require lock on the vacuumed table now? I am sure it
does. And in our case we must lock the vacuumed table and all
the tables that are referencing it, not all tables.
And, besides, manual suggests that VACUUM should be done
nightly, not daily :)
Having aquired such lock, vacuum should update the "main"
table first, then update all links in referencing tables.
It can be done using oids, which are matched in new and old
versions of "main "table (are oids preserved during vacuum? -
if they are not, this can be done with primary key)
T> Also, making those updates
T> in an already-vacuumed table without undoing its nicely vacuummed state
T> might be tricky.
I didn' get the idea of last sentence. Anyway, I am going to sleep.
See you tomorrow :)
Best regards, Leon
Leon <leon@udmnet.ru> writes:
I'm afraid that's mainly because fields in Postgres have variable
length and after update they go to the end of the table. Am I right?
In that case there could be done such referencing only with
tables with wixed width rows, whose updates can naturally be done
without moving. It is a little sacrifice, but it is worth it.No, you are not right. Tuple updates can *never* be done without
moving the tuple, because the old tuple value must not be overwritten
until and unless the transaction is committed. (Under MVCC, it may
need to stick around even longer than that, I believe.) Thus, a tuple
update would require an update (and move) of every referencing tuple,
which could cascade into updates of tuples that reference those tuples,
etc.
Yes, Thanks, Tom. This is exactly the issue.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Congrats Tom for rising to the challenge ;)
istm that a brute force hack such as is being suggested should be left
as an exercise for the reader.
These "table links" seem to controvert the ability for a RDBMS to mix
and match tables in ways which are not hardcoded beforehand.
Regardless of whether "there exist some real servers that offer such
features I am talking", a departure from the relation model in a
relational database is likely to lead to undesireable constraints and
restrictions in our future development.
If they are a good idea, you might be able to implement and prove them
using an embedded language and the SPI facilities.
Just the usual $.02...
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
Regardless of whether "there exist some real servers that offer such
features I am talking", a departure from the relation model in a
relational database is likely to lead to undesireable constraints and
restrictions in our future development.
That was another thing that was bothering me about the idea of "version
links" between tuples (as in Leon's second proposal). They don't fit
into the fundamental relational model.
I am not sure there's anything fundamentally wrong with his basic point;
if, say, we could find a way to construct OIDs so that a tuple could be
found very quickly from its OID, that wouldn't violate the relational
model AFAICS, and such OIDs would work fine as "links". But I don't see
any way to do that without either giving up UPDATE or introducing a huge
amount of baggage into all processes that can update tables (VACUUM
being the worst case, likely). Without doubt the best compromise would
look remarkably like an index on OID.
Ultimately, when you consider both the update costs and the access
costs, I doubt that this sort of thing could be a win, except maybe
in the case where the referenced table is hardly ever changed so that
the update costs are seldom incurred. But in that situation it's not
clear you want to store the referenced table in an RDBMS anyway ---
there are lots of lower-overhead ways to deal with fixed tables, such
as perfect-hash generators.
If they are a good idea, you might be able to implement and prove them
using an embedded language and the SPI facilities.
I don't think VACUUM invokes triggers, so you couldn't really do
anything about VACUUM rearranging the table under you that way,
could you?
I'll be interested to see Vadim's comments on this thread...
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofMon05Jul1999214356+00003781271C.FAB5531@alumni.caltech.edu | Resolved by subject fallback
Hi Leon,
In the long wars between the object databases, Versant which has logical
record ids, and ObjectStore which has physical record ids, Versant has
consistently beaten ObjectStore in the sort of queries that financial
people are likely to do. (On graphic design type apps, ObjectStore tends
to win).
Versants object ids actually go through an index to get to the physical
location. Admittedly a VERY highly optimised index, but an index
nevertheless.
What this says to me is that Postgres's concept of oids are ok as is. If
your queries are too slow either the optimiser is not using an index
when it should, or else the indexing mechanism is not fast enough. I
suspect it would be nice, from an object database perspective if a
special case oid-index index type was written.
Physical record ids, have lots of problems. You can't re-organise space
dynamically so you have to take your database off-line for a while to
totally re-organise it. You lose space because it can't be re-used
dynamically. There are problems with backup.
I'm writing a web page on what would be needed to make Postgres into an
Object database.....
http://www.tech.com.au/postgres
Leon wrote:
Show quoted text
Hello all!
You probably remember me - recently I complained about speed
of joins in Postgres. After a short investigation the way was
found in which Postgres's optimizer can do the right job. It
was constructive discussion. Now I want to tell you what could
make Postgres better and faster. And what will make us (our
development group) happy. Maybe I am bothering someone, if
I do - tell me that.Let me begin.
First of all, some accounting reports need to be delivered
very fast - within minutes or so. And what's bad is that
quite a few of these reports are quite time-consuming and search
intensive. In particular, internals of these reports include
a lot of joins on tables.Secondly, almost all of accounting information naturally
fits into network data model, which can be implemented very
efficiently.This stuff described here is not accounting-specific, it
can be found in every database which uses master-detail
tables and other such types of relations.So. How is join being performed in such cases? Although I am
not an expert, I can imagine the way: first there is an (index)
scan on first table, and then an (index) scan on the second.
It is the best way, reality could be much worse as we have seen.How can we radically improve performance in such cases? There
is a simple and quite obvious way. (For you not to think that
I am hallucinating I will tell you that there exist some
real servers that offer such features I am talking about)
We should make a real reference in one table to another! That
means there could be special data type called, say, "link",
which is a physical record number in the foreign table.Queries could look like this:
table1:
a int4
b link (->table2)table2:
c int4
recnum (system auxiliary field, really a record number in the table)select * from table2 where table1.a > 5 and table1.b = table2.recnum
Such joins can fly really fast, as practice shows :)
Just consider: the thing table1.b = table2.recnum is a READY-MADE
join, so server doesn't have to build anything on top of that. It
can simply perform lookup through link, and since it is a physical
record number, this is done with the efficiency of C pointers! Thus
performance gain is ENORMOUS.And it simplifies the optimizer, because it doesn't have to decide
anything about whether to use indices and such like. The join is
performed always the same way, and it is the best way.This feature, being implemented, could bring Postgres ahead
of most commercial servers, so proving creative abilities of
free software community. Let us make a step in the future!Best regards,
Leon
Leon wrote:
Ah, you mean MVCC! That's what I replied to Tom Lane:
This problem can be solved. An offhand solution is to have
an additional system field which will point to new tuple left after
update. It is filled at the same time as the original tuple is
marked invalid. So the scenario is as follows: we follow the link,
and if we find that in the tuple where we arrived this system field
is not NULL, we go to (the same table of course) where it is pointing
to. Sure VACUUM will eliminate these. Performance penalty is small.
Old tuple version points to new version right now -:).
O how else could we handle updates in read committed mode
(new tuple version are not visible to concurrent update).
Let's go to hackers list.
But also let's relax for some more time, Leon... -:)
Vadim
Thomas Lockhart wrote:
Congrats Tom for rising to the challenge ;)
istm that a brute force hack such as is being suggested should be left
as an exercise for the reader.These "table links" seem to controvert the ability for a RDBMS to mix
and match tables in ways which are not hardcoded beforehand.
Excuse me, in what way? All the usual features of SQL are in their
legal place. What you are getting is one more additional ability,
not constraint.
Regardless of whether "there exist some real servers that offer such
features I am talking", a departure from the relation model in a
relational database is likely to lead to undesireable constraints and
restrictions in our future development.
Until you offer an example of such constraint these words are
groundless fear.
Think of it not merely as index lookup speedup, but as quick and
clever way to fix the optimizer. As we have seen, optimizer now
has troubles choosing the fastest way of doing the query. This
is deep-rooted trouble, because optimizer generally needs some
fine-graded statistics on tables which it is working on, and
gathering such statisctics would require, I am afraid, total
rewrite of the optimizer. In the case of links quiery is always
done the best way.
--
Leon.
Tom Lane wrote:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
Regardless of whether "there exist some real servers that offer such
features I am talking", a departure from the relation model in a
relational database is likely to lead to undesireable constraints and
restrictions in our future development.
Yep. Also, you fix one set of hard links and the next day you need to do
a slightly different join and it doesn't fit into the links you
constructed, because you left out a table or something silly.
I used to work on a system storing and retrieving real-time trading data
on Tandem. We were integrating it with their first-generation CORBA
system, but could handle 70 updates a second + heavy reads of
historical/real-time tick data. And this was on an old 4-processor
K10000. Nearly all data was stored at least twice -- made the inserts
slightly slower (Tandem is bloody good at inserts!) but otherwise we
couldn't cope with the reads of several MBytes of historical data/query.
Leon, I think you should study the accesses, and build the right
intermediate tables. Yes, I know you are not supposed to duplicate data,
but hey, this is the real world, and disk is cheap. And triggers etc
make it fairly managable to retain integrity. But what is indispensable
is the flexibility you have in a true relational model, so that you can
easily adapt to changing demands -- adding temporary tables as you need
them for new reports and dropping them as they go out of use.
Of course you can use views, but this can still be slow.
As far as I can see: if you know which hard links you need, you know
which additional table you need to build. And knocking up the triggers
to keep it updated is childs play. Ah, yes -- and I always have to add a
sanity_function as well that can fix things when I've made a real
balls-up ;-)
Have fun,
Adriaan
I am not sure there's anything fundamentally wrong with his basic point;
if, say, we could find a way to construct OIDs so that a tuple could be
found very quickly from its OID, that wouldn't violate the relational
model AFAICS, and such OIDs would work fine as "links". But I don't see
any way to do that without either giving up UPDATE or introducing a huge
amount of baggage into all processes that can update tables (VACUUM
being the worst case, likely). Without doubt the best compromise would
look remarkably like an index on OID.
I think the best compromise would be to have ctid in the where clause.
This would need to always be considered as best path by the optimizer.
Then the situation is similar to a primary key foreign key scenario.
The referenced table does not need an index, since we know the physical
position of the row we want (where ctid='(5,255)').
What we need second is an update trigger for the referenced table that
updates old.ctid to new.ctid in the referencing table. For this to be
efficient
you would need to create an index on the column that stores the reference.
I do not actually think that we would need extra syntax to allow this,
only the access method for a ctid where clause.
Andreas
Import Notes
Resolved by subject fallback
Tom Lane wrote:
I am not sure there's anything fundamentally wrong with his basic point;
if, say, we could find a way to construct OIDs so that a tuple could be
found very quickly from its OID, that wouldn't violate the relational
model AFAICS, and such OIDs would work fine as "links". But I don't see
any way to do that without either giving up UPDATE or introducing a huge
amount of baggage into all processes that can update tables (VACUUM
being the worst case, likely). Without doubt the best compromise would
look remarkably like an index on OID.
There is no problems with UPDATE: updated tuple points to newer
version, so we can avoid update of referencing tuples here.
VACUUM would have to update referencing tuples (via normal
heap_replace, nothing special) while removing old versions.
This may cause deadlocks but we could give vacuum higher priority
and abort others.
So, vacuum is the worst case, as pointed by Tom.
No problems with MVCC and other things.
Vadim
Zeugswetter Andreas IZ5 wrote:
I think the best compromise would be to have ctid in the where clause.
And we told about this a days ago, but no one did it.
This would need to always be considered as best path by the optimizer.
Then the situation is similar to a primary key foreign key scenario.The referenced table does not need an index, since we know the physical
position of the row we want (where ctid='(5,255)').What we need second is an update trigger for the referenced table that
updates old.ctid to new.ctid in the referencing table. For this to be
Triggers are not fired by vacuum...
Vadim
Hello Vadim,
Tuesday, July 06, 1999 you wrote:
V> There is no problems with UPDATE: updated tuple points to newer
V> version, so we can avoid update of referencing tuples here.
V> VACUUM would have to update referencing tuples (via normal
V> heap_replace, nothing special) while removing old versions.
V> This may cause deadlocks but we could give vacuum higher priority
V> and abort others.
V> So, vacuum is the worst case, as pointed by Tom.
V> No problems with MVCC and other things.
So. The main drawback is higher priority for VACUUM. Not
too large, eh?
When you will decide - to implement or not to implement,
I urge you to think again about the relief on optimizer,
which I stressed many times. No one rebutted yet that adding
brains to optimizer so that it can use appropriate join method
will require major rewrite. With links you get the best join
method as side effect - virtually for free. These joins
will never be too slow for an unknown reason. Think carefully.
I hope you will make wise decision.
Best regards, Leon
Leon wrote:
So. The main drawback is higher priority for VACUUM. Not
too large, eh?When you will decide - to implement or not to implement,
We will not decide -:))
If someone want to implement it - welcome.
I urge you to think again about the relief on optimizer,
which I stressed many times. No one rebutted yet that adding
brains to optimizer so that it can use appropriate join method
will require major rewrite. With links you get the best join
method as side effect - virtually for free. These joins
will never be too slow for an unknown reason. Think carefully.
I hope you will make wise decision.
Optimizer requires major rewrite in any case, even
having links implemented.
Vadim
After thinking a bit more, I decided to reply in a more constructive
way.
Thomas Lockhart wrote:
These "table links" seem to controvert the ability for a RDBMS to mix
and match tables in ways which are not hardcoded beforehand.
Certainly links are only of use to their intended purpose, and to
nothing more. But you should be aware that real life relationships
are exactly ot this kind. The drawback of general relational model
is that links (=joins) are built from scratch at the moment of join.
This may seem an advantage, but really this is often an unnecessary
redundant feature whose design allows to build a swarm of relationships
which never existed and will never be used.
Keeping all that in mind, we might consider building a subsystem in
SQL server which is carefully optimized for such real life tasks.
There is no need to put any restrictions on general SQL, the only
thing proposed is enhancement of a particular side of the server.
Regardless of whether "there exist some real servers that offer such
features I am talking", a departure from the relation model in a
relational database is likely to lead to undesireable constraints and
restrictions in our future development.
You have already done a heroic deed of implementing MVCC, it seems
the most interfered with thing. I can see no serious interference
with any SQL feature which you might implement.
--
Leon.
Hello Vadim,
Tuesday, July 06, 1999 you wrote:
These joins
will never be too slow for an unknown reason. Think carefully.
I hope you will make wise decision.
V> Optimizer requires major rewrite in any case, even
V> having links implemented.
I am afraid that optimizer, even totally rewritten, can't choose
the best method always. That is simply because it is such a
complex animal :) Bacterium - simple links will always win
in the field where they live :)
Best regards, Leon
Adriaan Joubert wrote:
Yep. Also, you fix one set of hard links and the next day you need to do
a slightly different join and it doesn't fit into the links you
constructed, because you left out a table or something silly.
No one is talking about abolishing any standard SQL feature. After
you carefully verified design you can hard-code links to speedup
access. Before that has happened the usual SQL will do.
Leon, I think you should study the accesses, and build the right
intermediate tables. Yes, I know you are not supposed to duplicate data,
but hey, this is the real world, and disk is cheap.
But RAM is not as big as HDD. If database doesn't fit in RAM performance
degrades severely.
And triggers etc
make it fairly managable to retain integrity.
Making trigger would cost the same as rearranging the table after
poor design of links is discovered.
But what is indispensable
is the flexibility you have in a true relational model, so that you can
easily adapt to changing demands -- adding temporary tables as you need
them for new reports and dropping them as they go out of use.
This will immensely bloat the database thus flooding the disk
channel and, what is worse, the main memory.
--
Leon.
Leon wrote:
No one is talking about abolishing any standard SQL feature. After
you carefully verified design you can hard-code links to speedup
access. Before that has happened the usual SQL will do.
Leon,
Interesting idea. You are re-introducing some of the
hierarchical database ideas, is this right? Here is
what I'm receiving, could you correct me if I'm
mis-understanding? (much of this you did not say...)
- - - - - -
In current SQL implementations, if updates are done on a
tuple being modified, referencing tables are not usually
identified or checked, let alone updated. Then, when a
query is requested, the database figures out how referenced
data can be retrieved, and does this at the time of the query.
In this proposal, in addition to carrying a primary key
for a referenced table, tuples in the referencing table
will also have a place to record the physical address
of each referenced tuple. In this way, referenced
data is easily retrieved during a query, since the
physical address of the referenced information is
stored in the referant.
For example, lets take the following schema
ORDER (ORDER_ID, ... );
PRODUCT (PRODUCT_ID, NAME, ... );
ORDER_LINE (ORDER_ID,PRODUCT_ID, ... );
In the current cases, changes to the PRODUCT table,
let's say a changed name, do not result in an update
of the ORDER_LINE tuples which reference the product
tuple being changed.
In this proposal, a few hidden field (ID/REF) would be added:
ORDER ( LEON_ID, ORDER_ID, ... );
PRODUCT ( LEON_ID, PRODUCT_ID, NAME, ... );
ORDER_LINE ( LEON_ID, ORDER_ID, PRODUCT_ID, ... , PRODUCT_LEON_REF );
Where the ORDER_LINE table would have a reference to the
physical LEON_ID of the tuple being referenced by PRODUCT_ID.
Then, an update of the PRODUCT table would result in a cascading
update of all referencing tables, including ORDER_LINE to
change the PRODUCT_LEON_REF from its previous value to the
update value. The LEON_ID and LEON_REF are internal implementation
fields and not available though SQL.
SUMMARY,
With this method, query speed is drastically improved since
the "join" logic is performed once during insert, instead
of many times during select.
This method should work well, when the referencing table
changes relatively infrequently. Thus people, products,
and other relatively static "reference" information is
a key canidate for this 'indexing' technique.
This technique should not be used if the frequency of
updates exceed the frequency of select statements.
- - - - - - -
Overall, I think it is a good idea. I frequently do weaker
versions all the time that I call "field cashing", where
the NAME field of infrequently chaging tuples are frequently
accessed. In this case, one may put PRODUCT_NAME in the
ORDER_LINE table and put a trigger on PRODUCT to cascade
update of NAME to the ORDER_LINE.PRODUCT_NAME table.
I tend to make monsters like this a nightly process, since
product name changes need not be immediate (they are rare,
and thus not frequent, and thus, not usually immediate).
This allows the cascade update to run at night when
things are alot less stressful on the database.
Is this in-line with what you are saying?
I suggest borrowing an XML term for the idea, GROVE.
In XML, a GROVE is a tree built from XML/SGML/notations.
In this case, you can think of frequently joined
information as cutting deep into the landscape, thus
the more the query is done, the more of a chance that
the UPDATE/SELECT ratio wil be small, and thus, the
greater chance that the hard wired physical address
is cashed in the table. The reason I like the
name, is that it defines a common pathway that is
easy, without preventing the efficiency of uncommon
paths (where updates >> select ).
Hmm. I'm just worrying about the CASCADE nature
of the beast. On the extreme that I was writing
about earlier, a prototype OO dbms that I was
looking at about 6 years ago (god knows what the
name is), they did *everything* this way. And
GOD it was slow... especially since it cascaded
when frequency of updates far exceed the frequency
of selects.
Thoughts?
Best,
Clark
Clark Evans wrote:
In this proposal, a few hidden field (ID/REF) would be added:
^^^^^^
Not hidden, but with _link_ type.
Vadim
Hello Clark,
Tuesday, July 06, 1999 you wrote:
C> Interesting idea. You are re-introducing some of the
C> hierarchical database ideas, is this right? Here is
C> what I'm receiving, could you correct me if I'm
C> mis-understanding? (much of this you did not say...)
Strictly speaking, this is neither hierarchical nor network
database. It is not hierarchical because cyclic graphs are
allowed (when tables reference one another, maybe through
some intermediate table). And it is not network because there
is not some weird restriction put on network database.
(textbook says in network database one referenced tuple must
be at most in one link of certain link type)
C> In this proposal, in addition to carrying a primary key
C> for a referenced table, tuples in the referencing table
C> will also have a place to record the physical address
C> of each referenced tuple.
I have read description carefully. I am afraid that MVCC
will break your scheme, because referencing tuple must have
a way to reach all versions of foreign updated tuple. If
you update the referencing field, all other versions of
foreign tuple are lost. It seems the only way to satisfy
MVCC is to chain updated foreign tuples with subsequent
VACUUM. That's because there is no need of indices, as soon
as the need of them is only during VACUUM.
Best regards, Leon
When you will decide - to implement or not to implement,
I urge you to think again about the relief on optimizer,
which I stressed many times. No one rebutted yet that adding
brains to optimizer so that it can use appropriate join method
will require major rewrite. With links you get the best join
method as side effect - virtually for free. These joins
will never be too slow for an unknown reason. Think carefully.
I hope you will make wise decision.
I believe Ingres does allow this, as it has tid's too. If you are
creating a temp table, you could use tids during your processing. In
fact, it seems tids would be valid until a vacuum is performed.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Leon wrote:
C> In this proposal, in addition to carrying a primary key
C> for a referenced table, tuples in the referencing table
C> will also have a place to record the physical address
C> of each referenced tuple.I have read description carefully. I am afraid that MVCC
will break your scheme, because referencing tuple must have
a way to reach all versions of foreign updated tuple.
If you update the referencing field, all other versions of
foreign tuple are lost.
It seems the only way to satisfy
MVCC is to chain updated foreign tuples with subsequent
VACUUM. That's because there is no need of indices, as soon
as the need of them is only during VACUUM.
(look of puzzlement) Where did I go wrong with what
you are proposing? I'm not trying to invent my
own scheme... I'm trying to understand yours.
;) Clark
Hello Clark,
Tuesday, July 06, 1999 you wrote:
C> (look of puzzlement) Where did I go wrong with what
C> you are proposing? I'm not trying to invent my
C> own scheme... I'm trying to understand yours.
Ok. If American people wants to know the True Path, it
can be enlightened :))) (it's a joke)
So what's exactly proposed:
Introduction of what will seem a new data type in table
structure:
CREATE TABLE atable (a int4)
CREATE TABLE btable (b int4, c link (atable)) - "link" looks like
new data type.
Example query with link:
SELECT * FROM atable where btable.b < 5 AND btable.c = atable.tid
(or here should go ctid - you can know better)
Type checking:
CREATE TABLE ctable (d int4)
SELECT * FROM ctable where btable.b < 5 AND btable.c = ctable.tid -
it should produce an error because link isn't to ctable.
No additional constraint is placed. Tables can reference one
another in any combination, maybe the table should be able
to reference itself.
How all that is implemented:
As we have seen, link is matched against tid in queries. It
means that link internally can be of the same data type as tid.
MVCC stuff: as Vadim pointed out, updated tuples are chained
already, so this feature can naturally be utilized. Referencing
tuple is always pointing to the oldest version of foreign
updated tuple. If transaction needs the version of foreign
tuple other than oldest, it follows the chain.
Vacuuming removes these chains thus packing the table and
rewriting references to vacuumed table in other tables.
Vacuuming thus needs high priority, maybe lock on the table
being vacuumed and all referencing tables.
Since referencing fields are rewritten only during vacuum,
there is no need of indices on any field.
Best regards, Leon
On Mon, 5 Jul 1999, Tom Lane wrote:
I am not sure there's anything fundamentally wrong with his basic point;
if, say, we could find a way to construct OIDs so that a tuple could be
found very quickly from its OID, that wouldn't violate the relational
model AFAICS, and such OIDs would work fine as "links". But I don't see
any way to do that without either giving up UPDATE or introducing a huge
amount of baggage into all processes that can update tables (VACUUM
being the worst case, likely). Without doubt the best compromise would
look remarkably like an index on OID.
So is there anything wrong with that?
Ultimately, when you consider both the update costs and the access
costs, I doubt that this sort of thing could be a win, except maybe
in the case where the referenced table is hardly ever changed so that
the update costs are seldom incurred. But in that situation it's not
clear you want to store the referenced table in an RDBMS anyway ---
there are lots of lower-overhead ways to deal with fixed tables, such
as perfect-hash generators.
While I read this thread I noticed that a lot of people are concerned
about their update speeds. I am primarily concerned about query speeds.
Consider how often you update data vs. how often you query it. That's the
whole point of a database: to optimize information retrieval. Now I am not
sure how big those update performance penalties would be but I am not
concerned really.
Meanwhile I agree that hard-linking via record IDs sounds suspiciously
like a page from the OODB textbook where it is praised for exactly the
same reasons the person who started this discussion cited: no joins. But
in order for that to work (if it works) the database software would have
to be written from scratch in otder for it to be marginally efficient.
The question I ask myself though is, are there any concrete plans for
referential integrity via foreign key clauses? 6.6, 7.0, never? To me,
that's really much more important than query speed or MVCC.
--
Peter Eisentraut
PathWay Computing, Inc.
Leon wrote:
Hello Vadim,
Tuesday, July 06, 1999 you wrote:
These joins
will never be too slow for an unknown reason. Think carefully.
I hope you will make wise decision.V> Optimizer requires major rewrite in any case, even
V> having links implemented.I am afraid that optimizer, even totally rewritten, can't choose
the best method always. That is simply because it is such a
complex animal :) Bacterium - simple links will always win
in the field where they live :)
From what I have read from earlier posts about the optimizer,
there can be situations where using links would actually be slower
than going through the optimiser, similar to the case where scanning
the whole table using an index can be orders of magnitude slower than
doing a direct scan.
That is of course if used unwisely ;)
Another thing that has remained unclear to me is the way to actually
insert or update the links - you can't just put another record there,
so that should be some kind of field (tid,oid,...) or some function
like last_touched('other_table_name').
So, what have you thought to put there ?
------
Hannu
-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Vadim Mikheev
Sent: Tuesday, July 06, 1999 8:12 PM
To: Tom Lane
Cc: Thomas Lockhart; Leon; pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] Fwd: Joins and linksTom Lane wrote:
I am not sure there's anything fundamentally wrong with his basic point;
if, say, we could find a way to construct OIDs so that a tuple could be
found very quickly from its OID, that wouldn't violate the relational
model AFAICS, and such OIDs would work fine as "links". But I don't see
any way to do that without either giving up UPDATE or introducing a huge
amount of baggage into all processes that can update tables (VACUUM
being the worst case, likely). Without doubt the best compromise would
look remarkably like an index on OID.There is no problems with UPDATE: updated tuple points to newer
version, so we can avoid update of referencing tuples here.
VACUUM would have to update referencing tuples (via normal
heap_replace, nothing special) while removing old versions.
This may cause deadlocks but we could give vacuum higher priority
and abort others.So, vacuum is the worst case, as pointed by Tom.
No problems with MVCC and other things.
What about dump/reload ?
And would vacuum be much complicated than now ?
I think vacuum is sufficiently complicated now.
Didn't these kind of annoying things let RDBMS exceed
NDBMS inspite of its low performance ?
If "link" is necessary at any cost,how about the following story ?
"link" = OID + TID
If oid pointed by TID is different from holding OID,executor resets
TID using OID indices(my story needs OID indices).
By this way we need not change vacuum/dump/reload etc.
The command to update TID-s to latest ones may be needed.
Comments ?
Hiroshi Inoue
Inoue@tpf.co.jp
Hannu Krosing wrote:
Another thing that has remained unclear to me is the way to actually
insert or update the links - you can't just put another record there,
so that should be some kind of field (tid,oid,...) or some function
like last_touched('other_table_name').So, what have you thought to put there ?
Earlier I proposed that links should be of type similar to tid,
so inserts should be fed with values of tid. But this requires
intermediate step, so there can be a function which takes primary
key and returns tid, or as you say a function
last_touched('other_table_name') - this seems the best choice.
--
Leon.
All,
Peter Eisentraut wrote:
The question I ask myself though is, are there any concrete plans for
referential integrity via foreign key clauses? 6.6, 7.0,
never? To me,
that's really much more important than query speed or MVCC.
I'd have to go along with this, because I've noticed that DRI is a LOT
faster than using triggers to implement RI. Although not on PG (on Oracle
actually), I think that the results can be extrapolated closely enough. DRI
reduces the implementation overhead dramatically.
MikeA
Import Notes
Resolved by subject fallback
[Charset iso-8859-1 unsupported, filtering to ASCII...]
All,
Peter Eisentraut wrote:
The question I ask myself though is, are there any concrete plans for
referential integrity via foreign key clauses? 6.6, 7.0,
never? To me,
that's really much more important than query speed or MVCC.I'd have to go along with this, because I've noticed that DRI is a LOT
faster than using triggers to implement RI. Although not on PG (on Oracle
actually), I think that the results can be extrapolated closely enough. DRI
reduces the implementation overhead dramatically.
This is on our 6.6 short list, and someone has said he will work on it
"after 6.5".
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Leon <leon@udmnet.ru> wrote:
Earlier I proposed that links should be of type similar to tid,
so inserts should be fed with values of tid. But this requires
intermediate step, so there can be a function which takes primary
key and returns tid, or as you say a function
last_touched('other_table_name') - this seems the best choice.
Beware of adding special purpose hard-links as a way to
skip the run-time value comparisons. A link looks attractive
but it really only works for one-to-one relationships
(any multi-way relationships would require a list of links
to follow) and a link has all of the overhead that a
foreign key requires.
As somone who has developed several commercial dbms systems,
I would discourage doing a special "link" type. There are
other ways to gain performance -- de-normalize your tables
if you are doing mainly reads; carefully check your storage
layout; and, of course, buy more RAM ;-)
--
Bob Devine devine@cs.utah.edu
Import Notes
Resolved by subject fallback
Leon <leon@udmnet.ru> wrote:
Earlier I proposed that links should be of type similar to tid,
so inserts should be fed with values of tid. But this requires
intermediate step, so there can be a function which takes primary
key and returns tid, or as you say a function
last_touched('other_table_name') - this seems the best choice.Beware of adding special purpose hard-links as a way to
skip the run-time value comparisons. A link looks attractive
but it really only works for one-to-one relationships
(any multi-way relationships would require a list of links
to follow) and a link has all of the overhead that a
foreign key requires.As somone who has developed several commercial dbms systems,
I would discourage doing a special "link" type. There are
other ways to gain performance -- de-normalize your tables
if you are doing mainly reads; carefully check your storage
layout; and, of course, buy more RAM ;-)
Good to see you around Bob. This guy does know what he is talking
about.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
As somone who has developed several commercial dbms systems,
I would discourage doing a special "link" type. There are
other ways to gain performance -- de-normalize your tables
if you are doing mainly reads; carefully check your storage
layout; and, of course, buy more RAM ;-)Good to see you around Bob. This guy does know what he is talking
about.
Believe me, I know what I say. Some day I spoke exactly like you,
but having seen an impementation of network DBMS, I suddenly
realized that SQL days are numbered. The sooner you understand that
the better.
--
Leon.
Bob Devine wrote:
Beware of adding special purpose hard-links as a way to
skip the run-time value comparisons. A link looks attractive
but it really only works for one-to-one relationships
(any multi-way relationships would require a list of links
to follow)
Not exactly. If you have a fixed set of links it a tuple, you
don't have to follow the list of them.
and a link has all of the overhead that a
foreign key requires.
We looked at the matter carefully and found no overhead like
foregn key's. Maybe you should read the thread more carefully
once again.
As somone who has developed several commercial dbms systems,
I would discourage doing a special "link" type. There are
other ways to gain performance -- de-normalize your tables
if you are doing mainly reads;
If I denormalize my tables, they will grow some five to ten
times in size.
But simply think what you are proposing: you are proposing
exactly to break RDBMS "alphabet" to gain performance! This
means that even SQL warriors see RDBMS's ideology as not
proper and as corrupt, because it hinders performance.
You are contradicting yourself!
carefully check your storage
layout; and, of course, buy more RAM ;-)
And what will I do with performance loss from bloated tables?
--
Leon.
Bruce Momjian wrote:
As somone who has developed several commercial dbms systems,
I would discourage doing a special "link" type. There are
other ways to gain performance -- de-normalize your tables
if you are doing mainly reads; carefully check your storage
layout; and, of course, buy more RAM ;-)Good to see you around Bob. This guy does know what he is talking
about.
After thinking a bit, it became clear to me that we are flaming
senselessly here. So can anyone do a fast hack to test links
for speed? Especially with three or more tables being joined.
--
Leon.
Bob Devine wrote:
As somone who has developed several commercial dbms systems,
I would discourage doing a special "link" type.
Of course you tried to implement links and failed, didn't you?
It such case personally I and maybe others want to hear what
can go wrong, in order to benefit from your mistakes and lessons.
--
Leon.
[Charset koi8-r unsupported, filtering to ASCII...]
Bruce Momjian wrote:
As somone who has developed several commercial dbms systems,
I would discourage doing a special "link" type. There are
other ways to gain performance -- de-normalize your tables
if you are doing mainly reads; carefully check your storage
layout; and, of course, buy more RAM ;-)Good to see you around Bob. This guy does know what he is talking
about.
No. I wasn't flaming, just confirming that he has lots of experience.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Leon wrote:
If I denormalize my tables, they will grow some five to ten
times in size.But simply think what you are proposing: you are proposing
exactly to break RDBMS "alphabet" to gain performance! This
means that even SQL warriors see RDBMS's ideology as not
proper and as corrupt, because it hinders performance.
and he wrote:
After thinking a bit, it became clear to me that we are flaming
senselessly here. So can anyone do a fast hack to test links
for speed? Especially with three or more tables being joined.
and in another message:
Of course you tried to implement links and failed, didn't you?
It such case personally I and maybe others want to hear what
can go wrong, in order to benefit from your mistakes and lessons.
It's a good idea to test it out. My guess is that a hard link
between tables would speed up the join a small amount.
The bigger drawbacks are:
1) the application design is now encoded in the database structure.
Using link forces your _one_ application's need to affect all other
users of that table. Each affected table would be bloated with
at least one more column. All updates now affect multiple tables
leading to more locking, paging, and synchronization overhead. Etc.
2) adding performance tweaks for a version condemns you to always
be aware of it for future versions. I know of many cases where
people now hate the idea of a database performance "improvement"
that has prevented them from modifying the database schema.
One person's company is still using a database that everyone hates
because one critical application prevents them from changing it.
Indexes are about the only useful physical level hack that have
survived the test of time. An index is not part of relational
databases but are universally implemented because they yield
a huge payback.
3) Be aware of hardware improvements. System performance is
still doubling every 18 months. If a software hack can't match
that rate, it is probably not worth doing.
In my experience, old style network and hierarchical databases
are still faster than relational systems. Just like OO DBMSs
can be faster. However, the non-relational databases gain their
speed by optimizing for a single application instead of being a
more general purpose approach. Nearly every company that uses
databases realizes that flexibility is more important than a bit
more speed unless that business has already maxed out their
computer's performance and are desparate for that extract bit.
It is my many years of watching databases in use that suggest
that links are not worth the overhead. My gut feeling is that
links would speed up a simple join by only 10% and there are
many other ways to speed up joins.
--
Bob Devine devine@cs.utah.edu
Bob Devine wrote:
The bigger drawbacks are:
1) the application design is now encoded in the database structure.
This is true.
Using link forces your _one_ application's need to affect all other
users of that table. Each affected table would be bloated with
at least one more column.
In fact link is intended to replace foreign key in a given table
and not coexist with it. Given that it eliminates the need of
index, there is even a small space gain.
All updates now affect multiple tables
leading to more locking, paging, and synchronization overhead. Etc.
Oh, no :) After a short discussion it became clear that there
must not be a link rewrite in a referencing table during update.
So update goes as usual, involving only one table. Instead we have
a chain of referenced tuples left after update. VACUUM eliminates
these.
2) adding performance tweaks for a version condemns you to always
be aware of it for future versions.
Absolutely right. If we started a talk on general matters, let me
clear my position.
Every tool is suitable for it's purpose. No one walks from city
to city and uses car instead. And no one takes a car to get into
neighbor's home for evening tea :) So. There are tasks of
different kind. Some are flexible and require redesigning of
relationships often. But there are other, which are well known
and explored well, and have well known structure. Accounting is
some of them. There are a lot others, without doubt. What is
proposed is a tool to handle tasks of the second sort effectively,
since general RDBMS is a tool for other, flexible tasks. This is a
matter of design and designer's job to choose the right tool.
If designer made a wrong choice, it is a problem of him an his
kicked ass. You should give designer as many tools as possible
and let him choose. They will love you for that :)
3) Be aware of hardware improvements. System performance is
still doubling every 18 months. If a software hack can't match
that rate, it is probably not worth doing.
Oh, that argument again :) I'll tell you - sooner or later
this development will stop. There are purely physical obstacles
that prevent manufacturing of silicon chips with frequencies much
higher than 10 gigahertz.
It is my many years of watching databases in use that suggest
that links are not worth the overhead. My gut feeling is that
links would speed up a simple join by only 10% and there are
many other ways to speed up joins.
Let's count. We have two tables, joined by link. What is the
cost of lookup? First there is an index scan, which is between
2 and 5 iterations, and link lookup which is 1 iteration. Average
is 4 iterations. And if we don't have link, there is 6 iterations.
More than 10% already! We still didn't consider joining multiple
tables and big tables. So the gain will be big anyway.
That is not to consider the optimizer (do I sound like a broken
record? :) To be sincere, current Postgres optimizer sucks heavily
and in most cases can't figure out the fastest way. Implementing
links is a quick and cheap way to get a performance gain on
a wide range of tasks. I am obliged to repeat this again and again,
because every day there appears a new developer who didn't hear
that yet :)
--
Leon.
Leon <leon@udmnet.ru> writes:
3) Be aware of hardware improvements. System performance is
still doubling every 18 months. If a software hack can't match
that rate, it is probably not worth doing.Oh, that argument again :) I'll tell you - sooner or later
this development will stop. There are purely physical obstacles
that prevent manufacturing of silicon chips with frequencies much
higher than 10 gigahertz.
Furthermore, the continuous availability of ever faster hardware at
low prices will slow down very soon, now that the MS Windows users
finally don't need to upgrade to twice as fast computers every 18
months just to be able to run the latest version of MS bloatware, and
will spend their money on peripherals and fast net access instead.
...but as for "purely physical obstacles", I don't buy it. We will
always find a way to make what we need. Count on it.
-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Import Notes
Reply to msg id not found: LeonsmessageofThu08Jul1999231051+0500
Bob Devine <devine@cs.utah.edu> writes:
3) Be aware of hardware improvements. System performance is
still doubling every 18 months. If a software hack can't match
that rate, it is probably not worth doing.
I like Kernighan's and Pike's argument, presented in their recent
book, The Practice of Programming, that if a software improvement is
expected to save more accumulated user time than the programmer time
spent making it, then it should be considered worthwhile.
Great book, by the way. Highly recommended.
-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Import Notes
Reply to msg id not found: BobDevinesmessageofThu08Jul1999103149-0600
Leon wrote:
Bob Devine wrote:
It is my many years of watching databases in use that suggest
that links are not worth the overhead. My gut feeling is that
links would speed up a simple join by only 10% and there are
many other ways to speed up joins.Let's count. We have two tables, joined by link. What is the
cost of lookup? First there is an index scan, which is between
2 and 5 iterations, and link lookup which is 1 iteration. Average
is 4 iterations.
This is true for the case wher you want to look up only one row.
The difference will quickly degrade as more rows are fetched in one
query and cache misses and disk head movement start rattling your
disks. The analogy being a man who needs 10 different items from a
supermarket and takes 10 full round trips from home to buy them.
And if we don't have link, there is 6 iterations.
More than 10% already! We still didn't consider joining multiple
tables and big tables.
I think that the two-tables-one-row lookup will gain the most,
probably even more than 10%
So the gain will be big anyway.
That is not to consider the optimizer (do I sound like a broken
record? :) To be sincere, current Postgres optimizer sucks heavily
and in most cases can't figure out the fastest way.
Adding links does nothing to improve the optimizer, its still free
to choose sucky plans. It is possible that links are faster if used
in the right way, as they cut out the index lookup, but I suspect that
hard-coding link-is-always-faster into the optimiser would also produce
a lot of very bad plans.
The link-is-always-faster is probably true only for all-memory
databases,
and even there not allways - for example if it happened to produce a
worse
initial ordering for sort/group by than some other strategy, a complex
query can still run slower (the difference will be small either way)
Implementing links is a quick and cheap way to get a performance
gain on a wide range of tasks.
Fixing the optimizer would get a performance gain on a far wider
range of tasks, and is still needed for links.
I am obliged to repeat this again and again,
because every day there appears a new developer who didn't hear
that yet :)
Unfortunaltely there are far less _developers_ than letter-writers, and
it
is sometimes quite hard to make them even commit good and useful patches
that are ready.
So I quess thet if you want links in foreseeable future, your best bet
would be to start coding, and to coordinate with whoever starts to
fix/rewrite
the optimizer (probably Vadim)
(BTW, in PostgreSQL, I still consider myself a letter-writer and not
developer, as I have committed no code for the backend)
-------------
Hannu
Hannu Krosing wrote:
The difference will quickly degrade as more rows are fetched in one
query and cache misses and disk head movement start rattling your
disks. The analogy being a man who needs 10 different items from a
supermarket and takes 10 full round trips from home to buy them.
Frankly, I didn't even consider fetching database from disk. This
slows queries immensely and I wonder if there exist someone who
doesn't keep their entire DB in RAM.
I think that the two-tables-one-row lookup will gain the most,
probably even more than 10%
I think the gain will raise with the number of tables, because
the more tables - the more index lookups are saved.
Adding links does nothing to improve the optimizer, its still free
to choose sucky plans. It is possible that links are faster if used
in the right way, as they cut out the index lookup, but I suspect that
hard-coding link-is-always-faster into the optimiser would also produce
a lot of very bad plans.
Methinks that hard-wiring link-is-always-faster into optimizer will
still help it very much, because there are few situations where it
is not true.
Fixing the optimizer would get a performance gain on a far wider
range of tasks, and is still needed for links.
But general fixing of optimizer is total rewritement of it, whereas
link fix is almost a fast hack.
So I quess thet if you want links in foreseeable future, your best bet
would be to start coding, and to coordinate with whoever starts to
fix/rewrite
the optimizer (probably Vadim)
Unfortunately I already have a project to work on. There is too
little of me for two projects.
--
Leon.
Leon wrote:
Hannu Krosing wrote:
The difference will quickly degrade as more rows are fetched in one
query and cache misses and disk head movement start rattling your
disks. The analogy being a man who needs 10 different items from a
supermarket and takes 10 full round trips from home to buy them.Frankly, I didn't even consider fetching database from disk. This
slows queries immensely and I wonder if there exist someone who
doesn't keep their entire DB in RAM.
Well, I personally dont even know, how I could keep my entire PostgreSQL
DB in RAM :)
It would be interesting to know what percentage of people do use
PostgreSQL for databases that are small enough to fit in RAM -
surely not the ones who need splitting of tables >2GB.
And I think that setting up PostgreSQL for maximum RAM usage would
make a nice topic for "Optimizing PostgreSQL".
When my backends are mostly idle they usually use about 3-4MB of memory,
(hardly enough for any database :).
It is quite possible that some bigger tables end up in a disk-cache,
but you can expect to find all your data in that cache only if you do
many queries on the same tables in a row, and the machine is otherways
idle.
I think the gain will raise with the number of tables, because
the more tables - the more index lookups are saved.
My point is that sometimes even sequential scan is faster than index
lookup,
and not only due to overhead of using the index, but due to better disk
performance of sequential reads vs. random reads
For in-memory databases this of course does not count.
Still I'm quite sure that the main effort in PostgreSQL development has
so
far gone to optimising queries where most of the data is fetched from
the
disk.
Fixing the optimizer would get a performance gain on a far wider
range of tasks, and is still needed for links.But general fixing of optimizer is total rewritement of it, whereas
link fix is almost a fast hack.
I'm not too sure about it. It certainly can be done without a _total_
rewrite, but getting all the new node types and access methods into the
parser/planner/executor may not be trivial.
One idea would be a cross-table OID index for anything in memory.
Then, assuming that everything is in-memory, using oids as links would
be
only trivially, if at all, slower (2-10 memory accesses and comparisons)
than "straight" link lookup, that could also be chasing linear chains of
forward-id-links on frequently updated DBs. On infrequently updated DBs
you could just use triggers and/or cron jobs to keep your reports
updated,
I quess that this is what most commercial OLAP systems do.
Actually I lived my first halfyear of using PostgreSQL under a delusion
that lookup by OID would be somewhat special (fast).
Probably due to my misunderstanding of the (ever diminishing) O in
ORDBMS :)
There have been some attempts to get the object-orientedness better
supported by PGSQL, (possibly even some contrib funtions), but nobody
seems
to have needed it bad enough to
1) implement it
and
2) shout long enough to get it included in standart distibution.
Most (all?) of the developers seem to be die-hard RDBMS guys and thanks
to that we have now a solid and reasonably fast Relational DBMS with
some OO rudiments
So I quess that unless you do (at least part of) links, no-one else will
;(
Unfortunately I already have a project to work on. There is too
little of me for two projects.
Darn! I almost hoped we would get one more PostgreSQL hacker as I'm sure
that after familiarising oneself with the code enougth to implement
links
one would be quite capable of helping with most of PostgreSQL
development
<grin>
----------------------
Hannu
Unfortunaltely there are far less _developers_ than letter-writers, and
it
is sometimes quite hard to make them even commit good and useful patches
that are ready.So I quess thet if you want links in foreseeable future, your best bet
would be to start coding, and to coordinate with whoever starts to
fix/rewrite
the optimizer (probably Vadim)
Are people complaining about the 6.5 optimizer, or the pre-6.5
optimizer? 6.5 has a much improved optimizer.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello Hannu,
Friday, July 09, 1999 you wrote:
H> Still I'm quite sure that the main effort in PostgreSQL development has
H> so
H> far gone to optimising queries where most of the data is fetched from
H> the
H> disk.
Oh, I see. This is appropriate for some not time critical
and dumb applications, such as web DB. But this is out of the
way of speed server tasks. Maybe Postgres has been designed with
such plan in mind - to use big DBs from disc? That is not
good news for me either. Almost everyone has suggested me
to use more RAM to speed up queries, and now it turned out to
be not in Postgres's mainstream. Maybe there is something
wrong with this ideology, since RAM is bigger and cheaper
every day?
H> forward-id-links on frequently updated DBs. On infrequently updated DBs
H> you could just use triggers and/or cron jobs to keep your reports
H> updated,
H> I quess that this is what most commercial OLAP systems do.
It seems that trigger will be the last resort.
Best regards, Leon
Hannu Krosing wrote:
Leon wrote:
Frankly, I didn't even consider fetching database from disk. This
slows queries immensely and I wonder if there exist someone who
doesn't keep their entire DB in RAM.Well, I personally dont even know, how I could keep my entire PostgreSQL
DB in RAM :)
I thought about doing this once on a Linux box. What I was thinking about was
creating a large RAM disk, and use that disk together with a physical drive in
a mirror setup. However, I was never able to create a large enough RAM disk back then
(must have been like LInux 2.0), and also the RAID mirror code wasn't able to
support such a mix of devices (i.e. RAM disk + physical disk). The situation might
have changed by now.
Maarten
--
Maarten Boekhold, boekhold@tibco.com
TIBCO Finance Technology Inc.
The Atrium
Strawinskylaan 3051
1077 ZX Amsterdam, The Netherlands
tel: +31 20 3012158, fax: +31 20 3012358
http://www.tibco.com
Then <boekhold@tibco.com> spoke up and said:
Hannu Krosing wrote:
Leon wrote:
Frankly, I didn't even consider fetching database from disk. This
slows queries immensely and I wonder if there exist someone who
doesn't keep their entire DB in RAM.Well, I personally dont even know, how I could keep my entire PostgreSQL
DB in RAM :)I thought about doing this once on a Linux box. What I was thinking about was
creating a large RAM disk, and use that disk together with a physical drive in
a mirror setup. However, I was never able to create a large enough RAM disk back then
(must have been like LInux 2.0), and also the RAID mirror code wasn't able to
support such a mix of devices (i.e. RAM disk + physical disk). The situation might
have changed by now.
Maarten, PostgreSQL keeps it's data in the filesystem, rather than on
raw disks. Due to the nature of *nix, all you need to do to keep your
entire DB in memory is have enough memory. The buffer cache will do
the rest, for you. Of course, you still need to start it up with -F
to avoid fsync's. This is also somewhat OS dependent, as you may have
to do some tuning to allow full memory utilization in this manner.
--
=====================================================================
| JAVA must have been developed in the wilds of West Virginia. |
| After all, why else would it support only single inheritance?? |
=====================================================================
| Finger geek@cmu.edu for my public key. |
=====================================================================
Well, I personally dont even know, how I could keep my entire PostgreSQL
DB in RAM :)I thought about doing this once on a Linux box. What I was thinking about was
creating a large RAM disk, and use that disk together with a physical drive in
a mirror setup. However, I was never able to create a large enough RAM disk back then
[...]
Maarten, PostgreSQL keeps it's data in the filesystem, rather than on
raw disks. Due to the nature of *nix, all you need to do to keep your
entire DB in memory is have enough memory. The buffer cache will do
the rest, for you. Of course, you still need to start it up with -F
I know, but there's no *guarantee* that the complete database is going to be in RAM.
That's what I was trying to solve. Putting the thing on a RAM disk would guarantee that
it is.
Maarten
--
Maarten Boekhold, boekhold@tibco.com
TIBCO Finance Technology Inc.
The Atrium
Strawinskylaan 3051
1077 ZX Amsterdam, The Netherlands
tel: +31 20 3012158, fax: +31 20 3012358
http://www.tibco.com