Suggestion for optimization
It would be nice if total table cardinality could be maintained live.
So (after the initial vacuum) we update the cardinality for each table
in the system table (or perhaps add an entry to the table itself).
There are two reasons why this is an important optimization. Firstly,
it is a psychological benefit for both benchmarks and customers when
doing a select count(*) from <tablename>. This is something that pops
up all the time in benchmarks and customers do it too, in order to get a
feel for speed. By storing the current number and incrementing for
every insert and decrementing for every delete, the count(*) case with
no where clause can return the value instantly.
The far more important reason is for optimizations. An accurate
cardinality figure can greatly enhance the optimizer's ability to
perform joins in the correct order.
An example of a SQL system that does this sort of thing is Microsoft
SQL*Server. If you have 100 million rows in a table and do:
SELECT COUNT(*) FROM table_name
it will return the correct number instantly. The same is true for
Oracle.
It might also be possible to keep an array in memory of:
typedef struct tag_cardinality_list {
char *table_name;
unsigned long cardinality;
} cardinality_list;
and keep the data updated there with simple interlocked exchange
operations. The list would be loaded on Postmaster startup and saved on
shutdown.
"Dann Corbit" <DCorbit@connx.com> writes:
It would be nice if total table cardinality could be maintained live.
So (after the initial vacuum) we update the cardinality for each table
in the system table (or perhaps add an entry to the table itself).
There are two reasons why this is an important optimization. Firstly,
it is a psychological benefit for both benchmarks and customers when
doing a select count(*) from <tablename>. This is something that pops
up all the time in benchmarks and customers do it too, in order to get a
feel for speed. By storing the current number and incrementing for
every insert and decrementing for every delete, the count(*) case with
no where clause can return the value instantly.
How would this work with MVCC?
-Doug
--
Doug McNaught Wireboard Industries http://www.wireboard.com/
Custom software development, systems and network consulting.
Java PostgreSQL Enhydra Python Zope Perl Apache Linux BSD...
Import Notes
Reply to msg id not found: DannCorbitsmessageofFri5Apr2002110413-0800
Doug McNaught <doug@wireboard.com> writes:
"Dann Corbit" <DCorbit@connx.com> writes:
It would be nice if total table cardinality could be maintained live.
How would this work with MVCC?
It wouldn't. That's why it's not there. Under MVCC, table cardinality
is in the eye of the beholder...
regards, tom lane
-----Original Message-----
From: Doug McNaught [mailto:doug@wireboard.com]
Sent: Friday, April 05, 2002 11:30 AM
To: Dann Corbit
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Suggestion for optimization
"Dann Corbit" <DCorbit@connx.com> writes:
It would be nice if total table cardinality could be maintained live.
So (after the initial vacuum) we update the cardinality for each table
in the system table (or perhaps add an entry to the table itself).
There are two reasons why this is an important optimization. Firstly,
it is a psychological benefit for both benchmarks and customers when
doing a select count(*) from <tablename>. This is something that pops
up all the time in benchmarks and customers do it too, in order to get
a
feel for speed. By storing the current number and incrementing for
every insert and decrementing for every delete, the count(*) case with
no where clause can return the value instantly.
How would this work with MVCC?
Whenever a commit occurs, the pending inserts are totaled into the sum
and the pending deletes are subtracted. It can be a list in memory or
whatever. Maybe you are referring to the old (expired) rows begin
stored until vacuum? Perhaps I really don't understand your question or
the issues involved. Why does MVCC complicate issues?
<<
Import Notes
Resolved by subject fallback
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, April 05, 2002 11:37 AM
To: Doug McNaught
Cc: Dann Corbit; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Suggestion for optimization
Doug McNaught <doug@wireboard.com> writes:
"Dann Corbit" <DCorbit@connx.com> writes:
It would be nice if total table cardinality could be maintained live.
How would this work with MVCC?
It wouldn't. That's why it's not there. Under MVCC, table cardinality
is in the eye of the beholder...
-------------------------
If this is true (even after a commit) then MVCC is a very bad thing. No
transactions occur, and two people ask the same question yet get
different answers. Doesn't that scare anyone? That would mean (among
other things) that Postgresql cannot be used for a data warehouse.
One of the primary facets of a reliable database transaction system is
repeatability. In fact, if there is no certain cardinality known after
commits, then there are no reliable database operations that can be
trusted.
How many accounts are 90 days overdue? Bill says 78 and Frank says 82.
Who is right and how can we know?
I have spent months working on Postgresql projects here (at CONNX
Solutions Inc.) and talked management into using an open source
database. Please tell me I'm not going to look like a bloody idiot in
the near term.
<<-------------------------
Import Notes
Resolved by subject fallback
"Dann Corbit" <DCorbit@connx.com> writes:
If this is true (even after a commit) then MVCC is a very bad thing. No
transactions occur, and two people ask the same question yet get
different answers. Doesn't that scare anyone? That would mean (among
other things) that Postgresql cannot be used for a data warehouse.
Have you read the doc chapter about MVCC? Sounds like you don't
quite understand how it works yet.
-Doug
--
Doug McNaught Wireboard Industries http://www.wireboard.com/
Custom software development, systems and network consulting.
Java PostgreSQL Enhydra Python Zope Perl Apache Linux BSD...
Import Notes
Reply to msg id not found: DannCorbitsmessageofFri5Apr2002115007-0800
"Dann Corbit" <DCorbit@connx.com> writes:
How many accounts are 90 days overdue? Bill says 78 and Frank says 82.
Who is right and how can we know?
If Bill and Frank look at exactly the same instant (ie, from read-only
transactions started at the same time), they will get the same answer.
If they are looking from transactions started at different times, or
from transactions that have themselves added/removed rows, they won't
necessarily get the same answer. That does not make either answer
wrong, just a reflection of the snapshots they are seeing.
regards, tom lane
"Dann Corbit" <DCorbit@connx.com> writes:
How would this work with MVCC?
Whenever a commit occurs, the pending inserts are totaled into the sum
and the pending deletes are subtracted. It can be a list in memory or
whatever. Maybe you are referring to the old (expired) rows begin
stored until vacuum? Perhaps I really don't understand your question or
the issues involved. Why does MVCC complicate issues?
<<
Because the row count depends on what transactions have committed when
yours starts. Also, you will see the count(*) reflecting INSERTs in
your transaction, but others won't until you commit. So there is no
well-defined concept of cardinality under MVCC--it depends on which
rows are visible to which transactions.
-Doug
--
Doug McNaught Wireboard Industries http://www.wireboard.com/
Custom software development, systems and network consulting.
Java PostgreSQL Enhydra Python Zope Perl Apache Linux BSD...
Import Notes
Reply to msg id not found: DannCorbitsmessageofFri5Apr2002114330-0800
-----Original Message-----
From: Doug McNaught [mailto:doug@wireboard.com]
Sent: Friday, April 05, 2002 11:55 AM
To: Dann Corbit
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Suggestion for optimization
"Dann Corbit" <DCorbit@connx.com> writes:
How would this work with MVCC?
Whenever a commit occurs, the pending inserts are totaled into the sum
and the pending deletes are subtracted. It can be a list in memory or
whatever. Maybe you are referring to the old (expired) rows begin
stored until vacuum? Perhaps I really don't understand your question
or
the issues involved. Why does MVCC complicate issues?
<<
Because the row count depends on what transactions have committed when
yours starts. Also, you will see the count(*) reflecting INSERTs in
your transaction, but others won't until you commit. So there is no
well-defined concept of cardinality under MVCC--it depends on which
rows are visible to which transactions.
-----------------------------------------------------------------
I guess that this model can be viewed as "everything is a snapshot".
It seems plain that the repercussions for a data warehouse and for
reporting have not been thought out very well. This is definitely
very, very bad in that arena. I suppose that reporting could still
be accomplished, but it would require pumping the data into a new
copy of the database that does not allow writes at all. Yuck.
At any rate, there is clearly a concept of cardinality in any case.
Perhaps the information would have to be kept as part of the
connection. If (after all) you cannot even compute cardinality
for a single connection then the database truly is useless. In
fact, under a scenario where cardinality has no meaning, neither does
select count() since that is what it measures. Might as well
remove it from the language.
I have read a couple books on Postgresql and somehow missed the
whole MVCC idea. Maybe after I understand it better the clammy
beads of sweat on my forehead will dry up a little.
<<-----------------------------------------------------------------
Import Notes
Resolved by subject fallback
"Dann Corbit" <DCorbit@connx.com> writes:
At any rate, there is clearly a concept of cardinality in any case.
Certainly. The count(*) value is perfectly well defined within any one
transaction. We *could*, if we wanted to, implement bookkeeping logic
that would keep track of the number of rows inserted by all transactions
and allow derivation of the count-as-seen-by-any-one-transaction at all
times. The point is that that logic would be vastly more complex than
you thought it would be; and it would not be optional. (AFAICS, the
counts would have to be determined at postmaster startup and then
maintained faithfully by all transactions. There wouldn't be any good
way for a transaction to initialize the bookkeeping logic on-the-fly ---
unless you call acquiring an exclusive lock on a table good.) No one
who's looked at it has thought that it would be a good tradeoff for
making count(*) faster.
regards, tom lane
Not to mention it only increases the speed of:
SELECT count(*) FROM foo;
and not:
SELECT count(*) FROM foo WHERE bar;
--
Rod Taylor
Your eyes are weary from staring at the CRT. You feel sleepy. Notice
how restful it is to watch the cursor blink. Close your eyes. The
opinions stated above are yours. You cannot imagine why you ever felt
otherwise.
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Dann Corbit" <DCorbit@connx.com>
Cc: "Doug McNaught" <doug@wireboard.com>;
<pgsql-hackers@postgresql.org>
Sent: Friday, April 05, 2002 3:21 PM
Subject: Re: [HACKERS] Suggestion for optimization
"Dann Corbit" <DCorbit@connx.com> writes:
At any rate, there is clearly a concept of cardinality in any
case.
Certainly. The count(*) value is perfectly well defined within any
one
transaction. We *could*, if we wanted to, implement bookkeeping
logic
that would keep track of the number of rows inserted by all
transactions
and allow derivation of the count-as-seen-by-any-one-transaction at
all
times. The point is that that logic would be vastly more complex
than
you thought it would be; and it would not be optional. (AFAICS, the
counts would have to be determined at postmaster startup and then
maintained faithfully by all transactions. There wouldn't be any
good
way for a transaction to initialize the bookkeeping logic
on-the-fly ---
unless you call acquiring an exclusive lock on a table good.) No
one
who's looked at it has thought that it would be a good tradeoff for
making count(*) faster.regards, tom lane
---------------------------(end of
broadcast)---------------------------
Show quoted text
TIP 4: Don't 'kill -9' the postmaster
Dann Corbit wrote:
I guess that this model can be viewed as "everything is a snapshot".
It seems plain that the repercussions for a data warehouse and for
reporting have not been thought out very well. This is definitely
very, very bad in that arena. I suppose that reporting could still
be accomplished, but it would require pumping the data into a new
copy of the database that does not allow writes at all. Yuck.At any rate, there is clearly a concept of cardinality in any case.
Perhaps the information would have to be kept as part of the
connection. If (after all) you cannot even compute cardinality
for a single connection then the database truly is useless. In
fact, under a scenario where cardinality has no meaning, neither does
select count() since that is what it measures. Might as well
remove it from the language.I have read a couple books on Postgresql and somehow missed the
whole MVCC idea. Maybe after I understand it better the clammy
beads of sweat on my forehead will dry up a little.
Oracle is also a MVCC database. So this notion that MVCC somehow makes
it inappropriate for data warehousing would imply that Oracle is also
inappropriate. However, in your defense, Oracle did apparently find
enough customer demand for a MVCC-compatible hack of COUNT() to
implement a short-cut route to calculate its value...
Mike Mascari
mascarm@mascari.com
-----Original Message-----
From: Mike Mascari [mailto:mascarm@mascari.com]
Sent: Friday, April 05, 2002 12:44 PM
To: Dann Corbit
Cc: Doug McNaught; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Suggestion for optimization
Dann Corbit wrote:
I guess that this model can be viewed as "everything is a snapshot".
It seems plain that the repercussions for a data warehouse and for
reporting have not been thought out very well. This is definitely
very, very bad in that arena. I suppose that reporting could still
be accomplished, but it would require pumping the data into a new
copy of the database that does not allow writes at all. Yuck.At any rate, there is clearly a concept of cardinality in any case.
Perhaps the information would have to be kept as part of the
connection. If (after all) you cannot even compute cardinality
for a single connection then the database truly is useless. In
fact, under a scenario where cardinality has no meaning, neither does
select count() since that is what it measures. Might as well
remove it from the language.I have read a couple books on Postgresql and somehow missed the
whole MVCC idea. Maybe after I understand it better the clammy
beads of sweat on my forehead will dry up a little.
Oracle is also a MVCC database. So this notion that MVCC somehow makes
it inappropriate for data warehousing would imply that Oracle is also
inappropriate. However, in your defense, Oracle did apparently find
enough customer demand for a MVCC-compatible hack of COUNT() to
implement a short-cut route to calculate its value...
------------------------------------------------------------------
That's interesting. If Oracle is a MVCC database, how did they
manage to perform ANSI standard Isolation Levels? It seems it ought
to be impossible.
<<------------------------------------------------------------------
Import Notes
Resolved by subject fallback
At 12:08 PM -0800 4/5/02, Dann Corbit wrote:
I guess that this model can be viewed as "everything is a snapshot".
It seems plain that the repercussions for a data warehouse and for
reporting have not been thought out very well. This is definitely
very, very bad in that arena. I suppose that reporting could still
be accomplished, but it would require pumping the data into a new
copy of the database that does not allow writes at all. Yuck.
That is exactly the point of MVCC. When you start your reporting cycle, you initiate a transaction. That transaction causes the database to _act_ as if you had "pump[ed] the data into a new copy of the database that does not allow writes at all."
Your transaction is isolated from ongoing activities in the database. Your transaction _is_ a snapshot of the database at some instant in time.
This is a good thing. You should probably ponder it for a while before claiming it hasn't been thought out well wrt. certain applications.
Still, your suggestion _could_ be implemented. Your comment: "An accurate
cardinality figure can greatly enhance the optimizer's ability to
perform joins in the correct order" was intriguing, and I'd be interested in Tom's thoughts on just that bit.
-pmb
-----Original Message-----
From: Jon Grov [mailto:jon@linpro.no]
Sent: Friday, April 05, 2002 12:54 PM
To: Dann Corbit
Cc: Mike Mascari; Doug McNaught; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Suggestion for optimization
"Dann Corbit" <DCorbit@connx.com> writes:
That's interesting. If Oracle is a MVCC database, how did they
manage to perform ANSI standard Isolation Levels? It seems it ought
to be impossible.
There's an excellent introduction to MVCC and snapshot isolation in
the PostgreSQL docs.
See
http://www2.no.postgresql.org/users-lounge/docs/7.2/postgres/mvcc.html
------------------------------------------------------------------
I have read these documents (and some others) now. It seems that
there is a serializable transaction level, and so the goal I was
after can be reached anyway. So never mind. I am at peace again
(and breathing a heavy sigh of relief).
But I am a bit puzzled. How can a serializable transaction be
performed in a MVCC system? I realize the Oracle does it, and also
Postgresql, but I can't picture how that would work.
<<------------------------------------------------------------------
Import Notes
Resolved by subject fallback
-----Original Message-----
From: Rod Taylor [mailto:rbt@zort.ca]
Sent: Friday, April 05, 2002 12:35 PM
To: Dann Corbit; Tom Lane
Cc: Doug McNaught; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Suggestion for optimization
Not to mention it only increases the speed of:
SELECT count(*) FROM foo;
and not:
SELECT count(*) FROM foo WHERE bar;
-----------------------------------------------
Of course. Nobody optimizes that one (that I am
aware of).
<<-----------------------------------------------
Import Notes
Resolved by subject fallback
It would be nice if total table cardinality could be maintained live.
How would this work with MVCC?
It wouldn't. That's why it's not there. Under MVCC, table cardinality
is in the eye of the beholder...
That makes me curious how oracle implements it. I was under the impression
that oracle managed to get the two together (MVCC and cardinality).
Also, can't triggers basically solve the problem from a functionaility
standpoint? I mean, it would be slow for inserts, but I wonder if similar
principles could be applied without the full overhead of triggers?
What I had in mind was to create a stats table (I guess much like the one
that already exists, or a separate attribute of the existing one) and all of
the tables in there, with a rowcount. Once a row is inserted, the trigger
runs and updates the count (or decrements for delete), making the new count
visible to the current transaction. Then when the transaction commits, it's
visible everywhere at the same time as the count.
Is there anything wrong with that setup, other than the obvious performance
hit?
By the way, since this discussion has happened before, I actually read a
similar idea in a previous email (if I remember correctly), but I didn't see
much talk about it.
Regards,
Jeff
(Sorry that my previous post did not reach the pgsql-hackers list, I
sent it from the wrong address and was thus not considered a
subscriber)
"Dann Corbit" <DCorbit@connx.com> writes:
But I am a bit puzzled. How can a serializable transaction be
performed in a MVCC system? I realize the Oracle does it, and also
Postgresql, but I can't picture how that would work.
In short, snapshot isolation implies this (assuming all transactions
are assigned a monotonously growing timestamp, and that all updates
create a new, unique version of the object):
- A transactions reads the most recent version of an object that is
written by a transaction who were committed at it's beginning.
- Two concurrent (i.e. at some point in time, they're both active)
transactions need to have disjunct write sets.
This is probably best illustrated through an example:
Let x and y be columns in a table a. Initially, x = 20 and y = 5 in
the row where k = 1 (k is the primary key).
- Let T_1 be transaction as follows:
BEGIN;
SELECT x FROM a WHERE k = 1; SELECT y FROM a WHERE k = 1;
END;
- Let T_2 be another transaction:
BEGIN;
UPDATE a SET x = 10 WHERE k = 1; UPDATE a SET y = 10 WHERE k = 1;
END;
Then, if we have the following execution:
10 BEGIN; /* T_2 */
20 BEGIN; /* T_1 */
30 SELECT x FROM a WHERE k = 1; /* T_1 */
40 UPDATE a SET x = 10 WHERE k = 1; /* T_2 */
50 UPDATE y SET a = 10 WHERE k = 1; /* T_2 */
60 SELECT y FROM a WHERE k = 1; /* T_1 */
70 END; /* T_2 */
80 END; /* T_1 */
Clearly, this would not be serializable in a non-versioned
database. But under MVCC and snapshot isolation, T_1 reads from a
snapshot, i.e. it would not (under serialable isolation level, at
least) be allowed to read anything T_2 have written. The requirement
is that a transaction T reads values written by transactions committed
before T's beginning.
So T_1 would find that x = 20 and y = 5, just as it would if T_1 and
T_2 were executed serially with T_1 before T_2.
With two (or more) updating transactions, things get more complicated.
Assume we have the following to transactions:
T_1:
BEGIN;
UPDATE x SET x = x + 10;
END;
T_2:
BEGIN;
UPDATE x SET x = x - 5;
END;
and the following execution (assuming x = 20 before it starts):
10 BEGIN; /* T_1 */
20 BEGIN; /* T_2 */
30 UPDATE x SET x = x + 10; /* T_1 */
40 END; /* T_1 */
50 UPDATE x SET x = x - 5; /* T_2 */
60 END; /* T_2 *
Since a transaction is only allowed to read values written by
transactions committed before it's beginning, both T_1 and T_2 will
read x = 20. A transaction started after just this execution would
then read T_2's newly written value of x, that is 15, and line 40
would become a lost update. To avoid this, PostgreSQL offers two
solutions:
- Read committed isolation, where a statement on the form UPDATE
<table> SET <column> = <column> + <value> is considered a special
case and T_2 is allowed to read T_1's value.
- Serializable isolation, where T_2 would have to be aborted.
If line 40 and 50 were swapped, T_2 would wait to see what happens to
T_1. If it's aborted, it can safely read x = 20 regardless of the
isolation level, if it's committed, the result would again depend on
the selected isolation level.
Hopefully, this illustrates the basic concepts. An interesting article
concerning the subtleties of this subject was posted by Tom Lane a
couple of days ago:
In addition, this seems to be the "canonical paper" on snapshot
isolation:
http://citeseer.nj.nec.com/berenson95critique.html
--
Jon Grov, Linpro as
I don't think your idea would work for a concurrent user setup where
people have different transactions started at different times with
different amounts of changes inside each transaction.That's why it would have to be tracked on a "per connection" basis for
all the tables.
I tried it out with concurrent connections and it seemed to hold up just
fine. I think MVCC took care of everything. Transactions got a different
count depending on whether they could see the inserted values or not. Once
committed all transactions could see the new table count.
Can you provide a case where it wouldn't?
I imagine this causes some major performance issues, not to mention the dead
tuples would pile up fast, but it seems to work just fine.
My SQL is below.
Regards,
Jeff
jdavis=> create table tuple_count(tuples int);
CREATE
jdavis=> create table c1(a int);
CREATE
jdavis=> create function f1() returns opaque as '
jdavis'> BEGIN
jdavis'> UPDATE tuple_count set tuples=tuples+1;
jdavis'> RETURN NEW;
jdavis'> END;
jdavis'> ' language 'plpgsql';
CREATE
jdavis=> create function f2() returns opaque as '
jdavis'> BEGIN
jdavis'> UPDATE tuple_count set tuples=tuples-1;
jdavis'> RETURN NEW;
jdavis'> END;
jdavis'> ' language 'plpgsql';
CREATE
jdavis=> create trigger t1 after insert on c1 for each row execute procedure
f1();
CREATE
jdavis=> create trigger t2 after delete on c1 for each row execute procedure
f2();
CREATE
Import Notes
Reply to msg id not found: 3CAE2182.EE9E5A81@postgresql.org
Peter Bierman <bierman@apple.com> writes:
... Your comment: "An
accurate cardinality figure can greatly enhance the optimizer's
ability to perform joins in the correct order" was intriguing, and I'd
be interested in Tom's thoughts on just that bit.
Approximate figures are quite sufficient for the planner's purposes.
AFAICS, making them exact would not improve the planning estimates
at all, because there are too many other sources of error. We have
approximate stats already via vacuum/analyze statistics gathering.
regards, tom lane