how good is PostgreSQL

Started by Arnold Gamboaabout 25 years ago47 messages

arnold@php4us.com

about 25 years ago

Hi,

For users of large PostgreSQL and PostgreSQL builders, this is for you.

I'm having a terrible time deciding now. :(

We're about to build a "huge" website now. I got tied up in signing the
contract without really getting enough information about PgSQL since this
what we plan to implement with PHP (normally we use mySQL but i guess it
does not fit for huge databases like that).

Here's my problem.. We're about to build a site like hitbox.com where there
is a large amount of database required.. If say there is 100,000 users with
1000 page hits per day for each, and everything will be logged, you could
imagine how huge this will be. I'm just so "nervous" (really, that's the
term) if we implement this and later on experience a slow down or worse than
that, crash in the server.

My questions are:
1. What is the limit for number of records in a table WITHOUT SUFFERING SLOW
DOWN.
2. ....limit in number of tables per database
3. ... limit in number of database.

Thanks for you comments. I would really appreciate every comment that I'll
receive regarding this.

Arnold

Gunnar R|nning

gunnar@candleweb.no

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

"Arnold Gamboa" <arnold@php4us.com> writes:

We're about to build a "huge" website now. I got tied up in signing the
contract without really getting enough information about PgSQL since this
what we plan to implement with PHP (normally we use mySQL but i guess it
does not fit for huge databases like that).

Can you do connection pooling and client side caching of database queries
in PHP ? From working with Java this is the spot we really improve speed.

Here's my problem.. We're about to build a site like hitbox.com where there
is a large amount of database required.. If say there is 100,000 users with
1000 page hits per day for each, and everything will be logged, you could
imagine how huge this will be. I'm just so "nervous" (really, that's the
term) if we implement this and later on experience a slow down or worse than
that, crash in the server.

How many database queries do you have per page hit ?
How many database inserts/updates do you have per page hit ?

Are you using the database for httpd access logging, or is it some
application level logging ? Anyhow you might want to look into an
architecture where you have a dedicated box for the logging.

But most important, test with real data. Populate your database and run
stress tests.

I'm was doing some testing on a portal my company has developed with
PostgreSQL as the backend database. Running on my Linux laptop P466 with
128MB, Apache JServ, PostgreSQL 7.0.2. I managed to get about ~20 pageviews
a second. Each pageview had on average 4 queries and 1 insert.

But measure for yourself. Remember that you can gain a lot by tuning
application, database and OS.

regards,

Gunnar

Import Notes

Reply to msg id not found: ArnoldGamboasmessageofTue31Oct2000132504+0800

markw

markw@mohawksoft.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

Arnold Gamboa wrote:

Hi,

For users of large PostgreSQL and PostgreSQL builders, this is for you.

I'm having a terrible time deciding now. :(

We're about to build a "huge" website now. I got tied up in signing the
contract without really getting enough information about PgSQL since this
what we plan to implement with PHP (normally we use mySQL but i guess it
does not fit for huge databases like that).

Here's my problem.. We're about to build a site like hitbox.com where there
is a large amount of database required.. If say there is 100,000 users with
1000 page hits per day for each, and everything will be logged, you could
imagine how huge this will be. I'm just so "nervous" (really, that's the
term) if we implement this and later on experience a slow down or worse than
that, crash in the server.

That is a LOT of work for any system. That is over 1100 page views a second, or
under 900us each.. A standard Pentium III system, serving static pages would
have problems with that.

If you look at search engines, to get that performance with readonly data, they
usually cluster multiple systems and load balance across them. You may need to
segment your data and have multiple SQL servers perform different functions.

Also, that 1100 page view per second is assuming an even distribution of
traffic, which does not happen in a web server. If you average that much,
chances are there will be periods of twice that.

Look into a "local director," "Alteon," or even LVS.

My questions are:
1. What is the limit for number of records in a table WITHOUT SUFFERING SLOW
DOWN.

2. ....limit in number of tables per database
3. ... limit in number of database.

There are a couple factors involved, more complex than a simple response.

Use multiple databases and put each on a separate disk, with its own
controller. Better yet, have multiple load balanced web boxes do a lot of
processing in PHP and offload much of the CPU bound SQL work to the "cheap" web
boxes, and have multiple SQL databases in the back handling various independent
tasks.

In a web site I worked on, we had multiple front end web servers, load balanced
with an Alteon. Each web server had its own SQL database which provided SQL
access to "static" data which was updated each week. We had an additional
single SQL database backend which all the Web servers accessed for synchronized
dynamic data.

If you are serious about the load you expect to put on this system you must be
careful:
Do not create any indexes you do not need.
Do not use the "foreign key" constraint as it forces a trigger for each insert.

Make sure you index the keys by which you will access data.
Avoid searching by strings, try to use keys.

Even after that, you have a long way to go before you will hit 1000
transactions per second from any SQL database.

If you are betting your business on this implementation, you have a lot of
homework to do.

Show quoted text

Thanks for you comments. I would really appreciate every comment that I'll
receive regarding this.

Arnold

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

Even after that, you have a long way to go before you will hit 1000
transactions per second from any SQL database.

I guess they could always buy a few Sun E10000's on the backend, and a
large room of rack-mountable PC's for web/CGI serving. Nothing like
plopping down ten or twenty million dollars on hardware. : )

steve

Lamar Owen

lamar.owen@wgcr.org

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

Steve Wolfe wrote:

Even after that, you have a long way to go before you will hit 1000
transactions per second from any SQL database.

I guess they could always buy a few Sun E10000's on the backend, and a
large room of rack-mountable PC's for web/CGI serving. Nothing like
plopping down ten or twenty million dollars on hardware. : )

Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on
that. Probably would cost less, and be more reliable. And they can
always load another Linux/390 VM -- an S/390 can run something like
41,000 virtual machines each running Linux/390 and Apache.

However, if you want to see the architecture of a _large_
database-backed website, see the story behind Digital City at
www.aolserver.com. While they're using Sybase instead of PostgreSQL,
the architecture is the same.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on
that. Probably would cost less, and be more reliable. And they can
always load another Linux/390 VM -- an S/390 can run something like
41,000 virtual machines each running Linux/390 and Apache.

Yeah.... I'm very optomistic about IBM's new chips that are coming out
next year. Each "processor module" will have 4 processors, but each
processor will have 2 cores - so in effect, each "processor module" has 8
processors on it. All processors will have copper interconnects, and
depending on the source, will debut at anywhere from 1.3 to 2 gigahertz. I
think that will certainly help them get a larger share of the high-end
market!

steve

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

Even after that, you have a long way to go before you will hit 1000
transactions per second from any SQL database.

Since my last post probably wasn't too useful, here's some information
that might be a little more help. It's a little long, I know, but hopefully
it will be of use to someone.

As programmers, we naturally want to throw things into databases for
three reasons. First, it's easy to get data in. Second, it's easy to get
relevant data out. And third, it's "cool". We don't want to work with flat
files, now do we? ; )

However, in some cases, using the database to get data out ends up
costing us a lot of time and money. Sometimes we do the same nasty query so
often, that we end up purchasing bigger hardware to make the system work
reasonably. Why? Because it was easier for us to write a program that did:

GetDataFromDatabase();
PrepareData();
PrintData();

Each time, the database server does the work. But it doesn't
necessarily have to be that way. In our company, we've found two trends
that have enabled us to save a LOT of processing power on our machines.
(read: Increase the capacity of our servers by 30% or more, with fairly
minor changes)

The first case is that of rarely-changing data. Some of our datasets
probably have around 50,000 to 1,000,000 views (selects) for each update
(insert/delete). Having the database repeat the query every time is a
waste. So, we began writing our programs such that they will grab the data
from the database once, and generate the HTML for every page, and the
indexes. Then, when an update is made to the database (via the
administrative tools), it simply rewrites *the relevant HTML files*, and
changes the indeces pointing to them. (There are also some other very large
advantages to this sort of thing, but I'm not allowed to say them. ; ) )

The second case is that of often-repeated queries. One of the
offerings on our site is an online directory, which gets a pretty fair
amount of traffic. Unfortunately, it uses a proprietary program that was
purchased by management before they spoke with us. Grr.... It was the
most utterly inefficient program I've ever seen. It would *not* allow the
database to do joins, it would grab entire tables, then try to do the joins
itself, in Perl.

We rewrote the program to let PostgreSQL do the joins, and that sped
it up. Then we realized that a very small number of queries (those for the
first one or two levels of pages) accounted for a huge portion of the
useage. So, we replaced the front page with a static HTML page (the front
page doesn't change...), and saw another terrific drop in our system loads.

Overall, by only modifying a couple of our more heavily-uesd programs,
our server loads dropped by about 30%-40%. If we went to the trouble to
modify some others, it would drop even more. But we're going to rewrite
them completely for other reasons. : )

In any event, there are ways like this to save a LOT of CPU and disk I/O.
Most web servers can server out several hundred static pages with the
resources that would otherwise deliver one dynamically-created,
database-driven page. It also allows you to cluster the web servers with
cheap commodity hardware, instead of using big-iron on the database. And if
you have a big-iron machine running the back-end, this can severely lighten
the load on it, keeping you from dropping a few hundred grand on the next
step up. ; )

(Incidentally, we've toyed around with developping a query-caching system
that would sit betwen PostgreSQL and our DB libraries. However, it seems
like it could be done *much* more efficiently in PostgreSQL itself, as it
would be much easier to keep track of which tables have changed, etc..
Anybody know if this sort of functionality is planned? It would be terrific
to simply give the machine another 256 megs of RAM, and tell it to use it as
a DB cache...)

steve

Kevin O'Gorman

kogorman@pacbell.net

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

markw wrote:

Arnold Gamboa wrote:

Hi,

For users of large PostgreSQL and PostgreSQL builders, this is for you.

..snip..

Also, that 1100 page view per second is assuming an even distribution of
traffic, which does not happen in a web server. If you average that much,
chances are there will be periods of twice that.

That's excessively optimistic. If your daily average is 1100 per second, you'll
have 2200 average for many of the hours in that day, 5500 for a few hours, and
some 10-minute periods with 11,000, certainly once in a while.

++ kevin

--
Kevin O'Gorman (805) 650-6274 mailto:kogorman@pacbell.net
Permanent e-mail forwarder: mailto:Kevin.O'Gorman.64@Alum.Dartmouth.org
At school: mailto:kogorman@cs.ucsb.edu
Web: http://www.cs.ucsb.edu/~kogorman/index.html
Web: http://trixie.kosman.via.ayuda.com/~kevin/index.html

"There is a freedom lying beyond circumstance,
derived from the direct intuition that life can
be grounded upon its absorption in what is
changeless amid change"
-- Alfred North Whitehead

KuroiNeko

evpopkov@carrier.kiev.ua

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

As programmers, we naturally want to throw things into databases for
three reasons. First, it's easy to get data in. Second, it's easy to get
relevant data out. And third, it's "cool". We don't want to work with
flat files, now do we? ; )

Kiddin', eh? :) Actually, the third reason seems to dominate the younger
developers' minds. People often tend to keep everything in poor DBMS until
it begins to kick back. And this has impact on the customers. Does your
system use a database? No, why should it? You mean you'll keep our dearly
beloved banner ads as flat files? Yes, this is where they belong. Sorry,
we'll seek for someone more advanced. Good luck.
Of course, hardware vendors jump up of joy :) Maybe I don't get it, but
IMHO there's no reason to put into DB something that can't be indexed and
used in where clause.

It would *not* allow the
database to do joins, it would grab entire tables, then try to do the
joins
itself, in Perl.

Umh.... Yeah.... Well.... To keep compatibility with other Open Source
Databases and ESR/RMS, you know :)

(Incidentally, we've toyed around with developping a query-caching
system that would sit betwen PostgreSQL and our DB libraries.

Sounds amazing, but requires some research, I guess. However, in many
cases one would be more than happy with cahced connections. Of course,
cahced query results can be naturally added to that, but just connections
are OK to start with. Security....

contaminated fish and microchips
huge supertankers on Arabian trips
oily propaganda from the leaders' lips
all about the future
there's people over here, people over there
everybody's looking for a little more air
crossing all the borders just to take their share
planning for the future

Rainbow, Difficult to Cure

#10

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Query caching

(Incidentally, we've toyed around with developping a

query-caching

system that would sit betwen PostgreSQL and our DB libraries.

Sounds amazing, but requires some research, I guess. However, in

many

cases one would be more than happy with cahced connections. Of

course,

cahced query results can be naturally added to that, but just

connections

are OK to start with. Security....

To me, it doesn't sound like it would be that difficult of a project, at
least not for the likes of the PostgreSQL developpers. It also doesn't seem
like it would really introduce any security problems, not if it were done
inside of PostgreSQL. Long ago, I got sidetracked from my endeavors in C,
and so I don't feel that I'm qualified to do it. (otherwise, I would have
done it already. : ) ) If you wanted it done in Perl or Object Pascal, I
could help. : )

Here's a simple design that I was tossing back and forth. Please
understand that I'm not saying this is the best way to do it, or even a good
way to do it. Just a possible way to do it. I haven't been able to give it
as much thought as I would like to. Here goes.

------------
Implementation

Upon starting, the PostgreSQL engine could allocate a chunk of memory,
sized according to the administrator's desire. That chunk would be used
solely for query caching.

When a query came in that was not cached (say, the first query), the
database engine would process it as normal. It would then return it to the
user, and add it to the cache. "Adding it to the cache" would mean that it
would enter the query itself, the result set, and a list of which tables the
query relied upon. The query that is stored could be either the query
coming from the user, or the query after it goes through the optimizer.
Each has pros and cons, I would probably favor using the query that comes
from the user.

When another query comes along, the caching engine would quickly look
in the hash table, and see if it already had the cached results of the
query. If so, it returns them, and wham. You've just avoided all of the
work of optimizing, parsing, and executing, not to mention the disk I/O. A
hash lookup seems extremely cheap compared to the work of actually
processing a query.

When an update/delete/insert comes along, the engine would analyze
which tables were affected, and clear the cache entries that relied upon
those tables.

-----------------
Cache Clearing

Cache clearing would be achieved via an LRU-based algorithm, which
would also take into account the amount of RAM used by each query in the
cache.
-----------------
Performance Impact

The potential performance differences range from a miniscule decrease to
a tremendous increase. And it's a lot cheaper to throw an extra half gig of
RAM in a machine that to upgrade processors and disk subsystems!

------------------
Possible Changes

One potential drawback is that when a table is modified, the queries
that rely upon it would be discarded. Where a table is updated frequently,
that could greatly reduce the performance benefit. One possible alternative
is to store the query cost with each query in the cache. When a table is
updated, those queries are marked as "dirty". If the system load is below a
certain amount, or the system has been idle, it could then re-execute those
queries and update the cache. Which queries it re-executed would be
determined on a factor of query cost and how frequently those cache entries
were used.
-------------------

The reason I would prefer it done in the PostgreSQL engine (as opposed to
in a middleware application) is that the caching engine needs to know (a)
which tables a query relies upon, and (b) which tables get changed. It
seems that it would significantly reduce overhead to do those inside of
PostgreSQL (which is already doing the query parsing and analysis).

This could certainly give PostgreSQL a huge advantage over other
database systems, too. It could save administrators a very large chunk of
cash that they would otherwise have to spend on large systems. And it would
just be cool. ; )

steve

#11

Bryan White

bryan@arcamax.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

Whenever a query is executed (not found in cache, etc.), the caching
system would simply store the query, the results, and a list of tables
queried. When a new query came in, it would do a quick lookup in the

query

hash to see if it already had the results. If so, whammo. Whenever an
insert/delete/update was sensed, it would look at the tables being

affected,

and the caching mechanism would clear out the entries depending on those
tables.

It seems to me that tracking the list of cached queries and watching for
queries that might invalidate them adds a lot of complexity to the back end
and the front end still has to establish the connection and wait transfer
the data over the socket.

On a more practical level, a backend solution would require someone with
fairly detailed knowlege of the internals of the backend. A front end
solution can more likely to be implemented by someone not as knowlegable.

One of the big advantages of your technique is there is no code change at
the application level. This means less database lock-in. Maybe that is a
disadvantage too. ;-)

#12

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

It seems to me that tracking the list of cached queries and watching for
queries that might invalidate them adds a lot of complexity to the back

end

and the front end still has to establish the connection and wait transfer
the data over the socket.

I really don't think that it would. Checking to see if you have a query
(a hash lookup) is very, very cheap relative to normally processing a query,
I would think.

And invalidating cache entries would also be very, very cheap compared to
the normal activity of the database. Assuming hashes are done correctly, it
would probably be done much, much faster than any query could execute. If
software caches can increase the performance of disk drives that have
latencies in thousandths of seconds, I'm sure they could help with queries
that take hundredths or tenths of seconds. ; )

On a more practical level, a backend solution would require someone with
fairly detailed knowlege of the internals of the backend. A front end
solution can more likely to be implemented by someone not as knowlegable.

Yeah. I was hoping that one of the developpers would say "oooh... that
would rock. We should do that." : )

One of the big advantages of your technique is there is no code change at
the application level. This means less database lock-in. Maybe that is a
disadvantage too. ;-)

I'm sure that someone with a better understanding of the theory associated
with cache invalidation would design a better algorithm that I would, but it
seems that even a fairly rudimentary implementation would seriously increase
performance.

steve

#13

KuroiNeko

evpopkov@carrier.kiev.ua

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: Query caching

Here's a simple design that I was tossing back and forth. Please
understand that I'm not saying this is the best way to do it, or even a
good way to do it. Just a possible way to do it.

Sounds interesting, I certainly have reasons to play bad guy, but that's
what I always do, so nevermind :)
However, there's one major point where I disagree. Not that I have real
reasons to, or observation or analysis to background my position, just a
feeling. And the feeling is that connection/query cache should be separate
from DBMS server itself.
Several things come to the mind right off, like possibilities to cache
connections to different sources, like PGSQL and Oracle, as well as a
chance to run this cache on a separate box that will perform various
additional functions, like load balancing. But that's right on the surface.
Still in doubt....

Rainbow, Difficult to Cure

#14

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: Query caching

Sounds interesting, I certainly have reasons to play bad guy, but

that's

what I always do, so nevermind :)

That's OK. Somebody has to be a realist. : )

However, there's one major point where I disagree. Not that I have

real

reasons to, or observation or analysis to background my position, just

feeling. And the feeling is that connection/query cache should be

separate

from DBMS server itself.
Several things come to the mind right off, like possibilities to

cache

connections to different sources, like PGSQL and Oracle,

That would be a benefit if you're running multiple DBMS' in the house -
and you're certainly welcome to do something like that as a standalone
package. ; ) I think it would be terrific if PostgreSQL could have the
feature added to it, which would (a) give it a big performance benefit, (b)
let it take advantage of already-written code, and (c) make one less machine
and service to administer.

as well as a
chance to run this cache on a separate box that will perform

various

additional functions, like load balancing. But that's right on the

surface.

Still in doubt....

Yes, load-balancing would be another good factor. However, to my (very
limitted) knowledge, there aren't any truly good ways of splitting up
database work. If you're doing nothing but selects, it would be easy. But
when updates come around, it gets hairier - and when you try to try for
dynamic continuity-checking and database rebuilding, it gets very ugly. If
there are any systems that get around those without huge performance hits,
I'd love to hear about it.

(Of course, if you have lots of money, a Beowolf-style cluster with high
bandwidth, low-latency interconnects becomes desireable. But that's a
different ballgame.)

However, there is one other possibility: With caching, your servers
might see enough of a performance increase that you wouldn't need to
load-balance them. : )

steve

#15

Alfred Perlstein

bright@wintelcom.net

about 25 years ago

In reply to: Steve Wolfe (#10)

Re: Query caching

* Steve Wolfe <steve@iboats.com> [001031 13:47] wrote:

(Incidentally, we've toyed around with developping a

query-caching

system that would sit betwen PostgreSQL and our DB libraries.

Sounds amazing, but requires some research, I guess. However, in

many

cases one would be more than happy with cahced connections. Of

course,

cahced query results can be naturally added to that, but just

connections

are OK to start with. Security....

To me, it doesn't sound like it would be that difficult of a project, at
least not for the likes of the PostgreSQL developpers. It also doesn't seem
like it would really introduce any security problems, not if it were done
inside of PostgreSQL. Long ago, I got sidetracked from my endeavors in C,
and so I don't feel that I'm qualified to do it. (otherwise, I would have
done it already. : ) ) If you wanted it done in Perl or Object Pascal, I
could help. : )

Here's a simple design that I was tossing back and forth. Please
understand that I'm not saying this is the best way to do it, or even a good
way to do it. Just a possible way to do it. I haven't been able to give it
as much thought as I would like to. Here goes.

------------
Implementation

[snip]

Karel Zak <zakkr@zf.jcu.cz> Implemented stored proceedures for
postgresql but still hasn't been approached to integrated them.

You can find his second attempt to get a response from the developers
here:

http://people.freebsd.org/~alfred/karel-pgsql.txt

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#16

KuroiNeko

evpopkov@carrier.kiev.ua

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: Query caching

I think this feature deserves to be put on the TODO list under exotic
features.

Well, it's kinda implemented already, I believe, with decades of being run
unattended :)

This feature would probably also be a threat to MySQL dominance in the
web scripting area for websites with medium to high traffic ;)

Dominance? Who needs it, anyway?

Rainbow, Difficult to Cure

#17

Poul L. Christiansen

poulc@cs.auc.dk

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: Query caching

KuroiNeko wrote:

Here's a simple design that I was tossing back and forth. Please
understand that I'm not saying this is the best way to do it, or even a
good way to do it. Just a possible way to do it.

Sounds interesting, I certainly have reasons to play bad guy, but that's
what I always do, so nevermind :)

I think this feature deserves to be put on the TODO list under exotic
features.

This feature would probably also be a threat to MySQL dominance in the
web scripting area for websites with medium to high traffic ;)

Poul L. Christiansen

#18

Tim Kientzle

kientzle@acm.org

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

a) Don't log to a database. Log data should be sent into a process
that collects any needed on-the-fly statistics and then outputs
into disk files (rotating hourly or daily depending on your needs).
This model is becoming pretty standard with Apache now; look at
rotatelog in the Apache distribution for an example.

b) Number of records isn't really the issue. Query complexity and
number of queries are more pertinent. Generally, for example, a
single SELECT that pulls in multiple rows is much faster than
a bunch of small SELECTs.

c) For very high traffic, you are going to have multiple front-end
servers. If you design the system carefully, you can have a single
shared network disk used by all of your front ends, then just stack
boxes in front of it. This doesn't give you endless scalability,
though;
at some point you'll saturate your network file server and/or
database
box.

d) PHP may not be a great choice. It doesn't provide a lot of hooks
for effective caching of database connections and/or results.
mod_perl or Java servlets may be better, depending on the details.

- Tim Kientzle

Arnold Gamboa wrote:

Show quoted text

Hi,

For users of large PostgreSQL and PostgreSQL builders, this is for you.

I'm having a terrible time deciding now. :(

We're about to build a "huge" website now. I got tied up in signing the
contract without really getting enough information about PgSQL since this
what we plan to implement with PHP (normally we use mySQL but i guess it
does not fit for huge databases like that).

Here's my problem.. We're about to build a site like hitbox.com where there
is a large amount of database required.. If say there is 100,000 users with
1000 page hits per day for each, and everything will be logged, you could
imagine how huge this will be. I'm just so "nervous" (really, that's the
term) if we implement this and later on experience a slow down or worse than
that, crash in the server.

My questions are:
1. What is the limit for number of records in a table WITHOUT SUFFERING SLOW
DOWN.
2. ....limit in number of tables per database
3. ... limit in number of database.

Thanks for you comments. I would really appreciate every comment that I'll
receive regarding this.

Arnold

#19

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

d) PHP may not be a great choice. It doesn't provide a lot of hooks
for effective caching of database connections and/or results.
mod_perl or Java servlets may be better, depending on the details.

One of our competitors spent a very, very large deal of money on high-end
Sun equipment, so that they could write their CGI stuff in Java servlets.
It still ran slow. We run Perl on machines that pale compared to theirs,
and get far better performance. : )

steve

#20

KuroiNeko

evpopkov@carrier.kiev.ua

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

We run Perl on machines that pale compared to theirs,
and get far better performance. : )

Well, don't get me wrong, I'm not going to a war. Here :) But CGI is so
simple and straightforward that anything more than C is quite an overkill
(think assembly).
Myself I'm planning to port all my PERL stuff eventually. Yes, PERL is
great for string handling, but when you spend a couple of weeks on BugTraq,
you'll suddenly feel that it's still too much. When you only let `known
good' values in, lex or regexp libs will do.
Sorry for the offtopic, anyone interested is welcome to email me in
private.

Rainbow, Difficult to Cure

#21

Gunnar R|nning

gunnar@candleweb.no

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

"Steve Wolfe" <steve@iboats.com> writes:

One of our competitors spent a very, very large deal of money on high-end
Sun equipment, so that they could write their CGI stuff in Java servlets.
It still ran slow. We run Perl on machines that pale compared to theirs,
and get far better performance. : )

You can always do it slow if you don't design properly. A former customer
saved a lot hardware and maintenance cost by migrating from a perl based
publishing system to a Java based one. Less hardware, better performance and
more functionality. ;-) The old perl system had been developed and maintained
over a 4 year period - the initial development of the new Java based system
took about 9 months.

regards,

Gunnar

Import Notes

Reply to msg id not found: SteveWolfesmessageofTue31Oct2000173954-0700

#22

Jim Mercer

jim@reptiles.org

about 25 years ago

In reply to: Gunnar R|nning (#21)

Re: how good is PostgreSQL

i have a client which merged two companies, one running perl the other running
java.

what to do?

i convinced them to port both the perl and java code to INTERCAL, and run
the whole system on an array of C-64's.

works better than either of the perl or java stuff.

On Wed, Nov 01, 2000 at 01:58:55AM +0100, Gunnar R|nning wrote:

"Steve Wolfe" <steve@iboats.com> writes:

One of our competitors spent a very, very large deal of money on high-end
Sun equipment, so that they could write their CGI stuff in Java servlets.
It still ran slow. We run Perl on machines that pale compared to theirs,
and get far better performance. : )

You can always do it slow if you don't design properly. A former customer
saved a lot hardware and maintenance cost by migrating from a perl based
publishing system to a Java based one. Less hardware, better performance and
more functionality. ;-) The old perl system had been developed and maintained
over a 4 year period - the initial development of the new Java based system
took about 9 months.

--
[ Jim Mercer jim@reptiles.org +1 416 410-5633 ]
[ Reptilian Research -- Longer Life through Colder Blood ]
[ Don't be fooled by cheap Finnish imitations; BSD is the One True Code. ]

#23

Alex Pilosov

alex@pilosoft.com

about 25 years ago

In reply to: Jim Mercer (#22)

Re: how good is PostgreSQL

On Tue, 31 Oct 2000, Jim Mercer wrote:

i convinced them to port both the perl and java code to INTERCAL, and run
the whole system on an array of C-64's.

But there are no bindings from Postgresql to intercal! ;)

(I hope I didn't just give a bad idea to someone...;)

-alex

#24

The Hermit Hacker

scrappy@hub.org

about 25 years ago

In reply to: Alfred Perlstein (#15)

Re: Re: [GENERAL] Query caching

On Tue, 31 Oct 2000, Alfred Perlstein wrote:

* Steve Wolfe <steve@iboats.com> [001031 13:47] wrote:

(Incidentally, we've toyed around with developping a

query-caching

system that would sit betwen PostgreSQL and our DB libraries.

Sounds amazing, but requires some research, I guess. However, in

many

cases one would be more than happy with cahced connections. Of

course,

cahced query results can be naturally added to that, but just

connections

are OK to start with. Security....

To me, it doesn't sound like it would be that difficult of a project, at
least not for the likes of the PostgreSQL developpers. It also doesn't seem
like it would really introduce any security problems, not if it were done
inside of PostgreSQL. Long ago, I got sidetracked from my endeavors in C,
and so I don't feel that I'm qualified to do it. (otherwise, I would have
done it already. : ) ) If you wanted it done in Perl or Object Pascal, I
could help. : )

Here's a simple design that I was tossing back and forth. Please
understand that I'm not saying this is the best way to do it, or even a good
way to do it. Just a possible way to do it. I haven't been able to give it
as much thought as I would like to. Here goes.

------------
Implementation

[snip]

Karel Zak <zakkr@zf.jcu.cz> Implemented stored proceedures for
postgresql but still hasn't been approached to integrated them.

someone has to approach him to integrate them? *raised eyebrow*

Karel, where did things stand the last time this was brought up? We
haven't gone beta yet, can you re-submit a patch for v7.1 before beta so
that we can integrate the changes? *Maybe*, if possible, submit it such
that its a compile time option, so that its there if someone like Alfred
wants to be brave, but it won't zap everyone if there is a bug?

#25

Daniel Freedman

freedman@ccmr.cornell.edu

about 25 years ago

In reply to: KuroiNeko (#13)

Re: Query caching

On the topic of query cache (or maybe this is just tangential and I'm
confused):

I've always heard that Oracle has the ability to essentially suck in as
much of the database into RAM as you have memory to allow it, and can then
just run its queries on that in-RAM database (or db subset) without doing
disk I/O (which I would probably imagine is one of the more expensive
parts of a given SQL command). I've looked for references as to
Postgresql's ability to do something like this, but I've never been
certain if it's possible. Can postgresql do this, please? And, if not,
does it have to hit the disk for every SQL instruction (I would assume
so)?

I would imagine that the actual query cache would be slightly orthogonal
to this in-RAM database cache, in as much as it would actually store the
results of specific queries, rather than the complete tuple set on which
to run queries. However, I would imagine that both schemes would provide
performance increases.

Also, as KuroiNeko writes below about placing the query cache outside the
actual DBMS, don't some webservers (or at least specific custom coding
implementations of them) just cache common query results themselves?
(Not that it would necessarily be bad for the DBMS to do so, I
wouldn't know enough about this to surmise.)

I'd appreciate any pointers to more information on specific performance
tuning in this area (IMHO, it would probably be a boon to the postgresql
database and its community, if there existed some reference like
O'Reilly's _Oracle Performance Tuning_ that was focused on Postgresql.)

Thanks for any extra info,

Daniel

On Tue, 31 Oct 2000, KuroiNeko wrote:

Show quoted text

Here's a simple design that I was tossing back and forth. Please
understand that I'm not saying this is the best way to do it, or even a
good way to do it. Just a possible way to do it.

Sounds interesting, I certainly have reasons to play bad guy, but that's
what I always do, so nevermind :)
However, there's one major point where I disagree. Not that I have real
reasons to, or observation or analysis to background my position, just a
feeling. And the feeling is that connection/query cache should be separate
from DBMS server itself.
Several things come to the mind right off, like possibilities to cache
connections to different sources, like PGSQL and Oracle, as well as a
chance to run this cache on a separate box that will perform various
additional functions, like load balancing. But that's right on the surface.
Still in doubt....

#26

KuroiNeko

evpopkov@carrier.kiev.ua

about 25 years ago

In reply to: Daniel Freedman (#25)

Re: Query caching

I've looked for references as to
Postgresql's ability to do something like this, but I've never been
certain if it's possible. Can postgresql do this, please? And, if not,
does it have to hit the disk for every SQL instruction (I would assume
so)?

Doing so, as you might guess is quite dangerous. Eg, RAM failures are
extremely rare, with probability very close to 0, but there's nothing
absolutely reliable.
From my, quite limited, experience, I can tell that PGSQL relies more on
file caching (or whatever is the term), provided by the OS, rather than on
slurping relations into RAM. See the recent discussion of [f]sync(), maybe
it sheds more light.

I would imagine that the actual query cache would be slightly orthogonal
to this in-RAM database cache

Actually, there are several ways to keep the data in memory, each having
its advantages drawbacks and reasons. To name just a few: caching pages and
files, mapping files, storing `internal' structures (like the tuples in
your example) in shared memory areas.
Apologets and enemies of each method come in all shapes, but the real life
is even worse. Often these methods interfere with each other, and
inaccurate combination (you cache the pages, but overlooked file caching,
performed by the OS) may easily become a bottleneck.

I'd appreciate any pointers to more information on specific performance
tuning in this area (IMHO, it would probably be a boon to the postgresql
database and its community, if there existed some reference like
O'Reilly's _Oracle Performance Tuning_ that was focused on Postgresql.)

As I see it, performance tuning with PGSQL should be concentrated around
quality design of your DB and queries. I may be wrong, but there's not much
to play with where PGSQL server touches the system.
Maybe it's bad, but I like it. General suggestions about fs performance
apply to PGSQL and you don't have to re-invent the wheel. There are just
files. Play with sync, install a RAID of SCSI drives, keep your swap on
separate controller. Nothing really special that would impact or, what's
more important, interfere with other services running on the same box.
Change must come from inside :) Here, inside is DB design.

Rainbow, Difficult to Cure

#27

Denis Perchine

dyp@perchine.com

about 25 years ago

In reply to: Daniel Freedman (#25)

Re: Query caching

PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's
are cached, but the default cache is only ï¿½MB of RAM. You can change
this to whatever you want.

I'm using Cold Fusion and it can cache queries itself, so no database
action is necessary. But I don't think PHP and others have this
possibility. But Cold Fusion costs 1300$ :(

No, PHP has this.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Import Notes

Reply to msg id not found: 39FFED9A.E064B36D@cs.auc.dk

#28

Frank Joerdens

frank@joerdens.de

about 25 years ago

In reply to: Daniel Freedman (#25)

Re: Query caching

On Wed, Nov 01, 2000 at 10:16:58AM +0000, Poul L. Christiansen wrote:

PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's
are cached, but the default cache is only ï¿½MB of RAM. You can change
this to whatever you want.

That sound like a very cool thing to do, and the default seems awfully
conservative, given the average serverï¿½s RAM equipment nowadays. If you
have a small Linux server with 128 MB of RAM, it would be interesting to
see what happens, performance-wise, if you increase the cache for
selects to, for instance, 64 MB. Has anyone tried to benchmark this? How
would you benchmark it? Where do you change this cache size? How do you
keep the cache from being swapped out to disk (which would presumably
all but eradicate the benefits of such a measure)?

Cheers Frank

--
frank joerdens

joerdens new media
urbanstr. 116
10967 berlin
germany

e: frank@joerdens.de
t: +49 (0)30 69597650
f: +49 (0)30 7864046
h: http://www.joerdens.de

pgp public key: http://www.joerdens.de/pgp/frank_joerdens.asc

Import Notes

Reply to msg id not found: 39FFED9A.E064B36D@cs.auc.dk

#29

Karel Zak

zakkr@zf.jcu.cz

about 25 years ago

In reply to: The Hermit Hacker (#24)

Re: Re: [GENERAL] Query caching

On Tue, 31 Oct 2000, The Hermit Hacker wrote:

On Tue, 31 Oct 2000, Alfred Perlstein wrote:

Karel Zak <zakkr@zf.jcu.cz> Implemented stored proceedures for
postgresql but still hasn't been approached to integrated them.

someone has to approach him to integrate them? *raised eyebrow*

Karel, where did things stand the last time this was brought up? We
haven't gone beta yet, can you re-submit a patch for v7.1 before beta so
that we can integrate the changes? *Maybe*, if possible, submit it such
that its a compile time option, so that its there if someone like Alfred
wants to be brave, but it won't zap everyone if there is a bug?

Well I can re-write and resubmit this patch. Add it as a compile time option
is not bad idea. Second possibility is distribute it as patch in the contrib
tree. And if it until not good tested not dirty with this main tree...

Ok, I next week prepare it...

Karel

#30

Poul L. Christiansen

poulc@cs.auc.dk

about 25 years ago

In reply to: Daniel Freedman (#25)

Re: Query caching

Daniel Freedman wrote:

On the topic of query cache (or maybe this is just tangential and I'm
confused):

I've always heard that Oracle has the ability to essentially suck in as
much of the database into RAM as you have memory to allow it, and can then
just run its queries on that in-RAM database (or db subset) without doing
disk I/O (which I would probably imagine is one of the more expensive
parts of a given SQL command). I've looked for references as to
Postgresql's ability to do something like this, but I've never been
certain if it's possible. Can postgresql do this, please? And, if not,
does it have to hit the disk for every SQL instruction (I would assume
so)?

PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's
are cached, but the default cache is only ï¿½MB of RAM. You can change
this to whatever you want.

I'm using Cold Fusion and it can cache queries itself, so no database
action is necessary. But I don't think PHP and others have this
possibility. But Cold Fusion costs 1300$ :(

Poul L. Christiansen

#31

Marc G. Fournier

scrappy@hub.org

about 25 years ago

In reply to: Karel Zak (#29)

Re: Re: [GENERAL] Query caching

On Wed, 1 Nov 2000, Karel Zak wrote:

On Tue, 31 Oct 2000, The Hermit Hacker wrote:

On Tue, 31 Oct 2000, Alfred Perlstein wrote:

Karel Zak <zakkr@zf.jcu.cz> Implemented stored proceedures for
postgresql but still hasn't been approached to integrated them.

someone has to approach him to integrate them? *raised eyebrow*

Karel, where did things stand the last time this was brought up? We
haven't gone beta yet, can you re-submit a patch for v7.1 before beta so
that we can integrate the changes? *Maybe*, if possible, submit it such
that its a compile time option, so that its there if someone like Alfred
wants to be brave, but it won't zap everyone if there is a bug?

Well I can re-write and resubmit this patch. Add it as a compile time option
is not bad idea. Second possibility is distribute it as patch in the contrib
tree. And if it until not good tested not dirty with this main tree...

Ok, I next week prepare it...

If you can have it as a compile time option before we go beta, I'll put it
into the main tree ... if not, we'll put it into contrib.

#32

Poul L. Christiansen

poulc@cs.auc.dk

about 25 years ago

In reply to: Daniel Freedman (#25)

Re: Query caching

Frank Joerdens wrote:

On Wed, Nov 01, 2000 at 10:16:58AM +0000, Poul L. Christiansen wrote:

PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's
are cached, but the default cache is only ï¿½MB of RAM. You can change
this to whatever you want.

That sound like a very cool thing to do, and the default seems awfully
conservative, given the average serverï¿½s RAM equipment nowadays. If you
have a small Linux server with 128 MB of RAM, it would be interesting to
see what happens, performance-wise, if you increase the cache for
selects to, for instance, 64 MB. Has anyone tried to benchmark this? How
would you benchmark it? Where do you change this cache size? How do you
keep the cache from being swapped out to disk (which would presumably
all but eradicate the benefits of such a measure)?

I have a PostgreSQL server with 80MB of RAM running Redhat Linux 7.0 and
in my /etc/rc.d/init.d/postgresql start script I have these 2 lines that
start the postmaster.

echo 67108864 > /proc/sys/kernel/shmmax

su -l postgres -c "/usr/bin/pg_ctl -D $PGDATA -p /usr/bin/postmaster -o
'-i -B 4096 -o -F' start >/dev/null 2>&1" < /dev/null

The first line increases the maxium shared memory to 64MB.
The "-B 4096" indicates 4096 * 8kb = 32MB to each postmaster.

I haven't benchmarked it, but I know it's MUCH faster.

Poul L. Christiansen

#33

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Daniel Freedman (#25)

Re: Query caching

How do you
keep the cache from being swapped out to disk (which would presumably
all but eradicate the benefits of such a measure)?

You make sure that you have enough RAM that you aren't using swap. : )

Seriously, as cheap as RAM is today, if a machine uses swap more than
occasionally, an upgrade is in order.

steve

#34

Zeugswetter Andreas SB

ZeugswetterA@wien.spardat.at

about 25 years ago

In reply to: Steve Wolfe (#33)

AW: Re: [GENERAL] Query caching

Well I can re-write and resubmit this patch. Add it as a
compile time option
is not bad idea. Second possibility is distribute it as patch
in the contrib
tree. And if it until not good tested not dirty with this main tree...

Ok, I next week prepare it...

One thing that worries me though is, that it extends the sql language,
and there has been no discussion about the chosen syntax.

Imho the standard embedded SQL syntax (prepare ...) could be a
starting point.

Andreas

Import Notes

Resolved by subject fallback

#35

Karel Zak

zakkr@zf.jcu.cz

about 25 years ago

In reply to: Zeugswetter Andreas SB (#34)

Re: AW: Re: [GENERAL] Query caching

On Thu, 2 Nov 2000, Zeugswetter Andreas SB wrote:

Well I can re-write and resubmit this patch. Add it as a
compile time option
is not bad idea. Second possibility is distribute it as patch
in the contrib
tree. And if it until not good tested not dirty with this main tree...

Ok, I next week prepare it...

One thing that worries me though is, that it extends the sql language,
and there has been no discussion about the chosen syntax.

Imho the standard embedded SQL syntax (prepare ...) could be a
starting point.

Yes, you are right... my PREPARE/EXECUTE is not too much ready to SQL92,
I some old letter I speculate about "SAVE/EXECUTE PLAN" instead
PREPARE/EXECUTE. But don't forget, it will *experimental* patch... we can
change it in future ..etc.

Karel

#36

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: The Hermit Hacker (#24)

Re: Re: [GENERAL] Query caching

The Hermit Hacker <scrappy@hub.org> writes:

Karel, where did things stand the last time this was brought up? We
haven't gone beta yet, can you re-submit a patch for v7.1 before beta so
that we can integrate the changes?

I think it would be a very bad idea to try to integrate the query cache
stuff at this point in the 7.1 cycle. The feature needs more
discussion/design/testing than we have time to give it for 7.1.

Some of the concerns I have about it:

1. What is the true performance gain --- if any --- in real-world
situations? The numbers Karel has quoted sound like wildly optimistic
best cases to me. What's the worst case? What's the average case?

2. How do we handle flushing the cache when conditions change (schema
alterations, etc)?

3. Is it really a good idea to use a shared-across-backends cache?
What are the locking and contention costs? What happens when we run
out of shared memory (which is a *very* finite resource)? Will cache
flush work correctly in a situation where backends are concurrently
inserting new plans? Doesn't a shared cache make it nearly impossible
to control the query planner, if the returned plan might have been
generated by a different backend with a different set of
optimization-control variables?

4. How does one control the cache, anyway? Can it be flushed by user
command? How is a new query matched against existing cache entries?
Can one determine which elements of a query are considered parameters to
the cached plan, and which are constants? Does the syntax for doing
these things have anything to do with the SQL standard?

I think this is a potentially interesting feature, but it requires far
more discussion and review than it's gotten so far, and there's no time
to do that unless we want to push out 7.1 release a lot more. I'm also
concerned that we will need to focus heavily on testing WAL during 7.1
beta, and I don't want a major distraction from that...

regards, tom lane

#37

Joseph Shraibman

jks@selectacast.net

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

Performance depends on a lot of factors. Shelling out $$$ for Sun
hardware doesn't garuntee good performance. They might have been better
off buying a Tru64 system with Compaq's jdk.

Steve Wolfe wrote:

d) PHP may not be a great choice. It doesn't provide a lot of hooks
for effective caching of database connections and/or results.
mod_perl or Java servlets may be better, depending on the details.

One of our competitors spent a very, very large deal of money on high-end
Sun equipment, so that they could write their CGI stuff in Java servlets.
It still ran slow. We run Perl on machines that pale compared to theirs,
and get far better performance. : )

steve

--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com

#38

Steve Wolfe

steve@iboats.com

about 25 years ago

In reply to: Arnold Gamboa (#1)

Re: how good is PostgreSQL

Performance depends on a lot of factors. Shelling out $$$ for Sun
hardware doesn't garuntee good performance. They might have been better
off buying a Tru64 system with Compaq's jdk.

Yeah, it could be. But comparing the $7,000 Intel machine I built
against a $20,000 Alpha, I'm still very happy with Intel. Yes, the Alpha
was faster on a per-processor basis. But it also cost more than twice as
much on a dollar-for-transaction basis. ; )

steve

#39

Karel Zak

zakkr@zf.jcu.cz

about 25 years ago

In reply to: Tom Lane (#36)

Re: Re: [GENERAL] Query caching

On Thu, 2 Nov 2000, Tom Lane wrote:

The Hermit Hacker <scrappy@hub.org> writes:

Karel, where did things stand the last time this was brought up? We
haven't gone beta yet, can you re-submit a patch for v7.1 before beta so
that we can integrate the changes?

I think it would be a very bad idea to try to integrate the query cache

We not talking about integrate it.. we talking about "prepare
*experimental* patch for 7.1" as contrib matter or compile time
option. I mean that contrib will better.

stuff at this point in the 7.1 cycle. The feature needs more
discussion/design/testing than we have time to give it for 7.1.

Agree.

Some of the concerns I have about it:

1. What is the true performance gain --- if any --- in real-world
situations? The numbers Karel has quoted sound like wildly optimistic

:-)

best cases to me. What's the worst case? What's the average case?

It's total some as SPI's saved planns. The query cache not has too much
cost, EXECUTE saved plan is: lock, search in HTAB, unlock, run executor..

2. How do we handle flushing the cache when conditions change (schema
alterations, etc)?

It's a *global* PG problem. What happen with VIEW if anyone change table
definition? ...etc. IMHO not ide for this.

3. Is it really a good idea to use a shared-across-backends cache?

I know your fear. But IMHO it's capital feature. For application
that not use persistent connection and very often re-connecting to
backend is very interesting share planns.

The query cache has two stores:
- global in shared memory -
- local in HTAB inside standard backend memory

What are the locking and contention costs? What happens when we run

costs of spinlock..

out of shared memory (which is a *very* finite resource)? Will cache

The cache has list of all planns and keep track of usage. If use define
cache entry as "removeable" is this oldest entry remove. Else cache
return error like 'cache is full'. The size of cache is possible define
during backen start up (argv).

flush work correctly in a situation where backends are concurrently
inserting new plans? Doesn't a shared cache make it nearly impossible
to control the query planner, if the returned plan might have been
generated by a different backend with a different set of
optimization-control variables?

Hmm, not implemented now.

4. How does one control the cache, anyway? Can it be flushed by user
command? How is a new query matched against existing cache entries?

All depend on user, the query is stored under some key (can be text or
binary). The key must be unique, but can be stored some planns but under
differnet keys.

Can one determine which elements of a query are considered parameters to
the cached plan, and which are constants? Does the syntax for doing

I don't underestend here. I use strandard '$' parameters and executor
options for this.

these things have anything to do with the SQL standard?

Yes, it is a problem. I mean that SQL92 expect a little differnet
stuff of PREPARE/EXECUTE.

I think this is a potentially interesting feature, but it requires far
more discussion and review than it's gotten so far, and there's no time
to do that unless we want to push out 7.1 release a lot more. I'm also
concerned that we will need to focus heavily on testing WAL during 7.1
beta, and I don't want a major distraction from that...

Total agree.. I prepare it as patch for playful hackers
(hope, like you :-)))

Karel

#40

Christof Petig

christof.petig@wtal.de

about 25 years ago

In reply to: Karel Zak (#35)

Re: AW: Re: [GENERAL] Query caching

Karel Zak wrote:

On Thu, 2 Nov 2000, Zeugswetter Andreas SB wrote:

Well I can re-write and resubmit this patch. Add it as a
compile time option
is not bad idea. Second possibility is distribute it as patch
in the contrib
tree. And if it until not good tested not dirty with this main tree...

Ok, I next week prepare it...

One thing that worries me though is, that it extends the sql language,
and there has been no discussion about the chosen syntax.

Imho the standard embedded SQL syntax (prepare ...) could be a
starting point.

Yes, you are right... my PREPARE/EXECUTE is not too much ready to SQL92,
I some old letter I speculate about "SAVE/EXECUTE PLAN" instead
PREPARE/EXECUTE. But don't forget, it will *experimental* patch... we can
change it in future ..etc.

Karel

[Sorry, I didn't look into your patch, yet.]
What about parameters? Normally you can prepare a statement and execute it
using different parameters. AFAIK postgres' frontend-backend protocol is not
designed to take parameters for statements (e.g. like result presents
results). A very long road to go.
By the way, I'm somewhat interested in getting this feature in. Perhaps it
should be part of a protocol redesign (e.g. binary parameters/results).
Handling endianness is one aspect, floats are harder (but float->ascii->float
sometimes fails as well).

Christof

#41

Karel Zak

zakkr@zf.jcu.cz

about 25 years ago

In reply to: Christof Petig (#40)

Re: AW: Re: [GENERAL] Query caching

On Fri, 3 Nov 2000, Christof Petig wrote:

Karel Zak wrote:

On Thu, 2 Nov 2000, Zeugswetter Andreas SB wrote:

Well I can re-write and resubmit this patch. Add it as a
compile time option
is not bad idea. Second possibility is distribute it as patch
in the contrib
tree. And if it until not good tested not dirty with this main tree...

Ok, I next week prepare it...

One thing that worries me though is, that it extends the sql language,
and there has been no discussion about the chosen syntax.

Imho the standard embedded SQL syntax (prepare ...) could be a
starting point.

Yes, you are right... my PREPARE/EXECUTE is not too much ready to SQL92,
I some old letter I speculate about "SAVE/EXECUTE PLAN" instead
PREPARE/EXECUTE. But don't forget, it will *experimental* patch... we can
change it in future ..etc.

Karel

[Sorry, I didn't look into your patch, yet.]

Please, read my old query cache and PREPARE/EXECUTE description...

What about parameters? Normally you can prepare a statement and execute it

We have in PG parameters, see SPI, but now it's used inside backend only
and not exist statement that allows to use this feature in be<->fe.

using different parameters. AFAIK postgres' frontend-backend protocol is not
designed to take parameters for statements (e.g. like result presents
results). A very long road to go.
By the way, I'm somewhat interested in getting this feature in. Perhaps it
should be part of a protocol redesign (e.g. binary parameters/results).
Handling endianness is one aspect, floats are harder (but float->ascii->float
sometimes fails as well).

PREPARE <name> AS <query>
[ USING type, ... typeN ]
[ NOSHARE | SHARE | GLOBAL ]

EXECUTE <name>
[ INTO [ TEMPORARY | TEMP ] [ TABLE ] new_table ]
[ USING val, ... valN ]
[ NOSHARE | SHARE | GLOBAL ]

DEALLOCATE PREPARE
[ <name> [ NOSHARE | SHARE | GLOBAL ]]
[ ALL | ALL INTERNAL ]

An example:

PREPARE chris_query AS SELECT * FROM pg_class WHERE relname = $1 USING text;

EXECUTE chris_query USING 'pg_shadow';

Or mean you something other?
Karel

#42

Christof Petig

christof.petig@wtal.de

about 25 years ago

In reply to: Karel Zak (#41)

Re: AW: Re: [GENERAL] Query caching

Karel Zak wrote:

On Fri, 3 Nov 2000, Christof Petig wrote:

Karel Zak wrote:

On Thu, 2 Nov 2000, Zeugswetter Andreas SB wrote:

Well I can re-write and resubmit this patch. Add it as a
compile time option
is not bad idea. Second possibility is distribute it as patch
in the contrib
tree. And if it until not good tested not dirty with this main tree...

Ok, I next week prepare it...

One thing that worries me though is, that it extends the sql language,
and there has been no discussion about the chosen syntax.

Imho the standard embedded SQL syntax (prepare ...) could be a
starting point.

Yes, you are right... my PREPARE/EXECUTE is not too much ready to SQL92,
I some old letter I speculate about "SAVE/EXECUTE PLAN" instead
PREPARE/EXECUTE. But don't forget, it will *experimental* patch... we can
change it in future ..etc.

Karel

[Sorry, I didn't look into your patch, yet.]

Please, read my old query cache and PREPARE/EXECUTE description...

Sorry I can't find it in my (current) mailbox, do you have a copy around? Or can
you give me a keyword?

What about parameters? Normally you can prepare a statement and execute it

We have in PG parameters, see SPI, but now it's used inside backend only
and not exist statement that allows to use this feature in be<->fe.

Sad. Since ecpg would certainly benefit from this.

using different parameters. AFAIK postgres' frontend-backend protocol is not
designed to take parameters for statements (e.g. like result presents
results). A very long road to go.
By the way, I'm somewhat interested in getting this feature in. Perhaps it
should be part of a protocol redesign (e.g. binary parameters/results).
Handling endianness is one aspect, floats are harder (but float->ascii->float
sometimes fails as well).

PREPARE <name> AS <query>
[ USING type, ... typeN ]
[ NOSHARE | SHARE | GLOBAL ]

EXECUTE <name>
[ INTO [ TEMPORARY | TEMP ] [ TABLE ] new_table ]
[ USING val, ... valN ]
[ NOSHARE | SHARE | GLOBAL ]

DEALLOCATE PREPARE
[ <name> [ NOSHARE | SHARE | GLOBAL ]]
[ ALL | ALL INTERNAL ]

An example:

PREPARE chris_query AS SELECT * FROM pg_class WHERE relname = $1 USING text;

I would prefer '?' as a parameter name, since this is in the embedded sql standard
(do you have a copy of the 94 draft? I can mail mine to you?)
Also the standard says a whole lot about guessing the parameter's type.

Also I vote for ?::type or type(?) or sql's cast(...) (don't know it's syntax)
instead of abusing the using keyword.

EXECUTE chris_query USING 'pg_shadow';

Great idea of yours to implement this! Since I was thinking about implementing a
more decent schema for ecpg but had no mind to touch the backend and be-fe
protocol (yet).
It would be desirable to do an 'execute immediate using', since using input
parameters would take a lot of code away from ecpg.

Yours
Christof

PS: I vote for rethinking the always ascii over the wire strategy. CORBA was
proposed as a potential replacement which takes care of endianness and float
conversions. But I would not go that far (???), perhaps taking encodings (aka
marshalling?) from CORBA.

#43

Ross J. Reedstrom

reedstrm@rice.edu

about 25 years ago

In reply to: Christof Petig (#42)

Re: AW: Re: [GENERAL] Query caching

On Wed, Nov 08, 2000 at 04:05:50PM +0100, Christof Petig wrote:

Karel Zak wrote:

Please, read my old query cache and PREPARE/EXECUTE description...

Sorry I can't find it in my (current) mailbox, do you have a copy around? Or can
you give me a keyword?

In my archives, there's this one:

Date: Wed, 19 Jul 2000 10:16:13 +0200 (CEST)
From: Karel Zak <zakkr@zf.jcu.cz>
To: pgsql-hackers <pgsql-hackers@postgresql.org>
Subject: [HACKERS] The query cache - first snapshot (long)

Here's the URL to the archives:

http://www.postgresql.org/mhonarc/pgsql-hackers/2000-07/msg01098.html

Ross
--
Open source code is like a natural resource, it's the result of providing
food and sunshine to programmers, and then staying out of their way.
[...] [It] is not going away because it has utility for both the developers
and users independent of economic motivations. Jim Flynn, Sunnyvale, Calif.

#44

Karel Zak

zakkr@zf.jcu.cz

about 25 years ago

In reply to: Christof Petig (#42)

Re: AW: Re: [GENERAL] Query caching

On Wed, 8 Nov 2000, Christof Petig wrote:

Karel Zak wrote:

What about parameters? Normally you can prepare a statement and execute it

We have in PG parameters, see SPI, but now it's used inside backend only
and not exist statement that allows to use this feature in be<->fe.

Sad. Since ecpg would certainly benefit from this.

using different parameters. AFAIK postgres' frontend-backend protocol is not
designed to take parameters for statements (e.g. like result presents
results). A very long road to go.
By the way, I'm somewhat interested in getting this feature in. Perhaps it
should be part of a protocol redesign (e.g. binary parameters/results).
Handling endianness is one aspect, floats are harder (but float->ascii->float
sometimes fails as well).

PREPARE <name> AS <query>
[ USING type, ... typeN ]
[ NOSHARE | SHARE | GLOBAL ]

EXECUTE <name>
[ INTO [ TEMPORARY | TEMP ] [ TABLE ] new_table ]
[ USING val, ... valN ]
[ NOSHARE | SHARE | GLOBAL ]

DEALLOCATE PREPARE
[ <name> [ NOSHARE | SHARE | GLOBAL ]]
[ ALL | ALL INTERNAL ]

An example:

PREPARE chris_query AS SELECT * FROM pg_class WHERE relname = $1 USING text;

I would prefer '?' as a parameter name, since this is in the embedded sql standard
(do you have a copy of the 94 draft? I can mail mine to you?)

This not depend on query cache. The '$n' is PostgreSQL query parametr
keyword and is defined in standard parser. The PREPARE statement not parsing
query it's job for standard parser.

Also the standard says a whole lot about guessing the parameter's type.

Also I vote for ?::type or type(?) or sql's cast(...) (don't know it's syntax)
instead of abusing the using keyword.

The postgresql executor expect types of parametrs in separate input (array).
I not sure how much expensive/executable is survey it from query.

EXECUTE chris_query USING 'pg_shadow';

Great idea of yours to implement this! Since I was thinking about implementing a
more decent schema for ecpg but had no mind to touch the backend and be-fe
protocol (yet).
It would be desirable to do an 'execute immediate using', since using input
parameters would take a lot of code away from ecpg.

By the way, PREPARE/EXECUTE is face only. More interesting in this period is
query-cache-kernel. SQL92 is really a little unlike my PREPARE/EXECUTE.

Karel

#45

Christof Petig

christof.petig@wtal.de

about 25 years ago

In reply to: Karel Zak (#44)

Re: Query caching

Karel Zak wrote:

On Wed, 8 Nov 2000, Christof Petig wrote:

Karel Zak wrote:

What about parameters? Normally you can prepare a statement and execute it

We have in PG parameters, see SPI, but now it's used inside backend only
and not exist statement that allows to use this feature in be<->fe.

Sad. Since ecpg would certainly benefit from this.

Postponed for future improvements ...

PREPARE chris_query AS SELECT * FROM pg_class WHERE relname = $1 USING text;

I would prefer '?' as a parameter name, since this is in the embedded sql standard
(do you have a copy of the 94 draft? I can mail mine to you?)

This not depend on query cache. The '$n' is PostgreSQL query parametr
keyword and is defined in standard parser. The PREPARE statement not parsing
query it's job for standard parser.

I see.

Also the standard says a whole lot about guessing the parameter's type.

Also I vote for ?::type or type(?) or sql's cast(...) (don't know it's syntax)
instead of abusing the using keyword.

The postgresql executor expect types of parametrs in separate input (array).
I not sure how much expensive/executable is survey it from query.

That would involve changing the parser. Future project.

EXECUTE chris_query USING 'pg_shadow';

Great idea of yours to implement this! Since I was thinking about implementing a
more decent schema for ecpg but had no mind to touch the backend and be-fe
protocol (yet).
It would be desirable to do an 'execute immediate using', since using input
parameters would take a lot of code away from ecpg.

By the way, PREPARE/EXECUTE is face only. More interesting in this period is
query-cache-kernel. SQL92 is really a little unlike my PREPARE/EXECUTE.

I'm looking forward to get first experiences with the query cache kernel. I think it's
the right way to go.

Christof

#46

Noname

devik@cdi.cz

about 25 years ago

In reply to: Karel Zak (#44)

Re: Query caching

Did someone think about query costs ? Is you prepare
query like SELECT id FROM t1 WHERE type=$1 and
execute it with $1=1 and 2. For 1 there is one record
in t1 a all other have type=2.
Without caching, first query will use index, second
not.
Should cached plan use index or not ?
devik

Christof Petig wrote:

Show quoted text

Karel Zak wrote:

On Wed, 8 Nov 2000, Christof Petig wrote:

Karel Zak wrote:

What about parameters? Normally you can prepare a statement and execute it

We have in PG parameters, see SPI, but now it's used inside backend only
and not exist statement that allows to use this feature in be<->fe.

Sad. Since ecpg would certainly benefit from this.

Postponed for future improvements ...

PREPARE chris_query AS SELECT * FROM pg_class WHERE relname = $1 USING text;

I would prefer '?' as a parameter name, since this is in the embedded sql standard
(do you have a copy of the 94 draft? I can mail mine to you?)

This not depend on query cache. The '$n' is PostgreSQL query parametr
keyword and is defined in standard parser. The PREPARE statement not parsing
query it's job for standard parser.

I see.

Also the standard says a whole lot about guessing the parameter's type.

Also I vote for ?::type or type(?) or sql's cast(...) (don't know it's syntax)
instead of abusing the using keyword.

The postgresql executor expect types of parametrs in separate input (array).
I not sure how much expensive/executable is survey it from query.

That would involve changing the parser. Future project.

EXECUTE chris_query USING 'pg_shadow';

Great idea of yours to implement this! Since I was thinking about implementing a
more decent schema for ecpg but had no mind to touch the backend and be-fe
protocol (yet).
It would be desirable to do an 'execute immediate using', since using input
parameters would take a lot of code away from ecpg.

By the way, PREPARE/EXECUTE is face only. More interesting in this period is
query-cache-kernel. SQL92 is really a little unlike my PREPARE/EXECUTE.

I'm looking forward to get first experiences with the query cache kernel. I think it's
the right way to go.

Christof

#47

Karel Zak

zakkr@zf.jcu.cz

about 25 years ago

In reply to: Noname (#46)

Re: Query caching

On Fri, 10 Nov 2000 devik@cdi.cz wrote:

Did someone think about query costs ? Is you prepare
query like SELECT id FROM t1 WHERE type=$1 and
execute it with $1=1 and 2. For 1 there is one record
in t1 a all other have type=2.
Without caching, first query will use index, second
not.
Should cached plan use index or not ?
devik

The postgresql already have planns caching. See SPI (saveplan), but
it's usable for internal stuff (for example triggers..) only. The
PREPARE/EXECUTE pull up it to be<->fe and make new memory type that
allows save it in shared memory. But else it's *nothing* new.

A validity of cached planns is user problem now. Not some internal
method how check changes that out of date some query (or exist some idea?).
It can be more changes like changes in DB schema.

If resolve this anyone clever person it will great for VIEW, SPI too.

Rebuid a query plan in the planner is not a problem, in the cache is
stored original query tree, but you must known when... or must know
it a DB user.

Karel