A query planner that learns

Started by Scott Marloweover 19 years ago28 messagesgeneral

smarlowe@g2switchworks.com

over 19 years ago

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Seems to me that would be far more useful than my having to babysit the
queries that are running slow and come up with hints to have the
database do what I want.

I already log slow queries and review them once a week by running them
with explain analyze and adjust what little I can, like stats targets
and such.

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info. Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful. I could see something like that
eventually evolving into a self tuning system.

Well, I'm busy learning to be an Oracle DBA right now, so I can't do
it. But it would be a very cool project for the next college student
who shows up looking for one.

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Scott Marlowe (#1)

Re: A query planner that learns

On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Seems to me that would be far more useful than my having to babysit the
queries that are running slow and come up with hints to have the
database do what I want.

I already log slow queries and review them once a week by running them
with explain analyze and adjust what little I can, like stats targets
and such.

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info. Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful. I could see something like that
eventually evolving into a self tuning system.

Saves it and then... does what? That's the whole key...

Well, I'm busy learning to be an Oracle DBA right now, so I can't do
it. But it would be a very cool project for the next college student
who shows up looking for one.

Why? There's a huge demand for PostgreSQL experts out there... or is
this for a current job?
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

Scott Marlowe

smarlowe@g2switchworks.com

over 19 years ago

In reply to: Jim Nasby (#2)

Re: A query planner that learns

On Thu, 2006-10-12 at 17:14, Jim C. Nasby wrote:

On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Seems to me that would be far more useful than my having to babysit the
queries that are running slow and come up with hints to have the
database do what I want.

I already log slow queries and review them once a week by running them
with explain analyze and adjust what little I can, like stats targets
and such.

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info. Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful. I could see something like that
eventually evolving into a self tuning system.

Saves it and then... does what? That's the whole key...

It's meant as a first step. I could certainly use a daily report on
which queries had bad plans so I'd know which ones to investigate
without having to run them each myself in explain analyze. Again, my
point was to do it incrementally. This is something someone could do
now, and someone could build on later.

To start with, it does nothing. Just saves it for the DBA to look at.
Later, it could feed any number of the different hinting systems people
have been proposing.

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.

Well, I'm busy learning to be an Oracle DBA right now, so I can't do
it. But it would be a very cool project for the next college student
who shows up looking for one.

Why? There's a huge demand for PostgreSQL experts out there... or is
this for a current job?

Long story. I do get to do lots of pgsql stuff. But right now I'm
learning Oracle as well, cause we use both DBs. It's just that since I
know pgsql pretty well, and know oracle hardly at all, Oracle is taking
up lots more of my time.

Erik Jones

erik@myemma.com

over 19 years ago

In reply to: Scott Marlowe (#3)

Re: A query planner that learns

Forgive me if I'm way off here as I'm not all that familiar with the
internals of postgres, but isn't this what the genetic query optimizer
discussed the one of the manual's appendixes is supposed to do. Or, is
that more about using the the known costs of atomic operations that make
up a query plan, i.e. more of a bottom up approach than what you're
discussing. If it's the latter, then it sounds like what you're looking
for is a classifier system. See <a
href="http://en.wikipedia.org/wiki/Learning_classifier_system">this</a>
wikepedia article for a short description of them along with a couple of
reference links, you can google for many more.

Scott Marlowe wrote:

On Thu, 2006-10-12 at 17:14, Jim C. Nasby wrote:

On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Seems to me that would be far more useful than my having to babysit the
queries that are running slow and come up with hints to have the
database do what I want.

I already log slow queries and review them once a week by running them
with explain analyze and adjust what little I can, like stats targets
and such.

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info. Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful. I could see something like that
eventually evolving into a self tuning system.

Saves it and then... does what? That's the whole key...

It's meant as a first step. I could certainly use a daily report on
which queries had bad plans so I'd know which ones to investigate
without having to run them each myself in explain analyze. Again, my
point was to do it incrementally. This is something someone could do
now, and someone could build on later.

To start with, it does nothing. Just saves it for the DBA to look at.
Later, it could feed any number of the different hinting systems people
have been proposing.

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.

Well, I'm busy learning to be an Oracle DBA right now, so I can't do
it. But it would be a very cool project for the next college student
who shows up looking for one.

Why? There's a huge demand for PostgreSQL experts out there... or is
this for a current job?

Long story. I do get to do lots of pgsql stuff. But right now I'm
learning Oracle as well, cause we use both DBs. It's just that since I
know pgsql pretty well, and know oracle hardly at all, Oracle is taking
up lots more of my time.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
erik jones <erik@myemma.com>
software development
emma(r)

John D. Burger

john@mitre.org

over 19 years ago

In reply to: Erik Jones (#4)

Re: A query planner that learns

Erik Jones wrote:

Forgive me if I'm way off here as I'm not all that familiar with
the internals of postgres, but isn't this what the genetic query
optimizer discussed the one of the manual's appendixes is supposed
to do.

No - it's not an "optimizer" in that sense. When there are a small
enough set of tables involved, the planner uses a dynamic programming
algorithm to explore the entire space of all possible plans. But the
space grows exponentially (I think) with the number of tables - when
this would take too long, the planner switches to a genetic algorithm
approach, which explores a small fraction of the plan space, in a
guided manner.

But with both approaches, the planner is just using the static
statistics gathered by ANALYZE to estimate the cost of each candidate
plan, and these statistics are based on sampling your data - they may
be wrong, or at least misleading. (In particular, the statistic for
total number of unique values is frequently =way= off, per a recent
thread here. I have been reading about this, idly thinking about how
to improve the estimate.)

The idea of a learning planner, I suppose, would be one that examines
cases where these statistics lead to very misguided expectations.
The simplest version of a "learning" planner could simply bump up the
statistics targets on certain columns. A slightly more sophisticated
idea would be for some of the statistics to optionally use parametric
modeling (this column is a Gaussian, let's estimate the mean and
variance, this one is a Beta distribution ...). Then the smarter
planner could spend some cycles applying more sophisticated
statistical modeling to problematic tables/columns.

- John D. Burger
MITRE

A.M.

agentm@themactionfaction.com

over 19 years ago

In reply to: John D. Burger (#5)

Re: A query planner that learns

On Oct 13, 2006, at 11:47 , John D. Burger wrote:

Erik Jones wrote:

Forgive me if I'm way off here as I'm not all that familiar with
the internals of postgres, but isn't this what the genetic query
optimizer discussed the one of the manual's appendixes is supposed
to do.

No - it's not an "optimizer" in that sense. When there are a small
enough set of tables involved, the planner uses a dynamic
programming algorithm to explore the entire space of all possible
plans. But the space grows exponentially (I think) with the number
of tables - when this would take too long, the planner switches to
a genetic algorithm approach, which explores a small fraction of
the plan space, in a guided manner.

But with both approaches, the planner is just using the static
statistics gathered by ANALYZE to estimate the cost of each
candidate plan, and these statistics are based on sampling your
data - they may be wrong, or at least misleading. (In particular,
the statistic for total number of unique values is frequently =way=
off, per a recent thread here. I have been reading about this,
idly thinking about how to improve the estimate.)

The idea of a learning planner, I suppose, would be one that
examines cases where these statistics lead to very misguided
expectations. The simplest version of a "learning" planner could
simply bump up the statistics targets on certain columns. A
slightly more sophisticated idea would be for some of the
statistics to optionally use parametric modeling (this column is a
Gaussian, let's estimate the mean and variance, this one is a Beta
distribution ...). Then the smarter planner could spend some
cycles applying more sophisticated statistical modeling to
problematic tables/columns.

One simple first step would be to run an ANALYZE whenever a
sequential scan is executed. Is there a reason not to do this? It
could be controlled by a GUC variable in case someone wants
repeatable plans.

Further down the line, statistics could be collected during the
execution of any query- updating histograms on delete and update, as
well.

-M

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: Scott Marlowe (#3)

Re: A query planner that learns

On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info. Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful. I could see something like that
eventually evolving into a self tuning system.

Saves it and then... does what? That's the whole key...

It's meant as a first step. I could certainly use a daily report on
which queries had bad plans so I'd know which ones to investigate
without having to run them each myself in explain analyze. Again, my
point was to do it incrementally. This is something someone could do
now, and someone could build on later.

To start with, it does nothing. Just saves it for the DBA to look at.
Later, it could feed any number of the different hinting systems people
have been proposing.

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.

Yeah, that would be useful to have. The problem I see is storing that
info in a format that's actually useful... and I'm thinking that a
logfile doesn't qualify since you can't really query it.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

Jim Nasby

Jim.Nasby@BlueTreble.com

over 19 years ago

In reply to: A.M. (#6)

Re: A query planner that learns

On Fri, Oct 13, 2006 at 11:53:15AM -0400, AgentM wrote:

One simple first step would be to run an ANALYZE whenever a
sequential scan is executed. Is there a reason not to do this? It

Yes. You want a seqscan on a small (couple pages) table, and ANALYZE has
a very high overhead on some platforms.

Just recording the query plan and actual vs estimated rowcounts would be
a good start, though. And useful to DBA's, provided you had some means
to query against it.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

Scott Marlowe

smarlowe@g2switchworks.com

over 19 years ago

In reply to: Jim Nasby (#7)

Re: A query planner that learns

On Fri, 2006-10-13 at 12:48, Jim C. Nasby wrote:

On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info. Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful. I could see something like that
eventually evolving into a self tuning system.

Saves it and then... does what? That's the whole key...

It's meant as a first step. I could certainly use a daily report on
which queries had bad plans so I'd know which ones to investigate
without having to run them each myself in explain analyze. Again, my
point was to do it incrementally. This is something someone could do
now, and someone could build on later.

To start with, it does nothing. Just saves it for the DBA to look at.
Later, it could feed any number of the different hinting systems people
have been proposing.

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.

Yeah, that would be useful to have. The problem I see is storing that
info in a format that's actually useful... and I'm thinking that a
logfile doesn't qualify since you can't really query it.

grep / sed / awk can do amazing things to a text file.

I'd actually recommend URL encoding (or something like that) so they'd
be single lines, then you could grep for certain things and feed the
lines to a simple de-encoder.

We do it with our log files at work and can search through some fairly
large files for the exact entry we need fairly quickly.

#10

Guy Rouillier

guyr@masergy.com

over 19 years ago

In reply to: Jim Nasby (#8)

Re: A query planner that learns

Jim C. Nasby wrote:

Just recording the query plan and actual vs estimated rowcounts would
be a good start, though. And useful to DBA's, provided you had some
means to query against it.

If the DBMS stores the incoming query (or some parsed version of it) and
identifies frequency of usage over time, it can then spend spare cycles
more deeply analyzing frequently used queries. Many DB servers have
usage patterns just like end-user workstations (not all, I realize):
they are busy for predictable periods of the day and fairly idle at
other times. To provide acceptable response times, DBMSs don't have the
luxury of analyzing numerous query paths when queries are received. But
they could do this offline so that the next time it sees the same query,
it can use a better-optimized plan.

Storing the query itself is probably a better idea than storing the
plan, since the plan may change over time.

--
Guy Rouillier

#11

Ron Johnson

ron.l.johnson@cox.net

over 19 years ago

In reply to: John D. Burger (#5)

Re: A query planner that learns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/13/06 10:47, John D. Burger wrote:

Erik Jones wrote:

[snip]

But with both approaches, the planner is just using the static
statistics gathered by ANALYZE to estimate the cost of each candidate
plan, and these statistics are based on sampling your data - they may be
wrong, or at least misleading. (In particular, the statistic for total
number of unique values is frequently =way= off, per a recent thread
here. I have been reading about this, idly thinking about how to
improve the estimate.)

What about an ANALYZE FULL <table>, reading every record in the
table, and ever node in every index, storing in pg_statistic (or
some new, similar table) such items as the AVG and STD of the number
of records in each page, and b-tree depth, keys per node, records
per key and per segment? Maybe even "average distance between pages
in the tablespace".

This would let the optimizer know things like "the value of the
field which is first segment of an index (and which is the only part
of the index in the WHERE clause) describes 75% of the rows in the
table, and the records are all packed in tightly in the pages, and
the pages are close together", so the optimizer could decide "a
table scan would be much more efficient".

In some ways, this would be similar in functionality to the existing
histogram created by ANALYZE, but would provide a slightly different
picture.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFMcu5S9HxQb37XmcRAvVJAJ0VFfEoxwrKn15VqPaZz54SNY4tPACg47zB
r3hZ+HqHE/1bCJK/xNZzNRE=
=OP9+
-----END PGP SIGNATURE-----

#12

Alban Hertroys

alban@magproductions.nl

over 19 years ago

In reply to: Jim Nasby (#7)

Re: A query planner that learns

Jim C. Nasby wrote:

On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.

Yeah, that would be useful to have. The problem I see is storing that
info in a format that's actually useful... and I'm thinking that a
logfile doesn't qualify since you can't really query it.

You'd need something that contains the query plan (obviously), but also
all relevant information about the underlying data model and data
distribution. Some meta-data, like the version of PostgreSQL, is
probably required as well.

The current statistics contain some of this information, but from
reading this list I know that that's rarely enough information to
determine an error made by the planner.

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

#13

Jochem van Dieten

jochemd@oli.tudelft.nl

over 19 years ago

In reply to: Scott Marlowe (#1)

Re: A query planner that learns

Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Technically it is very feasible. But I think you might want to check US
Patent 6,763,359 before you start writing any code.

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info. Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful. I could see something like that
eventually evolving into a self tuning system.

I think it would be a good start if we can specify a
log_selectivity_error_threshold and if estimates are more then that
factor off, the query, parameters and planner estimates get logged for
later analysis. That would be driven entirely by selectivity estimates
and not (estimated) cost since cost is influenced by outside factors
such as other processes competing for resources. If a system for
statistical hints emerges from the current discussion we would indeed
have the input to start tuning the selectivity estimations.

Jochem

#14

Alvaro Herrera

alvherre@2ndquadrant.com

over 19 years ago

In reply to: Jochem van Dieten (#13)

Re: A query planner that learns

Jochem van Dieten wrote:

Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Technically it is very feasible. But I think you might want to check US
Patent 6,763,359 before you start writing any code.

I think it would be a very good idea if you guys stopped looking at the
US patent database. It does no good to anyone. There's no way we can
avoid stomping on a patent or another -- there are patents for everything.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#15

John D. Burger

john@mitre.org

over 19 years ago

In reply to: Jochem van Dieten (#13)

Re: A query planner that learns

Jochem van Dieten wrote:

I think you might want to check US Patent 6,763,359 before you
start writing any code.

http://tinyurl.com/yzjdve

- John D. Burger
MITRE

#16

Madison Kelly

linux@alteeve.com

over 19 years ago

In reply to: Alvaro Herrera (#14)

Re: A query planner that learns

Alvaro Herrera wrote:

Jochem van Dieten wrote:

Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.
Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Technically it is very feasible. But I think you might want to check US
Patent 6,763,359 before you start writing any code.

I think it would be a very good idea if you guys stopped looking at the
US patent database. It does no good to anyone. There's no way we can
avoid stomping on a patent or another -- there are patents for everything.

Hasn't IBM release a pile of it's patents for use (or at least stated
they won't sue) to OSS projects? If so, is this patent covered by that
"amnesty"?

Simply ignoring patents because "there is a patent for everything" is a
recipe for disaster. Companies like MS are running out of ways to tear
open OSS and they are certainly not above (below?) suing the heck out of
OSS projects for patent infringement.

What's needed is reform in the USPO. Call you congress (wo)man and
complain, but don't flaunt the law; you will lose.

Madi

#17

A.M.

agentm@themactionfaction.com

over 19 years ago

In reply to: Madison Kelly (#16)

Re: A query planner that learns

On Oct 16, 2006, at 16:17 , Madison Kelly wrote:

Alvaro Herrera wrote:

Jochem van Dieten wrote:

Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and
perform is
good, and I have a few queries that could live with a simple
hint system
pop up now and again, I keep thinking that a query planner that
learns
from its mistakes over time is far more desirable.
Is it reasonable or possible for the system to have a way to
look at
query plans it's run and look for obvious mistakes its made,
like being
off by a factor of 10 or more in estimations, and slowly learn
to apply
its own hints?

Technically it is very feasible. But I think you might want to
check US Patent 6,763,359 before you start writing any code.

I think it would be a very good idea if you guys stopped looking
at the
US patent database. It does no good to anyone. There's no way we
can
avoid stomping on a patent or another -- there are patents for
everything.

Hasn't IBM release a pile of it's patents for use (or at least
stated they won't sue) to OSS projects? If so, is this patent
covered by that "amnesty"?

Simply ignoring patents because "there is a patent for everything"
is a recipe for disaster. Companies like MS are running out of ways
to tear open OSS and they are certainly not above (below?) suing
the heck out of OSS projects for patent infringement.

What's needed is reform in the USPO. Call you congress (wo)man and
complain, but don't flaunt the law; you will lose.

Alvaro's advice is sound. If the patent holder can prove that a
developer looked at a patent (for example, from an email in a mailing
list archive) and the project proceeded with the implementation
regardless, malice can been shown and "damages" can be substantially
higher. You're screwed either way but your safest bet is to never
look at patents.

Disclaimer: I am not a lawyer- I don't even like lawyers.

-M

#18

Alvaro Herrera

alvherre@2ndquadrant.com

over 19 years ago

In reply to: Madison Kelly (#16)

Re: A query planner that learns

Madison Kelly wrote:

Alvaro Herrera wrote:

Jochem van Dieten wrote:

Scott Marlowe wrote:

While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.
Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Technically it is very feasible. But I think you might want to check US
Patent 6,763,359 before you start writing any code.

I think it would be a very good idea if you guys stopped looking at the
US patent database. It does no good to anyone. There's no way we can
avoid stomping on a patent or another -- there are patents for everything.

Hasn't IBM release a pile of it's patents for use (or at least stated
they won't sue) to OSS projects? If so, is this patent covered by that
"amnesty"?

This is useless as a policy, because we have plenty of companies basing
their proprietary code on PostgreSQL, which wouldn't be subject to the
grant (EnterpriseDB, Command Prompt, Fujitsu, SRA). We do support them.

Simply ignoring patents because "there is a patent for everything" is a
recipe for disaster. Companies like MS are running out of ways to tear
open OSS and they are certainly not above (below?) suing the heck out of
OSS projects for patent infringement.

It has been said that unknowingly infringing a patent is much less
problematic than knowingly doing same. We don't have the manpower to
implement the whole Postgres without infringing a single patent, so the
best approach is to refrain from researching possible patents applying
to us in the first place.

If people comes here and points at patents that we infringe or may
infringe, it will cause much more (useless) work for hackers which then
have to search alternative ways of doing the same things.

What's needed is reform in the USPO. Call you congress (wo)man and
complain, but don't flaunt the law; you will lose.

I agree. However, I am not an US inhabitant in the first place, and
bless my parents for that. Heck, I was even denied a visa -- twice.
Please do us all a favor and write to your congresspeople.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#19

Madison Kelly

linux@alteeve.com

over 19 years ago

In reply to: Alvaro Herrera (#18)

Re: A query planner that learns

Alvaro Herrera wrote:

Hasn't IBM release a pile of it's patents for use (or at least stated
they won't sue) to OSS projects? If so, is this patent covered by that
"amnesty"?

This is useless as a policy, because we have plenty of companies basing
their proprietary code on PostgreSQL, which wouldn't be subject to the
grant (EnterpriseDB, Command Prompt, Fujitsu, SRA). We do support them.

More specifically, it is then up to the 3rd party (non-OSS) developers
to clear the patents. It's not the PgSQL team's problem in this case
(assuming it's the case at all).

Simply ignoring patents because "there is a patent for everything" is a
recipe for disaster. Companies like MS are running out of ways to tear
open OSS and they are certainly not above (below?) suing the heck out of
OSS projects for patent infringement.

It has been said that unknowingly infringing a patent is much less
problematic than knowingly doing same. We don't have the manpower to
implement the whole Postgres without infringing a single patent, so the
best approach is to refrain from researching possible patents applying
to us in the first place.

If people comes here and points at patents that we infringe or may
infringe, it will cause much more (useless) work for hackers which then
have to search alternative ways of doing the same things.

"Plausible Deniability" and all that jazz. There is another truism
though; "Ignorance of the law is no excuse". Besides, claiming ignorance
doesn't keep you out of the courts in the first place. The people who
would attack OSS applications generally have very, very deep pockets and
can run a project out of money before the trial was over. They could do
that non-the-less (SCO, hello?) but I still suggest NOT tempting fate.

It's unfortunate that this legal mess causes the developers more
headaches than they need, but it still can't be ignored, imho.

What's needed is reform in the USPO. Call you congress (wo)man and
complain, but don't flaunt the law; you will lose.

I agree. However, I am not an US inhabitant in the first place, and
bless my parents for that. Heck, I was even denied a visa -- twice.
Please do us all a favor and write to your congresspeople.

Heh, I'm not an American either, so I can't do anything but shake my
head (and be equally glad that my own personal OSS program is here in
Canada).

American industry wonders why they are losing so many IT positions...
It's such a difficult and unfriendly environment there for anyone but
the biggest companies. Far too litigious.

Madi

#20

Madison Kelly

linux@alteeve.com

over 19 years ago

In reply to: A.M. (#17)

Re: A query planner that learns

AgentM wrote:

Alvaro's advice is sound. If the patent holder can prove that a
developer looked at a patent (for example, from an email in a mailing
list archive) and the project proceeded with the implementation
regardless, malice can been shown and "damages" can be substantially
higher. You're screwed either way but your safest bet is to never look
at patents.

Disclaimer: I am not a lawyer- I don't even like lawyers.

Nor am I a lawyer, but I still hold that hoping "ignorance" will be a
decent defense is very, very risky. In the end I am not a pgSQL
developer so it isn't in my hands either way.

Madi

#21