Long update query ?

Started by Sergei Chernevover 27 years ago30 messages

ser@nsu.ru

over 27 years ago

Hello,
I have query:
UPDATE userd_session_stat SET status =1 WHERE status=0 AND ((uid <>627 AND
tty <>'ttyA03') OR (uid <> 425 AND tty <> 'ttyA05') OR (uid <> 8011 AND tty
<> 'ttyA09') OR (uid <> 2092 AND tty <> 'ttyA0f') OR (uid <> 249 AND tty <>
'ttyp3') OR (uid <> 249 AND tty <> 'ttyp4') OR (uid <> 249 AND tty <>
'ttyp5') OR (uid <> 249 AND tty <> 'ttyp6'))

But, postgres complains that:
FATAL 1: palloc failure: memory exhausted

I see, the query must be less than 4kB, and this query is less.
Long SELECT queries works fine.
Have any idea? Maybe, I have to change postmaster's settings ? Query
executes from libpg programm.

Thanx,
---------------------------
Sergei Chernev
Internet: ser@nsu.ru
Phone: +7-3832-397354

David Hartwig

daveh@insightdist.com

over 27 years ago

In reply to: Sergei Chernev (#1)

Re: [GENERAL] Long update query ?

This is caused by a semi-well known weakness in the optimizer. The optimizer
rewrites the WHERE clause in conjunctive normal form (CNF):

(A and B) or (C and D) ==> (A or C) and (A or D) and (B or C) and (B or D)

Try this with your statement and you will see the expression explodes. Foe
now, I would suggest that you break this up into multiple statements.

Sergei Chernev wrote:

Show quoted text

Hello,
I have query:
UPDATE userd_session_stat SET status =1 WHERE status=0 AND ((uid <>627 AND
tty <>'ttyA03') OR (uid <> 425 AND tty <> 'ttyA05') OR (uid <> 8011 AND tty
<> 'ttyA09') OR (uid <> 2092 AND tty <> 'ttyA0f') OR (uid <> 249 AND tty <>
'ttyp3') OR (uid <> 249 AND tty <> 'ttyp4') OR (uid <> 249 AND tty <>
'ttyp5') OR (uid <> 249 AND tty <> 'ttyp6'))

But, postgres complains that:
FATAL 1: palloc failure: memory exhausted

I see, the query must be less than 4kB, and this query is less.
Long SELECT queries works fine.
Have any idea? Maybe, I have to change postmaster's settings ? Query
executes from libpg programm.

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: David Hartwig (#2)

RE: [GENERAL] Long update query ?

This is caused by a semi-well known weakness in the optimizer.
The optimizer
rewrites the WHERE clause in conjunctive normal form (CNF):

(A and B) or (C and D) ==> (A or C) and (A or D) and (B or C)
and (B or D)

Wouldn't disjunctive normal form be better, since it can be implemented as
the simple union of a set of small queries?

Taral

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#3)

Re: [GENERAL] Long update query ?

[Charset iso-8859-1 unsupported, filtering to ASCII...]

This is caused by a semi-well known weakness in the optimizer.
The optimizer
rewrites the WHERE clause in conjunctive normal form (CNF):

(A and B) or (C and D) ==> (A or C) and (A or D) and (B or C)
and (B or D)

Wouldn't disjunctive normal form be better, since it can be implemented as
the simple union of a set of small queries?

Please tell us more.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: Bruce Momjian (#4)

Re: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Wouldn't disjunctive normal form be better, since it can be

implemented as

the simple union of a set of small queries?

Please tell us more.

Well, I don't know how the backend processes queries, but one can imagine
this scenario (for DNF):

1) Analyze query and set up columns in result table
2) Rewrite query into DNF
3) Split query into subqueries
4) For each subquery:
a) Process query
b) Append matching tuples to result table
5) Do any post-processing (ORDER BY, etc.)
6) Return result

How is the processing currently done (with CNF)?

Taral

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#5)

Re: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

[Charset iso-8859-1 unsupported, filtering to ASCII...]

Wouldn't disjunctive normal form be better, since it can be

implemented as

the simple union of a set of small queries?

Please tell us more.

Well, I don't know how the backend processes queries, but one can imagine
this scenario (for DNF):

1) Analyze query and set up columns in result table
2) Rewrite query into DNF
3) Split query into subqueries
4) For each subquery:
a) Process query
b) Append matching tuples to result table
5) Do any post-processing (ORDER BY, etc.)
6) Return result

How is the processing currently done (with CNF)?

It currently convert to CNF so it can select the most restrictive
restriction and join, and use those first. However, the CNF conversion
is a memory exploder for some queries, and we certainly need to have
another method to split up those queries into UNIONS. I think we need
to code to identify those queries capable of being converted to UNIONS,
and do that before the query gets to the CNF section. That would be
great, and David Hartwig has implemented a limited capability of doing
this, but we really need a general routine to do this with 100%
reliability.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: Bruce Momjian (#6)

RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

It currently convert to CNF so it can select the most restrictive
restriction and join, and use those first. However, the CNF conversion
is a memory exploder for some queries, and we certainly need to have
another method to split up those queries into UNIONS. I think we need
to code to identify those queries capable of being converted to UNIONS,
and do that before the query gets to the CNF section. That would be
great, and David Hartwig has implemented a limited capability of doing
this, but we really need a general routine to do this with 100%
reliability.

Well, if you're talking about a routine to generate a heuristic for CNF vs.
DNF, it is possible to precalculate the query sizes for CNF and DNF
rewrites...

For conversion to CNF:

At every node:

if nodeType = AND then f(node) = f(left) + f(right)
if nodeType = OR then f(node) = f(left) * f(right)

f(root) = a reasonably (but not wonderful) metric

For DNF just switch AND and OR in the above. You may want to compute both
metrics and compare... take the smaller one and use that path.

How to deal with other operators depends on their implementation...

Taral

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#7)

Re: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

[Charset iso-8859-1 unsupported, filtering to ASCII...]

It currently convert to CNF so it can select the most restrictive
restriction and join, and use those first. However, the CNF conversion
is a memory exploder for some queries, and we certainly need to have
another method to split up those queries into UNIONS. I think we need
to code to identify those queries capable of being converted to UNIONS,
and do that before the query gets to the CNF section. That would be
great, and David Hartwig has implemented a limited capability of doing
this, but we really need a general routine to do this with 100%
reliability.

Well, if you're talking about a routine to generate a heuristic for CNF vs.
DNF, it is possible to precalculate the query sizes for CNF and DNF
rewrites...

For conversion to CNF:

At every node:

if nodeType = AND then f(node) = f(left) + f(right)
if nodeType = OR then f(node) = f(left) * f(right)

f(root) = a reasonably (but not wonderful) metric

For DNF just switch AND and OR in the above. You may want to compute both
metrics and compare... take the smaller one and use that path.

How to deal with other operators depends on their implementation...

[Moved to Hackers list.]

This is interesting. Check CNF size and DNF size. Choose smallest.
CNF uses existing code, DNF converts to UNIONs. How do you return the
proper rows with/without proper duplicates?

i.e.

SELECT * FROM tab1 WHERE x > 1 or x > 2

We need to return all rows where x > 1, even if some there are indentical
rows in tab1.

What I do in the index OR code is to test that rows in index matches
found in 2nd and 3rd index scans are false in earlier index scans. I am
not sure how to do that with a UNION query, but it may be possible.

We currently have UNION and UNION ALL, and I think we may need a new
UNION type internally to prevent 2nd and 3rd queries from returning rows
returned by earlier UNION queries.

This is interesting.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: Bruce Momjian (#8)

RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

This is interesting. Check CNF size and DNF size. Choose smallest.
CNF uses existing code, DNF converts to UNIONs. How do you return the
proper rows with/without proper duplicates?

Create a temporary oid hash? (for each table selected on, I guess)

Taral

#10

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#9)

Re: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

[Charset iso-8859-1 unsupported, filtering to ASCII...]

This is interesting. Check CNF size and DNF size. Choose smallest.
CNF uses existing code, DNF converts to UNIONs. How do you return the
proper rows with/without proper duplicates?

Create a temporary oid hash? (for each table selected on, I guess)

Taral

What I did with indexes was to run the previous OR clause index
restrictions through the qualification code, and make sure it failed,
but I am not sure how that is going to work with a more complex WHERE
clause. Perhaps I need to restrict this to just simple cases of
constants, which are easy to pick out an run through. Doing this with
joins would be very hard, I think.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#11

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: Bruce Momjian (#10)

RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Create a temporary oid hash? (for each table selected on, I guess)

What I did with indexes was to run the previous OR clause index
restrictions through the qualification code, and make sure it failed,
but I am not sure how that is going to work with a more complex WHERE
clause. Perhaps I need to restrict this to just simple cases of
constants, which are easy to pick out an run through. Doing this with
joins would be very hard, I think.

Actually, I was thinking more of an index of returned rows... After each
subquery, the backend would check each row to see if it was already in the
index... Simple duplicate check, in other words. Of course, I don't know how
well this would behave with large tables being returned...

Anyone else have some ideas they want to throw in?

Taral

#12

Noname

jwieck@debis.com

over 27 years ago

In reply to: Taral (#11)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Create a temporary oid hash? (for each table selected on, I guess)

What I did with indexes was to run the previous OR clause index
restrictions through the qualification code, and make sure it failed,
but I am not sure how that is going to work with a more complex WHERE
clause. Perhaps I need to restrict this to just simple cases of
constants, which are easy to pick out an run through. Doing this with
joins would be very hard, I think.

Actually, I was thinking more of an index of returned rows... After each
subquery, the backend would check each row to see if it was already in the
index... Simple duplicate check, in other words. Of course, I don't know how
well this would behave with large tables being returned...

Anyone else have some ideas they want to throw in?

Taral

But what about unions of join queries? Which OID's then should
be checked against which? And unions from view selects? There
are no OID's at all after rewriting.

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#13

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#11)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

[Charset iso-8859-1 unsupported, filtering to ASCII...]

Create a temporary oid hash? (for each table selected on, I guess)

What I did with indexes was to run the previous OR clause index
restrictions through the qualification code, and make sure it failed,
but I am not sure how that is going to work with a more complex WHERE
clause. Perhaps I need to restrict this to just simple cases of
constants, which are easy to pick out an run through. Doing this with
joins would be very hard, I think.

Actually, I was thinking more of an index of returned rows... After each
subquery, the backend would check each row to see if it was already in the
index... Simple duplicate check, in other words. Of course, I don't know how
well this would behave with large tables being returned...

Anyone else have some ideas they want to throw in?

I certainly think we are heading in the direction for a good general
solution.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#14

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Noname (#12)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Create a temporary oid hash? (for each table selected on, I guess)

What I did with indexes was to run the previous OR clause index
restrictions through the qualification code, and make sure it failed,
but I am not sure how that is going to work with a more complex WHERE
clause. Perhaps I need to restrict this to just simple cases of
constants, which are easy to pick out an run through. Doing this with
joins would be very hard, I think.

Actually, I was thinking more of an index of returned rows... After each
subquery, the backend would check each row to see if it was already in the
index... Simple duplicate check, in other words. Of course, I don't know how
well this would behave with large tables being returned...

Anyone else have some ideas they want to throw in?

Taral

But what about unions of join queries? Which OID's then should
be checked against which? And unions from view selects? There
are no OID's at all after rewriting.

Yep, you can't just use oid's, I think. Joins and specifiying a table
multiple times using a table alias would break this anyway.

CNF'ify only goes through the tables once, so we somehow need to
simulate this. Perhaps we can restrict the kinds of queries used for
DNF so we can do this easily.

Another idea is that we rewrite queries such as:

SELECT *
FROM tab
WHERE (a=1 AND b=2 AND c=3) OR
(a=1 AND b=2 AND c=4) OR
(a=1 AND b=2 AND c=5) OR
(a=1 AND b=2 AND c=6)

into:

SELECT *
FROM tab
WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6)

and we do this BEFORE calling cnfify(). How much does this do for us?

Seems this would not be too hard, and would be a good performer.

You could even convert:

SELECT *
FROM tab
WHERE (a=1 AND b=2 AND c=3) OR
(a=1 AND b=2 AND c=4) OR
(a=1 AND b=52 AND c=5) OR
(a=1 AND b=52 AND c=6)

into:

SELECT *
FROM tab
WHERE ((a=1 AND b=2) AND (c=3 OR c=4)) OR
WHERE ((a=1 AND b=52) AND (c=5 OR c=6))

This should work OK too. Someone want to try this? David, is this what
your code does?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#15

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: Bruce Momjian (#14)

RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Another idea is that we rewrite queries such as:

SELECT *
FROM tab
WHERE (a=1 AND b=2 AND c=3) OR
(a=1 AND b=2 AND c=4) OR
(a=1 AND b=2 AND c=5) OR
(a=1 AND b=2 AND c=6)

into:

SELECT *
FROM tab
WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6)

Very nice, but that's like trying to code factorization of numbers... not
pretty, and very CPU intensive on complex queries...

Taral

#16

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#15)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

[Charset iso-8859-1 unsupported, filtering to ASCII...]

Another idea is that we rewrite queries such as:

SELECT *
FROM tab
WHERE (a=1 AND b=2 AND c=3) OR
(a=1 AND b=2 AND c=4) OR
(a=1 AND b=2 AND c=5) OR
(a=1 AND b=2 AND c=6)

into:

SELECT *
FROM tab
WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6)

Very nice, but that's like trying to code factorization of numbers... not
pretty, and very CPU intensive on complex queries...

Yes, but how large are the WHERE clauses going to be? Considering the
cost of cnfify() and UNION, it seems like a clear win. Is it general
enough to solve our problems?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#17

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: Bruce Momjian (#16)

RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Very nice, but that's like trying to code factorization of

numbers... not

pretty, and very CPU intensive on complex queries...

Yes, but how large are the WHERE clauses going to be? Considering the
cost of cnfify() and UNION, it seems like a clear win. Is it general
enough to solve our problems?

Could be... the examples I received where the cnfify() was really bad were
cases where the query was submitted alredy in DNF... and where the UNION was
a simple one. However, I don't know of any algorithms for generic
simplification of logical constraints. One problem is resolution/selection
of factors:

SELECT * FROM a WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 2 AND c
= 3) OR (a = 1 AND b = 5 AND c = 3) OR (a = 1 AND b = 2 AND c = 6);

Try that on for size. You can understand why that code gets ugly, fast.
Somebody could try coding it, but it's not a clear win to me.

My original heuristic was missing one thing: "Where the heuristic fails to
process or decide, default to CNF." Since that's the current behavior, we're
less likely to break things.

Taral

#18

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#17)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

[Charset iso-8859-1 unsupported, filtering to ASCII...]

Very nice, but that's like trying to code factorization of

numbers... not

pretty, and very CPU intensive on complex queries...

Yes, but how large are the WHERE clauses going to be? Considering the
cost of cnfify() and UNION, it seems like a clear win. Is it general
enough to solve our problems?

Could be... the examples I received where the cnfify() was really bad were
cases where the query was submitted alredy in DNF... and where the UNION was
a simple one. However, I don't know of any algorithms for generic
simplification of logical constraints. One problem is resolution/selection
of factors:

SELECT * FROM a WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 2 AND c
= 3) OR (a = 1 AND b = 5 AND c = 3) OR (a = 1 AND b = 2 AND c = 6);

Try that on for size. You can understand why that code gets ugly, fast.
Somebody could try coding it, but it's not a clear win to me.

My original heuristic was missing one thing: "Where the heuristic fails to
process or decide, default to CNF." Since that's the current behavior, we're
less likely to break things.

OK, but if we use UNION, how to we return the proper rows? Is there any
solution for that, and we are executing the query over and over again.
Any factoring would be faster than running those multiple queries,
wouldn't it?

Also, I amagine the case where we are doing a join, so we have:

SELECT *
FROM tab1, tab2
WHERE tab1.col1 = tab2.col2 AND
((a=1 and b=2 and c=3) OR
(a=1 and b=2 and c=4))

How do we do that with UNION, and return the right rows. Seems the
_join_ happending multiple times would be much worse than the factoring.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#19

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: Bruce Momjian (#18)

RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

How do we do that with UNION, and return the right rows. Seems the
_join_ happending multiple times would be much worse than the factoring.

Ok... We have two problems:

1) DNF for unjoined queries.
2) Factorization for the rest.

I have some solutions for (1). Not for (2). Remember that unjoined queries
are quite common. :)

For (1), we can always try to parallel the multiple queries... especially in
the case where a sequential search is required.

Taral

#20

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#19)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

[Charset iso-8859-1 unsupported, filtering to ASCII...]

How do we do that with UNION, and return the right rows. Seems the
_join_ happending multiple times would be much worse than the factoring.

Ok... We have two problems:

1) DNF for unjoined queries.
2) Factorization for the rest.

I have some solutions for (1). Not for (2). Remember that unjoined queries
are quite common. :)

For (1), we can always try to parallel the multiple queries... especially in
the case where a sequential search is required.

I don't know how to return the proper rows with UNION.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#21

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#19)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

I have another idea.

When we cnfify, this:

(A AND B) OR (C AND D)

becomes

(A OR C) AND (A OR D) AND (B OR C) AND (B OR D)

however if A and C are identical, this could become:

(A OR A) AND (A OR D) AND (B OR A) AND (B OR D)

and A OR A is A:

A AND (A OR D) AND (B OR A) AND (B OR D)

and since we are now saying A has to be true, we can remove OR's with A:

A AND (B OR D)

Much smaller, and a big win for queries like:

SELECT *
FROM tab
WHERE (a=1 AND b=2) OR
(a=1 AND b=3)

This becomes:

(a=1) AND (b=2 OR b=3)

which is accurate, and uses our OR indexing.

Seems I could code cnfify() to look for identical qualifications in two
joined OR clauses and remove the duplicates.

Sound like big win, and fairly easy and inexpensive in processing time.

Comments?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#22

Taral

taral@mail.utexas.edu

over 27 years ago

In reply to: Bruce Momjian (#21)

RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

however if A and C are identical, this could become:

(A OR A) AND (A OR D) AND (B OR A) AND (B OR D)

and A OR A is A:

A AND (A OR D) AND (B OR A) AND (B OR D)

and since we are now saying A has to be true, we can remove OR's with A:

A AND (B OR D)

Very nice... and you could do that after each iteration of the rewrite,
preventing the size from getting too big. :)

I have a symbolic expression tree evaluator that would be perfect for
this... I'll see if I can't adapt it.

Can someone mail me the structures for expression trees? I don't want to
have to excise them from the source. Please?

Taral

#23

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: Taral (#22)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

[Charset iso-8859-1 unsupported, filtering to ASCII...]

however if A and C are identical, this could become:

(A OR A) AND (A OR D) AND (B OR A) AND (B OR D)

and A OR A is A:

A AND (A OR D) AND (B OR A) AND (B OR D)

and since we are now saying A has to be true, we can remove OR's with A:

A AND (B OR D)

Very nice... and you could do that after each iteration of the rewrite,
preventing the size from getting too big. :)

I have a symbolic expression tree evaluator that would be perfect for
this... I'll see if I can't adapt it.

Can someone mail me the structures for expression trees? I don't want to
have to excise them from the source. Please?

That is very hard to do. We have lots of structures involved. I
recommend you look at backend/optimizer/prep/prepqual.c. That has the
CNF'ify code, and I am studying it now. There are supporing functions
on backend/nodes that will allow comparisons of many structures.

We may not be that far off. normalize() does much of the work, and
qual_cleanup() reomves duplicates using remove_duplicates(), but
qual_cleanup() is only called after normalize completes, not during the
normalization, which seems to be the problem. If we can remove the
duplicates BEFORE the OR explosion, we are much better off.

You can then use ctags to jump around to see the supporting structures.
See the developers FAQ in the web site or doc directory.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#24

David Hartwig

daveh@insightdist.com

over 27 years ago

In reply to: Bruce Momjian (#21)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Bruce Momjian wrote:

I have another idea.

When we cnfify, this:

(A AND B) OR (C AND D)

becomes

(A OR C) AND (A OR D) AND (B OR C) AND (B OR D)

however if A and C are identical, this could become:

(A OR A) AND (A OR D) AND (B OR A) AND (B OR D)

and A OR A is A:

A AND (A OR D) AND (B OR A) AND (B OR D)

and since we are now saying A has to be true, we can remove OR's with A:

A AND (B OR D)

Much smaller, and a big win for queries like:

SELECT *
FROM tab
WHERE (a=1 AND b=2) OR
(a=1 AND b=3)

This becomes:

(a=1) AND (b=2 OR b=3)

which is accurate, and uses our OR indexing.

Seems I could code cnfify() to look for identical qualifications in two
joined OR clauses and remove the duplicates.

Sound like big win, and fairly easy and inexpensive in processing time.

Comments?

Apologies for not commenting sooner. I have been incognito.

As to your earlier question, Bruce, the KSQO patch rewrites qualifying
queries as UNIONs.

(A AND B) OR (C AND D) ==> (A AND B) UNION (C AND D)

The rules to qualify are fairly strict. Must be have ANDs; rectangular in
shape; all (VAR op CONST) type nodes; minimum of 10 nodes; etc. I was
targeting the keysets queries generated by ODBC tools.

As for the current direction this thread is going, (factoring) I have one
word of caution. PREPARE. If you take this route, you will never be able
to implement a workable PREPARE statement. I believe that in order for
PostgrerSQL ever become a industrial strength client/server it must implement
a PREPARE statement with parameters.

#25

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: David Hartwig (#24)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

(A AND B) OR (C AND D) ==> (A AND B) UNION (C AND D)

The rules to qualify are fairly strict. Must be have ANDs; rectangular in
shape; all (VAR op CONST) type nodes; minimum of 10 nodes; etc. I was
targeting the keysets queries generated by ODBC tools.

As for the current direction this thread is going, (factoring) I have one
word of caution. PREPARE. If you take this route, you will never be able
to implement a workable PREPARE statement. I believe that in order for
PostgrerSQL ever become a industrial strength client/server it must implement
a PREPARE statement with parameters.

I see that adding nodes it going to mess up prepare, but we already add
extra nodes as part of part of "col in (1, 2, 3)."

I think the PARAM's we already use will be duplicated/removed and still
retain their values for substitution. They just may be in a different
order.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#26

David Hartwig

daveh@insightdist.com

over 27 years ago

In reply to: Bruce Momjian (#25)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Bruce Momjian wrote:

(A AND B) OR (C AND D) ==> (A AND B) UNION (C AND D)

The rules to qualify are fairly strict. Must be have ANDs; rectangular in
shape; all (VAR op CONST) type nodes; minimum of 10 nodes; etc. I was
targeting the keysets queries generated by ODBC tools.

As for the current direction this thread is going, (factoring) I have one
word of caution. PREPARE. If you take this route, you will never be able
to implement a workable PREPARE statement. I believe that in order for
PostgrerSQL ever become a industrial strength client/server it must implement
a PREPARE statement with parameters.

I see that adding nodes it going to mess up prepare, but we already add
extra nodes as part of part of "col in (1, 2, 3)."

It's not extra nodes I am worried about. It is factored out nodes.

I think the PARAM's we already use will be duplicated/removed and still
retain their values for substitution. They just may be in a different
order.

I realize I may be stretching the point, since I brought it up I will complete my
thoughts. Now, you may understand this, but just to be sure. Here is a typical
client/server scenario:

- prepare statement S
- retrieve result description of S
- retrieve number of parameters of S
- retrieve parameter descriptions of S
- put data into parameters of S
- execute S
- retrieve result
[REPEAT]
- put different data into parameters of S
- execute S
- retrieve result
[END REPEAT]
- free statement S

The problem is that you cannot depend upon factoring to reduce these complex
statements. We need to retain a place holder (pointer) for each passed
parameter. Otherwise we need to re-(parse and plan) the statement before each
execution; thus, loosing one of the major benefits of PREPARE.

The other major benefits are:
1. Gaining access to the statement result description w/o having to actually
execute the statement. Client/server tools live off this stuff.
2. Smaller statement size. The parameters in the WHERE clause can be sent to that
backend in separate chunks.
Back to the subject at hand.

My point is that the factoring approach may be a bit short sighted in the long term
evolution of PostgreSQL.

#27

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: David Hartwig (#26)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

I see that adding nodes it going to mess up prepare, but we already add
extra nodes as part of part of "col in (1, 2, 3)."

It's not extra nodes I am worried about. It is factored out nodes.

I think the PARAM's we already use will be duplicated/removed and still
retain their values for substitution. They just may be in a different
order.

I realize I may be stretching the point, since I brought it up I will complete my
thoughts. Now, you may understand this, but just to be sure. Here is a typical
client/server scenario:

- prepare statement S
- retrieve result description of S
- retrieve number of parameters of S
- retrieve parameter descriptions of S
- put data into parameters of S
- execute S
- retrieve result
[REPEAT]
- put different data into parameters of S
- execute S
- retrieve result
[END REPEAT]
- free statement S

The problem is that you cannot depend upon factoring to reduce these complex
statements. We need to retain a place holder (pointer) for each passed
parameter. Otherwise we need to re-(parse and plan) the statement before each
execution; thus, loosing one of the major benefits of PREPARE.

The other major benefits are:
1. Gaining access to the statement result description w/o having to actually
execute the statement. Client/server tools live off this stuff.
2. Smaller statement size. The parameters in the WHERE clause can be sent to that
backend in separate chunks.
Back to the subject at hand.

My point is that the factoring approach may be a bit short sighted in the long term
evolution of PostgreSQL.

Yikes. I see what you mean. The factoring of one query with certain
constants will be different than another query. That will certainly be
a problem.

I still haven't had time to look over the cnfify code, to see if calling
qual_cleanup earlier in the code will help reduce the palloc failures.
If it is easy to do, I will implement it, and we can remove it or change
it once we start looking at prepared queries.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#28

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: David Hartwig (#26)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

The problem is that you cannot depend upon factoring to reduce these complex
statements. We need to retain a place holder (pointer) for each passed
parameter. Otherwise we need to re-(parse and plan) the statement before each
execution; thus, loosing one of the major benefits of PREPARE.

I think we already have such a problem. When using optimization
statistics, the optimizer checks the value of the constant to determine
how many rows will be returned by a "x > 10" by looking at the min/max
values for the column. Prepared queries where this value will change
would make that a problem.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#29

David Hartwig

daveh@insightdist.com

over 27 years ago

In reply to: Bruce Momjian (#28)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Bruce Momjian wrote:

The problem is that you cannot depend upon factoring to reduce these complex
statements. We need to retain a place holder (pointer) for each passed
parameter. Otherwise we need to re-(parse and plan) the statement before each
execution; thus, loosing one of the major benefits of PREPARE.

I think we already have such a problem. When using optimization
statistics, the optimizer checks the value of the constant to determine
how many rows will be returned by a "x > 10" by looking at the min/max
values for the column. Prepared queries where this value will change
would make that a problem.

Gad Zooks. The future is here. I wonder how Vadim's SPI_Prepare() will respond
to this. I have not used it much, but I believe it accepts parameters.

For that matter, I seem to recall some kind of reduction going on in the query
plan. In 6.3.2 something like:

-- with an index on bar
EXPLAIN SELECT stuff FROM foo WHERE bar = 1 OR bar = 2;
-- does not use index; this is expected in 6.3.2

EXPLAIN SELECT stuff FROM foo WHERE bar = 1 OR bar = 1;
-- uses index; I speculated on some reduction going on here.

...

I just tried it with on out with our corp (6.3.2) database. _day is an indexed field
on dates.

corp=> explain select * from dates where _day = '1/1/99';
NOTICE: QUERY PLAN:

Index Scan on dates (cost=2.05 size=1 width=24)

EXPLAIN
corp=> explain select * from dates where _day = '1/1/99' or _day = '1/1/99';
NOTICE: QUERY PLAN:

Index Scan on dates (cost=2.05 size=1 width=24)

EXPLAIN
corp=> explain select * from dates where _day = '1/1/99' or _day = '1/2/99';
NOTICE: QUERY PLAN:

Seq Scan on dates (cost=91.27 size=219 width=24)

SPI_prepare may need to be tested, along with your example, to see how it responds.

#30

Bruce Momjian

maillist@candle.pha.pa.us

over 27 years ago

In reply to: David Hartwig (#29)

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Gad Zooks. The future is here. I wonder how Vadim's SPI_Prepare() will respond
to this. I have not used it much, but I believe it accepts parameters.

For that matter, I seem to recall some kind of reduction going on in the query
plan. In 6.3.2 something like:

-- with an index on bar
EXPLAIN SELECT stuff FROM foo WHERE bar = 1 OR bar = 2;
-- does not use index; this is expected in 6.3.2

EXPLAIN SELECT stuff FROM foo WHERE bar = 1 OR bar = 1;
-- uses index; I speculated on some reduction going on here.

...

I just tried it with on out with our corp (6.3.2) database. _day is an indexed field
on dates.

corp=> explain select * from dates where _day = '1/1/99';
NOTICE: QUERY PLAN:

Index Scan on dates (cost=2.05 size=1 width=24)

EXPLAIN
corp=> explain select * from dates where _day = '1/1/99' or _day = '1/1/99';
NOTICE: QUERY PLAN:

Index Scan on dates (cost=2.05 size=1 width=24)

I believe this reduction is done by cnfify when it removes duplicates as
its last step.

EXPLAIN
corp=> explain select * from dates where _day = '1/1/99' or _day = '1/2/99';
NOTICE: QUERY PLAN:

Seq Scan on dates (cost=91.27 size=219 width=24)

Yes, sure looks like that is what is happening.

SPI_prepare may need to be tested, along with your example, to see how it responds.

I don't think it has actual values in the prepare, but just
place-holders, so it doesn't do the reduction, and my code wouldn't do
it either.

It is only when they use constants, and want to re-run the query with
new constants that could cause a problem.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026