Opptimizing projections containing unused columns

Started by Andrusover 17 years ago12 messagesgeneral
Jump to latest
#1Andrus
kobruleht2@hot.ee

I have lot of autogenerated from projection queries in form

SELECT source.c1, source.c2, t1.col1, t1.col2, ...
FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source
LEFT JOIN t2 USING (somecolumn)

Main SELECT uses only few columns (source.c1 and source.c2 in this sample)
from source projection.
Inner table may have up to 500000 rows.

I read 8.4 WITH documentation and this describes somewhat that PostgreSQL
can discard unused columns automatically but this is not clear. I havent
found exact doc about this. I havent way how to determine this using EXPLAIN
or similar command.

Questions:

1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance
?
2. Will PostgreSQL 8.0+ automatically exclude columns c3 .. c20 while
executing
query ?
3. Or should I make sql builder much more sophisticated so that it will not
generate columns c3 .. c20 when it creates sql ?

Andrus.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrus (#1)
Re: Opptimizing projections containing unused columns

"Andrus" <kobruleht2@hot.ee> writes:

I have lot of autogenerated from projection queries in form
SELECT source.c1, source.c2, t1.col1, t1.col2, ...
FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source
LEFT JOIN t2 USING (somecolumn)
Main SELECT uses only few columns (source.c1 and source.c2 in this sample)
from source projection.

1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance

Don't worry about it. All modern Postgres versions ignore columns that
aren't actually used in the query --- at least for examples as simple as
this one. In cases where you intentionally defeat optimization (eg via
OFFSET 0 in a sub-select) it's possible that the sub-select will compute
all its output columns even though the upper query doesn't use 'em all.

The width field in EXPLAIN output offers a good hint as to how many
columns the query is actually fetching.

regards, tom lane

#3Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Tom Lane (#2)
Re: Opptimizing projections containing unused columns

looks like most ppl nowdays have two simple problems, and try to work
against it. Instead they all should focus on getting their data organized
properly, and queries writeen for project before they start to code other
stuff.

The problems are: trying to outsmart db, still belive that you can catch all
data to memory, and work it out in software.

I see it time and time again :/

#4Andrus
kobruleht2@hot.ee
In reply to: Tom Lane (#2)
Re: Opptimizing projections containing unused columns

Tom,

Don't worry about it. All modern Postgres versions ignore columns that
aren't actually used in the query --- at least for examples as simple as
this one. In cases where you intentionally defeat optimization (eg via
OFFSET 0 in a sub-select) it's possible that the sub-select will compute
all its output columns even though the upper query doesn't use 'em all.

The width field in EXPLAIN output offers a good hint as to how many
columns the query is actually fetching.

thank you very much.
I don't use OFFSET clause in those queries and have servers starting at 8.0
version.
So I hope this works.

I have also a related question.
Inner query returns about 500000 records, outer query returns about 10000
records.

Is it OK to put all filters to outer query WHERE clase?
Or should I move as many filter conditions as possible to inner query so
that inner query returns 10000 records instead of 500000 records.

Is there difference in perfomance if inner query returs large number of rows
which are filtered only by outer query ?

Andrus.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrus (#4)
Re: Opptimizing projections containing unused columns

"Andrus" <kobruleht2@hot.ee> writes:

Is it OK to put all filters to outer query WHERE clase?
Or should I move as many filter conditions as possible to inner query so
that inner query returns 10000 records instead of 500000 records.

Is there difference in perfomance if inner query returs large number of rows
which are filtered only by outer query ?

This question is too vague to be answerable --- especially if you want
an answer that applies across all 8.x releases. I'd suggest
experimenting a bit using EXPLAIN ANALYZE to see what happens in your
actual application.

regards, tom lane

#6Andrus
kobruleht2@hot.ee
In reply to: Tom Lane (#5)
Re: Optimizing projections containing unused columns

Tom,

This question is too vague to be answerable --- especially if you want
an answer that applies across all 8.x releases. I'd suggest
experimenting a bit using EXPLAIN ANALYZE to see what happens in your
actual application.

Thank you very much.
I cannot experiment with application currently since this part is at
planning state.
I asked this to get right designing direction.

OK, I will create applicaton so that inner WHERE clauses contain as much
restrictions as it is possible without adding additional tables to
inner SELECTs. Hopefully this cannot be slower than moving those
restrictions to outer select.

A third related issue:

Is it OK to use 5 levels of nested queries like

SELECT ...
FROM ...
(SELECT .. FROM ...
(SELECT ... FROM ..
...
) p1
) p2

or is it better to create queries with as few levels of nestings as possible
?
In some cases we have selection to use UNIONS, JOINS and CREATE TEMP TABLE x
ON COMMIT DROP clauses instead of nested queries but
nested queries look more natural to create.

Currently we are planning to support servers starting at 8.0 so we are not
planning to use WITH statement.

Andrus

#7陈伟楠
weinan@staff.sina.com.cn
In reply to: Tom Lane (#5)
Re: Opptimizing projections containing unused columns

Andrus,Hi!

1.From the query plan,it makes no difference.

2.PostgreSQL 8.0+ will not exclude columns c3..c20 while executing.

3.I just make a test,From query plan or executing time,it makes no difference.

I'm from China.I hope you could see what I mean :)

============================================================

I have lot of autogenerated from projection queries in form

SELECT source.c1, source.c2, t1.col1, t1.col2, ...
FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source
LEFT JOIN t2 USING (somecolumn)

Main SELECT uses only few columns (source.c1 and source.c2 in this sample)
from source projection.
Inner table may have up to 500000 rows.

I read 8.4 WITH documentation and this describes somewhat that PostgreSQL
can discard unused columns automatically but this is not clear. I havent
found exact doc about this. I havent way how to determine this using EXPLAIN
or similar command.

Questions:

1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance
?
2. Will PostgreSQL 8.0+ automatically exclude columns c3 .. c20 while
executing
query ?
3. Or should I make sql builder much more sophisticated so that it will not
generate columns c3 .. c20 when it creates sql ?

Andrus.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

weinna.chen

Beijing China

#8Martijn van Oosterhout
kleptog@svana.org
In reply to: Andrus (#6)
Re: Optimizing projections containing unused columns

On Wed, Oct 15, 2008 at 09:07:28PM +0300, Andrus wrote:

Tom,

This question is too vague to be answerable --- especially if you want
an answer that applies across all 8.x releases. I'd suggest
experimenting a bit using EXPLAIN ANALYZE to see what happens in your
actual application.

Thank you very much.
I cannot experiment with application currently since this part is at
planning state.
I asked this to get right designing direction.

The basic answer is: every version of pg gets smarter. And the current
version is really smart. What you really have to watch out for is that
the moving of the clause really works. In the presense of outer joins
and NULL some things arn't as obvious as they seem...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#9Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Martijn van Oosterhout (#8)
Re: Optimizing projections containing unused columns

not so smart yet ;)
thickbook:~ gj$ psql gj
Timing is on.
psql (8.4devel)
Type "help" for help.

gj=# create table smartarse(id bigserial, v varchar not null , primary key
(id, v));
NOTICE: CREATE TABLE will create implicit sequence "smartarse_id_seq" for
serial column "smartarse.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"smartarse_pkey" for table "smartarse"
CREATE TABLE
Time: 172,315 ms
gj=# \d+ smartarse
Table "public.smartarse"
Column | Type | Modifiers
| Storage | Description
--------+-------------------+--------------------------------------------------------+----------+-------------
id | bigint | not null default
nextval('smartarse_id_seq'::regclass) | plain |
v | character varying | not null
| extended |
Indexes:
"smartarse_pkey" PRIMARY KEY, btree (id, v)
Has OIDs: no

gj=# insert into smartarse(v) values('one'), ('two'), ('three'), ('four');
INSERT 0 4
Time: 1,500 ms
gj=# explain select count(*) from smartarse where v is null;
QUERY PLAN
----------------------------------------------------------------
Aggregate (cost=21.62..21.63 rows=1 width=0)
-> Seq Scan on smartarse (cost=0.00..21.60 rows=6 width=0)
Filter: (v IS NULL)
(3 rows)

Time: 1,115 ms

but, getting more and more smart - that's for sure :)

#10Andrus
kobruleht2@hot.ee
In reply to: Grzegorz Jaśkiewicz (#9)
Re: Optimizing projections containing unused columns

gj=# explain select count(*) from smartarse where v is null;
----------------------------------------------------------------
Aggregate (cost=21.62..21.63 rows=1 width=0)

< -> Seq Scan on smartarse (cost=0.00..21.60 rows=6 width=0)

Thread question and sample was about using nested queries.
Your sample does not use nested queries but it shows (width=0) that all
columns are excluded.

So I do'nt understand how this applies to topic.

Andrus.

#11Andrus
kobruleht2@hot.ee
In reply to: Andrus (#10)
Re: Optimizing projections containing unused columns

So I do'nt understand how this applies to topic.

I'm sorry, after re-reading yuor reply I understood this.
You think thank PostgreSQL should use not null inforamtion from metadata.

Maybe it is better to create separate static code analyzer like FxCop,
Smokey or Gendarme for C# for PostgreSQL to analyze code.

Andrus.

#12Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Andrus (#11)
Re: Optimizing projections containing unused columns

I just generally wanted to show, that even tho it is much smarter from
version to version, it still has this little things. I generally don't put
'is null' or 'is not null' in queries on column that is 'not null' - but it
is imo example of those little things, where postgres isn't so smart yet :)