Order of columns in query is important?!

Started by Colin 't Hartabout 11 years ago8 messageshackers

Jump to latest

Colin 't Hart

colin@sharpheart.org

about 11 years ago

Hi,

I hope this is the best place to report this or should I be on
pgsql-general or pgsql-bugs?

It seems that the order of columns in a query can make a difference in
execution times.

In my brief investigation, queries on table(a,b,c,d,e,f,g,h) of the form

select * from table order by non-indexed-column limit 25;
select a,b,c,d,e,f,g,h from table order by non-indexed-column limit 25;

performed the same (approx 1.5 seconds on our customers table --
rows=514431 width=215), while the query

select h,g,f,e,d,c,b,a from table order by non-indexed-column limit 25;

was about 50% slower (approx 2.2 seconds on our customers table).

I had expected these to perform the same -- to my mind column ordering
in a query should be purely presentation -- as far as I'm concerned,
the DBMS can retrieve the columns in a different order as long as it
displays it in the order I've asked for them. Although, again, the
order of columns in a resultset in a Java or Python is mostly
irrelevant, though when displayed in psql I'd want the columns in the
order I asked for them.

Is there really something strange happening here? Or perfectly
explainable and expected?

Regards,

Colin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Colin 't Hart

colin@sharpheart.org

about 11 years ago

In reply to: Colin 't Hart (#1)

Re: Order of columns in query is important?!

I should have mentioned that it wasn't me that discovered this but
"Evgeny" on dba.stackexchange
He was reporting a much greater disparity in times.
See http://dba.stackexchange.com/questions/102403/why-is-select-much-faster-than-selecting-all-columns-by-name

Thanks,

Colin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

about 11 years ago

In reply to: Colin 't Hart (#1)

Re: Order of columns in query is important?!

On 2015-05-25 PM 06:26, Colin 't Hart wrote:

It seems that the order of columns in a query can make a difference in
execution times.

In my brief investigation, queries on table(a,b,c,d,e,f,g,h) of the form

select * from table order by non-indexed-column limit 25;
select a,b,c,d,e,f,g,h from table order by non-indexed-column limit 25;

performed the same (approx 1.5 seconds on our customers table --
rows=514431 width=215), while the query

select h,g,f,e,d,c,b,a from table order by non-indexed-column limit 25;

was about 50% slower (approx 2.2 seconds on our customers table).

I had expected these to perform the same -- to my mind column ordering
in a query should be purely presentation -- as far as I'm concerned,
the DBMS can retrieve the columns in a different order as long as it
displays it in the order I've asked for them. Although, again, the
order of columns in a resultset in a Java or Python is mostly
irrelevant, though when displayed in psql I'd want the columns in the
order I asked for them.

Is there really something strange happening here? Or perfectly
explainable and expected?

I think any difference may have to do with an extra projection step on top of
the underlying scan when the target list does not match the tuple descriptor.
When that happens there has to happen additional processing in Sort data
initialization which converts the data back (from values[], nulls[] lists
form) to a form that sorting code expects/understands.

That means the specified order of columns in a query does matter which would
have to match the defined order in order to avoid extra processing (that is
only when specified columns *exactly* matches the tuple descriptor).

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

CK Tan

cktan@vitessedata.com

about 11 years ago

In reply to: Colin 't Hart (#1)

Re: Order of columns in query is important?!

It has to do with the implementation of slot_getattr, which tries to do the
deform on-demand lazily.

if you do select a,b,c, the execution would do slot_getattr(1) and deform
a, and then slot_getattr(2) which reparse the tuple to deform b, and
finally slot_getattr(3), which parse the tuple yet again to deform c.

Where as if you do select c, b, a, it would do slot_getattr(3) to deform c,
and in the process deform a and b in one pass. Subsequent calls to
slot_getattr 1 and 2 would find the attribute ready and available, and
return it (without parsing the tuple again).

For Vitesse X, we mark all columns that were required in the query during
JIT compile, and deform it in one shot. PG should be able to do the same.

-cktan

On Mon, May 25, 2015 at 2:26 AM, Colin 't Hart <colin@sharpheart.org> wrote:

Show quoted text

Hi,

I hope this is the best place to report this or should I be on
pgsql-general or pgsql-bugs?

It seems that the order of columns in a query can make a difference in
execution times.

In my brief investigation, queries on table(a,b,c,d,e,f,g,h) of the form

select * from table order by non-indexed-column limit 25;
select a,b,c,d,e,f,g,h from table order by non-indexed-column limit 25;

performed the same (approx 1.5 seconds on our customers table --
rows=514431 width=215), while the query

select h,g,f,e,d,c,b,a from table order by non-indexed-column limit 25;

was about 50% slower (approx 2.2 seconds on our customers table).

I had expected these to perform the same -- to my mind column ordering
in a query should be purely presentation -- as far as I'm concerned,
the DBMS can retrieve the columns in a different order as long as it
displays it in the order I've asked for them. Although, again, the
order of columns in a resultset in a Java or Python is mostly
irrelevant, though when displayed in psql I'd want the columns in the
order I asked for them.

Is there really something strange happening here? Or perfectly
explainable and expected?

Regards,

Colin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: CK Tan (#4)

Re: Order of columns in query is important?!

On 26/05/15 11:59, CK Tan wrote:

It has to do with the implementation of slot_getattr, which tries to do
the deform on-demand lazily.

if you do select a,b,c, the execution would do slot_getattr(1) and
deform a, and then slot_getattr(2) which reparse the tuple to deform b,
and finally slot_getattr(3), which parse the tuple yet again to deform c.

Where as if you do select c, b, a, it would do slot_getattr(3) to deform
c, and in the process deform a and b in one pass. Subsequent calls to
slot_getattr 1 and 2 would find the attribute ready and available, and
return it (without parsing the tuple again).

If this was the case, changing column order would lead to performance
increase, not decrease as reported.

My guess would be same as Amits, it's most likely the additional
projection step.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

CK Tan

cktan@vitessedata.com

about 11 years ago

In reply to: Petr Jelinek (#5)

Re: Order of columns in query is important?!

You're right. I misread the problem description.

On Tue, May 26, 2015 at 3:13 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:

Show quoted text

On 26/05/15 11:59, CK Tan wrote:

It has to do with the implementation of slot_getattr, which tries to do
the deform on-demand lazily.

if you do select a,b,c, the execution would do slot_getattr(1) and
deform a, and then slot_getattr(2) which reparse the tuple to deform b,
and finally slot_getattr(3), which parse the tuple yet again to deform c.

Where as if you do select c, b, a, it would do slot_getattr(3) to deform
c, and in the process deform a and b in one pass. Subsequent calls to
slot_getattr 1 and 2 would find the attribute ready and available, and
return it (without parsing the tuple again).

If this was the case, changing column order would lead to performance
increase, not decrease as reported.

My guess would be same as Amits, it's most likely the additional
projection step.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Simon Riggs

simon@2ndQuadrant.com

about 11 years ago

In reply to: Amit Langote (#3)

Re: Order of columns in query is important?!

On 25 May 2015 at 11:48, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2015-05-25 PM 06:26, Colin 't Hart wrote:

That means the specified order of columns in a query does matter which
would
have to match the defined order in order to avoid extra processing (that is
only when specified columns *exactly* matches the tuple descriptor).

...and it matters a lot in those cases because we are sorting all of the
data scanned, not just 25 rows.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: CK Tan (#4)

Re: Order of columns in query is important?!

CK Tan <cktan@vitessedata.com> writes:

For Vitesse X, we mark all columns that were required in the query during
JIT compile, and deform it in one shot. PG should be able to do the same.

See ExecProject().

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers