client performance v.s. server statistics

Started by Zhou Hanalmost 14 years ago11 messages

zhouhan@gmail.com

almost 14 years ago

Hi,

I am checking a performance problem encountered after porting old embeded
DB to postgreSQL. While the system is real-time sensitive, we are
concerning for per-query cost. In our environment sequential scanning
(select * from ...) for a table with tens of thousands of record costs 1 -
2 seconds, regardless of using ODBC driver or the "timing" result shown in
psql client (which in turn, relies on libpq). However, using EXPLAIN
ANALYZE, or checking the statistics in pg_stat_statement view, the query
costs only less than 100ms.

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP? Has the
pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples
from shared buffers to result sets?

Could you experts share your views on this big gap? And any suggestions to
optimise?

P.S. In our original embeded DB a "fastpath" interface is provided to read
directly from shared memory for the records, thus provides extremely
realtime access (of course sacrifice some other features such as
consistency).

Best regards,
Han

Amit Kapila

amit.kapila@huawei.com

almost 14 years ago

In reply to: Zhou Han (#1)

Re: client performance v.s. server statistics

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?

The difference as compare to your embedded DB you are seeing is mainly seems
to be due to TCP.

One optimization you can use is to use Unix-domain socket mode of
PostgreSQL. You can refer unix_socket_directory parameter in postgresql.conf
and other related parameters.

I am suggesting you this as earlier you were using embedded DB, so your
client/server should be on same machine. If now this is not the case then it
will not work.

Can you please clarify some more things like

1. After doing sequence scan, do you need all the records in client for
which seq. scan is happening. If less records then why you have not created
index.

2. What is exact scenario for fetching records

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics

Hi,

I am checking a performance problem encountered after porting old embeded DB
to postgreSQL. While the system is real-time sensitive, we are concerning
for per-query cost. In our environment sequential scanning (select * from
...) for a table with tens of thousands of record costs 1 - 2 seconds,
regardless of using ODBC driver or the "timing" result shown in psql client
(which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or
checking the statistics in pg_stat_statement view, the query costs only less
than 100ms.

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP? Has the
pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples
from shared buffers to result sets?

Could you experts share your views on this big gap? And any suggestions to
optimise?

Best regards,
Han

Zhou Han

zhouhan@gmail.com

almost 14 years ago

In reply to: Amit Kapila (#2)

Re: client performance v.s. server statistics

Hi,

I have tried unix domain socket and the performance is similar with TCP
socket. It is MIPS architecture so memory copy to/from kernel can occupy
much time, and apparently using unit domain socket has no difference than
TCP in terms of memory copy.

But it is still unbelievable for the ten-fold gap between the client side
statistic and the server side statistics. So I want to know what exactly
the operations are involved in the server side statistics in EXPLAIN
ANALYZE. May I check the code later on when I get time.

For the query itself, it was just for performance comparison. There are
other index based queries, which are of course much faster, but still
result in similar ten-fold of time gap between client side and server side
statistics.

I am thinking of non-kernel involved client interface, is there such an
option, or do I have to develop one from scratch?

Best regards,
Han

On Wed, Feb 15, 2012 at 1:23 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

Show quoted text

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?****

** **

The difference as compare to your embedded DB you are seeing is mainly
seems to be due to TCP.****

One optimization you can use is to use Unix-domain socket mode of
PostgreSQL. You can refer unix_socket_directory parameter in
postgresql.conf and other related parameters. ****

I am suggesting you this as earlier you were using embedded DB, so your
client/server should be on same machine. If now this is not the case then
it will not work.****

** **

Can you please clarify some more things like****

**1. **After doing sequence scan, do you need all the records in
client for which seq. scan is happening. If less records then why you have
not created index.****

**2. **What is exact scenario for fetching records****

** **

** **

** **

* pgsql-hackers-owner@postgresql.org [mailto:
pgsql-hackers-owner@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics*

** **

Hi,

I am checking a performance problem encountered after porting old embeded
DB to postgreSQL. While the system is real-time sensitive, we are
concerning for per-query cost. In our environment sequential scanning
(select * from ...) for a table with tens of thousands of record costs 1 -
2 seconds, regardless of using ODBC driver or the "timing" result shown in
psql client (which in turn, relies on libpq). However, using EXPLAIN
ANALYZE, or checking the statistics in pg_stat_statement view, the query
costs only less than 100ms.
rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement
or EXPLAIN ANALYZE included the cost of copying tuples from shared buffers
to result sets?

Could you experts share your views on this big gap? And any suggestions to
optimise?

P.S. In our original embeded DB a "fastpath" interface is provided to read
directly from shared memory for the records, thus provides extremely
realtime access (of course sacrifice some other features such as
consistency).

Best regards,
Han****

Zhou Han

zhouhan@gmail.com

almost 14 years ago

In reply to: Zhou Han (#3)

Fwd: [HACKERS] client performance v.s. server statistics

Hi,

Forward my question here.

Best regards,
Han

---------- Forwarded message ----------
From: Zhou Han <zhouhan@gmail.com>
Date: Wed, Feb 15, 2012 at 3:01 PM
Subject: Re: [HACKERS] client performance v.s. server statistics
To: Amit Kapila <amit.kapila@huawei.com>
Cc: pgsql-hackers@postgresql.org

Hi,

I am thinking of non-kernel involved client interface, is there such an
option, or do I have to develop one from scratch?

Best regards,
Han

On Wed, Feb 15, 2012 at 1:23 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

Show quoted text

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?****

** **

The difference as compare to your embedded DB you are seeing is mainly
seems to be due to TCP.****

One optimization you can use is to use Unix-domain socket mode of
PostgreSQL. You can refer unix_socket_directory parameter in
postgresql.conf and other related parameters. ****

I am suggesting you this as earlier you were using embedded DB, so your
client/server should be on same machine. If now this is not the case then
it will not work.****

** **

Can you please clarify some more things like****

**1. **After doing sequence scan, do you need all the records in
client for which seq. scan is happening. If less records then why you have
not created index.****

**2. **What is exact scenario for fetching records****

** **

** **

** **

* pgsql-hackers-owner@postgresql.org [mailto:
pgsql-hackers-owner@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics*

** **

Hi,

I am checking a performance problem encountered after porting old embeded
DB to postgreSQL. While the system is real-time sensitive, we are
concerning for per-query cost. In our environment sequential scanning
(select * from ...) for a table with tens of thousands of record costs 1 -
2 seconds, regardless of using ODBC driver or the "timing" result shown in
psql client (which in turn, relies on libpq). However, using EXPLAIN
ANALYZE, or checking the statistics in pg_stat_statement view, the query
costs only less than 100ms.
rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement
or EXPLAIN ANALYZE included the cost of copying tuples from shared buffers
to result sets?

Could you experts share your views on this big gap? And any suggestions to
optimise?

P.S. In our original embeded DB a "fastpath" interface is provided to read
directly from shared memory for the records, thus provides extremely
realtime access (of course sacrifice some other features such as
consistency).

Best regards,
Han****

Zhou Han

zhouhan@gmail.com

almost 14 years ago

In reply to: Zhou Han (#4)

Fwd: [HACKERS] client performance v.s. server statistics

Hi,

Forward my question from HACKERS list to here (and added some more notes):

I have tried unix domain socket and the performance is similar with
TCP socket. It is MIPS architecture so memory copy to/from kernel can
occupy much time, and apparently using unit domain socket has no
difference than TCP in terms of memory copy.

But it is still unbelievable for the ten-fold gap between the client
side statistic and the server side statistics. So I want to know what
exactly the operations are involved in the server side statistics in
EXPLAIN ANALYZE. May I check the code later on when I get time.

For the query itself, it was just for performance comparison. There
are other index based queries, which are of course much faster, but
still result in similar ten-fold of time gap between client side and
server side statistics.

I am thinking of non-kernel involved client interface, is there such
an option, or do I have to develop one from scratch?

Besides, the test was done on the same host (without network cost).
And even considering the memory copying cost it is not reasonable,
because the client did similar job using another IPC mechanism via
kernel space to transfer the data again to another program, which
appears to be quite fast - costed even much less than the time shown
by EXPLAIN ANALYZE.

Is there anyone can help me to explain this?

Best regards,
Han

On Wed, Feb 15, 2012 at 1:23 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?

The difference as compare to your embedded DB you are seeing is mainly seems to be due to TCP.

One optimization you can use is to use Unix-domain socket mode of PostgreSQL. You can refer unix_socket_directory parameter in postgresql.conf and other related parameters.

I am suggesting you this as earlier you were using embedded DB, so your client/server should be on same machine. If now this is not the case then it will not work.

Can you please clarify some more things like

1. After doing sequence scan, do you need all the records in client for which seq. scan is happening. If less records then why you have not created index.

2. What is exact scenario for fetching records

pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics

Hi,

I am checking a performance problem encountered after porting old embeded DB to postgreSQL. While the system is real-time sensitive, we are concerning for per-query cost. In our environment sequential scanning (select * from ...) for a table with tens of thousands of record costs 1 - 2 seconds, regardless of using ODBC driver or the "timing" result shown in psql client (which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or checking the statistics in pg_stat_statement view, the query costs only less than 100ms.
rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples from shared buffers to result sets?

Could you experts share your views on this big gap? And any suggestions to optimise?

P.S. In our original embeded DB a "fastpath" interface is provided to read directly from shared memory for the records, thus provides extremely realtime access (of course sacrifice some other features such as consistency).

Best regards,
Han

--
Best regards,
Han

Andres Freund

andres@anarazel.de

almost 14 years ago

In reply to: Zhou Han (#5)

Re: Fwd: [HACKERS] client performance v.s. server statistics

Hi,
On Wednesday, February 15, 2012 11:19:00 AM Zhou Han wrote:

I have tried unix domain socket and the performance is similar with
TCP socket. It is MIPS architecture so memory copy to/from kernel can
occupy much time, and apparently using unit domain socket has no
difference than TCP in terms of memory copy.

But it is still unbelievable for the ten-fold gap between the client
side statistic and the server side statistics. So I want to know what
exactly the operations are involved in the server side statistics in
EXPLAIN ANALYZE. May I check the code later on when I get time.

My guess is that the time difference youre seing is actually the planning time.
The timing shown at the end of EXPLAIN ANALYZE is just the execution, not the
planning time. You can use "\timing on" in psql to let it display timing
information that include planning.

Whats the query?

For the query itself, it was just for performance comparison. There
are other index based queries, which are of course much faster, but
still result in similar ten-fold of time gap between client side and
server side statistics.

I am thinking of non-kernel involved client interface, is there such
an option, or do I have to develop one from scratch?

Its unlikely thats possible in a sensible amount of time. But I don't think
thats your problem anyway.

Andres

Han Zhou

zhouhan@gmail.com

almost 14 years ago

In reply to: Andres Freund (#6)

Re: Fwd: [HACKERS] client performance v.s. server statistics

Hi Andres,

May you missed my first post, and I paste it here again:
In our environment sequential scanning (select * from ...) for a table
with tens of thousands of record costs 1 - 2 seconds, regardless of
using ODBC driver or the "timing" result shown in psql client (which
in turn, relies on libpq). However, using EXPLAIN ANALYZE, or checking
the statistics in pg_stat_statement view, the query costs only less
than 100ms.

Has the pg_stat_statement or EXPLAIN ANALYZE included the cost of
copying tuples from shared buffers to result sets?

Best regards,
Han

On Wed, Feb 15, 2012 at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,
On Wednesday, February 15, 2012 11:19:00 AM Zhou Han wrote:

I have tried unix domain socket and the performance is similar with
TCP socket. It is MIPS architecture so memory copy to/from kernel can
occupy much time, and apparently using unit domain socket has no
difference than TCP in terms of memory copy.

But it is still unbelievable for the ten-fold gap between the client
side statistic and the server side statistics. So I want to know what
exactly the operations are involved in the server side statistics in
EXPLAIN ANALYZE. May I check the code later on when I get time.

My guess is that the time difference youre seing is actually the planning time.
The timing shown at the end of EXPLAIN ANALYZE is just the execution, not the
planning time. You can use "\timing on" in psql to let it display timing
information that include planning.

Whats the query?

For the query itself, it was just for performance comparison. There
are other index based queries, which are of course much faster, but
still result in similar ten-fold of time gap between client side and
server side statistics.

I am thinking of non-kernel involved client interface, is there such
an option, or do I have to develop one from scratch?

Its unlikely thats possible in a sensible amount of time. But I don't think
thats your problem anyway.

Andres

--
Best regards,
Han

Han Zhou

zhouhan@gmail.com

almost 14 years ago

In reply to: Han Zhou (#7)

Re: Fwd: [HACKERS] client performance v.s. server statistics

Hi,

To be more specific, I list my calculation here:
The timing shown in psql may include: plan + execution + copying to
result set in backend (does this step exist?) + transferring data to
client via socket.

Then I want to know what's the time shown in pg_stat_statement and
EXPLAIN ANALYZE in terms of the above mentioned parts. And why are the
gap is almost 10 times (100 ms v.s. 1 second)? As a comparison,
transferring same amount of data with unix domain socket should cost
only a very small fraction of this (almost negligible), according to
my other performance tests.

And I don't think the plan time plays an important role here in
EXPLAIN ANALYZE, because the command itself costs similar time to the
"Total runtime" as shown in psql (timing on), which means the plan is
too simple to take any significant part of time in this case.

Best regards,
Han

On Wed, Feb 15, 2012 at 7:02 PM, Han Zhou <zhouhan@gmail.com> wrote:

Hi Andres,

May you missed my first post, and I paste it here again:
In our environment sequential scanning (select * from ...) for a table
with tens of thousands of record costs 1 - 2 seconds, regardless of
using ODBC driver or the "timing" result shown in psql client (which
in turn, relies on libpq). However, using EXPLAIN ANALYZE, or checking
the statistics in pg_stat_statement view, the query costs only less
than 100ms.

Has the pg_stat_statement or EXPLAIN ANALYZE included the cost of
copying tuples from shared buffers to result sets?

Best regards,
Han

On Wed, Feb 15, 2012 at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,
On Wednesday, February 15, 2012 11:19:00 AM Zhou Han wrote:

I have tried unix domain socket and the performance is similar with
TCP socket. It is MIPS architecture so memory copy to/from kernel can
occupy much time, and apparently using unit domain socket has no
difference than TCP in terms of memory copy.

But it is still unbelievable for the ten-fold gap between the client
side statistic and the server side statistics. So I want to know what
exactly the operations are involved in the server side statistics in
EXPLAIN ANALYZE. May I check the code later on when I get time.

My guess is that the time difference youre seing is actually the planning time.
The timing shown at the end of EXPLAIN ANALYZE is just the execution, not the
planning time. You can use "\timing on" in psql to let it display timing
information that include planning.

Whats the query?

For the query itself, it was just for performance comparison. There
are other index based queries, which are of course much faster, but
still result in similar ten-fold of time gap between client side and
server side statistics.

I am thinking of non-kernel involved client interface, is there such
an option, or do I have to develop one from scratch?

Its unlikely thats possible in a sensible amount of time. But I don't think
thats your problem anyway.

Andres

--
Best regards,
Han

--
Best regards,
Han

Andres Freund

andres@anarazel.de

almost 14 years ago

In reply to: Han Zhou (#8)

Re: Fwd: [HACKERS] client performance v.s. server statistics

On Wednesday, February 15, 2012 12:33:13 PM Han Zhou wrote:

Hi,

To be more specific, I list my calculation here:
The timing shown in psql may include: plan + execution + copying to
result set in backend (does this step exist?) + transferring data to
client via socket.

Correct.

Then I want to know what's the time shown in pg_stat_statement and
EXPLAIN ANALYZE in terms of the above mentioned parts. And why are the
gap is almost 10 times (100 ms v.s. 1 second)? As a comparison,
transferring same amount of data with unix domain socket should cost
only a very small fraction of this (almost negligible), according to
my other performance tests.

Yea, you proved my quick theory wrong.

And I don't think the plan time plays an important role here in
EXPLAIN ANALYZE, because the command itself costs similar time to the
"Total runtime" as shown in psql (timing on), which means the plan is
too simple to take any significant part of time in this case.

Sounds like that.

It would be interesting to see the time difference between:
COPY (SELECT * FROM blub) TO '/tmp/somefile';
COPY (SELECT * FROM blub) TO '/tmp/somefile' BINARY;
EXPLAIN ANALYZE SELECT * FROM blub;

Andres

#10

Amit Kapila

amit.kapila@huawei.com

almost 14 years ago

In reply to: Zhou Han (#3)

Re: client performance v.s. server statistics

So I want to know what exactly the operations are involved in the server

side statistics in EXPLAIN ANALYZE

It gives the time for execution of Query on server. According to my
knowledge, it doesn't account for data to send over TCP.

From: Zhou Han [mailto:zhouhan@gmail.com]
Sent: Wednesday, February 15, 2012 12:32 PM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] client performance v.s. server statistics

Hi,

But it is still unbelievable for the ten-fold gap between the client side
statistic and the server side statistics. So I want to know what exactly the
operations are involved in the server side statistics in EXPLAIN ANALYZE.
May I check the code later on when I get time.

For the query itself, it was just for performance comparison. There are
other index based queries, which are of course much faster, but still result
in similar ten-fold of time gap between client side and server side
statistics.

I am thinking of non-kernel involved client interface, is there such an
option, or do I have to develop one from scratch?

Best regards,
Han

On Wed, Feb 15, 2012 at 1:23 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?

The difference as compare to your embedded DB you are seeing is mainly seems
to be due to TCP.

One optimization you can use is to use Unix-domain socket mode of
PostgreSQL. You can refer unix_socket_directory parameter in postgresql.conf
and other related parameters.

I am suggesting you this as earlier you were using embedded DB, so your
client/server should be on same machine. If now this is not the case then it
will not work.

Can you please clarify some more things like

1. After doing sequence scan, do you need all the records in client for
which seq. scan is happening. If less records then why you have not created
index.

2. What is exact scenario for fetching records

pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics

Hi,

I am checking a performance problem encountered after porting old embeded DB
to postgreSQL. While the system is real-time sensitive, we are concerning
for per-query cost. In our environment sequential scanning (select * from
...) for a table with tens of thousands of record costs 1 - 2 seconds,
regardless of using ODBC driver or the "timing" result shown in psql client
(which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or
checking the statistics in pg_stat_statement view, the query costs only less
than 100ms.

rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement or
EXPLAIN ANALYZE included the cost of copying tuples from shared buffers to
result sets?

Could you experts share your views on this big gap? And any suggestions to
optimise?

Best regards,
Han

#11

Han Zhou

zhouhan@gmail.com

almost 14 years ago

In reply to: Andres Freund (#9)

Re: Fwd: [HACKERS] client performance v.s. server statistics

Hi Andres,

Good hint!

DBRNWHSB=# COPY (SELECT * FROM my_large) TO '/tmp/somefile';
COPY 73728
Time: 1405.976 ms
DBRNWHSB=# COPY (SELECT * FROM my_large) TO '/tmp/somefile_binary' BINARY ;
COPY 73728
Time: 840.987 ms
DBRNWHSB=# EXPLAIN ANALYZE SELECT * FROM my_large;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on my_large (cost=0.00..1723.78 rows=80678 width=59)
(actual time=0.036..114.400 rows=73728 loops=1)
Total runtime: 171.561 ms
(2 rows)

Time: 172.523 ms
DBRNWHSB=# SELECT * FROM my_large;
...
Time: 1513.274 ms

In this test the record number is 73728, each with tens of bytes. The
size of somefile is 5,455,872, and the size of somefile_binary is even
more: 6,782,997. However, BINARY COPY to memory file costs lower, so
it means something else, e.g. result preparing is taking CPU time. But
even the BINARY COPY still takes much more time than the ANALYZE:
840ms v.s. 172ms. So I guess most time is spent in preparing +
transferring result in backend, and this part of time is not counted
in the ANALYZE or pg_stat_statement statistics.

If this assumption is true, then is it possible to optimise towards
the result preparing and transferring in backend? Or is there any
"bulk" output operation already supported in some existing PostgreSQL
options?

Best regards,
Han

On Wed, Feb 15, 2012 at 7:36 PM, Andres Freund <andres@anarazel.de> wrote:

On Wednesday, February 15, 2012 12:33:13 PM Han Zhou wrote:

Hi,

To be more specific, I list my calculation here:
The timing shown in psql may include: plan + execution + copying to
result set in backend (does this step exist?) + transferring data to
client via socket.

Correct.

Then I want to know what's the time shown in pg_stat_statement and
EXPLAIN ANALYZE in terms of the above mentioned parts. And why are the
gap is almost 10 times (100 ms v.s. 1 second)? As a comparison,
transferring same amount of data with unix domain socket should cost
only a very small fraction of this (almost negligible), according to
my other performance tests.

Yea, you proved my quick theory wrong.

And I don't think the plan time plays an important role here in
EXPLAIN ANALYZE, because the command itself costs similar time to the
"Total runtime" as shown in psql (timing on), which means the plan is
too simple to take any significant part of time in this case.

Sounds like that.

It would be interesting to see the time difference between:
COPY (SELECT * FROM blub) TO '/tmp/somefile';
COPY (SELECT * FROM blub) TO '/tmp/somefile' BINARY;
EXPLAIN ANALYZE SELECT * FROM blub;

Andres

--
Best regards,
Han