Asynchronous queries with bound data.

Started by Вячеслав Блинниковover 15 years ago7 messagesgeneral

slavmfm@gmail.com

over 15 years ago

Database connection using "libpq":
So, generally speaking, I need:
- send multiple queries using "PQsendQuery()" (or what else)
- bind some data (void*) to each qeury
- obtain results using "PQgetResult()" (or what else) and to know which data
is bound to each result (correspondence between queries and results)
How it can be implemented?

Peter Geoghegan

pg@bowt.ie

over 15 years ago

In reply to: Вячеслав Блинников (#1)

Re: Asynchronous queries with bound data.

2011/1/5 Вячеслав Блинников <slavmfm@gmail.com>:

Database connection using "libpq":
So, generally speaking, I need:
- send multiple queries using "PQsendQuery()" (or what else)
- bind some data (void*) to each qeury
- obtain results using "PQgetResult()" (or what else) and to know which data
is bound to each result (correspondence between queries and results)
How it can be implemented?

You cannot "bind" data to a query. You cannot execute two queries in
parallel with one connection. However, PGresults exist totally
independently or the connection or thread that originated them, so
feel free to stash them as you see fit.

libpq usually operates within the same thread context as the client
application, so in general it doesn't make sense to pass a function
pointer (which you suggested in a slightly earlier mail to the list)
that will be called back asynchronously like a signal handler.
Asynchronous command processing just exists as a way to keep a GUI
responsive and things like that, because PQExec() blocks.

Maybe it would help if you stepped back and described your problem in
broader terms.

--
Regards,
Peter Geoghegan

Вячеслав Блинников

slavmfm@gmail.com

over 15 years ago

In reply to: Peter Geoghegan (#2)

Re: Asynchronous queries with bound data.

The whole thing is:
- server connect to the database and wait for incoming connections
- when incoming connection occurs, server request the database for some data
about connected client - server must do it asynchronously and without
creating any threads just for connected client (there can be more than
thousand clients)
- at moment of requesting data from the database there already can be some
active requests
- when database respond it is need to know for which client this data
received (and what kind of request it was)
That's all.

P.S. it is possible to implement it requesting data one-by-one (adding each
request to the queue and popping (FIFO) each request after each database'
respond) but it is slow because it's need to wait data transferring through
the net while database (server) itself will be idling.

[Did I figured out with some "cc"? Can I remove the commented previous
letters in each respond?]

5 января 2011 г. 23:10 пользователь Peter Geoghegan <
peter.geoghegan86@gmail.com> написал:

Show quoted text

2011/1/5 Вячеслав Блинников <slavmfm@gmail.com>:

Database connection using "libpq":
So, generally speaking, I need:
- send multiple queries using "PQsendQuery()" (or what else)
- bind some data (void*) to each qeury
- obtain results using "PQgetResult()" (or what else) and to know which

data

is bound to each result (correspondence between queries and results)
How it can be implemented?

You cannot "bind" data to a query. You cannot execute two queries in
parallel with one connection. However, PGresults exist totally
independently or the connection or thread that originated them, so
feel free to stash them as you see fit.

libpq usually operates within the same thread context as the client
application, so in general it doesn't make sense to pass a function
pointer (which you suggested in a slightly earlier mail to the list)
that will be called back asynchronously like a signal handler.
Asynchronous command processing just exists as a way to keep a GUI
responsive and things like that, because PQExec() blocks.

Maybe it would help if you stepped back and described your problem in
broader terms.

--
Regards,
Peter Geoghegan

Peter Geoghegan

pg@bowt.ie

over 15 years ago

In reply to: Вячеслав Блинников (#3)

Re: Asynchronous queries with bound data.

2011/1/5 Вячеслав Блинников <slavmfm@gmail.com>:

The whole thing is:
- server connect to the database and wait for incoming connections
- when incoming connection occurs, server request the database for some data
about connected client - server must do it asynchronously and without
creating any threads just for connected client (there can be more than
thousand clients)
- at moment of requesting data from the database there already can be some
active requests
- when database respond it is need to know for which client this data
received (and what kind of request it was)
That's all.

P.S. it is possible to implement it requesting data one-by-one (adding each
request to the queue and popping (FIFO) each request after each database'
respond) but it is slow because it's need to wait data transferring through
the net while database (server) itself will be idling.

I'm afraid I don't understand your problem, but may I suggest that you:

1. Wrap the connection in a semaphore. This isn't pretty, but if I've
understood you correctly, it will do the job.

2. Have one database connection per "server" client. Having one
process that itself has up to a thousand clients but uses only one PG
connection is a very questionable approach - the single database
connection is certain to become a bottleneck.

and

3. Use a connection pooler.

--
Regards,
Peter Geoghegan

Вячеслав Блинников

slavmfm@gmail.com

over 15 years ago

In reply to: Peter Geoghegan (#4)

Re: Asynchronous queries with bound data.

1: Didn't figured out what it does mean - can you explain it better?

2: Operation system will refuse me to create thousand threads and, anyway,
database will return responds averagely just when all of them will be
accomplished.

3: I never close a connection once it was created, so any pool will not help
me (I google says right about "connection pool").

I found the expression: "you can't issue another PQsendQuery until you've
gotten that NULL." - so if it is true, then it's impossible (because it's
only one function available to send an asynchronous request) to send
multiple requests before any result will be gotten.

Problem can be observed from this abstract point of view:
Transferring data from application server (which connects to the database)
takes 200 ms (and the same amount to transfer backward); adding data to the
database and then selecting data (all in one request) from it takes 250 ms,
so each database operation (of such type) will take 200 + 250 + 200 = 650
ms. Two such operations will take 650 + 650 = 1300 ms, but if there existed
the way to send two queries at once and then get two results at once (of
course tracking the correspondence between requests/responds) we could
decrease such two "dialogs" from 1300 ms to 200 + 250 + 250 + 200 = 900 ms.
So, we win 400 ms - when there are thousand requests per several minutes -
it bocomes to be a very good time.

6 января 2011 г. 1:07 пользователь Peter Geoghegan <
peter.geoghegan86@gmail.com> написал:

Show quoted text

2011/1/5 Вячеслав Блинников <slavmfm@gmail.com>:

The whole thing is:
- server connect to the database and wait for incoming connections
- when incoming connection occurs, server request the database for some

data

about connected client - server must do it asynchronously and without
creating any threads just for connected client (there can be more than
thousand clients)
- at moment of requesting data from the database there already can be

some

active requests
- when database respond it is need to know for which client this data
received (and what kind of request it was)
That's all.

P.S. it is possible to implement it requesting data one-by-one (adding

each

request to the queue and popping (FIFO) each request after each database'
respond) but it is slow because it's need to wait data transferring

through

the net while database (server) itself will be idling.

I'm afraid I don't understand your problem, but may I suggest that you:

1. Wrap the connection in a semaphore. This isn't pretty, but if I've
understood you correctly, it will do the job.

or

2. Have one database connection per "server" client. Having one
process that itself has up to a thousand clients but uses only one PG
connection is a very questionable approach - the single database
connection is certain to become a bottleneck.

and

3. Use a connection pooler.

--
Regards,
Peter Geoghegan

Peter Geoghegan

pg@bowt.ie

over 15 years ago

In reply to: Вячеслав Блинников (#5)

Re: Asynchronous queries with bound data.

2011/1/6 Вячеслав Блинников <slavmfm@gmail.com>:

1: Didn't figured out what it does mean - can you explain it better?

http://ru.wikipedia.org/wiki/%D0%A1%D0%B5%D0%BC%D0%B0%D1%84%D0%BE%D1%80_(%D0%B8%D0%BD%D1%84%D0%BE%D1%80%D0%BC%D0%B0%D1%82%D0%B8%D0%BA%D0%B0)

2: Operation system will refuse me to create thousand threads and, anyway,
database will return responds averagely just when all of them will be
accomplished.

I don't know how I can help you, since you haven't explained the
architecture of your application very well.

3: I never close a connection once it was created, so any pool will not help
me (I google says right about "connection pool").

Maybe you should.

Problem can be observed from this abstract point of view:
Transferring data from application server (which connects to the database)
takes 200 ms (and the same amount to transfer backward); adding data to the
database and then selecting data (all in one request) from it takes 250 ms,
so each database operation (of such type) will take 200 + 250 + 200 = 650
ms. Two such operations will take 650 + 650 = 1300 ms, but if there existed
the way to send two queries at once and then get two results at once (of
course tracking the correspondence between requests/responds) we could
decrease such two "dialogs" from 1300 ms to 200 + 250 + 250 + 200 = 900 ms.
So, we win 400 ms - when there are thousand requests per several minutes -
it bocomes to be a very good time.

Databases are optimized for throughput, not latency. It isn't in
question that there would be less latency if we could parallelise the
queries. What is in question is:

1. Whether or not it matters.

2. Whether or not that's possible, given the restrictions you insist on.

--
Regards,
Peter Geoghegan

Вячеслав Блинников

slavmfm@gmail.com

over 15 years ago

In reply to: Peter Geoghegan (#6)

Re: Asynchronous queries with bound data.

2: Operation system will refuse me to create thousand threads and,

anyway,

database will return responds averagely just when all of them will be
accomplished.

I don't know how I can help you, since you haven't explained the
architecture of your application very well.

It does not make a point, but whatever: "application erchitecture" is the
implementation of distributed computing where multiple clients connect to
the server, get it's peace of job, caltulate and returns results back.
Server must track an all (generally speaking) job what was done by each
client (processors' data, adapter's data, amount of accomplished tasks,
calculation time, etc...).

3: I never close a connection once it was created, so any pool will not

help

me (I google says right about "connection pool").

Maybe you should.

No I shouldn't. I am the only one user of the database, so I am very welcome
there.

Problem can be observed from this abstract point of view:
Transferring data from application server (which connects to the

database)

takes 200 ms (and the same amount to transfer backward); adding data to

the

database and then selecting data (all in one request) from it takes 250

ms,

so each database operation (of such type) will take 200 + 250 + 200 = 650
ms. Two such operations will take 650 + 650 = 1300 ms, but if there

existed

the way to send two queries at once and then get two results at once (of
course tracking the correspondence between requests/responds) we could
decrease such two "dialogs" from 1300 ms to 200 + 250 + 250 + 200 = 900

ms.

So, we win 400 ms - when there are thousand requests per several minutes

-

it bocomes to be a very good time.

Databases are optimized for throughput, not latency. It isn't in
question that there would be less latency if we could parallelise the
queries

I did not talked (and both requested it) about any kind of some
parallelization - I just meant the database' ability to stack incoming
requests and then pop them one-by-one - just not refusing sending requests
when "there are one already in progress".
And it's not the point to start doing on this direction right now - it will
take weeks or months - and I believe that if it is not implemented already -
so there was some reasons for it - PostgreSQL exist not for the first day
(it's 6 years yonger than me).