Async IO HTTP server frontend for PostgreSQL

Started by Dmitriy Igrishinover 11 years ago12 messagesgeneral
Jump to latest
#1Dmitriy Igrishin
dmitigr@gmail.com

Dear community,

I need a %subj% -- high performance HTTP server solution
based on asynchronous IO with ability to run PostgreSQL's
functions from HTML templates asynchronously and passing
the results to the HTTP client.
For example, consider a simple template:
<html>
<div id="rows">
${get_rows(id := :id)}
</div>
</html>
The function get_rows() will be called asynchronously
during the dispatching HTTP request and the result of
it will streams immediately to the HTTP client via async IO.

Currently, I've found only a module for NGINX
https://github.com/FRiCKLE/ngx_postgres
but it does not what I need.

Ideally, it should be a simple C (libevent based) or C++
(boost::asio based) library.

Please, if anyone has a ready solution of the idea described
above, let me know, because I don't want waste my time to write
it from scratch.

--
// Dmitriy.

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Dmitriy Igrishin (#1)
Re: Async IO HTTP server frontend for PostgreSQL

On Mon, Sep 8, 2014 at 12:59 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:

Dear community,

I need a %subj% -- high performance HTTP server solution
based on asynchronous IO with ability to run PostgreSQL's
functions from HTML templates asynchronously and passing
the results to the HTTP client.
For example, consider a simple template:
<html>
<div id="rows">
${get_rows(id := :id)}
</div>
</html>
The function get_rows() will be called asynchronously
during the dispatching HTTP request and the result of
it will streams immediately to the HTTP client via async IO.

Currently, I've found only a module for NGINX
https://github.com/FRiCKLE/ngx_postgres
but it does not what I need.

Ideally, it should be a simple C (libevent based) or C++
(boost::asio based) library.

Please, if anyone has a ready solution of the idea described
above, let me know, because I don't want waste my time to write
it from scratch.

It's not in C, but you should take a very good look at node.js.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Dmitriy Igrishin
dmitigr@gmail.com
In reply to: Merlin Moncure (#2)
Re: Async IO HTTP server frontend for PostgreSQL

2014-09-09 1:28 GMT+04:00 Merlin Moncure <mmoncure@gmail.com>:

On Mon, Sep 8, 2014 at 12:59 PM, Dmitriy Igrishin <dmitigr@gmail.com>
wrote:

Dear community,

I need a %subj% -- high performance HTTP server solution
based on asynchronous IO with ability to run PostgreSQL's
functions from HTML templates asynchronously and passing
the results to the HTTP client.
For example, consider a simple template:
<html>
<div id="rows">
${get_rows(id := :id)}
</div>
</html>
The function get_rows() will be called asynchronously
during the dispatching HTTP request and the result of
it will streams immediately to the HTTP client via async IO.

Currently, I've found only a module for NGINX
https://github.com/FRiCKLE/ngx_postgres
but it does not what I need.

Ideally, it should be a simple C (libevent based) or C++
(boost::asio based) library.

Please, if anyone has a ready solution of the idea described
above, let me know, because I don't want waste my time to write
it from scratch.

It's not in C, but you should take a very good look at node.js.

merlin

Yeah, it looks interesting and AFAIK there are already bindings
for node.js to asynchronous libpq's API --
https://github.com/brianc/node-postgres/blob/master/src/binding.cc#L43
Thanks for the point, Merlin.

--
// Dmitriy.

#4David Boreham
david_list@boreham.org
In reply to: Dmitriy Igrishin (#3)
Re: Async IO HTTP server frontend for PostgreSQL

Hi Dmitriy, are you able to say a little about what's driving your quest
for async http-to-pg ?
I'm curious as to the motivations, and whether they match up with some
of my own reasons for wanting to use low-thread-count solutions.

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Dmitriy Igrishin
dmitigr@gmail.com
In reply to: David Boreham (#4)
Re: Async IO HTTP server frontend for PostgreSQL

Hello, David

2014-09-10 4:31 GMT+04:00 David Boreham <david_list@boreham.org>:

Hi Dmitriy, are you able to say a little about what's driving your quest
for async http-to-pg ?
I'm curious as to the motivations, and whether they match up with some of
my own reasons for wanting to use low-thread-count solutions.

For many web projects I consider Postgres as a development platform. Thus,
I prefer to keep the business logic (data integrity trigger functions and
API functions) in the database. Because of nature of the Web, many
concurrent
clients can request a site and I want to serve maximum possible of them with
minimal overhead. Also I want to avoid a complex solutions. So, I believe
that
with asynchronous solution it's possible to *stream* the data from the
database
to the maximum number of clients (which possible can request my site over a
slow connection).

--
// Dmitriy.

#6John DeSoi
desoi@pgedit.com
In reply to: David Boreham (#4)
Re: Async IO HTTP server frontend for PostgreSQL

On Sep 9, 2014, at 7:31 PM, David Boreham <david_list@boreham.org> wrote:

Hi Dmitriy, are you able to say a little about what's driving your quest for async http-to-pg ?
I'm curious as to the motivations, and whether they match up with some of my own reasons for wanting to use low-thread-count solutions.

For some discussion and preliminary design, see also

https://wiki.postgresql.org/wiki/HTTP_API

John DeSoi, Ph.D.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Dmitriy Igrishin
dmitigr@gmail.com
In reply to: John DeSoi (#6)
Re: Async IO HTTP server frontend for PostgreSQL

Hello, John

2014-09-10 17:25 GMT+04:00 John DeSoi <desoi@pgedit.com>:

On Sep 9, 2014, at 7:31 PM, David Boreham <david_list@boreham.org> wrote:

Hi Dmitriy, are you able to say a little about what's driving your quest

for async http-to-pg ?

I'm curious as to the motivations, and whether they match up with some

of my own reasons for wanting to use low-thread-count solutions.

For some discussion and preliminary design, see also

https://wiki.postgresql.org/wiki/HTTP_API

John DeSoi, Ph.D.

While this is not related to the %subj%, I've glanced and the link above.
And I found this complex. (As many many many "nice featured" things
on the market today.) Could you tell me please, for example, why the
URLs like
/databases/:database_name/schemas/:schema_name/tables/:table_name
OR
/databases/:database_name/schemas/:schema_name/table/:table_name/indexes/:index_name
Whats wrong with SQL way to get such info from sys. catalogs or
the information schema?
I guess, I know the answer -- "its just pretty nice and clear". But
I think it's just a waste of time (of users which should learn all of
these fancies, and developers which should implement them).

--
// Dmitriy.

#8Steve Atkins
steve@blighty.com
In reply to: Dmitriy Igrishin (#5)
Re: Async IO HTTP server frontend for PostgreSQL

On Sep 10, 2014, at 12:16 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:

Hello, David

2014-09-10 4:31 GMT+04:00 David Boreham <david_list@boreham.org>:
Hi Dmitriy, are you able to say a little about what's driving your quest for async http-to-pg ?
I'm curious as to the motivations, and whether they match up with some of my own reasons for wanting to use low-thread-count solutions.
For many web projects I consider Postgres as a development platform. Thus,
I prefer to keep the business logic (data integrity trigger functions and
API functions) in the database. Because of nature of the Web, many concurrent
clients can request a site and I want to serve maximum possible of them with
minimal overhead. Also I want to avoid a complex solutions. So, I believe that
with asynchronous solution it's possible to *stream* the data from the database
to the maximum number of clients (which possible can request my site over a
slow connection).

That's going to require you to have one database connection open for each
client. If the client is over a slow connection it'll keep the database connection
open far longer than is needed, (compared to the usual "pull data from the
database as fast as the disks will go, then spoonfeed it out to the slow client"
approach). Requiring a live database backend for every open client connection
doesn't seem like a good idea if you're supporting many slow concurrent clients.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Dmitriy Igrishin
dmitigr@gmail.com
In reply to: Steve Atkins (#8)
Re: Async IO HTTP server frontend for PostgreSQL

Hello, Steve

2014-09-10 21:08 GMT+04:00 Steve Atkins <steve@blighty.com>:

On Sep 10, 2014, at 12:16 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:

Hello, David

2014-09-10 4:31 GMT+04:00 David Boreham <david_list@boreham.org>:
Hi Dmitriy, are you able to say a little about what's driving your quest

for async http-to-pg ?

I'm curious as to the motivations, and whether they match up with some

of my own reasons for wanting to use low-thread-count solutions.

For many web projects I consider Postgres as a development platform.

Thus,

I prefer to keep the business logic (data integrity trigger functions and
API functions) in the database. Because of nature of the Web, many

concurrent

clients can request a site and I want to serve maximum possible of them

with

minimal overhead. Also I want to avoid a complex solutions. So, I

believe that

with asynchronous solution it's possible to *stream* the data from the

database

to the maximum number of clients (which possible can request my site

over a

slow connection).

That's going to require you to have one database connection open for each
client. If the client is over a slow connection it'll keep the database
connection
open far longer than is needed, (compared to the usual "pull data from the
database as fast as the disks will go, then spoonfeed it out to the slow
client"
approach). Requiring a live database backend for every open client
connection
doesn't seem like a good idea if you're supporting many slow concurrent
clients.

Good point. Thus, some of caching on the HTTP server side should be
implemented
then.

--
// Dmitriy.

#10Allan Kamau
kamauallan@gmail.com
In reply to: Dmitriy Igrishin (#9)
Re: Async IO HTTP server frontend for PostgreSQL

Dear Dmitriy,

To add on to David's suggestions, Data caching is a difficult task to
undertake. Consider an example where your data may not all fit into memory,
when you cache these data outside PostgreSQL you would need to look into
memory management as well as issues around concurrent population of this
cache as well as means to keep the data in the cache fresh in tune with any
changes to the data. These are no trivial tasks and the database community
has spent years constructing and improving algorithms to do this on behalf
of the front end database application developer. Also each time a TCP
connection is created, additional compute resources are consumed by the OS
as well as the database management server software.

A simpler way would be to use connection pooling where a thread of your
running application "borrows" a connection from a pool of open connections,
executes the SQL command then returns the connection immediately on
completion of the SQL command. This will require few concurrent connections
(depending of configuration) and let the database do the caching of the
data for you. For effective database data caching may need to make
adjustments of the PostgreSQL configuration file (postgresql.conf file) as
well as the operating system resources configuration. This way the response
time of your client application will degrade gracefully with the increase
of concurrent client requests.

For small number of concurrent connections, the speed advantage direct
"streaming" solution may have over the traditional connection pooling
solution may hardly be noticeable to end user. The easier way to increase
response time is to look into PostgreSQL performance tuning as well as
investing in faster hardware (mainly the the disk subsystem and more RAM).

Regards,

Allan.

On Wed, Sep 10, 2014 at 8:25 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:

Show quoted text

Hello, Steve

2014-09-10 21:08 GMT+04:00 Steve Atkins <steve@blighty.com>:

On Sep 10, 2014, at 12:16 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:

Hello, David

2014-09-10 4:31 GMT+04:00 David Boreham <david_list@boreham.org>:
Hi Dmitriy, are you able to say a little about what's driving your

quest for async http-to-pg ?

I'm curious as to the motivations, and whether they match up with some

of my own reasons for wanting to use low-thread-count solutions.

For many web projects I consider Postgres as a development platform.

Thus,

I prefer to keep the business logic (data integrity trigger functions

and

API functions) in the database. Because of nature of the Web, many

concurrent

clients can request a site and I want to serve maximum possible of them

with

minimal overhead. Also I want to avoid a complex solutions. So, I

believe that

with asynchronous solution it's possible to *stream* the data from the

database

to the maximum number of clients (which possible can request my site

over a

slow connection).

That's going to require you to have one database connection open for each
client. If the client is over a slow connection it'll keep the database
connection
open far longer than is needed, (compared to the usual "pull data from the
database as fast as the disks will go, then spoonfeed it out to the slow
client"
approach). Requiring a live database backend for every open client
connection
doesn't seem like a good idea if you're supporting many slow concurrent
clients.

Good point. Thus, some of caching on the HTTP server side should be
implemented
then.

--
// Dmitriy.

#11John DeSoi
desoi@pgedit.com
In reply to: Dmitriy Igrishin (#7)
Re: Async IO HTTP server frontend for PostgreSQL

On Sep 10, 2014, at 12:02 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:

While this is not related to the %subj%, I've glanced and the link above.
And I found this complex. (As many many many "nice featured" things
on the market today.) Could you tell me please, for example, why the
URLs like

I did not write the spec, so you'll have to post to the wiki to ask questions. Just thought it might be relevant for anyone interested in this topic.

Best,

John DeSoi, Ph.D.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Dmitriy Igrishin
dmitigr@gmail.com
In reply to: Allan Kamau (#10)
Re: Async IO HTTP server frontend for PostgreSQL

Hello, Allan

2014-09-11 0:29 GMT+04:00 Allan Kamau <kamauallan@gmail.com>:

Dear Dmitriy,

To add on to David's suggestions, Data caching is a difficult task to
undertake. Consider an example where your data may not all fit into memory,
when you cache these data outside PostgreSQL you would need to look into
memory management as well as issues around concurrent population of this
cache as well as means to keep the data in the cache fresh in tune with any
changes to the data. These are no trivial tasks and the database community
has spent years constructing and improving algorithms to do this on behalf
of the front end database application developer. Also each time a TCP
connection is created, additional compute resources are consumed by the OS
as well as the database management server software.

Memory - limited resource. Hence, it's possible to catch "out of memory"
everywhere.
By caching in this case I mean flushing the data retrieved by HTTP server
from the
database server on the disk during "socket read-ready event" dispatching if
the
retrieved data cannot be send to the HTTP-client (socket not write-ready
yet). When
the socket to the HTTP-client became "write-ready" the data will be streamed
from cache.
I believe, it's not hard to implement it. And note, there is no need to
make such
cache shared. It's just a temporary files with live time of HTTP request
dispatching.

A simpler way would be to use connection pooling where a thread of your
running application "borrows" a connection from a pool of open connections,
executes the SQL command then returns the connection immediately on
completion of the SQL command. This will require few concurrent connections
(depending of configuration) and let the database do the caching of the
data for you. For effective database data caching may need to make
adjustments of the PostgreSQL configuration file (postgresql.conf file) as
well as the operating system resources configuration. This way the response
time of your client application will degrade gracefully with the increase
of concurrent client requests.

In practice, this approach works. How well? It works. But with
thread/connection pools
your site can be blocked by attacker ("cool-hacker") by initiating many slow
concurrent connections.
Using threads a good only if you're doing a lot of CPU work, rather than
communication work.

For small number of concurrent connections, the speed advantage direct
“streaming” solution may have over the traditional connection pooling
solution may hardly be noticeable to end user. The easier way to increase
response time is to look into PostgreSQL performance tuning as well as
investing in faster hardware (mainly the the disk subsystem and more RAM).

Yes, I agree. Good and fast hardware is always good :-). But every hardware
will has limited resources anyway. And I believe, that for many tasks async
solution will help to leverage it to the max.

Regards,

Allan.

On Wed, Sep 10, 2014 at 8:25 PM, Dmitriy Igrishin <dmitigr@gmail.com>
wrote:

Hello, Steve

2014-09-10 21:08 GMT+04:00 Steve Atkins <steve@blighty.com>:

On Sep 10, 2014, at 12:16 AM, Dmitriy Igrishin <dmitigr@gmail.com>
wrote:

Hello, David

2014-09-10 4:31 GMT+04:00 David Boreham <david_list@boreham.org>:
Hi Dmitriy, are you able to say a little about what's driving your

quest for async http-to-pg ?

I'm curious as to the motivations, and whether they match up with some

of my own reasons for wanting to use low-thread-count solutions.

For many web projects I consider Postgres as a development platform.

Thus,

I prefer to keep the business logic (data integrity trigger functions

and

API functions) in the database. Because of nature of the Web, many

concurrent

clients can request a site and I want to serve maximum possible of

them with

minimal overhead. Also I want to avoid a complex solutions. So, I

believe that

with asynchronous solution it's possible to *stream* the data from the

database

to the maximum number of clients (which possible can request my site

over a

slow connection).

That's going to require you to have one database connection open for each
client. If the client is over a slow connection it'll keep the database
connection
open far longer than is needed, (compared to the usual "pull data from
the
database as fast as the disks will go, then spoonfeed it out to the slow
client"
approach). Requiring a live database backend for every open client
connection
doesn't seem like a good idea if you're supporting many slow concurrent
clients.

Good point. Thus, some of caching on the HTTP server side should be
implemented
then.

--
// Dmitriy.

--
// Dmitriy.