Connections dropping while using Postgres backend DB with Ejabberd

Started by Dipanjan Gangulyalmost 6 years ago6 messages
#1Dipanjan Ganguly
dipagnjan@gmail.com
2 attachment(s)

Greetings,

I was trying to use postgresql database as a backend with Ejabberd XMPP
server for load test (Using TSUNG).

Noticed, while using Mnesia the “simultaneous users and open TCP/UDP
connections” graph in Tsung report is showing consistency, but while using
Postgres, we see drop in connections during 100 to 500 seconds of runtime,
and then recovering and staying consistent.

I have been trying to figure out what the issue could be without any
success. I am kind of a noob in this technology, and hoping for some help
from the good people from the community to understand the problem and how
to fix this. Below are some details..

· Postgres server utilization is low ( Avg load 1, Highest Cpu
utilization 26%, lowest freemem 9000)

Tsung graph:
[image: image.png]
Graph 1: Postgres 12 Backen
[image: image.png]

Graph 2: Mnesia backend

· Ejabberd Server: Ubuntu 16.04, 16 GB ram, 4 core CPU.

· Postgres on remote server: same config

· Errors encountered during the same time: error_connect_etimedout
(same outcome for other 2 tests)

· *Tsung Load: *512 Bytes message size, user arrival rate 50/s,
80k registered users.

· Postgres server utilization is low ( Avg load 1, Highest Cpu
utilization 26%, lowest freemem 9000)

· Same tsung.xm and userlist used for the tests in Mnesia and
Postgres.

*Postgres Configuration used:*
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 256MB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
max_connections=50000

Kindly help understanding this behavior. Some advice on how to fix this
will be a big help .

Thanks,

Dipanjan

Attachments:

image.pngimage/png; name=image.pngDownload
image.pngimage/png; name=image.pngDownload
#2Justin
zzzzz.graf@gmail.com
In reply to: Dipanjan Ganguly (#1)
2 attachment(s)
Re: Connections dropping while using Postgres backend DB with Ejabberd

Hi Dipanjan

Please do not post to all the postgresql mailing list lets keep this on one
list at a time, Keep this on general list

Am i reading this correctly 10,000 to 50,000 open connections.
Postgresql really is not meant to serve that many open connections.
Due to design of Postgresql each client connection can use up to the
work_mem of 256MB plus additional for parallel processes. Memory will be
exhausted long before 50,0000 connections is reached

I'm not surprised Postgresql and the server is showing issues long before
10K connections is reached. The OS is probably throwing everything to the
swap file and see connections dropped or time out.

Should be using a connection pooler to service this kind of load so the
Postgresql does not exhaust resources just from the open connections.
https://www.pgbouncer.org/

On Tue, Feb 25, 2020 at 11:29 AM Dipanjan Ganguly <dipagnjan@gmail.com>
wrote:

Show quoted text

Greetings,

I was trying to use postgresql database as a backend with Ejabberd XMPP
server for load test (Using TSUNG).

Noticed, while using Mnesia the “simultaneous users and open TCP/UDP
connections” graph in Tsung report is showing consistency, but while using
Postgres, we see drop in connections during 100 to 500 seconds of runtime,
and then recovering and staying consistent.

I have been trying to figure out what the issue could be without any
success. I am kind of a noob in this technology, and hoping for some help
from the good people from the community to understand the problem and how
to fix this. Below are some details..

· Postgres server utilization is low ( Avg load 1, Highest Cpu
utilization 26%, lowest freemem 9000)

Tsung graph:
[image: image.png]
Graph 1: Postgres 12 Backen
[image: image.png]

Graph 2: Mnesia backend

· Ejabberd Server: Ubuntu 16.04, 16 GB ram, 4 core CPU.

· Postgres on remote server: same config

· Errors encountered during the same time:
error_connect_etimedout (same outcome for other 2 tests)

· *Tsung Load: *512 Bytes message size, user arrival rate 50/s,
80k registered users.

· Postgres server utilization is low ( Avg load 1, Highest Cpu
utilization 26%, lowest freemem 9000)

· Same tsung.xm and userlist used for the tests in Mnesia and
Postgres.

*Postgres Configuration used:*
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 256MB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
max_connections=50000

Kindly help understanding this behavior. Some advice on how to fix this
will be a big help .

Thanks,

Dipanjan

Attachments:

image.pngimage/png; name=image.pngDownload
image.pngimage/png; name=image.pngDownload
#3Michael Lewis
mlewis@entrata.com
In reply to: Justin (#2)
Re: Connections dropping while using Postgres backend DB with Ejabberd

work_mem can be used many times per connection given it is per sort, hash,
or other operations and as mentioned that can be multiplied if the query is
handled with parallel workers. I am guessing the server has 16GB memory
total given shared_buffers and effective_cache_size, and a more reasonable
work_mem setting might be on the order of 32-64MB.

Depending on the type of work being done and how quickly the application
releases the db connection once it is done, max connections might be on the
order of 4-20x the number of cores I would expect. If more simultaneous
users need to be serviced, a connection pooler like pgbouncer or pgpool
will allow those connections to be re-used quickly.

These numbers are generalizations based on my experience. Others with more
experience may have different configurations to recommend.

Show quoted text
#4Dipanjan Ganguly
dipagnjan@gmail.com
In reply to: Michael Lewis (#3)
Re: Connections dropping while using Postgres backend DB with Ejabberd

Thanks Michael for the recommendation and clarification.

Will try the with 32 MB on my next run.

BR,
Dipanjan

On Tue, Feb 25, 2020 at 10:51 PM Michael Lewis <mlewis@entrata.com> wrote:

Show quoted text

work_mem can be used many times per connection given it is per sort, hash,
or other operations and as mentioned that can be multiplied if the query is
handled with parallel workers. I am guessing the server has 16GB memory
total given shared_buffers and effective_cache_size, and a more reasonable
work_mem setting might be on the order of 32-64MB.

Depending on the type of work being done and how quickly the application
releases the db connection once it is done, max connections might be on the
order of 4-20x the number of cores I would expect. If more simultaneous
users need to be serviced, a connection pooler like pgbouncer or pgpool
will allow those connections to be re-used quickly.

These numbers are generalizations based on my experience. Others with more
experience may have different configurations to recommend.

#5Justin
zzzzz.graf@gmail.com
In reply to: Dipanjan Ganguly (#1)
2 attachment(s)
Re: Connections dropping while using Postgres backend DB with Ejabberd

Hi Dipanjan

If the connections are not being closed and left open , you should see
50,000 processes running on the server because postgresql creates/forks a
new process for each connection

Just having that many processes running will exhaust resources, I would
confirm that the process are still running.
you can use the command

ps aux |wc -l

to get a count on the number of processes
Beyond just opening the connection are there any actions such as Select *
from sometable being fired off to measure performance?

Attempting to open and leave 50K connections open should exhaust the server
resources long before reaching 50K

Something is off here I would be looking into how this test actually works,
how the connections are opened, and commands it sends to Postgresql

On Tue, Feb 25, 2020 at 2:12 PM Dipanjan Ganguly <dipagnjan@gmail.com>
wrote:

Show quoted text

Hi Justin,

Thanks for your insight.

I agree with you completely, but as mentioned in my previous email, the
fact that Postgres server resource utilization is less *"( Avg load 1,
Highest Cpu utilization 26%, lowest freemem 9000)*" and it recovers at a
certain point then consistently reaches close to 50 k , is what confusing
me..

Legends from the Tsung report:
users
Number of simultaneous users (it's session has started, but not yet
finished).connectednumber of users with an opened TCP/UDP connection
(example: for HTTP, during a think time, the TCP connection can be closed
by the server, and it won't be reopened until the thinktime has expired)
I have also used pgcluu to monitor the events. Sharing the stats below..*Memory
information*

- 15.29 GB Total memory
- 8.79 GB Free memory
- 31.70 MB Buffers
- 5.63 GB Cached
- 953.12 MB Total swap
- 953.12 MB Free swap
- 13.30 MB Page Tables
- 3.19 GB Shared memory

Any thoughts ??!! 🤔🤔

Thanks,
Dipanjan

On Tue, Feb 25, 2020 at 10:31 PM Justin <zzzzz.graf@gmail.com> wrote:

Hi Dipanjan

Please do not post to all the postgresql mailing list lets keep this on
one list at a time, Keep this on general list

Am i reading this correctly 10,000 to 50,000 open connections.
Postgresql really is not meant to serve that many open connections.
Due to design of Postgresql each client connection can use up to the
work_mem of 256MB plus additional for parallel processes. Memory will be
exhausted long before 50,0000 connections is reached

I'm not surprised Postgresql and the server is showing issues long before
10K connections is reached. The OS is probably throwing everything to the
swap file and see connections dropped or time out.

Should be using a connection pooler to service this kind of load so the
Postgresql does not exhaust resources just from the open connections.
https://www.pgbouncer.org/

On Tue, Feb 25, 2020 at 11:29 AM Dipanjan Ganguly <dipagnjan@gmail.com>
wrote:

Greetings,

I was trying to use postgresql database as a backend with Ejabberd XMPP
server for load test (Using TSUNG).

Noticed, while using Mnesia the “simultaneous users and open TCP/UDP
connections” graph in Tsung report is showing consistency, but while using
Postgres, we see drop in connections during 100 to 500 seconds of runtime,
and then recovering and staying consistent.

I have been trying to figure out what the issue could be without any
success. I am kind of a noob in this technology, and hoping for some help
from the good people from the community to understand the problem and how
to fix this. Below are some details..

· Postgres server utilization is low ( Avg load 1, Highest Cpu
utilization 26%, lowest freemem 9000)

Tsung graph:
[image: image.png]
Graph 1: Postgres 12 Backen
[image: image.png]

Graph 2: Mnesia backend

· Ejabberd Server: Ubuntu 16.04, 16 GB ram, 4 core CPU.

· Postgres on remote server: same config

· Errors encountered during the same time:
error_connect_etimedout (same outcome for other 2 tests)

· *Tsung Load: *512 Bytes message size, user arrival rate
50/s, 80k registered users.

· Postgres server utilization is low ( Avg load 1, Highest Cpu
utilization 26%, lowest freemem 9000)

· Same tsung.xm and userlist used for the tests in Mnesia and
Postgres.

*Postgres Configuration used:*
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 256MB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
max_connections=50000

Kindly help understanding this behavior. Some advice on how to fix this
will be a big help .

Thanks,

Dipanjan

Attachments:

image.pngimage/png; name=image.pngDownload
image.pngimage/png; name=image.pngDownload
#6Dipanjan Ganguly
dipagnjan@gmail.com
In reply to: Justin (#5)
2 attachment(s)
Re: Connections dropping while using Postgres backend DB with Ejabberd

Hi Justin,
I have already checked running Postgres processes and strangely never
counted more than 20.

I'll check as you recommend on how ejabberd to postgresql connectivity
works. May be the answer lies there. Will get back if I find something.

Thanks for giving some direction to my thoughts.

Good talk. 👍👍

BR,
Dipanjan

On Wed 26 Feb, 2020 1:05 am Justin, <zzzzz.graf@gmail.com> wrote:

Show quoted text

Hi Dipanjan

If the connections are not being closed and left open , you should see
50,000 processes running on the server because postgresql creates/forks a
new process for each connection

Just having that many processes running will exhaust resources, I would
confirm that the process are still running.
you can use the command

ps aux |wc -l

to get a count on the number of processes
Beyond just opening the connection are there any actions such as Select *
from sometable being fired off to measure performance?

Attempting to open and leave 50K connections open should exhaust the
server resources long before reaching 50K

Something is off here I would be looking into how this test actually
works, how the connections are opened, and commands it sends to Postgresql

On Tue, Feb 25, 2020 at 2:12 PM Dipanjan Ganguly <dipagnjan@gmail.com>
wrote:

Hi Justin,

Thanks for your insight.

I agree with you completely, but as mentioned in my previous email, the
fact that Postgres server resource utilization is less *"( Avg load 1,
Highest Cpu utilization 26%, lowest freemem 9000)*" and it recovers at
a certain point then consistently reaches close to 50 k , is what confusing
me..

Legends from the Tsung report:
users
Number of simultaneous users (it's session has started, but not yet
finished).connectednumber of users with an opened TCP/UDP connection
(example: for HTTP, during a think time, the TCP connection can be closed
by the server, and it won't be reopened until the thinktime has expired)
I have also used pgcluu to monitor the events. Sharing the stats below..*Memory
information*

- 15.29 GB Total memory
- 8.79 GB Free memory
- 31.70 MB Buffers
- 5.63 GB Cached
- 953.12 MB Total swap
- 953.12 MB Free swap
- 13.30 MB Page Tables
- 3.19 GB Shared memory

Any thoughts ??!! 🤔🤔

Thanks,
Dipanjan

On Tue, Feb 25, 2020 at 10:31 PM Justin <zzzzz.graf@gmail.com> wrote:

Hi Dipanjan

Please do not post to all the postgresql mailing list lets keep this on
one list at a time, Keep this on general list

Am i reading this correctly 10,000 to 50,000 open connections.
Postgresql really is not meant to serve that many open connections.
Due to design of Postgresql each client connection can use up to the
work_mem of 256MB plus additional for parallel processes. Memory will be
exhausted long before 50,0000 connections is reached

I'm not surprised Postgresql and the server is showing issues long
before 10K connections is reached. The OS is probably throwing everything
to the swap file and see connections dropped or time out.

Should be using a connection pooler to service this kind of load so the
Postgresql does not exhaust resources just from the open connections.
https://www.pgbouncer.org/

On Tue, Feb 25, 2020 at 11:29 AM Dipanjan Ganguly <dipagnjan@gmail.com>
wrote:

Greetings,

I was trying to use postgresql database as a backend with Ejabberd XMPP
server for load test (Using TSUNG).

Noticed, while using Mnesia the “simultaneous users and open TCP/UDP
connections” graph in Tsung report is showing consistency, but while using
Postgres, we see drop in connections during 100 to 500 seconds of runtime,
and then recovering and staying consistent.

I have been trying to figure out what the issue could be without any
success. I am kind of a noob in this technology, and hoping for some help
from the good people from the community to understand the problem and how
to fix this. Below are some details..

· Postgres server utilization is low ( Avg load 1, Highest Cpu
utilization 26%, lowest freemem 9000)

Tsung graph:
[image: image.png]
Graph 1: Postgres 12 Backen
[image: image.png]

Graph 2: Mnesia backend

· Ejabberd Server: Ubuntu 16.04, 16 GB ram, 4 core CPU.

· Postgres on remote server: same config

· Errors encountered during the same time:
error_connect_etimedout (same outcome for other 2 tests)

· *Tsung Load: *512 Bytes message size, user arrival rate
50/s, 80k registered users.

· Postgres server utilization is low ( Avg load 1, Highest Cpu
utilization 26%, lowest freemem 9000)

· Same tsung.xm and userlist used for the tests in Mnesia and
Postgres.

*Postgres Configuration used:*
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 256MB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
max_connections=50000

Kindly help understanding this behavior. Some advice on how to fix
this will be a big help .

Thanks,

Dipanjan

Attachments:

image.pngimage/png; name=image.pngDownload
image.pngimage/png; name=image.pngDownload