DB Server slow down & hang during Peak hours of Usage
List ,
*Description:*
Maintaining a DB Server Postgres and with a lot of read writes to this
Server( virtual machine running on ESXi 7 with CentOS 7) .
( I am not sure how to get the read / write counts or required IOPS or any
other parameters for you. If you point our I can execute those commands
and get the data. )
Peak hours say 19:00 Hrs to 21:00 hrs it hangs ( The application is an
Emergency call response system writing many Emergency Response vehicles
locations coordinates to the DB every 30 Seconds and every emergency call
metadata (username, phone number, location info and address of the caller
to the DB for each call)
During these hours the system hangs and the Application ( which shows the
location of the vehicles on a GIS map hangs ) and the CAD machines which
connects to the system hangs as those machines can't connect to the DB and
get data for displaying the caller information to the call taking persons
working on them. )
*Issue : *
How to trace out what makes this DB hangs and make it slow and how to fix
it..
*Resource poured on the system :*
*64 vCPUs allocate ( Out of a host machine comprised of 2 processor slots
of 20 cores each with Hyper Threading, intel xeon 2nd Gen, CPU usage show
50 % in vCentre Console), and RAM 64 GB allocated ( buy usage always
showing around 33 GB only ) *
*Query :*
How to rectify the issues that makes the DB server underperforming and find
a permanent fix for this slow down issue*. *
*Attached the Postgres.conf file here for reference .*
*Any more information required I can share for analysis to fix the issue. *
*Krishane *
Attachments:
On 8/7/23 08:41, KK CHN wrote:
List ,
*Description:*
Maintaining a DB Server Postgres and with a lot of read writes to this
Server( virtual machine running on ESXi 7 with CentOS 7) .( I am not sure how to get the read / write counts or required IOPS or any
other parameters for you. If you point our I can execute those commands
and get the data. )Peak hours say 19:00 Hrs to 21:00 hrs it hangs ( The application is an
Emergency call response system writing many Emergency Response vehicles
locations coordinates to the DB every 30 Seconds and every emergency call
metadata (username, phone number, location info and address of the caller
to the DB for each call)During these hours the system hangs and the Application ( which shows
the location of the vehicles on a GIS map hangs ) and the CAD machines
which connects to the system hangs as those machines can't connect to the
DB and get data for displaying the caller information to the call taking
persons working on them. )
Have you looked at the postgresql log file?
Have you looked in the *application* error log?
*Issue : *
How to trace out what makes this DB hangs and make it slow and how to
fix it..*Resource poured on the system :*
*
*
*64 vCPUs allocate ( Out of a host machine comprised of 2 processor slots
of 20 cores each with Hyper Threading, intel xeon 2nd Gen, CPU usage show
50 % in vCentre Console), and RAM 64 GB allocated ( buy usage always
showing around 33 GB only ) *
*
*
What else happens on the ESXi host between 19:00 and 21:00?
*Query :*
*
*
How to rectify the issues that makes the DB server underperforming and
find a permanent fix for this slow down issue*. *
*
*
*Attached the Postgres.conf file here for reference .*
*
*
*Any more information required I can share for analysis to fix the issue. *
*
**Krishane *
--
Born in Arizona, moved to Babylonia.
Hello,
in the postgresql.conf joined, 2 things (at least) look strange:
1) the values for background writer are the default values, fit for a
server with a limited writes throughput.
you may want to increase those, like:
bgwriter_delay = 50ms
bgwriter_lru_maxpages = 400
bgwriter_lru_multiplier = 4.0
and check the checkpoint log to see if there are still backend processes
writes.
2) work_mem is set to 2 GB.
so, if 50 simultaneous requests use at least one buffer for sorting,
joining, ..., you will consume 100 GB of RAM
this value seems huge for the kind of config/usage you describe.
You may try to set work_mem to 100 MB and check what's happening.
Also check the logs, postgres tells his life there...
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
On Mon, Aug 7, 2023 at 3:36 PM KK CHN <kkchn.in@gmail.com> wrote:
Show quoted text
List ,
*Description:*
Maintaining a DB Server Postgres and with a lot of read writes to this
Server( virtual machine running on ESXi 7 with CentOS 7) .( I am not sure how to get the read / write counts or required IOPS or any
other parameters for you. If you point our I can execute those commands
and get the data. )Peak hours say 19:00 Hrs to 21:00 hrs it hangs ( The application is an
Emergency call response system writing many Emergency Response vehicles
locations coordinates to the DB every 30 Seconds and every emergency call
metadata (username, phone number, location info and address of the caller
to the DB for each call)During these hours the system hangs and the Application ( which shows
the location of the vehicles on a GIS map hangs ) and the CAD machines
which connects to the system hangs as those machines can't connect to the
DB and get data for displaying the caller information to the call taking
persons working on them. )*Issue : *
How to trace out what makes this DB hangs and make it slow and how to
fix it..*Resource poured on the system :*
*64 vCPUs allocate ( Out of a host machine comprised of 2 processor slots
of 20 cores each with Hyper Threading, intel xeon 2nd Gen, CPU usage show
50 % in vCentre Console), and RAM 64 GB allocated ( buy usage always
showing around 33 GB only ) **Query :*
How to rectify the issues that makes the DB server underperforming and
find a permanent fix for this slow down issue*. **Attached the Postgres.conf file here for reference .*
*Any more information required I can share for analysis to fix the issue. *
*Krishane *
also,
checkpoint setup are all default values
you may try to
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min
max_wal_size = 5GB
and, as said in the previous mail, check the checkpoint logs
Also, all vacuum and autovacuum values are defaults
so, as autovacuum_work_mem = -1
the autovacuum processes will use the 4 GB setuped by maintenance_work_mem
= 4096MB
as there are 3 launched at the same time, its 12 GB "eaten"
which doesn't look like a good idea, so set
autovacuum_work_mem = 128MB
also pls read the autovacuum doc for your version (which is ?) here for
postgres 12:
https://www.postgresql.org/docs/12/runtime-config-autovacuum.html
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
On Tue, Aug 8, 2023 at 1:59 PM Marc Millas <marc.millas@mokadb.com> wrote:
Show quoted text
Hello,
in the postgresql.conf joined, 2 things (at least) look strange:
1) the values for background writer are the default values, fit for a
server with a limited writes throughput.
you may want to increase those, like:
bgwriter_delay = 50ms
bgwriter_lru_maxpages = 400
bgwriter_lru_multiplier = 4.0
and check the checkpoint log to see if there are still backend processes
writes.2) work_mem is set to 2 GB.
so, if 50 simultaneous requests use at least one buffer for sorting,
joining, ..., you will consume 100 GB of RAM
this value seems huge for the kind of config/usage you describe.
You may try to set work_mem to 100 MB and check what's happening.Also check the logs, postgres tells his life there...
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.comOn Mon, Aug 7, 2023 at 3:36 PM KK CHN <kkchn.in@gmail.com> wrote:
List ,
*Description:*
Maintaining a DB Server Postgres and with a lot of read writes to this
Server( virtual machine running on ESXi 7 with CentOS 7) .( I am not sure how to get the read / write counts or required IOPS or
any other parameters for you. If you point our I can execute those
commands and get the data. )Peak hours say 19:00 Hrs to 21:00 hrs it hangs ( The application is an
Emergency call response system writing many Emergency Response vehicles
locations coordinates to the DB every 30 Seconds and every emergency call
metadata (username, phone number, location info and address of the caller
to the DB for each call)During these hours the system hangs and the Application ( which shows
the location of the vehicles on a GIS map hangs ) and the CAD machines
which connects to the system hangs as those machines can't connect to the
DB and get data for displaying the caller information to the call taking
persons working on them. )*Issue : *
How to trace out what makes this DB hangs and make it slow and how to
fix it..*Resource poured on the system :*
*64 vCPUs allocate ( Out of a host machine comprised of 2 processor
slots of 20 cores each with Hyper Threading, intel xeon 2nd Gen, CPU usage
show 50 % in vCentre Console), and RAM 64 GB allocated ( buy usage always
showing around 33 GB only ) **Query :*
How to rectify the issues that makes the DB server underperforming and
find a permanent fix for this slow down issue*. **Attached the Postgres.conf file here for reference .*
*Any more information required I can share for analysis to fix the
issue. **Krishane *
On Tue, Aug 8, 2023 at 5:49 PM Marc Millas <marc.millas@mokadb.com> wrote:
also,
checkpoint setup are all default valuesyou may try to
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min
max_wal_size = 5GBand, as said in the previous mail, check the checkpoint logs
Also, all vacuum and autovacuum values are defaults
so, as autovacuum_work_mem = -1
the autovacuum processes will use the 4 GB setuped by maintenance_work_mem
= 4096MB
as there are 3 launched at the same time, its 12 GB "eaten"
which doesn't look like a good idea, so set
autovacuum_work_mem = 128MBalso pls read the autovacuum doc for your version (which is ?) here for
postgres 12:
https://www.postgresql.org/docs/12/runtime-config-autovacuum.htmlMarc MILLAS
Senior Architect
+33607850334
www.mokadb.comOn Tue, Aug 8, 2023 at 1:59 PM Marc Millas <marc.millas@mokadb.com> wrote:
Hello,
in the postgresql.conf joined, 2 things (at least) look strange:
1) the values for background writer are the default values, fit for a
server with a limited writes throughput.
you may want to increase those, like:
bgwriter_delay = 50ms
bgwriter_lru_maxpages = 400
bgwriter_lru_multiplier = 4.0
and check the checkpoint log to see if there are still backend processes
writes.2) work_mem is set to 2 GB.
so, if 50 simultaneous requests use at least one buffer for sorting,
joining, ..., you will consume 100 GB of RAM
this value seems huge for the kind of config/usage you describe.
You may try to set work_mem to 100 MB and check what's happening.Also check the logs, postgres tells his life there...
Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com
Thank you all for your time and the valuable inputs to fix the issue. Let
me tune conf parameters as advised and will get back with the results and
log outputs .
Krishane
Show quoted text
On Mon, Aug 7, 2023 at 3:36 PM KK CHN <kkchn.in@gmail.com> wrote:
List ,
*Description:*
Maintaining a DB Server Postgres and with a lot of read writes to this
Server( virtual machine running on ESXi 7 with CentOS 7) .( I am not sure how to get the read / write counts or required IOPS or
any other parameters for you. If you point our I can execute those
commands and get the data. )Peak hours say 19:00 Hrs to 21:00 hrs it hangs ( The application is an
Emergency call response system writing many Emergency Response vehicles
locations coordinates to the DB every 30 Seconds and every emergency call
metadata (username, phone number, location info and address of the caller
to the DB for each call)During these hours the system hangs and the Application ( which shows
the location of the vehicles on a GIS map hangs ) and the CAD machines
which connects to the system hangs as those machines can't connect to the
DB and get data for displaying the caller information to the call taking
persons working on them. )*Issue : *
How to trace out what makes this DB hangs and make it slow and how to
fix it..*Resource poured on the system :*
*64 vCPUs allocate ( Out of a host machine comprised of 2 processor
slots of 20 cores each with Hyper Threading, intel xeon 2nd Gen, CPU usage
show 50 % in vCentre Console), and RAM 64 GB allocated ( buy usage always
showing around 33 GB only ) **Query :*
How to rectify the issues that makes the DB server underperforming and
find a permanent fix for this slow down issue*. **Attached the Postgres.conf file here for reference .*
*Any more information required I can share for analysis to fix the
issue. **Krishane *