vacuumdb not letting me connect to db

Started by Atul Kumarabout 5 years ago20 messagesgeneral
Jump to latest
#1Atul Kumar
akumar14871@gmail.com

Hi,

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that command I was not able to connect the database
using psql for few minutes.

After 20-30 minutes i was able to connect to the db and at that time I
checked the pg_stst_activity, the active connections was reduced to
27.

my max_connections is set to 700.

I tried to find out the reason for not being abled to connect the db
(when 300 jobs were running) but still not got the answer. So
suggestions are welcome.

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Atul Kumar (#1)
Re: vacuumdb not letting me connect to db

On Thu, 2021-02-04 at 16:56 +0530, Atul Kumar wrote:

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that command I was not able to connect the database
using psql for few minutes.

That is to be expected.

If you have 300 processes performing I/O and using CPU, your machine
will vertainly be overloaded.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#3Atul Kumar
akumar14871@gmail.com
In reply to: Laurenz Albe (#2)
Re: vacuumdb not letting me connect to db

Hi,

The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption) but
I don’t understand one thing here that if max_connections is set to 700
then why I am not able to connect the db.

As the running jobs (300) are lesser than half of max_connections.

Regards
Atul

On Thursday, February 4, 2021, Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Show quoted text

On Thu, 2021-02-04 at 16:56 +0530, Atul Kumar wrote:

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that command I was not able to connect the database
using psql for few minutes.

That is to be expected.

If you have 300 processes performing I/O and using CPU, your machine
will vertainly be overloaded.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#4Ron
ronljohnsonjr@gmail.com
In reply to: Atul Kumar (#1)
Re: vacuumdb not letting me connect to db

On 2/4/21 5:26 AM, Atul Kumar wrote:

Hi,

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that command I was not able to connect the database
using psql for few minutes.

What's the exact error message?

After 20-30 minutes i was able to connect to the db and at that time I
checked the pg_stst_activity, the active connections was reduced to
27.

my max_connections is set to 700.

I tried to find out the reason for not being abled to connect the db
(when 300 jobs were running) but still not got the answer. So

Connect to the cluster *before* running "vacuumdb -j300", and start looking
at pg_stst_activity while vacuumdb is running.

--
Angular momentum makes the world go 'round.

#5Ron
ronljohnsonjr@gmail.com
In reply to: Atul Kumar (#3)
Re: vacuumdb not letting me connect to db

What about disk IO?  That's what really gets saturated when running 300 threads.

On 2/4/21 11:00 AM, Atul Kumar wrote:

Hi,

The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption)
but I don’t understand one thing here that if max_connections is set to
700 then why I am not able to connect the db.

As the running jobs (300) are lesser than half of max_connections.

Regards
Atul

On Thursday, February 4, 2021, Laurenz Albe <laurenz.albe@cybertec.at
<mailto:laurenz.albe@cybertec.at>> wrote:

On Thu, 2021-02-04 at 16:56 +0530, Atul Kumar wrote:

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that  command I was not able to connect the database
using psql for few minutes.

That is to be expected.

If you have 300 processes performing I/O and using CPU, your machine
will vertainly be overloaded.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
<https://www.cybertec-postgresql.com&gt;

--
Angular momentum makes the world go 'round.

#6Ravi Krishna
rkrishna_pg@aol.com
In reply to: Atul Kumar (#3)
Re: vacuumdb not letting me connect to db

The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption)
but I don’t understand one thing here that if max_connections is set to 700 then
why I am not able to connect the db. As the running jobs (300) are lesser than
half of max_connections.

Please paste the error message

#7Atul Kumar
akumar14871@gmail.com
In reply to: Ravi Krishna (#6)
Re: vacuumdb not letting me connect to db

There is no error message, when I try to connect the database while running
vacuumdb with 300 jobs, it gets stuck.

On Thursday, February 4, 2021, Ravi Krishna <rkrishna_pg@aol.com> wrote:

Show quoted text

The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption)
but I don’t understand one thing here that if max_connections is set to

700 then

why I am not able to connect the db. As the running jobs (300) are lesser

than

half of max_connections.

Please paste the error message

#8Atul Kumar
akumar14871@gmail.com
In reply to: Ron (#4)
Re: vacuumdb not letting me connect to db

There is no error message, when I try to connect the database while running
vacuumdb with 300 jobs, it gets stuck.

On Thursday, February 4, 2021, Ron <ronljohnsonjr@gmail.com> wrote:

Show quoted text

On 2/4/21 5:26 AM, Atul Kumar wrote:

Hi,

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that command I was not able to connect the database
using psql for few minutes.

What's the exact error message?

After 20-30 minutes i was able to connect to the db and at that time I
checked the pg_stst_activity, the active connections was reduced to
27.

my max_connections is set to 700.

I tried to find out the reason for not being abled to connect the db
(when 300 jobs were running) but still not got the answer. So

Connect to the cluster *before* running "vacuumdb -j300", and start
looking at pg_stst_activity while vacuumdb is running.

--
Angular momentum makes the world go 'round.

#9Ron
ronljohnsonjr@gmail.com
In reply to: Atul Kumar (#7)
Re: vacuumdb not letting me connect to db

Your problem screams "IO saturation".

On 2/4/21 12:07 PM, Atul Kumar wrote:

There is no error message, when I try to connect the database while
running vacuumdb with 300 jobs, it gets stuck.

On Thursday, February 4, 2021, Ravi Krishna <rkrishna_pg@aol.com
<mailto:rkrishna_pg@aol.com>> wrote:

The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption)
but I don’t understand one thing here that if max_connections is set

to 700 then

why I am not able to connect the db. As the running jobs (300) are

lesser than

half of max_connections.

Please paste the error message

--
Angular momentum makes the world go 'round.

#10Ravi Krishna
rkrishna_pg@aol.com
In reply to: Atul Kumar (#8)
Re: vacuumdb not letting me connect to db

There is no error message, when I try to connect the database while
running vacuumdb with 300 jobs, it gets stuck.

But you mentioned max connection which now seems to be a red herring.
Based on your description, the impression I got is that you are getting
"sorry, too many clients already" error.

As others have pointed out, the db is probably saturated with I/O error.

#11Atul Kumar
akumar14871@gmail.com
In reply to: Ron (#9)
Re: vacuumdb not letting me connect to db

ok, How do I resolve it ?

Any suggestions ?

Show quoted text

On 2/5/21, Ron <ronljohnsonjr@gmail.com> wrote:

Your problem screams "IO saturation".

On 2/4/21 12:07 PM, Atul Kumar wrote:

There is no error message, when I try to connect the database while
running vacuumdb with 300 jobs, it gets stuck.

On Thursday, February 4, 2021, Ravi Krishna <rkrishna_pg@aol.com
<mailto:rkrishna_pg@aol.com>> wrote:

The CPU and RAM are normal even on 300 jobs ( only 1-4% of

consumption)

but I don’t understand one thing here that if max_connections is set

to 700 then

why I am not able to connect the db. As the running jobs (300) are

lesser than

half of max_connections.

Please paste the error message

--
Angular momentum makes the world go 'round.

#12Ron
ronljohnsonjr@gmail.com
In reply to: Atul Kumar (#11)
Re: vacuumdb not letting me connect to db

Obviously... don't use 300 threads.

On 2/5/21 2:15 AM, Atul Kumar wrote:

ok, How do I resolve it ?

Any suggestions ?

On 2/5/21, Ron <ronljohnsonjr@gmail.com> wrote:

Your problem screams "IO saturation".

On 2/4/21 12:07 PM, Atul Kumar wrote:

There is no error message, when I try to connect the database while
running vacuumdb with 300 jobs, it gets stuck.

On Thursday, February 4, 2021, Ravi Krishna <rkrishna_pg@aol.com
<mailto:rkrishna_pg@aol.com>> wrote:

The CPU and RAM are normal even on 300 jobs ( only 1-4% of

consumption)

but I don’t understand one thing here that if max_connections is set

to 700 then

why I am not able to connect the db. As the running jobs (300) are

lesser than

half of max_connections.

Please paste the error message

--
Angular momentum makes the world go 'round.

--
Angular momentum makes the world go 'round.

#13Rob Sargent
robjsargent@gmail.com
In reply to: Ron (#12)
Re: vacuumdb not letting me connect to db

On 2/5/21 9:11 AM, Ron wrote:

Obviously... don't use 300 threads.

No, no Ron. Clearly the answer is more CPUs

#14Ron
ronljohnsonjr@gmail.com
In reply to: Rob Sargent (#13)
Re: vacuumdb not letting me connect to db

On 2/5/21 10:22 AM, Rob Sargent wrote:

On 2/5/21 9:11 AM, Ron wrote:

Obviously... don't use 300 threads.

No, no Ron.  Clearly the answer is more CPUs

I hope you're being sarcastic.

--
Angular momentum makes the world go 'round.

#15Gavan Schneider
list.pg.gavan@pendari.org
In reply to: Ron (#14)
Re: vacuumdb not letting me connect to db

On 6 Feb 2021, at 3:37, Ron wrote:

On 2/5/21 10:22 AM, Rob Sargent wrote:

On 2/5/21 9:11 AM, Ron wrote:

Obviously... don't use 300 threads.

No, no Ron.  Clearly the answer is more CPUs

I hope you're being sarcastic.

A reasonable conjecture… though there is the consideration that 300
CPU intensive tasks spread across a given number of CPUs is going to
waste some resources with context switching., i.e., need more CPUs :)

Basically if there is plenty of wait time for I/O completion then CPU
task switching can get more total work done. So far so obvious. In this
thread I can see where it is disappointing to have a system considered
capable of 700 connections getting saturated by a “mere” 300
threads. But this is only a “problem” if connections are equated to
threads. PG max connection count is about external users having access
to resources needed to get a task done. Like all resource allocations
this relies on estimated average usage, i.e., each connection only asks
for a lot of CPU in brief bursts and then the result is transmitted with
a time lag before the connection makes another CPU demand. The system
designer should use estimations about usage and load to budget and
configure the system, and, monitor it all against actual performance in
the real world. Of course estimates are a standing request for outliers
and the system will show stress under an unexpected load.

So far I have not seen an analysis of where the bottle neck has
occurred: CPU RAM HD and/or the data bus connecting these. Some of these
hardware resources maxed out to the extent the system would not
immediately pick up an additional work unit. As I see it OP started 300
CPU intensive tasks on hardware intended for 700 connections. If the
connection count was designed with say 50% CPU intensive time per
connection you would expect this hardware to be fully saturated with 300
CPU intensive tasks. More than that, doing the task with 300 threads
would probably take longer than (say) 200 threads as the increased CPU
context swapping time is just wasted effort.

OP now has a choice: decrease threads or (seriously) upgrade the
hardware. We in the gallery would love to see a plot of total time to
completion as a function of threads invoked (50-300 increments of 50)
assuming the starting conditions are the same :)

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a
well-known solution to every human problem — neat, plausible, and
wrong.
— H. L. Mencken, 1920

#16Atul Kumar
akumar14871@gmail.com
In reply to: Gavan Schneider (#15)
Re: vacuumdb not letting me connect to db

Hi Gavan,

Thanks for providing the details, I need more clarification on this as how
should I analyze that what should be ideal no. of connections should we set
to avoid IO overhead based on the available hardware resources.

How to do this calculation ?

Note: even during 300 threads, my RAM utilisation is totally normal.

Regards
Atul

On Saturday, February 6, 2021, Gavan Schneider <list.pg.gavan@pendari.org>
wrote:

Show quoted text

On 6 Feb 2021, at 3:37, Ron wrote:

On 2/5/21 10:22 AM, Rob Sargent wrote:

On 2/5/21 9:11 AM, Ron wrote:

Obviously... don't use 300 threads.

No, no Ron. Clearly the answer is more CPUs

I hope you're being sarcastic.

A reasonable conjecture… though there is the consideration that 300 CPU

intensive tasks spread across a given number of CPUs is going to waste some
resources with context switching., i.e., need more CPUs :)

Basically if there is plenty of wait time for I/O completion then CPU task
switching can get more total work done. So far so obvious. In this thread
I can see where it is disappointing to have a system considered capable of
700 connections getting saturated by a “mere” 300 threads. But this is only
a “problem” if connections are equated to threads. PG max connection count
is about external users having access to resources needed to get a task
done. Like all resource allocations this relies on estimated average usage,
i.e., each connection only asks for a lot of CPU in brief bursts and then
the result is transmitted with a time lag before the connection makes
another CPU demand. The system designer should use estimations about usage
and load to budget and configure the system, and, monitor it all against
actual performance in the real world. Of course estimates are a standing
request for outliers and the system will show stress under an unexpected
load.

So far I have not seen an analysis of where the bottle neck has occurred:
CPU RAM HD and/or the data bus connecting these. Some of these hardware
resources maxed out to the extent the system would not immediately pick up
an additional work unit. As I see it OP started 300 CPU intensive tasks on
hardware intended for 700 connections. If the connection count was designed
with say 50% CPU intensive time per connection you would expect this
hardware to be fully saturated with 300 CPU intensive tasks. More than
that, doing the task with 300 threads would probably take longer than (say)
200 threads as the increased CPU context swapping time is just wasted
effort.

OP now has a choice: decrease threads or (seriously) upgrade the hardware.
We in the gallery would love to see a plot of total time to completion as a
function of threads invoked (50-300 increments of 50) assuming the starting
conditions are the same :)

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

#17Gavan Schneider
list.pg.gavan@pendari.org
In reply to: Atul Kumar (#16)
Re: vacuumdb not letting me connect to db

On 6 Feb 2021, at 23:06, Atul Kumar wrote:

Thanks for providing the details, I need more clarification on this as
how
should I analyze that what should be ideal no. of connections should
we set
to avoid IO overhead based on the available hardware resources.

How to do this analysis properly is outside my expertise and likely off
topic for this list. But you can get an overall idea doing what I
suggested: measure the performance (i.e., total time to completion) when
using different numbers of threads). In an ideal world the time will get
less as you invoke more threads. But there will be a point where there
is a bottle neck (or some resource hits its limit) and the time will be
more or less the same no matter how many threads.

How to do this calculation ?

That’s likely too hard. Measurement will tell you what you need to
know. Once you know how many threads it takes to saturate you will know
you need to use a lesser number. This may, or may not, get you to
revisit the max 700 connections setting but once you know there is a
performance limit, and you decide it’s not good enough, then you can
ask the hardware people for help.

Note: even during 300 threads, my RAM utilisation is totally normal.

That’s not unexpected. The CPU activity should be using the data
held in RAM not spending all those cycles allocating RAM.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a
well-known solution to every human problem — neat, plausible, and
wrong.
— H. L. Mencken, 1920

#18Rob Sargent
robjsargent@gmail.com
In reply to: Ron (#14)
Re: vacuumdb not letting me connect to db

On Feb 5, 2021, at 9:37 AM, Ron <ronljohnsonjr@gmail.com> wrote:

On 2/5/21 10:22 AM, Rob Sargent wrote:

On 2/5/21 9:11 AM, Ron wrote:
Obviously... don't use 300 threads.

No, no Ron. Clearly the answer is more CPUs

I hope you're being sarcastic.

--
Angular momentum makes the world go 'round.

Ron, I was being sarcastic, but I’ve gone over the thread and I do not see any mention of the number of cores on the server. If it has a single quad core chip then the queue for each processor could be on average 75 deep at the outset, all of them anxiously awaiting data. -j connections are processes, correct? Not threads.
Atul, if you list the tools you know and have used for watching system performance and then others can perhaps suggest alternatives (or re-interpret results)

#19Ron
ronljohnsonjr@gmail.com
In reply to: Atul Kumar (#16)
Re: vacuumdb not letting me connect to db

On 2/6/21 6:06 AM, Atul Kumar wrote:

Hi Gavan,

Thanks for providing the details, I need more clarification on this as how
should I analyze that what should be ideal no. of connections should we
set to avoid IO overhead based on the available hardware resources.
How to do this calculation ?

Run "iotop -o -u postgres", and then compare that with the total bandwidth
available to the system.  If it's (even almost) saturated, then everything
else will be starved.

The "--jobs=" value should AT MOST be some *small* multiple of the number of
CPUs (like 1x, 1.5x or *maybe* 2x if the core count is low, and nothing else
is running on the system.

Note: even during 300 threads, my RAM utilisation is totally normal.

Regards
Atul

On Saturday, February 6, 2021, Gavan Schneider <list.pg.gavan@pendari.org
<mailto:list.pg.gavan@pendari.org>> wrote:

On 6 Feb 2021, at 3:37, Ron wrote:

On 2/5/21 10:22 AM, Rob Sargent wrote:

On 2/5/21 9:11 AM, Ron wrote:

Obviously... don't use 300 threads.

No, no Ron.  Clearly the answer is more CPUs

I hope you're being sarcastic.

A reasonable conjecture… though there is the consideration that 300
CPU intensive tasks spread across a given number of CPUs is going to
waste some resources with context switching., i.e., need more CPUs :)

Basically if there is plenty of wait time for I/O completion then CPU
task switching can get more total work done.  So far so obvious. In
this thread I can see where it is disappointing to have a system
considered capable of 700 connections getting saturated by a “mere”
300 threads. But this is only a “problem” if connections are equated
to threads. PG max connection count is about external users having
access to resources needed to get a task done. Like all resource
allocations this relies on estimated average usage, i.e., each
connection only asks for a lot of CPU in brief bursts and then the
result is transmitted with a time lag before the connection makes
another CPU demand. The system designer should use estimations about
usage and load to budget and configure the system, and, monitor it all
against actual performance in the real world. Of course estimates are
a standing request for outliers and the system will show stress under
an unexpected load.

So far I have not seen an analysis of where the bottle neck has
occurred: CPU RAM HD and/or the data bus connecting these. Some of
these hardware resources maxed out to the extent the system would not
immediately pick up an additional work unit. As I see it OP started
300 CPU intensive tasks on hardware intended for 700 connections. If
the connection count was designed with say 50% CPU intensive time per
connection you would expect this hardware to be fully saturated with
300 CPU intensive tasks. More than that, doing the task with 300
threads would probably take longer than (say) 200 threads as the
increased CPU context swapping time is just wasted effort.

OP now has a choice: decrease threads or (seriously) upgrade the
hardware. We in the gallery would love to see a plot of total time to
completion as a function of threads invoked (50-300 increments of 50)
assuming the starting conditions are the same :)

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

--
Angular momentum makes the world go 'round.

#20Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Ron (#19)
Re: vacuumdb not letting me connect to db

On 2021-02-06 12:54:11 -0600, Ron wrote:

On 2/6/21 6:06 AM, Atul Kumar wrote:

Hi Gavan,

Thanks for providing the details, I need more clarification on this as how
should I analyze that what should be ideal no. of connections should we set
to avoid IO overhead based on the available hardware resources.
How to do this calculation ?

Run "iotop -o -u postgres", and then compare that with the total bandwidth
available to the system.  If it's (even almost) saturated, then everything else
will be starved.

The "--jobs=" value should AT MOST be some *small* multiple of the number of
CPUs (like 1x, 1.5x or maybe 2x if the core count is low, and nothing else is
running on the system.

While this is generally good advice, Atul reported earlier in the thread
that CPU consumption is only 1 to 4 %. So the problem is clearly not
related to CPU usage, but very probably (as some have already noted) on
I/O. You could use a similar rule of thumb for disks ("a small multiple
of the number of disks") but with SSDs this might not be very accurate.

I think the only method to get to the optimal number of vacuums
that can be run in parallel is to determine it experimentally:
300 is clearly too high. So try 150, 75, 37, etc. For each number try to
do some other work - is the performance acceptable? If yes, note the
time until vacuum is finished. FInally among those where the performance
was acceptable choose the value which was fastest.

(Note: If you do this on the same database, subsequent runs will benefit
from work already done, so the take the results with a grain of salt).

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"