synchronous_commit=off doesn't always return immediately

Started by tomrevamover 16 years ago21 messagesgeneral
Jump to latest
#1tomrevam
tomer@fabrix.tv

Hi,

I set synchronous_commit to off and expected trivial inserts (single row,
6-8 columns) to always return quickly. However, I see that during
checkpoints they sometimes take over a second. I tried setting the
full_page_write to off, but this didn't seem to have an effect.

The version I'm using is 8.3.3
The configuration is:
bgwriter_delay = 20ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 5.0 # 0-10.0 multipler on buffers
scanned/round
synchronous_commit = off # immediate fsync at commit
full_page_writes = off # recover from partial page writes
checkpoint_segments = 32 # in logfile segments, min 1, 16MB
each
checkpoint_timeout = 30min # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 -
1.0
checkpoint_warning = 30s # 0 is off

Thanks,
Tomer
--
View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24621119.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: tomrevam (#1)
Re: synchronous_commit=off doesn't always return immediately

tomrevam <tomer@fabrix.tv> writes:

I set synchronous_commit to off and expected trivial inserts (single row,
6-8 columns) to always return quickly. However, I see that during
checkpoints they sometimes take over a second.

What sort of disk hardware have you got? It sounds overstressed.

It might help to increase wal_buffers.

regards, tom lane

#3Greg Smith
gsmith@gregsmith.com
In reply to: tomrevam (#1)
Re: synchronous_commit=off doesn't always return immediately

On Thu, 23 Jul 2009, tomrevam wrote:

bgwriter_delay = 20ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 5.0 # 0-10.0 multipler on buffers scanned/round

These settings may be contributing to the problem. You should never run
the background writer that frequently--it just wastes resources and writes
more than it should. I'd suggest turning it off altogether
(bgwriter_lru_maxpages = 0) and seeing if things improve any, just to rule
that out as a potential source of issues.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#4tomrevam
tomer@fabrix.tv
In reply to: Greg Smith (#3)
Re: synchronous_commit=off doesn't always return immediately

Tom Lane-2 wrote:

What sort of disk hardware have you got? It sounds overstressed.

It might help to increase wal_buffers.

The disk is a SATA drive and it will be replaced in future projects with a
better drive. However, I still don't understand why there is any halt in the
client when synchronous_commit is off.

Thanks,
Tomer
--
View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24630130.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#5tomrevam
tomer@fabrix.tv
In reply to: Greg Smith (#3)
Re: synchronous_commit=off doesn't always return immediately

Greg Smith-12 wrote:

On Thu, 23 Jul 2009, tomrevam wrote:

bgwriter_delay = 20ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers
written/round
bgwriter_lru_multiplier = 5.0 # 0-10.0 multipler on buffers
scanned/round

These settings may be contributing to the problem. You should never run
the background writer that frequently--it just wastes resources and writes
more than it should. I'd suggest turning it off altogether
(bgwriter_lru_maxpages = 0) and seeing if things improve any, just to rule
that out as a potential source of issues.

The behavior of the system is the same with bg_writer_lru_maxpages = 0. Can
you
explain why transactions are sometimes synchronous even with the
synchrounous_commit
set to off?

Thanks,
Tomer
--
View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24666816.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: tomrevam (#5)
Re: synchronous_commit=off doesn't always return immediately

tomrevam wrote:

The behavior of the system is the same with bg_writer_lru_maxpages =
0. Can you explain why transactions are sometimes synchronous even
with the synchrounous_commit set to off?

Asynchronous transactions must still be logged to the journal from time
to time. There's a background process called the "wal writer" that's
supposed to do that on the background, but if you have too much WAL
traffic then some transactions may block if there's not enough space in
the in-memory WAL buffer to store their changes, which causes them to
block. This is why Tom was suggesting you to increase wal_buffers. Did
you try that?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7tomrevam
tomer@fabrix.tv
In reply to: Alvaro Herrera (#6)
Re: synchronous_commit=off doesn't always return immediately

Alvaro Herrera-7 wrote:

This is why Tom was suggesting you to increase wal_buffers. Did
you try that?

Thanks for the explanation. I will try increasing the wal_buffers.
Unfortunately this is on a system I can't restart for the next few days.

Tomer
--
View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24675696.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#8tomrevam
tomer@fabrix.tv
In reply to: Alvaro Herrera (#6)
Re: synchronous_commit=off doesn't always return immediately

Alvaro Herrera-7 wrote:

tomrevam wrote:

This is why Tom was suggesting you to increase wal_buffers. Did
you try that?

Increasing the wal_buffers improved the performance. Is there a limit on how
big I can raise them? Is there anything that would work worse with bigger
wal_buffers?

Thanks,
Tomer
--
View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24712612.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#9tomrevam
tomer@fabrix.tv
In reply to: Tom Lane (#2)
Re: synchronous_commit=off doesn't always return immediately

Tom Lane-2 wrote:

tomrevam <tomer@fabrix.tv> writes:
It might help to increase wal_buffers.

Hi,

I increased the wal_buffers to 2 MB. Initially this improved the performance
very much, but after 4 days of continuous operation the system is back to
very long inserts and updates (at least as bad as it was before).
Is there a way to know what resource the DB is running out of?

Thanks,
Tomer
--
View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24785860.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#10Merlin Moncure
mmoncure@gmail.com
In reply to: tomrevam (#9)
Re: synchronous_commit=off doesn't always return immediately

On Mon, Aug 3, 2009 at 2:14 AM, tomrevam<tomer@fabrix.tv> wrote:

Tom Lane-2 wrote:

tomrevam <tomer@fabrix.tv> writes:
It might help to increase wal_buffers.

Hi,

I increased the wal_buffers to 2 MB. Initially this improved the performance
very much, but after 4 days of continuous operation the system is back to
very long inserts and updates (at least as bad as it was before).
Is there a way to know what resource the DB is running out of?

I can almost guarantee you that you are i/o bound. synchronous commit
helps, but is not magic: your single sata disk can only deliver about
100-200 iops and you are expecting more than that. I think you have
only two options: add more/faster disks (and consider separating wal
from the data volume) or disable sync completely with fsync=off (which
is of course quite dangerous).

This can be confirmed with an iostat, noting the tps on each volume
and the iowait.

merlin

#11Michael Gould
mgould@intermodalsoftwaresolutions.net
In reply to: Merlin Moncure (#10)
Partition tables

I am considering whether or not to partition a few tables in our system.
The guide lines I read said that this could help when the size of a table
exceeds the amount of system memory. I'm wondering if this statement should
be table or database.

The way our system operates is that each individual location is actually a
separate operation owned by another party. In most cases the data from one
location should not be seen by others. There are several tables which are
global in nature which hold mainly lookup information, but the actual
processing tables are by location.

I am wondering if partitioning would be a good way to isolate the
information for each location. Each database would be created by location
number. Out db servers is a dual quad Intel with 4 Gigs of RAM running
Windows 2000 Enterprise Server. All Client machines are running Quad core
servers with 8-16 gig of RAM partitioned using Windows 2003 and accessed by
Citrix.

The total size of our database with 5 years worth of data is about 3.4 gig.
In the business we are in, we open about 5-7 new locations each year and
close 2-3. I was also thinking that if each partition was by location it
would be easier to disconnect the partion tables to use for historial use
when we close a location. We probably would get 10 or so queries on the
closed locations 6 months after closing.

Does this seem like an appropriate use of table partitioning?

Best Regards

--
Michael Gould, Managing Partner
Intermodal Software Solutions, LLC
904.226.0978
904.592.5250 fax

#12Bill Reynolds
Bill.Reynolds@ateb.com
In reply to: Michael Gould (#11)
using generate_series to iterate through months

Ok, I'm a bit stumped on getting my group by query to work which
iterates through a number of months that basically a generate_series
provides for me.

Here is what I am using in the from clause (along with other tables) to
generate the series of numbers for the number of months. This seems to
work:
generate_series( 0, ((extract(years from age(current_date,
DATE('2008-05-01')))*12) + extract(month from age(current_date,
DATE('2008-05-01'))))::INTEGER) as s(a)

Next, I want to group by and compare against dates in my table. These
are the two parts I can't get to work:

In the select part:
select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of

and in the where clause:
and DATE(sometimestamp) >= DATE('2008-05-01') || 's.a??? months' -
someOffsetVariable - 30
and DATE(sometimestamp) < DATE('2008-05-01') + s.a + 1 || 'months'

The point is I want to use interval (s.a 'months') to iterate through
them. I can obviously get DATE('2008-05-01') + interval '3 months' to
work, but I want the '3' part to be generated with a series. I have
tried various type casting to no avail. Any help with this syntax would
be appreciated.

Thanks in advance for insight on how to do this.

#13Wojtek
foo@twine.pl
In reply to: Michael Gould (#11)
Re: Partition tables

Michael Gould wrote:

I am considering whether or not to partition a few tables in our system.
The guide lines I read said that this could help when the size of a table
exceeds the amount of system memory. I'm wondering if this statement should
be table or database.

The way our system operates is that each individual location is actually a
separate operation owned by another party. In most cases the data from one
location should not be seen by others. There are several tables which are
global in nature which hold mainly lookup information, but the actual
processing tables are by location.

I am wondering if partitioning would be a good way to isolate the
information for each location. Each database would be created by location
number. Out db servers is a dual quad Intel with 4 Gigs of RAM running
Windows 2000 Enterprise Server. All Client machines are running Quad core
servers with 8-16 gig of RAM partitioned using Windows 2003 and accessed by
Citrix.

The total size of our database with 5 years worth of data is about 3.4 gig.
In the business we are in, we open about 5-7 new locations each year and
close 2-3. I was also thinking that if each partition was by location it
would be easier to disconnect the partion tables to use for historial use
when we close a location. We probably would get 10 or so queries on the
closed locations 6 months after closing.

Does this seem like an appropriate use of table partitioning?

Hi - partitioning by definition should be used to get your data faster
and, in most of the cases, should be transparent to your client.
Questions I'd suggest to ask first:
1. do you have performance issues reading from your db?
2. can you come up with a pattern you can use to segment your data into
smaller chunks (like weeks, months, ids) to create partitions reasonably
big but not huge
3. how do you populate your db? inserts? copy? if ever you create
partitions, will you write to many or just one (the most recent one)

Yes, it can be useful archiving old 'locations'. In that case I'd
suggest to create dedicated datafile and put it on (slower=cheaper)
disks and move your older partitions there... but again, it's not the
main reason why you could consider partitioning.

Best Regards

--
Michael Gould, Managing Partner
Intermodal Software Solutions, LLC
904.226.0978
904.592.5250 fax

Regards,
foo

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Reynolds (#12)
Re: using generate_series to iterate through months

"Bill Reynolds" <Bill.Reynolds@ateb.com> writes:

Ok, I'm a bit stumped on getting my group by query to work which
iterates through a number of months that basically a generate_series
provides for me.
...
select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of

No, you're confusing a syntax that's meant to handle literal constants
with something that's appropriate for computation. What you want is

select DATE('2008-05-01') + s.a * interval '1 month' as Month_of

that is, multiply the sequence output by a suitable interval constant.

regards, tom lane

#15Wojtek
foo@twine.pl
In reply to: Wojtek (#13)
Re: Partition tables

Michael Gould wrote:

2. can you come up with a pattern you can use to segment your data into
smaller chunks (like weeks, months, ids) to create partitions reasonably
big but not huge

Yes, each location has their own specific location code.

good!
how many rows you will have per each location code?
how do you use the data? most of your reporting would read from one
location only or from multiple ones?

3. how do you populate your db? inserts? copy? if ever you create
partitions, will you write to many or just one (the most recent one)

Everything will be done via inserts either via online entry or from external
processes such as EDI processing. It would depend on how the partition is
setup. If we set them up by location, there would be one insert or more for
each record entered and it would to just the location files. If we did it
by date range then everyone would be entering data into a single set of
tables based on date range instead of by location.

do you read tables in the same time when you insert into it?
how many inserts per sec/min/hour you may have? approximate..

Yes, it can be useful archiving old 'locations'. In that case I'd
suggest to create dedicated datafile and put it on (slower=cheaper)
disks and move your older partitions there... but again, it's not the
main reason why you could consider partitioning.

From our current platform, disk space isn't a issue. Our db is highly

normalized and we've had about 200 locations over the past 10 years and the
db is currently about 4 gig of total data. That being said, the only way to
reclaim space with this db is to do a complete unload/reload which we don't
do very oftern and it appears that the Postgres vaccuum all process is much
easier to use.

ok, in that case why do you want to archive the data in the first place?
you have no space constraints and you still need to read the data from
time to time...

Regards,
foo

#16Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Bill Reynolds (#12)
Re: using generate_series to iterate through months

Bill Reynolds <Bill.Reynolds@ateb.com> wrote:

Ok, I'm a bit stumped on getting my group by query to work which
iterates through a number of months that basically a generate_series
provides for me.

Here is what I am using in the from clause (along with other tables) to
generate the series of numbers for the number of months. This seems to
work:
generate_series( 0, ((extract(years from age(current_date,
DATE('2008-05-01')))*12) + extract(month from age(current_date,
DATE('2008-05-01'))))::INTEGER) as s(a)

Next, I want to group by and compare against dates in my table. These
are the two parts I can't get to work:

In the select part:
select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of

The trick is easy:

test=*# select current_date + s.a * '1 month'::interval from (select
generate_series(0,10) as a) as s;
?column?
---------------------
2009-08-03 00:00:00
2009-09-03 00:00:00
2009-10-03 00:00:00
2009-11-03 00:00:00
2009-12-03 00:00:00
2010-01-03 00:00:00
2010-02-03 00:00:00
2010-03-03 00:00:00
2010-04-03 00:00:00
2010-05-03 00:00:00
2010-06-03 00:00:00
(11 Zeilen)

I think, you can solve your problem now.

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

#17Bill Reynolds
Bill.Reynolds@ateb.com
In reply to: Tom Lane (#14)
Re: using generate_series to iterate through months

Ok, it is Monday -:) Thanks Tom!

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, August 03, 2009 11:44 AM
To: Bill Reynolds
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] using generate_series to iterate through months

"Bill Reynolds" <Bill.Reynolds@ateb.com> writes:

Ok, I'm a bit stumped on getting my group by query to work which
iterates through a number of months that basically a generate_series
provides for me.
...
select DATE('2008-05-01') + interval (s.a??? ' months') as Month_of

No, you're confusing a syntax that's meant to handle literal constants
with something that's appropriate for computation. What you want is

select DATE('2008-05-01') + s.a * interval '1 month' as Month_of

that is, multiply the sequence output by a suitable interval constant.

regards, tom lane

#18Sam Mason
sam@samason.me.uk
In reply to: Bill Reynolds (#12)
Re: using generate_series to iterate through months

On Mon, Aug 03, 2009 at 11:15:25AM -0400, Bill Reynolds wrote:

Here is what I am using in the from clause (along with other tables) to
generate the series of numbers for the number of months. This seems to
work:
generate_series( 0, ((extract(years from age(current_date,
DATE('2008-05-01')))*12) + extract(month from age(current_date,
DATE('2008-05-01'))))::INTEGER) as s(a)

I doubt you're using it, but the generate_series in 8.4 knows how to
handle dates and intervals, for example you can do:

SELECT generate_series(timestamp '2001-1-1','2004-1-1',interval '1 month');

to go from 2001 to 2004 in one month steps. If not, I'd be tempted to
bung the above into a function at that does the same. Something like
this should work OK for series with only a few thousand rows, but don't
use it to generate a microsecond spaced series covering several years:

CREATE FUNCTION generate_series(timestamp,timestamp,interval)
RETURNS SETOF timestamp
LANGUAGE plpgsql
IMMUTABLE AS $$
DECLARE
_c timestamp := $1;
BEGIN
WHILE _c < $2 LOOP
RETURN NEXT _c;
_c := _c + $3;
END LOOP;
END;
$$;

--
Sam http://samason.me.uk/

#19Vick Khera
vivek@khera.org
In reply to: Michael Gould (#11)
Re: Partition tables

On Mon, Aug 3, 2009 at 10:51 AM, Michael
Gould<mgould@intermodalsoftwaresolutions.net> wrote:

The total size of our database with 5 years worth of data is about 3.4 gig.
In the business we are in, we open about 5-7 new locations each year and
close 2-3.  I was also thinking that if each partition was by location it

It seems to me it would be cheaper to populate the server with 8Gb of
RAM (or more) and be done with it. The effort you will expend to
partition and manage the partitions as locations come and go will be
worth far more than the cost of the RAM.

Once your DB is in the 100+GB range, then you may want to consider
paritioning, or perhaps using a schema per location, and for your few
global queries, make a view that encompasses all the schemas. Given
your rate of change, redefining the views will not be a major burden.

#20Michael Gould
mgould@intermodalsoftwaresolutions.net
In reply to: Vick Khera (#19)
LDAP using Active Directory

I am wondering how others handle the login situation. We use Active
Directory and require our users to change their passwords every 30 days.
Currently in our old system using SQL Anywhere we use the integrated login
feature. Our db server is Windows 2003 R2

I believe we can mimic this in Postgres.

What are peoples feelings about using passwords in Postgres in this
situation? We know that only people authenticated to access our servers are
actually getting logged in. All of our users must login through Citrix and
access our system via our Citrix web page login.

We I do not believe we can capture the password from Active Directory that
the user types so I really do not want to use a password on the Postgres
side. We do have application level security also which only allows certain
users (same as the login id) access to the allowed area's within the system
and only at the level of access prescribed within the system.

What are others thoughts on this. With SQL Anywhere if you are using
integrated logins, you need to enter a password when the account is first
defined to the database but it is bypassed from that point forward unless
you remove their access to use integrated logins.

Best Regards
--
Michael Gould, Managing Partner
Intermodal Software Solutions, LLC
904.226.0978
904.592.5250 fax

#21Magnus Hagander
magnus@hagander.net
In reply to: Michael Gould (#20)