default_isolation_level='serializable' crashes on Windows

Started by Heikki Linnakangasover 13 years ago21 messages
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

A customer reported that when you set
default_isolation_level='serializable' in postgresql.conf on Windows,
and try to start up the database, it crashes immediately. And sure
enough, it does, on REL9_1_STABLE as well as on master.

Stack trace:

postgres!RecoveryInProgress+0x3a
[c:\postgresql\src\backend\access\transam\xlog.c @ 7125]
postgres!check_XactIsoLevel+0x162
[c:\postgresql\src\backend\commands\variable.c @ 617]
postgres!call_string_check_hook+0x6d
[c:\postgresql\src\backend\utils\misc\guc.c @ 8226]
postgres!set_config_option+0x13e5
[c:\postgresql\src\backend\utils\misc\guc.c @ 5652]
postgres!read_nondefault_variables+0x27f
[c:\postgresql\src\backend\utils\misc\guc.c @ 7677]
postgres!SubPostmasterMain+0x227
[c:\postgresql\src\backend\postmaster\postmaster.c @ 4101]
postgres!main+0x1e9 [c:\postgresql\src\backend\main\main.c @ 187]
postgres!__tmainCRTStartup+0x192
[f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 586]
postgres!mainCRTStartup+0xe
[f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 403]
kernel32!BaseThreadInitThunk+0xd
ntdll!RtlUserThreadStart+0x1d

The problem is that when a postmaster subprocess is launched, it calls
read_nondefault_variables() very early, before shmem initialization, to
read the non-default config options from the file that postmaster wrote.
When check_XactIsoLevel() calls RecoveryInProgress(), it crashes,
because XLogCtl is NULL.

I'm not sure what the cleanest fix for this would be. It seems that we
could should just trust the values the postmaster passes to us and
accept them without checking RecoveryInProgress(), but there's no
straightforward way to tell that within check_XactIsoLevel(). Another
thought is that there's really no need to pass XactIsoLevel from
postmaster to a backend anyway, because it's overwritten from
default_transaction_isolation as soon as you begin a transaction.

There's also a call to RecoveryInProgress() in
check_transaction_read_only() as well, but it seems to not have this
problem. That's more by accident than by design, though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#1)
Re: default_isolation_level='serializable' crashes on Windows

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

The problem is that when a postmaster subprocess is launched, it calls
read_nondefault_variables() very early, before shmem initialization, to
read the non-default config options from the file that postmaster wrote.
When check_XactIsoLevel() calls RecoveryInProgress(), it crashes,
because XLogCtl is NULL.

Hm, how did the same code fail to crash in the postmaster itself, when
the postmaster read the setting from postgresql.conf?

A larger point is that I think it's broken for any GUC assignment
function to be calling something as transient as RecoveryInProgress to
start with. We probably ought to re-think the logic, not just band-aid
this by having it skip the check when shmem isn't initialized yet.
I'm thinking that the check has to occur somewhere outside GUC.

regards, tom lane

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#2)
Re: default_isolation_level='serializable' crashes on Windows

On 12.08.2012 17:39, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

The problem is that when a postmaster subprocess is launched, it calls
read_nondefault_variables() very early, before shmem initialization, to
read the non-default config options from the file that postmaster wrote.
When check_XactIsoLevel() calls RecoveryInProgress(), it crashes,
because XLogCtl is NULL.

Hm, how did the same code fail to crash in the postmaster itself, when
the postmaster read the setting from postgresql.conf?

It's not the check function for default_transaction_isolation that
crashes, but the one for transaction_isolation.

I'm not exactly sure how transaction_isolation gets set to a non-default
value, though. The default for transaction_isolation is 'default', so
it's understandable that the underlying XactIsoLevel variable gets set
to XACT_SERIALIZABLE, but AFAICS the code to read/write the GUCs from/to
file only cares about the string value of the guc, not the integer value
of the underlying global variable.

A larger point is that I think it's broken for any GUC assignment
function to be calling something as transient as RecoveryInProgress to
start with. We probably ought to re-think the logic, not just band-aid
this by having it skip the check when shmem isn't initialized yet.
I'm thinking that the check has to occur somewhere outside GUC.

Hmm, it seems like the logical place to complain if you do a manual "SET
transaction_isolation='serializable'". But I think we should only do the
check if we're not in a transaction. Setting the guc won't have any
effect outside a transaction anyway, because StartTransaction will
overwrite it from default_transaction_isolation as soon as you begin a
transaction.

While playing around, I bumped into another related bug, and after
googling around I found out that it was already reported by Robert Haas
earlier, but still not fixed:
http://archives.postgresql.org/message-id/CA%2BTgmoa0UM2W1YkjjneEgJctzxopC3G53ocYPaCyoEOWT3aKiA%40mail.gmail.com.
Kevin, the last message on that thread
(http://archives.postgresql.org/pgsql-hackers/2012-04/msg01394.php) says
you'll write a patch for that. Ping? Or would you like me to try that?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#3)
Re: default_isolation_level='serializable' crashes on Windows

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

On 12.08.2012 17:39, Tom Lane wrote:

A larger point is that I think it's broken for any GUC assignment
function to be calling something as transient as RecoveryInProgress to
start with.

Hmm, it seems like the logical place to complain if you do a manual "SET
transaction_isolation='serializable'". But I think we should only do the
check if we're not in a transaction.

You mean "if we *are* in a transaction"? Possibly that would work.
The concerns I've got about running this type of test in a GUC hook
are (1) what if you're not in a transaction or (2) what if you aren't
in a process that has access to shared memory; the early-startup-in-
EXEC_BACKEND case loses on both counts. But I think we can test our
local in-transaction state without touching shared memory, and then
go from there.

While playing around, I bumped into another related bug, and after
googling around I found out that it was already reported by Robert Haas
earlier, but still not fixed:
http://archives.postgresql.org/message-id/CA%2BTgmoa0UM2W1YkjjneEgJctzxopC3G53ocYPaCyoEOWT3aKiA%40mail.gmail.com

That sounds like a different bug, although perhaps with the same theme
of overreliance on what a GUC hook can do.

regards, tom lane

#5Amit Kapila
amit.kapila@huawei.com
In reply to: Heikki Linnakangas (#3)
Re: default_isolation_level='serializable' crashes on Windows

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Heikki Linnakangas
Sent: Monday, August 13, 2012 12:14 PM
On 12.08.2012 17:39, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

The problem is that when a postmaster subprocess is launched, it calls
read_nondefault_variables() very early, before shmem initialization, to
read the non-default config options from the file that postmaster wrote.
When check_XactIsoLevel() calls RecoveryInProgress(), it crashes,
because XLogCtl is NULL.

Hm, how did the same code fail to crash in the postmaster itself, when
the postmaster read the setting from postgresql.conf?

It's not the check function for default_transaction_isolation that
crashes, but the one for transaction_isolation.

I 'm not exactly sure how transaction_isolation gets set to a non-default
value, though. The default for transaction_isolation is 'default', so
it's understandable that the underlying XactIsoLevel variable gets set
to XACT_SERIALIZABLE, but AFAICS the code to read/write the GUCs from/to
file only cares about the string value of the guc, not the integer value
of the underlying global variable.

Here What I am able to trace is that function read_nondefault_variables(),
reads all variables
from config_exec_params which contains both default_transaction_isolation
and transaction_isolation.

1. it first reads default_transaction_isolation and sets value of
DefaultXactIsoLevel to 'serializable'.
2. As for parameter default_transaction_isolation, there is no check
function it passes.
3. After that when variable transaction_isolation is processed, function
check_XactIsoLevel() sets
XactIsoLevel to XACT_SERIALIZABLE which causes crash.

Actually function read_nondefault_variables(), should only process non
default values (default_transaction_isolation)
not transaction_isolation, but currently it processes both?

With Regards,
Amit Kapila.

#6Amit Kapila
amit.kapila@huawei.com
In reply to: Amit Kapila (#5)
Re: default_isolation_level='serializable' crashes on Windows

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Amit Kapila
Sent: Monday, August 13, 2012 12:47 PM
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Heikki Linnakangas
Sent: Monday, August 13, 2012 12:14 PM
On 12.08.2012 17:39, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

Hm, how did the same code fail to crash in the postmaster itself, when
the postmaster read the setting from postgresql.conf?

It's not the check function for default_transaction_isolation that
crashes, but the one for transaction_isolation.

I 'm not exactly sure how transaction_isolation gets set to a non-default

value, though. The default for transaction_isolation is 'default', so
it's understandable that the underlying XactIsoLevel variable gets set
to XACT_SERIALIZABLE, but AFAICS the code to read/write the GUCs from/to
file only cares about the string value of the guc, not the integer value
of the underlying global variable.

Here What I am able to trace is that function read_nondefault_variables(),
reads all variables
from config_exec_params which contains both default_transaction_isolation
and transaction_isolation.

1. it first reads default_transaction_isolation and sets value of
DefaultXactIsoLevel to 'serializable'.
2. As for parameter default_transaction_isolation, there is no check
function it passes.
3. After that when variable transaction_isolation is processed, function
check_XactIsoLevel() sets
XactIsoLevel to XACT_SERIALIZABLE which causes crash.

Actually function read_nondefault_variables(), should only process non
default values (default_transaction_isolation)
not transaction_isolation, but currently it processes both?

transaction_isolation is getting written to config_exec_params file as
function
write_one_nondefault_variable() checks if conf source is not PGC_S_DEFAULT,
then it writes to file.
For transaction_isolation, the conf source is set to PGC_S_OVERRIDE in
function InitializeGUCOptions()
so this also gets written to config_exec_params file.
Should the parameter transaction_isolation be written to config_exec_params?

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#3)
Re: default_isolation_level='serializable' crashes on Windows

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

While playing around, I bumped into another related bug, and
after googling around I found out that it was already reported by
Robert Haas earlier, but still not fixed:

http://archives.postgresql.org/message-id/CA%2BTgmoa0UM2W1YkjjneEgJctzxopC3G53ocYPaCyoEOWT3aKiA%40mail.gmail.com

Kevin, the last message on that thread (

http://archives.postgresql.org/pgsql-hackers/2012-04/msg01394.php

) says you'll write a patch for that. Ping? Or would you like me
to try that?

Let me try to redeem myself by taking a shot at it tonight.

-Kevin

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#7)
Re: default_isolation_level='serializable' crashes on Windows

Heikki Linnakangas wrote:

While playing around, I bumped into another related bug, and after
googling around I found out that it was already reported by Robert
Haas earlier, but still not fixed:

Kevin, the last message on that thread says you'll write a patch
for that. Ping?

OK, attached is a first cut to confirm that the approach looks sane
to everyone; I *think* it is along the lines that we reached
consensus. After moving the check to the point where we get a
serializable snapshot, it was still behaving badly. It took a bit,
but forcing the snapshot acquisition in InitPostgres to use 'read
committed' instead of the default isolation level got reasonable
behavior in my initial tests. It sure looks a lot better to *me*
than what was happening before.

Besides confirming that this is the solution people want, this has
not been tested well enough yet to be ready for commit. It's getting
late, though, and I'm fading. If the overall approach looks good
I'll beat up on it some more tomorrow.

Since the existing behavior is so bad, I'm inclined to think this
merits backpatching to 9.1. Thoughts on that?

-Kevin

#9Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#8)
1 attachment(s)
Re: default_isolation_level='serializable' crashes on Windows

"Kevin Grittner" wrote:

OK, attached

[sigh]

This time for sure!

-Kevin

Attachments:

hotstandby-serializable.patchapplication/octet-stream; name=hotstandby-serializable.patchDownload
*** a/src/backend/commands/variable.c
--- b/src/backend/commands/variable.c
***************
*** 613,625 **** check_XactIsoLevel(char **newval, void **extra, GucSource source)
  			GUC_check_errmsg("SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction");
  			return false;
  		}
! 		/* Can't go to serializable mode while recovery is still active */
! 		if (newXactIsoLevel == XACT_SERIALIZABLE && RecoveryInProgress())
! 		{
! 			GUC_check_errmsg("cannot use serializable mode in a hot standby");
! 			GUC_check_errhint("You can use REPEATABLE READ instead.");
! 			return false;
! 		}
  	}
  
  	*extra = malloc(sizeof(int));
--- 613,624 ----
  			GUC_check_errmsg("SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction");
  			return false;
  		}
! 		/*
! 		 * NOTE: We cannot check for an attempt to set SERIALIZABLE
! 		 * transactions on a hot standby here, because when that is set as the
! 		 * default, this would get called before shared memory is even set up.
! 		 * Instead, this is checked when a serializable snapshot is requested.
! 		 */
  	}
  
  	*extra = malloc(sizeof(int));
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 1572,1577 **** GetSerializableTransactionSnapshot(Snapshot snapshot)
--- 1572,1587 ----
  	Assert(IsolationIsSerializable());
  
  	/*
+ 	 * Can't use serializable mode while recovery is still active, as it is,
+ 	 * for example, on a hot standby.
+ 	 */
+ 	if (RecoveryInProgress())
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot use serializable mode in a hot standby"),
+ 				 errhint("You can use REPEATABLE READ instead.")));
+ 
+ 	/*
  	 * A special optimization is available for SERIALIZABLE READ ONLY
  	 * DEFERRABLE transactions -- we can wait for a suitable snapshot and
  	 * thereby avoid all SSI overhead once it's running.
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
***************
*** 584,589 **** InitPostgres(const char *in_dbname, Oid dboid, const char *username,
--- 584,591 ----
  		/* statement_timestamp must be set for timeouts to work correctly */
  		SetCurrentStatementStartTimestamp();
  		StartTransactionCommand();
+ 		SetConfigOption("transaction_isolation", "read committed",
+ 						PGC_SUSET, PGC_S_OVERRIDE);
  		(void) GetTransactionSnapshot();
  	}
  
#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#8)
Re: default_isolation_level='serializable' crashes on Windows

On 14.08.2012 08:23, Kevin Grittner wrote:

OK, attached is a first cut to confirm that the approach looks sane
to everyone; I *think* it is along the lines that we reached
consensus. After moving the check to the point where we get a
serializable snapshot, it was still behaving badly. It took a bit,
but forcing the snapshot acquisition in InitPostgres to use 'read
committed' instead of the default isolation level got reasonable
behavior in my initial tests. It sure looks a lot better to *me*
than what was happening before.

A comment in InitPostgres would be nice, explaining why we want a read
committed snapshot there.

Besides confirming that this is the solution people want, this has
not been tested well enough yet to be ready for commit. It's getting
late, though, and I'm fading. If the overall approach looks good
I'll beat up on it some more tomorrow.

Thanks! This fixes both the assertion failure and the Windows crash,
which is good.

Now that the error is thrown when the first snapshot is taken, rather
than at the SET command, I think the error message needs some work:

postgres=# select 123;
ERROR: cannot use serializable mode in a hot standby
HINT: You can use REPEATABLE READ instead.

If the isolation level came from default_transaction_isolation, set in
postgresql.conf file, it might take the user a while to figure that out.
Perhaps something like this:

ERROR: cannot use serializable mode in a hot standby
DETAIL: default_transaction_isolation was set to 'serializable' in the
config file.
HINT: You can use "SET transaction_isolation = 'repeatable read'"
before the first query in the transaction to override it.

This still leaves the RecoveryInProgress() call in
check_transaction_read_only(), which isn't a problem at the moment, but
seems fragile. I think we should still add the IsTransactionState()
check in there, so that it works without shared memory access. If we're
not in a transaction yet (TRANS_DEFAULT), setting transaction_read_only
has no effect, as it's overwritten at the beginning of a transaction. If
we're in one of the transitory states, TRANS_START or
TRANS_COMMIT/ABORT/PREPARE, I'm not sure what should happen, but it
should not be possible for user code to run and change
transaction_read_only in those states.

Since the existing behavior is so bad, I'm inclined to think this
merits backpatching to 9.1. Thoughts on that?

Yes, we have to somehow fix the crash and the assertion failure on 9.1.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#11Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#10)
Re: default_isolation_level='serializable' crashes on Windows

Heikki Linnakangas wrote:

On 14.08.2012 08:23, Kevin Grittner wrote:

OK, attached is a first cut to confirm that the approach looks
sane to everyone; I *think* it is along the lines that we reached
consensus. After moving the check to the point where we get a
serializable snapshot, it was still behaving badly. It took a bit,
but forcing the snapshot acquisition in InitPostgres to use 'read
committed' instead of the default isolation level got reasonable
behavior in my initial tests. It sure looks a lot better to *me*
than what was happening before.

Oh, further testing this morning shows that while *queries* on the HS
seem OK, streaming replication is now broken. I probably need to
override transaction isolation on the recovery process. I'll take a
look at that.

A comment in InitPostgres would be nice, explaining why we want a
read committed snapshot there.

OK.

This fixes both the assertion failure and the Windows crash, which
is good.

I wasn't sure whether it would help with the Windows crash or not.
I'm not set up to build under Windows, so I'm glad you gave that a
check.

Now that the error is thrown when the first snapshot is taken,
rather than at the SET command, I think the error message needs
some work:

postgres=# select 123;
ERROR: cannot use serializable mode in a hot standby
HINT: You can use REPEATABLE READ instead.

If the isolation level came from default_transaction_isolation, set
in postgresql.conf file, it might take the user a while to figure
that out. Perhaps something like this:

ERROR: cannot use serializable mode in a hot standby
DETAIL: default_transaction_isolation was set to 'serializable' in
the config file.
HINT: You can use "SET transaction_isolation = 'repeatable read'"
before the first query in the transaction to override it.

Did you mean?:

HINT: You can use "SET default_transaction_isolation = 'repeatable
read'" before the first query in the transaction to override it.

Otherwise, I agree and will do.

This still leaves the RecoveryInProgress() call in
check_transaction_read_only(), which isn't a problem at the moment,
but seems fragile. I think we should still add the
IsTransactionState() check in there, so that it works without
shared memory access. If we're not in a transaction yet
(TRANS_DEFAULT), setting transaction_read_only has no effect, as
it's overwritten at the beginning of a transaction. If we're in one
of the transitory states, TRANS_START or
TRANS_COMMIT/ABORT/PREPARE, I'm not sure what should happen, but it
should not be possible for user code to run and change
transaction_read_only in those states.

I'll take a look.

Since the existing behavior is so bad, I'm inclined to think this
merits backpatching to 9.1. Thoughts on that?

Yes, we have to somehow fix the crash and the assertion failure on
9.1.

Should the check_transaction_read_only() stuff be back-patched to
9.1, too? So far as we know, that's fragile, not broken, right?
Could the fix you envision there cause a behavioral change that could
break anything that users might have in place?

-Kevin

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#11)
1 attachment(s)
Re: default_isolation_level='serializable' crashes on Windows

Heikki Linnakangas wrote:

we have to somehow fix the crash and the assertion failure on 9.1.

Here's a revision with some changes based on your feedback. I have
to go to my "day job" now, and I was unable to find the right place
to fix the streaming replication problem. I fear I won't be able to
get this sorted out before the wrap of back-branch releases this
evening, so if you feel it is urgent enough to need to get into
that, feel free to finish it; otherwise I'll keep at it tonight.

-Kevin

Attachments:

hotstandby-serializable-2.patchapplication/octet-stream; name=hotstandby-serializable-2.patchDownload
*** a/src/backend/commands/variable.c
--- b/src/backend/commands/variable.c
***************
*** 613,625 **** check_XactIsoLevel(char **newval, void **extra, GucSource source)
  			GUC_check_errmsg("SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction");
  			return false;
  		}
! 		/* Can't go to serializable mode while recovery is still active */
! 		if (newXactIsoLevel == XACT_SERIALIZABLE && RecoveryInProgress())
! 		{
! 			GUC_check_errmsg("cannot use serializable mode in a hot standby");
! 			GUC_check_errhint("You can use REPEATABLE READ instead.");
! 			return false;
! 		}
  	}
  
  	*extra = malloc(sizeof(int));
--- 613,624 ----
  			GUC_check_errmsg("SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction");
  			return false;
  		}
! 		/*
! 		 * NOTE: We cannot check for an attempt to set SERIALIZABLE
! 		 * transactions on a hot standby here, because when that is set as the
! 		 * default, this would get called before shared memory is even set up.
! 		 * Instead, this is checked when a serializable snapshot is requested.
! 		 */
  	}
  
  	*extra = malloc(sizeof(int));
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 1572,1577 **** GetSerializableTransactionSnapshot(Snapshot snapshot)
--- 1572,1596 ----
  	Assert(IsolationIsSerializable());
  
  	/*
+ 	 * Can't use serializable mode while recovery is still active, as it is,
+ 	 * for example, on a hot standby.
+ 	 */
+ 	if (RecoveryInProgress())
+ 	{
+ 		if (DefaultXactIsoLevel == XACT_SERIALIZABLE)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use serializable mode in a hot standby"),
+ 					 errhint("You can use REPEATABLE READ instead.")));
+ 		else
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use serializable mode in a hot standby"),
+ 					 errdetail("default_transaction_isolation was set to 'serializable'"),
+ 					 errhint("You can use \"SET default_transaction_isolation = 'repeatable read'\" before a transaction to change it for a connection.")));
+ 	}
+ 
+ 	/*
  	 * A special optimization is available for SERIALIZABLE READ ONLY
  	 * DEFERRABLE transactions -- we can wait for a suitable snapshot and
  	 * thereby avoid all SSI overhead once it's running.
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
***************
*** 584,589 **** InitPostgres(const char *in_dbname, Oid dboid, const char *username,
--- 584,597 ----
  		/* statement_timestamp must be set for timeouts to work correctly */
  		SetCurrentStatementStartTimestamp();
  		StartTransactionCommand();
+ 
+ 		/*
+ 		 * transaction_isolation will have been set to the default by the
+ 		 * above.  Connection to a hot standby will have a problem here if we
+ 		 * are in 'serializable' mode, so force to something safe.
+ 		 */
+ 		SetConfigOption("transaction_isolation", "read committed",
+ 						PGC_SUSET, PGC_S_OVERRIDE);
  		(void) GetTransactionSnapshot();
  	}
  
#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#11)
Re: default_isolation_level='serializable' crashes on Windows

On 14.08.2012 14:25, Kevin Grittner wrote:

Heikki Linnakangas wrote:

On 14.08.2012 08:23, Kevin Grittner wrote:

OK, attached is a first cut to confirm that the approach looks
sane to everyone; I *think* it is along the lines that we reached
consensus. After moving the check to the point where we get a
serializable snapshot, it was still behaving badly. It took a bit,
but forcing the snapshot acquisition in InitPostgres to use 'read
committed' instead of the default isolation level got reasonable
behavior in my initial tests. It sure looks a lot better to *me*
than what was happening before.

Oh, further testing this morning shows that while *queries* on the HS
seem OK, streaming replication is now broken. I probably need to
override transaction isolation on the recovery process. I'll take a
look at that.

Hmm, seems to work for me. Do you get an unexpected error or what?

Now that the error is thrown when the first snapshot is taken,
rather than at the SET command, I think the error message needs
some work:

postgres=# select 123;
ERROR: cannot use serializable mode in a hot standby
HINT: You can use REPEATABLE READ instead.

If the isolation level came from default_transaction_isolation, set
in postgresql.conf file, it might take the user a while to figure
that out. Perhaps something like this:

ERROR: cannot use serializable mode in a hot standby
DETAIL: default_transaction_isolation was set to 'serializable' in
the config file.
HINT: You can use "SET transaction_isolation = 'repeatable read'"
before the first query in the transaction to override it.

Did you mean?:

HINT: You can use "SET default_transaction_isolation = 'repeatable
read'" before the first query in the transaction to override it.

Otherwise, I agree and will do.

I didn't, I meant to point out that you can set transaction_isolation
just for the one transaction. But your suggested hint is OK as well -
you suggest to set it for the whole session, which will also work around
the problem. The "before the first query in the transaction" isn't
necessary in that case though.

Yet another option is to suggest the "BEGIN ISOLATION LEVEL REPEATABLE
READ" syntax, instead of SET.

Since the existing behavior is so bad, I'm inclined to think this
merits backpatching to 9.1. Thoughts on that?

Yes, we have to somehow fix the crash and the assertion failure on
9.1.

Should the check_transaction_read_only() stuff be back-patched to
9.1, too? So far as we know, that's fragile, not broken, right?
Could the fix you envision there cause a behavioral change that could
break anything that users might have in place?

Good question. I don't see how it could cause a behavioral change, but
I've been wrong before.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#13)
Re: default_isolation_level='serializable' crashes on Windows

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 14.08.2012 14:25, Kevin Grittner wrote:

Heikki Linnakangas wrote:

On 14.08.2012 08:23, Kevin Grittner wrote:

Oh, further testing this morning shows that while *queries* on
the HS seem OK, streaming replication is now broken. I probably
need to override transaction isolation on the recovery process.
I'll take a look at that.

Hmm, seems to work for me. Do you get an unexpected error or what?

No, I wasn't getting errors in the clients or the logs, but I wasn't
seeing data pop up on the replica when I expected. Perhaps I messed
up my streaming replication configuration somehow. Do I understand
that you are now seeing correction of both bugs and no new
misbehaviors in your tests?

Now that the error is thrown when the first snapshot is taken,
rather than at the SET command, I think the error message needs
some work:

postgres=# select 123;
ERROR: cannot use serializable mode in a hot standby
HINT: You can use REPEATABLE READ instead.

If the isolation level came from default_transaction_isolation,
set in postgresql.conf file, it might take the user a while to
figure that out. Perhaps something like this:

ERROR: cannot use serializable mode in a hot standby
DETAIL: default_transaction_isolation was set to 'serializable'
in the config file.
HINT: You can use "SET transaction_isolation = 'repeatable
read'" before the first query in the transaction to override it.

Did you mean?:

HINT: You can use "SET default_transaction_isolation =
'repeatable read'" before the first query in the transaction to
override it.

I didn't, I meant to point out that you can set
transaction_isolation just for the one transaction. But your
suggested hint is OK as well - you suggest to set it for the whole
session, which will also work around the problem. The "before the
first query in the transaction" isn't necessary in that case
though.

Yeah, I figured that out before posting version 2 of the patch. I
should have said something.

Yet another option is to suggest the "BEGIN ISOLATION LEVEL
REPEATABLE READ" syntax, instead of SET.

I'm inclined toward hinting at a session override of the default.
If you're typing away in psql, that's a lot less work. :-) That's
what I included in version 2 of the patch, but only if the default
was serializable. I did say to set it "before a transaction"
because a quick test showed that setting the default in a
transaction didn't keep that transaction from getting the error if
you proceeded to run a SELECT statement.

Since the existing behavior is so bad, I'm inclined to think
this merits backpatching to 9.1. Thoughts on that?

Yes, we have to somehow fix the crash and the assertion failure
on 9.1.

Should the check_transaction_read_only() stuff be back-patched to
9.1, too? So far as we know, that's fragile, not broken, right?
Could the fix you envision there cause a behavioral change that
could break anything that users might have in place?

Good question. I don't see how it could cause a behavioral change,
but I've been wrong before.

If we don't know of any actual existing bugs I'm inclined to not
back-patch that part to 9.1, although it makes sense for 9.2 since
we shouldn't be risking breakage of any production systems. I'm
really cautious about giving anybody any excuse not to apply a minor
update.

-Kevin

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#14)
1 attachment(s)
Re: default_isolation_level='serializable' crashes on Windows

"Kevin Grittner" wrote:
Heikki Linnakangas wrote:

On 14.08.2012 14:25, Kevin Grittner wrote:

Attached is version 3.

Oh, further testing this morning shows that while *queries* on
the HS seem OK, streaming replication is now broken. I probably
need to override transaction isolation on the recovery process.
I'll take a look at that.

Hmm, seems to work for me. Do you get an unexpected error or what?

No, I wasn't getting errors in the clients or the logs, but I
wasn't seeing data pop up on the replica when I expected. Perhaps I
messed up my streaming replication configuration somehow.

Yeah, setup error. I accidentally dropped the primary_conninfo
setting from my recovery.conf file. Now that I've put that back, it
works just fine.

I didn't, I meant to point out that you can set
transaction_isolation just for the one transaction. But your
suggested hint is OK as well - you suggest to set it for the whole
session, which will also work around the problem. The "before the
first query in the transaction" isn't necessary in that case
though.

Yet another option is to suggest the "BEGIN ISOLATION LEVEL
REPEATABLE READ" syntax, instead of SET.

I'm inclined toward hinting at a session override of the default.
If you're typing away in psql, that's a lot less work. :-)

I tinkered with the messages (including correcting a reverse-logic
bug in which was displayed under what circumstances) and tweaked a
comment. I struggled with how to capitalize and quote. Let me know
what you think.

Since the existing behavior is so bad, I'm inclined to think
this merits backpatching to 9.1. Thoughts on that?

Yes, we have to somehow fix the crash and the assertion failure
on 9.1.

Should the check_transaction_read_only() stuff be back-patched to
9.1, too? So far as we know, that's fragile, not broken, right?
Could the fix you envision there cause a behavioral change that
could break anything that users might have in place?

Good question. I don't see how it could cause a behavioral change,
but I've been wrong before.

If we don't know of any actual existing bugs I'm inclined to not
back-patch that part to 9.1, although it makes sense for 9.2 since
we shouldn't be risking breakage of any production systems. I'm
really cautious about giving anybody any excuse not to apply a
minor update.

How about we fix the serializable versus HS & Windows bugs in one
patch, and then look at the other as a separate patch? If that's OK,
I think this is ready, unless my message text can be improved. (And
I will have a shot at my first back-patching....)

-Kevin

Attachments:

hotstandby-serializable-3.patchapplication/octet-stream; name=hotstandby-serializable-3.patchDownload
*** a/src/backend/commands/variable.c
--- b/src/backend/commands/variable.c
***************
*** 613,625 **** check_XactIsoLevel(char **newval, void **extra, GucSource source)
  			GUC_check_errmsg("SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction");
  			return false;
  		}
! 		/* Can't go to serializable mode while recovery is still active */
! 		if (newXactIsoLevel == XACT_SERIALIZABLE && RecoveryInProgress())
! 		{
! 			GUC_check_errmsg("cannot use serializable mode in a hot standby");
! 			GUC_check_errhint("You can use REPEATABLE READ instead.");
! 			return false;
! 		}
  	}
  
  	*extra = malloc(sizeof(int));
--- 613,624 ----
  			GUC_check_errmsg("SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction");
  			return false;
  		}
! 		/*
! 		 * NOTE: We cannot check for an attempt to set SERIALIZABLE
! 		 * transactions on a hot standby here, because when that is set as the
! 		 * default, this would get called before shared memory is even set up.
! 		 * Instead, this is checked when a serializable snapshot is requested.
! 		 */
  	}
  
  	*extra = malloc(sizeof(int));
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 1572,1577 **** GetSerializableTransactionSnapshot(Snapshot snapshot)
--- 1572,1596 ----
  	Assert(IsolationIsSerializable());
  
  	/*
+ 	 * Can't use serializable mode while recovery is still active, as it is,
+ 	 * for example, on a hot standby.
+ 	 */
+ 	if (RecoveryInProgress())
+ 	{
+ 		if (DefaultXactIsoLevel == XACT_SERIALIZABLE)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use SERIALIZABLE mode in a hot standby"),
+ 					 errdetail("\"default_transaction_isolation\" is set to \"serializable\"."),
+ 					 errhint("You can use \"SET default_transaction_isolation = 'repeatable read'\" to change the default.")));
+ 		else
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use SERIALIZABLE mode in a hot standby"),
+ 					 errhint("You can use REPEATABLE READ instead.")));
+ 	}
+ 
+ 	/*
  	 * A special optimization is available for SERIALIZABLE READ ONLY
  	 * DEFERRABLE transactions -- we can wait for a suitable snapshot and
  	 * thereby avoid all SSI overhead once it's running.
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
***************
*** 584,589 **** InitPostgres(const char *in_dbname, Oid dboid, const char *username,
--- 584,597 ----
  		/* statement_timestamp must be set for timeouts to work correctly */
  		SetCurrentStatementStartTimestamp();
  		StartTransactionCommand();
+ 
+ 		/*
+ 		 * transaction_isolation will have been set to the default by the
+ 		 * above.  Connection to a hot standby will have a problem here if we
+ 		 * are in 'serializable' mode, so force isolation to something safe.
+ 		 */
+ 		SetConfigOption("transaction_isolation", "read committed",
+ 						PGC_SUSET, PGC_S_OVERRIDE);
  		(void) GetTransactionSnapshot();
  	}
  
#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#15)
Re: default_isolation_level='serializable' crashes on Windows

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

How about we fix the serializable versus HS & Windows bugs in one
patch, and then look at the other as a separate patch? If that's OK,
I think this is ready, unless my message text can be improved. (And
I will have a shot at my first back-patching....)

I poked around this area a bit. I notice that
check_transaction_read_only has got the same fundamental error: it
thinks it can safely consult RecoveryInProgress(), which in general
it cannot.

The particular cases we have discussed so far cannot lead to a crash
there, because in startup scenarios XactReadOnly wouldn't be on already.
However I think that a background process not connected to shared memory
could be crashed if somebody changed transaction_read_only from true to
false in postgresql.conf. That's sufficiently far-fetched that maybe we
shouldn't worry about it, but still it seems ugly.

On reflection I think maybe the best solution is for
check_transaction_read_only to test IsTransactionState(), and just
allow the change always if outside a transaction. That would make its
IsSubTransaction test a bit saner/safer as well. We could do the same
thing in check_XactIsoLevel. We still need a test in
GetSerializableTransactionSnapshot, or someplace else in the vicinity
of that, to cover the default_transaction_isolation = serializable case;
but if we leave the error check in place in check_XactIsoLevel, you'll
get an immediate rather than delayed error from trying to crank the
level up manually in hot standby, which seems more user-friendly.

Will send an updated patch as soon as I'm done testing this idea.

One other point: I notice that Kevin used ERRCODE_FEATURE_NOT_SUPPORTED
in the error messages he added, which seems sane in isolation.
However, the GUC-based code is (by default) throwing
ERRCODE_INVALID_PARAMETER_VALUE when it rejects due to
RecoveryInProgress. I'm not totally sure whether that was thought about
or just fell out of not thinking. Which one do we want here?

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#16)
Re: default_isolation_level='serializable' crashes on Windows

I wrote:

I poked around this area a bit. I notice that
check_transaction_read_only has got the same fundamental error: it
thinks it can safely consult RecoveryInProgress(), which in general
it cannot.

After rereading the whole thread I saw that Heikki had already pointed
this out, and come to the same conclusion about how to fix it:

On reflection I think maybe the best solution is for
check_transaction_read_only to test IsTransactionState(), and just
allow the change always if outside a transaction.

Attached is a version of the patch that does it like that. I've checked
that this fixes the problem (as well as Robert's earlier report) in an
EXEC_BACKEND build, which is as close as I can get to the Windows
environment.

I tweaked Kevin's error message to keep the same capitalization as the
existing text for the message in check_XactIsoLevel --- if we change
that it will cause work for the translators, and I don't think it's
enough of an improvement to justify that.

I also back-propagated the use of ERRCODE_FEATURE_NOT_SUPPORTED into
the GUC check hooks. On reflection this seems more appropriate than
ERRCODE_INVALID_PARAMETER_VALUE.

Lastly, I simplified the added code in InitPostgres down to just a
bare assignment to XactIsoLevel. It doesn't seem worthwhile to add
all the cycles involved in SetConfigOption(), when we have no desire
to change the GUC permanently. (I think Kevin's code was wrong anyway
in that it was using PGC_S_OVERRIDE, which would impact the reset
state for the GUC.)

I think this is ready to go. Kevin, do you want to apply it? You
had mentioned wanting some practice with back-patches.

regards, tom lane

#18Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#17)
Re: default_isolation_level='serializable' crashes on Windows

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I tweaked Kevin's error message to keep the same capitalization as
the existing text for the message in check_XactIsoLevel --- if we
change that it will cause work for the translators, and I don't
think it's enough of an improvement to justify that.

That's one of the reasons I agonized over it -- I think the way I
left it is a little better and more consistent with other messages,
but didn't know whether the difference was worth translator effort.
I'm happy to trust your judgment on that.

Lastly, I simplified the added code in InitPostgres down to just a
bare assignment to XactIsoLevel. It doesn't seem worthwhile to
add all the cycles involved in SetConfigOption(), when we have no
desire to change the GUC permanently. (I think Kevin's code was
wrong anyway in that it was using PGC_S_OVERRIDE, which would
impact the reset state for the GUC.)

Point taken on PGC_S_OVERRIDE. And that probably fixes the issue
that caused me to hold up when I was about ready to pull the trigger
this past weekend. A final round of testing showed a "SET" line on
psql start, which is clearly wrong. I suspected that I needed to go
to a lower level in setting that, but hadn't had a chance to sort
out just what the right path was. In retrospect, just directly
assigning the value seems pretty obvious.

I think this is ready to go.

With your changes, I agree.

Kevin, do you want to apply it? You had mentioned wanting some
practice with back-patches.

I'm getting on a plane to Istanbul in less than 48 hours for the
VLDB conference, and scrambling to tie up loose ends. I don't want
to be under that kind of time-pressure when I back-patch for the
first time, for fear of making a mess of things and not being around
to clean up the mess; so my first back-patch is probably best left
for another time.

I'll run through my tests again tonight, against your patch, not
that I expect any problems with it. Unfortunately I can't test
Windows, as I don't have a build environment for that.

Thanks for going over this.

-Kevin

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#18)
Re: default_isolation_level='serializable' crashes on Windows

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

I'll run through my tests again tonight, against your patch, not
that I expect any problems with it. Unfortunately I can't test
Windows, as I don't have a build environment for that.

FWIW, you can approximate Windows close enough for this type of problem
by building with EXEC_BACKEND defined. I usually add the #define to
pg_config.h after configuring, though of course there's more than one
way to do it.

regards, tom lane

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#19)
Re: default_isolation_level='serializable' crashes on Windows

Tom Lane wrote:
"Kevin Grittner" writes:

I'll run through my tests again tonight, against your patch, not
that I expect any problems with it. Unfortunately I can't test
Windows, as I don't have a build environment for that.

FWIW, you can approximate Windows close enough for this type of
problem by building with EXEC_BACKEND defined. I usually add the
#define to pg_config.h after configuring, though of course there's
more than one way to do it.

It tested out fine both ways.

For anybody else wanting to use EXEC_BACKEND for testing -- don't
count on --enable-depend to rebuild everything correctly. I used
maintainer-clean and it worked.

-Kevin

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#18)
Re: default_isolation_level='serializable' crashes on Windows

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Lastly, I simplified the added code in InitPostgres down to just a
bare assignment to XactIsoLevel. It doesn't seem worthwhile to
add all the cycles involved in SetConfigOption(), when we have no
desire to change the GUC permanently. (I think Kevin's code was
wrong anyway in that it was using PGC_S_OVERRIDE, which would
impact the reset state for the GUC.)

Point taken on PGC_S_OVERRIDE. And that probably fixes the issue
that caused me to hold up when I was about ready to pull the trigger
this past weekend. A final round of testing showed a "SET" line on
psql start, which is clearly wrong. I suspected that I needed to go
to a lower level in setting that, but hadn't had a chance to sort
out just what the right path was. In retrospect, just directly
assigning the value seems pretty obvious.

Hm, I'm not sure why using SetConfigOption() would result in anything
happening client-side. (If this GUC were GUC_REPORT, that would result
in a parameter-change report to the client; but it isn't, and anyway
that shouldn't cause psql to print "SET".) Might be worth looking
closer at what was happening there.

Kevin, do you want to apply it? You had mentioned wanting some
practice with back-patches.

I'm getting on a plane to Istanbul in less than 48 hours for the
VLDB conference, and scrambling to tie up loose ends.

OK, I've committed it.

regards, tom lane