Assertion failure in base backup code path

Started by Antonin Houskaabout 12 years ago8 messages
#1Antonin Houska
antonin.houska@gmail.com

In HEAD, pg_basebackup causes WAL sender to fail when the replication
user is not a superuser:

#0 0x00007f34f671dd25 in raise () from /lib64/libc.so.6
#1 0x00007f34f671f1a8 in abort () from /lib64/libc.so.6
#2 0x00000000008989a9 in ExceptionalCondition (conditionName=0xa51ac1
"!(IsTransactionState())", errorType=0xa51734 "FailedAssertion",
fileName=0xa516e0 "catcache.c", lineNumber=1111) at assert.c:54
#3 0x000000000087ea36 in SearchCatCache (cache=0x23c4fb8, v1=16384,
v2=0, v3=0, v4=0) at catcache.c:1111
#4 0x0000000000890cdd in SearchSysCache (cacheId=11, key1=16384,
key2=0, key3=0, key4=0) at syscache.c:909
#5 0x00000000008a9a99 in has_rolreplication (roleid=16384) at
miscinit.c:401
#6 0x00000000005146d1 in do_pg_start_backup (backupidstr=0x239d5b0
"bkp_01", fast=0 '\000', starttli_p=0x7fff78e4f8ec,
labelfile=0x7fff78e4f8e0) at xlog.c:9633
#7 0x0000000000733a24 in perform_base_backup (opt=0x7fff78e4fa30,
tblspcdir=0x242c6a0) at basebackup.c:106
#8 0x0000000000735013 in SendBaseBackup (cmd=0x239dbf8) at basebackup.c:563
#9 0x000000000072f4f2 in exec_replication_command (cmd_string=0x23ea078
"BASE_BACKUP LABEL 'bkp_01' WAL NOWAIT") at walsender.c:668
#10 0x000000000077c5c4 in PostgresMain (argc=1, argv=0x2385358,
dbname=0x2385248 "", username=0x2385210 "postgres_replication") at
postgres.c:4009
#11 0x0000000000717c94 in BackendRun (port=0x23a2e90) at postmaster.c:4085
#12 0x000000000071742e in BackendStartup (port=0x23a2e90) at
postmaster.c:3774
#13 0x0000000000713cc9 in ServerLoop () at postmaster.c:1585
#14 0x0000000000713370 in PostmasterMain (argc=3, argv=0x2381f60) at
postmaster.c:1240
#15 0x0000000000677698 in main (argc=3, argv=0x2381f60) at main.c:196

Some additional condition may be needed in the Assert() statement?

// Antonin Houska (Tony)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@2ndquadrant.com
In reply to: Antonin Houska (#1)
Re: Assertion failure in base backup code path

Hi,

On 2013-12-16 15:08:51 +0100, Antonin Houska wrote:

In HEAD, pg_basebackup causes WAL sender to fail when the replication
user is not a superuser:

#0 0x00007f34f671dd25 in raise () from /lib64/libc.so.6
#1 0x00007f34f671f1a8 in abort () from /lib64/libc.so.6
#2 0x00000000008989a9 in ExceptionalCondition (conditionName=0xa51ac1
"!(IsTransactionState())", errorType=0xa51734 "FailedAssertion",
fileName=0xa516e0 "catcache.c", lineNumber=1111) at assert.c:54
#3 0x000000000087ea36 in SearchCatCache (cache=0x23c4fb8, v1=16384,
v2=0, v3=0, v4=0) at catcache.c:1111
#4 0x0000000000890cdd in SearchSysCache (cacheId=11, key1=16384,
key2=0, key3=0, key4=0) at syscache.c:909
#5 0x00000000008a9a99 in has_rolreplication (roleid=16384) at
miscinit.c:401
#6 0x00000000005146d1 in do_pg_start_backup (backupidstr=0x239d5b0
"bkp_01", fast=0 '\000', starttli_p=0x7fff78e4f8ec,
labelfile=0x7fff78e4f8e0) at xlog.c:9633
#7 0x0000000000733a24 in perform_base_backup (opt=0x7fff78e4fa30,
tblspcdir=0x242c6a0) at basebackup.c:106
#8 0x0000000000735013 in SendBaseBackup (cmd=0x239dbf8) at basebackup.c:563
#9 0x000000000072f4f2 in exec_replication_command (cmd_string=0x23ea078
"BASE_BACKUP LABEL 'bkp_01' WAL NOWAIT") at walsender.c:668
#10 0x000000000077c5c4 in PostgresMain (argc=1, argv=0x2385358,
dbname=0x2385248 "", username=0x2385210 "postgres_replication") at
postgres.c:4009
#11 0x0000000000717c94 in BackendRun (port=0x23a2e90) at postmaster.c:4085
#12 0x000000000071742e in BackendStartup (port=0x23a2e90) at
postmaster.c:3774
#13 0x0000000000713cc9 in ServerLoop () at postmaster.c:1585
#14 0x0000000000713370 in PostmasterMain (argc=3, argv=0x2381f60) at
postmaster.c:1240
#15 0x0000000000677698 in main (argc=3, argv=0x2381f60) at main.c:196

Some additional condition may be needed in the Assert() statement?

Actually it more looks like a bug around the basebackup facility. It
shouldn't do syscache lookups without having the resource management
stuff around it (the transaction state asserted in SearchCatCache()).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#2)
Re: Assertion failure in base backup code path

Hi Magnus,

It looks to me like the path to do_pg_start_backup() outside of a
transaction context comes from your initial commit of the base backup
facility.
The problem is that you're not allowed to do anything leading to a
syscache/catcache lookup in those contexts.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#3)
Re: Assertion failure in base backup code path

On Dec 19, 2013 12:06 AM, "Andres Freund" <andres@2ndquadrant.com> wrote:

Hi Magnus,

It looks to me like the path to do_pg_start_backup() outside of a
transaction context comes from your initial commit of the base backup
facility.
The problem is that you're not allowed to do anything leading to a
syscache/catcache lookup in those contexts.

I think that may have come with the addition of the replication privilege
actually but that doesn't change the fact that yes, it appears broken..

/Magnus

#5Andres Freund
andres@2ndquadrant.com
In reply to: Magnus Hagander (#4)
Re: Assertion failure in base backup code path

On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote:

On Dec 19, 2013 12:06 AM, "Andres Freund" <andres@2ndquadrant.com> wrote:

Hi Magnus,

It looks to me like the path to do_pg_start_backup() outside of a
transaction context comes from your initial commit of the base backup
facility.
The problem is that you're not allowed to do anything leading to a
syscache/catcache lookup in those contexts.

I think that may have come with the addition of the replication privilege
actually but that doesn't change the fact that yes, it appears broken..

There was a if (!superuser()) check there before as well, that has the
same dangers.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#5)
1 attachment(s)
Re: Assertion failure in base backup code path

On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund <andres@2ndquadrant.com>wrote:

On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote:

On Dec 19, 2013 12:06 AM, "Andres Freund" <andres@2ndquadrant.com>

wrote:

Hi Magnus,

It looks to me like the path to do_pg_start_backup() outside of a
transaction context comes from your initial commit of the base backup
facility.
The problem is that you're not allowed to do anything leading to a
syscache/catcache lookup in those contexts.

I think that may have come with the addition of the replication privilege
actually but that doesn't change the fact that yes, it appears broken..

There was a if (!superuser()) check there before as well, that has the
same dangers.

I think the correct fix is to move the security check outside of
do_pg_start_backup() and leave it to the caller. That means
pg_start_backup() for a manual backup. And for a streaming base backup the
check has already been made - you can't get through the authentication step
if you don't have the required permissions.

Does the attached seem right to you?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachments:

basebackup_permissions.patchtext/x-patch; charset=US-ASCII; name=basebackup_permissions.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 43a0416..bf1174e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9727,6 +9727,9 @@ XLogFileNameP(TimeLineID tli, XLogSegNo segno)
  *
  * Every successfully started non-exclusive backup must be stopped by calling
  * do_pg_stop_backup() or do_pg_abort_backup().
+ *
+ * It is the responsibility of the caller of this function to verify the
+ * permissions of the calling user!
  */
 XLogRecPtr
 do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
@@ -9747,11 +9750,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 
 	backup_started_in_recovery = RecoveryInProgress();
 
-	if (!superuser() && !has_rolreplication(GetUserId()))
-		ereport(ERROR,
-				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-		   errmsg("must be superuser or replication role to run a backup")));
-
 	/*
 	 * Currently only non-exclusive backup can be taken during recovery.
 	 */
@@ -10053,6 +10051,9 @@ pg_start_backup_callback(int code, Datum arg)
  *
  * Returns the last WAL position that must be present to restore from this
  * backup, and the corresponding timeline ID in *stoptli_p.
+ *
+ * It is the responsibility of the caller of this function to verify the
+ * permissions of the calling user!
  */
 XLogRecPtr
 do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
@@ -10085,11 +10086,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 
 	backup_started_in_recovery = RecoveryInProgress();
 
-	if (!superuser() && !has_rolreplication(GetUserId()))
-		ereport(ERROR,
-				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-		 (errmsg("must be superuser or replication role to run a backup"))));
-
 	/*
 	 * Currently only non-exclusive backup can be taken during recovery.
 	 */
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index c421a59..4be31b4 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -56,6 +56,11 @@ pg_start_backup(PG_FUNCTION_ARGS)
 
 	backupidstr = text_to_cstring(backupid);
 
+	if (!superuser() && !has_rolreplication(GetUserId()))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+		   errmsg("must be superuser or replication role to run a backup")));
+
 	startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL);
 
 	snprintf(startxlogstr, sizeof(startxlogstr), "%X/%X",
@@ -82,6 +87,11 @@ pg_stop_backup(PG_FUNCTION_ARGS)
 	XLogRecPtr	stoppoint;
 	char		stopxlogstr[MAXFNAMELEN];
 
+	if (!superuser() && !has_rolreplication(GetUserId()))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+		 (errmsg("must be superuser or replication role to run a backup"))));
+
 	stoppoint = do_pg_stop_backup(NULL, true, NULL);
 
 	snprintf(stopxlogstr, sizeof(stopxlogstr), "%X/%X",
#7Andres Freund
andres@2ndquadrant.com
In reply to: Magnus Hagander (#6)
Re: Assertion failure in base backup code path

On 2014-01-07 17:40:07 +0100, Magnus Hagander wrote:

On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund <andres@2ndquadrant.com>wrote:

On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote:

On Dec 19, 2013 12:06 AM, "Andres Freund" <andres@2ndquadrant.com>

wrote:

Hi Magnus,

It looks to me like the path to do_pg_start_backup() outside of a
transaction context comes from your initial commit of the base backup
facility.
The problem is that you're not allowed to do anything leading to a
syscache/catcache lookup in those contexts.

I think that may have come with the addition of the replication privilege
actually but that doesn't change the fact that yes, it appears broken..

There was a if (!superuser()) check there before as well, that has the
same dangers.

I think the correct fix is to move the security check outside of
do_pg_start_backup() and leave it to the caller. That means
pg_start_backup() for a manual backup. And for a streaming base backup the
check has already been made - you can't get through the authentication step
if you don't have the required permissions.

Yes, that's what I was thinking and was planning to write you about at
some point.

Does the attached seem right to you?

I haven't run the code, but it looks right to me.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#7)
Re: Assertion failure in base backup code path

On Tue, Jan 7, 2014 at 5:43 PM, Andres Freund <andres@2ndquadrant.com>wrote:

On 2014-01-07 17:40:07 +0100, Magnus Hagander wrote:

On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund <andres@2ndquadrant.com
wrote:

On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote:

On Dec 19, 2013 12:06 AM, "Andres Freund" <andres@2ndquadrant.com>

wrote:

Hi Magnus,

It looks to me like the path to do_pg_start_backup() outside of a
transaction context comes from your initial commit of the base

backup

facility.
The problem is that you're not allowed to do anything leading to a
syscache/catcache lookup in those contexts.

I think that may have come with the addition of the replication

privilege

actually but that doesn't change the fact that yes, it appears

broken..

There was a if (!superuser()) check there before as well, that has the
same dangers.

I think the correct fix is to move the security check outside of
do_pg_start_backup() and leave it to the caller. That means
pg_start_backup() for a manual backup. And for a streaming base backup

the

check has already been made - you can't get through the authentication

step

if you don't have the required permissions.

Yes, that's what I was thinking and was planning to write you about at
some point.

Good, then we think the same :)

Does the attached seem right to you?

I haven't run the code, but it looks right to me.

It worked fine in my testing, so I've pushed that version.

Looks slightly different in both 9.2 and 9.1 (not clean backpatching) due
to code reorganization and such, but AFAICT it's just code in different
places.

Thanks!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/