Detecting File Damage & Inconsistencies

Started by Simon Riggsabout 5 years ago22 messages
#1Simon Riggs
simon@2ndquadrant.com

I would like to propose a few points that will help us detect file
damage, inconsistencies in files and track actions of users.

Optionally, we could add these fields onto the WAL commit record:
* 8-byte session start time (from MyStartTime)
* 2-byte pid (from MyProcPid)
* 4-byte user oid
Identified by a name flag bit, XACT_XINFO_HAS_USERINFO.
That allows us to match up transactions with the server log %c option.
Another option might allow text username to be added to each commit as well.

XLogLongPageHeaderData has 4 bytes of unused data because of
alignment. We could use that space for 1) the Xid Epoch value or 2) a
CRC value - since only WAL records are covered by a CRC, not page or
file headers. Perhaps we should add both?

There are also 2 bytes unused in the XLogRecord header (with 8 byte
alignment). We could optionally use that space for the pid that wrote
the record, but that's not compelling. What can we use those 2 bytes
for?

REINDEX VERIFY
After the new index is created, but before we drop the old index:
Check whether the two indexes match:
* checks whether the previous index had pointers to row versions that
don't exist
* checks whether the heap has rows that were not in the old index
This approach piggybacks on existing operations. AccessShareLock is
held on both indexes before the old one is dropped.

Other ideas are welcome.

Thoughts?

--
Simon Riggs http://www.EnterpriseDB.com/

#2Jose Luis Tallon
jltallon@adv-solutions.net
In reply to: Simon Riggs (#1)
Re: Detecting File Damage & Inconsistencies

On 11/11/20 21:56, Simon Riggs wrote:

[ŝnip]

REINDEX VERIFY
After the new index is created, but before we drop the old index:
Check whether the two indexes match:
* checks whether the previous index had pointers to row versions that
don't exist
* checks whether the heap has rows that were not in the old index
This approach piggybacks on existing operations. AccessShareLock is
held on both indexes before the old one is dropped.

FWIW, as long as it's optional (due to the added runtime), it'd be a
welcome feature.

Maybe something along the lines of:

    REINDEX (verify yes) ....

Other ideas are welcome.

I still have nightmares from an specific customer case w/ shared storage
(using VxFS) among two postmaster instances ---supposedly could never be
active concurrently, not completely sure that it didn't actually
happen--- and the corruption that we found there. I seem to remember
that they even had scripts to remove the locking when switching over and
back :S

I don't think Postgres can do much about this, but maybe someone can
come up with a suitable countermeasure.

Just my .02€

Thanks,

    / J.L.

#3tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Simon Riggs (#1)
RE: Detecting File Damage & Inconsistencies

From: Simon Riggs <simon@2ndquadrant.com>

I would like to propose a few points that will help us detect file
damage, inconsistencies in files and track actions of users.

Hello, Simon san. Long time no see. I'm happy to see you be back here recently.

What kind of improvement do you expect? What problems would this make detectable?

* 2-byte pid (from MyProcPid)

pid is 4 bytes on Windows. Isn't it also 4 byte on Linux when some kernel parameter is set to a certain value?

Regards
Takayuki Tsunakawa

#4Simon Riggs
simon@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#3)
Re: Detecting File Damage & Inconsistencies

On Thu, 12 Nov 2020 at 06:42, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Simon Riggs <simon@2ndquadrant.com>

I would like to propose a few points that will help us detect file
damage, inconsistencies in files and track actions of users.

Hello, Simon san. Long time no see. I'm happy to see you be back here recently.

Thank you, happy to be back. It's good to have the time to contribute again.

What kind of improvement do you expect? What problems would this make detectable?

If a rogue user/process is suspected, this would allow you to identify
more easily the changes made by specific sessions/users.

* 2-byte pid (from MyProcPid)

pid is 4 bytes on Windows. Isn't it also 4 byte on Linux when some kernel parameter is set to a certain value?

4 bytes is no problem, thanks for pointing that out.

--
Simon Riggs http://www.EnterpriseDB.com/

#5tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Simon Riggs (#4)
RE: Detecting File Damage & Inconsistencies

From: Simon Riggs <simon@2ndquadrant.com>

If a rogue user/process is suspected, this would allow you to identify
more easily the changes made by specific sessions/users.

Isn't that kind of auditing a job of pgAudit or log_statement = mod? Or, does "more easily" mean that you find pgAudit complex to use and/or log_statement's overhead is big?

Regards
Takayuki Tsunakawa

#6Simon Riggs
simon@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#5)
Re: Detecting File Damage & Inconsistencies

On Fri, 13 Nov 2020 at 00:50, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Simon Riggs <simon@2ndquadrant.com>

If a rogue user/process is suspected, this would allow you to identify
more easily the changes made by specific sessions/users.

Isn't that kind of auditing a job of pgAudit or log_statement = mod? Or, does "more easily" mean that you find pgAudit complex to use and/or log_statement's overhead is big?

Well, I designed pgaudit, so yes, I think pgaudit is useful.

However, pgaudit works at the statement level, not the data level. So
using pgaudit to locate data rows that have changed is fairly hard.

What I'm proposing is an option to add 16 bytes onto each COMMIT
record, which is considerably less than turning on full auditing in
pgaudit. This option would allow identifying data at the row level, so
you could for example find all rows changed by specific sessions.
Also, because it is stored in WAL it will show updates that might no
longer exist in the database because the changed row versions might
have been vacuumed away. So pgaudit will tell you that happened, but
having extra info in WAL is important also.

So thank you for the question because it has allowed me to explain why
it is useful and important.

--
Simon Riggs http://www.EnterpriseDB.com/

#7Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#6)
1 attachment(s)
Re: Detecting File Damage & Inconsistencies

On Fri, 13 Nov 2020 at 11:24, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 13 Nov 2020 at 00:50, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Simon Riggs <simon@2ndquadrant.com>

If a rogue user/process is suspected, this would allow you to identify
more easily the changes made by specific sessions/users.

Isn't that kind of auditing a job of pgAudit or log_statement = mod? Or, does "more easily" mean that you find pgAudit complex to use and/or log_statement's overhead is big?

Well, I designed pgaudit, so yes, I think pgaudit is useful.

However, pgaudit works at the statement level, not the data level. So
using pgaudit to locate data rows that have changed is fairly hard.

What I'm proposing is an option to add 16 bytes onto each COMMIT
record, which is considerably less than turning on full auditing in
pgaudit. This option would allow identifying data at the row level, so
you could for example find all rows changed by specific sessions.
Also, because it is stored in WAL it will show updates that might no
longer exist in the database because the changed row versions might
have been vacuumed away. So pgaudit will tell you that happened, but
having extra info in WAL is important also.

So thank you for the question because it has allowed me to explain why
it is useful and important.

Patch attached to implement "wal_sessioninfo" option.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

wal_sessioninfo.v2.patchapplication/octet-stream; name=wal_sessioninfo.v2.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a632cf98ba..ab4a6e6c7b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3008,6 +3008,25 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-wal-sessioninfo" xreflabel="wal_sessioninfo">
+      <term><varname>wal_sessioninfo</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>wal_sessioninfo</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        When this parameter is <literal>on</literal>, the <productname>PostgreSQL</productname>
+        server will add information about the user's session onto every
+        commit or abort record. This is intended to provide additional information
+        to trace the changes made by specific sessions, allowing more in-depth
+        investigation during security audits or impact assessment of bugs.
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-wal-init-zero" xreflabel="wal_init_zero">
       <term><varname>wal_init_zero</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index addd95faec..77af6911e9 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -123,6 +123,20 @@ ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_commit *pars
 
 		data += sizeof(xl_xact_origin);
 	}
+
+	if (parsed->xinfo & XACT_XINFO_HAS_SESSIONINFO)
+	{
+		xl_xact_sessioninfo xl_sessioninfo;
+
+		/* no alignment is guaranteed, so copy onto stack */
+		memcpy(&xl_sessioninfo, data, sizeof(xl_sessioninfo));
+
+		parsed->session_start_time = xl_sessioninfo.session_start_time;
+		parsed->session_pid = xl_sessioninfo.session_pid;
+		parsed->userid = xl_sessioninfo.userid;
+
+		data += sizeof(xl_xact_sessioninfo);
+	}
 }
 
 void
@@ -207,6 +221,20 @@ ParseAbortRecord(uint8 info, xl_xact_abort *xlrec, xl_xact_parsed_abort *parsed)
 
 		data += sizeof(xl_xact_origin);
 	}
+
+	if (parsed->xinfo & XACT_XINFO_HAS_SESSIONINFO)
+	{
+		xl_xact_sessioninfo xl_sessioninfo;
+
+		/* no alignment is guaranteed, so copy onto stack */
+		memcpy(&xl_sessioninfo, data, sizeof(xl_sessioninfo));
+
+		parsed->session_start_time = xl_sessioninfo.session_start_time;
+		parsed->session_pid = xl_sessioninfo.session_pid;
+		parsed->userid = xl_sessioninfo.userid;
+
+		data += sizeof(xl_xact_sessioninfo);
+	}
 }
 
 /*
@@ -310,6 +338,14 @@ xact_desc_commit(StringInfo buf, uint8 info, xl_xact_commit *xlrec, RepOriginId
 						 (uint32) parsed.origin_lsn,
 						 timestamptz_to_str(parsed.origin_timestamp));
 	}
+
+	if (parsed.xinfo & XACT_XINFO_HAS_SESSIONINFO)
+	{
+		appendStringInfo(buf, "; session: pid %u, start at %s userid %u",
+						 parsed.session_pid,
+						 timestamptz_to_str(parsed.session_start_time),
+						 parsed.userid);
+	}
 }
 
 static void
@@ -327,6 +363,14 @@ xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
 
 	xact_desc_relations(buf, "rels", parsed.nrels, parsed.xnodes);
 	xact_desc_subxacts(buf, parsed.nsubxacts, parsed.subxacts);
+
+	if (parsed.xinfo & XACT_XINFO_HAS_SESSIONINFO)
+	{
+		appendStringInfo(buf, "; session: pid %u, start at %s userid %u",
+						 parsed.session_pid,
+						 timestamptz_to_str(parsed.session_start_time),
+						 parsed.userid);
+	}
 }
 
 static void
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 03c553e7ea..83022e28f8 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -82,6 +82,8 @@ bool		XactDeferrable;
 
 int			synchronous_commit = SYNCHRONOUS_COMMIT_ON;
 
+bool		wal_sessioninfo = false;
+
 /*
  * CheckXidAlive is a xid value pointing to a possibly ongoing (sub)
  * transaction.  Currently, it is used in logical decoding.  It's possible
@@ -5515,6 +5517,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 	xl_xact_invals xl_invals;
 	xl_xact_twophase xl_twophase;
 	xl_xact_origin xl_origin;
+	xl_xact_sessioninfo xl_sessioninfo;
 	uint8		info;
 
 	Assert(CritSectionCount > 0);
@@ -5594,6 +5597,15 @@ XactLogCommitRecord(TimestampTz commit_time,
 		xl_origin.origin_timestamp = replorigin_session_origin_timestamp;
 	}
 
+	if (wal_sessioninfo)
+	{
+		xl_xinfo.xinfo |= XACT_XINFO_HAS_SESSIONINFO;
+
+		xl_sessioninfo.session_start_time = MyStartTimestamp;
+		xl_sessioninfo.session_pid = MyProcPid;
+		xl_sessioninfo.userid = GetUserId();
+	}
+
 	if (xl_xinfo.xinfo != 0)
 		info |= XLOG_XACT_HAS_INFO;
 
@@ -5642,6 +5654,9 @@ XactLogCommitRecord(TimestampTz commit_time,
 	if (xl_xinfo.xinfo & XACT_XINFO_HAS_ORIGIN)
 		XLogRegisterData((char *) (&xl_origin), sizeof(xl_xact_origin));
 
+	if (xl_xinfo.xinfo & XACT_XINFO_HAS_SESSIONINFO)
+		XLogRegisterData((char *) (&xl_sessioninfo), sizeof(xl_xact_sessioninfo));
+
 	/* we allow filtering by xacts */
 	XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
 
@@ -5668,6 +5683,7 @@ XactLogAbortRecord(TimestampTz abort_time,
 	xl_xact_twophase xl_twophase;
 	xl_xact_dbinfo xl_dbinfo;
 	xl_xact_origin xl_origin;
+	xl_xact_sessioninfo xl_sessioninfo;
 
 	uint8		info;
 
@@ -5730,6 +5746,15 @@ XactLogAbortRecord(TimestampTz abort_time,
 		xl_origin.origin_timestamp = replorigin_session_origin_timestamp;
 	}
 
+	if (wal_sessioninfo)
+	{
+		xl_xinfo.xinfo |= XACT_XINFO_HAS_SESSIONINFO;
+
+		xl_sessioninfo.session_start_time = MyStartTimestamp;
+		xl_sessioninfo.session_pid = MyProcPid;
+		xl_sessioninfo.userid = GetUserId();
+	}
+
 	if (xl_xinfo.xinfo != 0)
 		info |= XLOG_XACT_HAS_INFO;
 
@@ -5771,6 +5796,9 @@ XactLogAbortRecord(TimestampTz abort_time,
 	if (xl_xinfo.xinfo & XACT_XINFO_HAS_ORIGIN)
 		XLogRegisterData((char *) (&xl_origin), sizeof(xl_xact_origin));
 
+	if (xl_xinfo.xinfo & XACT_XINFO_HAS_SESSIONINFO)
+		XLogRegisterData((char *) (&xl_sessioninfo), sizeof(xl_xact_sessioninfo));
+
 	if (TransactionIdIsValid(twophase_xid))
 		XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
 
@@ -5870,6 +5898,8 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 						   false /* backward */ , false /* WAL */ );
 	}
 
+	/* No action if XACT_INFO_HAS_SESSIONINFO */
+
 	/* Make sure files supposed to be dropped are dropped */
 	if (parsed->nrels > 0)
 	{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bb34630e8e..c69754531d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -130,6 +130,7 @@ extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
+extern bool wal_sessioninfo;
 
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
@@ -1269,6 +1270,16 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"wal_sessioninfo", PGC_SUSET, WAL_SETTINGS,
+			gettext_noop("Includes sessionfo onto the WAL records for transaction completion."),
+			NULL
+		},
+		&wal_sessioninfo,
+		true,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"wal_init_zero", PGC_SUSET, WAL_SETTINGS,
 			gettext_noop("Writes zeroes to new WAL files before first use."),
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7320de345c..533be43950 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -172,6 +172,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
 #define XACT_XINFO_HAS_ORIGIN			(1U << 5)
 #define XACT_XINFO_HAS_AE_LOCKS			(1U << 6)
 #define XACT_XINFO_HAS_GID				(1U << 7)
+#define XACT_XINFO_HAS_SESSIONINFO		(1U << 8)
 
 /*
  * Also stored in xinfo, these indicating a variety of additional actions that
@@ -267,6 +268,13 @@ typedef struct xl_xact_origin
 	TimestampTz origin_timestamp;
 } xl_xact_origin;
 
+typedef struct xl_xact_sessioninfo
+{
+	TimestampTz	session_start_time;
+	int			session_pid;
+	Oid			userid;
+} xl_xact_sessioninfo;
+
 typedef struct xl_xact_commit
 {
 	TimestampTz xact_time;		/* time of commit */
@@ -279,6 +287,7 @@ typedef struct xl_xact_commit
 	/* xl_xact_twophase follows if XINFO_HAS_TWOPHASE */
 	/* twophase_gid follows if XINFO_HAS_GID. As a null-terminated string. */
 	/* xl_xact_origin follows if XINFO_HAS_ORIGIN, stored unaligned! */
+	/* xl_xact_sessioninfo follows if XINFO_HAS_SESSIONINFO */
 } xl_xact_commit;
 #define MinSizeOfXactCommit (offsetof(xl_xact_commit, xact_time) + sizeof(TimestampTz))
 
@@ -294,6 +303,7 @@ typedef struct xl_xact_abort
 	/* xl_xact_twophase follows if XINFO_HAS_TWOPHASE */
 	/* twophase_gid follows if XINFO_HAS_GID. As a null-terminated string. */
 	/* xl_xact_origin follows if XINFO_HAS_ORIGIN, stored unaligned! */
+	/* xl_xact_sessioninfo follows if XINFO_HAS_SESSIONINFO */
 } xl_xact_abort;
 #define MinSizeOfXactAbort sizeof(xl_xact_abort)
 
@@ -344,6 +354,10 @@ typedef struct xl_xact_parsed_commit
 
 	XLogRecPtr	origin_lsn;
 	TimestampTz origin_timestamp;
+
+	TimestampTz	session_start_time;
+	int			session_pid;
+	Oid			userid;
 } xl_xact_parsed_commit;
 
 typedef xl_xact_parsed_commit xl_xact_parsed_prepare;
@@ -367,6 +381,10 @@ typedef struct xl_xact_parsed_abort
 
 	XLogRecPtr	origin_lsn;
 	TimestampTz origin_timestamp;
+
+	TimestampTz	session_start_time;
+	int			session_pid;
+	Oid			userid;
 } xl_xact_parsed_abort;
 
 
#8Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Simon Riggs (#6)
Re: Detecting File Damage & Inconsistencies

On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs <simon@2ndquadrant.com> wrote:

What I'm proposing is an option to add 16 bytes onto each COMMIT
record

Would it make sense to write this at the time we write a topxid assignment
to WAL instead?

Otherwise it won't be accessible to streaming-mode logical decoding.

#9Simon Riggs
simon@2ndquadrant.com
In reply to: Craig Ringer (#8)
Re: Detecting File Damage & Inconsistencies

On Wed, 18 Nov 2020 at 06:42, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs <simon@2ndquadrant.com> wrote:

What I'm proposing is an option to add 16 bytes onto each COMMIT
record

Would it make sense to write this at the time we write a topxid assignment to WAL instead?

Otherwise it won't be accessible to streaming-mode logical decoding.

Do you mean extend the xl_xact_assignment record? My understanding is
that is not sent in all cases, so not sure what you mean by "instead".

--
Simon Riggs http://www.EnterpriseDB.com/

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Simon Riggs (#7)
Re: Detecting File Damage & Inconsistencies

Hi Simon,

On Wed, Nov 18, 2020 at 2:14 AM Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 13 Nov 2020 at 11:24, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 13 Nov 2020 at 00:50, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Simon Riggs <simon@2ndquadrant.com>

If a rogue user/process is suspected, this would allow you to identify
more easily the changes made by specific sessions/users.

Isn't that kind of auditing a job of pgAudit or log_statement = mod? Or, does "more easily" mean that you find pgAudit complex to use and/or log_statement's overhead is big?

Well, I designed pgaudit, so yes, I think pgaudit is useful.

However, pgaudit works at the statement level, not the data level. So
using pgaudit to locate data rows that have changed is fairly hard.

What I'm proposing is an option to add 16 bytes onto each COMMIT
record, which is considerably less than turning on full auditing in
pgaudit. This option would allow identifying data at the row level, so
you could for example find all rows changed by specific sessions.
Also, because it is stored in WAL it will show updates that might no
longer exist in the database because the changed row versions might
have been vacuumed away. So pgaudit will tell you that happened, but
having extra info in WAL is important also.

So thank you for the question because it has allowed me to explain why
it is useful and important.

Patch attached to implement "wal_sessioninfo" option.

You sent in your patch, wal_sessioninfo.v2.patch to pgsql-hackers on
Nov 18, but you did not post it to the next CommitFest[1]https://commitfest.postgresql.org/31/. If this
was intentional, then you need to take no action. However, if you
want your patch to be reviewed as part of the upcoming CommitFest,
then you need to add it yourself before 2021-01-01 AoE[2]https://en.wikipedia.org/wiki/Anywhere_on_Earth. Thanks for
your contributions.

Regards,

[1]: https://commitfest.postgresql.org/31/
[2]: https://en.wikipedia.org/wiki/Anywhere_on_Earth

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#11David Steele
david@pgmasters.net
In reply to: Simon Riggs (#9)
Re: Detecting File Damage & Inconsistencies

On 11/18/20 5:23 AM, Simon Riggs wrote:

On Wed, 18 Nov 2020 at 06:42, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs <simon@2ndquadrant.com> wrote:

What I'm proposing is an option to add 16 bytes onto each COMMIT
record

Would it make sense to write this at the time we write a topxid assignment to WAL instead?

Otherwise it won't be accessible to streaming-mode logical decoding.

Do you mean extend the xl_xact_assignment record? My understanding is
that is not sent in all cases, so not sure what you mean by "instead".

Craig, can you clarify?

Cleysson, you are signed up as a reviewer. Do you know when you'll have
a change to have a look?

Regards,
--
-David
david@pgmasters.net

#12Craig Ringer
craig.ringer@enterprisedb.com
In reply to: David Steele (#11)
Re: Detecting File Damage & Inconsistencies

On Mon, 15 Mar 2021 at 21:01, David Steele <david@pgmasters.net> wrote:

On 11/18/20 5:23 AM, Simon Riggs wrote:

On Wed, 18 Nov 2020 at 06:42, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs <simon@2ndquadrant.com>

wrote:

What I'm proposing is an option to add 16 bytes onto each COMMIT
record

Would it make sense to write this at the time we write a topxid

assignment to WAL instead?

Otherwise it won't be accessible to streaming-mode logical decoding.

Do you mean extend the xl_xact_assignment record? My understanding is
that is not sent in all cases, so not sure what you mean by "instead".

Craig, can you clarify?

Right. Or write a separate WAL record when the feature is enabled. But it's
probably sufficient to write it as an optional chunk on xl_xact_assignment
records. We often defer writing them so we can optimise away xacts that
never actually wrote anything, but IIRC we still write one before we write
any WAL that references the xid. That'd be fine, since we don't need the
info any sooner than that during decoding. I'd have to double check that we
write it in all cases and won't get to that too soon, but I'm pretty sure
we do...

#13Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Craig Ringer (#12)
Re: Detecting File Damage & Inconsistencies

On Thu, Mar 18, 2021 at 6:20 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Mon, 15 Mar 2021 at 21:01, David Steele <david@pgmasters.net> wrote:

On 11/18/20 5:23 AM, Simon Riggs wrote:

On Wed, 18 Nov 2020 at 06:42, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs <simon@2ndquadrant.com> wrote:

What I'm proposing is an option to add 16 bytes onto each COMMIT
record

Would it make sense to write this at the time we write a topxid assignment to WAL instead?

Otherwise it won't be accessible to streaming-mode logical decoding.

Do you mean extend the xl_xact_assignment record? My understanding is
that is not sent in all cases, so not sure what you mean by "instead".

Craig, can you clarify?

Right. Or write a separate WAL record when the feature is enabled. But it's probably sufficient to write it as an optional chunk on xl_xact_assignment records. We often defer writing them so we can optimise away xacts that never actually wrote anything, but IIRC we still write one before we write any WAL that references the xid. That'd be fine, since we don't need the info any sooner than that during decoding. I'd have to double check that we write it in all cases and won't get to that too soon, but I'm pretty sure we do...

The commit record is optimized away if no xid is assigned, though is
still present if we didn't write any WAL records.

But if a commit record exists in the WAL stream, we want to know where
it came from.

A later patch will add PITR capability based on this information so
attaching it directly to the commit record is fairly important, IMHO.

--
Simon Riggs http://www.EnterpriseDB.com/

#14Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Simon Riggs (#13)
Re: Detecting File Damage & Inconsistencies

On Tue, 22 Jun 2021 at 00:24, Simon Riggs <simon.riggs@enterprisedb.com>
wrote:

On Thu, Mar 18, 2021 at 6:20 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Mon, 15 Mar 2021 at 21:01, David Steele <david@pgmasters.net> wrote:

On 11/18/20 5:23 AM, Simon Riggs wrote:

On Wed, 18 Nov 2020 at 06:42, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs <simon@2ndquadrant.com>

wrote:

What I'm proposing is an option to add 16 bytes onto each COMMIT
record

Would it make sense to write this at the time we write a topxid

assignment to WAL instead?

Otherwise it won't be accessible to streaming-mode logical decoding.

Do you mean extend the xl_xact_assignment record? My understanding is
that is not sent in all cases, so not sure what you mean by "instead".

Craig, can you clarify?

Right. Or write a separate WAL record when the feature is enabled. But

it's probably sufficient to write it as an optional chunk on
xl_xact_assignment records. We often defer writing them so we can optimise
away xacts that never actually wrote anything, but IIRC we still write one
before we write any WAL that references the xid. That'd be fine, since we
don't need the info any sooner than that during decoding. I'd have to
double check that we write it in all cases and won't get to that too soon,
but I'm pretty sure we do...

The commit record is optimized away if no xid is assigned, though is
still present if we didn't write any WAL records.

But if a commit record exists in the WAL stream, we want to know where
it came from.

A later patch will add PITR capability based on this information so
attaching it directly to the commit record is fairly important, IMHO.

Why?

All the proposed info:

* 8-byte session start time (from MyStartTime)
* 2-byte pid (from MyProcPid)
* 4-byte user oid

are available at topxid assignment time. If we defer writing them until
commit, we lose the ability to use this information during streaming
logical decoding. That's something I believe you've wanted for other
functionality in the past, such as logical decoding based audit
functionality.

IIRC the restart_lsn horizon already ensures that we can't miss the
xl_xact_assignment at the start of a txn. We would ensure that the desired
info is available throughout decoding of the txn, including at commit
record processing time, by adding it to the toplevel ReorderBufferTxn.

The only advantage I can see to annotating the commit record instead is
that we don't have to spend a few bytes per reorder-buffered topxid to
track this info between start of decoding for the tx and processing of the
commit record. I don't think that's worth caring about.The advantages that
having it earlier would give us are much more significant.

A few examples:

* Skip reorder buffering of non-target transactions early, so we can decode
the WAL stream to find the target transactions much faster using less
memory and I/O;

* Read the database change stream and use the session info to stream info
into an intrusion detection system and/or audit engine in real time, using
txn streaming to avoid the need to create huge reorder buffers;

* Re-decode the WAL stream to identify a target txn you know was aborted,
and commit it instead, so you can recover data from aborted txns from the
WAL stream using logical decoding. (Only possible if the catalog_xmin
hasn't advanced past that point already though)

So yeah. I think it'd be better to log the info you want at start-of-txn
unless there's a compelling reason not so, and I don't see one yet.

#15Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Craig Ringer (#14)
Re: Detecting File Damage & Inconsistencies

On Tue, Jun 22, 2021 at 6:32 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

IIRC the restart_lsn horizon already ensures that we can't miss the xl_xact_assignment at the start of a txn. We would ensure that the desired info is available throughout decoding of the txn, including at commit record processing time, by adding it to the toplevel ReorderBufferTxn.

The only advantage I can see to annotating the commit record instead is that we don't have to spend a few bytes per reorder-buffered topxid to track this info between start of decoding for the tx and processing of the commit record. I don't think that's worth caring about.The advantages that having it earlier would give us are much more significant.

A few examples:

* Skip reorder buffering of non-target transactions early, so we can decode the WAL stream to find the target transactions much faster using less memory and I/O;

* Read the database change stream and use the session info to stream info into an intrusion detection system and/or audit engine in real time, using txn streaming to avoid the need to create huge reorder buffers;

* Re-decode the WAL stream to identify a target txn you know was aborted, and commit it instead, so you can recover data from aborted txns from the WAL stream using logical decoding. (Only possible if the catalog_xmin hasn't advanced past that point already though)

So yeah. I think it'd be better to log the info you want at start-of-txn unless there's a compelling reason not so, and I don't see one yet.

AFAIK, XLOG_XACT_ASSIGNMENT does not occur for normal top-level
transactions, only for subxids.

I don't really want to add an extra record just for this because it
will slow down applications and it won't get turned on as often.

Thoughts?

--
Simon Riggs http://www.EnterpriseDB.com/

#16Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Simon Riggs (#15)
Re: Detecting File Damage & Inconsistencies

On Fri, 2 Jul 2021 at 00:19, Simon Riggs <simon.riggs@enterprisedb.com>
wrote:

So yeah. I think it'd be better to log the info you want at start-of-txn

unless there's a compelling reason not so, and I don't see one yet.

AFAIK, XLOG_XACT_ASSIGNMENT does not occur for normal top-level
transactions, only for subxids.

I don't really want to add an extra record just for this because it
will slow down applications and it won't get turned on as often.

OK, that makes sense - I was indeed operating on an incorrect assumption.

I wouldn't want to add a new record either. I thought we could piggyback on
XLOG_XACT_ASSIGNMENT with a new chunk that's only added when the feature is
required, much like we do for replication origin info on commit records.

Is it worth considering forcing XLOG_XACT_ASSIGNMENT to be logged if this
feature is enabled?

If you don't think the sorts of use cases I presented are worth the trouble
that's fair enough. I'm not against adding it on the commit record. It's
just that with logical decoding moving toward a streaming model I suspect
only having it at commit time may cause us some pain later.

#17Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Craig Ringer (#16)
Re: Detecting File Damage & Inconsistencies

On Fri, Jul 2, 2021 at 5:34 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Fri, 2 Jul 2021 at 00:19, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

So yeah. I think it'd be better to log the info you want at start-of-txn unless there's a compelling reason not so, and I don't see one yet.

AFAIK, XLOG_XACT_ASSIGNMENT does not occur for normal top-level
transactions, only for subxids.

I don't really want to add an extra record just for this because it
will slow down applications and it won't get turned on as often.

OK, that makes sense - I was indeed operating on an incorrect assumption.

I wouldn't want to add a new record either. I thought we could piggyback on XLOG_XACT_ASSIGNMENT with a new chunk that's only added when the feature is required, much like we do for replication origin info on commit records.

Is it worth considering forcing XLOG_XACT_ASSIGNMENT to be logged if this feature is enabled?

My feeling is that the drop in performance would lead to it being
turned off most of the time, reducing the value of the feature.

Does anyone else disagree?

If you don't think the sorts of use cases I presented are worth the trouble that's fair enough. I'm not against adding it on the commit record. It's just that with logical decoding moving toward a streaming model I suspect only having it at commit time may cause us some pain later.

I think you have some good ideas about how to handle larger
transactions with streaming. As a separate patch it might be worth
keeping track of transaction size and logging something when a
transaction gets too large.

--
Simon Riggs http://www.EnterpriseDB.com/

#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Simon Riggs (#17)
Re: Detecting File Damage & Inconsistencies

On Fri, Jul 2, 2021 at 8:29 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Fri, Jul 2, 2021 at 5:34 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

If you don't think the sorts of use cases I presented are worth the trouble that's fair enough. I'm not against adding it on the commit record. It's just that with logical decoding moving toward a streaming model I suspect only having it at commit time may cause us some pain later.

I think you have some good ideas about how to handle larger
transactions with streaming. As a separate patch it might be worth
keeping track of transaction size and logging something when a
transaction gets too large.

If we want this additional information for streaming mode in logical
replication then can't we piggyback it on the very first record
written for a transaction when this info is required? Currently, we do
something similar for logging top_level_xid for subtransaction in
XLogRecordAssemble().

--
With Regards,
Amit Kapila.

#19Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Amit Kapila (#18)
Re: Detecting File Damage & Inconsistencies

On Tue, Jul 6, 2021 at 4:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 2, 2021 at 8:29 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Fri, Jul 2, 2021 at 5:34 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

If you don't think the sorts of use cases I presented are worth the trouble that's fair enough. I'm not against adding it on the commit record. It's just that with logical decoding moving toward a streaming model I suspect only having it at commit time may cause us some pain later.

I think you have some good ideas about how to handle larger
transactions with streaming. As a separate patch it might be worth
keeping track of transaction size and logging something when a
transaction gets too large.

If we want this additional information for streaming mode in logical
replication then can't we piggyback it on the very first record
written for a transaction when this info is required? Currently, we do
something similar for logging top_level_xid for subtransaction in
XLogRecordAssemble().

It's possible, but I'm struggling to believe anybody would accept that
as an approach because it breaks simplicity, modularity and makes it
harder to search for this info in the WAL.

I was imagining that we'd keep track of amount of WAL written by a
transaction and when it reaches a certain size generate a "streaming
info" record as an early warning that we have a big transaction coming
down the pipe.

I'm feeling that a simple patch is expanding well beyond its original
scope and timeline. How can we do this simply?

--
Simon Riggs http://www.EnterpriseDB.com/

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Simon Riggs (#19)
Re: Detecting File Damage & Inconsistencies

On Tue, Jul 13, 2021 at 8:29 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

On Tue, Jul 6, 2021 at 4:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

If you don't think the sorts of use cases I presented are worth the trouble that's fair enough. I'm not against adding it on the commit record. It's just that with logical decoding moving toward a streaming model I suspect only having it at commit time may cause us some pain later.

I think you have some good ideas about how to handle larger
transactions with streaming. As a separate patch it might be worth
keeping track of transaction size and logging something when a
transaction gets too large.

If we want this additional information for streaming mode in logical
replication then can't we piggyback it on the very first record
written for a transaction when this info is required? Currently, we do
something similar for logging top_level_xid for subtransaction in
XLogRecordAssemble().

It's possible, but I'm struggling to believe anybody would accept that
as an approach because it breaks simplicity, modularity and makes it
harder to search for this info in the WAL.

I was imagining that we'd keep track of amount of WAL written by a
transaction and when it reaches a certain size generate a "streaming
info" record as an early warning that we have a big transaction coming
down the pipe.

I am not sure if that satisfies Craig's requirement of making it
available during the streaming of in-progress xacts during logical
replication. It is quite possible that by the time we decide to start
streaming a transaction this information won't be logged yet.

I'm feeling that a simple patch is expanding well beyond its original
scope and timeline. How can we do this simply?

The patch is simple but its use doesn't seem to be very clear. You
have mentioned its use for future PITR patches and Craig mentioned
some use cases in logical decoding and it appears to me that to
support the use cases mentioned by Craig, it is important to LOG this
earlier than at commit time. As there are no details about how it will
be used for PITR patches and whether such patch ideas are accepted, it
makes it harder to judge the value of this patch.

I think if we would have patches (even at WIP/POC stage) for the ideas
you and Craig have in mind, it would have been much easier to see the
value of this patch.

--
With Regards,
Amit Kapila.

#21Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Amit Kapila (#20)
Re: Detecting File Damage & Inconsistencies

On Wed, 14 Jul 2021 at 05:01, Amit Kapila <amit.kapila16@gmail.com> wrote:

The patch is simple but its use doesn't seem to be very clear. You
have mentioned its use for future PITR patches and Craig mentioned
some use cases in logical decoding and it appears to me that to
support the use cases mentioned by Craig, it is important to LOG this
earlier than at commit time. As there are no details about how it will
be used for PITR patches and whether such patch ideas are accepted, it
makes it harder to judge the value of this patch.

I think if we would have patches (even at WIP/POC stage) for the ideas
you and Craig have in mind, it would have been much easier to see the
value of this patch.

Fair question. This is one of a series of 4 independent patches I have
planned to provide enhanced information and/or recovery tools. (The
second one is already in this CF). This is an area I know lots about
and nobody else is working on, so I thought I would contribute. I've
not discussed this off-list with anyone else. So this is PITR as
recovery in a broad sense, not just that specific feature.

For this patch, the idea is to be able to go in either direction:
Data damage <--> User info

So if you know a user was an intruder, you can detect the damage they caused.
Or, if you detect damage, you can work out who caused it, work out if
they were an intruder and if so, detect what else they did.

The most important thing is to have the info available in WAL, nothing
is possible until that is available.
We already added an option to add this same info to log_line_prefix,
yet nobody said it wasn't useful there, or needed other uses to allow
the feature.
The two sources of info are designed to be able to be used in combination.

My experience of recovery scenarios is that you often have to build
custom search tools to make it work. It's hard to say whether you'll
want to track the user, the specific session, or even specific
transactions.

But I do understand the overall request, so I propose adding
* pg_waldump output for wal_sessioninfo data, if it exists
* pg_waldump --user=USERNAME as a filter on username
to demonstrate the use of this

--
Simon Riggs http://www.EnterpriseDB.com/

#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Simon Riggs (#21)
Re: Detecting File Damage & Inconsistencies

On Thu, Jul 22, 2021 at 7:10 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

On Wed, 14 Jul 2021 at 05:01, Amit Kapila <amit.kapila16@gmail.com> wrote:

But I do understand the overall request, so I propose adding
* pg_waldump output for wal_sessioninfo data, if it exists
* pg_waldump --user=USERNAME as a filter on username
to demonstrate the use of this

This makes sense but I am still thinking of some more concrete way.
Can we think of providing a way to filter WAL from user/process (like
filter_by_origin) for logical decoding? If so, then we can have an
example to show via test_decoding.

--
With Regards,
Amit Kapila.