Re: Review: Patch to compute Max LSN of Data Pages

Started by Amit Kapilaover 12 years ago7 messages

amit.kapila16@gmail.com

over 12 years ago

1 attachment(s)

On Monday, July 08, 2013 5:16 PM Andres Freund wrote:
On 2013-07-08 17:10:43 +0530, Amit Kapila wrote:
On Monday, July 08, 2013 4:26 PM Andres Freund wrote:
On 2013-07-08 16:17:54 +0530, Hari Babu wrote:

+ This utility can also be used to decide whether backup is

required or not when the data page

+ in old-master precedes the last applied LSN in old-standby

(i.e., new-master) at the
+    moment of the failover.
+   </para>
+  </refsect1>
I don't think this is safe in any interesting set of cases. Am I
missing
something?
No, you are not missing anything. It can be only used to find max LSN in
database which can avoid further corruption

Why is the patch submitted documenting it as a use-case then? I find it
rather scary if the *patch authors* document a known unsafe use case as
one of the known use-cases.

I got the problem which can occur with the specified use case. Removed the
wrong use case specified above.
Thanks for the review, please find the updated patch attached in the mail.

Patch is not getting compiled on Windows, I had made following changes:
a. updated the patch for resolving Windows build
b. few documentation changes in (pg_resetxlog.sgml) for spelling
mistake and minor line change
c. corrected year for Copyright in file pg_computemaxlsn.c

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_computemaxlsn_v11.patchapplication/octet-stream; name=pg_computemaxlsn_v11.patchDownload

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..38c2612 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -31,6 +31,7 @@ SUBDIRS = \
 		passwordcheck	\
 		pg_archivecleanup \
 		pg_buffercache	\
+		pg_computemaxlsn \
 		pg_freespacemap \
 		pg_standby	\
 		pg_stat_statements \
diff --git a/contrib/pg_computemaxlsn/Makefile b/contrib/pg_computemaxlsn/Makefile
new file mode 100644
index 0000000..eb41ff1
--- /dev/null
+++ b/contrib/pg_computemaxlsn/Makefile
@@ -0,0 +1,22 @@
+# contrib/pg_computemaxlsn/Makefile
+
+PGFILEDESC = "pg_computemaxlsn - an utility to find max LSN from data pages"
+PGAPPICON = win32
+
+PROGRAM  = pg_computemaxlsn
+OBJS = pg_computemaxlsn.o $(WIN32RES)
+
+PG_CPPFLAGS  = -I$(srcdir)
+PG_LIBS = $(libpq_pgport)
+
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_computemaxlsn
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_computemaxlsn/pg_computemaxlsn.c b/contrib/pg_computemaxlsn/pg_computemaxlsn.c
new file mode 100644
index 0000000..532f18b
--- /dev/null
+++ b/contrib/pg_computemaxlsn/pg_computemaxlsn.c
@@ -0,0 +1,430 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_computemaxlsn.c
+ *	  A utility to compute the maximum LSN in data pages
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/pg_computemaxlsn/pg_computemaxlsn.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * We have to use postgres.h not postgres_fe.h here, because there's so much
+ * backend-only stuff for reading data files we need.  But we need a
+ * frontend-ish environment otherwise.	Hence this ugly hack.
+ */
+#define FRONTEND 1
+
+#include "postgres.h"
+
+#include <dirent.h>
+#include <fcntl.h>
+#include <locale.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "getopt_long.h"
+
+#include "access/xlog_internal.h"
+#include "catalog/catalog.h"
+#include "storage/bufpage.h"
+#include "storage/fd.h"
+
+/* Page header size */
+#define PAGEHDRSZ (sizeof(PageHeaderData))
+
+extern int	optind;
+extern char *optarg;
+static const char *progname;
+static int	quiet = false;
+
+static bool FindMaxLSNinFile(char *filename, XLogRecPtr *maxlsn);
+static bool FindMaxLSNinDir(char *path, XLogRecPtr *maxlsn);
+static bool FindMaxLSNinPgData(char *pgdatapath, XLogRecPtr *maxlsn);
+static void usage(void);
+
+/*
+ * This function validates the given cluster directory - we search for a
+ * small set of subdirectories that we expect to find in a valid data directory.
+ * directory. If any of the subdirectories are missing, then it treats as a
+ * normal directory.
+ */
+static bool
+check_data_dir(const char *pg_data)
+{
+	char		subDirName[MAXPGPATH];
+	int			dnum;
+
+	/* start check with top-most directory */
+	const char *requiredSubdirs[] = {"", "base", "global", "pg_tblspc",
+		"pg_multixact", "pg_subtrans", "pg_clog", "pg_twophase",
+	"pg_xlog"};
+
+	for (dnum = 0; dnum < lengthof(requiredSubdirs); ++dnum)
+	{
+		struct stat statBuf;
+
+		snprintf(subDirName, sizeof(subDirName), "%s%s%s", pg_data,
+		/* Win32 can't stat() a directory with a trailing slash. */
+				 *requiredSubdirs[dnum] ? "/" : "",
+				 requiredSubdirs[dnum]);
+
+		if (stat(subDirName, &statBuf) != 0
+			|| (!S_ISDIR(statBuf.st_mode)))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * relfilenode name validation.
+ * Format with_ext == true	[0-9]+[ \w | _vm | _fsm | _init ][\.][0-9]*
+ *		  with_ext == false [0-9]+
+ */
+static bool
+validateRelfilenodename(char *name)
+{
+	int			pos = 0;
+
+	while ((name[pos] >= '0') && (name[pos] <= '9'))
+		pos++;
+
+	if (name[pos] == '_')
+	{
+		pos++;
+		while ((name[pos] >= 'a') && (name[pos] <= 'z'))
+			pos++;
+	}
+	if (name[pos] == '.')
+	{
+		pos++;
+		while ((name[pos] >= '0') && (name[pos] <= '9'))
+			pos++;
+	}
+
+	if (name[pos] == 0)
+		return true;
+
+	return false;
+}
+
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"quiet", no_argument, NULL, 'q'},
+		{NULL, 0, NULL, 0}
+	};
+	int			optindex;
+	int			c;
+	char	   *LsnSearchPath = NULL;
+	XLogRecPtr	maxLSN = 0;
+	XLogSegNo	logSegNo = 0;
+	bool		result = true;
+	struct stat statbuf;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_computemaxlsn"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_computemaxlsn (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt_long(argc, argv, "q", long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'q':
+				quiet = true;
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	/* Loop over all input arguements */
+	do
+	{
+		if (optind < argc)
+			LsnSearchPath = argv[optind];
+		else
+			LsnSearchPath = getenv("PGDATA");
+
+		if (LsnSearchPath == NULL)
+		{
+			fprintf(stderr, _("%s: no file/directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+
+		maxLSN = 0;
+
+		/* Check for the provided search path is a data directory? */
+		if (check_data_dir(LsnSearchPath))
+		{
+			result = FindMaxLSNinPgData(LsnSearchPath, &maxLSN);
+		}
+		else
+		{
+			if (lstat(LsnSearchPath, &statbuf) < 0)
+			{
+				if (errno == ENOENT)
+				{
+					fprintf(stderr, _("%s: file or directory \"%s\" does not exists.\n"),
+							progname, LsnSearchPath);
+				}
+				else
+				{
+					fprintf(stderr, _("%s: could not stat file or directory \"%s\": %s\n"),
+							progname, LsnSearchPath, strerror(errno));
+				}
+				exit(1);
+			}
+
+			if (S_ISREG(statbuf.st_mode))
+				result = FindMaxLSNinFile(LsnSearchPath, &maxLSN);
+			else
+				result = FindMaxLSNinDir(LsnSearchPath, &maxLSN);
+		}
+
+		if (!result)
+		{
+			/* Message already provided, simply exit */
+			exit(1);
+		}
+
+		XLByteToSeg(maxLSN, logSegNo);
+
+		printf("Maximum LSN found in \"%s\" is: %X/%X \n"
+			   "WAL segment file name (fileid,seg): %X/%X\n",
+			   LsnSearchPath, (uint32) (maxLSN >> 32), (uint32) maxLSN,
+			   (uint32) (logSegNo >> 32), (uint32) (logSegNo));
+	} while (++optind < argc);
+
+	return 0;
+}
+
+
+/*
+ * Read the maximum LSN number in the one of data file (relnode file).
+ */
+static bool
+FindMaxLSNinFile(char *filename, XLogRecPtr *maxlsn)
+{
+	XLogRecPtr	pagelsn;
+	off_t		len,
+				seekpos;
+	uint32		nblocks,
+				blocknum;
+	char		buffer[PAGEHDRSZ];
+	int			nbytes;
+	int			fd;
+	XLogRecPtr	filemaxlsn = 0;
+	uint32		filemaxlsn_block = 0;
+
+	if ((fd = open(filename, O_RDONLY | PG_BINARY, 0)) < 0)
+	{
+		/*
+		 * If file does not exist or we can't read it. give error
+		 */
+		fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
+				progname, filename, strerror(errno));
+		return false;
+	}
+
+	/* Calculate the number of pages in file */
+	len = lseek(fd, 0L, SEEK_END);
+	if (len < 0)
+	{
+		close(fd);
+		fprintf(stderr, _("%s: file \"%s\" for seeking: %s\n"),
+				progname, filename, strerror(errno));
+		return false;
+	}
+
+	nblocks = (len / BLCKSZ);
+	if (nblocks > RELSEG_SIZE)
+	{
+		/*
+		 * In one relfilenode file length can't be more that RELSEG_SIZE
+		 */
+		close(fd);
+		fprintf(stderr, _("%s: file \"%s\" length is more than segment size: %d.\n"),
+				progname, filename, RELSEG_SIZE);
+		return false;
+	}
+
+	/*
+	 * Read the only page header and find the LSN of the page and continue to
+	 * next page.
+	 */
+	seekpos = 0;
+	for (blocknum = 0; blocknum < nblocks; blocknum++)
+	{
+		len = lseek(fd, seekpos, SEEK_SET);
+		if (len != seekpos)
+		{
+			close(fd);
+			fprintf(stderr, _("%s: could not seek to offset %u in file \"%s\": %s\n"),
+					progname, blocknum, filename, strerror(errno));
+			return false;
+		}
+
+		nbytes = read(fd, buffer, PAGEHDRSZ);
+		if (nbytes < 0)
+		{
+			close(fd);
+			fprintf(stderr, _("%s: could not read file \"%s\" at offset %u: %s\n"),
+					progname, filename, blocknum, strerror(errno));
+			return false;
+		}
+
+		/*
+		 * Remember this lsn as the highest (if it is)
+		 */
+		pagelsn = PageGetLSN(buffer);
+		if (filemaxlsn < pagelsn)
+		{
+			filemaxlsn = pagelsn;
+			filemaxlsn_block = blocknum;
+		}
+
+		seekpos += (off_t) BLCKSZ;
+	}
+
+	close(fd);
+
+	if (*maxlsn < filemaxlsn)
+		*maxlsn = filemaxlsn;
+
+	if (!quiet)
+		printf("Highest LSN for file:\"%s\" is: %X/%X in block %u\n",
+			   filename, (uint32) (filemaxlsn >> 32), (uint32) filemaxlsn,
+			   filemaxlsn_block);
+
+	return true;
+}
+
+/*
+ * Read the maximum LSN number in current directory; including sub directories
+ * and links.
+ */
+static bool
+FindMaxLSNinDir(char *path, XLogRecPtr *maxlsn)
+{
+	DIR		   *dir;
+	struct dirent *de;
+	char		pathbuf[MAXPGPATH];
+	struct stat statbuf;
+	bool		result = true;
+
+	dir = opendir(path);
+	if (NULL == dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %s\n"),
+				progname, path, strerror(errno));
+		return false;
+	}
+
+	while ((de = readdir(dir)) != NULL)
+	{
+		/* Skip special stuff */
+		if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+			continue;
+
+		/* Skip temporary files */
+		if (strncmp(de->d_name,
+					PG_TEMP_FILE_PREFIX,
+					strlen(PG_TEMP_FILE_PREFIX)) == 0)
+			continue;
+
+		/*
+		 * Skip all invalid files, and read remaining relfilenode files
+		 */
+		if (!validateRelfilenodename(de->d_name))
+			continue;
+
+		snprintf(pathbuf, MAXPGPATH, "%s/%s", path, de->d_name);
+
+		if (lstat(pathbuf, &statbuf) != 0)
+		{
+			if (errno != ENOENT)
+				fprintf(stderr, _("%s: could not stat file or directory \"%s\": %s\n"),
+						progname, pathbuf, strerror(errno));
+
+			/* If the file went away while scanning, it's no error. */
+			continue;
+		}
+
+		if (S_ISREG(statbuf.st_mode))
+		{
+			if (!FindMaxLSNinFile(pathbuf, maxlsn))
+				result = false;
+		}
+		else
+		{
+			if (!FindMaxLSNinDir(pathbuf, maxlsn))
+				result = false;
+		}
+	}
+
+	closedir(dir);
+	return result;
+}
+
+
+/*
+ * Read the maximum LSN number in the DATA directory.
+ */
+static bool
+FindMaxLSNinPgData(char *pgdatapath, XLogRecPtr *maxlsn)
+{
+	char		pathbuf[MAXPGPATH];
+	bool		result = true;
+
+	/* scan all the relfilenodes in data directory */
+	snprintf(pathbuf, MAXPGPATH, "%s/global", pgdatapath);
+	if (!FindMaxLSNinDir(pathbuf, maxlsn))
+		result = false;
+
+	snprintf(pathbuf, MAXPGPATH, "%s/base", pgdatapath);
+	if (!FindMaxLSNinDir(pathbuf, maxlsn))
+		result = false;
+
+	snprintf(pathbuf, MAXPGPATH, "%s/pg_tblspc", pgdatapath);
+	if (!FindMaxLSNinDir(pathbuf, maxlsn))
+		result = false;
+
+	return result;
+}
+
+static void
+usage(void)
+{
+	printf(_("%s compute the maximum LSN in PostgreSQL data pages.\n\n"), progname);
+	printf(_("Usage:\n  %s [OPTION] [FILE/DIRECTORY...]\n\n"), progname);
+	printf(_("Options:\n"));
+	printf(_("  -q, --quiet        stops the output of highest LSN in a file\n"));
+	printf(_("  -V, --version      output version information, then exit\n"));
+	printf(_("  -?, --help         show this help, then exit\n"));
+	printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index dd8e09e..a5c76b6 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -208,5 +208,6 @@ pages.
  &pgtesttiming;
  &pgupgrade;
  &pgxlogdump;
+ &pgcomputemaxlsn;
  </sect1>
 </appendix>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 914090d..b75d8e7 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -124,6 +124,7 @@
 <!ENTITY pgbench         SYSTEM "pgbench.sgml">
 <!ENTITY pgarchivecleanup SYSTEM "pgarchivecleanup.sgml">
 <!ENTITY pgbuffercache   SYSTEM "pgbuffercache.sgml">
+<!ENTITY pgcomputemaxlsn   SYSTEM "pg_computemaxlsn.sgml">
 <!ENTITY pgcrypto        SYSTEM "pgcrypto.sgml">
 <!ENTITY pgfreespacemap  SYSTEM "pgfreespacemap.sgml">
 <!ENTITY pgrowlocks      SYSTEM "pgrowlocks.sgml">
diff --git a/doc/src/sgml/pg_computemaxlsn.sgml b/doc/src/sgml/pg_computemaxlsn.sgml
new file mode 100644
index 0000000..4f8b50f
--- /dev/null
+++ b/doc/src/sgml/pg_computemaxlsn.sgml
@@ -0,0 +1,94 @@
+<!--
+doc/src/sgml/pg_computemaxlsn.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgcomputemaxlsn">
+ <refmeta>
+  <refentrytitle><application>pg_computemaxlsn</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_computemaxlsn</refname>
+  <refpurpose>computes the maximum LSN in database of a <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <indexterm zone="pgcomputemaxlsn">
+  <primary>pg_computemaxlsn</primary>
+ </indexterm>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_computemaxlsn</command>
+   <arg choice="opt"><replaceable>option</replaceable></arg>
+   <arg rep="repeat"><replaceable>file or directory</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="R1-APP-PGCOMPUTEMAXLSN-1">
+  <title>Description</title>
+  <para>
+   <command>pg_computemaxlsn</command> computes maximun LSN from database pages
+   in the specified list of files or directories.
+  </para>
+
+  <para>
+   If user doesn't provide the file or directory to find the max lsn then
+   <command>pg_computemaxlsn</command> use the environment variable <envar>PGDATA</>
+   if exists otherwise reports an error.
+  </para>
+
+  <para>
+   The <option>-q</> and <option>--quiet</> options stops the print of
+   highest LSN in a file.
+  </para>
+
+  <para>
+   The <option>-V</> and <option>--version</> options
+   print the <application>pg_computemaxlsn</application> version and exit. The
+   options <option>-?</> and <option>--help</> show supported syntax, and exit.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>use cases</title>
+  <para>
+   This utility can be used to avoid more data corruption by finding out the
+   maximum lsn in the data directory which is required by the <command>pg_resetxlog</command>
+   to provide the new WAL segment file name.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+  <screen>
+pg_computemaxlsn> ./pg_computemaxlsn -q ../../../data/
+Maximum LSN found in "../../../data/" is: 0/181B090
+WAL segment file name (fileid,seg): 0/1
+  </screen>
+
+  <screen>
+pg_computemaxlsn> ./pg_computemaxlsn -q ../../../data/base/126*
+Maximum LSN found in "../../../data/base/12625" is: 0/17D5360
+WAL segment file name (fileid,seg): 0/1
+Maximum LSN found in "../../../data/base/12630" is: 0/181B090
+WAL segment file name (fileid,seg): 0/1
+  </screen>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+   This utility just parse the header of the page and provides the LSN, please be sure
+   when using this tool on directory which is not a data directory.
+  </para>
+
+  <para>
+   If any error occurs during the finding of max LSN in the provided file or directory,
+   the max lsn is not printed. Once the reported problem solves need to run the tool again
+   for proper result.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index c680680..b042ce8 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -139,6 +139,15 @@ PostgreSQL documentation
       largest entry in <filename>pg_xlog</>, use <literal>-l 00000001000000320000004B</> or higher.
      </para>
 
+     <para>
+      If <command>pg_resetxlog</command> complains that it cannot determine
+      valid data for <filename>pg_control</> and if you do not have it or WAL segment
+      files in the directory <filename>pg_xlog</> under the data directory are corrupted,
+      then to identify larger WAL segment file from data files we can use utility <command>pg_computemaxlsn</command>
+      by specifying file or folder or data directory. Once larger WAL segment file is found use <option>-l</> option
+      for setting the value.
+     </para>
+
      <note>
       <para>
        <command>pg_resetxlog</command> itself looks at the files in
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 8dd6173..8928dae 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -37,13 +37,13 @@ my @contrib_uselibpgport = (
 	'pg_standby',    'pg_archivecleanup',
 	'pg_test_fsync', 'pg_test_timing',
 	'pg_upgrade',    'pg_xlogdump',
-	'vacuumlo');
+	'vacuumlo',      'pg_computemaxlsn');
 my @contrib_uselibpgcommon = (
 	'oid2name',      'pgbench',
 	'pg_standby',    'pg_archivecleanup',
 	'pg_test_fsync', 'pg_test_timing',
 	'pg_upgrade',    'pg_xlogdump',
-	'vacuumlo');
+	'vacuumlo', 	 'pg_computemaxlsn');
 my $contrib_extralibs = { 'pgbench' => ['wsock32.lib'] };
 my $contrib_extraincludes =
   { 'tsearch2' => ['contrib/tsearch2'], 'dblink' => ['src/backend'] };

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Amit Kapila (#1)

On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Monday, July 08, 2013 5:16 PM Andres Freund wrote:
On 2013-07-08 17:10:43 +0530, Amit Kapila wrote:
On Monday, July 08, 2013 4:26 PM Andres Freund wrote:
On 2013-07-08 16:17:54 +0530, Hari Babu wrote:

+ This utility can also be used to decide whether backup is

required or not when the data page

+ in old-master precedes the last applied LSN in old-standby

(i.e., new-master) at the
+    moment of the failover.
+   </para>
+  </refsect1>
I don't think this is safe in any interesting set of cases. Am I
missing
something?
No, you are not missing anything. It can be only used to find max LSN in
database which can avoid further corruption
Why is the patch submitted documenting it as a use-case then? I find it
rather scary if the *patch authors* document a known unsafe use case as
one of the known use-cases.

I got the problem which can occur with the specified use case. Removed the
wrong use case specified above.
Thanks for the review, please find the updated patch attached in the mail.

Patch is not getting compiled on Windows, I had made following changes:
a. updated the patch for resolving Windows build
b. few documentation changes in (pg_resetxlog.sgml) for spelling
mistake and minor line change
c. corrected year for Copyright in file pg_computemaxlsn.c

I am OK with this patch in its current form, modulo some grammar
issues in the documentation which I can fix before committing.
However, I'm unclear whether there was sufficient consensus to proceed
with this. Can others weigh in? If there is too much residual
unhappiness with this, then we should just mark this as Rejected and
stop wasting time on it; it can be pushed to PGXN or similar even if
we don't put it in core.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

over 12 years ago

In reply to: Robert Haas (#2)

On Wed, Oct 9, 2013 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Monday, July 08, 2013 5:16 PM Andres Freund wrote:
On 2013-07-08 17:10:43 +0530, Amit Kapila wrote:
On Monday, July 08, 2013 4:26 PM Andres Freund wrote:
On 2013-07-08 16:17:54 +0530, Hari Babu wrote:

+ This utility can also be used to decide whether backup is

required or not when the data page

+ in old-master precedes the last applied LSN in old-standby

(i.e., new-master) at the
+    moment of the failover.
+   </para>
+  </refsect1>
I don't think this is safe in any interesting set of cases. Am I
missing
something?
No, you are not missing anything. It can be only used to find max LSN in
database which can avoid further corruption
Why is the patch submitted documenting it as a use-case then? I find it
rather scary if the *patch authors* document a known unsafe use case as
one of the known use-cases.

I got the problem which can occur with the specified use case. Removed the
wrong use case specified above.
Thanks for the review, please find the updated patch attached in the mail.

Patch is not getting compiled on Windows, I had made following changes:
a. updated the patch for resolving Windows build
b. few documentation changes in (pg_resetxlog.sgml) for spelling
mistake and minor line change
c. corrected year for Copyright in file pg_computemaxlsn.c
I am OK with this patch in its current form, modulo some grammar
issues in the documentation which I can fix before committing.
However, I'm unclear whether there was sufficient consensus to proceed
with this.

I would like to summarize in short about this patch.

This idea started with me proposing some solutions for cases where
user can recover from some of corruption scenario's, then Tom Lane had
suggested
this idea and I prepared a Patch for it, then you have given
suggestions about this patch at various phases during development.
Both Fujji Masao and Alvaro seems to be in favor of use case and
Alvaro has given few suggestions for patch as well. Muhammad Usama had
reviewed this patch.

It appears to me that Andres is not in favour of this Patch.

So, I think there are good number of people who had participated in
this patch and were in favour of this patch.

The above summarization is from what I remember about this Patch, so
if anybody feels that I have misunderstood, then kindly correct me.

Can others weigh in? If there is too much residual
unhappiness with this, then we should just mark this as Rejected and
stop wasting time on it; it can be pushed to PGXN or similar even if
we don't put it in core.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 12 years ago

In reply to: Robert Haas (#2)

On Wed, Oct 9, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Monday, July 08, 2013 5:16 PM Andres Freund wrote:
On 2013-07-08 17:10:43 +0530, Amit Kapila wrote:
On Monday, July 08, 2013 4:26 PM Andres Freund wrote:
On 2013-07-08 16:17:54 +0530, Hari Babu wrote:

+ This utility can also be used to decide whether backup is

required or not when the data page

+ in old-master precedes the last applied LSN in old-standby

(i.e., new-master) at the
+    moment of the failover.
+   </para>
+  </refsect1>
I don't think this is safe in any interesting set of cases. Am I
missing
something?
No, you are not missing anything. It can be only used to find max LSN in
database which can avoid further corruption
Why is the patch submitted documenting it as a use-case then? I find it
rather scary if the *patch authors* document a known unsafe use case as
one of the known use-cases.

I got the problem which can occur with the specified use case. Removed the
wrong use case specified above.
Thanks for the review, please find the updated patch attached in the mail.

Patch is not getting compiled on Windows, I had made following changes:
a. updated the patch for resolving Windows build
b. few documentation changes in (pg_resetxlog.sgml) for spelling
mistake and minor line change
c. corrected year for Copyright in file pg_computemaxlsn.c
I am OK with this patch in its current form, modulo some grammar
issues in the documentation which I can fix before committing.
However, I'm unclear whether there was sufficient consensus to proceed
with this. Can others weigh in? If there is too much residual
unhappiness with this, then we should just mark this as Rejected and
stop wasting time on it; it can be pushed to PGXN or similar even if
we don't put it in core.

I didn't hear any other votes. Anyone else have an opinion about
this? If I can't get a +1 from anyone who wasn't involved in writing
the patch, I'm inclined to think we don't have sufficient consensus to
commit this.

On further review of the patch, I also found a number of other issues
that I think need to be fixed before we could consider committing it:

- Per a previous request of mine, the patch has three different modes:
it can be run on an individual file, on a directory potentially
containing multiple relation files, or on an entire data directory.
This is not explained in the documentation.

- The patch skips printing an error if attempting to open a file
returns ENOENT. I don't see why that case shouldn't print an error.
Yeah, it could be legit if you're executing this against a running
server, but why are you doing that? And even if you are (e.g. for
corruption detection), printing a error message and proceeding makes
more sense than proceeding without printing anything, at least IMHO.

- Some of the static functions in this file preceed main and others
follow it. And they use different naming conventions. I suggest
putting all of them after main() and using names like check_data_dir,
check_data_file_name (instead of validateRelfilenodename; note that
the comment for that function is also incorrect),
find_max_lsn_in_file, find_max_lsn_in_directory,
find_max_lsn_in_pgdata.

- Since the patch goes to some pains to exit with 0 if no errors were
encountered and 1 if any were, that probably ought to be documented.
But I think instead of passing a "result" argument back up the call
stack it would be better to just keep a global variable called
"errors_encountered". When we encounter an error, bump that value.
Then you can just do exit(errors_encountered > 0 ? 1 : 0) and not
bother passing all of this stuff around via the return value. The
current behavior is inconsistent in another way, too: if you encounter
an error while scanning through a particular directory, you finish the
whole directory. But if multiple command-line arguments were passed,
you don't proceed to any subsequent command-line arguments. I think
you should continue always, which the above-mentioned change will take
care of basically for free.

- The description of the -q option is not very clear. A reader could
be forgiven for thinking that the option suppresses all output, which
would be quite useless, or at least left in doubt about what output
will still be provided.

A broader complaint I have with this patch is that it almost but
not-quite solves a problem I've had a few times in the past: namely,
searching through the data directory for data blocks which have LSNs
in the future. This has come up a few times for me, and this tool
would make it easier, because I'd be able to run it and look through
the output to see which relations have high max-LSN values. However,
it wouldn't be quite enough, because it'd only tell me about the block
with the highest LSN in each file, whereas what I'd really want to
find is every block with an LSN greater than some threshold value.
Maybe I'm pushing the envelope too much by trying to fit that into the
framework of this patch, but I can't help thinking we're not going to
want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are
95% the same code, so maybe we ought to rename the utility to
something slightly more generic than "pg_computemaxlsn".

In some ways this is almost a plausible shell for a generic
block-corruption checker. If we wanted to validate, for example, that
every page in a relation (or a whole data directory) had the expected
page version number, we'd want almost exactly this same code
structure, just with more checks added in.

I'm going to mark this "Returned with Feedback" in the CF app for now.
I still think that there's promise in this idea but, on further
reflection, committing what we have here now doesn't feel like the
right decision.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

about 12 years ago

In reply to: Robert Haas (#4)

Robert Haas escribiï¿½:

A broader complaint I have with this patch is that it almost but
not-quite solves a problem I've had a few times in the past: namely,
searching through the data directory for data blocks which have LSNs
in the future. This has come up a few times for me, and this tool
would make it easier, because I'd be able to run it and look through
the output to see which relations have high max-LSN values. However,
it wouldn't be quite enough, because it'd only tell me about the block
with the highest LSN in each file, whereas what I'd really want to
find is every block with an LSN greater than some threshold value.
Maybe I'm pushing the envelope too much by trying to fit that into the
framework of this patch, but I can't help thinking we're not going to
want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are
95% the same code, so maybe we ought to rename the utility to
something slightly more generic than "pg_computemaxlsn".

Perhaps not coincidentally, I had a need to do this recently. Perhaps
we should turn the utility into a generic tool to report existing LSNs,
with options to 1) report only the highest one in a given file, 2)
report only those that exceed some threshold. So maybe pg_reportlsn or
pg_extractlsn.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

about 12 years ago

In reply to: Robert Haas (#4)

On Fri, Oct 18, 2013 at 9:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 9, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Monday, July 08, 2013 5:16 PM Andres Freund wrote:

On 2013-07-08 17:10:43 +0530, Amit Kapila wrote:

On Monday, July 08, 2013 4:26 PM Andres Freund wrote:

On 2013-07-08 16:17:54 +0530, Hari Babu wrote:

I am OK with this patch in its current form, modulo some grammar
issues in the documentation which I can fix before committing.
However, I'm unclear whether there was sufficient consensus to proceed
with this. Can others weigh in? If there is too much residual
unhappiness with this, then we should just mark this as Rejected and
stop wasting time on it; it can be pushed to PGXN or similar even if
we don't put it in core.

I didn't hear any other votes. Anyone else have an opinion about
this? If I can't get a +1 from anyone who wasn't involved in writing
the patch, I'm inclined to think we don't have sufficient consensus to
commit this.

On further review of the patch, I also found a number of other issues
that I think need to be fixed before we could consider committing it:

- Per a previous request of mine, the patch has three different modes:
it can be run on an individual file, on a directory potentially
containing multiple relation files, or on an entire data directory.
This is not explained in the documentation.

There is some explanation about it, but I think you want to see in
different format or wording.
I will change it to explain it more clearly in next update of this patch.

+  <title>Description</title>
+  <para>
+   <command>pg_computemaxlsn</command> computes maximun LSN from database pages
+   in the specified list of files or directories.
+  </para>
+
+  <para>
+   If user doesn't provide the file or directory to find the max lsn then
+   <command>pg_computemaxlsn</command> use the environment variable
<envar>PGDATA</>
+   if exists otherwise reports an error.
+  </para>

- The patch skips printing an error if attempting to open a file
returns ENOENT. I don't see why that case shouldn't print an error.
Yeah, it could be legit if you're executing this against a running
server, but why are you doing that? And even if you are (e.g. for
corruption detection), printing a error message and proceeding makes
more sense than proceeding without printing anything, at least IMHO.

- Some of the static functions in this file preceed main and others
follow it. And they use different naming conventions. I suggest
putting all of them after main() and using names like check_data_dir,
check_data_file_name (instead of validateRelfilenodename; note that
the comment for that function is also incorrect),
find_max_lsn_in_file, find_max_lsn_in_directory,
find_max_lsn_in_pgdata.

- Since the patch goes to some pains to exit with 0 if no errors were
encountered and 1 if any were, that probably ought to be documented.
But I think instead of passing a "result" argument back up the call
stack it would be better to just keep a global variable called
"errors_encountered". When we encounter an error, bump that value.
Then you can just do exit(errors_encountered > 0 ? 1 : 0) and not
bother passing all of this stuff around via the return value. The
current behavior is inconsistent in another way, too: if you encounter
an error while scanning through a particular directory, you finish the
whole directory. But if multiple command-line arguments were passed,
you don't proceed to any subsequent command-line arguments. I think
you should continue always, which the above-mentioned change will take
care of basically for free.

- The description of the -q option is not very clear. A reader could
be forgiven for thinking that the option suppresses all output, which
would be quite useless, or at least left in doubt about what output
will still be provided.

Thank you for your feedback.
I will update it in next version of patch if there is a consensus to
proceed for this utility.

A broader complaint I have with this patch is that it almost but
not-quite solves a problem I've had a few times in the past: namely,
searching through the data directory for data blocks which have LSNs
in the future. This has come up a few times for me, and this tool
would make it easier, because I'd be able to run it and look through
the output to see which relations have high max-LSN values. However,
it wouldn't be quite enough, because it'd only tell me about the block
with the highest LSN in each file, whereas what I'd really want to
find is every block with an LSN greater than some threshold value.
Maybe I'm pushing the envelope too much by trying to fit that into the
framework of this patch, but I can't help thinking we're not going to
want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are
95% the same code, so maybe we ought to rename the utility to
something slightly more generic than "pg_computemaxlsn".

In some ways this is almost a plausible shell for a generic
block-corruption checker. If we wanted to validate, for example, that
every page in a relation (or a whole data directory) had the expected
page version number, we'd want almost exactly this same code
structure, just with more checks added in.

Okay, I have some idea to make the use case of this utility bit
broader, so that it can be helpful in many more cases to detect
corruption.
This utility can be extended to validate database in below ways:
1. It will validate the blocks based on checksum, if database has checksums
2. Can validate the version number in pages
3. Can validate if all data files exist
4. Can validate if block contains all zeros
5. It can validate the blocks based on whether rows are intact in a
block, doesn't have clear idea at this point for how to achieve it.
6. It can report empty pages in a file.

more ideas can be thought of to validate the database, but I think we
can start with something minimal like
a. its current usage of computing LSN and extend it find LSN greater
than threshold
b. validate the checksum

I'm going to mark this "Returned with Feedback" in the CF app for now.
I still think that there's promise in this idea but, on further
reflection, committing what we have here now doesn't feel like the
right decision.

Thank you very much for giving valuable suggestions and supporting the idea.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

about 12 years ago

In reply to: Alvaro Herrera (#5)

On Fri, Oct 18, 2013 at 9:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas escribió:

A broader complaint I have with this patch is that it almost but
not-quite solves a problem I've had a few times in the past: namely,
searching through the data directory for data blocks which have LSNs
in the future. This has come up a few times for me, and this tool
would make it easier, because I'd be able to run it and look through
the output to see which relations have high max-LSN values. However,
it wouldn't be quite enough, because it'd only tell me about the block
with the highest LSN in each file, whereas what I'd really want to
find is every block with an LSN greater than some threshold value.
Maybe I'm pushing the envelope too much by trying to fit that into the
framework of this patch, but I can't help thinking we're not going to
want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are
95% the same code, so maybe we ought to rename the utility to
something slightly more generic than "pg_computemaxlsn".

Perhaps not coincidentally, I had a need to do this recently. Perhaps
we should turn the utility into a generic tool to report existing LSNs,
with options to 1) report only the highest one in a given file, 2)
report only those that exceed some threshold. So maybe pg_reportlsn or
pg_extractlsn.

How about extending it validate database in more meaningful way and
name it as validatedb.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers