[PATCH] Better logging of COPY queries if log_statement='all'

Started by Aleksander Alekseevabout 9 years ago13 messages
#1Aleksander Alekseev
a.alekseev@postgrespro.ru
1 attachment(s)

Hello.

Sometimes it's useful to log content of files used in COPY ... TO ... and
COPY ... FROM ... queries. Unfortunately PostgreSQL doesn't allow to do
it, even if log_statement='all'. Suggested patch fixes this.

Log example:

```
LOG: statement: create table test (k int, v text);
LOG: statement: insert into test values (111, 'aaa'), (222, 'bbb');
LOG: statement: copy test to '/tmp/copy.txt';
LOG: statement: 111 aaa
LOG: statement: 222 bbb
LOG: statement: delete from test;
LOG: statement: copy test from '/tmp/copy.txt';
LOG: statement: 111 aaa
LOG: statement: 222 bbb
```
--
Best regards,
Aleksander Alekseev

Attachments:

log-statement-all-for-copy.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5947e72..82b3a18 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -331,6 +331,38 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
 
 
 /*
+ * Logs file content during COPY ... FROM / COPY ... TO execution if
+ * log_statement = 'all'.
+ */
+static void
+CopyLogStatement(const char* str, bool flush)
+{
+	static StringInfo logString = NULL;
+
+	if(log_statement != LOGSTMT_ALL)
+		return;
+
+	if(logString == NULL)
+	{
+		MemoryContext oldctx = MemoryContextSwitchTo(TopMemoryContext);
+		logString = makeStringInfo();
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	appendStringInfoString(logString, str);
+
+	if(flush)
+	{
+		ereport(LOG,
+				(errmsg("statement: %s", logString->data),
+				 errhidestmt(true),
+				 errhidecontext(true)));
+
+		resetStringInfo(logString);
+	}
+}
+
+/*
  * Send copy start/stop messages for frontend copies.  These have changed
  * in past protocol redesigns.
  */
@@ -2045,14 +2077,20 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 		if (!cstate->binary)
 		{
 			if (need_delim)
+			{
 				CopySendChar(cstate, cstate->delim[0]);
+				CopyLogStatement(cstate->delim, false);
+			}
 			need_delim = true;
 		}
 
 		if (isnull)
 		{
 			if (!cstate->binary)
+			{
 				CopySendString(cstate, cstate->null_print_client);
+				CopyLogStatement(cstate->null_print_client, false);
+			}
 			else
 				CopySendInt32(cstate, -1);
 		}
@@ -2062,6 +2100,9 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 			{
 				string = OutputFunctionCall(&out_functions[attnum - 1],
 											value);
+
+				CopyLogStatement(string, false);
+
 				if (cstate->csv_mode)
 					CopyAttributeOutCSV(cstate, string,
 										cstate->force_quote_flags[attnum - 1],
@@ -2083,6 +2124,7 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 	}
 
 	CopySendEndOfRow(cstate);
+	CopyLogStatement("", true);
 
 	MemoryContextSwitchTo(oldcontext);
 }
@@ -2914,6 +2956,8 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	if (done && cstate->line_buf.len == 0)
 		return false;
 
+	CopyLogStatement(cstate->line_buf.data, true);
+
 	/* Parse the line into de-escaped field values */
 	if (cstate->csv_mode)
 		fldct = CopyReadAttributesCSV(cstate);
#2Andrew Dunstan
andrew@dunslane.net
In reply to: Aleksander Alekseev (#1)
Re: [PATCH] Better logging of COPY queries if log_statement='all'

On 10/17/2016 09:57 AM, Aleksander Alekseev wrote:

Hello.

Sometimes it's useful to log content of files used in COPY ... TO ... and
COPY ... FROM ... queries. Unfortunately PostgreSQL doesn't allow to do
it, even if log_statement='all'. Suggested patch fixes this.

Log example:

```
LOG: statement: create table test (k int, v text);
LOG: statement: insert into test values (111, 'aaa'), (222, 'bbb');
LOG: statement: copy test to '/tmp/copy.txt';
LOG: statement: 111 aaa
LOG: statement: 222 bbb
LOG: statement: delete from test;
LOG: statement: copy test from '/tmp/copy.txt';
LOG: statement: 111 aaa
LOG: statement: 222 bbb
```

I'm not in favor of this, especially if it's not even optional.
log_statement is about logging, er, statements, not logging data.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#2)
Re: [PATCH] Better logging of COPY queries if log_statement='all'

Andrew Dunstan <andrew@dunslane.net> writes:

On 10/17/2016 09:57 AM, Aleksander Alekseev wrote:

Sometimes it's useful to log content of files used in COPY ... TO ... and
COPY ... FROM ... queries. Unfortunately PostgreSQL doesn't allow to do
it, even if log_statement='all'. Suggested patch fixes this.

I'm not in favor of this, especially if it's not even optional.

I'm not either. It sounds good when you're looking at toy examples,
but not when it means repeating gigabytes of COPY data into the log.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Aleksander Alekseev
a.alekseev@postgrespro.ru
In reply to: Tom Lane (#3)
Re: [PATCH] Better logging of COPY queries if log_statement='all'

I'm not in favor of this, especially if it's not even optional.

I'm not either. It sounds good when you're looking at toy examples,
but not when it means repeating gigabytes of COPY data into the log.

I understand your concern. Perhaps we could create and additional
parameter for enabling/disabling this feature or a new log_statement
value, or maybe both - i.e. rename log_statement and add a new possible
value?

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

This feature is very simple and easy to maintain. I'm sure we could find
a solution that will make happy both developers and users.

--
Best regards,
Aleksander Alekseev

#5Stephen Frost
sfrost@snowman.net
In reply to: Aleksander Alekseev (#4)
Re: [PATCH] Better logging of COPY queries if log_statement='all'

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 10/17/2016 09:57 AM, Aleksander Alekseev wrote:

Sometimes it's useful to log content of files used in COPY ... TO ... and
COPY ... FROM ... queries. Unfortunately PostgreSQL doesn't allow to do
it, even if log_statement='all'. Suggested patch fixes this.

I'm not in favor of this, especially if it's not even optional.

I'm not either. It sounds good when you're looking at toy examples,
but not when it means repeating gigabytes of COPY data into the log.

This isn't new- I've seen many cases of people happily loading gigabytes
of data via INSERT with log_statement='all' on. What I don't like is
the idea of springing this change on people.

* Aleksander Alekseev (a.alekseev@postgrespro.ru) wrote:

I understand your concern. Perhaps we could create and additional
parameter for enabling/disabling this feature or a new log_statement
value, or maybe both - i.e. rename log_statement and add a new possible
value?

Ugh. Adding more options to log_statement is just an ugly route to go
in, in my view. We really need a better solution here.

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

This use-case doesn't really make much sense to me. Can you explain it
in more detail? Is the goal here to replicate all of the statements
that are changing data in the database?

Thanks!

Stephen

#6Daniel Verite
daniel@manitou-mail.org
In reply to: Aleksander Alekseev (#4)
Re: [PATCH] Better logging of COPY queries if log_statement='all'

Aleksander Alekseev wrote:

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

Outside of the app, what can be already set up is an AFTER INSERT
FOR EACH ROW trigger that essentially does:

raise LOG, '%', NEW;

The main drawback of this approach is that, for each line of data
emitted to the log, there's another STATEMENT: copy... line added.
But that might be not too bad for certain uses.

Ideally we should be able to access the new rowset as a whole through
a statement-level trigger. In that case, the data could be logged in
a one-shot operation by that trigger.
There's a related item in the TODO list:
"Allow statement-level triggers to access modified rows"
and an old thread on -hackers here:
/messages/by-id/20060522150647.GE24404@svana.org
that discussed this topic in relation to MSSQL having this functionality.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Aleksander Alekseev
a.alekseev@postgrespro.ru
In reply to: Stephen Frost (#5)
Re: [PATCH] Better logging of COPY queries if log_statement='all'

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

This use-case doesn't really make much sense to me. Can you explain it
in more detail? Is the goal here to replicate all of the statements
that are changing data in the database?

The idea is to record application workload in real environment and write
a benchmark based on this record. Then using this benchmark we could try
different OS/DBMS configuration (or maybe hardware), find an extremum,
then change configuration in production environment.

It's not always possible to change an application or even database (e.g.
to use triggers) for this purpose. For instance, if DBMS is provided as
a service.

Currently PostgreSQL allows to record all workload _except_ COPY
queries. Considering how easily it could be done I think it's wrong.
Basically the only real question here is how it should look like in
postgresql.conf.

--
Best regards,
Aleksander Alekseev

#8Aleksander Alekseev
a.alekseev@postgrespro.ru
In reply to: Aleksander Alekseev (#7)
1 attachment(s)
File content logging during execution of COPY queries (was: Better logging of COPY queries if log_statement='all')

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

This use-case doesn't really make much sense to me. Can you explain it
in more detail? Is the goal here to replicate all of the statements
that are changing data in the database?

The idea is to record application workload in real environment and write
a benchmark based on this record. Then using this benchmark we could try
different OS/DBMS configuration (or maybe hardware), find an extremum,
then change configuration in production environment.

It's not always possible to change an application or even database (e.g.
to use triggers) for this purpose. For instance, if DBMS is provided as
a service.

Currently PostgreSQL allows to record all workload _except_ COPY
queries. Considering how easily it could be done I think it's wrong.
Basically the only real question here is how it should look like in
postgresql.conf.

OK, how about introducing a new boolean parameter named log_copy?
Corresponding patch is attached.

--
Best regards,
Aleksander Alekseev

Attachments:

log-copy-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8c25b45..84a7542 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5205,6 +5205,20 @@ FROM pg_stat_activity;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-log-copy" xreflabel="log_copy">
+      <term><varname>log_copy</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>log_copy</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Controls whether file content is logged during execution of
+        COPY queries.  The default is <literal>off</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-log-replication-commands" xreflabel="log_replication_commands">
       <term><varname>log_replication_commands</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5947e72..1863e27 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -331,6 +331,38 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
 
 
 /*
+ * Logs file content during COPY ... FROM / COPY ... TO execution if
+ * log_copy = 'on'.
+ */
+static void
+CopyLogStatement(const char* str, bool flush)
+{
+	static StringInfo logString = NULL;
+
+	if(log_copy == false)
+		return;
+
+	if(logString == NULL)
+	{
+		MemoryContext oldctx = MemoryContextSwitchTo(TopMemoryContext);
+		logString = makeStringInfo();
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	appendStringInfoString(logString, str);
+
+	if(flush)
+	{
+		ereport(LOG,
+				(errmsg("statement: %s", logString->data),
+				 errhidestmt(true),
+				 errhidecontext(true)));
+
+		resetStringInfo(logString);
+	}
+}
+
+/*
  * Send copy start/stop messages for frontend copies.  These have changed
  * in past protocol redesigns.
  */
@@ -2045,14 +2077,20 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 		if (!cstate->binary)
 		{
 			if (need_delim)
+			{
 				CopySendChar(cstate, cstate->delim[0]);
+				CopyLogStatement(cstate->delim, false);
+			}
 			need_delim = true;
 		}
 
 		if (isnull)
 		{
 			if (!cstate->binary)
+			{
 				CopySendString(cstate, cstate->null_print_client);
+				CopyLogStatement(cstate->null_print_client, false);
+			}
 			else
 				CopySendInt32(cstate, -1);
 		}
@@ -2062,6 +2100,9 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 			{
 				string = OutputFunctionCall(&out_functions[attnum - 1],
 											value);
+
+				CopyLogStatement(string, false);
+
 				if (cstate->csv_mode)
 					CopyAttributeOutCSV(cstate, string,
 										cstate->force_quote_flags[attnum - 1],
@@ -2083,6 +2124,7 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 	}
 
 	CopySendEndOfRow(cstate);
+	CopyLogStatement("", true);
 
 	MemoryContextSwitchTo(oldcontext);
 }
@@ -2914,6 +2956,8 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	if (done && cstate->line_buf.len == 0)
 		return false;
 
+	CopyLogStatement(cstate->line_buf.data, true);
+
 	/* Parse the line into de-escaped field values */
 	if (cstate->csv_mode)
 		fldct = CopyReadAttributesCSV(cstate);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bc9d33f..0f035ac 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -415,6 +415,8 @@ bool		log_planner_stats = false;
 bool		log_executor_stats = false;
 bool		log_statement_stats = false;		/* this is sort of all three
 												 * above together */
+bool		log_copy = false;
+
 bool		log_btree_build_stats = false;
 char	   *event_source;
 
@@ -1161,6 +1163,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		check_log_stats, NULL, NULL
 	},
+	{
+		{"log_copy", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Writes file content during COPY queries to the server log."),
+			NULL
+		},
+		&log_copy,
+		false,
+		NULL, NULL, NULL
+	},
 #ifdef BTREE_BUILD_STATS
 	{
 		{"log_btree_build_stats", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index e1de1a5..4f25331 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -240,6 +240,7 @@ extern bool log_parser_stats;
 extern bool log_planner_stats;
 extern bool log_executor_stats;
 extern bool log_statement_stats;
+extern bool log_copy;
 extern bool log_btree_build_stats;
 
 extern PGDLLIMPORT bool check_function_bodies;
#9Stephen Frost
sfrost@snowman.net
In reply to: Aleksander Alekseev (#8)
Re: File content logging during execution of COPY queries (was: Better logging of COPY queries if log_statement='all')

Aleksander,

* Aleksander Alekseev (a.alekseev@postgrespro.ru) wrote:

The idea is to record application workload in real environment and write
a benchmark based on this record. Then using this benchmark we could try
different OS/DBMS configuration (or maybe hardware), find an extremum,
then change configuration in production environment.

It's not always possible to change an application or even database (e.g.
to use triggers) for this purpose. For instance, if DBMS is provided as
a service.

Currently PostgreSQL allows to record all workload _except_ COPY
queries. Considering how easily it could be done I think it's wrong.
Basically the only real question here is how it should look like in
postgresql.conf.

OK, how about introducing a new boolean parameter named log_copy?
Corresponding patch is attached.

The parameter would be better as 'log_copy_data', I believe. The actual
COPY command is already logged with just 'log_statement = all', of
course.

Also..

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8c25b45..84a7542 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5205,6 +5205,20 @@ FROM pg_stat_activity;
</listitem>
</varlistentry>
+     <varlistentry id="guc-log-copy" xreflabel="log_copy">
+      <term><varname>log_copy</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>log_copy</> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Controls whether file content is logged during execution of
+        COPY queries.  The default is <literal>off</>.
+       </para>
+      </listitem>
+     </varlistentry>

"file" isn't accurate here and I don't know that it actually makes sense
to log "COPY TO" data- we don't log the results of SELECT statements,
after all, and the use-case you outlined above (which I generally agree
is one we should consider) doesn't have any need for the data of "COPY
TO" statements to be in the log, it seems to me.

Can you elaborate on why we would want to log the data sent to the
client with a COPY TO command. If there is a reason, why wouldn't we
want to support that for SELECT and ... RETURNING commands too?

Thanks!

Stephen

#10Grigory Smolkin
g.smolkin@postgrespro.ru
In reply to: Aleksander Alekseev (#8)
Re: File content logging during execution of COPY queries

On 10/20/2016 12:36 PM, Aleksander Alekseev wrote:

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

This use-case doesn't really make much sense to me. Can you explain it
in more detail? Is the goal here to replicate all of the statements
that are changing data in the database?

The idea is to record application workload in real environment and write
a benchmark based on this record. Then using this benchmark we could try
different OS/DBMS configuration (or maybe hardware), find an extremum,
then change configuration in production environment.

It's not always possible to change an application or even database (e.g.
to use triggers) for this purpose. For instance, if DBMS is provided as
a service.

Currently PostgreSQL allows to record all workload _except_ COPY
queries. Considering how easily it could be done I think it's wrong.
Basically the only real question here is how it should look like in
postgresql.conf.

OK, how about introducing a new boolean parameter named log_copy?
Corresponding patch is attached.

This is a useful feature I was waiting for some time.
If some application which workload you want to collect is using COPY
statement, then recording network traffic was your only option.

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#11Stephen Frost
sfrost@snowman.net
In reply to: Grigory Smolkin (#10)
Re: File content logging during execution of COPY queries

Grigory,

* Grigory Smolkin (g.smolkin@postgrespro.ru) wrote:

On 10/20/2016 12:36 PM, Aleksander Alekseev wrote:

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

This use-case doesn't really make much sense to me. Can you explain it
in more detail? Is the goal here to replicate all of the statements
that are changing data in the database?

The idea is to record application workload in real environment and write
a benchmark based on this record. Then using this benchmark we could try
different OS/DBMS configuration (or maybe hardware), find an extremum,
then change configuration in production environment.

It's not always possible to change an application or even database (e.g.
to use triggers) for this purpose. For instance, if DBMS is provided as
a service.

Currently PostgreSQL allows to record all workload _except_ COPY
queries. Considering how easily it could be done I think it's wrong.
Basically the only real question here is how it should look like in
postgresql.conf.

OK, how about introducing a new boolean parameter named log_copy?
Corresponding patch is attached.

This is a useful feature I was waiting for some time.
If some application which workload you want to collect is using COPY
statement, then recording network traffic was your only option.

As I pointed out to Aleksander, you would still need to record network
traffic to see all of the data going to and from the database since we
do not include SELECT or ... RETURNING results in the log files. If
that is needed then that's a whole different discussion.

Thanks!

Stephen

#12Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Aleksander Alekseev (#7)
Re: [PATCH] Better logging of COPY queries if log_statement='all'

Hi Dmitry,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Can you please try to share your
views
about the patch. This will help us in smoother operation of commitfest.

some people are against to the current patch approach. If you can share your
views on the use case and etc, it will be helpful. If you are not agreed
with the
approach similar like others, better reject the patch.

Please Ignore if you already shared your review.

Regards,
Hari Babu
Fujitsu Australia

#13Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#12)
Re: [PATCH] Better logging of COPY queries if log_statement='all'

On Thu, Nov 17, 2016 at 12:57 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

Hi Dmitry,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Can you please try to share your
views
about the patch. This will help us in smoother operation of commitfest.

some people are against to the current patch approach. If you can share
your
views on the use case and etc, it will be helpful. If you are not agreed
with the
approach similar like others, better reject the patch.

Please Ignore if you already shared your review.

Closed in 2016-11 commitfest with "rejected" status.

Regards,
Hari Babu
Fujitsu Australia