[PATCH] Initial progress reporting for COPY command
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL
maillist (
https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
I have prepared an initial patch for COPY command progress reporting.
Few examples first:
"COPY (SELECT * FROM test) TO '/tmp/ids';"
yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program |
lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 0 | TO | t | f |
3529943 | 24906226
(1 row)
"COPY test FROM '/tmp/ids';
yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program |
lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 16385 | FROM | t | f |
121591999 | 957218816
(1 row)
Columns are inspired by CREATE INDEX progress report system view.
pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since oid
should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known (for
example when copying to file, it is 0)
direction - text - one of "FROM" or "TO" depends on command used
file - bool - is file is used?
program - bool - is program used?
lines_processed - bigint - amount of processed lines, works for both
directions (FROM/TO)
file_bytes_processed - amount of bytes processed when file is used
(otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)
Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.
Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patch
I havefew initial notes and questions.
I'm using ftell to get current position in file to populate
file_bytes_processed without error handling (ftell can return -1L and also
populate errno on problems).
1. Is that a good way to get progress of file processing?
2. Is it safe in given context to not care about errors? If not, what to do
on error?
Some columns are not populated on certain COPY commands. For example when a
file is not used, file_bytes_processed is set to 0. Would it be better to
use NULL instead when the column is not related to the current command?
Same problem is for relid column.
I have not found any tests for progress reporting. Are there any? It would
need two backends running (one running COPY, one checking output of report
view). Is there any similar test I can inspire at? In theory, it should be
possible to use dblink_send_query to run async COPY command in the
background.
My initial (attached) patch also doesn't introduce documentation for this
system view. I can add that later once this patch is finalized (if that
happens).
Attachments:
copy-progress-v1.patchapplication/octet-stream; name=copy-progress-v1.patchDownload
From f8ed06cfee691d9ba9602a93113b9a4debe97b5b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.simanek@gmail.com>
Date: Mon, 8 Jun 2020 02:00:53 +0200
Subject: [PATCH 1/2] Initial work on COPY progress.
---
src/backend/commands/copy.c | 13 ++++++++++---
src/backend/utils/adt/pgstatfuncs.c | 2 ++
src/include/commands/progress.h | 3 +++
src/include/pgstat.h | 3 ++-
4 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c18..6c66a1631a40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -29,6 +29,7 @@
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
+#include "commands/progress.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
@@ -45,6 +46,7 @@
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -1752,6 +1754,9 @@ BeginCopy(ParseState *pstate,
cstate->copy_dest = COPY_FILE; /* default */
+ pgstat_progress_start_command(PROGRESS_COMMAND_COPY, queryRelId);
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED,0);
+
MemoryContextSwitchTo(oldcontext);
return cstate;
@@ -1811,6 +1816,8 @@ EndCopy(CopyState cstate)
cstate->filename)));
}
+ pgstat_progress_end_command();
+
MemoryContextDelete(cstate->copycontext);
pfree(cstate);
}
@@ -2123,7 +2130,7 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
- processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
}
ExecDropSingleTupleTableSlot(slot);
@@ -3262,7 +3269,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
}
}
@@ -5119,7 +5126,7 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
- myState->processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++myState->processed);
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2aff739466ff..b740eef7c102 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -494,6 +494,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
cmdtype = PROGRESS_COMMAND_BASEBACKUP;
+ else if (pg_strcasecmp(cmd, "COPY") == 0)
+ cmdtype = PROGRESS_COMMAND_COPY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 36b073e67757..2d72f12a75b5 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -133,4 +133,7 @@
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5
+/* Commands of PROGRESS_CLUSTER */
+#define PROGRESS_COPY_PROCESSED 0
+
#endif
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c55dc1481ca5..45b8006ff2b8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -994,7 +994,8 @@ typedef enum ProgressCommandType
PROGRESS_COMMAND_ANALYZE,
PROGRESS_COMMAND_CLUSTER,
PROGRESS_COMMAND_CREATE_INDEX,
- PROGRESS_COMMAND_BASEBACKUP
+ PROGRESS_COMMAND_BASEBACKUP,
+ PROGRESS_COMMAND_COPY
} ProgressCommandType;
#define PGSTAT_NUM_PROGRESS_PARAM 20
From bb528a21dcf8d12b06961d8fd591d3ac3bbf2c5b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.simanek@gmail.com>
Date: Sun, 14 Jun 2020 02:46:04 +0200
Subject: [PATCH 2/2] Enhance copy progress with more info. - add
pg_stat_progress_copy system view
---
src/backend/catalog/system_views.sql | 14 ++++++++++++++
src/backend/commands/copy.c | 16 +++++++++++-----
src/include/commands/progress.h | 8 ++++++--
src/test/regress/expected/rules.out | 15 +++++++++++++++
4 files changed, 46 insertions(+), 7 deletions(-)
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 56420bbc9d6f..05f995b43937 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1093,6 +1093,20 @@ CREATE VIEW pg_stat_progress_basebackup AS
S.param5 AS tablespaces_streamed
FROM pg_stat_get_progress_info('BASEBACKUP') AS S;
+CREATE VIEW pg_stat_progress_copy AS
+ SELECT
+ S.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE S.param1 WHEN 0 THEN 'TO'
+ WHEN 1 THEN 'FROM'
+ END as direction,
+ CAST (S.param2::integer AS bool) AS file,
+ CAST (S.param3::integer AS bool) AS program,
+ S.param4 AS lines_processed,
+ S.param5 AS file_bytes_processed
+ FROM pg_stat_get_progress_info('COPY') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6c66a1631a40..3462e4f414d1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -566,6 +566,7 @@ CopySendEndOfRow(CopyState cstate)
(errcode_for_file_access(),
errmsg("could not write to COPY file: %m")));
}
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
break;
case COPY_OLD_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
@@ -618,6 +619,7 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
case COPY_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
@@ -1754,8 +1756,12 @@ BeginCopy(ParseState *pstate,
cstate->copy_dest = COPY_FILE; /* default */
- pgstat_progress_start_command(PROGRESS_COMMAND_COPY, queryRelId);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED,0);
+ pgstat_progress_start_command(PROGRESS_COMMAND_COPY, cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
+
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, 0);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_FROM, (int) cstate->is_copy_from);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_FILE, (int) cstate->copy_dest == COPY_FILE);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_PROGRAM, (int) cstate->is_program);
MemoryContextSwitchTo(oldcontext);
@@ -2130,7 +2136,7 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
ExecDropSingleTupleTableSlot(slot);
@@ -3269,7 +3275,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
}
@@ -5126,7 +5132,7 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++myState->processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++myState->processed);
return true;
}
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 2d72f12a75b5..3947222e6f61 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -133,7 +133,11 @@
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_COPY_PROCESSED 0
+/* Commands of PROGRESS_COPY */
+#define PROGRESS_COPY_IS_FROM 0
+#define PROGRESS_COPY_IS_FILE 1
+#define PROGRESS_COPY_IS_PROGRAM 2
+#define PROGRESS_COPY_LINES_PROCESSED 3
+#define PROGRESS_COPY_BYTES_PROCESSED 4
#endif
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e322153d..dac5da4e6dd9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1925,6 +1925,21 @@ pg_stat_progress_cluster| SELECT s.pid,
s.param8 AS index_rebuild_count
FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_copy| SELECT s.pid,
+ s.datid,
+ d.datname,
+ s.relid,
+ CASE s.param1
+ WHEN 0 THEN 'TO'::text
+ WHEN 1 THEN 'FROM'::text
+ ELSE NULL::text
+ END AS direction,
+ ((s.param2)::integer)::boolean AS file,
+ ((s.param3)::integer)::boolean AS program,
+ s.param4 AS lines_processed,
+ s.param5 AS file_bytes_processed
+ FROM (pg_stat_get_progress_info('COPY'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_create_index| SELECT s.pid,
s.datid,
d.datname,
Hi Josef,
On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote:
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL
maillist (
https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
I have prepared an initial patch for COPY command progress reporting.
Sounds like a good idea to me.
I have not found any tests for progress reporting. Are there any? It would
need two backends running (one running COPY, one checking output of report
view). Is there any similar test I can inspire at? In theory, it should be
possible to use dblink_send_query to run async COPY command in the
background.
We don't have any tests in core. I think that making deterministic
test cases is rather tricky here as long as we don't have a more
advanced testing framework that allows is to lock certain code paths
and keep around an expected state until a second session comes around
and looks at the progress catalog (even that would need adding more
code to core to mark the extra point looked at). So I think that it is
fine to not focus on that for this feature. The important parts are
the choice of the progress points and the data sent to MyProc, and
both should be chosen wisely.
My initial (attached) patch also doesn't introduce documentation for this
system view. I can add that later once this patch is finalized (if that
happens).
You may want to add it to the next commit fest:
https://commitfest.postgresql.org/28/
Documentation is necessary, and having some would ease reviews.
--
Michael
On 2020/06/14 21:32, Josef Šimánek wrote:
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.
Sounds nice!
file - bool - is file is used?
program - bool - is program used?
Are these fields really necessary in a progress view?
What values are reported when STDOUT/STDIN is specified in COPY command?
file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)
What value is reported when STDOUT/STDIN is specified in COPY command?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems).
1. Is that a good way to get progress of file processing?
IMO, it's better to handle the error cases. One possible case where
ftell can return -1 and set errno is when the total bytes processed is
more than LONG_MAX.
Will your patch handle file_bytes_processed reporting for COPY FROM
STDIN cases? For this case, ftell can't be used.
Instead of using ftell and worrying about the errors, a simple
approach could be to have a uint64 variable in CopyStateData to track
the number of bytes read whenever CopyGetData is called. This approach
can also handle the case of COPY FROM STDIN.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
po 15. 6. 2020 v 2:18 odesílatel Michael Paquier <michael@paquier.xyz>
napsal:
Hi Josef,
On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote:
Hello, as proposed by Pavel Stěhule and discussed on local czech
PostgreSQL
maillist (
I have prepared an initial patch for COPY command progress reporting.
Sounds like a good idea to me.
Great. I will continue working on this.
I have not found any tests for progress reporting. Are there any? It
would
need two backends running (one running COPY, one checking output of
report
view). Is there any similar test I can inspire at? In theory, it should
be
possible to use dblink_send_query to run async COPY command in the
background.We don't have any tests in core. I think that making deterministic
test cases is rather tricky here as long as we don't have a more
advanced testing framework that allows is to lock certain code paths
and keep around an expected state until a second session comes around
and looks at the progress catalog (even that would need adding more
code to core to mark the extra point looked at). So I think that it is
fine to not focus on that for this feature. The important parts are
the choice of the progress points and the data sent to MyProc, and
both should be chosen wisely.
Thanks for the info. I'm focusing exactly at looking for right spots to
report the progress. I'll attach new patch with better places and
supporting more options of reporting (including STDIN, STDOUT) soon and
also I'll try to add it to commitfest.
Show quoted text
My initial (attached) patch also doesn't introduce documentation for this
system view. I can add that later once this patch is finalized (if that
happens).You may want to add it to the next commit fest:
https://commitfest.postgresql.org/28/
Documentation is necessary, and having some would ease reviews.
--
Michael
po 15. 6. 2020 v 6:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com>
napsal:
On 2020/06/14 21:32, Josef Šimánek wrote:
Hello, as proposed by Pavel Stěhule and discussed on local czech
PostgreSQL maillist (
https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
I have prepared an initial patch for COPY command progress reporting.Sounds nice!
file - bool - is file is used?
program - bool - is program used?Are these fields really necessary in a progress view?
What values are reported when STDOUT/STDIN is specified in COPY command?
For STDOUT and STDIN file is true and program is false.
file_bytes_processed - amount of bytes processed when file is used
(otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)
What value is reported when STDOUT/STDIN is specified in COPY command?
For my first patch nothing was reported on STDOUT/STDIN usage. I'll attach
new patch soon supporting those as well.
Show quoted text
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
po 15. 6. 2020 v 2:18 odesílatel Michael Paquier <michael@paquier.xyz>
napsal:
Hi Josef,
On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote:
Hello, as proposed by Pavel Stěhule and discussed on local czech
PostgreSQL
maillist (
I have prepared an initial patch for COPY command progress reporting.
Sounds like a good idea to me.
I have not found any tests for progress reporting. Are there any? It
would
need two backends running (one running COPY, one checking output of
report
view). Is there any similar test I can inspire at? In theory, it should
be
possible to use dblink_send_query to run async COPY command in the
background.We don't have any tests in core. I think that making deterministic
test cases is rather tricky here as long as we don't have a more
advanced testing framework that allows is to lock certain code paths
and keep around an expected state until a second session comes around
and looks at the progress catalog (even that would need adding more
code to core to mark the extra point looked at). So I think that it is
fine to not focus on that for this feature. The important parts are
the choice of the progress points and the data sent to MyProc, and
both should be chosen wisely.My initial (attached) patch also doesn't introduce documentation for this
system view. I can add that later once this patch is finalized (if that
happens).You may want to add it to the next commit fest:
https://commitfest.postgresql.org/28/
Documentation is necessary, and having some would ease reviews.
I have added documentation, more code comments and I'll upload patch to
commit fest.
Show quoted text
--
Michael
po 15. 6. 2020 v 7:34 odesílatel Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> napsal:
I'm using ftell to get current position in file to populate
file_bytes_processed without error handling (ftell can return -1L and also
populate errno on problems).1. Is that a good way to get progress of file processing?
IMO, it's better to handle the error cases. One possible case where
ftell can return -1 and set errno is when the total bytes processed is
more than LONG_MAX.Will your patch handle file_bytes_processed reporting for COPY FROM
STDIN cases? For this case, ftell can't be used.Instead of using ftell and worrying about the errors, a simple
approach could be to have a uint64 variable in CopyStateData to track
the number of bytes read whenever CopyGetData is called. This approach
can also handle the case of COPY FROM STDIN.
Thanks for suggestion. I used this approach and latest patch supports both
STDIN and STDOUT now.
Show quoted text
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Thanks for all comments. I have updated code to support more options
(including STDIN/STDOUT) and added some documentation.
Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.
Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patch
I'm also attaching screenshot of HTML documentation and html documentation
file.
I'll do my best to get this to commitfest now.
ne 14. 6. 2020 v 14:32 odesílatel Josef Šimánek <josef.simanek@gmail.com>
napsal:
Show quoted text
Hello, as proposed by Pavel Stěhule and discussed on local czech
PostgreSQL maillist (
https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
I have prepared an initial patch for COPY command progress reporting.Few examples first:
"COPY (SELECT * FROM test) TO '/tmp/ids';"
yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program |
lines_processed | file_bytes_processed---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 0 | TO | t | f |
3529943 | 24906226
(1 row)"COPY test FROM '/tmp/ids';
yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program |
lines_processed | file_bytes_processed---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 16385 | FROM | t | f |
121591999 | 957218816
(1 row)Columns are inspired by CREATE INDEX progress report system view.
pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since oid
should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known (for
example when copying to file, it is 0)
direction - text - one of "FROM" or "TO" depends on command used
file - bool - is file is used?
program - bool - is program used?
lines_processed - bigint - amount of processed lines, works for both
directions (FROM/TO)
file_bytes_processed - amount of bytes processed when file is used
(otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patchI havefew initial notes and questions.
I'm using ftell to get current position in file to populate
file_bytes_processed without error handling (ftell can return -1L and also
populate errno on problems).1. Is that a good way to get progress of file processing?
2. Is it safe in given context to not care about errors? If not, what to
do on error?Some columns are not populated on certain COPY commands. For example when
a file is not used, file_bytes_processed is set to 0. Would it be better to
use NULL instead when the column is not related to the current command?
Same problem is for relid column.I have not found any tests for progress reporting. Are there any? It would
need two backends running (one running COPY, one checking output of report
view). Is there any similar test I can inspire at? In theory, it should be
possible to use dblink_send_query to run async COPY command in the
background.My initial (attached) patch also doesn't introduce documentation for this
system view. I can add that later once this patch is finalized (if that
happens).
Attachments:
copy-progress-v2.patchtext/x-patch; charset=US-ASCII; name=copy-progress-v2.patchDownload
From 64bf6cd66478e5b5b710d404b726c4147a98a660 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.simanek@gmail.com>
Date: Mon, 8 Jun 2020 02:00:53 +0200
Subject: [PATCH 1/3] Initial work on COPY progress.
---
src/backend/commands/copy.c | 13 ++++++++++---
src/backend/utils/adt/pgstatfuncs.c | 2 ++
src/include/commands/progress.h | 3 +++
src/include/pgstat.h | 3 ++-
4 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d4cca6..bef551770862 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -29,6 +29,7 @@
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
+#include "commands/progress.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
@@ -45,6 +46,7 @@
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -1751,6 +1753,9 @@ BeginCopy(ParseState *pstate,
cstate->copy_dest = COPY_FILE; /* default */
+ pgstat_progress_start_command(PROGRESS_COMMAND_COPY, queryRelId);
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED,0);
+
MemoryContextSwitchTo(oldcontext);
return cstate;
@@ -1810,6 +1815,8 @@ EndCopy(CopyState cstate)
cstate->filename)));
}
+ pgstat_progress_end_command();
+
MemoryContextDelete(cstate->copycontext);
pfree(cstate);
}
@@ -2122,7 +2129,7 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
- processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
}
ExecDropSingleTupleTableSlot(slot);
@@ -3261,7 +3268,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
}
}
@@ -5114,7 +5121,7 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
- myState->processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++myState->processed);
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2aff739466ff..b740eef7c102 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -494,6 +494,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
cmdtype = PROGRESS_COMMAND_BASEBACKUP;
+ else if (pg_strcasecmp(cmd, "COPY") == 0)
+ cmdtype = PROGRESS_COMMAND_COPY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 36b073e67757..2d72f12a75b5 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -133,4 +133,7 @@
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5
+/* Commands of PROGRESS_CLUSTER */
+#define PROGRESS_COPY_PROCESSED 0
+
#endif
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 13872013823e..c338a1a5ba00 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -995,7 +995,8 @@ typedef enum ProgressCommandType
PROGRESS_COMMAND_ANALYZE,
PROGRESS_COMMAND_CLUSTER,
PROGRESS_COMMAND_CREATE_INDEX,
- PROGRESS_COMMAND_BASEBACKUP
+ PROGRESS_COMMAND_BASEBACKUP,
+ PROGRESS_COMMAND_COPY
} ProgressCommandType;
#define PGSTAT_NUM_PROGRESS_PARAM 20
From f9cb4fe9b3fc223cb3174044ea00edbb96675138 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.simanek@gmail.com>
Date: Sun, 14 Jun 2020 02:46:04 +0200
Subject: [PATCH 2/3] Enhance copy progress with more info. - add
pg_stat_progress_copy system view
---
src/backend/catalog/system_views.sql | 14 ++++++++++++++
src/backend/commands/copy.c | 16 +++++++++++-----
src/include/commands/progress.h | 8 ++++++--
src/test/regress/expected/rules.out | 15 +++++++++++++++
4 files changed, 46 insertions(+), 7 deletions(-)
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348fa7..7bc429ce299e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1093,6 +1093,20 @@ CREATE VIEW pg_stat_progress_basebackup AS
S.param5 AS tablespaces_streamed
FROM pg_stat_get_progress_info('BASEBACKUP') AS S;
+CREATE VIEW pg_stat_progress_copy AS
+ SELECT
+ S.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE S.param1 WHEN 0 THEN 'TO'
+ WHEN 1 THEN 'FROM'
+ END as direction,
+ CAST (S.param2::integer AS bool) AS file,
+ CAST (S.param3::integer AS bool) AS program,
+ S.param4 AS lines_processed,
+ S.param5 AS file_bytes_processed
+ FROM pg_stat_get_progress_info('COPY') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index bef551770862..a733b6e1c676 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -565,6 +565,7 @@ CopySendEndOfRow(CopyState cstate)
(errcode_for_file_access(),
errmsg("could not write to COPY file: %m")));
}
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
break;
case COPY_OLD_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
@@ -617,6 +618,7 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
case COPY_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
@@ -1753,8 +1755,12 @@ BeginCopy(ParseState *pstate,
cstate->copy_dest = COPY_FILE; /* default */
- pgstat_progress_start_command(PROGRESS_COMMAND_COPY, queryRelId);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED,0);
+ pgstat_progress_start_command(PROGRESS_COMMAND_COPY, cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
+
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, 0);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_FROM, (int) cstate->is_copy_from);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_FILE, (int) cstate->copy_dest == COPY_FILE);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_PROGRAM, (int) cstate->is_program);
MemoryContextSwitchTo(oldcontext);
@@ -2129,7 +2135,7 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
ExecDropSingleTupleTableSlot(slot);
@@ -3268,7 +3274,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
}
@@ -5121,7 +5127,7 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++myState->processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++myState->processed);
return true;
}
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 2d72f12a75b5..3947222e6f61 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -133,7 +133,11 @@
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_COPY_PROCESSED 0
+/* Commands of PROGRESS_COPY */
+#define PROGRESS_COPY_IS_FROM 0
+#define PROGRESS_COPY_IS_FILE 1
+#define PROGRESS_COPY_IS_PROGRAM 2
+#define PROGRESS_COPY_LINES_PROCESSED 3
+#define PROGRESS_COPY_BYTES_PROCESSED 4
#endif
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e322153d..dac5da4e6dd9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1925,6 +1925,21 @@ pg_stat_progress_cluster| SELECT s.pid,
s.param8 AS index_rebuild_count
FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_copy| SELECT s.pid,
+ s.datid,
+ d.datname,
+ s.relid,
+ CASE s.param1
+ WHEN 0 THEN 'TO'::text
+ WHEN 1 THEN 'FROM'::text
+ ELSE NULL::text
+ END AS direction,
+ ((s.param2)::integer)::boolean AS file,
+ ((s.param3)::integer)::boolean AS program,
+ s.param4 AS lines_processed,
+ s.param5 AS file_bytes_processed
+ FROM (pg_stat_get_progress_info('COPY'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_create_index| SELECT s.pid,
s.datid,
d.datname,
From 5b1ddddf0c0a83606757b762f996e9c40469a831 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.simanek@gmail.com>
Date: Sun, 21 Jun 2020 13:15:57 +0200
Subject: [PATCH 3/3] Support STDIN, STDOUT. Add docs.
---
doc/src/sgml/monitoring.sgml | 119 +++++++++++++++++++++++++++
src/backend/catalog/system_views.sql | 2 +-
src/backend/commands/copy.c | 34 ++++++--
src/test/regress/expected/rules.out | 2 +-
4 files changed, 149 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dfa9d0d6410c..d4a9fa9e2f2f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -390,6 +390,13 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</entry>
</row>
+ <row>
+ <entry><structname>pg_stat_progress_copy</structname><indexterm><primary>pg_stat_progress_copy</primary></indexterm></entry>
+ <entry>One row for each backend running <command>COPY</command>, showing current progress.
+ See <xref linkend='copy-progress-reporting'/>.
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
@@ -5013,6 +5020,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
which support progress reporting are <command>ANALYZE</command>,
<command>CLUSTER</command>,
<command>CREATE INDEX</command>, <command>VACUUM</command>,
+ <command>COPY</command>
and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
command that <xref linkend="app-pgbasebackup"/> issues to take
a base backup).
@@ -6142,6 +6150,117 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
</table>
</sect2>
+
+ <sect2 id="copy-progress-reporting">
+ <title>COPY Progress Reporting</title>
+
+ <para>
+ Whenever <command>COPY</command> is running, the
+ <structname>pg_stat_copy_progress</structname> view will contain one row
+ for each backend that is currently running <command>COPY</command> command.
+ The table bellow describes the information that will be reported and provide
+ information how to interpret it.
+ </para>
+
+ <table id="pg-stat-progress-copy-view" xreflabel="pg_stat_progress_copy">
+ <title><structname>pg_stat_progress_copy</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of backend.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>text</type>
+ </para>
+ <para>
+ OID of the database to which this backend is connected.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database to which this backend is connected.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>relid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the table on which the <command>COPY</command> command is executed. It is set to 0 if <structfield>SELECT query</structfield> is provided.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>direction</structfield> <type>text</type>
+ </para>
+ <para>
+ Can be one of <structfield>TO</structfield> or <structfield>FROM</structfield>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>file</structfield> <type>boolean</type>
+ </para>
+ <para>
+ <structfield>true</structfield> if <structfield>filename</structfield> argument is used. Notice STDIN and STDOUT are considered file as well.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>program</structfield> <type>boolean</type>
+ </para>
+ <para>
+ <structfield>true</structfield> if <structfield>PROGRAM</structfield> option is used.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>line_processed</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of lines already processed by <command>COPY</command> command.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>bytes_processed</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of bytes already processed by <command>COPY</command> command.
+ </para></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
</sect1>
<sect1 id="dynamic-trace">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7bc429ce299e..53b38ad70cfc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1103,7 +1103,7 @@ CREATE VIEW pg_stat_progress_copy AS
CAST (S.param2::integer AS bool) AS file,
CAST (S.param3::integer AS bool) AS program,
S.param4 AS lines_processed,
- S.param5 AS file_bytes_processed
+ S.param5 AS bytes_processed
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a733b6e1c676..1eccc7721fe5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -225,6 +225,7 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ uint64 bytes_processed; /* total # of bytes processed, used for progress reporting */
} CopyStateData;
/* DestReceiver for COPY (query) TO */
@@ -393,6 +394,7 @@ static void CopySendInt32(CopyState cstate, int32 val);
static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static int CopyUpdateBytesProgress(CopyState cstate, int newbytesprocessed);
/*
@@ -500,18 +502,19 @@ SendCopyEnd(CopyState cstate)
static void
CopySendData(CopyState cstate, const void *databuf, int datasize)
{
- appendBinaryStringInfo(cstate->fe_msgbuf, databuf, datasize);
+ appendBinaryStringInfo(cstate->fe_msgbuf, databuf, CopyUpdateBytesProgress(cstate, datasize));
}
static void
CopySendString(CopyState cstate, const char *str)
{
- appendBinaryStringInfo(cstate->fe_msgbuf, str, strlen(str));
+ appendBinaryStringInfo(cstate->fe_msgbuf, str, CopyUpdateBytesProgress(cstate, strlen(str)));
}
static void
CopySendChar(CopyState cstate, char c)
{
+ CopyUpdateBytesProgress(cstate, 1);
appendStringInfoCharMacro(cstate->fe_msgbuf, c);
}
@@ -565,7 +568,6 @@ CopySendEndOfRow(CopyState cstate)
(errcode_for_file_access(),
errmsg("could not write to COPY file: %m")));
}
- pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
break;
case COPY_OLD_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
@@ -618,7 +620,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
case COPY_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
- pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
@@ -713,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
break;
}
+ CopyUpdateBytesProgress(cstate, bytesread);
+
return bytesread;
}
@@ -1757,7 +1760,8 @@ BeginCopy(ParseState *pstate,
pgstat_progress_start_command(PROGRESS_COMMAND_COPY, cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
- pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, 0);
+ /* initialize progress */
+ cstate->bytes_processed = 0;
pgstat_progress_update_param(PROGRESS_COPY_IS_FROM, (int) cstate->is_copy_from);
pgstat_progress_update_param(PROGRESS_COPY_IS_FILE, (int) cstate->copy_dest == COPY_FILE);
pgstat_progress_update_param(PROGRESS_COPY_IS_PROGRAM, (int) cstate->is_program);
@@ -2135,6 +2139,8 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
+
+ /* Increment processed lines counter and update progress */
pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
@@ -3272,7 +3278,8 @@ CopyFrom(CopyState cstate)
/*
* We count only tuples not suppressed by a BEFORE INSERT trigger
* or FDW; this is the same definition used by nodeModifyTable.c
- * for counting tuples inserted by an INSERT command.
+ * for counting tuples inserted by an INSERT command. Update
+ * progress as well.
*/
pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
@@ -5127,6 +5134,7 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
+ /* Increment processed lines counter and update progress */
pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++myState->processed);
return true;
@@ -5169,3 +5177,17 @@ CreateCopyDestReceiver(void)
return (DestReceiver *) self;
}
+
+/*
+ * CopyUpdateBytesProgress --- increment bytes_processed
+ * on CopyState, updates progress and returns amount
+ * of processed bytes back;
+ */
+
+static int
+CopyUpdateBytesProgress(CopyState cstate, int newbytesprocessed)
+{
+ cstate->bytes_processed += newbytesprocessed;
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, cstate->bytes_processed);
+ return newbytesprocessed;
+}
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index dac5da4e6dd9..c04f7b94c3f0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1937,7 +1937,7 @@ pg_stat_progress_copy| SELECT s.pid,
((s.param2)::integer)::boolean AS file,
((s.param3)::integer)::boolean AS program,
s.param4 AS lines_processed,
- s.param5 AS file_bytes_processed
+ s.param5 AS bytes_processed
FROM (pg_stat_get_progress_info('COPY'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_create_index| SELECT s.pid,
copy-progress-v2.difftext/x-patch; charset=US-ASCII; name=copy-progress-v2.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dfa9d0d6410c..d4a9fa9e2f2f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -390,6 +390,13 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</entry>
</row>
+ <row>
+ <entry><structname>pg_stat_progress_copy</structname><indexterm><primary>pg_stat_progress_copy</primary></indexterm></entry>
+ <entry>One row for each backend running <command>COPY</command>, showing current progress.
+ See <xref linkend='copy-progress-reporting'/>.
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
@@ -5013,6 +5020,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
which support progress reporting are <command>ANALYZE</command>,
<command>CLUSTER</command>,
<command>CREATE INDEX</command>, <command>VACUUM</command>,
+ <command>COPY</command>
and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
command that <xref linkend="app-pgbasebackup"/> issues to take
a base backup).
@@ -6142,6 +6150,117 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
</table>
</sect2>
+
+ <sect2 id="copy-progress-reporting">
+ <title>COPY Progress Reporting</title>
+
+ <para>
+ Whenever <command>COPY</command> is running, the
+ <structname>pg_stat_copy_progress</structname> view will contain one row
+ for each backend that is currently running <command>COPY</command> command.
+ The table bellow describes the information that will be reported and provide
+ information how to interpret it.
+ </para>
+
+ <table id="pg-stat-progress-copy-view" xreflabel="pg_stat_progress_copy">
+ <title><structname>pg_stat_progress_copy</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of backend.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>text</type>
+ </para>
+ <para>
+ OID of the database to which this backend is connected.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database to which this backend is connected.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>relid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the table on which the <command>COPY</command> command is executed. It is set to 0 if <structfield>SELECT query</structfield> is provided.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>direction</structfield> <type>text</type>
+ </para>
+ <para>
+ Can be one of <structfield>TO</structfield> or <structfield>FROM</structfield>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>file</structfield> <type>boolean</type>
+ </para>
+ <para>
+ <structfield>true</structfield> if <structfield>filename</structfield> argument is used. Notice STDIN and STDOUT are considered file as well.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>program</structfield> <type>boolean</type>
+ </para>
+ <para>
+ <structfield>true</structfield> if <structfield>PROGRAM</structfield> option is used.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>line_processed</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of lines already processed by <command>COPY</command> command.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>bytes_processed</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of bytes already processed by <command>COPY</command> command.
+ </para></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
</sect1>
<sect1 id="dynamic-trace">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348fa7..53b38ad70cfc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1093,6 +1093,20 @@ CREATE VIEW pg_stat_progress_basebackup AS
S.param5 AS tablespaces_streamed
FROM pg_stat_get_progress_info('BASEBACKUP') AS S;
+CREATE VIEW pg_stat_progress_copy AS
+ SELECT
+ S.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE S.param1 WHEN 0 THEN 'TO'
+ WHEN 1 THEN 'FROM'
+ END as direction,
+ CAST (S.param2::integer AS bool) AS file,
+ CAST (S.param3::integer AS bool) AS program,
+ S.param4 AS lines_processed,
+ S.param5 AS bytes_processed
+ FROM pg_stat_get_progress_info('COPY') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d4cca6..1eccc7721fe5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -29,6 +29,7 @@
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
+#include "commands/progress.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
@@ -45,6 +46,7 @@
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -223,6 +225,7 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ uint64 bytes_processed; /* total # of bytes processed, used for progress reporting */
} CopyStateData;
/* DestReceiver for COPY (query) TO */
@@ -391,6 +394,7 @@ static void CopySendInt32(CopyState cstate, int32 val);
static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static int CopyUpdateBytesProgress(CopyState cstate, int newbytesprocessed);
/*
@@ -498,18 +502,19 @@ SendCopyEnd(CopyState cstate)
static void
CopySendData(CopyState cstate, const void *databuf, int datasize)
{
- appendBinaryStringInfo(cstate->fe_msgbuf, databuf, datasize);
+ appendBinaryStringInfo(cstate->fe_msgbuf, databuf, CopyUpdateBytesProgress(cstate, datasize));
}
static void
CopySendString(CopyState cstate, const char *str)
{
- appendBinaryStringInfo(cstate->fe_msgbuf, str, strlen(str));
+ appendBinaryStringInfo(cstate->fe_msgbuf, str, CopyUpdateBytesProgress(cstate, strlen(str)));
}
static void
CopySendChar(CopyState cstate, char c)
{
+ CopyUpdateBytesProgress(cstate, 1);
appendStringInfoCharMacro(cstate->fe_msgbuf, c);
}
@@ -709,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
break;
}
+ CopyUpdateBytesProgress(cstate, bytesread);
+
return bytesread;
}
@@ -1751,6 +1758,14 @@ BeginCopy(ParseState *pstate,
cstate->copy_dest = COPY_FILE; /* default */
+ pgstat_progress_start_command(PROGRESS_COMMAND_COPY, cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
+
+ /* initialize progress */
+ cstate->bytes_processed = 0;
+ pgstat_progress_update_param(PROGRESS_COPY_IS_FROM, (int) cstate->is_copy_from);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_FILE, (int) cstate->copy_dest == COPY_FILE);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_PROGRAM, (int) cstate->is_program);
+
MemoryContextSwitchTo(oldcontext);
return cstate;
@@ -1810,6 +1825,8 @@ EndCopy(CopyState cstate)
cstate->filename)));
}
+ pgstat_progress_end_command();
+
MemoryContextDelete(cstate->copycontext);
pfree(cstate);
}
@@ -2122,7 +2139,9 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
- processed++;
+
+ /* Increment processed lines counter and update progress */
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
ExecDropSingleTupleTableSlot(slot);
@@ -3259,9 +3278,10 @@ CopyFrom(CopyState cstate)
/*
* We count only tuples not suppressed by a BEFORE INSERT trigger
* or FDW; this is the same definition used by nodeModifyTable.c
- * for counting tuples inserted by an INSERT command.
+ * for counting tuples inserted by an INSERT command. Update
+ * progress as well.
*/
- processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
}
@@ -5114,7 +5134,8 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
- myState->processed++;
+ /* Increment processed lines counter and update progress */
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++myState->processed);
return true;
}
@@ -5156,3 +5177,17 @@ CreateCopyDestReceiver(void)
return (DestReceiver *) self;
}
+
+/*
+ * CopyUpdateBytesProgress --- increment bytes_processed
+ * on CopyState, updates progress and returns amount
+ * of processed bytes back;
+ */
+
+static int
+CopyUpdateBytesProgress(CopyState cstate, int newbytesprocessed)
+{
+ cstate->bytes_processed += newbytesprocessed;
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, cstate->bytes_processed);
+ return newbytesprocessed;
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2aff739466ff..b740eef7c102 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -494,6 +494,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
cmdtype = PROGRESS_COMMAND_BASEBACKUP;
+ else if (pg_strcasecmp(cmd, "COPY") == 0)
+ cmdtype = PROGRESS_COMMAND_COPY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 36b073e67757..3947222e6f61 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -133,4 +133,11 @@
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5
+/* Commands of PROGRESS_COPY */
+#define PROGRESS_COPY_IS_FROM 0
+#define PROGRESS_COPY_IS_FILE 1
+#define PROGRESS_COPY_IS_PROGRAM 2
+#define PROGRESS_COPY_LINES_PROCESSED 3
+#define PROGRESS_COPY_BYTES_PROCESSED 4
+
#endif
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 13872013823e..c338a1a5ba00 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -995,7 +995,8 @@ typedef enum ProgressCommandType
PROGRESS_COMMAND_ANALYZE,
PROGRESS_COMMAND_CLUSTER,
PROGRESS_COMMAND_CREATE_INDEX,
- PROGRESS_COMMAND_BASEBACKUP
+ PROGRESS_COMMAND_BASEBACKUP,
+ PROGRESS_COMMAND_COPY
} ProgressCommandType;
#define PGSTAT_NUM_PROGRESS_PARAM 20
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e322153d..c04f7b94c3f0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1925,6 +1925,21 @@ pg_stat_progress_cluster| SELECT s.pid,
s.param8 AS index_rebuild_count
FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_copy| SELECT s.pid,
+ s.datid,
+ d.datname,
+ s.relid,
+ CASE s.param1
+ WHEN 0 THEN 'TO'::text
+ WHEN 1 THEN 'FROM'::text
+ ELSE NULL::text
+ END AS direction,
+ ((s.param2)::integer)::boolean AS file,
+ ((s.param3)::integer)::boolean AS program,
+ s.param4 AS lines_processed,
+ s.param5 AS bytes_processed
+ FROM (pg_stat_get_progress_info('COPY'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_create_index| SELECT s.pid,
s.datid,
d.datname,
On 2020/06/21 20:33, Josef Šimánek wrote:
po 15. 6. 2020 v 6:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/06/14 21:32, Josef Šimánek wrote:
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.
Sounds nice!
file - bool - is file is used?
program - bool - is program used?Are these fields really necessary in a progress view?
What values are reported when STDOUT/STDIN is specified in COPY command?For STDOUT and STDIN file is true and program is false.
Could you tell me why these columns are necessary in *progress* view?
If we want to see what copy command is actually running, we can see
pg_stat_activity, instead. For example,
SELECT pc.*, a.query FROM pg_stat_progress_copy pc, pg_stat_activity a WHERE pc.pid = a.pid;
file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)What value is reported when STDOUT/STDIN is specified in COPY command?
For my first patch nothing was reported on STDOUT/STDIN usage. I'll attach new patch soon supporting those as well.
Thanks for the patch!
With the patch, pg_stat_progress_copy seems to report the progress of
the processing on file_fdw. Is this intentional?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Sun, Jun 21, 2020 at 5:11 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
Thanks for all comments. I have updated code to support more options (including STDIN/STDOUT) and added some documentation.
Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.
Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patchI'm also attaching screenshot of HTML documentation and html documentation file.
I'll do my best to get this to commitfest now.
ne 14. 6. 2020 v 14:32 odesílatel Josef Šimánek <josef.simanek@gmail.com> napsal:
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.
Few examples first:
"COPY (SELECT * FROM test) TO '/tmp/ids';"
yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 0 | TO | t | f | 3529943 | 24906226
(1 row)"COPY test FROM '/tmp/ids';
yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 16385 | FROM | t | f | 121591999 | 957218816
(1 row)Columns are inspired by CREATE INDEX progress report system view.
pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since oid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known (for example when copying to file, it is 0)
direction - text - one of "FROM" or "TO" depends on command used
file - bool - is file is used?
program - bool - is program used?
lines_processed - bigint - amount of processed lines, works for both directions (FROM/TO)
file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.
Few comments:
@@ -713,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int
minread, int maxread)
break;
}
+ CopyUpdateBytesProgress(cstate, bytesread);
+
return bytesread;
}
This is actually the read data, actual processing will happen later
like in CopyReadLineText, it would be better if
CopyUpdateBytesProgress is done later, if not it will give the same
value even though it does multiple inserts on the table.
lines_processed will keep getting updated but file_bytes_processed
will not be updated.
+pg_stat_progress_copy| SELECT s.pid,
+ s.datid,
+ d.datname,
+ s.relid,
+ CASE s.param1
+ WHEN 0 THEN 'TO'::text
+ WHEN 1 THEN 'FROM'::text
+ ELSE NULL::text
+ END AS direction,
+ ((s.param2)::integer)::boolean AS file,
+ ((s.param3)::integer)::boolean AS program,
+ s.param4 AS lines_processed,
+ s.param5 AS file_bytes_processed
You could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
po 22. 6. 2020 v 4:48 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com>
napsal:
On 2020/06/21 20:33, Josef Šimánek wrote:
po 15. 6. 2020 v 6:39 odesílatel Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/06/14 21:32, Josef Šimánek wrote:
Hello, as proposed by Pavel Stěhule and discussed on local czech
PostgreSQL maillist (
https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
I have prepared an initial patch for COPY command progress reporting.Sounds nice!
file - bool - is file is used?
program - bool - is program used?Are these fields really necessary in a progress view?
What values are reported when STDOUT/STDIN is specified in COPYcommand?
For STDOUT and STDIN file is true and program is false.
Could you tell me why these columns are necessary in *progress* view?
If we want to see what copy command is actually running, we can see
pg_stat_activity, instead. For example,SELECT pc.*, a.query FROM pg_stat_progress_copy pc, pg_stat_activity
a WHERE pc.pid = a.pid;
If that doesn't make any sense, I can remove those. I have not strong
opinion about those values. Those were just around when I was looking for
possible values to include in the progress report.
file_bytes_processed - amount of bytes processed when file is
used (otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)
What value is reported when STDOUT/STDIN is specified in COPY
command?
For my first patch nothing was reported on STDOUT/STDIN usage. I'll
attach new patch soon supporting those as well.
Thanks for the patch!
With the patch, pg_stat_progress_copy seems to report the progress of
the processing on file_fdw. Is this intentional?
Every action using internally COPY will be included in the progress report
view.
I have spotted for example pg_dump does that and is reported there as well.
I do not see any problem regarding this. For pg_dump it is consistent with
"pg_stat_activity" reporting COPY command in the query field.
Show quoted text
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
po 22. 6. 2020 v 9:15 odesílatel vignesh C <vignesh21@gmail.com> napsal:
On Sun, Jun 21, 2020 at 5:11 PM Josef Šimánek <josef.simanek@gmail.com>
wrote:Thanks for all comments. I have updated code to support more options
(including STDIN/STDOUT) and added some documentation.
Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.
Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patchI'm also attaching screenshot of HTML documentation and html
documentation file.
I'll do my best to get this to commitfest now.
ne 14. 6. 2020 v 14:32 odesílatel Josef Šimánek <josef.simanek@gmail.com>
napsal:
Hello, as proposed by Pavel Stěhule and discussed on local czech
PostgreSQL maillist (
https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
I have prepared an initial patch for COPY command progress reporting.Few examples first:
"COPY (SELECT * FROM test) TO '/tmp/ids';"
yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program |lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 0 | TO | t | f |
3529943 | 24906226
(1 row)
"COPY test FROM '/tmp/ids';
yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program |lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 16385 | FROM | t | f |
121591999 | 957218816
(1 row)
Columns are inspired by CREATE INDEX progress report system view.
pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, sinceoid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known (for
example when copying to file, it is 0)
direction - text - one of "FROM" or "TO" depends on command used
file - bool - is file is used?
program - bool - is program used?
lines_processed - bigint - amount of processed lines, works for bothdirections (FROM/TO)
file_bytes_processed - amount of bytes processed when file is used
(otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)
Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.
Few comments:
@@ -713,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int
minread, int maxread)
break;
}+ CopyUpdateBytesProgress(cstate, bytesread); + return bytesread; }This is actually the read data, actual processing will happen later
like in CopyReadLineText, it would be better if
CopyUpdateBytesProgress is done later, if not it will give the same
value even though it does multiple inserts on the table.
lines_processed will keep getting updated but file_bytes_processed
will not be updated.
First I would like to explain what's reported (or at least I'm trying to
get reported) at bytes_processed column.
When exporting to file it should start at 0 and end up at the actual final
file size.
When importing from file, it should do the same. You can check file size
before you start COPY FROM and get actual progress looking at
bytes_processed.
This column is just a counter of bytes read from input on COPY FROM or
amount of bytes going through COPY TO.
Thanks for the hint regarding "CopyReadLineText". I'll take a look.
For now I have tested those cases:
CREATE TABLE test(id int);
INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
COPY (SELECT * FROM test) TO '/tmp/ids';
COPY test FROM '/tmp/ids';
psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO
STDOUT;' > /tmp/ryba.txt
echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
It is easy to check lines count and bytes count are in sync (since 1 line
is 2 bytes here - "1" and newline character).
I'll try to check more complex COPY commands to ensure everything is in
sync.
If you have any ideas for testing queries, feel free to suggest.
+pg_stat_progress_copy| SELECT s.pid,
+ s.datid, + d.datname, + s.relid, + CASE s.param1 + WHEN 0 THEN 'TO'::text + WHEN 1 THEN 'FROM'::text + ELSE NULL::text + END AS direction, + ((s.param2)::integer)::boolean AS file, + ((s.param3)::integer)::boolean AS program, + s.param4 AS lines_processed, + s.param5 AS file_bytes_processedYou could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.
I was looking at the rest of reporting views and for me those seem to be
just basic ones providing just raw data to be used later in custom nice
friendly human-readable views built on the client side.
For example "pg_stat_progress_basebackup" also reports "backup_streamed" in
raw form.
Anyway if you would like to make this view more user-friendly, I can add
that. Just ping me.
Regards,
Show quoted text
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
Thanks for all comments. I have updated code to support more options
(including STDIN/STDOUT) and added some documentation.Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patchI'm also attaching screenshot of HTML documentation and html documentation
file.I'll do my best to get this to commitfest now.
I see we're not showing the total number of bytes the COPY is expected
to process, which makes it hard to estimate how far we actually are.
Clearly there are cases when we really don't know that (exports, import
from stdin/program), but why not to show file size for imports from a
file? I'd expect that to be the most common case.
I wonder if it made sense to show some estimates in the other cases. For
example when exporting query result, maybe we could show the estimated
number of rows and size? Of course, that's prone to estimation errors
and it's more a wild idea for the future, I don't expect this patch to
implement that.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
po 22. 6. 2020 v 14:14 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
napsal:
On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
Thanks for all comments. I have updated code to support more options
(including STDIN/STDOUT) and added some documentation.Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patchI'm also attaching screenshot of HTML documentation and html documentation
file.I'll do my best to get this to commitfest now.
I see we're not showing the total number of bytes the COPY is expected
to process, which makes it hard to estimate how far we actually are.
Clearly there are cases when we really don't know that (exports, import
from stdin/program), but why not to show file size for imports from a
file? I'd expect that to be the most common case.
For COPY FROM file fstat is done and info is available already at
https://github.com/postgres/postgres/blob/fe186b4c200b76a5c0f03379fe8645ed1c70a844/src/backend/commands/copy.c#L1934.
It should be easy to update some param (param6 for example) with file size
and expose it in report view. When not available, this column can be NULL.
Would that be enough?
On the other side everyone can check file size manually to get total value
expected and just compare to reported bytes_processed. Alt. "wc -l" can be
checked to get amount of lines and check lines_processed column to get
progress. Should it check amount of lines and populate another column with
lines total (using a configured separator) as well? AFAIK that would need
full file scan which can be slow for huge files.
I wonder if it made sense to show some estimates in the other cases. For
example when exporting query result, maybe we could show the estimated
number of rows and size? Of course, that's prone to estimation errors
and it's more a wild idea for the future, I don't expect this patch to
implement that.
My plan here was to expose numbers not being currently available and let
clients get the rest of info on their own.
For example:
- for "COPY (query) TO file" - EXPLAIN or COUNT variant of query could be
executed before to get the amount of expected rows
- for "COPY table FROM file" - file size or amount of lines in file can be
inspected first to get amount of expected rows or bytes to be processed
I see the current system view in my patch (and also all other report views
currently available) more as a scaffold to build own tools.
For example CLI tools can use this to provide some kind of progress.
Show quoted text
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jun 22, 2020 at 03:33:00PM +0200, Josef Šimánek wrote:
po 22. 6. 2020 v 14:14 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
napsal:On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
Thanks for all comments. I have updated code to support more options
(including STDIN/STDOUT) and added some documentation.Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patchI'm also attaching screenshot of HTML documentation and html documentation
file.I'll do my best to get this to commitfest now.
I see we're not showing the total number of bytes the COPY is expected
to process, which makes it hard to estimate how far we actually are.
Clearly there are cases when we really don't know that (exports, import
from stdin/program), but why not to show file size for imports from a
file? I'd expect that to be the most common case.For COPY FROM file fstat is done and info is available already at
https://github.com/postgres/postgres/blob/fe186b4c200b76a5c0f03379fe8645ed1c70a844/src/backend/commands/copy.c#L1934.
It should be easy to update some param (param6 for example) with file size
and expose it in report view. When not available, this column can be NULL.Would that be enough?
Yes, I think that'd be fine. The rows without a file should have NULL,
because we literally don't know what the value is. And 0 is a valid file
size, so we can't use it anyway.
On the other side everyone can check file size manually to get total value
expected and just compare to reported bytes_processed. Alt. "wc -l" can be
checked to get amount of lines and check lines_processed column to get
progress. Should it check amount of lines and populate another column with
lines total (using a configured separator) as well? AFAIK that would need
full file scan which can be slow for huge files.
Sure, but the extra `wc -l` is less convenient and you then need to
combine that with pg_stat_progress_copy. With the information right in
the view, you can do (100.0 * bytes_processed / bytes_total) and you get
the progress as a percentage. (I've omitted the NULL handling.)
As for the number of lines, I certainly don't think we need to scan the
file - that'd be far too expensive. What we might do is estimate it as
total_bytes / (processed_bytes / processed_rows)
but that's something people can easily do on their own. So I don't think
it needs to be part of the patch, and IMHO bytes_processed / bytes_total
is a sufficient measure of progress.
I wonder if it made sense to show some estimates in the other cases. For
example when exporting query result, maybe we could show the estimated
number of rows and size? Of course, that's prone to estimation errors
and it's more a wild idea for the future, I don't expect this patch to
implement that.My plan here was to expose numbers not being currently available and let
clients get the rest of info on their own.For example:
- for "COPY (query) TO file" - EXPLAIN or COUNT variant of query could be
executed before to get the amount of expected rows
- for "COPY table FROM file" - file size or amount of lines in file can be
inspected first to get amount of expected rows or bytes to be processedI see the current system view in my patch (and also all other report views
currently available) more as a scaffold to build own tools.For example CLI tools can use this to provide some kind of progress.
True, but I'd advise against putting this into v1 of the patch. Let's
keep it simple, get it committed and then maybe improve it later.
Some of these stats (like the estimates from a query) may be quite
unreliable, so I think it needs more discussion. We might invent
lines_estimated or something like that, for example.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
Thanks for the hint regarding "CopyReadLineText". I'll take a look.
For now I have tested those cases:
CREATE TABLE test(id int);
INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
COPY (SELECT * FROM test) TO '/tmp/ids';
COPY test FROM '/tmp/ids';psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
I'll try to check more complex COPY commands to ensure everything is in sync.If you have any ideas for testing queries, feel free to suggest.
For copy from statement you could attach the session, put a breakpoint
at CopyReadLineText, execution will hit this breakpoint for every
record it is doing COPY FROM and parallely check if
pg_stat_progress_copy is getting updated correctly. I noticed it was
showing the file read size instead of the actual processed bytes.
+pg_stat_progress_copy| SELECT s.pid, + s.datid, + d.datname, + s.relid, + CASE s.param1 + WHEN 0 THEN 'TO'::text + WHEN 1 THEN 'FROM'::text + ELSE NULL::text + END AS direction, + ((s.param2)::integer)::boolean AS file, + ((s.param3)::integer)::boolean AS program, + s.param4 AS lines_processed, + s.param5 AS file_bytes_processedYou could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
I felt we could add pg_size_pretty to make the view more user friendly.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:
On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
Thanks for the hint regarding "CopyReadLineText". I'll take a look.
For now I have tested those cases:
CREATE TABLE test(id int);
INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
COPY (SELECT * FROM test) TO '/tmp/ids';
COPY test FROM '/tmp/ids';psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
I'll try to check more complex COPY commands to ensure everything is in sync.If you have any ideas for testing queries, feel free to suggest.
For copy from statement you could attach the session, put a breakpoint
at CopyReadLineText, execution will hit this breakpoint for every
record it is doing COPY FROM and parallely check if
pg_stat_progress_copy is getting updated correctly. I noticed it was
showing the file read size instead of the actual processed bytes.+pg_stat_progress_copy| SELECT s.pid, + s.datid, + d.datname, + s.relid, + CASE s.param1 + WHEN 0 THEN 'TO'::text + WHEN 1 THEN 'FROM'::text + ELSE NULL::text + END AS direction, + ((s.param2)::integer)::boolean AS file, + ((s.param3)::integer)::boolean AS program, + s.param4 AS lines_processed, + s.param5 AS file_bytes_processedYou could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
I felt we could add pg_size_pretty to make the view more user friendly.
Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
út 23. 6. 2020 v 13:15 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
napsal:
On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:
On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <josef.simanek@gmail.com>
wrote:
Thanks for the hint regarding "CopyReadLineText". I'll take a look.
For now I have tested those cases:
CREATE TABLE test(id int);
INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
COPY (SELECT * FROM test) TO '/tmp/ids';
COPY test FROM '/tmp/ids';psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000))
TO STDOUT;' > /tmp/ryba.txt
echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
It is easy to check lines count and bytes count are in sync (since 1
line is 2 bytes here - "1" and newline character).
I'll try to check more complex COPY commands to ensure everything is in
sync.
If you have any ideas for testing queries, feel free to suggest.
For copy from statement you could attach the session, put a breakpoint
at CopyReadLineText, execution will hit this breakpoint for every
record it is doing COPY FROM and parallely check if
pg_stat_progress_copy is getting updated correctly. I noticed it was
showing the file read size instead of the actual processed bytes.+pg_stat_progress_copy| SELECT s.pid, + s.datid, + d.datname, + s.relid, + CASE s.param1 + WHEN 0 THEN 'TO'::text + WHEN 1 THEN 'FROM'::text + ELSE NULL::text + END AS direction, + ((s.param2)::integer)::boolean AS file, + ((s.param3)::integer)::boolean AS program, + s.param4 AS lines_processed, + s.param5 AS file_bytes_processedYou could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.I was looking at the rest of reporting views and for me those seem to
be just basic ones providing just raw data to be used later in custom nice
friendly human-readable views built on the client side.For example "pg_stat_progress_basebackup" also reports
"backup_streamed" in raw form.
Anyway if you would like to make this view more user-friendly, I can
add that. Just ping me.
I felt we could add pg_size_pretty to make the view more user friendly.
Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.
+1, *_pretty functions should be used on the client side only. Server side
(source) should be in raw format.
Regards
Pavel
Show quoted text
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
út 23. 6. 2020 v 13:15 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
napsal:
On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:
On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <josef.simanek@gmail.com>
wrote:
Thanks for the hint regarding "CopyReadLineText". I'll take a look.
For now I have tested those cases:
CREATE TABLE test(id int);
INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
COPY (SELECT * FROM test) TO '/tmp/ids';
COPY test FROM '/tmp/ids';psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000))
TO STDOUT;' > /tmp/ryba.txt
echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
It is easy to check lines count and bytes count are in sync (since 1
line is 2 bytes here - "1" and newline character).
I'll try to check more complex COPY commands to ensure everything is in
sync.
If you have any ideas for testing queries, feel free to suggest.
For copy from statement you could attach the session, put a breakpoint
at CopyReadLineText, execution will hit this breakpoint for every
record it is doing COPY FROM and parallely check if
pg_stat_progress_copy is getting updated correctly. I noticed it was
showing the file read size instead of the actual processed bytes.+pg_stat_progress_copy| SELECT s.pid, + s.datid, + d.datname, + s.relid, + CASE s.param1 + WHEN 0 THEN 'TO'::text + WHEN 1 THEN 'FROM'::text + ELSE NULL::text + END AS direction, + ((s.param2)::integer)::boolean AS file, + ((s.param3)::integer)::boolean AS program, + s.param4 AS lines_processed, + s.param5 AS file_bytes_processedYou could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.I was looking at the rest of reporting views and for me those seem to
be just basic ones providing just raw data to be used later in custom nice
friendly human-readable views built on the client side.For example "pg_stat_progress_basebackup" also reports
"backup_streamed" in raw form.
Anyway if you would like to make this view more user-friendly, I can
add that. Just ping me.
I felt we could add pg_size_pretty to make the view more user friendly.
Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.
I think the same.
Show quoted text
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
po 15. 6. 2020 v 7:34 odesílatel Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> napsal:
I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems).
1. Is that a good way to get progress of file processing?
IMO, it's better to handle the error cases. One possible case where
ftell can return -1 and set errno is when the total bytes processed is
more than LONG_MAX.Will your patch handle file_bytes_processed reporting for COPY FROM
STDIN cases? For this case, ftell can't be used.Instead of using ftell and worrying about the errors, a simple
approach could be to have a uint64 variable in CopyStateData to track
the number of bytes read whenever CopyGetData is called. This approach
can also handle the case of COPY FROM STDIN.Thanks for suggestion. I used this approach and latest patch supports both STDIN and STDOUT now.
Thanks.
It would be good to see the performance of the copy command(probably
with a few GBs of data) with patch and without patch for both csv/text
and binary files.
For copy from command CopyGetData gets called for every
RAW_BUF_SIZE(64KB) and so is CopyUpdateBytesProgress function, but for
binary format files, CopyGetData gets called for each field/column for
all rows/lines/tuples.
Can we make CopyUpdateBytesProgress() a macro or an inline
function(probably by using pg_attribute_always_inline) to reduce
function call overhead as it just handles two statements?
I tried to apply the patch on commit #
7ce461560159948ba0c802c767e42c5f5ae08b4a, seems like a warning.
bharath:postgres$ git apply /mnt/hgfs/Downloads/copy-progress-v2.diff
/mnt/hgfs/Downloads/copy-progress-v2.diff:277: trailing whitespace.
* for counting tuples inserted by an INSERT
command. Update
warning: 1 line adds whitespace errors.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 2020/06/22 17:21, Josef Šimánek wrote:
po 22. 6. 2020 v 4:48 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/06/21 20:33, Josef Šimánek wrote:
po 15. 6. 2020 v 6:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> napsal:
On 2020/06/14 21:32, Josef Šimánek wrote:
> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.Sounds nice!
> file - bool - is file is used?
> program - bool - is program used?Are these fields really necessary in a progress view?
What values are reported when STDOUT/STDIN is specified in COPY command?For STDOUT and STDIN file is true and program is false.
Could you tell me why these columns are necessary in *progress* view?
If we want to see what copy command is actually running, we can see
pg_stat_activity, instead. For example,SELECT pc.*, a.query FROM pg_stat_progress_copy pc, pg_stat_activity a WHERE pc.pid = a.pid;
If that doesn't make any sense, I can remove those. I have not strong opinion about those values. Those were just around when I was looking for possible values to include in the progress report.
I vote not to expose them. *If* we expose them, we should also
expose the options in pg_stat_progress_xxx views, for example,
the options for BASE_BACKUP command in pg_stat_progress_basebackup,
for the consistency. But I don't think that makes sense.
> file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
> FROM/TO) when file is used (file = t)What value is reported when STDOUT/STDIN is specified in COPY command?
For my first patch nothing was reported on STDOUT/STDIN usage. I'll attach new patch soon supporting those as well.
Thanks for the patch!
With the patch, pg_stat_progress_copy seems to report the progress of
the processing on file_fdw. Is this intentional?Every action using internally COPY will be included in the progress report view.
I have spotted for example pg_dump does that and is reported there as well.
I do not see any problem regarding this. For pg_dump it is consistent with "pg_stat_activity" reporting COPY command in the query field.
So it's better to add this kind of information into the docs?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/06/22 21:14, Tomas Vondra wrote:
On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
Thanks for all comments. I have updated code to support more options
(including STDIN/STDOUT) and added some documentation.Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patchI'm also attaching screenshot of HTML documentation and html documentation
file.I'll do my best to get this to commitfest now.
I see we're not showing the total number of bytes the COPY is expected
to process, which makes it hard to estimate how far we actually are.
Clearly there are cases when we really don't know that (exports, import
from stdin/program), but why not to show file size for imports from a
file? I'd expect that to be the most common case.
+1
I like using \copy psql meta command. So I feel better if the total size
is reported even when using \copy (i.e., COPY STDIN).
As just idea, what about adding new option into COPY command,
allowing users (including \copy command) to specify the estimated size
of input file in that option, and making pg_stat_progress_copy view
display it as the total size? If we implement this mechanism, we can
change \copy command so that it calculate the actual size of input file
and specify it in that option.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Tue, Jun 23, 2020 at 4:45 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
I felt we could add pg_size_pretty to make the view more user friendly.
Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.
I thought of including pg_size_pretty as we there was no total_bytes
to compare with, but I'm ok without it too as there is an option for
user to always include it in the client side like "SELECT
pg_size_pretty(file_bytes_processed) from pg_stat_progress_copy;" if
required.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
The automated patchtester for the Commitfest gets confused when there are two
versions of the same changeset attached to the email, as it tries to apply them
both which obviously results in an application failure. I've attached just the
previously submitted patch version to this email to see if we can get a test
run of it.
cheers ./daniel
Attachments:
copy-progress-v2.patchapplication/octet-stream; name=copy-progress-v2.patch; x-unix-mode=0644Download
From 64bf6cd66478e5b5b710d404b726c4147a98a660 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.simanek@gmail.com>
Date: Mon, 8 Jun 2020 02:00:53 +0200
Subject: [PATCH 1/3] Initial work on COPY progress.
---
src/backend/commands/copy.c | 13 ++++++++++---
src/backend/utils/adt/pgstatfuncs.c | 2 ++
src/include/commands/progress.h | 3 +++
src/include/pgstat.h | 3 ++-
4 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d4cca6..bef551770862 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -29,6 +29,7 @@
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
+#include "commands/progress.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
@@ -45,6 +46,7 @@
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -1751,6 +1753,9 @@ BeginCopy(ParseState *pstate,
cstate->copy_dest = COPY_FILE; /* default */
+ pgstat_progress_start_command(PROGRESS_COMMAND_COPY, queryRelId);
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED,0);
+
MemoryContextSwitchTo(oldcontext);
return cstate;
@@ -1810,6 +1815,8 @@ EndCopy(CopyState cstate)
cstate->filename)));
}
+ pgstat_progress_end_command();
+
MemoryContextDelete(cstate->copycontext);
pfree(cstate);
}
@@ -2122,7 +2129,7 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
- processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
}
ExecDropSingleTupleTableSlot(slot);
@@ -3261,7 +3268,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
}
}
@@ -5114,7 +5121,7 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
- myState->processed++;
+ pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++myState->processed);
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2aff739466ff..b740eef7c102 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -494,6 +494,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
cmdtype = PROGRESS_COMMAND_BASEBACKUP;
+ else if (pg_strcasecmp(cmd, "COPY") == 0)
+ cmdtype = PROGRESS_COMMAND_COPY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 36b073e67757..2d72f12a75b5 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -133,4 +133,7 @@
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5
+/* Commands of PROGRESS_CLUSTER */
+#define PROGRESS_COPY_PROCESSED 0
+
#endif
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 13872013823e..c338a1a5ba00 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -995,7 +995,8 @@ typedef enum ProgressCommandType
PROGRESS_COMMAND_ANALYZE,
PROGRESS_COMMAND_CLUSTER,
PROGRESS_COMMAND_CREATE_INDEX,
- PROGRESS_COMMAND_BASEBACKUP
+ PROGRESS_COMMAND_BASEBACKUP,
+ PROGRESS_COMMAND_COPY
} ProgressCommandType;
#define PGSTAT_NUM_PROGRESS_PARAM 20
From f9cb4fe9b3fc223cb3174044ea00edbb96675138 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.simanek@gmail.com>
Date: Sun, 14 Jun 2020 02:46:04 +0200
Subject: [PATCH 2/3] Enhance copy progress with more info. - add
pg_stat_progress_copy system view
---
src/backend/catalog/system_views.sql | 14 ++++++++++++++
src/backend/commands/copy.c | 16 +++++++++++-----
src/include/commands/progress.h | 8 ++++++--
src/test/regress/expected/rules.out | 15 +++++++++++++++
4 files changed, 46 insertions(+), 7 deletions(-)
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348fa7..7bc429ce299e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1093,6 +1093,20 @@ CREATE VIEW pg_stat_progress_basebackup AS
S.param5 AS tablespaces_streamed
FROM pg_stat_get_progress_info('BASEBACKUP') AS S;
+CREATE VIEW pg_stat_progress_copy AS
+ SELECT
+ S.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE S.param1 WHEN 0 THEN 'TO'
+ WHEN 1 THEN 'FROM'
+ END as direction,
+ CAST (S.param2::integer AS bool) AS file,
+ CAST (S.param3::integer AS bool) AS program,
+ S.param4 AS lines_processed,
+ S.param5 AS file_bytes_processed
+ FROM pg_stat_get_progress_info('COPY') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index bef551770862..a733b6e1c676 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -565,6 +565,7 @@ CopySendEndOfRow(CopyState cstate)
(errcode_for_file_access(),
errmsg("could not write to COPY file: %m")));
}
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
break;
case COPY_OLD_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
@@ -617,6 +618,7 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
case COPY_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
@@ -1753,8 +1755,12 @@ BeginCopy(ParseState *pstate,
cstate->copy_dest = COPY_FILE; /* default */
- pgstat_progress_start_command(PROGRESS_COMMAND_COPY, queryRelId);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED,0);
+ pgstat_progress_start_command(PROGRESS_COMMAND_COPY, cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
+
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, 0);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_FROM, (int) cstate->is_copy_from);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_FILE, (int) cstate->copy_dest == COPY_FILE);
+ pgstat_progress_update_param(PROGRESS_COPY_IS_PROGRAM, (int) cstate->is_program);
MemoryContextSwitchTo(oldcontext);
@@ -2129,7 +2135,7 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
ExecDropSingleTupleTableSlot(slot);
@@ -3268,7 +3274,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
}
@@ -5121,7 +5127,7 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
- pgstat_progress_update_param(PROGRESS_COPY_PROCESSED, ++myState->processed);
+ pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++myState->processed);
return true;
}
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 2d72f12a75b5..3947222e6f61 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -133,7 +133,11 @@
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5
-/* Commands of PROGRESS_CLUSTER */
-#define PROGRESS_COPY_PROCESSED 0
+/* Commands of PROGRESS_COPY */
+#define PROGRESS_COPY_IS_FROM 0
+#define PROGRESS_COPY_IS_FILE 1
+#define PROGRESS_COPY_IS_PROGRAM 2
+#define PROGRESS_COPY_LINES_PROCESSED 3
+#define PROGRESS_COPY_BYTES_PROCESSED 4
#endif
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e322153d..dac5da4e6dd9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1925,6 +1925,21 @@ pg_stat_progress_cluster| SELECT s.pid,
s.param8 AS index_rebuild_count
FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_copy| SELECT s.pid,
+ s.datid,
+ d.datname,
+ s.relid,
+ CASE s.param1
+ WHEN 0 THEN 'TO'::text
+ WHEN 1 THEN 'FROM'::text
+ ELSE NULL::text
+ END AS direction,
+ ((s.param2)::integer)::boolean AS file,
+ ((s.param3)::integer)::boolean AS program,
+ s.param4 AS lines_processed,
+ s.param5 AS file_bytes_processed
+ FROM (pg_stat_get_progress_info('COPY'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_create_index| SELECT s.pid,
s.datid,
d.datname,
From 5b1ddddf0c0a83606757b762f996e9c40469a831 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.simanek@gmail.com>
Date: Sun, 21 Jun 2020 13:15:57 +0200
Subject: [PATCH 3/3] Support STDIN, STDOUT. Add docs.
---
doc/src/sgml/monitoring.sgml | 119 +++++++++++++++++++++++++++
src/backend/catalog/system_views.sql | 2 +-
src/backend/commands/copy.c | 34 ++++++--
src/test/regress/expected/rules.out | 2 +-
4 files changed, 149 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dfa9d0d6410c..d4a9fa9e2f2f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -390,6 +390,13 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</entry>
</row>
+ <row>
+ <entry><structname>pg_stat_progress_copy</structname><indexterm><primary>pg_stat_progress_copy</primary></indexterm></entry>
+ <entry>One row for each backend running <command>COPY</command>, showing current progress.
+ See <xref linkend='copy-progress-reporting'/>.
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
@@ -5013,6 +5020,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
which support progress reporting are <command>ANALYZE</command>,
<command>CLUSTER</command>,
<command>CREATE INDEX</command>, <command>VACUUM</command>,
+ <command>COPY</command>
and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
command that <xref linkend="app-pgbasebackup"/> issues to take
a base backup).
@@ -6142,6 +6150,117 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
</table>
</sect2>
+
+ <sect2 id="copy-progress-reporting">
+ <title>COPY Progress Reporting</title>
+
+ <para>
+ Whenever <command>COPY</command> is running, the
+ <structname>pg_stat_copy_progress</structname> view will contain one row
+ for each backend that is currently running <command>COPY</command> command.
+ The table bellow describes the information that will be reported and provide
+ information how to interpret it.
+ </para>
+
+ <table id="pg-stat-progress-copy-view" xreflabel="pg_stat_progress_copy">
+ <title><structname>pg_stat_progress_copy</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of backend.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>text</type>
+ </para>
+ <para>
+ OID of the database to which this backend is connected.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database to which this backend is connected.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>relid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the table on which the <command>COPY</command> command is executed. It is set to 0 if <structfield>SELECT query</structfield> is provided.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>direction</structfield> <type>text</type>
+ </para>
+ <para>
+ Can be one of <structfield>TO</structfield> or <structfield>FROM</structfield>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>file</structfield> <type>boolean</type>
+ </para>
+ <para>
+ <structfield>true</structfield> if <structfield>filename</structfield> argument is used. Notice STDIN and STDOUT are considered file as well.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>program</structfield> <type>boolean</type>
+ </para>
+ <para>
+ <structfield>true</structfield> if <structfield>PROGRAM</structfield> option is used.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>line_processed</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of lines already processed by <command>COPY</command> command.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>bytes_processed</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of bytes already processed by <command>COPY</command> command.
+ </para></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
</sect1>
<sect1 id="dynamic-trace">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7bc429ce299e..53b38ad70cfc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1103,7 +1103,7 @@ CREATE VIEW pg_stat_progress_copy AS
CAST (S.param2::integer AS bool) AS file,
CAST (S.param3::integer AS bool) AS program,
S.param4 AS lines_processed,
- S.param5 AS file_bytes_processed
+ S.param5 AS bytes_processed
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a733b6e1c676..1eccc7721fe5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -225,6 +225,7 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ uint64 bytes_processed; /* total # of bytes processed, used for progress reporting */
} CopyStateData;
/* DestReceiver for COPY (query) TO */
@@ -393,6 +394,7 @@ static void CopySendInt32(CopyState cstate, int32 val);
static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static int CopyUpdateBytesProgress(CopyState cstate, int newbytesprocessed);
/*
@@ -500,18 +502,19 @@ SendCopyEnd(CopyState cstate)
static void
CopySendData(CopyState cstate, const void *databuf, int datasize)
{
- appendBinaryStringInfo(cstate->fe_msgbuf, databuf, datasize);
+ appendBinaryStringInfo(cstate->fe_msgbuf, databuf, CopyUpdateBytesProgress(cstate, datasize));
}
static void
CopySendString(CopyState cstate, const char *str)
{
- appendBinaryStringInfo(cstate->fe_msgbuf, str, strlen(str));
+ appendBinaryStringInfo(cstate->fe_msgbuf, str, CopyUpdateBytesProgress(cstate, strlen(str)));
}
static void
CopySendChar(CopyState cstate, char c)
{
+ CopyUpdateBytesProgress(cstate, 1);
appendStringInfoCharMacro(cstate->fe_msgbuf, c);
}
@@ -565,7 +568,6 @@ CopySendEndOfRow(CopyState cstate)
(errcode_for_file_access(),
errmsg("could not write to COPY file: %m")));
}
- pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
break;
case COPY_OLD_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
@@ -618,7 +620,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
case COPY_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
- pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, ftell(cstate->copy_file));
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
@@ -713,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
break;
}
+ CopyUpdateBytesProgress(cstate, bytesread);
+
return bytesread;
}
@@ -1757,7 +1760,8 @@ BeginCopy(ParseState *pstate,
pgstat_progress_start_command(PROGRESS_COMMAND_COPY, cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
- pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, 0);
+ /* initialize progress */
+ cstate->bytes_processed = 0;
pgstat_progress_update_param(PROGRESS_COPY_IS_FROM, (int) cstate->is_copy_from);
pgstat_progress_update_param(PROGRESS_COPY_IS_FILE, (int) cstate->copy_dest == COPY_FILE);
pgstat_progress_update_param(PROGRESS_COPY_IS_PROGRAM, (int) cstate->is_program);
@@ -2135,6 +2139,8 @@ CopyTo(CopyState cstate)
/* Format and send the data */
CopyOneRowTo(cstate, slot);
+
+ /* Increment processed lines counter and update progress */
pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
@@ -3272,7 +3278,8 @@ CopyFrom(CopyState cstate)
/*
* We count only tuples not suppressed by a BEFORE INSERT trigger
* or FDW; this is the same definition used by nodeModifyTable.c
- * for counting tuples inserted by an INSERT command.
+ * for counting tuples inserted by an INSERT command. Update
+ * progress as well.
*/
pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++processed);
}
@@ -5127,6 +5134,7 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
/* Send the data */
CopyOneRowTo(cstate, slot);
+ /* Increment processed lines counter and update progress */
pgstat_progress_update_param(PROGRESS_COPY_LINES_PROCESSED, ++myState->processed);
return true;
@@ -5169,3 +5177,17 @@ CreateCopyDestReceiver(void)
return (DestReceiver *) self;
}
+
+/*
+ * CopyUpdateBytesProgress --- increment bytes_processed
+ * on CopyState, updates progress and returns amount
+ * of processed bytes back;
+ */
+
+static int
+CopyUpdateBytesProgress(CopyState cstate, int newbytesprocessed)
+{
+ cstate->bytes_processed += newbytesprocessed;
+ pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, cstate->bytes_processed);
+ return newbytesprocessed;
+}
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index dac5da4e6dd9..c04f7b94c3f0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1937,7 +1937,7 @@ pg_stat_progress_copy| SELECT s.pid,
((s.param2)::integer)::boolean AS file,
((s.param3)::integer)::boolean AS program,
s.param4 AS lines_processed,
- s.param5 AS file_bytes_processed
+ s.param5 AS bytes_processed
FROM (pg_stat_get_progress_info('COPY'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_create_index| SELECT s.pid,
čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
The automated patchtester for the Commitfest gets confused when there are
two
versions of the same changeset attached to the email, as it tries to apply
them
both which obviously results in an application failure. I've attached
just the
previously submitted patch version to this email to see if we can get a
test
run of it.
Thanks, I'm new to commitfest and I was confused as well. I tried to
reattach the thread there. I'll prepare a new patch soon, what should I do?
Just attach it again?
Show quoted text
cheers ./daniel
On 2 Jul 2020, at 14:42, Josef Šimánek <josef.simanek@gmail.com> wrote:
čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:
The automated patchtester for the Commitfest gets confused when there are two
versions of the same changeset attached to the email, as it tries to apply them
both which obviously results in an application failure. I've attached just the
previously submitted patch version to this email to see if we can get a test
run of it.Thanks, I'm new to commitfest and I was confused as well.
No worries, we're here to help.
I tried to reattach the thread there. I'll prepare a new patch soon, what should I do? Just attach it again?
Correct, just reply to the thread with a new version of the patch attached, and
it'll get picked up automatically. No need to do anything more.
cheers ./daniel
On 2020/07/02 21:51, Daniel Gustafsson wrote:
On 2 Jul 2020, at 14:42, Josef Šimánek <josef.simanek@gmail.com> wrote:
čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:The automated patchtester for the Commitfest gets confused when there are two
versions of the same changeset attached to the email, as it tries to apply them
both which obviously results in an application failure. I've attached just the
previously submitted patch version to this email to see if we can get a test
run of it.Thanks, I'm new to commitfest and I was confused as well.
No worries, we're here to help.
I tried to reattach the thread there. I'll prepare a new patch soon, what should I do? Just attach it again?
New patch has not been sent yet.
So I marked this as "Waiting on Author" at Commit Fest.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Thanks for the info. I am waiting for review. Is there any summary of
requested changes needed?
Dne út 28. 7. 2020 19:00 uživatel Fujii Masao <masao.fujii@oss.nttdata.com>
napsal:
Show quoted text
On 2020/07/02 21:51, Daniel Gustafsson wrote:
On 2 Jul 2020, at 14:42, Josef Šimánek <josef.simanek@gmail.com> wrote:
čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson <daniel@yesql.se<mailto:daniel@yesql.se>> napsal:
The automated patchtester for the Commitfest gets confused when there
are two
versions of the same changeset attached to the email, as it tries to
apply them
both which obviously results in an application failure. I've attached
just the
previously submitted patch version to this email to see if we can get a
test
run of it.
Thanks, I'm new to commitfest and I was confused as well.
No worries, we're here to help.
I tried to reattach the thread there. I'll prepare a new patch soon,
what should I do? Just attach it again?
New patch has not been sent yet.
So I marked this as "Waiting on Author" at Commit Fest.Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
út 28. 7. 2020 v 20:25 odesílatel Josef Šimánek <josef.simanek@gmail.com>
napsal:
Thanks for the info. I am waiting for review. Is there any summary of
requested changes needed?
Maybe it is just noise - you wrote so you will resend a patch to different
thread
I tried to reattach the thread there. I'll prepare a new patch soon,
what should I do? Just attach it again?
Regards
Pavel
Show quoted text
Dne út 28. 7. 2020 19:00 uživatel Fujii Masao <masao.fujii@oss.nttdata.com>
napsal:On 2020/07/02 21:51, Daniel Gustafsson wrote:
On 2 Jul 2020, at 14:42, Josef Šimánek <josef.simanek@gmail.com>
wrote:
čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson <daniel@yesql.se
<mailto:daniel@yesql.se>> napsal:
The automated patchtester for the Commitfest gets confused when there
are two
versions of the same changeset attached to the email, as it tries to
apply them
both which obviously results in an application failure. I've attached
just the
previously submitted patch version to this email to see if we can get
a test
run of it.
Thanks, I'm new to commitfest and I was confused as well.
No worries, we're here to help.
I tried to reattach the thread there. I'll prepare a new patch soon,
what should I do? Just attach it again?
New patch has not been sent yet.
So I marked this as "Waiting on Author" at Commit Fest.Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/07/29 3:30, Pavel Stehule wrote:
út 28. 7. 2020 v 20:25 odesílatel Josef Šimánek <josef.simanek@gmail.com <mailto:josef.simanek@gmail.com>> napsal:
Thanks for the info. I am waiting for review. Is there any summary of requested changes needed?
Maybe it is just noise - you wrote so you will resend a patch to different thread
I tried to reattach the thread there. I'll prepare a new patch soon, what should I do? Just attach it again?
Yeah, since I read this message, I was thinking that new patch will be
posted. But, Josef, if the situation was changed, could you correct me?
Anyway the patch seems not to be applied cleanly, so at least it needs to
be updated to address that.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Thu, Jul 30, 2020 at 08:51:36AM +0900, Fujii Masao wrote:
Yeah, since I read this message, I was thinking that new patch will be
posted. But, Josef, if the situation was changed, could you correct me?
Anyway the patch seems not to be applied cleanly, so at least it needs to
be updated to address that.
Josef, the patch has been waiting on author for a bit more than one
month, so could you send a rebased version please? It looks that you
are still a bit confused by the commit fest process, and as a first
step we need a clean version to be able to review it. This would also
allow the commit fest bot to check it at http://commitfest.cputube.org/.
--
Michael
On Mon, Sep 07, 2020 at 01:13:10PM +0900, Michael Paquier wrote:
Josef, the patch has been waiting on author for a bit more than one
month, so could you send a rebased version please? It looks that you
are still a bit confused by the commit fest process, and as a first
step we need a clean version to be able to review it. This would also
allow the commit fest bot to check it at http://commitfest.cputube.org/.
This feature has some appeal, but there is no activity lately, so I am
marking it as returned with feedback. Please feel free to send a new
patch once you have time to do so, and I would recommend to register a
new entry in the commit fest app when done.
--
Michael
čt 17. 9. 2020 v 7:09 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Mon, Sep 07, 2020 at 01:13:10PM +0900, Michael Paquier wrote:
Josef, the patch has been waiting on author for a bit more than one
month, so could you send a rebased version please? It looks that you
are still a bit confused by the commit fest process, and as a first
step we need a clean version to be able to review it. This would also
allow the commit fest bot to check it at http://commitfest.cputube.org/.This feature has some appeal, but there is no activity lately, so I am
marking it as returned with feedback. Please feel free to send a new
patch once you have time to do so, and I would recommend to register a
new entry in the commit fest app when done.
Thanks for info. I hope I'll be able to revisit this patch soon,
rebase and post again. I'm still interested in this.
Show quoted text
--
Michael