[RFC] Add an until-0 loop in psql

Started by Pierre Ducroquetover 7 years ago12 messages

pierre.ducroquet@people-doc.com

over 7 years ago

1 attachment(s)

When running database migrations with .sql files on a live database, it's not
uncommon to have to run a migration in a loop to prevent a big lock on a
table.
For instance if one want to delete some old datas from a big table one would
write :

DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad = true
LIMIT 1000);
VACUUM big_table;

Right now, doing this is quite inefficient. We either have to write a script
in another language, or run psql in a shell loop and wait for the migration to
stop altering rows.

The attached **proof of concept** patch (I insist, it's a 15 minutes hack
sprint with no previous knowledge of psql code) implements an 'until-0' loop
in psql.
The previous migration could be simply written as :

\until-0
BEGIN;
DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad = true
LIMIT 1000);
VACUUM big_table;
COMMIT;
\end-until

And psql will execute it until there is no row affected in the inner queries.

I am willing to write a proper patch for this (I hope the tell/seek is an
acceptable implementation…), but I prefer having some feedback first.

Thanks

Pierre

Attachments:

psql-until-0--hack01.patchtext/x-patch; charset=UTF-8; name=psql-until-0--hack01.patchDownload

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 4c85f43f09..d706e38ffc 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -401,6 +401,19 @@ exec_command(const char *cmd,
 		status = exec_command_shell_escape(scan_state, active_branch);
 	else if (strcmp(cmd, "?") == 0)
 		status = exec_command_slash_command_help(scan_state, active_branch);
+	else if (strcmp(cmd, "until-0") == 0) {
+		status = PSQL_CMD_SKIP_LINE;
+		pset.is_in_until = ftell(pset.cur_cmd_source);
+		pset.has_affected_rows = false;
+	} else if (strcmp(cmd, "end-until") == 0) {
+		status = PSQL_CMD_SKIP_LINE;
+		if (pset.has_affected_rows) {
+			fseek(pset.cur_cmd_source, pset.is_in_until, SEEK_SET);
+			pset.has_affected_rows = false;
+		} else {
+			pset.is_in_until = 0;
+		}
+	}
 	else
 		status = PSQL_CMD_UNKNOWN;
 
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index c06ce3ca09..869dbf6dcd 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -430,6 +430,12 @@ MainLoop(FILE *source)
 				{
 					success = SendQuery(query_buf->data);
 					slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR;
+					if (success && pset.is_in_until && !pset.has_affected_rows)
+					{
+						const char *row_count = GetVariable(pset.vars, "ROW_COUNT");
+						if (row_count != NULL && strcmp(row_count, "0") != 0)
+							pset.has_affected_rows = true;
+					}
 					pset.stmt_lineno = 1;
 
 					/* transfer query to previous_buf by pointer-swapping */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 69e617e6b5..f3f92fe899 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -139,6 +139,9 @@ typedef struct _psqlSettings
 	const char *prompt3;
 	PGVerbosity verbosity;		/* current error verbosity level */
 	PGContextVisibility show_context;	/* current context display level */
+	
+	long		is_in_until;
+	int			has_affected_rows;
 } PsqlSettings;
 
 extern PsqlSettings pset;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index be57574cd3..beba8851a3 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -134,6 +134,8 @@ main(int argc, char *argv[])
 	pset.last_error_result = NULL;
 	pset.cur_cmd_source = stdin;
 	pset.cur_cmd_interactive = false;
+	pset.is_in_until = 0;
+	pset.has_affected_rows = false;
 
 	/* We rely on unmentioned fields of pset.popt to start out 0/false/NULL */
 	pset.popt.topt.format = PRINT_ALIGNED;

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Pierre Ducroquet (#1)

Re: [RFC] Add an until-0 loop in psql

2018-04-24 9:58 GMT+02:00 Pierre Ducroquet <pierre.ducroquet@people-doc.com>
:

Hi

When running database migrations with .sql files on a live database, it's
not
uncommon to have to run a migration in a loop to prevent a big lock on a
table.
For instance if one want to delete some old datas from a big table one
would
write :

DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad =
true
LIMIT 1000);
VACUUM big_table;

Right now, doing this is quite inefficient. We either have to write a
script
in another language, or run psql in a shell loop and wait for the
migration to
stop altering rows.

The attached **proof of concept** patch (I insist, it's a 15 minutes hack
sprint with no previous knowledge of psql code) implements an 'until-0'
loop
in psql.
The previous migration could be simply written as :

\until-0
BEGIN;
DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad =
true
LIMIT 1000);
VACUUM big_table;
COMMIT;
\end-until

And psql will execute it until there is no row affected in the inner
queries.

I am willing to write a proper patch for this (I hope the tell/seek is an
acceptable implementation…), but I prefer having some feedback first.

I like this idea, but it is really hack :)

In this case, the cycle should be \repeat ... and \end-repeat-until

The expression should be more generic maybe.

Regards

Pavel

Show quoted text

Thanks

Pierre

Corey Huinker

corey.huinker@gmail.com

over 7 years ago

In reply to: Pierre Ducroquet (#1)

Re: [RFC] Add an until-0 loop in psql

As of v11, DO blocks can do transactions. I think this will meet your needs.

A loop that starts at point X in the code and terminates at point Y has to
know how to jump back in the file (if there even is a file!) to point X and
re-interpret commands as it makes it's way back through the "file" toward
point Y again... a place it might not reach, or it might reach some other
loop termination first. \commands can be hidden inside psql variables,
files can be conditionally included based on \if statements, and those
files might have loop starters/terminators in them. And those commands *are*
processed.

That, or you'd have to capture the code by somehow parsing ahead to the
next \until-0 (processing all inner loops as you go, and the files they
include, etc), but that means that variables that were expanded the first
time are *not* expanded on subsequent iterations, and that makes it hard to
set an exit-condition variable. It would also seriously alter what psql is
when inside that loop.

I once did a presentation on ways to (ab)use psql, and one thing I did was
recursion via include files. Adapting your loop as literally as possible,
it would look like this:

loop_file.sql:

BEGIN;
WITH deleted_rows AS (DELETE FROM big_table
WHERE id in (SELECT id FROM big_table WHERE bad =
true LIMIT 1000)
RETURNING 1)
SELECT (COUNT(*) > 0) as deleted_some_rows FROM deleted_rows
\gset
VACUUM big_table;
COMMIT;
\if :deleted_some_rows
\include loop_file.sql
\endif

What you don't see here is that you're using your psql process's available
open file handles as a stack, and when you hit that limit psql will fail.
If you remove that limit, then you get a bit further before psql segfaults
on you. I think I got ~2700 files deep before that happened. Your stackage
may vary.

I'm not saying this is a good solution, quite the contrary. I think the
sane solution is right around the corner in Version 11.

Now if we just had a way of passing parameters into DO blocks...

On Tue, Apr 24, 2018 at 3:59 AM Pierre Ducroquet <
pierre.ducroquet@people-doc.com> wrote:

Show quoted text

Hi

When running database migrations with .sql files on a live database, it's
not
uncommon to have to run a migration in a loop to prevent a big lock on a
table.
For instance if one want to delete some old datas from a big table one
would
write :

DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad =
true
LIMIT 1000);
VACUUM big_table;

Right now, doing this is quite inefficient. We either have to write a
script
in another language, or run psql in a shell loop and wait for the
migration to
stop altering rows.

The attached **proof of concept** patch (I insist, it's a 15 minutes hack
sprint with no previous knowledge of psql code) implements an 'until-0'
loop
in psql.
The previous migration could be simply written as :

\until-0
BEGIN;
DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad =
true
LIMIT 1000);
VACUUM big_table;
COMMIT;
\end-until

And psql will execute it until there is no row affected in the inner
queries.

I am willing to write a proper patch for this (I hope the tell/seek is an
acceptable implementation…), but I prefer having some feedback first.

Thanks

Pierre

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Corey Huinker (#3)

Re: [RFC] Add an until-0 loop in psql

2018-04-27 21:40 GMT+02:00 Corey Huinker <corey.huinker@gmail.com>:

As of v11, DO blocks can do transactions. I think this will meet your
needs.

A loop that starts at point X in the code and terminates at point Y has to
know how to jump back in the file (if there even is a file!) to point X and
re-interpret commands as it makes it's way back through the "file" toward
point Y again... a place it might not reach, or it might reach some other
loop termination first. \commands can be hidden inside psql variables,
files can be conditionally included based on \if statements, and those
files might have loop starters/terminators in them. And those commands
*are* processed.

That, or you'd have to capture the code by somehow parsing ahead to the
next \until-0 (processing all inner loops as you go, and the files they
include, etc), but that means that variables that were expanded the first
time are *not* expanded on subsequent iterations, and that makes it hard
to set an exit-condition variable. It would also seriously alter what psql
is when inside that loop.

I once did a presentation on ways to (ab)use psql, and one thing I did was
recursion via include files. Adapting your loop as literally as possible,
it would look like this:

loop_file.sql:

BEGIN;
WITH deleted_rows AS (DELETE FROM big_table
WHERE id in (SELECT id FROM big_table WHERE bad =
true LIMIT 1000)
RETURNING 1)
SELECT (COUNT(*) > 0) as deleted_some_rows FROM deleted_rows
\gset
VACUUM big_table;
COMMIT;
\if :deleted_some_rows
\include loop_file.sql
\endif

What you don't see here is that you're using your psql process's available
open file handles as a stack, and when you hit that limit psql will fail.
If you remove that limit, then you get a bit further before psql segfaults
on you. I think I got ~2700 files deep before that happened. Your stackage
may vary.

I'm not saying this is a good solution, quite the contrary. I think the
sane solution is right around the corner in Version 11.

Now if we just had a way of passing parameters into DO blocks...

I hope so there will be schema (temporal) variables:

create temp variable foo int default 10;

do $$
begin
for i in 1..foo loop
raise notice '%', i;
end loop;
end;
$$;

Show quoted text

On Tue, Apr 24, 2018 at 3:59 AM Pierre Ducroquet <
pierre.ducroquet@people-doc.com> wrote:

Hi

When running database migrations with .sql files on a live database, it's
not
uncommon to have to run a migration in a loop to prevent a big lock on a
table.
For instance if one want to delete some old datas from a big table one
would
write :

DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad =
true
LIMIT 1000);
VACUUM big_table;

Right now, doing this is quite inefficient. We either have to write a
script
in another language, or run psql in a shell loop and wait for the
migration to
stop altering rows.

The attached **proof of concept** patch (I insist, it's a 15 minutes hack
sprint with no previous knowledge of psql code) implements an 'until-0'
loop
in psql.
The previous migration could be simply written as :

\until-0
BEGIN;
DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad =
true
LIMIT 1000);
VACUUM big_table;
COMMIT;
\end-until

And psql will execute it until there is no row affected in the inner
queries.

I am willing to write a proper patch for this (I hope the tell/seek is an
acceptable implementation…), but I prefer having some feedback first.

Thanks

Pierre

Corey Huinker

corey.huinker@gmail.com

over 7 years ago

In reply to: Pavel Stehule (#4)

Re: [RFC] Add an until-0 loop in psql

What you don't see here is that you're using your psql process's
available open file handles as a stack, and when you hit that limit psql
will fail. If you remove that limit, then you get a bit further before psql
segfaults on you. I think I got ~2700 files deep before that happened. Your
stackage may vary.

I'm not saying this is a good solution, quite the contrary. I think the
sane solution is right around the corner in Version 11.

Now if we just had a way of passing parameters into DO blocks...

I hope so there will be schema (temporal) variables:

create temp variable foo int default 10;

do $$
begin
for i in 1..foo loop
raise notice '%', i;
end loop;
end;
$$;

That would be nice too.

A while back, somebody explained why implementing parameters in a DO block
was so hard, but I don't recall why, and the search terms "do" and
"parameter" don't really narrow things down.

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Corey Huinker (#5)

Re: [RFC] Add an until-0 loop in psql

2018-04-28 7:36 GMT+02:00 Corey Huinker <corey.huinker@gmail.com>:

What you don't see here is that you're using your psql process's
available open file handles as a stack, and when you hit that limit psql
will fail. If you remove that limit, then you get a bit further before psql
segfaults on you. I think I got ~2700 files deep before that happened. Your
stackage may vary.

I'm not saying this is a good solution, quite the contrary. I think the
sane solution is right around the corner in Version 11.

Now if we just had a way of passing parameters into DO blocks...

I hope so there will be schema (temporal) variables:

create temp variable foo int default 10;

do $$
begin
for i in 1..foo loop
raise notice '%', i;
end loop;
end;
$$;

That would be nice too.

A while back, somebody explained why implementing parameters in a DO block
was so hard, but I don't recall why, and the search terms "do" and
"parameter" don't really narrow things down.

I did it too. It is not too hard - there was not a agreement on syntax.

can be nice some like CTE

WITH PROCEDURE x(a int, b int) AS ... $$ SELECT x(10);

Maybe Oracle supports this syntax

Pavel

Show quoted text

Corey Huinker

corey.huinker@gmail.com

over 7 years ago

In reply to: Pavel Stehule (#6)

Re: [RFC] Add an until-0 loop in psql

I did it too. It is not too hard - there was not a agreement on syntax.

can be nice some like CTE

WITH PROCEDURE x(a int, b int) AS ... $$ SELECT x(10);

I've seen "WITH function...." syntax in Oracle (here's an example:
https://oracle-base.com/articles/12c/with-clause-enhancements-12cr1) , but
I can see where we'd run into trouble with SELECT statements being atomic,
and the PROCEDURE being transactional. If we only allowed a one-line SELECT
that problem would go away, but that raises the question of whether SELECT
is the right keyword for the job. Is there something in the SQL standard
for that?

Either way, we're clearly talking about something to add to postgres, not
psql itself, and we should split this conversation to another thread. Sorry
for hijacking your thread, Pierre.

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Corey Huinker (#7)

Re: [RFC] Add an until-0 loop in psql

2018-04-28 21:25 GMT+02:00 Corey Huinker <corey.huinker@gmail.com>:

I did it too. It is not too hard - there was not a agreement on syntax.

can be nice some like CTE

WITH PROCEDURE x(a int, b int) AS ... $$ SELECT x(10);

I've seen "WITH function...." syntax in Oracle (here's an example:
https://oracle-base.com/articles/12c/with-clause-enhancements-12cr1) ,
but I can see where we'd run into trouble with SELECT statements being
atomic, and the PROCEDURE being transactional. If we only allowed a
one-line SELECT that problem would go away, but that raises the question of
whether SELECT is the right keyword for the job. Is there something in the
SQL standard for that?

My example was wrong

WITH PROCEDURE x(a int, b int) AS ... $$ CALL x(10);

but still, we can use different syntax based on default parameters -

DO (a => 10, b => 20) $$ ... $$

probably implementation should not be hard.

Either way, we're clearly talking about something to add to postgres, not

psql itself, and we should split this conversation to another thread. Sorry
for hijacking your thread, Pierre.

+1 we can start new thread if is interest about this topic. The problem is
just in syntax, nothing else.

note: DO is executed as function if I remember well, not as procedure! So
this should be changed.

Regards

Pavel

Daniel Verite

daniel@manitou-mail.org

over 7 years ago

In reply to: Corey Huinker (#3)

Re: [RFC] Add an until-0 loop in psql

Corey Huinker wrote:

As of v11, DO blocks can do transactions. I think this will meet your needs.

They do support COMMIT and ROLLBACK in the current
development tree, but not VACUUM as in Pierre's example.

postgres=# \echo :SERVER_VERSION_NAME
11devel

postgres=# do ' begin vacuum; end ';
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL statement "vacuum"
PL/pgSQL function inline_code_block line 1 at SQL statement

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#10

Pierre Ducroquet

pierre.ducroquet@people-doc.com

over 7 years ago

In reply to: Daniel Verite (#9)

Re: [RFC] Add an until-0 loop in psql

On Monday, April 30, 2018 1:01:25 PM CEST Daniel Verite wrote:

Corey Huinker wrote:

As of v11, DO blocks can do transactions. I think this will meet your
needs.

They do support COMMIT and ROLLBACK in the current
development tree, but not VACUUM as in Pierre's example.

postgres=# \echo :SERVER_VERSION_NAME
11devel

postgres=# do ' begin vacuum; end ';
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL statement "vacuum"
PL/pgSQL function inline_code_block line 1 at SQL statement

Best regards,

Indeed, vacuum is going to be the biggest offender here, sadly.
One could work around this of course (on top of my head, using notify to wake-
up another client that would launch the required vacuums…)
Being able to do transactions in DO blocks is a great new feature of v11 I was
not aware of. But psql saw the addition of \if recently, so why not having
loops in there too ? (Something better than this hack of course, it was just a
10 minutes hack-sprint for a demo)

Regards

Pierre

#11

Corey Huinker

corey.huinker@gmail.com

over 7 years ago

In reply to: Pierre Ducroquet (#10)

Re: [RFC] Add an until-0 loop in psql

On Mon, Apr 30, 2018 at 7:05 AM Pierre Ducroquet <
pierre.ducroquet@people-doc.com> wrote:

On Monday, April 30, 2018 1:01:25 PM CEST Daniel Verite wrote:

Corey Huinker wrote:

As of v11, DO blocks can do transactions. I think this will meet your
needs.

They do support COMMIT and ROLLBACK in the current
development tree, but not VACUUM as in Pierre's example.

postgres=# \echo :SERVER_VERSION_NAME
11devel

postgres=# do ' begin vacuum; end ';
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL statement "vacuum"
PL/pgSQL function inline_code_block line 1 at SQL statement

Best regards,

Indeed, vacuum is going to be the biggest offender here, sadly.
One could work around this of course (on top of my head, using notify to
wake-
up another client that would launch the required vacuums…)
Being able to do transactions in DO blocks is a great new feature of v11 I
was
not aware of. But psql saw the addition of \if recently, so why not having
loops in there too ? (Something better than this hack of course, it was
just a
10 minutes hack-sprint for a demo)

Regards

Pierre

Bummer about vacuum.

If you dig into the very long discussion about \if (which, incidentally,
started off as a 20-line command patch called \quit-if, so don't discount
that your idea could take off), you'll see some of the problems with
looping discussed, mostly about the issues I already alluded to (no concept
of reading backwards on STDIN, scoping outside the current "file", ability
of psql vars to contain executable \commands), you'll have a pretty good
grasp of the places where psql would need changes.

In the mean time, if you believe the table won't get much larger during the
operation, you could use \gexec as a finite loop iterator

SELECT count(*)::bigint / 1000 FROM big_table as num_iters
\gset
SELECT
'BEGIN',
'DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad
= true LIMIT 1000)',
'VACUUM big_table',
'COMMIT'
from generate_series(1,:num_iters) g
\gexec

If the number of rows increases, then your finite loop will fall short, and
if something else deletes a bunch of rows, your loop will spin it's wheels
a few times at the end, but it would do most of what you want.

#12

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 7 years ago

In reply to: Daniel Verite (#9)

Re: [RFC] Add an until-0 loop in psql

On 4/30/18 07:01, Daniel Verite wrote:

As of v11, DO blocks can do transactions. I think this will meet your needs.

They do support COMMIT and ROLLBACK in the current
development tree, but not VACUUM as in Pierre's example.

Support for VACUUM can be added in the future. There is no big problem
with it.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services