tracking scripts...
I have a fairly large table (4.3 billion rows) that I am running an update
script on (a bit over 127 thousand individual update queries). I am using
the gui. It has been running for about 24 hours now. Is there any good way
to gauge progress (as in, how many of the individual update queries have
finished)?
Now it's too late,
but maybe you could allow to not use a single transaction ( but instead
127k transactions).4
Then at the end of every transaction you could print something in gui
(print for pgscript, raise for plpgsql) or execute a command to write in a
file (copy for instance).
It would also be in the log, but not so clear.
Cheers,
Rémi-C
2013/11/26 Joey Quinn <bjquinniii@gmail.com>
Show quoted text
I have a fairly large table (4.3 billion rows) that I am running an update
script on (a bit over 127 thousand individual update queries). I am using
the gui. It has been running for about 24 hours now. Is there any good way
to gauge progress (as in, how many of the individual update queries have
finished)?
Connect to the DB and run "select * from pg_stat_activity" to see what
specific query your other connection is running. Then find that in your
file to see how far it has progressed.
I hope you profiled your queries to make sure they run fast before you
started. :)
On Tue, Nov 26, 2013 at 10:28 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
Show quoted text
I have a fairly large table (4.3 billion rows) that I am running an update
script on (a bit over 127 thousand individual update queries). I am using
the gui. It has been running for about 24 hours now. Is there any good way
to gauge progress (as in, how many of the individual update queries have
finished)?
The queries themselves are written like so:
update ipv4_table set country='xx' where ipv4 between 'xxx.xxx.xxx.xxx' and
'xxx.xxx.xxx.xxx';
There are 127k lines like that (each with a different range and the
appropriate country code). Each is terminated with a semi-colon. Does that
make them individual transactions in postgres or not? (postgres newbie
here). Is there something else I need to do for them to be treated like
separate transactions?
On Tue, Nov 26, 2013 at 11:16 AM, Rémi Cura <remi.cura@gmail.com> wrote:
Show quoted text
Now it's too late,
but maybe you could allow to not use a single transaction ( but instead
127k transactions).4Then at the end of every transaction you could print something in gui
(print for pgscript, raise for plpgsql) or execute a command to write in a
file (copy for instance).
It would also be in the log, but not so clear.Cheers,
Rémi-C
2013/11/26 Joey Quinn <bjquinniii@gmail.com>
I have a fairly large table (4.3 billion rows) that I am running an
update script on (a bit over 127 thousand individual update queries). I am
using the gui. It has been running for about 24 hours now. Is there any
good way to gauge progress (as in, how many of the individual update
queries have finished)?
On Tue, Nov 26, 2013 at 12:11 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
update ipv4_table set country='xx' where ipv4 between 'xxx.xxx.xxx.xxx'
and 'xxx.xxx.xxx.xxx';There are 127k lines like that (each with a different range and the
appropriate country code). Each is terminated with a semi-colon. Does that
make them individual transactions in postgres or not? (postgres newbie
here). Is there something else I need to do for them to be treated like
separate transactions?
If you did not wrap the whole thing with begin/commit then each is its own
transaction. I certainly hope you have an appopriate index on that ipv4
column and it is appropriately typed.
When I ran that command (select * from pg_stat_activity"), it returned the
first six lines of the scripts. I'm fairly sure it has gotten a bit beyond
that (been running over 24 hours now, and the size has increased about 300
GB). Am I missing something for it to tell me what the last line processed
was?
I didn't do any profiling (postgres newbie here). All of the updates are
fairly straightforward and only hit a single table. They are updating a
single column based upon a "where between" clause which hits an index. I
did run a single one initially, and then a group of about 10k to make sure
they were behaving properly before launching the rest of the pile...
This is my first postgres project. It's a table of the complete IPV4
address space. Trying out postgres because the MySQL (actually MariaDB)
attempt was not scaling well.
On Tue, Nov 26, 2013 at 11:20 AM, Vick Khera <vivek@khera.org> wrote:
Show quoted text
Connect to the DB and run "select * from pg_stat_activity" to see what
specific query your other connection is running. Then find that in your
file to see how far it has progressed.I hope you profiled your queries to make sure they run fast before you
started. :)On Tue, Nov 26, 2013 at 10:28 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
I have a fairly large table (4.3 billion rows) that I am running an
update script on (a bit over 127 thousand individual update queries). I am
using the gui. It has been running for about 24 hours now. Is there any
good way to gauge progress (as in, how many of the individual update
queries have finished)?
The ipv4 column is of type inet. It is the primary key (btree access) and
access times for queries on individual ip addresses have been around 10-15
ms.
On Tue, Nov 26, 2013 at 12:13 PM, Vick Khera <vivek@khera.org> wrote:
Show quoted text
On Tue, Nov 26, 2013 at 12:11 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
update ipv4_table set country='xx' where ipv4 between 'xxx.xxx.xxx.xxx'
and 'xxx.xxx.xxx.xxx';There are 127k lines like that (each with a different range and the
appropriate country code). Each is terminated with a semi-colon. Does that
make them individual transactions in postgres or not? (postgres newbie
here). Is there something else I need to do for them to be treated like
separate transactions?If you did not wrap the whole thing with begin/commit then each is its own
transaction. I certainly hope you have an appopriate index on that ipv4
column and it is appropriately typed.
On 11/26/2013 9:24 AM, Joey Quinn wrote:
When I ran that command (select * from pg_stat_activity"), it returned
the first six lines of the scripts. I'm fairly sure it has gotten a
bit beyond that (been running over 24 hours now, and the size has
increased about 300 GB). Am I missing something for it to tell me what
the last line processed was?
that means your GUI lobbed the entire file at postgres in a single
PQexec call, so its all being executed as a single statement.
psql -f "filename.sql" dbname would have processed the queries one at
a time.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Nov 26, 2013 at 12:24 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
When I ran that command (select * from pg_stat_activity"), it returned the
first six lines of the scripts. I'm fairly sure it has gotten a bit beyond
that (been running over 24 hours now, and the size has increased about 300
GB). Am I missing something for it to tell me what the last line processed
was?
I agree with what John R Pierce says about your GUI lumping all of it into
one statement. What you can do is get indirect evidence by looking to see
which rows are set. I'm assuming that your IP ranges are non-overlapping,
so just do a binary search until you narrow it down to see how far along
you are.
Sounds like I will have to get comfortable with the command line version of
things... sigh... hate that.
Would that command be from within the psql SQL Shell that came as part of
the install? (I'm living in Windows land).
(and thank-you for the command)
On Tue, Nov 26, 2013 at 1:24 PM, John R Pierce <pierce@hogranch.com> wrote:
Show quoted text
On 11/26/2013 9:24 AM, Joey Quinn wrote:
When I ran that command (select * from pg_stat_activity"), it returned
the first six lines of the scripts. I'm fairly sure it has gotten a bit
beyond that (been running over 24 hours now, and the size has increased
about 300 GB). Am I missing something for it to tell me what the last line
processed was?that means your GUI lobbed the entire file at postgres in a single PQexec
call, so its all being executed as a single statement.psql -f "filename.sql" dbname would have processed the queries one at a
time.--
john r pierce 37N 122W
somewhere on the middle of the left coast--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
The ranges are indeed overlapping, though the update statements were
generated alphabetically rather than in IP order... If the command line
will let me query the table directly without being blocked by the ongoing
updates, then I could get a rough order of magnitude of progress by doing a
null count on the county field... hate to throw queries at it while it's
busy updating though...
On Tue, Nov 26, 2013 at 1:43 PM, Vick Khera <vivek@khera.org> wrote:
Show quoted text
On Tue, Nov 26, 2013 at 12:24 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
When I ran that command (select * from pg_stat_activity"), it returned
the first six lines of the scripts. I'm fairly sure it has gotten a bit
beyond that (been running over 24 hours now, and the size has increased
about 300 GB). Am I missing something for it to tell me what the last line
processed was?I agree with what John R Pierce says about your GUI lumping all of it into
one statement. What you can do is get indirect evidence by looking to see
which rows are set. I'm assuming that your IP ranges are non-overlapping,
so just do a binary search until you narrow it down to see how far along
you are.
On Tue, Nov 26, 2013 at 2:48 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
The ranges are indeed overlapping, though the update statements were
generated alphabetically rather than in IP order... If the command line
will let me query the table directly without being blocked by the ongoing
updates, then I could get a rough order of magnitude of progress by doing a
null count on the county field... hate to throw queries at it while it's
busy updating though...
Try a SELECT ... LIMIT 1; it will find the first row that matches and exit.
So you can see if any particular country code has been set. Again, binary
search on the codes.
On 11/26/2013 11:45 AM, Joey Quinn wrote:
Would that command be from within the psql SQL Shell that came as part
of the install? (I'm living in Windows land).
if you're already in psql, logged onto your database, it would be \i
filename.sql
psql -f filename.sql dbname... would be at the system shell prompt,
but that assumes the postgresql binary directory is in your path, which
it may not be on default Windows installs.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
nope, that appears to be being blocked by the updates...
tried "select * from ipv4_table where country='gb' limit 1;"
it just sat there...
On Tue, Nov 26, 2013 at 3:00 PM, Vick Khera <vivek@khera.org> wrote:
Show quoted text
On Tue, Nov 26, 2013 at 2:48 PM, Joey Quinn <bjquinniii@gmail.com> wrote:
The ranges are indeed overlapping, though the update statements were
generated alphabetically rather than in IP order... If the command line
will let me query the table directly without being blocked by the ongoing
updates, then I could get a rough order of magnitude of progress by doing a
null count on the county field... hate to throw queries at it while it's
busy updating though...Try a SELECT ... LIMIT 1; it will find the first row that matches and
exit. So you can see if any particular country code has been set. Again,
binary search on the codes.
yeah, unlikely that it is already in the path (I certainly didn't add it
yet).
Thanks for the command (new version).
On Tue, Nov 26, 2013 at 3:13 PM, John R Pierce <pierce@hogranch.com> wrote:
Show quoted text
On 11/26/2013 11:45 AM, Joey Quinn wrote:
Would that command be from within the psql SQL Shell that came as part of
the install? (I'm living in Windows land).if you're already in psql, logged onto your database, it would be \i
filename.sqlpsql -f filename.sql dbname... would be at the system shell prompt,
but that assumes the postgresql binary directory is in your path, which it
may not be on default Windows installs.--
john r pierce 37N 122W
somewhere on the middle of the left coast--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Nov 26, 2013 at 9:28 AM, Joey Quinn <bjquinniii@gmail.com> wrote:
I have a fairly large table (4.3 billion rows) that I am running an update
script on (a bit over 127 thousand individual update queries). I am using
the gui. It has been running for about 24 hours now. Is there any good way
to gauge progress (as in, how many of the individual update queries have
finished)?
There are not many ways to Hand off information outside of the
database while a transaction Is running. one way Is to write a Simple
trigger in plpgsql that 'raise'es A notice every 'n' times trigger
condition fires. that'S Essentially the only Clean way to do it in
such a way that the information is Returned to the Executing console.
Thanks!
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 11/26/2013 12:30 PM, Merlin Moncure wrote:
There are not many ways to Hand off information outside of the
database while a transaction Is running. one way Is to write a Simple
trigger in plpgsql that 'raise'es A notice every 'n' times trigger
condition fires. that'S Essentially the only Clean way to do it in
such a way that the information is Returned to the Executing console.
Thanks!
how would that trigger track N?
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Nov 26, 2013 at 2:38 PM, John R Pierce <pierce@hogranch.com> wrote:
On 11/26/2013 12:30 PM, Merlin Moncure wrote:
There are not many ways to Hand off information outside of the
database while a transaction Is running. one way Is to write a Simple
trigger in plpgsql that 'raise'es A notice every 'n' times trigger
condition fires. that'S Essentially the only Clean way to do it in
such a way that the information is Returned to the Executing console.
Thanks!how would that trigger track N?
A couple of different ways. Easiest would just be to manage a sequence.
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 26/11/2013 20:30, Merlin Moncure wrote:
There are not many ways to Hand off information outside of the
database while a transaction Is running. one way Is to write a Simple
trigger in plpgsql that 'raise'es A notice every 'n' times trigger
condition fires. that'S Essentially the only Clean way to do it in
such a way that the information is Returned to the Executing console.
Thanks!
Totally unrelated to the thread.... I noticed that the capitalised
letters in the email above spell out this:
THIISASECRET
.. which (almost) spells "This is a secret". Was this intentional, or am
I just working too hard? :-)
Ray.
--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell <rod@iol.ie> wrote:
On 26/11/2013 20:30, Merlin Moncure wrote:
There are not many ways to Hand off information outside of the
database while a transaction Is running. one way Is to write a Simple
trigger in plpgsql that 'raise'es A notice every 'n' times trigger
condition fires. that'S Essentially the only Clean way to do it in
such a way that the information is Returned to the Executing console.
Thanks!Totally unrelated to the thread.... I noticed that the capitalised
letters in the email above spell out this:THIISASECRET
.. which (almost) spells "This is a secret". Was this intentional, or am
I just working too hard? :-)
Well, bad spelling on my part. To get the joke, you have to be A.
observant, B. be using a gmail account, and C. be a comic book geek
that grew up in the 80's.
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general