Unexpected query result
In order to process a large amount of data I need to run a procedure using parallel batch processes. The query I run in each process is expected to ...
1- select a bunch of id (500) in a table (wait4processing) containing the list of all records to process2- remove selected records from wait4processing table in order to eliminate duplicate processing3- run the procedure (build_contributions_clusters) over the range of selected ids
--The query I use:With ids as( delete from wait4processing where id in( select id from wait4processing limit 500) returning id) select build_contributions_clusters(min(id),max(id)) from ids;
The query runs properly if I send it sequentially (wait for the completion of the query before sening it again) but it does'nt work when sending multiple occurrences in parallel. Seems from the results I got that the first query received by the server runs properly but the following ones try to process the same first 500 records even if deleted - the build_contributions_clusters procedure receive NULL values instead of the following 500 records.
Since I am almost certain it is the expected behavior, I would like to like to understand why, and I would also appreciate to see alternative queries to do the job.
Thanks :-)
On 10/05/2015 05:02 AM, Begin Daniel wrote:
In order to process a large amount of data I need to run a procedure
using parallel batch processes.
The query I run in each process is expected to ...1- select a bunch of id (500) in a table (wait4processing) containing
the list of all records to process
2- remove selected records from wait4processing table in order to
eliminate duplicate processing
3- run the procedure (build_contributions_clusters) over the range of
selected ids--The query I use:
With ids as( delete from wait4processing where id in( select id from
wait4processing limit 500) returning id)
select build_contributions_clusters(min(id),max(id)) from ids;The query runs properly if I send it sequentially (wait for the
completion of the query before sening it again) but it does'nt work
when sending multiple occurrences in parallel. Seems from the results I
got that the first query received by the server runs properly but the
following ones try to process the same first 500 records even if deleted
- the build_contributions_clusters procedure receive NULL values instead
of the following 500 records.Since I am almost certain it is the expected behavior, I would like to
like to understand why, and I would also appreciate to
see alternative queries to do the job.
See here:
http://www.postgresql.org/docs/9.4/interactive/transaction-iso.html
"13.2.1. Read Committed Isolation Level
Read Committed is the default isolation level in PostgreSQL. When a
transaction uses this isolation level, a SELECT query (without a FOR
UPDATE/SHARE clause) sees only data committed before the query began; it
never sees either uncommitted data or changes committed during query
execution by concurrent transactions. In effect, a SELECT query sees a
snapshot of the database as of the instant the query begins to run.
However, SELECT does see the effects of previous updates executed within
its own transaction, even though they are not yet committed. Also note
that two successive SELECT commands can see different data, even though
they are within a single transaction, if other transactions commit
changes after the first SELECT starts and before the second SELECT starts."
Thanks :-)
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thank Adrian, it makes sense. I'll adapt the calling procedures Daniel
Show quoted text
Subject: Re: [GENERAL] Unexpected query result
To: jfd553@hotmail.com; pgsql-general@postgresql.org
From: adrian.klaver@aklaver.com
Date: Mon, 5 Oct 2015 06:17:33 -0700On 10/05/2015 05:02 AM, Begin Daniel wrote:
In order to process a large amount of data I need to run a procedure
using parallel batch processes.
The query I run in each process is expected to ...1- select a bunch of id (500) in a table (wait4processing) containing
the list of all records to process
2- remove selected records from wait4processing table in order to
eliminate duplicate processing
3- run the procedure (build_contributions_clusters) over the range of
selected ids--The query I use:
With ids as( delete from wait4processing where id in( select id from
wait4processing limit 500) returning id)
select build_contributions_clusters(min(id),max(id)) from ids;The query runs properly if I send it sequentially (wait for the
completion of the query before sening it again) but it does'nt work
when sending multiple occurrences in parallel. Seems from the results I
got that the first query received by the server runs properly but the
following ones try to process the same first 500 records even if deleted
- the build_contributions_clusters procedure receive NULL values instead
of the following 500 records.Since I am almost certain it is the expected behavior, I would like to
like to understand why, and I would also appreciate to
see alternative queries to do the job.See here:
http://www.postgresql.org/docs/9.4/interactive/transaction-iso.html
"13.2.1. Read Committed Isolation Level
Read Committed is the default isolation level in PostgreSQL. When a
transaction uses this isolation level, a SELECT query (without a FOR
UPDATE/SHARE clause) sees only data committed before the query began; it
never sees either uncommitted data or changes committed during query
execution by concurrent transactions. In effect, a SELECT query sees a
snapshot of the database as of the instant the query begins to run.
However, SELECT does see the effects of previous updates executed within
its own transaction, even though they are not yet committed. Also note
that two successive SELECT commands can see different data, even though
they are within a single transaction, if other transactions commit
changes after the first SELECT starts and before the second SELECT starts."Thanks :-)
--
Adrian Klaver
adrian.klaver@aklaver.com--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi,
On Mon, Oct 5, 2015 at 5:02 AM, Begin Daniel <jfd553@hotmail.com> wrote:
In order to process a large amount of data I need to run a procedure using
parallel batch processes.
The query I run in each process is expected to ...
It seems, you are trying to achieve the same, what we did. Find my blog
entry
<http://manojadinesh.blogspot.com/2015/07/parallel-operations-with-plpgsql_9.html>,
which may help you in this scenarios.
1- select a bunch of id (500) in a table (wait4processing
) containing the list of all records to process
2- remove selected records from wait4processing table in order to
eliminate duplicate processing
3- run the procedure (build_contributions_clusters) over the range of
selected ids--The query I use:
With ids as( delete from wait4processing where id in( select id from
wait4processing limit 500) returning id)
select build_contributions_clusters(min(id),max(id)) from ids;The query runs properly if I send it sequentially (wait for the completion
of the query before sening it again) but it does'nt work when sending
multiple occurrences in parallel. Seems from the results I got that the
first query received by the server runs properly but the following ones try
to process the same first 500 records even if deleted - the
build_contributions_clusters procedure receive NULL values instead of the
following 500 records.Since I am almost certain it is the expected behavior, I would like to like
to understand why, and I would also appreciate to see alternative queries
to do the job.Thanks :-)
--
Regards,
Dinesh
manojadinesh.blogspot.com