[PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

Started by dinesh kumarabout 10 years ago9 messages
#1dinesh kumar
dineshkumar02@gmail.com

Hi Team,

Would like to propose a new DIAGNOSTICS attribute, which returns the no.of
rows got skipped during the FOR UPDATE SKIP LOCKED;

Using this attribute, we can have more control on parallel operations like,

IF SKIPPED_ROW_COUNT =0 THEN
<<Treat me as, a complete transaction, and do below stuff>>
ELSE
<<Got only few tuples than required, and do below stuff>>
END IF;

Kindly let me know your inputs/suggestions on this.

Regards,
Dinesh
manojadinesh.blogspot.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: dinesh kumar (#1)
Re: [PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

dinesh kumar <dineshkumar02@gmail.com> writes:

Would like to propose a new DIAGNOSTICS attribute, which returns the no.of
rows got skipped during the FOR UPDATE SKIP LOCKED;

I'm concerned that there may not be any implementation-independent
definition of this. That is, the query plan might or might not reject
rows before the locking step is reached, which would result in
random-looking changes in the output of the proposed counter.

Constraining the query plan might fix that, but only at unacceptable
performance costs, especially since those constraints would have to apply
to every plan ever generated (since the query planner can't know whether
you will inquire about this counter value later).

Using this attribute, we can have more control on parallel operations like,

IF SKIPPED_ROW_COUNT =0 THEN
<<Treat me as, a complete transaction, and do below stuff>>
ELSE
<<Got only few tuples than required, and do below stuff>>
END IF;

Um ... so what? This is not a use-case.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3dinesh kumar
dineshkumar02@gmail.com
In reply to: Tom Lane (#2)
Re: [PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

On Mon, Oct 12, 2015 at 9:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

dinesh kumar <dineshkumar02@gmail.com> writes:

Would like to propose a new DIAGNOSTICS attribute, which returns the

no.of

rows got skipped during the FOR UPDATE SKIP LOCKED;

I'm concerned that there may not be any implementation-independent
definition of this. That is, the query plan might or might not reject
rows before the locking step is reached, which would result in
random-looking changes in the output of the proposed counter.

Constraining the query plan might fix that, but only at unacceptable
performance costs, especially since those constraints would have to apply
to every plan ever generated (since the query planner can't know whether
you will inquire about this counter value later).

Thanks Tom. Understood.

Using this attribute, we can have more control on parallel operations

like,

IF SKIPPED_ROW_COUNT =0 THEN
<<Treat me as, a complete transaction, and do below stuff>>
ELSE
<<Got only few tuples than required, and do below stuff>>
END IF;

Um ... so what? This is not a use-case.

In my view, "How one can be sure that, he obtained all the tuples with SKIP
LOCKED". If the end user has this counter value, he may proceed with a
different approach with partially locked tuples.

regards, tom lane

--

Regards,
Dinesh
manojadinesh.blogspot.com

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: dinesh kumar (#3)
Re: [PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

Using this attribute, we can have more control on parallel operations

like,

IF SKIPPED_ROW_COUNT =0 THEN
<<Treat me as, a complete transaction, and do below stuff>>
ELSE
<<Got only few tuples than required, and do below stuff>>
END IF;

Um ... so what? This is not a use-case.

In my view, "How one can be sure that, he obtained all the tuples with
SKIP LOCKED". If the end user has this counter value, he may proceed with a
different approach with partially locked tuples.

​Can you be more specific? In most cases I can come up with (queues,
basically) where skipped locked is running the processing performing the
query is going to re-query the database on the next tick regardless of
whether they thought they say only some or all of the potential rows on the
prior pass.

David J.

#5dinesh kumar
dineshkumar02@gmail.com
In reply to: David G. Johnston (#4)
Re: [PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

On Tue, Oct 13, 2015 at 5:53 AM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

Using this attribute, we can have more control on parallel operations

like,

IF SKIPPED_ROW_COUNT =0 THEN
<<Treat me as, a complete transaction, and do below stuff>>
ELSE
<<Got only few tuples than required, and do below stuff>>
END IF;

Um ... so what? This is not a use-case.

In my view, "How one can be sure that, he obtained all the tuples with
SKIP LOCKED". If the end user has this counter value, he may proceed with a
different approach with partially locked tuples.

​Can you be more specific? In most cases I can come up with (queues,
basically) where skipped locked is running the processing performing the
query is going to re-query the database on the next tick regardless of
whether they thought they say only some or all of the potential rows on the
prior pass.

Sure,

In an existing wait policies like WAIT(default) and NO WAIT,
one can be sure to determine(Using ROW_COUNT daignostics counter),
how many required tuples he processed in a transaction.
But this is not case when it comes to SKIP LOCKED. My concern is,
how we come to know that, our SKIP LOCKED missed few tuples due to lock, OR
it's processed all the tuples.
I understood that, the name itself defining SKIP the LOCKED rows, but we
are not measuring it.

I wrote the patch and it's working as below. But I haven't extended this to
other planners yet.

postgres=# DO
$$
DECLARE
rc INT;
BEGIN
PERFORM * FROM test WHERE mod(t,2)=0 FOR UPDATE SKIP LOCKED;
GET DIAGNOSTICS rc=SKIPPED_ROW_COUNT;
RAISE NOTICE 'Skipped : %', rc;
GET DIAGNOSTICS rc=ROW_COUNT;
RAISE NOTICE 'Processed : %', rc;
END;
$$
;
NOTICE: Skipped : 2
NOTICE: Processed : 3
DO

Once we measured the SKIP LOCKED, then we can only consider
to re-process the Skipped !=0 transactions rather doing every transaction
again.

In my view, SKIP LOCKED is a nice feature, which gives only the available
OR unlocked tuples.
But those are not the complete required tuples for the given SQL statement.
Isn't it ?!

Let me know, if I am still not clear about this.

David J.

--

Regards,
Dinesh
manojadinesh.blogspot.com

#6Robert Haas
robertmhaas@gmail.com
In reply to: dinesh kumar (#5)
Re: [PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

On Tue, Oct 13, 2015 at 10:37 PM, dinesh kumar <dineshkumar02@gmail.com> wrote:

In an existing wait policies like WAIT(default) and NO WAIT,
one can be sure to determine(Using ROW_COUNT daignostics counter),
how many required tuples he processed in a transaction.
But this is not case when it comes to SKIP LOCKED.

Sure it is. You didn't process the ones that you skipped. This is no
different than if you say WHERE a = 5. Depending on plan choice and
table contents, you may have "skipped" a large number of rows where a
!= 5, or you may have skipped none at all.

In my view, SKIP LOCKED is a nice feature, which gives only the available OR
unlocked tuples.
But those are not the complete required tuples for the given SQL statement.
Isn't it ?!

They better be. If you wanted the locked tuples, you shouldn't have
asked to skip them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7dinesh kumar
dineshkumar02@gmail.com
In reply to: Robert Haas (#6)
Re: [PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

Hi Robert,

On Wed, Oct 14, 2015 at 12:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Oct 13, 2015 at 10:37 PM, dinesh kumar <dineshkumar02@gmail.com>
wrote:

In an existing wait policies like WAIT(default) and NO WAIT,
one can be sure to determine(Using ROW_COUNT daignostics counter),
how many required tuples he processed in a transaction.
But this is not case when it comes to SKIP LOCKED.

Sure it is. You didn't process the ones that you skipped. This is no
different than if you say WHERE a = 5. Depending on plan choice and
table contents, you may have "skipped" a large number of rows where a
!= 5, or you may have skipped none at all.

Yes, True.

But, using SKIP LOCKED we may bypass the tuples where a = 5, If those were
locked by parallel operations.

In my view, SKIP LOCKED is a nice feature, which gives only the

available OR

unlocked tuples.
But those are not the complete required tuples for the given SQL

statement.

Isn't it ?!

They better be.

Agreed.

If you wanted the locked tuples, you shouldn't have
asked to skip them.

Kindly let me know if I am going in a wrong way.

I see this feature as an add on to do the parallel DML operations.
There won't be any problem, if operations are mutually exclusive.
I mean, each session operates on unique set of tuples.

In the above case, we don't even need of SKIP LOCKED wait policy.

But, when it comes to mutually depend operations, isn't it nice to provide,
how much were locked by the other sessions. OR atlest a HINT to the other
session like,

GET DIAGNOSTICS var = DID_I_MISS_ANYTHING_FROM_OTHER_SESSIONS;

I agree that, adding counter will take a performance hit.
Rather going to my actual proposal on providing the counter value,
isn't it good to provide a boolean type HINT, if we miss atleast a single
tuple.

Let me know your thoughts.

Thanks in advance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--

Regards,
Dinesh
manojadinesh.blogspot.com

#8Robert Haas
robertmhaas@gmail.com
In reply to: dinesh kumar (#7)
Re: [PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

On Wed, Oct 14, 2015 at 6:28 PM, dinesh kumar <dineshkumar02@gmail.com> wrote:

I see this feature as an add on to do the parallel DML operations.
There won't be any problem, if operations are mutually exclusive.
I mean, each session operates on unique set of tuples.

In the above case, we don't even need of SKIP LOCKED wait policy.

But, when it comes to mutually depend operations, isn't it nice to provide,
how much were locked by the other sessions. OR atlest a HINT to the other
session like,

GET DIAGNOSTICS var = DID_I_MISS_ANYTHING_FROM_OTHER_SESSIONS;

I agree that, adding counter will take a performance hit.
Rather going to my actual proposal on providing the counter value,
isn't it good to provide a boolean type HINT, if we miss atleast a single
tuple.

Suppose there are 5 locked rows and 5 unlocked rows in the heap and you do this:

select * from t1 for share skip locked limit 5

The Boolean you propose will be false if the first 5 rows in physical
order are locked, and otherwise it will be false. But there's no
difference between those two scenarios from the perspective of the
application. Here's another example:

with foo as (select * from t1 for share skip locked) select * from foo
where a = 2;

If foo contains any locked rows at all, this will return true,
regardless of whether a = 2.

It's true that, for a lot of normal-ish queries, LockRows is applied
late enough that your proposed Boolean would return the intended
answer. But there are a bunch of exceptions, like the ones shown
above, and there might be more in the future.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9dinesh kumar
dineshkumar02@gmail.com
In reply to: Robert Haas (#8)
Re: [PROPOSAL] DIAGNOSTICS <var> = SKIPPED_ROW_COUNT

On Wed, Oct 14, 2015 at 4:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 14, 2015 at 6:28 PM, dinesh kumar <dineshkumar02@gmail.com>
wrote:

I see this feature as an add on to do the parallel DML operations.
There won't be any problem, if operations are mutually exclusive.
I mean, each session operates on unique set of tuples.

In the above case, we don't even need of SKIP LOCKED wait policy.

But, when it comes to mutually depend operations, isn't it nice to

provide,

how much were locked by the other sessions. OR atlest a HINT to the other
session like,

GET DIAGNOSTICS var = DID_I_MISS_ANYTHING_FROM_OTHER_SESSIONS;

I agree that, adding counter will take a performance hit.
Rather going to my actual proposal on providing the counter value,
isn't it good to provide a boolean type HINT, if we miss atleast a single
tuple.

Suppose there are 5 locked rows and 5 unlocked rows in the heap and you do
this:

select * from t1 for share skip locked limit 5

The Boolean you propose will be false if the first 5 rows in physical
order are locked, and otherwise it will be false. But there's no
difference between those two scenarios from the perspective of the
application. Here's another example:

with foo as (select * from t1 for share skip locked) select * from foo
where a = 2;

If foo contains any locked rows at all, this will return true,
regardless of whether a = 2.

It's true that, for a lot of normal-ish queries, LockRows is applied
late enough that your proposed Boolean would return the intended
answer. But there are a bunch of exceptions, like the ones shown
above, and there might be more in the future.

Hi Robert,

As usual, a great guidance from you. Thanks :-)

But I'm still trying to see, is there a way we can implement this for all
use cases.
Will update this thread with my findings.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--

Regards,
Dinesh
manojadinesh.blogspot.com