ctid ranges

Started by Thomas Munroalmost 14 years ago7 messagesgeneral
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi

In 9.1.3, this is fast, handled with a tid scan using the physical address:

SELECT ... FROM ... WHERE ctid = ...;

This is slow, handled with a seq scan (as are various rephrasing with
<, <=, etc):

SELECT ... FROM ... WHERE ctid BETWEEN ... AND ...;

Is there a way to retrieve the rows in a physical range quickly?

(I realise this is a pretty odd thing to want to do... I was
experimenting with a crackpot idea for storing some data in a known
physical order and finding the beginning of ends ranges by binary
chop, instead of using a btree.)

Thanks
Thomas Munro

#2Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#1)
Re: ctid ranges

On Fri, 2012-06-08 at 22:27 +0100, Thomas Munro wrote:

This is slow, handled with a seq scan (as are various rephrasing with
<, <=, etc):

SELECT ... FROM ... WHERE ctid BETWEEN ... AND ...;

Is there a way to retrieve the rows in a physical range quickly?

Interesting idea. However, as far as I know, there is no such support.

Regards,
Jeff Davis

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Jeff Davis (#2)
Re: ctid ranges

On Mon, Jun 11, 2012 at 7:57 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2012-06-08 at 22:27 +0100, Thomas Munro wrote:

This is slow, handled with a seq scan (as are various rephrasing with
<, <=, etc):

SELECT ... FROM ... WHERE ctid BETWEEN ... AND ...;

Is there a way to retrieve the rows in a physical range quickly?

Interesting idea. However, as far as I know, there is no such support.

yeah -- and I think it's a great thing to want to be able to do. it
could be used in parallelizing tricks for example: divide up a table
into N approximately equal parts and hand each one off to a work
thread.

merlin

#4Bruce Momjian
bruce@momjian.us
In reply to: Merlin Moncure (#3)
Re: ctid ranges

On Wed, Jun 13, 2012 at 03:15:14PM -0500, Merlin Moncure wrote:

On Mon, Jun 11, 2012 at 7:57 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2012-06-08 at 22:27 +0100, Thomas Munro wrote:

This is slow, handled with a seq scan (as are various rephrasing with
<, <=, etc):

SELECT ... FROM ... WHERE ctid BETWEEN ... AND ...;

Is there a way to retrieve the rows in a physical range quickly?

Interesting idea. However, as far as I know, there is no such support.

yeah -- and I think it's a great thing to want to be able to do. it
could be used in parallelizing tricks for example: divide up a table
into N approximately equal parts and hand each one off to a work
thread.

Can we add this as a TODO? It would basically be adding
less/greater-than comparisons for the 'tid' data type.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Bruce Momjian (#4)
Re: ctid ranges

On Wed, Jun 13, 2012 at 3:18 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jun 13, 2012 at 03:15:14PM -0500, Merlin Moncure wrote:

yeah -- and I think it's a great thing to want to be able to do.  it
could be used in parallelizing tricks for example: divide up a table
into N approximately equal parts and hand each one off to a work
thread.

Can we add this as a TODO?  It would basically be adding
less/greater-than comparisons for the 'tid' data type.

IMNSHO, it's a no-brainer for the todo (but I think it's more
complicated than adding some comparisons -- which are working now):

postgres=# explain select ctid from foo where ctid >= '(3786,67)'::tid limit 1;
QUERY PLAN
------------------------------------------------------------------
Limit (cost=0.00..0.05 rows=1 width=6)
-> Seq Scan on foo (cost=0.00..16422.00 rows=333333 width=6)
Filter: (ctid >= '(3786,67)'::tid)
(3 rows)

merlin

#6Bruce Momjian
bruce@momjian.us
In reply to: Merlin Moncure (#5)
Re: ctid ranges

On Wed, Jun 13, 2012 at 03:21:17PM -0500, Merlin Moncure wrote:

On Wed, Jun 13, 2012 at 3:18 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jun 13, 2012 at 03:15:14PM -0500, Merlin Moncure wrote:

yeah -- and I think it's a great thing to want to be able to do. �it
could be used in parallelizing tricks for example: divide up a table
into N approximately equal parts and hand each one off to a work
thread.

Can we add this as a TODO? �It would basically be adding
less/greater-than comparisons for the 'tid' data type.

IMNSHO, it's a no-brainer for the todo (but I think it's more
complicated than adding some comparisons -- which are working now):

postgres=# explain select ctid from foo where ctid >= '(3786,67)'::tid limit 1;
QUERY PLAN
------------------------------------------------------------------
Limit (cost=0.00..0.05 rows=1 width=6)
-> Seq Scan on foo (cost=0.00..16422.00 rows=333333 width=6)
Filter: (ctid >= '(3786,67)'::tid)
(3 rows)

I see. Seems we have to add index smarts to those comparisons. That
might be complicated.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: ctid ranges

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Jun 13, 2012 at 03:21:17PM -0500, Merlin Moncure wrote:

IMNSHO, it's a no-brainer for the todo (but I think it's more
complicated than adding some comparisons -- which are working now):

I see. Seems we have to add index smarts to those comparisons. That
might be complicated.

Uh, the whole point of a TID scan is to *not* need an index.

What would be needed is for tidpath.c to let through more kinds of TID
comparison quals than it does now, and then for nodeTidscan.c to know
what to do with them. The latter logic might well look something like
btree indexscan qual preparation, but it wouldn't be the same code.

regards, tom lane