pg_index.indislossy

Started by Bruce Momjianover 24 years ago28 messages

pgman@candle.pha.pa.us

over 24 years ago

Can someone tell me what we use indislossy for? The only comment I
could find was:

/*
* WITH clause reinstated to handle lossy indices. -- JMH, 7/22/96
*/
foreach(pl, parameterList)
{
DefElem *param = (DefElem *) lfirst(pl);

if (!strcasecmp(param->defname, "islossy"))
lossy = true;
else
elog(NOTICE, "Unrecognized index attribute \"%s\" ignored",
param->defname);
}

and

bool indislossy; /* do we fetch false tuples (lossy
* compression)? */

Should I remove this column?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

The Hermit Hacker

scrappy@hub.org

over 24 years ago

In reply to: Bruce Momjian (#1)

Re: pg_index.indislossy

On Mon, 14 May 2001, Bruce Momjian wrote:

Bruce Momjian wrote:

Can someone tell me what we use indislossy for?

IIRC it means that if you get something by this index you must check
again in the actual data

I think that at least the GIST intarray (actually intset) methods use
it.

So you probably should _not_ remove it ;)

I did a search and found it used only a few places. I do not see it
used as part of GIST. My rememberance is that it is involved in partial
indexes, where you index only certain values in a column. It was an old
idea that was never working in PostgreSQL.

Let's avoid removing things for the sake of removing them ... might be an
old idea that, if someone takes the time to research, might prove useful
...

Import Notes

Reply to msg id not found: 200105142149.f4ELnMV08293@candle.pha.pa.us | Resolved by subject fallback

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: The Hermit Hacker (#2)

Re: pg_index.indislossy

I think that at least the GIST intarray (actually intset) methods use
it.

So you probably should _not_ remove it ;)

I did a search and found it used only a few places. I do not see it
used as part of GIST. My rememberance is that it is involved in partial
indexes, where you index only certain values in a column. It was an old
idea that was never working in PostgreSQL.

Let's avoid removing things for the sake of removing them ... might be an
old idea that, if someone takes the time to research, might prove useful
...

Yea, there is actually some code attached to this vs. the others that
had no code at all. Are we ever going to do partial indexes? I guess
that is the question.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Bruce Momjian (#3)

Re: pg_index.indislossy

Let's avoid removing things for the sake of removing them ... might be an
old idea that, if someone takes the time to research, might prove useful
...

Yea, there is actually some code attached to this vs. the others that
had no code at all. Are we ever going to do partial indexes? I guess
that is the question.

One problem with keeping it is that interface coders are getting
confused by some of the unused system table columns, assuming they mean
something, when in fact they don't. Both ODBC and JDBC have had this
problem that I fixed today.

Maybe the best solution is to mark the code as NOT_USED and remove the
column. That way, the code stays around but no one sees it.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Bruce Momjian (#1)

Re: pg_index.indislossy

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Should I remove this column?

NO. The fact that we don't currently have any lossy index types doesn't
mean that we won't ever have them again. I've been wondering, for
example, whether the frequently heard requests for case-insensitive
indexes might not be answerable by implementing 'em as lossy indexes.

regards, tom lane

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Tom Lane (#5)

Re: pg_index.indislossy

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Should I remove this column?

NO. The fact that we don't currently have any lossy index types doesn't
mean that we won't ever have them again. I've been wondering, for
example, whether the frequently heard requests for case-insensitive
indexes might not be answerable by implementing 'em as lossy indexes.

Is lossy and partial indexes the same? I can't see how they were
supposed to be used. The _only_ mention of this field I see in
indexcmd.c, around line 132. All the other mentions are just passing
around parameters.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Tom Lane (#5)

Re: pg_index.indislossy

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Should I remove this column?

NO. The fact that we don't currently have any lossy index types doesn't
mean that we won't ever have them again. I've been wondering, for
example, whether the frequently heard requests for case-insensitive
indexes might not be answerable by implementing 'em as lossy indexes.

I see it now. It is just looking for a flag in the index creation:

if (!strcasecmp(param->defname, "islossy"))

That is the extent of the lossy code. There is optimizer code checking
for it. Seems an ifdef NOT_USED around those would document what needs
to be done, and document it doesn't do anything now, or I can just leave
it alone.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Bruce Momjian (#6)

Re: pg_index.indislossy

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Is lossy and partial indexes the same?

No.

I can't see how they were
supposed to be used. The _only_ mention of this field I see in
indexcmd.c, around line 132.

You missed a few then --- the most critical being in createplan.c.

AFAIK this is a working feature, and I am going to insist that you
keep your hands off it ...

regards, tom lane

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Tom Lane (#8)

Re: pg_index.indislossy

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Is lossy and partial indexes the same?

No.

I can't see how they were
supposed to be used. The _only_ mention of this field I see in
indexcmd.c, around line 132.

You missed a few then --- the most critical being in createplan.c.

AFAIK this is a working feature, and I am going to insist that you
keep your hands off it ...

Really, it actually works? What are they?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#10

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Bruce Momjian (#9)

Re: pg_index.indislossy

Bruce Momjian <pgman@candle.pha.pa.us> writes:

AFAIK this is a working feature, and I am going to insist that you
keep your hands off it ...

Really, it actually works? What are they?

Read create_indexscan_plan().

regards, tom lane

#11

Hannu Krosing

hannu@tm.ee

over 24 years ago

In reply to: Bruce Momjian (#9)

Re: pg_index.indislossy

Bruce Momjian wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Is lossy and partial indexes the same?

No.

I can't see how they were
supposed to be used. The _only_ mention of this field I see in
indexcmd.c, around line 132.

You missed a few then --- the most critical being in createplan.c.

AFAIK this is a working feature, and I am going to insist that you
keep your hands off it ...

Really, it actually works? What are they?

rom the readme of contrib/intarray/README.intarray

This is an implementation of RD-tree data structure using GiST interface
of PostgreSQL. It has built-in lossy compression - must be declared
in index creation - with (islossy). Current implementation provides
index
support for one-dimensional array of int4's - gist__int_ops, suitable
for
small and medium size of arrays (used on default), and gist__intbig_ops
for
indexing large arrays (we use superimposed signature with length of 4096
bits to represent sets).

-----------------------
Hannu

#12

Oleg Bartunov

oleg@sai.msu.su

over 24 years ago

In reply to: Bruce Momjian (#1)

Re: pg_index.indislossy

GiST certainly use it ! Please don't remove.
GiST uses compression/decompression technique which could
produce false drops.

Oleg
On Mon, 14 May 2001, Bruce Momjian wrote:

Can someone tell me what we use indislossy for? The only comment I
could find was:

/*
* WITH clause reinstated to handle lossy indices. -- JMH, 7/22/96
*/
foreach(pl, parameterList)
{
DefElem *param = (DefElem *) lfirst(pl);

if (!strcasecmp(param->defname, "islossy"))
lossy = true;
else
elog(NOTICE, "Unrecognized index attribute \"%s\" ignored",
param->defname);
}

and

bool indislossy; /* do we fetch false tuples (lossy
* compression)? */

Should I remove this column?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#13

Zeugswetter Andreas SB

ZeugswetterA@wien.spardat.at

over 24 years ago

In reply to: Oleg Bartunov (#12)

AW: pg_index.indislossy

Let's avoid removing things for the sake of removing them ... might be an
old idea that, if someone takes the time to research, might prove useful
...

Yea, there is actually some code attached to this vs. the others that
had no code at all. Are we ever going to do partial indexes? I guess
that is the question.

The idea is very very good, and since there is an exaple implementation in
pg 4 it should probably be possible to reimplement. (DB2 has this feature also)

In real life, you would e.g. index a status column for rows, that need more work.
create index deposit_status_index on deposit (status) where status <> 0;
99% of your rows would have status = 0 thus the index would be extremely
efficient for all select statements that search for a specific status other than 0.

Imho it would be a shame to give up that idea so easily.

Andreas

Import Notes

Resolved by subject fallback

#14

Zeugswetter Andreas SB

ZeugswetterA@wien.spardat.at

over 24 years ago

In reply to: Zeugswetter Andreas SB (#13)

AW: pg_index.indislossy

One problem with keeping it is that interface coders are getting
confused by some of the unused system table columns, assuming they mean
something, when in fact they don't. Both ODBC and JDBC have had this
problem that I fixed today.

Imho the correct answer to this would be to implement the SQL standard
system views, and make all interfaces use those.

Andreas

Import Notes

Resolved by subject fallback

#15

Zeugswetter Andreas SB

ZeugswetterA@wien.spardat.at

over 24 years ago

In reply to: Zeugswetter Andreas SB (#14)

AW: pg_index.indislossy

Can someone tell me what we use indislossy for?

Ok, so the interpretation of this field is:
A match in the index needs to be reevaluated in the heap tuple data,
since a match in the index does not necessarily mean, that the heap tuple
matches.
If the heap tuple data matches, the index must always match.

A very typical example for such an index is a hash index. This might explain the
fact, that the ODBC driver misinterpreted that field as meaning that the index is a hash.
The field has nothing to do with partial index.

Andreas

Import Notes

Resolved by subject fallback

#16

Thomas Lockhart

lockhart@alumni.caltech.edu

over 24 years ago

In reply to: Zeugswetter Andreas SB (#13)

Re: AW: pg_index.indislossy

Yea, there is actually some code attached to this vs. the others that
had no code at all. Are we ever going to do partial indexes? I guess
that is the question.

The idea is very very good, and since there is an exaple implementation in
pg 4 it should probably be possible to reimplement. (DB2 has this feature also)

...

Imho it would be a shame to give up that idea so easily.

Agreed. Another common example is to create an index on all non-null
values of a column.

- Thomas

#17

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Oleg Bartunov (#12)

Re: pg_index.indislossy

GiST certainly use it ! Please don't remove.
GiST uses compression/decompression technique which could
produce false drops.

No problem. It is unchanged. I was just asking.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#18

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Zeugswetter Andreas SB (#15)

Re: AW: pg_index.indislossy

Added to pg_index.h file as a comment.

Can someone tell me what we use indislossy for?

Ok, so the interpretation of this field is:
A match in the index needs to be reevaluated in the heap tuple data,
since a match in the index does not necessarily mean, that the heap tuple
matches.
If the heap tuple data matches, the index must always match.

A very typical example for such an index is a hash index. This might explain the
fact, that the ODBC driver misinterpreted that field as meaning that the index is a hash.
The field has nothing to do with partial index.

Andreas

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#19

Peter Eisentraut

peter_e@gmx.net

over 24 years ago

In reply to: Bruce Momjian (#18)

Re: AW: pg_index.indislossy

Bruce Momjian writes:

Can someone tell me what we use indislossy for?

Ok, so the interpretation of this field is:
A match in the index needs to be reevaluated in the heap tuple data,
since a match in the index does not necessarily mean, that the heap tuple
matches.
If the heap tuple data matches, the index must always match.

AFAIK, this is true for all indexes in PostgreSQL, because index rows
don't store the transactions status. Of course those are two different
underlying reasons why a heap lookup is always necessary, but there
shouldn't be any functional difference in the current implementation.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#20

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Peter Eisentraut (#19)

Re: AW: pg_index.indislossy

Bruce Momjian writes:

Can someone tell me what we use indislossy for?

Ok, so the interpretation of this field is:
A match in the index needs to be reevaluated in the heap tuple data,
since a match in the index does not necessarily mean, that the heap tuple
matches.
If the heap tuple data matches, the index must always match.

AFAIK, this is true for all indexes in PostgreSQL, because index rows
don't store the transactions status. Of course those are two different
underlying reasons why a heap lookup is always necessary, but there
shouldn't be any functional difference in the current implementation.

Seems it is something they added for the index abstraction and not for
practical use by PostgreSQL.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#21

Zeugswetter Andreas SB

ZeugswetterA@wien.spardat.at

over 24 years ago

In reply to: Bruce Momjian (#20)

AW: AW: pg_index.indislossy

Can someone tell me what we use indislossy for?

Ok, so the interpretation of this field is:
A match in the index needs to be reevaluated in the heap tuple data,
since a match in the index does not necessarily mean, that the heap tuple
matches.
If the heap tuple data matches, the index must always match.

AFAIK, this is true for all indexes in PostgreSQL, because index rows
don't store the transactions status. Of course those are two different
underlying reasons why a heap lookup is always necessary, but there
shouldn't be any functional difference in the current implementation.

Seems it is something they added for the index abstraction and not for
practical use by PostgreSQL.

Why, you do not need to call the comparison function on the heap data
if the index is not lossy, saves some CPU cycles.

Andreas

Import Notes

Resolved by subject fallback

#22

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Peter Eisentraut (#19)

Re: AW: pg_index.indislossy

Peter Eisentraut <peter_e@gmx.net> writes:

Bruce Momjian writes:
A match in the index needs to be reevaluated in the heap tuple data,
since a match in the index does not necessarily mean, that the heap tuple
matches.

AFAIK, this is true for all indexes in PostgreSQL, because index rows
don't store the transactions status.

Not true at all. The tuple commit status needs to be rechecked, yes,
but with a normal index it is not necessary to recheck whether the index
key field actually satisfies the index qual conditions. With a lossy
index it *is* necessary to recheck --- the index may return more tuples
than the ones that match the given qual. For example, an r-tree index
applied to a "distance from point X <= D" query might return all the
tuples lying within a bounding box of the circle actually wanted.

The LIKE index optimization can also be thought of as using an index as
a lossy index: the index scan gives you all the tuples you want, plus
some you don't, so you have to evaluate the LIKE operator over again at
each returned tuple.

Basically, what this is good for is using an index for more kinds of
WHERE conditions than it could otherwise support. It is *not* a useless
abstraction. It occurs to me though that marking the index itself
as lossy is the wrong way to think about it --- the right way is to
associate the "lossy" flag with use of a particular operator with an
index. So maybe the flag should be in pg_amop or pg_amproc, instead.
Someday I'd also like to see those tables extended so that the LIKE
index optimization is described by the tables, rather than being
hard-wired into the planner as it is now.

regards, tom lane

#23

Peter Eisentraut

peter_e@gmx.net

over 24 years ago

In reply to: Tom Lane (#22)

Re: AW: pg_index.indislossy

Tom Lane writes:

Not true at all. The tuple commit status needs to be rechecked, yes,
but with a normal index it is not necessary to recheck whether the index
key field actually satisfies the index qual conditions. With a lossy
index it *is* necessary to recheck --- the index may return more tuples
than the ones that match the given qual.

Okay, this is not surprising. I agree that storing this in the index
might be suboptimal.

But why is this called lossy? Shouldn't it be called "exceedy"?

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#24

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Zeugswetter Andreas SB (#21)

Re: AW: AW: pg_index.indislossy

Can someone tell me what we use indislossy for?

Ok, so the interpretation of this field is:
A match in the index needs to be reevaluated in the heap tuple data,
since a match in the index does not necessarily mean, that the heap tuple
matches.
If the heap tuple data matches, the index must always match.

AFAIK, this is true for all indexes in PostgreSQL, because index rows
don't store the transactions status. Of course those are two different
underlying reasons why a heap lookup is always necessary, but there
shouldn't be any functional difference in the current implementation.

Seems it is something they added for the index abstraction and not for
practical use by PostgreSQL.

Why, you do not need to call the comparison function on the heap data
if the index is not lossy, saves some CPU cycles.

Because we don't know of the tuples expired status until we check the
heap.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#25

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Peter Eisentraut (#23)

Re: AW: pg_index.indislossy

Peter Eisentraut <peter_e@gmx.net> writes:

But why is this called lossy? Shouldn't it be called "exceedy"?

Good point ;-). "lossy" does sound like the index might "lose" tuples,
which is exactly what it's not allowed to do; it must find all the
tuples that match the query.

The terminology is correct by analogy to "lossy compression" --- the
index loses information, in the sense that its result isn't quite the
result you wanted. But I can see where it'd confuse the unwary.
Perhaps we should consult the literature and see if there is another
term for this concept.

regards, tom lane

#26

Bruce Momjian

pgman@candle.pha.pa.us

over 24 years ago

In reply to: Tom Lane (#25)

Re: AW: pg_index.indislossy

Peter Eisentraut <peter_e@gmx.net> writes:

But why is this called lossy? Shouldn't it be called "exceedy"?

Good point ;-). "lossy" does sound like the index might "lose" tuples,
which is exactly what it's not allowed to do; it must find all the
tuples that match the query.

The terminology is correct by analogy to "lossy compression" --- the
index loses information, in the sense that its result isn't quite the
result you wanted. But I can see where it'd confuse the unwary.
Perhaps we should consult the literature and see if there is another
term for this concept.

Seeing how our ODBC driver refrences it in previous releases, we are
going to have trouble changing it. I always thought it was "lossy" in
terms of compression too.

I don't see it mentioned now in ODBC, but I think it used to be there.
I changed it recently to check for word "hash" instead.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#27

Noname

ncm@zembu.com

over 24 years ago

In reply to: Tom Lane (#25)

Re: AW: pg_index.indislossy

On Tue, Jul 10, 2001 at 01:36:33PM -0400, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

But why is this called lossy? Shouldn't it be called "exceedy"?

Good point ;-). "lossy" does sound like the index might "lose" tuples,
which is exactly what it's not allowed to do; it must find all the
tuples that match the query.

The terminology is correct by analogy to "lossy compression" --- the
index loses information, in the sense that its result isn't quite the
result you wanted. But I can see where it'd confuse the unwary.
Perhaps we should consult the literature and see if there is another
term for this concept.

How about "hinty"? :-)

Seriously, "indislossy" is a singularly poor name for a predicate.
Also, are we so poor that we can't afford whole words, or even word
breaks? I propose "index_is_hint".

Actually, is the "ind[ex]" part even necessary?
How about "must_check_heap"?

Nathan Myers
ncm@zembu.com

#28

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Noname (#27)

Re: AW: pg_index.indislossy

ncm@zembu.com (Nathan Myers) writes:

Seriously, "indislossy" is a singularly poor name for a predicate.

Perhaps, but it fits with the existing naming conventions for Postgres
catalog columns. Unless we want to indulge in wholesale renaming of
the system's catalog columns (and break an awful lot of applications)
I'd resist any name for a pg_index column that's not of the form "indFOO".

regards, tom lane