[PROPOSAL] Covering + unique indexes.

Started by Anastasia Lubennikovaover 10 years ago32 messageshackers

a.lubennikova@postgrespro.ru

over 10 years ago

Hi, hackers!

Use case:
Index-only scans is a wonderful feature that allows to speed up select
queries of indexed columns.
Therefore some users want to create multicolumn indexes on columns which
are queried often.
But if there's unique constraint on some column, they have to maintain
another unique index.
Even if the column is already in covering index.
This adds overhead to data manipulation operations and database size.

I've started work on a patch that allows to combine covering and unique
functionality.
The main idea is to allow user create multicolumn indexes with a
definite number of unique columns.
For example (don't mind SQL syntax here, please):
CREATE INDEX index ON table (c1, c2, c3) UNIQUE ON (c1, c2);
Created index has three columns, but only two of them have unique
constraint.

This idea has obvious restriction. We can set unique only for first
index columns.
There is no clear way to maintain following index.
CREATE INDEX index ON table (c1, c2, c3) UNIQUE ON (c1, c3);

So I suggest following syntax:
CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}} INDEX ON
table_name (column_name1, column_name2 ...);

Examples:
CREATE UNIQUE INDEX ON table_name (c1, c2, c3); // (c1,c2, c3) must be
UNIQUE. That's how it works now.

CREATE UNIQUE ON FIRST COLUMN INDEX ON table_name (c1, c2, c3); // (c1)
must be UNIQUE
CREATE UNIQUE ON FIRST 2 COLUMNS INDEX ON table_name (c1, c2, c3); //
(c1,c2) must be UNIQUE
CREATE UNIQUE ON FIRST 3 COLUMNS INDEX ON table_name (c1, c2, c3); //
(c1,c2, c3) must be UNIQUE

Next issue is pg_index changes.
Now there's only a boolean flag

* bool indisunique; /* is this a unique index? */

But new algorithm requires to store a single number

* unit16n_unique_columns; /* number of first columns of index which
has unique constrains. */

I think, that numbers of all attributes themselves are not needed. Is it
right?

I'd like to see your suggestions about syntax changes.
And of course any other comments are welcome.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Import Notes

Reply to msg id not found: 55F2C1BC.3010109@postgrespro.ruReference msg id not found: 55F2C1BC.3010109@postgrespro.ru

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Anastasia Lubennikova (#1)

Re: [PROPOSAL] Covering + unique indexes.

On 9/11/15 7:45 AM, Anastasia Lubennikova wrote:

This idea has obvious restriction. We can set unique only for first
index columns.
There is no clear way to maintain following index.
CREATE INDEX index ON table (c1, c2, c3) UNIQUE ON (c1, c3);

So I suggest following syntax:
CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}} INDEX ON
table_name (column_name1, column_name2 ...);

I would use the first (simple) syntax and just throw an error if the
user tries to skip a column on the UNIQUE clause.

Have you by chance looked to see what other databases have done for
syntax? I'm guessing this isn't covered by ANSI but maybe there's
already an industry consensus.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Teodor Sigaev

teodor@sigaev.ru

over 10 years ago

In reply to: Jim Nasby (#2)

Re: [PROPOSAL] Covering + unique indexes.

CREATE INDEX index ON table (c1, c2, c3) UNIQUE ON (c1, c3);

CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}} INDEX ON
table_name (column_name1, column_name2 ...);

I would use the first (simple) syntax and just throw an error if the
user tries to skip a column on the UNIQUE clause.

Seems, second option looks as more natural extension of CREATE UNIQUE INDEX

Have you by chance looked to see what other databases have done for
syntax? I'm guessing this isn't covered by ANSI but maybe there's
already an industry consensus.

MS SQL and DB/2 suggests (with changes for postgresql):
CREATE UNIQUE INDEX i ON t (a,b) INCLUDE (c)

MS SQL supports both unique and non-unique indexes, DB/2 only unique
indexes. Oracle/MySQL doesn't support covering indexes. Readed at
http://use-the-index-luke.com/sql/clustering/index-only-scan-covering-index

MSSQL/DB/2 variants looks good too.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@gmail.com

over 10 years ago

In reply to: Teodor Sigaev (#3)

Re: [PROPOSAL] Covering + unique indexes.

On Tue, Sep 15, 2015 at 6:08 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:

CREATE INDEX index ON table (c1, c2, c3) UNIQUE ON (c1, c3);

CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}} INDEX ON
table_name (column_name1, column_name2 ...);

I would use the first (simple) syntax and just throw an error if the
user tries to skip a column on the UNIQUE clause.

Seems, second option looks as more natural extension of CREATE UNIQUE INDEX

Have you by chance looked to see what other databases have done for
syntax? I'm guessing this isn't covered by ANSI but maybe there's
already an industry consensus.

MS SQL and DB/2 suggests (with changes for postgresql):
CREATE UNIQUE INDEX i ON t (a,b) INCLUDE (c)

MS SQL supports both unique and non-unique indexes, DB/2 only unique
indexes. Oracle/MySQL doesn't support covering indexes. Readed at
http://use-the-index-luke.com/sql/clustering/index-only-scan-covering-index

It surprised me that you can INCLUDE extra columns on non-UNIQUE indexes,
since you could just add them as regular indexed columns for the same
effect. It looks like when you do that in SQL Server, the extra columns
are only stored on btree leaf pages and so can't be used for searching or
ordering. I don't know how useful that is or if we would ever want it...
but I just wanted to note that difference, and that the proposed UNIQUE ON
FIRST n COLUMNS syntax and catalog change can't express that.

http://sqlperformance.com/2014/07/sql-indexes/new-index-columns-key-vs-include

--
Thomas Munro
http://www.enterprisedb.com

Teodor Sigaev

teodor@sigaev.ru

over 10 years ago

In reply to: Thomas Munro (#4)

Re: [PROPOSAL] Covering + unique indexes.

It surprised me that you can INCLUDE extra columns on non-UNIQUE
indexes, since you could just add them as regular indexed columns for
the same effect. It looks like when you do that in SQL Server, the
extra columns are only stored on btree leaf pages and so can't be used
for searching or ordering. I don't know how useful that is or if we
would ever want it... but I just wanted to note that difference, and

Agree

that the proposed UNIQUE ON FIRST n COLUMNS syntax and catalog change
can't express that.

Proposal suggests to work only with unique index by exactly your
reasons above.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Thomas Munro (#4)

Re: [PROPOSAL] Covering + unique indexes.

On 9/14/15 1:50 PM, Thomas Munro wrote:

CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}}
INDEX ON
table_name (column_name1, column_name2 ...);

I would use the first (simple) syntax and just throw an error if the
user tries to skip a column on the UNIQUE clause.

Seems, second option looks as more natural extension of CREATE
UNIQUE INDEX

True, but it's awefully verbose. :( And...

It surprised me that you can INCLUDE extra columns on non-UNIQUE
indexes, since you could just add them as regular indexed columns for
the same effect. It looks like when you do that in SQL Server, the
extra columns are only stored on btree leaf pages and so can't be used
for searching or ordering. I don't know how useful that is or if we
would ever want it... but I just wanted to note that difference, and
that the proposed UNIQUE ON FIRST n COLUMNS syntax and catalog change
can't express that.

... we might want to support INCLUDE at some point. It enhances covering
scans without bloating the heck out of the btree. (I'm not sure if it
would help other index types...) So it seems like a bad idea to preclude
that.

I don't see that UNIQUE ON FIRST precludes also supporting INCLUDE.
Presumably we could do either

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);
or
CREATE UNIQUE ON FIRST 2 COLUMNS INDEX ... ON table (f1, f2, f3)
INCLUDE(f4);

Personally, I find the first form easier to read.

Are we certain that no index type could ever support an index on (f1,
f2, f3) UNIQUE(f1, f3)? Even if it doesn't make sense for btree, perhaps
some other index could handle it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Oleg Bartunov

oleg@sai.msu.su

over 10 years ago

In reply to: Jim Nasby (#6)

Re: [PROPOSAL] Covering + unique indexes.

On Tue, Sep 15, 2015 at 12:44 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

On 9/14/15 1:50 PM, Thomas Munro wrote:

CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}}
INDEX ON
table_name (column_name1, column_name2 ...);

I would use the first (simple) syntax and just throw an error if
the
user tries to skip a column on the UNIQUE clause.

Seems, second option looks as more natural extension of CREATE
UNIQUE INDEX

True, but it's awefully verbose. :( And...

It surprised me that you can INCLUDE extra columns on non-UNIQUE

indexes, since you could just add them as regular indexed columns for
the same effect. It looks like when you do that in SQL Server, the
extra columns are only stored on btree leaf pages and so can't be used
for searching or ordering. I don't know how useful that is or if we
would ever want it... but I just wanted to note that difference, and
that the proposed UNIQUE ON FIRST n COLUMNS syntax and catalog change
can't express that.

... we might want to support INCLUDE at some point. It enhances covering
scans without bloating the heck out of the btree. (I'm not sure if it would
help other index types...) So it seems like a bad idea to preclude that.

I don't see that UNIQUE ON FIRST precludes also supporting INCLUDE.
Presumably we could do either

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);
or
CREATE UNIQUE ON FIRST 2 COLUMNS INDEX ... ON table (f1, f2, f3)
INCLUDE(f4);

Personally, I find the first form easier to read.

Why not normal syntax with optional INCLUDE ?

CREATE UNIQUE INDEX ON table (f1,f2,f3) INCLUDE (f4)

Show quoted text

Are we certain that no index type could ever support an index on (f1, f2,
f3) UNIQUE(f1, f3)? Even if it doesn't make sense for btree, perhaps some
other index could handle it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thom Brown

thom@linux.com

over 10 years ago

In reply to: Jim Nasby (#6)

Re: [PROPOSAL] Covering + unique indexes.

On 14 September 2015 at 22:44, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 9/14/15 1:50 PM, Thomas Munro wrote:

CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}}
INDEX ON
table_name (column_name1, column_name2 ...);

I would use the first (simple) syntax and just throw an error if
the
user tries to skip a column on the UNIQUE clause.

Seems, second option looks as more natural extension of CREATE
UNIQUE INDEX

True, but it's awefully verbose. :( And...

It surprised me that you can INCLUDE extra columns on non-UNIQUE

indexes, since you could just add them as regular indexed columns for
the same effect. It looks like when you do that in SQL Server, the
extra columns are only stored on btree leaf pages and so can't be used
for searching or ordering. I don't know how useful that is or if we
would ever want it... but I just wanted to note that difference, and
that the proposed UNIQUE ON FIRST n COLUMNS syntax and catalog change
can't express that.

... we might want to support INCLUDE at some point. It enhances covering
scans without bloating the heck out of the btree. (I'm not sure if it would
help other index types...) So it seems like a bad idea to preclude that.

I don't see that UNIQUE ON FIRST precludes also supporting INCLUDE.
Presumably we could do either

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);
or
CREATE UNIQUE ON FIRST 2 COLUMNS INDEX ... ON table (f1, f2, f3)
INCLUDE(f4);

Personally, I find the first form easier to read.

I guess the standard CREATE UNIQUE INDEX can be seen as shorthand for
CREATE INDEX with all columns listed in the UNIQUE clause.

Are we certain that no index type could ever support an index on (f1, f2,

f3) UNIQUE(f1, f3)? Even if it doesn't make sense for btree, perhaps some
other index could handle it.

That's certainly an interesting question. At the moment, only btree is
capable of enforcing uniqueness, but that's not to say it will always be
that way. But I guess you'd need a way for the access method list of
defining whether it's capable of multi-column indexes with out-of-order
unique columns. (or some more sensible way of describing it)

Thom

Thom Brown

thom@linux.com

over 10 years ago

In reply to: Oleg Bartunov (#7)

Re: [PROPOSAL] Covering + unique indexes.

On 14 September 2015 at 23:12, Oleg Bartunov <obartunov@gmail.com> wrote:

On Tue, Sep 15, 2015 at 12:44 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

On 9/14/15 1:50 PM, Thomas Munro wrote:

CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}}
INDEX ON
table_name (column_name1, column_name2 ...);

I would use the first (simple) syntax and just throw an error if
the
user tries to skip a column on the UNIQUE clause.

Seems, second option looks as more natural extension of CREATE
UNIQUE INDEX

True, but it's awefully verbose. :( And...

It surprised me that you can INCLUDE extra columns on non-UNIQUE

indexes, since you could just add them as regular indexed columns for
the same effect. It looks like when you do that in SQL Server, the
extra columns are only stored on btree leaf pages and so can't be used
for searching or ordering. I don't know how useful that is or if we
would ever want it... but I just wanted to note that difference, and
that the proposed UNIQUE ON FIRST n COLUMNS syntax and catalog change
can't express that.

... we might want to support INCLUDE at some point. It enhances covering
scans without bloating the heck out of the btree. (I'm not sure if it would
help other index types...) So it seems like a bad idea to preclude that.

I don't see that UNIQUE ON FIRST precludes also supporting INCLUDE.
Presumably we could do either

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);
or
CREATE UNIQUE ON FIRST 2 COLUMNS INDEX ... ON table (f1, f2, f3)
INCLUDE(f4);

Personally, I find the first form easier to read.

Why not normal syntax with optional INCLUDE ?

CREATE UNIQUE INDEX ON table (f1,f2,f3) INCLUDE (f4)

That's not the same thing. Then you're including f3 in the unique
constraint, which you may not want for uniqueness purposes, but may want in
the index for sorting. But then saying that, if f1 and f2 are unique,
you'd only get 1 value of f3 for each combination of f1 and f2, so sorting
probably isn't useful. You might as well only INCLUDE f3 rather than have
it in the multi-column index for sorting. So to adjust your example:

CREATE UNIQUE INDEX ON table (f1,f2) INCLUDE (f3, f4);

Is there a scenario anyone can think of where f3 here:

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);

would be advantageous outside of INCLUDE?

Out of curiosity, why is this only being discussed for unique indexes?
What if you want additional columns included on non-unique indexes?

Thom

#10

Gavin Flower

GavinFlower@archidevsys.co.nz

over 10 years ago

In reply to: Jim Nasby (#6)

Re: [PROPOSAL] Covering + unique indexes.

On 15/09/15 09:44, Jim Nasby wrote:

On 9/14/15 1:50 PM, Thomas Munro wrote:

CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}}
INDEX ON
table_name (column_name1, column_name2 ...);

I would use the first (simple) syntax and just throw an error
if the
user tries to skip a column on the UNIQUE clause.

Seems, second option looks as more natural extension of CREATE
UNIQUE INDEX

True, but it's awefully verbose. :( And...

It surprised me that you can INCLUDE extra columns on non-UNIQUE
indexes, since you could just add them as regular indexed columns for
the same effect. It looks like when you do that in SQL Server, the
extra columns are only stored on btree leaf pages and so can't be used
for searching or ordering. I don't know how useful that is or if we
would ever want it... but I just wanted to note that difference, and
that the proposed UNIQUE ON FIRST n COLUMNS syntax and catalog change
can't express that.

... we might want to support INCLUDE at some point. It enhances
covering scans without bloating the heck out of the btree. (I'm not
sure if it would help other index types...) So it seems like a bad
idea to preclude that.

I don't see that UNIQUE ON FIRST precludes also supporting INCLUDE.
Presumably we could do either

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);

Of the formats I've seen so far, I prefer this one.

I think using "[ALSO] INCLUDE(f4)" - might be potentially more readable
than using just "INCLUDE(f4)". even if not used, the noise word also
would help people understand that the other fields mentioned are already
covered.

If not too difficult then allowing the unique fields to be separated by
other fields could be useful - in the example allowing "UNIQUE(f1,
f3)". Especially if the index is likely to be used to CLUSTER a table,
where the order f1, f2, ... is important.

or
CREATE UNIQUE ON FIRST 2 COLUMNS INDEX ... ON table (f1, f2, f3)
INCLUDE(f4);

Personally, I find the first form easier to read.

Are we certain that no index type could ever support an index on (f1,
f2, f3) UNIQUE(f1, f3)? Even if it doesn't make sense for btree,
perhaps some other index could handle it.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Teodor Sigaev

teodor@sigaev.ru

over 10 years ago

In reply to: Jim Nasby (#6)

Re: [PROPOSAL] Covering + unique indexes.

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);

I don't see an advantage this form. What is f3 column? just order? and
f4 will not be used for compare? At least now it requires additional
checks that UNIQUE() fields are the same as first columns in definition.
Non ordering field f4 will require invasive intervention in planner
because now it believes that all columns in btree are ordered.

I agree, that form
CREATE UNIQUE INDEX i ON t (f1, f2, f3) INCLUDE (f4)
is clear. f4 will be used in row compare and actually planner will be
able to use it as unique index (f1, f2, f3) with additional f4 or as
as unique index (f1, f2, f3, f4), because uniqueness on (f1, f2, f3)
gives warranty of uniqueness on (f1, f2, f3, f4)

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Teodor Sigaev

teodor@sigaev.ru

over 10 years ago

In reply to: Thom Brown (#9)

Re: [PROPOSAL] Covering + unique indexes.

Why not normal syntax with optional INCLUDE ?

CREATE UNIQUE INDEX ON table (f1,f2,f3) INCLUDE (f4)

That's not the same thing. Then you're including f3 in the unique
constraint, which you may not want for uniqueness purposes, but may want
in the index for sorting. But then saying that, if f1 and f2 are
unique, you'd only get 1 value of f3 for each combination of f1 and f2,
so sorting probably isn't useful. You might as well only INCLUDE f3
rather than have it in the multi-column index for sorting. So to adjust
your example:

CREATE UNIQUE INDEX ON table (f1,f2) INCLUDE (f3, f4);

Is there a scenario anyone can think of where f3 here:

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);

would be advantageous outside of INCLUDE?

Out of curiosity, why is this only being discussed for unique indexes?
What if you want additional columns included on non-unique indexes?

Because there is no difference for non-unique indexes between (f1,f2,f3)
and (f1, f2) INCLUDE (f3). In second case we just got index with
unordered f3 column.

Oh no, it's possible that f3 column type has not btree operator class...
If we want to support this case then intervention in planner will be a
bit invasive.

BTW, I don't see in foreseen future another unique access methods.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

David Rowley

dgrowleyml@gmail.com

over 10 years ago

In reply to: Teodor Sigaev (#11)

Re: [PROPOSAL] Covering + unique indexes.

On 15 September 2015 at 18:16, Teodor Sigaev <teodor@sigaev.ru> wrote:

CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);

I don't see an advantage this form. What is f3 column? just order? and f4
will not be used for compare? At least now it requires additional checks
that UNIQUE() fields are the same as first columns in definition. Non
ordering field f4 will require invasive intervention in planner because
now it believes that all columns in btree are ordered.

I'm also a bit confused where f3 comes in here. If it's UNIQUE on (f1,f2)
and we include f4. Where's f3?

I agree, that form
CREATE UNIQUE INDEX i ON t (f1, f2, f3) INCLUDE (f4)
is clear. f4 will be used in row compare and actually planner will be able
to use it as unique index (f1, f2, f3) with additional f4 or as
as unique index (f1, f2, f3, f4), because uniqueness on (f1, f2, f3) gives
warranty of uniqueness on (f1, f2, f3, f4)

I'd vote for this too. However, INCLUDE does not seem to be a reserved word
at the moment.

I think this syntax fits in nicely to with non-unique indexes too.

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

#14

Vik Fearing

vik@postgresfriends.org

over 10 years ago

In reply to: David Rowley (#13)

Re: [PROPOSAL] Covering + unique indexes.

On 09/15/2015 10:57 AM, David Rowley wrote:

I agree, that form

CREATE UNIQUE INDEX i ON t (f1, f2, f3) INCLUDE (f4)
is clear. f4 will be used in row compare and actually planner will be able
to use it as unique index (f1, f2, f3) with additional f4 or as
as unique index (f1, f2, f3, f4), because uniqueness on (f1, f2, f3) gives
warranty of uniqueness on (f1, f2, f3, f4)

I'd vote for this too. However, INCLUDE does not seem to be a reserved word
at the moment.

What about CREATE UNIQUE INDEX i ON t (f1, f2, f3) WITH (f4); ?
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

David Rowley

dgrowleyml@gmail.com

over 10 years ago

In reply to: Anastasia Lubennikova (#1)

Re: [PROPOSAL] Covering + unique indexes.

On 12 September 2015 at 00:45, Anastasia Lubennikova <
a.lubennikova@postgrespro.ru> wrote:

I've started work on a patch that allows to combine covering and unique
functionality.

Great to hear someone is working on this!

Next issue is pg_index changes.
Now there's only a boolean flag

- bool indisunique; /* is this a unique index? */

But new algorithm requires to store a single number

- unit16 n_unique_columns; /* number of first columns of index which
has unique constrains. */

I think, that numbers of all attributes themselves are not needed. Is it
right?

I think the total number of attributes is already in indnatts.
I imagine you're planning to keep the indexed columns at the start of
the indkey[] array, and just use n_unique_columns to mark how many of the
indkey[] attributes to check as indexed columns. I'd imagine the change
would be fairly simple from a planner point of view as you'd just need to
check columns 1 to n_unique_columns instead of 1 to indnatts. Although I
have a tendency to under estimate these things :(

I imagine you don't want to name the new column n_unique_columns, since it
does not fit too well with non-unique indexes.
Perhaps just indindexedatts, or something slightly along those lines. But
perhaps it would be a good idea to also rename "ncolumns" in code, to
ensure that any non-updated code does not even compile. Perhaps including
"tot" or "total" in there might help indicate it's new meaning.

Regards

David Rowley
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

#16

Anastasia Lubennikova

a.lubennikova@postgrespro.ru

over 10 years ago

In reply to: David Rowley (#15)

Re: [PROPOSAL] Covering + unique indexes.

15.09.2015 12:18, David Rowley:

On 12 September 2015 at 00:45, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru <mailto:a.lubennikova@postgrespro.ru>>
wrote:

I've started work on a patch that allows to combine covering and
unique functionality.

Great to hear someone is working on this!

Thank you! It looks like very interesting project to me)

Next issue is pg_index changes.
Now there's only a boolean flag

* bool indisunique; /* is this a unique index? */

But new algorithm requires to store a single number

* unit16n_unique_columns; /* number of first columns of index
which has unique constrains. */

I think, that numbers of all attributes themselves are not needed.
Is it right?

I think the total number of attributes is already in indnatts.
I imagine you're planning to keep the indexed columns at the start of
the indkey[] array, and just use n_unique_columns to mark how many of
the indkey[] attributes to check as indexed columns. I'd imagine the
change would be fairly simple from a planner point of view as you'd
just need to check columns 1 to n_unique_columns instead of 1 to
indnatts. Although I have a tendency to under estimate these things :(

I've asked that for the same reason. I'm not too deep in access method
and btree code, so I don't want to miss any hidden dependencies.
As I see it, changes are required in _bt_isequal()
<http://doxygen.postgresql.org/nbtinsert_8c.html#abcfb7a3dcd40a5d1759652239f3a0115>.
Instead of

for (i = 1; i <= keysz; i++) {} // where /keysz/ is actually /natts//=
rel->rd_rel->relnatts;
/Look at _bt_check_uinque
<http://doxygen.postgresql.org/nbtinsert_8c.html#a96eb8c53ffdf53f139b037194a9721a3>
and pg_class
<http://doxygen.postgresql.org/pg__class_8h.html#ac8bf924d36feee5f3ca4c36aa01c75ec>
for more info.

cycle should be
for (i = 1; i <= nuinique; i++) {} // where /nunique /is value of
/rel->rd_index->//indnuinque

//indnuinque /is a new field, which I suggest to add to pg_index
<http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a>
instead of /indisunique/ (or may be in addition to it).

But I'm doubt about the problem which Jim has mentioned.

Are we certain that no index type could ever support an index on (f1,

f2, f3) UNIQUE(f1, f3)? Even if it >doesn't make sense for btree,
perhaps some other index could handle it.

If it's really so, we certainly have to keep in pg_index
<http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a>
not just number of unique columns (/indnunique/), but their numbers
themselves (an array /indnuniqueatts/ on the model of /indnatts/).

I imagine you don't want to name the new column n_unique_columns,
since it does not fit too well with non-unique indexes.

Hm, I think that it would be quite clear to set it to zero for
non-unique indexes.
/(nunique == 0)/ is equal to /(indisunique==false)/.

But maybe I've missed some reason why we should to save /indisunique/ flag.

Perhaps just indindexedatts, or something slightly along those lines.
But perhaps it would be a good idea to also rename "ncolumns" in code,
to ensure that any non-updated code does not even compile. Perhaps
including "tot" or "total" in there might help indicate it's new meaning.

I hope, that all these changes are not needed. Just continue to use
/ncolumns/ for all indexed columns. And new variable /nuinique/ (or
something like that) is for number of unique columns in the index.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#17

Anastasia Lubennikova

a.lubennikova@postgrespro.ru

over 10 years ago

In reply to: Vik Fearing (#14)

Re: [PROPOSAL] Covering + unique indexes.

15.09.2015 12:11, Vik Fearing:

On 09/15/2015 10:57 AM, David Rowley wrote:

I agree, that form

CREATE UNIQUE INDEX i ON t (f1, f2, f3) INCLUDE (f4)
is clear. f4 will be used in row compare and actually planner will be able
to use it as unique index (f1, f2, f3) with additional f4 or as
as unique index (f1, f2, f3, f4), because uniqueness on (f1, f2, f3) gives
warranty of uniqueness on (f1, f2, f3, f4)

I'd vote for this too. However, INCLUDE does not seem to be a reserved word
at the moment.

What about CREATE UNIQUE INDEX i ON t (f1, f2, f3) WITH (f4); ?

WITH seems ambiguity to me. It refers to CTE, so I expect to see after
that a kind of query expression. But maybe that's just matter of habit.

BTW, that's the first syntax change I'm working with.
Is there any convention in PostgreSQL about new keywords and so on?
Where can I find it?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

David Rowley

dgrowleyml@gmail.com

over 10 years ago

In reply to: Anastasia Lubennikova (#16)

Re: [PROPOSAL] Covering + unique indexes.

On 15 September 2015 at 22:20, Anastasia Lubennikova <
a.lubennikova@postgrespro.ru> wrote:

Hm, I think that it would be quite clear to set it to zero for non-unique
indexes.
*(nunique == 0)* is equal to *(indisunique==false)*.

But maybe I've missed some reason why we should to save *indisunique*
flag.

I'd say that Jim summed this one up well, with:

... we might want to support INCLUDE at some point. It enhances covering

scans without bloating the heck out of the btree. (I'm not sure if it would
help other index types...) So it seems like a bad idea to preclude that.

Which I take to mean non-unique indexes.

So if you just kept the indisunique flag, and added a column to state the
number of columns that are actually in the "index" (not INCLUDE columns).
Then your code would probably work for both unique and non-unique index.
This way users don't have to pay the price of index bloat if they tag on
high cardinality columns onto the end of the index's column list.

Perhaps it would be easier just to add a new column to pg_index which
stores the total attrs, that way you could get away with not having to edit
each of the existing for() loop that go over the index attributes. This
would just store the idxnattrs + number of included columns. Perhaps
something named idxtotnatts or idxtotalnatts.

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

#19

Teodor Sigaev

teodor@sigaev.ru

over 10 years ago

In reply to: Anastasia Lubennikova (#16)

Re: [PROPOSAL] Covering + unique indexes.

Seems, final form is

CREATE INDEX idx ON tbl (f1, f2, f3) [UNIQUE ON (f1, f2)] [INCLUDE (f4)]

f1, f2, f3 are participated in index row comparence (btre, gist etc)
f1, f2 are participated in unique constrain and it gives warranty for
(f1, f2, f3[, f4]) uniqueness. Now supported by Btree only
f4 doesn't participate in row comparence and could even do not have an operator
class. Btree and GiST could support that.

The form
CREATE UNIQUE INDEX ON tbl (f1, f2, f3)
is exact equivalent of form
CREATE INDEX idx ON tbl (f1, f2, f3) UNIQUE ON (f1, f2, f3)

I hope, that it's doible without a lot of difficulties.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Vik Fearing

vik@postgresfriends.org

over 10 years ago

In reply to: Anastasia Lubennikova (#17)

Re: [PROPOSAL] Covering + unique indexes.

On 09/15/2015 12:45 PM, Anastasia Lubennikova wrote:

15.09.2015 12:11, Vik Fearing:

On 09/15/2015 10:57 AM, David Rowley wrote:

I agree, that form

CREATE UNIQUE INDEX i ON t (f1, f2, f3) INCLUDE (f4)
is clear. f4 will be used in row compare and actually planner will
be able
to use it as unique index (f1, f2, f3) with additional f4 or as
as unique index (f1, f2, f3, f4), because uniqueness on (f1, f2,
f3) gives
warranty of uniqueness on (f1, f2, f3, f4)

I'd vote for this too. However, INCLUDE does not seem to be a
reserved word
at the moment.

What about CREATE UNIQUE INDEX i ON t (f1, f2, f3) WITH (f4); ?

WITH seems ambiguity to me. It refers to CTE, so I expect to see after
that a kind of query expression. But maybe that's just matter of habit.

Not necessarily. See WITH ORDINALITY, for example. However, now that
I've looked at it, index creation already takes a WITH clause for
storage parameters, so that's out.

BTW, that's the first syntax change I'm working with.
Is there any convention in PostgreSQL about new keywords and so on?
Where can I find it?

I don't think it's written anywhere except peppered throughout the
archives. New keywords are greatly frowned upon.

INCLUDING is already an unreserved keyword, and sounds more natural than
INCLUDE anyway. Maybe that could work?
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21