Better name/syntax for "online" index creation

Started by Tom Laneover 19 years ago34 messageshackers

tgl@sss.pgh.pa.us

over 19 years ago

I'm fairly dissatisfied with the naming of Greg Stark's proposed new
feature for creating indexes without blocking writers of the table.
To my mind, "ONLINE" just doesn't convey any useful information ---
the existing CREATE INDEX functionality could already be said to be
"online", in the sense that you don't have to take down the database
to do it. I thought about "SHARED" but someone could probably raise
the same objection to it. Anyone have a better idea?

I'm also wondering about where in the command the keyword should go.
As submitted it's

( { <replaceable class="parameter">column</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ <replaceable class="parameter">opclass</replaceable> ] [, ...] )
[ WITH ( <replaceable class="PARAMETER">storage_parameter</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] ) ]
[ TABLESPACE <replaceable class="parameter">tablespace</replaceable> ]
+ [ ONLINE]
[ WHERE <replaceable class="parameter">predicate</replaceable> ]

which seems a bit randomly chosen; what's more it creates a problem for
psql, which would have to parse nearly the entire command to discover
whether it's safe to execute inside a transaction block or not.
I'm tempted to put the new keyword at the very front:

SHARED CREATE INDEX ....

which would probably mean that we'd have to document it as if it were a
completely separate command from CREATE INDEX, but then again that might
not be a bad thing considering how differently the two cases behave.
If not that, we probably still need to put it somewhere near the front
for psql's sake.

Comments?

regards, tom lane

Rod Taylor

rbt@rbt.ca

over 19 years ago

In reply to: Tom Lane (#1)

Re: Better name/syntax for "online" index creation

SHARED CREATE INDEX ....

Comments?

CREATE [UNIQUE] INDEX <foo> [WITH NOLOCK] ON ...

Rod Taylor

rbt@rbt.ca

over 19 years ago

In reply to: Rod Taylor (#2)

Re: Better name/syntax for "online" index creation

Sorry, hit send too quickly.

NOLOCK is kinda like NOWAIT, except implies that the command will not
take a strong lock instead of stating that it will not wait for one.

On Mon, 2006-07-24 at 11:20 -0400, Rod Taylor wrote:

SHARED CREATE INDEX ....

Comments?

CREATE [UNIQUE] INDEX <foo> [WITH NOLOCK] ON ...

Peter Eisentraut

peter_e@gmx.net

over 19 years ago

In reply to: Tom Lane (#1)

Re: Better name/syntax for "online" index creation

Am Montag, 24. Juli 2006 17:13 schrieb Tom Lane:

To my mind, "ONLINE" just doesn't convey any useful information ---
the existing CREATE INDEX functionality could already be said to be
"online", in the sense that you don't have to take down the database
to do it. I thought about "SHARED" but someone could probably raise
the same objection to it. Anyone have a better idea?

CONCURRENTLY

I'm tempted to put the new keyword at the very front:

SHARED CREATE INDEX ....

which would probably mean that we'd have to document it as if it were a
completely separate command from CREATE INDEX, but then again that might
not be a bad thing considering how differently the two cases behave.

What is so different about them that would justify this?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Peter Eisentraut (#4)

Re: Better name/syntax for "online" index creation

Peter Eisentraut <peter_e@gmx.net> writes:

Am Montag, 24. Juli 2006 17:13 schrieb Tom Lane:

I'm tempted to put the new keyword at the very front:

SHARED CREATE INDEX ....

which would probably mean that we'd have to document it as if it were a
completely separate command from CREATE INDEX, but then again that might
not be a bad thing considering how differently the two cases behave.

What is so different about them that would justify this?

Well, mainly it's exactly the reason that psql wants to know the
difference: one can be executed inside a transaction block, and the
other one can't. To my mind that's a sufficiently big difference
that it deserves a different command name. We messed this up with
CLUSTER but that's not a precedent I want to follow.

regards, tom lane

Bruce Momjian

bruce@momjian.us

over 19 years ago

In reply to: Tom Lane (#1)

Re: Better name/syntax for "online" index creation

Tom Lane <tgl@sss.pgh.pa.us> writes:

I'm fairly dissatisfied with the naming of Greg Stark's proposed new
feature for creating indexes without blocking writers of the table.
To my mind, "ONLINE" just doesn't convey any useful information ---
the existing CREATE INDEX functionality could already be said to be
"online", in the sense that you don't have to take down the database
to do it. I thought about "SHARED" but someone could probably raise
the same objection to it. Anyone have a better idea?

I know Oracle calls this "online" index builds. In fact it works similarly
with a single keyword "online" tacked on near the end of the create index
statement.

Anyone know what MSSQL or DB2 call it?

I'm also wondering about where in the command the keyword should go.
As submitted it's

[ WITH ( <replaceable class="PARAMETER">storage_parameter</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] ) ]
[ TABLESPACE <replaceable class="parameter">tablespace</replaceable> ]
+ [ ONLINE]
[ WHERE <replaceable class="parameter">predicate</replaceable> ]

One thing I didn't like about this is that really all of these clauses should
be legal to put in in any order. I'm not sure that's doable with the WHERE
clause but the others ought to be possible to make an arbitrary list that can
be given in any order. But perhaps that's irrelevant if we don't go with
ONLINE at the end at all.

which seems a bit randomly chosen; what's more it creates a problem for
psql, which would have to parse nearly the entire command to discover
whether it's safe to execute inside a transaction block or not.

One thing to think about, what will the command to execute stored procedures
look like? Those will also need to be called from outside a transaction.

I keep coming back to this feeling that the server should be the one starting
the transaction, not psql. But then that could just be my experience with
Oracle. In Oracle you're never "outside a transaction". Transactions
implicitly start the first time you execute almost any statement.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

Bort, Paul

pbort@tmwsystems.com

over 19 years ago

In reply to: Bruce Momjian (#6)

Re: Better name/syntax for "online" index creation

Greg Stark asked:

I know Oracle calls this "online" index builds. In fact it
works similarly
with a single keyword "online" tacked on near the end of the
create index
statement.

Anyone know what MSSQL or DB2 call it?

I have to live with MSSQL at work, and I didn't remember anything like
this, so I looked up the syntax for CREATE INDEX
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlre
f/ts_tsqlcon_6lyk.asp), and I can't find anything that gives the user
control over this. The closest is this note in the remarks:

Backup and CREATE INDEX operations do not block each other. If a
backup is in progress, index is created in a fully logged mode, which
may require extra log space.

It sounds to me like they fall back to 'online' index creation if a
backup is in progress, but give the user no control over it. I also
looked in the settings and didn't see anything relevant.

Regards,
Paul Bort

Alvaro Herrera

alvherre@2ndquadrant.com

over 19 years ago

In reply to: Bruce Momjian (#6)

Re: Better name/syntax for "online" index creation

Greg Stark wrote:

Tom Lane <tgl@sss.pgh.pa.us> writes:

I'm also wondering about where in the command the keyword should go.
As submitted it's

[ WITH ( <replaceable class="PARAMETER">storage_parameter</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] ) ]
[ TABLESPACE <replaceable class="parameter">tablespace</replaceable> ]
+ [ ONLINE]
[ WHERE <replaceable class="parameter">predicate</replaceable> ]

One thing I didn't like about this is that really all of these clauses should
be legal to put in in any order. I'm not sure that's doable with the WHERE
clause but the others ought to be possible to make an arbitrary list that can
be given in any order. But perhaps that's irrelevant if we don't go with
ONLINE at the end at all.

If you were going to propose something like the "INCLUDING CONSTRAINTS"
or "EXCLUDING CONSTRAINTS" stuff, which you can specify multiple times
and then the last one takes precedence, then I personally don't like it.
IMHO it's not nice that we accept that kind of things.

which seems a bit randomly chosen; what's more it creates a problem for
psql, which would have to parse nearly the entire command to discover
whether it's safe to execute inside a transaction block or not.

One thing to think about, what will the command to execute stored procedures
look like? Those will also need to be called from outside a transaction.

CALL presumably?

I keep coming back to this feeling that the server should be the one starting
the transaction, not psql. But then that could just be my experience with
Oracle. In Oracle you're never "outside a transaction". Transactions
implicitly start the first time you execute almost any statement.

Well, the problem is the AUTOCOMMIT mode. In that case it isn't the
server starting the transaction, but the client app. We already had the
server starting the transaction (back in 7.3 IIRC) and it was such a
problem that it was rolled back, and instead the autocommit logic was
put into the client apps.

In Postgres, just like in Oracle, you're never outside of a transaction
as well. The only difference is when does the transaction end -- in
Postgres, it's either right after the command, or when COMMIT/ROLLBACK
is issued, depending on whether the user (or the client app) issued
BEGIN or not. But you already knew that. So the only problem is that
psql should be figuring out if it should send a BEGIN or not, which
depends on the command being executed. In the normal CREATE INDEX case,
psql should send a BEGIN right before. But in the ONLINE CREATE INDEX
case, psql should figure out that it must *not* send the BEGIN.

The server logic does not change in either case. It only knows to
reject the ONLINE CREATE INDEX when inside a transaction block, because
it cannot possibly roll it back.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Peter Eisentraut

peter_e@gmx.net

over 19 years ago

In reply to: Bruce Momjian (#6)

Re: Better name/syntax for "online" index creation

Greg Stark wrote:

One thing to think about, what will the command to execute stored
procedures look like? Those will also need to be called from outside
a transaction.

Huh? Only if you invent your own stored-procedure theory or have a
hitherto unknown interpretation of the SQL standard.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#10

Alvaro Herrera

alvherre@2ndquadrant.com

over 19 years ago

In reply to: Peter Eisentraut (#9)

Re: Better name/syntax for "online" index creation

Peter Eisentraut wrote:

Greg Stark wrote:

One thing to think about, what will the command to execute stored
procedures look like? Those will also need to be called from outside
a transaction.

Huh? Only if you invent your own stored-procedure theory or have a
hitherto unknown interpretation of the SQL standard.

We've discussed a couple of times that stored procedures should have
enough abilities to start and commit transactions on their own, to be
useful to administration tasks like vacuuming. It's not something that
has been set in stone but it's something to consider.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#11

Bruce Momjian

bruce@momjian.us

over 19 years ago

In reply to: Rod Taylor (#3)

Re: Better name/syntax for "online" index creation

Rod Taylor wrote:

Sorry, hit send too quickly.

NOLOCK is kinda like NOWAIT, except implies that the command will not
take a strong lock instead of stating that it will not wait for one.

On Mon, 2006-07-24 at 11:20 -0400, Rod Taylor wrote:

SHARED CREATE INDEX ....

I like NOLOCK myself. ONLINE makes it sound like non-ONLINE index
creation has to happen when the server is down.

I don't like SHARED as the first word because we don't do this in any
other cases --- it makes it look like the command is SHARED, and I am
sure people will try putting SHARED in front of other commands like
UPDATE. "Hey, where is the SHARED manual page?"

Anyway, if you want psql to easily identify it, just return NOLOCK as
part of the command string returned:

test=> create index i on test(x);
CREATE INDEX NOLOCK

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12

Bruce Momjian

bruce@momjian.us

over 19 years ago

In reply to: Bruce Momjian (#11)

Re: Better name/syntax for "online" index creation

Bruce Momjian wrote:

Rod Taylor wrote:

Sorry, hit send too quickly.

NOLOCK is kinda like NOWAIT, except implies that the command will not
take a strong lock instead of stating that it will not wait for one.

On Mon, 2006-07-24 at 11:20 -0400, Rod Taylor wrote:

SHARED CREATE INDEX ....

I like NOLOCK myself. ONLINE makes it sound like non-ONLINE index
creation has to happen when the server is down.

I don't like SHARED as the first word because we don't do this in any
other cases --- it makes it look like the command is SHARED, and I am
sure people will try putting SHARED in front of other commands like
UPDATE. "Hey, where is the SHARED manual page?"

Anyway, if you want psql to easily identify it, just return NOLOCK as
part of the command string returned:

test=> create index i on test(x);
CREATE INDEX NOLOCK

Oh, psql needs to know before the command is sent? How do we handle it
now with CLUSTER? Whatever psql is trying to prevent doesn't seem to
warrant mucking up the logical order of the CREATE INDEX command.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13

Alvaro Herrera

alvherre@2ndquadrant.com

over 19 years ago

In reply to: Bruce Momjian (#12)

Re: Better name/syntax for "online" index creation

Bruce Momjian wrote:

Anyway, if you want psql to easily identify it, just return NOLOCK as
part of the command string returned:

test=> create index i on test(x);
CREATE INDEX NOLOCK

Oh, psql needs to know before the command is sent? How do we handle it
now with CLUSTER?

We don't, which is exactly the problem. If I'm not mistaken, currently
psql in autocommit off mode, CLUSTER doesn't start a transaction block,
which is arguably wrong because some forms of CLUSTER (single-table) are
able to work within a transaction. But since not all of them are, then
we must act like they all were, because otherwise we would send spurious
error messages to the user.

Whatever psql is trying to prevent doesn't seem to
warrant mucking up the logical order of the CREATE INDEX command.

Personally I'm not sure if this is too serious an issue.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#14

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Alvaro Herrera (#13)

Re: Better name/syntax for "online" index creation

Alvaro Herrera <alvherre@commandprompt.com> writes:

Bruce Momjian wrote:

Oh, psql needs to know before the command is sent? How do we handle it
now with CLUSTER?

We don't, which is exactly the problem. If I'm not mistaken, currently
psql in autocommit off mode, CLUSTER doesn't start a transaction block,
which is arguably wrong because some forms of CLUSTER (single-table) are
able to work within a transaction.

psql could actually tell these apart if it worked just a bit harder.
CLUSTER with no arguments is the one case, CLUSTER with anything after
it is the other. Not sure why we couldn't be bothered to get that
right in psql the first time :-(.

But to get back to the point at hand, I think that there should be some
equally obvious syntactic clue about what CREATE INDEX does --- and
burying an ONLINE keyword near the end of the command doesn't qualify.

regards, tom lane

#15

Bruce Momjian

bruce@momjian.us

over 19 years ago

In reply to: Tom Lane (#14)

Re: Better name/syntax for "online" index creation

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Bruce Momjian wrote:

Oh, psql needs to know before the command is sent? How do we handle it
now with CLUSTER?

We don't, which is exactly the problem. If I'm not mistaken, currently
psql in autocommit off mode, CLUSTER doesn't start a transaction block,
which is arguably wrong because some forms of CLUSTER (single-table) are
able to work within a transaction.

psql could actually tell these apart if it worked just a bit harder.
CLUSTER with no arguments is the one case, CLUSTER with anything after
it is the other. Not sure why we couldn't be bothered to get that
right in psql the first time :-(.

But to get back to the point at hand, I think that there should be some
equally obvious syntactic clue about what CREATE INDEX does --- and
burying an ONLINE keyword near the end of the command doesn't qualify.

OK, how about CREATE INDEX x NOLOCK ON tab ... Please don't use it as
the first word. I am afraid we would regret that.

Or maybe CREATE INDEX x ON tab NOLOCK.... Having the NOLOCK
associated with the table name makes sense.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Bruce Momjian (#15)

Re: Better name/syntax for "online" index creation

Bruce Momjian <bruce@momjian.us> writes:

OK, how about CREATE INDEX x NOLOCK ON tab ... Please don't use it as
the first word. I am afraid we would regret that.
Or maybe CREATE INDEX x ON tab NOLOCK.... Having the NOLOCK
associated with the table name makes sense.

psql would be a lot happier if the NOLOCK keyword were in front of
either of the identifiers. Otherwise it's got to deal with quoting
and possible schema qualification.

I have no objection to either of these syntaxes as long as someone else
writes the psql patch, however ;-)

regards, tom lane

#17

Bort, Paul

pbort@tmwsystems.com

over 19 years ago

In reply to: Tom Lane (#14)

Re: Better name/syntax for "online" index creation

Tom Lane wrote:

psql could actually tell these apart if it worked just a bit harder.
CLUSTER with no arguments is the one case, CLUSTER with
anything after it is the other. Not sure why we couldn't be
bothered to get that right in psql the first time :-(.

Should this go on the to-do list?

Regards,
Paul Bort

#18

Peter Eisentraut

peter_e@gmx.net

over 19 years ago

In reply to: Bruce Momjian (#15)

Re: Better name/syntax for "online" index creation

Am Dienstag, 25. Juli 2006 06:55 schrieb Bruce Momjian:

Or maybe CREATE INDEX x ON tab NOLOCK.... Having the NOLOCK
associated with the table name makes sense.

Semantically, NOLOCK (or whatever word) seems most closely associated with
CREATE INDEX, so having it as third word would make sense, and it would be
quite easy to parse for psql.

Strictly speaking, however, it would have to be NOLOCKLY in that case. :-)

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#19

Csaba Nagy

nagy@ecircle-ag.com

over 19 years ago

In reply to: Peter Eisentraut (#18)

Re: Better name/syntax for "online" index creation

Semantically, NOLOCK (or whatever word) seems most closely associated with
CREATE INDEX, so having it as third word would make sense, and it would be
quite easy to parse for psql.

Strictly speaking, however, it would have to be NOLOCKLY in that case. :-)

In this case CREATE INDEX CONCURRENTLY ... sounds better to me, although
the whole feature sounds nice any way you will finally call it ;-)

Cheers,
Csaba.

#20

Bruce Momjian

bruce@momjian.us

over 19 years ago

In reply to: Peter Eisentraut (#18)

Re: Better name/syntax for "online" index creation

Peter Eisentraut wrote:

Am Dienstag, 25. Juli 2006 06:55 schrieb Bruce Momjian:

Or maybe CREATE INDEX x ON tab NOLOCK.... Having the NOLOCK
associated with the table name makes sense.

Semantically, NOLOCK (or whatever word) seems most closely associated with
CREATE INDEX, so having it as third word would make sense, and it would be
quite easy to parse for psql.

Strictly speaking, however, it would have to be NOLOCKLY in that case. :-)

We can use the same code we use in psql for tab completion to find a
keyword in the line.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#21

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Csaba Nagy (#19)

#22