Re: Materialized views WIP patch

Started by Kevin Grittnerabout 13 years ago152 messageshackers
Jump to latest
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

Here is a new version of the patch, with most issues discussed in
previous posts fixed.

I've been struggling with two areas:

- pg_dump sorting for MVs which depend on other MVs
- proper handling of the relisvalid flag for unlogged MVs after recovery

I've been hacking at the code in those areas without success;
what's here is the least broken form I have, but work is still
needed for these cases. Any other problems are news to me.

In addition, the docs need another pass, and there is an open
question about what is the right thing to use for TRUNCATE syntax.

-Kevin

Attachments:

matview-v2.patchtext/x-patch; charset=utf-8; name=matview-v2.patchDownload+2922-881
#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#1)

[resending with patch compressed, since original post didn't make
it through to the list]

Here is a new version of the patch, with most issues discussed in
previous posts fixed.

I've been struggling with two areas:

- pg_dump sorting for MVs which depend on other MVs
- proper handling of the relisvalid flag for unlogged MVs after recovery

I've been hacking at the code in those areas without success;
what's here is the least broken form I have, but work is still
needed for these cases. Any other problems are news to me.

In addition, the docs need another pass, and there is an open
question about what is the right thing to use for TRUNCATE syntax.

-Kevin

Attachments:

matview-v2.patch.gzapplication/x-gzip; charset=utf-8; name=matview-v2.patch.gzDownload
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#2)

"Kevin Grittner" <kgrittn@mail.com> writes:

I've been struggling with two areas:
- pg_dump sorting for MVs which depend on other MVs

Surely that should fall out automatically given that the dependency is
properly expressed in pg_depend?

If you mean you're trying to get it to cope with circular dependencies
between MVs, it might take some work on the pg_dump side, but plain
ordering shouldn't require new code.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#3)

Tom Lane wrote:

"Kevin Grittner" <kgrittn@mail.com> writes:

I've been struggling with two areas:
- pg_dump sorting for MVs which depend on other MVs

Surely that should fall out automatically given that the
dependency is properly expressed in pg_depend?

If you mean you're trying to get it to cope with circular
dependencies between MVs, it might take some work on the pg_dump
side, but plain ordering shouldn't require new code.

The *definitions* sort properly, but what I'm trying to do is
define them WITH NO DATA and load data after all the COPY
statements for tables. If mva is referenced by mvb, the goal is the
REFRESH mva, build its indexes before running REFRESH for mvb and
building its indexes. To do things in any other order does't seem
to me to leave things after restore in the same state they were in
at the time of the dump.

So I should have been a little more verbose describing the problem:
pg_dump sorting of REFRESH and CREATE INDEX steps for MVs which
depend on other MVs.

Last night I found why my previous attempts had been failing -- I
was trying to build the dependencies at the wrong point in the dump
process, after the sorts had already been done.  Now that I've
spotted that fundamental flaw, I think I can get this out of the
way without too much more fanfare. I kept thinking I had something
wrong in the detail of my approach, while the problem was at a much
higher level.

Where I really need someone to hit me upside the head with a
clue-stick is the code I added to the bottom of RelationBuildDesc()
in relcache.c. The idea is that on first access to an unlogged MV,
to detect that the heap has been replaced by the init fork, set
relisvalid to false, and make the heap look normal again. I
couldn't see any way to do that which wasn't a kludge, and I can't
figure out how to deal with relcache properly in implementing that
kludge. Either a tip about the right way to work the kludge, or a
suggestion for a less kludgy alternative would be welcome.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Thom Brown
thom@linux.com
In reply to: Kevin Grittner (#1)

On 16 January 2013 05:40, Kevin Grittner <kgrittn@mail.com> wrote:

Here is a new version of the patch, with most issues discussed in
previous posts fixed.

I've been struggling with two areas:

- pg_dump sorting for MVs which depend on other MVs
- proper handling of the relisvalid flag for unlogged MVs after recovery

Some weirdness:

postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
SELECT 2
postgres=# \d+ mv_test2
Materialized view "public.mv_test2"
Column | Type | Modifiers | Storage | Stats target | Description
----------+---------+-----------+---------+--------------+-------------
moo | integer | | plain | |
?column? | integer | | plain | |
View definition:
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";
Has OIDs: no

The "weirdness" I refer you to is the view definition. This does not occur
with a straightforward UNION.

This does not occur with a regular view:

postgres=# CREATE VIEW v_test3 AS SELECT moo, 2*moo FROM v_test2 UNION ALL
SELECT moo, 3*moo FROM v_test2;
CREATE VIEW
postgres=# \d+ v_test3
View "public.v_test3"
Column | Type | Modifiers | Storage | Description
----------+---------+-----------+---------+-------------
moo | integer | | plain |
?column? | integer | | plain |
View definition:
SELECT v_test2.moo, 2 * v_test2.moo
FROM v_test2
UNION ALL
SELECT v_test2.moo, 3 * v_test2.moo
FROM v_test2;

--
Thom

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Thom Brown (#5)

Thom Brown wrote:

Some weirdness:

postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
SELECT 2
postgres=# \d+ mv_test2
 Materialized view "public.mv_test2"
 Column | Type | Modifiers | Storage | Stats target | Description
----------+---------+-----------+---------+--------------+-------------
 moo | integer | | plain | |
 ?column? | integer | | plain | |
View definition:
 SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";

You are very good at coming up with these, Thom!

Will investigate.

Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?

Thanks,

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Thom Brown
thom@linux.com
In reply to: Kevin Grittner (#6)

On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:

Thom Brown wrote:

Some weirdness:

postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
SELECT 2
postgres=# \d+ mv_test2
Materialized view "public.mv_test2"
Column | Type | Modifiers | Storage | Stats target | Description
----------+---------+-----------+---------+--------------+-------------
moo | integer | | plain | |
?column? | integer | | plain | |
View definition:
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";

You are very good at coming up with these, Thom!

Will investigate.

Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?

Yes, nothing wrong with using the MV, or refreshing it:

postgres=# TABLE mv_test2;
moo | ?column?
-----+----------
1 | 2
1 | 3
(2 rows)

postgres=# SELECT * FROM mv_test2;
moo | ?column?
-----+----------
1 | 2
1 | 3
(2 rows)

postgres=# REFRESH MATERIALIZED VIEW mv_test2;
REFRESH MATERIALIZED VIEW

But a pg_dump of the MV has the same issue as the view definition:

--
-- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom;
Tablespace:
--

CREATE MATERIALIZED VIEW mv_test2 (
moo,
"?column?"
) AS
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"
WITH NO DATA;

--
Thom

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#4)

"Kevin Grittner" <kgrittn@mail.com> writes:

Tom Lane wrote:

Surely that should fall out automatically given that the
dependency is properly expressed in pg_depend?

The *definitions* sort properly, but what I'm trying to do is
define them WITH NO DATA and load data after all the COPY
statements for tables. If mva is referenced by mvb, the goal is the
REFRESH mva, build its indexes before running REFRESH for mvb and
building its indexes. To do things in any other order does't seem
to me to leave things after restore in the same state they were in
at the time of the dump.

Ah. Can't you treat this using the same pg_dump infrastructure as
for the data for an ordinary table? The dependencies made for the
TableDataInfo object might be a bit different, but after that it
seems like the sort logic ought to be happy.

Where I really need someone to hit me upside the head with a
clue-stick is the code I added to the bottom of RelationBuildDesc()
in relcache.c. The idea is that on first access to an unlogged MV,
to detect that the heap has been replaced by the init fork, set
relisvalid to false, and make the heap look normal again.

Hmm. I agree that relcache.c has absolutely no business doing that,
but not sure what else to do instead. Seems like it might be better
done at first touch of the MV in the parser, rewriter, or planner ---
but the fact that I can't immediately decide which of those is right
makes me feel that it's still too squishy.

I'm also wondering about locking issues there. Obviously you don't
want more than one backend trying to rebuild the MV.

Do we really need unlogged MVs in the first iteration? Seems like
that's adding a whole bunch of new issues, when you have quite enough
already without that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#8)

Do we really need unlogged MVs in the first iteration? Seems like
that's adding a whole bunch of new issues, when you have quite enough
already without that.

While I think there is strong user demand for unlogged MVs, if we can
get MVs without unlogged ones for 9.3, I say go for that. We'll add
unlogged in 9.4.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)

On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Where I really need someone to hit me upside the head with a
clue-stick is the code I added to the bottom of RelationBuildDesc()
in relcache.c. The idea is that on first access to an unlogged MV,
to detect that the heap has been replaced by the init fork, set
relisvalid to false, and make the heap look normal again.

Hmm. I agree that relcache.c has absolutely no business doing that,
but not sure what else to do instead. Seems like it might be better
done at first touch of the MV in the parser, rewriter, or planner ---
but the fact that I can't immediately decide which of those is right
makes me feel that it's still too squishy.

I think we shouldn't be doing that at all. The whole business of
transferring the relation-is-invalid information from the relation to
a pg_class flag seems like a Rube Goldberg device to me. I'm still
not convinced that we should have a relation-is-invalid flag at all,
but can we at least not have two?

It seems perfectly adequate to detect that the MV is invalid when we
actually try to execute a plan - that is, when we first access the
heap or one of its indexes. So the bit can just live in the
file-on-disk, and there's no need to have a second copy of it in
pg_class.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Thom Brown
thom@linux.com
In reply to: Thom Brown (#7)

On 16 January 2013 17:25, Thom Brown <thom@linux.com> wrote:

On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:

Thom Brown wrote:

Some weirdness:

postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
SELECT 2
postgres=# \d+ mv_test2
Materialized view "public.mv_test2"
Column | Type | Modifiers | Storage | Stats target | Description
----------+---------+-----------+---------+--------------+-------------
moo | integer | | plain | |
?column? | integer | | plain | |
View definition:
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";

You are very good at coming up with these, Thom!

Will investigate.

Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?

Yes, nothing wrong with using the MV, or refreshing it:

postgres=# TABLE mv_test2;
moo | ?column?
-----+----------
1 | 2
1 | 3
(2 rows)

postgres=# SELECT * FROM mv_test2;
moo | ?column?
-----+----------
1 | 2
1 | 3
(2 rows)

postgres=# REFRESH MATERIALIZED VIEW mv_test2;
REFRESH MATERIALIZED VIEW

But a pg_dump of the MV has the same issue as the view definition:

--
-- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom;
Tablespace:
--

CREATE MATERIALIZED VIEW mv_test2 (
moo,
"?column?"
) AS
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"
WITH NO DATA;

A separate issue is with psql tab-completion:

postgres=# COMMENT ON MATERIALIZED VIEW ^IIS

This should be offering MV names instead of prematurely providing the "IS"
keyword.

--
Thom

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#1)

On 16 January 2013 05:40, Kevin Grittner <kgrittn@mail.com> wrote:

Here is a new version of the patch, with most issues discussed in
previous posts fixed.

Looks good.

The patch implements one kind of MV. In the future, we hope to have
other features or other kinds of MV alongside this:
* Snapshot MV - built once at start and then refreshed by explicit command only
* Snapshot MV with fast refresh
* Maintained MV (lazy) - changes trickle continuously into lazy MVs
* Maintained MV (transactional) - changes applied as part of write transactions
and or others

So I think we should agree now some aspects of those other options so
we can decide syntax. Otherwise we'll be left in the situation that
what we implement in 9.3 becomes the default for all time and/or we
have difficulties adding things later. e.g.
REFRESH ON COMMAND

Also, since there is no optimizer linkage between these MVs and the
tables they cover, I think we need to have that explicitly as a
command option, e.g.
DISABLE OPTIMIZATION

That way in the future we can implement "ENABLE OPTIMIZATION" mode and
REFRESH TRANSACTIONAL mode as separate items.

So all I am requesting is that we add additional syntax now, so that
future additional features are clear.

Please suggest syntax, not wedded to those... and we may want to use
more compatible syntax also.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Thom Brown
thom@linux.com
In reply to: Thom Brown (#11)

On 17 January 2013 16:03, Thom Brown <thom@linux.com> wrote:

On 16 January 2013 17:25, Thom Brown <thom@linux.com> wrote:

On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:

Thom Brown wrote:

Some weirdness:

postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
SELECT 2
postgres=# \d+ mv_test2
Materialized view "public.mv_test2"
Column | Type | Modifiers | Storage | Stats target | Description
----------+---------+-----------+---------+--------------+-------------
moo | integer | | plain | |
?column? | integer | | plain | |
View definition:
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";

You are very good at coming up with these, Thom!

Will investigate.

Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?

Yes, nothing wrong with using the MV, or refreshing it:

postgres=# TABLE mv_test2;
moo | ?column?
-----+----------
1 | 2
1 | 3
(2 rows)

postgres=# SELECT * FROM mv_test2;
moo | ?column?
-----+----------
1 | 2
1 | 3
(2 rows)

postgres=# REFRESH MATERIALIZED VIEW mv_test2;
REFRESH MATERIALIZED VIEW

But a pg_dump of the MV has the same issue as the view definition:

--
-- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom;
Tablespace:
--

CREATE MATERIALIZED VIEW mv_test2 (
moo,
"?column?"
) AS
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"
WITH NO DATA;

A separate issue is with psql tab-completion:

postgres=# COMMENT ON MATERIALIZED VIEW ^IIS

This should be offering MV names instead of prematurely providing the "IS"
keyword.

Also in doc/src/sgml/ref/alter_materialized_view.sgml:

s/materailized/materialized/

In src/backend/executor/execMain.c:

s/referrenced/referenced/

--
Thom

#14Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#10)

On Thu, Jan 17, 2013 at 07:54:55AM -0500, Robert Haas wrote:

On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Where I really need someone to hit me upside the head with a
clue-stick is the code I added to the bottom of RelationBuildDesc()
in relcache.c. The idea is that on first access to an unlogged MV,
to detect that the heap has been replaced by the init fork, set
relisvalid to false, and make the heap look normal again.

Hmm. I agree that relcache.c has absolutely no business doing that,
but not sure what else to do instead. Seems like it might be better
done at first touch of the MV in the parser, rewriter, or planner ---
but the fact that I can't immediately decide which of those is right
makes me feel that it's still too squishy.

I think we shouldn't be doing that at all. The whole business of
transferring the relation-is-invalid information from the relation to
a pg_class flag seems like a Rube Goldberg device to me. I'm still
not convinced that we should have a relation-is-invalid flag at all,
but can we at least not have two?

It seems perfectly adequate to detect that the MV is invalid when we
actually try to execute a plan - that is, when we first access the
heap or one of its indexes. So the bit can just live in the
file-on-disk, and there's no need to have a second copy of it in
pg_class.

Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
genuinely yielded the empty set at last refresh. I agree that there's no
particular need to store that fact in pg_class, and I would much prefer only
storing it one way in any case. A user-visible disadvantage of the current
implementation is that relisvalid remains stale until something opens the rel.
That's fine for the system itself, but it can deceive user-initiated catalog
queries. Imagine a check_postgres action that looks for invalid MVs to
complain about. It couldn't just scan pg_class; it would need to first do
something that opens every MV.

I suggest the following:

1. Let an invalid MV have a zero-length heap. Distinguish a valid, empty MV
by giving it a page with no tuples. This entails VACUUM[1]For the time being, it's unfortunate to VACUUM materialized views at all; they only ever bear frozen tuples. not truncating
MVs below one page and the refresh operation, where necessary, extending
the relation from zero pages to one.
2. Remove pg_class.relisvalid.
3. Add a bool field to RelationData. The word "valid" is used in that context
to refer to the validity of the structure itself, so perhaps call the new
field rd_scannable. RelationIsFlaggedAsValid() can become a macro;
consider changing its name for consistency with the field name.
4. During relcache build, set the field to "RelationGetNumberBlocks(rel) != 0"
for MVs, fixed "true" for everyone else. Any operation that changes the
field must, and probably would anyway, instigate a relcache invalidation.
5. Expose a database function, say pg_relation_scannable(), for querying the
current state of a relation. This supports user-level monitoring.

Does that seem reasonable? One semantic difference to keep in mind is that
unlogged MVs will be considered invalid on the standby while valid on the
master. That's essentially an accurate report, so I won't mind it.

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery. MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode. When a crash occurs with no refresh in flight,
the latest refresh had been safely synced. When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt. However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here. It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

I'm going to follow this with a review covering a broader range of topics.

Thanks,
nm

[1]: For the time being, it's unfortunate to VACUUM materialized views at all; they only ever bear frozen tuples.
they only ever bear frozen tuples.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Noah Misch
noah@leadboat.com
In reply to: Kevin Grittner (#1)

Hi Kevin,

The patch conflicts with git master; I tested against master@{2013-01-20}.

On Wed, Jan 16, 2013 at 12:40:55AM -0500, Kevin Grittner wrote:

I've been struggling with two areas:

- pg_dump sorting for MVs which depend on other MVs

From your later messages, I understand that you have a way forward on this.

- proper handling of the relisvalid flag for unlogged MVs after recovery

I have discussed this in a separate email. While reading the patch to assess
that topic, I found a few more things:

*** a/contrib/pg_upgrade/version_old_8_3.c
--- b/contrib/pg_upgrade/version_old_8_3.c
***************
*** 145,151 **** old_8_3_check_for_tsquery_usage(ClusterInfo *cluster)
"FROM	pg_catalog.pg_class c, "
"		pg_catalog.pg_namespace n, "
"		pg_catalog.pg_attribute a "
! 								"WHERE	c.relkind = 'r' AND "
"		c.oid = a.attrelid AND "
"		NOT a.attisdropped AND "
"		a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND "
--- 145,151 ----
"FROM	pg_catalog.pg_class c, "
"		pg_catalog.pg_namespace n, "
"		pg_catalog.pg_attribute a "
! 								"WHERE	c.relkind in ('r', 'm') AND "
"		c.oid = a.attrelid AND "
"		NOT a.attisdropped AND "
"		a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND "

PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really
matter whether this change happens or not. I suggest adding a comment,
whether or not you keep the code change.

*** a/contrib/sepgsql/sepgsql.h
--- b/contrib/sepgsql/sepgsql.h
***************
*** 32,37 ****
--- 32,39 ----
/*
* Internally used code of object classes
+  *
+  * NOTE: Materialized views are treated as tables for now.

This smells like a bypass of mandatory access control. Unless you've
determined that this is correct within the sepgsql security model, I suggest
starting with a draconian policy, like simply crippling MVs. Even if you have
determined that, separating out the nontrivial sepgsql support might be good.
The set of ideal reviewers is quite different.

*/
#define SEPG_CLASS_PROCESS			0
#define SEPG_CLASS_FILE				1
*** a/contrib/vacuumlo/vacuumlo.c
--- b/contrib/vacuumlo/vacuumlo.c
***************
*** 209,215 **** vacuumlo(const char *database, const struct _param * param)
strcat(buf, "      AND a.atttypid = t.oid ");
strcat(buf, "      AND c.relnamespace = s.oid ");
strcat(buf, "      AND t.typname in ('oid', 'lo') ");
! 	strcat(buf, "      AND c.relkind = 'r'");
strcat(buf, "      AND s.nspname !~ '^pg_'");
res = PQexec(conn, buf);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
--- 209,215 ----
strcat(buf, "      AND a.atttypid = t.oid ");
strcat(buf, "      AND c.relnamespace = s.oid ");
strcat(buf, "      AND t.typname in ('oid', 'lo') ");
! 	strcat(buf, "      AND c.relkind in ('r', 'm')");

It concerns me slightly that older vacuumlo could silently remove large
objects still referenced by MVs. Only slightly, though, because the next MV
refresh would remove those references anyway. Is there anything we should do
to help that situation? If nothing else, perhaps backpatch this patch hunk.

+    <varlistentry>
+     <term><literal>WITH OIDS</></term>
+     <term><literal>WITHOUT OIDS</></term>
+     <listitem>
+      <para>
+       These are obsolescent syntaxes equivalent to <literal>WITH (OIDS)</>
+       and <literal>WITH (OIDS=FALSE)</>, respectively.  If you wish to give
+       both an <literal>OIDS</> setting and storage parameters, you must use
+       the <literal>WITH ( ... )</> syntax; see above.
+      </para>
+     </listitem>
+    </varlistentry>

Let's not support OIDs on MVs. They'll be regenerated on every refresh.

***************
*** 336,342 **** ExplainOneQuery(Query *query, IntoClause *into, ExplainState *es,
*/
void
ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
! 				  const char *queryString, ParamListInfo params)
{
if (utilityStmt == NULL)
return;
--- 338,345 ----
*/
void
ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
! 				  const char *queryString, DestReceiver *dest,
! 				  ParamListInfo params)
{
if (utilityStmt == NULL)
return;
***************
*** 349,361 **** ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* contained parsetree another time, but let's be safe.
*/
CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt;
! 		List	   *rewritten;
Assert(IsA(ctas->query, Query));
! 		rewritten = QueryRewrite((Query *) copyObject(ctas->query));
! 		Assert(list_length(rewritten) == 1);
! 		ExplainOneQuery((Query *) linitial(rewritten), ctas->into, es,
! 						queryString, params);
}
else if (IsA(utilityStmt, ExecuteStmt))
ExplainExecuteQuery((ExecuteStmt *) utilityStmt, into, es,
--- 352,366 ----
* contained parsetree another time, but let's be safe.
*/
CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt;
! 		Query	   *query = (Query *) ctas->query;
! 
! 		dest = CreateIntoRelDestReceiver(into);

Assert(IsA(ctas->query, Query));
!
! query = SetupForCreateTableAs(query, ctas->into, queryString, params, dest);
!
! ExplainOneQuery(query, ctas->into, es, queryString, dest, params);
}
else if (IsA(utilityStmt, ExecuteStmt))
ExplainExecuteQuery((ExecuteStmt *) utilityStmt, into, es,

If I'm reading this right, you always overwrite the passed-in dest without
looking at it. What's the intent here?

+ 	/*
+ 	 * Kludge here to allow refresh of a materialized view which is invalid
+ 	 * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the
+ 	 * first two RangeTblEntry list elements, which were added to the front
+ 	 * of the rewritten Query to keep the rules system happy, with the
+ 	 * isResultRel flag to indicate that it is OK if they are flagged as
+ 	 * invalid.
+ 	 */
+ 	rtable = dataQuery->rtable;
+ 	((RangeTblEntry *) linitial(rtable))->isResultRel = true;
+ 	((RangeTblEntry *) lsecond(rtable))->isResultRel = true;

Is it safe to assume that the first two RTEs are the correct ones to flag?

+ 	/*
+ 	 * Swap the physical files of the target and transient tables, then
+ 	 * rebuild the target's indexes and throw away the transient table.
+ 	 */
+ 	finish_heap_swap(matviewOid, OIDNewHeap, false, false, false, RecentXmin);

The check_constraints argument should be "true", because the refresh could
have invalidated a UNIQUE index.

***************
*** 3049,3055 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
break;
case AT_ClusterOn:		/* CLUSTER ON */
case AT_DropCluster:	/* SET WITHOUT CLUSTER */
! 			ATSimplePermissions(rel, ATT_TABLE);
/* These commands never recurse */
/* No command-specific prep needed */
pass = AT_PASS_MISC;
--- 3104,3110 ----
break;
case AT_ClusterOn:		/* CLUSTER ON */
case AT_DropCluster:	/* SET WITHOUT CLUSTER */
! 			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW);

If the user desires an actually-clustered MV, he must re-CLUSTER it after each
refresh. That deserves a documentation mention.

***************
*** 724,729 **** InitPlan(QueryDesc *queryDesc, int eflags)
--- 765,775 ----
ExecCheckRTPerms(rangeTable, true);

/*
+ * Ensure that all referrenced relations are flagged as valid.

Typo.

+ */
+ ExecCheckRelationsValid(rangeTable);

I believe this ought to happen after the executor lock acquisitions, perhaps
right in ExecOpenScanRelation(). Since you'll then have an open Relation,
RelationIsFlaggedAsValid() can use the relcache.

***************
*** 1591,1596 **** fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
--- 1592,1607 ----
rel = heap_open(rte->relid, NoLock);
/*
+ 		 * Skip materialized view expansion when resultRelation is set.
+ 		 */
+ 		if (rel->rd_rel->relkind == RELKIND_MATVIEW &&
+ 			rel->rd_rel->relisvalid)
+ 		{
+ 			heap_close(rel, NoLock);
+ 			break;
+ 		}

Would you elaborate on this?

+ 	/* Strip off the trailing semicolon so that other things may follow. */
+ 	appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1);

I suggest verifying that the last character is indeed a semicolon.

/*
+  * dumpMatViewIndex
+  *	  write out to fout a user-defined index
+  */
+ static void
+ dumpMatViewIndex(Archive *fout, IndxInfo *indxinfo)

This is so similar to dumpIndex(); can we avoid this level of duplication?

*** /dev/null
--- b/src/test/regress/sql/matview.sql
+ -- test diemv when the mv does exist
+ DROP MATERIALIZED VIEW IF EXISTS tum;
+ 
+ -- make sure that dependencies are reported properly when they block the drop
+ DROP TABLE t;
+ 
+ -- make sure dependencies are dropped and reported
+ DROP TABLE t CASCADE;

Please retain an interesting sample of materialized views in the regression
database. Among other benefits, the pg_upgrade test suite exercises pg_dump
and pg_upgrade for all object types retained in the regression database.

The regression tests should probably include a few other wrinkles, like an
index on a MV.

Creating a RULE on an MV succeeds, but refreshing the view then fails:

[local] test=# create rule mvrule as on insert to mymv where 1 = 0 do also select 1;
CREATE RULE
[local] test=# REFRESH MATERIALIZED VIEW mymv;
ERROR: materialized view "mymv" has too many rules

The documentation is a good start. I would expect a brief introduction in
Tutorial -> Advanced Features and possibly a deeper discussion under The SQL
Language. I suggest updating Explicit Locking to mention the new commands;
users will be interested in the lock level of a refresh.

You have chosen to make pg_dump preserve the valid-or-invalid state of each
MV. That seems reasonable, though I'm slightly concerned about the case of a
dump taken from a standby.

We support ALTER TABLE against regular views for historical reasons. When we
added foreign tables, we did not extend that permissiveness; one can only use
ALTER FOREIGN TABLE on foreign tables. Please do the same for materialized
views. See RangeVarCallbackForAlterRelation(). Note that "ALTER TABLE
... RENAME colname TO newname" and "ALTER TABLE ... RENAME CONSTRAINT" are
currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW.

There's no documented support for table constraints on MVs, but UNIQUE
constraints are permitted:

[local] test=# alter materialized view mymv add unique (c);
ALTER MATERIALIZED VIEW
[local] test=# alter materialized view mymv add check (c > 0);
ERROR: "mymv" is not a table
[local] test=# alter materialized view mymv add primary key (c);
ERROR: "mymv" is not a table or foreign table

Some of the ALTER TABLE variants would make plenty of sense for MVs:

ALTER [ COLUMN ] column_name SET STATISTICS integer
ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] )
ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

It wouldn't be a problem to skip those for the first patch, though.
Conversely, this syntax is accepted:

ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] )

But there are no available options. The only option accepted for regular
views, security_barrier, is rejected. MVs always have security_barrier
semantics, in any event.

Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to
determine which ones make sense for MVs. For each one in the sensible set,
either allow it or add a comment indicating that it could reasonably be
allowed in the future. For each one outside the set, forbid it. Verify that
the documentation, the results of your evaluation, and the actual allowed
operations are all consistent.

Thanks,
nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Noah Misch (#15)

Thanks for looking at this!

Noah Misch wrote:

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery. MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode. When a crash occurs with no refresh in flight,
the latest refresh had been safely synced. When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt. However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here. It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

Hmm. That's a very good observation. Perhaps the issue can be
punted to a future release where we start adding more incremental
updates to them. I'll think on that, but on the face of it, it
sounds like the best choice.

I'm going to follow this with a review covering a broader range
of topics.

I'll need time to digest the rest of it. As you note, recent
commits conflict with the last patch. Please look at the github
repo where I've been working on this. I'll post an updated patch
later today.

https://github.com/kgrittn/postgres/tree/matview

You might want to ignore the interim work on detecting the new
pg_dump dependencies through walking the internal structures. I
decided that was heading in a direction which might be
unnecessarily fragile and slow; so I tried writing it as a query
against the system tables. I'm pretty happy with the results.
Here's the query:

with recursive w as
(
select
   d1.objid,
   d1.objid as wrkid,
   d2.refobjid,
   c2.relkind as refrelkind
 from pg_depend d1
 join pg_class c1 on c1.oid = d1.objid
                 and c1.relkind = 'm'
                 and c1.relisvalid
 join pg_rewrite r1 on r1.ev_class = d1.objid
 join pg_depend d2 on d2.classid = 'pg_rewrite'::regclass
                  and d2.objid = r1.oid
                  and d2.refobjid <> d1.objid
 join pg_class c2 on c2.oid = d2.refobjid
                 and c2.relkind in ('m','v')
                 and c2.relisvalid
 where d1.classid = 'pg_class'::regclass
union
select
   w.objid,
   w.refobjid as wrkid,
   d3.refobjid,
   c3.relkind as refrelkind
 from w
 join pg_rewrite r3 on r3.ev_class = w.refobjid
 join pg_depend d3 on d3.classid = 'pg_rewrite'::regclass
                  and d3.objid = r3.oid
                  and d3.refobjid <> w.refobjid
 join pg_class c3 on c3.oid = d3.refobjid
                 and c3.relkind in ('m','v')
                 and c3.relisvalid
 where w.refrelkind <> 'm'
),
x as
(
select objid::regclass, refobjid::regclass from w
 where refrelkind = 'm'
)
select 'm'::text as type, x.objid, x.refobjid from x
union all
select
   'i'::text as type,
   x.objid,
   i.indexrelid as refobjid
 from x
 join pg_index i on i.indrelid = x.refobjid
                and i.indisvalid
;

If we bail on having pg_class.relisvalid, then it will obviously
need adjustment.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Noah Misch
noah@leadboat.com
In reply to: Kevin Grittner (#16)

On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:

Noah Misch wrote:

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery. MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode. When a crash occurs with no refresh in flight,
the latest refresh had been safely synced. When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt. However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here. It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

Hmm. That's a very good observation. Perhaps the issue can be
punted to a future release where we start adding more incremental
updates to them. I'll think on that, but on the face of it, it
sounds like the best choice.

That situation is challenging for the same reason pg_class.relisvalid was hard
to implement for unlogged relations. The startup process doesn't know the
relkind of the unlogged-relation relfilenodes it cleans. If you can work
through all that, it's certainly a nice endpoint to not lose unlogged snapshot
MVs on crash. But I intended the first half of my message as the
recommendation and the above as a wish for the future.

You might want to ignore the interim work on detecting the new
pg_dump dependencies through walking the internal structures. I
decided that was heading in a direction which might be
unnecessarily fragile and slow; so I tried writing it as a query
against the system tables. I'm pretty happy with the results.
Here's the query:

with recursive w as

[snip]

Why is the dependency problem of ordering MV refreshes and MV index builds so
different from existing pg_dump dependency problems?

If we bail on having pg_class.relisvalid, then it will obviously
need adjustment.

Even if we don't have the column, we can have the fact of an MV's validity
SQL-visible in some other way.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Noah Misch (#17)

Noah Misch wrote:

On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:

Noah Misch wrote:

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery. MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode. When a crash occurs with no refresh in flight,
the latest refresh had been safely synced. When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt. However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here. It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

Hmm. That's a very good observation. Perhaps the issue can be
punted to a future release where we start adding more incremental
updates to them. I'll think on that, but on the face of it, it
sounds like the best choice.

That situation is challenging for the same reason pg_class.relisvalid was hard
to implement for unlogged relations. The startup process doesn't know the
relkind of the unlogged-relation relfilenodes it cleans. If you can work
through all that, it's certainly a nice endpoint to not lose unlogged snapshot
MVs on crash. But I intended the first half of my message as the
recommendation and the above as a wish for the future.

Well, if I just don't create an init fork for MVs, they are left as
they were on recovery, aren't they? So for 9.3, that solves that
issue, I think. pg_class.relisvald is a separate issue.

You might want to ignore the interim work on detecting the new
pg_dump dependencies through walking the internal structures. I
decided that was heading in a direction which might be
unnecessarily fragile and slow; so I tried writing it as a query
against the system tables. I'm pretty happy with the results.
Here's the query:

with recursive w as

[snip]

Why is the dependency problem of ordering MV refreshes and MV index builds so
different from existing pg_dump dependency problems?

If mva has indexes and is referenced by mvb, the CREATE statements
are all properly ordered, but you want mva populated and indexed
before you attempt to populate mvb. (Populated to get correct
results, indexed to get them quickly.) We don't have anything else
like that.

If we bail on having pg_class.relisvalid, then it will obviously
need adjustment.

Even if we don't have the column, we can have the fact of an MV's validity
SQL-visible in some other way.

Sure, I didn't say we had to abandon the query -- probably just
replace the relisvalid tests with a function call using the oid of
the MV.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Noah Misch
noah@leadboat.com
In reply to: Kevin Grittner (#18)

On Thu, Jan 24, 2013 at 03:14:15PM -0500, Kevin Grittner wrote:

Noah Misch wrote:

On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:

Noah Misch wrote:

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery. MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode. When a crash occurs with no refresh in flight,
the latest refresh had been safely synced. When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt. However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here. It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

Hmm. That's a very good observation. Perhaps the issue can be
punted to a future release where we start adding more incremental
updates to them. I'll think on that, but on the face of it, it
sounds like the best choice.

That situation is challenging for the same reason pg_class.relisvalid was hard
to implement for unlogged relations. The startup process doesn't know the
relkind of the unlogged-relation relfilenodes it cleans. If you can work
through all that, it's certainly a nice endpoint to not lose unlogged snapshot
MVs on crash. But I intended the first half of my message as the
recommendation and the above as a wish for the future.

Well, if I just don't create an init fork for MVs, they are left as
they were on recovery, aren't they? So for 9.3, that solves that
issue, I think. pg_class.relisvald is a separate issue.

The startup process just looks for init forks, yes. But it's acceptable to
leave the unlogged MV materials alone during *crash* recovery only. When
recovering from a base backup, we once again need an init fork to refresh the
unlogged-MV relations. In turn, we would still need a relisvalid
implementation that copes. This is all solvable, sure, but it looks like a
trip off into the weeds relative to the core aim of this patch.

Why is the dependency problem of ordering MV refreshes and MV index builds so
different from existing pg_dump dependency problems?

If mva has indexes and is referenced by mvb, the CREATE statements
are all properly ordered, but you want mva populated and indexed
before you attempt to populate mvb. (Populated to get correct
results, indexed to get them quickly.) We don't have anything else
like that.

Is the REFRESH order just a replay of the CREATE order (with index builds
interspersed), or can it differ?

nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Noah Misch (#19)

Noah Misch wrote:

The patch conflicts with git master; I tested against master@{2013-01-20}.

New patch rebased, fixes issues raised by Thom Brown, and addresses
some of your points.

PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really
matter whether this change happens or not. I suggest adding a comment,
whether or not you keep the code change.

Reverted code changes to version_old_8_3.c; added comments.

*** a/contrib/sepgsql/sepgsql.h
--- b/contrib/sepgsql/sepgsql.h
***************
*** 32,37 ****
--- 32,39 ----
/*
* Internally used code of object classes
+ *
+ * NOTE: Materialized views are treated as tables for now.

This smells like a bypass of mandatory access control. Unless you've
determined that this is correct within the sepgsql security model, I suggest
starting with a draconian policy, like simply crippling MVs. Even if you have
determined that, separating out the nontrivial sepgsql support might be good.
The set of ideal reviewers is quite different.

Robert suggested this way of coping for now. Will post just the
sepgsql separately to try to attract the right crowd to confirm.

It concerns me slightly that older vacuumlo could silently remove large
objects still referenced by MVs. Only slightly, though, because the next MV
refresh would remove those references anyway. Is there anything we should do
to help that situation? If nothing else, perhaps backpatch this patch hunk.

Defensive backpatching of this code sounds like a good idea to me.
I'm open to other opinions on whether we need to defend 9.3 and
later against earler versions of vacuumlo being run against them.

Let's not support OIDs on MVs. They'll be regenerated on every refresh.

Do they have any value for people who might want to use cursors? If
nobody speaks up for them, I will drop OID support for materialized
views.

If I'm reading this right, you always overwrite the passed-in dest without
looking at it. What's the intent here?

Let me get back to you on that one.

+ /*
+ * Kludge here to allow refresh of a materialized view which is invalid
+ * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the
+ * first two RangeTblEntry list elements, which were added to the front
+ * of the rewritten Query to keep the rules system happy, with the
+ * isResultRel flag to indicate that it is OK if they are flagged as
+ * invalid.
+ */
+ rtable = dataQuery->rtable;
+ ((RangeTblEntry *) linitial(rtable))->isResultRel = true;
+ ((RangeTblEntry *) lsecond(rtable))->isResultRel = true;

Is it safe to assume that the first two RTEs are the correct ones to flag?

I'm trying to play along with UpdateRangeTableOfViewParse() in
view.c. See the comment in front of that function for details.

+ finish_heap_swap(matviewOid, OIDNewHeap, false, false, false, RecentXmin);

The check_constraints argument should be "true", because the refresh could
have invalidated a UNIQUE index.

Fixed.

If the user desires an actually-clustered MV, he must re-CLUSTER it after each
refresh. That deserves a documentation mention.

That point had not occurred to me. Let me see if I can fix that before changing docs.

+ * Ensure that all referrenced relations are flagged as valid.

Typo.

Fixed.

+ ExecCheckRelationsValid(rangeTable);

I believe this ought to happen after the executor lock acquisitions, perhaps
right in ExecOpenScanRelation(). Since you'll then have an open Relation,
RelationIsFlaggedAsValid() can use the relcache.

That would break MVs entirely. This probably deserves more
comments. It's a little fragile, but was the best way I found to
handle things. An MV has a rule associated with it, just like a
"regular" view, which is parse-analyzed but not rewritten or
planned. We need to allow the rewrite and planning for statements
which populate the view, but suppress expansion of the rule for
simple references. It is OK for an MV to be invalid if it is being
populated, but not if it is being referenced. Long story short,
this call helps determine which relations will be opened.

If someone can suggest a better alternative, I'll see what I can
do; otherwise I guess I should add comments around the key places.

***************
*** 1591,1596 **** fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
--- 1592,1607 ----
rel = heap_open(rte->relid, NoLock);
/*
+ * Skip materialized view expansion when resultRelation is set.
+ */
+ if (rel->rd_rel->relkind == RELKIND_MATVIEW &&
+ rel->rd_rel->relisvalid)
+ {
+ heap_close(rel, NoLock);
+ break;
+ }

Would you elaborate on this?

It's diretly related to the point immediately preceding. At this
point we have thrown an error if the MV is invalid and being used
as a source of data. If it is the target of data it is flagged as
invalid so that it will not be expanded. Maybe we need a better way
to determine this, but I'm not sure just what to use.

+ /* Strip off the trailing semicolon so that other things may follow. */
+ appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1);

I suggest verifying that the last character is indeed a semicolon.

How about if I have it exit_horribly if the semicolon added 21
lines up has disappeared? Or use Assert if we have that for the
frontend now?

+ static void
+ dumpMatViewIndex(Archive *fout, IndxInfo *indxinfo)

This is so similar to dumpIndex(); can we avoid this level of duplication?

It is identical except for name. I can only assume that I thought I
needed a modified version and changed my mind. Removed.

Please retain an interesting sample of materialized views in the regression
database. Among other benefits, the pg_upgrade test suite exercises pg_dump
and pg_upgrade for all object types retained in the regression database.

OK

The regression tests should probably include a few other wrinkles, like an
index on a MV.

Yeah. Will do.

Creating a RULE on an MV succeeds, but refreshing the view then fails:

Fixed by prohibiting CREATE RULE on an MV.

The documentation is a good start. I would expect a brief introduction in
Tutorial -> Advanced Features and possibly a deeper discussion under The SQL
Language. I suggest updating Explicit Locking to mention the new commands;
users will be interested in the lock level of a refresh.

Yeah, the docs need another pass. It seemed prudent to make sure of
what I was documenting first.

You have chosen to make pg_dump preserve the valid-or-invalid state of each
MV. That seems reasonable, though I'm slightly concerned about the case of a
dump taken from a standby.

I'm not clear on the problem. Could you explain?

Some of the ALTER TABLE variants would make plenty of sense for MVs:

 ALTER [ COLUMN ] column_name SET STATISTICS integer
 ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] )
 ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
 ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

It wouldn't be a problem to skip those for the first patch, though.
Conversely, this syntax is accepted:

 ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] )

But there are no available options. The only option accepted for regular
views, security_barrier, is rejected. MVs always have security_barrier
semantics, in any event.

I think those are doc problems, not implementation of the
functionality. Will double-check and fix where needed.

Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to
determine which ones make sense for MVs. For each one in the sensible set,
either allow it or add a comment indicating that it could reasonably be
allowed in the future. For each one outside the set, forbid it. Verify that
the documentation, the results of your evaluation, and the actual allowed
operations are all consistent.

I have already tried to do that in the coding, although maybe you
think more comments are needed there? The docs definitely need to
catch up. This part isn't in flux, so I'll fix that part of the
docs in the next day or two.

Thanks for the review!

-Kevin

Attachments:

matview-v3.patch.gzapplication/x-gzip; charset=utf-8; name=matview-v3.patch.gzDownload
#21Thom Brown
thom@linux.com
In reply to: Kevin Grittner (#20)
#22Noah Misch
noah@leadboat.com
In reply to: Kevin Grittner (#20)
#23Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#20)
#24Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Noah Misch (#14)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#24)
#26Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#25)
#27Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#25)
#28Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#26)
#29Noah Misch
noah@leadboat.com
In reply to: Kevin Grittner (#27)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Noah Misch (#29)
#31Thom Brown
thom@linux.com
In reply to: Kevin Grittner (#24)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kevin Grittner (#30)
#33Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Alvaro Herrera (#32)
#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Thom Brown (#31)
#35Noah Misch
noah@leadboat.com
In reply to: Kevin Grittner (#33)
#36Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Noah Misch (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#36)
#38Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Robert Haas (#37)
#39Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#37)
#40Erik Rijkers
er@xs4all.nl
In reply to: Kevin Grittner (#24)
#41Josh Berkus
josh@agliodbs.com
In reply to: Erik Rijkers (#40)
#42David Fetter
david@fetter.org
In reply to: Erik Rijkers (#40)
#43Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Erik Rijkers (#40)
#44Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Josh Berkus (#41)
#45Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#44)
#46Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#24)
#47Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Josh Berkus (#45)
#48Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#47)
#49Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#46)
#51Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#39)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#51)
#53Peter Eisentraut
peter_e@gmx.net
In reply to: David Fetter (#42)
#54Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#50)
#55Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#53)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#54)
#57Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#56)
#58Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#51)
#59Erik Rijkers
er@xs4all.nl
In reply to: Kevin Grittner (#55)
#60Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#57)
#61Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#60)
#62Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#60)
#63Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#46)
#64Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#63)
#65Stephen Frost
sfrost@snowman.net
In reply to: Kevin Grittner (#64)
#66Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#64)
#67Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#66)
#68Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Bruce Momjian (#60)
#69Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#66)
#70Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#69)
#71Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#69)
#72Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#69)
#73Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#67)
#74Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#69)
#75Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#74)
#76Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#70)
#77Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#69)
#78Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#76)
#79Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#73)
#80Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#71)
#81Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#76)
#82Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#78)
#83Andres Freund
andres@anarazel.de
In reply to: Kevin Grittner (#80)
#84Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#81)
#85Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#83)
#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#84)
#87Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#75)
#88Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#86)
#89Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#76)
#90Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#89)
#91Andres Freund
andres@anarazel.de
In reply to: Kevin Grittner (#88)
#92Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#91)
#93Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#84)
#94Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#93)
#95Michael Paquier
michael@paquier.xyz
In reply to: Bruce Momjian (#94)
#96Erik Rijkers
er@xs4all.nl
In reply to: Kevin Grittner (#55)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#95)
#98Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#97)
#99Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#97)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#98)
#101Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#99)
#102Ants Aasma
ants.aasma@cybertec.at
In reply to: Kevin Grittner (#101)
#103Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ants Aasma (#102)
#104Ants Aasma
ants.aasma@cybertec.at
In reply to: Kevin Grittner (#103)
#105Bruce Momjian
bruce@momjian.us
In reply to: Ants Aasma (#104)
#106Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#105)
#107Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Kevin Grittner (#106)
#108Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Dean Rasheed (#107)
#109Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#106)
#110Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Kevin Grittner (#108)
#111Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Kevin Grittner (#108)
#112Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Nicolas Barbier (#111)
#113Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#106)
#114Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Kevin Grittner (#112)
#115Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nicolas Barbier (#114)
#116Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#115)
#117Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#109)
#118Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#117)
#119Craig Ringer
craig@2ndquadrant.com
In reply to: Josh Berkus (#116)
In reply to: Josh Berkus (#116)
#121Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kevin Grittner (#118)
#122Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#115)
#123Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#122)
#124Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#123)
#125Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#124)
#126Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#123)
#127Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Josh Berkus (#125)
#128Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#126)
#129Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Robert Haas (#126)
#130Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#126)
#131Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Kevin Grittner (#128)
#132Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Nicolas Barbier (#131)
#133Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#127)
#134Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Kevin Grittner (#109)
#135Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#130)
#136Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#135)
#137Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tatsuo Ishii (#134)
#138Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#23)
#139Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#126)
#140Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#137)
#141David E. Wheeler
david@kineticode.com
In reply to: Kevin Grittner (#140)
#142Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Kevin Grittner (#140)
#143Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#136)
#144Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: David E. Wheeler (#141)
#145David E. Wheeler
david@kineticode.com
In reply to: Kevin Grittner (#144)
#146Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Kevin Grittner (#132)
#147Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#136)
#148Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#122)
#149Robert Haas
robertmhaas@gmail.com
In reply to: David E. Wheeler (#145)
#150Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#15)
#151Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Noah Misch (#150)
#152Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#151)