Implementing SQL ASSERTION
Hi all,
I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?
I’ve recently spent a bit of time looking to implement the execution models described in “Applied Mathematics for Database Professionals” by Toon Koppelaars and Lex de Haan. I’ve gotten as far as execution model 3 and am now looking at deriving polarity of involved tables to do EM4 (described in some detail in “Deriving Production Rules for Constraint Maintenance”, Ceri & Widom, VLDB Conference 1990, p555-577). EM5 & EM6 look rather more difficult but I’m intending to try and implement those, too.
If there are other people working on this stuff it would be great to collaborate.
Regards.
-Joe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Apr 30, 2015 at 6:36 PM, Joe Wildish
<joe-postgresql.com@elusive.cx> wrote:
I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?
I’ve recently spent a bit of time looking to implement the execution models described in “Applied Mathematics for Database Professionals” by Toon Koppelaars and Lex de Haan. I’ve gotten as far as execution model 3 and am now looking at deriving polarity of involved tables to do EM4 (described in some detail in “Deriving Production Rules for Constraint Maintenance”, Ceri & Widom, VLDB Conference 1990, p555-577). EM5 & EM6 look rather more difficult but I’m intending to try and implement those, too.
If there are other people working on this stuff it would be great to collaborate.
I don't know of anyone working on this. It sounds very difficult.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1 May 2015, at 19:51, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 30, 2015 at 6:36 PM, Joe Wildish
<joe-postgresql.com@elusive.cx> wrote:I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?
I’ve recently spent a bit of time looking to implement the execution models described in “Applied Mathematics for Database Professionals” by Toon Koppelaars and Lex de Haan. I’ve gotten as far as execution model 3 and am now looking at deriving polarity of involved tables to do EM4 (described in some detail in “Deriving Production Rules for Constraint Maintenance”, Ceri & Widom, VLDB Conference 1990, p555-577). EM5 & EM6 look rather more difficult but I’m intending to try and implement those, too.
If there are other people working on this stuff it would be great to collaborate.
I don't know of anyone working on this. It sounds very difficult.
The book I mention details a series of execution models, where each successive model aims to validate the assertion in a more efficient manner than the last. This is achieved by performing static analysis of the assertion's expression to determine under what circumstances the assertion need be (re)checked. Briefly:
EM1: after all DML statements;
EM2: only after DML statements involving tables mentioned in the assertion expression;
EM3: only after DML statements involving the columns mentioned in the assertion expression;
EM4: only after DML statements involving the columns, plus if the statement has a “polarity” that may affect the assertion expression.
“Polarity" here means that one is able to (statically) determine if only INSERTS and not DELETES can affect an expression or vice-versa.
EMs 5 and 6 are further enhancements that make use of querying the “transition effect” data of what actually changed in a statement, to determine if the assertion expression need be validated. I’ve not done as much reading around this topic yet so am concentrating on EMs 1-4.
I agree it is a difficult problem but there are a fair number of published academic papers relating to this topic. The AM4DP book draws a lot of this research together and presents the executions models.
I may start writing up on a blog of where I get to, and then post further to this list, if there is interest.
Regards.
-Joe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 02, 2015 at 10:42:24PM +0100, Joe Wildish wrote:
On 1 May 2015, at 19:51, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 30, 2015 at 6:36 PM, Joe Wildish
<joe-postgresql.com@elusive.cx> wrote:I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?
I’ve recently spent a bit of time looking to implement the execution models described in “Applied Mathematics for Database Professionals” by Toon Koppelaars and Lex de Haan. I’ve gotten as far as execution model 3 and am now looking at deriving polarity of involved tables to do EM4 (described in some detail in “Deriving Production Rules for Constraint Maintenance”, Ceri & Widom, VLDB Conference 1990, p555-577). EM5 & EM6 look rather more difficult but I’m intending to try and implement those, too.
If there are other people working on this stuff it would be great to collaborate.
I don't know of anyone working on this. It sounds very difficult.
The book I mention details a series of execution models, where each successive model aims to validate the assertion in a more efficient manner than the last. This is achieved by performing static analysis of the assertion's expression to determine under what circumstances the assertion need be (re)checked. Briefly:
EM1: after all DML statements;
EM2: only after DML statements involving tables mentioned in the assertion expression;
EM3: only after DML statements involving the columns mentioned in the assertion expression;
EM4: only after DML statements involving the columns, plus if the statement has a “polarity” that may affect the assertion expression.“Polarity" here means that one is able to (statically) determine if only INSERTS and not DELETES can affect an expression or vice-versa.
EMs 5 and 6 are further enhancements that make use of querying the “transition effect” data of what actually changed in a statement, to determine if the assertion expression need be validated. I’ve not done as much reading around this topic yet so am concentrating on EMs 1-4.
I agree it is a difficult problem but there are a fair number of published academic papers relating to this topic. The AM4DP book draws a lot of this research together and presents the executions models.
I may start writing up on a blog of where I get to, and then post further to this list, if there is interest.
I suspect that you would get a lot further with a PoC patch including
the needed documentation. Remember to include how this would work at
all the transaction isolation levels and combinations of same that we
support. Recall also to include the lock strength needed. Just about
anything can be done with a database-wide lock :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3 May 2015, at 02:42, David Fetter <david@fetter.org> wrote:
On Sat, May 02, 2015 at 10:42:24PM +0100, Joe Wildish wrote:
I may start writing up on a blog of where I get to, and then post further to this list, if there is interest.
I suspect that you would get a lot further with a PoC patch including
the needed documentation. Remember to include how this would work at
all the transaction isolation levels and combinations of same that we
support. Recall also to include the lock strength needed. Just about
anything can be done with a database-wide lock :)
Thanks David. I’m obviously new here so I not that familiar with how one starts contributing.
Once I get to a decent level with the EM4 PoC I will post the details to this list. The general idea is that upon assertion creation, the expression is analysed to determine when it needs to be validated — corresponding internal "after statement” triggers are then created. There will definitely need to be some serialisation take place on the basis of when an assertion has been validated, but I’ve not got that far yet. I’ll be sure to include the details when I post though.
Regards.
-Joe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/30/15 6:36 PM, Joe Wildish wrote:
I’m wondering if there are other people out there working on implementing SQL ASSERTION functionality?
I was the last one, probably:
</messages/by-id/1384486216.5008.17.camel@vanquo.pezone.net>.
I intend to pick up that work sometime, but feel free to review the
thread for a start. The main question was how to manage transaction
isolation.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hackers,
Attached is a WIP patch for SQL assertion. I am posting it for anyone who might be interested in seeing it, for comments/feedback, and to see if others are keen to collaborate on taking it further. It is not near production-ready (see thoughts on that below).
The patch builds on the work posted by Peter back in 2013. I've taken his code and updated it to conform to some general changes made to the codebase since then. The bulk of the new work I have done is around when an assertion needs to be checked. Essentially it is an implementation of the algorithm described by Ceri & Widom in "Deriving Production Rules for Constraint Maintenance” — http://infolab.stanford.edu/pub/papers/constraint-maintenance.ps
The general idea is to traverse the expression tree and derive the set of potentially invalidating operations. These operations are used to determine when the constraint trigger fires and causes a re-check. The detail is in the paper but some examples are:
* insertion into the subject of an exists cannot be invalidating;
* deletion from the subject of a not exists cannot be invalidating;
* update of columns in the target list of an exists cannot be invalidating;
* certain combinations of aggregates with comparison operations cannot be invalidating.
As an example of the last point, the expression "CHECK (10 > (SELECT COUNT(*) FROM t))" cannot be invalidated by a delete or an update but can be invalidated by an insert.
I have implemented most of the optimisations mentioned in the paper. There are one or two that I am unsure about, specifically how to deal with set-operations that are the subject of an exists. According to the paper, these are optimisable when they're the subject of an exists, but I think it is only applicable for union and not intersect or except, so I have skipped that particular optimisation for the time being.
The algorithm works under the assumption that when a recheck occurs the previous check result was true (the research report by Ceri & Widom does acknolwedge this assumption). However, unfortunately the SQL specification requires that both true and unknown be valid results for an assertion's check expression. This doesn't play too well with the algorithm so for the time being I have disallowed null. I think the solution here may be that when a null result for a check occurs, the assertion is changed to trigger on all operations against the involved tables; once it returns to true, the triggers can be returned to fire only on the derived invalidating operations. More thought required though. (note: having just written this paragraph, I've realised I can't right now think of a concrete example illustrating the point, so it may be that I'm wrong on this).
The paper does mention a set of optimisations that I have not yet attempted to implement. These are essentially the technique of evaluating the expression against the deltas of a change rather than the full tables. Clearly there is a large overlap with incremental maintainence of views and actually the two authors of the paper have a similiarly named paper called "Deriving Production Rules for Incremental View Maintanence". Although I have yet to finish reviewing all the literature on the subject, I suspect that realistically for this to make it into production, we'd need some implementation of these techniques to make the performance palatable.
Cheers,
-Joe
Attachments:
0001-SQL-assertion-WIP.patchapplication/octet-stream; name=0001-SQL-assertion-WIP.patch; x-unix-mode=0600Download+3571-69
Hello Joe,
Just a reaction to the example, which is maybe addressed in the patch
which I have not investigated.
* certain combinations of aggregates with comparison operations cannot
be invalidating.As an example of the last point, the expression "CHECK (10 > (SELECT
COUNT(*) FROM t))" cannot be invalidated by a delete or an update but
can be invalidated by an insert.
I'm wondering about the effect of MVVC on this: if the check is performed
when the INSERT is done, concurrent inserting transactions would count the
current status which would be ok, but on commit all concurrent inserts
would be there and the count could not be ok anymore?
Maybe if the check was deferred, but this is not currently possible with
pg (eg the select can simply be put in a function), and I there might be
race conditions. ISTM that such a check would imply non trivial locking to
be okay, it is not just a matter of deciding whether to invoke the check
or not.
--
Fabien.
Hi Fabien,
* certain combinations of aggregates with comparison operations cannot be invalidating.
As an example of the last point, the expression "CHECK (10 > (SELECT COUNT(*) FROM t))" cannot be invalidated by a delete or an update but can be invalidated by an insert.
I'm wondering about the effect of MVVC on this: if the check is performed when the INSERT is done, concurrent inserting transactions would count the current status which would be ok, but on commit all concurrent inserts would be there and the count could not be ok anymore?
Yes, there was quite a bit of discussion in the original thread about concurrency. See here:
/messages/by-id/1384486216.5008.17.camel@vanquo.pezone.net </messages/by-id/1384486216.5008.17.camel@vanquo.pezone.net
The patch doesn’t attempt to address concurrency (beyond the obvious benefit of reducing the circumstances under which the assertion is checked). I am working under the assumption that we will find some acceptable way for that to be resolved :-) And at the moment, working in serialisable mode addresses this issue. I think that is suggested in the thread actually (essentially, if you want to use assertions, you require that transactions be performed at serialisable isolation level).
Maybe if the check was deferred, but this is not currently possible with pg (eg the select can simply be put in a function), and I there might be race conditions. ISTM that such a check would imply non trivial locking to be okay, it is not just a matter of deciding whether to invoke the check or not.
I traverse into SQL functions so that the analysis can capture invalidating operations from the expression inside the function. Only internal and SQL functions are considered legal. Other languages are rejected.
-Joe
I'm wondering about the effect of MVVC on this: if the check is
performed when the INSERT is done, concurrent inserting transactions
would count the current status which would be ok, but on commit all
concurrent inserts would be there and the count could not be ok
anymore?
The patch doesnοΏ½t attempt to address concurrency (beyond the obvious
benefit of reducing the circumstances under which the assertion is
checked). I am working under the assumption that we will find some
acceptable way for that to be resolved :-) And at the moment, working in
serialisable mode addresses this issue. I think that is suggested in the
thread actually (essentially, if you want to use assertions, you require
that transactions be performed at serialisable isolation level).
Thanks for the pointers. The "serializable" isolation level restriction
sounds reasonnable.
--
Fabien.
On Mon, Jan 15, 2018 at 03:40:57PM +0100, Fabien COELHO wrote:
I'm wondering about the effect of MVVC on this: if the check is
performed when the INSERT is done, concurrent inserting transactions
would count the current status which would be ok, but on commit all
concurrent inserts would be there and the count could not be ok anymore?The patch doesn’t attempt to address concurrency (beyond the obvious
benefit of reducing the circumstances under which the assertion is
checked). I am working under the assumption that we will find some
acceptable way for that to be resolved :-) And at the moment, working in
serialisable mode addresses this issue. I think that is suggested in the
thread actually (essentially, if you want to use assertions, you require
that transactions be performed at serialisable isolation level).Thanks for the pointers. The "serializable" isolation level restriction
sounds reasonnable.
It sounds reasonable enough that I'd like to make a couple of Modest
Proposals™, to wit:
- We follow the SQL standard and make SERIALIZABLE the default
transaction isolation level, and
- We disallow writes at isolation levels other than SERIALIZABLE when
any ASSERTION could be in play.
That latter could range in implementation from crashingly unsubtle to
very precise.
Crashingly Unsubtle:
Disallow writes at any isolation level other than SERIALIZABLE.
Very Precise:
Disallow writes at any other isolation level when the ASSERTION
could come into play using the same machinery that enforces the
ASSERTION in the first place.
What say?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi David,
On 15 Jan 2018, at 16:35, David Fetter <david@fetter.org> wrote:
It sounds reasonable enough that I'd like to make a couple of Modest
Proposals™, to wit:- We follow the SQL standard and make SERIALIZABLE the default
transaction isolation level, and- We disallow writes at isolation levels other than SERIALIZABLE when
any ASSERTION could be in play.
Certainly it would be easy to put a test into the assertion check function to require the isolation level be serialisable. I didn’t realise that that was also the default level as per the standard. That need not necessarily be changed, of course; it would be obvious to the user that it was a requirement as the creation of an assertion would fail without it, as would any subsequent attempts to modify the involved tables.
-Joe
On Mon, Jan 15, 2018 at 09:14:02PM +0000, Joe Wildish wrote:
Hi David,
On 15 Jan 2018, at 16:35, David Fetter <david@fetter.org> wrote:
It sounds reasonable enough that I'd like to make a couple of Modest
Proposals™, to wit:- We follow the SQL standard and make SERIALIZABLE the default
transaction isolation level, and- We disallow writes at isolation levels other than SERIALIZABLE when
any ASSERTION could be in play.Certainly it would be easy to put a test into the assertion check
function to require the isolation level be serialisable. I didn’t
realise that that was also the default level as per the standard.
That need not necessarily be changed, of course; it would be obvious
to the user that it was a requirement as the creation of an
assertion would fail without it, as would any subsequent attempts to
modify the involved tables.
This patch no longer applies. Any chance of a rebase?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi David,
This patch no longer applies. Any chance of a rebase?
Of course. I’ll look at it this weekend,
Cheers,
-Joe
On Thu, Mar 08, 2018 at 09:11:58PM +0000, Joe Wildish wrote:
Hi David,
This patch no longer applies. Any chance of a rebase?
Of course. I’ll look at it this weekend,
Much appreciate it!
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Jan 15, 2018 at 11:35 AM, David Fetter <david@fetter.org> wrote:
- We follow the SQL standard and make SERIALIZABLE the default
transaction isolation level, and
The consequences of such a decision would include:
- pgbench -S would run up to 10x slower, at least if these old
benchmark results are still valid:
/messages/by-id/CA+TgmoZog1wFbyrqzJUkiLSXw5sDUjJGUeY0c2BqSG-tciSB7w@mail.gmail.com
- pgbench without -S would fail outright, because it doesn't have
provision to retry failed transactions.
https://commitfest.postgresql.org/16/1419/
- Many user applications would probably also experience similar difficulties.
- Parallel query would no longer work by default, unless this patch
gets committed:
https://commitfest.postgresql.org/17/1004/
I think a good deal of work to improve the performance of serializable
would need to be done before we could even think about making it the
default -- and even then, the fact that it really requires the
application to be retry-capable seems like a pretty major obstacle.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Mar 10, 2018 at 6:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 15, 2018 at 11:35 AM, David Fetter <david@fetter.org> wrote:
- We follow the SQL standard and make SERIALIZABLE the default
transaction isolation level, andThe consequences of such a decision would include:
- pgbench -S would run up to 10x slower, at least if these old
benchmark results are still valid:/messages/by-id/CA+TgmoZog1wFbyrqzJUkiLSXw5sDUjJGUeY0c2BqSG-tciSB7w@mail.gmail.com
- pgbench without -S would fail outright, because it doesn't have
provision to retry failed transactions.https://commitfest.postgresql.org/16/1419/
- Many user applications would probably also experience similar difficulties.
- Parallel query would no longer work by default, unless this patch
gets committed:https://commitfest.postgresql.org/17/1004/
I think a good deal of work to improve the performance of serializable
would need to be done before we could even think about making it the
default -- and even then, the fact that it really requires the
application to be retry-capable seems like a pretty major obstacle.
Also:
- It's not available on hot standbys. Experimental patches have been
developed based on the read only safe snapshot concept, but some
tricky problems remain unsolved.
- Performance is terrible (conflicts are maximised) if you use any
index type except btree, unless some of these get committed:
https://commitfest.postgresql.org/17/1172/
https://commitfest.postgresql.org/17/1183/
https://commitfest.postgresql.org/17/1466/
--
Thomas Munro
http://www.enterprisedb.com
This patch no longer applies. Any chance of a rebase?
Attached is a rebased version of this patch. It takes into account the ACL checking changes and a few other minor amendments.
Cheers,
-Joe
Attachments:
0001-SQL-ASSERTION-prototype.patchapplication/octet-stream; name=0001-SQL-ASSERTION-prototype.patch; x-unix-mode=0600Download+3572-69
On Sun, Mar 18, 2018 at 12:29:50PM +0000, Joe Wildish wrote:
This patch no longer applies. Any chance of a rebase?
Attached is a rebased version of this patch. It takes into account the ACL checking changes and a few other minor amendments.
Thanks!
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sun, Mar 18, 2018 at 12:29:50PM +0000, Joe Wildish wrote:
This patch no longer applies. Any chance of a rebase?
Attached is a rebased version of this patch. It takes into account
the ACL checking changes and a few other minor amendments.
Sorry to bother you again, but this now doesn't compile atop master.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate