Difference between revisions of "Update through views: a possible approach"

From D Wiki
Jump to: navigation, search
m (References)
m
Line 82: Line 82:
 
The ETNF paper's use of ''essential'' tuple is (loosley speaking) that without that tuple some constraint would be violated, and we couldn't 'recreate' that tuple from other content to satisfy the constraint. I'm extending that idea: without that tuple (inserted or deleted) in the base some constraint would be violated and/or the update would not have precisely the effect requested for the view.
 
The ETNF paper's use of ''essential'' tuple is (loosley speaking) that without that tuple some constraint would be violated, and we couldn't 'recreate' that tuple from other content to satisfy the constraint. I'm extending that idea: without that tuple (inserted or deleted) in the base some constraint would be violated and/or the update would not have precisely the effect requested for the view.
  
==== References ====
+
== References ==
  
 
* Codd's 12 rules -- Rule 6 and definition, see also discussion under Rule 9.
 
* Codd's 12 rules -- Rule 6 and definition, see also discussion under Rule 9.
Line 101: Line 101:
 
: note this is Fabian Pascal reporting a Date/McGoveran posting from 2004, it's unlikely either author would hold the same views today.
 
: note this is Fabian Pascal reporting a Date/McGoveran posting from 2004, it's unlikely either author would hold the same views today.
 
* Links to the Date & McGovern 1994 series of articles (need to dig them out via the wayback machine)
 
* Links to the Date & McGovern 1994 series of articles (need to dig them out via the wayback machine)
: http://www.dbdebunk.citymax.com/page/page/622302.htm -- Part 1 links through to the other parts
+
: http://www.dbdebunk.citymax.com/page/page/622302.htm -- Part 1 links through to the other parts, in a haphazard fashion
 
<blockquote>I am not alone in thinking that the treatment of view updating within the overall theory of relational databases has always been rather unsatisfactory for one reason or another. As Nat Goodman puts it:  "... there is no theory behind any of this.  Each [rule] seems intuitively correct, but there is no overall framework.  It would be better to have a general rule that states what a correct view update algorithm has to do, and then derive the special rules for each case from that general rule.  Without this, the [rules] feel like a crazy patchwork of exceptions and special notes.  In some cases, two or more [rules] are equally sensible ..." (and so on).</blockquote>
 
<blockquote>I am not alone in thinking that the treatment of view updating within the overall theory of relational databases has always been rather unsatisfactory for one reason or another. As Nat Goodman puts it:  "... there is no theory behind any of this.  Each [rule] seems intuitively correct, but there is no overall framework.  It would be better to have a general rule that states what a correct view update algorithm has to do, and then derive the special rules for each case from that general rule.  Without this, the [rules] feel like a crazy patchwork of exceptions and special notes.  In some cases, two or more [rules] are equally sensible ..." (and so on).</blockquote>
 
: http://www.dbdebunk.com/page/page/622150.htm  -- Part 6 is the concluding summary
 
: http://www.dbdebunk.com/page/page/622150.htm  -- Part 6 is the concluding summary

Revision as of 02:02, 7 February 2017

(In draft as at early February 2017.)

It is assumed the reader is familiar with the topic of update through views. 'View' is the term usual in database discussions, as for example in SQL. The ttm term is 'Virtual relvar'. Updating through views is discussed, for example, in both DTATRM and DBE. Most textbooks on the Relational Model or SQL discuss views and at least mention updating through views.

Background

Update through views is a regular discussion topic on the ttm forum. And a regular topic for Chris Date, co-author of The Third Manifesto. He has published many treatments, since particularly a series of articles (with David McGoveran) 1994 [see #References below for publication details]. Arising from that 1994 treatment also was the Principle Of Orthogonal Design, aimed at making update through views more tractable. McGoveran continues to publish on the topic (nowadays separately to Date), emphasises Orthogonal Design as key to updating databases -- albeit a different interpretation from Date. The topic has also been much discussed on older forums, such as comp.database.theory.

There is a wide range of opinions on the desirability, possibility and practicability of update through views:

  • Those who assert update is possible through all views, and the mechanisms merely need refining.
  • Those who prescribe update should be possible through all views "that are theoretically updatable".
    • (The quote is from Codd's 12 rules (1985), rule 6. The narrative for that rule includes a definition of 'theoretically updatable'.)
  • Those who think some but not all views are updatable through, (which is not inconsistent with Codd 1985) and there is merit in a relational dbms being able to support that ability.
    • Note that the SQL standard stipulates some capabilities, which have been somewhat implemented by vendors.
  • Those who think that although some views might support being updated through, that is certainly not all of the views encountered in practice, so the uncertainty and continuing debate do not justify implementing a feature that can at best be only partial.
  • Those who are adamant that (even if some views are updatable-through unproblematically), it is an unnecessary feature, and adds complexity both to the dbms and to developers/schema designers wanting to update the database.
    • (It is true that update to base relvars is always sufficient, and necessarily so: the objective for update through views is to translate updates requested against views into updates against base.)

The "possible approach" propounded in this article is squarely aimed at getting some progress in the middle of this range. Specifically:

  • The debate has tended to be polarised, with the most prominent advocates (Date particularly) at the "all views can be updated through" position.
  • Date's continuing inability to 'solve' the problem (even to his own satisfaction) have lead to many being sceptical.
  • The only major attempt at a partial implementation is SQL. It is beset with limitations and hard-to-explain behaviour. (And subtle differences between different vendors' implementations.) Such that most developers avoid the 'feature', and advise others to do the same. (See, for example, regular questions on StackOverflow.)
  • Consequently there appears to be little thinking that acknowledges some views can be updated through, some not, and seeks to arrive at a precise characterisation of which and why. (And how, for those updates that can be supported.)

Considerations for an approach

[email from the forum]

On Jan 21, 2017, at 2:53 AM, Anthony Clayden <anthony.d.clayden@gmail.com> wrote:

The whole topic has suffered from too much ad-hockery. I have a small and perfectly-formed set of rules in mind.


So my general position:
  • some views are updatable-through; some aren't;
  • some views are INSERTable-through but not DELETEable-through, and vv.
  • rather than putting effort into developing byzantine rules to make more views updatable;
  • the effort should go into agreeing a small set of rules [**] that
a) are clear what's updatable, what isn't;
b) have clear effects for views that are updatable
c) can give clear explanations (compilation error messages) for those that are not.

Note [**] the "rules" were in later discussions re-titled as "prescriptions". The following has been editted accordingly.

I have four-and-a-bit prescriptions. Of which the bit is merely to say that if the four prescriptions don't give a clear answer how to update, then the view is not updatable-through (for that action) -- i.e. c) above.

The prescriptions

The first two prescriptions are standard on this topic [Dayal & Bernstein 1978]'s preserves semantic consistency and no side effects.

The third prescription is really to make view updating tractable and predictable/manageable for the dbms and application developers. The same motivation as for ttm's OO Pre 1 (compile time type checking).

It is the fourth I've long had trouble with -- hence this prologue. When I say "fourth", this has been up to 4 or 5 separate rules in my previous versions. I think I've now found a formulation to condense [D&B 1978]'s unique, no extraneous updates, my own unambiguous; and to find grounds to reject [D&B 1978]'s "uncomfortable dimension" with which I've never been, er, comfortable. The notion -- that both tuples for the update and their attributes be essential -- is cribbed from D,D&F's ETNF paper. [Darwen, Date, Fagin 2012, ETNF = Essential Tuple Normal Form]

There are footnotes for each prescription, but let's state them succinctly first:

  1. At all times all constraints must hold. (rubric: The "Golden Rule".)
  2. Any update through a view must have precisely the effect on the content of the view as if it were a base relation. (rubric: "Assignment Principle")
  3. That a view is updatable-through (for some action INSERT, DELETE, UPDATE, assignment :=) must be determined by reference only to the schema, view's definition, constraints and the specified update effect for that action, per prescription 2. (rubric: "Not dependent on the happenstance of content")
  4. Noting that the update is to be effected by some combination of tuples deleted and inserted into the database (base relations): each such tuple must be necessary to achieving 1. and 2., the set of tuples (deletes and inserts) must be sufficient, and each attribute value must be similarly necessary. (rubric: essential tuples, essential attribute values.)
  5. If there is no set of necessary, sufficient tuples/attributes per 4., that view is not updatable-through for that action.

Footnotes to the prescriptions

  1. Knowing all constraints must hold, the compiler can:
    a) rely on them all holding before it plans the update steps;
    b) examine them (in the catalogue or otherwise) to guide the update.
  2. I'm not expecting the update actions to be necessarily limited to the traditional INSERT/DELETE/UPDATE/assignment(:=).
    Opportunity for new forms of update. (I've toyed with a few on the forum.)
  3. The consequence is that the algorithm for update be derivable at compile time.
    (So could be implemented as a stored procedure.)
  4. Essential tuple means that it is necessary for those tuples to be deleted/inserted to comply with 1., 2 for all valid states of the database.
    Essential attribute value means that value is the only possible to comply with 1., 2 for all valid states of the database.
    Sufficient requires that if there are two (or more) different sets of tuple actions/values that comply:
    • and one is a superset of another (deletes and inserts taken separately), then the tuples in the difference are not necessary (not essential).
    • or the sets only partially overlap (and their intersection is not sufficient), then the tuples unique to each set are not necessary (not essential).


The ETNF paper's use of essential tuple is (loosley speaking) that without that tuple some constraint would be violated, and we couldn't 'recreate' that tuple from other content to satisfy the constraint. I'm extending that idea: without that tuple (inserted or deleted) in the base some constraint would be violated and/or the update would not have precisely the effect requested for the view.

References

  • Codd's 12 rules -- Rule 6 and definition, see also discussion under Rule 9.
(The bare rules just prescribe.)
http://computing.derby.ac.uk/c/codds-twelve-rules/
  • Dayal and Bernstein 1978 "On the Updatability of Relational Views”
https://pdfs.semanticscholar.org/39f4/853a35a913c714c5f3956dbefd02916447c4.pdf
references a 1975 paper Chamberlin, Gray, Traiger
https://www.computer.org/csdl/proceedings/afips/1975/5083/00/50830425.pdf
  • Bancilhon and Spyratos 1981 "Update Semantics of Relational VIews"
http://pages.cs.wisc.edu/~jhuang/qual/update-semantics-of-views.pdf
  • Darwen, Date, Fagin 2012 "A Normal Form for Preventing Redundant Tuples in Relational Databases"
http://researcher.watson.ibm.com/researcher/files/us-fagin/icdt12.pdf
  • An example SQL implementation (SQL-Server) reference manual
https://msdn.microsoft.com/en-us/library/ms187956.aspx#Anchor_3
  • For another round of the continuing debate
http://www.dbdebunk.com/2016/12/on-view-updating-c-j-date-and-d.html
note this is Fabian Pascal reporting a Date/McGoveran posting from 2004, it's unlikely either author would hold the same views today.
  • Links to the Date & McGovern 1994 series of articles (need to dig them out via the wayback machine)
http://www.dbdebunk.citymax.com/page/page/622302.htm -- Part 1 links through to the other parts, in a haphazard fashion
I am not alone in thinking that the treatment of view updating within the overall theory of relational databases has always been rather unsatisfactory for one reason or another. As Nat Goodman puts it: "... there is no theory behind any of this. Each [rule] seems intuitively correct, but there is no overall framework. It would be better to have a general rule that states what a correct view update algorithm has to do, and then derive the special rules for each case from that general rule. Without this, the [rules] feel like a crazy patchwork of exceptions and special notes. In some cases, two or more [rules] are equally sensible ..." (and so on).
http://www.dbdebunk.com/page/page/622150.htm -- Part 6 is the concluding summary
Comments On Republication: [on dbdebunk] Originally published in Database Programming & Design 7, No. 6 (June 1994) and published as a two-part article in RELATIONAL DATABASE WRITINGS1991-94. It is republished here by permission of David McGoveran, Miller Freeman Inc. and Pearson Education, Inc. © All rights reserved by C.J. Date. Research has shown that certain detail level corrections might be needed, which we may undertake in the future. However, we still believe strongly that the overall approach is sound.

Musings: a possible comparison

On 3/02/2017, at 9:06 PM, Dave Voorhis <dave@armchair.mb.ca> wrote:

The essence of the debate is perhaps something like this:

The update-through-views proponent sees this...

VAR v VIEW ...

…and feels it is perfectly reasonable to allow

UPDATE v WHERE Name = ‘Dave Vorhis’: {Name := ‘Dave Voorhis’}.

The update-through-views opponent sees the above, and feels it is akin to...

#define e (int)((p / n + sqrt(r) * 0.156) * pow(q, 3))

…and expecting an assignment like e := 2 to meaningfully update p, n, r and q.

The update-through-views proponent sees the above and says no, it’s really akin to…

a[e] := 2;

…which is fine, isn’t it?

The update-through-views opponent disagrees that it’s akin to updating an array.

(Speaking as an update-through-views-sometimes proponent:)

I've never been happy likening update-through-view to update-to-array. Because there's only one way to update an array, and if two different expressions are targetting the same cell, that's referentially transparent.

An array you can't change the size/shape merely by assigning a new value; you can't at run time insert/delete cells (indexes) in the middle; you can't add an extra dimension. (Changing shape would be something like renumbering the indexes in an array with rationals between the integers, and leaving 'gaps'.)

Contrast that with updating relations -- even just base ones -- there are expressions that target the same tuple, but in non-obvious ways:

  • S WHERE S# = 'S1';
  • S WHERE SNAME = 'Smith';

You can update that base, and those same expressions no longer target the same tuple:

  • UPDATE S WHERE SNAME = 'Smith' : {SNAME := 'Smythe'};
  • INSERT S REL{TUP{S# 'S17', SNAME 'Smith', ...}};

(Indeed the idea of "the same tuple" is a nonsense, unlike an array cell. I mean: no longer return the same result as each other.)

You can update a relation to change it's size/shape, such that expressions that used to target a single tuple now target multiple tuples, or no tuples at all.

Note all this applies just with update to base relations.

If we're looking for a conventional data structure "akin to" relations, could I suggest a doubly-linked list:

  • There are expressions that target the same cell, but in non-obvious ways
left(left(left(L))) === right(right(right(right(L))))
  • You can at run time insert/delete cells, which changes the list's size/shape, and
  • thereby those expressions no longer target the same cell.
  • You can 'target' cells by content: form a list by filtering.
  • For extra fun, consider that each cell could itself be a doubly-linked list.

So what would be the doubly-linked equivalent of update-through-views vs to-base? Perhaps targetting through the left vs through the right. Both are feasible; and for every leftwards update there's an equivalent rightwards update. But each time you insert/delete cells you have to recalibrate those equivalences.

You could also target by content an update to doubly-linked list: insert before the leftmost cell value 'XYZ'; delete 3 cells to the right of the rightmost cell value 'ABC'.


Worked examples

[todo: move to a separate wiki page; expand with other examples from discussions; precisely specify the rewrite rules and qualifying conditions]

JOINs

A) INSERT through 1:many JOIN with IND Supported for update (algorithm given before).

B) INSERT through 1:many JOIN without IND Not supported: an INSERT on the 1 side could MATCH tuples already in the many side, that would thereby 'appear' in the result; so breaking Assignment Principle.

C) DELETE through 1:many JOIN (with or without IND) Supported: delete from the many side only. Deleting from the 1 side might 'unMATCH' non-targetted tuples on the many side, that would thereby 'disappear' from the result. If no such tuples, we might DELETE the 1 side, but that's not essential.

D) DELETE through 1:1 JOIN with two-way IND (i.e. EQD) Supported: delete from both sides.

E) DELETE through 1:1 JOIN with a single IND Suppoerted: delete from the included side only.

F) DELETE through 1:1 join without IND Not supported: could be achieved by deleting from either side only or both. Then neither tuple DELETE is essential.

Projections

Projections are awkward for updating through. Push projections towards the leaves as far as possible, for example:

  • For [NOT] MATCHING, which needs a projection and JOIN
Appendix A gives: re1 MATCHING re2 ==> (re1 JOIN re2) { attribs of re1 }
Instead: re1 MATCHING re2 ==> re1 JOIN (re2 { attribs in common } )
  • For projections that are really implementing RENAME, change to RENAME
(EXTEND re : { B := A }) {ALL BUT A} ==> (re RENAME A AS B)
  • For no-op projections (i.e. is projecting all attributes), all updates are supported.
Rewrite updop (re{ ... }) ...; ==> updop re ...;

G) DELETE through projection Supported (whether or not the projection is key-preserving.) DELETE (R{ ... }) WHERE p; ==> DELETE R WHERE p; DELETE (R{ ... }) v; ==> DELETE R (R MATCHING v); We must DELETE exactly those base tuples to achieve the DELETE of the proj. We can't DELETE too many, by those rewrites.

H) INSERT through projection Not valid. (No surprise.) Therefore

I) assign (:=) through projection Not valid.

J) UPDATE through projection Valid providing the projection is key-preserving (i.e. at least one key). UPDATE (R{ ... }) WHERE p : assigns; ==> UPDATE R WHERE p : assigns; Note: the assigns could be changing the key.

Valid even if the projection is not key-preserving, providing the assigns are not changing attribs in the based-on's key (for at least one key).