Update through views: a possible approach

From D Wiki
Revision as of 11:51, 6 February 2017 by AntC (Talk | contribs) (Worked examples)

Jump to: navigation, search

(In draft as at early February 2017.)

It is assumed the reader is familiar with the topic of update through views. 'View' is the term usual in database discussions, as for example in SQL. The ttm term is 'Virtual relvar'. Updating through views is discussed, for example, in both DTATRM and DBE. Most textbooks on the Relational Model or SQL discuss views and at least mention updating through views.

Background

Update through views is a regular discussion topic on the ttm forum. And a regular topic for Chris Date, co-author of The Third Manifesto. He has published many treatments, since particularly a series of articles (with David McGoveran) 1994 [published on dbdebunk]. Arising from that 1994 treatment also was the Principle Of Orthogonal Design, aimed at making update through views more tractable. McGoveran continues to publish on the topic (nowadays separately to Date), emphasises Orthogonal Design as key to updating databases -- albeit a different interpretation from Date. The topic has also been much discussed on older forums, such as comp.database.theory.

There is a wide range of opinions on the desirability, possibility and practicability of update through views:

  • Those who assert update is possible through all views, and the mechanisms merely need refining.
  • Those who prescribe update should be possible through all views "that are theoretically updatable".
    • (The quote is from Codd's 12 rules (1985), rule 6. The narrative for that rule includes a definition of 'theoretically updatable'.)
  • Those who think some but not all views are updatable through, (which is not inconsistent with Codd 1985) and there is merit in a relational dbms being able to support that ability.
    • Note that the SQL standard stipulates some capabilities, which have been somewhat implemented by vendors.
  • Those who think that although some views might support being updated through, that is certainly not all of the views encountered in practice, so the uncertainty and continuing debate do not justify implementing a feature that can at best be only partial.
  • Those who are adamant that (even if some views are updatable-through unproblematically), it is an unnecessary feature, and adds complexity both to the dbms and to developers/schema designers wanting to update the database.
    • (It is true that update to base relvars is always sufficient, and necessarily so: the objective for update through views is to translate updates requested against views into updates against base.)

The "possible approach" propounded in this article is squarely aimed at getting some progress in the middle of this range. Specifically:

  • The debate has tended to be polarised, with the most prominent advocates (Date particularly) at the "all views can be updated through" position.
  • Date's continuing inability to 'solve' the problem (even to his own satisfaction) have lead to many being sceptical.
  • The only major attempt at a partial implementation is SQL. It is beset with limitations and hard-to-explain behaviour. (And subtle differences between different vendors' implementations.) Such that most developers avoid the 'feature', and advise others to do the same. (See, for example, regular questions on StackOverflow.)
  • Consequently there appears to be little thinking that acknowledges some views can be updated through, some not, and seeks to arrive at a precise characterisation of which and why. (And how, for those updates that can be supported.)

Considerations for an approach

[email from the forum]

On Jan 21, 2017, at 2:53 AM, Anthony Clayden <anthony.d.clayden@gmail.com> wrote:

The whole topic has suffered from too much ad-hockery. I have a small and perfectly-formed set of rules in mind.


So my general position:
  • some views are updatable-through; some aren't;
  • some views are INSERTable-through but not DELETEable-through, and vv.
  • rather than putting effort into developing byzantine rules to make more views updatable;
  • the effort should go into agreeing a small set of rules [**] that
a) are clear what's updatable, what isn't;
b) have clear effects for views that are updatable
c) can give clear explanations (compilation error messages) for those that are not.

Note [**] the "rules" were in later discussions re-titled as "prescriptions". The following has been editted accordingly.

I have four-and-a-bit prescriptions. Of which the bit is merely to say that if the four prescriptions don't give a clear answer how to update, then the view is not updatable-through (for that action) -- i.e. c) above.

The prescriptions

The first two prescriptions are standard on this topic [Dayal & Bernstein 1978]'s preserves semantic consistency and no side effects.

The third prescription is really to make view updating tractable and predictable/manageable for the dbms and application developers. The same motivation as for ttm's OO Pre 1 (compile time type checking).

It is the fourth I've long had trouble with -- hence this prologue. When I say "fourth", this has been up to 4 or 5 separate rules in my previous versions. I think I've now found a formulation to condense [D&B 1978]'s unique, no extraneous updates, my own unambiguous; and to find grounds to reject [D&B 1978]'s "uncomfortable dimension" with which I've never been, er, comfortable. The notion -- that both tuples for the update and their attributes be essential -- is cribbed from D,D&F's ETNF paper. [Darwen, Date, Fagin 2012, ETNF = Essential Tuple Normal Form]

There are footnotes for each prescription, but let's state them succinctly first:

  1. At all times all constraints must hold. (rubric: The "Golden Rule".)
  2. Any update through a view must have precisely the effect on the content of the view as if it were a base relation. (rubric: "Assignment Principle")
  3. That a view is updatable-through (for some action INSERT, DELETE, UPDATE, assignment :=) must be determined by reference only to the schema, view's definition, constraints and the specified update effect for that action, per prescription 2. (rubric: "Not dependent on the happenstance of content")
  4. Noting that the update is to be effected by some combination of tuples deleted and inserted into the database (base relations): each such tuple must be necessary to achieving 1. and 2., the set of tuples (deletes and inserts) must be sufficient, and each attribute value must be similarly necessary. (rubric: essential tuples, essential attribute values.)
  5. If there is no set of necessary, sufficient tuples/attributes per 4., that view is not updatable-through for that action.

Footnotes to the prescriptions

  1. Knowing all constraints must hold, the compiler can:
    a) rely on them all holding before it plans the update steps;
    b) examine them (in the catalogue or otherwise) to guide the update.
  2. I'm not expecting the update actions to be necessarily limited to the traditional INSERT/DELETE/UPDATE/assignment(:=).
    Opportunity for new forms of update. (I've toyed with a few on the forum.)
  3. The consequence is that the algorithm for update be derivable at compile time.
    (So could be implemented as a stored procedure.)
  4. Essential tuple means that it is necessary for those tuples to be deleted/inserted to comply with 1., 2 for all valid states of the database.
    Essential attribute value means that value is the only possible to comply with 1., 2 for all valid states of the database.
    Sufficient requires that if there are two (or more) different sets of tuple actions/values that comply:
    • and one is a superset of another (deletes and inserts taken separately), then the tuples in the difference are not necessary (not essential).
    • or the sets only partially overlap (and their intersection is not sufficient), then the tuples unique to each set are not necessary (not essential).


The ETNF paper's use of essential tuple is (loosley speaking) that without that tuple some constraint would be violated, and we couldn't 'recreate' that tuple from other content to satisfy the constraint. I'm extending that idea: without that tuple (inserted or deleted) in the base some constraint would be violated and/or the update would not have precisely the effect requested for the view.

References

  • Codd's 12 rules -- Rule 6 and definition, see also discussion under Rule 9.
(The bare rules just prescribe.)
http://computing.derby.ac.uk/c/codds-twelve-rules/
  • Dayal and Bernstein 1978 "On the Updatability of Relational Views”
https://pdfs.semanticscholar.org/39f4/853a35a913c714c5f3956dbefd02916447c4.pdf
references a 1975 paper Chamberlin, Gray, Traiger
https://www.computer.org/csdl/proceedings/afips/1975/5083/00/50830425.pdf
  • Bancilhon and Spyratos 1981 "Update Semantics of Relational VIews"
http://pages.cs.wisc.edu/~jhuang/qual/update-semantics-of-views.pdf
  • Darwen, Date, Fagin 2012 "A Normal Form for Preventing Redundant Tuples in Relational Databases"
http://researcher.watson.ibm.com/researcher/files/us-fagin/icdt12.pdf

Worked examples

[todo: move to a separate wiki page; expand with other examples from discussions; precisely specify the rewrite rules and qualifying conditions]

JOINs

A) INSERT through 1:many JOIN with IND Valid for update (algorithm given before).

B) INSERT through 1:many JOIN without IND Not valid: an INSERT on the 1 side could MATCH tuples already in the many side, that would thereby 'appear' in the result; so breaking Assignment Principle.

C) DELETE through 1:many JOIN (with or without IND) Valid: delete from the many side only. Deleting from the 1 side might 'unMATCH' non-targetted tuples on the many side, that would thereby 'disappear' from the result. If no such tuples, we might DELETE the 1 side, but that's not essential.

D) DELETE through 1:1 JOIN with two-way IND (i.e. EQD) Valid: delete from both sides.

E) DELETE through 1:1 JOIN with a single IND Valid: delete from the included side only.

F) DELETE through 1:1 join without IND Not valid: could be achieved by deleting from either side only or both. Then neither tuple DELETE is essential.

Projections

G) DELETE through projection Valid (whether or not the projection is key-preserving.) DELETE (R{ ... }) WHERE p; ==> DELETE R WHERE p; DELETE (R{ ... }) v; ==> DELETE R (R MATCHING v); We must DELETE exactly those base tuples to achieve the DELETE of the proj. We can't DELETE too many, by those rewrites.

H) INSERT through projection Not valid. (No surprise.) Therefore

    • Except where this is a no-op projection (i.e. is projecting all attributes).

I) assign (:=) through projection Not valid.

J) UPDATE through projection Valid providing the projection is key-preserving (i.e. at least one key). UPDATE (R{ ... }) WHERE p : assigns; ==> UPDATE R WHERE p : assigns; Note: the assigns could be changing the key.

Valid even if the projection is not key-preserving, providing the assigns are not changing attribs in the based-on's key (for at least one key).