Catalog

From D Wiki
Jump to: navigation, search

TTM and the Catalog

While the concept of DBMS's having a catalog is almost universally accepted, TheThirdManifesto has precious little to say about it. Some ramblings to explore why this is so, and perhaps inevitably so ...

Definition

"The catalog" is usually understood to be a database in which information about other databases is recorded (this does not preclude the possibility that the catalog also contains similar information about itself). However, the scope covered by that term "information" may differ vastly between systems. The detailed nature of what is to be recorded may differ as well. That makes it unrealistic to try and prescribe in detail what could be called a "logical schema for the catalog". That does not mean no reasonable "common ground" can be found regarding what should reasonably be findable in the catalog for a given D. An overview.

Types-and-operators systems

> The mere existence of a type, and the details of all its possreps.

> Technical details about the implementation of the type.

> If the system supports the IM, the sub-/supertype relationships between those types.

> The inventory of operators available for operating on values of those types

> Technical details about the implementation of those operators.

Below we mention some issues that make it an iffy proposition to try and prescribe in detail what a catalog should look like in a D.

> Type names. Types are named sets of values. And it's only obvious that a type's existence should be documented in the catalog. So the catalog should keep a record of existing types and their name. So a type name should appear as a value of an attribute in a tuple in a catalog relvar defining types. So there should be a type for that name. But prescribing what type that should be, amounts to prescribing what should and should not be a valid name in a particular D. Not something TTM wants to do.

> The notion of separation of the physical from the logical in order to give the implementer the freedom to offer whatever features he likes at the physical level, makes it by definition impossible to prescribe anything regarding the nature of what a catalog should contain for physical implementation details. By way of example, one system could have its type and/or operator implementations packaged as entry points in a jar or dll on the local file system. Another system could be using a shared types server for that purpose.

Logical descriptions of the database

> The definitions of the database relvars, This concerns base relvars as well as virtual relvars, if the system supports the latter.

> The definitions of all constraints applicable to those relvars

Once again, Below some issues that make it an iffy proposition to try and prescribe in detail what a catalog should look like in a D.

> Details to document. Since the raison d'être for a relvar in a database is essentially nothing else but its relvar predicate, it seems only natural that a catalog should include this vital piece of information. Hardly any systems exist that actually do that, however. Most systems regard it as a piece of "documentation" that can equally well be kept outside the catalog.

> To enforce or not to enforce certain design practices. The practice of maintaining "data dictionaries" may be less widespread that it once used to be, but it has not entirely lost its flock of followers. Typically, data dictionaries encourage the practice of using the same name for the same business element. Typically, the practice will also encourage using same data type for same business element. In a catalog describing database relvars, it is fairly self-evident that a relation schema (relvarname attrname typename) will exist, somewhere somehow. Said data dictionary practice ends up as being the question of whether the FD ATTRNAME->TYPENAME applies to this schema or not (and as being the question whether normalization should then split this schema in two. The practice will have its adepts as well as its opponents, with not much of any objective material available for cutting the issue once and for all.

> Categorisation of constraints. People like to compartmentalize, and database designers are rarely any different. Looking at a given database design, lots of constraints will apply to this design, and one trick in the bag for keeping this bunch of constraints "manageable" is to divide them into "classes". Single-relvar constraints are often distinguished from multi-relvar constraints, for example. For reasons of familiarity, the special class of "foreign key" constraints might be isolated from all the other multi-relvar constraints as a class in its own right. Some systems have a distinct class of "single-tuple constraints". The benefits of this or that particular classification of constraints, are largely in the eye of the beholder. Any prescription in this area would amount mostly to the prescriber forcing his private views/preferences/biases on the others.

Details of the security system

> Identity of the users who are allowed to access the system

> Details of the authentication procedures and credentials per user

> Details of who is allowed to do what on/to the database

Administrative data

> Statistics about data volumes and distribution of values used in the database

Details of the physical design of the database

> Storage volumes, locations of physical files, ...

> Inventory of data record types and indexes recorded in those files

> If storage is relegated to an underlying SQL system, connection details of that SQL engine

(Ab)Using the catalog

The mere fact of having information about physical implementation details recorded in the catalog, combined with the catalog having to be available for querying, by definition opens a backdoor to break said separation. It means that user programs, almost by definition, always have a means at their disposal to "introspect" into the physical details of the system they're working with. Maintaining the logical/physical separation is also a matter of the user not using the catalog for such purposes.

Updating the catalog

It is common for database systems to make the DBA use a separate language for the purpose of creating and maintaining the structure of databases. In such a scenario, "updating the catalog" is not an action performed directly by the DBA, but is in a sense merely a byproduct of the DBA using this DDL language. Not all systems work this way though, and it turns out to be perfectly possible to expose the catalog directly as an updateable database (updateable just like any other "regular" user database, that is).