Thursday, December 15, 2005

Marks

The Man Who Wasn't There: problems of missing or partially missing data in geoscience databases "In the literature discussions between Codd and Date on the propriety or otherwise of NULLs in relational databases, there seems to have been some confusion on both sides, on one very important question. That is the distinction between database representation and function evaluation. NULLs are one approach to the problem of handling missing data within the database."

So in this respect RDF is great - you don't have to come up with a value or values to represent missing data. You only have to worry about function evaluation.

"In fact the Codd 'mark' solution does not in itself require, as unfortunately implied by Codd himself, and vigorously attacked by Date, the use of 3- or 4-valued logic, and therefore cannot be dismissed so easily. Relational database theory is based on first-order predicate logic, which uses two truth values TRUE and FALSE. If there is no value for a data item, then the logical statement corresponding to the tuple containing that item can simply omit any mention of that particular column. If the value of this data item is required in an operation, then there is only one truth value which can be returned: FALSE. This applies to database set operations such as JOINs and also to numerical operations such as totals and averages where the absence of any required data value prevents the computation from being carried out. If a total or average is required in such a situation, then the problem can be circumvented only by first selecting non-absent data. This is the correct treatment, to ensure that statistics are computed on a valid data set."

Another example of marks, tuple marks.

SH writes in about incomplete data in observational science databases, the open world assumption, 3VL and NULL, McGoveran responds saying: "In a scientific database such as the type to which you allude, a reasonable interpretation of True and False under CWA is "valid by experiment and consistent with hypotheses" and "not validated by experiment or inconsistent with hypotheses". If you give this differentiation up with CWA and nulls, you've given up scientific reasoning and the scientific method."

In Kowari, if you have the following triples: _b1, <urn:sno>, "S1"; _b2, <urn:sno>, "S2"; _b3, <urn:pno> "P1".

And performed the following query:
select $s1
...
where $s1 <urn:sno> $o1 or $s2 <urn:pno> $o2;

It returns: _b1, _b2, null (really unconstrained).

However, if you select $s2 instead it returns: null, b3. Using the above idea, it would return _b1, _b2 for the first query and _b3 for the second.

I'm not sure I really like this solution, preserving unknown seems to make more sense as demonstrated in How FirstSQL Solves the EXISTS and Other Problems.

No comments: