More News: 03/01/2005

Thursday, March 31, 2005

You are the Same, Just Like Everybody Else

Two follow-ups from today's post:
* OWL Cardinality "If minimum cardinality means nothing in an open world, then there was very little point in including it in the OWL language. Whoever did so probably forgot about the open world assumption at the time. I can't blame them, as I find it very easy to forget. So I think I should assume that the intent was for this to operate in a closed world assumption. Similarly, I have to assume the same about the intent of maximum cardinality."

* An inconsistency or not? "As you can see from the above sentences, Harry is a Person.
And, by r1, Harry can have exactly one value for hasFather property.
But, it is asserted that Harry has two different
values, John & Johnny, for hasFather property.

I can't decide which of the following is true.

A. John and Johnny are actually the same individual.
B. The above sequence of sentences are inconsistent."

Why Different Things are the Same

I was reading what I think is a good introductory book, "A Semantic Web Primer" at the same time reading "OWL and Cardinality".

In the Primer book it states: "...the non-unique-names assumption is the most plausible one to make on the World Wide Web..." Which I didn't think made sense.

In Cardinality: "The difficulty in checking this for consistency is that an <owl:sameAs> statement can make it all valid:
<ns:firstPropertyObject> <owl:sameAs> <ns:secondPropertyObject>" Which did make sense.

The thing that eventually struck me was that the assumption Paul makes, and I have made, is something very common - names are unique. However, this is not the case in OWL: "Just because two names are different does not mean they refer to different individuals."

So using cardinality restrictions, systems can infer that if you have two values for a property when the ontology defines you should have one, it's not that you've broken cardinality, it's that those values are actually the same/equal. In Paul's example sameAs isn't required - firstPropertyOne and firstPropertyTwo are the same. This seems weird, completely backwards and very non-intuitive.

Looking around I found Pellet: An OWL DL Reasoner which says: "In general, a Semantic Web reasoner should handle individuals (provide ABox reasoning), should not make the Unique Name Assumption..."

It doesn't really say, though, why you need non-unique names. It does seem perverse that the basis for this, the URI, is unique.

In fact, the opposite case seems to be well stated in "OWL Flight vs. OWL DL": "We argue that very few equalities can actually be resolved with reasoning and that many derived equalities are actually faulty. Thus, it makes more sense in our opinion to either resolve equalities beyond the logical language or to make strong assumptions on the available knowledge, i.e. assume that each identifier in the knowledge base uniquely identifies an individual..."

It was resolved as an issue by the Web Ontology group by having to use AllDifferent. Where a collection is used hold all your distinct items. This seems like a decision to make the usual case more difficult and less scalable.

Saturday, March 12, 2005

Making Money from the Long Tail

The long tail of software. Millions of Markets of Dozens. "You know the real reason Excite went out of business? We couldn’t figure out how to make money from 97% of our traffic. We couldn’t figure out how to make money from the long tail – from those queries asked only once a day."

"57% of Amazon’s sales come from books you can’t even buy at a Barnes and Noble (to be fair, there is some skepticism around this number voiced here). This runs totally counter to the traditional 80/20 rule in retailing – that 80% of your sales come from 20% of your inventory. In Amazon’s case, 57% of their book revenue comes from 0% of Barnes and Nobles inventory."

"iTunes has over one million songs in it’s catalog. You know how many have been bought at least once?

Every one."

Promising Project

A scalable environment for the Semantic Web "Taking a bottom-up approach to the development of the Semantic Web, a scalable and modular Knowledge Management system has been successfully validated on two university websites.

Created by the IST program funded project MOSES, which ends this month, it enhances the possibilities for ‘knowledge mining’ by providing users with a natural-language query interface that offers direct responses to specific questions. Most significantly, the MOSES system creates a modular and scalable environment allowing content to be automatically updated, while interlinking ontologies from different academic websites in different languages.

“The biggest achievement I think was being able to combine different ontologies from different websites into a single application,” explains project coordinator Alfredo Ricchi at Finsa Consulting in Italy. “The system has had some encouraging results.”"

Mentioned as part of Grid Start.

Example OWL Ontologies and Cooking for the Semantic Web
OWL and Topic Map Pudding.

Wednesday, March 09, 2005

Bunch of Links

* Too Darned Big to Test - Make it smaller.
* XForms Editor in OpenOffice.org and A First Look at OpenOffice.org 2.0.
* Why REST is Better - Part 3 - Links to Push and Pull Message Delivery in the PSB Architecture.
* Open Source Profilers for Java. Also mentions the free but not open source JFuild.

Using the Semantic Web to Handle Petascale Datasets

Scientific Data Management in the Coming Decade "Increasingly, the datasets are so large, and the application programs are so complex, that it is much more economical to move the end-user’s programs to the data and only communicate questions and answers rather than moving the source data and its applications to the user‘s local system."

"If the data is to be analyzed by generic tools, the tools need to “understand” the data. You cannot just present a bundle-of-bytes to a tool and expect the tool to intuit where the data values are and what they mean. The tool will want to know the metadata."

"In addition to standardization, computer-usable ontologies will help build the Semantic Web: applications will be semantically compatible beyond the mere syntactic compatibility that current-generation of Web services offer with type matching interfaces. However, it will take some time before high-performance general-purpose ontology engines will be available and integrated with data analysis tools...The XML integration in modern Database Management Systems (DBMS) opens the door for existing standards like RDF and OWL."

This would be something that I would disagree with - existing DBMS are not flexible enough to handle things like RDF. And OWL is even more crucially effected where you need the indexing available from something like Kowari. Other systems like Sesame and Jena are both adding tree structures to store their RDF. Some of that is maintenance and setup but also performance.

"As file systems grow to petabyte-scale archives with billions of files, the science community must create a synthesis of database systems and file systems. At a minimum, the file hierarchy will be replaced with a database that catalogs the attributes and lineage of each file. Set-oriented file processing will make file names increasingly irrelevant – analysis will be applied to “all data with these attributes” rather than working on a list of file/directory names or name patterns."

This is very similar to the TMex V2 architecture that I worked on.

Also Distributed Computing Economics "Computing economics are changing. Today there is rough price parity between (1) one database access, (2) ten bytes of network traffic, (3) 100,000 instructions, (4) 10 bytes of disk storage, and (5) a megabyte of disk bandwidth.".

Via Peta-scale Data Centers - a Trio of Great Jim Gray Papers (from: Kevin Schofield’s Weblog).

Tuesday, March 08, 2005

Watching the Watchers

Test your tests with Jester "Any statements that are not executed at least once are not being tested.

This approach, taken by tools like Clover and EMMA (see Resources), is valuable for finding untested statements -- but it's not enough. Knowing that a statement isn't executed by the test suite proves that it isn't being tested. However, the inverse is not true. If a line of code is executed, it doesn't necessarily follow that it's tested. It's entirely possible that the test doesn't check whether the line of code produces the correct result."

"Because Jester recompiles the code base and reruns the test suite for each change it makes, it runs orders of magnitude more slowly than more traditional tools like Clover. It's therefore important to pay some attention to performance. You can use a number of techniques to speed up Jester runs."

A list of useful OS tools from Thoughtworks (who have contributed to Jester) is: ThoughtWorks Open Source Java.

Kowari Changes since Pre-Release 2

Here's what's been committed so far (further updates will be put here):
* Transaction Timeout configurable. SR
* Client JRDF modifications - client side API now handles blank nodes, currently memory bound. AN
* KModel modifications - modified with changes to client JRDF and added RMI versions of API. Have to finish blank node client side changes. AN
* MP3 Content handler - changed to use different library. AN
* JRDF's RDF/XML parser (based on RIO) passes all tests. AN/DM
* UUID changes for blank node generation in JRDF and Kowari. RT/AN
* Fulltext acceptance tests now work without making changes to the OS environment. AN
* xsd:dateTime now millisecond accurate - also now using thread safe version to parser/format dates. ES/AN
* Refactoring of Tuples layer. More to come to remove duplication, reduce coupling and possibly adding features. AN

Sunday, March 06, 2005

Powerbook for Top Guns

The PowerBook Sudden Motion Sensor "This example creates a window displaying a bicycle wheel. The window i "stable" in the sense that if you rotate the PowerBook left or right, th window compensates by rotating itself by an equal amount in the opposit direction in an attempt to remain in its original orientation with respect to th ground. The bicycle wheel rotates too — independently of the window."

"While the PowerBook only uses the AMS as a defensive measure to prevent accidental damage to the disk drive, suc sensors could have a variety of uses. In particular, they have been considered as alternative input methods in use interfaces for video game controllers, phones, PDAs, and other mobile devices. While it is to be seen if they will b successful in these areas, such use at least has a novelty value"

Friday, March 04, 2005

ODA not OOD or OP

Grady Booch and others have written Ontology Driven Architectures and Potential Uses of the Semantic Web in Software Engineering. Anything that begins with: "Throughout the history of computing..." must be good.

"Having raised the idea of using of the Semantic Web in Software Engineering, a commonly asked question arises, namely; how does one broadly characterise the Semantic Web in terms of Systems or Software' Engineering use?"

"As a 'classification', merely to group together related tools and techniques for modeling rigorous semantics during specification and design stages of the Software Lifecycle."

"As a 'mechanism' for strongly identifying, discovering and sharing artifacts amongst discrete subsystems, systems and systems' design teams both during design and at runtime"

This second use is very similar to Ontological Programming and Lisp (better is better and Ontology-Oriented Design and Programming). Where developers/customers/analysts/etc share and maintain ontologies and develop solutions against these ontologies.

"Given that The Semantic Web uses triple-based data representation and that this is merely a minimalisation of the representation employed in relational database technologies, the attraction of considering the Semantic Web as a specialised relational framework has been recognised for some time."

Thursday, March 03, 2005

REST vs the Rest

Web Resources and WS-Resources " Now I can still only really talk from the point of view of first impressions, but it seems to me that the WS-Resource approach seems generally more concrete than the resource abstraction of RDF. I think it likely that WS-Resource constructs could be modelled in RDF. However, the emphasis of WS-Resource is very much on the protocol, so this isn’t really comparing like with like."

"I can see how state can be managed using WS-Resources, but what I having discovered is why the assumption is made that because SOAP++ messaging is usually done statelessly, it has to be that way. State can (and often is) maintained at both client and server sides of communications. If state that is global across client and server is needed then that can be maintained using one of the many reliable messaging protocols possible on top of HTTP."

Another XML UI Framwork

XUI: Finally, a Java GUI Framework You Can Love "From experience, it is almost always the case that the UI description and the functionality and business logic are mixed up within the Java classes.

With XUI, separating them isn't just easy, it's automatic. Using the XUI editor, you select an item, associate the name of the event that processes that item, and XUI will create a separate Java class containing the stubs for running that event."

HyperJournal

HyperJournal "HyperJournals versus “core journals”. By clicking on an author’s name, the HyperJournal system automatically searches the entire HyperJournal network and produces a citation list that includes all the articles written by the author, all the articles the author has cited, and all the articles that cite the author. Comprehensive bibliometric lists can thereby be composed without the need to rely on the manual consultation of a small set of “core journals,” often exclusively in English. In this system, by contrast, it will be the actual give-and-take of academic discourse, registered automatically on the network through citations, which will signal the prestige of a journal (even of small niche journals written in so-called minor languages) and establish the reputation of scholars. In addition, through the use of (semantic web) RDF describers, bibliometric lists can be constructed that distinguish, for example, between positive and negative citations."

Wednesday, March 02, 2005

Fedora 2.0 and Kowari

Fedora 2.0 Release Notes "Fedora 2.0 includes significant new features and improvements including the introduction of the Fedora Object XML (FOXML) schema as the new internal storage format for objects, introduction of the Resource Index that provides enhanced search capability, introduction of a Batch Modify utility, upgrades of all third party libraries, performance enhancements, and a number of bug fixes."

Kowari is mentioned in "Fedora: An Architecture for Complex Objects and their Relationships": "Fedora expresses relationships by defining a base relationship ontology using RDFS [17] and provides a slot in the digital object abstraction for RDF expression of relationships based on this ontology. Assertions from other ontologies may also be included along with the base Fedora relationships. All relationships are reflected in a native RDF triple-store using Kowari [44]. The query interface to this triple-store is exposed as a web service, providing a rich information foundation for external services."

Looks like there's some Trippi work too.

Chris Wilper and Edwin Shin both have contributed back to Kowari too. Most recently, with changes to the DateTime datetypes in the post-Kowari Pre-Release 2.

Agile Services

I'm probably just combining two buzz words to appear like a consultant, but it's more like RDF for Agile Services.

When You Sit Down To Write A Description "...you discover that doing so is unnecessary."

"In a REST-style service, we want an analogy to the self describing hypermedia we have in the HTML scenario. First let's assume we'll use some form of XML as our hypermedia. It's easy to imagine a XML document. Maybe even more machine friendly would be an RDF document -- a bunch of RDF statements. Your client invokes GET on the well known URL of the service, and receives an RDF form. The RDF form describes the names of parameters and how to serialize them. So just as in the user driven HTML case, the client needs no foreknowledge of what the neccessary parameters do, or even what their names are. But we still need an automated "user" to fill out those parameters. Since the RDF form describes the parameters in RDF, your client can map the RDF types of those parameters to elements in its data model. Your client has to "understand" the service's ontology, sure. But that is a one-time mapping of ontology elements to, say, SQL queries."

Follow-up.

Throw Away Your Hierarchies of Knowledge

Taxonomies and Tags: From Trees to Piles of Leaves "But traditional taxonomic trees aren’t something we can throw away without a thought. They are an amazingly efficient way of organizing complexity because they enable us to focus on one aspect (e.g., that’s an apple) while keeping a universe of context (it’s a fruit, part of a plant, a type of living thing) in the background, ready for access. Tree structures are built into our institutions. They may even be built into our genes. So we are in a confusing and fertile period as we try to sort out what works and what doesn’t. Without trees, how would we organize college curricula, business org charts, the local library, and the order of species? How will we organize knowledge itself?"

"Faceted classification still presents users with a hierarchical tree, making it easy for them to browse to what they want. But unlike traditional trees, faceted systems don’t decide beforehand how the branches are arranged. For example, if an ice cream stand organized its “customer experience” around a traditional hierarchical taxonomy – a tree – it might have a customer first choose between two flavors, then among three sizes, and finally between a cup or cone."

"Tagging systems are possible only if people are motivated to do more of the work themselves, for individual and/or social reasons. They are necessarily sloppy systems, so if it’s crucial to find each and every object that has to do with, say, apples, tagging won’t work. But for an inexpensive, easy way of using the wisdom of the crowd to make resources visible and sortable, there’s nothing like tags."

Another Way

I heard about this before I saw it (as I now have to prioritize what I do), first ant game. I guess I'm the only one who pines for declarative Ant.