More News: 07/01/2004

Friday, July 30, 2004

Which one?

The question is: the Semantic Web or global warming. 6-7 metres if Greenland melts is nothing compared to 100 metres if Antartica melts.

Mocking Woody Guthrie's Memory "Guthrie wanted credit for what he wrote, but he had contempt for severe legal restrictions on what others might do with it. According to Pete Seeger, in this account (widely acknowledged in the folk world to be true) from the Museum of Musical Instruments, when Guthrie was singing on the radio in Los Angeles during the Depression, he'd mail mimeographed songs to listeners, and wrote on one:
"“This song is Copyrighted in U.S., under Seal of Copyright # 154085, for a period of 28 years, and anybody caught singin’ it without our permission, will be mighty good friends of ourn, cause we don’t give a dern. Publish it. Write it. Sing it. Swing to it. Yodel it. We wrote it, that’s all we wanted to do.”
I'll bet, therefore, that Woody would be horrified -- and angered -- by the behavior of an outfit called The Richmond Organization, which controls the copyright to his music. This humor-impaired crew has gone ballistic and has launched legal threats (CNN) at JibJab."

Thursday, July 29, 2004

Triple Store Bake-Off

Scalability Report on Triple Store Applications "Drawing conclusions about remotely accessible stores is more pertinent to our project requirements. In passing, it seems MySQL 3 performs the most quickly in general as a Jena store, and Kowari shows some great promise with its order of magnitude less time for configuration and its speed of loading data into the store.

Browsing and configuration times were the most pertinent figures to our future work. We don't believe the browsing times are really significant beyond the second granularity, so by that metric, it appears models with a performance between one and two seconds are potentially worth pursuing. All of our network models with caching appear to fall in that range, which is perhaps not a surprise since all of them implement caching in approximately the same fashion.

This leaves configuration time as the more interesting metric - how fast does a store return its results for creating the in-memory cache? For network models, the fastest were 3store and Sesame with files, though using files for the remote store is akin to using an in-memory model for our application, meaning it probably is not feasible for extremely large stores. So 3store and Sesame using MySQL 3 appear to be our best choices."

What would be nice to see is the data and queries being done. Some of the code is here.

There does seem to be some slight errors in the code, like creating a new ItqlInterpreterBean every time which effectively sets up a new RMI session. There are large differences in the testing, like the local "Load Page" is slower than over the network by two orders of magntitude, this may have to do with using the Jena API on top of Kowari.

The "configure" tests appears to be testing different things, because the variation in results including both the network and local tests is from 2ms to 200,000ms. The difference in Kowari local vs Kowari over the network is 2166ms vs 80304ms. Which shows the network version is slower but the only difference should be RMI and 78 seconds seems excessive even for RMI.

And the use of "In-Memory" should probably be "In JVM".

Something not shown in the graphs is the time taken to load the triples:
* Jena w/ Postgres - 971784 ms
* Jena w/ MySQL 4 - 844257 ms
* Jena w/ MySQL 3 - 667138 ms
* Kowari - 139092ms
* 3Store - 213088ms

Overall, it's pretty much what's expected, Kowari can achieve an order of magntitude improvement over SQL databases even over small datasets. Comparing Kowari against an SQL database with 5-10 million statements would show a greater margin of difference. Jena Fastpath and creating our own Model implementation should speed some of these results up.

Wednesday, July 28, 2004

Is that curve a little steep?

Learning curves "The Semantic Web and ARRESTED are extensions built upon the Web and REST. I believe that these will be the future of loosely coupled, document oriented services offered and integrated over the Internet."

First I'd heard of ARRESTED.

The Sound of IR

How-To Turn your iPod in to a Universal Infrared Remote Control " How did we do this? Basically, we “recorded” the “sounds” an infrared remote makes on a PC and then put them on an iPod as songs. Adding a special sound-to-IR converter then turns those sounds back to IR and allows you to use your iPod as a remote control. As an added bonus, it works up to 100 feet. It’s a slick all-in-one unit and we’re never going back to 6 remotes ever again."

Tuesday, July 27, 2004

More Kowari References

* Bipartite Graphs as Intermediate Model for RDF.
* An approach to using the Resource Description Framework (RDF) for Life Science Data "Several open source solution were evaluated but did not meet our performance requirements. To be fair, few projects claim to support such large data sets and most focus on providing advanced features such as inference capabilities instead. Kowari [http://kowari.sourceforge.net/] was the most promising solution, but does not at the time fulfill the last two requirements." The last requirement was maintaining insertion order.
* del.icio.us / url
* RDF APIs (JRDF 0.3 should be out soon, btw).

First Kowari Kontribution

KModel - Client-side Jena Model Impl for Kowari by Chris Wilper. Haven't had a look at it yet but hopefully this can be rolled into Kowari.

Chris is the author of RDQLPlus.

Monday, July 26, 2004

What to do with a 40 Petabyte iPod

A Conversation with Brewster Kahle "Let's consider the question of how much information there is. If you break it down, it turns out to be not that big of a deal. The largest print library in the world, which is the Library of Congress, has about 28 million volumes. A book is about a megabyte. That's just the ASCII of a book, if you put it in Microsoft Word. So 28 million megabytes is 28 terabytes, which fits in a bookshelf and costs about $60,000 right now. Storing books in ASCII is no problem, and the scanned images are more but still affordable.

Scanning books costs between $5 and $20. That's the mechanical cost if you just wanted to scan a book and end up with the images of the pages at high enough resolution that you could print it on a high-end laser printer so it would be a good facsimile at 600 DPI, color—a nice-looking book. So books are doable, in terms of technology.

Now let's take music. It's been estimated that there are about 2 to 3 million albums. In terms of salable units—things that were sold as either 78s, LPs, or CDs—that's the universe of commercial music. If you do the math again, it's a few more of your bookshelves. So you're still not talking about anything daunting.

If you take movies and video, Rick Prelinger [founder of a film collection known as the Prelinger Archives] estimated that the total number of theatrical releases of movies was between 100,000 and 200,000. Again if you do the math, based on DVD quality, you come up with low numbers of petabytes [one petabyte is 1 million gigabytes]."

You'd still have enough storage space left over for your address book, email, and every second of your life in video.

What was also interesting is the comments about the printing of library books rather than borrowing:
"A 100-page black-and-white book with current toner and paper costs in the United States is $1, not figuring labor costs, rights costs, or depreciation of capital. That's an interesting number, because at a buck a book, it turns out that for a library, it could be less expensive to give books away than to loan them. In his book, Practical Digital Libraries, Michael Lesk reported that it cost Harvard incrementally $2 to loan a book out and bring it back and put it on the shelf. This is not figuring in the warehousing costs and all the building costs. This is just the incremental cost of loaning a book out."

D2RMap 0.3

Now with added Kowari. "D2R Map Version 0.3 has been released. The new release supports different Jena model implementations like the Kowari Metastore. ProcessMap methods, connection and driver accessors have been added to the D2R processor. The error handling has changed to Log4J and Ant build scripts have been added. Thanks a lot to Robert Turner from Tucana Technologies for his contributions."

Sunday, July 25, 2004

RDF Mapper 2.0

RDFMapper: an RDF-Based Web Mapping Service "RDFMapper is a web service that searches an RDF or RSS file for resources with geographic locations, and returns a map overlayed with dots representing located resources. Clicking on a dot displays a web page representing the clicked resource (see these examples). Arbitrary images can be treated as maps, so the service can be used for any kind of image annotation.

RSS is translated into RDF before processing (except for RSS 1.0, which is already RDF). For brevity, RSS is mentioned in what follows only when the non-RDF variants of RSS (RSS 0.9x and RSS 2.0) require explicit discussion."

Thursday, July 22, 2004

Tamino goes Semantic (sort of)

Software AG's Tamino takes a 'semantic' step "Whether the W3C vision of the Semantic Web can be implemented in the real world may be debatable. But XML-based semantic technologies do have potential to be useful within the enterprise, contends Mike Champion, senior technologist with Software AG Inc., Reston Va."

"In keeping with Champion's vision of Tamino evolving with semantic technology, he pointed out that the new version offers capabilities for a meta data repository containing definitions of business terms that can be used for 'semantic integration.'

The new version has a special developer's edition and includes improvements made for developers, including:

* expanded XQuery, XPath and text retrieval functions, including a thesaurus;
* additional indexing capabilities for rapid query execution;
* improved handling of standard XML schemas; and
* a redesigned and more intuitive online tutorial."

Also of interest is Perspective on XML: Steady steps spell success with Google.

Two for Thursday

Making RDF Data Available for XML processing "The RDF Data Access Working Group is charged with providing access to RDF Knowledge Bases (repositories, data stores – we will use the term repository) by selecting instances of subgraphs from an RDF graph. This will involve a language for the query, and the use of RDF in some serializations for the returned results. As part of the requirements process, the Working Group has refined this to include Variable Binding Results and local access to RDF repositories."

Defining N-ary Relations on the Semantic Web: Use With Individuals "In Semantic Web languages, such as RDF and OWL, a property is a binary relation: it links two individuals or an individual and a value. How do we represent relations among more than two individuals? How do we represent properties of a relation, such as our certainty about it, severity or strength of a relation, relevance of a relation, and so on?"

If you look closely...

... you can see the lunar research and hosting centre.

IBM Releases Semantics Toolkit

IBM Semantics Toolkit "The semantics toolkit contains three main components (Orient, EODM, and Rstar), which are designed for users of different levels.

1. Integrated Ontology Development Toolkit (Orient), as a visual ontology management tool, is mainly used by domain experts who have limited computer knowledge but who are familiar with specific domain knowledge. It is designed as a set of loosely-coupled cooperative Eclipse plug-ins. Orient can now run on Eclipse 3.0 or compatiable software. Orient is a joint R&D project of IBM China Research Laboratory, Beijing, and APEX Data and Knowledge Management Lab, Shanghai Jiao Tong University.
2. Extended Ontolgy Definition Metamodel (EODM) and RDF Repository Star (RStar) provide a set of programming APIs for programmers and IT specialists. EODM is designed to provide a high performance OO interface for the programmer. Now, it is mainly used to manage ontology-level data with limited size.
3. RStar is used for storing and querying mass data, most of which belong to the instance level. In such a situation, the programmer will use SQL-like sentences to manipulate data."

"RStar provides a high-performance RDF storage and query system. It can takes RDF/XML files or RDF triples as input for loading ontology and instances. It accepts queries in the RStar Query Language and returns results as tables. It supports RDF(S) inference. Currently, RStar uses relational database as its back-end storage."

From, Semantic Web Interest Group IRC Scratchpad.

Wednesday, July 21, 2004

XQuery or SQL

SQLfX: Is It Progress Or Piffle? "“XML is important enough that it’s pulling the SQL market apart,” asserted David, expressing concern about the proprietary solutions that have emerged from the so-called big three database vendors, which have largely ignored David’s ideas. “IBM, Oracle and Microsoft are all very different in how they approach XML support, and each requires training. If you want to combine or pull data from two of those products, you have to learn those two.”"

"“With XPath calls, you go down one leg at a time,” he said, with manual coding required to traverse more than one leg at a time. “XQuery’s FLWR statement has loop statements. But you’d have to do your own correlation between paths and set up a different path call on each leg, and that gets complex.”"

"At the heart of SQLfX, which David expects to release in mid-2005, is SQL’s “outer join” operation. This brings two hierarchical structures together as a means of coping with XML’s nesting. “If you’ve ever looked at two legs of an org chart to see how they’re related, that’s what this does. The user doesn’t have to know the structure; they just need to say what data they need.”"

"“Because XML documents can and often do have a large maximum depth of nesting, with 10 or 15 levels not uncommon,” Melton continued, “a combination of 10 to 15 outer joins would be required to reassemble the data into a hierarchical representation,” which he said is enough to make many SQL engines bog down.

Ironically, David claims to address these inefficiencies with proprietary algorithms."

So, there is a similar debate in the database world about using XQuery over SQL to query XML.

The use case for multiple paths in a hierarchy, is similar to the Optional Match requirement in the DAWG. With RDF, of course, it's graph matching not multiple hierarchies.

With respect to querying RDF, I'm not sure that there should automatically be only one type of syntax. Currently, the DAWG is focused on the use cases and the required operations to meet these use cases. Then I'm sure the group can make a judgement as to how it could be expressed functionally (like XQuery does for XML) or declaractively (like in a BRQL/iTQL way).

Another problem that was brought up in our discussions at work was with the return syntax in XQuery. Applying some of the syntax of XQuery to an RDF query language, it would have to describe returning either a graph or some sort of list of results. This seems to be mixing the binding of results with the presentation of the results.

Paul's most recent blog discusses some of the issues, especially as Network Inference continues to make the claim that RDF is "grounded" in XML.

Monday, July 19, 2004

Adaptive Information

Top Quandrant's White Papers page has a preview of the book Adaptive Information: Improving Business Through Semantic Interoperability, Grid Computing, and Enterprise Integration.

"Semantic Interoperability Framework – A highly dynamic, adaptable, loosely-coupled, flexible, real-time, secure and open infrastructure service to facilitate a more automated information sharing framework among diverse organizational environments."

This was preceeded by:
"One way to describe a system is with a set of buzzwords. A standard set of them has been used to describe the framework. The rest of this section is to explain what is meant by those buzzwords and the problems that are being addressed."

Everyone wants to Integrate

JBoss airs expansion plans "JBoss is looking specifically to open-source, standards-based integration software, called an enterprise service bus, and business process management (BPM) software, which is server-based software for automating complex business processes, Bickel said. Currently, enterprise service bus and BPM software are offered by both large commercial software companies and smaller, specialized ones."

"He noted that adding integration capabilities to the JBoss application server mirrors what other Java server companies are already doing and could help make JBoss more competitive.

"Integration is a critical factor in many of the same projects that people are deploying application servers for," O'Grady said. "It's almost as if integration is a new checklist item for application server projects." "

And something I thought I'd wouldn't see JBoss Application Server gets J2EE-certified.

Ontology Editors

A nicely timed posting, given our recent work on Ontology editing at work, 94 ontology editors on the wall… links to Ontology Tools Survey, Revisited.

"Reference to taxonomies and ontologies by vendors of mainstream enterprise-application-integration (EAI) solutions are becoming commonplace. Popularly tagged as semantic integration, vendors like Verity, Modulant, Unicorn, Semagix, and many more are offering platforms to interchange information among mutually heterogeneous resources including legacy databases, semi-structured repositories, industry-standard directories and vocabularies like ebXML, and streams of unstructured content as text and media."

"The ontology editor enhancement mentioned most often by respondents was a higher-level abstraction of ontology language constructs to allow more intuitive and more powerful knowledge modeling expressions."

And on the second page:
"While achieving full-range ontology editing functionality is a tall order for toolmakers, the capabilities called out above are not the only demands toolmakers face...Some see the gathering demands as an impending crisis for providing editing environments that can accommodate an expanding scope of ontology language responsibilities. Eventually, editors will have to address the ontology language and reasoner functions currently under development..."

XQuery, XDS and Oracle

Integrating Data Using XML Data Synthesis "XDS provides an easy-to-use declarative framework to plug-in and query across the information sources. Instead of writing custom applications to access information from disparate information sources, customers have a choice of using XDS to build their information integration applications."

"There's a lot of similarity between the technologies Andrew links to, together with what Oracle are trying to achieve with XDS and XQuery, and what we're trying to do with business intelligence, data warehousing and data mining. It wouldn't suprise me if we start to hear more about XML, XQuery, RDF and so on in a business intelligence context in the future, and I fully expect these sorts of technologies making their way into Oracle's BI & knowledge management products over the next few years."

I've mentioned Oracle's recent interest in RDF here and here.

Saturday, July 17, 2004

SW is Vietnam

Johnson and FDR "I'm thinking of course of the great commander-in-chief of the Semantic Web, Tim Berners-Lee. Like Johnson, he had a vision for a great society, the HTML web, but let it languish while he fought a no-way-to-win war in Semantic Web Land".

So when did the French try the Semantic Web?

That was quick

Explanation of the Network Inference DAWG Strawman Objection "...the working group outright rejected any requirement or objective which expressed any commitment, at any level, to XQuery.

We believe that the DAWG working group is making an egregious error by rejecting any level of commitment to XQuery at this critical juncture."

"Regardless of outcome, Network Inference will remain devoted to our customer feedback by continuing our XQuery support for query-driven inferencing across RDF and OWL data inside our Cerebra Server product family."

Friday, July 16, 2004

Semantic Web and MDA

The July edition of the MDA Journal "...the only thought I had as to the potential for integrating the Semantic Web and MDA was the idea a rather obvious one to MDA aficionados that MOF metamodels of the Semantic Web languages would help to integrate ontologies into the MDA world. I did not appreciate the role that reasoning could play in making MDA more scalable."

"As ontologies move into industry they need to coexist with industrial metadata. We do not want ontologies to become yet another silo in a fragmented metadata landscape. Since much enterprise tooling is moving toward MOF-based metadata management, a minimal goal would be to make it possible for MOF-based tools to physically manage ontologies using the common MOF mechanisms."

"In order to achieve the goal of using MDA and the Semantic Web together, the OMG issued an RFP that calls for standardizing the following:
* A MOF metamodel for ontology definition
* A UML profile for ontology definition
* A mapping between the UML profile and the MOF metamodel".

Found by “Semantic Web” applied.

RDF Querying going to the DAWGs

The most recent DAWG face-to-face raised some interesting issues with respect to using XQuery for querying RDF. Jim Hendler has a response:

"You show RDFS/OWL/Rule query langauges as somehow being more easy inXquery, but again I think that is because you are assuming these things will be kept in their RDF/XML documents, or in APIs that respect the "boundaries" of those. I already see many applications moving towards multiontologies w/linking, and that seems to me to argue that we simply don't know yet which of these models are better."

The original proposal suggests that we're going to need a query language for OWL, Rules and RDF, which probably won't happen and it's suggested without proof. It also suggests that because RDF can be serialized in XML that it has something in common with XQuery, which is untrue. The standard RDF/XML serialization can have multiple forms of the same RDF graph. The same RDF query will work across different RDF/XML serializations, because it is operating on the same data model, this isn't true for XQuery.

It reminded me of the recent anti-XQuery article, "If You Liked SQL,You'll Love XQUERY".

Fabian is saying that the relational model was a simplification of graph theory. In this respect relational theory and RDF have much in common, much more in common than XML.

At a syntactic level, query languages like RDQL, iTQL and other SQL-like RDF query languages are leveraging off a legacy of SQL, Datalog and other similar languages. This is something that XQuery lacks as well. Do we really want FLWOR and Conditional Expressions in our query language?

Fabian also mentions NULLs, a continual pet peeve of the anti-SQL crowd, it's good to see XQuery avoids this. Something that I hope an RDF query language avoids as well.

Interestingly, Don Chamberlin's XQuery tutorial is quoted both by Fabian and in Jeff's proposal.

Andrae is also blogging some of this as well, "Jumping the gun".

Thursday, July 15, 2004

New co-Chair of SW Best Practices

"This is to inform you that as of 14 July 2004, David Wood of Tucana Technologies joins Guus Schrieber of Ibrow as co-Chair of the Semantic Web Best Practices and Deployment Working Group (SWBPD) [1]. We wish to express our thanks to David and Tucana and to Guus and Ibrow for their generous support of the Semantic Web Activity."

Fwd: W3C Announcement: David Wood, New Co-Chair of the Semantic Web Best Practices and Deployment Working Group

Querying with Rules

ISWC 2004 Research Track: Accepted Papers the first paper titled "Query Answering for OWL-DL with Rules", when plugged into Google gave two interesting papers:
* Answering DL Queries using Deductive Database Techniques and
* Rules and Queries with Ontologies: a Unified Logical Framework.

This is appropriate for our current TKS work, see Paul's blog for more details.

Everything New is Old Again

A short while ago I wrote, Accessing vs Naming Models. I was unaware that @semantics had done a presentation called "A naming mechanism for the RDF model" which lists the RFCs: 3401, 3402, 3403 and 3404 (both interesting) and 3405.

There's also the older, yet still relevant THTTP specification for encoding resolution into a HTTP request.

I was aware of the Handle System, which is for documents, and has its own RFCs including RFC 3650.

I think I got rid of all the times I tried to type RFC and my hand spat out RDF.

Wednesday, July 14, 2004

Supersonik

MOLVANIA DISQUALIFIED FROM EUROVISION! "The tiny Eastern European republic of Molvania was disqualified from the Eurovision Song Contest this year.

Zladko “Zlad” Vladcik was to perform his very popular techno-ballad, “Elektronik – Supersonik” - described as “a melodic fusion combining hot disco rhythms with cold war rhetoric”."

"Hey baby, wake up from your asleep
We have arrived on to the future,
And the whole world has become...
Electronic... Supersonic...
Supersonic... Electronic"

Champagne comedy indeed...by Working Dog.

Via Metafilter.

Discretization

semantic what? "people don't want to manage. people want to interact. applications that want users to enter metadata that enable management, at least in the consumer marketplace, are doomed to failure."

And while a little unreadable, The unbearable inevitability of discretization is an interesting rant about the Semantic Web and all things in general:
"Evolutive efficiency also applies to the Semantic Web. Luckily for us, it benefits from two distinct evolutionary avenues. It indeed gains effectiveness both from cleverer agents and from semiotically-complete ontology representation formats (relational databases, XML/RDF, OWL, UML, etc.). Therefore, with some site correctly implementing the Semantic Web-enabling technologies, one is right to argue that the Web already shows some signs of semantic intelligence.
Discretization is the fundamental mechanism behind any form of cognition. Solve et coagula-based computing rules!"

For XML Users

An no-nonsense guide to Semantic Web specs for XML people (Part I) "The Semantic Web has a serious problem: the XML people don't understand it.

They think it's an utterly complex way to write metadata that you can do with simple namespaces. The two worlds (despite being both hosted inside W3C) don't talk very much. Many (if not all) W3C folks are all in the RDF camp (and have been there for a while) and they see XML as a half-baked attempt to solve issues that RDF already solves. Unfortunately, not having been in the XML camp at all, they have no way to communicate with the other side.

The XML camp, on the other hand, thinks that they know how to build things that work, while the RDF people are all sitting in their ivory towers telling them that what they are doing is wrong, but without understanding their real-world needs.

As it normally happens in a debate, both are right and both are wrong. "

Tuesday, July 13, 2004

Nature nuturing Oracle

A presenation by Nature Publishing Group looking at Oracle's NDM as an RDF store, the last slide:
"NPG and Oracle investigating the suitability of the NDM to store and query
RDF-encoded information
o Storage looks OK
o Can hold directed labelled graphs
o Allows URIs, literals and blank nodes
o Can include provenance information

Querying may need more development:
o Can extract sub-graphs but performance and scalability need to be tested
o RDF/XML import and export would be desirable
o Support for RDFS- and OWL-based inferencing"

Mentions Urchin.

Monday, July 12, 2004

Ontologies for the Web

Leveraging Ontologies: The Intersection of Data Integration and Business Intelligence, Part 2 "The real significance of ontologies - leveraging the reusable aspects - is within vertical domains where the use of common meta data, services and processes has the most worth. Once we get semantics under control within vertical systems (more often, a collection of systems), data integration, or linking a common set of semantics to back-end systems, won't be as daunting as this process is today. What's more, the application of standards such as Semantic Web and OWL will make ontologies that much more attractive."

Sunday, July 11, 2004

N3QL

N3QL - RDF Data Query Language "N3QL is an implementation of an N3-based query language for RDF. It treats RDF as data and provides query with triple patterns and constraints over a single RDF model. The target usage is for scripting and for experimentation in information modelling languages. The language is derived from Notation3.and RDQL."

Part of CWM.

WebMethods and RDF

WebMethods, for One, Believes in UDDI " But UDDI doesn’t paint the whole development picture, which Glass believes may partially explain why its adoption has been slow. “In its current version, UDDI is not well suited for metadata about everything. If you’re building an application out of parts, Web services is only one portion.” Others might include portals, portlets, schemas and business processes, among other things.

Glass said WebMethods has been looking at specifications for publishing metadata of other types, such as the W3C’s Resource Definition Framework (RDF). “This looks promising as a way to represent a broader array of metadata than simply that of Web services. And in [the forthcoming] UDDI version 4, there’s a lot of work on leveraging RDF.”"

Friday, July 09, 2004

Pay me!

Eternal Refactoring "Some may consider Semantic Web developers to be very much concerned with the abstract, but a recent thread shows that good old materialism is as good a driver for progress on the Semantic Web as anywhere.

Many of us, no doubt, employ the wishlist facility on Amazon to communicate our birthday needs to distant relatives. The decentralization of this seems like a natural target for RDF-savvy developers."

More metadata than data (again)

Anyone who has done any work on RDFS/OWL won't be surprised by this, Behind the Scenes at Yahoo Labs, Part 2:

"I would claim that there is more implied data (or inferable meta-data) than "raw" data on the web, and that we are barely scratching the surface of it. Today, all search engines are scraping for some simple forms of implied data: language, locality, etc. What's missing from this list is a nearly infinite collection of relationships that are obvious to most any human reader but extremely difficult to infer from a single document. The reason why implied data is so hard to identify is because, in the aggregate, it forms our collective cultural wisdom."

Thursday, July 08, 2004

Oracle's RDF Store

Create a logical Network Data Model in Oracle and it would be great to store RDF. That was what I thought after reading "Re: Chemistry and the Semantic Web".

The attached document is based on articles available from Oracle. An indepth description is available in "Network Data Model Overview" (free registration required).

There are various schemas defined for storing networks which includes:
"NODE_NAME VARCHAR2(32) Name of the node.
NODE_TYPE VARCHAR2(24) User-defined string to identify the node type."

The schema is obviously not designed to store RDF unlike other RDBMS mappings.

One difference is their flexibility in storing different graphs and giving links a "cost".

Another difference is their nodes and links are typed as strings; this looks like it would limit the effectiveness of data type operations. Querying for all nodes that are numbers between two values or dates between two ranges is going to be costly compared to dedicated data type handling. That's apart from the obvious difficulty in trying to put everything into a VARCHAR2(24).

Unless they have optimised the query layer specifically for the task, which might be case, it will also incur the costs of joining against the same table many 10s or 100s of times.

It does have some neat operations (like shortest-path), a Java API, PL/SQL integration and of course it integrates well with existing Oracle databases.

Tuesday, July 06, 2004

Kowari 1.0.4 Released

New in this release (links to kowari.org will be inside a frame):
* Walk and transitive constraints.
* Backup individual models.
* Automatic reconnect of the iTQL Swing UI when the server restarts.
* Constructing Jena and JRDF with sessions rather than databases to allow multiple access.
* Sub-queries and the greater-than/less-than constraints are much faster.

Download here.

Monday, July 05, 2004

A Simpler Time

I read recently that someone thought animation was the key to Java's early success. JAVA TECHNOLOGY: THE EARLY YEARS "Next, Gosling and Gage pushed the audience over the edge with an animated line-sorting algorithm that Gosling had written.

In each of three sets of horizontal lines of random lengths, the demo sorted the collection by size, from shortest to longest, by actually moving them up and down in the browser. The audience had never seen anything but static images in a browser before this: The lines were moving, as if being sorted by unseen hands!

Suddenly, everyone in the room was rethinking the potential of the Internet. Far from the crash-and-burn scenario Gosling had first envisioned, his demo had jolted a very influential audience off their seats, and they were delivering enthusiastic applause. And within this technology-entertainment crowd, word would spread quickly."

That draw dropping demo still runs, too.

Sunday, July 04, 2004

Java Rules

The Mandarax Project "Mandarax is based on backward reasoning. This fits perfectly in a computing landscape based on a pull model (e.g. a transaction initiated from a web site). Data (e.g., from relational databases) can be integrated on the fly at query time, no replication is necessary (see the manual for a more detailed discussion of "Mandarax vs. RETE")."

The manual says: "The mandarax inference engine uses backward reasoning, and the reference implementation uses an object oriented version of backward reasoning similar to the algorithm used in Prolog. On the other hand, most commercial rule systems such as ILOG and popular open source solutions like CLIPS and JESS use forward reasoning, in particular an algorithm called RETE. This algorithm keeps the derivation structure in memory and propagates changes in the rule and fact base."

A description of the RETE algorithm is here.

Desktop Metadata

RDF For Desktop Metadata? "There is an article "Metadata for the desktop" that suggests that RDF should be used to describe data in desktop environments. This is an interesting idea. RDF is already used by Creative Commons to attach license metadata to its works. Mozilla also supports it. RDF was designed for the web, but can it also find its way to the desktop? And what metadata is most important to describe?""

Comments link to: WinFS is not filesystem, Spotlight, rdf semweb winfs, Haystack, Questions about Longhorn, Pike and libferris.

Friday, July 02, 2004

Blast from the Past

The Ur-Quan Masters "The project started in August 2002, when Toys For Bob released the partially ported sources of Star Control 2 3DO version to the fan community. Our goal is to port this wonderful game to current personal computers and operating systems. It is and will remain 100% free of charge, and anyone can contribute to the project and thus help make it even better."

Many wasted hours spent playing this the first time...

Thursday, July 01, 2004

Updated Kowari Site

The new kowari.org web site is up. The only negative is that it's now framed based; the website equivalent of SOAP. The positive is that it's now more up-to-date and includes documentation for Kowari 1.0.4. There's more to come including the Javadoc and the Jena tutorial.

Word! The DOPE Project

A recent IEEE article, Exploring Large Document Repositories with RDF Technology: The DOPE Project "(Drug Ontology Project for Elsevier) explores ways to provide access to multiple life-science information sources through a single interface."

"Current per-formance problems stem mostly from query procedures between the Sesame system and the Collexis-SOAP interface. We plan to address these problems by expanding DOPE with other data sources and thesauri."

Download and project page here.

FOAF next for Feedster?

An Interview With Feedster’s Scott Rafer, Part II "All this semantic web stuff is derived from a file format called RDF. RSS is a very simple version of that. The next one to gain any popularity is FOAF, which stands for “friend of a friend.” Under the hood, several of the social networks, LinkedIn, Tribe, maybe a couple of the others, are FOAF-based, and it’s very easy for them to start turning all those relationships into feeds, if they want. And that’s what I personally want, given my own habits. I know a lot of people in my “second degree” in LinkedIn. If I could have a feed of my second degree, as it increases, so I could go into my RSS aggregator and just hit links for “Yes, I know that guy” or “No, don’t know him,” they would end up with a much richer database, knowing more about me and my network. It would be really time efficient, and I’d be even more likely to pay for their service when they start charging."

RDF Query Languages

Design Evaluations Links Includes: SeRQL, RDQL, REX, iTQL, Algae2, and TriQL. Also links to Versa, BRQL and XsRQL (interesting given a recent anti-XQuery link).

RDF - Just don't mention the Semantic Web

Metadata for the desktop "My premise then is that more metadata is required to create a usable desktop for users and manage the increasing volume of information stored in our homes. That's a conclusion other people are agreeing with, too. Microsoft's next-generation operating systems will ultimately include WinFS, a file system supporting the attachment of arbitrary metadata to files. Mac OS X is acquiring similar functionality. ReiserFS has been trying to do it for ages. Closer to GNOME, there are projects like Dashboard, Storage and iFolder."

"First introduced in 1998, the W3C's Resource Description Framework is a computer-processible way of describing things. And that's about as simple as it gets. Despite being mired for some time in controversy over an awkward XML expression, the current view and consensus over RDF is in terms of its data model. The data model is simple and expressive, and is the best starting point for understanding RDF."