Thursday, October 30, 2003

Semaview Release Sherpa

"Sherpa Calendar is a Windows / PC Platform (based) intelligent calendar application (iCal compatible) that allows anyone to easily publish and consume RDF based calendars and their HTML representation. Users create semantic content without even knowing it!"

"SherpaFind is a calendar search engine for machine understandable RDF and ICS representations of intelligent calendars and events. SherpaFind allows anyone to quickly and easily search for calendars of interest and preview or subscribe to their iCal or RDF representation. An event search capability will be added very soon. "

Their technology overivew shows they're using Apache, MySQL, PHP and the like:

Wednesday, October 29, 2003

Mooting Mooter

Graphical Web searching gets Mooted “We have experienced triple the level of traffic of our most ambitious target,” Cappell said. “It’s really a lovely problem to have.”

The Mooter search engine, named after “a question that can have more than one answer”, is an Australian-made Web search tool which uses intelligent algorithms to group search results into themes or “clusters” of information. “In a traditional search a user might put in ‘travel’,” Cappell said. From the range of results brought up by this search, users may then narrow down what they’re looking for to “car hire” or “accommodation”, resulting in another list of search results, she said."

No such thing as bad publicity. The results on "windows" are all about the operating system likewise "java" is all about the programming language.

"How mooter technology works:
We push the results through our proprietary nodal structure. Phrases and chunks of meaning accumulate in nodes, our intelligent algorithms then analyse those nodes and present the user with a series of categories that encapsulate the content of the search results."

Kowari Update

In the current Developer Beta Release there are a couple of things that are being changed:
* The process of localization (resources to numeric values) and globalization (numeric values to resources) hasn't been totally moved over to the new way of doing things. Basically, globalization should only occur when we present the user with an Answer object.
* There's object allocation leaks (with anonymous resources due to changes in ARP) which leads to memory usage above the existing commercial version and is much less than what should be expected.

Tuesday, October 28, 2003

Commercial Software - Moral Failure

Kapor: Why the old development model is history ""No sustaining value was created during the boom years," said Kapor to an auditorium packed with a mix of open source and "traditional" developers representing virtually every mainline firm in Silicon Valley. Instead, a "lot of wealth was created and distributed among those who were either lucky or opportunistic." Kapor says that this GRQ -- Get Rich Quick -- development model has had a profoundly negative affect, amounting to "moral failure" that will be with us for years to come."

""Open source software, like flowing water, will go everywhere it can go," said Kapor. And that's not a bad thing; it may be harder to get ultra-rich developing software, he said, but it's easier to start a software company, thanks to the rich base of existing open source projects."

""Open source is a surprising success. No one predicted it in the '80s or '90s. It's completely crazy, it defies conventional wisdom, it has no central control, yet it produces products that are the equal of commercial software," he said.

Kapor ended his talk with a Bill Joy quote that seemed to sum up his take on open source software: "Sooner or later, you realize that most of the smart people in the world don't work for your company." "

Old is relative, open source software was used in the '50s, '60s and '70s and most money seems to have been made through hardware and later, consulting.

Stored Procedures for RDF

"My goal is to implement a fully-featured query system using just database stored procedures and plain ANSI-92 SQL statements."

"The prototype uses triples gathered from the FOAFnaut crawler and made available by Jim Ley. There are approximately 400,000 triples. This represents a small triple store but what is of interest is that some predicates are very popular (for example, the 'knows' predicate occurs in around 100,000 of the triples, so the work in evaluating 'least popular' terms first clearly pays dividends. Queries of 5 query terms are around 20ms on a laptop PC."

"An immediate goal is to import a much larger triple dataset, such as that used by 3store, which contains approximately 5 million triples. I also want to revisit the database schema, and ensure that existing RDF model structures are supported robustly. Once this has been achieved, I want to start to look at more sophisticated capabilities such as inference and subclasses."

Solving RDF triple queries with a relational database - another approach


"XCoder is an extensible model transformation and code generation framework. The framework is itself modelled with UML and generated using the standard UML to Java model transformation included in the distribution.

Currently supported input meta models: UML via XMI
Currently supported output meta models: Java, C# and C++"

From MDA code generator framework XCoder is now open source

Safari 1.1 begins XUL Support

XUL "The piece of XUL that Safari implements is "(b) implement some additional layout primitives." XUL basically introduces four new layout primitives to CSS: the flexible box, the grid (flexible boxes in 2 dimensions), rich popups/tooltips, and stacks. Safari in Panther has implemented the first (and most useful) of those layout primitives, the flexible box model. Since the box layout primitives are defined via CSS, you can even use them in HTML (in either Safari or Mozilla)."

See also, Safari 1.1 and XUL renderer for KDE.

IE Development: XAML "The operating system lock-in created and perpetuates the browser lock-in. Now the browser will give that extra boost to the OS. “Our application only works on Windows, using IE. No, sorry, extra development costs are too much. You’ll just have to use a Windows desktop.”

Apple has a tremendous amount of momentum right now, and about three years to innovate and compete against an OS that was released two years ago tomorrow. If they embrace a development platform like XUL and actually make inroads with the customers who will be deploying applications using it, will it be enough?

All this can be somewhat negated by the opportunity that exists for Microsoft to play fair. If, as Simon postulates, XAML can be transformed server-side to XUL (using XSLT), then we all win. But Eric remembers his history, which suggests we shouldn’t rely on that happening. I’ll throw this into the ring then: even if it’s possible, will it be cost-effective? For organizations to spend an extra 20% of a development budget to support a 3% share of the market is a bit of a stretch. But it’s obviously do-able, seeing as how many still bend over backward for NN4.x."

SW Misunderstandings

"* - One big web - trust everything
* - One inconsistency trips it all up
* - One big ontology
* - AI has promised us so much before
* SW points to make
* - Communities of all sizes"

Also, there's no Semantic Web killer application, "Its the integration, stupid!".

Sunday, October 26, 2003

Shaddap You Face

" Joe Dolce had a giant hit in the early 1980s with Shaddap You Face. Since then over 50 different artists have recorded covers of the song, including a hip hop version, and it's become the biggest-selling Australian single ever, surpassing Slim Dusty's Pub with No Beer and Mike Brady's Up There Cazaley. "

From Shaddap You Face to Gift -- Joe Dolce talks and plays live in the Deep End

Kowari Developer Beta Release

At long last it's now available:

Kowari Developer Beta Release.

Features included:
* A transactional triple store capable of store many millions of triples,
* iTQL - A Squish based query language that allows subqueries, operands for data-types (greater than, less than, etc),
* Web based and command line iTQL interpreter,
* Descriptors - A combination of XSLT and iTQL that can be used to generate renderings of RDF data (comes with a v-card example),
* Lucene integration - full text insertion and querying,
* Views - allows the combination of multiple models,
* Jena 2.0 support - currently only through the use of ARP, and
* Written in Java - 1.4.0 and above required.

Proposed future features include:
* Improved ARP integration (non-memory bound using disk based string pool).
* Move distributed queries to server - use server join code.
* Streaming end to end (mainly Driver work).
* Jena 2 Support: "store" and "model" integration, support of OWL at query time, and full support of Joseki and RDQL.
* Pluggable data types.
* Pluggable security.
* Pluggable data handlers: EXIF extraction, MP3 extraction, and XML RSS extraction.
* Streaming Descriptors.
* Back-end refactoring (Windows, OS X, 64-bit Unix).
* Small embeddable version - Jena lite plus Kowari lite (should be less than 5MB).
* J2EE Connector and MBean support.
* Non-RMI version of streaming of queries.
* Support all OWL entailments of models at the query layer.
* 64-bit testing and loading of large data sets (~150 million) including
improving bulk loading support and 6 index support.
* Better iTQL command line processor.
* Review joins.
* Review subqueries vs ontologies.
* Upgrade of Lucene support.

Saturday, October 25, 2003

MS to Copy XUL

XUL and XAML at 4 PM "For those of you Mozilla folks who are interested in XUL, I suggest you pay close attention to what Microsoft is going to unveil next week at the PDC conference. The massive new Longhorn API will be revealed, including XAML, Microsoft's own markup language which is similar to XUL, but way more powerful. It's flattering to know that Microsoft is modelling the future of Windows UI development after a technology we all worked so hard to bring to life for Mozilla. We were truly ahead of our time.

Click here ( on Monday, October 27th and keep a napkin handy to wipe the drool off your face. And please, spare me complaints about how this is not a cross platform technology. Who cares, it's going to be so mind-blowing that it will make huge waves in the industry regardless."

See also, Microsoft will ship Longhorn Betas with built-in XUL motor this fall

If anyone asks

Google knows: Semantic Web, OWL, Ontology, Taxonomy and RDF.

Metalog 2.0b Released

"Metalog is a next-generation reasoning system for the Semantic Web. Historically, Metalog has been the first system to introduce reasoning within the Semantic Web infrastructure, by adding the query/logical layer on top of RDF."

Requires Python (at least version 2.2), and SWI-Prolog (at least version 5.0). 2.0b Download.

Revisiting Knowledge Navigator

In Apple's Knowledge Navigator revisited Jon Udell sees Google, iSight, WiFi and Powerbooks. What I found interesting was the type of interactions made available. As Jon says: "Presence, attention management, and multimodal communication are woven into the piece in ways that we can clearly imagine if not yet achieve. "Contact Jill," says Prof. Bradford at one point. Moments later the computer announces that Jill is available, and brings her onscreen. While they collaboratively create some data visualizations, other calls are held in the background and then announced when the call ends.I feel as if we ought to be further down this road than we are. A universal canvas on which we can blend data from different sources is going to require clever data preparation and serious transformation magic. The obstacles that keep data and voice/video networks apart seem more political and economic than technical."

Sculley's ACM paper The Relationship Between Business and Higher Education: A Perspective on the 21st Century, which has pictures of the Knowledge Navigator video throughout, is still an interesting read.

Booch Still Positive on the Semantic Web

Back in April, Booch said roughly the same thing as in this article: "One goal is to boost support of the semantic Web via modeling capabilities in the Rational Rose and Rose XDE (eXtended Development Environment) tools, said Grady Booch, an IBM Fellow"

IBM execs ponder technology plans

Open Workflows

Topicus open source workflow Reviews and news of Worklow Engines. Brief reviews include XFlow and OpenWFE.

New Version of JESS

Jess Inventor Opines About Rule Engines and Java "Jess is designed from the ground up for integration, and in Jess 7.0 it's going to get even better. Current versions of Jess can only reason about data in its own working memory (although you can use backward chaining to fetch data into working memory as needed.). Jess 7.0 is going to have the ability to reason about data that isn't in working memory, making it possible to efficiently make inferences about truly huge data sets.

Jess has been integrated with agent frameworks and other tools. It's also been integrated with the popular ontology editor Protégé 2000. This is a powerful combination that many people use to develop knowledge structures as well as code that acts on them...Third-party translators between Jess and RuleML and Jess and DAML exist. One of the features planned for Charlemagne (Jess 7.0) is native XML support."

Friday, October 24, 2003

Competition for DSpace?

The Fedora Project "The Fedora project was funded by the Andrew W. Mellon Foundation to build a digital object repository management system based on the Flexible Extensible Digital Object and Repository Architecture (Fedora). The new system, designed to be a foundation upon which interoperable web-based digital libraries, institutional repositories and other information management systems can be built, demonstrates how distributed digital library architecture can be deployed using web-based technologies, including XML and Web services."

Another way to grep dead trees (well hopefully better than grepping).

Amazon Search Goes Live

" unveiled a massive new search engine Thursday called "Search Inside the Book", containing 33 million pages of a collection of 120,000 books."

"It's an eerie ability, sort of an extension of the omniscient feeling one gets when digging around in google or the Internet Archive. It extends easy search capabilities to printed material, which fights the old addage about grepping dead treees. Of course, you're limited to a subset of Amazon's catalog (and not every book ever printed), but it's still an insanely useful feature.

With Amazon's web services initiative, it could lead to all sorts of interesting implications. Imagine if your local library had the ability to search the entire contents of its store of books, quickly and free of charge, and not only told you instantly which books were relevant, but offered to deliver them to your door for a reasonable fee. Good heavens."

Wired has a piece too called, The Great Library of Amazonia.

Semantic Web as a Service

Tutorials " The program focuses on the use of semantic technology to enable the next generation of Enterprise Solutions for business and government. The first Tutorials in the series are listed below. They will be offered in the Washington, DC area on November 3-4 and December 3-4, 2003. Additional classes are planned, and any offering can be customized to the specific needs of groups or organizations on request."

Querying RDF and SQL

Two interesting technical pieces Heterogeneous RDF Databases and RDF Access to Relational Databases.

A great summary of why you would create an RDF specific database:
"Querying generic triple stores are inefficient as they do not gather all the properties of an entity together. This forces a self join for each additional attribute involved in the query. Migrating to a conventional relation provides the efficiency we are used to, and the brittleness we are used to enduring. The goal of the heterogeneous rdf databases is to provide an optimizable compromose between the two."

It's quite possible to optimize a database for joins. As I'm so fond of saying RDF databases can be more relational, to the original relational model, than current commercial SQL databases.

Engaging the Hackers

How the Semantic Web Will Really Happen "What makes me think, you may be asking yourself, that the hackers and the LAMP crowd will ever work on the Semantic Web effort? After all, the open source world isn't exactly a hotbed of knowledge representation, formal reasoning, and logic programming. Ah, dear and gentle reader, I'm glad that I made you ask yourself that question, for now I can deploy my simplistic analogy with the Web itself. Before the Web, the free software world -- as it was called back then -- was, first, considerably smaller. As others have noted, the Web was an enabling technology for the hackers as much as the hackers (by creating the LAMP platform) enabled the Web itself. But, second, before the Web the free software world was hardly a hotbed of relational database, hypertext, and document markup technology. The Web was the motivation for an entire generation of hackers to learn SQL, SGML, XML, and so on.

It's not much of a leap to think that the promise of the Semantic Web may fuel a new generation of hackers to learn RDF, OWL, and rule systems. I anticipate that, at some point, we will talk about, say, an RORB (RDF, OWL, Rules, Bayes) platform for Semantic Web development."

"Aside from conference considerations, there are other things we can all do. Professors should encourage (or mandate?) their students to use open source software whenever possible, to participate in relevant open source projects and communities, to use open source resources like SourceForge in order to increase the visibility of research and increase the prospects for mutually fruitful collaboration. Finally, everyone in academia should think about the lesson of n3."

Finding Context

The Web: Search engines still evolving " "What's missing here is the context," Barak Pridor, CEO of ClearForest Corp., of New York City, a developer of data management software, told United Press International. "Information is only meaningful when it is in context."

Computer scientists around the globe, funded by private investors, and government agencies, such as the National Science Foundation and Science Foundation Ireland, are seeking to solve this vexing dilemma. The search problem is inherent in the Internet -- a technology already 30 years old that has been commercialized only within the last decade."

" Using a combination of statistical mathematics, heuristics, artificial intelligence and new computer languages, researchers are developing a "Semantic Web," as it is called, which responds to online queries more effectively. The new tools are enabling users -- now on internal corporate networks and, within a year, on the global Internet -- to search using more natural language queries."

" Investors -- including Greylock -- have given ClearForest $7.5 million in recent weeks to take its technology to the next level.

Computer scientists also are employing artificial intelligence for the Semantic Web, James Lester, chief scientist and chairman of LiveWire Logic in Research Triangle Park, N.C., a linguistic software agent developer, told UPI."

The End of the AI Winter?

Commercializing the Semantic Web "For reasons I don't entirely understand, the term "Semantic Web" tanks with corporate clients, with venture capitalists, and in other non-academic contexts. This may yet be a hangover from the AI Winter, but the interesting difference is that, as I discuss below, the reaction is mostly to the label and to its perceived implications, rather than to the technology itself. "Web Services" does much better, and one of the things Network Inference seems to have done, at least at the marketing level, is to hitch its semantic wagon to the web services star. (This is a move I suggested, though more in the research than marketing context, in an article last summer, "The True Meaning of Service".)

Given the problems with the various application spaces, Network Inference has apparently been working to define a new application space, one which the Gartner Group has coined as "semantic oriented business applications". That doesn't raise the hackles that "Semantic Web" raises; it's different than EAI, and it's nicely distinguished from "Web Services"."

Network Inference made news recently when they announced "...a strategic partnership with Rightscom, the digital strategy consultancy, and ioko, an enterprise technology services specialist." at the ISWC 2003.

Monday, October 20, 2003

Metadata: The future of storage?

Metadata: The future of storage? "When vendors discuss metadata-driven storage, the phrase "storage virtualisation" invariably comes up. Vendors will tell you that, depending on what the goals of a particular metadata application are, the benefits of storage virtualisation can range from improvements in retrieval performance to searchability to ease of management to better allowance for heterogeneity at the hardware level (but usually not all of the above).

Return on investment theoretically comes in the form of increased productivity to both end users and those tasked with planning and managing enterprise storage. Storage virtualisation can result in capacity optimisations that bring hardware savings."

"Databases consist of structured data, which means relational records that are usually fairly dynamic and that have highly relational characteristics. Unstructured data is a photograph. That's unstructured data, where you're storing a big object with a little bit of information around the object. It's usually what we call fixed content. It's a medical image or an e-mail record or a document that's been scanned in. It's not relational, but you still want it to be a record."

Couple of Quick Links

Dan Bricklin on The Innovator's Solution.
JOHO on Metadata and Desire.

A Little Magic for Java

"JHOVE (pronounced "jove"), the JSTOR/Harvard Object Validation Environment, is a tool to automate the validation of file formats. Unlike less reliable approaches that rely on superficial indicators such as file extensions and MIME types, JHOVE uses format-specific modules to probe a file's internal structure. JHOVE's plug-in style architecture will allow the work of developing format modules to be shared. The JHOVE site will eventually include a tutorial on module writing, and a full explanation of the module interface."

It comes with some simple formats such as ASCII, byte stream, and UTF-8 as well as more interesting ones: TIFF and PDF.

It's actually much more feature rich than Unix's magic. From the tutorial:
"Format validation conformance is determined at three levels: well-formedness, validity, and consistency.

1. A digital object is well-formed if it meets the purely syntactic requirements for its format
2. An object is valid if it is well-formed and it meets the higher-level semantic requirements for format validity
3. An object is consistent if it is valid and its internally extracted representation information is consistent with externally supplied representation information "

"The set of characteristics reported by JHOVE about a digital object is known as the object's representation information...includes: file pathname or URI, last modification date, byte size, format, format version, MIME type, format profiles, and optionally, CRC32, MD5, and SHA-1 checksums [CRC32, MD5, SHA-1]."

It also has output handlers, again pluggable, which include text and XML. Part of the JSTOR project (More information).

Sunday, October 19, 2003

No Code

I've been spending most of my time trying to get the last of the problems with Kowari up and running - it's now down to Windows specific problems (mainly with java.nio). Frustratingly, I can't show how to solve Sam Ruby's problem of being able to query across namespaces. Metaqueries, we have that in iTQL, they're simply subqueries - like in SQL. That's not the interesting way (inferencing is) but it's a start. I think using OWL's sameAs is appropriate to show the two are equivalent.

I haven't checked this but something like:

select $creator
select $type
from <rss_schemas>
where $type <>
from <rss_feeds>
where $creator $type '';

"rss_schemas" is the model which holds mappings from Dublic Core to other schemas. "rss_feeds" are any URLs like "", file or Kowari model.

Danny Ayers noted that in RDQL: "SELECT ?a, ?c WHERE (?a, <dc:title>, ?c)" will get you the subProperties as well (without have to specifically list it).

Zoe is one of the Open Source projects to be nominated to be included to go to Comdex.

I was going to totally stay away from coding this weekend.

Friday, October 17, 2003

The Myths of Myths of Federated Searching

The Truth About Federated Searching "Not all federated search engines can search all databases, although most can search Z39.50 and free databases. But many vendors that claim to offer federated search engines cannot currently search all licensed databases for both walk-up and remote users. Why? Authentication."

TKS (as do other systems) allows authentication to be presented as needed so that searching can be performed on both secure and free data.

"It's impossible to perform a relevancy ranking that's totally relevant. A relevancy ranking basically counts the occurrence of words being searched in a citation. Based on this frequency of occurrence, items will be moved closer to the top or farther down the results list. Here's the problem: When attempting to relevancy-rank citations, the only words you have to work with are those that appear in the citation. Often, the search word doesn't even appear. "

Relevance calculations, using a histogram, can provide you with better relevancy rankings based on shared nodes. Of course, if the relationship are not there then it can't work. Although, to get around not having shared keywords you can use shared concepts extracted from a text mining tool.

"You can't get better results with a federated search engine than you can with the native database search. The same content is being searched, and a federated engine does not enhance the native database's search interface."

This is totally wrong. The more context, the more metadata that's extracted and related across different data sources, the better (more relevant) the search results.

Concept Analyser

Digging for Nuggets of Wisdom "How well computers truly make sense of what they are reading is, of course, highly questionable, and most of those who use text-mining software say that it works best when guided by smart people with knowledge of the particular subject."

"In one instance, a reference to a neural phenomenon called "spreading depression" caused him to look for articles with that term in their titles. Reading those pieces, he found that magnesium was often mentioned as preventing this spreading depression. Other connections to magnesium deficiencies started to appear, so he dug further. In a 1988 paper on his research, he wrote, "One is led to the conjecture that magnesium deficiency might be a causal factor in migraine."

Today, Dr. Swanson's work is considered significant both for migraine studies and for text mining. The link between the headaches and magnesium deficiency was soon backed up by actual experiments."

Thursday, October 16, 2003

The End of Google (again)

Google, Weblogs, and the End of the World "I’ve probably said this before, but it bears saying again. It is not the job of the Web to conform to Google’s search algorithms. In fact, that bears saying once more, with feeling.

It is not the job of the Web to conform to Google’s search algorithms!

The World Wide Web is not the carefully measured, centralised system of the ‘longhair database people’. The Web is Small Pieces, Loosely Joined. It is, and always has been an anarchy of links, and has taken over the world because anyone, anywhere on the web can innovate, and build on top of it."

The original Register article is worth a read as is the results from the wiki-wankers (including the hilarious Foo Bar) and their blogs.

Semantic Blog

David Czarneck notes HP's semantic blog"The folks over at HP Labs are doing some interesting stuff in the way of semantic blogging."

Cleaning Data

KDnuggets : Polls : Data preparation (Oct 2003) 64% said that 60% or more time was spent cleaning the data. Interestingly, 8% said less than 20%.

"Karl Brazier, The blip at the bottom
Suspect there may be a small peak at the bottom end caused by model induction researchers like myself. Doesn't mean we don't think cleaning is important, just that our remit is to focus elsewhere. So we'll just have one or two new data sets to clean at the start of our work and probably supplement these with some of the cleaner sets from the UCI Repository. At the time of writing, I think I see this blip beginning to form. Well anyway, there's the offer of an explanation if it does. "

What a Wonderful Web

WonderWeb "We are on the brink of a new generation of World Wide Web (WWW) which, in his recent book Weaving the Web, Tim Berners-Lee calls the Semantic Web...The development of ontologies will be central to this effort. Ontologies are meta data, providing a controlled vocabulary of terms, each with an explicitly defined and machine processable semantics. By defining shared and common domain theories, ontologies help both people and machines to communicate more effectively."

OWL API with inferencing support. The comments on the RDF Interest scratchpad are not very positive. The best thing I can say is that the build scripts design is novel and difficult to get going (I gave up).

Tuesday, October 14, 2003

EROS - RDFS Explorer

EROS "EROS is a graphical java-based user interface for browsing RDFSchemas and for expressing (RQL) queries over these." The querying (by example) seems to be a bit more user-friendly than most and it uses Sesame (makes a change from Jena!).

A few write-ups are available: EROS: A User Interface for the Semantic Web, Bringing the Semantic Web closer to the User, and EROS: Explorer for RDFS-based Ontologies.

From the Hera project.

Monday, October 13, 2003

In Search of Wonder GUI

Ever since Aurora (screenshots) was first mentioned the idea of data being able to render itself or "Wonder GUI" has been thrown around in various forms (Haystack's Ozone, Microcontent or Topicalla).

I've been looking, again, at XBL and RDF Datasources in Mozilla. Especially, the way in which Kowari's Descriptors will interact with it, especially given all of the widgets available. The RDF datasources maybe URLs and using something like Joseki's WebAPI to query the database or generate XUL dynamically.

Jon Udell's recent piece on Why Mozilla matters also reinforces this: "But as Web services redefine documents, Mozilla, an open and extensible document-handling engine, looks more strategic than ever."

Sunday, October 12, 2003

Verity and Autonomy - Magic

Visionaries Invade the 2003 Search Engine Magic Quadrant "Autonomy and Verity separately approached the challenge of information location. Verity chose a word-by-word sleuthing perspective, and Autonomy chose a pattern-matching, keyword-unexamined direction. Together, they lead the 2003 Search Engine Magic Quadrant.

Autonomy remains highly adept at pattern-matching and word-independent consideration of the content it indexes and processes. However, because it lacks a direct service division, it is handicapped in developing marquee installations and installations for midsize businesses that sometimes prefer not to use expensive systems integrators. It provides a broad spectrum of pricing options but tends to be priced higher, on average, than other vendors.

Verity's roots in keyword location and related relevancy analysis, where the word is the critical foundation for comprehension, is different than the Autonomy approach. Just as Autonomy has developed a credible search product, Verity has expanded its ability to address high-sophistication pattern matching and is now capable of matching Autonomy in many deals. Verity's acquisition of Ultraseek is simultaneously a cause for concern and a reason to be hopeful. Verity will find it challenging to produce a one-size-fits-all, variably priced product from the purchase. Success in doing so would benefit its status as a leader."

Saturday, October 11, 2003

Semantic Washable

Is The Semantic Web Hype? "Maybe incompatible with existing XML tools. Databases may take up to ten times as much memory and 24 hours to load."

Kowari's descriptors use XSLT and iTQL, RDF is serializable as XML and SQL databases are the wrong tool for the job. I've reponded to this before.

Friday, October 10, 2003

Understanding Data to Create Metadata

Information Quality, Liability, and Corrections "Nonetheless, information on its own is neither inherently good nor bad. It is often a sequence of events that leads to the consequences of what we simplistically refer to as bad information. One striking example is the Johns Hopkins clinical trials case, in which an insufficient search in PubMed resulted in a death [1]."

Five ways information can go wrong: incorrect fitness or "quality", ambiguous or fradualent, biased, incomplete and out of data.

Semantic Information Architecture: Creating Value by Understanding Data "Enterprises should consider capturing data semantics for two main reasons. Tactically, semantics saves time by capturing the meaning of data once. Without semantics, each data asset will be interpreted multiple times by different developers as it is designed, implemented, integrated, extended and decommissioned. This independent interpretation will be time-consuming and error-prone. With semantics, the data asset is mapped and interpreted only once. Moreover, any new assets can be generated from the information model so that they use official business terminology from the outset.

The second and most significant benefit of semantics is a strategic one. Semantics can turn hundreds of data sources into a single coherent body of information. This single body can then provide a common understanding showing where data is located, what it means and how it can be managed systematically. This keeps the data consistent and well defined and removes redundancy. Privacy and security policies may be applied uniformly based on the business content of the data."

Thursday, October 09, 2003

Two Vs

"Ordinary Australians are preparing with much excitement for President Bush's impending visit. The community has developed its own special greeting, the Big W which combines two 'V's — for victory in Iraq and Afghanistan — to form a 'W' for welcome, and also, of course, to represent the President's famous middle initial."

Australians plan to fall into line with a Big W

Another Review of Scopeware

I have previously linked to a review of Scopeware (Google Cache: 1, 2, 3). I recently came across a far more positive review:
"There is a lot that I like about Scopeware.

• It is incredibly innovative.
• The ability to search for related files via a keyword is great.
• I really dig the time based navigation.
• The process for adding feeds is identical to most aggregators.
• Seamless integration with MS Outlook.
• Three different and intuitive navigation styles to choose from."

Whereas he seems to have liked the 3D, time-based method the previous review did not:
"Scopeware orders all your stuff by time. Its creators assume that time is the most natural way to organize information. But that's a bit absurd. The phone book isn't ordered by time. Mail-order catalog aren't ordered by time. Real estate properties aren't ordered by time. Online auction items aren't ordered by time. Music albums aren't ordered by time — at least, not all the time. Even video libraries usually aren't ordered by time. "

I agree. The summary of the article is great:
"It's clear that the problems of today's conventional filesystems have not been adequately solved by Scopeware, and that Scopeware actually introduces a whole new set of problems. What, then, is the answer? What sort of information management technology should be developed instead? I firmly believe the answer lies within each individual user. The user should always be in control of their information. You should be able to organize your information in a way that makes sense to you, rather than have to figure out how the computer has organized your information. "

"I propose an information management system where files can be stored in categories, and metadata attributes would be attached to each file such as description, keywords, context, and other information that users can enter. Additional file-specific metadata, such as author and publication date for a manuscript, or director, cast, and crew for a movie, or artist and media type for a digitized painting, should be easy to specify and query later on."

Tuesday, October 07, 2003

Creative Commons Ideas

Technology Challenges "Currently we have a demonstration search that works by telling AlltheWeb to limit results to pages that link to Creative Commons licenses. While useful, this is far from our vision of a metadata-aware search engine.

The first requirement for a Creative Commons license-aware search engine is that license metadata (RDF embedded in pages) must be indexed. It wouldn't be necessary to index arbitrary RDF initially -- indexing only Creative Commons license metadata would be a good first step along the path to a Semantic Web-enabled search engine.

Once you start indexing license metadata, you can do two obvious things with it:

Provide users with an interface to filter their results by license or license characteristc. The aforementioned AlltheWeb demonstration interface is an example of the latter.

Display license information in search results. This could be done even if a query does not involve a license filter. If you have license information for a result, display the license in proximity to the result.

As you index and understand more metadata, you'll be able to go beyond these basics, with enhanced format or domain-specific searches and richly annotated results."

There are 8 other ideas on the page.

Metadata Zealots

metadata "If I were to rewrite my weblog software today I would create a universal text box... it receives data and a data type. If the data type is an entry it posts the data to my weblog. If it's a low-threshold link, it posts the data to my morale-o-meter. If it's a category, it creates a new category for entries and low-threshold links. If it's a photo it updates my mopho. If it's a name it can create a new author or update my friends list. Etc. I should be able to post comments to other people's entries and send emails from this universal text box. I should be able to ping a server or post to a remote API."

The Drowned World of Data "This is pretty much what I want to do in my own personal world of information. I want to pare down my data platforms to the bare minimum. I want 4-5 platforms maximum, which could be the following:

- Writing tool - my weblog and my paper notebooks (I still need both computer and pen)
- Ideas: Database
- Communication: Email
- Tasks: Outlook (or Chandler when it is finished)
- Reading tool: RSS Aggregator"

The idea behind Kowari is to write something for developers to solve these problems. The release will probably be in about two weeks and we'll have a developer beta release when we've got it close (probably in the next day or so). We should have some Jena 2.0 (or 2.1) support, data types, better sub-query speed, descriptors (XSL style sheets filled in by queries), and other stuff. We're after a good demo, I've got a list of over 30 Semantic Web/RDF applications that we're going to look at but if you've got an idea (with code) let me know.

Sunday, October 05, 2003

65 Million Euros for the Semantic Web

The first IST call in FP6: One Billion Euro to strengthen Europe's role in shaping the future of ICT and its impact on society and economy "...more than 65 M€ are devoted to semantic Web and context based knowledge handling. This will enable the development of far more “intelligent” search engines and knowledge sharing tools that are based on content and context."

From Semantic Weblog.

Friday, October 03, 2003

The Impedance Imperative

The Impedance Imperative Tuples + Objects + Infosets =Too Much Stuff! "In particular they added new native types to the database to support objects (called user defined data types) and large text types. Both of these extended types were syntactic extensions on Blobs, which were added largely to support images, and documents. SQL was extended to allow query operations over Blobs using special content selector objects. Recently text types have been enhanced to support XML schemas or DTDs."

"Meta programming [11] or generative programming is clearly the least offensive way to cope with this mess. A model driven generator can clearly address the syntactic redundancy and associated mappings. The generator handles the syntactic redundancy. This however is the easy part. Processing is still far too complicated. Unfortunately, most generative tools do not support debugging at the level of the abstraction, forcing programmers to have deep knowledge of the generated code and the underlying framework."

This is very close to my current thinking. He even references "The Third Manifesto". I would add something about how RDF could support meta programming though.

Niche Markets

The demise of the XML database "When is an XML database not an XML database? Answer: when it's an XML database. While you can still buy an XML database purely because it provides faster storage capability and greater functionality than a conventional database, all the erstwhile XML database vendors are increasingly turning to other sources of use for their products."

"For the XML database vendors this means that, like the object database suppliers before them, they need to look for alternative outlets, hence the drift into integration and content management. Even though they may provide faster retrieval capabilities the truth is that (like it or not) the market will make do with what the relational database products offer, if they can."

"The question then arises as to how long XML databases can exist within their new markets. It seems unlikely that they can long survive as content management products since the leading vendors in this market will increasingly avail themselves of the facilities provided by their underlying databases.

This leaves the use of these products for integration. But on-the-fly translation of XML documents would surely be better managed by an in-memory XML mapping tool rather than an XML database per se. Products such as eXcelon might evolve in this direction but in the long term it is surely the case that XML databases will be relegated to the same twilight (but profitable for those remaining standing) world that is inhabited by the object database vendors."

Storing RDF

Workshop on Semantic Web Storage and Retrieval "The goals of this workshop are to:

1. Bring together existing and new developers working on semantic web data storage and retrieval systems.
2. Discuss implementation lessons learnt.
3. Share this information to the wider community."

Thursday, October 02, 2003

Arguing about Metadata

Metadata, Semiotics, and the Tower of Babel

"Anyone who has built or been a heavy user of text databases or similar systems has run into this problem repeatedly. Fancy lexical or statistical processing does help, as does the integration of information like link patterns, but at the end there is always a significant and irreduceable noise, that means one either has a certain amount of garbage in the output (from the view of the user), or is dropping some amount of useful information. Nor is there a way out by using an artificial set of symbols, e.g., 'controlled terms', taxonomies and the like. In the large, with a heterogenous set of users, these perform no better than grinding up plain text."

Some of this is true, given a large enough set of users and a large enough set of documents the use of one word to mean one thing is lost. Not a big deal. But what Google is trying to do and has been for a long time (with the country based search engines, Froogle, etc.) is not only give documents context through things like link analysis but also reducing the data searched by making the search engine aware of the user's context.

Mathematicians, lawyers and falconers all have their own vocabulary and context. So you can describe your context when you're searching the Semantic Web. In fact, one of the early use cases for RDF was with P3P and automatic negotiation, requiring both client and server side descriptions of privacy policies.

"Now why should we suspect that taking character strings, and wrapping them in XML or RDF is going to change all of this? The syntactic sugar is all wonderful, and indeed a better mousetrap from the POV of systems integration, but the real basis for the blue sky claims that we're approaching Semantic Web nirvana is bound up in the signifiers, the symbols, that are to be wrapped in that sugar. Is there some magic in angle brackets that was not found in LISP parentheses, that will repeal human nature and semiotics? I think not. Call it a taxonomy, a controlled vocabulary, a metadata dictionary, it's all the same thing: yet another language, either small and brittle, or large and ambiguous. Either way, just another layer on the Tower of Babel."

"Coming soon: One place where the French and the Chicago school agree: economic reasons why the Semantic Web is a crock."

It's hard to disagree with the main arguments of the piece, that all language is symbolic and that meaning is based on context. That's fine. Not sure what that's got to do with RDF though. Saying that RDF will fail because it's based on language is making an argument in exactly the wrong direction. The reason why RDF will succeed is based on removing most of the complexities of language.

The same points that Cory lists can be used support why RDF can work. It pains me that this is still used as a good example of why metadata/RDF/Semantic Web will fail (I've commented on this before).

People lie - People also pay for (and sometimes get for free) reliable information from trustworthy sources (or sources they consider trustworthy).
People are lazy - Yes, and the best way to be lazy is to do the thing that requires the least effort. People want to find their book, document, music, video quickly and are willing either to do to it themselves or get some else to do it for them (sometimes for money). Because having the right metadata leads to less effort.
People are stupid - The web, search engines, classification tools, the Semantic Web, computers, etc. are all trying to help people become smarter - they are tools that help everyone think. In fact, it's a way to make the reliance on individual intelligence less important - it doesn't matter because you can look it up on Google. Many of the rules and requirements of language have been stripped out of RDF. You can't even make a false statement. It's easier to produce good RDF than to produce good English (luckily for me).

Good metadata gives a law firm an advantage over another, it lets you find that song in your MP3 collection when you're jogging, it stops your teeth from falling out, etc. It's wrong to say that because it's not going to be perfect it can't work. It's also wrong when plainly, people already rely on creating and using good metadata.

Joi Ito responds as well.

D2R Updated

"D2R MAP is a declarative language to describe mappings between relational database schemata and OWL ontologies. The mappings can be used by a D2R processor to export data from a relational database into RDF.

The mapping process executed by the D2R processor has four logical steps:

1. Selection of a record set from the database using SQL
2. Grouping of the record set by the d2r:groupBy columns.
3. Creation of class instances and identifier construction.
4. Mapping of the grouped record set data to instance properties. "

Based on Jena 2.

World Wide Mind

The World-Wide-Mind project "This work proposes that the construction of advanced artificial minds may be too difficult for any single laboratory to complete. At the moment, no easy system exists whereby a working mind can be made from the components of two or more laboratories. This system aims to change that, and accelerate the growth of Artificial Intelligence, once the requirement that a single laboratory understand the entire system is removed."

"The World-Wide-Mind is an artificial intelligence project that aims to make it possible to build large, complex, distributed minds by integrating independently developed modules into single minds."

Of course, each new attempt deserves another mark-up language in this case, SOML (Society of Mind Markup Language).


Noticed this is my referrers this morning: Semantic Collaboration "seco is a system to enable collaboration in online communities. It collects RDF data from the web, stores it in an index, and makes it accessible via a web interface. At the moment the system contains information about more than 7000 people and 2000 news items. This represents most of the information on the emerging semantic web in FOAF and RDF 1.0 vocabularies. This data has been created by a large number of people. The challenge is to tidy up this data and integrate it in a way that facilitates easy access and re-use."

Wednesday, October 01, 2003

Innovative = Successful

Fast, Focused and Fertile - the Innovation Evolution "Rebellion is out, relating is in. Twenty-six (26%) percent of companies define innovation as "a solution" that identifies and addresses the unmet needs of consumers. Very few associated innovation with a more likely term such as "discovery" or "revolution."

-- Tech companies out-innovate everyone. Microsoft was cited most frequently as one of today's most innovative companies (137 mentions) in unaided open-ended responses, followed by Dell (47 mentions), Apple (40 mentions). The only non-tech companies to make it on the Top 10 List are Wal-Mart (38 mentions) and Daimler Chrysler (21 mentions). "