Thursday, September 30, 2004

DAWG query language has a name

SPARQL - A Query Language for RDF from the latest DAWG minutes. Even has a Simpsons reference and possible logo, Mr Sparkle.

Kendall Clark has some ideas about whether FROM is necessary in "Re: [Fwd: FROM keyword unnecessary?]". I wish I had time to consider this more fully.

RDF's Rainbow

RDF Gravity (RDF Graph Visualization Tool) "RDF Gravity is a tool for visualising RDF/OWL Graphs/ ontologies. Its main features are:
* Graph Visualization
* Global and Local Filters (enabling specific views on a graph)
* Full text Search
* Generating views from RDQL Queries
* Visualising multiple RDF files"

It uses Jena and Jung.

Wednesday, September 29, 2004

One Persistance Framework?

Sun Proposes Single Persistence Model for Java ""Ideally I think a new JSR devoted to transparent persistence would have been more appropriate," Johnson said. "It's odd for EJB3 to be defining a persistence API applicable to J2SE. But overall I think it's promising. Everyone seems to agree that transparent persistence is important, and that POJO persistence is the future. The EJB expert group seems finally to accept that persistence should not be tied to the EJB container. Hopefully from here the focus will be on getting a good technical solution, rather than on politics.""

Good Ontologies and the Semantic Web

Judging the likely Success of an Ontology "The debate about the promised value of the Semantic Web seems to me to be missing a dispassionate examination of the success, or otherwise, of existing ontology based solutions. Clay Shirky is obviously right when he states that a single monolithic ontology will never work. His critics are equally right when they claim the Semantic web will only work if it is a melange of multiple interoperable Ontologies."

"The success or failure of any ontology should be judged primarily by it's ability to support the exchange of operational activity data between agents. This can only be confirmed after the ontology is implemented by assessing how the system performs in the context of use. To reduce the risk of failure in the early stages of specification the various components of the ontology should be assessed individually and collectively in terms of their ability to support required use cases for operational activity data."

From the same blog, Ontology Review 1. The NHS Common Basic Specification. Why top level Ontologies don't work. and Ontology Review 2: The International System of Units (SI). US Resistance to Adoption of the Metric System.

"Building a great ontology is only the first step. Getting people to adopt it is far more challenging. Adoption is not driven by the merits of the new ontology alone. Enforcement is often required. The US will not become metric until congress is prepared to enact enforceable laws that mandate the use of the metric system."

Bubbling to the top

What the Bubble Got Right Another Paul Graham essay. "So what's the connection between nerds and technology? Roughly that you can't fool mother nature. In technical matters, you have to get the right answers. If your software miscalculates the path of a space probe, you can't finesse your way out of trouble by saying that your code is patriotic, or avant-garde, or any of the other dodges people use in nontechnical fields."

"There does seem to be: that in the coming century, good ideas will count for more. That 26 year olds with good ideas will increasingly have an edge over 50 year olds with powerful connections. That doing good work will matter more than dressing up-- or advertising, which is the same thing for companies. That people will be rewarded a bit more in proportion to the value of what they create.

If so, this is good news indeed. Good ideas always tend to win eventually. The problem is, it can take a very long time."

Finding things in a non-semantic web

Drowned out by keywords "So here's a case for the semantic web. It's stupidly difficult to search for news of my hometown.

I live in the beautiful city of York, UK. In most search oriented applications I cannot search for my city. Why?

Because "New York" always matches a search for "York", too. Google ameliorates this a little: you can search for "York UK" or "York -New". But most search facilities aren't as good, and quite a few of my local news sources would not consider "UK" a term they should include in their metadata. "

Here's one closer to home, ISWC'04 or "International Symposium on Wearable Computers" not to be confused by ISWC 2004 or "International Semantic Web Conference". We're in trouble when 4 letter acronyms collide.

Not is no more

The operation still exists but it's now called "exclude". After spending too much time explaining to people how our "not" is not SQL's "not" but another type of "not" that inverts constraint values combined with the implementer occassionally getting confused between the syntax and the semantics, it lead to this new name. "Not" isn't what it used to be but then again maybe it never was.

From what I can tell so far, "exclude" and subqueries will allow you to perform set difference operations, whether it's clear is another problem. Similarly, you can do OPTIONAL queries using only subqueries but once you get to a few layers of subqueries your head explodes. In these cases it's probably better to have syntactic sugar for certain operations - how users express something and what the machinery underneath does to perform it shouldn't be so tightly coupled.

I made a mistake for the allValuesFrom restriction, here's the corrected version:

select $s $p $x subquery (
select $instance $t $x subquery (
from <...#testexclude>
where exclude($instance $t $x))
from <...#testexclude>
where $s $p $instance and $t <tucana:#is> <rdf:type>
and exclude($instance $t $x))
from <...#testexclude>
where $r <rdf:type> <owl:Restriction>
and $r <owl:onProperty> $p
and $r <owl:allValuesFrom> $x
and $s $p $o2
order by $s ;

I also recently read a paper, "OWL Lite- Reasoning with Rules", which clears up the difference between range and allValuesFrom: "[it] takes into account not only the property but also the domain of the statement. For example, you can say that Stefan manages Wolf, and Wolf is of type Student entails that Stefan is of class Advisor." That's also missing from the query but that's an intentional omission. :-)

Tuesday, September 28, 2004

Gnowsis Alpha

Available for Download I had to prioritize downloading it or blogging it...

Installing and running it was pretty painless (just running a shell script) and I'm already going through my MP3s - it evens executes iTunes. It's good to see these ideas implemented.

Feature list :
Server Features
* Local RDF Database (Jena Model based)
* Data integration Hub. integrates different Data sources.
* Filesystem adapter
* MP3-ID3 tag adapter (using MP3 Library by Jens Vonderheide)
* Microsoft Outlook adapter
* Mozilla Thunderbird email adapter
* Mozilla Firefox bookmarks adapter
* Java Client API
* full text indexing (using Apache Lucene)
* local webserver for experiments (using Jetty)

Browser Features
* Browse the local Semantic Desktop
* shows related information for any resource
* Manage your projects using ordinary File Folders
* full text search
* Link anything with drag-drop
* annotate photos and persons

Check out the planned features too.

No New Ideas

Paul has blogged (under the title "Graph Data Model") about a book called "Graph Data Model and its Data Language". It explains how you would go about creating a graph based database and query language - it would've saved us some time when writing TKS/Kowari. It even suggests powerful operations you could perform on such a database such as transitive closure.

Semantic Web Services Graphically

ODE SWS "The ODE SWS framework proposes the use of Problem-Solving Methods (PSM) to describe the services at the knowledge level; that is, independently of the language in which the service will be expressed."

"The Unified Problem-solving Method Language (UPML) has been proposed to provide a high-level description of the knowledge components of a PSM:
* Tasks, describe the operation to be solved, defining its input/output roles and the pre/post-conditions that should be verified to execute the task;
* Methods, detail the control of the reasoning process needed to achieve a task, specifying, if required, the decomposition of tasks into their sub-tasks as well as the coordination of the execution of such sub-tasks to obtain the result (operational description);
* Domain models, contain the knowledge of the domain of an application;
* Ontologies, semantically describe the elements used in the definition of tasks and methods (roles), and domain models (objects); and
* Adapters, establish mappings among the other knowledge components of a PSM, enabling the reuse of the components since adapters define the conditions in which the components could be applicable each other."

The manual was convincing enough to give it a try. The main graph UI is UML like, similar to Rational Rose. Written in Java it uses Jena and JGraph. The JAR is self executing but you need Jena and the rest in the classpath for it to work. It also comes with a distribution of Minerva and WebODE.

Monday, September 27, 2004

Other links

* Social Networks Connecting people with Information and Each Other "Verity K2 solutions take advantage of social network technology to automatically provide users with personalized discovery features. Through automatic analysis of the way users create, modify, locate and retrieve information, a model of the entire community..."
* Sir Tim Berners-Lee "TR: What kinds of Semantic Web applications are people making for the next phase?
B-L: Exciting things are happening in the life sciences. The big challenges such as cancer, AIDS, and drug discovery for new viruses require the interplay of vast amounts of data from many fields that overlap—genomics, proteomics, epidemiology, and so on. Some of this data is public, some very proprietary to drug companies, and some very private to a patient. The Semantic Web challenge of getting interoperability across these fields is great but has huge potential benefits."
* MG4J (Managing Gigabytes for Java) From a book I found interesting (NZDL is great). "MG4J (Managing Gigabytes for Java) is a collaborative effort aimed at providing a free Java implementation of inverted-index compression techniques; as a by-product, it offers several general-purpose optimised classes, including fast & compact mutable strings, bit-level I/O, fast unsychronised buffered streams, (possibly signed) minimal perfect hashing for very large strings collections, etc."
* Practical RDF Browser "So who’s going to do it? Awk, I wish I had time. Most of the stuff I’ve done previously along visualization lines has been from scratch using Java. But this time, for The Browser I’d go with SVG-enabled Mozilla/Firefox (or even forget the SVG, just make heavy use of CSS for a more graphic layout, or Plan C use XUL). Moz is already riddled with RDF, which should help. Its purpose would be a viewer, so I’d try and avoid having too many dependencies - a full RDF API/triplestore shouldn’t be needed, though being able to handle subclasses/properties would be very desirable, to enable approximate rendering."

Web Based Ontology Editing

pOWL " * POWL supports viewing, editing of RDFS/OWL ontologies of arbitrary size. It is even quite fast with the largest available models (e.g. NCI Cancer Ontology containing about 28,000 classes).
* Sophisticated widgets for data editing such as widgets for editing HTML in a WYSIWIG manner or for dates are integrated.
* Questioning the knowledge base. pOWL currently offers an RDQL query builder as well as a full-text search for literals and resources.
* Plugin concept. POWL is easy extensible, unfortunately still laking exhaustive documentation on this - please have a look at the source code and pester the developers. :-)
* Powerful object oriented API. All functionality is accessible by a clean application programming interface.
* Authentification scheme. Fine grained exposition of features and model data: Privileges (view, edit) for users and groups are planned to be assigned to Models, Classes and Properties.
* Versioning. All edits of a knowledge base may be logged and rolled back (depending on time, user and edit action).
* POWL is fast. Models are stored in database tables and only those parts of the model are loaded into main memory which are actually needed. Thus POWL is scalable and fast.
* Multi language support. POWL comes with English and German translations of the user interface. If you would like to provide a translation please contact us!"

The live demo is well worth a look.

Saturday, September 25, 2004

Load Speed

Triple Loading "What doesn’t show in the numbers is the fact that the entire load process is CPU bound, it’s not disk accesses that’s taking time. Also, the amount of triples, 473589, isn’t enough to fill up the in-memory cache MySQL maintains (here), not even importing a large dataset like Jim’s 6.7 million scuttered statements (converted into NTriples with jim2ntriples) seems to be. With bulk loading turned on, that entire process takes about 34 minutes, equivalent to about 3300 triples per second, as compared to the about 3000 triples per second for the best case above."

allValuesFrom in iTQL

I like this query because:
* It shows what NOT is (basically allows you to say things like "give me all the states that are not Alabama").
* Uses the new HAVING clause.
* Seems to be a nice use case for count and/or subqueries i.e. the ability to execute an outer query and plug the results into the inner query finally makes sense.
select $s $p $x count (

select $type
from <rmi://.../server1#foo>
where $s $p $type
and not($type <rdf:type> $x ))
from <rmi://.../server1#foo>
where $r <rdf:type> <owl:Restriction>
and $r <owl:onProperty> $p
and $r <owl:allValuesFrom> $x
and $s $p $o2
having $k0 <tucana:occursMoreThan> '0'^^<xmlschema:nonNegativeInteger>
order by $s ;

$k0 is the variable binding for the COUNT - we should offer a way to explicitly bind aggregates to a variable.

Another new feature that Paul's been working on, querying only URIs, makes use of virtual graphs and magical predicates like our data typing support.

Improving Guided Search

Site-search tools expand scope ""It is all about creating the metadata surrounding your content and using it effectively. With automated metadata creation, we can go through a site and pick out portions to expose," he said.

Whereas other search tools go through SQL for relational data queries, Atomz Commerce does relational metadata queries through site metadata.

"It is easier than SQL queries because that [metadata] is readily available on the Web site," Kusmer said."

"Guided search and self-service tools aim to improve on basic keyword site searching by pointing users to what they really want, said Eric Peterson, site operation and technology analyst at Jupiter Research.

"We see search vendors each looking at [their] core technology to see how they can better leverage that into other areas," such as applying natural language processing to self-service or guided commerce, Peterson said. "

Google's XUL

Old but Google's XUL is something for new Mozilla/Firefox users (2 million in 10 days). Via Xuggle.

XUL, something for your generic RDF browser. Not one UI for data but a framework to render your data with. Mozilla/Firefox is using RDF, and a query language, templates and bindings. There's also Kowari's descriptors which allow nesting of templates (vCard example).

Friday, September 24, 2004

Think Local, Search Global

Simple search lightens Net load " Researchers from the University of California at Los Angeles have devised a fast search algorithm that uses local rules to find nodes and content in randomly-formed, scale-free networks such as the Internet."

"The researchers' algorithm is based on the bond percolation threshold, or the smallest probability that a message is guaranteed to reach a core sub-network of highly-connected nodes, said Roychowdhury.

As connections randomly percolate through the network at a low rate, only small, isolated islands form. Once the bond percolation threshold is passed, the core of the network becomes connected. The threshold is an abrupt phase transition like the quick transformation of would in that takes place when water boils or freezes.

The algorithm involves three basic steps: content caching, query implantation, and bond percolation, said Roychowdhury.

Content caching happens when a node joins a peer-to-peer network and performs a one-time short random survey, or walk of nearby nodes and adds its content directory to each of these neighboring nodes.

The query implementation step is similar, but happens at the beginning of every query. When a node has a query, it performs a short random walk and passes the query along to each node it encounters.

These random walks are long enough that any given node will almost surely encounter at least one highly-connected node, said Roychowdhury. "So after these two steps one of the high-degree nodes has a copy of a node's directory, and a query is implanted at one of the high-degree nodes.""

"The key step in working out the algorithm was making sure that each query could reach a high-degree node starting from any node in the network. The conceptual breakthrough was the realization that the whole search process could be implemented locally, through messages passed among neighbors only, said Nima Sarshar, a researcher at the University of California at Los Angeles."

For some background see: Linked.

Matter's has a follow up with a link to: Scalable Percolation Search in Power Law Networks .

Adobe's use of RDF

XMP Lowdown "Adobe looks very strongly committed to XMP. Their decision to make it an RDF-based format, the high profile that Adobe products have in the commercial publishing world, and big print media publishers' growing interest in efficiently tracking their metadata are three factors that combine to make XMP a golden opportunity for the business world to appreciate the value of RDF. This business community doesn't care much about the Semantic Web, and its use of XMP (and hence RDF) will be behind firewalls, but an increased use of XMP in PDF, JPEG, and other formats will eventually mean more files with RDF-based metadata sitting on publicly accessible web servers, and hence a greater extension of the Semantic Web."

Verity Entity Extractor

Verity extracts meaning from content "Extractor scans documents of all types, and identifies the expected "information entities" such as names, addresses, dates and Social Security numbers, but also less obvious ones such as places, measurements and holidays."

Wednesday, September 22, 2004

Does Google's browser have name?

gbrowser "I would just love if the following rumor was true: Google builds a browser based upon Mozilla. Of course, there is some evidence Google is going to build a browser, just whois"

Google Browser Opens Up Gates For Web Services "This would take Google on a head-to-head strategy with Microsoft. Gmail or something like it could even very easily serve as a personal online server and archive.

The only core product that Google couldn’t challenge with that is the operating software and the office suite."

"In fact, a Google browser move is a strong example of what David Stutz warned Microsoft of; networked computing will overtake PC-based computing and Microsoft doesn’t have a competitive answer.

Google would have a lot to gain from a central position in Internet-based communications. Microsoft would have everything to lose.

Give me one good reason why Google wouldn’t do it."

IBM and its Semantic Web Technologies

Introduction to semantics technology "New semantic information-management schemes enable companies to make better use of their information. What exactly is semantics? And how can semantics technology help your development efforts?"

"Semantics technologies are important in a variety of industries such as life sciences, in which biomedical text plays a fundamental role in knowledge discovery. It can also be applied to numerous solutions, for example autonomic computing, where data correlation and inference technologies can be used as core components for building autonomic computing systems and to perform automated and continuous analysis that can result in actions that protect and heal a system."

"...State-of-the-art ontology management systems either are memory-based or use ad hoc solutions for persisting data...To overcome these limitations, researchers work on database-centered architectures for storing and manipulating ontological data. Such solutions will take advantage of existing standards for data management and the DBMS features that have been optimized over the years (robustness, concurrency control, recovery, and scalability). There is also much activity in the application domain. Two prominent applications that come to mind are the efforts involved in semantic Web services for capability-based discovery and composition of business processes, and the use of semantics with social networks for better collaboration."

Apart from the previous releases (IBM Integrated Ontology Development Toolkit and Ontology Management System (SNOBASE)) there's the more recent Ontology-based Web Services for Business Integration. Only for Windows (screenshots).

Tuesday, September 21, 2004

Early days for the Semantic Web Browser

Google's Browser Project...And Ours... "If true, the folks at Google should get in touch with me... without disclosing too much (yet), we are working on a project (for SRI and DARPA) to build a Java-based fully-semantic open-sourced PIM that grafts Mozilla onto my company's Semantic Applications Platform. The result is an integrated cross-platform PIM suite comprised of an OWL-ontology-based Web browser, e-mail, calendar, to-do manager, and chat... and that's just the beginning..."

FOAF based social network

Clique "Clique is a social networking site helping to drive the next evolution of the Web.

Clique is based on FOAF, an RDF vocabulary which helps to describe people and the relationships between them. These are the technologies that are driving the Semantic Web, which is envisioned to be the next evolution of the Semantic Web, and is well under way."

Pfizer Goes Semantic Too

WD: Pfizer Position Paper From the attached Word document:"There are really two issues here where Semantic Web technologies could be used to great effect. The first of these is the communications issue. How can experimental protocols, descriptions of model systems, statistical criteria for data acceptability, and many other critical elements be effectively communicated between technology silos? The second issue is that of synthesizing results from the various technology silos into a holistic picture of physiology..."

"...there are gaps between discovery, development, and clinical groups as well. These gaps have to do with inter-group communications and with the fact that, while there are some similarities, the different groups often work with very different types of information. Here again it seems that Semantic Web technologies could be instrumental in solving some of these problems, although it is clear that the solutions are not trivial."

"...When presented with a large amount of data and other information, IT departments typically create databases to store the data and provide interfaces by which users can query the data. Unfortunately, there are some problems with this approach. Non-IT users are typically very poorly equipped to perform queries of even middling sophistication, especially when they require the use of a query language such as SQL to accomplish. The upshot is that the data gets into the database, but it doesn’t come out again..."


Monday, September 20, 2004

Scaling Redland

Impact of storing RDF triples with Redland "In my prototyping Archipel, my pet software configuration management system, I started to use Redland to store the version information as RDF triples. I quickly realised that the RDF storage (stored by Redland using Berkeley DB) was using a lot of space, compared to what I was doing. For instance, storing 143 files generated a database of 624Kb, plus a directory containing the actual file content (the RDF storage did only contain versioning information). This is something like 5 to 6 time the size I was expecting."

"It seems that unless I am not using Redland Python API properly, Redland has an important overhead on storing triples. I hoped to use it as a storage backend for Archipel, because I liked the idea of managing version information in RDF, but the overhead is disappointing, if not discouraging.

However, Redland scales really well, and obvisouly will not grow in an unexpected manner when you reach the million of triples, which makes it really robust."

Data is structured

Forget unstructured data "In other words, all data is structured - it is just that some data is more structured than others. Actually, our misconceptions go further than this. The truth is that a photograph has more structure than a sales order, say. In fact, it has so much structure that we cannot usefully encapsulate it.

Indeed, this has important consequences for the way we that we think about metadata. This usually described as data about data. Now, consider the metadata that you would require to describe the Mona Lisa. We could agree that it was a painting, a portrait of a woman, and that it was painted by Leonardo. But metadata is supposed to describe the data: could you get any two people to agree on how to describe the Mona Lisa? And, if you can't get agreement on your metadata then it ceases to be worthwhile.

Perhaps then that is the distinction we need: it is not a question of whether data is structured or not, but actually about whether the data can be fully described by the relevant metadata."

Sunday, September 19, 2004

So much for the truth

Iraq had no WMD "A draft of the Iraq Survey Group's final report circulating in Washington found no sign of the alleged illegal stockpiles that the US and Britain presented as the justification for going to war, nor did it find any evidence of efforts to reconstitute Iraq's nuclear weapons programme."

A similar NY Times article and the Guardian also has another article on how Iraq is Far graver than Vietnam.

But then, there's the article "Who Cares About the Truth?".
"At the end of the day, is it always better to believe and speak the truth? Does the truth itself really matter? While generalizing is always dangerous, the above responses to the Iraq affair indicate that many Americans would look at such questions with a jaundiced eye. We are rather cynical about the value of truth."

"Sure, we may say we want to believe the truth, but what we really desire is to believe what is useful. Good beliefs get us what we want, whether nicer suits, bigger tax cuts, or a steady source of oil for our SUV's. At the end of the day, the truth of what we believe and say is beside the point. What matters are the consequences."

Saturday, September 18, 2004

Own your Data

The Future of the Semantic Web is Here Today and is Evenly Distributed "In this essay I will argue that the tools to do this exist now, and that all factors point toward a future of owning and publishing all our contributions."

"The most obvious example where redundant work has been noticed recently has been the proliferation of social networking sites like Orkut and Friendster. You had to reproduce the same data across every site. This didn't work out so well. In fact people were sick of these sites pretty fast. Now imagine producing, by hand, an rss feed for every rss aggregator in existence. Worst. Idea. Ever."

"We contribute our thoughts and work to the web now. We all write these incredibly useful pieces of information now, just not for ourselves.

Own your data. This future is here and is evenly distributed."

Friday, September 17, 2004

Drops to make an Ocean

many little models "Like many others (e.g. lost boy and more news) I’m wondering about the scalability of RDF data stores. In particular, we have many very small models (under 25 statements for most models, the largest is unlikely to exceed 100 statements) which we don’t really need super fast access to, but store more for reflection. While at current we only have twelve thousand of these models, we are expecting to collect much closer to 100,000 each term, and want to keep data around for at least the last couple of years (si lets say a million or so models). The 12,000 or so has already kicked our database to its knees (MySQL, Jena 2.1, an old P3 700Mhz with minimal ram), which we didn’t really expect."

Sesame 1.1 RC1 Released

"1.1-RC1 is the first official Release Candidate for Sesame 1.1. This release has a number of major improvements:

* The Graph API, an extension of Sesame's access APIs, allows fine-grained manipulation of RDF models directly from Java.
* The Native Disk Store is a new storage backend that works directly on the file system, without need for a DBMS. It uses B-Tree indexing on binary files for fast, efficient and scalable storage.
* SeRQL revision 1.1 is a syntax revision that makes SeRQL queries even easier to read and write, and makes embedding in XML easier.
* Blank node handling has dramatically improved compared to 1.0.x.
* RDF Schema inferencing has been updated to be fully compliant with the W3C RDF Semantics Recommendation.
* Support for MS SQL Server as storage backend RDBMS. Thanks to Adam Skutt for providing fixes and suggestions for this.
* Partial OWL reasoning support through Sesame's custom inferencer."


In A new Sail for Sesame thread: "1. Loading of multimedia metadata (includes metadata-extraction from 2777 MP3 files, transformation and load; so don't care for the absolute times...):

MSR: 85 sec (49089 explicit + 139894 inferred =188983 total statements)
NR: 87 sec (49089 explicit statements)

=> As NR does not generate inferred statements, I'm not sure what the performance will be with inferencing, but this looks good to me at the moment.

2. Renaming of a Class with 2777 associated instances:
MSR: 52 sec
NR: 3 sec
=> Wow! This is much better and depends on the number of triples involved (in contrast to the memoryrepository)."

Live Link Sharing

What are Live Bookmarks? "Services like let you publish your own bookmarks as RSS feeds, so that other Firefox users can subscribe to your bookmarks through Live Bookmarks. Live Bookmarks and makes it easy to share cool sites you like with your friends."

Thursday, September 16, 2004

More Military Intelligence

Uncle Sam's Semantic Web ""The beginning of the Iraqi operation was postponed for weeks because information systems couldn't be made interoperable in the time required," said Hendler. "Systems couldn't talk to one another." It was a problem, he says, that a Semantic Web framework could have solved."

"As more triples are created, checkbooks are coming out. TopQuadrant, one of the co-sponsors of the conference, has decided that the Semantic Web is a bankable technology, and predicts that a U.S. $63 billion market for "Semantic Technology" will emerge by 2010.

Miller, the W3C Semantic Web Activity lead, is not as confident in these numbers -- not from a lack of confidence in the technology, but from a suspicion of all predictions. "I am cautiously optimistic. I'm seeing an increase in tools and services," said Miller."

Wednesday, September 15, 2004

The Right Amount of Integration

Killing the "WinFS is About Making Search Better" Myth "At its core, WinFS was about storing strongly typed objects in the file system instead of opaque blobs of bits. The purpose of doing this was to make accessing and manipulating the content and metadata of these files simpler and more consistent. For example, instead of having to know how to manipulate JPEG, TIFF, GIF and BMP files there would just be a Photo item type that applications would have to deal with. Similarly one could imagine just interacting with a built in Music item instead of programming against MP3, WMA, OGG, AAC, and WAV files. In talking to Mike Deem a few months ago and recently seeing Bill Gates discuss his vision for WinFS to folks in our building a few weeks ago it is clear to me that the major benefits of WinFS to end users is the possibilities it creates in user interfaces for data organization."

Monday, September 13, 2004

Fibrous Blogging

David Wood's new blog, is where he sits down and gives us a vigorous, new article "Making the Semantic Web Viral": "The Semantic Web has not undergone the explosion of users that caused the early World Wide Web to be successful...The barrier to entry is currently substantially higher than it was in the early Web."

"The way I see it, Web links with semantics need an attribute on a hyperlink that defines the relationship between the resources...Once you have links with semantics, you need to provide a means of navigating from the existing Web to the more abstract space of RDF and back again. You can already link a Web page to another Web page, but what do you do when you want to link to a person? Representing people as email addresses or even FOAF files is weak, silly and just plain insufficient. People exist only in Meat Space and so have to be represented in Semantic Web Space by some non-trivial means. RDF & OWL already provide a great answer for this: A person object can exist that link to email addresses, FOAF and all the rest. Now we only need a way to navigate it via the Web."

I'm not sure if this is anything like what David wants but I recently came across This is a nice OS X client for The screenshots shows a user searching and adding metadata.

Saturday, September 11, 2004

Quick Links

* Dave Orchard on the Semantic Web
* Study: Facets on the web
* Mindmapping blogs
* Allchin's last stand?

Combine SW and Google

"Swoogle provides data service that discovers, digests, analyzes and indexes semantic web documents as well as portal service for the semantic web researchers. It is carried out by the ebiquity research group at UMBC."

Topic search and statistics.

Also, a paper is available: eBiquity: Publication: Swoogle: A Semantic Web Search and Metadata Engine.

Friday, September 10, 2004


Named Graphs API for Jena "The Named Graphs API for Jena (NG4J) is an extension to the Jena Semantic Web framework for parsing, manipulating and serializing sets of Named Graphs."

A lot of these methods are similar to the changes made between Kowari 1.0.4 and Kowari 1.0.5; including methods on DatabaseSession that actually use Jena objects.

Chemically assisted nuclear reactions

Cold Fusion Back From the Dead "THE FIRST HINT that the tide may be changing came in February 2002, when the U.S. Navy revealed that its researchers had been studying cold fusion on the quiet more or less continuously since the debacle began. Much of this work was carried out at the Space and Naval Warfare Systems Center in San Diego, where the idea of generating energy from sea water—a good source of heavy water—may have seemed more captivating than at other laboratories."

"Then, last August, in a small hotel near the Massachusetts Institute of Technology, in Cambridge, some 150 engineers and scientists met for the Tenth International Conference on Cold Fusion. Conference observers were struck by the careful way in which various early criticisms of the research were being addressed. Over the years, a number of groups around the world have reproduced the original Pons-Fleischmann excess heat effect, yielding sometimes as much as 250 percent of the energy put in."

Thanks to Paul for letting me know about this.

Open Workbench source available

Open Workbench it's a replacement for MS Project written in Java and C++.

Historic News

Brooklyn Daily Eagle 1852-1902 It's a very interesting experience. Via Shelflife, this article explains a bit more about it:
"The Brooklyn Daily Eagle, founded in 1841, was Brooklyn's newspaper of record. During the Civil War it was the most widely read afternoon newspaper in the nation, gaining prestige as the century drew on. The vast repository contains vibrant reporting on local and national events through a period of enormous growth. It is also a rich resource for illustrations and photographs."

"Naturally, the demand for digitizing 1902–55 of the Eagle is very high, since these recent years cover the lifetimes of many people's close ancestors. But several issues need to be resolved. First, web access to newspapers after 1923 presents copyright complications, which lawyers are now reviewing."

A enjoyed flipping through it like a clipping from 1902, Decemeber 5th or you can read the sports section before basketball was invented or about the solar system before Pluto was discovered.

Reiser4 Metadata

Reiser4 file semantics: An opportunity for open source "Some people feel that the Reiser4 file semantics will present problems for the Linux community. In a nutshell, every file now looks like a directory and can be opened as a directory. The names in that directory are not new files but metadata associated with the file, as documented by Hans Reiser on the Namesys site. The immediate response in the community has been that this is too big a change and should be withdrawn. I humbly propose that this is a challenge we should face head on now or we may not have an opportunity to do so in the future.

The best way for open source to fight patents is to create prior art, and you can only create prior art if you have a problem to solve. WinFS is going to give Microsoft the opportunity to discover the problems that have to be solved when faced with a filesystem that offers rich metadata. "

Wednesday, September 08, 2004

A Rel Query Language

"Rel is a project to develop a true relational database server based on a relational language called "Tutorial D". Rel is not SQL!"

"In The Third Manifesto, Date and Darwen propose a database language that:

* Is truly relational;
* Provides certain desirable programming language features, including some that are typically (but not exclusively) found under the "Object Oriented" heading;
* Represents a "firm foundation for the future of data.""

They haven't finished implementing TCLOSE (transitive closure) yet though.


Dyson's Meta-Mail: merging email with business processes "The point of this example is that a business process is being conducted via email. Each message is a step within our ad-hoc "renew a support contract" process."

While the idea of business process via email doesn't seem novel to me. Something that people have been doing with applications like Exchange for quite a long time. I did notice a link to Roundup. Whose data model (hyperdatabase) sounds a lot like RDF:
"For our application, we store each item as a item in a hyperdatabase. The item's properties are stored as key-value pairs on its item. Several types of properties are allowed: string, number, boolean, date, interval, *link, and multlink...The link type denotes a single selection from a number of options. A link property entails a link from the item possessing the property to the item representing the chosen option.

The multilink type is for a list of links to any number of other items in the in the database. A multilink property, for example, can be used to refer to related items or topic categories relevant to an item."

Trans and Walk in the DAWG

"In order to traverse class/property hierarchies defined in a schema, RQL provides functions such as subClassOf (for transitive subclasses) and subClassOf^ (for direct subclasses). For example, we can issue the queries: subClassOf(Artist) and subClassOf^(Artist) to find all transitive (direct) subclasses of class Artist. Similarly, functions superClassOf and superClassOf^ return transitive (direct) superclasses.

Similar functions exist for schema properties (i.e., subPropertyOf and subPropertyOf^). For example, we can ask for all transitive (direct) subproperties of creates: subPropertyOf(creates) and subPropertyOf^(creates)."

"That's what I had in mind for 4.6 as originally proposed."

I hope I'm correct in characterizing this as the functionality we currently provide in iTQL with trans and walk. Swapping the first parameter's values around swaps between super and sub class. I'd also like to add a limiting factor for recursing called depth.

Google Photo

Picasa Ads Appearing On Google Image Search Results "When the search engine first featured a link for the digital photo utility, traffic to Picasa increased by 6000%.

Now Google has begun including an ad-style Picasa banner on the result pages for their image search engine."

Further information, " – Google’s New Photographic Portal": " is a free photo organizer that allows you to share your photos with friends and family. Best of all, its free! is integrated with enhanced photo editing tools that could possibly rival Photoshop with the ease of use and portability."

The continues the Googlification of all things like the recently rumoured Google Chat based on Jabber.

RDF Dump

RDF Data "This is a list of stores of RDF data available on the web."

From is go.

Tuesday, September 07, 2004


"Database File System; It is a new type of file system that does away with places where you store your files. Actually do not think of it as a file system, instead think of it as a document system. And while being precise, it is not database system either, it is a faceted system?. It is a file system geared toward serving the user and is meant to make your live easier. It supports 'locating' files the way you think about them."

"The implementation is not perfect, it is a prototype grade application, meant as a proof of concept. The server and client are implemented using O'Caml?, but the client has different APIs: O'Caml, C, C++ and Objective-C."

Semantic Web Technologies

Semantic Web Technologies "...provides the core technology found in Topic Map Explorer. We are also about to release a C# RDF Engine, C# Peer-to-Peer infrastructure Technology and a C# Topic Map Engine."

Topic Map Explorer.

Monday, September 06, 2004

Creative Commons Search

Our updated Search "We flipped the switch last week and have been testing it ever since. Compared to the last version of our search engine, this one is blazingly fast to return results, the results are much more specific to what you're looking for, and it is constantly keeping up to date on over 1 million pages with Creative Commons license info in them."

What I like, is the "explain" feature. It looks like the Nutch API is specific to Creative Commons and isn't available for general RDF use. Nutch has a bunch of other plugins similar to resolvers in Kowari like mp3 and mbox.

Florida Courts deploying RDF

Florida Supreme Court Selects Metatomix "Metatomix, announced that the Florida Supreme Court, Office of State Courts Administrator, has selected it to deliver a next-generation Judicial Integration Solution for the Florida Supreme Court system. The Metatomix solution enables a seamless, cross-agency, real-time information-sharing system for the Florida courts."

"Metatomix, Inc. provides what it calls next-generation Semantic Web-based technology for enterprise resource interoperability (ERI). Using the W3C RDF standard, Metatomix technology offers bi-directional, distributed interoperability platform, that enables business managers and enterprise IT professionals to manage their entire resource set and IT infrastructure assets through a unified view."

Military Intelligence

Army seeks to use semantic Web to improve knowledge management "Lt. Gen. Steven W. Boutelle, CIO of the Army, said the service must move away from the current Web environment and turn to the semantic Web, which will help military users improve their knowledge management.

“This is the next step … to really get knowledge out of terabytes of information,” Boutelle said yesterday in a keynote speech at the Army’s 2004 Directories of Information Management conference."

Friday, September 03, 2004

The Banality of

I guess I'm just annoyed that they publish these things when I'm asleep but I actually liked Paul Ford's The Banality of Google:
"Oh, and I almost forgot—after the first focus group, and this is true, Satan stayed after to get his $30 and then tried to tempt Google. He said, “Larry and Sergey, I will give you everything! I will give you a cluster of 100,000 Linux servers!”[1]The Google guys were like, “Linux servers, huh. That sounds cool.”

And Satan said, “I will give you an algorithm to rank pages based on links.” And they were like, “you totally read our minds! Did you go to Stanford?” And Satan said, “I will give you six years of press coverage by fawning, lazy journalists, and a million slavering blogger acolytes to command as you will.”

Now the Google guys were nodding. They were like, “whatever you need, Satan. We'll give interviews to pornographic magazines, sell ads to the world's worst polluters, whatever you want.” So Satan said, “To seal the deal, I will give you a new logo—one that does not look like half-digested fridge magnets!”

But Satan had miscalculated! Because the logo was Google's number one secret technological advantage, the one thing no one would ever think to copy: it was so bad that it always made their search look good, no matter what could happen. So they said, “begone, foul wyrm, destroyer of worlds!” Although they later invited him back as a consultant. And that is how Google became the Jennifer Lopez of the Internet. "

Sure there's Screenscraping the Senate and Converting XML to RDF.

Thursday, September 02, 2004

Google for the Semantic Web

Accepted papers for FOAF-Galway included the following submissions:
* Open Rating Systems "If we ignore distrust and negative ratings, i.e., we only consider statements of trust and positive ratings, we can easily adapt PageRank as follows..." (that's Google's PageRank),
* SemIndex: Preliminary results from semantic web indexing "...indexing RDF documents such that they may be ranked and retreived by their contents." It talks about indexing a 1,000 million (an American billion) triples.

Also of interest:
* Fentwine: A navigational RDF browser and editor.

Also, earlier this year Semantic Web Personalization.