Friday, October 29, 2004

OntoBuilder

DG Research Team Is Developing a Digital Interpreter "One of the first steps in the process is the creation of an ontology, a cross-lingual thesaurus that provides a thematic structure for all the terms one is likely to encounter. Ontologies need to be semantically sensitive: "bank" near "river" means something quite different from "bank" near "teller."

"In the context of Digital Government, ontologies play an increasingly important role, as database metadata schemas, terminology standardization structures and the foundation for interfaces between applications," says Hovy. "Yet the complexity and cost of building ontologies remains a daunting challenge.""

"Since ontologies are such a necessary and time-consuming precursor to any integration project, researchers are investigating, "semi-automated methods to build ontologies to align and merge existing ones for new purposes and to adapt old ones for re-use," says Hovy, who hopes this will become a prototype project for just such semi-automation technology."

OntoBuilder was covered previously, it's Java based, and is mentioned in the Ontology Tools Survey, Revisited.

Thursday, October 28, 2004

More ISWC Papers

The previous posting I made on scalable OWL seems to have been very popular and the Semantic Web Interest Group IRC Scratchpad posted links to all the papers at ISWC 2004 Annotations. Sadly, I'm impressed that, under Safari, the title has a drop shadow that is done using stylesheets. More information, Text-Shadow in Safari 1.1.

The U in URI

A couple of interesting postings which seems to have a common theme:
* First in a series of five postings on the "W3C Semantic Web in Life Sciences Workshop", TBL Keynote at W3C Semantic Web in Life Sciences Workshop "True to form, Sir Tim ensured that everyone present understood that they should all use URIs for everything. I wonder what his sock drawer looks like. His example was to define a URI for a concept like "colour"...Tim was asked where to find a browser for the Semantic Web (again). The difference this year is that there is starting to be a good answer to that question. Have a look at (Longwell) and (Haystack). He also pointed to ( Ontaria, which is more of a directory than a browser..."
* Tim Berners-Lee on the Failure of URIs as Identifiers in RDF, Faulty URIs, Identifiers: Uniform Resource Identifier (URI).
* Learn About Haystack: The Universal Information Client

Wednesday, October 27, 2004

Isolation

ACID is Good "Unfortunately, ACID transactions do not work effectively over a long period of time. Do not expect things to work if your transactions last more than even a few seconds...This is the so-called “long transaction problem”. No one has found a solution for it after many years of research. The basic problem is achieving isolation – the “I” in “ACID”. There are no known concurrency control algorithms that will operate over a long period of time."

"Over the years, several techniques have been proposed for managing long-lived activities. One of the first is called a Saga. [GGKKS] Sagas require you to define compensating transactions. A compensating transaction compensates for the effects of a transaction. For example, a compensating transaction for reserving a hotel room would be a transaction that cancels the reservation."

"Therefore, rather than provide support for a single model, such as Sagas, JSR 95 defines an infrastructure to support a wide range of extended transaction models. The architecture is based on the insight that the various extended transaction models can be supported by providing a general purpose event signaling mechanism that can be programmed to enable activities (application specific units of computations) to coordinate each other in a manner prescribed by the extended transaction model under consideration."

The read phases in Kowari/TKS can work over days without interrupting other phases. I've been meaning to try and explain its isolation level which is somewhere between 2 and 3. Reads don't interact with locking but writes are serialized. It doesn't prevent phantom reads but it does mean that querying graphs and getting sub-graphs will always be consistent - i.e. as you constrain it further you always get fewer results.

Tuesday, October 26, 2004

Scalable OWL Lite

An Evaluation of Knowledge Base Systems for Large OWL Datasets "In this paper, we present an evaluation of four knowledge base systems (KBS) with respect to use in large OWL applications."

"DLDB-OWL [23], is a repository for processing, storing, and querying large amounts of OWL data. Its major feature is the extension of a relational database system with description logic inference capabilities. Specifically, DLDBOWL uses Microsoft Access® as the DBMS and FaCT [16] as the OWL reasoner. It uses the reasoner to precompute subsumption and employs relational views to answer extensional queries based on the implicit hierarchy that is inferred."

"...we were surprised to see that Sesame-Memory could load up to 10 universities, and was able to do it in 5% of the time of the next fastest system. However, for 20 or more universities, Sesame-Memory also succumbed to memory limitations...The result reveals an apparent problem for Sesame-DB: it does not scale in data loading...As an example, it took over 300 times longer to load the 20-university data set than the 1-university data set, although the former set contains only about 25 times more instances than the later...Sesame is a forward-chaining reasoner, and in order to support statement deletions it uses a truth maintenance system to track all deductive dependencies between statements."

"From our analysis, of the systems tested: DLDB is the best for large data sets where an equal emphasis is placed on query response time and completeness."

Lehigh University Benchmark.

Network Inference gets $8.1 million

Network Inference Announces $8.1 Million in New Funding "Network Inference, the leading provider of standards-based adaptive software infrastructure, today announced the first quarter 2004 close of $8.1 million as part of its Series B venture capital funding round."

In Search of Mosaic for the SW

Mining the Semantic Web "You've got to have ontologies; you've got to have a language for doing it; you need a social networking mechanism to enable you to make these inferences between representations that people are putting up about themselves. And you need a higher layer that actually crawls or somehow gathers that data and integrates it behind a single interface.

Today we have this new generation of technology. We don't really even have a browser for this information. There are some standards that have just come out, and we're at the stage now where there's no Yahoo. There's not even a Mosaic. There's definitely no Netscape. The question will be really who is in position to make these critical, enabling first key pieces of technology that will catalyze the rest of it."

"Where we're going is to take the hundreds of thousands of real-time news feeds today—going on millions—and turn it into broad search. The way we look at RSS is it's everything on the web that changes, which is to say everything on the web that has value."

Slides for Urchin and Kowari

NPG and the Semantic Web, Urchin and Kowari
"Urchin:
* RSS Aggregator which represents all feeds in RDF
* Written in Perl
Kowari:
* Specialized RDF database
* Written in Java
Urchin-Kowari:
* Replace Perl's RCQL with Tucana's iTQL query
* The interface is otherwise unchanged
* Urchin and Kowari are connected via SOAP"

P2P Review

The Networked Semantic Desktop "In the first phase, Semantic Web, P2P, and Social Networking technologies are developed, researched and partially deployed. In the second stage, we will see convergence: Semantic Web technology is deployed on the Desktop, resulting in the Semantic Desktop. Similarly, Semantic Web technology is incorporated in P2P networks and Social Networking. Once there is a reliable technology available for the technology convergences, the next phase can be tackled: the combination of the three fields Semantic Desktop, Semantic P2P and ontology-driven Social Networking into the Networked Semantic Desktop."

Via Post details: Networked Semantic Desktop.

Information Architecture

Metadata? Thesauri? Taxonomies? Topic Maps! "Information architects have so far applied known and well-tried tools from library science to solve this problem, and now topic maps are sailing up as another potential tool for information architects. This raises the question of how topic maps compare with the traditional solutions, and that is the question this paper attempts to address."

Offers a good introduction to the ideas of taxonomies, ontologies, faceted classification, controlled vocabularies, and of course, topic maps.

XP vs RUP

XP vs RUP "The main reason is that I think XP focus is the correct one: Simplicity, feedback, embracing change, people developing quality software that works. XP has a positive perspective on software development, putting trust before control. Furthermore, many tools for quality assurance, testing, reporting, and continuous integration have emerged from the XP community."

Monday, October 25, 2004

Interception

Using Spring AOP to Implement the Cuckoo's Egg Design Pattern "This article has shown how to carefully use the active around form of advice inside of the Spring framework. The around form of advice is commonly used when implementing the Cuckoo's Egg design pattern, and so you were shown one example of how this aspect-oriented design pattern can be implemented using Spring AOP."

Sunday, October 24, 2004

Apple in the Enterprise

Apple's Stupidity - Not Supporting "Third Party" User Space Applications That Cause Kernel Panics "...when Qmail is spam bombed with a flood of emails, the system suffers a kernel panic. Yet, this problem only occurs on dual processor machines! Apple's lovely suggestion - "Oh, just disabled the second processor...Under no circumstances should a userspace application cause the kernel to panic. On Unix, this is just plain unacceptable and represents an issue with something in the kernel or core libraries. Basically, you are saying that any end user has the ability to crash the machine. For example, if I have an account on a Mac OS X box, I can compile qmail and bind it to an unprivileged port. At that point, I can flood the machine with enough requests to crash the system."

That's exactly what we've seen with Kowari/TKS. I like developing on an Apple system, sure it's slow and Apple keep tying upgrades of the JVM to the OS but it's those other productivity benefits that make it worthwhile.

Recent Java 1.5 Articles

* To Annotate or Not? "...when the goal is really about adding layers of metadata on top of the source and that applies uniquely to specific code elements, metadata in Java should fill the bill perfectly. The ability to supply the metadata at the same time and locality as the code increases ease of development. Not only is development made easier but the processing is also much simpler, because the object model is defined by the annotation type definitions themselves."
* GETTING TO KNOW SYNTH "Synth is a new look and feel added to project Swing in J2SE 5.0. Synth is a skinnable (that is, a customizable) look and feel, where the "skin" (that is, the user interface) is controlled by an XML file. Instead of customizing the look and feel by providing default properties to the UIManager in a Properties table, you load in an XML file with component definitions. That means you can create a custom look without writing code."
* While not 1.5 releated, IBM security providers: An overview, shows JAAS login modules for OS/390, AIX, and Windows (both Active directory and NT 4).

Friday, October 22, 2004

Instance editing with Giblet

Announcing Giblet Features include:
* Class selection and resource creation
* Appropriate property selection, based on the domain of the class and its super-classes
* Appropriate value input, whether as a string literal or an internal resource reference (based on the range of the property)
* Saving and loading of simple graphs to and from local storage, via cookies
* Export of graphs as RDF/XML, and a submission to the W3C's online image generator for RDF graphs
* Availability of templates (initially FOAFNet only) to help a user get started

Demo.

I also noticed RDF Templates which seems to have similarities with Kowari's Descriptors.

Thursday, October 21, 2004

Pychinko

"Pychinko is a Python implementation of the classic Rete algorithm (see Forgy82 for original report.) Rete (and its since improved variants) has shown to be, in many cases, the most efficient way to apply rules to a set of facts--the basic functionality of an expert system. Pychinko employs an optimized implemention of the algorithm to handle facts, expressed as triples, and process them using a set of N3 rules. We've tried to closely mimic the features available in CWM, as it is one of the most widely used rule engines in the RDF community. Several benchmarks have shown our Rete-based Pychinko to be upto 5x faster than the naive rule application used in CWM (see presentation below for preliminary results.) A typical use case for Pychinko might be applying the RDFS inference rules, available in N3, to a document. Similar rules are available for XSD and a dialect of OWL."

Gibson vs Stephenson

Who would win? "In a fight between you and William Gibson, who would win?

You don't have to settle for mere idle speculation. Let me tell you how it came out on the three occasions when we did fight.

The first time was a year or two after SNOW CRASH came out. I was doing a reading/signing at White Dwarf Books in Vancouver. Gibson stopped by to say hello and extended his hand as if to shake. But I remembered something Bruce Sterling had told me...During the regeneration process, telescoping Carbonite stilettos had been incorporated into Gibson's arms...Gibson and I dueled among blazing stacks of books for a while. Slowly I gained the upper hand, for, on defense, his Praying Mantis style was no match for my Flying Cloud technique. But I lost him behind a cloud of smoke."

Ethnoclassification

Metadata for the Masses "We’re beginning to see ethnoclassification in action on the social bookmarks site Del.icio.us, and the photo sharing site Flickr. Both services encourage users to apply their own freely listed tags to content — tags that others can then employ when looking for content. See a web page that looks interesting, but don’t have time to read it? Post it to Del.icio.us with a tag that will help you find it again."

"The primary benefit of free tagging is that we know the classification makes sense to users. It can also reveal terms that “experts” might have overlooked. “Cameraphone” and “moblog” are newborn words that are already among Flickr’s most popular; such adoption speed is unheard of in typical classifications. For a content creator who is uploading information into such a system, being able to freely list subjects, instead of choosing from a pre-approved “pick list,” makes tagging content much easier. This, in turn, makes it more likely that users will take time to classify their contributions."

Wednesday, October 20, 2004

Unit of Knowledge

CBD - Concise Bounded Description "A concise bounded description of a resource is a body of knowledge about that resource which does not include any explicit knowledge about any other resource which can be obtained separately from the same source.

Concise bounded descriptions of resources can be considered to be a form of representation, however they are a highly specialized form and not the most usual or obvious form in a web primarily intended for human consumption. They are, however, a key form of representation which semantic web agents need in order to reason about such resources and adjust their behavior accordingly."

Deploying Metadata

To Metadata or Not To Metadata "Given the rather pathetic record that many metadata efforts have racked up, it is little wonder that organizations have begun to question the entire value of adding metadata. However, there is another side to the story. First, the cost of adding metadata can be reduced in several ways. For example, the $200K for a metadata initiative performed by outside consultants can be greatly reduced by not starting from scratch in each case, but rather starting with existing metadata standards and controlled vocabularies and taxonomies. The cost of a unique custom job will always be higher than one that at least starts with predefined components.

In addition, the cost of doing metadata has to be weighed against the cost of not doing metadata. Assume for the moment that adding metadata would solve all of the problems associated with search. One estimate from IDC puts the cost of bad search at $6 million for a 1,000 person company. Now it is unlikely that adding metadata will solve all search problems, but even if it only solves half, that is still a savings of $3 million per year. In this context, $200,000 for metadata doesn't seem so exorbitant."

"You wouldn't think of running a company without organizing your employees, why do you think you can create access to information without organizing that information?"

Tuesday, October 19, 2004

Googling your way to freedom

Australian journalist tells of capture in Iraq "Initially they thought Martinkus was a CIA agent or a contractor working for the US. Then they entered his name into an internet search engine to check his story. Convinced he had no links to the United States, his captors let him go."

The BBC story confirms that it was Google.

tolog

One topic map query language, tolog. In comparison to SPARQL, behold the magical abilities of projection (no distinct required), count and sorting. Also, non-failing queries (optional), negation, limit and offset, and built-in inferencing predicates.

Their negation looks very similar to iTQL's exclude.

Monday, October 18, 2004

Date and Pascal on RDF

ON METADATA, RDF AND RELATIONAL REPRESENTATION and ON RELATIONAL BINARY DATABASE DESIGN.

From the later:
"Research papers abound from the Semantic, Topic Map and RDF camps that claim relational binary database design is the superset of n-ary or the true relational model.

The only reference to relational binary that I could find on your site was mentioned by C.J. Date in passing. He mentioned that Codd showed that n-ary and even 0-ary relations have unique and important properties that would in fact make n-ary the superset."

"The foregoing point notwithstanding, there are actually some pretty strong arguments in favor of making all base relations binary. Well, not binary exactly, but irreducible, rather; an n-ary relation is irreducible if it can’t be non-loss decomposed into two or more projections each of degree less than n. (In practice, irreducible relations often are binary, a fact that might account for part of the confusion I mentioned; however, some irreducible relations are not binary and some binary relations are not irreducible.) But the question of whether base relations should be irreducible is a database design question, not a relational model question."

From Good recent writings on data.

Another one, ON THE “SEMANTIC WEB”. I was going to say that in iTQL we don't return NULL but we do if the value is unconstrained. I was going to say that in SPARQL it doesn't return NULL but it does. I've been playing with REL as well, just to see how it performs certain queries too.

Breaking Classloader Hierarchy

Find a way out of the ClassLoader maze "Thread context classloaders were introduced in Java 2 Platform, Standard Edition (J2SE). Every Thread has a context classloader associated with it (unless it was created by native code). It is set via the Thread.setContextClassLoader() method. If you don't invoke this method following a Thread's construction, the thread will inherit its context classloader from its parent Thread. If you don't do anything at all in the entire application, all Threads will end up with the system classloader as their context classloader. It is important to understand that nowadays this is rarely the case since Web and Java 2 Platform, Enterprise Edition (J2EE) application servers utilize sophisticated classloader hierarchies for features like Java Naming and Directory Interface (JNDI), thread pooling, component hot redeployment, and so on."

The 3rd Age

The end of data? "There used to be a difference between data and metadata. Data was the suitcase and metadata was the name tag on it. Data was the folder and metadata was its label. Data was the contents of the book and metadata was the Dewey Decimal number on its spine. But, in the Third Age of Order (see the previous issue), everything is becoming metadata."

"So, in the Third Age of Order, all data is metadata. Contents are labels. Data is all surface and no insides...Why does this matter? It changes the primary job of information architects. It makes stores of information more useful to users. It enables research that otherwise would be difficult, thus making our culture smarter overall...But, now that everything is metadata, no particular way of understanding something is any more inherently valuable than any other; it all depends on what you're trying to do. The old framework of knowledge — and authority — are getting a pretty good shake."

Saturday, October 16, 2004

SWAP Highlights

From the Semantic Web Applications and Perspectives program:
* Peer-to-Peer Semantic Coordination which considers "...the problem of coordinating hierarchical classifications". A paper on the algorithm listed, CtxMatch, is "Improving CtxMatch by means of grammatical and ontological knowledge – in order to handle attributes".
* RDFGrowth and the Dbin Project. Another paper was published here. Dbin is "..a p2p semantic web algorithm designed so that each single peer is only allowed to cause a minimal, bounded, remote computational burden".
* Semantic Web Tool Evaluation "We have tested the following systems: Jena, Sesame, RdfSuite, Triple, RdfStore, 4Suite, RdfDB....RDF(S) is unable to express basic representation needs typical in simple ontologies...few tools are unable to correctly handle RDF(S) ontologies...no semantic web tool is able to correctly handle even OWL-Lite...the only tool able to correctly handle OWL-Lite ontologies is the description logic inference engine Racer." I'm not sure quite how correct these statements are, maybe with respect to the projects listed, I know we intend for Kowari to support RDFS and OWL-Lite.
* Querying the Semantic Web: a new approach Also talks about using CtxMatch, "...we introduce in the paper a first set of semantic parameters we consider relevant: (i) the type of relation, i.e., the semantic relation that holds between a concept of the source schemas and concepts in the target schemas (e.g., equivalence, greater or lesser generality); (ii) the ontological distance, namely the distance between a source and a target concept with respect to some reference ontology; and (iii) the lexical distance, that is the distance between the formulation of semantically related concepts in different schemas."
* A Framework for Unified Information Browsing Which discusses the RDFX Project.

Friday, October 15, 2004

Google is the platform

Search your own computer "Google Desktop Search is how our brains would work if we had photographic memories. It's a desktop search application that provides full text search over your email, computer files, chats, and the web pages you've viewed. By making your computer searchable, Google Desktop Search puts your information easily within your reach and frees you from having to manually organize your files, emails, and bookmarks."

Screenshots.

Google Desktop Search Launched "Indexing is fast and only happens when your computer is idle...Get a new email? Visit a new web page? All this information is automatically recorded and made searchable within seconds."

"Each time you view something, a snapshot of what you've seen is created. Did you visit the same web page several times in a month? A copy of the page each time you visited is made. The "1 cached" link will change to reflect the number of copies recorded."

While support is incomplete, this article mentions that other file types, "Images, Acrobat, Windows Media and MP3" are indexed. This probably suggests future features.

The most important missing feature seems to be, failure to capture metadata or allow you to modify it. What a wasted opportunity. There's no advertising but hopefully this can stay as a way to integrate people's desktops with the web site and not have to become its own money spinner.

Another article with screenshots, "Google Your Desktop".

AOL is going to do the same, curiously "...the desktop search tool is being tested alongside the AOL Browser but declined to elaborate further. She said the AOL Browser will launch as early as November."

Tuesday, October 12, 2004

Resolvers

Masala That's what Kowari's resolvers should do - it will be able to either provide a standard interface to live data (like LDAP queries, SQL queries, etc.) or to extract the metadata/data (from files, email, etc.) and index them in order to perform full text or normal RDF queries on it (through iTQL, RDQL, etc.).

Democratization of Radio

podcasting "Podcasting is a term used to describe the recent phenomenon of pointcasting audio programs over the Internet to people's iPod for later listening. We're all familiar with the iPod; it's become an icon of the times: hip, fun, and progressive. Most people use the iPod, and other MP3 players, to listen to music. But, combine an iPod with the Internet and innovative things start to happen."

"iPodder and similar applications have sprung up over the last few months to solve that problem. iPodder checks a list of RSS feeds on a scheduled basis and downloads any attachments right into iTunes. Anytime I hook my iPod up to my PowerBook, new MP3s are downloaded to my iPod. Very convenient."

Masala

IBM delivers Masala ""What IBM will be able to do [with Masala] is offer a federated data model that brings together a number of disparate sources in one place [to let users] search, index, and retrieve data without writing to individual data sources as you might have had to in the past," said Stephen O'Grady, senior analyst at Redmonk. "It is a fairly significant step up," he added."

"Masala has been at the heart of that effort to help "redefine" business intelligence. It is designed to allow corporate users to create a "virtual database" by collecting information seamlessly across the enterprise from things such as customer service records, e-mails, tables of numbers, photos, and other forms of information and view them all as if they were in one location.

With the new offering, IBM will also be able to compete against traditional enterprise information integration vendors such as Composite Software, MetaMatrix, and BEA's Liquid Data along with enterprise search vendors such as Verity, Endeca, and Autonomy."

Sunday, October 10, 2004

Bi-Directional Interoperability

Metatomix Delivers the mtx ERI PLATFORM for Enterprise Resource Interoperability "Metatomix(TM), Inc...announced the release of the industry's first bi-directional Java platform for integration, correlation, and policy-based automation...Based on the Semantic Web's Resource Description Framework (RDF) and a central ontology framework -- two technologies recommended by the W3C for commercial-grade data integration -- the mtx ERI Platform is the only interoperability platform that enables enterprises and government agencies to leverage node-based data integration, distributable across an entire network, for unprecedented data interoperability. The mtx ERI Platform combines scalability, reliability, and extensibility with the power of the Semantic Web to create complete resource interoperability."

Saturday, October 09, 2004

Jumping to Java 5

* Java 5 - "final" is not final anymore "It makes sense to allow updates to final fields. For example, we could relax the requirement to have fields non-final in JDO. If we read section 9.1.1 carefully, we see that we should only modify final fields as part of our construction process. The use case is where we deserialize an object, and then once we have constructed the object, we initialise the final fields, before passing it on. Once we have made the object available to another thread, we should not change final fields using reflection. The result would not be predictable."
* Annotations in Tiger, Part 1: Add metadata to Java code "Annotations are modifiers you can add to your code and apply to package declarations, type declarations, constructors, methods, fields, parameters, and variables. Tiger includes built-in annotations and also supports custom annotations you can write yourself."
* AbstractStringBuilder "AbstractStringBuilder and its subclasses demonstrate another new feature of Java 5.0: covariant returns. This is the ability to narrow the return type of a method when you override it in a subclass. It got added to the language along with generics." See also, Generics Tutorial.
* alt.lang.jre: Twice as Nice "Java 5.0 incorporates generic types as an effective means of addressing ClassCastExceptions, but this adaptation will do little for many corporations still using Java 1.3. Nice, on the other hand, was developed with ClassCastExceptions and NullPointerExceptions in mind. As such, the language supports two features, parametric classes and optional types, that go a long way toward keeping applications from throwing these exceptions. What's more, with Nice you can employ these features today, on any Java platform 1.2 or higher."
* Tiger's Corner - Displaying array contents java.util.Arrays has a toString or deepToString for multi-dimensional arrays which will print out the contents of an array.
* Why Mac Developers are Concerned About the J2SE 5.0 Wait Highlights the historic lag of porting Java to Apple OSes.

Mistakes Made by OO Databases

Whatever happened to OODBMSes, anyway? The 5 reasons given are:
1. They were seen as nonscalable and nonperformant.
2. It was "new" technology in a time when only the tried-and-true was being used. 3. It had no compelling features for anybody but the developer.
4. They just didn't perform or scale well at first.
5. OODBMS doesn't store the same way as the RDBMS. This may be the kicker--people building OODBMS systems as if they were RDBMS systems were bound to be hurt by worlds of disappointment, and vice versa (which is where we are today).

Friday, October 08, 2004

iCite

Why can’t I manage academic papers like MP3s? The evolution and intent of Metadata standards "Two key differences are identified: Firstly, digital music metadata is standardized and moves with the content file, while academic metadata is not and does not. Secondly digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while academic metadata services do not."

"Academic metadata is obtained in a variety of ways, yet even the most sophisticated methods stop short of the level of automation and convenience reached by digital music. For those not using citation management software the process is a manual one of re-entering citations in the required format. This is often done through “reference list raiding”, is subject to copying errors and is extremely tedious."

"Thankfully systems like the CDDB or the one proposed above have the characteristic that only one upload is required to make the metadata available to all. It is established that online availability increases citations to one’s work so it does not seem a stretch that easy availability of accurate metadata would also increase citations...We propose that two tasks are required to move towards more effective practices: using XMP to store academic metadata with the file and using content-based hashing to map between the digital files and a user contributed metadata database."

There's also the recent Wanted: Cheap Metadata: "What about new incentives for adding judgment-call metadata? Stephen Cayzer's work at HP Labs (see his XML Europe paper), which demonstrates how better user interfaces can make the entry of metadata less trouble for the user, will hopefully inspire others to think more about acquiring good metadata and postpone some of their ideas about what to do with that metadata."

Jotspot

Begin the Intro Tour or Advanced Tour.

Jon Udell posts a Flash demo too.

This joins other Wiki technologies like SnipSnap and XWiki.

Web of the Future

Tomorrow's Semantic Web: Understanding What We Mean "We're moving to a web where I know what was meant instead of I know what was input, where the web can understand the meaning of the terms on the page instead of just the text size and color that it should present those words in. It knows a little bit about the user background. We're facilitating interoperability so you don't have to have all these stand-alone applications that don't understand each other. The web is moving to be programmable by normal people instead of just geeks like me."

"Analyses from Gartner and Forrester say that expectation citing is critical for applications. And one of the ways you can do that is exposing the top levels of your hierarchy about what your site covers. You also see things like umbrella upper-level structures; I gave an example of the UNSPSC, which provides the top five layers for business-to-business ontologies. They don't really expect to have enough information for any one application in the business space to function, but they say if you're going to do B2B applications, come use our top levels of integration terms so that if you're using a certain kind of widget, we know where that fits in our hierarchy."

The Semantic Web "Although search engine services will still play a big role in locating information for people across the Web, the foundations set by the Semantic Web will lower the entry barrier for applications to utilize a wide array of software agents for the purposes of information retrieval and mining."

Thursday, October 07, 2004

1.5's Executor

Put JDK 1.5's Executor to Work for You "The new java.util.concurrent package that is part of JDK 1.5 (Tiger) contains a host of concurrency utilities that are designed to make developing concurrent applications easier. The package provides the Executor framework, which is a collection of classes for managing asynchronous task execution. One way to execute a task asynchronously is to simply fire off a new thread:

new Thread(aRunnable).start();

While this approach is convenient, it has a number of serious drawbacks. Thread creation on some platforms is relatively expensive, and starting a new thread for each task could limit throughput. Even if thread creation were free, in a server application, or any application that will likely be executing many asynchronous tasks, there is no way to bind the number of threads created. This situation can cause the application to crash with OutOfMemoryException, or perform poorly because of resource exhaustion. A stable server application needs a policy to govern how many requests it can be processing at once. A thread pool is an ideal way to manage the number of tasks executing concurrently.

The Executor framework simplifies the management of asynchronous tasks and decouples task submission and management from task execution. The various implementations of Executor in java.util.concurrent can support thread-per-task execution, a single background thread for executing tasks, or thread pooling (and it would even be possible to write an Executor that executes the Runnable in another Java Virtual Machine). It can provide for execution policies such as before-execution and after-execution hooks, saturation policies, queuing policies, and so on."

There's a 1.4 backport of the concurrent library available. The CopyOnWriteArrayList and ConcurrentHashMap would be useful for JRDF. A more general article on the new 1.5 features is online as well.

Medical Metadata

ADL (Archetype Definition Language) is a developing standard that is part of the Open Electronic Health Records site. A screenshot of the archetype editor shows a familiar ontology editor interface.

This looks like it is being converted to OWL by the Mayo Clinic and the University of Manchester.

RE: OWL vs. ADL "We at Mayo are working with OpenEHR to produce a set of Java methods to create Archetypes which can be represented in OWL. These Archetypes are output in the abstract syntax which is loaded into memory as Java objects which are exposed for your use. Then we have a set of routines to output these objects as OWL:rdf, which is the exchange syntax for the semantic web. This output of Archetypes will be linked to higher level models such as the CDA and a terminology model (expressed as an archetype)."

See the previous message in this thread as well.

A paper explaining what archetypes are is available here. From the paper, "In this approach [interoperable knowledge methodology], the single, hard-coded software model has become a small reference object model (ROM), while domain concepts are expressed in a separete formalism and maintained in a concept library. Accordingly, software development can proceed separately from domain modelling.

Since the software is an implementation of the ROM, it has no direct dependency on domain concepts; i.e., if new concept models are introduced, the system does not need to be altered."

Seems similar to ontology based programming or meta-programming.

Wednesday, October 06, 2004

Kowari 1.0.5 Released

From the release notes:
* iTQL now supports the having clause, exclude constraints, repeated variables in a where constraint, improved subqueries and empty select clauses:
o The having clause allows restrictions to be placed on aggregate functions, such as count.
o exclude imposes a logically opposite match on the graph to normal constraints.
o Repeated variables in the where clause allow, for example, the ability to find statements with the same subject and object values.
o Subqueries now support the use of trans, walk and exclude.
o An empty select clause returns true if the items in the where clause exist in the given graph.
* The existing string pool was rewritten to allow support to add new hard-coded datatypes. The caching was also moved closer to the string pool implementation, increasing load speed by up to 50%.
* Complete rewrite of the Jena support layer. Multiple Jena models/graphs can be created and accessed independently on multiple machines. Fastpath support allows RDQL queries to make use of Kowari's native query handling. A client/server API is added allowing client access to a Jena Model or Graph backed by Kowari. There are also bugs fixes and further enhancements to performance, such as reading RDF files.
* Enhanced JRDF support including a client/server interface, similar to Jena's. A new OWL API, called SOFA is also available. This can operate on any JRDF compliant implementation. Currently this can be in-memory or Kowari.
* N3 file support for importing and exporting of models.
* Improved RMI support including the ability to load or save data from a client to a remote server or from a remote server to a client.

It's been a long time between releases. A Kowari 1.1 preview release should be along next, followed by a Kowari 1.0.6 release.

Sunday, October 03, 2004

Visualizing Online Social Network

Vizster "Vizster is an interactive visualization tool for online social networks, allowing exploration of the community structure of social networking services such as friendster.com [4], tribe.net [12], and orkut [10]. Such services provide means by which users can publicly articulate their mutual "friendship" in the form of friendship links, forming an undirected graph in which users are the nodes and friendship links are the edges. These services also allow users to describe themselves in a profile, including attributes such as age, marital status, sexual orientation, and various interests."

Built on prefuse.

Saturday, October 02, 2004

Quick Complement of Links

* RIO 1.0 "Rio 1.0 is the first seperate release that supports the Terse RDF Triple Language, or Turtle, format"
* Tim Berners-Lee: Weaving a Semantic Web "The MIT Technology Review Emerging Technologies conference opened today with a keynote by Tim Berners-Lee, inventor of the World Wide Web. Promising “a one-hour talk in 30 minutes,” Berners-Lee gave an animated, rapid-fire presentation -- more like a 90-minute talk in 30 minutes -- about the Semantic Web, his latest initiative."
* Semantic Web: Funding Nascent Democracy "Semantic web enthusiasts, here's your chance! For just US$1000 a pop, you can be the proud owner of fo.af or lo.af!"
* Metadata "Metadata switch: thinking about some metadata management and knowledge organization issues in the changing research and learning landscape "