More News: 2004

Friday, December 24, 2004

Happy Festivus

This should really annoy the people trying to keep the true meaning of Christmas.

'Seinfeld' Festivus display vies with Nativity

"When a Florida church group put a Nativity scene on public property, officials warned it might open the door to other religious -- and not-so-religious -- displays. They were right.

Since the Nativity was erected in Polk County, displays have gone up honoring Zoroastrianism and the fake holiday Festivus, featured on the TV sitcom "Seinfeld.""

It's real if it has a Wikipedia entry: http://en.wikipedia.org/wiki/Festivus oh and a NY Times article "Fooey to the World: Festivus Is Come". Danny has a very Britney Christmas (still the number one search term).

Thursday, December 23, 2004

Visual Browser

Visual Browser is a Java application that can visualise the data in RDF scheme. The main principle of the visualisation is that:

* the triple (resource, resource, resource) is represented by two nodes connected by an edge
* the triple (resource, resource, literal) is represented by a hint (small window appearing on mouse over the subject node)

Visual Browser uses the Jena framework to obtain the data, since the RDF scheme can be saved in different forms (a single XML file or a relational database).

The visualisation engine is derived from TouchGraph LLC."

Only for Java 1.5.

Wednesday, December 22, 2004

Pre-release 2 Available

See the Kowari project page. Includes a sample application which extracts metadata using resolvers (mp3s), an example of an AudioManager.

Gridbag

IBM Reflexive User Interface Builder "The IBM Reflexive User Interface Builder is an application that constructs and renders graphical user interfaces (GUIs) for Java Swing and Eclipse Standard Widget Toolkit (SWT) based upon a descriptive XML document. (Java Swing is a rich GUI toolkit included with Java that provides operating system-independent GUI components. Eclipse SWT is an add-on GUI toolkit that takes advantage of host operating system GUI components for maximum host integration.) IBM Reflexive User Interface Builder is both a specification for a mark-up language in which to describe GUIs and an engine for creating (and, if desired, rendering) them. This application can be used as a stand-alone application for testing and evaluating basic GUI layout and functionality, or it can be used as a library within the context of a Java application for creating and rendering GUIs for that application."

So you never have to go Totally Gridbag (via Tom).

Flink

"Flink is a visualization of the social networks of the Semantic Web
community. Information about researchers and their relationships is
extracted from the Web, FOAF profiles, emails and publications. Flink
itself uses Semantic Web technology to represent, store and reason with
metadata. For more information, please see the About section of the
website."

Results of a Query

SPARQL Variable Binding Results XML Format "This document describes an XML format for the variable binding results format provided by the SPARQL query language for RDF, developed by the W3C RDF Data Access Working Group (DAWG), part of the Semantic Web Activity as described in the activity statement"

Tuesday, December 21, 2004

Just Ask for Help

Came across Yet Another RDF Store: Perfect Index Structures for Storing Semantic Web Data With Contexts "We tried to install Kowari, but failed to get a running version. In one of our installations, inserting a 1 MB N-triples file via the Jena interface resulted in 30 minutes processing time before the process threw an exception because of a full disk (there were 200MB disk space available before starting the Kowari server). On another installation, we got a core dump of the JVM when running Kowari. Therefore we concentrated in our efforts on Sesame 1.1RC2 and Redland 0.9.18."

This is the first record of anyone having a problem with Kowari in this respect and without any idea of what version of Kowari or Java or the OS or anything else I went and downloaded their example code. Using Kowari 1.0.5 (they mentioned Sesame 1.1RC2 which was released after 1.0.5) with the given code you get an exception as NTriples is not supported as a parameter. NTriples is a subset of N3. So this code doesn't work. However, if the code is modified so that the last lines become:

  model.read(new FileReader(file), file.toURI().toString(), "N3");
  model.close();
  database.close();

And run the code you get:

...
 INFO [main] (AbstractDatabaseSession.java:699) - Loading 
file:/.../University1_0.nt into rmi://.../server1#camera
 INFO [main] (AbstractDatabaseSession.java:770) - Loaded 7304 
statements
...

If people have any problems running Kowari, before they write it up as a paper maybe they should ask someone how to get it running first. Although, this isn't the first time.

Type Safe Enums

Overloading int considered harmful: part "On the other hand, enums do not handle multi-classloader issues automatically. It is absolutely possible to have FontStyle.PLAIN != FontStyle.PLAIN when the FontStyle class is loaded twice by two different class loaders. If this is a real possibility in your code (and if you're writing a library, it's always a real possibility), the obvious solution is to override equals with a method that does work. Unfortunately, you can't do that. The equals method in java.lang.Enum is declared final, and all it does is compare objects for object identity with ==. So you really have no choice but to buckle down, write your own type-safe enum class, and warn client programmers to always use equals instead of == when comparing objects."

Monday, December 20, 2004

Eye on Search

Conquering the Digital Haystack "But clearly it's time to revisit the search engine. Google's IPO didn't end the search wars, it fanned the flames. Few fields are as rife with activity, and a slew of start-ups are angling for position. Some claim new and better technology than the PageRank algorithm made famous by Google. Others seek merely to be different -- filling voids left by the big players. And though the technologies, in most cases, are brand-new and untested, they promise to change the way consumers search the Web -- and the way advertisers reach those consumers. A look at three of the hottest search start-ups -- all planning services for small businesses by early 2005 -- shows how."

"San Francisco-based Blinkx launched last July and already claims more than a million users. What does Blinkx do differently? Its technology not only matches keywords but also locates related concepts...What's more, Blinkx searches everything -- not just the Web but also the contents of your computer, including e-mail messages and attachments and files on your hard drive, as well as weblogs and digital television content, which are currently ignored by most other search engines."

"Hence Dipsie, which searches based on semantic rules rather than keywords or even concepts. Wiener claims his semantic algorithm can sift through Web information and get you in one click what might take several with a conventional engine -- if it got you there at all. He also says the ability to map concepts will enable him to index some 10 billion webpages, more than double the four billion claimed by Google."

Proximity and MonetDB

"PROXIMITY incorporates major research findings from the Knowledge Discovery Laboratory, including model corrections for statistical biases inherent in relational data such as autocorrelation and degree disparity, as well as our graphical query language. PROXIMITY provides an open-source platform that can be used for both research into relational knowledge discovery and practical applications to real-world data."

"MonetDB achieves this goal using innovations at all layers of a DBMS: a storage model based on vertical fragmentation, a modern CPU-tuned vectorized query execution architecture that often gives MonetDB a more than 10-fold raw speed advantage on the same algorithm over a typical interpreter-based RDBMS. MonetDB is one of the first database systems to focus its query optimization effort on exploiting CPU caches. MonetDB also features automatic and self-tuning indexes, run-time query optimization, a modular software architecture, etc.. In-depth information on the technical innovations in the design and implementation of MonetDB can be found in our digital library."

Proximity is Apache license and MonetDB is MPL.

Another Agile Database User

Re: Non SemWeb uses of RDF "Of course this could also be done using a big relational database. A big benefit of RDF we've found is that you can dump the data together first, and then join it up with heuristics (e.g. switch port has same MAC entry as server NIC etc..). I suspect that this is an order-of-magnitude time saving compared with doing static schema design up front."

StAX

StAX utility classes "The purpose of this project is to help facilitate the adoption of JSR-173: Streaming API for XML (StAX) by providing a set of utility classes that make it easy for developers to integrate StAX into their existing XML processing applications."

Jena 2.2 beta 1

"This release is primarily a maintenance release. The "beta" designation
means that not all work on Jena 2.2 is completed but also it is a
recognition that a pre-release to the growing Jena user community is
needed before a full release in order to smooth the transition between
versions."

"This release was built with Java 1.5.0 with source and target set to
level 1.4. It has been tested on WindowsXP, Linux and MacOS with a
mixture of Java 1.4.1, 1.4.2, 1.5.0."

Some good stuff from the release notes:
* "Reifier::getHiddenTriples() and getReificationTriples() REMOVED and replaced by iterator..."
* "added new API operation Model::removeAll() which removes "all" statements from a model"
* "a new memory-based Graph, SmallGraphMem, has been introduced."

Available from Sourceforge to download.

Also released was NG4J 0.3.

Sunday, December 19, 2004

The Future of Everything

Future of Online Marketing: Semantic Ontology & Social Marketing "Despite these struggles, the Web, for all its flaws and bottlenecks and viruses and spam, is one of the crowning achievements of the last 50 years."

"As for the impact, the Semantic Web will make what we have more useful, and that has been Berners-Lee's mandate all along: "It's all about the people, not the technology," he said at W3C10.

"We made the Web to use the Web, to get other people to use the Web and to get things done," Berners-Lee said. "We are going into a new area ... that could change the rules of Web and enable us to do all that we on this planet can accomplish." "

Intelligent Enterprise also recently published an article, Beyond the World Wide Web.

Great Southern Land

A land of wasted web opportunity "Of the World Wide Web Consortium's (W3C) 368 members, only five are from Australia, says the head of W3C's international offices Ivan Herman, who was in Brisbane last week for a conference at the Distributed Systems Technology Centre (DSTC)."

""I often hear people in Australia say they are too small to do anything, they are happy to let the big guys in the US fight out the standards and they will just use them. I find this attitude strange because there is no reason for (it); the W3C tries to have a structure where any organisation can have their wishes heard," Herman says.

This attitude limits Australia's involvement in the development of the "semantic web", the next big e-commerce wave that promises to unleash a slew of new technologies, where computers on the internet can communicate data effortlessly with each other and therefore provide more useful applications. Whereas HTML was developed for humans to read websites, the semantic web puts computers in touch with each other so they can automatically exchange data, such as stock prices, weather forecasts, bus timetables, plane routes, and sports scores and GPS co-ordinates. It is envisaged that once these services are widely available that new applications and business opportunities will emerge."

It's basically correct, Australia is generally very conservative sprinkled with a bunch of very active people. Brisbane is doing okay, my GP is working on ontologies, libferris is made here, and of course Kowari/TKS. DSTC, UQ and QUT are all doing semantic web work, of course.

Principles

W3C Delivers Web Architecture Overview ""In the architecture document, [TAG participants] emphasize what characteristics of the Web must be preserved when inventing new technology," said Tim Berners-Lee, the Web's creator who now serves as W3C director and co-chair of the TAG." They notice where the current systems don't work well, and, as a result, show weakness. This document is a pithy summary of the wisdom of the community.""

Some of the highlights:
* 2. Identification "Global naming leads to global network effects."
* 2.5. URI Opacity "Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource."
* 4.6. Future Directions for Data Formats "Data formats enable the creation of new applications to make use of the information space infrastructure. The Semantic Web is one such application, built on top of RDF [RDFXML]. This document does not discuss the Semantic Web in detail; the TAG expects that future volumes of this document will. See the related TAG issue httpRange-14."
* 5.4. Protocol-based Interoperability "Protocols designed to be resilient in the face of widely varying environments have helped the Web scale and have facilitated communication across multiple trust boundaries. Traditional application programming interfaces (APIs) do not always take these constraints into account, nor should they be required to. One effect of protocol-based design is that the technology shared among agents often lasts longer than the agents themselves."

Wednesday, December 15, 2004

Two Papers

* Web ontology reasoning with logic databases "Put together, the results developed in this thesis allow scalable and practical solutions to ABox-related reasoning problems for OWL-based applications."
* Reverse Engineering Ontology to Conceptual Data Models "One of the main problems facing genomics data integration is related to multiple representations of the data semantics within a set of sources. For example, the same gene may be represented as a genetic map locus in one data source, an aggregation of multiple individual exon entries in another data source or a set of EST sequences in yet another one. In this case, ontologies become an integral part of bioinformatics since they encourage a common vocabulary for describing complex and evolving biological knowledge [17, 18] and can be used as a common access to diverse information repositories."

Testing

* Unit Test Patterns "...two things are needed--a formalization of unit testing by establishing unit test patterns, and the early adoption of object oriented design patterns in the developing application to specifically target the needs of unit testing."
* FIT "Fit allows customers, testers, and programmers to learn what their software should do and what it does do. It automatically compares customers' expectations to actual results."
* SelfEsteem "SelfEsteem is a graphical presentation of Acceptance Test results."
* Marathon Similar to Abbot, JFCUnit, and GUITAR.
* XMLUnit.
* MockEJB "MockEJB is a lightweight framework for running EJBs. MockEJB implements javax.ejb APIs and creates Home and EJBObject implementation classes for your EJBs. Internally, MockEJB relies on dynamic proxies and interceptors...Additionally, MockEJB comes with the "mock" implementation of non-EJB APIs. Currently it provides in-memory JNDI and JMS implementations which can be used independently from MockEJB's EJB support."
* ThoughtWorks OSS.

Tuesday, December 14, 2004

I am the Amazon and the Zip Code

Suggested Google Alphabet All someone has to do now is design the letters like they did at primary school.

Mock Objects

Recently, I was link to by The case for unit testing; Mock databases?; Single-day software development... which points to things such as I changed my mind - Mock objects are wrong for database unit testing and the very good How To Write Unmaintainable Code.

Zoom, Zoom, Zoom

A Structured 2D Graphics Framework " Why use Piccolo? It will allow you to build structured graphical applications without worrying so much about the low level details. The infrastructure provides efficient repainting of the screen, bounds management, event handling and dispatch, picking (determining which visual object the mouse is over), animation, layout, and more. Normally, you would have to write all of this code from scratch. Additionally, if you want to build an application with zooming, that’s built right into the framework too."

Used by Jambalaya.

Waiting for the Schema Fairy

The Semantic WinFS "Instead, people will simply release metadata. Bad metdata. Good metdata. Incompatible metadata. Heck, they're already doing it without Microsoft's filesystem as a catalyst. Rather than a Schema Fairy, we need a standard method for generalized structuring of metadata coupled with a language for semantic translation. The good people of the world can release all the metadata they want, and when critical mass has been reached, somebody will create a translation between the two. If only we had some knight in shining armor to rescue us!

Have you ever heard those Trojan commericals where the Trojan Man rides up on his horse and saves the, um, date? Well, imagine that same commercial, except that the horse is the W3C, the Trojan Man is Sir Tim Berners-Lee, and instead of free prophylactics he's handing out RDF and OWL."

Via On the WinFS delay "If WinFS was only about figuring out how to plug in an object store to the file system, I think it would have a definite ship date by now. The technical issues involved in doing that are tractable, and the PDC build showed that MS is making decent progress on that front. The problem of associating blobs of metadata with blobs of information in the filestore is most likely done (or getting there). However, even with that in place there are larger issues: what does this metadata look like, and where does it come from?"

"Even if the Schema Fairy came down today and blessed us with canonical schemas for everything, there would still be the problem of metadata extraction and annotation. WinFS will only work if it has metadata to process, and right now there aren’t very many good ways of marking up a document with semantic metadata. Manual annotation just doesn’t work – the time cost to the user is too high."

Monday, December 13, 2004

REST of the world

How I explained REST to my wife... "Ryan: Machines don't have a universal noun - that's why they suck. Every programming language, database, or other kind of system has a different way of talking about nouns. That's why the URL is so important. It let's all of these systems tell each other about each other's nouns.

Wife: But when I'm looking at a web page, I don't think of it like that.

Ryan: Nobody does. Except Fielding and handful of other people. That's why machines still suck.

Wife: What about verbs and pronouns and adjectives?

Ryan: Funny you asked because that's another big aspect of REST. Well, verbs are anyway."

Updated Kowari Javadoc

For all your API goodness Kowari Metadata Store 1.1.0 API.

Sunday, December 12, 2004

Another XML Accelerator

I was recently talking about Intel's old NetStructure™ 7210 XML Accelerator a more recent piece of hardware (which is different) DataPower XML Hardware Accelerator "As Rich mentioned in his comment hardware acceleration is a pretty hot ticket at the moment, and for good reason. While some might argue all hardware acceleration does is bring XML applications to the same performance level of binary data-based applications my counter question/remark would be "You mean I can get all the power and benefit that a XML/XSLT-based solution brings AND get the same performance as an application that has been painstakingly handcoded in C or at best C++? WOOHOO!!!""

To create an RDF accelerator maybe you could use a programmable graphics card.

How to Create a Good Web UI

Or should that be Google Web UI: "Looking at the web applications Google is producing, I can comfortably say (and I’m almost never comfortable making technological predictions), that if you develop web applications and you aren’t looking today for ways to include dynamic interface techniques like those made practical by XmlHttpRequest, you’re going to end up losing to someone who is."

Friday, December 10, 2004

Kowari 1.1.0 Pre-Release 1

What's new:
* Resolvers allows developers to create components that expose data sources as RDF.
* Content handlers allow Kowari to extract metadata from different types of files.
* Improved datatype handling for most XSD datatypes, RDF's inbuilt XML Literal datatype. Allows for the storage and querying of unsupported datatypes.
* Improved performance, specifically on small queries and subqueries.
* AbstractDatabaseSession replaced to allow pluggable Session implementations. Jena, JRDF or iTQL are able to be used separately.

See http://www.kowari.org/.

Provenance to Unify Browsing

A Framework for Unified Information Browsing via RDFX "This Unified Information Service Browser (UISB) project is investigating the means of joining heterogeneous information from multiple sources into a combined and navigable structure...Attempting to unify disparate data sources illustrates that hierarchical and relational data models can place undesirable constraints on the stored data; however, the same data can be stored without constraint in a dynamic graph, which also provides significant potential for enhanced traversal and discovery capabilities."

"The solution that was adopted was to include provenance information for every statement that is added to the graph. Thereafter if a graph, or a statement or a resource is removed from the store, it can be done without the risk of losing information. In addition to identifying where a statement originated, the provenance objects can store metadata describing (among other things) when it was discovered, and when it should expire, therefore enabling the automated management of expired information, or potentially providing an event mechanism to instigate the refresh of the data. The benefit of this solution, is that it does not rely on the graph implementation to ensure that graph merging and un-merging happens as expected, however the added overhead of four extra statements for tracking provenance may negatively affect performance and increase the size of the data store. This will depend on the implementation of the graph store since some stores (e.g. Jena 6.4) can reify statements more efficiently."

Bottom Up Tagging

Playing with Taxonomies "I spoke with Stewart Butterfield of Ludicorp, developer of Flickr, about this effort. (Latin scholars note: The Ludi in "Ludicorp" suggests the Latin words for play and game.) Stewart had many insights about this new approach to building taxonomies. "If you can hire enough excellent librarians, you will get better keyword results than with social approaches. However, as the content grows, tagging (and retagging) becomes an order of magnitude more difficult. In other words, social approaches are 80% as good as and 10 times easier than top-down approaches." As to whether Flickr's approach would work in the button-down corporate world, Butterfield had this to say: "Anticipate resistance in the CIO crowd who don't want to risk losing control in a social self-correcting process and do not want anything to get lost." Butterfield says that at least 55% of photos uploaded to Flickr have one or more tags, and 66% have both a tag and user-supplied metadata. As of early September, Flickr had 500,000 photos on its site and was growing at the rate of 15,000 to 20,000 more each day."

Thursday, December 09, 2004

Sometimes You Can't Make It On Your Own

On Folly "The unthinkable rapprochement between topic maps and RDF has occurred, signified by the formation of the W3C RDF/Topic Maps Interoperability Task Force. The task force is part of the Semantic Web Best Practices and Deployment Working Group. The last time I was with the majority of the people listed as members of the task force, it was in a very pleasant drinking establishment in Amsterdam. It's nice to think that the bonhomie of that evening has persisted into forming the basis of the task force."

"...and I think he mischaracterizes Berners-Lee's approach as calling for globally agreed ontologies. Nevertheless, he does note an increase in independent software activity concerning ontologies and the Semantic Web."

"Both Shirky and Udell seem to be pretty much convinced the Semantic Web requires, from the outset, globally agreed ontologies. It seems more that they've set up a straw man. I had always envisaged that in the same way user interface and other conventions have emerged from the messy web, so would ontological conventions. Messy, but good enough."

When different isn't better

Why Coding Standards? "Coding standards are being adopted by more development organizations as a means of ensuring the delivery of reliable and sound applications. Estimates suggest that 80 percent of the total cost of a piece of software go to maintenance, and not enough effort is being directed at ensuring that the quality goes in during the development process.

With the advent of coding standards, companies are saving money and time on the front end, reducing potential safety risks and safeguarding their reputation both internally and externally. Equally, development teams can reduce their time in code reviews, correcting coding standards violations that could be detected—and in some cases corrected—automatically."

IBM CICS J2EE

IBM Extends CICS for Web Services, J2EE "Announced last week, the CICS Transaction Server for z/OS 3.1 provides Web services capabilities, while the CICS Transaction Gateway 6.0 provides J2EE (Java 2 Platform, Enterprise Edition) connectivity. The software enables customers to more easily integrate business processes and extend existing mainframe-based applications with better Web services and integration capabilities."

This was a long time in coming: "IBM Corp. will bring its San Francisco project into line with Sun Microsystems's Enterprise JavaBeans object specification and announce intentions to add Enterprise Java application server functionality to a range of its middleware products, all at the Java strategy day at JavaOne this March in San Francisco...IBM may achieve this by building a separate application server, forming a partnership with EJB server vendors such as Gemstone or WebLogic, or developing an application container that will run inside or on top of current server offerings such as CICS, according to sources."

Metafilter on Metadata

Recently we've all been thinking about flat (or better, faceted) hierarchy web apps that organize email, photos, bookmarks, and general knowledge. The common threads are metadata (tags, categories, labels) that enrich relationships within and hence searchability of large collections. But besides marketroid hype (buzzwords, snark) and a computer that plays Twenty Questions what else can we do and study using faceted data structures: searchable culture references in The Simpsons, library science, computer filesystems, A.I. development, models for human memory and cognition?

FacetedClassification.
From http://www.metafilter.com/mefi/37515.

RDF, the ultimate agile database

Perspective on XML: Be humble, not imperial "The revolt against imperial modeling of code has already taken shape in the form of languages and agile methods. Agile programming emphasizes highly iterative development in close collaboration with the eventual users of the product. Even more important, it stresses the inevitability of change and evolution. In effect, agile developers pride themselves on being able to rapidly accommodate change."

"The same revolution is in the offing for data modeling. There have been some developments in agile databases which, literally, adapt the ideas of agile programming to the design of (usually relational) databases, but there is also progress occurring in semi-structured databases and, in particular, XML."

Is XML Zen in opposition to "strong" data modeling? "One thing I seem to share with so many of my colleagues in the XML world is a wary attitude towards traditional data modeling practices. It's an attitude that has also informed my thinking in related articles pondering data supermodels, coupling of distributed systems, OO encapsulation, and the like.

Some of us see XML as a bit of a refuge from established schools of data modeling. OO and Unified Process in my case, E-R and other relational based modeling in others'. Some just came from document-centric backgrounds where such extremely rationalized data modeling was not the mainstay. In my case, interest in XML was part of a general interest in data modeling as a vehicle for human expression rather than for robotic simulation of the real world."

As I've said before, XML is not relational enough and RDF is relational or rather "RDF provides a relational data model of the Web".

2-4 Players, Ages 10 and above

The Epistomat: Generating Consensus About RDF Ontologies and Rules "What happens when you take an ontology-driven interface generator, and feed it the ontology for OWL itself? A simple recursive technique allows for bootstrapping a world model and programmatically generating interfaces to it. Such a model can then serve as the platform for the evolution of collective consensus about ontologies and statements, by means of a curious game of Nomic."

From the Nomic site:
"Nomic is a game in which changing the rules is a move. In that respect it differs from almost every other game. The primary activity of Nomic is proposing changes in the rules, debating the wisdom of changing them in that way, voting on the changes, deciding what can and cannot be done afterwards, and doing it. Even this core of the game, of course, can be changed."

Protein Database

Constructing ontology-driven protein family databases. "The protein phosphatase resource, PhosphaBase, is freely available over the internet (http://www.bioinf.man.ac.uk/phosphabase). The DAML+OIL ontology for the protein phosphatases and the ABC transporters is available on request from the authors."

Ganesha

I previously mentioned Ganesha but before it was available for download. Well, download it.

Wednesday, December 08, 2004

OO DB goes OS

Object database goes open source "Startup company db4objects this week is releasing its object database, db4o, under an open source format, with the product now available either under the GPL via open source or commercially as embeddable software.

Built for Java and .Net development, db4o enables storage of objects, according to the company. An example of an object could be a vitamin in a biotech application or a brake configuration in an automotive application, according to Christof Wittig, CEO of db4objects."

Their homepage prominently displays the fact too.

Tuesday, December 07, 2004

Good-bye JUnit

TestNG "TestNG is a testing framework inspired from JUnit and NUnit but introducing some new functionalities that make it more powerful and easier to use, such as:

* JSR 175 Annotations (JDK 1.4 is also supported with JavaDoc annotations).
* Flexible test configuration.
* Default JDK functions for runtime and logging (no dependencies).
* Powerful execution model (no more TestSuite).
* Supports dependent methods.

I started TestNG out of frustration for some JUnit deficiencies which I have documented on my weblog here, here and in particular, here. Reading these entries might give you a better idea of the goal I am trying to achieve with TestNG. You can also check out a quick overview of the main features and an article describing a very concrete example where the combined use of several TestNG's features provides for a very intuitive and maintainable testing design."

A simple example via this blog entry.

Monday, December 06, 2004

Metadata as Self Defense

Bootstrapping the semantic Web "It's tempting to draw parallels between the careers of Albert Einstein and Tim Berners-Lee. Both men made world-transforming breakthroughs and then pursued even grander visions. Einstein, of course, never found the unified theory he sought for three decades. A lot of people think Berners-Lee's vision of a semantic Web will prove equally elusive."

"Semantic-Web naysayers think people and organizations can't be bothered to assert machine-readable facts about themselves. And, today, that is undoubtedly true. But when others assert facts about you -- as they increasingly will -- the tide could begin to turn. Individual acts of self-defense may ultimately combine to bootstrap the semantic Web."

Rice of the future

Berners-Lee Maps Vision of a Web Without Walls "To envision the Internet of the future, W3C director and WWW founding father Tim Berners-Lee suggested during the W3C's 10-year birthday bash here Wednesday, first envision groceries—say a box of rice.

On the box's side, in small, rice grain-sized type, you will find nutrition information. On its back, you will find directions on how to cook it. Somewhere else you may find a URL that you can use to research any number of rice-related things: recipes, country of agricultural origin, Uncle Ben company data or relevant information pertaining to the allergenic nature of rice, perhaps."

"Haystack knocks down the partitions that separate e-mail clients, file systems, calendars, address books, the Web and other repositories so that information can be worked with regardless of its origin.

Such applications will have a big impact on personal information management, Berners-Lee said, as users will be able to do things such as drop their bank statements onto their calendars and have items automatically populate given dates.

Such descriptions sound familiar to anybody who's been following IBM's work with its Information Integrator technology or Oracle Corp.'s upcoming Tsunami content management offering, which it plans to roll out at Oracle OpenWorld in San Francisco next week."

"The Semantic Web is going to be like a huge data bus, Berners-Lee said—a back-end bus that spans the planet. Comparing it to Tsunami or Information Integrator is like saying there used to be Hypercards before the Web. "Yes, there were innumerable Hypercard applications before the World Wide Web," he said. "They just didn't talk the same language.""

Books, Music and Humour

* The nation's 10 favourite books and there's actually 11. 1984 is an interesting inclusion, panelist noting that it's relevant because of the similarities between Osama and Goldstein (with recent news of him disappearing).
* Don't believe the hype "U2 are probably the most over-rated band in history. Their debut, Boy, was a classic and still sounds fresh and impassioned. Fatally though, they became a band that believed their own (fawning) press and whose egotism has devoured their talent. The Joshua Tree showed what a good guitar group/stadium rock band U2 could be. Sadly, they had the sort of pretensions that usually afflict mediocre American outfits like the Chili Peppers. On Achtung Baby and Zooropa they started plundering other bands' innovations and moving into "dance music" - though only the whitest, geekiest student could dance to them..." Also The Beatles, both Evlii (Elvis Costello and Elvis), Prince, Nirvana, etc.
* The 10 Least Successful Holiday Specials of All Time Includes: Ayn Rand's A Selfish Christmas (1951) and The Lost Star Trek Christmas Episode: "A Most Illogical Holiday" (1968).

Processing Java Annotations

Gram "Gram is a simple xdoclet-like tool for processing doclet tags or Java 5 annotations in source code or bytecode and auto-generating files, data or resources.

Gram = Groovy + JAM. JAM does all the hard work of abstracting away the details between annotations and doclet tags and handling Java 1.4 and 5 compliance. Groovy takes care of the scripting, code generation & templating. Gram is the little tidy bit of code in between."

To Southampton and beyond

Berners-Lee takes professorial chair at Southampton University "Berners-Lee will take up a chair in computer science at the University of Southampton's School of Electronics and Computer Science, holding this position as well as being senior research scientist at MIT, and director of W3C."

Friday, December 03, 2004

OWLchestra

OWLchestra "OWLchestra is a Web vocabulary management system (WVMS) that enables the collaborative development of small-scale OWL ontologies and RDF schemas." Includes screenshots and a brief paper. I saw this a while back, via BNode.

My Other Computer is Google

The magic that makes Google tick "The numbers alone are enough to make your eyes water.

* Over four billion Web pages, each an average of 10KB, all fully indexed.
* Up to 2,000 PCs in a cluster.
* Over 30 clusters.
* 104 interface languages including Klingon and Tagalog.
* One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
* Sustained transfer rates of 2Gbps in a cluster.
* An expectation that two machines will fail every day in each of the larger clusters.
* No complete system failure since February 2000."

PISTA

Semantic Association Identification and Knowledge Discovery for National Security Applications "Our goal is to research new techniques and improving effectiveness of techniques to identify semantic associations and knowledge discovery by exploiting a large knowledge base. Specific objectives include (a) ontology driven lazy semantic metadata extraction (i.e., annotation) to complement traditional active metadata extraction techniques, and (c) formal modeling and high-performance computation of semantic association discovery including ontology-based contextual processing and relevancy ranking of interesting relationships."

From the project report: "The nai?ve algorithm to find all paths between 2 nodes in a directed graph [1] shown below is a recursive implementation of a depth-first search. Our first implementation of the ?-path operator is based on this algorithm."

"Our initial implementation of the ?-Intersect operator is based on the ?-path operator. It searches for nodes where two ?-paths intersect (see Figure 5). We recognize the fact that there could be multiple intersection points for the ?-paths. Hence our implementation returns the sequence of nodes that are common between 2 ?-paths."

"The goal of ?-Iso is to take two resources as input and discover all paths that are “isomorphic” in both resources..."

Also, Semantic Association Identification and Knowledge Discovery for National Security Applications and Context-Aware Semantic Association Ranking.

Keeping Secrets

When Secrets Make Sense "Recently I wrote a short piece making a strong and general claim that the same forces that are pushing data towards XML are pushing software towards Open Source." From that article "These days, interoperation and integration are everything. You’d better have open interfaces, open networks, open services; that is, open data."

"’I'll put my finger on two pieces of Apple technology that benefit from being (for now) closed-source: Aqua and the video part of iChat. Both of them contain some magic that nobody else has figured out how to do yet, and if they can figure out how to make a few bucks in the gap before the world catches up, more power to ’em."

"Closed source isn’t over or anything like that. It’s just headed for a niche role, business-wise."

Not sure I believe that XML is the force to free you from vendor lock-in. RDF is of course - remember to put all your data in Kowari/TKS though, you know it makes sense.

A couple of quickies

* Via Matters EPOS "The objective of EPOS is to leverage a user's efforts for his personal knowledge management for his own benefit as well as to evolve this within the organization." The successor to Frodo.
* I'm sure everyone is now reading SWS IG scratchpad "Helpful paper tips: Repeat, "my god, we save X billion dollars with OWL-S! But we can't quite tell you how since that would reveal trade secrets! But trust us!" as many times as possible. Shrink margins if you must!" for the WWW2005 Tutorials and Workshops

Thursday, December 02, 2004

Space Colonies and Ontological Programs

What the Heck is an Ontology? (in PDF format) "There's no doubt about it; ontologies are on the up. The ideas are gaining wider acceptance and these days you're much less likely to be greeted by that familiar baf ed expression of those who are too afraid to ask what the heck you are talking about. Tools like Protege (http:// protege.stanford.edu/)continue to improve in robustness, and extend their functionality; and there are new tools on the scene too,such as SWOOP (http://www.mindswap.org/2004/SWOOP/) and Haystack (http://haystack.lcs.mit.edu/index.html)."

"Perhaps the hardest technical challenge is using ontolo-gies to enable the ad-hoc reuse of existing software com-ponents. In other words, it is much easier to design a suite of programs that all commit to the same ontology than to coerce existing programs to communicate effectively."

Wednesday, December 01, 2004

Coding for Failure

Automating Software Failure Reporting "Developing a failure reporting system requires an understanding of a product’s customer base, as well as usage profile. It is also important that you under-stand the failure profile of your product so that you can focus on events most annoying to your customers. Although product attributes are unique, there is a generic set of data that, if collected, will help in diag-nosing failures. In collecting customer data, however, you must address all privacy concerns prior to rolling out your process. Along with a failure collection system, you also need a process that can distribute patches to address those failures.

At Microsoft, our experience has led us to develop a generic methodology to process, transmit, analyze, and respond to customer failure data. Differences exist in the way this process can be implemented; it is usually dependent upon the product type and its failure profile. "

Java and LGPL

The LGPL and Java "FSF's position has remained constant throughout: the LGPL works as intended with all known programming languages, including Java. Applications which link to LGPL libraries need not be released under the LGPL. Applications need only follow the requirements in section 6 of the LGPL: allow new versions of the library to be linked with the application; and allow reverse engineering to debug this."

"If you distribute a Java application that imports LGPL libraries, it's easy to comply with the LGPL. Your application's license needs to allow users to modify the library, and reverse engineer your code to debug these modifications. This doesn't mean you need to provide source code or any details about the internals of your application. Of course, some changes the users may make to the library may break the interface, rendering the library unable to work with your application. You don't need to worry about that -- people who modify the library are responsible for making it work."

Monday, November 29, 2004

IBM boosts Oncology Ontology

IBM and Massachusetts General Hospital Announce Effort to Improve Information Sharing Among Cancer Researchers ""Effective tools for information management, integrated tightly with underlying computing and data infrastructures, are key to life sciences researchers gaining new insights into complex problems," said David Grossman, Distinguished Engineer, IBM Internet Technology Group. "In addition, the use of semantic web technologies to integrate many sources and formats of data with advanced modeling algorithms is particularly helpful for this type of large-scale collaborative project.""

""There is an urgent need to develop a common, unifying infrastructure that enables the integration and sharing of knowledge about cancer -- both in terms of disparate data and distinct computational tools -- with the goal of modeling cancer as a complex dynamic system," said Dr. Deisboeck. "While advances in cancer research and new technologies have generated a wealth of new data and insight, all too often the lack of shared systems and standards makes integration of this crucial knowledge difficult or impossible.""

Python and PHP

The Next Language "The vast majority of J2EE deployments (over 80% according to Gartner) are simply Servlet/JSP to JDBC applications. Basically HTML front-ends to relational databases. It is ironic that much of what makes Java complicated today is all of its numerous band-aid extensions, such as generics and JSP templates, which were added to make these types of simple applications easier to develop."

"Apparently what is needed is a language/environment that is loosely typed in order to encapsulate XML well and that can efficiently process text. It should be very well suited for specifying control flow. And it should be a thin veneer over the operating system."

Just to make this clear, the idea that the future of application development is turning them into "a big text pump" seems rather foreign and completely opposite to where application development seems to be going.

Requirements and Architecture

Requirements guru shares 'cosmic truths' "Wiegers' list of cosmic truths also includes:

* "Customer involvement is the most critical factor in achieving software quality."
* "The customer is not always right, but the customer always has a point."
* "Change happens."
* "If it’s not in the requirements specifications, don’t expect to find it in the product."
* "Even the best requirements document cannot replace human dialog."
* "You are never going to have perfect requirements.""

Architected RAD gets an A in Gartner study "The Gartner survey of development teams, completed in the past month, found the ARAD approach reduces training time and increases productivity of coders regardless of the vendor tool used.

"We have gotten consistently positive feedback from users of Computer Associates' Advantage:Plex, Compuware's OptimalJ and IBM's Rational Rapid Developer offerings," writes Michael Blechar, one of the Gartner analysts who worked on the survey."

"Of the newest tool technology, the Gartner report says: ARAD methods and tools are just beginning to achieve recognition by mainstream Java 2 Platform, Enterprise Edition (J2EE) and .NET developers. The tools provide development teams with pre-built J2EE and .NET frameworks as well as pre-built technical components, which Gartner says can be customized by technical architects and used to generate 60 to 85 percent of the code. Then the programmers on the development team can add the business logic specific to the application."

Sunday, November 28, 2004

Learning with the Semantic Web

Reasoning and Ontologies for Personalized E-Learning in the Semantic Web "Adaptive educational hypermedia systems are able to adapt various visible aspects of the hypermedia systems to the individual requirements of the learners and are very promising tools in the area of e-Learning: Especially in the area of eLearning it is important to take the different needs of learners into account in order to propose learning goals, learning paths, help students in orienting in the e-Learning systems and support them during their learning progress...We propose a framework for such adaptive or personalized educational hypermedia systems for the semantic web. The aim of this approach is to facilitate the development of an adaptive web as envisioned e.g. in (Brusilovsky and Maybury, 2002). In particular, we show how rules can be enabled to reason over distributed information resources in order to dynamically derive hypertext relations. On the web, information can be found in various resources (e.g. documents), in annotation of these resources (like RDF-annotations on the documents themselves), in metadata files (like RDF descriptions), or in ontologies. Based on these sources of information we can think of functionality allowing us to derive new relations between information. "

Friday, November 26, 2004

Free trade that isn't free

Patently yours "We quite understand that (the title) How to Kill a Country may sound alarmist...We use the parallel experience of Canada to buttress some of these points. Canada is now being described by leading author, Mel Hurtig, as a "Vanishing country"...By the mid 1980s, about half of the major US corporations in Canada were 100-percent American-owned. Ten years later, some 85 per cent had no Canadian shareholders...As Canadian shareholders were eliminated, corporate boards were substantially reduced in size and more American directors were added, as were more U.S. CEOs and board chairmen. As external directors were eliminated, there was no longer a force to influence policy decisions which would be beneficial to Canada. Gone too was the ability to scrutinise the payment of dividends, management fees, and content costs paid to the parent company."

"But the Australian negotiators overlooked the point that Australia is a net importer of IPRs...As a whole, Australian industry has everything to gain by moving away from the Microsoft stranglehold and towards an Open Source mode - rather like governments in Germany and Taiwan are currently doing in earnest...local firms would do well to shift towards the Open Source model, and utilise open source programs such as Linux..."

"...frequently the actions are entirely justified, and entirely in the spirit of competition - as when an importer of copyright-protected CDs seeks them out in a third market and imports them, entirely legally, at a lower cost than is stipulated by the IPR-holder. The FTA makes this action much more difficult - in the name of placing severe restrictions on parallel imports. Another name for this is placing restrictions on free trade in IPR-protected goods - all within a "free trade" agreement!"

Dangling Databases

Why Relational Databases And Semantics Don't Mix "Jarg Corporation, which takes its name from "jargon", is about the next evolution of search. Actually, it is about more than that since semantics has wider applicability than search, but we will stick with search as an easily understood example of this technology."

"So, the question is: how does it do that?

The first answer is that it doesn't - at least in any general sense - only where it has already built an ontology which, in this case, is within healthcare. Indeed, its first customer is a hospital medical library. However, industry knowledge bases are becoming widely available and Jarg reckons that about 75% of the work involved in creating an ontology can be automated, so extending its product for new customers should not be a big issue."

"The second answer is that it achieves this sort of performance by refusing to use a relational database. Instead, it has patented its own approach, which involves storing semantic fragments. A segment fragment is either two elements and the relationship that joins them (for example, "Waterloo is a station") or it can store and element with a "dangling" relationship. This latter concept is especially important. The whole point about searching is that you want to be able to discover relationships that you didn't know existed."

MEST Architecture

The MEST architectural style "We have both agreed that we shouldn’t call our architectural style ProcessMessage after all. Instead, we decided to call it MEST (MESsage Transfer) so as to recognise the big influence REST had in this work and our thinking in general. So, after all the blog entries and the discussions with the community, we have finally arrived to the MEST architectural style of which ProcessMessage is part."

WWW2005 Tutorial: Architecting and Developing Message-Oriented Web Services "Savas and I have been accepted to present a tutorial at WWW2005 in Chiba next year. We're going to be talking about message-orientation and the MEST architectural style. Our approach is going to be very interactive: We'll be doing head-to-head live coding and will have the audience involved right the way through.

Broadly speaking, we're going to introduce a simple problem domain (probably a simple game), get the audience to work through the domain with us, identifying the services and message exchanges involved, then we'll code up a solution. Once we've got a solution in place we'll break it in various interesting ways and show how various WS-* protocols can help prevent such breakages from occuring."

RDF on Lambda

RDF and Databases "Some RDF research dropped me to a nice paper (PDF) from IBM discussing RDF with relational databases. This combination can replace half-baked application data mechanisms. These crop up regularly in my consulting work. Think nested directories of Windows INI files and brittle, binary files breaking on minor design iterations. The pain, the pain."

"There are several projects in this domain. My favorite so far is OpenRDF Sesame. It supports querying at the semantic level. It seems more mature than others, having derived from previous efforts, and works with both PostgreSQL and MySQL as well as Oracle. An abstraction layer called SAIL makes Sesame database-agnostic. Sesame even sports a stand-alone b-tree system, or in-memory operation, if you don't want an external database."

Thursday, November 25, 2004

One step closer to making a lighter Kowari

An initial port of Sesame 1.1's RIO RDF/XML parser to JRDF has been checked into JRDF's CVS repository. It's basically the same with a few modifications to the constructor, the SAXFactory explicitly asks for a reader that will use namespaces and a reduction on depending on other Sesame classes.

In Kowari it already uses JRDF to do N3 and RDF/XML exporting. A recent requirement that I was just asked about was providing RDF/XML from the result of an iTQL query. The client side JRDF API allows the creation of a JRDF graph using an iTQL answer so it might be possible to plug this into the exporter classes.

Instruction on getting it are here.

A simple example of using it:


public class RdfXmlParserExample {

  public static void main(String[] args) throws Exception {

    String baseURI = "http://slashdot.org/index.rss";

    URL url = new URL(baseURI);

    InputStream is = url.openStream();

    final Graph jrdfMem = new GraphImpl();

    RdfXmlParser parser = new RdfXmlParser(jrdfMem);

    parser.setStatementHandler(new StatementHandler() {

      public void handleStatement(SubjectNode subject, 

        PredicateNode predicate, ObjectNode object) {

          try {

            jrdfMem.add(subject, predicate, object);

          }

          catch (Exception e) {

            e.printStackTrace();

          }

        }

      }

    );



    parser.parse(is, baseURI);

    Iterator iter = jrdfMem.find(null, null, null);

    while (iter.hasNext()) {

      System.err.println("Graph: " + iter.next());

    }

    is.close();

  }

}

While mentioning Kowari, one of the included resolvers to be included in 1.1 will generate statements based on the latitude and longitude of two points. We used the DAML Geofile linked from Semantic Web Application Integration: Travel Tools.

Tuesday, November 23, 2004

15 parts of Classification Theory

Another JOT link, this time the 15 parts on "The Theory of Classification". Here they are: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15.

Creating Requirements

Generating Complete, Unambiguous, and Verifiable Requirements from Stories, Scenarios, and Use Cases "Although very valuable as requirements elicitation, analysis, and initial validation tools, stories, scenarios, and use case path specifications are typically inadequate for specifying requirements because they are incomplete, ambiguous, and therefore unverifiable. For example, they usually do not address preconditions and postconditions, which have a huge influence on the meaning of the requirements. Similarly, they do not tend to state the triggering events that cause them to be true. They also do not typically clarify the distinctions between requirements (i.e., what the system must do and what postconditions it must ensure) and ancillary information (e.g., triggering events produced by actors and preconditions that may or may not be ensured). This column has provided examples and guidance on how to transform stories, scenarios, and use case path specifications into complete, unambiguous, and verifiable textual requirements."

Anti-metadata Google

Dear Mr. Bosworth "I can't believe that a smart guy like you could advocate Poscasting as a solution to the complexity of the semantic web and all that enterprise, corporate complexity - without a hint of meta-data - anywhere.

How's the work? Inference? Osmosis?

I know that Google is known as an anti-meta-data sort of place - but PLEASE oH LORd - get over that!

This is NOT about the religious wars of RSS 1.0 vs RSS 2.0. I couldn't give a dam about rdf, T B-L or any of that semantic web hooey.

I'd just like to see folks standardize on attributes, properties and meta-data around these new, burgeoning forms of micro-content."

Monday, November 22, 2004

Third SIGSEMIS

Volume 1, Issue 3 (PDF) "For the third time AIS SIGSEMIS bulletin is in your hands. Many interesting articles, a featured interview with Tom Gruber, our regular columns as well as several interesting announcements are waiting for your attention."

Interview with Tom Gruber: "In fact, the World Wide Web is based on a semiformal ontology, and it shows how ontological commitment works in software interoperability. At its core, the concept of the hyperlink is based on an ontological commitment to object identity. In order to hyperlink to an object requires that there be a stable notion of object and that its identity doesn’t depend on context (which page I am on now, or time, or who I am). Most of the machinery of the early Web standards are specifications of what can be an object with identity, and how to identify it independently of context. These standards documents serve as ontologies – specifications of the concepts that you need to commit to if you want to play fairly on the Web. If one built a system with these commitments, all of the web infrastructure works well."

"Intraspect was designed on the assumption that it is more valuable to get evidence of human knowledge into a collective memory than to add structure to existing online material. So we created technology that helped people work together on-line, and as a byproduct their work became available for discovery using information retrieval technology."

Other interesting articles: "Component Requirements for a Universal Semantic Web Framework", "Elements of a First Visual Rule Language for the Semantic Web" (about REWERSE), "Response Management in Multidimensional Web Information Systems", "The Semantic Web Trends in Brief", "Using e-business Registry / Repository for E-Health Semantics" and many others.

Your blog is boring and other links

* RSS, Blogs on a Roll, But How Extensive is Their Use? and a response.
* Swebok All you need to know about Software Engineering.
* Best Software Essays of 2004 via Danny.
* ISCOC04 Talk on how RSS 1.0 failed, with lots of other comments that seem to support a RESTful like system like RDF. With followups Fielding Bosworth and Quick Reactions.
* The Many Faces of J2EE, v5.0 Another annotations for J2EE 1.5 piece as well as Google and JBoss join SE/EE Java Council.
* Domain Speific Language and Domain Specific Modelling. Is UML really the best tool for the job?. The end of UML?

Friday, November 19, 2004

Metadata does Matter

In a similar vein to, I.T. does Matter a recent column entitled Does Meta Data Matter? highlights the competitive advantage that metadata plays in the enterprise.

"Is meta data strategic in nature? Yes. The reality is that information technology continues to get more complex. Our ability to manage these technologies and solutions requires a higher degree of knowledge and management skills. In the dynamic environment we see emerging, command and control style of management fails to deliver a competitive advantage. As a wise man once said, "All great things have been in done in spite of management." Our ability to adapt within the technology community may be dependent on our ability to handle multiple tasks, objectives and strategies which can then change on a dime. Meta data plays a central roll in your organization's ability to become an agile organization. Moving to common infrastructures, software platforms and even systems does not negate the competitive advantages that technology and meta data can bring."

Keeping abreast of the Semantic Web

Revolutionising breast cancer treatment through knowledge management "The system uses Semantic web technologies, enabling information from X-ray mammograms, MRI images, biopsy results and data from the clinician to be made available when the practitioners meet for their weekly Triple Assessment Procedure. Semantic web technologies allow information to be linked in such a way that it can be easily processed by machines. Practitioners can then view different types of images and scans, call up patient information, and automatically generate reports. It is also possible to investigate, annotate and analyse the data using web and Grid services.

‘This research draws on technologies in which the UK is a world leader,’ says Professor Nigel Shadbolt of the School of Electronics and Computer Science at the University of Southampton. ‘Eventually, e-health will be delivered using the web and incredibly powerful networks of computers. Medical practitioners will have the information and evidence at their fingertips to support decision-making that has a direct impact on us all.’"

More Working Notes

Andrae is out doing both Paul and I with two blogs: Etymon which has links to papers, interesting articles and the like and Circumlocute which is similar to Working notes.

When two worlds combine

"Gnowsis and Fenfire have met and another time, great Semantic Web developers (aka us) have proven that using ontologies, RDF and web protocols RULEZ. In just a day, two open source projects made substantial integration work."

An early screenshot of Fenfire and Gnowis combining their efforts.

Google Scholar

Hits for the Semantic Web gives you that piece in Scientific American. Another interesting one is citations to "The Description Logic Handbook: Theory, Implementation and Applications".

Sesame 1.1

Sesame 1.1 released "Highlights of this release are:

* The Graph API, an extension of Sesame's access APIs, allows fine-grained manipulation of RDF models directly from Java.
* The Native Disk Store is a new storage backend that works directly on the file system, without need for a DBMS. It uses B-Tree indexing on binary files for fast, efficient and scalable storage.
* SeRQL revision 1.1 is a syntax revision that makes SeRQL queries even easier to read and write, and makes embedding in XML easier.
* Blank node handling has dramatically improved compared to 1.0.x.
* Lots of issues related to full Unicode support have been fixed.
* RDF Schema inferencing has been updated to be fully compliant with the W3C RDF Semantics Recommendation.
* Support for MS SQL Server as storage backend RDBMS. Thanks to Adam Skutt for providing fixes and suggestions for this.
* The Rio parser now supports the Turtle serialization format.
* Partial OWL reasoning support through Sesame's custom inferencer.
* Fully updated and extended User Documentation, including code examples for use of the Sesame APIs and a new Troubleshooting and FAQ chapter."

Thursday, November 18, 2004

What's new with Java 6

Sun invites outside involvement with Java 6 "The new version will be easier to manage, exposing information that outside management software can use to make control decisions, said Mark Reinhold, chief J2SE engineer. And it will be easier to find problems, with an "attach-on-demand" feature that can let debugging software graft onto software while it's running instead of just before it's launched.

Another item on the list is support for a basic set of Web services called WS-I, Hamilton said. That basic set, standardized through the Web Services Interoperability organization, had been scheduled for the Tiger release.

And Mustang will have better integration with graphical user interfaces, including Microsoft's upcoming Longhorn version of Windows, Reinhold said. "

Details here.

Tuesday, November 16, 2004

The Problem with Ontologies

The Ontology Problem: A Definition with Commentary" "If the Ontology Integration Problem is not solved it will not be possible to answer a semantic search query across the open Web for a question such as "find all software products that work with Linux and are open-source and are endorsed by people or companies I trust." Why not? Because while there could be tons of raw RDF and OWL instance data out on the Web that is relevant from various ontologies, unless it either all uses the same ontology or all the ontologies that various instances refer to are integrated, the query agent will have no way of making sense of or normalizing the results. Of course, the query agent could simply run the query on all data from all ontologies it knows about, and then just present the results in a single list, sorted by ontology -- but as we've seen above, different ontologies might mean different things by classes with the same names -- and thus the results returned may not really be relevant or well-ordered."

"I believe the solution will ultimately stem from a solution to the Upper Ontology Problem -- if we can solve that problem, then much of the Ontology Integration Problem will go away as most ontologies will automatically be inter-mapped at the Upper-Ontology Level at least. If we had a standard Upper Ontology and furthermore, if this standard were to include concepts for mapping between ontologies and expressing shades semantic mapping and structure between ontological definitions, then mapping would be even easier."

Metadata Driven Component Model

BEA Announces Open-Source Milestones for Apache Beehive "BEA Systems, Inc. (Nasdaq: BEAS - News), a world leader in enterprise infrastructure software, today announced significant milestones in the company's open-source efforts including code release milestones, updated tools and additional platform support for Apache Beehive -- the lightweight metadata-driven component model which is designed to help accelerate the development of service-oriented architectures (SOAs)."

"The M1 code release for the Apache Beehive project is now publicly available for use in both Beehive open-source development and by BEA WebLogic Workshop 8.1 users, and can help developers to begin developing and collaborating on SOA-based applications."

The Apache Beehive Project "This is the project working on making J2EE easier by building a simple object model on J2EE and Struts. The goal is to take the new JSR-175 metadata annotations and use them to reduce the coding necessary for J2EE. The initial Beehive project has three pieces."

Related Pollinate Project " Pollinate is an Eclipse technology project slated to build an Eclipse-based IDE and toolset that leverages the open source Apache Beehive application framework." The PDF slides has some more information on NetUI Page Flows.

Via BEA announces Apache Beehive Milestone

Monday, November 15, 2004

HAVING

One of the features we recently added to iTQL was HAVING. This is practically identical to SQL's use of HAVING. For example:

SELECT $foo COUNT (SELECT $bar 

                   FROM ...

                   WHERE $bar <-> <->)

FROM ...

WHERE $foo <-> <->

HAVING $k0 <tucana:occurs> '1.0'^^<xsd:double> ;

There are a few things that bother me with this. The first one is the implicit column names. All aggregate functions in iTQL are implicity given $kn. Where n is an integer. The variable name should be able to be set by the user; something like: "SELECT $no_people=COUNT ..." or you could copy the SQL 92 use of AS.

Another one is caused by copying SQL. Putting constraints in the WHERE that were meant for the HAVING will produce an error from an SQL interpreter explaining that certain constraints must be in the HAVING. What this really means is that it could've been done in the WHERE clause and have been automatically extracted if necessary.

Also, the use of double should really be changed to nonNegativeInteger.

A good summary of these first two points are highlighted in a presentation "The Importance of Column Names". It is part of the web site of the new version of "The Third Manifesto".

Here's an example:

SELECT D#, COUNT(*)  

FROM EMP 

GROUP BY D# 

HAVING COUNT(*) >= 50

In SQL:1992 this is equivalent to:

SELECT *  

FROM ( SELECT D#, COUNT(*) AS NUMBER_OF_EMPS  

       FROM EMP 

       GROUP BY D# ) AS TEETH_GNASHER 

WHERE NUMBER_OF_EMPS >= 50

The Tutorial D version:

( SUMMARIZE EMP  BY { D# } ADD COUNT ( ) AS NUMBER_OF_EMPS ) 

  WHERE NUMBER_OF_EMPS >= 50

Closing the World

'Closed world' assumptions in RDF "Could we not use the XML declaration attribute standalone to determine that a document is self-contained? As in:

<xml version="1.0" encoding="UTF-8" standalone="yes">

That assumption would possibly elimitate the usefulness of that document in a Semantic Web, but could potentially make the 'open world' issue controllable."

RDF/XML is just one serialization of RDF so modifying the XML doesn't change the RDF contained within. While I don't think you can make RDF be closed world, things like ontologies do let you express "all pigs don't fly" or you can perform a query "do any pigs in this graph fly?" which may have been what you wanted anyway.

Manufacturing Semantics

A question of semantics "“I believe the big trend right now in terms of integration of business systems is the movement toward the semantic Web,” says Steven Ray, division chief for the Manufacturing Systems Integration Division (MSID), a division within the National Institute of Standards and Technology (NIST). “The semantic web gives meaning to information—it makes that information formal and acceptable to a computerized system to allow truly intelligent searching.”
According to Ray, the semantic Web is more than a structured database. Today, companies rely on programmers to give meaning to information. For example, if an enterprise resource planning (ERP) company wants to work closely with suppliers to monitor inventory levels, a programmer must complete integration for each new supplier. Over time, this can become quite costly."

"“These standards are the primary building blocks to intelligent and integrated manufacturing,” says Pat Snack, on executive loan from General Motors for AIAG. “We will see a growing need for semantic standards in coming months. The closer we get to end-to-end integrated manufacturing, the more we need to minimize our cost to complexity. It is really a delicate balancing act.”"

Sunday, November 14, 2004

Shortcomings of Spotlight

Tiger's Spotlight - Simplicity with Room for Improvement "While I'm not doubting the success of Apple marketing this as a world-class search tool, from what I know today, there are some shortcomings:
* Only one Importer registrable per File type (.extension) system-wide.
* No daisy-chaining of Importers.
* Very limited amount of data to be stored in dictionary per file.
* No built-in capability to index the content of compressed files like jars, zip, tar etc."

Follows a recent article Apple details plans to Spotlight desktop search. If you want to try something similar there's also Quicksilver which runs on 10.3 and comes with a Spotlight plugin called Flashlight.

Annotations and EJB 3.0

EJB 3.0 Preview " The EJB 3.0 specification uses annotations so that you can declare your EJB metadata directly within the bean class.

import javax.ejb.*;

@Stateful
public class ShoppingCartBean implements ShoppingCart
{
@Tx(TxType.REQUIRED) @MethodPermission({"customer"})
public void purchase(Product product, int quantity) {...}

@Remove void emptyCart() {...}
}

The @Stateful annotation marks the ShoppingCartBean as a stateful session bean. @Tx denotes transaction demarcation, while @MethodPermission defines role-based security for the bean method. EJB 3.0 provides annotations for every type of metadata so that no XML descriptor is needed and you can deploy your beans simply by deploying a plain old JAR into your application server."

Friday, November 12, 2004

Never trust a company with a fish tank

EA: The Human Story "...all along the way there were deceptions, there were promises, there were assurances -- there was a big fancy office building with an expensive fish tank -- all of which in the end look like an elaborate scheme to keep a crop of employees on the project just long enough to get it shipped. And then if they need to, they hire in a new batch, fresh and ready to hear more promises that will not be kept; EA's turnover rate in engineering is approximately 50%. This is how EA works. So now we know, now we can move on, right? That seems to be what happens to everyone else. But it's not enough. Because in the end, regardless of what happens with our particular situation, this kind of "business" isn't right, and people need to know about it, which is why I write this today."

That goes for virtual ones too.

The Floggings Will Continue Until Morale Improves "The list could be much longer, but it boils down to:
* Give them the tools to do their job efficiently
* Remove potential interruptions or distractions
* Make sure they’re motivated"

Always the last to know

NetKernel + Kowari "NetKernel has a great pipes/filters framework for composing services. Kowari is the only non-rdbms backed RDF triple-store with support for queries against datetime data types.

Sadly, building a web application or service in Kowari kinda sucks. But, theoretically, Kowari is embeddable. So I set out to verify this assertion and to gain more knowledge about the inner workings of NetKernel."

"Fourth, NetKernel is really just that, a kernel for managing the interaction and scheduling between components executing in the vm, called modules. But I haven't found support for explicit life-cycle management. That is, there is no init(), start() or stop() type methods on a module. It seems to rely on finalizers to clean up resources. Thus, shutting down a NetKernel instance corrupts the Kowari database."

Like I said in the comments, previously commited data will be there no matter if you kill the process, turn off the power or whatever. However, if you stop it during a load or if autocommit is off it won't be there when restarting.

New from Google

Coincidentally, Google's Index Size Jumps and Search Engine Size Wars V Erupts "On the eve of Microsoft's long anticipated launch of MSN Search, Google is reporting on its home page that its index size has nearly doubled. Google now claims that it is now "Searching 8,058,044,651 web pages." Earlier today, a search for the word "the" returned nearly 11 billion results, a far larger number than officially reported on the home page. No matter which numbers you believe, it's a significant expansion of Google's web database."

"Microsoft had planned to seize the title of biggest search engine by announcing 5 billion pages indexed today."

And the people at Google "abusing" their power Tweaking the tiger's tail. It's still there. A search for kowari only brings up references to the RDF triple store and not the animal. Google brings up Kowari the animal as the second result.

Thursday, November 11, 2004

Too many or not enough

The Ontological Challenge "There are several big missing pieces right now in making the semantic web. Certainly the lack of ontologies is a major issue. There are, I guess Deborah would say thousands of ontologies. So there maybe isn't a lack; there may be too many from one perspective. When you start looking at these ontologies, what you find is that some of them are overly specialized; maybe they are focused, for example, on particular niches of interest to DARPA, not particularly of great use to consumers unless you live in New York (with the paranoia that we all experience there)."

"Currently, there is no good human-readable mid-level ontology that's covering common-sense concepts. Cycorp has probably the most impressive ontology. The only problem is it's so big and complex and requires such a high, steep learning curve to actually do anything with it that it's not really targeted at the needs of normal developers and regular end users. The lack of the good, open ontology that covers common-sense concepts is a big problem. That's something we're working on, too. I think that ultimately there ought to be at least something like that that comes out of the W3C or is handed to the W3C at some point to at least provide a basis for describing certain types of entities and relationships that we all have to use in our applications."

"So associating data with ontologies is a problem. Building ontologies, I come from the school of thought of top down. I've never seen a bottom-up ontology that I liked. There aren't many. Having built much of ontologies, I think that the amount of thinking that goes into it is just so intensive that to do it well, I just don't think that, at least without great AI, we'll be able to do it anytime in the next couple of decades."

Wednesday, November 10, 2004

Welkin

Welkin: A General-Purpose RDF Browser "Many consider the Semantic Web to be vaporware and others believe it's the next big thing. No matter where you stand, a question always pops up: Where is the RDF browser? The SIMILE Project, a joint project between W3C, MIT and HP to implement semantic interoperability of metadata in digital libraries, released today the first beta release of a general purpose graphic and interactive RDF browser named Welkin (see a screenshot), targetted to those who need to get a mental model of any RDF dataset, from a single RSS 1.0 news feed to a collection of digital data."

Welkin Homepage.

Creative RDF in Queensland

Creative Commons taking shape in Australia "The Australian branch of the Creative Commons is taking shape with the Queensland University of Technology being the lead agency, according to Professor Brian Fitzgerald, head of the university's law school.

In February this year, QUT became the Australian institutional affiliate for the project and over the last few months it has worked closely with the legal firm Blake Dawson Waldron to set up the platform for the project in Australia.

The University is holding a conference in January next year on Open content licensing, and has invited Stanford University Law Professor Lawrence Lessig, one of the directors of the Creative Commons, as its keynote speaker."

6 months with Java 5

"I have been writing JDK 5.0 code for over six months now, so I thought I would take some time to reflect on my experience and draw a few conclusions on the features that were introduced."

Surprisingly the favourite, is the enhanced for loop:
"The undisputed winner. I can't even begin to describe how good it feels to use the new for loop everywhere (well, almost everywhere). I mentally cringe the few times when I am forced to use the old for loop, typically when I need the index or that I want the Iterator to be visible outside the loop."

Unsurprisingly annotations gets a rave:
"Obviously, I am partial to annotations since they are at the heart of TestNG but I am a firm believer that annotations are going to change the way we build software in Java. We have been relying for far too long on reflection hacks to introduce meta-data in our programs, and annotations are finally going to provide an excellent solution to this problem.

Also, I haven't felt the need to use some of the predefined annotations such as @Override, so I haven't formed an opinion on them yet.

It seems inescapable to me that in a couple of years, most of the Java code that we will be reading and writing will contain annotations. "

And generics:
"In a nutshell, I have this to say about Java generics: my code feels more robust, but it's harder to read."

JDK 5 in Practice.

Also an interview with one of the authors of Hibernate:
"Well, we are a bit stuck. We can't use many of the new features, because Hibernate needs to stay source-level compatible with older JDKs. The annotations stuff is okay, because we can provide it as an add-on package.

Certainly, annotations are the most significant new feature of Java 5, and it's very likely that they will completely change the way we write code."

The problem with SOAP

Web Services - The SCRAM Generation "SOAP - the bedrock of our industry's plans for universal interoperability. The cornerstone of enterprise mission-critical integration."

"SOAP - a communications protocol that does not guarantee delivery of messages, does not guarantee what order those messages will be delivered in, and does not guarantee not to throw in extra messages, just for the fun of it. Now to be fair, SOAP is only half of the problem. The other half is the communications channel itself, which is typically HTTP."

"SCRAM stands for Secure, Coordinated, Reliable Asynchronous Messaging"

SPARQL Protocol

SPARQL Protocol for RDF "This document describes SPARQL, a protocol for accessing RDF data and for conveying RDF queries from query clients to query processors. The SPARQL Protocol has been designed for compatability with the SPARQL Query Language for RDF but is designed to convey queries from other RDF query languages as well."

"The SPARQL abstract protocol has three parts: types, operations, and responses."

"The abstract protocol uses the following types, which fall into three categories: W3C XML Schema types, which are relevant to all abstract protocol operations and borrowed from W3C XML Schema Datatypes; protocol types, which are relevant to abstract protocol operations; and query types, which are relevant to query operations."

"The SPARQL Abstract Protocol defines three kinds of response: success, fault, and informational. Success responses indicate that the protocol operation was successfully executed; fault responses indicate that the protocol operation was not successfully executed; informational responses either provide additional information about an abstract protocol operation or describe specific conditions relevant to particular abstract protocol operations."

"The SPARQL Abstract Protocol is made up of seven orthogonal operations: query, getGraph, getOptions, makeGraph, dropGraph, addTriples, and deleteTriples."

Monday, November 08, 2004

The time for change

J2EE5, EJB3 and microcontainers "Today, java has support for annotation since JDK5.0. We are already using it to transform the way we think about, design, implement and indeed standardize middleware in the next generations of J2EE. Where J2EE has a big edge over .NET is first in a large installed base but most importantly in the quality of the services."

"We have been doing the right thing in the spec committees, namely simplifying the programming models to support POJO and annotations at the specification level by leveraging JDK5.0. Across the board in EJB3 for example, we have completely revamped and simplified the way developers interface with and program to middleware. Instead of complex API's and tons of XML, developers can tag their objects with annotations. Developers have already adopted this approach."

"EJB3 will have a long life, as it is the first to introduce this long awaited microcontainer, lightweight programming view of the world. EJB services can now be used on J2SE with these microcontainers."

"All of this is already used in production, today. Many vendors are on the market, many packages await branding standardization for further penetration of the market. J2EE will remain strong."

No-Nonsense Semantic Web Part II

A No-nonsense Guide to Semantic Web Specs for XML People "I was going to talk about RDQL in this article, but on Oct. 12, the Data Access Working Group (DAWG, pronounced so that it rhyms with 'dog') released the first working draft of Sparql, the query language for RDF."

"So, as a result, the query will return the title and the price of items where the price is less than 30.

Big deal, I hear you saying. I can do that today in SQL.

True, you can. If your data is local and you control it. But what if you want a software agent to do the queries for you? How are you going to find out across different databases how to adapt your query to their own internal logic, to their tables and to the way thay modeled the information in their relational model?"

"So, in short: should you care about RDF? For now, you are safe if you care about keeping your own data valid and coherent. The semantic web is trying hard to unlock the chicken-egg problem of "no killer app until data, no data until killer app" and automatic trasnformation of existing data into RDF is what I think is going to unlock it. Also, the fact that we are building tools that you can now use to operate on your RDF data, for example to browse and search it, will show you what you can gain by making those relationships explicit."

Thursday, November 04, 2004

Parallel Countries

The similarities between the Australian and American political situation was recently highlighted by "Bush & Howard: parallel lives"

"As John Howard ponders the likely return of George Bush, he will no doubt be struck by the parallels between the President's probable re-election and his own emphatic victory last month."

Similarities:
* Both have control in the upper and lower houses to pass laws as long as they can maintain party lines.
* Running trade deficits - Australian Trade Deficit Widens as Imports Increase
* Banning abortion.
Sarah Maddison: Minister mistakes opinion for fact.
* Increasing surveillance powers. Ruddock relaunches terror laws.
* Tax cuts for the higher tax brackets.
Howard flags new tax cuts
* Reducing gay rights. THE COALITION PLANS TO REINTRODUCE LEGISLATION BANNING SAME-SEX COUPLES FROM OVERSEAS ADOPTION NOW IT HAS CONTROL OF THE SENATE.
* Environment. We won't Sign Kyoto Treaty, Says Australia