More News: 12/01/2004

Friday, December 24, 2004

Happy Festivus

This should really annoy the people trying to keep the true meaning of Christmas.

'Seinfeld' Festivus display vies with Nativity

"When a Florida church group put a Nativity scene on public property, officials warned it might open the door to other religious -- and not-so-religious -- displays. They were right.

Since the Nativity was erected in Polk County, displays have gone up honoring Zoroastrianism and the fake holiday Festivus, featured on the TV sitcom "Seinfeld.""

It's real if it has a Wikipedia entry: http://en.wikipedia.org/wiki/Festivus oh and a NY Times article "Fooey to the World: Festivus Is Come". Danny has a very Britney Christmas (still the number one search term).

Thursday, December 23, 2004

Visual Browser

Visual Browser is a Java application that can visualise the data in RDF scheme. The main principle of the visualisation is that:

* the triple (resource, resource, resource) is represented by two nodes connected by an edge
* the triple (resource, resource, literal) is represented by a hint (small window appearing on mouse over the subject node)

Visual Browser uses the Jena framework to obtain the data, since the RDF scheme can be saved in different forms (a single XML file or a relational database).

The visualisation engine is derived from TouchGraph LLC."

Only for Java 1.5.

Wednesday, December 22, 2004

Pre-release 2 Available

See the Kowari project page. Includes a sample application which extracts metadata using resolvers (mp3s), an example of an AudioManager.

Gridbag

IBM Reflexive User Interface Builder "The IBM Reflexive User Interface Builder is an application that constructs and renders graphical user interfaces (GUIs) for Java Swing and Eclipse Standard Widget Toolkit (SWT) based upon a descriptive XML document. (Java Swing is a rich GUI toolkit included with Java that provides operating system-independent GUI components. Eclipse SWT is an add-on GUI toolkit that takes advantage of host operating system GUI components for maximum host integration.) IBM Reflexive User Interface Builder is both a specification for a mark-up language in which to describe GUIs and an engine for creating (and, if desired, rendering) them. This application can be used as a stand-alone application for testing and evaluating basic GUI layout and functionality, or it can be used as a library within the context of a Java application for creating and rendering GUIs for that application."

So you never have to go Totally Gridbag (via Tom).

Flink

"Flink is a visualization of the social networks of the Semantic Web
community. Information about researchers and their relationships is
extracted from the Web, FOAF profiles, emails and publications. Flink
itself uses Semantic Web technology to represent, store and reason with
metadata. For more information, please see the About section of the
website."

Results of a Query

SPARQL Variable Binding Results XML Format "This document describes an XML format for the variable binding results format provided by the SPARQL query language for RDF, developed by the W3C RDF Data Access Working Group (DAWG), part of the Semantic Web Activity as described in the activity statement"

Tuesday, December 21, 2004

Just Ask for Help

Came across Yet Another RDF Store: Perfect Index Structures for Storing Semantic Web Data With Contexts "We tried to install Kowari, but failed to get a running version. In one of our installations, inserting a 1 MB N-triples file via the Jena interface resulted in 30 minutes processing time before the process threw an exception because of a full disk (there were 200MB disk space available before starting the Kowari server). On another installation, we got a core dump of the JVM when running Kowari. Therefore we concentrated in our efforts on Sesame 1.1RC2 and Redland 0.9.18."

This is the first record of anyone having a problem with Kowari in this respect and without any idea of what version of Kowari or Java or the OS or anything else I went and downloaded their example code. Using Kowari 1.0.5 (they mentioned Sesame 1.1RC2 which was released after 1.0.5) with the given code you get an exception as NTriples is not supported as a parameter. NTriples is a subset of N3. So this code doesn't work. However, if the code is modified so that the last lines become:

  model.read(new FileReader(file), file.toURI().toString(), "N3");
  model.close();
  database.close();

And run the code you get:

...
 INFO [main] (AbstractDatabaseSession.java:699) - Loading 
file:/.../University1_0.nt into rmi://.../server1#camera
 INFO [main] (AbstractDatabaseSession.java:770) - Loaded 7304 
statements
...

If people have any problems running Kowari, before they write it up as a paper maybe they should ask someone how to get it running first. Although, this isn't the first time.

Type Safe Enums

Overloading int considered harmful: part "On the other hand, enums do not handle multi-classloader issues automatically. It is absolutely possible to have FontStyle.PLAIN != FontStyle.PLAIN when the FontStyle class is loaded twice by two different class loaders. If this is a real possibility in your code (and if you're writing a library, it's always a real possibility), the obvious solution is to override equals with a method that does work. Unfortunately, you can't do that. The equals method in java.lang.Enum is declared final, and all it does is compare objects for object identity with ==. So you really have no choice but to buckle down, write your own type-safe enum class, and warn client programmers to always use equals instead of == when comparing objects."

Monday, December 20, 2004

Eye on Search

Conquering the Digital Haystack "But clearly it's time to revisit the search engine. Google's IPO didn't end the search wars, it fanned the flames. Few fields are as rife with activity, and a slew of start-ups are angling for position. Some claim new and better technology than the PageRank algorithm made famous by Google. Others seek merely to be different -- filling voids left by the big players. And though the technologies, in most cases, are brand-new and untested, they promise to change the way consumers search the Web -- and the way advertisers reach those consumers. A look at three of the hottest search start-ups -- all planning services for small businesses by early 2005 -- shows how."

"San Francisco-based Blinkx launched last July and already claims more than a million users. What does Blinkx do differently? Its technology not only matches keywords but also locates related concepts...What's more, Blinkx searches everything -- not just the Web but also the contents of your computer, including e-mail messages and attachments and files on your hard drive, as well as weblogs and digital television content, which are currently ignored by most other search engines."

"Hence Dipsie, which searches based on semantic rules rather than keywords or even concepts. Wiener claims his semantic algorithm can sift through Web information and get you in one click what might take several with a conventional engine -- if it got you there at all. He also says the ability to map concepts will enable him to index some 10 billion webpages, more than double the four billion claimed by Google."

Proximity and MonetDB

"PROXIMITY incorporates major research findings from the Knowledge Discovery Laboratory, including model corrections for statistical biases inherent in relational data such as autocorrelation and degree disparity, as well as our graphical query language. PROXIMITY provides an open-source platform that can be used for both research into relational knowledge discovery and practical applications to real-world data."

"MonetDB achieves this goal using innovations at all layers of a DBMS: a storage model based on vertical fragmentation, a modern CPU-tuned vectorized query execution architecture that often gives MonetDB a more than 10-fold raw speed advantage on the same algorithm over a typical interpreter-based RDBMS. MonetDB is one of the first database systems to focus its query optimization effort on exploiting CPU caches. MonetDB also features automatic and self-tuning indexes, run-time query optimization, a modular software architecture, etc.. In-depth information on the technical innovations in the design and implementation of MonetDB can be found in our digital library."

Proximity is Apache license and MonetDB is MPL.

Another Agile Database User

Re: Non SemWeb uses of RDF "Of course this could also be done using a big relational database. A big benefit of RDF we've found is that you can dump the data together first, and then join it up with heuristics (e.g. switch port has same MAC entry as server NIC etc..). I suspect that this is an order-of-magnitude time saving compared with doing static schema design up front."

StAX

StAX utility classes "The purpose of this project is to help facilitate the adoption of JSR-173: Streaming API for XML (StAX) by providing a set of utility classes that make it easy for developers to integrate StAX into their existing XML processing applications."

Jena 2.2 beta 1

"This release is primarily a maintenance release. The "beta" designation
means that not all work on Jena 2.2 is completed but also it is a
recognition that a pre-release to the growing Jena user community is
needed before a full release in order to smooth the transition between
versions."

"This release was built with Java 1.5.0 with source and target set to
level 1.4. It has been tested on WindowsXP, Linux and MacOS with a
mixture of Java 1.4.1, 1.4.2, 1.5.0."

Some good stuff from the release notes:
* "Reifier::getHiddenTriples() and getReificationTriples() REMOVED and replaced by iterator..."
* "added new API operation Model::removeAll() which removes "all" statements from a model"
* "a new memory-based Graph, SmallGraphMem, has been introduced."

Available from Sourceforge to download.

Also released was NG4J 0.3.

Sunday, December 19, 2004

The Future of Everything

Future of Online Marketing: Semantic Ontology & Social Marketing "Despite these struggles, the Web, for all its flaws and bottlenecks and viruses and spam, is one of the crowning achievements of the last 50 years."

"As for the impact, the Semantic Web will make what we have more useful, and that has been Berners-Lee's mandate all along: "It's all about the people, not the technology," he said at W3C10.

"We made the Web to use the Web, to get other people to use the Web and to get things done," Berners-Lee said. "We are going into a new area ... that could change the rules of Web and enable us to do all that we on this planet can accomplish." "

Intelligent Enterprise also recently published an article, Beyond the World Wide Web.

Great Southern Land

A land of wasted web opportunity "Of the World Wide Web Consortium's (W3C) 368 members, only five are from Australia, says the head of W3C's international offices Ivan Herman, who was in Brisbane last week for a conference at the Distributed Systems Technology Centre (DSTC)."

""I often hear people in Australia say they are too small to do anything, they are happy to let the big guys in the US fight out the standards and they will just use them. I find this attitude strange because there is no reason for (it); the W3C tries to have a structure where any organisation can have their wishes heard," Herman says.

This attitude limits Australia's involvement in the development of the "semantic web", the next big e-commerce wave that promises to unleash a slew of new technologies, where computers on the internet can communicate data effortlessly with each other and therefore provide more useful applications. Whereas HTML was developed for humans to read websites, the semantic web puts computers in touch with each other so they can automatically exchange data, such as stock prices, weather forecasts, bus timetables, plane routes, and sports scores and GPS co-ordinates. It is envisaged that once these services are widely available that new applications and business opportunities will emerge."

It's basically correct, Australia is generally very conservative sprinkled with a bunch of very active people. Brisbane is doing okay, my GP is working on ontologies, libferris is made here, and of course Kowari/TKS. DSTC, UQ and QUT are all doing semantic web work, of course.

Principles

W3C Delivers Web Architecture Overview ""In the architecture document, [TAG participants] emphasize what characteristics of the Web must be preserved when inventing new technology," said Tim Berners-Lee, the Web's creator who now serves as W3C director and co-chair of the TAG." They notice where the current systems don't work well, and, as a result, show weakness. This document is a pithy summary of the wisdom of the community.""

Some of the highlights:
* 2. Identification "Global naming leads to global network effects."
* 2.5. URI Opacity "Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource."
* 4.6. Future Directions for Data Formats "Data formats enable the creation of new applications to make use of the information space infrastructure. The Semantic Web is one such application, built on top of RDF [RDFXML]. This document does not discuss the Semantic Web in detail; the TAG expects that future volumes of this document will. See the related TAG issue httpRange-14."
* 5.4. Protocol-based Interoperability "Protocols designed to be resilient in the face of widely varying environments have helped the Web scale and have facilitated communication across multiple trust boundaries. Traditional application programming interfaces (APIs) do not always take these constraints into account, nor should they be required to. One effect of protocol-based design is that the technology shared among agents often lasts longer than the agents themselves."

Wednesday, December 15, 2004

Two Papers

* Web ontology reasoning with logic databases "Put together, the results developed in this thesis allow scalable and practical solutions to ABox-related reasoning problems for OWL-based applications."
* Reverse Engineering Ontology to Conceptual Data Models "One of the main problems facing genomics data integration is related to multiple representations of the data semantics within a set of sources. For example, the same gene may be represented as a genetic map locus in one data source, an aggregation of multiple individual exon entries in another data source or a set of EST sequences in yet another one. In this case, ontologies become an integral part of bioinformatics since they encourage a common vocabulary for describing complex and evolving biological knowledge [17, 18] and can be used as a common access to diverse information repositories."

Testing

* Unit Test Patterns "...two things are needed--a formalization of unit testing by establishing unit test patterns, and the early adoption of object oriented design patterns in the developing application to specifically target the needs of unit testing."
* FIT "Fit allows customers, testers, and programmers to learn what their software should do and what it does do. It automatically compares customers' expectations to actual results."
* SelfEsteem "SelfEsteem is a graphical presentation of Acceptance Test results."
* Marathon Similar to Abbot, JFCUnit, and GUITAR.
* XMLUnit.
* MockEJB "MockEJB is a lightweight framework for running EJBs. MockEJB implements javax.ejb APIs and creates Home and EJBObject implementation classes for your EJBs. Internally, MockEJB relies on dynamic proxies and interceptors...Additionally, MockEJB comes with the "mock" implementation of non-EJB APIs. Currently it provides in-memory JNDI and JMS implementations which can be used independently from MockEJB's EJB support."
* ThoughtWorks OSS.

Tuesday, December 14, 2004

I am the Amazon and the Zip Code

Suggested Google Alphabet All someone has to do now is design the letters like they did at primary school.

Mock Objects

Recently, I was link to by The case for unit testing; Mock databases?; Single-day software development... which points to things such as I changed my mind - Mock objects are wrong for database unit testing and the very good How To Write Unmaintainable Code.

Zoom, Zoom, Zoom

A Structured 2D Graphics Framework " Why use Piccolo? It will allow you to build structured graphical applications without worrying so much about the low level details. The infrastructure provides efficient repainting of the screen, bounds management, event handling and dispatch, picking (determining which visual object the mouse is over), animation, layout, and more. Normally, you would have to write all of this code from scratch. Additionally, if you want to build an application with zooming, that’s built right into the framework too."

Used by Jambalaya.

Waiting for the Schema Fairy

The Semantic WinFS "Instead, people will simply release metadata. Bad metdata. Good metdata. Incompatible metadata. Heck, they're already doing it without Microsoft's filesystem as a catalyst. Rather than a Schema Fairy, we need a standard method for generalized structuring of metadata coupled with a language for semantic translation. The good people of the world can release all the metadata they want, and when critical mass has been reached, somebody will create a translation between the two. If only we had some knight in shining armor to rescue us!

Have you ever heard those Trojan commericals where the Trojan Man rides up on his horse and saves the, um, date? Well, imagine that same commercial, except that the horse is the W3C, the Trojan Man is Sir Tim Berners-Lee, and instead of free prophylactics he's handing out RDF and OWL."

Via On the WinFS delay "If WinFS was only about figuring out how to plug in an object store to the file system, I think it would have a definite ship date by now. The technical issues involved in doing that are tractable, and the PDC build showed that MS is making decent progress on that front. The problem of associating blobs of metadata with blobs of information in the filestore is most likely done (or getting there). However, even with that in place there are larger issues: what does this metadata look like, and where does it come from?"

"Even if the Schema Fairy came down today and blessed us with canonical schemas for everything, there would still be the problem of metadata extraction and annotation. WinFS will only work if it has metadata to process, and right now there aren’t very many good ways of marking up a document with semantic metadata. Manual annotation just doesn’t work – the time cost to the user is too high."

Monday, December 13, 2004

REST of the world

How I explained REST to my wife... "Ryan: Machines don't have a universal noun - that's why they suck. Every programming language, database, or other kind of system has a different way of talking about nouns. That's why the URL is so important. It let's all of these systems tell each other about each other's nouns.

Wife: But when I'm looking at a web page, I don't think of it like that.

Ryan: Nobody does. Except Fielding and handful of other people. That's why machines still suck.

Wife: What about verbs and pronouns and adjectives?

Ryan: Funny you asked because that's another big aspect of REST. Well, verbs are anyway."

Updated Kowari Javadoc

For all your API goodness Kowari Metadata Store 1.1.0 API.

Sunday, December 12, 2004

Another XML Accelerator

I was recently talking about Intel's old NetStructure™ 7210 XML Accelerator a more recent piece of hardware (which is different) DataPower XML Hardware Accelerator "As Rich mentioned in his comment hardware acceleration is a pretty hot ticket at the moment, and for good reason. While some might argue all hardware acceleration does is bring XML applications to the same performance level of binary data-based applications my counter question/remark would be "You mean I can get all the power and benefit that a XML/XSLT-based solution brings AND get the same performance as an application that has been painstakingly handcoded in C or at best C++? WOOHOO!!!""

To create an RDF accelerator maybe you could use a programmable graphics card.

How to Create a Good Web UI

Or should that be Google Web UI: "Looking at the web applications Google is producing, I can comfortably say (and I’m almost never comfortable making technological predictions), that if you develop web applications and you aren’t looking today for ways to include dynamic interface techniques like those made practical by XmlHttpRequest, you’re going to end up losing to someone who is."

Friday, December 10, 2004

Kowari 1.1.0 Pre-Release 1

What's new:
* Resolvers allows developers to create components that expose data sources as RDF.
* Content handlers allow Kowari to extract metadata from different types of files.
* Improved datatype handling for most XSD datatypes, RDF's inbuilt XML Literal datatype. Allows for the storage and querying of unsupported datatypes.
* Improved performance, specifically on small queries and subqueries.
* AbstractDatabaseSession replaced to allow pluggable Session implementations. Jena, JRDF or iTQL are able to be used separately.

See http://www.kowari.org/.

Provenance to Unify Browsing

A Framework for Unified Information Browsing via RDFX "This Unified Information Service Browser (UISB) project is investigating the means of joining heterogeneous information from multiple sources into a combined and navigable structure...Attempting to unify disparate data sources illustrates that hierarchical and relational data models can place undesirable constraints on the stored data; however, the same data can be stored without constraint in a dynamic graph, which also provides significant potential for enhanced traversal and discovery capabilities."

"The solution that was adopted was to include provenance information for every statement that is added to the graph. Thereafter if a graph, or a statement or a resource is removed from the store, it can be done without the risk of losing information. In addition to identifying where a statement originated, the provenance objects can store metadata describing (among other things) when it was discovered, and when it should expire, therefore enabling the automated management of expired information, or potentially providing an event mechanism to instigate the refresh of the data. The benefit of this solution, is that it does not rely on the graph implementation to ensure that graph merging and un-merging happens as expected, however the added overhead of four extra statements for tracking provenance may negatively affect performance and increase the size of the data store. This will depend on the implementation of the graph store since some stores (e.g. Jena 6.4) can reify statements more efficiently."

Bottom Up Tagging

Playing with Taxonomies "I spoke with Stewart Butterfield of Ludicorp, developer of Flickr, about this effort. (Latin scholars note: The Ludi in "Ludicorp" suggests the Latin words for play and game.) Stewart had many insights about this new approach to building taxonomies. "If you can hire enough excellent librarians, you will get better keyword results than with social approaches. However, as the content grows, tagging (and retagging) becomes an order of magnitude more difficult. In other words, social approaches are 80% as good as and 10 times easier than top-down approaches." As to whether Flickr's approach would work in the button-down corporate world, Butterfield had this to say: "Anticipate resistance in the CIO crowd who don't want to risk losing control in a social self-correcting process and do not want anything to get lost." Butterfield says that at least 55% of photos uploaded to Flickr have one or more tags, and 66% have both a tag and user-supplied metadata. As of early September, Flickr had 500,000 photos on its site and was growing at the rate of 15,000 to 20,000 more each day."

Thursday, December 09, 2004

Sometimes You Can't Make It On Your Own

On Folly "The unthinkable rapprochement between topic maps and RDF has occurred, signified by the formation of the W3C RDF/Topic Maps Interoperability Task Force. The task force is part of the Semantic Web Best Practices and Deployment Working Group. The last time I was with the majority of the people listed as members of the task force, it was in a very pleasant drinking establishment in Amsterdam. It's nice to think that the bonhomie of that evening has persisted into forming the basis of the task force."

"...and I think he mischaracterizes Berners-Lee's approach as calling for globally agreed ontologies. Nevertheless, he does note an increase in independent software activity concerning ontologies and the Semantic Web."

"Both Shirky and Udell seem to be pretty much convinced the Semantic Web requires, from the outset, globally agreed ontologies. It seems more that they've set up a straw man. I had always envisaged that in the same way user interface and other conventions have emerged from the messy web, so would ontological conventions. Messy, but good enough."

When different isn't better

Why Coding Standards? "Coding standards are being adopted by more development organizations as a means of ensuring the delivery of reliable and sound applications. Estimates suggest that 80 percent of the total cost of a piece of software go to maintenance, and not enough effort is being directed at ensuring that the quality goes in during the development process.

With the advent of coding standards, companies are saving money and time on the front end, reducing potential safety risks and safeguarding their reputation both internally and externally. Equally, development teams can reduce their time in code reviews, correcting coding standards violations that could be detected—and in some cases corrected—automatically."

IBM CICS J2EE

IBM Extends CICS for Web Services, J2EE "Announced last week, the CICS Transaction Server for z/OS 3.1 provides Web services capabilities, while the CICS Transaction Gateway 6.0 provides J2EE (Java 2 Platform, Enterprise Edition) connectivity. The software enables customers to more easily integrate business processes and extend existing mainframe-based applications with better Web services and integration capabilities."

This was a long time in coming: "IBM Corp. will bring its San Francisco project into line with Sun Microsystems's Enterprise JavaBeans object specification and announce intentions to add Enterprise Java application server functionality to a range of its middleware products, all at the Java strategy day at JavaOne this March in San Francisco...IBM may achieve this by building a separate application server, forming a partnership with EJB server vendors such as Gemstone or WebLogic, or developing an application container that will run inside or on top of current server offerings such as CICS, according to sources."

Metafilter on Metadata

Recently we've all been thinking about flat (or better, faceted) hierarchy web apps that organize email, photos, bookmarks, and general knowledge. The common threads are metadata (tags, categories, labels) that enrich relationships within and hence searchability of large collections. But besides marketroid hype (buzzwords, snark) and a computer that plays Twenty Questions what else can we do and study using faceted data structures: searchable culture references in The Simpsons, library science, computer filesystems, A.I. development, models for human memory and cognition?

FacetedClassification.
From http://www.metafilter.com/mefi/37515.

RDF, the ultimate agile database

Perspective on XML: Be humble, not imperial "The revolt against imperial modeling of code has already taken shape in the form of languages and agile methods. Agile programming emphasizes highly iterative development in close collaboration with the eventual users of the product. Even more important, it stresses the inevitability of change and evolution. In effect, agile developers pride themselves on being able to rapidly accommodate change."

"The same revolution is in the offing for data modeling. There have been some developments in agile databases which, literally, adapt the ideas of agile programming to the design of (usually relational) databases, but there is also progress occurring in semi-structured databases and, in particular, XML."

Is XML Zen in opposition to "strong" data modeling? "One thing I seem to share with so many of my colleagues in the XML world is a wary attitude towards traditional data modeling practices. It's an attitude that has also informed my thinking in related articles pondering data supermodels, coupling of distributed systems, OO encapsulation, and the like.

Some of us see XML as a bit of a refuge from established schools of data modeling. OO and Unified Process in my case, E-R and other relational based modeling in others'. Some just came from document-centric backgrounds where such extremely rationalized data modeling was not the mainstay. In my case, interest in XML was part of a general interest in data modeling as a vehicle for human expression rather than for robotic simulation of the real world."

As I've said before, XML is not relational enough and RDF is relational or rather "RDF provides a relational data model of the Web".

2-4 Players, Ages 10 and above

The Epistomat: Generating Consensus About RDF Ontologies and Rules "What happens when you take an ontology-driven interface generator, and feed it the ontology for OWL itself? A simple recursive technique allows for bootstrapping a world model and programmatically generating interfaces to it. Such a model can then serve as the platform for the evolution of collective consensus about ontologies and statements, by means of a curious game of Nomic."

From the Nomic site:
"Nomic is a game in which changing the rules is a move. In that respect it differs from almost every other game. The primary activity of Nomic is proposing changes in the rules, debating the wisdom of changing them in that way, voting on the changes, deciding what can and cannot be done afterwards, and doing it. Even this core of the game, of course, can be changed."

Protein Database

Constructing ontology-driven protein family databases. "The protein phosphatase resource, PhosphaBase, is freely available over the internet (http://www.bioinf.man.ac.uk/phosphabase). The DAML+OIL ontology for the protein phosphatases and the ABC transporters is available on request from the authors."

Ganesha

I previously mentioned Ganesha but before it was available for download. Well, download it.

Wednesday, December 08, 2004

OO DB goes OS

Object database goes open source "Startup company db4objects this week is releasing its object database, db4o, under an open source format, with the product now available either under the GPL via open source or commercially as embeddable software.

Built for Java and .Net development, db4o enables storage of objects, according to the company. An example of an object could be a vitamin in a biotech application or a brake configuration in an automotive application, according to Christof Wittig, CEO of db4objects."

Their homepage prominently displays the fact too.

Tuesday, December 07, 2004

Good-bye JUnit

TestNG "TestNG is a testing framework inspired from JUnit and NUnit but introducing some new functionalities that make it more powerful and easier to use, such as:

* JSR 175 Annotations (JDK 1.4 is also supported with JavaDoc annotations).
* Flexible test configuration.
* Default JDK functions for runtime and logging (no dependencies).
* Powerful execution model (no more TestSuite).
* Supports dependent methods.

I started TestNG out of frustration for some JUnit deficiencies which I have documented on my weblog here, here and in particular, here. Reading these entries might give you a better idea of the goal I am trying to achieve with TestNG. You can also check out a quick overview of the main features and an article describing a very concrete example where the combined use of several TestNG's features provides for a very intuitive and maintainable testing design."

A simple example via this blog entry.

Monday, December 06, 2004

Metadata as Self Defense

Bootstrapping the semantic Web "It's tempting to draw parallels between the careers of Albert Einstein and Tim Berners-Lee. Both men made world-transforming breakthroughs and then pursued even grander visions. Einstein, of course, never found the unified theory he sought for three decades. A lot of people think Berners-Lee's vision of a semantic Web will prove equally elusive."

"Semantic-Web naysayers think people and organizations can't be bothered to assert machine-readable facts about themselves. And, today, that is undoubtedly true. But when others assert facts about you -- as they increasingly will -- the tide could begin to turn. Individual acts of self-defense may ultimately combine to bootstrap the semantic Web."

Rice of the future

Berners-Lee Maps Vision of a Web Without Walls "To envision the Internet of the future, W3C director and WWW founding father Tim Berners-Lee suggested during the W3C's 10-year birthday bash here Wednesday, first envision groceries—say a box of rice.

On the box's side, in small, rice grain-sized type, you will find nutrition information. On its back, you will find directions on how to cook it. Somewhere else you may find a URL that you can use to research any number of rice-related things: recipes, country of agricultural origin, Uncle Ben company data or relevant information pertaining to the allergenic nature of rice, perhaps."

"Haystack knocks down the partitions that separate e-mail clients, file systems, calendars, address books, the Web and other repositories so that information can be worked with regardless of its origin.

Such applications will have a big impact on personal information management, Berners-Lee said, as users will be able to do things such as drop their bank statements onto their calendars and have items automatically populate given dates.

Such descriptions sound familiar to anybody who's been following IBM's work with its Information Integrator technology or Oracle Corp.'s upcoming Tsunami content management offering, which it plans to roll out at Oracle OpenWorld in San Francisco next week."

"The Semantic Web is going to be like a huge data bus, Berners-Lee said—a back-end bus that spans the planet. Comparing it to Tsunami or Information Integrator is like saying there used to be Hypercards before the Web. "Yes, there were innumerable Hypercard applications before the World Wide Web," he said. "They just didn't talk the same language.""

Books, Music and Humour

* The nation's 10 favourite books and there's actually 11. 1984 is an interesting inclusion, panelist noting that it's relevant because of the similarities between Osama and Goldstein (with recent news of him disappearing).
* Don't believe the hype "U2 are probably the most over-rated band in history. Their debut, Boy, was a classic and still sounds fresh and impassioned. Fatally though, they became a band that believed their own (fawning) press and whose egotism has devoured their talent. The Joshua Tree showed what a good guitar group/stadium rock band U2 could be. Sadly, they had the sort of pretensions that usually afflict mediocre American outfits like the Chili Peppers. On Achtung Baby and Zooropa they started plundering other bands' innovations and moving into "dance music" - though only the whitest, geekiest student could dance to them..." Also The Beatles, both Evlii (Elvis Costello and Elvis), Prince, Nirvana, etc.
* The 10 Least Successful Holiday Specials of All Time Includes: Ayn Rand's A Selfish Christmas (1951) and The Lost Star Trek Christmas Episode: "A Most Illogical Holiday" (1968).

Processing Java Annotations

Gram "Gram is a simple xdoclet-like tool for processing doclet tags or Java 5 annotations in source code or bytecode and auto-generating files, data or resources.

Gram = Groovy + JAM. JAM does all the hard work of abstracting away the details between annotations and doclet tags and handling Java 1.4 and 5 compliance. Groovy takes care of the scripting, code generation & templating. Gram is the little tidy bit of code in between."

To Southampton and beyond

Berners-Lee takes professorial chair at Southampton University "Berners-Lee will take up a chair in computer science at the University of Southampton's School of Electronics and Computer Science, holding this position as well as being senior research scientist at MIT, and director of W3C."

Friday, December 03, 2004

OWLchestra

OWLchestra "OWLchestra is a Web vocabulary management system (WVMS) that enables the collaborative development of small-scale OWL ontologies and RDF schemas." Includes screenshots and a brief paper. I saw this a while back, via BNode.

My Other Computer is Google

The magic that makes Google tick "The numbers alone are enough to make your eyes water.

* Over four billion Web pages, each an average of 10KB, all fully indexed.
* Up to 2,000 PCs in a cluster.
* Over 30 clusters.
* 104 interface languages including Klingon and Tagalog.
* One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
* Sustained transfer rates of 2Gbps in a cluster.
* An expectation that two machines will fail every day in each of the larger clusters.
* No complete system failure since February 2000."

PISTA

Semantic Association Identification and Knowledge Discovery for National Security Applications "Our goal is to research new techniques and improving effectiveness of techniques to identify semantic associations and knowledge discovery by exploiting a large knowledge base. Specific objectives include (a) ontology driven lazy semantic metadata extraction (i.e., annotation) to complement traditional active metadata extraction techniques, and (c) formal modeling and high-performance computation of semantic association discovery including ontology-based contextual processing and relevancy ranking of interesting relationships."

From the project report: "The nai?ve algorithm to find all paths between 2 nodes in a directed graph [1] shown below is a recursive implementation of a depth-first search. Our first implementation of the ?-path operator is based on this algorithm."

"Our initial implementation of the ?-Intersect operator is based on the ?-path operator. It searches for nodes where two ?-paths intersect (see Figure 5). We recognize the fact that there could be multiple intersection points for the ?-paths. Hence our implementation returns the sequence of nodes that are common between 2 ?-paths."

"The goal of ?-Iso is to take two resources as input and discover all paths that are “isomorphic” in both resources..."

Also, Semantic Association Identification and Knowledge Discovery for National Security Applications and Context-Aware Semantic Association Ranking.

Keeping Secrets

When Secrets Make Sense "Recently I wrote a short piece making a strong and general claim that the same forces that are pushing data towards XML are pushing software towards Open Source." From that article "These days, interoperation and integration are everything. You’d better have open interfaces, open networks, open services; that is, open data."

"’I'll put my finger on two pieces of Apple technology that benefit from being (for now) closed-source: Aqua and the video part of iChat. Both of them contain some magic that nobody else has figured out how to do yet, and if they can figure out how to make a few bucks in the gap before the world catches up, more power to ’em."

"Closed source isn’t over or anything like that. It’s just headed for a niche role, business-wise."

Not sure I believe that XML is the force to free you from vendor lock-in. RDF is of course - remember to put all your data in Kowari/TKS though, you know it makes sense.

A couple of quickies

* Via Matters EPOS "The objective of EPOS is to leverage a user's efforts for his personal knowledge management for his own benefit as well as to evolve this within the organization." The successor to Frodo.
* I'm sure everyone is now reading SWS IG scratchpad "Helpful paper tips: Repeat, "my god, we save X billion dollars with OWL-S! But we can't quite tell you how since that would reveal trade secrets! But trust us!" as many times as possible. Shrink margins if you must!" for the WWW2005 Tutorials and Workshops

Thursday, December 02, 2004

Space Colonies and Ontological Programs

What the Heck is an Ontology? (in PDF format) "There's no doubt about it; ontologies are on the up. The ideas are gaining wider acceptance and these days you're much less likely to be greeted by that familiar baf ed expression of those who are too afraid to ask what the heck you are talking about. Tools like Protege (http:// protege.stanford.edu/)continue to improve in robustness, and extend their functionality; and there are new tools on the scene too,such as SWOOP (http://www.mindswap.org/2004/SWOOP/) and Haystack (http://haystack.lcs.mit.edu/index.html)."

"Perhaps the hardest technical challenge is using ontolo-gies to enable the ad-hoc reuse of existing software com-ponents. In other words, it is much easier to design a suite of programs that all commit to the same ontology than to coerce existing programs to communicate effectively."

Wednesday, December 01, 2004

Coding for Failure

Automating Software Failure Reporting "Developing a failure reporting system requires an understanding of a product’s customer base, as well as usage profile. It is also important that you under-stand the failure profile of your product so that you can focus on events most annoying to your customers. Although product attributes are unique, there is a generic set of data that, if collected, will help in diag-nosing failures. In collecting customer data, however, you must address all privacy concerns prior to rolling out your process. Along with a failure collection system, you also need a process that can distribute patches to address those failures.

At Microsoft, our experience has led us to develop a generic methodology to process, transmit, analyze, and respond to customer failure data. Differences exist in the way this process can be implemented; it is usually dependent upon the product type and its failure profile. "

Java and LGPL

The LGPL and Java "FSF's position has remained constant throughout: the LGPL works as intended with all known programming languages, including Java. Applications which link to LGPL libraries need not be released under the LGPL. Applications need only follow the requirements in section 6 of the LGPL: allow new versions of the library to be linked with the application; and allow reverse engineering to debug this."

"If you distribute a Java application that imports LGPL libraries, it's easy to comply with the LGPL. Your application's license needs to allow users to modify the library, and reverse engineer your code to debug these modifications. This doesn't mean you need to provide source code or any details about the internals of your application. Of course, some changes the users may make to the library may break the interface, rendering the library unable to work with your application. You don't need to worry about that -- people who modify the library are responsible for making it work."