Monday, March 31, 2003

"The term 'hunk' is not only an English word, but also a cell type (human natural killer) and a gene (hormonally upregulated Neu-associated kinase). A search for the term 'hunk' in PubMed gives references to all three contexts. These semantic variables make information retrieval and analysis very inefficient and ineffective. Searching outside one's area of expertise, where one is unlikely to be familiar with the nomenclature, can be extremely challenging. Moreover, what we also need is a way to control the context of computational searches, so that concepts that have different meanings, in different contexts, can be retrieved appropriately. While human beings are able to decipher the context of different articles containing for example the term 'hunk', computers are much less able to do so.

The use of ontologies is a key step...It is a very effective way to structure biology or chemistry in a way that helps scientists to understand the context and relationships that exist between terms in a specialised area of interest."

BioWisdom on The use of ontologies in drug discovery.

Saturday, March 29, 2003

Cambridge first with electronic archive "Cambridge University is set to become the first UK university to launch an electronic super-archive that would make its academic material freely available.

The £1.7m Dspace project, being developed in collaboration with the Massachusetts Institute of Technology, in the US, will provide a digital repository of academic information that could potentially be accessed by anyone with a Google search engine."

An outline of the project at Cambridge is also available.

Friday, March 28, 2003

It's been a long wait but JAXB 1.0 is now available through the Java Web Services Developer Pack 1.1 Download.

Thursday, March 27, 2003

McDonald wins DoD semantic Web work "Users may include military intelligence, as well as agents from the Central Intelligence Agency, Defense Intelligence Agency, the National Security Agency and the Department of State, said Kenneth Bartee, president of McDonald Bradley.

McDonald Bradley will explore use of metadata tagging, data-content markup, taxonomies, ontologies and other extensible markup language and semantic Web techniques to help systems index and provide information."

Catch the RDF Wave

PC Forum and the Semantic Web "Leading into an example of how this vision could add value to today's version of the Web, Berners-Lee asked, "When you create a database with a concept like a zip code, why doesn't the computer automatically know what it is?" Continuing, he suggested that a Semantic Web could create a very distributed set of connections for data. The problem is complicated, acknowledges Berners-Lee, but he believes the applications could be as revolutionary as the Web was ten years ago and urged companies creating products to input and output RDF data, saying "Catch the RDF wave.""

What to Do With All That Information "He talked about the glorious future of something called the "semantic web," a concept that most people aside from programmers will probably never understand. His talk here didn't help. One hopeful journalist from the Economist asked Berners-Lee to give an example of how companies could make or save money using it, but he didn't have an answer. He doesn't have to. He's an academic. Businesspeople here assured me this is a big deal we'll hear more about."

Wednesday, March 26, 2003

Hotspot for Databases

New open source query system adapts to the database "“Traditional databases have two phases for a query: first a query ‘optimiser’ examines the query and picks a good execution plan, and second a query ‘executor’ runs the plan chosen by the optimiser from start to finish,” he said.

The static nature of these processes means there is no ability for the system to react to changes that occur while the query is running, such as another process chewing up memory or unexpected pauses while the related query data is streaming.

To combat this, “adaptive dataflow technology merges the optimiser and executor phases, so that while a query is running it is constantly being observed and re-optimised in an organic fashion,” he said. In other words, an adaptive system constantly re-evaluates its environment while processing a query to ensure it is working effectively – despite any runtime fluctuations"

I wonder what the exact problem they had with Java? Apparently it was memory management.

Monkey's Heads not Wrenches

RDF and other monkey wrenches has caused responses such as: We the Monkeys, YARD: Yet another RDF diss and RDF is a monkey wrench?.

In some ways I think Sean has missed the point. Sean says we should write our XML vocabularies and convert it to RDF. But what about using tools to create an RDF ontology or taxonomy and convert down to XML? Tools like Protégé or KAON's ontology modeller all allow you to model your information without realising you're using RDF. RDF, being more abstract, has the ability to be superior to XML. The UI can be just like any other XML tool around or better (more flexibility, reuse, adaptive, etc.). You don't have to understand the RDF model. However, in the meantime I see nothing wrong with existing tools and applications producing XML that can be converted to a generic RDF ontology. Indeed, if Semantic Web Services is to take off this will be a neccessity. See RDFT.

Please don't make the same mistakes (like NULL) as SQL - that is preserve the model when implementing.

To quote Fabian on XML: "The model they did come up with is the same hierarchic model which we discarded 30 years ago and replaced with SQL, because it was too complex, inflexible and lacked rigor." See also The Data Exchange Tail - Part 2.

Tuesday, March 25, 2003

From Gambling to Homeland Security

Homeland Security panel at PC Forum "The In-Q-Tel representative, Gilman Louie, is talking a good line, arguing that government shouldn't have access to commercial data, that there is no advantage to throwing more and more and more marketing and other commercial data into an already overloaded analyst's queue.

Esther Dyson says that she doesn't believe that the "Chinese walls" that keep the Feds from digging through personal info are really all that effective.

Gilman Louie from In-Q-Tel says that the real problem is local law-enforcement, who use techniques like sorting all records for Arabic last names -- technology makes bad policy worse. Unless we beef up the audit and accountability side of the house, it will get very scary.

I just put my questions to the panel: How can you square investigating everyone with the Constitution? How can an algorithm's oracular pronouncement stand in for due process? Gilman Louie's response was very good. He said that he believed that profiling is the wrong answer to the wrong question, that it should be a tool that's applied after human judgement, not before. He compared profiling to the internment of Japanese-Americans and averred that profiling is a dangerous tool for racial or ideological discrimination."

ASIS cover Semantic Web

The Semantic Web "Feeling insecure about the Semantic Web? This issue will solve your problem."

Richard Dawkins on the War

Bin Laden's victory "Whatever anyone may say about weapons of mass destruction, or about Saddam's savage brutality to his own people, the reason Bush can now get away with his war is that a sufficient number of Americans, including, apparently, Bush himself, see it as revenge for 9/11. This is worse than bizarre. It is pure racism and/or religious prejudice. Nobody has made even a faintly plausible case that Iraq had anything to do with the atrocity. It was Arabs that hit the World Trade Centre, right? So let's go and kick Arab ass. Those 9/11 terrorists were Muslims, right? And Eye-raqis are Muslims, right? That does it. We're gonna go in there and show them some hardware. Shock and awe? You bet. "

"Saddam Hussein has been a catastrophe for Iraq, but he never posed a threat outside his immediate neighbourhood. George Bush is a catastrophe for the world. And a dream for Bin Laden."

This is so dripping with vitriol. Richard Dawkins has previously written about how religion has caused so many of the world's problems and now he points out that American democracy may also be to blame.

Monday, March 24, 2003

Checked vs Unchecked

Another posting from Bruce Eckels describes his continued support of unchecked exceptions. His disturbing article, "Does Java need Checked Exceptions?, has bothered me for a very long time. I've found checked exceptions, especially typed ones (see Bill de hÓra counter discussion), the most useful thing for tracking and handling errors. The thing with Runtime exceptions is that they allow the responsibility for handling the errors to be lost and somewhere in a big framework the context can go missing (especially with RMI being involved).

Speaking of RMI, the best example I've seen as to why RMIException is checked and not runtime came from a Sun engineer:

"3) checked exceptions foster more robust programs

There was a time when Oak and the earliest version of Java did not have checked exceptions. Exception handling was advisory, and it was an unsafe world out there. It was our group (Jim Waldo and me in particular :-) that recommended that there be exceptions checked by the compiler. Jim was quite persuasive in his arguments, telling of a world where robust code would reign. After some consideration, Java was retooled to have checked exceptions. Only those exceptions for which there was no recovery or reflect application errors would be unchecked (e.g., OutOfMemoryError, NullPointerException respectively). And the world was safe again.

Imagine the Java engineers' surprise when many exceptions in the Java API and compiler were changed from unchecked to checked, and the compiler enforced the distinction, they uncovered bugs in the implementations! So, the best efforts at handling error conditions, however good intentioned, was not good enough. That compiler is useful for something :-)"

Sunday, March 23, 2003

For Humanitarian Reasons not Disarmament

Bush is an idiot, but he was right about Saddam "So you think the way he's presenting this war to the world is really where he's gone wrong.

Yes, it has been wretched. He's presented his arguments for going to war partly mendaciously, which has been a disaster. He's certainly presented them in a confused way, so that people can't understand his reasoning. He's aroused a lot of suspicion. Even when he's made good arguments, he's made them in ways that are very difficult to understand and have completely failed to get through to the general public. All in all, his inarticulateness has become something of a national security threat for the United States.

In my interpretation, the basic thing that the United States wants to do -- overthrow Saddam and get rid of his weapons -- is sharply in the interest of almost everybody all over the world. And although the U.S. is proposing to act in the interest of the world, Bush has managed to terrify the entire world and to turn the world against him and us and to make our situation infinitely more dangerous than it otherwise would have been. It's a display of diplomatic and political incompetence on a colossal scale. We're going to pay for this. "

It's almost exactly a year ago I wrote about Global morality and "The White Man's Burden". The original story makes very similar points to the above article such as: " I think a lot of moral thought, as applied to politics, has evaded conflict, has tended to suppress or gloss over conflict, or to rest on the pious hope that conflict can always be reconciled or somehow transcended. I don't think that's true."

There are good arguments on both sides I guess. ;-)

Saturday, March 22, 2003


"In the wake of Apple's announcement late yesterday that former Vice President Al Gore was elected to the company's board of directors, President George Bush announced this morning that he would demand a recount.

"I am sure that when the votes are tallied again," Bush press secretary Ari Fleischer said, "Apple will find that President Bush was, in fact, named to the board, not Mr. Gore."

Apple CEO Steve Jobs weighed in on the developing controversy by saying "I'm not sure what the heck he's talking about. It was a unanimous vote. I can read it back to you. Jobs: yea. Campbell: yea. Drexler: yea...""

Bush Demands Recount In Gore's Board Election.

Efficient Labelling

On Labeling Schemes for the Semantic Web "...we are interested in labeling schemes for RDF/S class (or property) hierarchies allowing us to efficiently evaluate descendant/ancestor, adjacent/sibling queries, as well as, finding nearest common ancestors (nca) by using only the generated labels. Compared to the transitive closure evaluation reported in our previous work [14], the performance gains for these queries are of 3-4 orders of magnitude when using adequate labeling schemes! Then, starting from a topic somewhere in the taxonomy, a user can easily and efficiently access not only its father/children (as in existing Portals) but also the leaf topics underneath where most of the web resources are classified, discover sibling topics (where related web resources may be found) or even continue navigation from the nca of two topics in the hierarchy."

3store 2.2.1

It can return "...500+ results from the 4.1 million triple KB in around 400ms, using subClassOf and subPropertyOf reasoning. NB this does not include the time taken to download the results to your client if usingHTTP.

Interring and indexing the triples from RDF/XML files takes around 2 hours."

FOAF Browser

Social Networking utilizing Semaview's Friend of a Friend Browser "The application is built on top of a few existing application packages. The RDF API for PHP ( is used to process the FOAF files. Alex Shapiro’s GoogleBrowser ( based on the TouchGraph project ( is the basis for the visual display."

Friday, March 21, 2003


I was watching this thing on TV. It's sort of like a book I read that considered what the world would be like if Hitler had won the war, or Sliders or something. Interestingly, it posed the question: "What would happen if Bush won the 2000 election?". Jeez, that would just be wrong. It's lucky that Gore won.

What would he have done then? Maybe he would have moved away from politics and into IT (hasn't that taken off - Wired was so right). Maybe he'd join Apple's board. Nah, that's just ridiculous.

Thursday, March 20, 2003


"High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML. The goal is to facilitate information retrieval research by providing an interchangable toolkit of functions." Written in Perl and C/C++.

Telcordia Demo Machine "...By using statistical algorithms, LSI can retrieve relevant documents even when they do not share any words with a query. LSI uses these statistically derived "concepts" to improve search performance by up to 30%."

Wednesday, March 19, 2003

RDF and Topic Maps

A small document published by KTWeb reports that: "As early as October 2000, Gartner Group experts predicted that “50 percent of portal and search engine providers would be using technologies on a Topic Map basis in the next three years”. More recent forecasts have been less enthusiastic, but still consider Topic Maps as having a high probability to become a “mainstream technology” by 2003; whereas METAGroup estimations see the technology significantly hitting the market by 2005-2007, with a potential over $1 billion." Topic Map Constraint Language is something that I hadn't seen before being analogous to OWL and RDF Schema. It's good to see another Australian organisation looking at RDF and Topic Maps too.

Monday, March 17, 2003


Technical trends bode well for KM "The same protections also apply to Tacit's forthcoming ActiveNet — which the company describes as a tool-agnostic, "active collaboration network." E-mail messages and other documents such as those in Notes, SharePoint, or Weblogs are fed into ActiveNet and are organized into topical "hotlists" that drive a supply-and-demand market for contacts and information. The topics in your public profile define a supply of expertise in those areas; other users who construct similar profiles create a demand; the system offers to connect those whose profiles intersect."

"Corporate blogs may be ushering in new era of voluntary communication critical for KM. Knowledge that people don't publish can, alternatively, be deduced from documents and message traffic -- if you take necessary privacy precautions."

Friday, March 14, 2003

Sun's Knowledge Management

A recent blog entry by Uche Ogbuji points to a document on how Sun is using swoRDFish to provide "...a controlled vocabulary, organizational classifications, and business rules. It also includes a core set of industry standard metadata tags, complete with SMI-specific extensions."

"Uche Ogbuji, founder of Fourthought, is one of the most respected thought leaders in RDF. He is currently working with Sun's Global Knowledge Engineering team to develop a system to integrate metadata from Sun's internal product and customer profile information. This integration will help amplify the power of the intranets and customer self-help sites at Sun. According to Mr. Ogbuji, Sun's efforts in knowledge management offer far more potential than commercially available ones, because the GKE team has chosen to adopt standard metadata and knowledge- management formats and techniques. Many other companies working with KM are using proprietary systems and methods. In my experience, this is very limiting because it restricts the value of the result to the amount of knowledge that can be gathered within the closed system. By adopting industry standards, Sun effectively opens up the system, allowing it to derive additional power from publicly available knowledge bases. This is analogous to the quantum leap in application development that developers gained once they began connecting their applications to the open Internet."


Semantic Web Services Interest Group Charter "The Semantic Web Services Interest Group will address both architectural/systems issues as well as application interests. Architectural principles are expected to be reported to the Web Services Architecture Working Group, for inclusion in the Web services architecture document."

Thursday, March 13, 2003

More Nukes

"The Bush administration is mounting a fresh push to add to the United States' nuclear-weapons arsenal.

In a report to go to Congress this week, the Air Force is expected to make the case for developing a "robust nuclear earth penetrator" that can destroy deeply buried targets. The earth penetrator, proposed last year by Bush-administration officials (see Nature 415, 945; 2002), would be a hardened version of an existing warhead.

The administration has also taken steps to develop new, low-yield nuclear weapons, which it says could be used to destroy chemical and biological weapon facilities. The president's 2004 budget asks for the repeal of a 1993 law that prohibits the development of weapons with yields of less than 5 kilotonnes as they would lower the threshold for using nuclear weapons."

"But scientists familiar with nuclear-weapon design questioned the usefulness of the proposal and said that it would encourage global nuclear proliferation. "'Penetrators can't penetrate very far," says Richard Garwin, a physicist who has advised the US government on nuclear-weapons policy. "And they will not destroy chemicals or even anthrax under likely conditions of storage."

Bush seeks to beef up US nuclear-weapons arsenal

Wednesday, March 12, 2003

The first thing you do...

Nforce 2 BIOS problems confirmed and fixed "Users of a new Shuttle Nforce 2 based machine had reported problems with the BIOS. They had been trying to overclock their machines only to find that, once they went past a certain point, the machine would stop working...Setting the FSB too high can stop Nforce 2 motherboards from POSTing. Once the motherboard has stopped POSTing, the only way to get it going again is to reset the FSB back down to 100MHz. Only most motherboards don't have the jumper so you can't do that." Looks like they have the Intel G2 now too. A review at SFF says that the Intel version has 2 PCI slots instead of an AGP and a PCI slot. There's also an impressive picture of a bank of servers at Los Alamos National Laboratories.

Tuesday, March 11, 2003

JDK 1.4.1 for OS X

Apple Releases Java 1.4.1 for Mac OS X "Apple had two tasks in moving from the 1.3.1 version of Java to 1.4.1. First, there were 60% more classes from Sun in the 1.4 release. Second, Apple decided to do a major rewrite and clean up the technology underneath the release. The GUI was moved from Carbon to Cocoa. This decision was costly in terms of time, but means that Java applications look and feel more like native applications on Mac OS X." Yay! And remember if all else fails: sudo /usr/local/bin/jsettestjdk 1.3.1

Semantic Applications

Semantic Applications, or Revenge of the Librarians "More broadly, the need for much more formal labeling and language suggests a new phase of IT industry focus. Whereas the PC and Internet industries were once described as the "revenge of the nerds," looking ahead, the emphasis on detailed classification and information management might well be described as the "revenge of the librarians."

Semantic systems present a different set of challenges and require a different set of business skills. The need for fancy words such as semantics, taxonomies and ontologies suggests a world of more rigorous thinking and information handling, a significant cultural change from today's mostly freewheeling Internet."

This is from Customer-Driven IT: How Users Are Shaping Technology Industry Growth

KAON 1.2.5 Released

KAON source and binaries have been made available for download. KAON supports "...ontology evolution, ontology modularization, concurrent ontology access and transactional processing. Storage mechanisms currently include RDF models or a relational database." The Ontology Registry is "...a tool that provides mechanisms for registering and searching ontologies in a distributed context. Ontologies can be searched based on their content, by matching ontology elements against WordNet". Other modules include: the KAON portal, an Ontology modeller and a text to ontology tool (a text mining tool).

The Ontology Modeller is interesting in that it supports search (using the Ontology Registry) including searching on instance, concept or property. From the manual.

Friday, March 07, 2003

Semantic Zooming

A Passion for Metadata "Semantic zooming changes the shape or context in which the information is being presented. An example of this type of technique is the use of a digital clock within an application. In a normal view, the clock may show the hour of the day and date. If the user zooms in then the clock may alter it’s appearance by adding the seconds and minutes. If the user that zooms out, information is discarded with only the date remaining. The actual information did not change, only the presentation method.

What does this have to do with Metadata? The answer is everything. If a user begins to review information at the system level and then drills down, we don’t just add more details about the system. We add interfaces, components and data. If the user drills into the data then we will add the logical, conceptual and physical views of the data. We are actually altering the view in hopes to increase the understanding of the information being presented."

There's also a lot on how they approach selling metadata management to corporate departments.

Meaningful Web Services

Opinion: The Next Big Web Thing - Really " Into the airless and solipsistic void of the Web services debate, enter the World Wide Web Consotium, where the Web's founder, Tim Berners-Lee, has recruited respected professionals to steadily assemble something called Resource Description Framework, or RDF. This is not only the most interesting idea in computing since the Web, but also the only thing since shopping carts and dancing applets that seems in danger of actually being relevant to humans who use computers...As the debates between Microsoft and Sun seem more and more to fade into the partisan struggle for market share within IT shops, the open protocols that are reinventing applications -- WebDAV, XML, RDF, SOAP, WSDL and the rest of the alphabet -- more and more will be the only things accelerating more meaningful use of the Web..."

Web services in serious jeopardy: "When people ask me to define Web services, I tell them that it's primarily an idea hatched by IBM and Microsoft to steal customers from each other."

In the recent Social Meaning of RDF there's a link to an email by Tim Berners-Lee:
"> Perhaps we can get all we need by describing intended use.

That is where you start getting into questionable stuff, necessarily slanting the use of RDF some way."

Tuesday, March 04, 2003


Semantic Networks "A semantic network is fundamentally a system for capturing, storing and transferring information that works much the same as (and is, in fact, modeled after) the human brain. It is robust, efficient and flexible." The anatomy of the semantic network consists of concepts, relationships and instances.

"The semantic network implementation is further described:
Users will each have their own set of concepts: one for their login, one for their password (MD5 encoded).
They will have an instance relating them to the concept "Users" to define them as such. They will have instances
involving element permissions.
{Users, has member / member of, [UserName]}
{[UserName], has password / password for, [Password] *MD5 Encoded}
{[UserName], [permission] / [permission], [ElementGUID]}"

Security Whitepaper


On Prevayler: "As I have argued on many occasions, a major problem in the IT industry is the large and increasing number of practitioners -- particularly the younger, Web-oriented generation -- who have a programming/HTML background and no exposure to data fundamentals. They make no distinction between application programs and a DBMS and between files and databases -- they aren't even aware of such distinctions. Many were not around when databases and DBMSs were invented precisely because we learned the hard way that managing data in files with applications is not a cost-effective approach. Since the object approach originates in -- and was intended for -- programming, it is hardly surprising that, in the absence of such a distinction, programmers extend objects to database management, too - to those with a hammer, everything looks like nails. "

"Even if data reside in memory, there are still critical reasons to manage it independently of applications -- these are two separate issues. Indeed, data independence, including integrity independence, is a major objective in switching from application files to databases."

Oh, Oh not OO Again

OS Image Search Engine

imgSeek is a GNU/Linux photo collection manager and viewer with content-based search and many other features. The query is expressed either as a rough sketch painted by the user or as another image you supply (or an image in your collection). The searching algorithm makes use of multiresolution wavelet decomposition of the query and database images.

Similar to Eikon.

TeSSI and FreePharma

"TeSSI is a state-of-the-art tool that improves upon the existing search and retrieval tools by extracting the meaning out of medical free text and placing the resulting medical ‘concepts’ in the document index instead of terms. This creates a very powerful environment allowing healthcare knowledge seekers to query content stores in natural language and retrieve highly relevant information with great accuracy."

"FreePharma is a software plug-in that analyzes drug prescription information...Each year medication errors cause hundreds of thousands of patient deaths annually (140,000 deaths per year in the US alone) and cost billions of dollars.

Software is available with drug-drug interactions, drug-allergy interactions... but all of these databases / applications need structured input. Prescriptions however are made in human language. Hence, most applications use data entry forms combined with pick lists and point and click interfaces to obtain structured data input from the very beginning. This is time-consuming and quite often very impractical and resisted by many doctors."

Monday, March 03, 2003

From Complete Separation to Complete Integration

Jive "...provides a library of pre-coded animated data structures. These data structures are identical to the standard data structures they derive except for the interactive animation they provide." The gallery has some nice examples of the UI including a graph representation. This is almost the complete opposite direction that I've taken with RCOSjava. Instead of messaging between the things being modelled and the animation (in an MVC type fashion) they have integrated these two and render it based on the same values.

Can Software Startups Succeed?

"* Be willing to share your source code. Even if your software is the coolest code since the internet, CIOs will worry about your startup's survival and what will happen if your company dies. Putting your source code in escrow will mitigate that worry.

* Don't compete on price; compete on speed. Some startups are so hungry for customers they give their product away. The smart CIO knows this only increases the odds that the startup will run short of money and die. Speed--of implementation, of problem solving--shows IQ and commitment."


RACER is a knowledge representation system it supports:
* the DIG standard such that, for instance, graphical ontology editors such as OilEd can be usedwith RACER as a reasoning engine.
* directly read knowledge bases specified w.r.t.the DAML+OIL,RDFS or RDF standard (although there are some restrictions on some DAML+OIL language expressions).
* an interactive graphical shell called RICE.

This came in response to a question on the RDF-Interest mailing list about inferencing tools the other ones mentioned were:

That Certain Context

"Referred to as both entity- and fact-based extraction, the technology complements search engines, content management systems, and portals by helping improve relevancy of results, according to analyst Laura Ramos, director of research at Giga Information Group, in Cambridge, Mass.

"The key thing here is this ability to pull specific words and phrases out of documents and automate figuring out what they mean," Ramos said."

"Another unstructured data management player, Inxight Software, is ramping up fact extraction capabilities in its SmartDiscovery information retrieval product, which combines natural-language processing, linguistics, and classification technologies.

Inxight's technology has gained strong traction within government agencies where it is often used to further counter-terrorism and intelligence efforts. The Sunnyvale, Calif.-based company this month signed 10 contracts worth $3 million dollars with the U.S. Department of Defense."