Monday, June 30, 2003


"FLORA-2 is a sophisticated object-oriented knowledge base language and application development platform. The programming language of FLORA-2 is a dialect of F-logic with numerous extensions, including meta-programming in the style of HiLog and logical updates in the style of Transaction Logic. FLORA-2 was designed with extensibility and flexibility in mind, and it provides strong support for modular software design through its unique feature of dynamic modules.

Applications of FLORA-2 include intelligent agents, Semantic Web, ontology management, integration of information, and others. " goldy and bronzy only made of iron.

The final irony "Our age has not so much redefined irony, as focused on just one of its aspects. Irony has been manipulated to echo postmodernism. The postmodern, in art, architecture, literature, film, all that, is exclusively self-referential - its core implication is that art is used up, so it constantly recycles and quotes itself."

Google Fearing Americans

Is Google God? "The key point is not just whether people hate us," says Robert Wright, the author of "Nonzero," a highly original book on the integrated world. "The key point is that it matters more now whether people hate us, and will keep mattering more, for technological reasons. I don't mean just homemade W.M.D.'s. I am talking about the way information technology — everyone using e-mail, Wi-Fi and Google — will make it much easier for small groups to rally like-minded people, crystallize diffuse hatreds and mobilize lethal force. And wait until the whole world goes broadband. Broadband — a much richer Internet service that brings video on demand to your PC — will revolutionize recruiting, because video is such an emotionally powerful medium. Ever seen one of Osama bin Laden's recruiting videos? They're very effective, and they'll reach their targeted audience much more efficiently via broadband.""

Friday, June 27, 2003

Carrot Squared

"Carrot2 is a research framework for experimenting with automated querying of various data sources (such as search engines), processing search results and their visualization.

Under the term "research", we understand that the architecture of the system is oriented mostly toward flexibility, sometimes at a price of performance losses. Mechanisms such as data exchange via XML language, dynamically loaded components accessible via HTTP protocol, the use of Java as primary language of implementation -- they all make the system very easy to tailor to one's needs."

An open source, Java based, framework with some cool demos showing result clustering - what more could you ask for?

A couple of demos: Google search for "RDF" with Dynamic Trees and AllTheWeb search for "data mining" with Dynamic Trees. Carrot2 Homepage.

The Way to Preserve is to Share

Storing e-text for centuries "To solve this digital dilemma, Ms Reich and Mr Rosenthal have looked long and hard at what the great libraries of the world have done over the millennia. First, they acquire copies and make them available to their local readers, while seeking to preserve them to the best of their ability. But if copies get lost or destroyed, they also lend them to each other. It is these circulating collections—which in effect form a peer-to-peer network with no central authority—that LOCKSS seeks to mimic.

It works by getting libraries to install a piece of software on a PC with a large hard disk, turning it into a cache for web pages. The program then pulls down the content of various journals that the library in question has subscribed to. If the system detects that one of its copies is damaged or missing, it asks the original publisher, or the cache of another library, to send it a fresh copy."

This idea has much in common with others such as XML Catalogs, DSpace and the like. All are using distributed stores to solve the problem of continuous access to data.

Thursday, June 26, 2003

Mysterious MSNBot

Microsoft, Google may go head-to-head "Microsoft could then connect the search engine of its MSN portal to new file technology planned for the next version of Windows, code-named Longhorn, which will make it easier to search e-mail, spreadsheets and documents on PCs, corporate networks and the Web. The result would be a powerful technology reaching from the desktop to the greater Internet that could displace Google as the Web's leading search engine."

Could it be something like Grub or some other decentralized metadata strategy with integration into the OS?

Wednesday, June 25, 2003

Stuck in the Middle

"The small companies offer me no visions. They can't build platforms; they can't challenge Microsoft, and if they keep squabbling with each other, they can't even create simple standards. The press and the business world won't even look at their technology until after it has been co-opted by the big players.

If you want my support, and the support of others like me, propose a vision. Show me you can co-operate, show me you can build platforms, and show me you can drive back Microsoft without becoming the next Microsoft. Tell me a tale of 2031, and what I'll be doing when I'm 55."

Bleak visions of the software future.

Tuesday, June 24, 2003

The Truth About Metadata

In a a recent posting Ben Hammersley comments about truth in metadata: "The answer to the problem of false metadata is, in fact, more metadata. For the problem of false metadata is actually a subset of the much wider problem of false information in general. Although the semantic web's need for metadata allows for more things to be false, metadata is also the only way to denote the data as false once you know it to be so."

I replied with some comments (from the OWL specification), I still hope that everyone becomes a librarian.

All that Mac Hype

Two somewhat ignored announcements by Apple:
- XCode "Anyone could guess that bringing multiple processors to bear on a build would make it go much faster, but Xcode lets you act on the obvious solution. With the Rendezvous-enabled distributed build feature it’s easy to simply farm out your build by distributing compile workload across idle desktop machines or, better, deploy a dedicated Xserve build farm to do in minutes what would take hours on any single machine."
- No Mac Left Behind "It doesn't stop there. Apple is so intent on providing an upgrade path for almost every Mac ever made that they're planning G5 upgrades for some real antiques.

The G5/II upgrade kit is for the Macintosh II, IIx, and IIfx, three six-slot workhorses designed before System 7 even came into being. Again, these will include a new backplane and will not support legacy floppies, serial devices, or ADB peripherals. Since the original computers did not have onboard video, the G5/II motherboard will include an AGP 8x socket, leaving room for up to five PCI-X cards. SCSI support is included, and after this upgrade Mac II and IIx owners will no longer have to worry about oddball memory, low memory ceilings, or "dirty" ROMs.

Anticipated price of the 1.4 GHz G5/II upgrade is US$499 plus the cost of an AGP video card." ;-)

They released Safari and some hardware too.

Monday, June 23, 2003

"this" Considered Harmful

This is a favourite of mine, one that I would consider others adopting: "Don't Repeat Yourself. If you really insist upon adorning your object fields with some token, I will argue that "this." is a horrible way of doing so. Instead, consider prefixing your field with a consistent name or character...Adopting a prefix like "m_" or "_" is preferable to "this" because the compiler ensures consistency. It does not rely on human programmers to remember to repeat the "this" prefix over and over."

I would prefer using something a little more expressive to prefix my variables names instead of "m_" or "_". I prefer to name the variables being passed in with prefixes such as "new", "old", "existing", etc., whatever makes sense.

This is similar to my beef on checked vs unchecked exceptions.

EDGAR Online with XML

"All data and functionality is returned with XML calls. Fundamental data elements on all NYSE, Nasdaq, Amex and OTCBB companies are XBRL and XML tagged."

Various Semantic Web Articles

- The article "The semantic web" which describes using the Semantic Web for public transport and ARKive-ERA "..which aims to make the materials as useful as possible for distinct user groups, from schoolchildren to university lecturers."
- Yet another "What is RDF and the Semantic Web?" which summarizes some aspects of RDF.
- Javaworld reviews Protégé "...One of the MDA's basic assumptions is that UML diagrams can be better maintained and reused than Java code. AI technology suggests that knowledge models (ontologies) can be even better maintained and reused than UML diagrams. Protégé helps you rapidly define such models and their semantics, and automatically generates the necessary GUI elements so your domain experts can conveniently enter their knowledge."

Friday, June 20, 2003


"Features of InferEd include:

- native RDF editing environment
- loading and storing of rdf documents from the web or local file system
- class browser and graphical class diagrams
- resource list and resource detail views
- integration with RDF Gateway (view and edit RDF Gateway tables)
- search and replace (w/regular expression support)
- rule-based inference - add rules to ontologies
- explanation of inferences"

Behind the Scene

To bee or not to bee ""This all sounds wonderful, but five years on it has failed to materialise. According to John Davies, manager of next-generation web research for BT Exact, BT's research arm, that's because the web is simply too huge. "Whether it will make the step to the external web, the jury is out. It's unlikely that anyone will turn those five billion pages into RDF any time soon," he points out."

One way to make it happen would be to build artificially intelligent software robots that go out and automatically recode the existing information on the web in RDF form."

"The first implementations of Semantic Web technology are taking place behind the scenes, within companies that work in specific industries."

I would tend to agree with this last statement. In a commercial environment the production of good metadata, ontologies and the like are a competitive advantage - not something, necessarily, to be shared. The tools and probably some ontologies should be free, of course.

XML Tricks

From XML to RDF using Open Office's drawing program to draw box/ball and stick diagrams and then converting the XML to RDF using Python.

Relaxer RELAX schemas (NG and Core), XML or JDBC to Java (or XML Schema and DTDs).

Thursday, June 19, 2003

Exploring Data

Spectacle:Server is a "...tool that transforms and presents your data sources in an easy to use exploration space." Examples include exploring: Lucene, Google and RDF.

Using Test Driven Development for Teaching

"One of the differences I've seen between really smart beginners and average beginners is that really smart beginners teach themselves by writing little tests. Teaching them to capture that knowledge in running test code is a great idea.

It's a learning curve for everyone not coming from an XP background, and the classroom (especially in a five-days-straight course) isn't the real world, but once they get a feel for it, the lights go on. We're not XP evangelists or anything, and in fact, that's more or less the only part of XP that we really use in class (sometimes we do use pair-programming). But we do believe that learning how to think about things is just as important as the actual things you learn. This goes back to what we said earlier about always telling them the "why" and "who cares"--if they really understand or "get" something, then the exercises don't seem arbitrary, but are the natural outcome. I want people to feel that the right way to do things (or at least a good way) is the most natural way, and not an awkward approach. We know we have failed when someone says, "I don't understand the point of this exercise." That makes us cringe, but every time it happens, we learn something new about what we should or shouldn't do to help people learn. "

Wednesday, June 18, 2003

Open Source RDF Portal

"Leeber is a system for building web portals based on RDF. Leeber provides a variety of ways for end-users to query the catalog contents, as well as letting users make suggested additions to the catalog. Editors review suggestions for approval and may edit the cataloging information for all resources in the catalog."

Uses XDoclet, JSTL, and Jena 1.5.0.

An example site is:

Monday, June 16, 2003

What about Data?

Semantic Web: Gripes and A Way Forward "So, Guha asked, why shouldn’t I be able to send a software agent to and pull all that stuff out without having to poke around their idiosyncratic web pages? This was in 1998, and I still wonder why I can’t do this, it seems like a no-brainer. Also, it seems like this is more or less exactly what they made RDF for; someone invents a basic bunch of property names, and then any company can invent their own special properties, and you need a little bit of RDF-schema-like machinery to make new property names a little more useful."


Treebeard "Often XSLT is used in a background process. For example, transforming an XML document into an HTML page. The process used to write an XSLT document, at least for me, was to write the XSLT in a text editor, save it to a server, and then run the page to see if it transformed correctly. This is a very tedious process if one is just learning XSLT. Most commercial XSLT editor / IDEs are rather expensive, especially if you are an individual just trying to learn XML / XSLT."

The Market is a Conversation

Dave Beckett finally notices, "Andrew Newman who seems to work on the Tucana Knowledge Store ("scheduled for open source release 2002-01-17")". Yeah as I said in May, it'd be nice to live up to your promise. BTW, I've spent most of my time on the other piece of software, TMex.

If you are interested in an Open Source (probably under MPL) version of TKS please send email to (the irony does not escape me). TKS is a Java 1.4 based triple store. It'd be nice to integrate it into Jena (or any of the other Java based RDF frameworks) just install the handleful of JARs and go.

Friday, June 13, 2003

DSpace again

DATA PRESERVATION "A leading example of this is the DSpace project, which has produced the open-source digital repository system, DSpace. This stores intellectual output in multiple formats, from computer simulations to journal papers, and became operational at the end of last year following a two-year collaboration between The Massachusetts Institute of Technology (MIT) Libraries and Hewlett-Packard Laboratories. It is described as a specialised type of digital asset management system that manages and distributes digital items comprised of 'bitstreams' (thereby resolving most future hardware compatibility problems)...'The DSpace system at MIT has policies for what we promise to preserve (i.e. open, popular, standards-based formats like TIFF and ASCII), and those that we will try our best to preserve but can't promise (e.g. Microsoft, etc). Lots of formats fall somewhere in between (e.g. Adobe PDF) so there we make judgement calls...Some will be possible, cheap even, to mass-migrate forward with time. Others will have to be emulated because they aren't really formats (video games or simulations, for example). It's going to be years before we really understand how to preserve these things.'

The MIT Libraries have announced the 'DSpace Federation,' a collaboration with six other major North American research universities and also Cambridge University in the UK (which will be focusing on the regeneration problem), to take the technology further, for which it has received a $300,000 grant from the Andrew W. Mellon Foundation. The plan now is to extend the scope of DSpace by encouraging other organisations to install it, run repositories and help to further develop its adaptability and potential for 'federated collections' - distributed digital libraries held on DSpace repositories in different locations. " Also talks about other projects at the Library of Congress, Havard University and Joint Information Systems Committee.

Thursday, June 12, 2003

Birthday Paradox

Someone told me that if there are 20 people in a room, there's a 50/50 chance that two of them will have the same birthday. How can that be? "When you put 20 people in a room, however, the thing that changes is the fact that each of the 20 people is now asking each of the other 19 people about their birthdays. Each individual person only has a small (less than 5%) chance of success, but each person is trying it 19 times. That increases the probability dramatically." A use for this is in crypto which has a fairly impractical problem with key exchange.

Wednesday, June 11, 2003


A Wiki maintain by Dan Goessling Contains a CVS to RDF servlet. "The servlet gives an RDF representation of the CVS history repository of one of the Olin servers. See CvsOnt for information on OWL ontologies for CVS data. The servlet uses the ontology from Mike Dean. I'd like to use the RDF output to answer software project engineering questions about the files in the CVS archive."

More on Java One 2003

Sun has finally decided to kill Personal Java. JNDC was discussed, it's a high level XML schema for configuring the UI of applications.

Tuesday, June 10, 2003

Christina loves Java

"What would Christina do? With ChristinaMobile, powered by Java technology, you'll know -- every day. ChristinaMobile, by Legend Mobile, keeps you on top of all things Aguilera with daily text message updates right to your mobile phone. Christina A. will tell you what she's up to, what she's wearing, and what's new in the world of music. From fashion and makeup tips to the skinny on her love life, ChristinaMobile keeps you in the loop."

This is what you do with $500 million in advertising? Rip-off Pepsi and Microsoft. ;-)

JavaOne News

Java (Intel), Java (Sun), Java Java (IBM and Palm) ching ching ching ($500 million in advertising).

Why can't Sun come up with its own marketing?

Java Agents

fjipa "jfipa is a set of java-based tools that supports parsing and routing of envelopes/messages of the FIPA Agent Communication Language (ACL) encoded as XML."

OS X 10.3 will be 64-bit

64-Bit Macs May Outpace 'Panther' Double good news, the machines will ship before 10.3 and 10.3 will be 64-bit.

Monday, June 09, 2003

JXTA 2.1

JavaOne 2003: Sun To Debut JXTA 2.1 " Sources close to the company said that Sun, for instance, quietly plans to integrate JXTA support into its future N1 platform and possibly Solaris/Sun ONE stack. They also claim that Sun is considering integrating JXTA collaboration features into its StarOffice suite. Sun, which has developed reference implementations of JXTA for J2SE and J2ME, would neither comment on JXTA 2.1 nor its plans for integrating JXTA into its product line.

In the Windows world, developers have access to Groove Networks' P2P APIs and platform and are awaiting integration of P2P technologies into Windows XP later this year and in the next Windows Longhorn release.. "

"Momentum, which allows users to create shared workspaces, share files, exchange messages and simultaneously edit shared documents, runs on Windows currently. At JavaOne next week, InView will unveil a Linux version that will allow Linux users and StarOffice users to have the same P2P capabilities."

Friday, June 06, 2003

Hoisting the Meta Framework

Keel - The Next Generation Metaframework "Keel could be described as a "Framework of Frameworks", or a meta-framework, a highly extensible backbone for integrating Java components and services. A meta-framework can be defined as a specialized framework that allows easy integration of other functional specific frameworks that work together seamlessly. Meta-frameworks act as a bridge and connect together multiple application frameworks and tend to extract the best out of them. Like the keel of a ship, the Keel framework provides direction for server-side Java projects. With Keel, every service is accessed through an interface, which insulates your application code from becoming dependent on an implementation of that particular service. With Keel, you can avoid tightly coupling your code to a particular service or component, giving you the freedom to "unplug and plug in" services without needing to change your business logic." Homepage.

What Happened to KISS?

Java becoming a LanguageForSmartPeople, not a LanguageForTheMasses? "I like all the little nuggets that are being added with JDK1.5, but there are a lot of little nuggets in there. The next time you look at a few lines of Java code, will you be able to remember even these 13 additions to the simple core language?"

I have enough problems writing bug free code and now they're going to make it harder. :-) I really hope varags don't make it.

Wednesday, June 04, 2003

And so it begins (again)...

Iranian families raided "The homes of at least 10 Iranian families in Brisbane, Melbourne and Sydney were raided yesterday by the Australian Federal Police, which removed computers and documents.

Sources believe the raid was related to a call by local Iranians to remove Iran's opposition party, the People's Mojahedin of Iran, or PMOI, from the United States' official terrorist list.

Supporters of the PMOI argue the organisation is not a terrorist body, but rather the only opposition to the ruling regime. "


WebFountain: IBM Buzzes the Web for Intelligent Applications "WebFountain works in three stages: base mining, in which indexing and search technologies are used to systematically mine the Internet using focused crawling; an industry component, which requires industry-specific expertise to know the types of algorithms with high value (IBM plans to work with customers and consulting organizations to build these); and the application, which will be delivered to customers as an on-demand service."

""For every megabyte of data we read in," says Carlson, "we create about 10 megabytes of metadata. Our value proposition is the metadata. We extract all of this stuff—nouns, locations, entities—then it goes into an industry process." He says, "We construct higher-level value from this information."

There have been a number of research breakthroughs that have allowed IBM to create the WebFountain infrastructure; the technical challenge was to get to the scale. It is an operation made up of about a 1000-node Intel Linux cluster and half a Petabyte of storage, according to Carlson. While a "me-too solution" could be made by "cobbling together about 30 or so companies in the marketplace," according to Carlson, he doubts they could get past 10 million pages."

Pretty much confirms what I thought, more metadata than data, 10:1 metadata to data.

Metadata Miner

Catalogue "Catalogue is a file cataloging utility that enables quick viewing, managing and updating of metadata or document properties...[the export modules allows you to] apply XSL transformations to XML exports, transform to Dublin Core RDF". It's homepage is here. At the low, low price of US$64.

Hidden Windows

Windows Explorer secrets exposed ""During the development of this product, Whirling Dervishes revealed hidden Windows interfaces that are crucial to the development of such applications but which the existence of was denied by Microsoft.

"An example of such interfaces is the way these applications can manipulate the 'tasks pane', a section that is displayed on the left in Windows XP and contains items like 'Folder Tasks'."

A namespace extension is a virtual folder in Windows Explorer, such as Control Panel. "

Tuesday, June 03, 2003

New Search Engine (with Topic Clustering)

Make way for the contender to Google's crown "Turbo10 let's you select up to 10 search engines to run a search through. But unlike AskJeeves where all the results are irritatingly put in different boxes, Turbo10's genius to combine them all in one weighted listing - it's the search engine of search engines. So if you wanted, you could select, and to search in."

"When we first started trying out the engine yesterday, it had 1102 search engines available. At the moment, it links to 1108. But the time you've reached the end of this article, it will be probably be more."

You know what's really annoying, you can't link to the search results. The results aren't the real URLs either, they are redirections through Turbo10. The idea of a "deep search" is good (the choice of search engines is impressive but restricting it to 10 is not). I can't help wishing that there was more.

Monday, June 02, 2003

Haystack Released

Preliminary Haystack Download Can't wait to have a look at Adenine. Screenshot. I've had problems getting it going on the Mac. ::Managability:: has a review Finding Open Source Java in a Haystack.

Smokey Trousers

Revealed: How Blair used discredited WMD 'evidence' "British intelligence sources said the defector, recruited by Ahmed Chalabi's Iraqi National Congress, told his story to American officials. It was passed on to London as part of regular information-sharing with Washington, but British intelligence chiefs considered the "45 minutes" claim to be unreliable and uncorroborated by any other evidence."

Sunday, June 01, 2003

IE is Dead (the Web too?)

WinInfo Short Takes: Week of June 2 "When asked in a recent online chat about the next version of IE, Brian Countryman, an IE Program Manager, said, "As part of the OS, IE will continue to evolve, but there will be no future standalone installations. IE6 SP1 is the final standalone installation." The reason? "Legacy OSes have reached their zenith with the addition of IE 6 SP1," he said. "Further improvements to IE will require enhancements to the underlying OS." Sadly, this perspective is skewed, and suggests Microsoft believes IE is somehow at the "zenith" of the Web browser heap."

Sad, although not at all unexpected (I think there was a better link but that's the best I could find). It's not about a zenith of anything, it's about rights management (Palladium). It also indicates just how deep the changes will be for rights management (not just a plugin for IE, an extra window in Word or integration into Window's Media Player). Also mentioned on Slashdot, Tim Bray, Ryan Lowe's Blog and