Thursday, July 31, 2003

Is RDF Ready?

"Not long ago, Marc Canter, one of the early founders of Macromedia, talked about RDF and the Semantic Web in his weblog. Specifically, he wrote:

"I've been spending more and more time trying to grok the RDF folks. I have to say I like what I see and hear, but what I don't see are many apps and services actually up and running and working.


We have a saying over here: "put up or shut up." I'm still looking for two different RDF apps or services to work together in some meaningful way. Then bring on the books.""

"Plugged In Software's Tucana Knowledge Store provides sophisticated knowledge-based querying of large stores of data, again based on RDF.

These companies are just the first to start looking at RDF and the RDF data model for use in large-scale, sophisticated applications. And then there's the Semantic Web."

Wednesday, July 30, 2003

Resources on Resources

* Vicious Circle
* On Resources
* Re: httpRange-14
* Semantic Web for Poets: FOAF, Flocking, and the Semantics of Starlings
* On Resources
* Some ramblings on URIs and identity


The Future of Human Knowledge: The Semantic Web "For some, the move to a smarter Web is taking far too long. "The Semantic Web might be a cure for our e-commerce ills, but for many businesses, waiting around for it isn't an option...Some e-commerce companies -- impatient at the progress of the Semantic Web project -- are looking into other stop-gap technologies to meet their needs today."

Which Web? "Berners-Lee's latest claim seems to forget that a huge group of people finds the Traditional Web more than adequate for their needs. Some of those folks even do Web Services-like things with REST-based approaches that more closely follow the patterns laid down by the traditional Web. Some of them use XML (and even at times RDF) to exchange semantically-rich information between computers without needing the full power of the Semantic Web. The Traditional Web may be no good to the W3C's director or its members any longer, but it's still good for a lot of us."

Blog Mapper

RDF Mapper "RDFMapper is a web service that searches an RDF file for resources with geographic locations, and returns a map overlayed with dots representing located resources."

Tuesday, July 29, 2003

DVDs the Rosetta Stones of the 21st Century

From Slashdot Romancing the Rosetta Stone "Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones.

"Our approach uses statistical models to find the most likely translation for a given input," Och explained

"It is quite different from the older, symbolic approaches to machine translation used in most existing commercial systems, which try to encode the grammar and the lexicon of a foreign language in a computer program that analyzes the grammatical structure of the foreign text, and then produces English based on hard rules," he continued.

"Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system it with a parallel corpus, that is, a collection of texts in the foreign language and their translations into English."

Cool and I know just where to get them from, DVDs. Websites provide subtitles for DivX (like the Princess Bride) - they're great because they're indexed. For the record the Spainish version (see the DVDs link) is: "El m·s famoso es no involucrarse en una guerra terrestre en Asia.".

Monday, July 28, 2003

WWW2 - The Search for Shared Meaning

How the Semantic Web will scale: "Actually, I expect the web to have different order at different scales. A fractal system has similar amounts of organization showing up in a similar way at different scales...Every time a message sent by one is inconsistent with some interpretation the other had been considering the second agent throws the interpretation away. So we end up with a concept of "means the same thing".

When more than two people do it, then we call it a community, or a movement, or whatever."

Sounds a lot like this: "In the Semantic Web, as in life, you don't have to believe everything everybody says forever. If you are deeply cynical, then don't believe what anyone says. Still, the Semantic Web technologies could be of use. You could use your own text mining tool to extract metadata (unlike the document's own data in the meta tag). You can then combine this with other document's metadata and create your own ontology. You can use ontologies to further classify other documents or sets of documents.

If you can trust other people and their metadata extraction then things get even better or even exponentially better as you trust more (people, groups, companies, etc.) and/or use more tools..."

I hope I've got it...

New version of 3Store

3store is a scalable RDF triplestore and query engine, made available
under the GNU General Public Licence and funded by the AKT Consortium

It is known to scale well to KB's of >10 million triples and this version
is running as the production RDF KB for the AKT project.

New features:
* Produces all RDF entailments except RDFS 10 (ContatinerMembershipProperty)
* Supports model queries via a 4th RDQL pattern member
* Interactive and batch mode RDQL query tool (tstore_rdql)
* Perl RDQL interface module
* Supports multiple RDF databases on one machine
* Comparision operators in RDQL queries (> and <)

Several parsing and crash bugs have been fixed in the engine.

The Periodic Table

Tom Lehrer's The Elements in Flash format.

Sunday, July 27, 2003

Various Articles

Computer Visions: A Conversation with David Gelernter "The Linda-Jini technology and Linda-JavaSpaces technology connections seem straightforward. In many ways, Sun is the leading intellectual force in the software world; both Jini technology and JavaSpaces technology are systems I admire highly, and I'm proud that our ideas were useful."

An Open Source Strategy for the Open Group "But sorry, vendors! The easy times are over. A market that embraces Open Source is a highly competitive market, and you'll have to work harder for the customer."

RDF tools are beginning to come of age "Redland is a C-based toolkit with many language bindings, including Python, Perl, and Java. It's comprised of an RDF parser, raptor, and a data store. The store currently uses Berkeley DB files, but support for underlying SQL stores is underway. For my example, I'll use the Python bindings to Redland."

RDF is a Web technology

Missing isn't broken: data validation and freedom on the Semantic Web "Missing isn't broken. In the general case, you are free to say as much, or as little, in your RDF document as you like. RDF vocabularies such as FOAF, Dublin Core, MusicBrainz, RDF-Wordnet don't get to tell you what to do, what to write, what to say. Instead, they serve as an interconnected dictionary documenting the meaning of the terms you're using in your RDF documents."

"If the stark notion of 'valid -vs- invalid' document checking doesn't make sense in the decentralised Semantic Web environment, how can we make things easier for developers who are trying to work with this free-flowing mix of RDF markup? If nothing is mandatory, then how can they write code that knows what to expect?.

There are several answers here. The first is that, if we want this to scale to the planet, we have to accept that one size won't fit all, that different parties will want to say quite varying things in their FOAF documents, and that our ability to impose our views on their documents is limited."

Friday, July 25, 2003

The Cult of Tim

Social Meaning and the Cult of Tim "Perhaps the most disturbing aspect of the entire social meaning debate is the degree to which people uncritically defer to Berners-Lee's "intuition" and "vision", that is, to his admittedly incompletely expressed idea about the Semantic Web. Few people think that Berners-Lee's ideas about the Semantic Web are perfectly or completely formed. Everyone, including Berners-Lee himself, agrees that they are intuitions, which implies the idea that he can see further than he can say, that he can reach further than he can grasp, at least for now.

One obvious point to make is that there are a lot of people trying to help Berners-Lee realize his intuited vision and that he wields more influence and authority over this complex process than any other single person. Perhaps that is perfectly appropriate. However, the problem arises when other people, who have less moral authority, disagree with Berners-Lee. I have heard it said several times, although few people seem willing to commit to this view publicly, that Berners-Lee should be exempt from public criticism because the realizability of the Semantic Web rests upon Berners-Lee's reputation more than upon any other single factor. "

As Berners-Lee says:
" ...we are not analyzing a world, we are building it. We are not experimental philosophers, we are philosophical engineers. We declare "this is the protocol". When people break the protocol, we lament, sue, and so on. But they tend to stick to it because we show that the system has very interesting and useful properties (emphasis added).

The architecture...defines an "authoritative" or "definitive" meaning, to which "meaning" in wittgensteinian sense and "intended menaing" in [an] ethical or legal sense generally approach as closely as they can, and close enough for the system to work and be unbelievably useful to millions of people. "

I'm of the, obviously ignorant, opinion that this is a engineering problem that will be fixed or continue to evolve to a better overall solution. Instead of arguing about this I'm thinking "killer app".

Tim Bray: "Whereas I’m repeatedly on the record as being more than a bit baffled by what the argument is about, and have hinted unsubtly that maybe it’s at the angels-on-the-head-of-a-pin level, a lot of people who are demonstrably smart care a whole lot about it. Anyhow, I can’t ignore it as long as I’m on the TAG, which is at least until we get that damn Architecture document shipped or it’s clear we’re not going to."

The ESW Wiki's entry on the Social Meaning Group has some background too.


A SIMULATOR SUPPORTING LECTURES ON OPERATING SYSTEMS "The SOsim has been used at the CS department of Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Brazil. There it was able to clearly demonstrate how a simple OS lecture can be improved by showing those aspects demanded in the complete understanding of the subject. In this way, the teaching and learning process becomes far more efficient."

Homepage (English translation). Developed using Delphi, it runs on Windows.

Wednesday, July 23, 2003

Various Updates

Classifier4J 0.3 "I now have Classifier4J and nntp//rss working together to do Bayesian classification of RSS feeds."

RDF extraction from HTML "Currently it extracts a set of blurbs of RDF and a list of URI's it believes contain RDF. Currently the only documents parsed are HTML.

There are a great many ways to include RDF in HTML files. I located one summary of possible methods. I am also aware that Creative Commons advocates storing the RDF that describes the license for a page in a comment tag."

It's official - we are not stupid! Is another commentry on Mark Butler's paper.

Even Primates use Macs

Primate Programming: The Evolution of IT "Primate Programming Inc, has received 10 million in 1st round financing from five Menlo Park venture capital firms. That first chimp, named Brainerd, is now leading several primate-only software development teams at the company’s headquarters...Bajek said he first got the primate programming (PP) idea after learning about Koko, a female gorilla that used an Apple Mac to communicate with her handlers. Koko held regular webchats on the Web and became famous for her utter mastery of language. One surprise is that Koko is now demanding to move to Maui, Hawaii. “Prima donna” behaviors in programmers are well known, but no one saw this one coming from a primate just starting out in the business. "

It's Integration

Enterprise Integration Patterns "Very few business applications can live in isolation. More often than not, applications have to be integrated with other applications inside and outside the enterprise. This integration is usually achieved through the use of some form of "middleware". Middleware provides the "plumbing" such as data transport, data transformation, routing etc."

ROI analyses key factor in web services take-off, report "The results of the poll taken in Germany show that 81 per cent of the companies interviewed judged security as the main restriction on web services taking off, while 74 per cent evaluated a missing (or at least a not properly declared) ROI as a real handicap in regard to web service implementation. Further barriers to success that were named included missing budgets, non-integration of existing applications, and an absence of key skills and know-how within organisations."

Thursday, July 17, 2003

Back to Business

According to Gartner the Semantic Web will be one of four technologies to drive the sevice driven IT recovery.

"...four new application/technologies (the semantic Web, the grid, enterprise performance management (EPM) and "net fabric") as being significant areas of innovation and opportunity taking shape behind the high-profile push toward Web services...Service and utility will be the overarching themes for new opportunity in the next five years as the IT industry undergoes its next paradigm shift."

Business Model for the Semantic Web "All you need to do is move up a thin layer of interoperability. Just as database systems suddenly became compatible by adopting a consistent relational model, so your unstructured data can also adopt a relational model, and get all the benefits you need to solve these problems.

The relational language for data on the Net is called RDF. When information from two sources is in RDF needs to be merged, you basically concatenate the files into one big file. When you want to extend a query on an RDF file to include constraints from another, you just write it in."

Lists of Semantic Web companies: Semantic Web Business SIG and WS Index.

Modeling the Web

Modeling the Internet and the Web The TOC lists chapters on text analysis, link analysis, crawling, and others.

Tuesday, July 15, 2003

I.T. does Matter

Havard Business Review has published some of the better responses as to whether IT matters: "Viewed more broadly, transaction costs encompass such challenging business issues as the creation of meaning, the building of trust, and the development and dissemination of knowledge."

"Easy availability of information technology makes it increasingly valuable. E-mail, fax, and cell phones gain in utility as they become more widely used, because they can be acquired on attractive terms."

Saturday, July 12, 2003

Mozilla Wonder GUI

Topicalla "Topicalla allows one to view information using a UI that is generated based on the kind of data available. The user can augment, customize and replace both the data and UI as they see fit. For instance, the user might want to view information about a movie. Data could be provided from different sources, for instance, cast info from one source, reviews from another and showtimes from yet another. The UI, or even parts of it may be provided by yet other sources.

Topicalla uses a custom template language, which is kind of like XUL templates, but uses an XPath-like language for gathering info from RDF sources." Screenshots.

All I can say is "Wow!" and "About time!". All good software deserves a developer blog and this is no different.

Thursday, July 10, 2003

GNOME Dashboard

GNOME Dashboard "The Dashboard is a persistent rectangle of screen real estate on the right-hand-side of the user's screen. The goal of the dashboard is to display links to relevant objects as the user goes through his day-to-day activities." Screenshots here, and before graphing and after. The Dashboard developer blog is here.

Instrumenting Phone Manager for Dashboard "A recently implemented feature in Dashboard is cluepacket-rewriting. Cluepackets are the short XML messages applications send to Dashboard. Rewriting them enables backends to augment clues to help other backends. For instance, when an instant message comes in, the clue contains the remote user's nick and the content of the message. Your addressbook contains a mapping from an IM nick to an email address and home page URL. So, when the addressbook backend receives the clue, it augments it with the URL and email address. Other applications are then able to show more relevant information."

Convera Converts Customers

New Enterprise-Wide Deal for Convera in the Netherlands Banking Sector "The Rabobank Group comprises Rabobank, the Netherlands largest retail bank, investment specialist Robeco, insurance company Interpolis and finance product group De Lage Landen... Initially the solution will be rolled-out to 15,000 employees nationally. The second phase will see the system extended to 43,000 employees worldwide. As a result, Rabobank employees will have the ability to accurately search across corporate knowledge, regardless of the data format, location or language."

A previous press release is also is interesting: "Convera is cited as a leader in extracting and exploiting structure from unstructured data, an essential requirement within the search market as XML and RDF become used more pervasively...Following his evaluation, the report noted that Convera can access a wide array of repositories and file systems. It has working installations that support over a petabyte of content."

Wednesday, July 09, 2003

The Power of Low-End Users

AI Startup? Strategic Planning for a Successful Start-up "Historic product performance data shows technology advancements occur at a faster rate than users can absorb the performance improvements. Consequently, product performance improvements migrate towards the high-end users. The natural tendency of well-run businesses is to design for their best customers. Financial pressures migrate products to the high-end where profit margins are greater — leaving the low-end users disenfranchised."

"These low-end users, disenfranchised by having to pay premiums for product performance beyond their requirements, are willing to switch products. For example, the first transistors did not have the performance to replace vacuum tubes in the large stand-alone television sets. They did however, offer individuals with poor hearing the first opportunity for a small hearing aid. Hearing impaired users jumped at the chance to use products with this technology, despite the inferior audio performance when compared to the existing cumbersome products."

"Many new product ideas will come from these experts working with potential end-users."

As they mention, in the era of plentiful computing (they say PDAs) and the Internet the disenfranchised can be given access to these technologies and drive it in unexpected ways.

This is one of the things I first blogged, disruptive technologies.

GPLed Coverage Tool

"jcoverage identifies how many times each line of code in your application has been executed. This is particularly useful in the area of testing, because you can see which parts of your software remain untested."

There's a comparison of it versus Clover and JProbe. Changing it at the byte-code level is especially interesting.

Monday, July 07, 2003

Semantic Web Hype

Via this posting, New, improved semantic web - now with added meaning, I read this by Dr Mark Butler. He lists the following issues with the Semantic Web:
* Produce a better XML format,
* Demonstrate RDF/XML, RDFS and OWL solving practical problems,
* Research: an XML version of OWL, a more efficient in memory and persistent version of RDF (with locking, transactions, etc.), simple API for RDF, query interface above triples, determine if context based models (quads) is better, devise a methodology for using Semantic Web tools and compare and contrast with other existing work.

Most of these questions already have answers. Commercial companies offer transactional triple stores with the ability to query above the triple layer. There is some lack of native transactional stores (there are many that work on top of SQL databases) available in the Open Source world which is probably stifling the adoption of RDF. However, it's definately possible and gives substantial, necessary improvements to the usability of RDF. The whole layered approach of RDF is similar to the abstract relational model in databases. The separation of datatypes, for example, gives positive benefits over most SQL database implementations.

Programming APIs for RDF such as Jena and others (available in tools like Redland, KAON, Sesame, swoRDFish, etc.) are quite suitable and simple to use. I would say these are less complicated than the DOM API for XML, for example.

There are already many companies using RDF, schemas, and ontologies (maybe OWL is a little too new) to solve practical problems. Much of this work is done without much publicity. Companies like Sun or the work done by the Ontoknowledge group have not only used Semantic Web technologies but also developed methodologies on how to use these tools.

I think the only things missing are a better XML format for RDF and possibly an XML version of OWL. I'm unsure if an XML version of OWL is necessary, though.

Sunday, July 06, 2003


Tim O'Reilly: Software licenses don't work "Nobody is pointing out something that I think is way more significant: all of the killer apps of the Internet era: Amazon (.com, Inc), Google (Inc.), and They run on Linux or FreeBSD, but they're not apps in the way that people have traditionally thought of applications, so they just don't get considered."

"Let's stop thinking about licenses for a little bit. Let's stop thinking that that's the core of what matters about open source...Open source is a contributor to the commoditization of software, but it's not the only contributor. Open standards lead to commoditization. The Web browser is proprietary, but it's a commodity."

People also seem to ignore that Semantic Web technologies are perfect for services and integration work. Open source, Semantic Web enabled technology for system integrators...seems to make sense.

Saturday, July 05, 2003

Learning Programming

Programming Environments for Novices "Each novice programming environment (or family of environments) is attempting to answer the question, "What makes programming hard?" Each answer to that question implies a family of environments that address the concern with a set of solutions. Each environment discussed in this chapter attempts to use several of these answers to make programming easier for novices."

Discusses environments/languages such as Logo, Moose, Smalltalk, Squeak, Prolog and the very good ToonTalk.

There was also a recent interview with Alan Kay in HP's Business View magazine.

Still Clueless over Free Software

The End of Idealism "...a lot of the intellectual property in Linux is actually owned by companies that never officially agreed to make it available under an open-source license. Most obvious here is The SCO Group...But there are others, including Microsoft, that could do the same if they chose. "

"The most successful open-source movement prior to Linux was the hacker movement—not exactly the kind of folks that corporate decision-makers want associated with their platform software."

This still proves that there is still a lot of ignorance (or if you're paranoid maybe corporate sponsored fear, uncertainity and doubt) about free software.

Friday, July 04, 2003

Google Changes Everything

Information Foraging: Why Google Makes People Leave Your Site Faster "In the last few years, Google has reversed this equation by emphasizing quality in its sorting of search results. It is now extremely easy for users to find other good sites.

Information foraging predicts that the easier it is to find good patches, the quicker users will leave a patch. Thus, the better search engines get at highlighting quality sites, the less time users will spend on any one site."

"The patch-leaving model thus predicts that visits will become ever shorter. Google and always-on connections have changed the most fruitful design strategy to one with three components:

* Support short visits; be a snack
* Encourage users to return; use mechanisms such as newsletters as a reminder
* Emphasize search engine visibility and other ways of increasing frequent visits by addressing users' immediate needs"

RDF Extractor for Office

More, More, MORE "The tool is a simple command-line utility that generates an RDF document from one or more Office documents. Access to the embedded properties is made possible by the POI HPSF API, while the RDF manipulations are performed by Jena." MORE Schema

Group Enemy

A Group Is Its Own Worst Enemy "To the question: Do you view groups of people as aggregations of individuals or as a cohesive group, his answer was: "Hopelessly committed to both.""

Unifying Information

The Right Solution: Federated Search Tools "I believe that federated or cross-database search tools now available on the market are the correct solution for unifying access to a variety of information resources...Representative players in this market space are MuseGlobal, Endeavor, Ex Libris, WebFeat, and Fretwell-Downing. All have products that unify searching of a variety of databases (known as "targets"). They also provide additional services such as authentication, merging, and de-duping."

"In the case of a large academic library, an effective discovery service may require searching a number of institutional repositories...While most such repositories support a standard that allows metadata to be harvested, it is not a searching protocol like Z39.50. In order to point a federated search tool at repositories such as these, a library would first need to harvest the metadata from the repositories of interest and make that metadata available to the federated search tool as a searchable target. In such a case, the library would be engaged in target creation."

Thursday, July 03, 2003

Tamino Updated

Software AG enhances XML document storage "Version 4.1.4 enables metadata searches of non-XML documents via the Tamino Non-XML Indexer, which can search on documents such as those in Microsoft Office or Sun StarOffice.

Also featured in Non-XML Indexer are phonetic document searches and retrieval text highlighting to enable better access to XML documents and other types of business and multimedia content.

Non-XML Indexer is a plug-in module for Version 4.1.4 which works with Tamino XML Server to extend the set of criteria for searchable metadata, such as author, creation date, date last modified or file size. This will enable faster, more intelligent searches of non-XML documents. Indices can be created for standard document formats such as Microsoft Office."

Learning Java

IBM pushes game as learning tool for Linux admins "CodeRally is a Java-based, real-time programming game based on the Eclipse platform. In the game, players program race cars to compete for points on a simulated racetrack. The game makes it easy for learning developers and system administrators to learn Java and learn about the Eclipse framework. It allows users unfamiliar with Java to easily compete while they learn the Java language.

The Eclipse platform is a framework or IDE [integrated development environment] built for Windows, Linux and other platforms. The framework helps developers build their applications and plug-ins."

Tuesday, July 01, 2003


"Another very strong argument that has hardly been used is the huge success built on top of common, patent and royalty-free standards. If ever there was a counter-argument to big business' innovation claim it is that the Internet is so successful because there weren't constraints or patents on it. Due to this, it has grown hugely, created new markets and so benefited everyone."

Open source prepares to kiss EU patent ass goodbye

"Berners-Lee laid out a new and somewhat controversial plan for keeping the Web working, the W3C Patent Policy, which he said would reduce the threat of patents blocking future Web infrastructure developments...."While the policy necessarily involved choices that could be perceived as threatening certain business models, I believe that this policy is the right one, from a revenue perspective, for all who seek to contribute to the development of the Web and who ultimately seek to profit from its growth," Berners-Lee said.

"However, it does not preclude licensing activity for all technologies on the Web. Indeed, by supporting the continued growth of the underlying Web infrastructure and by growing the overall market for the Web, this policy increases the opportunity for financial gain (including from patent licenses) on applications that depend upon the Web," he said. "

Web 'Shaman' Fights His Demons

"Who owns your Sidekick? T-Mobile does, apparently, even if you spent full retail on it (I dropped $250 on mine). You need T-Mobile's permission to install software on their device. T-Mobile will, from time to time, decide to erase software from your device. And when you stop subscribing to their service, T-Mobile will delete all your data forever, without giving you any mechanism for moving it off the device (and without giving you the ability to design a tool that would let you do this)."

T-Mobile drives a nail into the Sidekick's coffin