Friday, February 28, 2003

Sony vs Sony

This isn't very new, but I enjoyed it just the same:
"As a member of the Consumer Electronics Association, Sony joined the chorus of support for Napster against the legal onslaught from Sony and the other music giants seeking to shut it down. As a member of the RIAA, Sony railed against companies like Sony that manufacture CD burners. And it isn't just through trade associations that Sony is acting out its schizophrenia. Sony shipped a Celine Dion CD with a copy-protection mechanism that kept it from being played on Sony PCs. Sony even joined the music industry's suit against Launch Media, an Internet radio service that was part-owned by - you guessed it - Sony. Two other labels have since resolved their differences with Launch, but Sony Music continues the fight, even though Sony Electronics has been one of Launch's biggest advertisers and Launch is now part of Yahoo!, with which Sony has formed a major online partnership. It's as if hardware and entertainment have lashed two legs together and set off on a three-legged race, stumbling headlong into the future."

Wednesday, February 26, 2003

Semantic Publishing

An Easy Semantic Web The 4 principles from exisitng Semantic Web applications:
* Guarantee the independence of the content creator.
* Let the content creator choose the license.
* Freely importable ontologies in your publishing system.
* Let machines communicate without human intervention.

A great quote: "Everyone agrees that a content writer is not necessarily a librarian who possesses the knowledge to classify his work, and it is true that most of the time, the metadata provided in webpages are relatively poor because they have not been established according to a systematic, shared method."

JemBlog is blogging software based on Jena. Danny Ayers explains, "The idea is to build the skeleton of a server-side blogging system along the lines of the core of Movable Type etc. The difference being that the model used behind the scenes is RDF, so true semantic blogging is possible." I was considering a similar thing with SnipSnap. SnipSnap mentions RDF and DAML+OIL as well as the possibly RDF based IM Expert Search.

Tuesday, February 25, 2003

Review of "A New Kind of Science"

Reflections on Stephen Wolfram's "A New Kind of Science" by Ray Kurzweil "So how do we get from these interesting but limited patterns of Class 4 automata to those of insects, or humans or Chopin preludes? One concept we need to add is conflict, i.e., evolution. If we add another simple concept to that of Wolfram's simple cellular automata, i.e., an evolutionary algorithm, we start to get far more interesting, and more intelligent results. Wolfram would say that the Class 4 automata and an evolutionary algorithm are "computationally equivalent." But that is only true on what I could regard as the "hardware" level. On the software level, the order of the patterns produced are clearly different, and of a different order of complexity." From Psybertron.

Monday, February 24, 2003


Lawrence Lessig: Exclusive rights to stagnate "American software developers will continue to choke on software patents, especially as more and more get enforced in massively expensive litigation. These patents will in turn inhibit the work of independent developers and protect large developers over small. As Mr Gates rightly concluded in his 1991 memo, "established companies have an interest in excluding competitors."Patents give them just one more tool."

In the document that Bill Gates cites, "Against Software Patents", they note:
"...Apple was sued because the Hypercard program allegedly violates patent number 4,736,308, a patent that covers displaying portions of two or more strings together on the screen--effectively, scrolling with multiple subwindows. Scrolling and subwindows are well-known techniques, but combining them is now apparently illegal...the technique of using exclusive-or to write a cursor onto a screen is both well known and obvious...But it is covered by patent number 4,197,590, which has been upheld twice in court..."

Friday, February 21, 2003

Intra-Cranial Knowledge Unnecessary

Knowledge-on-demand "When employees, consultants, and executives represent the most important capital in a corporation, and they keep it in their own brains, then metainformation becomes essential for an organisation to survive. Strategies, decisions, historical evaluations, constructions and work-flows must be documented, or else the whole business stands or falls with a certain person's leaving or staying...Everything will become metainformation.

All kinds of traditional knowledge - especially good old basics from elementary school - will become checkpoints, beacons that guide us to other kinds of knowledge, and this knowledge will help us to find a context for what we retrieve from different computer networks, it will help us evaluate veracity and relevance."

This article has some pretty good descriptions of the types of information, some of the problems with the distribution of it over the Internet and how it can all be fixed with a healthy dose of skepticism.

"At the same time, I am glad that I am somewhat well-rounded. For the same reasons that I would not dare trust a calculator without having some idea of multiplication tables, I would not trust the Britannica without having at least a cursory overview of history, geography and other basic facts. That is why I think it is incredibly important that our schools do not make it their primary goal to turn our children into full blown multimedia producers, but rather to teach basic subjects."

The extraction of text from any source resonance greatly with the work that we do. Just about everything we do is based on getting to the extracted text to obtain meaning.

"Thomas Jefferson is often quoted in the discussion of freedom of information. He said that one can share information without loosing anything, in the same way that someone ”who lights his taper at mine, receives light without darkening me.” It is, of course, a very appealing thought. Still, one has to wonder if that was not an attitude that was easily taken and more affordable when, at the time, the material world was still the focal point for commerce and trade."

Readings from Semantic Web Courses

Here are a few good sources of reading material on the Semantic Web from Universitiy couse material:
Semantic Web Course Material, CS594 "Semantic Web: Models and Query Languages" and The Semantic Web.

Jeff Heflin (one of the lecturers of the previously mentioned courses) has written Semantic Web Technologies for Aerospace (for the IEEE Aerospace Conference in March) :
"This paper will cover promising aerospace applications and significant challenges for Semantic Web technologies. Potential applications include higher-level information fusion, collaboration in both operational and engineering environments and rapid systems integration. The challenges that will be discussed include the complexity of ontology development, automation of markup, semantic mismatch between current object-oriented models and Semantic Web ontologies, scalability issues related to reasoning with large knowledge bases and technology transition issues."

Wednesday, February 19, 2003


Overview of SIMILE "SIMILE is a joint project conducted by the W3C, HP, MIT Libraries, and MIT's Lab for Computer Science. SIMILE seeks to enhance inter-operability among digital assets, schemas, metadata, and services. A key challenge is that the collections which must inter-operate are often distributed across individual, community, and institutional stores. We seek to be able to provide end-user services by drawing upon the assets, schemas, and metadata held in such stores.

Simile will leverage and extend DSpace, enhancing its support for arbitrary schemas and metadata, primarily though the application of RDF and semantic web techniques. The project also aims to implement a digital asset dissemination architecture based upon web standards. The dissemination architecture will provide a mechanism to add useful "views" to a particular digital artifact (i.e. asset, schema, or metadata instance), and bind those views to consuming services."

The list of challenges and opportunities lists things such as "In this new digital domain, libraries would ideally offer “One Stop Shopping” to all information resources of interest to its consitituents, and act as a flexible “information clearinghouse”."

Monday, February 17, 2003

SnipSnap Expert Search

IM Expert Search "Ben is a software developer who uses the IM P2P network to find experts. Because he has some questions Ben searches for an expert for Java and Oracle. He enters "Search: Java Oracle" into his IM client and sends this to his knowledge agent. The agent searches the P2P network of the company for skills of experts. Every local SnipSnap of employees, teams or departments stores expert skills and offers them to the outside and the P2P network as RDF. The P2P network finds without a central server the experts and returns the result to Bens IM client. Ben then can directly contact the expert or surf to the website."

Sounds similar to Tacit. They are appearing at Demo 2003 which is also showing Socratic Learning (collaboration), Meaningful Machines (concept analyser), MagigTech (quantum information processing) and Terraplayer (wireless MP3 player).

More Applications for the Semantic Web

COMMUNICATION: Enhanced: Science and the Semantic Web (if it doesn't work go to: and click on "Full Text of Article". "Current Web technology is clearly insufficient for the needs of interdisciplinary science and comes up short when it comes to supporting the needs of the collaborative and interdisciplinary "e-Science." Fortunately, new Web technologies are emerging with the potential to revolutionize the ability of scientists to do collaborative work...The Center for Bioinformatics of the U.S. National Cancer Institute (NCI), as part of the Metathesaurus [HN8] project (3), is turning a large vocabulary of cancer research terms into a machine-readable "ontology" [HN9]--essentially an expanded thesaurus that delineates precise relationships between the vocabulary items and that is available in the RDF-based Web Ontology Language, OWL [HN10] (4)."

Another article written by
Professor Hendler.

Friday, February 14, 2003

XP ported to iPod

""It wasn't easy," said Mason, 28. "Obviously, I had to use the 20 GB iPod, and even then I had to take out a whole mess of things. File navigation, control panels, applications... And, of course, there's no room for any MP3s anymore. And it crashes a lot. And it was hard getting the product activation code in using the scroll wheel and those buttons."

Thursday, February 13, 2003

Papa Smurf is a Communist

"It is just now that I have realized what I was really tuning into each and every Saturday morning was in actuality Communist Propaganda!! Yes that is correct, Papa Smurf and all of his little smurf minions are not the happy little characters Hanna Barbara would have us believe! "

"The most disturbing but solid proof that the Smurfs are communist propaganda is the striking resemblance that Brainy Smurf bears to Trotsky." from the Belgium doesn't exist page.

Moving Towards Intelligence

All by Themselves (Registration Required) "“The Semantic Web is a step toward AI, a step toward representing the meaning of the content on a Web page. So it doesn’t utilize AI, nor enable AI; but it is moving in the latter direction.”

By design, the Semantic Web relies on a number of assertions about data, the combinations of which allow software to infer additional facts. Yet the initial number of assertions in the chain places a mathematical constraint on the number of combinations possible. “Distinctions that are unimportant in one context—and hence not tagged separately—are critical in another,” explains Lenat."

Wednesday, February 12, 2003

XML Entropy

Heaven or XM-hell? "No one intended for our XML data to grow unwieldy over the past few years, but it did. It takes a lot of hard work and attention to maintain the semantic integrity of the data represented in your XML, as your business morphs and changes and new people come along to touch and manipulate the data in different ways. It’s particularly difficult when you’re converting data created by people, ensconced in the daily ebb and flow of messy human life, into a machine-readable format intended for the ages."

As mentioned by Jon Udell

ClearForest Product of 2002

Search and Categorization " ClearTags 4.0 drills deeply into content to tag documents along semantic, statistical and structural parameters. As a result, you can search documents very specifically for events, locations, people and facts as well as for words and concepts. A new user control panel allows the definition of different tagging schemes for any type of document stream, and at the same time monitoring of the entire tagging process.

ClearTags allows companies to precisely identify and automatically tag multiple relevant entities, facts and events buried within large textual repositories. The process produces richly-tagged XML files, which facilitates data re-use and the manipulation of content with other applications, enhancing return on investment. When combined with ClearForest's ClearSight module, this product produces visual search maps that depict relationships among people, locations or events, providing insight into their associations. Pricing starts at $100,000. "

Also Enterprise Content & Collaboration Technologies.

Relational is all you need

Database pioneers ponder future ""But Chris Date, an associate of Codd and author of the book An Introduction to Database Systems, cautioned panelists not to count out relational technology.

"I would just like to interject a note of caution: Anybody that is starting to think about these new models must understand the relational model thoroughly. First, it may turn out that we already have the model that we need," Date said."

""The mantra of the day is Web services, and I'd like to put in a highly cautionary note [about] Web services taking over the world," Stonebraker said. He added that issues such as semantics over records will hinder Web services. For example, a salary record in one enterprise may be defined differently at another, he said.

"These sort of semantic issues are going to plague Web services the minute you get beyond things like e-mail, which are just text-based services," Stonebraker said."

Tuesday, February 11, 2003

Make you famous

Chapter 16 of Practical RDF Plugged In Software's Tucana KnowledgeStore. She has even included two diagrams from our marketing material. Cool! There's also some mention of other companies like Adobe and Intellidimension.

"Bottom line: the power of TKS is just that -- power. By combining simple and intuitive interface with an architecture that's built from the ground up for large-scale data queries, the application is meant to get you up and running, quickly, and to you running, quickly." It is kinda embarrassing to get such a positive review.

Sunday, February 09, 2003

There are no lies

Web Ontology Language (OWL) Guide Version 1.0 Finally got around to reading the latest version. Reading it reminded me of the metadata lies. "Consequently, OWL generally makes an open world assumption. That is, descriptions of resources are not confined to a single file or scope. While class C1 may be defined originally in ontology O1, it can be extended in other ontologies. The consequences of these additional propositions about C1 are monotonic. New information cannot retract previous information. New information can be contradictory, but facts and entailments can only be added, never deleted."

The "open world" view states that no knowledge can be falsified since we only have partial knowledge of the system. The closed world (XML, databases, etc.) assumes that what is in the system is complete and therefore true. Seth Russell's readings on monotomic reasoning provided some more useful information. Now which one is more naive?

From the Semantic to the Pedantic

Where to place Agora? "Moreover, in light in my semantic web involvement, I'm getting more and more unconfortable with RDF (see my semantic web fight club pictures in boston in the gallery at and I'm more and more heading myself into the concept of 'data emergence' where you don't go around bothering people to markup their data as *you* like it, but *you* make an effort to collect their data and make a sense out of it. I'm starting to call it 'pedantic web' myself :)

Google showed how much value can be gained out of harvesting of simple information (hyperlink) that locally has no apparent global meaning. As do email replies or IP logs for CVS logins.

There is potentially a huge value in fostering research on data emergence, expecially if related to reasonable-sized and well logged communities like ours."

After looking at companies such as Covera, Endeca, FAST and others centralization is a valid concern and strategy. I still think that the RDF model gives you the best model. Faceted or not XML is still a hierachy hamburger.

This comes in various references such as: Zen, flow and emergence in information models. XML is still limited by its model.

There is a Happy Ending:
"Nevertheless, there are some features of RDF that may fit well with the emerging Infoset-centric XML processing model.

* RDF gives us a model for namespace mixing and data merging
There is no algorithm for merging two XML Infosets, to enable us to pool knowledge acquired from diverse sources. The RDF information model, by constrast, was designed with data aggregation (rather than structured documents) in mind. Merging RDF data is trivial: add the triples extracted from two RDF/XML documents, and store them in a new one.
* RDF views of the Infoset are explicit about the information we can throw away
Transforming Infosets into their RDF graph allows us to throw away irrelevant information, such as the aspects of the Infoset concerned with preserving a representation of document ordering. When we define transformations from an XML Infoset into RDF, we show XML processors which parts of the Infoset can be discarded without losing the essence of the message encoded in that XML."

Saturday, February 08, 2003

Semantic Web Applications

Ontolog "OntoLog is a tool for annotating (describing and indexing) video and audio using ontologies – structured sets of terms or concepts." A Java 1.4 based application that is based on top of MySQL and Jena 1.5.

Semantic Blogging and Semantic Portals are two SWAD-E projects. They also performed a study of semantic web applications (no mention of Tucana).

Semantic Blogging suggests 7 ways to upgrade blogging to become semantically aware including: semantic linking, using RSS 1.0, upgrading aggregators and to allow users to search distributively across blogs.

For the Semantic Portal they are proposing using the "...Arkive media repository. This is a long term archive of rich multimedia about worldwide endangered species and a large cross-section of non-endangered UK species."

Friday, February 07, 2003

Universal Inbox

Tom recently pointed me to Spaces which is calling itself a Java based Outlook replacement. Not only does it do email, calendaring, tasks and notes but it has a built in RSS aggregator. The news items appear in your space/Inbox. This is similar to News2Mail but there doesn't appear to be a conversion. I've downloaded and played with it and it's very cool (cool meaning fast, seemingly stable, works with my mail settings, easily configurable, etc.) but not quite good enough for me to switch to using it. A recent blog by the author of Spaces mentions that " the future meta-structured storage integrated within applications will become the norm, rather than the exception".

Wednesday, February 05, 2003

Digging Deep

Paid Content Trend Is Dangerous "...all that "good stuff" that publishers think is worth charging for will, for the most part, be visible only to users of those individual publishers' Web sites. News search engines, which send plenty of user traffic to news sites, won't see or refer their users to such material. And that has potentially dire implications for both hybrid-revenue-model news sites and for the search publishers offering premium paid content should start thinking about how to let search engines know about it. His offers a mix of free and paid content, so it's an issue dear to his heart. He points to Inceptor's eLuminator "content optimization service," which automatically extracts key words and phrases from hidden content (paid-access or hidden behind firewalls) and creates optimized pages that can be crawled by search engines. Paid-content publishers should be employing such solutions to let the rest of the world know what they've got."

This sounds like the start of a SW search engine where you could give it constraints such as a series of concepts, how much you'd willing to pay, and how long you're willing wait for a result.

Monday, February 03, 2003

Jena 2 Prerelease

"This version of the Jena toolkit is a preview release of Jena 2. It has considerably less functionality than the current Jena 1 release (no DAML support, no persistenence support, no reification support). It is newer and thus less well tested in the field. However, it does have fully compliant support for datatyped literals, parsers and writers that are fully conformant with the RDFCore WG last call drafts and a preview of the proposed support for RDFS inference. The RDFCore team intend to complete the functionality missing from Jena 1 as soon as possible."

The tests seem to work fine. This is great I can get to play with datatyping and inferencing.

Microsoft 2003 is Atari 1984

Gates addresses Italian Senate amid protest "Bill Gates outlined his optimistic vision of the coming digital decade in a speech to the Italian Senate Friday as open source advocates in penguin suits protested his visit and called on the Italian government to legislate in favor of the use of open-source software by the state administration as an alternative to Microsoft Corp.'s ubiquitous operating systems...Gates showed off "smart" wristwatches recently at the Consumer Electronics Show in Las Vegas. The technology behind such smart devices is called Smart Personal Object Technology, or SPOT, and was developed by Microsoft's research group, building on advances in a variety of different technologies."

Microsoft's SPOT: The Atari Connection "About this time, Atari asked SCA's Karr to design an alternative method of distributing games. The answer? FM subcarrier transmission, the pre-Internet's broadband technology. Karr said FM subcarrier transmission could transmit up to 12 Kbits/s per radio station, and that the technology could be multiplexed to increase bandwidth further. New 2,400-baud (2.4-Kbit/s) modems, meanwhile, cost over $500."

Java Groove

Momentum is a new peer to peer (P2P) collaboration application that's a showcase for Swing and for what's possible using the JXTA support for P2P networking. The Momentum application supports sharing and collaborating on documents and drawings using email, chat, and direct manipulation. And there's no need for a special centralized server - all communication is peer to peer.

It supports both P2P and client/server architectures. Also has WebDAV support.

Saturday, February 01, 2003

Metadata and relational databases

Metadata Clarification "There is an important difference and the reason why Xperanto is not about metadata integration is because it is based on DB2, and relational databases are not good places for storing metadata. The reason for this is that relational databases store tables which define entities. That is all they store. They also contain details about relationships that are defined by means of such things as foreign keys. However, these relationships are not explicitly stored in the database, which is why they are called relational, because that information is implicit."