Saturday, January 31, 2004

Pragmatic Metadata

Content-aware searching "Brin’s pragmatic stance sharply opposes the idealistic view of the Web’s inventor, Tim Berners-Lee, who continues to evangelize his vision of a Semantic Web full of carefully encoded content that we can precisely search and fluidly recombine. My own humble contribution to this debate is a prototype search engine, now running on my Weblog, that tries to steer a middle course between the Scylla of simple fulltext search and the Charybdis of unwieldy tagging schemes and brittle ontologies."

"Remember, the pools of HTML content that your people routinely create, and the infinitely vaster pools to which they have access, are full of intrinsic metadata — including the links, tables, images, and other elements that occur naturally within HTML content. Mining that metadata may be more practical than you think."

Web Services then the Semantic Web

The Web within the Web "All these new protocols, SOAP in particular, took years to develop. Indeed, they're still works in progress, in part because contributing companies want to receive patent royalties or just don't want a competitor to control a standard. Those same concerns sabotaged two earlier transport mechanisms, one from the Unix world and one invented by Microsoft. "

"Although Web services allow a machine to publish its data, making it available to another machine, the two have to agree on the structure of the data they are publishing. In the semantic Web, this sort of agreement will be largely unnecessary. "


Bossam rule engine v0.7b42 is available for download. Only binary form of the engine is available, and you need a Java runtime (J2SE 1.4 or later) to run the engine. The engine is not feature rich, has many problems, and is buggy. But still, you can process RDF queries, and perform reasoning over OWL ontologies with Bossam. Currently, Bossam supports only one rule language, Buchingae.

Friday, January 30, 2004

Datacentric Web - Microcontent de ja vu?

The Data Centric Web "This shift in focus from documents to data and from humans to computers is simple and yet profound. Just imagine a world in which every piece of data is immediately and automatically accessible from any computer via the web using a simple, universal set of protocols and formats. Indeed, such a vision has long represented the ?holy grail? of Enterprise Application Integration (EAI) and yet attempts to realize this vision have been woefully inadequate to date."

Very similar to an older article about the microcontent client and of course very similar to the Semantic Web too.

Introducing the Microcontent Client
"The microcontent client is an extensible desktop application based around standard Internet protocols that leverages existing web technologies to find, navigate, collect, and author chunks of content for consumption by either the microcontent browser or a standard web browser. The primary advantage of the microcontent client over existing Internet technologies is that it will enable the sharing of meme-sized chunks of information using a consistent set of navigation, user interface, storage, and networking technologies. In short, a better user interface for task-based activities, and a more powerful system for reading, searching, annotating, reviewing, and other information-based activities on the Internet."

The good and the bad

I'm in heaven "Today I'm in geek heaven in so many ways. I read Curtis Hovey's recent weblog entry. He writes:

I strongly feel that a GNOME metadata solution should be based on metadata standards: RDF, OWL, FOAF and use common grammars."

From the original posting: "I strongly feel that a GNOME metadata solution should be based on metadata standards: RDF, OWL, FOAF and use common grammars. I'm shopping for a new Medusa backend because I don't think Medusa should be in the DB business, and it needs an extensible schema."

"The bad

* BDB 4: will everything break when BDB 5 comes out
* Mysql: a bit of a nuisance to setup for single users
* query: applications and users need robust searching
* scalability: will this work at 100 megs, the size of my Medusa db"

Taxonomy Warehouse

Agency taxonomies are a tall order, experts say "An agency building an enterprisewide taxonomy should expect to see more than a million categories within their design, according to Claude Vogel, chief technology officer for search engine company Convera Corp. of Vienna, Va."

"“People underestimate the magnitude of how big their taxonomies will be,” Vogel said, adding that commercial software, such as Convera’s, can handle most, though not all, of the job. "

Tuesday, January 27, 2004

Two Interviews

Checking in with the Inventor: Tim Berners-Lee ""The general public is seizing on the Web as a way to have a conversation," he said in our own chat this week. "That for me is very inspiring. It doesn't tell me something about the Web. It tells me something about humanity. The hope for humanity is that people do want to work things out. They do want to come to common understandings, and they will do it by constantly refining the way they've expressed their own ideas--and occasionally, on a good day, listening to the way other people have expressed theirs.""

Under the Iron interview with Aaron Swartz "You’ve put a tremendous amount of work in, for example, RDF and RSS 1.0 (the latter using the former). People say this is the basis of the “Semantic Web”. Could you cue us in on what they hope to achieve with this, how they will make everyone start doing something to achieve it, and what exactly it is we’ll start doing? Do you believe this is possible?

So, uh, here’s the plan:

1. Collect data

2. ???????

3. PROFIT!!!

Uh, more specifically, the idea is to get everyone sharing their vast databases of information in RDF with each other. Then we can write programs that put this data together to answer questions and take actions to make our lives easier."

Thursday, January 22, 2004

What makes technology succeed?

What matters? "In many cases, it is much more important that a choice be made so that we as a society can benefit from the network effects, than it matters which choice is made...the choice between technologies is often of much smaller significance for us as a society than that there be a choice. Networks effects have their role in this and provide many of the benefits.

But if we want to benefit not just from the network effect but also from the advantages of technology, it is in everyone's interest that the network effects cut the right way: that we choose as a society the technologies that work best.

Now, if network effects are the best predictor, then we must infer that the people who actually are responsible for making a good decision are the early adopters. In IT, that means you. You have a responsibility to judge what matters not by network effects but by technical merit. This is a special case of the Categorical Imperative of Immanuel Kant, which you may dimly remember is phrased something like this: “act only on that maxim by which you can at the same time will that it should become a universal law,”1 but which your mother may have expressed more colloquially as, “What would the world be like if everyone did that?”"

"But the reasons that it is a good idea for them to be widely adopted have nothing to do with the differences between SGML and XML, and everything to do with the essential characteristics of the languages...But the choice of any technology is a cost/benefit calculation. And the only changes XML made to that calculation were in lowering the costs of deployment, not in adding any benefits — unless you count the the benefits of the network effect, which are, as I have suggested, considerable."

OWL "Tiny"

Discussion on Owl Tiny.

Wednesday, January 21, 2004

Cluster Graphing

"Anyone who has ever had to complete a what doesn't belong question on a test has an interest in clustering technology. How close are the terms "slime mold", "skunk odor removal" and "luxury bathroom" anyway? Zoom in here and find out. (Clue: They are all green)."

Okay, that looks cool...then put your mouse over the map...

Kowari Update

* iTQL will be changing to be RDQL with proprietary extensions. This is based on the recent RDQL submission and previous discussions we've had.
* The Kowari lite development (just the minimum number of jars to get it going) is continuing. The iTQL command line UI has been improved so it's at least as good as the web iTQL UI.
* RDF Query Languages Eric van der Vlist's question, "Where are the triples?", we've often thought the same thing. The iTQL interpreter has planned, for a long time, to support spitting out RDF/XML results as well as it's existing XML and ResultSet based answers.
* CVS is going to stay internal until we get significant external development. It will be updated infrequently (as bugs are fixed) and with each release.
* Started looking at Aquamarine's API as far as JRDF is concerned. JRDF has had some minor updates (only in CVS at the moment).

Actually, after thinking about querying what you really want is to define a query and have it return all the RDF/XML related to a resource that matches the query - it's something that Guha wrote about (as given by Dan Brickley). This would require OWL and schema support but it's something that is unique over an existing SQL database. You could also, hack this up, by creating a large WHERE clause that defines all the properties of the resource but that's a lot of effort to do each time.

Semantic Google, Semantic Web

Reading this discussion about semantics (linked to by this blog entry) it's especially encouraging to see people when they are talking about this stuff to talk about RDF and the Semantic Web.

Especially, this on the second page:
"Then I re-read G's analysis of Vijay's article (previous post in this thread) in which he points out that Google/Froogle is already extracting this semantic information from non-RDF documents and doing a pretty good job of it all things considered (even if they are trying to sell off a forum moderator, and pretty cheaply, too, I might add ).

So, if I'm understanding things correctly... we don't have to convert everything over to XML (at least not right away) in order for this to work. Which is a good thing, because there are a buncha individuals and mom & pops out there (and some companies who ought to know better and could certainly afford the upgrade) who haven't even started using CSS and HTML4.x, much less XHTML or XML."

"Think of RDF (Resource Description Framework) as the foundation. Think of XML (eXtensible Markup Language) as the formatting language used to deliver it."

Putting Ontologies to Work

Judging the likely Success of an Ontology "Clay Shirky is obviously right when he states that a single monolithic ontology will never work. His critics are equally right when they claim the Semantic web will only work if it is a m�lange [melange] of multiple interoperable Ontologies. What is missing from the debate is a more detailed explanation of what ontologies are good at, how they interoperate, and why systems based on ontologies succeed or fail."

"Ontologies, far from being an unproven new concept, are already in practical daily use. They form the foundation of classification systems, databases, and object oriented software applications."

I enjoyed reading this article, especially as it touches on many areas (like the relational model) and previous articles. This is virtually the perfect anti-Shirky piece.

Tuesday, January 20, 2004

Apache XML

Not what you would initially think: ""Within the Apache Longbow, eXtremeDB will manage secure, digitized battlefield data. eXtremeDB's XML interface will facilitate communications, both internally and between the attack helicopter and external (ground and air) systems. Embedded software including eXtremeDB will run on airborne PowerPC processors and a commercial real-time operating system (RTOS). The program is being developed by The Boeing Company's Phantom Works organization in Mesa, Ariz."

Combine Two Technologies

Like putting a clock in an existing product I keep thinking of combining an XML Swing library with JSF. It would be nice to use JSF as a way to provide the abstraction for differing rendering technologies (this was the first thing I thought when I saw JSF). This interview had some interesting bits of information on this:
"One of the unique things about Faces is that it allows you to have separate classes for rendering a UI component. So a simple text box can consist of a UIInput component, which represents the concept of collecting user input, and a Text renderer, which knows how to display a textbox in HTML. You can create separate renderers for different types of clients -- one for HTML, one for SVG, and one for WML, for example...the third-party component market will continue to grow, not just with HTML components (which will be first), but also components and renderers that support other devices and richer clients."

"There's a sample in the current Faces early access release of XUL instead of JSP, but I think more work needs to be done to prove that other display technologies can really be first-class citizens."

"JavaServer Faces is also a good technology for thin client applications that aren't HTML-based. I've mentioned WML, but you could also write a Java applet application, or some other non-browser client that works with JSF. We'll see these types of applications evolve over time.

Personally, I think fat clients are great for some applications, like RSS News Readers. But web applications are great for other things, and JSF is a good way to build those types of applications. "

The latest download includes XUL in the non-JSP examples.

Monday, January 19, 2004

Related To

Semantic Similarity in a Taxonomy "This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach."


'Xen' programming language unites C#, XML and SQL programming languages. ""I am currently working on language and type-system support for bridging the worlds of object-oriented (CLR), relational (SQL), and hierarchical (XML) data, and of course first class functions," explains Mejer."

Mejer's paper, Unifying Tables, Objects, and Documents explains this idea more fully. Beware the Haskell programmer. Some of the syntax reminded me of Groovy. ExtremeTech also have an article.

Which Schemas?

WinFS Is a Storage Platform "WinFS is an active storage platform for organizing, searching for, and sharing all kinds of information. This platform defines a rich data model that allows you to use and define rich data types that the storage platform can use. WinFS contains numerous schemas that describe real entities such as Images, Documents, People, Places, Events, Tasks, and Messages."

Saturday, January 17, 2004

No unstructured data

MORE ON “UNSTRUCTURED” THINKING "There is no such thing as "unstructured data". That means random noise, which has no structure whatsoever and, therefore, is meaningless. It is the structure that gives meaning/content and makes data.

It has nothing to do with scanning, or incompleteness, or missing, or anything. It is structured, whatever it is. Diagrams have one type of structure, partial documents different types of structure, but there is always some structure by definition.

The term "unstructured data" is a misnomer based on misconception: it essentially refers to data that is not structured in tables, or spreadsheets, or whatever; mainly text, graphics, etc. But that is not unstructured, it's just different structures than tables or spreadsheets, that's all.

And that's a core issue, because structure determines the integrity and manipulation of the data, which are different for each type of structure. The point of relational structure is that it is the simplest formal structure for integrity and manipulation. Any other structure adds complexity, but no power."

Thursday, January 15, 2004

Another Java RSS Parser

FeedParser "The main API is very similar to JAXP, TRaX, SAX, and is designed to be very flexible. Having been a veteran of the RSS wars, member of the RSS 1.0 working group, and Atom developer, I think this takes into consideration all major issues with RSS/Atom feed formats and integration with the Java language...RSS serialization support. Serializers for all RSS versions (1.0, 2.0, Atom, etc) with the same code."


Well, it's not quite there yet but it will be available at SF's Kowari Project page (a 45MB download and in CVS). I think I've mentioned this before, one of the future goals is to reduce the download to the minimum set of jars.

Tuesday, January 13, 2004

Why the Dock Sucks

Top Nine Reasons the Apple Dock Still Sucks "The Dock is like a brightly-colored set of children's blocks, ideal for your first words—dog, cat, run, Spot, run—but not too useful for displaying the contents of War and Peace."

Active Internet

A collection of articles about how computers and the Internet are turning people into content producers not just consumers:
* Weblogs, RSS and the Rise of the Active Web - "...we show how blogging – originally a cross between self-expression and journalism – and its tools have morphed to give users some of the power promised by the so-called Semantic Web...they can construct personal news or commerce portals for themselves or for third parties, track multi-person blog conversations across the Web, or figure out other ways to control their digital environment that we have not thought of yet."
* The New Economy Hack: Turning Consumers into Producers - "That industry lately has become vigilant about threats from its customers, which it still thinks of as consumers. Instead it should be watching how Apple transforms those consumers into producers."
* Democratizing the Media, and More - "Smarter folks will understand the enormous opportunity it represents. They can start listening, really listening, to what people are saying. And they can dip into the vast pool of creative talent that exists outside the usual channels."

I finally have an excuse to link to Bush In 30 Seconds. My favourites were: In My Country, What are we teaching our children?, Imagine, Human Cost of War and Bush's Repair Shop. I still think the quality, even in the top 14, was spotty but that's where you need good annotation and recommendation software.

Monday, January 12, 2004

The Importance of Ontology

Ontology and Integration - Managing Application Semantics Using Ontologies and Supporting W3C Standards " Ontologies are important to application integration solutions because they provide a shared and common understanding of data (and, in some cases, services and processes) that exists within an application integration problem domain, and how to facilitate communication between people and information systems. By leveraging this concept we can organize and share enterprise information, as well as manage content and knowledge, which allows better interoperability and integration of inter- and intra-company information systems. We can also layer common ontologies within verticals, or domains with repeatable patterns."

Sunday, January 11, 2004

The old fashion way of integration

Compare and Contrast JOLAP and XML for Analysis and Intelligent Business Strategies: OLAP in the Database. I've covered some of this previously. Especially, JMI.

"JOLAP is a J2EE objected-oriented application programming interface (API) designed specifically to addresses the programming needs of Java developers by providing a standard set of object classes and methods for BI."

"XMLA ( is a linguistic interface with no preference for programming language or object model. This linguistic interface is implemented as a web services and also defines a standard query language (mdXML) for BI."

"Hyperion views JOLAP and XMLA as complementary rather than competing standards. Although you can implement XMLA without using JOLAP, the JOLAP specification supports the web services architecture that depends on J2EE application servers, XML, and SOAP messages.

In fact, Hyperion’s implementation of XMLA uses our Java API (which was developed based on our JOLAP specification work) to communicate with the Essbase Analytic Services (OLAP Server). Our XMLA web service accepts a SOAP message, takes the mdXML statement contained in the SOAP message, and passes it to the Analytic Services engine for processing through the Java API. The result set is passed back to the XMLA web service through the Java API, where it is wrapped in a SOAP message and sent to the requesting client."

Friday, January 09, 2004

Blogs are bad, don't do blogs

Why I Fucking Hate Weblogs! "Weblogs suck ass. What the fuck is up with this shit? Fuck. Who the fuck cares what these people think about oatmeal or what the UN did last week? Nobody! Who reads these weblogs? Nobody! Maybe fellow weblog authors read each others weblogs out of a sense of desperation...the feeling that if they read someone else's weblog, someone will read theirs. It's kindof like cooperative advertising too, people will cross-post, linking weblog entries to each other's weblogs. How fucking pathetic is that? I hate weblogs. "

It's convinced me...this has been a waste of time.

Perfect Company

"And if you did create the perfect organisation – perfect in organisational terms, that is, one that would magically hoover up all of the money and destroy its competitors, as soon as you achieve that perfection you would also achieve destruction – because our society seems to thrive best where there are many ideas contending and where no one organisation/form of government/set of ideas has eliminated all the rest."

Interview with Martha Atwood

Polite Society

What you can't say "It seems to be a constant throughout history: In every period, people believed things that were just ridiculous, and believed them so strongly that you would have gotten in terrible trouble for saying otherwise.

Is our time any different? To anyone who has read any amount of history, the answer is almost certainly no. It would be a remarkable coincidence if ours were the first era to get everything just right."

What you can say.

Thursday, January 08, 2004

Relational Web Services

XQuery on the Web talks about Xquery: Meet the Web.

Dare quotes: "In fact, this separation of the private and more general query mechanism from the public facing constrained operations is the essence of the movement we made years ago to 3 tier architectures. SQL didn't allow us to constrain the queries (subset of the data model, subset of the data, authorization) so we had to create another tier to do this.

What would it take to bring the generic functionality of the first tier (database) into the 2nd tier, let's call this "WebXQuery" for now. Or will XQuery be hidden behind Web and WSDL endpoints?"

And responds with:
"Every way I try to interpret this it seems like a step back to me. It seems like in general the software industry decided that exposing your database & query language directly to client applications was the wrong way to build software and 2-tier client-server architectures giving way to N-tier architectures was an indication of this trend."

"Data model subsets" - don't you mean views?

Also Dare says, "All this indirection with WSDL files and SOAP headers yet functionality such as what Yahoo has done with their Yahoo! News Search RSS feeds isn't straightforward. I agree that WSDL annotations would do the trick but then you have to deal with the fact that WSDL's themselves are not discoverable."

Which I would refer anyone interested to the paper in the JOWS: Automated Discovery, Interaction and Composition of Semantic Web Services.

XML For You and Me, Your Mama and Your Cousin Too "At this point if you are like me you might suspect that defining that the web service endpoints return the results of performing canned queries which can then be post processed by the client may be more practical then expecting to be able to ship arbitrary SQL/XML, XQuery or XPath queries to web service end points.

The main problem with what I've described is that it takes a lot of effort. Coming up with standardized schema(s) and distributed computing architecture for a particular industry then driving adoption is hard even when there's lots of cooperation let alone in highly competitive markets."

Journal on Web Semantics

Journal on Web Semantics "The Journal on Web Semantics and also this Website is approaching scientific publishing from a different angle: our topic demands more than just the production and printing of papers, but also the distribution of ontologies and running code. An early slogan of W3C standardisation efforts was 'rough consensus and running code' - this applies also for the Semantic Web - maybe changed to 'rough consensus, running code, and ontologies'."

Wednesday, January 07, 2004

Followup on Relational RSS

RSS, old enough to be having relations? "Seb mentions several of the operators of Codd's relational algebra, and, it seems to me there are two general reasons why everyone isn't already operating on RSS as relational data: 1) it is distributed across many files, and 2) the hierarchic XML structue of RSS."

"The main issue I am dealing with now is what types of data structures and formats work best with the various combinations of uses between data interchange, data storage, and querying."

Okay, now I'm convinced that this really is replicating RDF and I would have to encourage anyone considering this to pick up an open source RDF library (like Jena or Redland) and use it to perform these operations on RDF based RSS.

The problems that are highlighted are the same ones that various implementations of RDF have had to solve. A problem with serializing a graph (relational data) in XML - that's RDF/XML and it's use of striping. Distributed across many files and being able to search it - that's usually a problem for RDF data stores (like Kowari or other freely available ones).

For example, to get all the documents (blog entries, etc.) authored by Sam Ruby (this is from a previous post describing iTQL): "select $creator subquery( select $type from <rss_schemas> where $type <> <> ) from <rss_feeds> where $creator $type '';"

Where "<rss_feeds>" can be any number of URIs combined with logical operators.

Of course, you'll need to convert some feeds from XML to RDF. While I often link to RDFT, the more usual ways include XSLT and programmatically using an RDF API and an XML library. One of the quickest ways I've found is using a combination of Jena and Jakarta Apache's XML Commons Digester.

Also related, Base data: relational, RDF, XML.

George W. vs Hitler

George Bush & Adolf Hitler "The internet is littered with pictures of George Bush with a swastika on his chest, a drawn-on mustache, and his arm raised in the Nazi salute. It is therefore no surprise that someone would choose this theme for their video. But why? Has George Bush done anything to justify the comparison? Consider these points:

* Hitler slaughtered six million. George Bush only killed nine thousand or so in Iraq and many fewer in Afghanistan. Hardly a fair comparison.
* Hitler rounded up and killed homosexuals. George Bush only denies them the right to marry. Again, no comparison.
* Hitler rounded up and killed those with physical or mental infirmities. George Bush only cut their medical benefits. No comparison.
* Hitler invaded his neighbors and overthrew their governments. George Bush only invaded and overthrew the governments of two countries, and they were not neighbors. No comparison.
* Adolph Hitler believed in a "master race." George Bush believes in a master religion. No comparison.
* Adolph Hitler was an eloquent and persuasive madman. George Bush is neither eloquent nor persuasive. No comparison
* Adolph Hitler's government was in tight control of its citizens. George Bush has only limited our right to privacy, free speech, and access to lawyers.
* Adolph Hitler demonized Jews. George Bush only demonized Osama bin Laden and Saddam Hussein (with the leaders of Syria, Iran, and North Korea held in abeyance). No comparison.

Obviously, George Bush is no Adolf Hitler. To make sure those videos are never seen, we suggest that he confiscate them and start arresting people. This kind of outrage should not be allowed in a free society."

Tuesday, January 06, 2004

Set Theory with RSS

The Algebra of RSS Feeds ""Taking a cue from the operations of set theory," Paquet writes, "we could for instance define the following:

1. Splicing (union): I want feed C to be the result of merging feeds A and B.
2. Intersecting: Given primary feeds A and B, I want feed C to consist of all items that appear in both primary feeds.
3. Subtracting (difference): I want to remove from feed A all of the items that also appear in feed B. Put the result in feed C.
4. Splitting (subset selection): I want to split feed D into feeds D1 and D2, according to some binary selection criterion on items.""

The original post.

While I wouldn't say that RDF has the monopoly on set theory (and operations like union and intersection) it does seem like reinventing the wheel.

Monday, January 05, 2004


Internet creator Berners-Lee knighted "British physicist Tim Berners-Lee, who invented the World Wide Web -- or at least better access to it -- has been awarded a knighthood in London.

Without his creation, there would be no computer addresses, no e-mail and the Internet might still be the exclusive domain of a handful of computer experts, the Independent reported."


"RSSOwl is a free RSS (0.91, 0.92, 1.0, 2.0) newsreader written in Java programming language using SWT as fast graphic library. Features of RSSOwl include reading RSS or RDF newsfeeds in a comfortable tab folder, save newsfeeds in categories, export them to PDF / HTML or OPML, and view news in an internal browser."

Even though it had some issues with freezing, laying out some of the UI, and other bugs it's easy to get running and it's not too bad. It'd be nice to have support for RSS autodiscovery. I think I still prefer NetNewsWire or NewsMonster although I should try the others.

Sunday, January 04, 2004

More Commercial RSS

* RSSAds "The ad engine for RSS feeds."
* k-collector "k-collector is an enterprise news aggregator that leverages the power of shared topics to present new ways of finding and combining the real knowledge in your organisation...The k-collector archicture combines clients for leading weblogging software with a server based aggregator and web application."


"Waypath makes use of Think Tank 23's unique information retrieval platform, Nav4, which automatically analyzes content, such as weblogs, and links documents that share common topics. Using Nav4, Waypath provides both keyword search and contextual navigation of individual weblog posts."

SDK feature list includes things such as: concept-driven similarity browsing, under load, handling over 600 queries/minute and 1 million documents on a typical installation and there's no taxonomy to maintain.

Here are Morenews's related links (top link is Themes and metaphors in the semantic web discussion) and related books.

OS JavaServer Faces

Smile Now includes its own renderkit. Next release will be fully compliant, apparently.

More associated links: JavaServer Faces home page, more details in chapters 21 and 22 of the Web Services Tutorial or alternativately a tutorial for the impatient.

Saturday, January 03, 2004


Exclusive Insider Information: Apple iBox in production. "The iBox plugs into your TV and acts as a hub for your digital devices and computers. Unlike the EyeTV from Elgato, the iBox is a standalone machine, not something to plug into an existing computer. The iBox can be scheduled to record TV, but unlike TiVos it does not serve as a "what's on and when" service rather a hard drive / media based recording device (new aged VCR). With its built in 802.11b & 802.11g from its AirPort Extreme card, one can access the home folders of any user on any wirelessly networked Mac or PC. The iBox has its own version of the popular iPhoto and iTunes software which is a welcoming plus to Mac OS 10 veterans and easy for Windows users to adopt as well."

Time to sell my Shuttle. A picture you can eat with a spoon. Tasty Apple rumours.


Finding what you want on the web "Debbie is a trained librarian, and it shows. She understands that a single search engine is never going to do everything, no matter how good its indexing or how large its database."

"I do not think we will ever solve the search problem until we move away from the dumb web we have today towards something like the semantic web, a project that Sir Tim Berners-Lee has been pushing ever since the first web conference in 1994.

Once links carry meaning then it will become more of a distributed database than the vast heap of unstructured documents we have built so far.

And once we have a database then we can classify, index and search it properly."


Analysis Engine

A Fountain of Knowledge "...imagine a marketing researcher trying to find out the online attitude of consumers toward the popular rock singer Pink. The researcher would have to wade through an ocean of search results to sort out which Web pages were talking about Pink, the person, rather than pink, the color.

What such a researcher needs is not another search engine, but something beyond that—an analysis engine that can sniff out its own clues about a document’s meaning and then provide insight into what the search results mean in aggregate. And that’s just what IBM is about to deliver. In a few months, in partnership with Factiva, a New York City online news company, it will launch the first commercial test of WebFountain...Up to now this kind of aggregate analysis was possible only with so-called structured data, which is organized in such a way as to make its meaning clear. Originally, this required the data to be in some sort of rigidly organized database; if a field in a database is labeled “product color,” there is little chance that an entry reading “pink” refers to a musician."

"Although the pooled data is compressed to about one-third its original size to reduce storage demands, WebFountain still requires a whopping 160 terabytes plus of disk space. It uses a cluster of thirty 2.4-GHz Intel Xeon dual-processor computers running Linux to crawl as much of the general Web as it can find at least once a week."

"WebFountain’s builders admit it’s not always able to guess right, but they point out that humans can also be confused by ambiguous meanings."

"Because the data has been converted from an unstructured format to a structured XML-based format, IBM and its partners can fall back on the data-mining experience and methodologies already developed for analyzing databases. The structured format also provides an easy target for developing new analytic tools."

"This, perhaps more than anything else, is why WebFountain looks like a winner. By creating an open commercial platform for content providers and data miners, it will foster rapid innovation and commercialization in the realm of machine understanding, currently dominated by isolated research projects."

Friday, January 02, 2004

Reclaim the Semantic Web

Fight back "For the technologists among us I would recommend you read this piece by Mark Nottingham on reclaiming the Semantic Web from military purposes. We need to stop wasting time on bullshit like Freindster and start using this technology to do something useful like faciltating citizen oversight of the government."

The Semantic Web’s Dirty Little Secret

IBM Emerging Technologies Toolkit

1.2 Released "Version 1.2 contains Service Data Objects (SDO), Policy-Based IT Management Demo, Semantic Web Services, Autonomic Computing Toolset, WS-Manageability demo, WS-Trust, WS-Addressing, Web Services Failure Recovery, and Service Domain technology."

Serendipity Server

One of Danny Ayers' New Year resolutions:
"...text search, creation of triples using machine learning techniques...A server-side tools that combines Semantic Web and machine learning technologies to autodiscover connections between ideas."

It seems similar to some of Tim Bray's search vision Basic Resource Finder:
"BRF will have built in most of the lore on result ranking I wrote up earlier in this series, with the possible exception of Latent Semantic Indexing. Crucially, it will have some facilities to make it easy to feed back popularity and usage counts into the ranking heuristics."

eventSherpa, SW Killer App

eventSherpa - an RDF desktop application for Windows (at last!) "The desktop app is a good looking, user friendly and very functional calendar. Where it starts to get good is that I can publish my local calendars to the eventSherpa Calendar server. They will then be available as HTML and more importantly as RDF feeds using the RDF Calendar schema for anybody to subscribe to either using the eventSherpa client or any software that can consume this RDF vocab."

Thursday, January 01, 2004

OWL Implementations

OWL Implementations (commercial implementations are Cerebra and Snobase) also includes OWL Test Results.