More News

Saturday, March 20, 2004

DL-XML Mapper

Integrating DL and XML "Semantic Web Service concepts like DAML-S or the MCM Semantic Web Service Matchmaker provide means for storage and retrieval of web services far beyond text or UDDI based search. These concepts use Description Logics as a means to express the semantics of services. However, these Description Logic formats (e.g. the "RDF/XML"-serialization of DAML+OIL) are not directly compatible with the XML formats used by today's Web Services. In other words, the lingua franca of the Web is still XML, the advantages of Description Logics notwithstanding.

In order to gain some level of interoperation between Semantic Web Service concepts and traditional XML based Web Service concepts (and between Semantic Web and Web in general!), it is necessary to provide bridges between the (RDF/DAML/OWL based) concept description community and the XML/SOAP style of message passing."

Download and more information.

Friday, March 19, 2004

SWOOP 1.3 Released

SWOOP I've covered this before, it's since changed to use Jena 2.1.

Album sales at record high

Forget the spin! It's a record record. "Total sales (in all formats) climbed to a record high in 2003: 65.6 million, easily topping the previous record of 63.9 million set in 2001.

And the sales of actual CD albums climbed above 50 million for the first time (well above 50 million actually).

It's a real cause for celebration. Back before the advent of Napster and home CD burning the industry was selling fewer than 40 million CD albums per year."

Other stories: ARIA's press release, Sound of cash registers is music to the ears, Radio interview with Peter Martin and MUSIC INDUSTRY NEAR COLLAPSE IN FILE-SHARE FRENZY - No, Wait, That's Not Right….

Or the Slashdot article.

Thursday, March 18, 2004

Practical Calendaring

Making a date with the Semantic Web " Chris Sukornyk has an answer to the question "Is there any practical application for the W3C Semantic Web concept?"

Semaview Inc., Sukornyk's Toronto-based start-up, is offering a Semantic Web-based event calendaring application called eventSherpa. It is designed to give individuals and businesses a shareable calendar with the semantic search capabilities envisioned by the W3C. "

Wednesday, March 17, 2004

Vocabularies

RELATIONSHIP: A vocabulary for describing relationships between people "The RELATIONSHIP list should make it obvious that explicit linguistic clarity in human relations is a pipe dream. It probably won't though the madness of the age is to assume that people can spell out, in explicit detail, the messiest aspects of their lives, and that they will eagerly do so, in order to provide better inputs to cool new software."

From the comments: "I have been lurking around the FOAF list for a while and know these guys have been at it for a long while. The questions raised by Clay have been brought up and debated to death...here comes a reputable guy, who probably has not been involved in the effort, who without really knowing the issues, "fires a rocket at it". This rocket causes a lot of waste of energy because it is just a basic instinct kind of a rocket. Not really thought through."

Clay Shirky on RELATIONSHIP "...he still managed to overlook the raison d'etre for the relationship. Indeed it's the raison d'etre for all vocabularies."

Most of the other comments are humorous and worth a read. Another case of the metadata being more interesting and thought provoking than the data.

Tuesday, March 16, 2004

Not FOAF

The Opposite of FOAF "One of my friends is breaking new ground with semantic web; he set up a directory of annoying salespeople, and he’s naming names."

HP iTunes

HP Music Complete with annoying background loop. Links to Apple's iTunes download page for HP.

RDF Schema Editor

Mark Choate has released an early version of RDF Schema Editor for download. Giving it a quick run, it seems fairly intuitive for those who are used to RDF.

Schema Editor update "There is nothing like a real project when it comes to learning something new. The most problematic part of RDF from a user's perspective is handling namespaces. It's a hassle. But I also think that it's a hassle that can be overcome, and not one that really needs to intrude on the user experience. Improving the usability of tools that create metadata is important if the Semantic Web is ever to grow beyond speculation."

Monday, March 15, 2004

The Prevalence of Prevalyer

Continuing Prevalence "Brief run-down on two significant paradigms that have been becoming a lot more visible, presumably because they're good at...". As long as you define "it" in the strictest sense of what you're doing. If it's anything like persistently storing data (not objects, data) then Prevalyer isn't it. That being said, it's a finalist in the Jolt Awards.

Here's a bunch of links:
* Prevayling Stupidity,
* The Prevayler,
* Object Prevalence: Get Rid of Your Database?, and
* The first message on the AJUG maling list on Prevalyer. It includes replies like, "Prevayler serializes incoming transactions but not the result of executing each transaction. You don't have change (before or after) images. Undoing a failing transaction cannot be accomplished."

Revisiting the DBDebunk web site:
* MORE ON THE STATE OF THE INDUSTRY AND SQL.
* To O OR to R: IS THIS A DATABASE QUESTION?.

We've been considering adding to iTQL the MAYBE operator, to reduce the complexity of writing joins when querying RDF data (more on the MAYBE operator by Date).

Saturday, March 13, 2004

Digging Against the Internet

Advice to Microsoft regarding commodity software "Digging in against open source commoditization won't work - it would be like digging in against the Internet, which Microsoft tried for a while before getting wise. Any move towards cutting off alternatives by limiting interoperability or integration options would be fraught with danger, since it would enrage customers, accelerate the divergence of the open source platform, and have other undesirable results. Despite this, Microsoft is at risk of following this path, due to the corporate delusion that goes by many names: "better together," "unified platform," and "integrated software." There is false hope in Redmond that these outmoded approaches to software integration will attract and keep international markets, governments, academics, and most importantly, innovators, safely within the Microsoft sphere of influence. But they won't ."

Friday, March 12, 2004

EMail Classification

Xerox Scientists Invent Software That Automatically Indexes, Categorizes, Routes Electronic Documents "Scientists at Xerox Corporation have invented powerful software that's clever enough to "read" an electronic document, decide how it should be classified by subject, then route it to the right person's e-mail address or online document management system ? all completely automatically.

The software, which is a categorizing tool, is intended to help businesses keep their e-document collections orderly and easily accessible, and it is available for licensing from Xerox.

"A misshelved book in a library might as well be lost. It's the same with documents that haven't been properly categorized; the document itself may have to be recreated," said Eric Gaussier, a research scientist at the Xerox Research Centre Europe in Grenoble, France. "Our new software can help save time and money and increase productivity. It will ensure that documents are properly classified for future retrieval and that the right information gets into the right hands as quickly as possible.""

Updates, Books and Referrers

Some quick updates:
* A Lot To Digest - A glowing summary of the Cannes Technical Plenary and WG Meeting Week.
* From duct tape to chewing gum and baling wire - A review of the RDF in XHTML proposal.
* A Semantic Web Primer. Also from MIT Press From Logic to Logic Programming and Dynamic Logic.
* Also closed the RFEs in Kowari for RDQL (which we implemented using SableCC and our own query layer) and Jena Support (still lots to optimize).
* In my referrer log: [Babelfish translation] "The semantic Web, you use it and you will use it more and more without the knowledge. And it is very well like that. RDF, OWL, Semantic Web" from La Grange.

Thursday, March 04, 2004

SWIG Summary

Semantic Web Interest Group "Just about every tool in the Adobe arsenal is now able to embed RDF into their primary artifacts. It's ironic, of course, that a core web technology, RDF, still isn't really embeddable into the core web data format, HTML; but it can easily be embedded into Adobe artifacts...there are two reasons why Adobe customers want metadata embedded in digital artifacts. The first, obvious, reason is to support automated workflow. If you've ever worked in a graphics shop or design house, you know that workflow management is absolutely critical to profitability. The second, less obvious, reason is intelligent syndication. What a real-time OWL-powered pub-sub application needs, of course, is rich RDF metadata attached or linked to artifacts of the Web: images, video and sound files, as well as XHTML documents."

"...we're at the point now where the basic knoweldge representational mechanisms -- RDF and OWL -- are formalized; where mechanisms for the creation of RDF and OWL are coming online and into place; and where there is increasing awareness of and interest in the SW. What we need, then, is the "uniform means of interaction" for the SW that HTTP provides for the Web. What we need is a reasonable specification that we can give both to RDF tool makers and to to Python and Perl and Java and C# and Ruby programmers; and we need to say to them, "implement this specification in your tool or your language; then your users and programmers will be able to uniformly access resource representations on the Semantic Web". That will be a happy day, indeed."

Also talks about how Boeing are using OWL in the battlefield.

The Other Tim on the SW

Bill Gates, Edd Dumbill, and the Semantic Web "This idea of making the data smarter is absolutely central. I have been speaking about this myself for some time. As we move to a network-based software platform, where applications don't live on the local machine but are distributed between rich client front ends and huge database back ends, "open source" alone won't really solve our problems. It's open data we're going to be fighting about."

We 0wnz0r Facts

Hands Off! That Fact Is Mine "Ostensibly, the Database and Collections of Information Misappropriation Act (HR3261) makes it a crime for anyone to copy and redistribute a substantial portion of data collected by commercial database companies and list publishers. But critics say the bill would give the companies ownership of facts -- stock quotes, historical health data, sports scores and voter lists. The bill would restrict the kinds of free exchange and shared resources that are essential to an informed citizenry, opponents say."

"Under the terms of the broadly written bill, a public-health website could be deemed in violation of the law for gathering a list of the latest health headlines and providing links to them on its home page."

"An encyclopedia site not only could own the historical facts contained in its online entries, but could do so long after the copyright on authorship of the written entries had expired. Unlike copyright, which expires 70 years after the death of a work's author, the Misappropriation Act doesn't designate an expiration date."

Free T-Shirt!

Trust on the Semantic Web "This project is designed to build and maintain a trust network on the semantic web. Using an ontology that extends FOAF, you can assign trust ratings to people you know.

The network generated in this project will serve as the foundation for a dissertation on the topic of Trust on the web. By becoming part of the network, your data will be used as the foundation for Trustbot, Trustmail, and some future applications."

Current top 10 and visualization of the graph.

Wednesday, March 03, 2004

Faceted Metadata

Faceted interfaces "Yes, the alternative is so-called faceted metadata. Like the facets on the diamond, faceted metadata allow you to look from different sides to the same information. Faceted metadata have the information organized on several dimensions (facets). The information has values on each of the facets. Readers find information by chosing values on each of the facets."

Faceted Metadata Search and Browse "Examples of faceted metadata include:

* Music store: songs have attributes such as artist, title, length, genre, date...
* Recipes: cuisine, main ingredients, cooking style, holiday...
* Travel site: articles have authors, dates, places, prices...
* Regulatory documents: product and part codes, machine types, expiration dates...
* Image collection: artist, date, style, type of image, major colors, theme..."

Both have screenshots of the faceted approach, although the second is a little more clear. Flamenco is often cited too. For example, Flamenco Fine Arts Search.

Tuesday, March 02, 2004

Balloons don't scale

Balloons and Ribbon as Social Networking Visualization "Artist/curator friend Mark Soo did a piece for one of the Infest openings where he visualized the curators' social network using balloons with people's names printed on them as the nodes and ribbons tying them together as the edges (the data comes from "invites" he got the curators to send to one another).

This was a great, inviting, tactile "graph manipulaton interface". But the reason I liked it so much was that it really brought out the problems of social networks visualizations as a way of learning about the networks being visualized: too confusing!"

JRDF

JRDF got a mention at the SWIG meeting. It would be good to work in the org.w3c namespace. The next thing, after bugs and modifications have been made to the Vocabulary package, is transactions.

The Bleeding Semantic Edge

Semantics: a new beginning? "The reason for all this continual mapping is that the source data has no meaning in any real sense of the word. That is, it has no context. What the new interest in semantics is looking to do is to provide that meaning. Unfortunately, there is no indication that database vendors (the leading ones at least) are going to provide this contextual information any time soon. However, there are moves afoot within the semantic community to try to establish semantic rules and ontologies that can be used across the enterprise and then, using appropriate tools, you can start to automate the process of creating these transformations and mappings rather than having to do it manually via a mapping tool.

There are a number of vendors already in this space. For example, Unicorn is an Israeli company that has moved its headquarters to the States, Contivo is completely American, and Network Inference started in the UK but is currently following Unicorn to the US.

Now, although all of these companies have a number of installations already, this is really bleeding edge stuff and I confess to not having fully got to grasps with it. However, I am in the process of arranging briefings with all of these companies and I will report back when I have more information and can explain it more simply. In the meantime the technology appears to have potential, not just for data integration but also other areas like EAI. Watch this space."

Friday, February 27, 2004

Time for Phase 2

I was noticing the current increase in activity and general buzz with the Semantic Web. Recently, development work like: RDF in XHTML and Nokia's Semantic Web Server. It's obvious now, we've reached Phase 2 - the fun phase.

"W3C announced the launch of Phase 2 of the Semantic Web Activity. Two new Working Groups have been formed; the Best Practices and Deployment WG (charter) and the RDF Data Access Working Group (charter). These join the RDF Core and Web Ontology WGs, the Semantic Web Interest Group, and the Semantic Web Coordination Group."

"The aim of this Semantic Web Best Practices and Deployment (SWBPD) Working Group is to provide hands-on support for developers of Semantic Web applications. With the publication of the revised RDF and the new OWL specification we expect a large number of new application developers. Some evidence of this could be seen at the last International Semantic Web Conference in Florida, which featured a wide range of applications, including 10 submissions to the Semantic Web Challenge (see http://challenge.semanticweb.org/).This working group will help application developers by providing them with "best practices" in various forms, ranging from engineering guidelines, ontology / vocabulary repositories to educational material and demo applications."

http://www.w3.org/2003/12/swa/swbpd-charter

"The RDF data model is a directed, labeled graph with edges labeled with URIs and nodes that are either unidentified, literals, or URIs (please see the RDF Primer for further explanation). The principal task of the RDF Data Access Working Group is to gather requirements and to define an HTTP and/or SOAP-based protocol for selecting instances of subgraphs from an RDF graph. The group's attention is drawn to the RDF Net API submission. This will involve a language for the query and the use of RDF in some serialization for the returned results. The query langauge may have aspects of a path language similar to XPath (used for XML in XSLT and XQuery) and various RDF experimental path syntaxes."

http://www.w3.org/2003/12/swa/dawg-charter.

Wednesday, February 25, 2004

RDQLPlus

"RDQLPlus is a tool for querying RDF graphs, featuring graphical results in a zoomable user interface (ZUI). It can work with existing RDF files, Jena2 RDF databases, and a native-Java database called Mckoi (included)."

The UI has similar scaling problems to IsaViz (with displaying hundreds of statements). Good to see a pure Java implementation.

Sunday, February 22, 2004

Profium RDF Parser

Profium RDF Parser V2 in Java Also of interest is their API documentation.

Metadata Mac

Next Mac OS X to be Metadata-driven? "The (unconfirmed) info says that Mac OS X 10.4 will go "further than anticipated", introducing not only a "database-driven" new Finder (possibly similar to BeOS' Tracker) --although the file system itself will still be HFS+-- but also a wide support for file metadata. Please note that both the BeFS (and quite possibly this Apple implementation) is not similar to Longhorn's WinFS (apples & oranges). All this is not a surprise for us, as the people who were behind the same realization on BeOS --Dominic Giampaolo and Pavel Cisler-- today work at key positions at Apple Computer in the file system and Finder areas respectively."

I had previously mentioned another suggestion for metadata in OS X.

Marusha

Marusha: Using semantic web concepts to create your own private DJ and a more recent update.

"I ran into a number of serious scaling issues on Friday. Adding the 16 million tracks from freedb.org to the parser's database really was the straw that broke the camel's back. Lesson learnt: There are types of queries that a relational database will never be able to run in acceptable time."

Saturday, February 21, 2004

Mangrove

Mangrove: An Evolutionary Approach to the Semantic Web "The Mangrove project seeks to create an environment in which users are motivated to create semantic content because of the existence of useful semantic services that exploit that content and because the process of generating such content is as close as possible to existing techniques for generating HTML documents. Our goal is to facilitate the simple annotation and subsequent extraction and querying of the enormous amount of information that already exists within the WWW's billions of HTML pages, rather than requiring the creation of new content from scratch. Our approach thus seeks to faciliate the gradual transformation of the current web into the semantic web."

Very similar to the ideas expressed recently in "interview followup". Semantic Tagger is an interesting example of this (the schema is an XML schema).

Future of MS Search

Robert Scoble On The Future Of Search Engine Technology "If we're talking about your local hard drive, searching for files on your local hard drive is still awful and getting worse...It's easier to create files now than it is to find them. "

"...to really make search work well search engines need metadata and metadata that's added by the system keeping track of your usage of files, as well as letting application developers add metadata into the system itself. In a lot of ways, weblogs are adding metadata to websites. When a weblog like mine links to a web site, we usually add some more details about that site. We might say it's a "cool site" for instance. Well, Google puts those words into its engine. That's metadata."

"Developers distrust Microsoft's intentions here. They also don't want to open up their own applications to their competitors. If you were a developer at AOL, for instance, do you see opening up your contact system with, say, Yahoo or Google or Microsoft? That's scary stuff for all of us.

But, if the industry works together on common WinFS schemas (not just for contacts either, but other types of data too), we'll come away with some really great new capabilities. It really will take getting developers excited about WinFS's promise and getting them to lose their fears about opening up their data types. "

AUIML

AUIML "AUIML is an XML dialect that is a platform and a technology-neutral representation of panels, wizards, property sheets, etc. AUIML captures relative positioning information of user interface components and delegates their display to a platform-specific renderer. Depending on the platform or device being used, the renderer decides the best way to present the user interface to the user and receive user input.

The AUIML XML is created using the Eclipse-based AUIML VisualBuilder, which allows a developer to quickly build and preview user interfaces in the Java Swing and HTML Renderers. The AUIML VisualBuilder can also automatically create data beans, event handlers, and help system skeletons for the user interface. Since it plugs into Eclipse, building the user interface and application code is an integrated proces"

Interesting that IBM are support Swing here and not SWT.

Ontological Programming

Semantic Integration "Basically semantic integration seems to involve using RDF/OWL to define mappings between XML vocabularies.

Recently I've been involved in co-ordinating a number of large scale data migrations to help integrate several systems. This inevitably involved exploring the business models in the affected applications to define a mapping layer to allow the data to be cleanly migrated. Also inevitably this mapping ends up being expressed as a bunch of (hairy!) procedural code. It would have been nice to have been able to express that mapping in a more declarative way, if no other reason than it's easier to understand and debug possible problems. An example of not being able to see the wood for the trees."

There are many way to declaratively map different schemas - SQL, OO and XML all have their own tools. The value that RDF has is that you can program to a single ontology and map all of these different data sources to it.

With RDF there's not only the ability to map between XML vocabularies but to program to an ontology rather than a schema. In this case, the problem goes from MxN to M+N (or even just M).

One of the articles that highlights this issue is The Impedance Imperative Tuples + Objects + Infosets =Too Much Stuff!.

Friday, February 20, 2004

US Government Understands the Semantic Web

Taxonomy’s not just design, it’s an art "If anyone understands the acronym soup of Web services, it’s Michael C. Daconta. He’s director of Web and technology services for systems integrator APG McDonald Bradley Inc. of McLean, Va. As part of that job, Daconta is chief architect of the Defense Intelligence Agency’s Virtual Knowledge Base, a project to compile a directory of Defense Department data through Extensible Markup Language ontologies. "

"GCN: Will everyone use a single taxonomy for one big semantic Web, or will organizations build their own semantic Webs?

DACONTA: There clearly will not be just one semantic Web. A lot of people are looking at taxonomies, so they have to be careful. "

Thursday, February 19, 2004

Google Won

Search For Tomorrow ""For a lot of kids today, the world started in 1996," says librarian and author Gary Price.

And yet Berkeley professor Peter Lyman points out that traditional sources of information, such as textbooks, are heavily filtered by committees, and are full of "compromised information." He's not so sure that the robotic Web crawlers give results any worse than those from more traditional sources. "There's been a culture war between librarians and computer scientists," Lyman says.

And the war is over, he adds.

"Google won.""

""A generation ago, reference librarians -- flesh-and-blood creatures -- were the most powerful search engines on the planet. But the rise of robotic search engines in the mid-1990s has removed the human mediators between researchers and information. Librarians are not so sure they approve. Much of the material on the World Wide Web is wrong, or crazy, or of questionable provenance, or simply out of date (odd to say this about a new technology, but the Web is full of stale information).

"How do you authenticate what you're looking at? How do you know this isn't some kind of fly-by-night operation that's put up this Web site?" asks librarian Patricia Wand of American University.""

""He needs one that knows that he's a big-brain tech guru and not an eighth-grader with a paper due.

"The field is called user modeling," says Dan Gruhl of IBM. "It's all about computers watching interactions with people to try to understand their interests and something about them."

Imagine a version of Google that's got a bit of TiVo in it: It doesn't require you to pose a query. It already knows! It's one step ahead of you. It has learned your habits and thought processes and interests. It's your secretary, your colleague, your counselor, your own graduate student doing research for which you'll get all the credit.

To put it in computer terminology, it is your intelligent agent.""

Tuesday, February 17, 2004

RDF - For All Your Fire Hydrant Needs

ETech Wiki links to the Collaborative Mapping Workshop in San Diego. RDFMapper.

Monday, February 16, 2004

XSLT for everything

On Semantic Integration and XML "In conclusion, it is clear that Semantic Web can be used to map between XML vocabularies however in non-trivial situations the extra work that must be layered on top of such approaches tends to favor using XML-centric techniques such as XSLT to map between the vocabularies instead. "

Radar Networks Triple Store

Okay, not much sleuthing here, it's the 4th hit on Google or something, still interesting.

"2 November, 2003. A short and somewhat formal description of the Radar Networks Triple Store, which is the system that handles the semantic metadata for this website. An essay by Jack Rusher.
Introduction

A triple store is designed to store and retrieve identities that are constructed from triplex collections of strings (sequences of letters). These triplex collections represent a subject-predicate-object relationship that more or less corresponds to the definition put forth by the RDF standard.

The problem space of storing this sort of data has been explored by the graph database, object database, PROLOG language and, more recently, semantic web communities. A more thorough backgrounder is provided by the work on Datalog, Jena, Dave Beckett’s Redland, the AT&T Research Communities of Interest project, Ora Lassila’s Wilbur, and the activities of the W3C Semantic Web project, among others."

Triple Store

Oh and then there's this blog (which I've read before without seeing the Radar Network postings).

The Dangers of Caffeine

Coffee-breaks sabotage employees' abilities "St Claire and Rogers decided to investigate caffeine's effects on work stress after hearing an anecdote at a stress workshop. A man described how he and a group of normally cohesive colleagues went on a business trip to the US.

Unlike in the UK, coffee was freely available and the team over-indulged. Within days their stress levels had escalated and they believe the extra caffeine had disrupted their working relationships, and impaired their working ability.

The Bristol team tested caffeine's effects on 32 coffee-drinkers. They told them they would be given a caffeinated coffee that would boost their performance, or a caffeinated coffee which causes stress-like side-effects, or decaffeinated coffee. However, unknown to the volunteers, only half the drinks contained 200 mg of caffeine and the other half contained none. The subjects then carried out two stressful tasks."

""Certainly in our experience of people drinking coffee there's a tendency for all sorts of personal interactions to get a little more intense. If there was a stressful situation there would be more shouting, yelling, louder talking," he told New Scientist. "This is very interesting confirmation.""

Beware of taking Bristolites to cafes with bottomless cups of coffee - violence, mayhem and talking louder could ensue.

Sunday, February 15, 2004

GiST and MTrees

MTree Project "The M-tree is an index structure that can be used for the efficient resolution of similarity queries on complex objects to be compared using an arbitrary metric, i.e. a distance function d that satisfies the positivity, symmetry, and triangle inequality postulates. For instance, with the M-tree you can index a set of strings and organize them according to their edit distances (the minimal number of character changes needed to transform one string into another)."

MTree Applet (also part of XXL).

GiST: A Generalized Search Tree for Secondary Storage "The GiST is a balanced tree structure like a B-tree, containing pairs. But keys in the GiST are not integers like the keys in a B-tree. Instead, a GiST key is a member of a user-defined class, and represents some property that is true of all data items reachable from the pointer associated with the key. For example, keys in a B+-tree-like GiST are ranges of numbers ("all data items below this pointer are between 4 and 6"); keys in an R-tree-like GiST are bounding boxes, ("all data items below this pointer are in Calfornia"); keys in an RD-tree-like GiST are sets ("all data items below this pointer are subsets of {1,6,7,9,11,12,13,72}"); etc. To make a GiST work, you just have to figure out what to represent in the keys, and then write 4 methods for the key class that help the tree do insertion, deletion, and search."

Report on implementing GiST in Java (framed).

Other trees (or multi-dimensional access methods).

Saturday, February 14, 2004

Problems with Java Generics

Generics in C#, Java, and C++ "...with Java generics, you don't actually get any of the execution efficiency that I talked about, because when you compile a generic class in Java, the compiler takes away the type parameter and substitutes Object everywhere. So the compiled image for List is like a List where you use the type Object everywhere...Of course, if you now try to make a List, you get boxing of all the ints. So there's a bunch of overhead there. Furthermore, to keep the VM happy, the compiler actually has to insert all of the type casts you didn't write."

"When you apply reflection to a generic List in Java, you can't tell what the List is a List of. It's just a List. Because you've lost the type information, any type of dynamic code-generation scenario, or reflection-based scenario, simply doesn't work."

With 1.3 casting used to be a fairly big overhead, with 1.4 this isn't an issue. Running the trove4j benchmarks in 1.4 you can still see the performance difference of objects vs primitives.

The second issue, and it's the same with autoboxing, seems to highlight how the syntax gives the illusion of consistency when there is none.

Trust Visualization

Trust on the Semantic Web has some examples of visualization. I recently found, again, the visualization of newsgroups by Microsoft Research called Netscan. I was reminded about this after reading unstruct.org's "Ouch! This is a red hot topic...".

New Sesame Home

"The openRDF.org site is a community site that is the center for all Sesame-related development. Here, developers and users can meet and discuss, ask questions and submit problem reports. The latest news about Sesame will be posted here."

Friday, February 13, 2004

Kowari 1.0.1

After some feedback about problems with building Kowari we've decided to release a new version that will compile and build successfully - this was due to problems with Barracuda. The upshot of this, though, is that all the bugs we fixed and the features we've been working on like Jena and JRDF support, RDQL, improvements in resource allocation and handling, and a Swing based iTQL UI get released.

We're also having fun loading millions of triples on an Opteron system running Linux. It gets about 4,000 triples/second after 13 million triples - which isn't quite fast enough, of course it's never fast enough.

Kowari project page.

Brain Dead Farting - I Don't Think in Triples

There looks to be a nice response on TriX. I was fairly aware that what I wrote I'd hoped it was what people would've thought of before. Had a simple "here's the XSLT you boob" and be done with it. I guess it's time to read those 39 odd emails.

Okay, that doesn't seem have been enlightening - especially when there's no response to your mail. It's not supposed to be human readable but someone has to write the XSLT for the "user".

Eric Jain has had a similar experience with the RDF/XML syntax that I have : "Interestingly, when presented with the choice of working with an XML or an RDF/XML representation of the same data, our developers (somewhat familiar with XML, not RDF) choose to use the RDF version (to my great relief :-). The data is relatively complex, with lots of cross-referencing, which the RDF/XML syntax can handle in a simple and consistent way."

Alberto Reggiori said: "but the real point here is how much work a user/programmer has to put into writing and managing RDF descriptions - even though RDF is supposed to be for machines, the poor users will have most of time mark-up their data into their templates of scripts, JSP, ASP and so on. TriX is definitively a step ahead compared to RDF/XML - added DTD, XMLSchema (then kinda "deterministic" markup) and named graphs are very cool - but still too complicated for the average human being to use, due he/she has to think in terms of statements, subjects, predicates, objects, collections, reification and so on. Definitively, such an "assembly like language" is very good for general purpose RDF toolkits and frameworks, and more experienced users. But the major part of XML folks out there do not have a clue (or few) why they have to "denormalize" their data all the time into RDF constructs."

After reading the TriX paper again, I still wonder what is the exact problem that's trying to be solved. I don't think it's made the problem of converting RDF to XML any easier, it is significently more verbose than RDF/XML or N3, some of the small things like named graphs are useful but not by themselves. Using XQuery to query it would be slow and rather cumbersome, it's much better to use a triple store.

Thursday, February 12, 2004

13 Things to Fix about EJBs

"In particular developers appear to be most interested in the following:

1. Ditch CMP or make it simpler like Hibernate or JDO
2. Use a POJO programming model
3. Add support for Interceptors/filters/AOP.
4. Eliminate/Consolidate the component interfaces (remote, local and endpoint).
5. Make deployment descriptors more like XDoclet
6. Instance-level authentication/entitlement.
7. Replace the JNDI ENC with Dependency Injection (a.k.a. IoC).
8. Add a Mutable Application Level Context
9. Support development and testing outside the container system.
10. Define cluster-wide singletons for EJB.
11. Clearly specify and standardize class loading requirements.
12. Standardize the deployment directory structure for EJBs.
13. Support Inheritance."

13 improvements for EJB.

Wednesday, February 11, 2004

IoC

I was looking for more information about this today and found this (Martin Fowler has an eariler article and so does Michael Yuan):
"The best way to describe what IoC is about, and what benefits it can provide, is to look at a simple example. The following JDBCDataManger class is used to manage our application's accessing of the database. This application is currently using raw JDBC for persistence. To access the persistence store via JDBC, the JDBCDataManger will need a DataSource object. The standard approach would be to hard code this DataSource object into the class, like this:

public class JDBCDataManger {
public void accessData() {
DataSource dataSource = new DataSource();
//access data
...
}

Given that JDBCDataManger is handling all data access for our application, hard coding the DataSource isn't that bad, but we may want to further abstract the DataSource, perhaps getting it via some system-wide property object:

public class JDBCDataManger {
public void accessData() {
DataSource dataSource =
ApplciationResources.getDataSource();
}

In either case, the JDBCDataManger has to fetch the DataSource itself.

IoC takes a different approach — with IoC, the JDBCDataManger would declare its need for a DataSource and have one provided to it by an IoC framework. This means that the component would no longer need to know how to get the dependency, resulting in cleaner, more focused, and more flexible code."

A Brief Introduction to IoC

Tuesday, February 10, 2004

RDF in GPL and LGPL

Creative Commons Includes GPL And LGPL Metadata "I was looking at the Creative Commons site this weekend, and was surprised to find, on their license generation page, entries (translated into Portuguese) in a sidebar for the GNU General Public License and GNU Lesser General Public License, including RDF blocks. Since CC is pushing for projects that can generate, validate, display and search for CC license metadata, how cool would it be to be able to do a Google search for GPL-licensed material, or a P2P network for MP3s released under the CC Attribution-ShareAlike license? As an example, Nathan Yergler has released mozCC, a plugin for Mozilla and Firebird that allows you to view CC license information embedded in a webpage, and provides icons on the status bar displaying the CC license options."

Time to do one for MPL I guess.

RAD XUL

Building RAD Forms and Menus in Mozilla "In Rapid Application Development with Mozilla, Web, XML, and open standards expert Nigel McFarlane explores Mozilla's revolutionary XML User interface Language (XUL) and its library of well over 1,000 pre-built objects. Using clear and concise instruction, McFarlane explains what companies such a AOL, IBM, Hewlett-Packard, and others already know—that Mozilla and XUL are the keys to quickly and easily creating cross-platform, Web-enabled applications. The Mozilla Platform encourages a particular style of software development: rapid application development (RAD). RAD occurs when programmers base their applications-to-be on a powerful development tool that contains much pre-existing functionality. With such a tool, a great deal can be done very quickly. The Mozilla Platform is such a tool."

Monday, February 09, 2004

More Java Magic

Java metadata extraction / file format identification library, similar to JHOVE.

Thursday, February 05, 2004

SemEarth

The Semantic Earth "Thanks to the constellation of technology that enables digital networks to be laid over the places of the earth, wherever we are we will be able to hear the human conversation that has occurred about that place - the history that occurred there, the aesthetics to be savored, the commerce transpiring at that very moment, recommendations offered by strangers and friends."

More comments: "Predictably, I'm curious because this is an area that my research group has been interested in for some time. For example, our work in creating digital representations for people, places and things led us to semantic location and the Websign project...will a proprietary model be operative?"

Planetarium

XML Watch: Planet Blog "As the number of Planet-style aggregators grows (while I'm writing this, Planets Apache and SuSE are under active development), so grows a variety of software for creating the aggregated sites. There are now at least three codebases for creating such sites, originating with Monologue, Planet GNOME, and Planet RDF. It would be good if each of these codebases could interoperate at least on the basis of configuration files, such as the RDF blog listing from Listing 1. Additionally, we may want a more advanced way of describing each of the planets, perhaps so an über-aggregator -- the Planetarium! -- can be made. (Actually, Jeff Waugh, who created Planet GNOME, has just registered "planetplanet.org", so watch this space!)

I'll leave you with the code in Listing 5, which is a suggestion of how multiple planets could be described; processors follow the seeAlso links to retrieve a list of contributors for each planet. If the choice is made to use RDF/XML, creating the über-configuration file is as easy as aggregating the various RDF blog lists."

A great article and all, especially Figure 2.

Quick Links

* The Semantic Web Made Easy - about a startup using RDF called Radar Networks.
* Protege 2.0
* SemWeb Central - OS Semantic Web tools.
* OntoJava (linked to previously)
* Mac G5 Cude - Instructions on how to build a Mac cube with the G5's grill look.
* Why your Movable Type blog must die - Another anti-blog rant.
* What if there was a data format, and nobody cared? - About social networking and RSS.

Wednesday, February 04, 2004

Docco

Tockit is an open source project which was written by people from DSTC. It has a great interface that uses lattices called Docco. It's a representation of Lucene results combined with metadata extracted by plugins like POI. It seems to be only using the text found in documents, not actual concepts.

It helps to remove the empty nodes to better understand the results.

Screenshots are also available.

Other FCA tools written using Java are available at the ToscanaJ site.

SWOOP

SWOOP (Semantic Web Ontology Overview and Perusal) "SWOOP is a simple and elegant utility to browse Ontologies written in OWL in a hyperlinked thesaurus-style format...Allows users to add OWL ontologies to the Knowledge Base and browse terms listed in them (sorted alphabetically). These ontologies can be saved locally for faster retrieval at a later stage (uses Jena 2.0)"

EII

A new view on data "Composite Software is at the forefront of this trend. Here's what EII is not, according to Jim Green, Composite's CEO and chairman: EII is not EAI (enterprise application integration). EAI pushes data around; EII is a pull system. If an address changes in your CRM application, EAI will push that information out to your ERP (enterprise resource planning) system, for example. Conversely, EII, pulls only the data you need out of ERP and CRM systems and offers it up in a single view for analysis.

Composite Software's Information Server stores the meta data (the data about the data), the fields, and the relationships between them. When the user executes the query, it fetches the data from the underlying systems to present a synthetic view.

"Composite Software's Composite Information Server joins data from different types of resources and creates an alias so it looks different than when it was stored," Green says.

EII is also not BPM (business process management). It has nothing to do with changing business processes. Composite Software's Murthy Nukala, vice president of marketing, pointed out some of the benefits of EII.

"Data takes up most of the cost of an integration effort. Increased understanding of that data is mission-critical, and you need to have a strategy about how you handle data," said Nukala. "

Tuesday, February 03, 2004

MOFman Prophecies

"If done correctly, a meta data-based approach can allow for reuse of interface definitions, messages and other pieces of the integration puzzle. For reuse to happen, though, developers must be able to quickly and easily find previously developed pieces. A centralized repository approach is one way to let that happen...The Holy Grail for this technology is creating, managing and reusing meta data models that are then redeployed to execution engines to reduce the amount of code that needs to be developed manually."

"Vendors, including Redwood City, Calif.-based Informatica Corp. and Bournemouth, U.K.-based Adaptive, have implemented MOF for their warehouse tools and engines. Informatica’s SuperGlue helps to put context around information used in traditional enterprise application integration (EAI)...[it] collects, stores and helps to analyze meta data, including providing audit trails. “The value is in being able to see the dependencies and linkages between data,” Poonen explained. Six customers, including one federal government agency that Poonen could not name, currently use SuperGlue."

"Beyond technology, successful meta data-based integration will depend on corporate culture and practices. Not everyone agrees that a centralized “center of excellence” approach to integration meta data is an absolute given to make it work, but some methodology or approach clearly is -- especially if reuse is ever going to happen."

"By 2005, predict Gartner analysts, more than half of large organizations will have multiple sources of integration technology. “As that proliferation occurs, being able to recognize the use of meta data and have consistent use across all the different deployments becomes important,” said Thompson."

The next step for meta data: Application integration

Monday, February 02, 2004

Lego as an Analogy

We were just talking about it the other day, I recently came across this Lego diagram describing the trade-offs of business objects.

Saturday, January 31, 2004

Pragmatic Metadata

Content-aware searching "Brin’s pragmatic stance sharply opposes the idealistic view of the Web’s inventor, Tim Berners-Lee, who continues to evangelize his vision of a Semantic Web full of carefully encoded content that we can precisely search and fluidly recombine. My own humble contribution to this debate is a prototype search engine, now running on my Weblog, that tries to steer a middle course between the Scylla of simple fulltext search and the Charybdis of unwieldy tagging schemes and brittle ontologies."

"Remember, the pools of HTML content that your people routinely create, and the infinitely vaster pools to which they have access, are full of intrinsic metadata — including the links, tables, images, and other elements that occur naturally within HTML content. Mining that metadata may be more practical than you think."

Web Services then the Semantic Web

The Web within the Web "All these new protocols, SOAP in particular, took years to develop. Indeed, they're still works in progress, in part because contributing companies want to receive patent royalties or just don't want a competitor to control a standard. Those same concerns sabotaged two earlier transport mechanisms, one from the Unix world and one invented by Microsoft. "

"Although Web services allow a machine to publish its data, making it available to another machine, the two have to agree on the structure of the data they are publishing. In the semantic Web, this sort of agreement will be largely unnecessary. "

Bossam

Bossam rule engine v0.7b42 is available for download. Only binary form of the engine is available, and you need a Java runtime (J2SE 1.4 or later) to run the engine. The engine is not feature rich, has many problems, and is buggy. But still, you can process RDF queries, and perform reasoning over OWL ontologies with Bossam. Currently, Bossam supports only one rule language, Buchingae.

Friday, January 30, 2004

Datacentric Web - Microcontent de ja vu?

The Data Centric Web "This shift in focus from documents to data and from humans to computers is simple and yet profound. Just imagine a world in which every piece of data is immediately and automatically accessible from any computer via the web using a simple, universal set of protocols and formats. Indeed, such a vision has long represented the ?holy grail? of Enterprise Application Integration (EAI) and yet attempts to realize this vision have been woefully inadequate to date."

Very similar to an older article about the microcontent client and of course very similar to the Semantic Web too.

Introducing the Microcontent Client
"The microcontent client is an extensible desktop application based around standard Internet protocols that leverages existing web technologies to find, navigate, collect, and author chunks of content for consumption by either the microcontent browser or a standard web browser. The primary advantage of the microcontent client over existing Internet technologies is that it will enable the sharing of meme-sized chunks of information using a consistent set of navigation, user interface, storage, and networking technologies. In short, a better user interface for task-based activities, and a more powerful system for reading, searching, annotating, reviewing, and other information-based activities on the Internet."

The good and the bad

I'm in heaven "Today I'm in geek heaven in so many ways. I read Curtis Hovey's recent weblog entry. He writes:

I strongly feel that a GNOME metadata solution should be based on metadata standards: RDF, OWL, FOAF and use common grammars."

From the original posting: "I strongly feel that a GNOME metadata solution should be based on metadata standards: RDF, OWL, FOAF and use common grammars. I'm shopping for a new Medusa backend because I don't think Medusa should be in the DB business, and it needs an extensible schema."

"The bad

* BDB 4: will everything break when BDB 5 comes out
* Mysql: a bit of a nuisance to setup for single users
* query: applications and users need robust searching
* scalability: will this work at 100 megs, the size of my Medusa db"

Taxonomy Warehouse

Agency taxonomies are a tall order, experts say "An agency building an enterprisewide taxonomy should expect to see more than a million categories within their design, according to Claude Vogel, chief technology officer for search engine company Convera Corp. of Vienna, Va."

"“People underestimate the magnitude of how big their taxonomies will be,” Vogel said, adding that commercial software, such as Convera’s, can handle most, though not all, of the job. "

http://www.taxonomywarehouse.com/

Tuesday, January 27, 2004

Two Interviews

Checking in with the Inventor: Tim Berners-Lee ""The general public is seizing on the Web as a way to have a conversation," he said in our own chat this week. "That for me is very inspiring. It doesn't tell me something about the Web. It tells me something about humanity. The hope for humanity is that people do want to work things out. They do want to come to common understandings, and they will do it by constantly refining the way they've expressed their own ideas--and occasionally, on a good day, listening to the way other people have expressed theirs.""

Under the Iron interview with Aaron Swartz "You’ve put a tremendous amount of work in, for example, RDF and RSS 1.0 (the latter using the former). People say this is the basis of the “Semantic Web”. Could you cue us in on what they hope to achieve with this, how they will make everyone start doing something to achieve it, and what exactly it is we’ll start doing? Do you believe this is possible?

So, uh, here’s the plan:

1. Collect data

2. ???????

3. PROFIT!!!

Uh, more specifically, the idea is to get everyone sharing their vast databases of information in RDF with each other. Then we can write programs that put this data together to answer questions and take actions to make our lives easier."

Thursday, January 22, 2004

What makes technology succeed?

What matters? "In many cases, it is much more important that a choice be made so that we as a society can benefit from the network effects, than it matters which choice is made...the choice between technologies is often of much smaller significance for us as a society than that there be a choice. Networks effects have their role in this and provide many of the benefits.

But if we want to benefit not just from the network effect but also from the advantages of technology, it is in everyone's interest that the network effects cut the right way: that we choose as a society the technologies that work best.

Now, if network effects are the best predictor, then we must infer that the people who actually are responsible for making a good decision are the early adopters. In IT, that means you. You have a responsibility to judge what matters not by network effects but by technical merit. This is a special case of the Categorical Imperative of Immanuel Kant, which you may dimly remember is phrased something like this: “act only on that maxim by which you can at the same time will that it should become a universal law,”1 but which your mother may have expressed more colloquially as, “What would the world be like if everyone did that?”"

"But the reasons that it is a good idea for them to be widely adopted have nothing to do with the differences between SGML and XML, and everything to do with the essential characteristics of the languages...But the choice of any technology is a cost/benefit calculation. And the only changes XML made to that calculation were in lowering the costs of deployment, not in adding any benefits — unless you count the the benefits of the network effect, which are, as I have suggested, considerable."

OWL "Tiny"

Discussion on Owl Tiny.

Wednesday, January 21, 2004

Cluster Graphing

"Anyone who has ever had to complete a what doesn't belong question on a test has an interest in clustering technology. How close are the terms "slime mold", "skunk odor removal" and "luxury bathroom" anyway? Zoom in here and find out. (Clue: They are all green)."

http://labs.yahoo.com/demo/clustergraph/top_level#img

Okay, that looks cool...then put your mouse over the map...

Kowari Update

* iTQL will be changing to be RDQL with proprietary extensions. This is based on the recent RDQL submission and previous discussions we've had.
* The Kowari lite development (just the minimum number of jars to get it going) is continuing. The iTQL command line UI has been improved so it's at least as good as the web iTQL UI.
* RDF Query Languages Eric van der Vlist's question, "Where are the triples?", we've often thought the same thing. The iTQL interpreter has planned, for a long time, to support spitting out RDF/XML results as well as it's existing XML and ResultSet based answers.
* CVS is going to stay internal until we get significant external development. It will be updated infrequently (as bugs are fixed) and with each release.
* Started looking at Aquamarine's API as far as JRDF is concerned. JRDF has had some minor updates (only in CVS at the moment).

Actually, after thinking about querying what you really want is to define a query and have it return all the RDF/XML related to a resource that matches the query - it's something that Guha wrote about (as given by Dan Brickley). This would require OWL and schema support but it's something that is unique over an existing SQL database. You could also, hack this up, by creating a large WHERE clause that defines all the properties of the resource but that's a lot of effort to do each time.

Semantic Google, Semantic Web

Reading this discussion about semantics (linked to by this blog entry) it's especially encouraging to see people when they are talking about this stuff to talk about RDF and the Semantic Web.

Especially, this on the second page:
"Then I re-read G's analysis of Vijay's article (previous post in this thread) in which he points out that Google/Froogle is already extracting this semantic information from non-RDF documents and doing a pretty good job of it all things considered (even if they are trying to sell off a forum moderator, and pretty cheaply, too, I might add ).

So, if I'm understanding things correctly... we don't have to convert everything over to XML (at least not right away) in order for this to work. Which is a good thing, because there are a buncha individuals and mom & pops out there (and some companies who ought to know better and could certainly afford the upgrade) who haven't even started using CSS and HTML4.x, much less XHTML or XML."

"Think of RDF (Resource Description Framework) as the foundation. Think of XML (eXtensible Markup Language) as the formatting language used to deliver it."

Putting Ontologies to Work

Judging the likely Success of an Ontology "Clay Shirky is obviously right when he states that a single monolithic ontology will never work. His critics are equally right when they claim the Semantic web will only work if it is a m�lange [melange] of multiple interoperable Ontologies. What is missing from the debate is a more detailed explanation of what ontologies are good at, how they interoperate, and why systems based on ontologies succeed or fail."

"Ontologies, far from being an unproven new concept, are already in practical daily use. They form the foundation of classification systems, databases, and object oriented software applications."

I enjoyed reading this article, especially as it touches on many areas (like the relational model) and previous articles. This is virtually the perfect anti-Shirky piece.

Tuesday, January 20, 2004

Apache XML

Not what you would initially think: ""Within the Apache Longbow, eXtremeDB will manage secure, digitized battlefield data. eXtremeDB's XML interface will facilitate communications, both internally and between the attack helicopter and external (ground and air) systems. Embedded software including eXtremeDB will run on airborne PowerPC processors and a commercial real-time operating system (RTOS). The program is being developed by The Boeing Company's Phantom Works organization in Mesa, Ariz."

http://www.xmlmania.com/news_article_846.php

Combine Two Technologies

Like putting a clock in an existing product I keep thinking of combining an XML Swing library with JSF. It would be nice to use JSF as a way to provide the abstraction for differing rendering technologies (this was the first thing I thought when I saw JSF). This interview had some interesting bits of information on this:
"One of the unique things about Faces is that it allows you to have separate classes for rendering a UI component. So a simple text box can consist of a UIInput component, which represents the concept of collecting user input, and a Text renderer, which knows how to display a textbox in HTML. You can create separate renderers for different types of clients -- one for HTML, one for SVG, and one for WML, for example...the third-party component market will continue to grow, not just with HTML components (which will be first), but also components and renderers that support other devices and richer clients."

"There's a sample in the current Faces early access release of XUL instead of JSP, but I think more work needs to be done to prove that other display technologies can really be first-class citizens."

"JavaServer Faces is also a good technology for thin client applications that aren't HTML-based. I've mentioned WML, but you could also write a Java applet application, or some other non-browser client that works with JSF. We'll see these types of applications evolve over time.

Personally, I think fat clients are great for some applications, like RSS News Readers. But web applications are great for other things, and JSF is a good way to build those types of applications. "

The latest download includes XUL in the non-JSP examples.

Monday, January 19, 2004

Related To

Semantic Similarity in a Taxonomy "This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach."

Xen

'Xen' programming language unites C#, XML and SQL programming languages. ""I am currently working on language and type-system support for bridging the worlds of object-oriented (CLR), relational (SQL), and hierarchical (XML) data, and of course first class functions," explains Mejer."

Mejer's paper, Unifying Tables, Objects, and Documents explains this idea more fully. Beware the Haskell programmer. Some of the syntax reminded me of Groovy. ExtremeTech also have an article.

Which Schemas?

WinFS Is a Storage Platform "WinFS is an active storage platform for organizing, searching for, and sharing all kinds of information. This platform defines a rich data model that allows you to use and define rich data types that the storage platform can use. WinFS contains numerous schemas that describe real entities such as Images, Documents, People, Places, Events, Tasks, and Messages."

Saturday, January 17, 2004

No unstructured data

MORE ON “UNSTRUCTURED” THINKING "There is no such thing as "unstructured data". That means random noise, which has no structure whatsoever and, therefore, is meaningless. It is the structure that gives meaning/content and makes data.

It has nothing to do with scanning, or incompleteness, or missing, or anything. It is structured, whatever it is. Diagrams have one type of structure, partial documents different types of structure, but there is always some structure by definition.

The term "unstructured data" is a misnomer based on misconception: it essentially refers to data that is not structured in tables, or spreadsheets, or whatever; mainly text, graphics, etc. But that is not unstructured, it's just different structures than tables or spreadsheets, that's all.

And that's a core issue, because structure determines the integrity and manipulation of the data, which are different for each type of structure. The point of relational structure is that it is the simplest formal structure for integrity and manipulation. Any other structure adds complexity, but no power."

Thursday, January 15, 2004

Another Java RSS Parser

FeedParser "The main API is very similar to JAXP, TRaX, SAX, and is designed to be very flexible. Having been a veteran of the RSS wars, member of the RSS 1.0 working group, and Atom developer, I think this takes into consideration all major issues with RSS/Atom feed formats and integration with the Java language...RSS serialization support. Serializers for all RSS versions (1.0, 2.0, Atom, etc) with the same code."

Kowari

Well, it's not quite there yet but it will be available at SF's Kowari Project page (a 45MB download and in CVS). I think I've mentioned this before, one of the future goals is to reduce the download to the minimum set of jars.

Tuesday, January 13, 2004

Why the Dock Sucks

Top Nine Reasons the Apple Dock Still Sucks "The Dock is like a brightly-colored set of children's blocks, ideal for your first words—dog, cat, run, Spot, run—but not too useful for displaying the contents of War and Peace."

Active Internet

A collection of articles about how computers and the Internet are turning people into content producers not just consumers:
* Weblogs, RSS and the Rise of the Active Web - "...we show how blogging – originally a cross between self-expression and journalism – and its tools have morphed to give users some of the power promised by the so-called Semantic Web...they can construct personal news or commerce portals for themselves or for third parties, track multi-person blog conversations across the Web, or figure out other ways to control their digital environment that we have not thought of yet."
* The New Economy Hack: Turning Consumers into Producers - "That industry lately has become vigilant about threats from its customers, which it still thinks of as consumers. Instead it should be watching how Apple transforms those consumers into producers."
* Democratizing the Media, and More - "Smarter folks will understand the enormous opportunity it represents. They can start listening, really listening, to what people are saying. And they can dip into the vast pool of creative talent that exists outside the usual channels."

I finally have an excuse to link to Bush In 30 Seconds. My favourites were: In My Country, What are we teaching our children?, Imagine, Human Cost of War and Bush's Repair Shop. I still think the quality, even in the top 14, was spotty but that's where you need good annotation and recommendation software.

Monday, January 12, 2004

The Importance of Ontology

Ontology and Integration - Managing Application Semantics Using Ontologies and Supporting W3C Standards " Ontologies are important to application integration solutions because they provide a shared and common understanding of data (and, in some cases, services and processes) that exists within an application integration problem domain, and how to facilitate communication between people and information systems. By leveraging this concept we can organize and share enterprise information, as well as manage content and knowledge, which allows better interoperability and integration of inter- and intra-company information systems. We can also layer common ontologies within verticals, or domains with repeatable patterns."

Sunday, January 11, 2004

The old fashion way of integration

Compare and Contrast JOLAP and XML for Analysis and Intelligent Business Strategies: OLAP in the Database. I've covered some of this previously. Especially, JMI.

"JOLAP is a J2EE objected-oriented application programming interface (API) designed specifically to addresses the programming needs of Java developers by providing a standard set of object classes and methods for BI."

"XMLA (www.xmla.org) is a linguistic interface with no preference for programming language or object model. This linguistic interface is implemented as a web services and also defines a standard query language (mdXML) for BI."

"Hyperion views JOLAP and XMLA as complementary rather than competing standards. Although you can implement XMLA without using JOLAP, the JOLAP specification supports the web services architecture that depends on J2EE application servers, XML, and SOAP messages.

In fact, Hyperion’s implementation of XMLA uses our Java API (which was developed based on our JOLAP specification work) to communicate with the Essbase Analytic Services (OLAP Server). Our XMLA web service accepts a SOAP message, takes the mdXML statement contained in the SOAP message, and passes it to the Analytic Services engine for processing through the Java API. The result set is passed back to the XMLA web service through the Java API, where it is wrapped in a SOAP message and sent to the requesting client."

Friday, January 09, 2004

Blogs are bad, don't do blogs

Why I Fucking Hate Weblogs! "Weblogs suck ass. What the fuck is up with this shit? Fuck. Who the fuck cares what these people think about oatmeal or what the UN did last week? Nobody! Who reads these weblogs? Nobody! Maybe fellow weblog authors read each others weblogs out of a sense of desperation...the feeling that if they read someone else's weblog, someone will read theirs. It's kindof like cooperative advertising too, people will cross-post, linking weblog entries to each other's weblogs. How fucking pathetic is that? I hate weblogs. "

It's convinced me...this has been a waste of time.

Perfect Company

"And if you did create the perfect organisation – perfect in organisational terms, that is, one that would magically hoover up all of the money and destroy its competitors, as soon as you achieve that perfection you would also achieve destruction – because our society seems to thrive best where there are many ideas contending and where no one organisation/form of government/set of ideas has eliminated all the rest."

Interview with Martha Atwood

Polite Society

What you can't say "It seems to be a constant throughout history: In every period, people believed things that were just ridiculous, and believed them so strongly that you would have gotten in terrible trouble for saying otherwise.

Is our time any different? To anyone who has read any amount of history, the answer is almost certainly no. It would be a remarkable coincidence if ours were the first era to get everything just right."

What you can say.

Thursday, January 08, 2004

Relational Web Services

XQuery on the Web talks about Xquery: Meet the Web.

Dare quotes: "In fact, this separation of the private and more general query mechanism from the public facing constrained operations is the essence of the movement we made years ago to 3 tier architectures. SQL didn't allow us to constrain the queries (subset of the data model, subset of the data, authorization) so we had to create another tier to do this.

What would it take to bring the generic functionality of the first tier (database) into the 2nd tier, let's call this "WebXQuery" for now. Or will XQuery be hidden behind Web and WSDL endpoints?"

And responds with:
"Every way I try to interpret this it seems like a step back to me. It seems like in general the software industry decided that exposing your database & query language directly to client applications was the wrong way to build software and 2-tier client-server architectures giving way to N-tier architectures was an indication of this trend."

"Data model subsets" - don't you mean views?

Also Dare says, "All this indirection with WSDL files and SOAP headers yet functionality such as what Yahoo has done with their Yahoo! News Search RSS feeds isn't straightforward. I agree that WSDL annotations would do the trick but then you have to deal with the fact that WSDL's themselves are not discoverable."

Which I would refer anyone interested to the paper in the JOWS: Automated Discovery, Interaction and Composition of Semantic Web Services.

XML For You and Me, Your Mama and Your Cousin Too "At this point if you are like me you might suspect that defining that the web service endpoints return the results of performing canned queries which can then be post processed by the client may be more practical then expecting to be able to ship arbitrary SQL/XML, XQuery or XPath queries to web service end points.

The main problem with what I've described is that it takes a lot of effort. Coming up with standardized schema(s) and distributed computing architecture for a particular industry then driving adoption is hard even when there's lots of cooperation let alone in highly competitive markets."

Journal on Web Semantics

Journal on Web Semantics "The Journal on Web Semantics and also this Website is approaching scientific publishing from a different angle: our topic demands more than just the production and printing of papers, but also the distribution of ontologies and running code. An early slogan of W3C standardisation efforts was 'rough consensus and running code' - this applies also for the Semantic Web - maybe changed to 'rough consensus, running code, and ontologies'."

Wednesday, January 07, 2004

Followup on Relational RSS

RSS, old enough to be having relations? "Seb mentions several of the operators of Codd's relational algebra, and, it seems to me there are two general reasons why everyone isn't already operating on RSS as relational data: 1) it is distributed across many files, and 2) the hierarchic XML structue of RSS."

"The main issue I am dealing with now is what types of data structures and formats work best with the various combinations of uses between data interchange, data storage, and querying."

Okay, now I'm convinced that this really is replicating RDF and I would have to encourage anyone considering this to pick up an open source RDF library (like Jena or Redland) and use it to perform these operations on RDF based RSS.

The problems that are highlighted are the same ones that various implementations of RDF have had to solve. A problem with serializing a graph (relational data) in XML - that's RDF/XML and it's use of striping. Distributed across many files and being able to search it - that's usually a problem for RDF data stores (like Kowari or other freely available ones).

For example, to get all the documents (blog entries, etc.) authored by Sam Ruby (this is from a previous post describing iTQL): "select $creator subquery( select $type from <rss_schemas> where $type <http://www.w3.org/2002/07/owl#sameAs> <http://purl.org/dc/elements/1.1/creator> ) from <rss_feeds> where $creator $type 'rubys@intertwingly.net';"

Where "<rss_feeds>" can be any number of URIs combined with logical operators.

Of course, you'll need to convert some feeds from XML to RDF. While I often link to RDFT, the more usual ways include XSLT and programmatically using an RDF API and an XML library. One of the quickest ways I've found is using a combination of Jena and Jakarta Apache's XML Commons Digester.

Also related, Base data: relational, RDF, XML.

George W. vs Hitler

George Bush & Adolf Hitler "The internet is littered with pictures of George Bush with a swastika on his chest, a drawn-on mustache, and his arm raised in the Nazi salute. It is therefore no surprise that someone would choose this theme for their video. But why? Has George Bush done anything to justify the comparison? Consider these points:

* Hitler slaughtered six million. George Bush only killed nine thousand or so in Iraq and many fewer in Afghanistan. Hardly a fair comparison.
* Hitler rounded up and killed homosexuals. George Bush only denies them the right to marry. Again, no comparison.
* Hitler rounded up and killed those with physical or mental infirmities. George Bush only cut their medical benefits. No comparison.
* Hitler invaded his neighbors and overthrew their governments. George Bush only invaded and overthrew the governments of two countries, and they were not neighbors. No comparison.
* Adolph Hitler believed in a "master race." George Bush believes in a master religion. No comparison.
* Adolph Hitler was an eloquent and persuasive madman. George Bush is neither eloquent nor persuasive. No comparison
* Adolph Hitler's government was in tight control of its citizens. George Bush has only limited our right to privacy, free speech, and access to lawyers.
* Adolph Hitler demonized Jews. George Bush only demonized Osama bin Laden and Saddam Hussein (with the leaders of Syria, Iran, and North Korea held in abeyance). No comparison.

Obviously, George Bush is no Adolf Hitler. To make sure those videos are never seen, we suggest that he confiscate them and start arresting people. This kind of outrage should not be allowed in a free society."

Tuesday, January 06, 2004

Set Theory with RSS

The Algebra of RSS Feeds ""Taking a cue from the operations of set theory," Paquet writes, "we could for instance define the following:

1. Splicing (union): I want feed C to be the result of merging feeds A and B.
2. Intersecting: Given primary feeds A and B, I want feed C to consist of all items that appear in both primary feeds.
3. Subtracting (difference): I want to remove from feed A all of the items that also appear in feed B. Put the result in feed C.
4. Splitting (subset selection): I want to split feed D into feeds D1 and D2, according to some binary selection criterion on items.""

The original post.

While I wouldn't say that RDF has the monopoly on set theory (and operations like union and intersection) it does seem like reinventing the wheel.

Monday, January 05, 2004

Misinformation

Internet creator Berners-Lee knighted "British physicist Tim Berners-Lee, who invented the World Wide Web -- or at least better access to it -- has been awarded a knighthood in London.

Without his creation, there would be no computer addresses, no e-mail and the Internet might still be the exclusive domain of a handful of computer experts, the Independent reported."

RSSOwl

"RSSOwl is a free RSS (0.91, 0.92, 1.0, 2.0) newsreader written in Java programming language using SWT as fast graphic library. Features of RSSOwl include reading RSS or RDF newsfeeds in a comfortable tab folder, save newsfeeds in categories, export them to PDF / HTML or OPML, and view news in an internal browser."

Even though it had some issues with freezing, laying out some of the UI, and other bugs it's easy to get running and it's not too bad. It'd be nice to have support for RSS autodiscovery. I think I still prefer NetNewsWire or NewsMonster although I should try the others.

Sunday, January 04, 2004

More Commercial RSS

* RSSAds "The ad engine for RSS feeds."
* k-collector "k-collector is an enterprise news aggregator that leverages the power of shared topics to present new ways of finding and combining the real knowledge in your organisation...The k-collector archicture combines clients for leading weblogging software with a server based aggregator and web application."

Waypath

"Waypath makes use of Think Tank 23's unique information retrieval platform, Nav4, which automatically analyzes content, such as weblogs, and links documents that share common topics. Using Nav4, Waypath provides both keyword search and contextual navigation of individual weblog posts."

SDK feature list includes things such as: concept-driven similarity browsing, under load, handling over 600 queries/minute and 1 million documents on a typical installation and there's no taxonomy to maintain.

Here are Morenews's related links (top link is Themes and metaphors in the semantic web discussion) and related books.

OS JavaServer Faces

Smile Now includes its own renderkit. Next release will be fully compliant, apparently.

More associated links: JavaServer Faces home page, more details in chapters 21 and 22 of the Web Services Tutorial or alternativately a tutorial for the impatient.

Saturday, January 03, 2004

iBox

Exclusive Insider Information: Apple iBox in production. "The iBox plugs into your TV and acts as a hub for your digital devices and computers. Unlike the EyeTV from Elgato, the iBox is a standalone machine, not something to plug into an existing computer. The iBox can be scheduled to record TV, but unlike TiVos it does not serve as a "what's on and when" service rather a hard drive / media based recording device (new aged VCR). With its built in 802.11b & 802.11g from its AirPort Extreme card, one can access the home folders of any user on any wirelessly networked Mac or PC. The iBox has its own version of the popular iPhoto and iTunes software which is a welcoming plus to Mac OS 10 veterans and easy for Windows users to adopt as well."

Time to sell my Shuttle. A picture you can eat with a spoon. Tasty Apple rumours.

NoodleTools

Finding what you want on the web "Debbie is a trained librarian, and it shows. She understands that a single search engine is never going to do everything, no matter how good its indexing or how large its database."

"I do not think we will ever solve the search problem until we move away from the dumb web we have today towards something like the semantic web, a project that Sir Tim Berners-Lee has been pushing ever since the first web conference in 1994.

Once links carry meaning then it will become more of a distributed database than the vast heap of unstructured documents we have built so far.

And once we have a database then we can classify, index and search it properly."

NoodleTools.

Analysis Engine

A Fountain of Knowledge "...imagine a marketing researcher trying to find out the online attitude of consumers toward the popular rock singer Pink. The researcher would have to wade through an ocean of search results to sort out which Web pages were talking about Pink, the person, rather than pink, the color.

What such a researcher needs is not another search engine, but something beyond that—an analysis engine that can sniff out its own clues about a document’s meaning and then provide insight into what the search results mean in aggregate. And that’s just what IBM is about to deliver. In a few months, in partnership with Factiva, a New York City online news company, it will launch the first commercial test of WebFountain...Up to now this kind of aggregate analysis was possible only with so-called structured data, which is organized in such a way as to make its meaning clear. Originally, this required the data to be in some sort of rigidly organized database; if a field in a database is labeled “product color,” there is little chance that an entry reading “pink” refers to a musician."

"Although the pooled data is compressed to about one-third its original size to reduce storage demands, WebFountain still requires a whopping 160 terabytes plus of disk space. It uses a cluster of thirty 2.4-GHz Intel Xeon dual-processor computers running Linux to crawl as much of the general Web as it can find at least once a week."

"WebFountain’s builders admit it’s not always able to guess right, but they point out that humans can also be confused by ambiguous meanings."

"Because the data has been converted from an unstructured format to a structured XML-based format, IBM and its partners can fall back on the data-mining experience and methodologies already developed for analyzing databases. The structured format also provides an easy target for developing new analytic tools."

"This, perhaps more than anything else, is why WebFountain looks like a winner. By creating an open commercial platform for content providers and data miners, it will foster rapid innovation and commercialization in the realm of machine understanding, currently dominated by isolated research projects."

Friday, January 02, 2004

Reclaim the Semantic Web

Fight back "For the technologists among us I would recommend you read this piece by Mark Nottingham on reclaiming the Semantic Web from military purposes. We need to stop wasting time on bullshit like Freindster and start using this technology to do something useful like faciltating citizen oversight of the government."

The Semantic Web’s Dirty Little Secret

IBM Emerging Technologies Toolkit

1.2 Released "Version 1.2 contains Service Data Objects (SDO), Policy-Based IT Management Demo, Semantic Web Services, Autonomic Computing Toolset, WS-Manageability demo, WS-Trust, WS-Addressing, Web Services Failure Recovery, and Service Domain technology."

Serendipity Server

One of Danny Ayers' New Year resolutions:
"...text search, creation of triples using machine learning techniques...A server-side tools that combines Semantic Web and machine learning technologies to autodiscover connections between ideas."

It seems similar to some of Tim Bray's search vision Basic Resource Finder:
"BRF will have built in most of the lore on result ranking I wrote up earlier in this series, with the possible exception of Latent Semantic Indexing. Crucially, it will have some facilities to make it easy to feed back popularity and usage counts into the ranking heuristics."

eventSherpa, SW Killer App

eventSherpa - an RDF desktop application for Windows (at last!) "The desktop app is a good looking, user friendly and very functional calendar. Where it starts to get good is that I can publish my local calendars to the eventSherpa Calendar server. They will then be available as HTML and more importantly as RDF feeds using the RDF Calendar schema for anybody to subscribe to either using the eventSherpa client or any software that can consume this RDF vocab."

Thursday, January 01, 2004

OWL Implementations

OWL Implementations (commercial implementations are Cerebra and Snobase) also includes OWL Test Results.