Wednesday, March 31, 2004

D2RQ

RE: mapping a relational database as RDF "I'm currently working on an extended version of D2R MAP called D2RQ, which will allow the general rewriting of RDQL and find(s,p,o) queries to database specific SQL queries.

I have attached the current language specification and an example mapping which shows how the ISWC ontology (http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/iswc.daml.xml) is mapped
to a corresponding application specific relational schema (http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/iswc.sql.txt). The example also shows how D2RQ deals with taxonomical class structures in the ontology and many-to-many relationships in the data base.

D2RQ is currently implemented as a plug-in for Jena 2.1. The first prototype will be released sometime in May under an open source license."

Tuesday, March 30, 2004

Microsoft Goes Blogging

Microsoft to launch news, blog search services "The new services, called MSN Newsbot and MSN Blogbot, up the ante in the battle for Internet search market share. Microsoft already said that it is working on its own general Internet search engine, expected to be launched this year, to go head-to-head with Google. The Friday announcements were made at a Microsoft meeting for online advertisers, Microsoft said in a statement."

I hadn't looked at Newsbot, I'm sure any similarities (even the "Beta") to Google News is purely coincidental.

Wallop and FOAF

The truth about why I hate Friendster "Look, I want to say to the Friendsters of the world, we already invented a social network for friends and strangers. It's called the Internet. Why are you privatizing it? Why do we need a proprietary sub-network to do what the Internet has already done in an open way?"

"FOAF and LOAF add value to the Net, enriching it with voluntarily disclosed information about who we are and who we know. In this they are unlike Artificial Social Networks that capture the conversations between us but make them inaccessible to other applications.

The trade-off is high, however. Just take a look at Wallop and you'll see what I mean. "

Wallop was announced last year.

Monday, March 29, 2004

Jabberwocky and Adenosine

Jabberwocky "Jabberwocky represents a shift towards interoperability of RDF-based systems. Up until now, for all RDF was promoted as the basis of the new, semantic, web, information has remained locked behind propriatory, incompatible storage and query systems. Jabberwocky aims to set a standard for the distributed sharing and search and use of RDF-based semantics."

Adenosine "Adenosine is a language designed both to work on, and be distributed over, the semantic web. The language exploits the expressiveness of RDF whilst adopting a clean syntax based on a combination of Notation3 and ECMAScript."

Snippets, security and contexts

A pragmatic approach to storing and distributing RDF in context using Snippets "Snippets provide the means to represent a set of statements made about a particular subject and in a particular context. They may be stored conventionally using a bag of reified statements, or in a non-standard compact form using a much smaller number of triples. This compact representation is somewhat analogous to the use of quads to represent individual reified statements. The difference is that, while quads fall completely outside the specification, snippets may still be queried and transported in their compact form."

Sunday, March 28, 2004

Friday, March 26, 2004

Upcoming Papers for the WWW2004 Conference

Selected Search and Search-Related Papers Authors include researchers from UCLA, Intel, IBM and the University of Washington.

Semantic Web Dream

A maiden's dream "The most sophisticated (and the most bleeding edge) is to use the Web Ontology Language, which allows you to define an ontology. Literally, "ontology" is the science or study of being, particularly in a metaphysical sense. It is not therefore, numerable and should not be preceded by the indefinite article...Now, the Web Ontology Language is a W3C (World Wide Web Consortium) standard and, in practice, it is an extension to UML. This gives the clue to how you create an ontology: you model it.

This also gives an indication of Unicorn's approach and why it tries not to talk about semantics too much. Unicorn aims to provide your computers with the ability to understand that customers and clients are the same thing but does so without forcing you to adopt a formal semantic approach. Thus, for example, you can do the relevant modelling using E-R (entity relationship) diagrams if you wish."

The second article covers Network Inference: "Network Inference is much more committed to new developments in the area of semantic information architectures, whereas Unicorn takes a more evolutionary approach. In particular, Network Inference is strongly committed to OWL, the Web Ontology Language (whose letters don't match because that is how Owl spelled his name - WOL - in Winnie the Pooh)...The big argument in favour of this approach is that OWL is a standard. So are XQuery, SOAP, UDDI, WSDL and the other standards that Network Inference supports. In principle, this is a significant step forward in the integration space (whether EAI or EII) because, while existing solutions make use of standards, they are not, with the arguable exception of pure XML solutions, standards-based. That is, there is no standard way of expressing a virtual schema or database to underpin these integrations."

Java OWL-S API

OWL-S API provides a Java API for programmatic access to read, execute and write OWL-S (formerly known as DAML-S) service descriptions. The API supports to read different versions of OWL-S (OWL-S 1.0, OWL-S 0.9, DAML-S 0.7) descriptions. The API provides an ExecutionEngine that can invoke AtomicProcesses that has WSDL or UPnP groundings, and CompositeProcecesses that uses control constructs Sequence, Unordered, and Split. Executing processes that relies on conditionals such as If-Then-else and RepeatUntil is not supported in the default implementation. But this implementation can be extended to handle these constructs if the application that uses the OWL-S descriptions has a custom syntax and evaluation procedure for the conditions."

AI Fun and Relationship Fun

RELATIONSHIP: Two Worldviews Something I can finally agree with. Except that it's really a problem.

Fun with Topic Maps

Metadata? Thesauri? Taxonomies? Topic Maps! "Topic maps are organized around topics, which represent subjects. That is, in a topic map you find topics. Every topic you find represents a subject out in the real world that it is a symbol or stand-in for in the topic map. The definition of subject is essentially "anything whatsoever". What this means is that from the point of view of a topic map, objects are just a special kind of subject.

What this means is that the whole machinery we have created to describe the concepts in a subject-based classification is also available to describe the objects being classified. Thus, we can create a topic type for objects ("document", perhaps), and express the metadata using names, occurrences, and associations. The topic types let us keep track of what is a document and what is a concept, but we no longer need different technologies for metadata and classification.

And once we have everything in a single representation we can start to cross the boundaries by for example describing the authors of a paper further, and connecting them with the terms from the subject-based classification. Effectively what has happened is that the straightjacket has been removed, and we can now say anything we want to. The technology is no longer the limiting factor; instead the limits are set by our imagination and by how much information we are able to maintain within the economic constraints of our projects."

Java Fun

* JBoss Cache Released "Supports local, synchronous replication, and asynchronous replication transaction. Works with JTA transaction manager. Support transaction isolation levels." Check out the TreeCacheAop documentation.
* FishEye "Fisheye delivers a unified view of your repository that provides easy navigation, powerful search, historical reporting, configurable file annotation and diff views, changeset analysis, RSS feeds, and integration with your issue tracker. Many more features and enhancements are also under development." And Australian too.
* Joott - Use OpenOffice to convert to different document types and the like.
* Blocks In Java Trying to hack Java to do closures.

Thursday, March 25, 2004

IBM announces use of Aspects

IBM touts new 'aspect' for software coding "The public commitment to aspect-oriented programming is meant to indicate that IBM believes the technology is ready for use in business development, rather than academic scenarios. Only a few commercial software companies, such as Intentional Software and JBoss Group, use aspect-oriented technologies.

"We believe these concepts are viable; they can deliver real value and help us transform ourselves to be more flexible and improve quality," Berry said.

Before aspect orientation can become mainstream, programmers need to be trained in the techniques and development tools, according to analysts. IBM, for one, will be working on "wizards" that can walk people through the process of creating aspects.

"The tooling support is absolutely essential for this to take off," Berry said.

Berry said he expected that in the next two or three years, development tools will commonly have aspects integrated within them. Further along, IBM is looking at melding its work on model-driven software development with aspect orientation, he said.

Also at the conference, IBM will discuss another research project, called Concern Manipulation Environment. The project, which uses work developed in IBM research, is designed to provide a path to aspect-oriented programming with tools that work with existing software written in different languages. "

The different languages included C++ and C#.

Another monopoly case

Newly Released Documents Shed Light on Microsoft Tactics "Even as Microsoft prepares to face penalties from the European Union, which accuses the company of abusing the Windows monopoly, new details about the tactics Microsoft used to secure a dominant position in software markets for nearly two decades are emerging in a state courthouse in Minneapolis...Among the documents introduced in court this week was a letter from June 1990 in which Bill Gates, Microsoft's chairman, told Andrew S. Grove, the chief executive of Intel at the time, that any support given to the Go Corporation, a Silicon Valley software company, would be considered an aggressive move against Microsoft."

The book Start-up : A Silicon Valley Adventure is definately a good, if one-sided, read. Just in time for the 10 year anniversary.

Users make bad Security Decisions

Disappearing .Net Brand Invites Assimilation "Importantly, she then noted, "If you remember 'Hailstorm,' Microsoft's personal Web services technologies, a number of them are set to manifest in Indigo." Why position something as a pervasive set of user services that might frighten off privacy-sensitive consumers, when you can position it instead as a developer productivity tool that will be comfortably buried inside end-user applications and tasks?"

"Personally, I'm in a pretty grouchy mood at the moment about end users' apparent willingness to live with bad choices that developers make: specifically, choices that favor developer convenience over security and reliability and other boring issues. For example, I'll soon be sharing with eWEEK readers my comments on Greg Hoglund's and Gary McGraw's new book, "Exploiting Software: How to Break Code"; one comment from that book seems apropos. The specific subject is PHP, which the book calls "a study in bad security. … The mantra 'don't make the developer go to any extra work to get stuff done' applies in all cases." And yet, PHP is widely used, creating widespread vulnerabilities.

Likewise the developer and user convenience features of Internet Explorer and the Windows platform, which still pave the way for costly attacks. "

Wednesday, March 24, 2004

Aspects can be slow

Measuring the Dynamic Behaviour of AspectJ Programs "The conventional wisdom that AspectJ does not introduce overheads seems to be explained by typical aspect usage. First, advice generally applies to user code, yet typical Java programs spend most of their time in library calls. As a percentage of the total execution time, the cost of advice is therefore insignificant in such applications. The Tetris benchmark illustrates this phenomenon. Some of our benchmarks (in particular DCM) show the opposite behaviour,where the advice is so expensive that the overheads of applying it are dwarfed."

"Contrary to popular belief, we did however also find significant overheads. This has led to the following guidelines for AspectJ usage, as well as promising areas for future compiler research:

* Loose pointcuts. It is easy to write a pointcut that matches too many join points...

* Advice that is too generic. When using the very generic form of around, this causes a significant amount of boxing and unboxing to convert arguments to the right form...

* Unwarranted use of around. Because of the above, it is generally preferable to eschew around in favour of after returning when possible...

* Pertarget. The use of per clauses to control aspect creation carries a non-negligible overhead..."

Monday, March 22, 2004

Interesting Query Use Cases

There are definately a large variety of use cases being kicked around that you wouldn't always think people would want or expect.

* Federated Searching: "Actor/User Agent needs to seamlessly query/access/integrate related chunks/pieces of data coming from a set of decentralized heterogeneous sources, and get presented an unified view over a the whole result-set/data-set."
* Temporal querying.
* Regular Expressions.
* Discover functionality of server: "Abelard's client software connects to each RDF storage server and determines whether it supports one of the three query languages it knows about. Abelard's client software chooses, based on priorities set by Abelard, to send different queries to different servers."

They seem to be missing security - probably required in the federated searching use case.

Saturday, March 20, 2004

OASIS goes Semantic

Network Inference Joins OASIS Consortium "Network Inference will participate in ebXML and other OASIS Technical Committees to help advance the application of OWL, OWL-DL, OWL-S, and RDF into business environments. The Cerebra Server™ product suite from Network Inference provides market-leading support for the Web Ontology Language and is the basis of the company's deep expertise with the emerging semantic web languages."

Hairy Computers

Web ontologies: problems, benefits "Computers are insanely, hair-tearingly, stupid - they have to be told everything in precise detail...using ontology with computers can't possibly be worse than the in-code truth functions we use today. Legions of programmers are writing down things like isBossOf all the time (myself included). It's their job. Except they don't call them relations, or predicates, they call them methods and those methods capture what most of us call business logic. [Nor do they call themselves ontologists.] So it's pretty far from logic but good enough for business - until the time comes to change the logic where the cost of using all that code becomes apparent. It's a long held truism that we'd be all much better off if we could get such logic out of code."

"For my part, the raison d'etre of declaring such relations is reducing code complexity while increasing flexibility."

Bill also suggests a reading list of: Data and Reality: Bill Kent, Knowledge Representation: John Sowa, Programming in Prolog: Clocksin and Mellish Philosophy of Artificial Intelligence: ed Margaret Boden.

For those wanting to really look at vocabularies "Vocabulary Mapping for Terminology Services" which describes how combining different vocabularies can increase usefulness.

DL-XML Mapper

Integrating DL and XML "Semantic Web Service concepts like DAML-S or the MCM Semantic Web Service Matchmaker provide means for storage and retrieval of web services far beyond text or UDDI based search. These concepts use Description Logics as a means to express the semantics of services. However, these Description Logic formats (e.g. the "RDF/XML"-serialization of DAML+OIL) are not directly compatible with the XML formats used by today's Web Services. In other words, the lingua franca of the Web is still XML, the advantages of Description Logics notwithstanding.

In order to gain some level of interoperation between Semantic Web Service concepts and traditional XML based Web Service concepts (and between Semantic Web and Web in general!), it is necessary to provide bridges between the (RDF/DAML/OWL based) concept description community and the XML/SOAP style of message passing."

Download and more information.

Friday, March 19, 2004

SWOOP 1.3 Released

SWOOP I've covered this before, it's since changed to use Jena 2.1.

Album sales at record high

Forget the spin! It's a record record. "Total sales (in all formats) climbed to a record high in 2003: 65.6 million, easily topping the previous record of 63.9 million set in 2001.

And the sales of actual CD albums climbed above 50 million for the first time (well above 50 million actually).

It's a real cause for celebration. Back before the advent of Napster and home CD burning the industry was selling fewer than 40 million CD albums per year."

Other stories: ARIA's press release, Sound of cash registers is music to the ears, Radio interview with Peter Martin and MUSIC INDUSTRY NEAR COLLAPSE IN FILE-SHARE FRENZY - No, Wait, That's Not Right….

Or the Slashdot article.

Thursday, March 18, 2004

Practical Calendaring

Making a date with the Semantic Web " Chris Sukornyk has an answer to the question "Is there any practical application for the W3C Semantic Web concept?"

Semaview Inc., Sukornyk's Toronto-based start-up, is offering a Semantic Web-based event calendaring application called eventSherpa. It is designed to give individuals and businesses a shareable calendar with the semantic search capabilities envisioned by the W3C. "

Wednesday, March 17, 2004

Vocabularies

RELATIONSHIP: A vocabulary for describing relationships between people "The RELATIONSHIP list should make it obvious that explicit linguistic clarity in human relations is a pipe dream. It probably won't though the madness of the age is to assume that people can spell out, in explicit detail, the messiest aspects of their lives, and that they will eagerly do so, in order to provide better inputs to cool new software."

From the comments: "I have been lurking around the FOAF list for a while and know these guys have been at it for a long while. The questions raised by Clay have been brought up and debated to death...here comes a reputable guy, who probably has not been involved in the effort, who without really knowing the issues, "fires a rocket at it". This rocket causes a lot of waste of energy because it is just a basic instinct kind of a rocket. Not really thought through."

Clay Shirky on RELATIONSHIP "...he still managed to overlook the raison d'etre for the relationship. Indeed it's the raison d'etre for all vocabularies."

Most of the other comments are humorous and worth a read. Another case of the metadata being more interesting and thought provoking than the data.

Tuesday, March 16, 2004

Not FOAF

The Opposite of FOAF "One of my friends is breaking new ground with semantic web; he set up a directory of annoying salespeople, and he’s naming names."

HP iTunes

HP Music Complete with annoying background loop. Links to Apple's iTunes download page for HP.

RDF Schema Editor

Mark Choate has released an early version of RDF Schema Editor for download. Giving it a quick run, it seems fairly intuitive for those who are used to RDF.

Schema Editor update "There is nothing like a real project when it comes to learning something new. The most problematic part of RDF from a user's perspective is handling namespaces. It's a hassle. But I also think that it's a hassle that can be overcome, and not one that really needs to intrude on the user experience. Improving the usability of tools that create metadata is important if the Semantic Web is ever to grow beyond speculation."

Monday, March 15, 2004

The Prevalence of Prevalyer

Continuing Prevalence "Brief run-down on two significant paradigms that have been becoming a lot more visible, presumably because they're good at...". As long as you define "it" in the strictest sense of what you're doing. If it's anything like persistently storing data (not objects, data) then Prevalyer isn't it. That being said, it's a finalist in the Jolt Awards.

Here's a bunch of links:
* Prevayling Stupidity,
* The Prevayler,
* Object Prevalence: Get Rid of Your Database?, and
* The first message on the AJUG maling list on Prevalyer. It includes replies like, "Prevayler serializes incoming transactions but not the result of executing each transaction. You don't have change (before or after) images. Undoing a failing transaction cannot be accomplished."

Revisiting the DBDebunk web site:
* MORE ON THE STATE OF THE INDUSTRY AND SQL.
* To O OR to R: IS THIS A DATABASE QUESTION?.

We've been considering adding to iTQL the MAYBE operator, to reduce the complexity of writing joins when querying RDF data (more on the MAYBE operator by Date).

Saturday, March 13, 2004

Digging Against the Internet

Advice to Microsoft regarding commodity software "Digging in against open source commoditization won't work - it would be like digging in against the Internet, which Microsoft tried for a while before getting wise. Any move towards cutting off alternatives by limiting interoperability or integration options would be fraught with danger, since it would enrage customers, accelerate the divergence of the open source platform, and have other undesirable results. Despite this, Microsoft is at risk of following this path, due to the corporate delusion that goes by many names: "better together," "unified platform," and "integrated software." There is false hope in Redmond that these outmoded approaches to software integration will attract and keep international markets, governments, academics, and most importantly, innovators, safely within the Microsoft sphere of influence. But they won't ."

Friday, March 12, 2004

EMail Classification

Xerox Scientists Invent Software That Automatically Indexes, Categorizes, Routes Electronic Documents "Scientists at Xerox Corporation have invented powerful software that's clever enough to "read" an electronic document, decide how it should be classified by subject, then route it to the right person's e-mail address or online document management system ? all completely automatically.

The software, which is a categorizing tool, is intended to help businesses keep their e-document collections orderly and easily accessible, and it is available for licensing from Xerox.

"A misshelved book in a library might as well be lost. It's the same with documents that haven't been properly categorized; the document itself may have to be recreated," said Eric Gaussier, a research scientist at the Xerox Research Centre Europe in Grenoble, France. "Our new software can help save time and money and increase productivity. It will ensure that documents are properly classified for future retrieval and that the right information gets into the right hands as quickly as possible.""

Updates, Books and Referrers

Some quick updates:
* A Lot To Digest - A glowing summary of the Cannes Technical Plenary and WG Meeting Week.
* From duct tape to chewing gum and baling wire - A review of the RDF in XHTML proposal.
* A Semantic Web Primer. Also from MIT Press From Logic to Logic Programming and Dynamic Logic.
* Also closed the RFEs in Kowari for RDQL (which we implemented using SableCC and our own query layer) and Jena Support (still lots to optimize).
* In my referrer log: [Babelfish translation] "The semantic Web, you use it and you will use it more and more without the knowledge. And it is very well like that. RDF, OWL, Semantic Web" from La Grange.

Thursday, March 04, 2004

SWIG Summary

Semantic Web Interest Group "Just about every tool in the Adobe arsenal is now able to embed RDF into their primary artifacts. It's ironic, of course, that a core web technology, RDF, still isn't really embeddable into the core web data format, HTML; but it can easily be embedded into Adobe artifacts...there are two reasons why Adobe customers want metadata embedded in digital artifacts. The first, obvious, reason is to support automated workflow. If you've ever worked in a graphics shop or design house, you know that workflow management is absolutely critical to profitability. The second, less obvious, reason is intelligent syndication. What a real-time OWL-powered pub-sub application needs, of course, is rich RDF metadata attached or linked to artifacts of the Web: images, video and sound files, as well as XHTML documents."

"...we're at the point now where the basic knoweldge representational mechanisms -- RDF and OWL -- are formalized; where mechanisms for the creation of RDF and OWL are coming online and into place; and where there is increasing awareness of and interest in the SW. What we need, then, is the "uniform means of interaction" for the SW that HTTP provides for the Web. What we need is a reasonable specification that we can give both to RDF tool makers and to to Python and Perl and Java and C# and Ruby programmers; and we need to say to them, "implement this specification in your tool or your language; then your users and programmers will be able to uniformly access resource representations on the Semantic Web". That will be a happy day, indeed."

Also talks about how Boeing are using OWL in the battlefield.

The Other Tim on the SW

Bill Gates, Edd Dumbill, and the Semantic Web "This idea of making the data smarter is absolutely central. I have been speaking about this myself for some time. As we move to a network-based software platform, where applications don't live on the local machine but are distributed between rich client front ends and huge database back ends, "open source" alone won't really solve our problems. It's open data we're going to be fighting about."

We 0wnz0r Facts

Hands Off! That Fact Is Mine "Ostensibly, the Database and Collections of Information Misappropriation Act (HR3261) makes it a crime for anyone to copy and redistribute a substantial portion of data collected by commercial database companies and list publishers. But critics say the bill would give the companies ownership of facts -- stock quotes, historical health data, sports scores and voter lists. The bill would restrict the kinds of free exchange and shared resources that are essential to an informed citizenry, opponents say."

"Under the terms of the broadly written bill, a public-health website could be deemed in violation of the law for gathering a list of the latest health headlines and providing links to them on its home page."

"An encyclopedia site not only could own the historical facts contained in its online entries, but could do so long after the copyright on authorship of the written entries had expired. Unlike copyright, which expires 70 years after the death of a work's author, the Misappropriation Act doesn't designate an expiration date."

Free T-Shirt!

Trust on the Semantic Web "This project is designed to build and maintain a trust network on the semantic web. Using an ontology that extends FOAF, you can assign trust ratings to people you know.

The network generated in this project will serve as the foundation for a dissertation on the topic of Trust on the web. By becoming part of the network, your data will be used as the foundation for Trustbot, Trustmail, and some future applications."

Current top 10 and visualization of the graph.

Wednesday, March 03, 2004

Faceted Metadata

Faceted interfaces "Yes, the alternative is so-called faceted metadata. Like the facets on the diamond, faceted metadata allow you to look from different sides to the same information. Faceted metadata have the information organized on several dimensions (facets). The information has values on each of the facets. Readers find information by chosing values on each of the facets."

Faceted Metadata Search and Browse "Examples of faceted metadata include:

* Music store: songs have attributes such as artist, title, length, genre, date...
* Recipes: cuisine, main ingredients, cooking style, holiday...
* Travel site: articles have authors, dates, places, prices...
* Regulatory documents: product and part codes, machine types, expiration dates...
* Image collection: artist, date, style, type of image, major colors, theme..."

Both have screenshots of the faceted approach, although the second is a little more clear. Flamenco is often cited too. For example, Flamenco Fine Arts Search.

Tuesday, March 02, 2004

Balloons don't scale

Balloons and Ribbon as Social Networking Visualization "Artist/curator friend Mark Soo did a piece for one of the Infest openings where he visualized the curators' social network using balloons with people's names printed on them as the nodes and ribbons tying them together as the edges (the data comes from "invites" he got the curators to send to one another).

This was a great, inviting, tactile "graph manipulaton interface". But the reason I liked it so much was that it really brought out the problems of social networks visualizations as a way of learning about the networks being visualized: too confusing!"

JRDF

JRDF got a mention at the SWIG meeting. It would be good to work in the org.w3c namespace. The next thing, after bugs and modifications have been made to the Vocabulary package, is transactions.

The Bleeding Semantic Edge

Semantics: a new beginning? "The reason for all this continual mapping is that the source data has no meaning in any real sense of the word. That is, it has no context. What the new interest in semantics is looking to do is to provide that meaning. Unfortunately, there is no indication that database vendors (the leading ones at least) are going to provide this contextual information any time soon. However, there are moves afoot within the semantic community to try to establish semantic rules and ontologies that can be used across the enterprise and then, using appropriate tools, you can start to automate the process of creating these transformations and mappings rather than having to do it manually via a mapping tool.

There are a number of vendors already in this space. For example, Unicorn is an Israeli company that has moved its headquarters to the States, Contivo is completely American, and Network Inference started in the UK but is currently following Unicorn to the US.

Now, although all of these companies have a number of installations already, this is really bleeding edge stuff and I confess to not having fully got to grasps with it. However, I am in the process of arranging briefings with all of these companies and I will report back when I have more information and can explain it more simply. In the meantime the technology appears to have potential, not just for data integration but also other areas like EAI. Watch this space."