Friday, February 27, 2004

Time for Phase 2

I was noticing the current increase in activity and general buzz with the Semantic Web. Recently, development work like: RDF in XHTML and Nokia's Semantic Web Server. It's obvious now, we've reached Phase 2 - the fun phase.

"W3C announced the launch of Phase 2 of the Semantic Web Activity. Two new Working Groups have been formed; the Best Practices and Deployment WG (charter) and the RDF Data Access Working Group (charter). These join the RDF Core and Web Ontology WGs, the Semantic Web Interest Group, and the Semantic Web Coordination Group."

"The aim of this Semantic Web Best Practices and Deployment (SWBPD) Working Group is to provide hands-on support for developers of Semantic Web applications. With the publication of the revised RDF and the new OWL specification we expect a large number of new application developers. Some evidence of this could be seen at the last International Semantic Web Conference in Florida, which featured a wide range of applications, including 10 submissions to the Semantic Web Challenge (see http://challenge.semanticweb.org/).This working group will help application developers by providing them with "best practices" in various forms, ranging from engineering guidelines, ontology / vocabulary repositories to educational material and demo applications."

http://www.w3.org/2003/12/swa/swbpd-charter

"The RDF data model is a directed, labeled graph with edges labeled with URIs and nodes that are either unidentified, literals, or URIs (please see the RDF Primer for further explanation). The principal task of the RDF Data Access Working Group is to gather requirements and to define an HTTP and/or SOAP-based protocol for selecting instances of subgraphs from an RDF graph. The group's attention is drawn to the RDF Net API submission. This will involve a language for the query and the use of RDF in some serialization for the returned results. The query langauge may have aspects of a path language similar to XPath (used for XML in XSLT and XQuery) and various RDF experimental path syntaxes."

http://www.w3.org/2003/12/swa/dawg-charter.

Wednesday, February 25, 2004

RDQLPlus

"RDQLPlus is a tool for querying RDF graphs, featuring graphical results in a zoomable user interface (ZUI). It can work with existing RDF files, Jena2 RDF databases, and a native-Java database called Mckoi (included)."

The UI has similar scaling problems to IsaViz (with displaying hundreds of statements). Good to see a pure Java implementation.

Sunday, February 22, 2004

Profium RDF Parser

Profium RDF Parser V2 in Java Also of interest is their API documentation.

Metadata Mac

Next Mac OS X to be Metadata-driven? "The (unconfirmed) info says that Mac OS X 10.4 will go "further than anticipated", introducing not only a "database-driven" new Finder (possibly similar to BeOS' Tracker) --although the file system itself will still be HFS+-- but also a wide support for file metadata. Please note that both the BeFS (and quite possibly this Apple implementation) is not similar to Longhorn's WinFS (apples & oranges). All this is not a surprise for us, as the people who were behind the same realization on BeOS --Dominic Giampaolo and Pavel Cisler-- today work at key positions at Apple Computer in the file system and Finder areas respectively."

I had previously mentioned another suggestion for metadata in OS X.

Marusha

Marusha: Using semantic web concepts to create your own private DJ and a more recent update.

"I ran into a number of serious scaling issues on Friday. Adding the 16 million tracks from freedb.org to the parser's database really was the straw that broke the camel's back. Lesson learnt: There are types of queries that a relational database will never be able to run in acceptable time."

Saturday, February 21, 2004

Mangrove

Mangrove: An Evolutionary Approach to the Semantic Web "The Mangrove project seeks to create an environment in which users are motivated to create semantic content because of the existence of useful semantic services that exploit that content and because the process of generating such content is as close as possible to existing techniques for generating HTML documents. Our goal is to facilitate the simple annotation and subsequent extraction and querying of the enormous amount of information that already exists within the WWW's billions of HTML pages, rather than requiring the creation of new content from scratch. Our approach thus seeks to faciliate the gradual transformation of the current web into the semantic web."

Very similar to the ideas expressed recently in "interview followup". Semantic Tagger is an interesting example of this (the schema is an XML schema).

Future of MS Search

Robert Scoble On The Future Of Search Engine Technology "If we're talking about your local hard drive, searching for files on your local hard drive is still awful and getting worse...It's easier to create files now than it is to find them. "

"...to really make search work well search engines need metadata and metadata that's added by the system keeping track of your usage of files, as well as letting application developers add metadata into the system itself. In a lot of ways, weblogs are adding metadata to websites. When a weblog like mine links to a web site, we usually add some more details about that site. We might say it's a "cool site" for instance. Well, Google puts those words into its engine. That's metadata."

"Developers distrust Microsoft's intentions here. They also don't want to open up their own applications to their competitors. If you were a developer at AOL, for instance, do you see opening up your contact system with, say, Yahoo or Google or Microsoft? That's scary stuff for all of us.

But, if the industry works together on common WinFS schemas (not just for contacts either, but other types of data too), we'll come away with some really great new capabilities. It really will take getting developers excited about WinFS's promise and getting them to lose their fears about opening up their data types. "

AUIML

AUIML "AUIML is an XML dialect that is a platform and a technology-neutral representation of panels, wizards, property sheets, etc. AUIML captures relative positioning information of user interface components and delegates their display to a platform-specific renderer. Depending on the platform or device being used, the renderer decides the best way to present the user interface to the user and receive user input.

The AUIML XML is created using the Eclipse-based AUIML VisualBuilder, which allows a developer to quickly build and preview user interfaces in the Java Swing and HTML Renderers. The AUIML VisualBuilder can also automatically create data beans, event handlers, and help system skeletons for the user interface. Since it plugs into Eclipse, building the user interface and application code is an integrated proces"

Interesting that IBM are support Swing here and not SWT.

Ontological Programming

Semantic Integration "Basically semantic integration seems to involve using RDF/OWL to define mappings between XML vocabularies.

Recently I've been involved in co-ordinating a number of large scale data migrations to help integrate several systems. This inevitably involved exploring the business models in the affected applications to define a mapping layer to allow the data to be cleanly migrated. Also inevitably this mapping ends up being expressed as a bunch of (hairy!) procedural code. It would have been nice to have been able to express that mapping in a more declarative way, if no other reason than it's easier to understand and debug possible problems. An example of not being able to see the wood for the trees."

There are many way to declaratively map different schemas - SQL, OO and XML all have their own tools. The value that RDF has is that you can program to a single ontology and map all of these different data sources to it.

With RDF there's not only the ability to map between XML vocabularies but to program to an ontology rather than a schema. In this case, the problem goes from MxN to M+N (or even just M).

One of the articles that highlights this issue is The Impedance Imperative Tuples + Objects + Infosets =Too Much Stuff!.

Friday, February 20, 2004

US Government Understands the Semantic Web

Taxonomy’s not just design, it’s an art "If anyone understands the acronym soup of Web services, it’s Michael C. Daconta. He’s director of Web and technology services for systems integrator APG McDonald Bradley Inc. of McLean, Va. As part of that job, Daconta is chief architect of the Defense Intelligence Agency’s Virtual Knowledge Base, a project to compile a directory of Defense Department data through Extensible Markup Language ontologies. "

"GCN: Will everyone use a single taxonomy for one big semantic Web, or will organizations build their own semantic Webs?

DACONTA: There clearly will not be just one semantic Web. A lot of people are looking at taxonomies, so they have to be careful. "

Thursday, February 19, 2004

Google Won

Search For Tomorrow ""For a lot of kids today, the world started in 1996," says librarian and author Gary Price.

And yet Berkeley professor Peter Lyman points out that traditional sources of information, such as textbooks, are heavily filtered by committees, and are full of "compromised information." He's not so sure that the robotic Web crawlers give results any worse than those from more traditional sources. "There's been a culture war between librarians and computer scientists," Lyman says.

And the war is over, he adds.

"Google won.""

""A generation ago, reference librarians -- flesh-and-blood creatures -- were the most powerful search engines on the planet. But the rise of robotic search engines in the mid-1990s has removed the human mediators between researchers and information. Librarians are not so sure they approve. Much of the material on the World Wide Web is wrong, or crazy, or of questionable provenance, or simply out of date (odd to say this about a new technology, but the Web is full of stale information).

"How do you authenticate what you're looking at? How do you know this isn't some kind of fly-by-night operation that's put up this Web site?" asks librarian Patricia Wand of American University.""

""He needs one that knows that he's a big-brain tech guru and not an eighth-grader with a paper due.

"The field is called user modeling," says Dan Gruhl of IBM. "It's all about computers watching interactions with people to try to understand their interests and something about them."

Imagine a version of Google that's got a bit of TiVo in it: It doesn't require you to pose a query. It already knows! It's one step ahead of you. It has learned your habits and thought processes and interests. It's your secretary, your colleague, your counselor, your own graduate student doing research for which you'll get all the credit.

To put it in computer terminology, it is your intelligent agent.""

Monday, February 16, 2004

XSLT for everything

On Semantic Integration and XML "In conclusion, it is clear that Semantic Web can be used to map between XML vocabularies however in non-trivial situations the extra work that must be layered on top of such approaches tends to favor using XML-centric techniques such as XSLT to map between the vocabularies instead. "

Radar Networks Triple Store

Okay, not much sleuthing here, it's the 4th hit on Google or something, still interesting.

"2 November, 2003. A short and somewhat formal description of the Radar Networks Triple Store, which is the system that handles the semantic metadata for this website. An essay by Jack Rusher.
Introduction

A triple store is designed to store and retrieve identities that are constructed from triplex collections of strings (sequences of letters). These triplex collections represent a subject-predicate-object relationship that more or less corresponds to the definition put forth by the RDF standard.

The problem space of storing this sort of data has been explored by the graph database, object database, PROLOG language and, more recently, semantic web communities. A more thorough backgrounder is provided by the work on Datalog, Jena, Dave Beckett’s Redland, the AT&T Research Communities of Interest project, Ora Lassila’s Wilbur, and the activities of the W3C Semantic Web project, among others."

Triple Store

Oh and then there's this blog (which I've read before without seeing the Radar Network postings).

The Dangers of Caffeine

Coffee-breaks sabotage employees' abilities "St Claire and Rogers decided to investigate caffeine's effects on work stress after hearing an anecdote at a stress workshop. A man described how he and a group of normally cohesive colleagues went on a business trip to the US.

Unlike in the UK, coffee was freely available and the team over-indulged. Within days their stress levels had escalated and they believe the extra caffeine had disrupted their working relationships, and impaired their working ability.

The Bristol team tested caffeine's effects on 32 coffee-drinkers. They told them they would be given a caffeinated coffee that would boost their performance, or a caffeinated coffee which causes stress-like side-effects, or decaffeinated coffee. However, unknown to the volunteers, only half the drinks contained 200 mg of caffeine and the other half contained none. The subjects then carried out two stressful tasks."

""Certainly in our experience of people drinking coffee there's a tendency for all sorts of personal interactions to get a little more intense. If there was a stressful situation there would be more shouting, yelling, louder talking," he told New Scientist. "This is very interesting confirmation.""

Beware of taking Bristolites to cafes with bottomless cups of coffee - violence, mayhem and talking louder could ensue.

Sunday, February 15, 2004

GiST and MTrees

MTree Project "The M-tree is an index structure that can be used for the efficient resolution of similarity queries on complex objects to be compared using an arbitrary metric, i.e. a distance function d that satisfies the positivity, symmetry, and triangle inequality postulates. For instance, with the M-tree you can index a set of strings and organize them according to their edit distances (the minimal number of character changes needed to transform one string into another)."

MTree Applet (also part of XXL).

GiST: A Generalized Search Tree for Secondary Storage "The GiST is a balanced tree structure like a B-tree, containing pairs. But keys in the GiST are not integers like the keys in a B-tree. Instead, a GiST key is a member of a user-defined class, and represents some property that is true of all data items reachable from the pointer associated with the key. For example, keys in a B+-tree-like GiST are ranges of numbers ("all data items below this pointer are between 4 and 6"); keys in an R-tree-like GiST are bounding boxes, ("all data items below this pointer are in Calfornia"); keys in an RD-tree-like GiST are sets ("all data items below this pointer are subsets of {1,6,7,9,11,12,13,72}"); etc. To make a GiST work, you just have to figure out what to represent in the keys, and then write 4 methods for the key class that help the tree do insertion, deletion, and search."

Report on implementing GiST in Java (framed).

Other trees (or multi-dimensional access methods).

Saturday, February 14, 2004

Problems with Java Generics

Generics in C#, Java, and C++ "...with Java generics, you don't actually get any of the execution efficiency that I talked about, because when you compile a generic class in Java, the compiler takes away the type parameter and substitutes Object everywhere. So the compiled image for List is like a List where you use the type Object everywhere...Of course, if you now try to make a List, you get boxing of all the ints. So there's a bunch of overhead there. Furthermore, to keep the VM happy, the compiler actually has to insert all of the type casts you didn't write."

"When you apply reflection to a generic List in Java, you can't tell what the List is a List of. It's just a List. Because you've lost the type information, any type of dynamic code-generation scenario, or reflection-based scenario, simply doesn't work."

With 1.3 casting used to be a fairly big overhead, with 1.4 this isn't an issue. Running the trove4j benchmarks in 1.4 you can still see the performance difference of objects vs primitives.

The second issue, and it's the same with autoboxing, seems to highlight how the syntax gives the illusion of consistency when there is none.

Trust Visualization

Trust on the Semantic Web has some examples of visualization. I recently found, again, the visualization of newsgroups by Microsoft Research called Netscan. I was reminded about this after reading unstruct.org's "Ouch! This is a red hot topic...".

New Sesame Home

"The openRDF.org site is a community site that is the center for all Sesame-related development. Here, developers and users can meet and discuss, ask questions and submit problem reports. The latest news about Sesame will be posted here."

Friday, February 13, 2004

Kowari 1.0.1

After some feedback about problems with building Kowari we've decided to release a new version that will compile and build successfully - this was due to problems with Barracuda. The upshot of this, though, is that all the bugs we fixed and the features we've been working on like Jena and JRDF support, RDQL, improvements in resource allocation and handling, and a Swing based iTQL UI get released.

We're also having fun loading millions of triples on an Opteron system running Linux. It gets about 4,000 triples/second after 13 million triples - which isn't quite fast enough, of course it's never fast enough.

Kowari project page.

Brain Dead Farting - I Don't Think in Triples

There looks to be a nice response on TriX. I was fairly aware that what I wrote I'd hoped it was what people would've thought of before. Had a simple "here's the XSLT you boob" and be done with it. I guess it's time to read those 39 odd emails.

Okay, that doesn't seem have been enlightening - especially when there's no response to your mail. It's not supposed to be human readable but someone has to write the XSLT for the "user".

Eric Jain has had a similar experience with the RDF/XML syntax that I have : "Interestingly, when presented with the choice of working with an XML or an RDF/XML representation of the same data, our developers (somewhat familiar with XML, not RDF) choose to use the RDF version (to my great relief :-). The data is relatively complex, with lots of cross-referencing, which the RDF/XML syntax can handle in a simple and consistent way."

Alberto Reggiori said: "but the real point here is how much work a user/programmer has to put into writing and managing RDF descriptions - even though RDF is supposed to be for machines, the poor users will have most of time mark-up their data into their templates of scripts, JSP, ASP and so on. TriX is definitively a step ahead compared to RDF/XML - added DTD, XMLSchema (then kinda "deterministic" markup) and named graphs are very cool - but still too complicated for the average human being to use, due he/she has to think in terms of statements, subjects, predicates, objects, collections, reification and so on. Definitively, such an "assembly like language" is very good for general purpose RDF toolkits and frameworks, and more experienced users. But the major part of XML folks out there do not have a clue (or few) why they have to "denormalize" their data all the time into RDF constructs."

After reading the TriX paper again, I still wonder what is the exact problem that's trying to be solved. I don't think it's made the problem of converting RDF to XML any easier, it is significently more verbose than RDF/XML or N3, some of the small things like named graphs are useful but not by themselves. Using XQuery to query it would be slow and rather cumbersome, it's much better to use a triple store.

Thursday, February 12, 2004

13 Things to Fix about EJBs

"In particular developers appear to be most interested in the following:

1. Ditch CMP or make it simpler like Hibernate or JDO
2. Use a POJO programming model
3. Add support for Interceptors/filters/AOP.
4. Eliminate/Consolidate the component interfaces (remote, local and endpoint).
5. Make deployment descriptors more like XDoclet
6. Instance-level authentication/entitlement.
7. Replace the JNDI ENC with Dependency Injection (a.k.a. IoC).
8. Add a Mutable Application Level Context
9. Support development and testing outside the container system.
10. Define cluster-wide singletons for EJB.
11. Clearly specify and standardize class loading requirements.
12. Standardize the deployment directory structure for EJBs.
13. Support Inheritance."

13 improvements for EJB.

Wednesday, February 11, 2004

IoC

I was looking for more information about this today and found this (Martin Fowler has an eariler article and so does Michael Yuan):
"The best way to describe what IoC is about, and what benefits it can provide, is to look at a simple example. The following JDBCDataManger class is used to manage our application's accessing of the database. This application is currently using raw JDBC for persistence. To access the persistence store via JDBC, the JDBCDataManger will need a DataSource object. The standard approach would be to hard code this DataSource object into the class, like this:

public class JDBCDataManger {
public void accessData() {
DataSource dataSource = new DataSource();
//access data
...
}

Given that JDBCDataManger is handling all data access for our application, hard coding the DataSource isn't that bad, but we may want to further abstract the DataSource, perhaps getting it via some system-wide property object:

public class JDBCDataManger {
public void accessData() {
DataSource dataSource =
ApplciationResources.getDataSource();
}

In either case, the JDBCDataManger has to fetch the DataSource itself.

IoC takes a different approach — with IoC, the JDBCDataManger would declare its need for a DataSource and have one provided to it by an IoC framework. This means that the component would no longer need to know how to get the dependency, resulting in cleaner, more focused, and more flexible code."

A Brief Introduction to IoC

Tuesday, February 10, 2004

RDF in GPL and LGPL

Creative Commons Includes GPL And LGPL Metadata "I was looking at the Creative Commons site this weekend, and was surprised to find, on their license generation page, entries (translated into Portuguese) in a sidebar for the GNU General Public License and GNU Lesser General Public License, including RDF blocks. Since CC is pushing for projects that can generate, validate, display and search for CC license metadata, how cool would it be to be able to do a Google search for GPL-licensed material, or a P2P network for MP3s released under the CC Attribution-ShareAlike license? As an example, Nathan Yergler has released mozCC, a plugin for Mozilla and Firebird that allows you to view CC license information embedded in a webpage, and provides icons on the status bar displaying the CC license options."

Time to do one for MPL I guess.

RAD XUL

Building RAD Forms and Menus in Mozilla "In Rapid Application Development with Mozilla, Web, XML, and open standards expert Nigel McFarlane explores Mozilla's revolutionary XML User interface Language (XUL) and its library of well over 1,000 pre-built objects. Using clear and concise instruction, McFarlane explains what companies such a AOL, IBM, Hewlett-Packard, and others already know—that Mozilla and XUL are the keys to quickly and easily creating cross-platform, Web-enabled applications. The Mozilla Platform encourages a particular style of software development: rapid application development (RAD). RAD occurs when programmers base their applications-to-be on a powerful development tool that contains much pre-existing functionality. With such a tool, a great deal can be done very quickly. The Mozilla Platform is such a tool."

Thursday, February 05, 2004

SemEarth

The Semantic Earth "Thanks to the constellation of technology that enables digital networks to be laid over the places of the earth, wherever we are we will be able to hear the human conversation that has occurred about that place - the history that occurred there, the aesthetics to be savored, the commerce transpiring at that very moment, recommendations offered by strangers and friends."

More comments: "Predictably, I'm curious because this is an area that my research group has been interested in for some time. For example, our work in creating digital representations for people, places and things led us to semantic location and the Websign project...will a proprietary model be operative?"

Planetarium

XML Watch: Planet Blog "As the number of Planet-style aggregators grows (while I'm writing this, Planets Apache and SuSE are under active development), so grows a variety of software for creating the aggregated sites. There are now at least three codebases for creating such sites, originating with Monologue, Planet GNOME, and Planet RDF. It would be good if each of these codebases could interoperate at least on the basis of configuration files, such as the RDF blog listing from Listing 1. Additionally, we may want a more advanced way of describing each of the planets, perhaps so an über-aggregator -- the Planetarium! -- can be made. (Actually, Jeff Waugh, who created Planet GNOME, has just registered "planetplanet.org", so watch this space!)

I'll leave you with the code in Listing 5, which is a suggestion of how multiple planets could be described; processors follow the seeAlso links to retrieve a list of contributors for each planet. If the choice is made to use RDF/XML, creating the über-configuration file is as easy as aggregating the various RDF blog lists."

A great article and all, especially Figure 2.

Quick Links

* The Semantic Web Made Easy - about a startup using RDF called Radar Networks.
* Protege 2.0
* SemWeb Central - OS Semantic Web tools.
* OntoJava (linked to previously)
* Mac G5 Cude - Instructions on how to build a Mac cube with the G5's grill look.
* Why your Movable Type blog must die - Another anti-blog rant.
* What if there was a data format, and nobody cared? - About social networking and RSS.

Wednesday, February 04, 2004

Docco

Tockit is an open source project which was written by people from DSTC. It has a great interface that uses lattices called Docco. It's a representation of Lucene results combined with metadata extracted by plugins like POI. It seems to be only using the text found in documents, not actual concepts.

It helps to remove the empty nodes to better understand the results.

Screenshots are also available.

Other FCA tools written using Java are available at the ToscanaJ site.

SWOOP

SWOOP (Semantic Web Ontology Overview and Perusal) "SWOOP is a simple and elegant utility to browse Ontologies written in OWL in a hyperlinked thesaurus-style format...Allows users to add OWL ontologies to the Knowledge Base and browse terms listed in them (sorted alphabetically). These ontologies can be saved locally for faster retrieval at a later stage (uses Jena 2.0)"

EII

A new view on data "Composite Software is at the forefront of this trend. Here's what EII is not, according to Jim Green, Composite's CEO and chairman: EII is not EAI (enterprise application integration). EAI pushes data around; EII is a pull system. If an address changes in your CRM application, EAI will push that information out to your ERP (enterprise resource planning) system, for example. Conversely, EII, pulls only the data you need out of ERP and CRM systems and offers it up in a single view for analysis.

Composite Software's Information Server stores the meta data (the data about the data), the fields, and the relationships between them. When the user executes the query, it fetches the data from the underlying systems to present a synthetic view.

"Composite Software's Composite Information Server joins data from different types of resources and creates an alias so it looks different than when it was stored," Green says.

EII is also not BPM (business process management). It has nothing to do with changing business processes. Composite Software's Murthy Nukala, vice president of marketing, pointed out some of the benefits of EII.

"Data takes up most of the cost of an integration effort. Increased understanding of that data is mission-critical, and you need to have a strategy about how you handle data," said Nukala. "

Tuesday, February 03, 2004

MOFman Prophecies

"If done correctly, a meta data-based approach can allow for reuse of interface definitions, messages and other pieces of the integration puzzle. For reuse to happen, though, developers must be able to quickly and easily find previously developed pieces. A centralized repository approach is one way to let that happen...The Holy Grail for this technology is creating, managing and reusing meta data models that are then redeployed to execution engines to reduce the amount of code that needs to be developed manually."

"Vendors, including Redwood City, Calif.-based Informatica Corp. and Bournemouth, U.K.-based Adaptive, have implemented MOF for their warehouse tools and engines. Informatica’s SuperGlue helps to put context around information used in traditional enterprise application integration (EAI)...[it] collects, stores and helps to analyze meta data, including providing audit trails. “The value is in being able to see the dependencies and linkages between data,” Poonen explained. Six customers, including one federal government agency that Poonen could not name, currently use SuperGlue."

"Beyond technology, successful meta data-based integration will depend on corporate culture and practices. Not everyone agrees that a centralized “center of excellence” approach to integration meta data is an absolute given to make it work, but some methodology or approach clearly is -- especially if reuse is ever going to happen."

"By 2005, predict Gartner analysts, more than half of large organizations will have multiple sources of integration technology. “As that proliferation occurs, being able to recognize the use of meta data and have consistent use across all the different deployments becomes important,” said Thompson."

The next step for meta data: Application integration

Monday, February 02, 2004

Lego as an Analogy

We were just talking about it the other day, I recently came across this Lego diagram describing the trade-offs of business objects.