More News: 2003

Wednesday, December 31, 2003

Statement vs Stating

This comes up from time to time (at work and on mailing lists) a useful summary of an RDF statement and a stating:
Statements/Statings. From the ILRT Semantic Web technical reports at a glance. Two other references: Does the model allow different statements with the same subject/predicate/object? and also part of Reification in the RDF Semantics document.

TeraView, Level 5 and BioWisdom starting 2004 in style "Ontology specialist BioWisdom also plans to make a “big announcement early in the New Year.

Ontology is a branch of science that deals with knowledge capture and representation. BioWisdom’s approach involves the development of specialised knowledgebases and the software tools for managing them.

Chief executive Gordon Smith Baxter said: “2003 has been a great year for BioWisdom. In January we secured a £2.5m investment from MB Venture Capital II and Merlin Ventures Fund IV."

BioWisdom and Network Inference on using ontologies for drug discovery.

Friday, December 26, 2003

Some Holiday Links

* Improved Topicalla Screenshot
* Weedshare
* XML 2003 Conference Diary - Notes the continual rise in interest in the Semantic Web.
* SnipSnap 0.5 - Now with the snips available as RDF.

Friday, December 19, 2003

Quintuples

Trust, Context and Justification While I'm not sure about using 5 tuples, we use 4 and make statements about the 4th tuple in order to do things like security, it's still an interesting paper with some good references.

Google Searching for Relevance

A Quantum Theory of Internet Value "When "the Internet" was unveiled to a doughnut-eating public a decade ago, we were promised unlimited access to vistas of encyclopedic knowledge. Every body would be connected to every thing, and we would never be short of an answer. What with the abundance of information, and the costs of transporting information approaching zero, the world would never be the same again.

Of course, a decade on, we know that real economics have prevailed. Information costs money. Those transport costs certainly aren't zero. And faced with a choice of a million experts, people gravitate towards experts with a good track record: i.e., for better or worse, paid journalists, qualified doctors or other centers of expertise.

Taxonomies also have been proved to have value: archivists can justify a smirk as manual directory projects dmoz floundered - true archivists have a far better sense of meta-data than any computerized system can conjure. If you're in doubt, befriend a librarian, and from the resulting dialog, you'll learn to start asking good questions. Your results, we strongly suspect, will be much more fruitful than any iterative Google searches. "

"At a convivial dinner recently, John Perry Barlow asked me why no one had written a story about how the most powerful organisations in the world were dependent on the most awful, antiquated and dysfunctional technology. Well, I ventured (to a deafening silence), maybe they were making ruthless choices, and really weren't too slavish about following techno-fads. Maybe the answer is in the question."

Wednesday, December 17, 2003

Commerical RSS

How to make RSS commercially viable "Without full content no aggregator can add much value by categorizing and filtering infomation, so no purely RSS based aggregator can make much money.

Despite all of the interest around web based syndication, people like Lexis Nexis will still make all the money unless this problem is solved."

Does it? Will it? Must it?

Interview: David Weinberger "What Shelley calls "the semantic web" is the Web itself. She puts it beautifully. And I agree 100% that the Web consists of meaning; it has to because we created this new world for ourselves out of language and music and other signifiers. But that meaning is as hard to systematize and capture as is the meaning of the offline world and for precisely the same reasons. The Semantic Web, it seems to me, often underplays not only the difficulty of systematizing human meaning (= the world) but also ignores the price we pay for doing so: making metadata explicit often is an act of aggression. Human meaning is only possible because of its gnarly, tangly, implicit, unlit, messy context. That's the real reason the Semantic Web can't scale, IMO.

If by "The Semantic Web" you merely mean "A set of domain-specific taxonomies some of which can be knit together to provide a greater degree of automation and improved searching," then I've got no problem with it. It's the more ambitious plans -- and the use of the definite article in its name -- that ticks me off when it comes to The Semantic Web."

Exceptions (again)

13 Exceptional Exception Handling Techniques notes "Declare Unchecked Exceptions in the Throws Clause" and "Soften Checked Exceptions" (always use RuntimeExceptions). This lead to JDO and its JDO Transaction class that uses runtime exceptions (although it does document them) instead of JDBC's use of checked exceptions. Similarly, the Spring Framework and in Chapter 4 of Expert One-on-One J2EE Design and Development the author discusses the usual reasons given to avoid checked exceptions:
"Checked exceptions are much superior to error return codes...However, I don't recommend using checked exceptions unless callers are likely to be able to handle them. In particular, checked exceptions shouldn't be used to indicate that something went horribly wrong, which the caller can't be expected to handle...Use an unchecked exception if the exception is fatal."

With both JDO and Spring the contract offered by the framework tells the client what they can and cannot handle. In my experience, this is not an either or situation. For example in JDO they use "CanRetryException" and "FatalException" - an exception that can be retried, could actually be fatal depending on the context and vice-versa. This often occurs when large frameworks are used in conjunction with one another - at the system integration level. Preventing the developer the choice, when integrating into larger frameworks, what exceptions can and cannot be caught often leads to unexpected exceptions tunneling through layers.

Tuesday, December 16, 2003

Drools in Groovy

Drools (an augmented implementation of Forgy's Rete algorithm) is now available in Groovy.

RDF Matures

Resource Description Framework (RDF) Is a W3C Proposed Recommendation and OWL Web Ontology Language Is a W3C Proposed Recommendation, the next step is Recommendation.

More On Practical RDF

Practical RDF Town Hall "Next, xmlhack editor Edd Dumbill explored how he applies RDF to his personal data integration problems, running personal information through the Friend-of-a-Friend (FOAF) RDF vocabulary, using the Redland framework as a foundation for processing...which has sprouted context features and Python bindings to support this work. "

"In the last presentation, Norm Walsh explained how he was using RDF to make better use of information he already had. Walsh explained that he had lots of data in various devices about a lot of people and projects, but no means of integrating it. Thanks to various RDF toolkits - "just by dumping it into RDF, it just kind of happens for free." Aggregation and inference are easy - and Walsh can get convenient notifications of people's birthdays without duplicating information between a file on a person and a calendar entry noting that."

Monday, December 15, 2003

Corporate Taxonomies

Verity provides standard ways to categorise content "Traditionally, taxonomies have been time-consuming and expensive to set up. A Taxonomy needs to be unambiguous and cover all topics of interest to the organisation. In other words, it has to be Collectively Exhaustive and Mutually Exclusive. Few individuals, not even the company librarian, have the breadth of knowledge of the organisation and its information assets to construct a set of categories that encompasses all information and meets all needs."

"Because a taxonomy reflects the most important knowledge categories of an organisation, organisations that carry out the same business activities need similar taxonomies. (In the same way that such organisations share similar core business processes). This fact and the rising importance of taxonomies to organisations has led Verity to make six tailorable taxonomies available to jump-start the development of an organisation's taxonomy. Verity's six taxonomies suit a range of business activities covering Pharmaceuticals, Defence, Homeland Security, Human Resources, Sales and Marketing, and Information Technology. Organisations that start with these predefined taxonomies can then tailor them to their specific needs. "

Sunday, December 14, 2003

The Winner Takes It All

Power Laws, Discourse, and Democracy "Well, inevitable inequality is one way to characterize the effects of power laws in social networks. But is it the most useful way? Drawing on the same body of research on power laws in social networks, and using similar methods, Jakob Nielsen chose to emphasize instead that, as he put it in a piece published on AlertBox (03.06.16): Diversity is Power for Specialized Sites:"

"Winner-takes-all networks may follow Pareto's Law (the 80/20 rule) with regard to the cumulative distribution of links. But, according to Barabasi in Linked, the distinctive distribution hierarchy of scale free networks will have been broken. Instead, the network takes on what Barabasi describes as a "star topology," in which a single hub snarfs nearly all the links, dwarfing its competitors. "

"It's the the dynamics of emergent systems being formalized in open source. It's the fragile and turbulent architecture of democracy.

By contrast, winner-takes-all networks wipe out the middle ground connecting leaders to the network's other players. With this, winner-takes-all networks strip away the architecture that supports the productivity of local niches."

Saturday, December 13, 2003

More Practical RDF

Practical RDF "There are two features of RDF that I find particularly practical: Aggregation [and] Inference".

Not Influential or Famous

Myths Open Source Developers Tell Ourselves

Friday, December 12, 2003

Groovy is Out

"Groovy is a powerful new high level dynamic language for the JVM combining lots of great features from languages like Python, Ruby and Smalltalk and making them available to the Java developers using a Java-like syntax."

GPath "When working with deeply nested object hierarchies or data structures, a path expression language like XPath or Jexl absolutely rocks."

The SQL and Markup example also looks interesting.

New Java Tools

Algernon-J is a rule-based reasoning engine written in Java. It allows forward and backward chaining across Protege knowledge bases. In addition to traversing the KB, rules can call Java functions and LISP functions (from an embedded LISP interpreter).

JRDF "A project designed to create a standard mapping of RDF to Java."

Google 2005

Searching With Invisible Tabs "Doesn't the future of search look great? Whatever type of information you're after, Google and other major search engines will have a tab for it!"

Highlights that people can suffer from "tab blindness" and why one UI doesn't suite all (fairly obvious).

Greed is Good for Data Emergence

The Age of Reason: The Perfect Knowing Machine Meets the Reality of Content "In brief, the concept of "data emergence" that is central to this knowledge Nirvana is best summed up by James Snell as "the incidental creation of personal information through the selfish pursuit of individual goals." From Snell's perspective, content value is shackled by dumb Web browsers that are used to share information about individuals with Web sites that then try to "personalize" their content - an experience that must be repeated at each and every Web site visited, since this knowledge about individual interests and preferences is not shared site-to-site. Instead of this, the perfect world would have a "smart" content service, probably on one's PC, that would retain knowledge of all of one's personal profile and interests in accessing content; content providers would then be "dumb" sources pumping information into the smart service, not having any detailed knowledge of who is using their services and how. No more nasty Web site publishers, just one perfect tacit machine that knows exactly what you're thinking and allows you to obtain and share thoughts with others."

"Aggregation can happen anywhere to the satisfaction of many."

Kowari Already Out There

Kowari for RDF developers - Early Release It's already out there - found this when doing a Google on Kowari. The real site will be Kowari.org but with OS the source is the real thing I guess.

RSS for the Knowledge Worker

From the Metaweb to the Semantic Web: A Roadmap "At Radar Networks we have been working to define this ontology -- which we call "The Infoworker Ontology" -- with a goal of evententually contributing it to a standards body in the future. The Infoworker Ontology is a mid-level horizontal ontology that defines the semantics of common entities and relationships in the domain of knowledge work -- things like documents, events, projects, tasks, people, groups, etc. The development and adoption of an open, extensible, and widely-used Infoworker ontology is a necessary step towards making the Semantic Web useful to ordinary mortals (as opposed to academic researchers).

By connecting microcontent objects to the Infoworker Ontology a new generation of semantic-microcontent (what we call "metacontent") is enabled. With the right tools even non-technical consumers will be able to author and use metacontent. "

Thursday, December 11, 2003

The Early Days...of the Semantic Web

Early Days Of a Data-Sharing Revolution " And next week, a Chicago company plans to start selling a $36 mini-scanner dubbed "iPilot" that shoppers can use to scan bar codes on products in stores, then upload the data to a computer and compare prices at Amazon.com.

All are examples of how Web sites, relying on a new generation of Internet software, are licensing their databases to business partners and outside developers in an attempt to spark innovation and reach more customers.

"In the past six to nine months, we have started ramping up the program to license eBay's data," eBay Vice President Randy Ching said."

iPilot hey? You'd think with millions of venture capital the least you could do would be better than combining Apple's and Palm's product names.

Saturday, December 06, 2003

Metaweb

The Birth of "The Metaweb" -- The Next Big Thing -- What We are All Really Building "But RSS is just the first step in the evolution of the Metaweb. The next step will be the Semantic Web...The Semantic Web transforms data and metadata from "dumb data" to "smart data." When I say "smart data" I mean data that carries increased amounts of information about its own meaning, structure, purpose, context, policies, etc. The data is "smart" because the knowledge about the data moves with the data, instead of being locked in an application...The Semantic Web is already evolving naturally from the emerging confluence of Blogs, Wikis, RSS feeds, RDF tools, ontology languages such as OWL, rich ontologies, inferencing engines, triplestores, and a growing range of new tools and services for working with metadata. But the key is that we don't have to wait for the Semantic Web for metadata to be useful. The Metaweb is already happening."

Friday, December 05, 2003

More Visualization of RDF

Meta-Model Management based on RDFs Revision Reflection Breaking up the different type of RDF (schema and properties) is interesting although probably still has scaling problems (as with most of these types).

Styling RDF Graphs with GSS "One such solution is GSS (Graph Style Sheets), an RDF vocabulary for describing rule-based style sheets used to modify the visual representation of RDF models represented as node-link diagrams."

I would imagine that somehow taking historgram data and mapping that from graphs maybe more interesting and would scale better. Much like how some image search engines work.

Al Gore - How I would've done it different

FREEDOM AND SECURITY
""I want to challenge the Bush Administration’s implicit assumption that we have to give up many of our traditional freedoms in order to be safe from terrorists...In both cases they have recklessly put our country in grave and unnecessary danger, while avoiding and neglecting obvious and much more important challenges that would actually help to protect the country...In both cases, they have used unprecedented secrecy and deception in order to avoid accountability to the Congress, the Courts, the press and the people." "

"In other words, the mass collecting of personal data on hundreds of millions of people actually makes it more difficult to protect the nation against terrorists, so they ought to cut most of it out.""

Thursday, December 04, 2003

Semantic Merging

Skip This Rant and Read Shirky "Shirky sums up many metadata challenges with a concise statement: "it's easy to get broad agreement in a narrow group of users, or vice-versa, but not both." Hey, if you don't make your metadata structurally interoperable, you can't have semantic merging."

I found the diagram, "Content Enterprise Metadata: Structural Interoperability & Semantic Merging" to be quite instructive.

Wednesday, December 03, 2003

Zeitgeist Mining

On RSS, Blogs, and Search "I've been thinking lately about the role of blogs and RSS in search, and that, of course, has led me to both the Semantic Web and to Technorati, Feedster, and many others. Along those lines, I recently finished a column for 2.0 on blogs and business information. I can't reveal my conclusions yet (my Editor'd kill me) but suffice to say, I find the intersection of blogging, search, and the business information market to be pretty darn interesting.
I'm certainly not alone. Moreover has created "Enterprise-Grade Weblog Search" - essentially, a zietgiest mining tool for corporations. One can imagine similar products from any of the RSS search engines, or even from the major marketing agencies of the world."

JRDF

Well, it's not an impressive name but after talking to the Jena and Sesame people it seems important to have a consistent binding to RDF written in Java.

JRDF is going to be based on the best bits from Sesame, Jena, Kowari and RDF API Draft. Any other contributions would be good. Currently I've got the start of Blank Node, Graph, Literal, Node, NodeFactory, Statement and URIReference.

One of the annoying things is that the W3C specs for RDF don't talk about models anymore but graphs. It will be odd for a Model to implement Graph - maybe. Or there might be a lot of renaming to be done.

I've removed our implementation from Kowari and started using it instead. Still lots to do. Once I have Kowari done then it's Jena's turn.

Java Code

Abstract classes are not types "Have you ever seen code that declared a variable of type java.util.AbstractList? No? Why not? It's there, along with HashMap and TreeSet, etc. Because AbstractList is not a type. List is. AbstractList provides a convenient base from which to implement custom Lists. But I still have the option of implementing my own from scratch if I so choose. Maybe because I need some behaviour for my list such as lazy loading or dynamic expansion, etc. that wouldn't be satisfied by the default implementation."

"cglib is a powerful, high performance and quality Code Generation Library, It is used to extend JAVA classes and implements interfaces at runtime."

Tuesday, December 02, 2003

Blank Nodes

In RDF there are nodes. There are nodes that are resources (with or without URIs) and literals. The ones without URIs are blank nodes. These blank nodes can either be given a name (nodeID) or not. Statements are made of a subject (resource), predicate (resources with URIs) and object (anything). Simple enough.

Now I've been looking across three separate Java implementations. I found 8 implementations of classes designed to use blank nodes (resources without URIs):
* Two in Kowari,
* BNode, BNodeImpl and BNodeNode.
* AResource, Node_Blank and RDFNode.

What is maddening is that they each (well 7 of them) have a different way to get their name. What's even more maddening is this is right. Getting their name isn't a part of the RDF model - the most all of these different blank node implementations probably should have in common is equality and being the same type (just share a marker interface like Serializable).

Harpers.org brought to you by the Semantic Web

A New Website for Harper's Magazine "We cut up the Weekly Review into individual events (6000 of them, going back to the year 2000), and tagged them by date, using XML and a bit of programming. We did the same with the Harper's Index, except instead of events, we marked things up as “facts.”

Then we added links inside the events and facts to items in the taxonomy. Magic occured: on the Satan page, for instance, is a list of all the events and facts related to Satan, sorted by time. Where do these facts come from? From the Weekly Review and the Index. On the opposite side, as you read the Weekly Review in its narrative form, all of the links in the site's content take you to timelines. Take a look at a recent Harper's Index and click around a bit—you'll see what I mean.

The best way to think about this is as a remix: the taxonomy is an automated remix of the narrative content on the site, except instead of chopping up a ballad to turn it into house music, we're turning narrative content into an annotated timeline. The content doesn't change, just the way it's presented."

"A small team of Java coders and I are planning to take the work done on Harper's, and in other places like Rhetorical Device, and create an open-sourced content management system based on RDF storage. This will allow much larger content bases (the current system will start to get gimpy at around 30 megs of XML content—fine for Harper's, but not for larger sites), and for different kinds of content to be merged."

I'll have to look at Samizdat. Which "is a generic RDF-based engine for building collaboration and open publishing web sites." Seems to be the way things are heading.

Thursday, November 27, 2003

Simulated OS for teaching Assembly

Apoo is very similar to one of the uses of RCOSjava.

Jena2 Manager

"Briefly stated, I needed a means by which I could quickly hack models and ontologies to learn how to use Jena2. There remain many things to learn, and many things to finish coding in the program. I'm turning it loose so that others can contribute to its development. The JOSL license requires that those who fix things in the code or otherwise improve it return their code to the public. JOSL does not require that users use an open source license on new code that extends the licensed code."

Jena2 Manager

On Ontologies and Gnomes

The AI gnomes of Zurich "McDermott ends with a zinger:

It's annoying that Shirky indulges in the usual practice of blaming AI for every attempt by someone to tackle a very hard problem. The image, I suppose, is of AI gnomes huddled in Zurich plotting the next attempt to --- what? inflict hype on the world? AI tantalizes people all by itself; no gnomes are required. Researchers in the field try as hard as they can to work on narrow problems, with technical definitions. Reading papers by AI people can be a pretty boring experience. Nonetheless, journalists, military funding agencies, and recently the World-Wide Web Consortium, are routinely gripped by visions of what computers should be able to do with just a tiny advance beyond today's technology, and off we go again. Perhaps Mr. Shirky has a proposal for stopping such visions from sweeping through the population."

The entry links to a paper which lists the things that the Semantic Web "violates" wirth respect to traditional assumptions about AI. Including lack of referential integrity, variety in quality, diversity and no single authority. As noted, these are the same problems with human intelligence too.

Wednesday, November 26, 2003

webMethods going Semantic

Interview: webMethods CEO eyes Web services innovation "Secondly, there’s a whole other layer to deal with, what I call the semantic integration problem. Web services are great but they standardize pure connectivity between applications. The applications still have highly varied data models, extremely different ideas of what business processes should look like. Yet for most large organizations, a business process is going to span many applications. So you’re always going to need in the middleware stack something that can do wrapping, transformation, and, more than that, can actually keep the model of how the business processes are implemented across all of the infrastructure pieces.So [you need] something that’s technology-neutral underneath, like our Fabric product, and then on top have the ability to orchestrate business processes across all of these nodes in the fabric. Our customers now want to get real-time intelligence about what’s happening with the business and with the business processes, and they want to see it in dashboards, they want alerts. So we can put real-time monitoring around [IT infrastructure] at the business process level."

"We’re also able to offer enterprise event management, [injecting] business events into some kind of AI [artificial intelligence]-based rules engine."

Web Services in RDF

The question of how Web Services and the Semantic Web came up again recently. Here are a few links to current work in the area:
* Semantic Web enabled Web Services,
* Government Semantic XML Web Services Community of Practice,
* BPEL2DAML-S,
* Esperanto,
* Meteor-S,
* Supercharging WSDL with RDF, and
* SWAD-Europe Thesaurus Activity.

Tuesday, November 25, 2003

Don't Panic

Neil Gaiman hitchhikes through Douglas Adams' hilarious galaxy

Only 3

Three Uses for the Semantic Web They include:
* Sideline Semantics (or how to cut down on those darn post-planning columns),
* The Policy Ontology, and
* Cross Domain Searching for Calendar Concerns.

The Policy Ontology was perhaps the most interesting:
"I decided to tackle this with an interface to Jena for Apache Cocoon, or to use Cocoon parlance, a Jena-based transformer. I had no idea what kind of systems sat behind virtual reference applications, but I did know the protocol used underneath the queries was based on SOAP, and Cocoon excels at inserting itself in between any XML stream and adding value to the contents. So my approach was to use Jena's inference capabilities to map different classification schemes based on relationships defined in either RDF Schema or OWL. Yes, you could do the same thing with a table or two, and a thousand other ways, but the ontology approach provides a formal syntax for defining relationships."

Seems similar in idea to Sherpa Calendar.

The application is WIBS.

Fractally Yours

Openness & Interconnection "Big Fractal Tangle would be the name of a blog...A decade into the first Web, we've now got way too much information available, too much for any of us to sift through easily, which is why we need Round Two: the annotated, interconnected, Web. This new organic, evolving, maintainable, improvement will do more than simply increase the accuracy of our Google searches. It'll help real people understand and visualize interconnection, which in my opinion will alter our society profoundly for the better.

This was to be the point of my paper. Driving home that night, my brain frazzled and my voice hoarse from too much talk, I realized the topic was too big for a single paper. It'll have to be a blog."

See also: The Fractal nature of the Web.

Monday, November 24, 2003

CEUR Workshop Proceedings

CEUR Workshop Proceedings includes Semantic Integration, ICSW 2003 and Practical and Scalable Semantic Systems.

Sunday, November 23, 2003

Random RDF Tools

MusicBrainz Java API, RDFical (English version), MnM and FOAF Explorer - some of these I've covered before.

One Stop Schema Shop

SchemaWeb " SchemaWeb is a repository for RDF schemas expressed in the RDFS, OWL and DAML+OIL schema languages.
SchemaWeb is a place for developers and designers working with RDF. It provides a comprehensive directory of RDF schemas to be browsed and searched by human agents and also an extensive set of web services to be used by RDF agents and reasoning software applications that wish to obtain real-time schema information whilst processing RDF data. "

Who are you?

Gillmor Takes On Dvorak's Anti-Blog Stance ""Perseus thinks that most blogs have an audience of about 12 readers," Dvorak argues. Yes, John, but who are those 12? If one of them is Bill Gates, and another is Tony Scott, CTO of General Motors, and another is John Cleese, well you get the idea. Sometimes it's who you know as much as what. RSS only amplifies this, allowing a Ray Ozzie to post only when it's valuable to him and his readers. It's "You've got blog.""

Bill, Tony, John, give me some feedback then.

Saturday, November 22, 2003

Exceptional Exceptions

Best Practices for Exception Handling "When deciding on checked exceptions vs. unchecked exceptions, ask yourself, "What action can the client code take when the exception occurs?"

If the client can take some alternate action to recover from the exception, make it a checked exception. If the client cannot do anything useful, then make the exception unchecked. By useful, I mean taking steps to recover from the exception and not just logging the exception."

"Preserve encapsulation.

Never let implementation-specific checked exceptions escalate to the higher layers. For example, do not propagate SQLException from data access code to the business objects layer."

The only one I slightly disagree with is:
"Log exceptions just once

Logging the same exception stack trace more than once can confuse the programmer examining the stack trace about the original source of exception. So just log it once. "

I think I'm right, in there's been cases where logging at different levels of abstraction have required multiple logging of the same base exception. I've also found it helpful rather than harmful.

Ontologies

"OntoBuilder started as a tool (with a user interface) developed in Java. Later, one of the requirements was to implement OntoBuilder as an agent, removing any user interaction required. Therefore, OntoBuilder was implemented as a TCP agent. OntoBuilder can be accessed as a graphical tool, as a command line tool, as an applet, as a java WebStart application, as a TCP server, and using an HTML interface."

Includes ontologies in domains including: car rental, job finding, news, and others. Shouldn't forget about Metalog either.

Also, a new book called Ontological Engineering has just been published (well November 1) just in time for Christmas, the perfect stocking filler.

Pluggable Data Types

Curiosity Killed the Cat "This argument cements my suspicions that the using RDF and Semantic Web technologies are a losing proposition when compared to using XML-centric technologies for information interchange on the World Wide Web. It is quite telling that none of the participants who tried to counter my arguments gave a cogent response besides "use an xsd library" when in fact anyone with a passing knowledge of XSD would inform them that XSD only supports ISO 8601 dates and would barf on RFC 822 if asked to treat them as dates. In fact, this is a common complaint about them from our customers w.r.t internationalization [that and the fact decimals use a period as a delimiter instead of a comma for fractional digits]. "

One of the things on the Kowari Roadmap (I wish I could point to the real one) is pluggable data type handling. The basic idea is that you can describe the data handler and have data type processors register themselves with Kowari. Much like RelaxNG's.

SARS, SOTA and the Semantic Web

October's issue of Computer had two interesting articles on the Semantic Web:
Fighting Epidemics in the Information and Knowledge Age "We have simulated the spread of SARS and shown that isolation control measures had no significant effect on containing the epidemic's outbreaks...Information and knowledge sharing profoundly influenced the extent and duration of the SARS epidemic. At first, lack of information and knowledge sharing hampered China's efforts to research the virus and control the epidemic. SARS appeared initially in Guangdong province, but during the outbreak's early stages, the obtained experience with controlling and curing the disease was not available to health workers in other affected regions...Currently, however, the Web cannot guarantee the accuracy and reliability of the data it holds. To overcome these limitations, scientists are exploring ways to reshape the Web. The Semantic Web and the Grid represent just two of these efforts."

I'm not sure why I liked this article so much. Is it the old man vs microbe battle? Is it the positive reaction and can-do attitude about these problems.

Ontology-Mediated Integration of Intranet Web Services "Dealing with this flood of options will require sweeping automation. To meet this challenge, the authors built their smart office task automation framework—SOTA—using Web services, an ontology, and agent components."

Updates

Notes from SWAD-E "Building a triple store based on non-relational technology was represented by several participants such as those using BDB (not in Jena2 at present) and more sophisticated indexing such as @semantics. These have the advantages of smaller system dependencies than RDBs (slightly different SQLs, optimising needs, fetures) but are more "bare metal". The indexing is done using hashes (content digests) or using triple identiifers. A brief discussion showed that there were a variety of content digests used across relational and non-relational, MD5, SHA1 and using the top/lower 32/64 bits. As disks are still much slower than processors, there is little difference on modern systems (but MD5 is seen as more common). The non-relational triplestores tend to have better intimate knowledge and use of the RDF details such as schema information but still need to have query optimising, text searching and so on added by hand rather than reusing relational work."

libferris Release 1.1.12 "New fnamespace for setting XML like namespaces to refer to EA, added support for mounting RDF/bdb and RDF/XML files with list, rename, remove, create support, can now save EA in a user's local RDF/bdb file, new as-rdf EA to export all EA for a file as RDF/XML, new myrdf:// URL for personal RDF storage, can now handle directory names that are URIs properly, isCompressedContext() no longer tries to read a context to find out if its compressed or not, commented g_io_channel_set_encoding() because it was causing errors setting encoding on a fifo to null." Note of libferris.

Semantic Web the comic Less said the better.

Saturday, November 15, 2003

Java Goodies

Stocking up on the Internet before I spend a week on holidays:
JmDNS is a Java implementation of multi-cast DNS and can be used for service registration and discovery in local area networks. Part of JTunes.
JXTA 2: A high-performance, massively scalable P2P network "JXTA 2 introduces the concept of a rendezvous super-peer network, greatly improving scalability by reducing propagation traffic...Implementation of the shared resource distributed index (SRDI) within the rendezvous super-peer network, creating a loosely consistent, fault-resilient, distributed hash table".

Both ripe for the impending release of the Kowari project on SF.

Friday, November 14, 2003

Authors, Librarians and More Metadata

It's been nearly a week since Clay's posting. It's with some weariness that I began reading this response, but it was well worth it. Paul Ford has been a long time supporter of the Semantic Web.

A Response to Clay Shirky's “The Semantic Web, Syllogism, and Worldview” "But logical reasoning does work well in the real world—it's just not identified as such, because it often appears in mundane places, like library card catalogs and book indices, and because we've been trained to automatically deduce certain assumptions from signifiers which do not much represent the (S,P,O) form."

"I am a writer by avocation and trade, and I am finding real pleasure in using Semantic Web technologies to mark up my ideas, creating pages that link together. What I do is not math done with words. It's links done with semantics, and it forces me to think in new ways about the things I'm writing."

"For every quote he presents that shows the Semantic Web community as glassy-eyed, out-of-touch individuals suffering from “cluelessness,” I could give a list of many other individuals doing work that is relevant to real-world issues, who have pinned their successful careers on the concepts of the Semantic Web, sometimes because they feel it is going to be the next big thing, but also because of sheer intellectual excitement...My money's on them. They know what they're talking about, and aren't afraid to admit what they don't know."

The best is definately last:
"Postscript: on December 1, on this site, I'll describe a site I've built for a major national magazine of literature, politics, and culture. The site is built entirely on a primitive, but useful, Semantic Web framework, and I'll explain why using this framework was in the best interests of both the magazine and the readers, and how its code base allows it to re-use content in hundreds of interesting ways."

The Semantic Web looks to become a little bigger and a little better.

Wednesday, November 12, 2003

Fabl

Describing Computation within RDF "A programming language is described which is built within RDF. Its code, functions, and classes are formalized as RDF resources. Programs may be written using standard RDF syntax, or in a conventional JavaScript–based syntax which is translated to RDF."

Somehow I missed this being announced on the rdf-interest list. Interestingly, he's used DAML and not OWL, I can't see any obvious reason for that.

Metadata and Librarians

the semantic web, metacrap and libraries "Well, I may be biased, but a lot of it probably has to do with who exactly is creating the metadata. Librarians are probably as third-party and objective as you're going to get, when it comes to analyzing resources. Doctorow's concerns of overt misrepresentation for personal gain are lessened when the party creating the metadata really only has a stake in its correct representation. Of course, librarians do make judgement calls: they're unlikely to create metadata for just any resource and do apply some rules (certain subject specificities, etc.) that may not actually provide the most useful metadata."

"I do agree that a global ontology is probably hopeless, but I do think that applying an assortment of ontologies where necessary (or possible) might go a ways towards creating data that can be globally related."

"But anyway, even if a global ontology, whether made up of related ontologies or not, can't happen, or doesn't happen, it's not like its subsets don't remain valuable."

Tuesday, November 11, 2003

30TB

Survey: Biggest Databases Approach 30 Terabytes "For its Top Ten Program, Winter Corp. gathers voluntary submissions from companies worldwide that are running large databases. The program requires that the databases must be in production and contain at least 1 terabyte of data (or 500 megabytes of data if running on Windows). The results, divided into 24 categories, are based on the amount of online data running on the database."

"The largest decision-support database in this year's survey is from France Telecom and handles 29.2 terabytes of data, triple the size of the top database in that category in Winter's last survey in 2001."

I wonder what the rules are regarding distributed databases. Could you submit the Semantic Web as a runner in this competition?

Sick of Carping on about the Semantic Web

LOOKS FISHY TO ME "Leave it to the fashion world to make bikinis out of leather. But these sexy little swimsuits are cut from a material that's more apropos for water wear than you might think: tanned, dyed salmon skin. Soft, smooth and lightweight, this particular form of "sea leather" has a natural elasticity that won't sag after a dip."

Now that's making asturgeons. That's the kind of entailment most guys would relate to. Fish Eye for the Straight Guy. Ergh. I haven't even watched that show. Alright, back to work.

Semantic Web for Workflow

Why an ontology for workflow? "The purpose is clear and simple - workflow orchestration at any level requires both the Pull and Push model...the Push model happens when a workflow has to report to its superior workflow - a natural strategy in many current solutions - this is easy when both the workflows are implemented using the same platform. Things get botchy when they are implemented differently - and thats where the ontology comes in. Now this person also asked my why an ontology and not a simple xml - and my answer was the requirements forced me to. One of the requirements was to ask a given instance to show me all 'user entry form' kind of activities which were completed recently. Now if you look at the last sentence carefully you see I use the clause 'kind of' - a good ontology fit !."

A Little More Positive

Shirky on SemWeb, a Case Study "What bothers me about Shirky's essay is that the building blocks of the Semantic Web are useful even if we never achieve nirvana, whereas Shirky's critique makes it seem like they are an academic exercise of no utility to anyone now or in the future."

XML vs. RDF :: N × M vs. N + M "The RDF model along with the logic and equivalency languages, like OWL (nee DAML+OIL), altogether referred to as "the Semantic Web", is the current W3C effort to address that problem. Factoring those equivalencies into a common layer allows application developers to work with an already-integrated model, and the libraries to do the work of mapping each schema to the integrated model using a shared schema definition: N + M"

shirky touches off a storm of semantic web posts "My inner librarian has a response brewing, as well, but it will have to wait a bit. It’s the last week of the quarter, and I’ve got exams and project to give and grade. Next week is blogging and reading catch-up time."

Clay Shirky Predicts the Demise of The Semantic Web "However, his arguments "syllogisms are not very useful" and "we describe the world in generalities" are extremely bitter pills to accept. (Darn! I haven't thought about these two, he hits a homer!) However, he is correct, we should be digging deeper into probability networks rather than logic inferencing engines. Semantic technology still has its place in the world (maybe the enterprise though), however I agree with him in that it isn't going to scale for the web."

Monday, November 10, 2003

Business Web

Metadata, Semantics and All That "I’m closer to the Semantic Web project than most, and remain significantly unconvinced, but I don’t think dismissing it is as easy as shooting fish in a barrel, and anyhow shooting fish in a barrel is unsportsmanlike and generally sucks...It’s in the Metadata Right at the moment, a lot of the Semantic Web theory is doomed to remain just that—theory—because it relies on the existence of a critical mass of metadata, and if we’ve learned one thing in recent decades it’s that there is no such thing as cheap metadata. I’m pretty convinced that if you could build up a lot more metadata you could make the Web a more useful place, and I’ve thought a whole lot about this problem over the years, and really haven’t made much progress...However, if all of a sudden there were a million machine-readable business facts there for anyone to read, I think that quite a few software-savvy and accounting-savvy entrepreneurs would retreat into their garages and there would be some considerable surprises in store."

Web Rules, Okay?

DRAFT: Semantic Web Rules Working Group Charter "The W3C Team, with input from the Semantic Web Coordination Group, is presently involved in drafting a (Member Confidential) proposal to its Membership for a "Semantic Web Phase 2 Activity"...The group is chartered to develop a practical and useful rules language for the Semantic Web, along with a corresponding language for expressing justifications. The rules language will allow the expression of the knowledge that when certain things are true, certain other things must also be true; the justification language will allow the expression of the knowledge of how certain rules were used to reach a conclusion."

Dan Brickley had some interesting comments:
"What you might end up with here is an RDF *description* of a query-related data structure. Maybe handy for testcase-style interop, but pretty ugly to read and think about. I believe DAML Query works this way. My understanding of XQuery btw is that they started out with an XML syntax but now mostly focus on the non-XML syntax, since it is vastly more usable. My hunch is that RDF Query might go the same way...

Closest you can get and still be pretty is a kind of query-by-example, with bNodes for variables, perhaps decorated with variable names in a well-known namespace. Such RDF/XML would never be taken assertionally but used to ask questions. I think Edutella have something in this vein. Sorry I'm in a rush or I'd do the googling for links. Also this approach doesn't allow blanks for property names, since RDF/XML doesn't allow that."

Review Vocabulary

RDF Review Vocabulary "Review and rate blogs, CDs, books, software, whatever. Suitable for inclusion in any RDF-based language : FOAF, RSS etc."

Sunday, November 09, 2003

Wallop Wallop

Will Microsoft Wallop Friendster? "In fact, Wallop is Microsoft's venture into the red-hot social-networking arena, using the common Microsoft tack of piecing together existing technologies and packaging them for the novice user. Those technologies include Friendster-style social-networking capabilities, super-simplistic blogging tools, moblogging, wikis and RSS feeds, all based on Microsoft's Instant Messenger functionality."

Reiterate, Reiterate, Reiterate

Shirky's Men of Straw and Comments: Shirky misses both say pretty much the same things I did.

Shirky on the Semantic Web:

"Shirky nonetheless bases his argument on a central fallacy: the Semantic Web as monolith, as a "thing" to be supported or opposed."

Deconstructing The Syllogistic Shirky:

"There never was a suggestion that all metadata work cease and desist as we sit down on some mountaintop somewhere and fully derive the model before allowing the world to proceed."

Totality:

"Sure, there is certainly a segment of the semantic web community who think we'll be able to do the "strong semantic web", which can somehow make inferences without much human work in the metadata space, but the bulk of the folks I've talked to about it are well aware of the difficulty of that kind of problem - and they are much more focused on, as Ken Macleod puts it, the "relational model of RDF and its ability to integrate decentralized data models"."

Most of these can be linked from either Syllogism or Semantic Examinations.

Other links (most positive towards the article): Clay Shirky smacks syllogism around. (Metafilter), Semantic web systemantics, Clay Cements the Semantic, Shirky: The Semantic Web, Syllogism and Worldview, Shirky on SemWeb and Pass It Around.

Saturday, November 08, 2003

Why, Why, Why?

"Despite their appealing simplicity, syllogisms don't work well in the real world, because most of the data we use is not amenable to such effortless recombination. As a result, the Semantic Web will not be very useful either...This sentiment is attractive precisely because it describes a world simpler than our own. In the real world, we are usually operating with partial, inconclusive or context-sensitive information."

Hmm, relational databases are used everyday and they make assertions - even incorrect ones. When you create a row in a table you're saying "Andrew lives at 12 Mulberry Lane" or "Andrew has $123 dollars in his account". Relational databases have the same problems with respect to partial, inconclusive or context-sensitive information.

RDF by itself, has no enforced schema, so you could quite easily store data that you would not be able to put in a relational database. However, by adding schemas you can then start querying or preventing from being stored data that does not fit your schema. It's a lot more flexible in that manner.

"Each of those statements is true, in other words, but each is true in a different way. It is tempting to note that the second statement is a generalization that can only be understood in context, but that way madness lies. Any requirement that a given statement be cross-checked against a library of context-giving statements, which would have still further context, would doom the system to death by scale."

To quote Tim Berners-Lee in "Weaving the Web":
"Databases are continually produced by different groups and companies, without knowledge of each other. Rarely does anyone stop the process to try to define globally consistent terms for each of the columns in the database tables...If HTML and the Web made all online documents look like one huge book, RDF, schema and inference languages will make all the data in the world look like one huge database"

"If a reasoning engine had pulled in all the data and figured the taxes, I could have asked it why it did what it did, and corrected the source of the problem.

Being able to ask 'Why?' is important. It allows the user to trace back to the assumptions that were made, and the rules and data used. Reasoning eninges will allow us to manipulate, figure, find and provide logical and numeric things over a wide-open field of applications."

"Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web. The processor could "think" about this til the silicon smokes without arriving at an answer."

Defining equality is exactly what Semantic Web technology allows you to do - by using things like OWL. You can say "Person Name = Name" if Person Name's First Name is equal to Name's Christian Name and Person Name's Last Name is equal to Name's Surname. In fact, defining equality also includes different languages and differing literals (like II, 10, 2, "two", etc.).

Again, we get Cory with the Metacrap and Mark with his tag soup.

Reading the sections "Worldviews Differ For Good Reasons" and "Worse is Better" sounds very similar to the sections in "Weaving the Web" about the Semantic Web and using databases, except of course Berners-Lee says the Semantic Web takes these things into consideration.

Clearly, Clay has missed the point of RDF and Semantic Web technologies. Maybe people should stop concentrating so much on the "Semantic" part of the name and more on the "Web" part. Saying that you can't use the Semantic Web technologies "a bit at a time, out of self-interest and without regard for global ontology" is a lot like saying you can't use TCP/IP, HTML, or HTTP for the same reasons.

Most of these arguments are just common Semantic Web misunderstandings. Maybe the Tuple or Relational Web would've been a better name.

The Semantic Web, Syllogism, and Worldview

Nokia Knowledge Navigator

I can't help being impressed by the Nokia 7700, well the Flash demos anyway. Comes with hand writing recognition, PDA functionality and, of course, it is a phone. CNN Money has a review. Also, Gartner released a report saying that Symbian will lose smartphone battle.

Friday, November 07, 2003

The Blue Matrix

Haven't been overly fussed with the Matrix series but here is an interesting read about the blue matrix ([1],[2]).

Storing RDF

Workshop on Semantic Web Storage and Retrieval - Position Papers.

The Indexing and retrieving Semantic Web resources: the RDFStore model from ASemantics:

"The number of 'sub queries' needed to satisfy a single RDF query is often several orders of magnitude larger than commonly seen in RDBMS applications. Also, very often to save space DBAs design tables using significant number of indirect references, deferring to the application, or a stored procedure layer, for expanding the operation into large numbers of additional join operations just to store, retrieve or delete a single atomic statement from the database and maintaining consistency."

"Such indexes map the RDF nodes, contexts and free-text words contained into the literals to statements. There are several advantages to this approach. First, the use of a hybrid run-length and variable-length encoding to compress the indexes makes the resulting data store much more compact. Second, the use of bitmaps and Boolean operations allows matching arbitrary complicated queries with conjunction, disjunction and free-text words without using backtracking and recursion techniques. Third, this technique gives fine-grained control over the actual database content."

I'm fairly sure that using hashing for databases is the wrong approach, using something that guarantees unique identifiers, like a node pools and string pools, seems much more sensible:
"For efficient storage and retrieval of statements and their components we assume there exist some hash functions which generates a unique CRC64 integer number for a given MD5 or SHA-1 cryptographic digest representation of statements and nodes of the graph."

Also of interest, Prolog-based RDF storage and retrieval (optimising a store for rdfs:subPropertyOf relation) and Jena position paper

Thursday, November 06, 2003

Sabre use MySQL and Linux

Sabre Holdings Air Shopping Products Leverage MySQL Database "Sabre Holdings has migrated its Air Shopping products from a mainframe platform to a combination of HP NonStop servers and a cluster of 45 HP rx5670 servers with MySQL running on the 64-bit Intel Itanium 2 processor with the Linux operating system. Over 100 MySQL database tables are replicated on a 7 X 24 basis through GoldenGate's data synchronization software. The total amount of data held in the clustered MySQL databases is approximately 75 gigabytes.

"We benchmarked our application on several databases, including open source, commercial and a specialized in-memory relational database, and MySQL was the best performing database,” said Alan Walker, vice president of Sabre Labs."

via They use MySQL? Yes. I told you so...

Wednesday, November 05, 2003

Why Longhorn will be Slow (apparently)

The Emperor's New Code "It is interesting that in all the demos and discussions at the PDC, nobody worried about performance. I have to believe XAML imposes substantial overhead on the GUI (look at what XUL did to Mozilla). And vector graphics? Hopefully it can all be pawned off on GPUs and everything will work okay. WinFS is going to be a big time resource hog. I'm guessing it is painfully slow now and that there’s a bunch of people working hard trying desperately to make it fast enough (not to be confused with fast, period). Indigo isn't far enough along for performance to be assessed, but because SOA is simpler than object proxying Indigo has a great chance to be faster than COM+ or DCOM or .NET remoting (none of which were fast enough to be useful in “real” applications). Let's hope the security wrappers don't kill the basic speed of ASMX; in the real world people still use sockets with no security whatsoever, because they're fast."

Performance, Microsoft? As the author states, Microsoft usually gets performance right, except maybe on the Mac.

IBM Releases SW Technology - Snobase

"A new systems management technology known as SNOBASE (Semantic Network Ontology Base) or the Ontology Management System has been released by IBM alphaWorks. The Java-based application provides a "framework for loading ontologies from files and via the Internet and for locally creating, modifying, querying, and storing ontologies. It provides a mechanism for querying ontologies and an easy-to-use programming interface for interacting with vocabularies of standard ontology specification languages such as RDF, RDF Schema, DAML+OIL, and W3C OWL. Internally, the SNOBASE system uses an inference engine, an ontology persistent store, an ontology directory, and ontology source connectors. Applications can query against the created ontology models and the inference engine deduces the answers and returns results sets similar to JDBC (Java Data Base Connectivity) result sets."

""The Query Optimizer allows applications to query large knowledge bases, whose entire set cannot be loaded into the working memory, by querying the ontology source for appropriate pieces as they are needed. In addition, the task of the Query Optimizer is to not only optimize the retrieval of information from ontology sources, but also coordinate queries that span multiple sources. This component is still under construction, and will be added to future editions of IBM Ontology Management System..."

"The Ontology Source Connectors provide a mechanism for reading and writing ontology information to persistent storage. The simplest connector is the file connector that is used to persist ontologies to the local file system. In addition, there will be connectors for storing ontological information in remote servers. Also, the connectors are used to implement caching of remote information to improve performance and reliability...""

Their query language is based on DQL (DAML Query Language).

http://www.alphaworks.ibm.com/tech/snobase

Yet Another Learning Environment

YALE "63 different operators for training, validation, feature selection and generation, and preprocessing plus 50 classifiers, clusterers and association learners available from the Weka package" Written in Java and comes with a Swing UI.

Monday, November 03, 2003

R is for RDF

RxPath is a language for querying a RDF model. It is syntactically identical to XPath 1.0 and behaves very similarly.

RxSLT is a language for transforming RDF to XML.

RxUpdate is a language for updating an RDF model.

RxML is an alternative XML serialization for RDF that is designed for easy and simple authoring, particularly in conjunction with RhizML.

Racoon is a simple application server that uses an RDF model for its data store, roughly analogous to RDF as Apache Cocoon is to XML.

Rhizome is a simple content management and delivery system that is similar to a Wiki except that you can author arbitrary XML and RDF metadata and the structure of the website is stored as RDF. This allows both the content and structure to be easily repurposed and complex web application rapidly developed.

Built on top of 4suite.

Sunday, November 02, 2003

Similar but not standard

I was hoping that Microsoft would implement Avalon and the like in Longhorn using standards alas it's not to be, to quote the quoted:
"I think the bottom-line of XAML is that it is equally useful for creating both desktop applications, web pages, and printable documents. This means that Microsoft may be attempting to simultaneously obsolete HTML, CSS, DOM, XUL, SVG, SMIL, Flash, PDF. At this point, the SDK documentation is too incomplete to firmly judge how well XAML compares with these formats, but I hope this lights a fire under the collective butt of the W3C, Macromedia, and Adobe. 2006 is going to be a fun year."

Saturday, November 01, 2003

Feedback on Sherpa

Now that's Semantic Web(?) "Having said this, though, some of what I read leads me to think this isn't as open as I thought at first glance. First, if I read this correctly, the Sherpa calendar information is centralized on the Sherpa servers. I'm assuming by this, again with just a first glance, that Semaview is providing the P2P cloud through which all of the clients interact in a manner extremely similiar to how Groove works. If this is true, I've said it before and will again -- any hint of centralization within a distributed application is a point of weakness and vulnerability, the iron mountain hidden within the cloud.

Second, I can't find the calendar RDF/XML out at the sites that use the product. There are no buttons at these sites that give me the RDF/XML directly. Additionally, trying variations of calendar.rdf isn't returning anything either. Again, this is a fast preliminary read and I'll correct my assumptions if I'm wrong -- but is the only way to access the RDF/XML calendar information through SherpaFind? How do bots find this data? "

More WinFS

Extended Blog Conversation "Metadata needs to be standardized, Metadata needs to be trusted / Metadata needs to be error-correcting, When performed on a large scale, moderation detects and FIXES "lies" in metadata, When a user first uses an aggregator, it should still be able to intelligently search. under repeat usage, it should get smarter, the more sites the user adds to their list, the "smarter" the search will get."

And the answer to these questions are .NET and WinFS.

Along similar lines, Can GNOME Storage Keep Up with Longhorn's WinFS?: "Storage is the only possible competitor to WinFS for the Linux community. It needs to be adopted, pushed, and developed NOW. It's easy to make fun of Longhorn for being so hyped up when it won't be released until 2005/2006, but the fact is that developers are going to start working on Longhorn software very soon. And developers create the market. If you want Linux to succeed on the desktop, you need apps. And apps need developers. And developers aren't going to care about Linux if they can make use of far superior technology on Windows. Which, I'm afraid, is starting to be the case. Just when open source was beginning to gain ground in the desktop arena, it is now starting to loose its edge.Storage is the only possible competitor to WinFS for the Linux community. It needs to be adopted, pushed, and developed NOW. It's easy to make fun of Longhorn for being so hyped up when it won't be released until 2005/2006, but the fact is that developers are going to start working on Longhorn software very soon. And developers create the market. If you want Linux to succeed on the desktop, you need apps. And apps need developers. And developers aren't going to care about Linux if they can make use of far superior technology on Windows. Which, I'm afraid, is starting to be the case. Just when open source was beginning to gain ground in the desktop arena, it is now starting to loose its edge."

Microsoft's New WinFS Gets the PDC Buzz "The next SQL Server edition, which is slated for beta release in the first half of 2004, features advancements such as a Common Language Runtime (CLR) that will be hosted in the database engine in order to give developers the ability to choose from a variety of development languages for building applications.

Mangione said the enhancements with XML and Web services in the next SQL Server will "provide developers with increased flexibility, simplify the integration of internal and external systems, and provide more efficient development and debugging of line-of-business and business intelligence applications.""

The end of stand-alone databases? It'll be interesting to see the effect of relational file systems glued together by web services will have on the Semantic Web.

LingPipe

LingPipe "Version 1.0 tools include a statistical named-entity detector, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Named entity extraction models are included for English news and English genomics domains, and can be trained for other languages and genres."

Thursday, October 30, 2003

Semaview Release Sherpa

"Sherpa Calendar is a Windows / PC Platform (based) intelligent calendar application (iCal compatible) that allows anyone to easily publish and consume RDF based calendars and their HTML representation. Users create semantic content without even knowing it!"

"SherpaFind is a calendar search engine for machine understandable RDF and ICS representations of intelligent calendars and events. SherpaFind allows anyone to quickly and easily search for calendars of interest and preview or subscribe to their iCal or RDF representation. An event search capability will be added very soon. "

Their technology overivew shows they're using Apache, MySQL, PHP and the like:
http://www.semaview.com/developers/documentation.html

Wednesday, October 29, 2003

Mooting Mooter

Graphical Web searching gets Mooted “We have experienced triple the level of traffic of our most ambitious target,” Cappell said. “It’s really a lovely problem to have.”

The Mooter search engine, named after “a question that can have more than one answer”, is an Australian-made Web search tool which uses intelligent algorithms to group search results into themes or “clusters” of information. “In a traditional search a user might put in ‘travel’,” Cappell said. From the range of results brought up by this search, users may then narrow down what they’re looking for to “car hire” or “accommodation”, resulting in another list of search results, she said."

No such thing as bad publicity. The results on "windows" are all about the operating system likewise "java" is all about the programming language.

"How mooter technology works:
We push the results through our proprietary nodal structure. Phrases and chunks of meaning accumulate in nodes, our intelligent algorithms then analyse those nodes and present the user with a series of categories that encapsulate the content of the search results."

http://www.mooter.com/corp/

Kowari Update

In the current Developer Beta Release there are a couple of things that are being changed:
* The process of localization (resources to numeric values) and globalization (numeric values to resources) hasn't been totally moved over to the new way of doing things. Basically, globalization should only occur when we present the user with an Answer object.
* There's object allocation leaks (with anonymous resources due to changes in ARP) which leads to memory usage above the existing commercial version and is much less than what should be expected.

Tuesday, October 28, 2003

Commercial Software - Moral Failure

Kapor: Why the old development model is history ""No sustaining value was created during the boom years," said Kapor to an auditorium packed with a mix of open source and "traditional" developers representing virtually every mainline firm in Silicon Valley. Instead, a "lot of wealth was created and distributed among those who were either lucky or opportunistic." Kapor says that this GRQ -- Get Rich Quick -- development model has had a profoundly negative affect, amounting to "moral failure" that will be with us for years to come."

""Open source software, like flowing water, will go everywhere it can go," said Kapor. And that's not a bad thing; it may be harder to get ultra-rich developing software, he said, but it's easier to start a software company, thanks to the rich base of existing open source projects."

""Open source is a surprising success. No one predicted it in the '80s or '90s. It's completely crazy, it defies conventional wisdom, it has no central control, yet it produces products that are the equal of commercial software," he said.

Kapor ended his talk with a Bill Joy quote that seemed to sum up his take on open source software: "Sooner or later, you realize that most of the smart people in the world don't work for your company." "

Old is relative, open source software was used in the '50s, '60s and '70s and most money seems to have been made through hardware and later, consulting.

Stored Procedures for RDF

"My goal is to implement a fully-featured query system using just database stored procedures and plain ANSI-92 SQL statements."

"The prototype uses triples gathered from the FOAFnaut crawler and made available by Jim Ley. There are approximately 400,000 triples. This represents a small triple store but what is of interest is that some predicates are very popular (for example, the 'knows' predicate occurs in around 100,000 of the triples, so the work in evaluating 'least popular' terms first clearly pays dividends. Queries of 5 query terms are around 20ms on a laptop PC."

"An immediate goal is to import a much larger triple dataset, such as that used by 3store, which contains approximately 5 million triples. I also want to revisit the database schema, and ensure that existing RDF model structures are supported robustly. Once this has been achieved, I want to start to look at more sophisticated capabilities such as inference and subclasses."

Solving RDF triple queries with a relational database - another approach

XCoder

"XCoder is an extensible model transformation and code generation framework. The framework is itself modelled with UML and generated using the standard UML to Java model transformation included in the distribution.

Currently supported input meta models: UML via XMI
Currently supported output meta models: Java, C# and C++"

From MDA code generator framework XCoder is now open source

Safari 1.1 begins XUL Support

XUL "The piece of XUL that Safari implements is "(b) implement some additional layout primitives." XUL basically introduces four new layout primitives to CSS: the flexible box, the grid (flexible boxes in 2 dimensions), rich popups/tooltips, and stacks. Safari in Panther has implemented the first (and most useful) of those layout primitives, the flexible box model. Since the box layout primitives are defined via CSS, you can even use them in HTML (in either Safari or Mozilla)."

See also, Safari 1.1 and XUL renderer for KDE.

IE Development: XAML "The operating system lock-in created and perpetuates the browser lock-in. Now the browser will give that extra boost to the OS. “Our application only works on Windows, using IE. No, sorry, extra development costs are too much. You’ll just have to use a Windows desktop.”

Apple has a tremendous amount of momentum right now, and about three years to innovate and compete against an OS that was released two years ago tomorrow. If they embrace a development platform like XUL and actually make inroads with the customers who will be deploying applications using it, will it be enough?

All this can be somewhat negated by the opportunity that exists for Microsoft to play fair. If, as Simon postulates, XAML can be transformed server-side to XUL (using XSLT), then we all win. But Eric remembers his history, which suggests we shouldn’t rely on that happening. I’ll throw this into the ring then: even if it’s possible, will it be cost-effective? For organizations to spend an extra 20% of a development budget to support a 3% share of the market is a bit of a stretch. But it’s obviously do-able, seeing as how many still bend over backward for NN4.x."

SW Misunderstandings

"* - One big web - trust everything
* - One inconsistency trips it all up
* - One big ontology
* - AI has promised us so much before
* SW points to make
* - Communities of all sizes"

Also, there's no Semantic Web killer application, "Its the integration, stupid!".

http://www.w3.org/2003/Talks/1023-iswc-tbl/

Sunday, October 26, 2003

Shaddap You Face

" Joe Dolce had a giant hit in the early 1980s with Shaddap You Face. Since then over 50 different artists have recorded covers of the song, including a hip hop version, and it's become the biggest-selling Australian single ever, surpassing Slim Dusty's Pub with No Beer and Mike Brady's Up There Cazaley. "

From Shaddap You Face to Gift -- Joe Dolce talks and plays live in the Deep End

Kowari Developer Beta Release

At long last it's now available:

Kowari Developer Beta Release.

Features included:
* A transactional triple store capable of store many millions of triples,
* iTQL - A Squish based query language that allows subqueries, operands for data-types (greater than, less than, etc),
* Web based and command line iTQL interpreter,
* Descriptors - A combination of XSLT and iTQL that can be used to generate renderings of RDF data (comes with a v-card example),
* Lucene integration - full text insertion and querying,
* Views - allows the combination of multiple models,
* Jena 2.0 support - currently only through the use of ARP, and
* Written in Java - 1.4.0 and above required.

Proposed future features include:
* Improved ARP integration (non-memory bound using disk based string pool).
* Move distributed queries to server - use server join code.
* Streaming end to end (mainly Driver work).
* Jena 2 Support: "store" and "model" integration, support of OWL at query time, and full support of Joseki and RDQL.
* Pluggable data types.
* Pluggable security.
* Pluggable data handlers: EXIF extraction, MP3 extraction, and XML RSS extraction.
* Streaming Descriptors.
* Back-end refactoring (Windows, OS X, 64-bit Unix).
* Small embeddable version - Jena lite plus Kowari lite (should be less than 5MB).
* J2EE Connector and MBean support.
* Non-RMI version of streaming of queries.
* Support all OWL entailments of models at the query layer.
* 64-bit testing and loading of large data sets (~150 million) including
improving bulk loading support and 6 index support.
* Better iTQL command line processor.
* Review joins.
* Review subqueries vs ontologies.
* Upgrade of Lucene support.

Saturday, October 25, 2003

MS to Copy XUL

XUL and XAML at 4 PM "For those of you Mozilla folks who are interested in XUL, I suggest you pay close attention to what Microsoft is going to unveil next week at the PDC conference. The massive new Longhorn API will be revealed, including XAML, Microsoft's own markup language which is similar to XUL, but way more powerful. It's flattering to know that Microsoft is modelling the future of Windows UI development after a technology we all worked so hard to bring to life for Mozilla. We were truly ahead of our time.

Click here (http://msdn.microsoft.com/longhorn/) on Monday, October 27th and keep a napkin handy to wipe the drool off your face. And please, spare me complaints about how this is not a cross platform technology. Who cares, it's going to be so mind-blowing that it will make huge waves in the industry regardless."

See also, Microsoft will ship Longhorn Betas with built-in XUL motor this fall

If anyone asks

Google knows: Semantic Web, OWL, Ontology, Taxonomy and RDF.

Metalog 2.0b Released

"Metalog is a next-generation reasoning system for the Semantic Web. Historically, Metalog has been the first system to introduce reasoning within the Semantic Web infrastructure, by adding the query/logical layer on top of RDF."

Requires Python (at least version 2.2), and SWI-Prolog (at least version 5.0). 2.0b Download.

Revisiting Knowledge Navigator

In Apple's Knowledge Navigator revisited Jon Udell sees Google, iSight, WiFi and Powerbooks. What I found interesting was the type of interactions made available. As Jon says: "Presence, attention management, and multimodal communication are woven into the piece in ways that we can clearly imagine if not yet achieve. "Contact Jill," says Prof. Bradford at one point. Moments later the computer announces that Jill is available, and brings her onscreen. While they collaboratively create some data visualizations, other calls are held in the background and then announced when the call ends.I feel as if we ought to be further down this road than we are. A universal canvas on which we can blend data from different sources is going to require clever data preparation and serious transformation magic. The obstacles that keep data and voice/video networks apart seem more political and economic than technical."

Sculley's ACM paper The Relationship Between Business and Higher Education: A Perspective on the 21st Century, which has pictures of the Knowledge Navigator video throughout, is still an interesting read.

Booch Still Positive on the Semantic Web

Back in April, Booch said roughly the same thing as in this article: "One goal is to boost support of the semantic Web via modeling capabilities in the Rational Rose and Rose XDE (eXtended Development Environment) tools, said Grady Booch, an IBM Fellow"

IBM execs ponder technology plans

Open Workflows

Topicus open source workflow Reviews and news of Worklow Engines. Brief reviews include XFlow and OpenWFE.

New Version of JESS

Jess Inventor Opines About Rule Engines and Java "Jess is designed from the ground up for integration, and in Jess 7.0 it's going to get even better. Current versions of Jess can only reason about data in its own working memory (although you can use backward chaining to fetch data into working memory as needed.). Jess 7.0 is going to have the ability to reason about data that isn't in working memory, making it possible to efficiently make inferences about truly huge data sets.

Jess has been integrated with agent frameworks and other tools. It's also been integrated with the popular ontology editor Protégé 2000. This is a powerful combination that many people use to develop knowledge structures as well as code that acts on them...Third-party translators between Jess and RuleML and Jess and DAML exist. One of the features planned for Charlemagne (Jess 7.0) is native XML support."

Friday, October 24, 2003

Competition for DSpace?

The Fedora Project "The Fedora project was funded by the Andrew W. Mellon Foundation to build a digital object repository management system based on the Flexible Extensible Digital Object and Repository Architecture (Fedora). The new system, designed to be a foundation upon which interoperable web-based digital libraries, institutional repositories and other information management systems can be built, demonstrates how distributed digital library architecture can be deployed using web-based technologies, including XML and Web services."

Another way to grep dead trees (well hopefully better than grepping).

Amazon Search Goes Live

"Amazon.com unveiled a massive new search engine Thursday called "Search Inside the Book", containing 33 million pages of a collection of 120,000 books."

http://www.washtimes.com/upi-breaking/20031023-040028-9510r.htm

"It's an eerie ability, sort of an extension of the omniscient feeling one gets when digging around in google or the Internet Archive. It extends easy search capabilities to printed material, which fights the old addage about grepping dead treees. Of course, you're limited to a subset of Amazon's catalog (and not every book ever printed), but it's still an insanely useful feature.

With Amazon's web services initiative, it could lead to all sorts of interesting implications. Imagine if your local library had the ability to search the entire contents of its store of books, quickly and free of charge, and not only told you instantly which books were relevant, but offered to deliver them to your door for a reasonable fee. Good heavens."

http://www.oreillynet.com/pub/wlg/3918

Wired has a piece too called, The Great Library of Amazonia.

Semantic Web as a Service

Tutorials " The program focuses on the use of semantic technology to enable the next generation of Enterprise Solutions for business and government. The first Tutorials in the series are listed below. They will be offered in the Washington, DC area on November 3-4 and December 3-4, 2003. Additional classes are planned, and any offering can be customized to the specific needs of groups or organizations on request."

Querying RDF and SQL

Two interesting technical pieces Heterogeneous RDF Databases and RDF Access to Relational Databases.

A great summary of why you would create an RDF specific database:
"Querying generic triple stores are inefficient as they do not gather all the properties of an entity together. This forces a self join for each additional attribute involved in the query. Migrating to a conventional relation provides the efficiency we are used to, and the brittleness we are used to enduring. The goal of the heterogeneous rdf databases is to provide an optimizable compromose between the two."

It's quite possible to optimize a database for joins. As I'm so fond of saying RDF databases can be more relational, to the original relational model, than current commercial SQL databases.

Engaging the Hackers

How the Semantic Web Will Really Happen "What makes me think, you may be asking yourself, that the hackers and the LAMP crowd will ever work on the Semantic Web effort? After all, the open source world isn't exactly a hotbed of knowledge representation, formal reasoning, and logic programming. Ah, dear and gentle reader, I'm glad that I made you ask yourself that question, for now I can deploy my simplistic analogy with the Web itself. Before the Web, the free software world -- as it was called back then -- was, first, considerably smaller. As others have noted, the Web was an enabling technology for the hackers as much as the hackers (by creating the LAMP platform) enabled the Web itself. But, second, before the Web the free software world was hardly a hotbed of relational database, hypertext, and document markup technology. The Web was the motivation for an entire generation of hackers to learn SQL, SGML, XML, and so on.

It's not much of a leap to think that the promise of the Semantic Web may fuel a new generation of hackers to learn RDF, OWL, and rule systems. I anticipate that, at some point, we will talk about, say, an RORB (RDF, OWL, Rules, Bayes) platform for Semantic Web development."

"Aside from conference considerations, there are other things we can all do. Professors should encourage (or mandate?) their students to use open source software whenever possible, to participate in relevant open source projects and communities, to use open source resources like SourceForge in order to increase the visibility of research and increase the prospects for mutually fruitful collaboration. Finally, everyone in academia should think about the lesson of n3."

Finding Context

The Web: Search engines still evolving " "What's missing here is the context," Barak Pridor, CEO of ClearForest Corp., of New York City, a developer of data management software, told United Press International. "Information is only meaningful when it is in context."

Computer scientists around the globe, funded by private investors, and government agencies, such as the National Science Foundation and Science Foundation Ireland, are seeking to solve this vexing dilemma. The search problem is inherent in the Internet -- a technology already 30 years old that has been commercialized only within the last decade."

" Using a combination of statistical mathematics, heuristics, artificial intelligence and new computer languages, researchers are developing a "Semantic Web," as it is called, which responds to online queries more effectively. The new tools are enabling users -- now on internal corporate networks and, within a year, on the global Internet -- to search using more natural language queries."

" Investors -- including Greylock -- have given ClearForest $7.5 million in recent weeks to take its technology to the next level.

Computer scientists also are employing artificial intelligence for the Semantic Web, James Lester, chief scientist and chairman of LiveWire Logic in Research Triangle Park, N.C., a linguistic software agent developer, told UPI."

The End of the AI Winter?

Commercializing the Semantic Web "For reasons I don't entirely understand, the term "Semantic Web" tanks with corporate clients, with venture capitalists, and in other non-academic contexts. This may yet be a hangover from the AI Winter, but the interesting difference is that, as I discuss below, the reaction is mostly to the label and to its perceived implications, rather than to the technology itself. "Web Services" does much better, and one of the things Network Inference seems to have done, at least at the marketing level, is to hitch its semantic wagon to the web services star. (This is a move I suggested, though more in the research than marketing context, in an XML.com article last summer, "The True Meaning of Service".)

Given the problems with the various application spaces, Network Inference has apparently been working to define a new application space, one which the Gartner Group has coined as "semantic oriented business applications". That doesn't raise the hackles that "Semantic Web" raises; it's different than EAI, and it's nicely distinguished from "Web Services"."

Network Inference made news recently when they announced "...a strategic partnership with Rightscom, the digital strategy consultancy, and ioko, an enterprise technology services specialist." at the ISWC 2003.

Monday, October 20, 2003

Metadata: The future of storage?

Metadata: The future of storage? "When vendors discuss metadata-driven storage, the phrase "storage virtualisation" invariably comes up. Vendors will tell you that, depending on what the goals of a particular metadata application are, the benefits of storage virtualisation can range from improvements in retrieval performance to searchability to ease of management to better allowance for heterogeneity at the hardware level (but usually not all of the above).

Return on investment theoretically comes in the form of increased productivity to both end users and those tasked with planning and managing enterprise storage. Storage virtualisation can result in capacity optimisations that bring hardware savings."

"Databases consist of structured data, which means relational records that are usually fairly dynamic and that have highly relational characteristics. Unstructured data is a photograph. That's unstructured data, where you're storing a big object with a little bit of information around the object. It's usually what we call fixed content. It's a medical image or an e-mail record or a document that's been scanned in. It's not relational, but you still want it to be a record."