Wednesday, December 31, 2003

Statement vs Stating

This comes up from time to time (at work and on mailing lists) a useful summary of an RDF statement and a stating:
Statements/Statings. From the ILRT Semantic Web technical reports at a glance. Two other references: Does the model allow different statements with the same subject/predicate/object? and also part of Reification in the RDF Semantics document.

Commercial Ontologies

TeraView, Level 5 and BioWisdom starting 2004 in style "Ontology specialist BioWisdom also plans to make a “big announcement early in the New Year.

Ontology is a branch of science that deals with knowledge capture and representation. BioWisdom’s approach involves the development of specialised knowledgebases and the software tools for managing them.

Chief executive Gordon Smith Baxter said: “2003 has been a great year for BioWisdom. In January we secured a £2.5m investment from MB Venture Capital II and Merlin Ventures Fund IV."

BioWisdom and Network Inference on using ontologies for drug discovery.

Friday, December 26, 2003

Some Holiday Links

* Improved Topicalla Screenshot
* Weedshare
* XML 2003 Conference Diary - Notes the continual rise in interest in the Semantic Web.
* SnipSnap 0.5 - Now with the snips available as RDF.

Friday, December 19, 2003

Quintuples

Trust, Context and Justification While I'm not sure about using 5 tuples, we use 4 and make statements about the 4th tuple in order to do things like security, it's still an interesting paper with some good references.

Google Searching for Relevance

A Quantum Theory of Internet Value "When "the Internet" was unveiled to a doughnut-eating public a decade ago, we were promised unlimited access to vistas of encyclopedic knowledge. Every body would be connected to every thing, and we would never be short of an answer. What with the abundance of information, and the costs of transporting information approaching zero, the world would never be the same again.

Of course, a decade on, we know that real economics have prevailed. Information costs money. Those transport costs certainly aren't zero. And faced with a choice of a million experts, people gravitate towards experts with a good track record: i.e., for better or worse, paid journalists, qualified doctors or other centers of expertise.

Taxonomies also have been proved to have value: archivists can justify a smirk as manual directory projects dmoz floundered - true archivists have a far better sense of meta-data than any computerized system can conjure. If you're in doubt, befriend a librarian, and from the resulting dialog, you'll learn to start asking good questions. Your results, we strongly suspect, will be much more fruitful than any iterative Google searches. "

"At a convivial dinner recently, John Perry Barlow asked me why no one had written a story about how the most powerful organisations in the world were dependent on the most awful, antiquated and dysfunctional technology. Well, I ventured (to a deafening silence), maybe they were making ruthless choices, and really weren't too slavish about following techno-fads. Maybe the answer is in the question."

Wednesday, December 17, 2003

Commerical RSS

How to make RSS commercially viable "Without full content no aggregator can add much value by categorizing and filtering infomation, so no purely RSS based aggregator can make much money.

Despite all of the interest around web based syndication, people like Lexis Nexis will still make all the money unless this problem is solved."

Does it? Will it? Must it?

Interview: David Weinberger "What Shelley calls "the semantic web" is the Web itself. She puts it beautifully. And I agree 100% that the Web consists of meaning; it has to because we created this new world for ourselves out of language and music and other signifiers. But that meaning is as hard to systematize and capture as is the meaning of the offline world and for precisely the same reasons. The Semantic Web, it seems to me, often underplays not only the difficulty of systematizing human meaning (= the world) but also ignores the price we pay for doing so: making metadata explicit often is an act of aggression. Human meaning is only possible because of its gnarly, tangly, implicit, unlit, messy context. That's the real reason the Semantic Web can't scale, IMO.

If by "The Semantic Web" you merely mean "A set of domain-specific taxonomies some of which can be knit together to provide a greater degree of automation and improved searching," then I've got no problem with it. It's the more ambitious plans -- and the use of the definite article in its name -- that ticks me off when it comes to The Semantic Web."

Exceptions (again)

13 Exceptional Exception Handling Techniques notes "Declare Unchecked Exceptions in the Throws Clause" and "Soften Checked Exceptions" (always use RuntimeExceptions). This lead to JDO and its JDO Transaction class that uses runtime exceptions (although it does document them) instead of JDBC's use of checked exceptions. Similarly, the Spring Framework and in Chapter 4 of Expert One-on-One J2EE Design and Development the author discusses the usual reasons given to avoid checked exceptions:
"Checked exceptions are much superior to error return codes...However, I don't recommend using checked exceptions unless callers are likely to be able to handle them. In particular, checked exceptions shouldn't be used to indicate that something went horribly wrong, which the caller can't be expected to handle...Use an unchecked exception if the exception is fatal."

With both JDO and Spring the contract offered by the framework tells the client what they can and cannot handle. In my experience, this is not an either or situation. For example in JDO they use "CanRetryException" and "FatalException" - an exception that can be retried, could actually be fatal depending on the context and vice-versa. This often occurs when large frameworks are used in conjunction with one another - at the system integration level. Preventing the developer the choice, when integrating into larger frameworks, what exceptions can and cannot be caught often leads to unexpected exceptions tunneling through layers.

Tuesday, December 16, 2003

Drools in Groovy

Drools (an augmented implementation of Forgy's Rete algorithm) is now available in Groovy.

RDF Matures

Resource Description Framework (RDF) Is a W3C Proposed Recommendation and OWL Web Ontology Language Is a W3C Proposed Recommendation, the next step is Recommendation.

More On Practical RDF

Practical RDF Town Hall "Next, xmlhack editor Edd Dumbill explored how he applies RDF to his personal data integration problems, running personal information through the Friend-of-a-Friend (FOAF) RDF vocabulary, using the Redland framework as a foundation for processing...which has sprouted context features and Python bindings to support this work. "

"In the last presentation, Norm Walsh explained how he was using RDF to make better use of information he already had. Walsh explained that he had lots of data in various devices about a lot of people and projects, but no means of integrating it. Thanks to various RDF toolkits - "just by dumping it into RDF, it just kind of happens for free." Aggregation and inference are easy - and Walsh can get convenient notifications of people's birthdays without duplicating information between a file on a person and a calendar entry noting that."

Monday, December 15, 2003

Corporate Taxonomies

Verity provides standard ways to categorise content "Traditionally, taxonomies have been time-consuming and expensive to set up. A Taxonomy needs to be unambiguous and cover all topics of interest to the organisation. In other words, it has to be Collectively Exhaustive and Mutually Exclusive. Few individuals, not even the company librarian, have the breadth of knowledge of the organisation and its information assets to construct a set of categories that encompasses all information and meets all needs."

"Because a taxonomy reflects the most important knowledge categories of an organisation, organisations that carry out the same business activities need similar taxonomies. (In the same way that such organisations share similar core business processes). This fact and the rising importance of taxonomies to organisations has led Verity to make six tailorable taxonomies available to jump-start the development of an organisation's taxonomy. Verity's six taxonomies suit a range of business activities covering Pharmaceuticals, Defence, Homeland Security, Human Resources, Sales and Marketing, and Information Technology. Organisations that start with these predefined taxonomies can then tailor them to their specific needs. "

Sunday, December 14, 2003

The Winner Takes It All

Power Laws, Discourse, and Democracy "Well, inevitable inequality is one way to characterize the effects of power laws in social networks. But is it the most useful way? Drawing on the same body of research on power laws in social networks, and using similar methods, Jakob Nielsen chose to emphasize instead that, as he put it in a piece published on AlertBox (03.06.16): Diversity is Power for Specialized Sites:"

"Winner-takes-all networks may follow Pareto's Law (the 80/20 rule) with regard to the cumulative distribution of links. But, according to Barabasi in Linked, the distinctive distribution hierarchy of scale free networks will have been broken. Instead, the network takes on what Barabasi describes as a "star topology," in which a single hub snarfs nearly all the links, dwarfing its competitors. "

"It's the the dynamics of emergent systems being formalized in open source. It's the fragile and turbulent architecture of democracy.

By contrast, winner-takes-all networks wipe out the middle ground connecting leaders to the network's other players. With this, winner-takes-all networks strip away the architecture that supports the productivity of local niches."

Saturday, December 13, 2003

More Practical RDF

Practical RDF "There are two features of RDF that I find particularly practical: Aggregation [and] Inference".

Not Influential or Famous

Myths Open Source Developers Tell Ourselves

Friday, December 12, 2003

Groovy is Out

"Groovy is a powerful new high level dynamic language for the JVM combining lots of great features from languages like Python, Ruby and Smalltalk and making them available to the Java developers using a Java-like syntax."

GPath "When working with deeply nested object hierarchies or data structures, a path expression language like XPath or Jexl absolutely rocks."

The SQL and Markup example also looks interesting.

New Java Tools

Algernon-J is a rule-based reasoning engine written in Java. It allows forward and backward chaining across Protege knowledge bases. In addition to traversing the KB, rules can call Java functions and LISP functions (from an embedded LISP interpreter).

JRDF "A project designed to create a standard mapping of RDF to Java."

Google 2005

Searching With Invisible Tabs "Doesn't the future of search look great? Whatever type of information you're after, Google and other major search engines will have a tab for it!"

Highlights that people can suffer from "tab blindness" and why one UI doesn't suite all (fairly obvious).

Greed is Good for Data Emergence

The Age of Reason: The Perfect Knowing Machine Meets the Reality of Content "In brief, the concept of "data emergence" that is central to this knowledge Nirvana is best summed up by James Snell as "the incidental creation of personal information through the selfish pursuit of individual goals." From Snell's perspective, content value is shackled by dumb Web browsers that are used to share information about individuals with Web sites that then try to "personalize" their content - an experience that must be repeated at each and every Web site visited, since this knowledge about individual interests and preferences is not shared site-to-site. Instead of this, the perfect world would have a "smart" content service, probably on one's PC, that would retain knowledge of all of one's personal profile and interests in accessing content; content providers would then be "dumb" sources pumping information into the smart service, not having any detailed knowledge of who is using their services and how. No more nasty Web site publishers, just one perfect tacit machine that knows exactly what you're thinking and allows you to obtain and share thoughts with others."

"Aggregation can happen anywhere to the satisfaction of many."

Kowari Already Out There

Kowari for RDF developers - Early Release It's already out there - found this when doing a Google on Kowari. The real site will be Kowari.org but with OS the source is the real thing I guess.

RSS for the Knowledge Worker

From the Metaweb to the Semantic Web: A Roadmap "At Radar Networks we have been working to define this ontology -- which we call "The Infoworker Ontology" -- with a goal of evententually contributing it to a standards body in the future. The Infoworker Ontology is a mid-level horizontal ontology that defines the semantics of common entities and relationships in the domain of knowledge work -- things like documents, events, projects, tasks, people, groups, etc. The development and adoption of an open, extensible, and widely-used Infoworker ontology is a necessary step towards making the Semantic Web useful to ordinary mortals (as opposed to academic researchers).

By connecting microcontent objects to the Infoworker Ontology a new generation of semantic-microcontent (what we call "metacontent") is enabled. With the right tools even non-technical consumers will be able to author and use metacontent. "

Thursday, December 11, 2003

The Early Days...of the Semantic Web

Early Days Of a Data-Sharing Revolution " And next week, a Chicago company plans to start selling a $36 mini-scanner dubbed "iPilot" that shoppers can use to scan bar codes on products in stores, then upload the data to a computer and compare prices at Amazon.com.

All are examples of how Web sites, relying on a new generation of Internet software, are licensing their databases to business partners and outside developers in an attempt to spark innovation and reach more customers.

"In the past six to nine months, we have started ramping up the program to license eBay's data," eBay Vice President Randy Ching said."

iPilot hey? You'd think with millions of venture capital the least you could do would be better than combining Apple's and Palm's product names.

Saturday, December 06, 2003

Metaweb

The Birth of "The Metaweb" -- The Next Big Thing -- What We are All Really Building "But RSS is just the first step in the evolution of the Metaweb. The next step will be the Semantic Web...The Semantic Web transforms data and metadata from "dumb data" to "smart data." When I say "smart data" I mean data that carries increased amounts of information about its own meaning, structure, purpose, context, policies, etc. The data is "smart" because the knowledge about the data moves with the data, instead of being locked in an application...The Semantic Web is already evolving naturally from the emerging confluence of Blogs, Wikis, RSS feeds, RDF tools, ontology languages such as OWL, rich ontologies, inferencing engines, triplestores, and a growing range of new tools and services for working with metadata. But the key is that we don't have to wait for the Semantic Web for metadata to be useful. The Metaweb is already happening."

Friday, December 05, 2003

More Visualization of RDF

Meta-Model Management based on RDFs Revision Reflection Breaking up the different type of RDF (schema and properties) is interesting although probably still has scaling problems (as with most of these types).

Styling RDF Graphs with GSS "One such solution is GSS (Graph Style Sheets), an RDF vocabulary for describing rule-based style sheets used to modify the visual representation of RDF models represented as node-link diagrams."

I would imagine that somehow taking historgram data and mapping that from graphs maybe more interesting and would scale better. Much like how some image search engines work.

Al Gore - How I would've done it different

FREEDOM AND SECURITY
""I want to challenge the Bush Administration’s implicit assumption that we have to give up many of our traditional freedoms in order to be safe from terrorists...In both cases they have recklessly put our country in grave and unnecessary danger, while avoiding and neglecting obvious and much more important challenges that would actually help to protect the country...In both cases, they have used unprecedented secrecy and deception in order to avoid accountability to the Congress, the Courts, the press and the people." "

"In other words, the mass collecting of personal data on hundreds of millions of people actually makes it more difficult to protect the nation against terrorists, so they ought to cut most of it out.""

Thursday, December 04, 2003

Semantic Merging

Skip This Rant and Read Shirky "Shirky sums up many metadata challenges with a concise statement: "it's easy to get broad agreement in a narrow group of users, or vice-versa, but not both." Hey, if you don't make your metadata structurally interoperable, you can't have semantic merging."

I found the diagram, "Content Enterprise Metadata: Structural Interoperability & Semantic Merging" to be quite instructive.

Wednesday, December 03, 2003

Zeitgeist Mining

On RSS, Blogs, and Search "I've been thinking lately about the role of blogs and RSS in search, and that, of course, has led me to both the Semantic Web and to Technorati, Feedster, and many others. Along those lines, I recently finished a column for 2.0 on blogs and business information. I can't reveal my conclusions yet (my Editor'd kill me) but suffice to say, I find the intersection of blogging, search, and the business information market to be pretty darn interesting.
I'm certainly not alone. Moreover has created "Enterprise-Grade Weblog Search" - essentially, a zietgiest mining tool for corporations. One can imagine similar products from any of the RSS search engines, or even from the major marketing agencies of the world."

JRDF

Well, it's not an impressive name but after talking to the Jena and Sesame people it seems important to have a consistent binding to RDF written in Java.

JRDF is going to be based on the best bits from Sesame, Jena, Kowari and RDF API Draft. Any other contributions would be good. Currently I've got the start of Blank Node, Graph, Literal, Node, NodeFactory, Statement and URIReference.

One of the annoying things is that the W3C specs for RDF don't talk about models anymore but graphs. It will be odd for a Model to implement Graph - maybe. Or there might be a lot of renaming to be done.

I've removed our implementation from Kowari and started using it instead. Still lots to do. Once I have Kowari done then it's Jena's turn.

Java Code

Abstract classes are not types "Have you ever seen code that declared a variable of type java.util.AbstractList? No? Why not? It's there, along with HashMap and TreeSet, etc. Because AbstractList is not a type. List is. AbstractList provides a convenient base from which to implement custom Lists. But I still have the option of implementing my own from scratch if I so choose. Maybe because I need some behaviour for my list such as lazy loading or dynamic expansion, etc. that wouldn't be satisfied by the default implementation."

"cglib is a powerful, high performance and quality Code Generation Library, It is used to extend JAVA classes and implements interfaces at runtime."

Tuesday, December 02, 2003

Blank Nodes

In RDF there are nodes. There are nodes that are resources (with or without URIs) and literals. The ones without URIs are blank nodes. These blank nodes can either be given a name (nodeID) or not. Statements are made of a subject (resource), predicate (resources with URIs) and object (anything). Simple enough.

Now I've been looking across three separate Java implementations. I found 8 implementations of classes designed to use blank nodes (resources without URIs):
* Two in Kowari,
* BNode, BNodeImpl and BNodeNode.
* AResource, Node_Blank and RDFNode.

What is maddening is that they each (well 7 of them) have a different way to get their name. What's even more maddening is this is right. Getting their name isn't a part of the RDF model - the most all of these different blank node implementations probably should have in common is equality and being the same type (just share a marker interface like Serializable).

Harpers.org brought to you by the Semantic Web

A New Website for Harper's Magazine "We cut up the Weekly Review into individual events (6000 of them, going back to the year 2000), and tagged them by date, using XML and a bit of programming. We did the same with the Harper's Index, except instead of events, we marked things up as “facts.”

Then we added links inside the events and facts to items in the taxonomy. Magic occured: on the Satan page, for instance, is a list of all the events and facts related to Satan, sorted by time. Where do these facts come from? From the Weekly Review and the Index. On the opposite side, as you read the Weekly Review in its narrative form, all of the links in the site's content take you to timelines. Take a look at a recent Harper's Index and click around a bit—you'll see what I mean.

The best way to think about this is as a remix: the taxonomy is an automated remix of the narrative content on the site, except instead of chopping up a ballad to turn it into house music, we're turning narrative content into an annotated timeline. The content doesn't change, just the way it's presented."

"A small team of Java coders and I are planning to take the work done on Harper's, and in other places like Rhetorical Device, and create an open-sourced content management system based on RDF storage. This will allow much larger content bases (the current system will start to get gimpy at around 30 megs of XML content—fine for Harper's, but not for larger sites), and for different kinds of content to be merged."

I'll have to look at Samizdat. Which "is a generic RDF-based engine for building collaboration and open publishing web sites." Seems to be the way things are heading.