Friday, December 19, 2003

Quintuples

Trust, Context and Justification While I'm not sure about using 5 tuples, we use 4 and make statements about the 4th tuple in order to do things like security, it's still an interesting paper with some good references.

Google Searching for Relevance

A Quantum Theory of Internet Value "When "the Internet" was unveiled to a doughnut-eating public a decade ago, we were promised unlimited access to vistas of encyclopedic knowledge. Every body would be connected to every thing, and we would never be short of an answer. What with the abundance of information, and the costs of transporting information approaching zero, the world would never be the same again.

Of course, a decade on, we know that real economics have prevailed. Information costs money. Those transport costs certainly aren't zero. And faced with a choice of a million experts, people gravitate towards experts with a good track record: i.e., for better or worse, paid journalists, qualified doctors or other centers of expertise.

Taxonomies also have been proved to have value: archivists can justify a smirk as manual directory projects dmoz floundered - true archivists have a far better sense of meta-data than any computerized system can conjure. If you're in doubt, befriend a librarian, and from the resulting dialog, you'll learn to start asking good questions. Your results, we strongly suspect, will be much more fruitful than any iterative Google searches. "

"At a convivial dinner recently, John Perry Barlow asked me why no one had written a story about how the most powerful organisations in the world were dependent on the most awful, antiquated and dysfunctional technology. Well, I ventured (to a deafening silence), maybe they were making ruthless choices, and really weren't too slavish about following techno-fads. Maybe the answer is in the question."

Wednesday, December 17, 2003

Commerical RSS

How to make RSS commercially viable "Without full content no aggregator can add much value by categorizing and filtering infomation, so no purely RSS based aggregator can make much money.

Despite all of the interest around web based syndication, people like Lexis Nexis will still make all the money unless this problem is solved."

Does it? Will it? Must it?

Interview: David Weinberger "What Shelley calls "the semantic web" is the Web itself. She puts it beautifully. And I agree 100% that the Web consists of meaning; it has to because we created this new world for ourselves out of language and music and other signifiers. But that meaning is as hard to systematize and capture as is the meaning of the offline world and for precisely the same reasons. The Semantic Web, it seems to me, often underplays not only the difficulty of systematizing human meaning (= the world) but also ignores the price we pay for doing so: making metadata explicit often is an act of aggression. Human meaning is only possible because of its gnarly, tangly, implicit, unlit, messy context. That's the real reason the Semantic Web can't scale, IMO.

If by "The Semantic Web" you merely mean "A set of domain-specific taxonomies some of which can be knit together to provide a greater degree of automation and improved searching," then I've got no problem with it. It's the more ambitious plans -- and the use of the definite article in its name -- that ticks me off when it comes to The Semantic Web."

Exceptions (again)

13 Exceptional Exception Handling Techniques notes "Declare Unchecked Exceptions in the Throws Clause" and "Soften Checked Exceptions" (always use RuntimeExceptions). This lead to JDO and its JDO Transaction class that uses runtime exceptions (although it does document them) instead of JDBC's use of checked exceptions. Similarly, the Spring Framework and in Chapter 4 of Expert One-on-One J2EE Design and Development the author discusses the usual reasons given to avoid checked exceptions:
"Checked exceptions are much superior to error return codes...However, I don't recommend using checked exceptions unless callers are likely to be able to handle them. In particular, checked exceptions shouldn't be used to indicate that something went horribly wrong, which the caller can't be expected to handle...Use an unchecked exception if the exception is fatal."

With both JDO and Spring the contract offered by the framework tells the client what they can and cannot handle. In my experience, this is not an either or situation. For example in JDO they use "CanRetryException" and "FatalException" - an exception that can be retried, could actually be fatal depending on the context and vice-versa. This often occurs when large frameworks are used in conjunction with one another - at the system integration level. Preventing the developer the choice, when integrating into larger frameworks, what exceptions can and cannot be caught often leads to unexpected exceptions tunneling through layers.

Tuesday, December 16, 2003

Drools in Groovy

Drools (an augmented implementation of Forgy's Rete algorithm) is now available in Groovy.

RDF Matures

Resource Description Framework (RDF) Is a W3C Proposed Recommendation and OWL Web Ontology Language Is a W3C Proposed Recommendation, the next step is Recommendation.

More On Practical RDF

Practical RDF Town Hall "Next, xmlhack editor Edd Dumbill explored how he applies RDF to his personal data integration problems, running personal information through the Friend-of-a-Friend (FOAF) RDF vocabulary, using the Redland framework as a foundation for processing...which has sprouted context features and Python bindings to support this work. "

"In the last presentation, Norm Walsh explained how he was using RDF to make better use of information he already had. Walsh explained that he had lots of data in various devices about a lot of people and projects, but no means of integrating it. Thanks to various RDF toolkits - "just by dumping it into RDF, it just kind of happens for free." Aggregation and inference are easy - and Walsh can get convenient notifications of people's birthdays without duplicating information between a file on a person and a calendar entry noting that."

Monday, December 15, 2003

Corporate Taxonomies

Verity provides standard ways to categorise content "Traditionally, taxonomies have been time-consuming and expensive to set up. A Taxonomy needs to be unambiguous and cover all topics of interest to the organisation. In other words, it has to be Collectively Exhaustive and Mutually Exclusive. Few individuals, not even the company librarian, have the breadth of knowledge of the organisation and its information assets to construct a set of categories that encompasses all information and meets all needs."

"Because a taxonomy reflects the most important knowledge categories of an organisation, organisations that carry out the same business activities need similar taxonomies. (In the same way that such organisations share similar core business processes). This fact and the rising importance of taxonomies to organisations has led Verity to make six tailorable taxonomies available to jump-start the development of an organisation's taxonomy. Verity's six taxonomies suit a range of business activities covering Pharmaceuticals, Defence, Homeland Security, Human Resources, Sales and Marketing, and Information Technology. Organisations that start with these predefined taxonomies can then tailor them to their specific needs. "

Sunday, December 14, 2003

The Winner Takes It All

Power Laws, Discourse, and Democracy "Well, inevitable inequality is one way to characterize the effects of power laws in social networks. But is it the most useful way? Drawing on the same body of research on power laws in social networks, and using similar methods, Jakob Nielsen chose to emphasize instead that, as he put it in a piece published on AlertBox (03.06.16): Diversity is Power for Specialized Sites:"

"Winner-takes-all networks may follow Pareto's Law (the 80/20 rule) with regard to the cumulative distribution of links. But, according to Barabasi in Linked, the distinctive distribution hierarchy of scale free networks will have been broken. Instead, the network takes on what Barabasi describes as a "star topology," in which a single hub snarfs nearly all the links, dwarfing its competitors. "

"It's the the dynamics of emergent systems being formalized in open source. It's the fragile and turbulent architecture of democracy.

By contrast, winner-takes-all networks wipe out the middle ground connecting leaders to the network's other players. With this, winner-takes-all networks strip away the architecture that supports the productivity of local niches."

Saturday, December 13, 2003

More Practical RDF

Practical RDF "There are two features of RDF that I find particularly practical: Aggregation [and] Inference".

Not Influential or Famous

Myths Open Source Developers Tell Ourselves

Friday, December 12, 2003

Groovy is out

"Groovy is a powerful new high level dynamic language for the JVM combining lots of great features from languages like Python, Ruby and Smalltalk and making them available to the Java developers using a Java-like syntax."

GPath "When working with deeply nested object hierarchies or data structures, a path expression language like XPath or Jexl absolutely rocks."

The SQL and Markup example also looks interesting.

New Java Tools

Algernon-J is a rule-based reasoning engine written in Java. It allows forward and backward chaining across Protege knowledge bases. In addition to traversing the KB, rules can call Java functions and LISP functions (from an embedded LISP interpreter).

JRDF "A project designed to create a standard mapping of RDF to Java."

Google 2005

Searching With Invisible Tabs "Doesn't the future of search look great? Whatever type of information you're after, Google and other major search engines will have a tab for it!"

Highlights that people can suffer from "tab blindness" and why one UI doesn't suite all (fairly obvious).

Greed is Good for Data Emergence

The Age of Reason: The Perfect Knowing Machine Meets the Reality of Content "In brief, the concept of "data emergence" that is central to this knowledge Nirvana is best summed up by James Snell as "the incidental creation of personal information through the selfish pursuit of individual goals." From Snell's perspective, content value is shackled by dumb Web browsers that are used to share information about individuals with Web sites that then try to "personalize" their content - an experience that must be repeated at each and every Web site visited, since this knowledge about individual interests and preferences is not shared site-to-site. Instead of this, the perfect world would have a "smart" content service, probably on one's PC, that would retain knowledge of all of one's personal profile and interests in accessing content; content providers would then be "dumb" sources pumping information into the smart service, not having any detailed knowledge of who is using their services and how. No more nasty Web site publishers, just one perfect tacit machine that knows exactly what you're thinking and allows you to obtain and share thoughts with others."

"Aggregation can happen anywhere to the satisfaction of many."

Kowari Already Out There

Kowari for RDF developers - Early Release It's already out there - found this when doing a Google on Kowari. The real site will be Kowari.org but with OS the source is the real thing I guess.

RSS for the Knowledge Worker

From the Metaweb to the Semantic Web: A Roadmap "At Radar Networks we have been working to define this ontology -- which we call "The Infoworker Ontology" -- with a goal of evententually contributing it to a standards body in the future. The Infoworker Ontology is a mid-level horizontal ontology that defines the semantics of common entities and relationships in the domain of knowledge work -- things like documents, events, projects, tasks, people, groups, etc. The development and adoption of an open, extensible, and widely-used Infoworker ontology is a necessary step towards making the Semantic Web useful to ordinary mortals (as opposed to academic researchers).

By connecting microcontent objects to the Infoworker Ontology a new generation of semantic-microcontent (what we call "metacontent") is enabled. With the right tools even non-technical consumers will be able to author and use metacontent. "

Thursday, December 11, 2003

The Early Days...of the Semantic Web

Early Days Of a Data-Sharing Revolution " And next week, a Chicago company plans to start selling a $36 mini-scanner dubbed "iPilot" that shoppers can use to scan bar codes on products in stores, then upload the data to a computer and compare prices at Amazon.com.

All are examples of how Web sites, relying on a new generation of Internet software, are licensing their databases to business partners and outside developers in an attempt to spark innovation and reach more customers.

"In the past six to nine months, we have started ramping up the program to license eBay's data," eBay Vice President Randy Ching said."

iPilot hey? You'd think with millions of venture capital the least you could do would be better than combining Apple's and Palm's product names.

Saturday, December 06, 2003

Metaweb

The Birth of "The Metaweb" -- The Next Big Thing -- What We are All Really Building "But RSS is just the first step in the evolution of the Metaweb. The next step will be the Semantic Web...The Semantic Web transforms data and metadata from "dumb data" to "smart data." When I say "smart data" I mean data that carries increased amounts of information about its own meaning, structure, purpose, context, policies, etc. The data is "smart" because the knowledge about the data moves with the data, instead of being locked in an application...The Semantic Web is already evolving naturally from the emerging confluence of Blogs, Wikis, RSS feeds, RDF tools, ontology languages such as OWL, rich ontologies, inferencing engines, triplestores, and a growing range of new tools and services for working with metadata. But the key is that we don't have to wait for the Semantic Web for metadata to be useful. The Metaweb is already happening."

Friday, December 05, 2003

More Visualization of RDF

Meta-Model Management based on RDFs Revision Reflection Breaking up the different type of RDF (schema and properties) is interesting although probably still has scaling problems (as with most of these types).

Styling RDF Graphs with GSS "One such solution is GSS (Graph Style Sheets), an RDF vocabulary for describing rule-based style sheets used to modify the visual representation of RDF models represented as node-link diagrams."

I would imagine that somehow taking historgram data and mapping that from graphs maybe more interesting and would scale better. Much like how some image search engines work.

Al Gore - How I would've done it different

FREEDOM AND SECURITY
""I want to challenge the Bush Administration’s implicit assumption that we have to give up many of our traditional freedoms in order to be safe from terrorists...In both cases they have recklessly put our country in grave and unnecessary danger, while avoiding and neglecting obvious and much more important challenges that would actually help to protect the country...In both cases, they have used unprecedented secrecy and deception in order to avoid accountability to the Congress, the Courts, the press and the people." "

"In other words, the mass collecting of personal data on hundreds of millions of people actually makes it more difficult to protect the nation against terrorists, so they ought to cut most of it out.""

Thursday, December 04, 2003

Semantic Merging

Skip This Rant and Read Shirky "Shirky sums up many metadata challenges with a concise statement: "it's easy to get broad agreement in a narrow group of users, or vice-versa, but not both." Hey, if you don't make your metadata structurally interoperable, you can't have semantic merging."

I found the diagram, "Content Enterprise Metadata: Structural Interoperability & Semantic Merging" to be quite instructive.

Wednesday, December 03, 2003

Zeitgeist Mining

On RSS, Blogs, and Search "I've been thinking lately about the role of blogs and RSS in search, and that, of course, has led me to both the Semantic Web and to Technorati, Feedster, and many others. Along those lines, I recently finished a column for 2.0 on blogs and business information. I can't reveal my conclusions yet (my Editor'd kill me) but suffice to say, I find the intersection of blogging, search, and the business information market to be pretty darn interesting.
I'm certainly not alone. Moreover has created "Enterprise-Grade Weblog Search" - essentially, a zietgiest mining tool for corporations. One can imagine similar products from any of the RSS search engines, or even from the major marketing agencies of the world."

JRDF

Well, it's not an impressive name but after talking to the Jena and Sesame people it seems important to have a consistent binding to RDF written in Java.

JRDF is going to be based on the best bits from Sesame, Jena, Kowari and RDF API Draft. Any other contributions would be good. Currently I've got the start of Blank Node, Graph, Literal, Node, NodeFactory, Statement and URIReference.

One of the annoying things is that the W3C specs for RDF don't talk about models anymore but graphs. It will be odd for a Model to implement Graph - maybe. Or there might be a lot of renaming to be done.

I've removed our implementation from Kowari and started using it instead. Still lots to do. Once I have Kowari done then it's Jena's turn.

Java Code

Abstract classes are not types "Have you ever seen code that declared a variable of type java.util.AbstractList? No? Why not? It's there, along with HashMap and TreeSet, etc. Because AbstractList is not a type. List is. AbstractList provides a convenient base from which to implement custom Lists. But I still have the option of implementing my own from scratch if I so choose. Maybe because I need some behaviour for my list such as lazy loading or dynamic expansion, etc. that wouldn't be satisfied by the default implementation."

"cglib is a powerful, high performance and quality Code Generation Library, It is used to extend JAVA classes and implements interfaces at runtime."

Tuesday, December 02, 2003

Blank Nodes

In RDF there are nodes. There are nodes that are resources (with or without URIs) and literals. The ones without URIs are blank nodes. These blank nodes can either be given a name (nodeID) or not. Statements are made of a subject (resource), predicate (resources with URIs) and object (anything). Simple enough.

Now I've been looking across three separate Java implementations. I found 8 implementations of classes designed to use blank nodes (resources without URIs):
* Two in Kowari,
* BNode, BNodeImpl and BNodeNode.
* AResource, Node_Blank and RDFNode.

What is maddening is that they each (well 7 of them) have a different way to get their name. What's even more maddening is this is right. Getting their name isn't a part of the RDF model - the most all of these different blank node implementations probably should have in common is equality and being the same type (just share a marker interface like Serializable).

Harpers.org brought to you by the Semantic Web

A New Website for Harper's Magazine "We cut up the Weekly Review into individual events (6000 of them, going back to the year 2000), and tagged them by date, using XML and a bit of programming. We did the same with the Harper's Index, except instead of events, we marked things up as “facts.”

Then we added links inside the events and facts to items in the taxonomy. Magic occured: on the Satan page, for instance, is a list of all the events and facts related to Satan, sorted by time. Where do these facts come from? From the Weekly Review and the Index. On the opposite side, as you read the Weekly Review in its narrative form, all of the links in the site's content take you to timelines. Take a look at a recent Harper's Index and click around a bit—you'll see what I mean.

The best way to think about this is as a remix: the taxonomy is an automated remix of the narrative content on the site, except instead of chopping up a ballad to turn it into house music, we're turning narrative content into an annotated timeline. The content doesn't change, just the way it's presented."

"A small team of Java coders and I are planning to take the work done on Harper's, and in other places like Rhetorical Device, and create an open-sourced content management system based on RDF storage. This will allow much larger content bases (the current system will start to get gimpy at around 30 megs of XML content—fine for Harper's, but not for larger sites), and for different kinds of content to be merged."

I'll have to look at Samizdat. Which "is a generic RDF-based engine for building collaboration and open publishing web sites." Seems to be the way things are heading.

Thursday, November 27, 2003

Simulated OS for teaching Assembly

Apoo is very similar to one of the uses of RCOSjava.

Jena2 Manager

"Briefly stated, I needed a means by which I could quickly hack models and ontologies to learn how to use Jena2. There remain many things to learn, and many things to finish coding in the program. I'm turning it loose so that others can contribute to its development. The JOSL license requires that those who fix things in the code or otherwise improve it return their code to the public. JOSL does not require that users use an open source license on new code that extends the licensed code."

Jena2 Manager

On Ontologies and Gnomes

The AI gnomes of Zurich "McDermott ends with a zinger:

It's annoying that Shirky indulges in the usual practice of blaming AI for every attempt by someone to tackle a very hard problem. The image, I suppose, is of AI gnomes huddled in Zurich plotting the next attempt to --- what? inflict hype on the world? AI tantalizes people all by itself; no gnomes are required. Researchers in the field try as hard as they can to work on narrow problems, with technical definitions. Reading papers by AI people can be a pretty boring experience. Nonetheless, journalists, military funding agencies, and recently the World-Wide Web Consortium, are routinely gripped by visions of what computers should be able to do with just a tiny advance beyond today's technology, and off we go again. Perhaps Mr. Shirky has a proposal for stopping such visions from sweeping through the population."

The entry links to a paper which lists the things that the Semantic Web "violates" wirth respect to traditional assumptions about AI. Including lack of referential integrity, variety in quality, diversity and no single authority. As noted, these are the same problems with human intelligence too.

Wednesday, November 26, 2003

webMethods going Semantic

Interview: webMethods CEO eyes Web services innovation "Secondly, there’s a whole other layer to deal with, what I call the semantic integration problem. Web services are great but they standardize pure connectivity between applications. The applications still have highly varied data models, extremely different ideas of what business processes should look like. Yet for most large organizations, a business process is going to span many applications. So you’re always going to need in the middleware stack something that can do wrapping, transformation, and, more than that, can actually keep the model of how the business processes are implemented across all of the infrastructure pieces.So [you need] something that’s technology-neutral underneath, like our Fabric product, and then on top have the ability to orchestrate business processes across all of these nodes in the fabric. Our customers now want to get real-time intelligence about what’s happening with the business and with the business processes, and they want to see it in dashboards, they want alerts. So we can put real-time monitoring around [IT infrastructure] at the business process level."

"We’re also able to offer enterprise event management, [injecting] business events into some kind of AI [artificial intelligence]-based rules engine."

Web Services in RDF

The question of how Web Services and the Semantic Web came up again recently. Here are a few links to current work in the area:
* Semantic Web enabled Web Services,
* Government Semantic XML Web Services Community of Practice,
* BPEL2DAML-S,
* Esperanto,
* Meteor-S,
* Supercharging WSDL with RDF, and
* SWAD-Europe Thesaurus Activity.

Tuesday, November 25, 2003

Don't Panic

Neil Gaiman hitchhikes through Douglas Adams' hilarious galaxy

Only 3

Three Uses for the Semantic Web They include:
* Sideline Semantics (or how to cut down on those darn post-planning columns),
* The Policy Ontology, and
* Cross Domain Searching for Calendar Concerns.

The Policy Ontology was perhaps the most interesting:
"I decided to tackle this with an interface to Jena for Apache Cocoon, or to use Cocoon parlance, a Jena-based transformer. I had no idea what kind of systems sat behind virtual reference applications, but I did know the protocol used underneath the queries was based on SOAP, and Cocoon excels at inserting itself in between any XML stream and adding value to the contents. So my approach was to use Jena's inference capabilities to map different classification schemes based on relationships defined in either RDF Schema or OWL. Yes, you could do the same thing with a table or two, and a thousand other ways, but the ontology approach provides a formal syntax for defining relationships."

Seems similar in idea to Sherpa Calendar.

The application is WIBS.

Fractally Yours

Openness & Interconnection "Big Fractal Tangle would be the name of a blog...A decade into the first Web, we've now got way too much information available, too much for any of us to sift through easily, which is why we need Round Two: the annotated, interconnected, Web. This new organic, evolving, maintainable, improvement will do more than simply increase the accuracy of our Google searches. It'll help real people understand and visualize interconnection, which in my opinion will alter our society profoundly for the better.

This was to be the point of my paper. Driving home that night, my brain frazzled and my voice hoarse from too much talk, I realized the topic was too big for a single paper. It'll have to be a blog."

See also: The Fractal nature of the Web.

Monday, November 24, 2003

CEUR Workshop Proceedings

CEUR Workshop Proceedings includes Semantic Integration, ICSW 2003 and Practical and Scalable Semantic Systems.

Sunday, November 23, 2003

Random RDF Tools

MusicBrainz Java API, RDFical (English version), MnM and FOAF Explorer - some of these I've covered before.

One Stop Schema Shop

SchemaWeb " SchemaWeb is a repository for RDF schemas expressed in the RDFS, OWL and DAML+OIL schema languages.
SchemaWeb is a place for developers and designers working with RDF. It provides a comprehensive directory of RDF schemas to be browsed and searched by human agents and also an extensive set of web services to be used by RDF agents and reasoning software applications that wish to obtain real-time schema information whilst processing RDF data. "

Who are you?

Gillmor Takes On Dvorak's Anti-Blog Stance ""Perseus thinks that most blogs have an audience of about 12 readers," Dvorak argues. Yes, John, but who are those 12? If one of them is Bill Gates, and another is Tony Scott, CTO of General Motors, and another is John Cleese, well you get the idea. Sometimes it's who you know as much as what. RSS only amplifies this, allowing a Ray Ozzie to post only when it's valuable to him and his readers. It's "You've got blog.""

Bill, Tony, John, give me some feedback then.

View My Stats