Tuesday, September 30, 2003

Enhydra's Workflow

It seems that the Enhydra group is still working hard on some novel products (a long time fan of XMLC):

"Shark is completely based on standards from WfMC and OMG using XPDL as its native workflow definition format. Storage of processes and activities is done using Enhydra DODS."

Shark.

"JaWE (Java Workflow Editor) is the first open source graphical Java workflow process editor fully according to WfMC specifications supporting XPDL as its native file format and LDAP connections." Workflow Application Declaration and Workflow Process Definition.

JaWE.

Latest XPDL specification (PDF). Kowari will be shipping with the latest version of Barracuda.

Taxonomies and WinFS

Longhorn To Get NUI Foundation Platform " In other words, if you want to tell your Longhorn computer to find a file, you will have the option of saying the word "Find," evoking a dialog box and/or drop-down menu on your screen.

This kind of capability is predicated on a new type of file system, Lee says.

"First, we need a file system that is more of a structured database," Lee explains. "You can't reason with everything being a file type. That's why we need WinFS," the Windows File System at the heart of SQL Server "Yukon" (and the data-store component of WinFS that will be embedded in Longhorn), Lee says.

"Then, we need to ask, if the richer (data) store lets you reason about nouns, why not verbs ... like 'format,' 'delete,' 'print'?" Lee continues. "If you think about 'find' as a verb, then everything fits into a reasonable taxonomy.""

Also interesting was the Google File System paper.

Friday, September 26, 2003

Web of Trust

Trust Metrics (idea) "Trust metrics are going to be one of the building blocks of future online communities. Advgato Trust Metric has the most visible example, but the concept of a trust metric is of great importance in computer security as well as in building community."

Semantic Web Trust and Security Resource Guide.

US Government not Afraid of Semantics

Marking Up Bureaucracy "Much other work is currently under discussion, but very few individuals were willing to discuss upcoming implementations; that said, it can be inferred that many projects lean toward making publicly available information (like that found on FirstGov and on THOMAS) available via a public, web-services based API. And the occasional tantalizing PDF shows that sites which present information to the public are thinking in terms of taxonomies, which, along with agenda proceedings from a September 8, 2003 conference on "Semantic Technologies for eGov" indicate that the US Government is not shying away from the promise of the Semantic Web."

Tim Berners-Lee and the Semantic Web

An Interview with Tim Berners-Lee
Whither the world wide web?

Thursday, September 25, 2003

Metaweb

Metaweb "The Metaweb is a collaborative structure for learning. In our first phase, we are annotating the ideas and historical period explored in Neal Stephenson's novel Quicksilver, seeding the Metaweb with an initial base of information. We are currently working on 109 articles, and hope you will expand and relate these and many other entries."

Wednesday, September 24, 2003

Spinning the iLife

Berners-Lee: Web inventor endorses Safari "During a lecture last night at the Royal Society in London, Tim Berners-Lee revealed that he invented the World Wide Web using a NeXT computer. He presented his lecture using Apple's OS X Web browser Safari on a PowerBook. He also referenced the Web's potential by talking about the possibilities of iCal, Apple's calendar program.

Berners-Lee was discussing one of the Web's many futures, in what he calls the Semantic Web. Devised by himself, this is due to become the next big thing, he told the audience. It will enhance the supply and exchange of information and data for the benefit of the Web user. "

Berners-Lee Talks Up Semantic Web "What if the World Wide Web were one giant database, linking both human readable documents and machine readable data in a way useful to both mankind and machine?"

"By implementing products based on RDF as an EAI "hub," companies can link together documents, and data stored in disparate databases, and pull related concepts together when analyzing the information. That sort of thing can be done with XML Web services today, but it can be a laborious task, Berners-Lee explained."

Tuesday, September 23, 2003

Another Australian Company using RDF

Langdale Consultants offers consultation to enhance production and delivery systems. They have developed products such as I-Builder and they use an application of RDF called CIM/XML.

Semantic Web Economically Infeasible

Distributed Computing Economics and the Semantic Web "Now along comes Gray, making an argument that, when you think about it, implies that the semantic web, as currently conceived, might just be all wrong. His basic point is that it's far cheaper to vend high-level apis than give access to the data (because the cost of shipping large amounts of data around is prohibitive). Since the semantic web is basically a data web, one wonders: why doesn't Gray's argument apply?"

Early on, I came across Open Source Distributed Capabilities, Smart Contracts and the idea of Agoric Computation. It formed the basis of Mariposa and was mentioned in an article in Wired called "The New Economy of Computation". It mentions that agoric systems have been described since the late 80s.

Agoric Open Systems "Agoric systems should form an attractive knowledge medium. In a large, evolving system, where the participants have great but dispersed knowledge, an important principle is: "In the incentive structure lies the power". In particular, the incentives of a distributed, charge-per-use market can widen the knowledge engineering bottleneck by encouraging people to create chunks of knowledge and knowledge-based systems that work together."

IT is Rocket Science (or at least Rocket Management)

I've been reading bits and pieces of SP-4221 The Space Shuttle Decision. It's a great insight into the management of projects and also a source of technical details such as why aluminum was picked over titanium, why it has a large delta wing, etc.

Some interesting quotes:
"Troubleshooting, also, was hit-and-miss. We all have had the experience of taking a car to a garage for repair, having a mechanic replace a part, paying the bill - and finding that the problem remains unsolved. Such experiences were also common in the airline industry. The American Airlines managers wrote that...over a recent six-month period, 44 percent of the components replaced during maintenance of the air conditioning system did not eliminate the pilot's complaint. Fifty-two percent of the replacements in the autopilot system did not eliminate the pilot's complaint. "

And:
"An in-house review...showed that NASA's principal automated spacecraft programs had increased in price by more than threefold...Gemini had gone from an initial estimate of $529 million, late in 1961, to a final expenditure of $1.283 billion. Apollo, with a program cost estimated at $12.0 billion in mid-1963, ballooned to $21.35 billion by the time of the first moon landing in July 1969...What had caused these overruns? Here too, cost meant people. Major overruns resulted when large technical staffs drew salaries to little effect, as when projects encountered technical stumbling blocks, forcing major redesigns. Such difficulties brought delays and pushed up costs by wasting much of the earlier work."

James Gosling on Checked Exceptions

Failure and Exceptions "Having one big catch clause on the outside really only works if your exception handling philosophy is simply to die...But pretty much all that a try catch block like that can do is blow the request away. There's no ability to respond gracefully. There's no ability to take account of local context to cope and adapt, which is really one of the key hallmarks of truly reliable software.

Bill Venners: It adapts to problems?

James Gosling: Instead of just rolling over and dying. "

Monday, September 22, 2003

Standards moi?

Funky to Go "The power behind Google is that the company owns the algorithms used to find data from the featureless mess of HTML that exists today. The more sophisticated the data storage, the less important the algorithms, and the less edge that Google has. Microsoft, by controlling the origination of much of this data can build in the missing knowledge about the data and basically undercut the ground on which the House of Google is written...But I wouldn't be surprised if Microsoft isn't working on creating it's own metadata and ontology XML vocabulary and data model, one that it will share with others, of course, putting it at the center of knowledge-based query in the years to come."

Microsoft pushing its own standard over a W3C one, that just seems unlikely.

Saturday, September 20, 2003

Friday, September 19, 2003

Joseki 2.0 Released

Joseki: The Jena RDF Server Includes: new query langauges HTTP GET, the Joseki distribution provides RDQL, "Fetch",
and a minimal query language "SPO", Joseki hosts models provided by Jena2, including inferencing models, extensible client libraries (different models and different query languages), and client libraries for Java and Python.

Bayesian Network Classifier Toolbox

jBNC is a Java toolkit for training, testing, and applying Bayesian Network Classifiers. Implemented classifiers have been shown to perform well in a variety of artificial intelligence, machine learning, and data mining applications.

Water

While Australia is suffering from a drought vortex, America is suffering the opposite.

From the Catalyst story:
"The water supplies of Melbourne and Adelaide are well below 50% capacity and in Perth their reservoirs are less than a quarter full. If the next 18 months are as dry as the last, these cities and their six million residents face a water crisis."

"The heatwaves and fires that we experienced in Australia recently and in Europe currently are indeed a glimpse of the future. We would expect more heatwaves, more droughts and of course a greater stress on people living in cities."

"But while the rest of the world gets warmer due to greenhouse gases, Antarctica is cooling due to ozone loss. So there’s a bigger difference in temperature between the equator and poles. The combined effect of ozone and greenhouse, it seems, is making the polar vortex spin faster."

There has been some simple proposals, much more reasonable than an Armageddon like ending for hurricanes.

Establishing the Loosely Coupled

Unweaving the tangled web of dumb data "Semantic encoding can be particularly useful in inference engines. Encouraging relationships between pieces of information enables you to analyse that information for new relationships...Using such technologies within the corporate firewall is one thing, but building a whole new web based on them is quite another. If we could create a second generation web using semantic technology, the benefits would be huge...The semantic web is not likely to hit your browser any time soon, but the semantic intranet just might. The underlying technology has been on the agenda since the mid-to-late 1990s, but it is now starting to move from theory into commercial products as companies begin to release RDF-capable knowledge management systems and inference engines. UK-based Inference Networks is one such firm, and in the US, Amblit Technologies has a semantic browser, and Intellidimension has an RDF data management system."

Thursday, September 18, 2003

Confluence Alpha 1

Some of the features include:
* Content is organised into discrete spaces.
* Permissioning per space, on a user or group basis.
* Textile-based text formatting.
* Page templating allowing rapid creation of boiler plate pages.
* Exporting of a whole space or single page to PDF or HTML.
* Dump/restore of the database to XML, with daily backup option
* Full text seraching across all pages visible to a user.
* Multiple RSS feeds the application and each space.
* Importing of page content from plain text files.
* Attach arbitrary files to any page.
* Tracking of all internal and external links.
* Flexible user and group management.

http://atlassian.com/software/confluence/

Wednesday, September 17, 2003

Semantic Technologies for E-Government

This was a recent one day seminar held by TopQuadrant at the White House Conference Center that included "...solution stories from both vendors of semantic technologies and agencies that are already using them in applications."

The preceedings included two papers: Semantic Technology Briefing and Ontology Myth or Magic.

The applications for semantic technology listed were "Answer Engine", "Automated Content Tagger", "Concept-base Search", "Connection and Pattern Explorer", "Content Annotator", "Context-Aware Retriever", "Dynamic User Interface", "Enhanced Query Search", "Expert Locator", "Generative Documentation", "Interest-based Information Delivery", "Navigational Search", "Product Design Assistant", "Semantic Data Integrator", "Semantic Form Generator and Results Classifier", "Semantic Service Discovery and Choreography"and "Virtual Consultant".

Also mentioned were ontology uses, comparison of tools, lifecycle, etc. and vendors like Ontoprise, Network Inference, Semagix, Lockheed Martin, Plugged In Software and Unicorn.

Shelly Powers' RDF book among others (actually most by Dieter Fensel) were listed (many of which I seem to have sitting on my shelf).

Tuesday, September 16, 2003

ISWC 2003

"In the context of new work on distributed computation, Semantic Web Services (SWSs) go beyond current services by adding ontologies and formal knowledge to support description, discovery, negotiation, mediation and composition. This formal knowledge is often strongly related to informal materials. For example, a service for multi-media content delivery over broadband networks might incorporate conceptual indices of the content, so that a smart VCR (such as next generation TiVO) can reason about programmes to suggest to its owner. Alternatively, a service for B2B catalogue publication has to translate between existing semi-structured catalogues and the more formal catalogues required for SWS purposes. To make these types of services cost-effective we need automatic knowledge harvesting from all forms of content that contain natural language text or spoken data."

The winner of the Semantic Web Challenge will be announced, papers and demos. Related to the Semantic Web Enabled Web Services homepage.

Pen Computing Destroys the English Advantage

Why pen computing could put you out of business "Being born in an English-speaking nation has, for the last century or so, meant never having to say you're sorry. People all over the world are still dying to try out their English on you... At some level, I'm all for technology lowering the linguistic barriers to communication and commerce...Globalisation is happening, for good or ill, bringing with it something we English-speaking first-worlders will just have to deal with: Tomorrow, we'll be a lot less special than we are today."

Enough Meta?

Metanology Joins Open-Source Tools Group "Our development team was able to produce an advanced programming tool with features that exceed those provided by non-Eclipse based competitors. Instead of spending our programming effort creating infrastructure, we were able to focus on the features of MDE that would create value for our users."

Uche Ogbuji wrote a paper called XML, The Model Driven Architecture, and RDF which briefly outlines MDA and suggests how RDF and RDFS could be used instead of XML and XSLT. UML 2.0 and RDF lists several other papers that maybe of interest.

Sunday, September 14, 2003

Semantic Web Integration Start-Up

Metatomix(TM), Inc. Closes $8.3 Million Venture Round to Bring Real-Time Visibility and Integration Software Technologies to Commercial and Government Markets "Metatomix, founded in December 2000, has developed its Real-Time Visibility and SMARTE(TM) (Surveillance Monitoring and Real-Time Events) Suites based on its innovative application of Semantic Web-based architectures. The Semantic Web extends the Internet by allowing data-driven interactions to enable greater access to all types of information. This evolution extends the original Web concept of people interacting with computers to the sea changing philosophy of machines interacting with machines. Metatomix has harnessed this data-driven network computing technique to deliver real-time information integration and visibility software for commercial applications."

Metatomix has a piece of software they call the Hologram Store which "...is stored in a form that allows the data to be queried and coalesced from a variety of perspectives. The data is captured and expressed in Resource Description Framework (RDF), a W3C (World Wide Web Consortium) standard that is a form of XML. By using this data format, new data sources are easily added without the requirement to redesign or reprogram the data model."

Saturday, September 13, 2003

There's Gold in them thar Semantics

"Beyond the great wall of data on the Internet lies a goldmine for enterprises called the Semantic Web.

Based on standards pioneered by the W3C, the Massachusetts Institute of Technology, Hewlett-Packard (Quote, Company Info) and a network of grassroots communities, the Semantic Web uses the Resource Description Framework (RDF) to piece together a variety of applications using XML for syntax and URLs for naming."

Teknowledge is featured. They hosted the AAAI-2002 and were mentioned before in relation to DARPA grants.

UDEF

e-Business & Web Services: The Missing Semantic Metadata Link "The Universal Data Element Framework (UDEF) is a cross-industry metadata identification strategy designed to facilitate convergence and interoperability among e-business and other standards. The objective of the UDEF is to provide a means of real-time identification for semantic equivalency, as an attribute to data elements within e-business document and integration formats."

RDF in Cellphones

A recent discussion on RDFIG "I work for a cellphone firmware company, where I have been pitching the idea that we would be smart to start putting some semweb code into our product. If anybody is curious, I'd like to briefly describe my strategy and hear any feedback you folks might have...The internal marketing involves finding pre-deployment internal apps. A few of these are PIM data, putting timestamps and text annotations on photos and sound bites, and metadata descriptors for internal applets. There is a lot of interest in tracking user preferences."

Should the RDF model be integrated into the File System?

We had a recent "discussion" at work about the BeOS file system, NTFS and the advantages that metadata integration in the operating system brings. Most modern file systems (in the last 10 years) have had some metadata support: HPFS, BFS, Reiser4 and NTFS (in 2000 and above) and the forth-coming WinFS. There's even a proposal to extend OS X's metada support.

Interestingly, "...NTFS5 implements general indexing, which lets NTFS5 store arbitrary data in indexes and sort the data entries by something other than a name. NTFS5 uses general indexing to manage security descriptors, quota information, reparse points, and file object identifiers..."

For examples of the power that BeOS's implementation gave there's the very popular Tales of a BeOS Refugee. BeOS's file system and translators had a direct implication on how applications were developed.

Or is applying metadata to file systems similar to the definition of the Semantic Web: "An attempt to apply the Dewey Decimal system to an orgy."

Friday, September 12, 2003

Results from Querying RDF

Recording Query Results "This document describes a way to record the results of queries where the queries are from languages that return bound variables. The recording of the results is in RDF, enabling graph comparison to be used for testing whether two sets of query results are equivalent."

I'm not sure whether I'd had read this before or not. It will be good to get a proper HTTP client in with Joseki support for Kowari.

Fractals (again)

Data visualisation: is it coming of age? "Fractal Edge's unique approach to data visualisation has enabled the power of the underlying data to be greatly enhanced. Data is better explained and understood within a defined vision or context. Huge volumes of rapidly changing information may be presented on screen in visual form without losing detail. Bottlenecks in the access and delivery mechanisms for data are eliminated. Data may be colour coded for ease of macro presentation and identification purposes and the structure of the data may be easily arranged depending on criteria important to the user.

Fractal Edge applications map data neutrally. They can aggregate multiple proprietary and third party data sources to bring a combined view of relevant sources of information. Basic adapters are available to CSV and Excel and more sophisticated adapters to Windows and Bloomberg, the financial data and market information providers. One-way and two-way links are available to data sources. Fractal applications allow launch of functionality from underlying applications."

WebFountain (again)

Here's another article on IBM's Web Fountain: " The result is a new online service called WebFountain. A big computer at IBM hoovers up web pages and information from other sources such as newsgroups, syndicated content and newswires. Each incoming page is analysed to determine what language it is in. The context—a news report, a page on a company's website, a web-log entry—is determined. Verbs, nouns, adjectives, proper nouns, place names and even entire phrases are extracted, and are analysed for positive or negative connotations. The page is also classified by category—is it about baseball, Iranian politics or global warming?"

Sunday, September 07, 2003

Querying RDF

A TALES syntax for RDF "It is becoming increasingly important to have the ability to link documents along more dimensions than just hyperlinks and to specify partial metadata. The standard model for this is the W3C's RDF. Using RDF, a cross-planet, distributed relational system can be set up, relating resources by predicates...We feel that TALES provides a system for a concise, yet explainatory format for RDF queries. With the current plans for an RDF layer in Plone, it is felt that developing a good syntax for this tool to use in formulating its queries would be beneficial."

Pondering RDF Path "Most RDF Path proposals to date are sketchy, and do not provide clear equivalents of the facilities in XPath, or do not account for the fact that selections of a resource occuring as a node and an arc causes problems."

RDF-Rules mailing list.

Saturday, September 06, 2003

RDFT

RDF Templates "RDF Templates (RDFT) are an XML format for creating representations of RDF graphs. In a similar way to XSLT, RDF Templates define template rules with patterns which are matched against nodes. Template rules specify output actions and further node selections which trigger further template operation. However, instead of acting on an XML tree, RDFT acts upon an RDF graph. Nodes are specified using a 'nodepath' syntax which defines conditional node/arc/node graph traversals. A macro definition facility is provided to reduce long nodepaths to easier to read strings."

The Future is Open

IBM Pushes Mainstream Linux Acceptance With New Television Ad links to an advertisement by IBM on Linux.

The transcript includes quotes such as "I think you should see this...It’s just a kid.", "One little thing can solve an incredibly complex problem.", "Knowledge amplification. What he learns, we all learn. What he knows, we all benefit from.", "Plumbing, it’s all about the tools." and "THE FUTURE IS OPEN". Remind you of anything else?

Thursday, September 04, 2003

Scooping Yourself

"TKS has been available commercially for two years, but is in the final stages of release under the Mozilla Public License, version 1.1. Release under the MPL is anticipated in October, 2003."

Cool, this isn't the currently distributed version of TKS either. This is the new, you-beaut 64-bit version, with streaming results and disk based queries. In the next few days there should be some more information about this. I'd like to get some sort of paper for WWW2004 done - maybe based on the data structured used to store RDF.

The current name I like is ocelot.

Reading

Practical Artificial Intelligence Programming in Java is one of a few Open Content books available for download written by Mark Watson. Of interest was the small sections on finding paths in graphs and WEKA.

3store: Efficient Bulk RDF Storage.

HT03 Complete List of Papers includes: The Connectivity Sonar: Detecting Site Functionality by Structural Patterns, Which Semantic Web?, Finding the Story - Broader Applicability of Semantics and Discourse for Hypermedia Generation, Collage, Composites, Construction, and Refinement of TF-IDF Schemes for Web Pages using their Hyperlinked Neighboring Pages.

AAMAS'03 Workshop on Web Services and Agent-based Engineering includes: Semantic web services interaction "For this purpose, we propose in this paper a new paradigm for cooperation and coordination between web services disguised as software agents.", KAoS Semantic Policy and Domain Services: An Application of DAML to Web Services-Based Grid Architectures "KAoS is a collection of componentized services compatible with several popular agent and distributed computing frameworks, including Nomads [21; 22], the DARPA CoABS Grid [17], the DARPA ALP/Ultra*Log Cougaar framework [29], Brahms [20], Voyager [30], CORBA [31]—and now GT3." (they plan to move to OWL), Brokerage for Mathematical Services in MONET "We describe here the approach to service brokerage being explored as part of the MONET project, whose larger aim is to demonstrate the applicability of the semantic web to the domain of mathematical software." and Composing Workflows of Semantic Web Services.

Tuesday, September 02, 2003

Streaming XML

"The Streaming API for XML (StAX) parsing will specify a Java-based, pull-parsing API for XML. The streaming API gives parsing control to the programmer by exposing a simple iterator based API. This allows the programmer to ask for the next event (pull the event) and allows state to be stored in a procedural fashion."

More information: JSR 173 Streaming API for XML, Specification and Reference Implementation and an article from Oracle comparing DOM, SAX and StAX Parsing.

Will the Role of the Relational Model Expand or Contract?

"If you're reading this and thinking "all he's saying is that lucene is useful for indexing xml fragments," you're halfway there. And If you're an XML lunatic who then says "Hey! Wow! And the world, in its entirety, is entirely composed of XML (or possibly RDF) fragments," then you've gone way too far. What I know is that the world is mostly made up of semi-structured data and I know that database schemas often evolve at a ferocious rate because, when we impose more structure, we often get it wrong.

And so now what I'm wondering is if I was completely off base in 1997. That is, I'm wondering if Moore's law really says that relational databases are going to become vastly less important over time, because for most applications there's a less-structured (and less efficient) way to do things that's more convenient for the programmers."

Whither (wither) the Relational Database

Well, being a Semantic Web blogger I'd have to say that the relational model, as it is applied in RDF, is only going to increase in usage not decrease. Things like JDBM (used for the backend of LDAPd) and hopefully TKS will become increasingly commoditized into being just something you throw in like logging.

Monday, September 01, 2003

What's in a name?

I'm trying to think of what to call an open source project. I started thinking of Australian animals, especially ones that start with "j" like jumbuck or jabiru. Jabiru was good because it's the biggest stork, stork -> store, wading in data, etc. I got all excited over "tyuk" which sounds like "chook" as both mean chicken. Lots of chicken references. Pretty sad. Any ideas? Taipan? Cod? Krill?