Thursday, November 27, 2003

Simulated OS for teaching Assembly

Apoo is very similar to one of the uses of RCOSjava.

Jena2 Manager

"Briefly stated, I needed a means by which I could quickly hack models and ontologies to learn how to use Jena2. There remain many things to learn, and many things to finish coding in the program. I'm turning it loose so that others can contribute to its development. The JOSL license requires that those who fix things in the code or otherwise improve it return their code to the public. JOSL does not require that users use an open source license on new code that extends the licensed code."

Jena2 Manager

On Ontologies and Gnomes

The AI gnomes of Zurich "McDermott ends with a zinger:

It's annoying that Shirky indulges in the usual practice of blaming AI for every attempt by someone to tackle a very hard problem. The image, I suppose, is of AI gnomes huddled in Zurich plotting the next attempt to --- what? inflict hype on the world? AI tantalizes people all by itself; no gnomes are required. Researchers in the field try as hard as they can to work on narrow problems, with technical definitions. Reading papers by AI people can be a pretty boring experience. Nonetheless, journalists, military funding agencies, and recently the World-Wide Web Consortium, are routinely gripped by visions of what computers should be able to do with just a tiny advance beyond today's technology, and off we go again. Perhaps Mr. Shirky has a proposal for stopping such visions from sweeping through the population."

The entry links to a paper which lists the things that the Semantic Web "violates" wirth respect to traditional assumptions about AI. Including lack of referential integrity, variety in quality, diversity and no single authority. As noted, these are the same problems with human intelligence too.

Wednesday, November 26, 2003

webMethods going Semantic

Interview: webMethods CEO eyes Web services innovation "Secondly, there’s a whole other layer to deal with, what I call the semantic integration problem. Web services are great but they standardize pure connectivity between applications. The applications still have highly varied data models, extremely different ideas of what business processes should look like. Yet for most large organizations, a business process is going to span many applications. So you’re always going to need in the middleware stack something that can do wrapping, transformation, and, more than that, can actually keep the model of how the business processes are implemented across all of the infrastructure pieces.So [you need] something that’s technology-neutral underneath, like our Fabric product, and then on top have the ability to orchestrate business processes across all of these nodes in the fabric. Our customers now want to get real-time intelligence about what’s happening with the business and with the business processes, and they want to see it in dashboards, they want alerts. So we can put real-time monitoring around [IT infrastructure] at the business process level."

"We’re also able to offer enterprise event management, [injecting] business events into some kind of AI [artificial intelligence]-based rules engine."

Web Services in RDF

The question of how Web Services and the Semantic Web came up again recently. Here are a few links to current work in the area:
* Semantic Web enabled Web Services,
* Government Semantic XML Web Services Community of Practice,
* Esperanto,
* Meteor-S,
* Supercharging WSDL with RDF, and
* SWAD-Europe Thesaurus Activity.

Tuesday, November 25, 2003

Don't Panic

Neil Gaiman hitchhikes through Douglas Adams' hilarious galaxy

Only 3

Three Uses for the Semantic Web They include:
* Sideline Semantics (or how to cut down on those darn post-planning columns),
* The Policy Ontology, and
* Cross Domain Searching for Calendar Concerns.

The Policy Ontology was perhaps the most interesting:
"I decided to tackle this with an interface to Jena for Apache Cocoon, or to use Cocoon parlance, a Jena-based transformer. I had no idea what kind of systems sat behind virtual reference applications, but I did know the protocol used underneath the queries was based on SOAP, and Cocoon excels at inserting itself in between any XML stream and adding value to the contents. So my approach was to use Jena's inference capabilities to map different classification schemes based on relationships defined in either RDF Schema or OWL. Yes, you could do the same thing with a table or two, and a thousand other ways, but the ontology approach provides a formal syntax for defining relationships."

Seems similar in idea to Sherpa Calendar.

The application is WIBS.

Fractally Yours

Openness & Interconnection "Big Fractal Tangle would be the name of a blog...A decade into the first Web, we've now got way too much information available, too much for any of us to sift through easily, which is why we need Round Two: the annotated, interconnected, Web. This new organic, evolving, maintainable, improvement will do more than simply increase the accuracy of our Google searches. It'll help real people understand and visualize interconnection, which in my opinion will alter our society profoundly for the better.

This was to be the point of my paper. Driving home that night, my brain frazzled and my voice hoarse from too much talk, I realized the topic was too big for a single paper. It'll have to be a blog."

See also: The Fractal nature of the Web.

Sunday, November 23, 2003

Random RDF Tools

MusicBrainz Java API, RDFical (English version), MnM and FOAF Explorer - some of these I've covered before.

One Stop Schema Shop

SchemaWeb " SchemaWeb is a repository for RDF schemas expressed in the RDFS, OWL and DAML+OIL schema languages.
SchemaWeb is a place for developers and designers working with RDF. It provides a comprehensive directory of RDF schemas to be browsed and searched by human agents and also an extensive set of web services to be used by RDF agents and reasoning software applications that wish to obtain real-time schema information whilst processing RDF data. "

Who are you?

Gillmor Takes On Dvorak's Anti-Blog Stance ""Perseus thinks that most blogs have an audience of about 12 readers," Dvorak argues. Yes, John, but who are those 12? If one of them is Bill Gates, and another is Tony Scott, CTO of General Motors, and another is John Cleese, well you get the idea. Sometimes it's who you know as much as what. RSS only amplifies this, allowing a Ray Ozzie to post only when it's valuable to him and his readers. It's "You've got blog.""

Bill, Tony, John, give me some feedback then.

Saturday, November 22, 2003

Exceptional Exceptions

Best Practices for Exception Handling "When deciding on checked exceptions vs. unchecked exceptions, ask yourself, "What action can the client code take when the exception occurs?"

If the client can take some alternate action to recover from the exception, make it a checked exception. If the client cannot do anything useful, then make the exception unchecked. By useful, I mean taking steps to recover from the exception and not just logging the exception."

"Preserve encapsulation.

Never let implementation-specific checked exceptions escalate to the higher layers. For example, do not propagate SQLException from data access code to the business objects layer."

The only one I slightly disagree with is:
"Log exceptions just once

Logging the same exception stack trace more than once can confuse the programmer examining the stack trace about the original source of exception. So just log it once. "

I think I'm right, in there's been cases where logging at different levels of abstraction have required multiple logging of the same base exception. I've also found it helpful rather than harmful.


"OntoBuilder started as a tool (with a user interface) developed in Java. Later, one of the requirements was to implement OntoBuilder as an agent, removing any user interaction required. Therefore, OntoBuilder was implemented as a TCP agent. OntoBuilder can be accessed as a graphical tool, as a command line tool, as an applet, as a java WebStart application, as a TCP server, and using an HTML interface."

Includes ontologies in domains including: car rental, job finding, news, and others. Shouldn't forget about Metalog either.

Also, a new book called Ontological Engineering has just been published (well November 1) just in time for Christmas, the perfect stocking filler.

Pluggable Data Types

Curiosity Killed the Cat "This argument cements my suspicions that the using RDF and Semantic Web technologies are a losing proposition when compared to using XML-centric technologies for information interchange on the World Wide Web. It is quite telling that none of the participants who tried to counter my arguments gave a cogent response besides "use an xsd library" when in fact anyone with a passing knowledge of XSD would inform them that XSD only supports ISO 8601 dates and would barf on RFC 822 if asked to treat them as dates. In fact, this is a common complaint about them from our customers w.r.t internationalization [that and the fact decimals use a period as a delimiter instead of a comma for fractional digits]. "

One of the things on the Kowari Roadmap (I wish I could point to the real one) is pluggable data type handling. The basic idea is that you can describe the data handler and have data type processors register themselves with Kowari. Much like RelaxNG's.

SARS, SOTA and the Semantic Web

October's issue of Computer had two interesting articles on the Semantic Web:
Fighting Epidemics in the Information and Knowledge Age "We have simulated the spread of SARS and shown that isolation control measures had no significant effect on containing the epidemic's outbreaks...Information and knowledge sharing profoundly influenced the extent and duration of the SARS epidemic. At first, lack of information and knowledge sharing hampered China's efforts to research the virus and control the epidemic. SARS appeared initially in Guangdong province, but during the outbreak's early stages, the obtained experience with controlling and curing the disease was not available to health workers in other affected regions...Currently, however, the Web cannot guarantee the accuracy and reliability of the data it holds. To overcome these limitations, scientists are exploring ways to reshape the Web. The Semantic Web and the Grid represent just two of these efforts."

I'm not sure why I liked this article so much. Is it the old man vs microbe battle? Is it the positive reaction and can-do attitude about these problems.

Ontology-Mediated Integration of Intranet Web Services "Dealing with this flood of options will require sweeping automation. To meet this challenge, the authors built their smart office task automation framework—SOTA—using Web services, an ontology, and agent components."


Notes from SWAD-E "Building a triple store based on non-relational technology was represented by several participants such as those using BDB (not in Jena2 at present) and more sophisticated indexing such as @semantics. These have the advantages of smaller system dependencies than RDBs (slightly different SQLs, optimising needs, fetures) but are more "bare metal". The indexing is done using hashes (content digests) or using triple identiifers. A brief discussion showed that there were a variety of content digests used across relational and non-relational, MD5, SHA1 and using the top/lower 32/64 bits. As disks are still much slower than processors, there is little difference on modern systems (but MD5 is seen as more common). The non-relational triplestores tend to have better intimate knowledge and use of the RDF details such as schema information but still need to have query optimising, text searching and so on added by hand rather than reusing relational work."

libferris Release 1.1.12 "New fnamespace for setting XML like namespaces to refer to EA, added support for mounting RDF/bdb and RDF/XML files with list, rename, remove, create support, can now save EA in a user's local RDF/bdb file, new as-rdf EA to export all EA for a file as RDF/XML, new myrdf:// URL for personal RDF storage, can now handle directory names that are URIs properly, isCompressedContext() no longer tries to read a context to find out if its compressed or not, commented g_io_channel_set_encoding() because it was causing errors setting encoding on a fifo to null." Note of libferris.

Semantic Web the comic Less said the better.

Saturday, November 15, 2003

Java Goodies

Stocking up on the Internet before I spend a week on holidays:
JmDNS is a Java implementation of multi-cast DNS and can be used for service registration and discovery in local area networks. Part of JTunes.
JXTA 2: A high-performance, massively scalable P2P network "JXTA 2 introduces the concept of a rendezvous super-peer network, greatly improving scalability by reducing propagation traffic...Implementation of the shared resource distributed index (SRDI) within the rendezvous super-peer network, creating a loosely consistent, fault-resilient, distributed hash table".

Both ripe for the impending release of the Kowari project on SF.

Friday, November 14, 2003

Authors, Librarians and More Metadata

It's been nearly a week since Clay's posting. It's with some weariness that I began reading this response, but it was well worth it. Paul Ford has been a long time supporter of the Semantic Web.

A Response to Clay Shirky's “The Semantic Web, Syllogism, and Worldview” "But logical reasoning does work well in the real world—it's just not identified as such, because it often appears in mundane places, like library card catalogs and book indices, and because we've been trained to automatically deduce certain assumptions from signifiers which do not much represent the (S,P,O) form."

"I am a writer by avocation and trade, and I am finding real pleasure in using Semantic Web technologies to mark up my ideas, creating pages that link together. What I do is not math done with words. It's links done with semantics, and it forces me to think in new ways about the things I'm writing."

"For every quote he presents that shows the Semantic Web community as glassy-eyed, out-of-touch individuals suffering from “cluelessness,” I could give a list of many other individuals doing work that is relevant to real-world issues, who have pinned their successful careers on the concepts of the Semantic Web, sometimes because they feel it is going to be the next big thing, but also because of sheer intellectual excitement...My money's on them. They know what they're talking about, and aren't afraid to admit what they don't know."

The best is definately last:
"Postscript: on December 1, on this site, I'll describe a site I've built for a major national magazine of literature, politics, and culture. The site is built entirely on a primitive, but useful, Semantic Web framework, and I'll explain why using this framework was in the best interests of both the magazine and the readers, and how its code base allows it to re-use content in hundreds of interesting ways."

The Semantic Web looks to become a little bigger and a little better.

Wednesday, November 12, 2003


Describing Computation within RDF "A programming language is described which is built within RDF. Its code, functions, and classes are formalized as RDF resources. Programs may be written using standard RDF syntax, or in a conventional JavaScript–based syntax which is translated to RDF."

Somehow I missed this being announced on the rdf-interest list. Interestingly, he's used DAML and not OWL, I can't see any obvious reason for that.

Metadata and Librarians

the semantic web, metacrap and libraries "Well, I may be biased, but a lot of it probably has to do with who exactly is creating the metadata. Librarians are probably as third-party and objective as you're going to get, when it comes to analyzing resources. Doctorow's concerns of overt misrepresentation for personal gain are lessened when the party creating the metadata really only has a stake in its correct representation. Of course, librarians do make judgement calls: they're unlikely to create metadata for just any resource and do apply some rules (certain subject specificities, etc.) that may not actually provide the most useful metadata."

"I do agree that a global ontology is probably hopeless, but I do think that applying an assortment of ontologies where necessary (or possible) might go a ways towards creating data that can be globally related."

"But anyway, even if a global ontology, whether made up of related ontologies or not, can't happen, or doesn't happen, it's not like its subsets don't remain valuable."

Tuesday, November 11, 2003


Survey: Biggest Databases Approach 30 Terabytes "For its Top Ten Program, Winter Corp. gathers voluntary submissions from companies worldwide that are running large databases. The program requires that the databases must be in production and contain at least 1 terabyte of data (or 500 megabytes of data if running on Windows). The results, divided into 24 categories, are based on the amount of online data running on the database."

"The largest decision-support database in this year's survey is from France Telecom and handles 29.2 terabytes of data, triple the size of the top database in that category in Winter's last survey in 2001."

I wonder what the rules are regarding distributed databases. Could you submit the Semantic Web as a runner in this competition?

Sick of Carping on about the Semantic Web

LOOKS FISHY TO ME "Leave it to the fashion world to make bikinis out of leather. But these sexy little swimsuits are cut from a material that's more apropos for water wear than you might think: tanned, dyed salmon skin. Soft, smooth and lightweight, this particular form of "sea leather" has a natural elasticity that won't sag after a dip."

Now that's making asturgeons. That's the kind of entailment most guys would relate to. Fish Eye for the Straight Guy. Ergh. I haven't even watched that show. Alright, back to work.

Semantic Web for Workflow

Why an ontology for workflow? "The purpose is clear and simple - workflow orchestration at any level requires both the Pull and Push model...the Push model happens when a workflow has to report to its superior workflow - a natural strategy in many current solutions - this is easy when both the workflows are implemented using the same platform. Things get botchy when they are implemented differently - and thats where the ontology comes in. Now this person also asked my why an ontology and not a simple xml - and my answer was the requirements forced me to. One of the requirements was to ask a given instance to show me all 'user entry form' kind of activities which were completed recently. Now if you look at the last sentence carefully you see I use the clause 'kind of' - a good ontology fit !."

A Little More Positive

Shirky on SemWeb, a Case Study "What bothers me about Shirky's essay is that the building blocks of the Semantic Web are useful even if we never achieve nirvana, whereas Shirky's critique makes it seem like they are an academic exercise of no utility to anyone now or in the future."

XML vs. RDF :: N × M vs. N + M "The RDF model along with the logic and equivalency languages, like OWL (nee DAML+OIL), altogether referred to as "the Semantic Web", is the current W3C effort to address that problem. Factoring those equivalencies into a common layer allows application developers to work with an already-integrated model, and the libraries to do the work of mapping each schema to the integrated model using a shared schema definition: N + M"

shirky touches off a storm of semantic web posts "My inner librarian has a response brewing, as well, but it will have to wait a bit. It’s the last week of the quarter, and I’ve got exams and project to give and grade. Next week is blogging and reading catch-up time."

Clay Shirky Predicts the Demise of The Semantic Web "However, his arguments "syllogisms are not very useful" and "we describe the world in generalities" are extremely bitter pills to accept. (Darn! I haven't thought about these two, he hits a homer!) However, he is correct, we should be digging deeper into probability networks rather than logic inferencing engines. Semantic technology still has its place in the world (maybe the enterprise though), however I agree with him in that it isn't going to scale for the web."

Monday, November 10, 2003

Business Web

Metadata, Semantics and All That "I’m closer to the Semantic Web project than most, and remain significantly unconvinced, but I don’t think dismissing it is as easy as shooting fish in a barrel, and anyhow shooting fish in a barrel is unsportsmanlike and generally sucks...It’s in the Metadata Right at the moment, a lot of the Semantic Web theory is doomed to remain just that—theory—because it relies on the existence of a critical mass of metadata, and if we’ve learned one thing in recent decades it’s that there is no such thing as cheap metadata. I’m pretty convinced that if you could build up a lot more metadata you could make the Web a more useful place, and I’ve thought a whole lot about this problem over the years, and really haven’t made much progress...However, if all of a sudden there were a million machine-readable business facts there for anyone to read, I think that quite a few software-savvy and accounting-savvy entrepreneurs would retreat into their garages and there would be some considerable surprises in store."

Web Rules, Okay?

DRAFT: Semantic Web Rules Working Group Charter "The W3C Team, with input from the Semantic Web Coordination Group, is presently involved in drafting a (Member Confidential) proposal to its Membership for a "Semantic Web Phase 2 Activity"...The group is chartered to develop a practical and useful rules language for the Semantic Web, along with a corresponding language for expressing justifications. The rules language will allow the expression of the knowledge that when certain things are true, certain other things must also be true; the justification language will allow the expression of the knowledge of how certain rules were used to reach a conclusion."

Dan Brickley had some interesting comments:
"What you might end up with here is an RDF *description* of a query-related data structure. Maybe handy for testcase-style interop, but pretty ugly to read and think about. I believe DAML Query works this way. My understanding of XQuery btw is that they started out with an XML syntax but now mostly focus on the non-XML syntax, since it is vastly more usable. My hunch is that RDF Query might go the same way...

Closest you can get and still be pretty is a kind of query-by-example, with bNodes for variables, perhaps decorated with variable names in a well-known namespace. Such RDF/XML would never be taken assertionally but used to ask questions. I think Edutella have something in this vein. Sorry I'm in a rush or I'd do the googling for links. Also this approach doesn't allow blanks for property names, since RDF/XML doesn't allow that."

Review Vocabulary

RDF Review Vocabulary "Review and rate blogs, CDs, books, software, whatever. Suitable for inclusion in any RDF-based language : FOAF, RSS etc."

Sunday, November 09, 2003

Wallop Wallop

Will Microsoft Wallop Friendster? "In fact, Wallop is Microsoft's venture into the red-hot social-networking arena, using the common Microsoft tack of piecing together existing technologies and packaging them for the novice user. Those technologies include Friendster-style social-networking capabilities, super-simplistic blogging tools, moblogging, wikis and RSS feeds, all based on Microsoft's Instant Messenger functionality."

Reiterate, Reiterate, Reiterate

Shirky's Men of Straw and Comments: Shirky misses both say pretty much the same things I did.

Shirky on the Semantic Web:
"Shirky nonetheless bases his argument on a central fallacy: the Semantic Web as monolith, as a "thing" to be supported or opposed."
Deconstructing The Syllogistic Shirky:
"There never was a suggestion that all metadata work cease and desist as we sit down on some mountaintop somewhere and fully derive the model before allowing the world to proceed."
"Sure, there is certainly a segment of the semantic web community who think we'll be able to do the "strong semantic web", which can somehow make inferences without much human work in the metadata space, but the bulk of the folks I've talked to about it are well aware of the difficulty of that kind of problem - and they are much more focused on, as Ken Macleod puts it, the "relational model of RDF and its ability to integrate decentralized data models"."
Most of these can be linked from either Syllogism or Semantic Examinations.

Other links (most positive towards the article): Clay Shirky smacks syllogism around. (Metafilter), Semantic web systemantics, Clay Cements the Semantic, Shirky: The Semantic Web, Syllogism and Worldview, Shirky on SemWeb and Pass It Around.

Saturday, November 08, 2003

Why, Why, Why?

"Despite their appealing simplicity, syllogisms don't work well in the real world, because most of the data we use is not amenable to such effortless recombination. As a result, the Semantic Web will not be very useful either...This sentiment is attractive precisely because it describes a world simpler than our own. In the real world, we are usually operating with partial, inconclusive or context-sensitive information."

Hmm, relational databases are used everyday and they make assertions - even incorrect ones. When you create a row in a table you're saying "Andrew lives at 12 Mulberry Lane" or "Andrew has $123 dollars in his account". Relational databases have the same problems with respect to partial, inconclusive or context-sensitive information.

RDF by itself, has no enforced schema, so you could quite easily store data that you would not be able to put in a relational database. However, by adding schemas you can then start querying or preventing from being stored data that does not fit your schema. It's a lot more flexible in that manner.

"Each of those statements is true, in other words, but each is true in a different way. It is tempting to note that the second statement is a generalization that can only be understood in context, but that way madness lies. Any requirement that a given statement be cross-checked against a library of context-giving statements, which would have still further context, would doom the system to death by scale."

To quote Tim Berners-Lee in "Weaving the Web":
"Databases are continually produced by different groups and companies, without knowledge of each other. Rarely does anyone stop the process to try to define globally consistent terms for each of the columns in the database tables...If HTML and the Web made all online documents look like one huge book, RDF, schema and inference languages will make all the data in the world look like one huge database"

"If a reasoning engine had pulled in all the data and figured the taxes, I could have asked it why it did what it did, and corrected the source of the problem.

Being able to ask 'Why?' is important. It allows the user to trace back to the assumptions that were made, and the rules and data used. Reasoning eninges will allow us to manipulate, figure, find and provide logical and numeric things over a wide-open field of applications."

"Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web. The processor could "think" about this til the silicon smokes without arriving at an answer."

Defining equality is exactly what Semantic Web technology allows you to do - by using things like OWL. You can say "Person Name = Name" if Person Name's First Name is equal to Name's Christian Name and Person Name's Last Name is equal to Name's Surname. In fact, defining equality also includes different languages and differing literals (like II, 10, 2, "two", etc.).

Again, we get Cory with the Metacrap and Mark with his tag soup.

Reading the sections "Worldviews Differ For Good Reasons" and "Worse is Better" sounds very similar to the sections in "Weaving the Web" about the Semantic Web and using databases, except of course Berners-Lee says the Semantic Web takes these things into consideration.

Clearly, Clay has missed the point of RDF and Semantic Web technologies. Maybe people should stop concentrating so much on the "Semantic" part of the name and more on the "Web" part. Saying that you can't use the Semantic Web technologies "a bit at a time, out of self-interest and without regard for global ontology" is a lot like saying you can't use TCP/IP, HTML, or HTTP for the same reasons.

Most of these arguments are just common Semantic Web misunderstandings. Maybe the Tuple or Relational Web would've been a better name.

The Semantic Web, Syllogism, and Worldview

Nokia Knowledge Navigator

I can't help being impressed by the Nokia 7700, well the Flash demos anyway. Comes with hand writing recognition, PDA functionality and, of course, it is a phone. CNN Money has a review. Also, Gartner released a report saying that Symbian will lose smartphone battle.

Friday, November 07, 2003

The Blue Matrix

Haven't been overly fussed with the Matrix series but here is an interesting read about the blue matrix ([1],[2]).

Storing RDF

Workshop on Semantic Web Storage and Retrieval - Position Papers.

The Indexing and retrieving Semantic Web resources: the RDFStore model from ASemantics:

"The number of 'sub queries' needed to satisfy a single RDF query is often several orders of magnitude larger than commonly seen in RDBMS applications. Also, very often to save space DBAs design tables using significant number of indirect references, deferring to the application, or a stored procedure layer, for expanding the operation into large numbers of additional join operations just to store, retrieve or delete a single atomic statement from the database and maintaining consistency."

"Such indexes map the RDF nodes, contexts and free-text words contained into the literals to statements. There are several advantages to this approach. First, the use of a hybrid run-length and variable-length encoding to compress the indexes makes the resulting data store much more compact. Second, the use of bitmaps and Boolean operations allows matching arbitrary complicated queries with conjunction, disjunction and free-text words without using backtracking and recursion techniques. Third, this technique gives fine-grained control over the actual database content."

I'm fairly sure that using hashing for databases is the wrong approach, using something that guarantees unique identifiers, like a node pools and string pools, seems much more sensible:
"For efficient storage and retrieval of statements and their components we assume there exist some hash functions which generates a unique CRC64 integer number for a given MD5 or SHA-1 cryptographic digest representation of statements and nodes of the graph."

Also of interest, Prolog-based RDF storage and retrieval (optimising a store for rdfs:subPropertyOf relation) and Jena position paper

Thursday, November 06, 2003

Sabre use MySQL and Linux

Sabre Holdings Air Shopping Products Leverage MySQL Database "Sabre Holdings has migrated its Air Shopping products from a mainframe platform to a combination of HP NonStop servers and a cluster of 45 HP rx5670 servers with MySQL running on the 64-bit Intel Itanium 2 processor with the Linux operating system. Over 100 MySQL database tables are replicated on a 7 X 24 basis through GoldenGate's data synchronization software. The total amount of data held in the clustered MySQL databases is approximately 75 gigabytes.

"We benchmarked our application on several databases, including open source, commercial and a specialized in-memory relational database, and MySQL was the best performing database,” said Alan Walker, vice president of Sabre Labs."

via They use MySQL? Yes. I told you so...

Wednesday, November 05, 2003

Why Longhorn will be Slow (apparently)

The Emperor's New Code "It is interesting that in all the demos and discussions at the PDC, nobody worried about performance. I have to believe XAML imposes substantial overhead on the GUI (look at what XUL did to Mozilla). And vector graphics? Hopefully it can all be pawned off on GPUs and everything will work okay. WinFS is going to be a big time resource hog. I'm guessing it is painfully slow now and that there’s a bunch of people working hard trying desperately to make it fast enough (not to be confused with fast, period). Indigo isn't far enough along for performance to be assessed, but because SOA is simpler than object proxying Indigo has a great chance to be faster than COM+ or DCOM or .NET remoting (none of which were fast enough to be useful in “real” applications). Let's hope the security wrappers don't kill the basic speed of ASMX; in the real world people still use sockets with no security whatsoever, because they're fast."

Performance, Microsoft? As the author states, Microsoft usually gets performance right, except maybe on the Mac.

IBM Releases SW Technology - Snobase

"A new systems management technology known as SNOBASE (Semantic Network Ontology Base) or the Ontology Management System has been released by IBM alphaWorks. The Java-based application provides a "framework for loading ontologies from files and via the Internet and for locally creating, modifying, querying, and storing ontologies. It provides a mechanism for querying ontologies and an easy-to-use programming interface for interacting with vocabularies of standard ontology specification languages such as RDF, RDF Schema, DAML+OIL, and W3C OWL. Internally, the SNOBASE system uses an inference engine, an ontology persistent store, an ontology directory, and ontology source connectors. Applications can query against the created ontology models and the inference engine deduces the answers and returns results sets similar to JDBC (Java Data Base Connectivity) result sets."

""The Query Optimizer allows applications to query large knowledge bases, whose entire set cannot be loaded into the working memory, by querying the ontology source for appropriate pieces as they are needed. In addition, the task of the Query Optimizer is to not only optimize the retrieval of information from ontology sources, but also coordinate queries that span multiple sources. This component is still under construction, and will be added to future editions of IBM Ontology Management System..."

"The Ontology Source Connectors provide a mechanism for reading and writing ontology information to persistent storage. The simplest connector is the file connector that is used to persist ontologies to the local file system. In addition, there will be connectors for storing ontological information in remote servers. Also, the connectors are used to implement caching of remote information to improve performance and reliability...""

Their query language is based on DQL (DAML Query Language).

Yet Another Learning Environment

YALE "63 different operators for training, validation, feature selection and generation, and preprocessing plus 50 classifiers, clusterers and association learners available from the Weka package" Written in Java and comes with a Swing UI.

Monday, November 03, 2003

R is for RDF

RxPath is a language for querying a RDF model. It is syntactically identical to XPath 1.0 and behaves very similarly.

RxSLT is a language for transforming RDF to XML.

RxUpdate is a language for updating an RDF model.

RxML is an alternative XML serialization for RDF that is designed for easy and simple authoring, particularly in conjunction with RhizML.

Racoon is a simple application server that uses an RDF model for its data store, roughly analogous to RDF as Apache Cocoon is to XML.

Rhizome is a simple content management and delivery system that is similar to a Wiki except that you can author arbitrary XML and RDF metadata and the structure of the website is stored as RDF. This allows both the content and structure to be easily repurposed and complex web application rapidly developed.

Built on top of 4suite.

Sunday, November 02, 2003

Similar but not standard

I was hoping that Microsoft would implement Avalon and the like in Longhorn using standards alas it's not to be, to quote the quoted:
"I think the bottom-line of XAML is that it is equally useful for creating both desktop applications, web pages, and printable documents. This means that Microsoft may be attempting to simultaneously obsolete HTML, CSS, DOM, XUL, SVG, SMIL, Flash, PDF. At this point, the SDK documentation is too incomplete to firmly judge how well XAML compares with these formats, but I hope this lights a fire under the collective butt of the W3C, Macromedia, and Adobe. 2006 is going to be a fun year."

Saturday, November 01, 2003

Feedback on Sherpa

Now that's Semantic Web(?) "Having said this, though, some of what I read leads me to think this isn't as open as I thought at first glance. First, if I read this correctly, the Sherpa calendar information is centralized on the Sherpa servers. I'm assuming by this, again with just a first glance, that Semaview is providing the P2P cloud through which all of the clients interact in a manner extremely similiar to how Groove works. If this is true, I've said it before and will again -- any hint of centralization within a distributed application is a point of weakness and vulnerability, the iron mountain hidden within the cloud.

Second, I can't find the calendar RDF/XML out at the sites that use the product. There are no buttons at these sites that give me the RDF/XML directly. Additionally, trying variations of calendar.rdf isn't returning anything either. Again, this is a fast preliminary read and I'll correct my assumptions if I'm wrong -- but is the only way to access the RDF/XML calendar information through SherpaFind? How do bots find this data? "

More WinFS

Extended Blog Conversation "Metadata needs to be standardized, Metadata needs to be trusted / Metadata needs to be error-correcting, When performed on a large scale, moderation detects and FIXES "lies" in metadata, When a user first uses an aggregator, it should still be able to intelligently search. under repeat usage, it should get smarter, the more sites the user adds to their list, the "smarter" the search will get."

And the answer to these questions are .NET and WinFS.

Along similar lines, Can GNOME Storage Keep Up with Longhorn's WinFS?: "Storage is the only possible competitor to WinFS for the Linux community. It needs to be adopted, pushed, and developed NOW. It's easy to make fun of Longhorn for being so hyped up when it won't be released until 2005/2006, but the fact is that developers are going to start working on Longhorn software very soon. And developers create the market. If you want Linux to succeed on the desktop, you need apps. And apps need developers. And developers aren't going to care about Linux if they can make use of far superior technology on Windows. Which, I'm afraid, is starting to be the case. Just when open source was beginning to gain ground in the desktop arena, it is now starting to loose its edge.Storage is the only possible competitor to WinFS for the Linux community. It needs to be adopted, pushed, and developed NOW. It's easy to make fun of Longhorn for being so hyped up when it won't be released until 2005/2006, but the fact is that developers are going to start working on Longhorn software very soon. And developers create the market. If you want Linux to succeed on the desktop, you need apps. And apps need developers. And developers aren't going to care about Linux if they can make use of far superior technology on Windows. Which, I'm afraid, is starting to be the case. Just when open source was beginning to gain ground in the desktop arena, it is now starting to loose its edge."

Microsoft's New WinFS Gets the PDC Buzz "The next SQL Server edition, which is slated for beta release in the first half of 2004, features advancements such as a Common Language Runtime (CLR) that will be hosted in the database engine in order to give developers the ability to choose from a variety of development languages for building applications.

Mangione said the enhancements with XML and Web services in the next SQL Server will "provide developers with increased flexibility, simplify the integration of internal and external systems, and provide more efficient development and debugging of line-of-business and business intelligence applications.""

The end of stand-alone databases? It'll be interesting to see the effect of relational file systems glued together by web services will have on the Semantic Web.


LingPipe "Version 1.0 tools include a statistical named-entity detector, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Named entity extraction models are included for English news and English genomics domains, and can be trained for other languages and genres."