Friday, April 30, 2004

Limiting Complexity

The Psychology of Ontology Harmonization "My personal experience seems to suggest the exact opposite: higher the abstraction, lower the objectivity with which people can argue and, therefore, come to an agreement.

I've spoken to people that spent several years of their lives coming up with an ontology and their perception is that the complexity over time of these models to cover a particular domain saturates, does not continue to grow.

This is a basic, but vital assumption for this entire approach to work: if the ontologies grow linearely with the amount of information they can describe, the ontology creation/maintainance process simply won't scale globally."

Intellidimension's SWS

Semantic Web Search "Semantic Web Search is a search engine for the Semantic Web. Our site can be used by both people and computers to precisely locate and gather information published on the Semantic Web."

From the FAQ:
"...using our standard search engine interface you can just type a one or more of keywords describing the information you are trying to locate. This is no more complicated than a traditional Web search engine. However like a traditional Web search engine this can lead to a large number of irrelevant results. To narrow your search you can restrict it to the specific type of resource that you are trying to locate such as a person (FOAF Person) or news article (RSS Item). If your search is still producing a large number of irrelevant results than you can refine it further by specifying one or more specific property values that the resource must have."

Browse the Semantic Web

How to Make a Semantic Web Browser "Two important architectural choices underlie the success of the Web: numerous, independently operated servers speak a common protocol, and a single type of client—the Web browser—provides point-and-click access to the content and services on these decentralized servers. However, because HTML marries content and presentation into a single representation, end users are often stuck with inappropriate choices made by the Web site designer of how to work with and view the content. RDF metadata on the Semantic Web does not have this limitation: users can access the underlying information and control how it is presented for themselves. This principle forms the basis for our Semantic Web browser—an end-user application that automatically locates metadata and assembles point-and-click interfaces from a combination of relevant information, ontological specifications, and presentation knowledge, all described in RDF and retrieved dynamically from the Semantic Web."

Some thoughts on RDF rendering had some ideas on visualizing the Semantic Web too.

The Passion of RDF

"SemanticBible is emerging exploration of new applications of markup and computational linguistic technology to the study of Scripture, with an emphasis on practical tools that encourage understanding and personal transformation."

See also: The Vision of a Semantic New Testament: "Just as important as avoiding commercial barriers to sharing is the requirement that SemANT support existing and emerging standards that enable use across the Internet. To this end, SemANT will build on the Semantic Web Activity of the World Wide Web Consortium (W3C), including XML as a syntactic standard for data interchange, and RDF for ontology-based representation, and DAML/OWL for additional semantic expressiveness.".

Another example of when your technology has matured like: PCs, CDROMs, Hypertext, the Web, etc.


Mozilla, Gnome mull united front against Longhorn "So far, XUL has failed to catch on, and Microsoft questioned whether Mozilla's technology would do much to help Gnome ward off Longhorn's promised threat.

XAML, Microsoft warned, is more potent than XUL in its ability to reflect exactly what's in the operating system.

"XUL is not the multipurpose declarative language that Gnome probably wants," said Ed Kaim, product manager for the Windows developer platform. "People say that when all you've got is a hammer, everything looks like a nail. In the same way, people are trying to figure out how to crush XUL into an OS it really wasn't designed for. The browser is great for a lot of things, but when it comes to robust client side applications, it's not the best."

Another trick will be in reconciling XUL with Gnome's existing user interface technology.

"There are ways to marry them," said Bruce Perens, an open-source consultant who serves as executive director of the Desktop Linux Consortium, a marketing organization. "But it's very difficult to get the two teams working in the same direction. They both went on a several-year tour of technical creation where they sat down and created everything they needed to do GUI [graphical user interface] applications — and they didn't create the same thing. Now to get them together it would take some number of years to resolve the technical diversions.""

Query Use Cases

Query Languages Report "A report by AIFB and Sesame and Jeen Broekstra from the Sesame crew. The Authors know what they are talking about as they are SemWeb developers themselves.

Although a little self advertisement and some missing languages, its a good thing to read. If you need info about RDF Query languages, read it.

My previous demand about "optional joins in queries" is answered by SeRQL."

The report is an excellent example of the current features required from a query language. From what I can tell iTQL implements 11 of the 14. Some of the others are fairly trival to add support for (like the data type support).

Wednesday, April 28, 2004

Will the Semantic Web Scale?

Apparently, there's going to be a debate at WWW2004 about whether the Semantic Web will scale: "However, with only a few exceptions we noticed that current research and development is focusing on creating new technologies for facilitating the Semantic Web. Available technologies from other disciplines such as databases are rarely reused and adapted. Hence, most Semantic Web systems do not scale to Web-size problems.

Lately, several researchers doubted whether the Semantic Web idea will ever scale for numerous reasons technological [But03], theoretical [van02] and practical [MS03,Sow]. Dedicated workshops on that topic [CKDE03,VDC03] have been organized recently to promote research to improve scalability. We will pick up these three categories of doubt by organizing the panel in three parts discussing each aspect: theory, technology/implementation, and practise."

I'm not sure I agree, there are very few Semantic Web systems that don't reuse existing SQL databases - they just suck at storing triples. With Kowari, and I'm sure with other native stores, the data structures and techniques used are taken directly from databases. They mention "Is the semantic web hype?" (which I responded to) and a few others. Although, there's no links to syllogism, metacrap or gnomes. BTW, I'm still not sure why you'd want an XML version of OWL.

"Network round-trips are often considerably less costly than the time taken for a transactional database operation due to the need to forcibly log transactional operations which is very costly in terms of disk performance. i.e. network round-trips aren't always the performance bottleneck." From Martin Fowler's First Law of Distribution.

As long as you keep the Semantic Web like the Web there's no real reason why it shouldn't scale.

Google Watching

What can't you find on Google? Vital statistics "Google is famous for being a confident, open company. Its clean, uncluttered search page is supposed to be a metaphor for the organisation behind it. But when you start asking questions about its technology, then the water rapidly becomes murky...One university presentation, for example, claimed that Google handled 150 million queries a day, and 1,000 per second at peak times...If the system is handling a peak load of 1,000 queries per second, he reasoned, that translates to a peak rate of 86.4 million queries per day - or perhaps 40 million queries per day...They also claim to have '4+ petabytes' of disk storage, and have let slip that each server is fitted with two 80 gigabyte hard drives. Now a petabyte is 10 to the power of 15 bytes, so if Google had only 10,000 servers, that would come to 400 Gb per server. So again the numbers don't add up."

Google Goes Public? The Rich Get Richer "People speculate. People dream. And if the numbers are to be believed, people will drool. The current prediction is that Google, if it decides to sell shares to investors this year, would probably end up with a market value of $20 billion to $25 billion by the end of its first day as a publicly traded company."

Google's Brin Talks on Gmail Future "It was interesting to me that you did finally hit on the word conversation. It seems to me that there's a synergy between the elements of the conversation in the RSS space and what you're doing in the e-mail space.

I think that's very true. Part of the things we've seen why blogs and RSS feeds are such a success is that you can actually read it—you don't have to stop, click back and forth, collect bits and pieces here and there—but it is all presented to you as one. "

Mozilla to Upgrade RDF

RDF module owner "With Benjamin Smedberg, Chase Tingley and Ben Goodger as peers, I took over the module ownership on RDF. We gonna push for both standards conformance (there are new specs out there since early 2004) and scriptability for remote web applications. This will include some serious whacking of the RDF API in Mozilla, as that is not ready for the web by a fair amount."

Tuesday, April 27, 2004

Ant is now more useful

ANT's finally a real build tool "And I can finaly call ANT a real build tool (and Maven can go play in its own cacca for all I care).

In a nutshell, the task lets you reference other build files. This means that you can create common centralized libraries of build files that other people can use on their own projects - all without copy and paste. And believe it or not, the semantics all make sense too. You can provide default tasks and properties, and the importer can override tasks and properties to customize behaviors on a case by case basis if it's required. The end result is that individual project build files are smaller and easier to understand, and common behavior can be achieved across an entire large system in a natural and non-cut-n-pasty manner (I don't know about you, but I always found pasties rather unnatural)."

Also new is macrodef: "Macrodef is a way to define a new Ant task in an Ant build itself. Macrodef allows you to define standard tasks that have attributes and elements given to them when they are called."

Paul's Blog

Paul is a guy that sits next to me at work and now is putting his development notes on a blog. So if you're interested in the inner-workings of Kowari take a look. I'll let people guess who DM, AM, and RMI are. :-)

Friday, April 23, 2004

Free Bits of Description Logic

DESCRIPTION LOGICS Includes a bunch of Postscript and PDF files (including the first couple of chapters of the DL Handbook).

Metaweb Graph Updated

New Version of My "Metaweb" Graph -- The Future of the Net "Many people have requested this graph and so I am posting my latest version of it. The Metaweb is the coming "intelligent Web" that is evolving from the convergence of the Web, Social Software and the Semantic Web. The Metaweb is starting to emerge as we shift from a Web focused on information to a Web focused on relationships between things --- what I call "The Relationship Web" or the "Relationship Revolution.""

Semaview Interview

What Is The Semantic Web? "The Semantic Web provides the foundation on which we can build more intelligent Internet applications. It will help everyone find, organize, collect, use and share information more easily."

"My company, Semaview has developed an application called eventSherpa. eventSherpa is making it simple to create and organize schedules and share them over the Internet. Our application automatically creates Semantic Web content transparently without the end user knowing it...Aside from reducing the complexity issue...I believe the largest challenge is convincing application developers to make their data available in semantic format. However it is "a chicken and egg problem" -- the more content available in a semantic format, the more applications that will be developed to take advantage of it; and vice versa."


Reto was trying to get Kowari going. Hopefully it will get used for this project. Documentation gives download links and installation instructions.

From Danny Ayers: Knobot PlanetRDF Demo.

XML 2004

The State of XML "As a software developer I feel increasingly unhappy with the development of a monolithic mass of technology building up, only reasonably accessible behind a Java or .NET API. In contrast, the REST model of composed, simple interactions seems more controllable and containable and you can still see the angle brackets in order to check that things are working. There is still plenty of work and experimentation to be done yet with the notion of more document-oriented web services."

"Consequently, even at the low level of operating systems vendors are seeing the need and advantages of implementing metadata storage and manipulation.

This is good. We have the tools to support this, whichever way you swing on the technology issues. RDF & OWL, Topic Maps, W3C XML Schema: all have the right machinery. Unfortunately that's not the biggest issue. The main problem is which terms, schemas, and ontologies to use. That's just not clear right now for most if not all metadata applications. At best, we'll get inconsistently classified information, which defeats the promise of interoperability. More typically, we'll end up with little tagged metadata and islands of de facto proprietary information."

"As an RDF fan, the realization of this truth causes me some pain. The way out is to stop thinking of RDF as an XML application, and look to easier syntaxes such as Turtle and N3."

Wednesday, April 21, 2004

Unix Job Ad

Unix Specialist "Ah, Unix. Its cheapass cousin, Linux, is what all Microsoft users turn to just as their sanity reaches a crossroads. Did you know that Microsoft Word stills spellchecks ‘Unix’ as ‘UNIX’? Man, how 80’s does that look? I can imagine something like that flickering on the screen of a computer you assembled yourself from a crystal radio kit."

"But the point of all that is, Unix is basically a sort of secret society where you either know it, or you don’t. And since most people just really can’t be bothered going through the agonies of learning it, it’s why we have jobs like this: “Unix Specialist”. Of course that means nothing, or at least it means about as much as “Car Specialist” or “Bread Specialist”. Bread Specialist? What the hell is that? What kind of bread? White, multigrain, mixed grain, wholemeal, sourdough? Sliced or unsliced? If sliced, sliced for sandwiches or for toast? Crusty or soft? No matter! Just eat your bread!"

RDF Engine

RDF Engine "The program RDFEngine was developed as a part of the master thesis of Guido Naudts. It was build on the example of Euler, the program of Jos De Roo. The original version was made with Haskell. It was then rewritten in Python. Purpose of the program was to implement a logic program for the Semantic Web initiative. Concerning compatibility the program is meant to be compatible with CWM in the sense that sources that work with CWM will also work with RDFEngine but not vice versa.(I like to do some experiments of my own (-: ). For input and output Notation 3 is used. See also the Notation 3 tutorial."

Tuesday, April 20, 2004

80/20 REST to SOAP

XMLEurope, Monday "The keynotes were by Jeff Barr of Amazon and Steven Pemberton from W3C. Interesting to hear some detail about the Amazon Web Services, including the 80/20 split between developer use of REST and SOAP interfaces to the Amazon catalogue, and the business case for making the catalogue available in machine-processible format: essentially the value is not in their catalogue per se, and they have a business model for third party sellers and affiliates, so WS access just makes this relationship easier. The output example he gave looked to me very close to RDF; with Amazon's XSLT service it could be transformed to it very easily I think."


Netscape Desktop Navigator The thing about a circular has no point...

Monday, April 19, 2004


Kowari 1.0.2 is now released - much better than the last one. Even available in a bite sized 14MB version. Unfortunately, we didn't get the anonymous node bug in our Jena implementation fixed in time. The next release should be within the next 2-4 weeks depending on the progress of the currently outstanding bugs.

BTW, you can now do things like: "select $s $p $o from <> where $s $p $o ;". You can combine local and remote (via file or http) models by using "and" and "or" in the FROM clause.

Bloom Filters in Social Networks

Building a Bloom Filter in Perl "One drawback of existing social network schemes is that they require participants to either divulge their list of contacts to a central server (Orkut, Friendster) or publish it to the public Internet (FOAF), in both cases sacrificing a great deal of privacy. By exchanging Bloom filters instead of explicit lists of contacts, users can participate in social networking experiments without having to admit to the world who their friends are. A Bloom filter encoding someone's contact information can be checked to see whether it contains a given name or email address, but it can't be coerced into revealing the full list of keys that were used to build it. It's even possible to turn the false-positive rate, which may not sound like a feature, into a powerful tool."

"If any one of the filters is intercepted, it will register the full 50% false-positive rate. So I am able to hedge my privacy risk across several interactions, and have some control over how accurately other people can see my network. My friends can be sure with a high degree of certainty whether someone is on my contact list, but someone who manages to snag just one or two of my filters will learn almost nothing about me."

"Additionally, you can combine two Bloom filters that have the same length and hash functions with the bitwise OR operator to create a composite filter."

Sunday, April 18, 2004

Save Our Software

Save Our Software "During the Internet boom, the traditional companies most obviously affected were bricks-and-mortar bookstores and travel agents...But in 2004, the businesses under direct financial assault by the broadband consumer Internet are not toothless independent bookstores. Instead, they are the major music labels, movie studios, and broadcasters."

"First, the FCC is conducting a proceeding to set the guidelines for what it calls 'software defined the near future, it will be possible use spectrum more efficiently and to increase competition in a space that's now the sole domain of incumbent operators. But only if we keep the FCC from regulating these technologies."

"Second...the FCC generally approved a mandate called the 'broadcast flag,' which would require that digital TV broadcasts include an anti-theft code to prevent consumers from recording programs off the air."

Thursday, April 15, 2004

WebFountain posting

WebFountain, the Long Version "So how does WebFountain make answers to such complex and specific queries possible? Short answer: A lot of hardware and a shitload of metatags. Longer answer: WebFountain does more than index the web, then serve up results based on keyword matches and some clever algorithms. Sure, it indexes the web, but once the pages are crawled, WebFountain goes several steps beyond consumer search engines, classifying those pages across any number of crucial semantic categories. (Yes, IBM is active in the semantic web conversation, and has published several specs on this in the public domain). Using natural language and machine learning technology, along with a host of structured data cross-references (such as public company databases or, perhaps, a client’s proprietary database of industry terminology), WebFountain basically re-structures the web, making it accessible to a client’s queries."

"As I mentioned earlier, IBM’s model for WebFountain is platform-based. Assuming they can pay the freight, most anyone can develop for it, using a standard API that leverages simple web services. IBM won’t disclose most of its customers, but two it will mention are Semagix, which has a (pretty damn frightening) money laundering application, and Factiva, which has developed a “reputation manager” - think of it as Technorati on steroids for the serious corporate marketing or legal department. (Imagine being able to find any mention of your product or service anywhere on the web and create custom filters for the context, location, date, author, and relationships attached to those mentions, in near real time.)"

History of Treemaps

Treemaps for space-constrained visualization of hierarchies "During 1990, in response to the common problem of a filled hard disk, I became obsessed with the idea of producing a compact visualization of directory tree structures. Since the 80 Megabyte hard disk in the HCIL was shared by 14 users it was difficult to determine how and where space was used. Finding large files that could be deleted, or even determining which users consumed the largest shares of disk space were difficult tasks.

Tree structured node-link diagrams grew too large to be useful, so I explored ways to show a tree in a space-constrained layout. I rejected strategies that left blank spaces or those that dealt with only fixed levels or fixed branching factors. Showing file size by area coding seemed appealing, but various rectangular, triangular, and circular strategies all had problems. Then while puzzling about this in the faculty lounge, I had the Aha! experience of splitting the screen into rectangles in alternating horizontal and vertical directions as you traverse down the levels. This recursive algorithm seemed attractive, but it took me a few days to convince myself that it would always work and to write a six line algorithm."

I especially liked PhotoMesa.

Wednesday, April 14, 2004

RSS on the TV

NewsGator Media Center Edition Provides Access to Syndicated Content on TV Sets; 'Living Room' Interface Allows Users to Read Selected Content, or Watch On-Demand Video Content "NewsGator Technologies launched NewsGator Media Center Edition today, which allows users to read syndicated content feeds on their TV with Windows XP Media Center Edition. Both text and multimedia content is supported, with an interface designed to be used with a remote control from across the room. NewsGator Media Center Edition shows information that has not already been viewed on another device by synchronizing user subscriptions with NewsGator Online Services."

You know your technology has reached maturity when someone tries to put it on the television. A good followup posting with links to screenshots.

DSpace presentations

DSpace User Group Meeting 2004 Presentations A bunch of Powerpoint presentations on DSpace.

42...What was the question?

Abracadabra, 42, Curator "The Hitchhiker's Guide to the Galaxy,1 those afflicted with 42 fever argue that "UML" is actually the correct answer. The classical symptom of those afflicted with 42 fever in the sphere of software engineering is to have an a priori delusion that UML is the solution for all software-engineering problems."

Part of a larger set of articles on UML Fever.

"The entertaining tenor of "Death by UML Fever" should not be inferred to suggest that UML fever is an imaginary ailment. It is genuinely real, it is thriving, and its presence is causing cost and schedule trauma on many software programs right now. Furthermore, the root causes of this fever, in general, have nothing to do with the UML itself: Rather, this fever and its various manifestations are largely symptoms of deeper ills in an organization's software development practices. Software organizations should consider launching self-diagnosis campaigns to assess the presence or extent of UML fever on their programs and plan rehabilitation strategies as necessary. Developing good software is a difficult enough task without having to endure the preventable and often painful complications of the dreaded UML fever."

We all Agree

As We May Hack "I can walk into any meeting anywhere in the world with a piece of paper in hand, and I can be sure that people will be able to read it, mark it up, pass it around, and file it away. I can't say the same for electronic documents. I can't annotate a Web page or use the same filing system for both my email and my Word documents, at least not in a way that is guaranteed to be interoperable with applications on my own machine and on others. Why not? - A Manifesto for Collaborative Tools

Yawp after me - "RDF! RDF!". But seriously, the answer to "why not?""

From Yawping!. The collaborative tools manifesto has been mentioned several times recently it's worth a look.

Monday, April 12, 2004

Another RSS library for Java

RSSLib4J is a set of Java API to parse and retrieve information from a RSS Feed. It supports RSS version 0.9x ,1.0 and 2.0 specification with Doublin Core and Syndication namespace.

Sunday, April 11, 2004

Querying Use Cases and Features

Data Access Working Group User Cases: WORKING DRAFT "Because there are no formal standards in these areas, developers in industry and in open source projects have created a wide variety of query languages for RDF data...These languages lack both a common syntax and a common semantics; there is, in fact, a wide variety of semantics: from declartive, SQL-like languages, to path languages, to rule or production-like systems. The existing languages also exhibit a range of extensibility features and builtin capabilities, including inferencing, distributed query, and domain-specific semantics."

Some of the features being considered include: optional triples, disjunction, queries with paths of length two or more edges, able to indicate whether the query response includes entailment from the graph or treats the graph as a fixed object, expressing arbitrary RDF datatypes, queries expressible as a URL, user specifiable query result formats and query results in RDF (closure).

Saturday, April 10, 2004

Librarians and the Semantic Web

People at work think that I have a fetish for girls in glasses or something I think. But actually I think that librarians are going to be an important part of the Semantic Web. Oh and chicks in glasses are dead sexy...err...anyway a recent posting "thinking about the semantic web" links to a series of postings along these lines (librarians not glasses):
* Extreme! "I want people to leave understanding that the Semantic Web didn’t come out of nowhere, that there’s a good century (at least!) of work coming to terms with infoglut that Semantic Web technologies build on. I want them to start thinking critically about website information architecture, about information management in general, about metadata, about problems with (and hopes for!) search engines.",
* More on metadata "Well, I have news. Everything Cory says is true. But it doesn’t matter. Hear me? It doesn’t matter. Because all those arguments aren’t arguments against a Semantic Web—they’re arguments against a populist Semantic Web.", and
* Metadata and authority "This means, quite simply, that a lot of current Semantic Web pipe dreaming will founder on authority and representation-convention problems. Heck, it’s already foundering."

One day, when I categorize this blog there will definitely be a "librarians" section.


Semantically Adapted Service Data Objects " a prototype extension to the base Service Data Objects implementation provided in the ETTK. Specifically, it adds the ability to use RDF and the W3C Web Ontology Language (OWL) as a means of attaching a semantic metamodel to an SDO DataGraph and use that model to navigate and query the DataGraph independent of it's structural model, without requiring any changes to the SDO API or running any transformations on the data."



Gnowsis is a project lead by Leo Sauermann. I've read about it before but apparently never blogged it. Many of the ideas like the Semantic Desktop are worth noting. This type of application is something that seems to be coming up again and again.

He gave a list of features to add: optional triples, update language, protocol on the web and a protocol on the desktop (ODBC like). The first two are handled by iTQL although the second is more elegant than first. Kowari/TKS has always had a driver. The Kowari Lite client, currently the executable iTQL jar, is only about 1MB and doesn't include things like Jena.

why I love Patrick Sticklers URIQA approach "Building RDF aggregators is not an easy task. I have tried it in many different ways, and had varying forms of success. If you want an easy approach that does it, think about concise bounded descriptions."

I do think that the resolver idea, using the FROM clause, views and defining caching and updates, is going to be an easy way to do RDF aggregation.

Thursday, April 08, 2004

Regime Change

U.S. Terrorism Policy Spawns Steady Staff Exodus "Since the Sept. 11 attacks, the Bush administration has faced a steady exodus of counterterrorism officials, many disappointed by a preoccupation with Iraq they said undermined the U.S. fight against terrorism."

""I'm kind of hoping for regime change," one official who quit told Reuters."

""Iraq has been a distraction from the whole counterterrorism effort," said the former official, adding the policy had frustrated many in the White House anti-terrorism office, about two-thirds of whom have left and been replaced since Sept. 11."

"Roger Cressey, who served under Clarke in the White House counterterrorism office, said: "Dick accurately reflects the frustration of many in the counterterrorism community in getting the new administration to take the al Qaeda issue seriously.""

Oh and for something cool with photo-mosaics:
War President "Below is a small version of an image I made. It's a mosaic composed of the photos of the American service men and women who have died in Iraq. No photograph is used more than three times." From here.

If it's broken:

An OS tool to do mosaics (includes Tux mosaic - I thought it was something else when I first looked at it):

Is WiX a trick?

MS Open-Source Move is Straight from Playbook "Looking more closely, WiX enables developers to translate programs from Windows Installer Databases (.msi/.msm) formats to a text-based, XML-file format. XML is an open standard, but to work with MSI/MSM, those XML files have a very specific format. Now, what company has already sought patent protection for specific expressions of XML code? The answer is, of course, Microsoft, with its Office XML formats.

Has Microsoft done this with WiX's XML formats yet? I don't know. But if the pros from Redmond haven't yet, they will. They did it for Office XML document formats; they'll do it for this. Thus, Microsoft's open-source code will work only on Microsoft-proprietary XML to produce Microsoft-proprietary installation programs. With open source like this, who needs proprietary programs?"

"Now, while it may look like Microsoft is doing something new, or perhaps even something helpful to the open-source community, it's not. What Microsoft is really doing is putting more of the Halloween memo's plans into action. Why shouldn't it? The Halloween plans are just an elaboration of Microsoft's time-tested embrace-and-extend technique The only embrace Microsoft is really giving the open-source community is a stranglehold. "

Tuesday, April 06, 2004

The Power of Free

The Profitability of Free Code "Some thought that open source meant non-commercial or even anti-commercial.' At the time, I wondered how far open source could go and in that posting I identified three criteria for success with open source:
• the user community must be vast
• the product scope must be well defined
• a viable business model must exist.

Today, I think that people are starting to understand that open source is more than just a development approach—it's a business model. What evidence is there to support this? Just follow the money: the OSBC conference was packed with venture capitalists and lawyers; techies were outnumbered ten-to-one. Even long time veterans of the closed source software world, like Ray Lane (previously of Oracle) and Chris Stone of Novell are focused on leveraging the power of open source in their businesses.

And why not? That's the nice thing about capitalism. It has a built-in Darwinian efficiency. If open source allows companies to deliver better products at a cheaper price, then it will be used to do just that. We see many of the most competitive companies in the world, like Amazon, Charles Schwab, Cisco, Corporate Express, FedEx, GE, Merrill Lynch, Motorola, Nokia, Sabre, SAP, and UPS using open source software as a platform for building new applications. Many companies that have been successful with Linux are now looking at deploying an entire open source stack, known as LAMP (Linux, Apache, MySQL, PHP/Perl/Python).

These days, it's easier than ever before for an ambitious developer to download all the software he or she needs to build a product—or even to build a business—on top of commodity hardware and open source software. Some of today's hottest pre-IPO companies, including billion-dollar-babies and Google, as well as emerging social networking companies like AlwaysOn's Zaibatsu, Friendster, and, are all using open source software."

Kowari 1.0.2 Soon

We've fixed all the known bugs; that just leaves the unknown ones. We've got JRDF, Jena, and our own RDQL implementation. The loading and query speed has also improved considerably - there's none of the problems associated previously with 32 bit systems. Kowari lite is down to 11MB; which considering that Jena and related jars are over 2MB that's pretty good. Querying is now totally disk backed - you can do a completely unconstrained query over arbitrarily large data sets.

It's now at the stage of what to do next. Some of them are fairly obvious like full data type support, scripting (possibly Groovy) and inferencing. But we've also got zeroconf and MAYBE. There's also a large refactoring underway to make Lucene, the triple store, ontology models and others as plugins exposed via a resolver interface. This should allow just about anything to be queried in the FROM clause of iTQL. In the short term, this means extending the querying of file and http RDF sources to include other metadata formats such as MP3 tags, iCal, vCard and XML based RSS. It will allow models to be typed and configured. Using views you could easily create an RSS aggregator. Other, more difficult resolver implementations will be available in TKS.

JRDF has been a little neglected. The next things to be worked on is transactions and a ResultSet like interface.

Your Life in a Wal-Mart Database

What to do with all that information "[Wal-Mart] the largest in the world, now also maintains the world's largest database, 280 Terabytes, as it seeks to better understand how to efficiently serve its customers. For instance, it increasingly uses Radio Frequency ID tags -- tiny 25-cent data-holders affixed to a box of razors and other goods. Wal-Mart uses RFID to track its inventory through much of the supply chain. Soon the tags will only cost pennies and start going on the razors themselves. Talk about increasing the amount of data to store and analyze! (I listened in the cool new $10 shirt I bought Sunday at a local Wal-Mart store.)"

"And in the not too distant future, data may be able to interact with other data without human intervention. That was the message of Web inventor Tim Berners-Lee. He talked about the glorious future of something called the "semantic web," a concept that most people aside from programmers will probably never understand. His talk here didn't help. One hopeful journalist from the Economist asked Berners-Lee to give an example of how companies could make or save money using it, but he didn't have an answer. He doesn't have to. He's an academic. Businesspeople here assured me this is a big deal we'll hear more about. "

I thought I'd read this before but couldn't find it in a previous entry. Although I did find a previous claim that the biggest database was France Telecom's at 30 TB.

Monday, April 05, 2004

Two Quick Projects

* ruby-rdf Very timely. "A project to advance ruby in the semantic web realm. Specifically, this will be an attempt to port the Redland RDF framework to ruby in an object oriented manner."
* ROWL "The system enables users to frame rules in RDF/XML syntax using an ontology in OWL. Using XSLT stylesheets, the rules in RDF/XML are transformed into forward-chaining defrules in JESS. We make use of two more stylesheets to transform ontology and instance files into Jess undordered facts..."

Chimney Sweeps

The Next Job for the W3C "People will start to feel that when they access some data that they didn’t really think was relevant, they hadn’t thought was relevant to the problem at hand, and suddenly, their problem gets connected to some other things."

"To answer questions within a company, you’ve got to reach across boundaries. You’ve got stovepipes of information. You have HR information. You have totally separate information for bug reports and customer satisfaction.

You might have another, for example, for documentation. We find that we were tracking a customer’s problem, but most of those documents left the company, and when we left the company, we should have kept them. So let’s find out who else has written documents which are being used by people who are tracking bug reports for processing customers who have this problem.

At the moment, what we’re looking at for this is huge monster applications like the pre-Web, which are document databases. That didn’t work because everyone wanted to operate it a certain way. It’s much better to let each part of a company have a Web site, and they’re linked to each other. Let the financial folks organize things how they want, and let manufacturing control people organize their things how they want, and then merge them. Where they have things in common like people or times or places—they have to be able to connect them."


Better is better: improving productivity through programming languages "This is one reason why Lisp and functional programming folks are a bit cool on stuff like Object and Aspect orientation. They can extend to Aspects or Objects using the language constructs directly. To most of us working in popular languages these are paradigm shifting ways of thinking about programming. But we can't use Java directly to support something like aspects or generics. There's whole communities working on language and compiler extensions, open source libraries, as well as that all important vendor support - J2EE app servers are being renegineered just to support aspects. To a Lisp person these are just constructs expressed as macros (which are written in Lisp itself), something you do on a rainy weekend. What's the fuss about?"

"It's something a running joke in the Lisp community that all language efforts are doomed to reinvent Lisp."

"It's corny to say the worst thing about Lisp is a Lisp programmer. But there's some truth in it. Work long enough in this industry and you'll run into a Lisp bigot...I think Lisp never traditionally emphasized libraries because the language is so flexible you can build one on your own, but then you have the twin nightmares of integration and reuse lying in wait for you."

I have been thinking about Lisp and aspect programming recently. The other is ontological programming or programming against an ontology.

Blast from the past

Another catch up post:
* Ercim News posting in 2002 mentioned both Corese and Notio. Their API uses strings to define concepts, relations and attributes. Apparently, it's now available for download (via PlanetRDF). It uses Jena's ARP but has its own CG interface.
* Sleepycat for Java Released under the same licence as the C version.
* Platypus Wiki Again, using Jena and looks good.
* Not in RDF The difficulties of NOT in RDF. I think Kowari/TKS lacks NOT because it was initially hard to do in our implementation; there's now no reason not to do NOT.
* Fixing the Java Memory Model, Part 2 Includes double checked locking, changes to volatile and initialization safety.
* Blog-Bleary? Try (What Else?) a Blog on kinja. Another review.
* RSS Versions Syndic8 is reporting that RSS 1.0 is now 46.4% of all RSS feeds with RSS 2.0 and 0.91 (combined) taking up 49.9%. Will RSS 1.0 take the lead soon?