Saturday, December 31, 2005

Smalltalk meets the Semantic Web

Smalltalk:::OWL-Project "OWL has emerged from the AI/semantic community and tends to be in the open-source community which appears to be a direction for Smalltalk (e.g. Smalltalk Solutions at Linux World) Much of the work to date has been implemented in Python and Ruby which, from a language perspective, is very close to Smalltalk. However, those languages become less appealing if you have ever worked in the IDE's supporting those languages. OWL can provide the Smalltalk community with a "market" that is a good fit for the features of the ST language and supporting IDE's."

"Agilense provides a product named EA WebModeler...is an implementation of the Adaptive Object Model pattern..."

A good summary of AOM: "We call these systems "Adaptive Object-Models", because the users' object model is interpreted at runtime and can be changed with immediate (but controlled) effects on the system interpreting it. The real power in Adaptive Object-Models is that the definition of a domain model and rules for its integrity can be configured by domain experts external to the execution of the program."...If you got just one reason why EJB is flawed, this has got to be the one, you can't build systems for Enterprises at the same time take away control form the business stakeholders."

RETE Rebuked

A recent blog I started reading after criticizing SPARQL. This time its criticizing an entry Paul made about the scalability of the RETE algorithm: "I am a bit confused by the statement that RETE does not scale. This is contrary to a mountain of papers by researchers and developers around the world. From the paragraph, a couple of things come to mind. "(loading indexes) does not need to be done often" tells me the author doesn't understand the purpose and goal of RETE. RETE was designed to solve machine learning problems where data changes rapidly and reasoning is a continuous process. What the author wants is something closer to BitMap indexes used in OLAP products."

A lot of the points raised, like RETE being for changing data, are mentioned in a previous post under Meeting. Drools was chosen as a starting to point to see what kind of system needed to be developed in Kowari.

Also mentioned, bitmap indexing: "Given Tucana is indexing everything, they might as well adapt Bitmap indexing and get better than linear performance. The problem described by the blog is a well understood problem in the OLAP world."

An previous entry, "Relational theory, RETE and Derby" points to some interesting articles about bitmap indexes (available in Oracle 9) and high scalability requirements: "In a large financial institution like a mutual fund company, they may have 1-20 million customers. If each customer has an average of 20-30 positions (aka specific holding of an equity) that means the potential dataset for firm wide compliance rule could involve 20million+ rows. Doing this within 2-5 seconds is rather hard, so it requires using lots of different techniques. In the extreme cases, a company might have 20 million accounts, which means the potential dataset is 600 million rows."

From the OTN article: "B-tree indexes are usually used when columns are unique or near unique; bitmap indexes should be used, or at least considered, in all other cases. While you would not generally use a b-tree index when retrieving 40 percent of the rows of a table, a bitmap index is often still faster than doing a full table scan. This is seemingly in violation of the 80/20 rule, which is to generally use an index when retrieving 20 percent or less of the rows and do a full table scan when retrieving more. Bitmap indexes are smaller and work differently from the 80/20 rule. You can effectively use bitmap indexes even when retrieving large percentages (20 to 80 percent) of a table. Bitmaps can also be used to retrieve conditions based on nulls (since nulls are also indexed) and for "not equal" conditions."

It would appear that this would be suitable for predicate indexation but not generally as both subjects and objects are near unique.

Thursday, December 29, 2005

Good API Design

Java API Design Guidelines "If your API is worth anything, it will evolve over time...decide what sort of compatibility you will guarantee between revisions...What should the design goals of your API be?...absolutely correct...easy to use...easy to learn...fast enough...small enough...it's much easier to put things in than to take them out."

This seems all well and good.

"Interfaces can be implemented by anybody. Suppose String were an interface. Then you could never be sure that a String you got from somewhere obeyed the semantics you expect: it is immutable; its hashCode() is computed in a certain way; its length is never negative; and so on."

Adding interfaces on top of String like CharSequence was a good thing. It meant that you could process a String or a StringBuffer the same as well as consistently treat something that may have been in memory or on-disk (including being memory mapped via NIO). The way to ensure that String implementations do follow the correct semantics is test driving the interfaces and this requires being able to create a Mock object of that interface - pretty much sealing the deal as far as interfaces are concerned.

Actually, the whole section on interfaces is pretty much a wash as this can be solved by test driving 3 of the 4 points raised. As far as using an abstract class to help with the evolution of the API, you can have both - an interface and a default abstract class for some base implementation.

Exceptions get a going over too, "Use a checked exception "if the exceptional condition cannot be prevented by proper use of the API and the programmer using the API can take some useful action once confronted with the exception." In practice this usually means that a checked exception reflects a problem in interaction with the outside world, such as the network, filesystem, or windowing system."

Related: Evolution Not Creation, Reminder About Incremental and Test Driven Development and 10 Minute Commits for Better Code.

5 Minutes with Monad

I've recently spent a bit of time trying to solve some problems using Microsoft's new Monad shell. It's interesting that the creature for the O'Reilly book is the common toad. It was a different experience, although the pain is eased a little as the default installation comes with all the new commands (cmdlets) mapped to Unix ones (like ls and ps).

The default security setting prevents remotely signed objects from being executed and there seems to be no way to turn it on. The documentation is missing. To make it usable it's:
set-property
`HKLM:\SOFTWARE\Microsoft\Msh\Microsoft.Management.Automation.msh`
-property ExecutionPolicy -value RemoteSigned

Going through the tutorials it did show itself to be kind of cool. For example, being able to select the top 10 processes based on VirtualMemorySize:
get-process | sort-object VirtualMemorySize | select-object -last 10
You can whack on a "convert-HTML" or an "export-csv" to produce the result in a format you want or connect to Excel or SQL Server to retrieve data. A lot has been made of its native XML support and how it passes around strongly typed objects rather than just Unix's streams.

One of the problems was trying to do line by line processing. There was promises of pipelining via XML streams but according to "Replace lines in a text file?" (the first hit on Google) Monad doesn't support it. The lack of streaming appears to be a crucial omission in a toolset designed for system administrators - although it might not be fatal as log files and the like don't usually come close to the available memory of modern systems.

It does support accessing the .NET APIs which provides a loophole. For example, to read a file line by line and replace "xxx" with "yyy":
$f = [System.IO.File]::OpenText("c:\file.txt")
while($line = $f.ReadLine())
{
$line -replace "xxx","yyy"
}

It was all for nothing, as I later found out that it didn't support Windows 2000 and it needed to be deployed on that - it is supported by Windows XP, 2003 and Vista. Back to Windows Script Host (maybe using Ruby) I guess.

Sunday, December 25, 2005

Merry Bag Of Links

Political:

  • Support Creative Commons "We are down to the last $100,000, and really need your support — both for the very cool projects we’re launching (see, e.g., the license interoperability project, discussed recently in Technology Review, and the two new projects announced this week), and for the very uncool pressure we’re under from IRS regulations to demonstrate “public support” as a condition for keeping our (absolutely essential as in we can’t live with out it) tax exempt status." Via We've got 10 days, and we need $100,000. Please help

  • Passion of the Spaghetti Monster and Intelligent Design

  • Top 12 media myths and falsehoods on the Bush administration's spying scandal "...the Bush administration and its conservative allies in the media have defended the secret spying operation with false and misleading claims that have subsequently been reported without challenge across the media."

  • The Curious Section 126 of the Patriot Act "Congress is seeking assurances that "the privacy and due process rights of individuals" is protected in the course of the government using massive databases of non-publicly available data; both proprietary databases and its own compiled intelligence and law enforcement databases to "search" for terrorists and terrorist connections."


Agile:

  • Client vs. Developer Wars "This, to me, is another indictment of dysfunctional specifications. I learned long ago that clients won't listen to what you say, and they certainly won't read what you write. You're much better off putting that wasted effort into a working model and setting it in front of the client. Let them play with it for a while. Refine the working model based on that feedback, then keep turning the crank on this cycle until you run out of resources."

  • Continuous Testing - in spirals "I want tests to run ‘inside out’ Imagine a spiral, with the unit test for the bit of code you’re currently editing to be the focal point. Ideally, I’d want a test to run first for the method I changed last, then for the whole class, then for the suite the class is in, then further out to other dependencies. Tests run outward only when green bars are encountered. If there is a red bar somewhere, the spiraling stops, so we can examine the failure, fix it, and see again from the inside which tests run."

  • Essential Advice for Agile Coaches


Programming:
General Technical:

Saturday, December 24, 2005

Know When to Hold Them, Know When to Fold Them

So what is an RDF merge and when should you apply it in SPARQL?

Very succinctly, in "The Semantics of SPARQL" it says: "The RDF merge  U+ <G1...Gn> of a sequence of graphs <G1...Gn> (i.e., a dataset) is the ordered merge union of the graphs, where repeated bnodes are substituted with fresh ones, by keeping the names of the bnodes coming first in the sequence order."

In "SPARQL Query Language for RDF" it gives a simple example:

Graph 1:
_:a foaf:name "Bob" .
_:a foaf:mbox .

Graph 2:
_:a foaf:name "Alice" .
_:a foaf:mbox .

The result of the merge, upon which queries are made:
_:x foaf:name "Bob" .
_:x foaf:mbox .

_:y foaf:name "Alice" .
_:y foaf:mbox .

Section 9 details querying multiple graphs in SPARQL, including a new dataset where the default graph is a merge of the graphs in the FROM clause.

In summary, when SPARQL operations are performed across graphs you get new blank nodes which prevents, for example, being able to JOIN across graphs using them.

What is generally required by RDF applications is something like smushing. For example, an "...RDF spider (often known as a "scutter") can gather up FOAF files and "smush" them together into a single model that unifies the individual pieces of information into a network." (from "A Semantic Web Shoebox - Annotating Photos with RSS and RDF").

To actually achieve smushing, Leo has an example algorithm or it might be appropriate to adapt RDF graph isomorphism algorithms.

Friday, December 23, 2005

More on Disjunction

I think this is just going to happen time and time again, "RDF non-sense": "The W3C Working Group members argue SPARQL is a mixed mode language that does support OR, though they are calling it an "optional union". Frankly, I see no point in renaming something people understand to mean one thing. It gives me the impression the W3C want to be thought police and enforce a certain way of thinking.

On the practical side, many analysts happen to like OR disjunctions and would complain loudly. Of course, there are plenty of cases where users abuse the power and write deeply nested disjunctions. That is not a valid reason in my mind to avoid disjunction. It saves the user time and allows them to write simpler rules using disjunction. The W3C seems love RDF and wants the world to love it. Unfortunately, the current specification is a complete piece of junk. I hope RDF dies a quick and public death."

I declare...backward chaining suits me fine! "My two favorite declarative tools right now are Pellet and Prova, both of which are open source java SemWeb tools that are highly compatible with Jena, which recently got a bump to 2.3 with fairly complete SPARQL support. Pellet is an implementation of OWL-DL and some related description logic facilities by the Mindswap guys in Maryland, who have absorbed some of the l33t Kowari/Tucana guys, too (Tucana was recently picked up by Northrop, BTW)."

"Prova is a prolog-variant built on top of Mandarax. It is a very effective and fun medium for scripting of high-level relationships and operations. The integration of prolog unification, java types, java methods, and java exceptions is done very nicely, and yields fine code economy. There are some rough edges in the docs, but we are helping to get these worked out in the pretty soon."

I guess that means David W is l33t! :-)

I Hope Not

Ruby is to Perl what C++ was to C. He qualified this by saying, "Ruby improves and simplifies the Perl language" which isn't really what I saw C++ doing at all.

Quote: "These are the folks that assert that Java's verbosity is "just finger typing that Eclipse/IntelliJ will do for me," and it doesn't matter if the resulting code has 20 times the visual bulk of a simpler approach. One of the basic tenets of the Python language has been that code should be simple and clear to express and to read, and Ruby has followed this idea, although not as far as Python has because of the inherited Perlisms. But for someone who has invested Herculean effort to use EJBs just to baby-sit a database, Rails must seem like the essence of simplicity. The understandable reaction for such a person is that everything they did in Java was a waste of time, and that Ruby is the one true path."

Related: Rocking With Ruby and One way Java is better than Ruby

Wednesday, December 21, 2005

A Better PageRank

This paper gives an example of some of the flaws with Google's PageRank algorithm and they suggest they have an algorithm that fixes it. Something is Wrong with Google’s Mathematical Model "In their original paper : ”The PageRank Citation Ranking: Bringing Order to the Web” [1], Page et al. suggest a new ranking algorithm named - PageRank. It is shown there that implementing the new algorithm boils down to solving a huge eigenvalues problem Ax = x (1) where A is a matrix which represents a graph related to the web. It is claimed that in order for the model to work properly, the graph should be strongly connected. In general, the graph is not strongly connected and we have ’sink’ set of pages."

"We have developed a new algorithm which can be considered as a modification of the original PageRank algorithm. The modified algorithm is stable and gives a correct ranking vector. The mathematical complexity of the suggested algorithm is the same as the complexity of the original one."

In Google's Librarian Center they've published as similar explanation of PageRank called "How does Google collect and rank results?".

LiveConnect, Lives as LAJAX

I ignored this when I first heard about it, as noted here: ""I was disappointed when [Bray] said that he was going to make a product announcement, and I was unenthusiastic when the announcement turned out to be about a Sun version of Derby," Leung wrote.".

Derby Demo hits a nerve "I think we hit a nerve with this demo. I think many of us within the Derby community recognized the potential for Derby within a web browser environment, but it's wonderful, great, fantastic to see how the community is "getting" it and running with it."

Derby ApacheCon demo and ApacheCon 2005: Ok, Tim, I'm not jaded anymore.

Monday, December 19, 2005

Minimum Union

The key operation to provide outer joins that are associative and communitive, as noted in Outer Joins Aren't Primitive, is called "minimum union". Minimum union (I've also seen "outer union") pads with nulls the tuples of two schemas and then unions them (without duplicate removal).

The original thesis "Algebraic Optimization of Outerjoin Queries" gives a variant of relational algebra that allows tuples defined by different sets of attributes (schemes) rather than padding with nulls. This actually removes the requirement for nulls. It also includes presenting the data in a nested relational form, where instead of having one tuple in a parent-child relationship, children are a set to a parent. This is just like the way Kowari presents its results (Figure 3.2 in the paper).

Sunday, December 18, 2005

A Link in Time

Event Horizon


  • Semantic Web, Here We Come "The “Structured Blogging Initiative” is an attempt to jump-start the “semantic web,” the idea of giving deeper meaning to the Internet advocated by World Wide Web creator Tim Berners-Lee. By incorporating descriptive information into the code of web pages, laypeople will be able to designate their content as a movie review, an event posting, or an item available for sale." StructureBlogging initiative and other entries: "Structured blogging initiative taking off", "More StructuredBlogging feedback" and Structured Blogging is a thing you do -- not a format.

  • Bill de hÓra discusses RDF and database schemas: "Using RDF storage provides flexibility at the domain level. Altering tables isn't needed because RDF, being a graph based, is naturally additive...My (somewhat anecdotal) experience with RDF is that datasets in the order of 106 and greater aren't uncommon and that you should budget for an order of magnitude increase in terms of the number of rows required for the domain storage compared to an entity relational approach...It's an interesting question whether using RDBMSes to store RDF counts as some form of abuse, or bad engineering."

Saturday, December 17, 2005

Ion Inside

First Mass Producible Quantum Computer Chip "Using the same semiconductor fabrication technology that is used in everyday computer chips, researchers were able to trap a single atom within an integrated semiconductor chip and control it using electrical signals, said Christopher Monroe, U-M physics professor and the principal investigator and co-author of the paper, "Ion Trap in a Semiconductor Chip." The paper appeared in the Dec. 11 issue of Nature Physics."

Thursday, December 15, 2005

Marks

The Man Who Wasn't There: problems of missing or partially missing data in geoscience databases "In the literature discussions between Codd and Date on the propriety or otherwise of NULLs in relational databases, there seems to have been some confusion on both sides, on one very important question. That is the distinction between database representation and function evaluation. NULLs are one approach to the problem of handling missing data within the database."

So in this respect RDF is great - you don't have to come up with a value or values to represent missing data. You only have to worry about function evaluation.

"In fact the Codd 'mark' solution does not in itself require, as unfortunately implied by Codd himself, and vigorously attacked by Date, the use of 3- or 4-valued logic, and therefore cannot be dismissed so easily. Relational database theory is based on first-order predicate logic, which uses two truth values TRUE and FALSE. If there is no value for a data item, then the logical statement corresponding to the tuple containing that item can simply omit any mention of that particular column. If the value of this data item is required in an operation, then there is only one truth value which can be returned: FALSE. This applies to database set operations such as JOINs and also to numerical operations such as totals and averages where the absence of any required data value prevents the computation from being carried out. If a total or average is required in such a situation, then the problem can be circumvented only by first selecting non-absent data. This is the correct treatment, to ensure that statistics are computed on a valid data set."

Another example of marks, tuple marks.

SH writes in about incomplete data in observational science databases, the open world assumption, 3VL and NULL, McGoveran responds saying: "In a scientific database such as the type to which you allude, a reasonable interpretation of True and False under CWA is "valid by experiment and consistent with hypotheses" and "not validated by experiment or inconsistent with hypotheses". If you give this differentiation up with CWA and nulls, you've given up scientific reasoning and the scientific method."

In Kowari, if you have the following triples: _b1, <urn:sno>, "S1"; _b2, <urn:sno>, "S2"; _b3, <urn:pno> "P1".

And performed the following query:
select $s1
...
where $s1 <urn:sno> $o1 or $s2 <urn:pno> $o2;

It returns: _b1, _b2, null (really unconstrained).

However, if you select $s2 instead it returns: null, b3. Using the above idea, it would return _b1, _b2 for the first query and _b3 for the second.

I'm not sure I really like this solution, preserving unknown seems to make more sense as demonstrated in How FirstSQL Solves the EXISTS and Other Problems.

Outer Joins aren't Primitive

Optional data in SPARQL seems to be equivalent to left outer join in SQL. As it turns out, outer joins can be composed of disjunctions. This is similar to the original MAYBE function suggested to be added to Kowari (although that suggestions is quite a deal simpler). The below paper outlines algorithms to do outer queries more efficiently. They require computing the anti-join of certain relations - an antij-oin being the set difference between two tables (or MINUS operation). Here is a good explanation of semi-joins and anti-joins.

Outerjoins as Disjunctions "The outerjoin operator is currently available in the query language of several major DBMSs, and it is included in the proposed SQL2 standard draft. However, “associativity problems” of the operator have been pointed out since its introduction. In this paper we propose a shift in the intuition behind outerjoin: Instead of computing the join while also preserving its arguments, outerjoin delivers tuples that come either from the join or from the arguments. Queries with joins and outerjoins deliver tuples that come from one out of several joins, where a single relation is a trivial join. An advantage of this view is that, in contrast to preservation, disjunction is commutative and associative, which is a significant property for intuition, formalisms, and generation of execution plans.Based on a disjunctive normal form, we show that some data merging queries cannot be evaluated by means of binary outerjoins, and give alternative procedures to evaluate those queries. We also explore several evaluation strategies for outerjoin queries, including the use of semijoin programs to reduce base relations."

Also related, Outer Join in Edutella where each part of the outer query is done individually.

Tuesday, December 13, 2005

A Really Interactive Query Language

SQLBuilder "SQLBuilder uses clever overriding of operators to make Python expressions build SQL expressions -- so long as you start with a Magic Object that knows how to fake it."

An example:
>>> from SQLBuilder import *
>>> person = table.person
# person is now equivalent to the Person.q object from the SQLObject
# documentation
>>> person
person
>>> person.first_name
person.first_name
>>> person.first_name == 'John'
person.first_name = 'John'

Via, SQL API "I'd rather SQLObject be built on some ORM-neutral layer, where you can move down to that layer when SQLObject doesn't fit your problem; as opposed to now, where you kind of have to work around SQLObject."

This is almost exactly like something I was thinking about, to prevent semantically incorrect SQL queries. Add an AJAX interface on this and it would be cool and useful.

DRY and Embedded Program Code

What if you could define a user interface and surf it via a telephone or browser and that the data and state from one to the other was able to be shared across multiple users?

Beyond interactive voice response "English’s hope, he tells Inskeep in the interview, is that companies will admit how infuriating their systems often are...the practice of automatically collecting customers’ account numbers, and then making those customers repeat the numbers to an agent when they finally connect to one -- my top IVR gripe"

"Voice calls must be able to recruit data channels, and vice versa. That way, an agent could attach an IM session to your voice call and push you the URL in real-time chat. It might even be appropriate to extend the data session with screen sharing, so the agent can watch and assist. If things still don’t work out and the whole matter must be referred to someone else, you’d like to be able to initiate voice or data communication -- or both -- in a context-preserving way."

Via, Rethinking customer service. This points to XBL2 and an effort to provide "...a declarative format for applications and user interfaces...based on an existing application/UI format, such as Mozilla's XUL, Microsoft's XAML, Macromedia's MXML or Laszlo Systems' LZX..."

Model Driven, Semantic Web Enabled, Science Commons

Semantic Web eyed for life sciences data "The Semantic Web involves a concept in which data from multiple sources and ontologies can be integrated into a single information space. Experiment design automation (XDA) software vendor Teranode, which focuses on software for life sciences, plans to collaborate with Science Commons to build a neurology repository for the Semantic Web."

NeuroCommons is part of the ScienceCommons project, it is going to provide a database and annotations of scientific data in (presumably) RDF.

Teranode explains with their XDA product, why model driven and why the semantic web.

A related post, via Etymon.com Federated Databases in Science "The astronomy, chemistry, and geospatial communities were active well over a decade ago in collaborating with information scientists on federated databases through various open standards. Molecular biology is a field that currently has considerable needs in this area, stimulated by the Human Genome Project. Developing common standards through consensus is of course not a technological solution. The Web is successful because it exploits the relationships among a huge number of people making individual judgements that only people can make. Even the Semantic Web, if it ever has a chance of working, would have to depend on a very large base of common metadata standards, and that can only result from the slow process of people coming together and agreeing. There are many things that information technology cannot do on its own. The semantic integration of knowledge still remains a human activity."

Friday, December 09, 2005

Graphical Batch Files

AutoMate This is a pretty interesting application that basically provides similar functionality that OSX Automator provides. While you can create customized tasks in VBA it has lots of interesting inbuilt functionality manipulating Excel, FTP, terminal emulation, keyboard and windows manipulation.

Related, but much simpler is a piece of software called, AutoIt which was "...initially designed for PC "roll out" situations to reliably configure thousands of PCs, but with the arrival of v3 it has become a powerful language able to cope with most scripting needs." Basically allowing simple automation of window, mouse and keyboard events.

Thursday, December 08, 2005

One Way Java is Better than Ruby

The unbridled humanity of APIs "But I think the Java guy has a point: 78 methods on your list objects isn't good. Less methods is good. Unless the result is stupid. Now, let's be honest here, Java is stupid. Dumb, idiotic, maybe written by people who aren't programmers; I just don't know how else to make sense of it. list.get(list.size() - 1) should be embarrassing. list.last or list[-1]? I think [-1] reads well enough, and fits into a very elegant set of functionality involving slices and whatnot. But I also think list.last is entirely justifiable. OTOH, list.get(0) isn't embarrassing, so list.first isn't as compelling."

"Maybe an interesting parallel is 0 vs. 1 indexing. 1 clearly seems more humane. I personally count starting from 1. I'm naturally inclined to index from 1. Languages go both ways on the choice...Of course Smalltalk indexes from 1, so no one gets everything right."

Humane Interfaces "Part of the reason this argument could go on forever is that Ruby’s Array is both an example of arguments for Humane design, and arguments against it...java.util.List isn’t really a shining example of good interface design either...Having two otherwise equivalent ways to perform the same operation is bad user-interface design, and it’s bad library interface design, because the existence of the synonyms actually adds to your cognitive load by making you choose between them."

Also, Why Ruby Shouldn’t Be Your Next Programming Language (Maybe).

Time you enjoy wasting, was not wasted

Wednesday, December 07, 2005

Making AJAX cool

Why Ajax Sucks (Most of the Time) "For new or inexperienced Web designers, I stand by my original recommendation. Ajax: Just Say No."

"Ajax breaks the unified model of the Web and introduce a new way of looking at data that has not been well integrated into the other aspects of the Web. With ajax, the user's view of information on the screen is now determined by a sequence of navigation actions rather than a single navigation action.

Navigation does not work with ajax since the unit of navigation is different from the unit of view. If users create a bookmark in their browser they may not get the same view back when they follow the bookmark at a later date since the bookmark doesn't include a representation of the state of the content on the page.

Even worse, URLs stop working: the addressing information shown at the top of the browser no longer constitutes a complete specification of the information shown in the window."

Tuesday, December 06, 2005

A little light I/O

Comparing Two High-Performance I/O Design Patterns "It is clear from the charts that C++ is still the preferable approach for high performance communication solutions, but Java on Linux comes quite close. However, the overall Java performance was weakened by poor results on Windows. One reason for that may be that the Java 1.4 nio package is based on select()-style API. Ð It is true, Java NIO package is kind of Reactor pattern based on select()-style API (see [7, 8]). Java NIO allows to write your own select()-style provider (equivalent of TProactor waiting strategies). Looking at Java NIO implementation for Windows (to do this enough to examine import symbols in jdk1.5.0\jre\bin\nio.dll), we can make a conclusion that Java NIO 1.4.2 and 1.5.0 for Windows is based on WSAEventSelect () API. That is better than select(), but slower than IOCompletionPortÕs for significant number of connections. . Should the 1.5 version of Java's nio be based on IOCompletionPorts, then that should improve performance. If Java NIO would use IOCompletionPorts, than conversion of Proactor pattern to Reactor pattern should be made inside nio.dll. Although such conversion is more complicated than Reactor- >Proactor conversion, but it can be implemented in frames of Java NIO interfaces. (this the topic of next arcticle, but we can provide algorithm). At this time, no TProactor performance tests were done on JDK 1.5."

Available in Java and C++ at Terabit.

Sunday, December 04, 2005

Links to Share and Enjoy

Another list of links:

  • XML 2005: Tipping Sacred Cows "Which brings us to one of our sacred cows: for decades we've had SQL for relational databases, and soon we'll have XQuery for general XML, and SPARQL for RDF...what if it was possible to construct a generalized query language, loosely coupled enough to work with any underlying data model? The mathematical basis for this was monoids. The presentation didn't actually define this fairly abstract term, only skipping from trivial examples like or to a fully worked representation of a generalized query. Erik's dynamic presentation style is such that I was not able to copy down the full example before he had moved on to the next slide. Whatever the details, it's a valuable topic in that it gets listeners to question their assumptions and see in new ways."

  • Dabble combines the best of group spreadsheets, custom databases, and intranet web applications into a new way to manage and share your information online. A lot of the same conversations in Agile databases seem to occuring - has related to functionality, data integrity, migration (string to first name, last name, for example), data types and the like. Includes a blog and demo (movie). Merging is coming in version 2.0 and it's RAM based. It just cries out for an RDF data model. Via, Dabble is Bloody Brilliant.

  • Problems with the $100 laptop "The time will certainly come when the appropriate tool to promote economic development will be a laptop produced very inexpensively in large volume. Before that point it will be necessary to implement systems that provide infrastructure which the laptop will need, in addition to producing tangible economic benefits for their users. OLPC is to be commended for raising issues and focusing attention, and for posing some technological challenges in a highly visible way...large sums of money are to be committed to the project in advance to fund manufacturing in deals where the customers are government ministries and not the end users." Also, $100 laptop.

  • Two interesting articles: Breaking The Quality–Speed Compromise "The most important thing we can do to break the compromises we impose on customers is to move testing forward and put it in-line with (or prior to) coding. Build suites of automated unit and acceptance tests, integrate code frequently, run the tests as often as possible. In other words, find and fix the defects before they even count as defects." and Is Agile Software Development Sustainable? "So if agile practices are a “disruptive technology” compared to traditional software development processes, then it would be quite in character for them to start by addressing small systems."

  • Exploratory Testing on Agile Projects Can Be a Good Fit "Why should agile teams do exploratory testing?: "Because an agile development project can accept new and unanticipated functionality so fast, it is impossible to reason out the consequences of every decision ahead of time. In other words, agile programs are more subject to unintended consequences of choices simply because choices happen so much faster. This is where exploratory testing saves the day. Because the program always runs, it is always ready to be explored.""

  • Matthew De George on Cranky Middle Manager. Explaining how to apply market economies to management and more. Forgive Matthew for his hierachical view, it's all about graphs of course. I'm a bit slow in finding this.

  • A different way to vommit in another yearly milestone: Tiger Moth Joy Flights.

Saturday, December 03, 2005

Best things in development are free

One of the key differences between Java and .NET development is cost. To get the right Microsoft solution costs thousands of dollars. And what you get is something very different to an IntelliJ, NetBeans or Eclipse. You get vendor integration (or lock in if you prefer) and competition against the community. It may appear attractive to some but it seems odd to me to actively fight integration into existing, open solutions. It does seem though that the open side is winning (I wish Sun had've done the same thing when choosing a logging API).

Unit testing and source control are obvious ones. It's amazing to see that their entry level IDE does not come with this. There are great free solutions in NUnit and MbUnit. MbUnit is especially cool (released recently) allowing all sorts of built in test fixtures. And there's always Ankh for Subversion integration. There are lots of alternatives.

A summary from a Microsoft developer: Hey, Shareholders! VS 2005 is *Fantastic* and our Developers Love Microsoft! "I might wander in early on Monday to meander through the crowds celebrating the big Visual Studio launch. But my heart is heavy that we shoveled what we could together and Won't Fix-ed this release out the door. Microsoft has just opened a very big door to competition in the IDE space. Or at least towards people jealously holding onto VS 2003 and saying, "CLR 2.0? Screw that! The last time I tried to use generics my machine locked up!" Big freakin' mistake. Microsoft should be ashamed."

Monday, November 28, 2005

Little Ink

  • Semantic Web as Webized Database "In the absense of the close coupling of designers, developers, users, and applications that is found in successful database implementations, what do the semantic web technologies offer in the way of establishing a shared view of the corespondence between the data and real world?" Links to Adam Bosworth's Learning from THE WEB.

  • "IRIS is a semantic desktop application framework that enables users to create a “personal map” across their office-related information objects. IRIS includes a machine-learning platform to help automate this process. It provides “dashboard” views, contextual navigation, and relationship-based structure across an extensible suite of office applications, including a calendar, web and file browser, e-mail client, and instant messaging client." Screenshots.

  • A relational algebra for SPARQL "Despite being in the Last Call stage of the W3C recommendation track, the SPARQL query language document currently lacks mathematical rigor and fails to accurately define the semantics for some cases...SPARQL doesn’t use a special value to indicate missing information, but simply leaves variables unbound. There’s no explicit heading. The SPARQL model does not, for example, distinguish between an OPTIONAL variable that is unbound in some solutions, and a variable that is not used in the query at all."

Faking It

A RATIONAL DESIGN PROCESS: HOW AND WHY TO FAKE IT "(1) In most cases the people who commission the building of a software system do not know exactly what
they want and are unable to tell us all that they know.

(2) Even if we knew the requirements, there are many other facts that we need to know to design the software. Many of the details only become known to us as we progress in the implementation. Some of the things that we learn invalidate our design and we must backtrack. Because we try to minimize lost work, the resulting design may be one that would not result from a rational design process.

(3) Even if we knew all of the relevant facts before we started, experience shows that human beings are unable to comprehend fully the plethora of details that must be taken into account in order to design and build a correct system. The process of designing the software is one in which0 we attempt to separate concerns so that we are working with a manageable amount of information. However, until we have separated the concerns, we are bound to make errors.

(4) Even if we could master all of the detail needed, all but the most trivial projects are subject to change for external reasons. Some of those changes may invalidate previous design decisions. The resulting design is not one that would have been produced by a rational design process.

(5) Human errors can only be avoided if one can avoid the use of humans. Even after the concerns are separated, errors will be made.

(6) We are often burdened by preconceived design ideas, ideas that we invented, acquired on related projects, or heard about in a class. Sometimes we undertake a project in order to try out or use a favourite idea. Such ideas may not be derived from our requirements by a rational process.

(7) Often we are encouraged, for economic reasons, to use software that was developed for some other project. In other situations, we may be encouraged to share our software with another ongoing project. The resulting software may not be the ideal software for either project, i.e., not the software that we would develop based on its requirements alone, but it is good enough and will save effort."

Does writing software show a lot about human nature? Via Why There Is No Rational Software Design Process.

Saturday, November 26, 2005

JRDF 0.3.4.2 is out

Another bug fix release. This time it was problems in the parser with RDF/XML literals and a fix for EscapeUtil. The first was an interesting problem, not directly to do with fixing the problem per se, but trying to maneuver the code in such a way as to only use the standard Java SDK. The second was a simple regex change and finding out how Unicode support has changed from 1.4 to 1.5 (and how it breaks certain regexs from 1.4). It now means that JRDF 0.3.4.2 works under 1.4 and 1.5 as expected.

Both of these problems reinforce the idea that without tests to prove code works almost always means that it doesn't.

Microsoft has an interesting and incorrect words on TDD (from here).

UPDATE: Microsoft seems to have pulled the above mentioned article, although it's still available in Google's cache.

UPDATE 2: More comments artima developer forum.

UPDATE 3: Microsoft Gets TDD Completely Wrong "Microsoft has completely missed the point of TDD. They got it wrong. Do not follow their guidelines: they will decrease productivity. You'll find that the process they've described doesn't work. If you stick with it, you'll find yourself writing increasingly bad code to work around its problems."

Doomed to Repeat History

Detecting Semantic Errors in SQL Queries, for example: "SELECT * FROM EMP WHERE JOB = ’CLERK’ AND JOB = ’MANAGER’". Obviously, JOB cannot be two values - yet you are allowed to express it. This and other examples are queries that are simply wrong and should be detected as such. It seems that this entire area of research exists because it's possible to write semantically incorrect queries in the first place. While the first example above may require something like better feedback at the command line, other examples like HAVING and DISTINCT exist because of the language design. It would be neat to have an SQL interpreter that would prevent these incorrect queries from being submittted (similar to IntelliJ or Word). It also covers solving problems with subqueries (using Skolemization) and Null Values.

From the SQLLint page.

Military Intelligence and IQL

Smart Searching "Sources of intelligence in the field include feeds from UAVs, intelligence, surveillance and reconnaissance data from a vast array of sensors and overhead platforms, signal intelligence, satellites, film and video, not to mention all the data from the open source world."

"To conduct research and analysis effectively, DIA relies on a broad inventory of technology tools from such companies as Endeca Technologies, Basis Technology, Inxight Software, Insightful, Attensity, Convera, NetOwl and Clearforest."

"Many of the search engines familiar to consumers, such as Google, are based on Boolean logic to perform a search with complex, long queries. But the Boolean language is not as expressive as it could be in the kind of query one can pose. “The InFact Query Language (IQL) can express in three words what would take 20 lines using Boolean language,” said Marchisio.

Although Insightful offers a four-hour course on using IQL, it is currently working to increase usability in order to eliminate the need for the course and to reach a broader audience among those who are not all super users or might not have the time to learn how to use the technology."

Examples of the InFact query language can be performed on the live demo here. For example: "USA > invade > Iraq - returns links to all sentences mentioning USA invading Iraq", "[organization] > win> contract - returns links to sentences mentioning who won contracts" and "* > attack > 1st Infantry Division - returns links to sentences mentioning the division being attacked".

Also related to this area, using Semantic Web technologies is Ontology Works offering integration into legacy systems and terabyte scale knowledge servers.

Friday, November 25, 2005

Google Base data belongs to us

"I’ve read the little background material on Google’s Base and still can’t see whether the material you put there can be found by other search engines. I also cannot find evidence of an API that shares any standards for tags and structure. Is Base open or closed? So far, closed."

"I wish I were hearing more noise from the microformats guys to act as competitors — or at least as pressure on Google for openness and standards."

Google Base seems like a roach motel for your data. Its not Google's data - its your data. Danny has some thoughts on this.

Apparently, the answer is put your data on the web. The question remains, what will eBay's and database companies response be.

Friday, November 18, 2005

Old Skool

This arrived in my letter box (a graphical view of the included games is here). Another year not dying now has the added benefit of being able to relive part of my childhood in joystick form. For a mere $35, there are plenty of options for hacking: putting it in an old floppy disk drive, putting it in a case as well as hooking up a PS/2 keyboard and monitor. More information on Wikipedia, schematics and on a DTV hacking forum.

Sunday, November 13, 2005

Kiwi Country

As a warning, this is not a usual post about RDF, the semantic web, Java, software development, etc.

This is just a little list I made on holiday based on my limited interactions while touring the south island of New Zealand:

  • Money - Finally a country where the denominations are the correct size - the $2 coin is bigger than the $1 coin. It annoys me that denominations aren't based on size - both the US and Australia are guilty of this. Another example of some sensible thinking, the emergency number is 111 rather than 000 (New Zealand was first of course). You're having a heart attack, you've got a rotary phone, which number would you prefer to call.

  • Lack of infrastructure - this is related to population I guess but the lack of guard rails, petrol stations, dual carriage ways, etc. especially around mountainous roads was very disconcerting.

  • No newsagents - I really can't imagine living in a place that doesn't have newsagents or good book stores. In the US and Australia there's usually a healthy selection of magazines and newspapers - the only magazines that I could find were in a bookstore and there was a Mac one, a PC one and that's about it.

  • Food - was pretty much the same with Coke tasting the same (although 20mls less in a can), more pie shops, good white wines (chardonnay that was drinkable), chocolate fish and alcohol in supermarkets.

  • Language - while I started truncating my vowels by the end of the trip I also noticed that most people used "wee" rather than "little" a wee bit more than I was used to.

  • Another country with long history of stupidly introduced species. Somewhat of a surprise was that most of the really bad ones are from Australia. Possums, magpies, wallabies, stoats, weasels and rabbits seem to be the main culprits for a country whose native fauna consists mainly of birds. The idea of a ferret eating a penguin alive rids you of some of the cuteness associated with mammals. I knew about the possums beforehand (which are pretty awful anyway) but the damage is pretty hideous and its good to see some productive use is made of them with possum fur combined with wool in clothing. The bird highlights were the kiwi, kakapo (introduced to me in Last Chance to See and recently highlighted in Kakapo Crisis), kea (also a concept extractor) and fiordland crested penguins (who have lost the fear of land based predation). The future does look good with many of them being introduced on islands where they are safe.

  • Real estate or how beautiful places are being overrun by rich foreigners. This not unique to New Zealand or Australia but it was sad to hear that working class locals are quickly being out-priced by rich foreigners and may not be able to afford to live in the same place as where they have grown up.

  • It would appear that the Maori culture is vastly better integrated and embraced than native culture in Australia (and many other parts of the world I would guess). There is a dedicated Maori television channel, radio programs, Maori is the official language and there seems to be much greater cultural interaction. Its not without it problems and injustices but it seems New Zealand has a much better history of treating people decently. This includes it being the first country to give everyone over 21 the vote (in 1893) and generally being a progressive socialist state. Another example, universal voting wasn't granted until the 1960s in Australia whereas in New Zealand it was there from the start (it did require voters to have individual title of the land however).

  • Water, water everywhere. I'm fairly convinced that some places, like South Australia, are fairly silly places to live, especially compared with somewhere like New Zealand - which is basically paradise without the snakes. New Zealand is almost arrogantly wet and fertile. The sheer number of animals (mainly sheep) in a paddock was impressive.

  • I don't want to reinforce stereotypes but the most annoying tourists would have to be American. It wasn't the loud complaining per se but more the insistence to understand everything and it had to be within their own cultural context. There seemed to be a lack of adapting and accepting things and it's just annoying to always suppose that something is worse just because it's different. I know this isn't an unfamiliar sentiment - the converse is often said to be true - that others don't stand up for themselves enough. I'm trying to be even handed here, maybe everybody does it and sure not all Americans are the same, maybe its just the ones that go on holidays, maybe I'm a cultural fascist, but it was really, really annoying.

  • And one last thing, the iPod compatible bed.

Monday, November 07, 2005

Lazy Links

* MKSearch Beta 1 Released - includes web crawler, HTML metadata extractor, and RDF storage using Sesame. Also, MG4J (Managing Gigabytes for Java).
* Open Source Java Application Management: BlueGlue and MyJavaPack. A little different to Ivy (also interesting IvyCruise).
* Concurrency JSR-166 Interest Site includes interesting posts like Java Memory Model versus dotnet Memory Model mentions the forthcoming problems with Java code when multi-core systems are rolled out. Also, Concurrent Skip List Map (coming to Java 6).

Sunday, October 30, 2005

Duplication City

Sgt Peppers Paradise

I'm being lazy (from a popular link) but its kinda good...

Friday, October 21, 2005

Intelligent Design comes to Australia

Intelligent Design "Scientists across the country are outraged.

So what's all the fuss about Intelligent Design?

Intelligent Design or ID, is being put forward as a serious scientific theory. It's even found it's way into some high schools."

They made mention of The Wedge Document which outlines a way to conquer the world. Apparently you need more than a death ray these days.

Maybe some Gerin Oil "dry-out" clinics need to be created. Adam Bosworth recently decloaked an article related to it too.

Monday, October 17, 2005

Rocking with Ruby

"Beyond Java" This book is a light read, even I read it in a couple of hours, about the circumstances around Java's success and its possible successor. There's a lot of first hand interviews about what's wrong with Java, what's right with Ruby and other languages and why Rails maybe Ruby's killer application (and why that's important). C# and .NET was not seen as a replacement to Java mainly because it doesn't add enough over Java and the reasons why Java became successful in the first place hasn't changed.

Overall, I was a bit disappointed with this book especially with its lack of depth. The thing that the book struggles with the most, is that it's been almost impossible to guess where the next disruptive technology will come from - most of the comments in the book are feelings or from a subjective point of view.

Some points raised in the book that reinforced my existing prejudices:
* That future languages and platforms will probably be deployed on .NET and Java VMs. The competition between the two seems to have a positive impact on both - locking out any competitors. That means, there's something to look forward to in Java 7 and .NET 3.
* That VB developers were totally hosed by .NET. Rails and environments like it will bring back that kind of development. The first to harness the disaffected VB developer wins.
* Continuations are going to be a key piece of infrastructure.
* Sun is moving too slowly especially with the JSR process. For example, the solution to JAR hell won't be available until 2008.

Update: The ServerSide's post article has some good comments and responses from Bruce Tate.

A few more things read over the last week or so on Ruby:
* Updates from the recent RubyConf 2005 JRuby Progress "...With the stackless interpreter, stack-depth was limited only by available memory. He showed a fibonacci calculation of 3000 (64MB), then 150000 (1G) with amazingly good performance..." and mocking in Ruby with Lafcadio and MockFS.
* Why's (poignant) Guide to Ruby "This is just a small Ruby book. It won’t crush you."
* What Is Ruby on Rails "Rails helps you achieve this new level of productivity by combining the right ingredients in the right amounts."
* You Can't Educate Pork Links to slides on "Dynamically Typed Languages on the Java Platform".

Wednesday, October 12, 2005

Serenity

There was a scene in the movie, with a mumified corpse in a vehicle, that reminded me of something I'd seen before in a book called "Spacewreck".

Spacewreck is part of a series of children's books Terran Trade Federation. Seems to ilicit a strong response in people including modelling them in PovRay and Homeworld mods.

A directory filled with the original artwork from the first book is here. Two paintings by Angus McKie look a bit like the design of Serenity without the engines (here and here). This is mentioned in a Firefly site thread.

More gallery links: Fred Gambino, Peter Elson, Chris Foss and Angus McKie.

Monday, October 10, 2005

SPARQLing

toward SPARQL CR, PR, REC, press releases... schedule review "...the valueTesting and sort issues continue to act like difficult-to-drain swamps...The objection from Network Inference saying that SPARQL should have used XQuery remains...I wanted to have a much smaller spec (without SOURCE/GRAPH etc.) and finish earlier... since we didn't meet the April CR milestone, I'm pretty much convinced that what I want and what this WG wants are pretty different..."

RDF DAWG Issues List: sort, "...re-opened 2005-08-03 following comment ORDER with IRIs." (also, extending < operator and lexical vs value space), valueTesting, disjunction, countAggregate , "these are complicated in RDF due to open world notions of equality and inequality.", and bnodeRef, "Users seem to find it useful to refer to bnodes given by the server, though the scope of a bnode is usually one lexical graph."

A look at implementing SPARQL using Lisp: "...parts of the grammar weren't designed to be easily parsed; it has a number of features that are clearly user-focused, such as optional dots. (If it were designed to be easily parsed, it would be s-expressions from the start...)...I have seen objections made to UNION, on the grounds that implementation is difficult. I'd like to nip that one in the bud; I wrote twinql in a few weeks, not full-time, inventing algorithms as I went, and UNION wasn't difficult." Alternatively, I guess it could've been XML.

A response to "evaluating SPARQL w.r.t an RDF query language survey" which lists how you can (or cannot) perform queries that: "Return the labels of all topics that are not titles of publications", "Count the number of authors of a publication" and "Return all publications where the page number is the integer value 8."

Just for Fun

-Ofun "One of the key realizations of modern Internet projects (the oft-quoted Web 2.0) is that on the whole, your users can be trusted. The key is that the users also need to have the tools needed to repair any damage the tiny minority may cause. For a development project, modern version control systems can give you "anarchy with an audit trail". If something does go wrong (intentionally or more likely accidentally), it's easy for any other developer to identify and fix or revert the problem. Having this safety net allows the project to run full-bore without time-wasting process getting in the way, and without undue worry that code quality will suffer...before atomic changesets and quality merge tools, it was extremely difficult to roll back a single change made at some point in the past; now it is much easier to do so. And without a proper test suite, it's hard to tell if a change broke the code in the first place...skill ladders are part of the very definition of fun. True passion and community-building rarely develop around a project that doesn't have such a ladder."

Via Slashdot.

Sunday, October 09, 2005

JRDF 0.3.4.1 Out

Now available for download.

Some thoughts related to this:
* To ensure valid refactorings and bug fixes it seems sensible to go back and do some things properly. It means that the next highest priority will have to be writing an NTriples parser to validate the RDF Test cases. The SableCC grammar for NTriples is already done.
* While this is a good start, it's not nearly fine grained enough to actually write the code and have fast running unit tests. Tom has done some of this test driving a parser with his SPARQL work although I suspect I may do it a little differently (using EasyMock).
* Graph equality and isomorphism on graphs is a pain - which is to say that it's complicated with blank nodes (see the test cases). In "Matching RDF Graphs it lists possible ways to do mappings, including a blank node may map to a labelled node. I know that in Kowari loading the sample FOAF files twice into the same graph adds the blank nodes twice. While this seems like a mistake it is the fastest way to load a graph. Maybe implementing different loading modes (no duplicate checking, blank node to blank node and blank node to labelled node) and signing grahs are two ways to help this.

Friday, October 07, 2005

JRDF 0.3.4.1 Soon

So the JRDF 0.3.4 was replaced last night due to a problem in the Ant build script creating it with source instead of classes - so much for QA.

Following that little discovery a few other problems related to RDF/XML parsing have been fixed: Does not resolve relative URIs correctly and Stacked xml:base directives with relative URIs not processed. The first is due to the difference between the way Java's URI classs resolves and the way its defined in RDF/XML. In section 5.2 of RFC2396 says: "...any characters after the last (right-most) slash character, if any, are excluded." and the RDF/XML specification says: "These specifications do not specify an algorithm for resolving a fragment identifier alone...The empty string is transformed into an RDF URI reference by substituting the in-scope base URI."

The second is a fix from Sesame's RIO.

Wednesday, October 05, 2005

Hypertime

Writeboard "Every time you save an edit a new version is created and linked in the sidebar. This allows you to write without fear of deleting something, overwriting something, or losing a better version of the document from last week."

Seems very similar to "Reasoning behind the OSMIC proposal: MODELS OF TIME, BACKTRACK AND GROUPWARE"

Out meta-ering the competition

Ning - a meta social app platform ""So after being offered a large number of social media applications, we are now into the meta-framework to build social media applications. The notion is interesting: as we have come to expect that any consumer application will include some element of social networking, collaborative filtering, tagging, etc., Ning has the first shot at claiming platform status in the social phenomena by offering building consistent building blocks (though wikis could probably claim anteriority)."...What it does do is make APIs an even more important feature for social apps to have, because Ning users will potentially have less need to visit the websites of Flickr, del.icio.us, 43Things and others. Ning will drive customers to those other services, but via the APIs. "

I always seem to quote people who quote other people, seems apt in this case.

Tuesday, October 04, 2005

Gun

Google and Sun together "Google and Sun Microsystems will hold a press conference on Tuesday at which they're expected to announce a collaboration to bring StarOffice productivity applications to Google users..."Imagine StarOffice running on the desktop, and Google perfecting the [file synchronization]," said Edwards. "Then you have your collaboration space carved out immediately for you, and Google is hosting it.""

Does it even make sense for a company that is selling open standard, network centric, highly scalable applications on Unix combines with a company that invented Java?

Imagine a host of GApps (GMail, GCal, GIM, GWorld, etc.) working on your desktop.

Via Update: Sun Marries Google? and a smidge more Sun turns up volume against Windows Vista and Office 12 (links to The Value in Volume).

Monday, October 03, 2005

The Future is Google Ads Everywhere (powered by SPARQL)

Scoble ? RDF "SPARQL is an answer to the question “What if I want to do SQL-like querying when I know perfectly well that everybody will be using their own incompatible database schema?” I’ve been a SemWeb skeptic, but I look at SPARQL and I think: Suppose you could assemble a ton of property-value pairs about web sites, and suppose on the front end you could build a nice responsive query page that allowed you to compose queries like Scoble’s hotel search; well then, SPARQL would be more or less exactly what you need to bridge the gap. Hey, isn’t Guha’s Alpiri project more or less that back-end? And isn’t Guha working at Google now?"

And what would we do with this perfect engine and ubiquitous bandwidth - ads. Author: Google's Patents Reveal Strategy To Beat Microsof "In Arnold’s analysis, he said some filings in the patent portfolio point to an accelerated use of high-speed fiber and wireless that could be used to deliver Google technology...An ultimate goal of the firm is to deliver completely individualized ads to users."

Friday, September 30, 2005

Evolution not creation

Another speech by Adam Bosworth He talked mostly on a TDD/Agile theme where applications are developed in 2 week iterations based directly on customer usage. This offers a better alternative to software architects spending years in seclusion developing the be-all and end-all API and developers hoping that they will get what they need. He cited Windows as an example of this approach. Unsurpisingly, this leads to more tightly coupled code as reported in the recent article about Microsoft rewriting Windows by the Wall Street Journal.

Bosworth calls this approach intelligent reaction rather than intelligent design. Google, eBay and Salesforce were given as examples of this application driven rather than API driven development.

Relating this back to RDF and agile databases, the experience at Salesforce is that the most significant amount of effort is spent customising the data not in other areas such as the user-interface or processing logic. While he doesn't give a reason for data over processing he does say that customised user interfaces are too expensive because they require more user training.

A transcript of some of the talk is available at: Bosworth: The new model is, 'run like mad'.

JVM Dynamic Language Support

invokedynamic: New Java Bytecode for the Dynamics "The new byte code, invokedynamic , is coming to a JSR near you very soon....the verifier won’t insist that the type of the target of the method invocation (the receiver, in Smalltalk speak) be known to support the method being invoked, or that the types of the arguments be known to match the signature of that method. Instead, these checks will be done dynamically."

From a previous article, Pluggable Types "...Ruby (that's basically Smalltalk with a Perl style syntax)..."

Saturday, September 24, 2005

Supping from the Information Firehose

* Oracle 10g Support for RDF "There are procedures provided to search the RDF models with a language that looks like a bit like SPARQL...The search is pretty nice because you can join your search on standard SQL tables, thus combining both your triple model and relational models together.

Oracle also has Rule support, in the form of Rule Indexes. A builtin set of rules for RDF Schema (RDFS) semantics is provided, which is a very nice touch. You are also free to create your own rules, both with the query pattern matching and filters."
* An Atom Store With links to people wanting to create a non-SQL store for Atom, including the very good Bosworth's Web of Data (original here) which is largely about why new databases should not be like Oracle but should distribute the processing (no views, triggers, etc).
* RDFAuthor does SPARQL "Damian is working on a new version of RDFAuthor that generates SPARQL queries (instead of the older Squish notation). It can also (not sure which protocol(s)) get results from a query service." I liked the original RDFAuthor - good to see an update is underway.
* Wrapping rdflib's Graph around a 4RDF Model "I wrote a 4Suite RDF model backend for rdflib, that allows the wrapping of Graph around a live 4Suite RDF model. Finally, I used this backend to execute a sparql-p query..."
* Secrets of lightweight development success, Part 7: Java alternatives Closures, Continuations, Metaprogramming and Reflection: "In short, the Java language just isn't a very productive applications language. The founders made some wise compromises to wrestle control away from C++, but we're starting to pay for those compromises."
* ONJava 2005 Reader Survey Results, Part 1 "Eclipse (76 percent), NetBeans (21 percent), None (17 percent), IntelliJ (13 percent)...JBoss (38 percent), None (28 percent), WebSphere (21 percent), WebLogic (20 percent)"
* Language Innovation: C# 3.0 explained "...most of the features of C# 3.0 are, arguably, nothing but syntactic sugar designed to make programming more productive..." One of the things that was (originally) good about Java was the lack of syntax (I thought anyway!).
* KVM over IP

Wednesday, September 21, 2005

What do you do all day?

Code Is Not An Asset "Since software code is not an asset, but rather a liability, the more we can reduce the deadwood, the better off we are...It is definitely possible to deliver high level of functionality, interactivity and sophistication by utilizing only a portion of code that would normally be used if we stick to the old school (morecode, or more LOC). And that’s a desirable thing."

Monday, September 19, 2005

What's a Unit Test?

A Set of Unit Testing Rules "A test is not a unit test if:
* It talks to the database
* It communicates across the network
* It touches the file system
* It can't run at the same time as any of your other unit tests
* You have to do special things to your environment (such as editing config files) to run it."

"If you write code in a way which separates your logic from OS and vendor services, you not only get faster unit tests, you get a ‘binary chop’ that allows you to discover whether the problem is in your logic or in the things are you interfacing with."

Linkaholic

* Questioning RDF "'Now I hear the argument that one does not need to know hedge automata to use RELAX NG, and all that, but I don't think it applies in the case of RDF. In RDF, the model semantics are the primary reason for coming to the party. I don't see it as an optional formalization. Maybe I'm wrong about that and it's the need to write a query language for RDF (hardly typical for the Web punter) that is causing me to gurgle in the muck.'...I definitely think there is some merit to disconnecting RDF from the Semantic Web and seeing if it can hang on its own from that perspective...I've wondered if there is similar usefulness lurking within RDF once it loses its Semantic Web baggage."
* An early look at JUnit 4 If you don't like TestNG or JUnit 4 what do you do? I like the "failXXX() throws FooBarException" better. The annotations just don't do it for me.
* XML Virtual Machines "You can target XUL for deployment to the Mozilla platform, today. XULRunner now means you can develop and test away in double quick time . Yes, Mozilla is a platform (a XUL visual forms builder could be a game changer on the client, as well saving some folks I know a lot of typing ;). And if you need a thicker-than-that client, then consider the OpenOffice platform."
* The Future of Mobility is Linux "The latest news I’ve seen about the Nokia 770 is that it’s going to have a host of applications ready for it at launch, including VoIP software, streaming media, chat applications, Doom, etc. The thing that’s so amazing about this is that the 770 is essentially the *same exact hardware* that’s on my Nokia 6680, yet the development pace for the 770 is way more rapid. In addition, there’s at least a half a dozen blogs and bloggers dedicated to the device, and it hasn’t even launched yet. This shows the power of an open environment and the draw of Linux and its fans."

Adding more Layers

Crisis "What if we jilted the ugly sisters of rdf:Bag, rdf:Alt and rdf:Alt and took reification out back and shot it? How many tears would be shed?

What if we junked classes, domains and ranges? Would anyone notice? The key concept in RDF is the relationship, the property.

The result would be a subset of RDF, RDF-lite perhaps. All instances of RDF-lite would be valid RDF-full but the converse couldn’t be true. Sparql would still work and so, I suspect, would the OWL machinery despite the omission of classes. RDF diffs would be trivial without blank nodes allowing efficient synchronisation of triple stores. Signing of triples would also be possible without requiring the hoops of canonicalisation to be jumped through."

Sunday afternoon blather "So (ramble nearly over) basically I don’t think there’s any need to define a subset or simplification of RDF because it’s already layered. You don’t need reification, ontological inferences don’t use them. If all you want to do is publish HTML with a bit of explicit data embedded, or write an aggregator that understands “rel” then fine. If you’re doing good Web stuff, you’re still helping the Semantic Web. What SPARQL brings is a low barrier to entry into the ideas, and a low barrier to development of true Web applications. I do believe SPARQL has the potential to be explosive because it makes the Semantic Web that much more agile."

This got me thinking about a conversation with Tom about the Graph interface in JRDF. It allows you to get TripleFactory (for Collections, reification, etc.) and GraphElementFactory for creating graph elements (nodes and err triples - which is now moved to TripleFactory in 0.4). It makes sense to make these decoupled to provide a tighter interface (for understandability) and less to mock out or stub out when testing.