Friday, September 30, 2005

Evolution not creation

Another speech by Adam Bosworth He talked mostly on a TDD/Agile theme where applications are developed in 2 week iterations based directly on customer usage. This offers a better alternative to software architects spending years in seclusion developing the be-all and end-all API and developers hoping that they will get what they need. He cited Windows as an example of this approach. Unsurpisingly, this leads to more tightly coupled code as reported in the recent article about Microsoft rewriting Windows by the Wall Street Journal.

Bosworth calls this approach intelligent reaction rather than intelligent design. Google, eBay and Salesforce were given as examples of this application driven rather than API driven development.

Relating this back to RDF and agile databases, the experience at Salesforce is that the most significant amount of effort is spent customising the data not in other areas such as the user-interface or processing logic. While he doesn't give a reason for data over processing he does say that customised user interfaces are too expensive because they require more user training.

A transcript of some of the talk is available at: Bosworth: The new model is, 'run like mad'.

JVM Dynamic Language Support

invokedynamic: New Java Bytecode for the Dynamics "The new byte code, invokedynamic , is coming to a JSR near you very soon....the verifier won’t insist that the type of the target of the method invocation (the receiver, in Smalltalk speak) be known to support the method being invoked, or that the types of the arguments be known to match the signature of that method. Instead, these checks will be done dynamically."

From a previous article, Pluggable Types "...Ruby (that's basically Smalltalk with a Perl style syntax)..."

Saturday, September 24, 2005

Supping from the Information Firehose

* Oracle 10g Support for RDF "There are procedures provided to search the RDF models with a language that looks like a bit like SPARQL...The search is pretty nice because you can join your search on standard SQL tables, thus combining both your triple model and relational models together.

Oracle also has Rule support, in the form of Rule Indexes. A builtin set of rules for RDF Schema (RDFS) semantics is provided, which is a very nice touch. You are also free to create your own rules, both with the query pattern matching and filters."
* An Atom Store With links to people wanting to create a non-SQL store for Atom, including the very good Bosworth's Web of Data (original here) which is largely about why new databases should not be like Oracle but should distribute the processing (no views, triggers, etc).
* RDFAuthor does SPARQL "Damian is working on a new version of RDFAuthor that generates SPARQL queries (instead of the older Squish notation). It can also (not sure which protocol(s)) get results from a query service." I liked the original RDFAuthor - good to see an update is underway.
* Wrapping rdflib's Graph around a 4RDF Model "I wrote a 4Suite RDF model backend for rdflib, that allows the wrapping of Graph around a live 4Suite RDF model. Finally, I used this backend to execute a sparql-p query..."
* Secrets of lightweight development success, Part 7: Java alternatives Closures, Continuations, Metaprogramming and Reflection: "In short, the Java language just isn't a very productive applications language. The founders made some wise compromises to wrestle control away from C++, but we're starting to pay for those compromises."
* ONJava 2005 Reader Survey Results, Part 1 "Eclipse (76 percent), NetBeans (21 percent), None (17 percent), IntelliJ (13 percent)...JBoss (38 percent), None (28 percent), WebSphere (21 percent), WebLogic (20 percent)"
* Language Innovation: C# 3.0 explained "...most of the features of C# 3.0 are, arguably, nothing but syntactic sugar designed to make programming more productive..." One of the things that was (originally) good about Java was the lack of syntax (I thought anyway!).
* KVM over IP

Wednesday, September 21, 2005

What do you do all day?

Code Is Not An Asset "Since software code is not an asset, but rather a liability, the more we can reduce the deadwood, the better off we are...It is definitely possible to deliver high level of functionality, interactivity and sophistication by utilizing only a portion of code that would normally be used if we stick to the old school (morecode, or more LOC). And that’s a desirable thing."

Monday, September 19, 2005

What's a Unit Test?

A Set of Unit Testing Rules "A test is not a unit test if:
* It talks to the database
* It communicates across the network
* It touches the file system
* It can't run at the same time as any of your other unit tests
* You have to do special things to your environment (such as editing config files) to run it."

"If you write code in a way which separates your logic from OS and vendor services, you not only get faster unit tests, you get a ‘binary chop’ that allows you to discover whether the problem is in your logic or in the things are you interfacing with."


* Questioning RDF "'Now I hear the argument that one does not need to know hedge automata to use RELAX NG, and all that, but I don't think it applies in the case of RDF. In RDF, the model semantics are the primary reason for coming to the party. I don't see it as an optional formalization. Maybe I'm wrong about that and it's the need to write a query language for RDF (hardly typical for the Web punter) that is causing me to gurgle in the muck.'...I definitely think there is some merit to disconnecting RDF from the Semantic Web and seeing if it can hang on its own from that perspective...I've wondered if there is similar usefulness lurking within RDF once it loses its Semantic Web baggage."
* An early look at JUnit 4 If you don't like TestNG or JUnit 4 what do you do? I like the "failXXX() throws FooBarException" better. The annotations just don't do it for me.
* XML Virtual Machines "You can target XUL for deployment to the Mozilla platform, today. XULRunner now means you can develop and test away in double quick time . Yes, Mozilla is a platform (a XUL visual forms builder could be a game changer on the client, as well saving some folks I know a lot of typing ;). And if you need a thicker-than-that client, then consider the OpenOffice platform."
* The Future of Mobility is Linux "The latest news I’ve seen about the Nokia 770 is that it’s going to have a host of applications ready for it at launch, including VoIP software, streaming media, chat applications, Doom, etc. The thing that’s so amazing about this is that the 770 is essentially the *same exact hardware* that’s on my Nokia 6680, yet the development pace for the 770 is way more rapid. In addition, there’s at least a half a dozen blogs and bloggers dedicated to the device, and it hasn’t even launched yet. This shows the power of an open environment and the draw of Linux and its fans."

Adding more Layers

Crisis "What if we jilted the ugly sisters of rdf:Bag, rdf:Alt and rdf:Alt and took reification out back and shot it? How many tears would be shed?

What if we junked classes, domains and ranges? Would anyone notice? The key concept in RDF is the relationship, the property.

The result would be a subset of RDF, RDF-lite perhaps. All instances of RDF-lite would be valid RDF-full but the converse couldn’t be true. Sparql would still work and so, I suspect, would the OWL machinery despite the omission of classes. RDF diffs would be trivial without blank nodes allowing efficient synchronisation of triple stores. Signing of triples would also be possible without requiring the hoops of canonicalisation to be jumped through."

Sunday afternoon blather "So (ramble nearly over) basically I don’t think there’s any need to define a subset or simplification of RDF because it’s already layered. You don’t need reification, ontological inferences don’t use them. If all you want to do is publish HTML with a bit of explicit data embedded, or write an aggregator that understands “rel” then fine. If you’re doing good Web stuff, you’re still helping the Semantic Web. What SPARQL brings is a low barrier to entry into the ideas, and a low barrier to development of true Web applications. I do believe SPARQL has the potential to be explosive because it makes the Semantic Web that much more agile."

This got me thinking about a conversation with Tom about the Graph interface in JRDF. It allows you to get TripleFactory (for Collections, reification, etc.) and GraphElementFactory for creating graph elements (nodes and err triples - which is now moved to TripleFactory in 0.4). It makes sense to make these decoupled to provide a tighter interface (for understandability) and less to mock out or stub out when testing.

Lack of Symmetry

Catching up on Journal of Web Semantics: Preprint Server:
* Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary "RDFS entailment is decidable, NP-complete, and in P if the target graph does not contain blank nodes...In this paper we extend the notion of RDF graph to allow blank nodes in predicate position and show that the standard entailment rules for RDFS become complete if rule rdfs7 is replaced by a suitably extended rule rdfs7x. In view of the fact that the semantics of RDFS allows blank nodes to refer to properties, ‘generalized RDF graphs’, which allow blank nodes in predicate position, seem to form a more direct abstract syntax for RDFS than RDF graphs."
* OWL-Eu: Adding Customised Datatypes into OWL "1. OWL does not support customised datatypes (except enumerated datatypes)...
2. OWL does not support negated datatypes. For example, ‘all integers but 0’...
3. An OWL DL datatype domain seriously restricts the interpretations of typed literals with unsupported datatype URIrefs..."

"OWL-Eu supports customised datatypes through unary datatype expressions based on unary datatype groups. Intuitively, an unary datatype group extends the OWL datatyping with a hierarchy of supported datatypes."

Thursday, September 15, 2005

Really Dynamic Framework

Semantic Rails, Semantic Django: Pushing RDF into MVC "...what Rails-Django would look like if instead of SQL and RDBMS... What if the "M" in MVC were composed of RDF, SPARQL, and a triplestore like Kowari?

Sure, Kowari is probably slower than MySQL, and you probably know SQL a lot better than SPARQL, but RDF is a schemaless data representation thingie. You can start with as little or as much schema as you want or need, and you can use the full expressive power of OWL (which is significantly more powerful than SQL's DDL) when you need it."

Four points are raised in "Semantic MVC".

nodel - HOWTO "Nodel is a kind of application generator. It is a halfway house between the ontomatic, which isn't yet stable enough in implementation or customisable enough in interface, and the nodedb, which was very hardcoded and application specific."

Related to: RDF, the ultimate agile database, Another Agile Database User and Scripting the Semantic Web.

Bitten by the Software Health Bug

SPARQL support Tom comments about a number of things we have both learned since working at Tucana on making better software: "The unit tests are not unit tests. Most of the tests in Kowari are integration tests, that is, while they may test only a single class, they test that class with the roles of its dependent class filled by real concrete instances, rather than mocks...Andrew and I have been bitten by the agile, quality, TDD and refactoring bug, and having our own small project to play with makes this much easier. It also allows us to enforce higher standards on the codebase as there's only two of us that need to agree at the moment."

"I hope none of this comes off as criticism for Kowari as this wasn't my intention. It's certainly the best triplestore there is, and the project is moving ahead at great guns now. It's basically a reflection that Kowari is too heavyweight to do what we want to do and that both projects have different goals."

The difference in what is a unit test is significant, Martin Fowler's "Mocks Aren't Stubs" is a good introduction - although he remains on the fence. We didn't - we used interaction based testing throughout. We still had the usual integration tests but it also included environment, wiring (Spring), and others. Interaction based testing seems to lead to better software. We even test drove XML, properties, etc.

How the software is better is not in the quality code so much (although I do think the initial number defects is also lower) but in the health of the code. A healthy code base can respond quickly to changes in architecture, requirements, etc. or in other, trendier words - it's more agile. Kent Beck's talk on software health is well summarized here.

Paul O'Keeffe and Greg Davis gave a presentation on Sustainable Software Development - firmly on the interaction testing side. You really do seem to get benefits late in a project with a sustainable, healthy code base.

I think it's covered in, "Using Software Maintainability Models to Track Code Health" but it doesn't seem to be available online. Another paper though, Software Deterioration And Maintainability – A Model Proposal contrasts the difference between "fast hack" and a "controlled update" for example and the need for just the right amount of effort for maintaining software. Finding research on interaction versus state based testing would be interesting.

JRDF still needs refactoring to get into a stage where it is test driven. And test driving isn't always a good tool to develop solutions to problems (to hack, to spike). So there's still a place for that too.

Monday, September 12, 2005

Indices and Refactoring JRDF

Staring back at me today is something that Paul probably saw last year. But it's something that I found pretty cool because initially it looked like a mistake. Almost as cool as Tom starting to get SPARQL going.

Basically, there are three indices in JRDF: (0,1,2), (1,2,0) and (2,0,1). With the 0 equating to the subject, the 1 the predicate and 2 to the object.

Looking at a refactoring between these internal indices and triples (ordered by subject, predicate and object) there are these three mappings:
* (0,1,2) maps to (s,p,o) using (0,1,2),
* (1,2,0) maps to (s,p,o) using (2,0,1), and
* (2,0,1) maps to (s,p,o) using (1,2,0).

Using the second index as an example, it means that the internal representation maps the third element to the subject, the first element to predicate and the second element to the object.

The surprising thing I noticed, beside the other two indices not being tested correctly, was that the other indices ((2,1,0), (1,0,2), (0,2,1) - these are mentioned in Paul's post) don't seem to have this property. They all seem to map (s,p,o) like (0,1,2) - i.e. using themselves. Whereas, (1,2,0) mapped to itself gives (2,0,1) and (2,0,1) mapped to itself gives (1,2,0).

Thursday, September 08, 2005

Further Beyond Java

A little bit late but The JavaCast has an interview with Bruce Tate.

Update (15/11): Fixed the link to the MP3.

Wednesday, September 07, 2005

Stacks of StAX stacks compared

Streaming APIs for XML Parsers "The performance of the five parsers was measured over a large number of XML documents." BEA, Oracle, SJSXP, XPP3 and Woodstox compared. The last two come out on top, although "...XPP3 is based on XmlPullParser APIs and not JSR-173 compliant. XPP3 is a parsing API that will work with small devices (J2ME compatible)." SJSXP has the advantage that it supports, "...symmetrical bi-directional APIs that can both read and write XML documents using the same representation of XML."

Via StAX parsing performance paper.

Tuesday, September 06, 2005

RDF in the USA

America seems to be where it's at for the Semantic Web:
* Work in the USA "...I decided that I should take the job. This is a big deal...To start with, I'm a contractor working remotely with Herzum until I can get a visa to become a full time employee."
* Semantic Web Yahoo! "At the end of September I'll be leaving the Institute for Learning and Research Technology (ILRT) at the University of Bristol in the UK after more than five years of productive and interesting work with RDF and the Semantic Web including developing the Redland RDF libraries. It's been a great place to work at with super people who have been very supportive of the Semantic Web and innovation."

If you ignore those people at Maryland or MIT or whatever there does seem to have been more European interest in the Semantic Web - maybe this is set to change. This is probably what it takes for it to go mainstream.

Thursday, September 01, 2005

Some Updates

* Ajax Libraries it seems the DWR is best, see demos.
* Web 2.0 "Proponents of the Web 2.0 approach believe that web usage is increasingly oriented toward interaction and rudimentary social networks, which can serve content that exploits network effects with or without creating a visual, interactive web page. In one view, Web 2.0 sites act more as points of presence, or user-dependent web portals, than as traditional websites."
* Clear message for causality "Stenner and co-workers found that although the smooth pulse arrives noticeably earlier through the superluminal medium, the instant at which the "1s" and "0s" begin to differ does not seem to be accelerated."