Monday, July 31, 2006

Sunday, July 30, 2006

Look Twice

An interesting post, "Windows XP, Our New Favorite Legacy Operating System" discusses the ramifications of 5 years without a new version of Windows and subliminal logos (I'll never look at an "Ex" the same way).

We are all focusing on OS X releases (and how Vista is copying OS X) during that time but I haven't found anywhere focusing on the progress of 5 years of Linux development.

Bandwagon

JRDF is now on Google code well the name and the description. The code isn't there yet. It seems quite minimal and there's no web hosting. Overall, I'm not sure what advantage it has over Sourceforge - except for reliability.

What would really good if the Googleplex did continuous builds.

Wednesday, July 26, 2006

Relational SPARQL in JRDF

JRDF can now do the following relational operations on RDF: project, restrict, union, semi-difference, natural join, anti-join, full and left outer join. It does most of the good things like retaining relations all the way through so it can perform the operations without ordering restrictions. However, the query engine doesn't do any optimization yet. It also doesn't have to deal with duplicates so there's no need for DISTINCT and UNION is predictable. The SPARQL grammar is a subset of the current one and it only supports SELECT, join (.), UNION and OPTIONAL. There are also no null values.

There's also a Swing based GUI (that's a bit like Twinkle) that allows the submission of SPARQL queries and renders them in tabular fashion.

I'm thinking that I might add a difference operator (-) and produce an OPTIONAL like query that is not order dependent. I had initially planned to use minimum union (outerunion and tuple subsumption (removing rows with more null values)). However, it does seem that the existing full outerjoin will work with tuple subsumption too - although it does look like more work.

There's also some interesting aspects of the grammar too.

Due to time constraints the level of code quality isn't quite up to what I'm happy with so at the moment there's no release and there's probably still bugs left to squash. For the truely keen the current source code is available in Subversion.

Turnabout is not fair play

He Who Cast the First Stone Probably Didn’t "Although volunteers tried to respond to each other’s touches with equal force, they typically responded with about 40 percent more force than they had just experienced. Each time a volunteer was touched, he touched back harder, which led the other volunteer to touch back even harder. What began as a game of soft touches quickly became a game of moderate pokes and then hard prods, even though both volunteers were doing their level best to respond in kind...Neither realized that the escalation was the natural byproduct of a neurological quirk that causes the pain we receive to seem more painful than the pain we produce, so we usually give more pain than we have received."

Tuesday, July 25, 2006

Ambient orb build task

Finally it's possible to hook up an orb to your Ant build.

Thursday, July 20, 2006

Goggle vs Semantic Web

Google exec challenges Berners-Lee "At the end of the keynote, however, things took a different turn. Google Director of Search and AAAI Fellow Peter Norvig was the first to the microphone during the Q&A session, and he took the opportunity to raise a few points.

"What I get a lot is: 'Why are you against the Semantic Web?' I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first," Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."

Related: Google Base -- summing up.

Wednesday, July 19, 2006

Test Driving Interface Usage

One of the neat things (or drawbacks) with Generics in Java is that the left hand side of an assignment can't always be treated separately to the right hand side because of type inferencing.

A simple example is Collections.emptySet(). By itself it will produce an empty set of Objects. If the left hand side defines a Set of Integers (or you use a cast) then that's what it will return.

Unexpectedly, this makes it possible to enforce an interface on the left hand side of an assignment without using something like Checkstyle or other ways of enforcing coding conventions.

The answer seems to be to create a generic object factory which wraps using reflection to create an object. The simplest API would be something like:
<T> T create(Class<T> concreteClass, Object... values).

To create an empty ArrayList and an ArrayList with a default capacity of 255 you would write:
List list = creator.create(ArrayList.class);
List list = creator.create(ArrayList.class, 255);

This allows you to create expectations such that a call to create must return an interface. Using EasyMock test code is something like (from memory):
List list = createMock(List.class);
creator.create(ArrayList.class);
creatorControl.andReturn(list);

The following creates an exception as it causes the wrong type to be created (an ArrayList not a List):
ArrayList list = creator.create(ArrayList.class)

When the API is this simple it can become quite difficult to work out which constructor should be called. The simplest, at least initially, is to fail when there are multiple constructors of the same length or to create a more complicated API where you specify the constructor types. You could also have more complicated behaviour like calling the most specific constructor, where there are multiple matches for the given objects, could create a more sophisticated version. Starting with the constructor arguments and matching against the objects seems like a quicker implementation than trying to match the objects against the constructors.

Monday, July 17, 2006

Native Vice

"MacViceBuilder is a set of tools to build a Mac version of the famous VICE emulator suite...Universal Binaries for PowerPC and Intel Macs are created...For each emulator a bundled Application is created that starts up on single click...Launches required X11 automatically"

Achieving Quality

Technology: Can You Automate Software Quality? "A lot of the debate has been focused on testing. Total Quality, however, would suggest that although testing is necessary, it's not sufficient. Testing focuses on inspection, not on prevention. To over-simplify, you test in hopes of demonstrating that the software has no defects (because you have a good high-quality development process), not to detect the defects that are present, but should not be there (because you don't have a good high quality development process). After several years of significant effort, the code my client was developing (and testing) still isn't of the high quality they are looking for. So we decided to go back, start from some basics, and look again at the issue of software quality."

Goes through the processes to ensure quality software: code quality (using TDD), functional quality (giving the customer what they want), non-functional quality (security, privacy, compliance, etc), deployment and production quality, and maintenance.

"This probably seems like a lot of additional work for the development organization. And it is. But the costs of defective code in the production environment (both direct, in terms of sustaining engineering, and indirect, in terms of lost revenue and reputation) was becoming significant. Management felt it had no choice but to focus on improved quality, and turn to the productivity issues later. So far, despite the added burden of all these quality–related activities (many of which already took place, but simply weren't very effective) we have not seen a slowdown in software availability. Some teams are actually moving faster than before, despite the new work they have to perform, because they spend much less time and effort on remediation and last-minute adjustments to code to fix issues that only show up as the code is transitioned to production."

Saturday, July 15, 2006

Frankly, my dear, I don't give a damn

Mapping Rete algorithm to FOL and then to RDF/N3 "There is already a well established precedent with Python/N3/RDF reasoners (Euler, CWM, and Pychinko). FuXi used to rely on Pychinko, but I decided to write a Rete implementation for N3/RDF from scratch - trying to leverage the host language idioms (hashing, mappings, containers, etc..) as much as possible for areas where it could make a difference in rule evaluation and compilation.

What I have so far is more Rete-based than a pure Rete implementation, but the difference comes mostly from the impedance between the representation components in the original algorithm (which are very influenced by FOL and Knowledge Representation in general) and those in the semantic web technology stack."

The goes on to describe how the mapping from Semantic Web concepts to concepts used in the Rete algorithm (tokens, object type, alpha, beta, and terminal nodes). Some comments, RDF a hole to no where.

Wednesday, July 12, 2006

Incase this happens again

Getting a fresh checkout of JRDF and caused the following error:
"svn: Can't open file '.../.svn/tmp/text-base/NadicJoinImpl.java.svn-base': No such file or directory"

Which then suggets to run cleanup. Which causes this message:
"svn: Can't copy '.../.svn/tmp/text-base/DyadicJoinImpl.java.svn-base' to '.../DyadicJoinImpl.java.1.tmp': No such file or directory"

So I'm unable to checkout or cleanup and everything is in a locked state.

The problem was that there were two files NadicJoinImpl.java and NAdicJoinImpl.java and OS X is a case insensitive filesystem. The solution is to remove one of the offending files.

In my case: "svn delete https://svn.sourceforge.net/svnroot/jrdf/.../NAdicJoinImpl.java -m "I hate subversion""

Sunday, July 09, 2006

Encapsulation for Design by Contract

The Three Reasons for Data Encapsulation "It’s not that data encapsulation does not separate interface from implementation, or that separating interface from implementation is unimportant. It’s just that this is by far the least important reason for data encapsulation...Data encapsulation allows programmers to enforce class invariants, preconditions, and postconditions."

"The I in API stands for interface, and interfaces are for people, not just machines. APIs can be complex, confusing things. The less there is of it, the better. The smaller and simpler the API is, the easier it is to learn and use; the more likely it is that the API will be used correctly."

Rationalizing Relational and OO

Moving forward with relational: looking for objects in the relational model, Chris Date finds they were there all the time. "The question is how to integrate the good ideas of object-oriented database with relational ideas. The wonderful thing is, it turns out you don't have to do anything to the relational model. Absolutely nothing. The relational model is so solid and so robust..."

"The key notion underlying The Manifesto is thus the equation: domain = object class. A domain, or an object class, is a data type that is encapsulated, which means that the only way you can operate on values of that type is through operators that are defined for the type. You don't actually see the way the data is represented. That's not relevant. You only know that there are certain functions you can perform. It might be a primitive system-defined data type. More generally, it's going to be a user-defined data type. The values of these data types can be arbitrarily complex."

"The values in row and column slots can then be anything you like. They can be simple integers. They can be strings. They can be arrays. They can be books. They can be engineering drawings. They can be videos... They can be anything you like - as long as you can define the data type. In fact, I believe do one of the reasons we're hearing so much hype about object-oriented is because of a failure on the part of the relational vendors to step up to the mark. They haven't supported the relational model. If they had, we wouldn't be having these silly arguments now."

"In my opinion, there are precisely two good ideas. One is the data type concept - user-defined data use of arbitrary complexity, with encapsulation and user-defined functions. The other is inheritance. For example, in a geometric database you might have an object class called polygons, and one called rectangles, where rectangles are a subclass of polygons because every rectangle is a polygon. Therefore it follows that everything that works for polygons automatically works for rectangles too."

"If under the covers, the representation changes - if polygons are represented by a sequence of points for the vertices, and rectangles are represented by just the bottom left and the top right corner or something like that - the code to implement the operator has to change too, but that's implementation. From the model's point of view, if you have a function called area that returns the area of a polygon, automatically it means that you can invoke the area function on a rectangle and get the right answer. Under the covers, it may be desirable to reimplement that function. I don't care. That's implementation."

Environment is Inherited


The Ghost in Your Genes
"At the heart of this new field is a simple but contentious idea – that genes have a 'memory'. That the lives of your grandparents – the air they breathed, the food they ate, even the things they saw – can directly affect you, decades later, despite your never experiencing these things yourself. And that what you do in your lifetime could in turn affect your grandchildren.

The conventional view is that DNA carries all our heritable information and that nothing an individual does in their lifetime will be biologically passed to their children. To many scientists, epigenetics amounts to a heresy, calling into question the accepted view of the DNA sequence – a cornerstone on which modern biology sits."

"And Reik's work has gone further, showing that these switches themselves can be inherited. This means that a 'memory' of an event could be passed through generations. A simple environmental effect could switch genes on or off – and this change could be inherited."

So you're fat because your grandparents were in a famine. A recent article in Nature suggests that vitamins taken during pregancy have a permanent effect on subsequent offspring. It certainly should place more responsibility on future parents - doing drugs or whatever may affect your children even if you stopped before having them. Epigenetics on Wikipedia.

When Set isn't Enough

Why isn't there an interface for LinkedHashSet? There's one for SortedSet. It seems that only if there are more methods is there another interface. What about different semantics and performance charateristics like ordering? What makes one characteristic worthy of an interface and not another? Maybe interfaces aren't descriptive enough?

Friday, July 07, 2006

The Future is a little Brighter

David links to a posting he forwarded from the Kowari developers list (original post here) from Amit Kapoor about the future of Kowari: "The Topaz Foundation (http://www.topazproject.org) is very pleased to forward the email, from Michael H. Wallach (Senior Counsel, Northrop Grumman) to Richard Fontana...We trust that this letter will end any confusion with respect to the future status of Kowari, which has been secured, and that the community will now be able to focus on making Kowari one of the most vibrant open source projects."

"Northrop Grumman respects the rights that users of open source Kowari software receive under the MPL. Northrop Grumman intends that open source Kowari software, licensed under the MPL, is and will remain free, open source software.

Moreover, Northrop Grumman has no objection to the continued appropriate use of the "Kowari" name by developers participating in the Kowari open source project."

Three More SW Applications


  1. FOAF to hCard via SPARQL "So I used the scutterplan from the FOAFBulletinBoard, going 2 steps, producing 17382 statements, containing I think it was 2035 foaf:Person nodes...To the results I applied sparql2hcard.xsl using xsltproc, script was sparql-hcard.sh (I do hope this was the latest SPARQL XML results format...), producing these hCards." XLST, querying - sounds like descriptors.

  2. Timeline "...a DHTML-based AJAXy widget for visualizing time-based events. It is like Google Maps for time-based information."

  3. iris: open your eyes to the future of the desktop "Iris is a java open source source integrated desktop environment including email, calendar, file browser, web browser, chat, and a data mining toolkit (clustering, indexing, and lots more)." Uses Jena.

JRuby now does Rails**

"** We are able to generate and run the cookbook demo from rolling with rails tutorial (http://www.onlamp.com/pub/a/onlamp/2005/01/20/rails.html) with what we have and all appears to work. With this said, it is likely that there are several aspects of rails that are not working correctly. See docs/README.rails for known issues/instructions in release."

JRuby "WEBrick runs...Ruby on Rails runs on top of WEBrick (and generation scripts work)**"

Time for a free lunch:

"I believe I've found a 'free lunch' in Ruby on rails. Oh, it's not always free. If I need to do two-phased commit or hardcore object relational mapping, this lunch may cost me more than I'm willing to pay. But often enough, it's for all practical purposes free.

* I can train a team of Rails developers faster than I can teach a new Java developer Spring plus Hibernate plus whatever web mvc you want plus all of the other frameworks and tools Java developers have to know.
* I can build my applications much faster than I could before.
* For many applications, the latency in the database is the overriding concern, so I don't even notice differences in performance.
* I can trivially expose web services, letting other applications, potentially written in other languages, quickly access my Rails services.

Now, I know that some will tell me that the lunch really isn't free. But you can tell that to my customers that pay a fraction of the price they'd pay for a Java application, and get something that's easier to maintain, just as fast, and on an earlier schedule. From that exec's perspective, the lunch is free."

Identifying with the Web

URIs and the Myth of Resource Identity "Another way to put it is that the authoritative descriptive information that I publish licenses the use of my URI in certain models. This is analogous to publishing some interfaces for an object in an object oriented system. You can never be sure that I won't (monotonically) publish an additional interface at some point in the future, just as you cannot be sure I won't publish more descriptive information about me..."

"On the other hand, even if it is not possible to completely describe a resource, it may be possible to unambiguously identify that resource, in the sense of conveying what resource it is, as distinct from all other possible resources.
For example, if I provide descriptive information telling you that the URI http://t-d-b.org?http://dbooth.org/2005/dbooth/ identifies all-and-only the actual, living person with email address dbooth@hp.com as of 1-Jan-2005, that is sufficient to unambiguously identify me, distinct from all other possible resources."

"The ability to uniquely identify a resource -- in the sense of conveying the distinction between this resource and all other resources -- is important because it enables others to publish additional descriptive information about the resource, beyond what the URI owner provides. The Semantic Web is all about the network effect created by the use of URIs as universal identifiers. When a URI's resource is uniquely identified, it enables "anyone to say anything" about that resource[4].

"If a URI's resource is not uniquely identified -- if others must rely solely on authoritative descriptive information about that resource -- then those who wish to make statements about it run the risk that they may have guessed wrong about what resource the URI owner was intending to identify. This hampers others' ability to make statements about that resource, thus diminishing the value of that URI. This is analogous to connecting, to the telephone network, a telephone that nobody wants to call: it consumes resources without contributing anything to the network effect."

Via, URIs and the Myth of Resource Identity.

Related to, Why Different Things are the Same and Architecture of the World Wide Web, Volume One.

Wednesday, July 05, 2006

At War with Ourselves

I read this a while ago but it seems somewhat relevant recently.

The Vietnam of Computer Science "Although it may seem trite to say it, Object/Relational Mapping is the Vietnam of Computer Science. It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy."

Solutions:
"Developers simply give up on objects entirely, and return to a programming model that doesn't create the object/relational impedance mismatch. While distasteful, in certain scenarios an object-oriented approach creates more overhead than it saves, and the ROI simply isn't there to justify the cost of creating a rich domain model. ([Fowler] talks about this to some depth.)"

Fowler's piece I believe is, "GetterEradicator" which links to "Tell, Don't Ask", which makes an important point about Design by Contract, "According to Design by Contract, as long as your methods (queries and commands) can be freely intermixed, and there is no way to violate the class invariant by doing so, then you are ok. But while you are maintaining the class invariant, you may have also dramatically increased the coupling between the caller and the callee depending on how much state you have exposed."

"Developers simply give up on relational storage entirely, and use a storage model that fits the way their languages of choice look at the world."

"Developers simply accept that it's not such a hard problem to solve manually after all, and write straight relational-access code to return relations to the language, access the tuples, and populate objects as necessary."

"Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate)..."

"Developers simply accept that this is a problem that should be solved by the language, not by a library or framework...bring relational concepts (which, at heart, are set-based) into mainstream programming languages, making it easier to bridge the gap between "sets" and "objects". Work in this space has thus far been limited, constrained mostly to research projects and/or "fringe" languages, but several interesting efforts are gaining visibility within the community, such as functional/object hybrid languages like Scala or F#, as well as direct integration into traditional O-O languages, such as the LINQ project from Microsoft for C# and Visual Basic. One such effort that failed, unfortunately, was the SQL/J strategy; even there, the approach was limited, not seeking to incorporate sets into Java, but simply allow for embedded SQL calls to be preprocessed and translated into JDBC code by a translator."

"Developers simply accept that this problem is solvable, but only with a change of perspective. Instead of relying on language or library designers to solve this problem, developers take a different view of "objects" that is more relational in nature, building domain frameworks that are more directly built around relational constructs."

Tuesday, July 04, 2006

Sunday, July 02, 2006

Purity

A couple of Artima posts, "Pure and Impure Functions" and "In Defense of Pattern Matching".

An I quote: "A pure function is one which has no side-effects and does not depend on any state beyond its local scope. This means that it can be replaced with any other pure function which returns the same result given the same input. This property is often referred to as referential transparency...some impure functions, can become pure when they are passed constant values."

"I think pattern matching is extremely convenient and is a good fit with object-oriented programming. I wonder why not more mainstream languages have adopted it."

Links to Scala. I was recently looking at Clean. And if you really want to, "Functional Programming in Java: More fun with Generics and CGLib" "They are Java instance methods which are pure-functional, that is, without side-effects, and do not reference any object fields or non-static variables. Essentially, they are non-mutating static functions declared without a static qualifier."

Searching for Links


  • Re: Implementations using "compositional semantics" Sesame looking at using relational algebra for implementing SPARQL. This maybe similar to the relational RDF approach. More in the Sesame CVS repository and JRDF Subversion repository.

  • DARQ - Federated Queries with SPARQL "It provides transparent query access to multiple, distributed SPARQL endpoints as if querying a single RDF graph. DARQ enables the applications to see a single query interface, leaving the details of federation to the query engine."

  • Why the World is ready for the Semantic Web "Current web applications that could be cited as Semantic Web proofs of concept are…" Google, RSS, Tagging and Mash-ups. Somewhat a response to, "The 7 Flaws of the Semantic Web".

  • OpenRDF "I don't know the people at Aduna, but frankly if sesame is the best they can produce, it's sad. Looking at the code, it's very clear sesame will run slow and scale like crap. The code isn't as convoluted and crappy as Jena2, but it's not what I consider good code either. The design and implementation is flawed all over and will likely need to be thrown out once people try to use it. So far the semantic web community has managed to produce only crap and doesn't know how to implement a scalable, and fast rule engine."

  • The really easy way... "People form, and break...their own social circles, in their own social ways...not Pre-fabricated with all sorts of SPARQLy knobs, and levers, and release valves...Simple tools that make life just a little bit easier at a time is all the world both wants AND needs."

Project Constraints

Ages ago in "Code and other Laws of Cyberspace" the idea of "The New Chicago" is mentioned. It says that behaviour is regulated by four things: the market, the law, social norms and architecture (or nature, the world you are in). In the original context it is described as a way that behaviour is limited in cyberspace and society and that the law can effect the other three to cause indirect behavourial change.

It seems equally appropriate in the context of a project. There are ideas that you can apply in a project that are (or at least should be) constrained in the same ways: laws, markets, social norms and architecture. In a project each constraint does not seem to have an equal effect on the smooth running of a project (they certainly could of course).

Laws don't usually have an impact on the constraints of a project. While people may break the law during the project or whatever and may have to go to court, get arrested, etc. the impact, at least in the projects I've been in, are quite minimal. They may effect certain requirements on the project (compliance to certain laws for instance) but generally they are well known or least able to be handled in a fairly straight-forward manner. No one generally argues whether a certain law should be abided by and usually people are available to ensure that the laws are readily adhered to.

Similarly, market forces have only a few effects on project cohesion as many decisions are already made for you - language, operating system and other infrastructural decisions are not usually up for grabs in a project. While some people may wish to do C# on Linux or Ruby instead of Java most people and projects limit scope in this way and resources are usually selected (or self-selected) on this basis. You don't usually find a .NET programmer amongst 10 Java programmers trying to convince them that Microsoft is the way to go.

The two key areas that do seem up for grabs and do cause problems are with norms and architecture.

The type of teams really dictate the level at which these two things can change and the methodology you use to achieve it. A well run XP project seems to allow a more participatory effect (mainly through pairing and pair swapping). If new ideas are accepted and tried then overall the team seems to have more of say. Certainly, the norms imposed on individual developers seems much higher in an XP project.

The opposite effect, somewhat of a project anti-pattern, is where a developer (or worse more than one) will go off and find a proposed solution and try to apply it to the entire project using a different norm or architecture. The rest of the team, quite rightly, does not feel ownership of the new technique or code. The effect of this is the opposite of what XP is trying to achieve - code ownership and cohesion. The key pushback to this is that no code goes into the project unless it has been paired on. If it is a project that does require a lot of investigation then each member of the team should get the ability to devise a solution to be considered.

There is of course limits placed on any project to change - moving languages or paradigms (OO, procedural, functional, etc) that XP has to abide by too. But in some ways XP seems much more honest (or from some perspectives limited) especially considering the "Extreme Programming Bill of Rights" where these kinds of changes will effect estimates, quality and progress.