Wednesday, June 28, 2006
Agile Anchors
XP2006 Patterns lists a summary of experiences in adopting agile development practices including anti-patterns: Unpaid Debt, Lots of Bugs Found by Customer. Some metrics to help drive agile software development with this not so contraversial claim, "Code coverage metrics measure the percentage of methods and classes under test. Developers should strive for 100% test coverage on classes that are behaviorally intense. The CCN can be used as a baseline for the number of tests required per method. If a method has CCN of three, a good starting point to exercise all paths through code would be three tests. Additional tests may be required that exercise error conditions and alternate behaviors. A robust suite of unit tests provides the courage necessary to refactor." Also mentioned, in "Examining Test-Driven Development, "The reality is that your unit tests form a large part of your design specification, and similarly acceptance tests form a portion of your requirements specification..."
Tuesday, June 27, 2006
Revisiting Visitor
I was looking for another approach to implement parsing expressions and came across Visitors in JSR 269 and "The Expression Problem Revisited", "The expression problem (aka the extensibility problem) refers to a fundamental dilemma of programming: Can your application be structured in such a way that both the data model and the set of virtual operations over it can be extended without the need to modify existing code, without the need for code repetition and without runtime type errors."
"All proposals that we know of take as a starting point either a data-centered approach, making it hard to add new operations, or an operation-centered (visitor-based) approach, making it equally hard to add new data types."
"Despite its simplicity, this trick is one of the major contributions of this paper, because it finally moves the visitor-based approach to the expression problem into the realm of static type safety...[it] divides the node classes of the different phases into separate, incompatible families, each characterized, in the form of a type parameter, by the specific brand of Visitor they are capable of accepting."
To put it another way, this enables you to define what implementation of Visitor you wish to use when you construct your visitable classes (at compile time, not runtime) - a pretty neat trick.
Via, why Visitors?.
Update: I've started using these ideas in JRDF to implement the SPARQL query layer and see what practical benefits are.
"All proposals that we know of take as a starting point either a data-centered approach, making it hard to add new operations, or an operation-centered (visitor-based) approach, making it equally hard to add new data types."
"Despite its simplicity, this trick is one of the major contributions of this paper, because it finally moves the visitor-based approach to the expression problem into the realm of static type safety...[it] divides the node classes of the different phases into separate, incompatible families, each characterized, in the form of a type parameter, by the specific brand of Visitor they are capable of accepting."
To put it another way, this enables you to define what implementation of Visitor you wish to use when you construct your visitable classes (at compile time, not runtime) - a pretty neat trick.
Via, why Visitors?.
Update: I've started using these ideas in JRDF to implement the SPARQL query layer and see what practical benefits are.
Wednesday, June 21, 2006
I Will Say This Only Once
Eight barriers to effective listening "By interrupting the speaker before letting her finish, you're essentially saying that you don't value what she's saying. Showing respect to the speaker is a crucial element of good listening."
"Many people have a "messiah complex" and try to fix or rescue other people as a way of feeling fulfilled...Trying to be helpful while listening also implies that you've made certain judgments about the speaker. That can raise emotional barriers to communication, as judgments can sometimes mean that the listener doesn't have complete respect for the speaker."
"Treating discussion as competition is one of the most serious barriers to good listening. It greatly inhibits the listener from stretching and seeing a different point of view. It can also be frustrating for the speaker."
"Many people have a "messiah complex" and try to fix or rescue other people as a way of feeling fulfilled...Trying to be helpful while listening also implies that you've made certain judgments about the speaker. That can raise emotional barriers to communication, as judgments can sometimes mean that the listener doesn't have complete respect for the speaker."
"Treating discussion as competition is one of the most serious barriers to good listening. It greatly inhibits the listener from stretching and seeing a different point of view. It can also be frustrating for the speaker."
Test First Saves Time
To obtain good code, writing tests and code is faster then code alone "Ron Jeffies, one of the XP gurus stated that "in order to obtain good code, writing tests and code is faster then just code". To find out if this is true or not let's make a small experiment."
Too Much Class
This is a paper that offers a potentially positive argument for smaller methods: Reengineering Analysis of Object-Oriented Systems via
Duplication Analysis "A manual approach to the reengineering has been compared against a tool-aided approach, highlighting the time saving (8 man days against 1 man month) and the amount of duplications undiscovered by the manual analysis. It has been stated that a tool for duplication detection can drastically reduce the time to identify the duplication in software systems at file, class and method level only if suitable metrics and duplication analysis support are available. On the other hand, the adoption of a tool only at file level does not guarantee a better identification with respect to what can be performed manually by an expert system engineer. The inspection at class and method levels reduces the time to perform the analysis and allows identifying several activities to be performed during the code manipulation phase."
Duplication Analysis "A manual approach to the reengineering has been compared against a tool-aided approach, highlighting the time saving (8 man days against 1 man month) and the amount of duplications undiscovered by the manual analysis. It has been stated that a tool for duplication detection can drastically reduce the time to identify the duplication in software systems at file, class and method level only if suitable metrics and duplication analysis support are available. On the other hand, the adoption of a tool only at file level does not guarantee a better identification with respect to what can be performed manually by an expert system engineer. The inspection at class and method levels reduces the time to perform the analysis and allows identifying several activities to be performed during the code manipulation phase."
Spring Best Practice
Leverage Application Framework Integration with Spring "Inversion of Control with Dependency Injection and support for Declarative Transaction are Spring framework features that can greatly reduce the complexity of integrating different application layers as well as fosters better OO design and greatly improves testability."
Monday, June 19, 2006
Feature or Property
A Review of Relational Concepts "The relational model includes an open-ended set of generic operators known collectively as the relational algebra [useful link] (the operators are generic because they apply to all possible relations, loosely speaking)...Each of the operators we discuss takes either one relation or two relations as operand(s) and returns another relation as result. Note: The—very important!—fact that the result is always another relation is referred to as the (relational) closure property. It is that property that, among other things, allows us to write nested relational expressions."
SQL doesn't support this and SPARQL has it as a feature (CONSTRUCT) but not a built-in property of SPARQL (results from a SELECT are not graphs). This is different to wanting transitive closure in SPARQL. It's interesting to see that OPTIONAL and UNION are still causing concerns for implementations.
SQL doesn't support this and SPARQL has it as a feature (CONSTRUCT) but not a built-in property of SPARQL (results from a SELECT are not graphs). This is different to wanting transitive closure in SPARQL. It's interesting to see that OPTIONAL and UNION are still causing concerns for implementations.
Saturday, June 17, 2006
Cannon Fodder
Commons Chain "Towards that end, the Chain API models a computation as a series of "commands" that can be combined into a "chain". The API for a command consists of a single method (execute()), which is passed a "context" parameter containing the dynamic state of the computation, and whose return value is a boolean that determines whether or not processing for the current chain has been completed (true), or whether processing should be delegated to the next command in the chain (false)."
Truth, Lies and Lines of Code
The ever quoted, "Broken Windows Theory": "Turns out they're actually great project managers. They knew months in advance that the schedule would never work. So they told their VP. And he, possibly influenced by one too many instances where engineering re-routes power to the warp core, thus completing the heretofore impossible six-hour task in a mere three, summarily sent the managers back to "figure out how to make it work."? The managers re-estimated, nipped and tucked, liposuctioned, did everything short of a lobotomy -- and still did not have a schedule that fit. The VP was not pleased. "You're smart people. Find a way!" This went back and forth for weeks, whereupon the intrepid managers finally understood how to get past the dilemma. They simply stopped telling the truth. "Sure, everything fits. We cut and cut, and here we are. Vista by August or bust. You got it, boss.""
The whacky thing still is that they're measuring productivity of programmers as lines of code, like numbers of cars produced. It's features delivered over time that counts, user visible features.
And here's part of the answer "How to refactor": "Essentially, you decide how it is you're going to write code and that's it. Product management asks for a new feature, you estimate based on your best work. What you don't do is offer choices:
"Well, that's going to affect a part of our source code that's not really testable. Cleaning that up and adding the feature will take a week, but I suppose I could get it done quicker if I don't refactor right now."
Eeek. It doesn't matter how much you decorate offers like that with warnings about how you can't really promise them the hack won't come back to bite them, blah, blah, blah... The quick-and-dirty decisions will be chosen often enough and you'll start moving backwards.
So we only write code one way and there's only one estimate:
"It will take a week. The reason the estimate is longer than similar features is because that code doesn't have unit tests yet and I'll have to move some things around in order to write them. The good news is that the next time you ask for a change in this area, it will go faster."
That's it. No choices."
Update: Peter Kantz's picked up on the same point (and similar title) and added that Microsoft is supposed to be using Scrum, which highlights the need for transparency. Via, "Microsoft Vista: Scrum or Not-Scrum".
The whacky thing still is that they're measuring productivity of programmers as lines of code, like numbers of cars produced. It's features delivered over time that counts, user visible features.
And here's part of the answer "How to refactor": "Essentially, you decide how it is you're going to write code and that's it. Product management asks for a new feature, you estimate based on your best work. What you don't do is offer choices:
"Well, that's going to affect a part of our source code that's not really testable. Cleaning that up and adding the feature will take a week, but I suppose I could get it done quicker if I don't refactor right now."
Eeek. It doesn't matter how much you decorate offers like that with warnings about how you can't really promise them the hack won't come back to bite them, blah, blah, blah... The quick-and-dirty decisions will be chosen often enough and you'll start moving backwards.
So we only write code one way and there's only one estimate:
"It will take a week. The reason the estimate is longer than similar features is because that code doesn't have unit tests yet and I'll have to move some things around in order to write them. The good news is that the next time you ask for a change in this area, it will go faster."
That's it. No choices."
Update: Peter Kantz's picked up on the same point (and similar title) and added that Microsoft is supposed to be using Scrum, which highlights the need for transparency. Via, "Microsoft Vista: Scrum or Not-Scrum".
Friday, June 16, 2006
Explicit Semantics
Here is another bunch of JPA articles.
Refactoring the EJB APIs "Annotations are very useful for things that are semantically part of the application. As an example, consider dependencies that your application has on an environment resource. That's intrinsic to your application: If that dependency is not satisfied, then your application isn't going to run. So that's a declaration of a dependency. Dependency on another EJB with a particular interface type is another such dependency. Those dependencies will have to be satisfied in the deployment environment, but are also intrinsic to the semantics of the application."
An Introduction to Java Persistence for Client-Side Developers "Since it was also designed to be implementation independent, you can change your database or persistence provider as your needs change and not worry about anything breaking. Though Java Persistence was originally designed for server tasks, it performs beautifully in client applications, letting client-side developers spend less time on storage and more time on what they are good at: building killer GUIs."
Refactoring the EJB APIs "Annotations are very useful for things that are semantically part of the application. As an example, consider dependencies that your application has on an environment resource. That's intrinsic to your application: If that dependency is not satisfied, then your application isn't going to run. So that's a declaration of a dependency. Dependency on another EJB with a particular interface type is another such dependency. Those dependencies will have to be satisfied in the deployment environment, but are also intrinsic to the semantics of the application."
An Introduction to Java Persistence for Client-Side Developers "Since it was also designed to be implementation independent, you can change your database or persistence provider as your needs change and not worry about anything breaking. Though Java Persistence was originally designed for server tasks, it performs beautifully in client applications, letting client-side developers spend less time on storage and more time on what they are good at: building killer GUIs."
Thursday, June 15, 2006
Mule with a Spinning Wheel
For some time now, I've been reading Jeff Langr's series on TDD (including testing equality, refactoring tests, and driving UIs). I now feel compelled to make sure I link to it as it's up to the 13th installment "Nine Reasons Why You Should Be Using TDD".
"A lot of the duplication in a system isn’t evident until you start refactoring. Once you begin refactoring, you start recognizing potential for reuse. You’re more likely to recognize that a small, newly extracted method could be called by another method, which in turn simplifies the other method. You start to build simpler classes that do fewer things. Often, having a smaller class increases the likelihood that other classes can reuse it."
I'd like to call this the "Tetris effect" of refactoring code. You get methods/classes/lines of code looking the same, all at the same level of abstraction and any extra complexity can then be easily removed.
"I used to think that I was a pretty good coder. I would spend an afternoon slamming out code. I’d integrate the code and run some manual tests late in the afternoon. Sometimes it wouldn’t work—imagine that! I’d then be forced to spend another few hours unraveling the mess I’d built."
"With TDD, every piece of code exists because it was required by a test. That rule, in turn, means that we have solid unit test coverage. (We don’t have exhaustive coverage, which would be virtually impossible to provide.) Defect rates should come down significantly. Martin Fowler and Ron Jeffries talk about shops that severely reduced their numbers of defects. Most teams working on moderate-sized applications (several hundred thousand lines of code) have defect rates in the hundreds or even thousands per year. Shops doing TDD can achieve rates of less than a dozen per year (but they probably need to be adhering to other proven disciplines as well)."
"The amount of time I feel compelled to go off into a dark cube and think about a problem has been minimal since I’ve been doing TDD."
"A lot of the duplication in a system isn’t evident until you start refactoring. Once you begin refactoring, you start recognizing potential for reuse. You’re more likely to recognize that a small, newly extracted method could be called by another method, which in turn simplifies the other method. You start to build simpler classes that do fewer things. Often, having a smaller class increases the likelihood that other classes can reuse it."
I'd like to call this the "Tetris effect" of refactoring code. You get methods/classes/lines of code looking the same, all at the same level of abstraction and any extra complexity can then be easily removed.
"I used to think that I was a pretty good coder. I would spend an afternoon slamming out code. I’d integrate the code and run some manual tests late in the afternoon. Sometimes it wouldn’t work—imagine that! I’d then be forced to spend another few hours unraveling the mess I’d built."
"With TDD, every piece of code exists because it was required by a test. That rule, in turn, means that we have solid unit test coverage. (We don’t have exhaustive coverage, which would be virtually impossible to provide.) Defect rates should come down significantly. Martin Fowler and Ron Jeffries talk about shops that severely reduced their numbers of defects. Most teams working on moderate-sized applications (several hundred thousand lines of code) have defect rates in the hundreds or even thousands per year. Shops doing TDD can achieve rates of less than a dozen per year (but they probably need to be adhering to other proven disciplines as well)."
"The amount of time I feel compelled to go off into a dark cube and think about a problem has been minimal since I’ve been doing TDD."
Wednesday, June 14, 2006
Test Driving Java 5
I was wrong that you can't test drive generic types in Java. There is a getGenericType and getDeclaredAnnotations. More information, "Reflecting generics". It is a pain though with the methods returning an interface that has to be cast up and the implementations live in the sun.* package.
Good Design and Web Spreadsheets
I've been using Spring Rich Client recently and it made me think of what makes a good design pattern as I go and implement the MVC pattern.
So with object-oriented programming I have to say that the most benefit I've seen is the combination of data and behaviour which enables large parallel development. As Alan Kay has said on his history of Smalltalk, "..."doing OOP right" would be to handle the mechanics of invocations between modules without having to worry about the details of the modules themselves." This is what the naked objects guys call behavioural completeness. They quote Riel by saying: "'Keep related data and behaviour in one place.','Distribute system intelligence horizontally as uniformly as possible, that is, the top-level classes in a design should share the work uniformly.', 'Do not create god classes/objects in your system. Be very suspicious of a class whose name contains Driver, Manager, System or Subsystem.'"
As they also mention, the MVC pattern is frequently misunderstood and misused as the controllers "...take on the role of task-scripts, incorporating not only the optimized sequence of activities, but business rules also - thereby usurping what ought to be responsibilities of the core business objects." This is usually extended to MVCP (or Model-View-Controller-Peristence) where persistence is not hidden from the other layers.
So by taking this more procedural or subject-oriented view, clothing it in OOP and design patterns, it leads to an architecture that is brittle when subjected to change. This approach has been proven to be less able to adapt to changing business requirements than one that is behaviourally complete.
This also ties into the semantic web for object-oriented software developers where it says, "The key power of OWL is that classes can be defined by combining multiple restrictions and other classes. For that purpose, OWL provides logical operands to build intersections (and), unions (or) and complements (not) of other classes. For example, you could define "the class of all customers from France who have issued at least 3 purchase orders or at least one order consisting only of books, except those customers who have ordered a DVD". This is also an example of behaviour and data moving around together and being combined and types inferred - I don't see it as a contradiction between the two as long as you consider a dynamic typing system like Ruby or Java with AOP, reflection, proxies and annotations. I also see it as the killer application of things like Google's spreadsheet - in fact their spreadsheet is an almost ideal expression of this shared, global object that takes messages and performs operations.
As an aside, I went searching for where I first read about the "Design Patterns" book and it was from a review by Tal Cohen who said: "The concept of design patterns was first discussed by Christopher Alexander, an architect, back in 1977. If someone from the software industry would have considered Alexander's' words back then, perhaps the first book about design patterns in software would have dealt with patterns in FORTRAN programs. As it happens, software design patterns were first discussed in this era, when most software design is object-oriented in nature." And recently, although I don't have access to read it, someone has applied design patterns to Fortran 90/95 (might be as interesting as Object Oriented Javascript).
So with object-oriented programming I have to say that the most benefit I've seen is the combination of data and behaviour which enables large parallel development. As Alan Kay has said on his history of Smalltalk, "..."doing OOP right" would be to handle the mechanics of invocations between modules without having to worry about the details of the modules themselves." This is what the naked objects guys call behavioural completeness. They quote Riel by saying: "'Keep related data and behaviour in one place.','Distribute system intelligence horizontally as uniformly as possible, that is, the top-level classes in a design should share the work uniformly.', 'Do not create god classes/objects in your system. Be very suspicious of a class whose name contains Driver, Manager, System or Subsystem.'"
As they also mention, the MVC pattern is frequently misunderstood and misused as the controllers "...take on the role of task-scripts, incorporating not only the optimized sequence of activities, but business rules also - thereby usurping what ought to be responsibilities of the core business objects." This is usually extended to MVCP (or Model-View-Controller-Peristence) where persistence is not hidden from the other layers.
So by taking this more procedural or subject-oriented view, clothing it in OOP and design patterns, it leads to an architecture that is brittle when subjected to change. This approach has been proven to be less able to adapt to changing business requirements than one that is behaviourally complete.
This also ties into the semantic web for object-oriented software developers where it says, "The key power of OWL is that classes can be defined by combining multiple restrictions and other classes. For that purpose, OWL provides logical operands to build intersections (and), unions (or) and complements (not) of other classes. For example, you could define "the class of all customers from France who have issued at least 3 purchase orders or at least one order consisting only of books, except those customers who have ordered a DVD". This is also an example of behaviour and data moving around together and being combined and types inferred - I don't see it as a contradiction between the two as long as you consider a dynamic typing system like Ruby or Java with AOP, reflection, proxies and annotations. I also see it as the killer application of things like Google's spreadsheet - in fact their spreadsheet is an almost ideal expression of this shared, global object that takes messages and performs operations.
As an aside, I went searching for where I first read about the "Design Patterns" book and it was from a review by Tal Cohen who said: "The concept of design patterns was first discussed by Christopher Alexander, an architect, back in 1977. If someone from the software industry would have considered Alexander's' words back then, perhaps the first book about design patterns in software would have dealt with patterns in FORTRAN programs. As it happens, software design patterns were first discussed in this era, when most software design is object-oriented in nature." And recently, although I don't have access to read it, someone has applied design patterns to Fortran 90/95 (might be as interesting as Object Oriented Javascript).
Sunday, June 11, 2006
Extra Spring Things
Spring Modules (SM) 0.4 has been released Validation Module (based on Commons Validator), JavaSpaces and jBPM. Joining the others: cache providers, hive mind, Java Content Repository, rules (Drools, Jess, and JRules), Lucene and OSWorkflow.
The roadmap includes: Spring 2.0 support, JMX, and others.
The roadmap includes: Spring 2.0 support, JMX, and others.
Friday, June 09, 2006
Mission Control
Navigator 4.5 had the ability to synchronize its address book, bookmarks, cookies, history and passwords. Google has re-introduced this for Firefox 1.5 users with Browser Sync.
More Spring Things
"Design enterprise applications with the EJB 3.0 Java Persistence API" lists injecting collaborators of a service using annotations such as "@EJB(beanName = "AccountDao")" to "AccountDao accountDao;".
This compares with Spring 2.0 examples that use the Spring configuration file to inject the collaborators: "JavaEE 5, Spring 2 and AspectJ 5: dependecy injection magics". Some more information, "7.7.1. Using AspectJ to dependency inject domain objects with Spring".
The other useful article is, "Don't repeat the DAO!" which allows the creation of type safe and generic DAOs. This uses dependency injection to put the class type to return and then uses Spring's AOP to add extra behaviour (like finder methods).
This compares with Spring 2.0 examples that use the Spring configuration file to inject the collaborators: "JavaEE 5, Spring 2 and AspectJ 5: dependecy injection magics". Some more information, "7.7.1. Using AspectJ to dependency inject domain objects with Spring".
The other useful article is, "Don't repeat the DAO!" which allows the creation of type safe and generic DAOs. This uses dependency injection to put the class type to return and then uses Spring's AOP to add extra behaviour (like finder methods).
Free America
Listening to the radio again (mp3 here). This time it was about Alfred McCoy's article called 'Outcast of Camp Echo: The Punishment of David Hicks' (magazine website here) with overtones of "The President Versus David Hicks" and mind control.
The second interview was with Bobby Kennedy Jr about his Rolling Stone article "Was the 2004 Election Stolen?" (moby mirror). The implication is that if the result was in a country like Ukraine or Georgia then the result would've been overturned.
The second interview was with Bobby Kennedy Jr about his Rolling Stone article "Was the 2004 Election Stolen?" (moby mirror). The implication is that if the result was in a country like Ukraine or Georgia then the result would've been overturned.
Sunday, June 04, 2006
Burst of Links
- As everyone knows OS X's default behaviour is to replace a folder not merge it. Luckily, there's Apple's free FileMerge utility.
- Peter Singer: The Ethics of What We Eat mentioning his books "Unsanctifying Human Life: Essays on Ethics" and "The Way We Eat : Why Our Food Choices Matter" - providing more than one example of the differences between moral and religious. His entry on animals also reinforces this. His essay "The Singer Solution to World Poverty" reminded me of the recent Richard Curtis movie "The Girl In The Café".
- The Asymmetric Web "Then began an awful period - which continues to this day, sadly -- of companies developing intranet applications and then concluding, erroneously, that the application can be deployed on the Web by just flicking the proverbial switch."
- Redux: Enterprise Software Licensing on Life Support "...by 2008 conventional software licenses will account for less than 10% of total revenues for all software companies, and less than 20% for leading enterprise search vendors..."
- Converting Between XML and JSON "In an ideal world, the resulting JSON structure can be converted back to its original XML document easily. Thus it seems worthwhile to discuss some common patterns as the foundation of a potentially bidirectional conversion process between XML and JSON."
- On First-Order-Logic Databases "We have demonstrated an equivalence between dependency statements (or functional dependencies) of a relational database on the one hand and of implicational statements of propositional logic on the other hand." Hopefully this means what I think it means.
Dressing Up SPARQL
Pérez et al.: Semantics and Complexity of SPARQL points to a new paper "Semantics and Complexity of SPARQL". Which offers a formal definition of most of the operations available in SPARQL.
"...I feel that this paper is a great formal foundation for SPARQL. It presents clear and reasonable answers to all of the weird edge cases we had to pussyfoot around until now. It makes me confident that SPARQL can be formally analyzed and there are answers to the hard optimization problems I and other people face when implementing SPARQL (e.g. in D2RQ)...I knew there were edge cases where my approach didn’t match SPARQL, and I suspected that they wouldn’t occur much in real queries – but now I know for sure, and I know that there are no further edge cases lurking."
From the paper: "...under the depth-first approach some natural properties of widely used operators do not hold, which may confuse some users. For example, it is not always the case that Eval D ((P1 AND P2 )) = Eval D ((P2 AND P1 )), violating the commutativity of the conjunction and making the result to depend on the order of the query."
"A graph pattern P is well designed if for every occurrence of a sub-pattern P?= (P1 OPT P2) of P and for every variable ?X occurring in P, the following condition holds: if ?X occurs both in P2 and outside P?, then it also occurs in P1."
"...the assumption on predicates used for joining (outer joining) relations to be null-rejecting...[in SPARQL] those predicates are implicit in the variables that the graph patterns share and by the definition of compatible mappings they are never null-rejecting....queries are also enforced not to contain Cartesian products, situation that occurs often in SPARQL when joining graph patterns that do not share variables. Thus, specific techniques must be developed in the SPARQL context."
While Galindo-Legaria and Rosenthal do limit their reordering work to ones without Cartesian products and null-intolerant predicates, other such as "Using EELs, a Practical Approach to Outerjoin and Antijoin Reordering" have extended their work to include handling special cases of these.
From my first look, the whole thing does point to having to re-develop a lot of work. Reusing things like depth-first query evaluation seem pretty fundamental. I'll certainly be looking at this more.
"...I feel that this paper is a great formal foundation for SPARQL. It presents clear and reasonable answers to all of the weird edge cases we had to pussyfoot around until now. It makes me confident that SPARQL can be formally analyzed and there are answers to the hard optimization problems I and other people face when implementing SPARQL (e.g. in D2RQ)...I knew there were edge cases where my approach didn’t match SPARQL, and I suspected that they wouldn’t occur much in real queries – but now I know for sure, and I know that there are no further edge cases lurking."
From the paper: "...under the depth-first approach some natural properties of widely used operators do not hold, which may confuse some users. For example, it is not always the case that Eval D ((P1 AND P2 )) = Eval D ((P2 AND P1 )), violating the commutativity of the conjunction and making the result to depend on the order of the query."
"A graph pattern P is well designed if for every occurrence of a sub-pattern P?= (P1 OPT P2) of P and for every variable ?X occurring in P, the following condition holds: if ?X occurs both in P2 and outside P?, then it also occurs in P1."
"...the assumption on predicates used for joining (outer joining) relations to be null-rejecting...[in SPARQL] those predicates are implicit in the variables that the graph patterns share and by the definition of compatible mappings they are never null-rejecting....queries are also enforced not to contain Cartesian products, situation that occurs often in SPARQL when joining graph patterns that do not share variables. Thus, specific techniques must be developed in the SPARQL context."
While Galindo-Legaria and Rosenthal do limit their reordering work to ones without Cartesian products and null-intolerant predicates, other such as "Using EELs, a Practical Approach to Outerjoin and Antijoin Reordering" have extended their work to include handling special cases of these.
From my first look, the whole thing does point to having to re-develop a lot of work. Reusing things like depth-first query evaluation seem pretty fundamental. I'll certainly be looking at this more.
Subscribe to:
Posts (Atom)