Smalltalk:::OWL-Project "OWL has emerged from the AI/semantic community and tends to be in the open-source community which appears to be a direction for Smalltalk (e.g. Smalltalk Solutions at Linux World) Much of the work to date has been implemented in Python and Ruby which, from a language perspective, is very close to Smalltalk. However, those languages become less appealing if you have ever worked in the IDE's supporting those languages. OWL can provide the Smalltalk community with a "market" that is a good fit for the features of the ST language and supporting IDE's."
"Agilense provides a product named EA WebModeler...is an implementation of the Adaptive Object Model pattern..."
A good summary of AOM: "We call these systems "Adaptive Object-Models", because the users' object model is interpreted at runtime and can be changed with immediate (but controlled) effects on the system interpreting it. The real power in Adaptive Object-Models is that the definition of a domain model and rules for its integrity can be configured by domain experts external to the execution of the program."...If you got just one reason why EJB is flawed, this has got to be the one, you can't build systems for Enterprises at the same time take away control form the business stakeholders."
Saturday, December 31, 2005
RETE Rebuked
A recent blog I started reading after criticizing SPARQL. This time its criticizing an entry Paul made about the scalability of the RETE algorithm: "I am a bit confused by the statement that RETE does not scale. This is contrary to a mountain of papers by researchers and developers around the world. From the paragraph, a couple of things come to mind. "(loading indexes) does not need to be done often" tells me the author doesn't understand the purpose and goal of RETE. RETE was designed to solve machine learning problems where data changes rapidly and reasoning is a continuous process. What the author wants is something closer to BitMap indexes used in OLAP products."
A lot of the points raised, like RETE being for changing data, are mentioned in a previous post under Meeting. Drools was chosen as a starting to point to see what kind of system needed to be developed in Kowari.
Also mentioned, bitmap indexing: "Given Tucana is indexing everything, they might as well adapt Bitmap indexing and get better than linear performance. The problem described by the blog is a well understood problem in the OLAP world."
An previous entry, "Relational theory, RETE and Derby" points to some interesting articles about bitmap indexes (available in Oracle 9) and high scalability requirements: "In a large financial institution like a mutual fund company, they may have 1-20 million customers. If each customer has an average of 20-30 positions (aka specific holding of an equity) that means the potential dataset for firm wide compliance rule could involve 20million+ rows. Doing this within 2-5 seconds is rather hard, so it requires using lots of different techniques. In the extreme cases, a company might have 20 million accounts, which means the potential dataset is 600 million rows."
From the OTN article: "B-tree indexes are usually used when columns are unique or near unique; bitmap indexes should be used, or at least considered, in all other cases. While you would not generally use a b-tree index when retrieving 40 percent of the rows of a table, a bitmap index is often still faster than doing a full table scan. This is seemingly in violation of the 80/20 rule, which is to generally use an index when retrieving 20 percent or less of the rows and do a full table scan when retrieving more. Bitmap indexes are smaller and work differently from the 80/20 rule. You can effectively use bitmap indexes even when retrieving large percentages (20 to 80 percent) of a table. Bitmaps can also be used to retrieve conditions based on nulls (since nulls are also indexed) and for "not equal" conditions."
It would appear that this would be suitable for predicate indexation but not generally as both subjects and objects are near unique.
A lot of the points raised, like RETE being for changing data, are mentioned in a previous post under Meeting. Drools was chosen as a starting to point to see what kind of system needed to be developed in Kowari.
Also mentioned, bitmap indexing: "Given Tucana is indexing everything, they might as well adapt Bitmap indexing and get better than linear performance. The problem described by the blog is a well understood problem in the OLAP world."
An previous entry, "Relational theory, RETE and Derby" points to some interesting articles about bitmap indexes (available in Oracle 9) and high scalability requirements: "In a large financial institution like a mutual fund company, they may have 1-20 million customers. If each customer has an average of 20-30 positions (aka specific holding of an equity) that means the potential dataset for firm wide compliance rule could involve 20million+ rows. Doing this within 2-5 seconds is rather hard, so it requires using lots of different techniques. In the extreme cases, a company might have 20 million accounts, which means the potential dataset is 600 million rows."
From the OTN article: "B-tree indexes are usually used when columns are unique or near unique; bitmap indexes should be used, or at least considered, in all other cases. While you would not generally use a b-tree index when retrieving 40 percent of the rows of a table, a bitmap index is often still faster than doing a full table scan. This is seemingly in violation of the 80/20 rule, which is to generally use an index when retrieving 20 percent or less of the rows and do a full table scan when retrieving more. Bitmap indexes are smaller and work differently from the 80/20 rule. You can effectively use bitmap indexes even when retrieving large percentages (20 to 80 percent) of a table. Bitmaps can also be used to retrieve conditions based on nulls (since nulls are also indexed) and for "not equal" conditions."
It would appear that this would be suitable for predicate indexation but not generally as both subjects and objects are near unique.
Thursday, December 29, 2005
Good API Design
Java API Design Guidelines "If your API is worth anything, it will evolve over time...decide what sort of compatibility you will guarantee between revisions...What should the design goals of your API be?...absolutely correct...easy to use...easy to learn...fast enough...small enough...it's much easier to put things in than to take them out."
This seems all well and good.
"Interfaces can be implemented by anybody. Suppose String were an interface. Then you could never be sure that a String you got from somewhere obeyed the semantics you expect: it is immutable; its hashCode() is computed in a certain way; its length is never negative; and so on."
Adding interfaces on top of String like CharSequence was a good thing. It meant that you could process a String or a StringBuffer the same as well as consistently treat something that may have been in memory or on-disk (including being memory mapped via NIO). The way to ensure that String implementations do follow the correct semantics is test driving the interfaces and this requires being able to create a Mock object of that interface - pretty much sealing the deal as far as interfaces are concerned.
Actually, the whole section on interfaces is pretty much a wash as this can be solved by test driving 3 of the 4 points raised. As far as using an abstract class to help with the evolution of the API, you can have both - an interface and a default abstract class for some base implementation.
Exceptions get a going over too, "Use a checked exception "if the exceptional condition cannot be prevented by proper use of the API and the programmer using the API can take some useful action once confronted with the exception." In practice this usually means that a checked exception reflects a problem in interaction with the outside world, such as the network, filesystem, or windowing system."
Related: Evolution Not Creation, Reminder About Incremental and Test Driven Development and 10 Minute Commits for Better Code.
This seems all well and good.
"Interfaces can be implemented by anybody. Suppose String were an interface. Then you could never be sure that a String you got from somewhere obeyed the semantics you expect: it is immutable; its hashCode() is computed in a certain way; its length is never negative; and so on."
Adding interfaces on top of String like CharSequence was a good thing. It meant that you could process a String or a StringBuffer the same as well as consistently treat something that may have been in memory or on-disk (including being memory mapped via NIO). The way to ensure that String implementations do follow the correct semantics is test driving the interfaces and this requires being able to create a Mock object of that interface - pretty much sealing the deal as far as interfaces are concerned.
Actually, the whole section on interfaces is pretty much a wash as this can be solved by test driving 3 of the 4 points raised. As far as using an abstract class to help with the evolution of the API, you can have both - an interface and a default abstract class for some base implementation.
Exceptions get a going over too, "Use a checked exception "if the exceptional condition cannot be prevented by proper use of the API and the programmer using the API can take some useful action once confronted with the exception." In practice this usually means that a checked exception reflects a problem in interaction with the outside world, such as the network, filesystem, or windowing system."
Related: Evolution Not Creation, Reminder About Incremental and Test Driven Development and 10 Minute Commits for Better Code.
5 Minutes with Monad
I've recently spent a bit of time trying to solve some problems using Microsoft's new Monad shell. It's interesting that the creature for the O'Reilly book is the common toad. It was a different experience, although the pain is eased a little as the default installation comes with all the new commands (cmdlets) mapped to Unix ones (like ls and ps).
The default security setting prevents remotely signed objects from being executed and there seems to be no way to turn it on. The documentation is missing. To make it usable it's:
Going through the tutorials it did show itself to be kind of cool. For example, being able to select the top 10 processes based on VirtualMemorySize:
One of the problems was trying to do line by line processing. There was promises of pipelining via XML streams but according to "Replace lines in a text file?" (the first hit on Google) Monad doesn't support it. The lack of streaming appears to be a crucial omission in a toolset designed for system administrators - although it might not be fatal as log files and the like don't usually come close to the available memory of modern systems.
It does support accessing the .NET APIs which provides a loophole. For example, to read a file line by line and replace "xxx" with "yyy":
It was all for nothing, as I later found out that it didn't support Windows 2000 and it needed to be deployed on that - it is supported by Windows XP, 2003 and Vista. Back to Windows Script Host (maybe using Ruby) I guess.
The default security setting prevents remotely signed objects from being executed and there seems to be no way to turn it on. The documentation is missing. To make it usable it's:
set-property
`HKLM:\SOFTWARE\Microsoft\Msh\Microsoft.Management.Automation.msh`
-property ExecutionPolicy -value RemoteSigned
Going through the tutorials it did show itself to be kind of cool. For example, being able to select the top 10 processes based on VirtualMemorySize:
get-process | sort-object VirtualMemorySize | select-object -last 10You can whack on a "convert-HTML" or an "export-csv" to produce the result in a format you want or connect to Excel or SQL Server to retrieve data. A lot has been made of its native XML support and how it passes around strongly typed objects rather than just Unix's streams.
One of the problems was trying to do line by line processing. There was promises of pipelining via XML streams but according to "Replace lines in a text file?" (the first hit on Google) Monad doesn't support it. The lack of streaming appears to be a crucial omission in a toolset designed for system administrators - although it might not be fatal as log files and the like don't usually come close to the available memory of modern systems.
It does support accessing the .NET APIs which provides a loophole. For example, to read a file line by line and replace "xxx" with "yyy":
$f = [System.IO.File]::OpenText("c:\file.txt")
while($line = $f.ReadLine())
{
$line -replace "xxx","yyy"
}
It was all for nothing, as I later found out that it didn't support Windows 2000 and it needed to be deployed on that - it is supported by Windows XP, 2003 and Vista. Back to Windows Script Host (maybe using Ruby) I guess.
Sunday, December 25, 2005
Merry Bag Of Links
Political:
Agile:
Programming:
General Technical:
- Support Creative Commons "We are down to the last $100,000, and really need your support — both for the very cool projects we’re launching (see, e.g., the license interoperability project, discussed recently in Technology Review, and the two new projects announced this week), and for the very uncool pressure we’re under from IRS regulations to demonstrate “public support” as a condition for keeping our (absolutely essential as in we can’t live with out it) tax exempt status." Via We've got 10 days, and we need $100,000. Please help
- Passion of the Spaghetti Monster and Intelligent Design
- Top 12 media myths and falsehoods on the Bush administration's spying scandal "...the Bush administration and its conservative allies in the media have defended the secret spying operation with false and misleading claims that have subsequently been reported without challenge across the media."
- The Curious Section 126 of the Patriot Act "Congress is seeking assurances that "the privacy and due process rights of individuals" is protected in the course of the government using massive databases of non-publicly available data; both proprietary databases and its own compiled intelligence and law enforcement databases to "search" for terrorists and terrorist connections."
Agile:
- Client vs. Developer Wars "This, to me, is another indictment of dysfunctional specifications. I learned long ago that clients won't listen to what you say, and they certainly won't read what you write. You're much better off putting that wasted effort into a working model and setting it in front of the client. Let them play with it for a while. Refine the working model based on that feedback, then keep turning the crank on this cycle until you run out of resources."
- Continuous Testing - in spirals "I want tests to run ‘inside out’ Imagine a spiral, with the unit test for the bit of code you’re currently editing to be the focal point. Ideally, I’d want a test to run first for the method I changed last, then for the whole class, then for the suite the class is in, then further out to other dependencies. Tests run outward only when green bars are encountered. If there is a red bar somewhere, the spiraling stops, so we can examine the failure, fix it, and see again from the inside which tests run."
- Essential Advice for Agile Coaches
Programming:
- Ruby Off the Rails "Ruby's syntax is quite different from that of the Java language, but it's amazingly easy to pick up. Moreover, some things are just plain easier to do in Ruby than they are in the Java language."
- Seven Habits of Highly Effective Programmers "Sorry, there's no shortcut - you have to learn and practice and make some mistakes."
- Networking Libraries using NIO: EJOE, Coconut AIO and MINA. MINA's features include: "unit testability using mock objects".
- ONJava: 2005 Year in Review "Java is still by far the most widely used programming language, if book sales are any indication, about 2x C#, 2.5x PHP, 4x Perl, and 9x Ruby/Python."
- Automate acceptance tests with Selenium.
General Technical:
- What is Songbird? "Songbird is built atop the Mozilla Foundation's XULRunner platform also used by the Firefox browser, the Thunderbird email client and other desktop applications." The user preview has been delayed
- Mac Mini - Big Ideas
- Lack of focus and death march at Google
- Mac IE's Death: A Case for Microsoft Disbanding or Transfering the Windows IE Team ""Then why on earth did we pursue IE in the first place? Just so that the DOJ would sue us?""
- The Bubble Cycle is Replacing the Business Cycle Housing, bond, etc bubbles.
Saturday, December 24, 2005
Know When to Hold Them, Know When to Fold Them
So what is an RDF merge and when should you apply it in SPARQL?
Very succinctly, in "The Semantics of SPARQL" it says: "The RDF merge <G1...Gn> of a sequence of graphs <G1...Gn> (i.e., a dataset) is the ordered merge union of the graphs, where repeated bnodes are substituted with fresh ones, by keeping the names of the bnodes coming first in the sequence order."
In "SPARQL Query Language for RDF" it gives a simple example:
Graph 1:
_:a foaf:name "Bob" .
_:a foaf:mbox .
Graph 2:
_:a foaf:name "Alice" .
_:a foaf:mbox .
The result of the merge, upon which queries are made:
_:x foaf:name "Bob" .
_:x foaf:mbox .
_:y foaf:name "Alice" .
_:y foaf:mbox .
Section 9 details querying multiple graphs in SPARQL, including a new dataset where the default graph is a merge of the graphs in the FROM clause.
In summary, when SPARQL operations are performed across graphs you get new blank nodes which prevents, for example, being able to JOIN across graphs using them.
What is generally required by RDF applications is something like smushing. For example, an "...RDF spider (often known as a "scutter") can gather up FOAF files and "smush" them together into a single model that unifies the individual pieces of information into a network." (from "A Semantic Web Shoebox - Annotating Photos with RSS and RDF").
To actually achieve smushing, Leo has an example algorithm or it might be appropriate to adapt RDF graph isomorphism algorithms.
Very succinctly, in "The Semantics of SPARQL" it says: "The RDF merge <G1...Gn> of a sequence of graphs <G1...Gn> (i.e., a dataset) is the ordered merge union of the graphs, where repeated bnodes are substituted with fresh ones, by keeping the names of the bnodes coming first in the sequence order.
In "SPARQL Query Language for RDF" it gives a simple example:
Graph 1:
_:a foaf:name "Bob" .
_:a foaf:mbox
Graph 2:
_:a foaf:name "Alice" .
_:a foaf:mbox
The result of the merge, upon which queries are made:
_:x foaf:name "Bob" .
_:x foaf:mbox
_:y foaf:name "Alice" .
_:y foaf:mbox
Section 9 details querying multiple graphs in SPARQL, including a new dataset where the default graph is a merge of the graphs in the FROM clause.
In summary, when SPARQL operations are performed across graphs you get new blank nodes which prevents, for example, being able to JOIN across graphs using them.
What is generally required by RDF applications is something like smushing. For example, an "...RDF spider (often known as a "scutter") can gather up FOAF files and "smush" them together into a single model that unifies the individual pieces of information into a network." (from "A Semantic Web Shoebox - Annotating Photos with RSS and RDF").
To actually achieve smushing, Leo has an example algorithm or it might be appropriate to adapt RDF graph isomorphism algorithms.
Friday, December 23, 2005
More on Disjunction
I think this is just going to happen time and time again, "RDF non-sense": "The W3C Working Group members argue SPARQL is a mixed mode language that does support OR, though they are calling it an "optional union". Frankly, I see no point in renaming something people understand to mean one thing. It gives me the impression the W3C want to be thought police and enforce a certain way of thinking.
On the practical side, many analysts happen to like OR disjunctions and would complain loudly. Of course, there are plenty of cases where users abuse the power and write deeply nested disjunctions. That is not a valid reason in my mind to avoid disjunction. It saves the user time and allows them to write simpler rules using disjunction. The W3C seems love RDF and wants the world to love it. Unfortunately, the current specification is a complete piece of junk. I hope RDF dies a quick and public death."
I declare...backward chaining suits me fine! "My two favorite declarative tools right now are Pellet and Prova, both of which are open source java SemWeb tools that are highly compatible with Jena, which recently got a bump to 2.3 with fairly complete SPARQL support. Pellet is an implementation of OWL-DL and some related description logic facilities by the Mindswap guys in Maryland, who have absorbed some of the l33t Kowari/Tucana guys, too (Tucana was recently picked up by Northrop, BTW)."
"Prova is a prolog-variant built on top of Mandarax. It is a very effective and fun medium for scripting of high-level relationships and operations. The integration of prolog unification, java types, java methods, and java exceptions is done very nicely, and yields fine code economy. There are some rough edges in the docs, but we are helping to get these worked out in the pretty soon."
I guess that means David W is l33t! :-)
On the practical side, many analysts happen to like OR disjunctions and would complain loudly. Of course, there are plenty of cases where users abuse the power and write deeply nested disjunctions. That is not a valid reason in my mind to avoid disjunction. It saves the user time and allows them to write simpler rules using disjunction. The W3C seems love RDF and wants the world to love it. Unfortunately, the current specification is a complete piece of junk. I hope RDF dies a quick and public death."
I declare...backward chaining suits me fine! "My two favorite declarative tools right now are Pellet and Prova, both of which are open source java SemWeb tools that are highly compatible with Jena, which recently got a bump to 2.3 with fairly complete SPARQL support. Pellet is an implementation of OWL-DL and some related description logic facilities by the Mindswap guys in Maryland, who have absorbed some of the l33t Kowari/Tucana guys, too (Tucana was recently picked up by Northrop, BTW)."
"Prova is a prolog-variant built on top of Mandarax. It is a very effective and fun medium for scripting of high-level relationships and operations. The integration of prolog unification, java types, java methods, and java exceptions is done very nicely, and yields fine code economy. There are some rough edges in the docs, but we are helping to get these worked out in the pretty soon."
I guess that means David W is l33t! :-)
I Hope Not
Ruby is to Perl what C++ was to C. He qualified this by saying, "Ruby improves and simplifies the Perl language" which isn't really what I saw C++ doing at all.
Quote: "These are the folks that assert that Java's verbosity is "just finger typing that Eclipse/IntelliJ will do for me," and it doesn't matter if the resulting code has 20 times the visual bulk of a simpler approach. One of the basic tenets of the Python language has been that code should be simple and clear to express and to read, and Ruby has followed this idea, although not as far as Python has because of the inherited Perlisms. But for someone who has invested Herculean effort to use EJBs just to baby-sit a database, Rails must seem like the essence of simplicity. The understandable reaction for such a person is that everything they did in Java was a waste of time, and that Ruby is the one true path."
Related: Rocking With Ruby and One way Java is better than Ruby
Quote: "These are the folks that assert that Java's verbosity is "just finger typing that Eclipse/IntelliJ will do for me," and it doesn't matter if the resulting code has 20 times the visual bulk of a simpler approach. One of the basic tenets of the Python language has been that code should be simple and clear to express and to read, and Ruby has followed this idea, although not as far as Python has because of the inherited Perlisms. But for someone who has invested Herculean effort to use EJBs just to baby-sit a database, Rails must seem like the essence of simplicity. The understandable reaction for such a person is that everything they did in Java was a waste of time, and that Ruby is the one true path."
Related: Rocking With Ruby and One way Java is better than Ruby
Wednesday, December 21, 2005
A Better PageRank
This paper gives an example of some of the flaws with Google's PageRank algorithm and they suggest they have an algorithm that fixes it. Something is Wrong with Google’s Mathematical Model "In their original paper : ”The PageRank Citation Ranking: Bringing Order to the Web” [1], Page et al. suggest a new ranking algorithm named - PageRank. It is shown there that implementing the new algorithm boils down to solving a huge eigenvalues problem Ax = x (1) where A is a matrix which represents a graph related to the web. It is claimed that in order for the model to work properly, the graph should be strongly connected. In general, the graph is not strongly connected and we have ’sink’ set of pages."
"We have developed a new algorithm which can be considered as a modification of the original PageRank algorithm. The modified algorithm is stable and gives a correct ranking vector. The mathematical complexity of the suggested algorithm is the same as the complexity of the original one."
In Google's Librarian Center they've published as similar explanation of PageRank called "How does Google collect and rank results?".
"We have developed a new algorithm which can be considered as a modification of the original PageRank algorithm. The modified algorithm is stable and gives a correct ranking vector. The mathematical complexity of the suggested algorithm is the same as the complexity of the original one."
In Google's Librarian Center they've published as similar explanation of PageRank called "How does Google collect and rank results?".
LiveConnect, Lives as LAJAX
I ignored this when I first heard about it, as noted here: ""I was disappointed when [Bray] said that he was going to make a product announcement, and I was unenthusiastic when the announcement turned out to be about a Sun version of Derby," Leung wrote.".
Derby Demo hits a nerve "I think we hit a nerve with this demo. I think many of us within the Derby community recognized the potential for Derby within a web browser environment, but it's wonderful, great, fantastic to see how the community is "getting" it and running with it."
Derby ApacheCon demo and ApacheCon 2005: Ok, Tim, I'm not jaded anymore.
Derby Demo hits a nerve "I think we hit a nerve with this demo. I think many of us within the Derby community recognized the potential for Derby within a web browser environment, but it's wonderful, great, fantastic to see how the community is "getting" it and running with it."
Derby ApacheCon demo and ApacheCon 2005: Ok, Tim, I'm not jaded anymore.
Monday, December 19, 2005
Minimum Union
The key operation to provide outer joins that are associative and communitive, as noted in Outer Joins Aren't Primitive, is called "minimum union". Minimum union (I've also seen "outer union") pads with nulls the tuples of two schemas and then unions them (without duplicate removal).
The original thesis "Algebraic Optimization of Outerjoin Queries" gives a variant of relational algebra that allows tuples defined by different sets of attributes (schemes) rather than padding with nulls. This actually removes the requirement for nulls. It also includes presenting the data in a nested relational form, where instead of having one tuple in a parent-child relationship, children are a set to a parent. This is just like the way Kowari presents its results (Figure 3.2 in the paper).
The original thesis "Algebraic Optimization of Outerjoin Queries" gives a variant of relational algebra that allows tuples defined by different sets of attributes (schemes) rather than padding with nulls. This actually removes the requirement for nulls. It also includes presenting the data in a nested relational form, where instead of having one tuple in a parent-child relationship, children are a set to a parent. This is just like the way Kowari presents its results (Figure 3.2 in the paper).
Sunday, December 18, 2005
A Link in Time
- Getting more than what you pay for with open source databases. The tale of using a free database, using a commercial one and then putting the free one back. A quote: "There are a lot of companies out there paying dearly for commercial databases (and operating systems for that matter). As far as I'm concerned they might as well be flushing that money down the toilet. Actually, they might be better off. We certainly would have been."
- For the metrically minded, "Two Motivational Metrics for Agile Teams". So as long as you convince people that "Time to Obstacle Removal" and "Obstacles Removed per Iteration" are good metrics.
- The, "I'm sick of the comparison between building and software" rant.
- Martin Fowler has recently updated his "The New Methodology" with half a dozen or so flavours of agile, which one are you?
- Don't Click It for those people with zero button mice. The first example of a gesture to replace a click just seems ridiculously hard.
- Examples of using Chickenfoot for Firefox: Rewrite the Web
The table sort, how to get a book from the library, simple examples and timely. - Another REST framework - Restlet.
- Getting stuff done - the quick version
- How not to get a PhD
Event Horizon
- Semantic Web, Here We Come "The “Structured Blogging Initiative” is an attempt to jump-start the “semantic web,” the idea of giving deeper meaning to the Internet advocated by World Wide Web creator Tim Berners-Lee. By incorporating descriptive information into the code of web pages, laypeople will be able to designate their content as a movie review, an event posting, or an item available for sale." StructureBlogging initiative and other entries: "Structured blogging initiative taking off", "More StructuredBlogging feedback" and Structured Blogging is a thing you do -- not a format.
- Bill de hÓra discusses RDF and database schemas: "Using RDF storage provides flexibility at the domain level. Altering tables isn't needed because RDF, being a graph based, is naturally additive...My (somewhat anecdotal) experience with RDF is that datasets in the order of 106 and greater aren't uncommon and that you should budget for an order of magnitude increase in terms of the number of rows required for the domain storage compared to an entity relational approach...It's an interesting question whether using RDBMSes to store RDF counts as some form of abuse, or bad engineering."
Saturday, December 17, 2005
Ion Inside
First Mass Producible Quantum Computer Chip "Using the same semiconductor fabrication technology that is used in everyday computer chips, researchers were able to trap a single atom within an integrated semiconductor chip and control it using electrical signals, said Christopher Monroe, U-M physics professor and the principal investigator and co-author of the paper, "Ion Trap in a Semiconductor Chip." The paper appeared in the Dec. 11 issue of Nature Physics."
Thursday, December 15, 2005
Marks
The Man Who Wasn't There: problems of missing or partially missing data in geoscience databases "In the literature discussions between Codd and Date on the propriety or otherwise of NULLs in relational databases, there seems to have been some confusion on both sides, on one very important question. That is the distinction between database representation and function evaluation. NULLs are one approach to the problem of handling missing data within the database."
So in this respect RDF is great - you don't have to come up with a value or values to represent missing data. You only have to worry about function evaluation.
"In fact the Codd 'mark' solution does not in itself require, as unfortunately implied by Codd himself, and vigorously attacked by Date, the use of 3- or 4-valued logic, and therefore cannot be dismissed so easily. Relational database theory is based on first-order predicate logic, which uses two truth values TRUE and FALSE. If there is no value for a data item, then the logical statement corresponding to the tuple containing that item can simply omit any mention of that particular column. If the value of this data item is required in an operation, then there is only one truth value which can be returned: FALSE. This applies to database set operations such as JOINs and also to numerical operations such as totals and averages where the absence of any required data value prevents the computation from being carried out. If a total or average is required in such a situation, then the problem can be circumvented only by first selecting non-absent data. This is the correct treatment, to ensure that statistics are computed on a valid data set."
Another example of marks, tuple marks.
SH writes in about incomplete data in observational science databases, the open world assumption, 3VL and NULL, McGoveran responds saying: "In a scientific database such as the type to which you allude, a reasonable interpretation of True and False under CWA is "valid by experiment and consistent with hypotheses" and "not validated by experiment or inconsistent with hypotheses". If you give this differentiation up with CWA and nulls, you've given up scientific reasoning and the scientific method."
In Kowari, if you have the following triples: _b1, <urn:sno>, "S1"; _b2, <urn:sno>, "S2"; _b3, <urn:pno> "P1".
And performed the following query:
It returns: _b1, _b2, null (really unconstrained).
However, if you select $s2 instead it returns: null, b3. Using the above idea, it would return _b1, _b2 for the first query and _b3 for the second.
I'm not sure I really like this solution, preserving unknown seems to make more sense as demonstrated in How FirstSQL Solves the EXISTS and Other Problems.
So in this respect RDF is great - you don't have to come up with a value or values to represent missing data. You only have to worry about function evaluation.
"In fact the Codd 'mark' solution does not in itself require, as unfortunately implied by Codd himself, and vigorously attacked by Date, the use of 3- or 4-valued logic, and therefore cannot be dismissed so easily. Relational database theory is based on first-order predicate logic, which uses two truth values TRUE and FALSE. If there is no value for a data item, then the logical statement corresponding to the tuple containing that item can simply omit any mention of that particular column. If the value of this data item is required in an operation, then there is only one truth value which can be returned: FALSE. This applies to database set operations such as JOINs and also to numerical operations such as totals and averages where the absence of any required data value prevents the computation from being carried out. If a total or average is required in such a situation, then the problem can be circumvented only by first selecting non-absent data. This is the correct treatment, to ensure that statistics are computed on a valid data set."
Another example of marks, tuple marks.
SH writes in about incomplete data in observational science databases, the open world assumption, 3VL and NULL, McGoveran responds saying: "In a scientific database such as the type to which you allude, a reasonable interpretation of True and False under CWA is "valid by experiment and consistent with hypotheses" and "not validated by experiment or inconsistent with hypotheses". If you give this differentiation up with CWA and nulls, you've given up scientific reasoning and the scientific method."
In Kowari, if you have the following triples: _b1, <urn:sno>, "S1"; _b2, <urn:sno>, "S2"; _b3, <urn:pno> "P1".
And performed the following query:
select $s1
...
where $s1 <urn:sno> $o1 or $s2 <urn:pno> $o2;
It returns: _b1, _b2, null (really unconstrained).
However, if you select $s2 instead it returns: null, b3. Using the above idea, it would return _b1, _b2 for the first query and _b3 for the second.
I'm not sure I really like this solution, preserving unknown seems to make more sense as demonstrated in How FirstSQL Solves the EXISTS and Other Problems.
Outer Joins aren't Primitive
Optional data in SPARQL seems to be equivalent to left outer join in SQL. As it turns out, outer joins can be composed of disjunctions. This is similar to the original MAYBE function suggested to be added to Kowari (although that suggestions is quite a deal simpler). The below paper outlines algorithms to do outer queries more efficiently. They require computing the anti-join of certain relations - an antij-oin being the set difference between two tables (or MINUS operation). Here is a good explanation of semi-joins and anti-joins.
Outerjoins as Disjunctions "The outerjoin operator is currently available in the query language of several major DBMSs, and it is included in the proposed SQL2 standard draft. However, “associativity problems” of the operator have been pointed out since its introduction. In this paper we propose a shift in the intuition behind outerjoin: Instead of computing the join while also preserving its arguments, outerjoin delivers tuples that come either from the join or from the arguments. Queries with joins and outerjoins deliver tuples that come from one out of several joins, where a single relation is a trivial join. An advantage of this view is that, in contrast to preservation, disjunction is commutative and associative, which is a significant property for intuition, formalisms, and generation of execution plans.Based on a disjunctive normal form, we show that some data merging queries cannot be evaluated by means of binary outerjoins, and give alternative procedures to evaluate those queries. We also explore several evaluation strategies for outerjoin queries, including the use of semijoin programs to reduce base relations."
Also related, Outer Join in Edutella where each part of the outer query is done individually.
Outerjoins as Disjunctions "The outerjoin operator is currently available in the query language of several major DBMSs, and it is included in the proposed SQL2 standard draft. However, “associativity problems” of the operator have been pointed out since its introduction. In this paper we propose a shift in the intuition behind outerjoin: Instead of computing the join while also preserving its arguments, outerjoin delivers tuples that come either from the join or from the arguments. Queries with joins and outerjoins deliver tuples that come from one out of several joins, where a single relation is a trivial join. An advantage of this view is that, in contrast to preservation, disjunction is commutative and associative, which is a significant property for intuition, formalisms, and generation of execution plans.Based on a disjunctive normal form, we show that some data merging queries cannot be evaluated by means of binary outerjoins, and give alternative procedures to evaluate those queries. We also explore several evaluation strategies for outerjoin queries, including the use of semijoin programs to reduce base relations."
Also related, Outer Join in Edutella where each part of the outer query is done individually.
Tuesday, December 13, 2005
A Really Interactive Query Language
SQLBuilder "SQLBuilder uses clever overriding of operators to make Python expressions build SQL expressions -- so long as you start with a Magic Object that knows how to fake it."
An example:
Via, SQL API "I'd rather SQLObject be built on some ORM-neutral layer, where you can move down to that layer when SQLObject doesn't fit your problem; as opposed to now, where you kind of have to work around SQLObject."
This is almost exactly like something I was thinking about, to prevent semantically incorrect SQL queries. Add an AJAX interface on this and it would be cool and useful.
An example:
>>> from SQLBuilder import *
>>> person = table.person
# person is now equivalent to the Person.q object from the SQLObject
# documentation
>>> person
person
>>> person.first_name
person.first_name
>>> person.first_name == 'John'
person.first_name = 'John'
Via, SQL API "I'd rather SQLObject be built on some ORM-neutral layer, where you can move down to that layer when SQLObject doesn't fit your problem; as opposed to now, where you kind of have to work around SQLObject."
This is almost exactly like something I was thinking about, to prevent semantically incorrect SQL queries. Add an AJAX interface on this and it would be cool and useful.
DRY and Embedded Program Code
What if you could define a user interface and surf it via a telephone or browser and that the data and state from one to the other was able to be shared across multiple users?
Beyond interactive voice response "English’s hope, he tells Inskeep in the interview, is that companies will admit how infuriating their systems often are...the practice of automatically collecting customers’ account numbers, and then making those customers repeat the numbers to an agent when they finally connect to one -- my top IVR gripe"
"Voice calls must be able to recruit data channels, and vice versa. That way, an agent could attach an IM session to your voice call and push you the URL in real-time chat. It might even be appropriate to extend the data session with screen sharing, so the agent can watch and assist. If things still don’t work out and the whole matter must be referred to someone else, you’d like to be able to initiate voice or data communication -- or both -- in a context-preserving way."
Via, Rethinking customer service. This points to XBL2 and an effort to provide "...a declarative format for applications and user interfaces...based on an existing application/UI format, such as Mozilla's XUL, Microsoft's XAML, Macromedia's MXML or Laszlo Systems' LZX..."
Beyond interactive voice response "English’s hope, he tells Inskeep in the interview, is that companies will admit how infuriating their systems often are...the practice of automatically collecting customers’ account numbers, and then making those customers repeat the numbers to an agent when they finally connect to one -- my top IVR gripe"
"Voice calls must be able to recruit data channels, and vice versa. That way, an agent could attach an IM session to your voice call and push you the URL in real-time chat. It might even be appropriate to extend the data session with screen sharing, so the agent can watch and assist. If things still don’t work out and the whole matter must be referred to someone else, you’d like to be able to initiate voice or data communication -- or both -- in a context-preserving way."
Via, Rethinking customer service. This points to XBL2 and an effort to provide "...a declarative format for applications and user interfaces...based on an existing application/UI format, such as Mozilla's XUL, Microsoft's XAML, Macromedia's MXML or Laszlo Systems' LZX..."
Model Driven, Semantic Web Enabled, Science Commons
Semantic Web eyed for life sciences data "The Semantic Web involves a concept in which data from multiple sources and ontologies can be integrated into a single information space. Experiment design automation (XDA) software vendor Teranode, which focuses on software for life sciences, plans to collaborate with Science Commons to build a neurology repository for the Semantic Web."
NeuroCommons is part of the ScienceCommons project, it is going to provide a database and annotations of scientific data in (presumably) RDF.
Teranode explains with their XDA product, why model driven and why the semantic web.
A related post, via Etymon.com Federated Databases in Science "The astronomy, chemistry, and geospatial communities were active well over a decade ago in collaborating with information scientists on federated databases through various open standards. Molecular biology is a field that currently has considerable needs in this area, stimulated by the Human Genome Project. Developing common standards through consensus is of course not a technological solution. The Web is successful because it exploits the relationships among a huge number of people making individual judgements that only people can make. Even the Semantic Web, if it ever has a chance of working, would have to depend on a very large base of common metadata standards, and that can only result from the slow process of people coming together and agreeing. There are many things that information technology cannot do on its own. The semantic integration of knowledge still remains a human activity."
NeuroCommons is part of the ScienceCommons project, it is going to provide a database and annotations of scientific data in (presumably) RDF.
Teranode explains with their XDA product, why model driven and why the semantic web.
A related post, via Etymon.com Federated Databases in Science "The astronomy, chemistry, and geospatial communities were active well over a decade ago in collaborating with information scientists on federated databases through various open standards. Molecular biology is a field that currently has considerable needs in this area, stimulated by the Human Genome Project. Developing common standards through consensus is of course not a technological solution. The Web is successful because it exploits the relationships among a huge number of people making individual judgements that only people can make. Even the Semantic Web, if it ever has a chance of working, would have to depend on a very large base of common metadata standards, and that can only result from the slow process of people coming together and agreeing. There are many things that information technology cannot do on its own. The semantic integration of knowledge still remains a human activity."
Friday, December 09, 2005
Graphical Batch Files
AutoMate This is a pretty interesting application that basically provides similar functionality that OSX Automator provides. While you can create customized tasks in VBA it has lots of interesting inbuilt functionality manipulating Excel, FTP, terminal emulation, keyboard and windows manipulation.
Related, but much simpler is a piece of software called, AutoIt which was "...initially designed for PC "roll out" situations to reliably configure thousands of PCs, but with the arrival of v3 it has become a powerful language able to cope with most scripting needs." Basically allowing simple automation of window, mouse and keyboard events.
Related, but much simpler is a piece of software called, AutoIt which was "...initially designed for PC "roll out" situations to reliably configure thousands of PCs, but with the arrival of v3 it has become a powerful language able to cope with most scripting needs." Basically allowing simple automation of window, mouse and keyboard events.
Thursday, December 08, 2005
One Way Java is Better than Ruby
The unbridled humanity of APIs "But I think the Java guy has a point: 78 methods on your list objects isn't good. Less methods is good. Unless the result is stupid. Now, let's be honest here, Java is stupid. Dumb, idiotic, maybe written by people who aren't programmers; I just don't know how else to make sense of it. list.get(list.size() - 1) should be embarrassing. list.last or list[-1]? I think [-1] reads well enough, and fits into a very elegant set of functionality involving slices and whatnot. But I also think list.last is entirely justifiable. OTOH, list.get(0) isn't embarrassing, so list.first isn't as compelling."
"Maybe an interesting parallel is 0 vs. 1 indexing. 1 clearly seems more humane. I personally count starting from 1. I'm naturally inclined to index from 1. Languages go both ways on the choice...Of course Smalltalk indexes from 1, so no one gets everything right."
Humane Interfaces "Part of the reason this argument could go on forever is that Ruby’s Array is both an example of arguments for Humane design, and arguments against it...java.util.List isn’t really a shining example of good interface design either...Having two otherwise equivalent ways to perform the same operation is bad user-interface design, and it’s bad library interface design, because the existence of the synonyms actually adds to your cognitive load by making you choose between them."
Also, Why Ruby Shouldn’t Be Your Next Programming Language (Maybe).
"Maybe an interesting parallel is 0 vs. 1 indexing. 1 clearly seems more humane. I personally count starting from 1. I'm naturally inclined to index from 1. Languages go both ways on the choice...Of course Smalltalk indexes from 1, so no one gets everything right."
Humane Interfaces "Part of the reason this argument could go on forever is that Ruby’s Array is both an example of arguments for Humane design, and arguments against it...java.util.List isn’t really a shining example of good interface design either...Having two otherwise equivalent ways to perform the same operation is bad user-interface design, and it’s bad library interface design, because the existence of the synonyms actually adds to your cognitive load by making you choose between them."
Also, Why Ruby Shouldn’t Be Your Next Programming Language (Maybe).
Time you enjoy wasting, was not wasted
- "TestDriven.NET makes it easy to run unit tests with a single click, anywhere in your Visual Studio solutions. It supports all versions of Microsoft Visual Studio .NET meaning you don't have to worry about compatibility issues and fully integrates with all major unit testing frameworks including NUnit, MbUnit, & MS Team System." Discussion on using TestDriven.NET with Express. Via James.
- Obie Fernandez from ThoughtWorks on Ruby and the Semantic Web "Java Doesn’t Work for Ontologies...Polymorphism in RDF and OWL is very different than in Java which makes for admittedly clumsy API...Why Deep Integration Enhances Programming with Ruby...Gives Ruby programmer powerful inferencing capabilities against the underlying RDF database". Related to Scripting the Semantic Web and Northrop Buys Tucana and Continues Kowari.
- OO in One Sentence: Keep It DRY, Shy, and Tell The Other Guy Available here. Covers how OO methods are not function calls but message passing.
- Beyond Story Cards: Agile Requirements Collaboration "Using this idea of "latest responsible moment," we postpone creating our detailed requirements specification. In fact, we can do the specification for a story at the same time as we actually implement the story. (I'll be talking about that more in a little bit.) We don't need to know every detail about a feature ahead of time. Going into the level of detail is expensive, and if (or when) our plans change, that's wasted effort."
Wednesday, December 07, 2005
Making AJAX cool
Why Ajax Sucks (Most of the Time) "For new or inexperienced Web designers, I stand by my original recommendation. Ajax: Just Say No."
"Ajax breaks the unified model of the Web and introduce a new way of looking at data that has not been well integrated into the other aspects of the Web. With ajax, the user's view of information on the screen is now determined by a sequence of navigation actions rather than a single navigation action.
Navigation does not work with ajax since the unit of navigation is different from the unit of view. If users create a bookmark in their browser they may not get the same view back when they follow the bookmark at a later date since the bookmark doesn't include a representation of the state of the content on the page.
Even worse, URLs stop working: the addressing information shown at the top of the browser no longer constitutes a complete specification of the information shown in the window."
"Ajax breaks the unified model of the Web and introduce a new way of looking at data that has not been well integrated into the other aspects of the Web. With ajax, the user's view of information on the screen is now determined by a sequence of navigation actions rather than a single navigation action.
Navigation does not work with ajax since the unit of navigation is different from the unit of view. If users create a bookmark in their browser they may not get the same view back when they follow the bookmark at a later date since the bookmark doesn't include a representation of the state of the content on the page.
Even worse, URLs stop working: the addressing information shown at the top of the browser no longer constitutes a complete specification of the information shown in the window."
Tuesday, December 06, 2005
A little light I/O
Comparing Two High-Performance I/O Design Patterns "It is clear from the charts that C++ is still the preferable approach for high performance communication solutions, but Java on Linux comes quite close. However, the overall Java performance was weakened by poor results on Windows. One reason for that may be that the Java 1.4 nio package is based on select()-style API. Ð It is true, Java NIO package is kind of Reactor pattern based on select()-style API (see [7, 8]). Java NIO allows to write your own select()-style provider (equivalent of TProactor waiting strategies). Looking at Java NIO implementation for Windows (to do this enough to examine import symbols in jdk1.5.0\jre\bin\nio.dll), we can make a conclusion that Java NIO 1.4.2 and 1.5.0 for Windows is based on WSAEventSelect () API. That is better than select(), but slower than IOCompletionPortÕs for significant number of connections. . Should the 1.5 version of Java's nio be based on IOCompletionPorts, then that should improve performance. If Java NIO would use IOCompletionPorts, than conversion of Proactor pattern to Reactor pattern should be made inside nio.dll. Although such conversion is more complicated than Reactor- >Proactor conversion, but it can be implemented in frames of Java NIO interfaces. (this the topic of next arcticle, but we can provide algorithm). At this time, no TProactor performance tests were done on JDK 1.5."
Available in Java and C++ at Terabit.
Available in Java and C++ at Terabit.
Sunday, December 04, 2005
Links to Share and Enjoy
Another list of links:
- XML 2005: Tipping Sacred Cows "Which brings us to one of our sacred cows: for decades we've had SQL for relational databases, and soon we'll have XQuery for general XML, and SPARQL for RDF...what if it was possible to construct a generalized query language, loosely coupled enough to work with any underlying data model? The mathematical basis for this was monoids. The presentation didn't actually define this fairly abstract term, only skipping from trivial examples like
or to a fully worked representation of a generalized query. Erik's dynamic presentation style is such that I was not able to copy down the full example before he had moved on to the next slide. Whatever the details, it's a valuable topic in that it gets listeners to question their assumptions and see in new ways." - Dabble combines the best of group spreadsheets, custom databases, and intranet web applications into a new way to manage and share your information online. A lot of the same conversations in Agile databases seem to occuring - has related to functionality, data integrity, migration (string to first name, last name, for example), data types and the like. Includes a blog and demo (movie). Merging is coming in version 2.0 and it's RAM based. It just cries out for an RDF data model. Via, Dabble is Bloody Brilliant.
- Problems with the $100 laptop "The time will certainly come when the appropriate tool to promote economic development will be a laptop produced very inexpensively in large volume. Before that point it will be necessary to implement systems that provide infrastructure which the laptop will need, in addition to producing tangible economic benefits for their users. OLPC is to be commended for raising issues and focusing attention, and for posing some technological challenges in a highly visible way...large sums of money are to be committed to the project in advance to fund manufacturing in deals where the customers are government ministries and not the end users." Also, $100 laptop.
- Two interesting articles: Breaking The Quality–Speed Compromise "The most important thing we can do to break the compromises we impose on customers is to move testing forward and put it in-line with (or prior to) coding. Build suites of automated unit and acceptance tests, integrate code frequently, run the tests as often as possible. In other words, find and fix the defects before they even count as defects." and Is Agile Software Development Sustainable? "So if agile practices are a “disruptive technology” compared to traditional software development processes, then it would be quite in character for them to start by addressing small systems."
- Exploratory Testing on Agile Projects Can Be a Good Fit "Why should agile teams do exploratory testing?: "Because an agile development project can accept new and unanticipated functionality so fast, it is impossible to reason out the consequences of every decision ahead of time. In other words, agile programs are more subject to unintended consequences of choices simply because choices happen so much faster. This is where exploratory testing saves the day. Because the program always runs, it is always ready to be explored.""
- Matthew De George on Cranky Middle Manager. Explaining how to apply market economies to management and more. Forgive Matthew for his hierachical view, it's all about graphs of course. I'm a bit slow in finding this.
- A different way to vommit in another yearly milestone: Tiger Moth Joy Flights.
Saturday, December 03, 2005
Best things in development are free
One of the key differences between Java and .NET development is cost. To get the right Microsoft solution costs thousands of dollars. And what you get is something very different to an IntelliJ, NetBeans or Eclipse. You get vendor integration (or lock in if you prefer) and competition against the community. It may appear attractive to some but it seems odd to me to actively fight integration into existing, open solutions. It does seem though that the open side is winning (I wish Sun had've done the same thing when choosing a logging API).
Unit testing and source control are obvious ones. It's amazing to see that their entry level IDE does not come with this. There are great free solutions in NUnit and MbUnit. MbUnit is especially cool (released recently) allowing all sorts of built in test fixtures. And there's always Ankh for Subversion integration. There are lots of alternatives.
A summary from a Microsoft developer: Hey, Shareholders! VS 2005 is *Fantastic* and our Developers Love Microsoft! "I might wander in early on Monday to meander through the crowds celebrating the big Visual Studio launch. But my heart is heavy that we shoveled what we could together and Won't Fix-ed this release out the door. Microsoft has just opened a very big door to competition in the IDE space. Or at least towards people jealously holding onto VS 2003 and saying, "CLR 2.0? Screw that! The last time I tried to use generics my machine locked up!" Big freakin' mistake. Microsoft should be ashamed."
Unit testing and source control are obvious ones. It's amazing to see that their entry level IDE does not come with this. There are great free solutions in NUnit and MbUnit. MbUnit is especially cool (released recently) allowing all sorts of built in test fixtures. And there's always Ankh for Subversion integration. There are lots of alternatives.
A summary from a Microsoft developer: Hey, Shareholders! VS 2005 is *Fantastic* and our Developers Love Microsoft! "I might wander in early on Monday to meander through the crowds celebrating the big Visual Studio launch. But my heart is heavy that we shoveled what we could together and Won't Fix-ed this release out the door. Microsoft has just opened a very big door to competition in the IDE space. Or at least towards people jealously holding onto VS 2003 and saying, "CLR 2.0? Screw that! The last time I tried to use generics my machine locked up!" Big freakin' mistake. Microsoft should be ashamed."
Subscribe to:
Posts (Atom)