More News: 05/01/2007

Tuesday, May 22, 2007

Oh the Waste

This new series looks at contemporary American culture through the austere lens of statistics. Each image portrays a specific quantity of something: fifteen million sheets of office paper (five minutes of paper use); 106,000 aluminum cans (thirty seconds of can consumption) and so on. My hope is that images representing these quantities might have a different effect than the raw numbers alone, such as we find daily in articles and books. Statistics can feel abstract and anesthetizing, making it difficult to connect with and make meaning of 3.6 million SUV sales in one year, for example, or 2.3 million Americans in prison, or 426,000 cell phones retired every day. This project visually examines these vast and bizarre measures of our society, in large intricately detailed prints assembled from thousands of smaller photographs.

It also includes, ".5 feet wide by 10.5 feet tall in three horizontal panels, Depicts 125,000 one-hundred dollar bills ($12.5 million), the amount our government spends every hour on the war in Iraq."

Via, Statistical visualization.

Sunday, May 20, 2007

MapReduce SPARQL

Compositional Evaluation of W3C SPARQL Algebra via Reduce/Map "Tested against older DAWG testsuite. Implemented using functional programming idioms: fold (reduce) / unfold (map)

Does that suggest parallelizable execution?"

Thanks to SPARQL's adoption of compositional semantics (algebra) it's possible. From what I can remember, while there is an implicit left to right evaluation of OPTIONAL, which does limit the top level execution, the partial results are parallelizable. Being able to execute order independent OPTIONAL while retaining the correct results for OPTIONAL is still an open question (although feasible I think).

Available from SVN here. More musings, Musings of a Semantic / Rich Web Architect: What's Next?.

Friday, May 18, 2007

One More Pragmatic Language

Pragmatic Haskell

You see, I’ve been talking to the good folks over at Pragmatic Programmers about the possibility of doing a Haskell book. All of my writing effort has been going into that book, and as I didn’t know what the consequences would be of posting potential book content to the net, I elected to keep my mouth shut.

Well, they have agreed.

Via.

Where's the Evil?

I like bats much better than bureaucrats. I live in the Managerial Age, in a world of "Admin." The greatest evil is not now done in those sordid "dens of crime" that Dickens loved to paint. It is not done even in concentration camps and labour camps. In those we see its final result. But it is conceived and ordered (moved, seconded, carried, and minuted) in clean, carpeted, warmed, and well-lighted offices, by quiet men with white collars and cut fingernails and smooth-shaven cheeks who do not need to raise their voice. Hence, naturally enough, my symbol for Hell is something like the bureaucracy of a police state or the offices of a thoroughly nasty business concern.

From, Bullying of Academics in Higher Education. An interesting review, Notes on The Screwtape Letters.

Tuesday, May 15, 2007

The Health Benefits of the Semantic Web

A page about the HCLS Demo given in Banff at WWW2007 Some interesting demos. Using OpenLink's Virtuoso to store and query about 350 million statements (and many more required). Part of the demo used Armed Bear Common Lisp, which is Lisp for the JVM (compiles Lisp to bytecode).

Thursday, May 10, 2007

Defeasible Logic Plus Time

Temporal extensions to Defeasible Logic. Non-monotonic reasoning is about adding more information over time to reach different conclusions. Rather than adding information adding temporal extensions actually removes when this information applies.

Even the Banner Ads are More Interesting

I was reading, Sun Tells Java Plans, of which I got to the second paragraph before noticing an Apple ad (in Flash with sound). Listened to the commercial, closed the tab, I only barely care what the article was about. So who does marketing better (actually yesterday there was one with the PC guy banging his head against the banner which was better)? Of course, I could just be responding in a Pavlovian (hmm dessert) way to the background jingle.

Sunday, May 06, 2007

Romulus and Remus - C# and Java

Ted Neward talks about C# and Java. Also mentions the possible .NET backlash, Scala and LINQ. There's also a very impressive demo of LINQ (about 1/4 of the way through the code demo starts) - the struggle from imperative to declarative.

Saturday, May 05, 2007

An Efficient Link Store

Another web of data store that produces a subset of RDF/XML, Astoria, is from an unlikely source, Microsoft. Instead of a proprietary Semantic Web, Danny sees it as going to town with URIs and REST.

Silverlight was the other surprising Microsoft development, nothing beats running code - except maybe browser-based dynamic code 2000 times faster. Applets are cool again.

Some ideas for static triple indexing "Most mature triplestores also index a 4th query element ‘graph’ or ‘context’. I intend to support this query type without expanding the index by using a trick: In my triples format the fact that the subjects are auto-generated and local to the graph means I can choose them to be sequential and effectively re-use them as graph indexes..."

Plugged In/Invisible Worlds/Tucana/Northrop/TKS/TMex/Kowari/Mulgara podcast (links to the Talis page).

PAGE a distributed triple store using DHT and YARS (the original). It does seem to miss the DELIS work on P2P RDF which scaled up to 64 nodes.

Haskell and the Faith of Programming Languages Phillip Wadler gives a rather brilliant talk on programming languages. Covers Haskell, Java generics, combining different typed languages (weak, strong, very strong) as well as monads and Links.

YARS Revenge

With little fanfare the folks at DERI have announced YARS2. I know of at least 4 next generation RDF stores (you know who you are) with a few others on the drawing board. Storing data is cool again.

To save disk space for the on-disk indices, we compress the individual blocks using Huffman coding. Depending on the data values and the sorting order of the index, we achieve a compression rate of ≈ 90%. Although compression has a marginal impact on performance, we deem that the benefits of saved disk space for large index files outweighs the slight performance dip.

Figure 4 shows the correspondence between block size and lookup time, and also shows the impact of Huffman coding on the lookup performance; block sizes are measured pre-compression. The average lookup time for a data file with 100k entries (random lookups for all subjects in the index) using a 64k block size is approximately 1.1 ms for the uncompressed and 1.4 ms for the compressed data file. For 90k random lookups over a 7 GB data file with 420 million synthetically generated triples (more on that dataset in Section 7), we achieve an average seek time of 23.5 ms.

I still wonder how the DERI guys can make the claim about it being their indexing scheme especially when Kowari was open sourced before YARS or the original paper came out. Maybe it's who publishes first? See Paul's previous discussion about it in 2005 (under the title "Indexing"). I mind that this hasn't been properly attributed as I'd like Paul and any others to get the attribution they deserve. On the other hand, I'm glad that people are taking this idea and running with it.

It's good to see that text searching on literals now seems like a standard feature too. They used a sparse index to create all 6 indices. They also hint out how reasoning is going to be performed by linking to, "Unifying Reasoning and Search to Web Scale", which suggests a tradeoff over time and trust.