To save disk space for the on-disk indices, we compress the individual blocks using Huffman coding. Depending on the data values and the sorting order of the index, we achieve a compression rate of ≈ 90%. Although compression has a marginal impact on performance, we deem that the benefits of saved disk space for large index files outweighs the slight performance dip.
Figure 4 shows the correspondence between block size and lookup time, and also shows the impact of Huffman coding on the lookup performance; block sizes are measured pre-compression. The average lookup time for a data file with 100k entries (random lookups for all subjects in the index) using a 64k block size is approximately 1.1 ms for the uncompressed and 1.4 ms for the compressed data file. For 90k random lookups over a 7 GB data file with 420 million synthetically generated triples (more on that dataset in Section 7), we achieve an average seek time of 23.5 ms.
I still wonder how the DERI guys can make the claim about it being their indexing scheme especially when Kowari was open sourced before YARS or the original paper came out. Maybe it's who publishes first? See Paul's previous discussion about it in 2005 (under the title "Indexing"). I mind that this hasn't been properly attributed as I'd like Paul and any others to get the attribution they deserve. On the other hand, I'm glad that people are taking this idea and running with it.
It's good to see that text searching on literals now seems like a standard feature too. They used a sparse index to create all 6 indices. They also hint out how reasoning is going to be performed by linking to, "Unifying Reasoning and Search to Web Scale", which suggests a tradeoff over time and trust.
2 comments:
I haven't read the source article, but you, I & Dave Wood gave a poster at WWW 2004 on such things, and we also had a paper accepted in 2005: Kowari: A Platform for Semantic Web Storage and Analysis
Hello,
we didn't take anything from Kowari when developing YARS. Looks like the complete index has been invented independently. However, we do not want to deprive anybody of well due appreciation, so we'll add the XTech reference to the bibliography of the YARS2 paper. Please let us know if you'd like any other attribution.
Regards,
Andreas Harth.
Post a Comment