Tuesday, August 03, 2004
Setting SAIL
Kowari is actually a svelt 2.4MB (or so) when Jena, Jetty, and everything else is removed. Without Jena, however, you can't parse RDF. If you are only interested in storing longs that gets down to about 1.5MB. I meantioned Paul's earlier ideas on various persistent triple stores.
| 0 comments | Link me |
Monday, August 02, 2004
OS Cloudscape
"Cloudscape is a niche product in IBM's overall data information line and has tiny market share compared with its multibillion-dollar DB2 franchise. IBM has used Cloudscape as an embedded data store as part of its Workplace desktop application line."
"The decision to release Cloudscape into open source mimics moves by other proprietary software companies, which have created open-source projects around existing products in an effort to generate more interest in the product and make it easier for programmers to access it. At LinuxWorld next week, Computer Associates International will release its Ingres r3 database, a product with limited market share, into open source."
"Putting an existing product into open source is not a surefire recipe for stimulating usage or sales, said Michael Olson, president and CEO of Sleepycat Software, which offers its own open-source database."
| 0 comments | Link me |
Google and the Semantic Web
"He [Sergey Brin] basically said he doesn't believe in the semantic web as a set of linked RDF data-structures. His basic argument is that the structure of natural language and what it presents is much much richer than meta-data tagging schemes. Clearly, Google's understanding of natural language is unique, but there still is a need for machine readable APIs for data on the Internet."
The ideas that Paul Ford puts forward are pretty interesting and illustrate the usefulness of some of the ideas of the Semantic Web; except maybe the reliance on a central authority like Google. The Semantic Web is like the Web, not like Google.
Also, I (well Google) found a new link to "The Anatomy of a Large-Scale Hypertextual Web Search Engine" written by Sergey Brin and Lawrence Page for WWW7.
| 0 comments | Link me |
July 2004 issue of SIGSEMIS
Includes an interview with Eric Miller:
"Freeing the data from the applications that created them and managing this information directly relates to a strong return on investment. The predominant skepticism I hear is perhaps the most is 'if I have XML why do I need RDF'. It's interesting however to see some of the skepticism dissipates after organizations learn from experiences (often times painful ones) that agreement on syntactic conventions are often overly brittle and not adequate for the effective management of data."
"The WWW2004 conference had a similar impact on me with regards to the Semantic Web. The technologies and toolkits are maturing. Semantic Web applications are becoming far more prevalent. Novel ideas for how these technologies may be used are happening on a daily basis. It was quite a week!"
Danny Ayer's "The Missing Webs":
"...there is a lot missing from the current Web. Those gaps can be filled in part using a logic-based framework."
It also includes lots of KM based articles, the ones I found interesting: "The Road Ahead to Competency-Based Learning Activity Selection: A Semantic Web Perspective", "Reflection on the future of knowledge portals", "Reflection on the future of knowledge portals", and "Methodologies for the Semantic Web: state-of-the-art of ontology methodology".
The book review, "Developing Semantic Web Services", has a link to Semantic Web Author a "...Multi-Markup Language (XML, RDF, and OWL) Validating Parser, Editor, and Web Development Environment."
| 0 comments | Link me |
What real hackers do
Yeah, real programmers go home and boot up Linux and write some really neat regexes in Perl. Yeah ;-).
| 0 comments | Link me |
Friday, July 30, 2004
Which one?
| 0 comments | Link me |
That Song
"“This song is Copyrighted in U.S., under Seal of Copyright # 154085, for a period of 28 years, and anybody caught singin’ it without our permission, will be mighty good friends of ourn, cause we don’t give a dern. Publish it. Write it. Sing it. Swing to it. Yodel it. We wrote it, that’s all we wanted to do.”
I'll bet, therefore, that Woody would be horrified -- and angered -- by the behavior of an outfit called The Richmond Organization, which controls the copyright to his music. This humor-impaired crew has gone ballistic and has launched legal threats (CNN) at JibJab."
| 0 comments | Link me |
Thursday, July 29, 2004
Triple Store Bake-Off
Browsing and configuration times were the most pertinent figures to our future work. We don't believe the browsing times are really significant beyond the second granularity, so by that metric, it appears models with a performance between one and two seconds are potentially worth pursuing. All of our network models with caching appear to fall in that range, which is perhaps not a surprise since all of them implement caching in approximately the same fashion.
This leaves configuration time as the more interesting metric - how fast does a store return its results for creating the in-memory cache? For network models, the fastest were 3store and Sesame with files, though using files for the remote store is akin to using an in-memory model for our application, meaning it probably is not feasible for extremely large stores. So 3store and Sesame using MySQL 3 appear to be our best choices."
What would be nice to see is the data and queries being done. Some of the code is here.
There does seem to be some slight errors in the code, like creating a new ItqlInterpreterBean every time which effectively sets up a new RMI session. There are large differences in the testing, like the local "Load Page" is slower than over the network by two orders of magntitude, this may have to do with using the Jena API on top of Kowari.
The "configure" tests appears to be testing different things, because the variation in results including both the network and local tests is from 2ms to 200,000ms. The difference in Kowari local vs Kowari over the network is 2166ms vs 80304ms. Which shows the network version is slower but the only difference should be RMI and 78 seconds seems excessive even for RMI.
And the use of "In-Memory" should probably be "In JVM".
Something not shown in the graphs is the time taken to load the triples:
* Jena w/ Postgres - 971784 ms
* Jena w/ MySQL 4 - 844257 ms
* Jena w/ MySQL 3 - 667138 ms
* Kowari - 139092ms
* 3Store - 213088ms
Overall, it's pretty much what's expected, Kowari can achieve an order of magntitude improvement over SQL databases even over small datasets. Comparing Kowari against an SQL database with 5-10 million statements would show a greater margin of difference. Jena Fastpath and creating our own Model implementation should speed some of these results up.
| 0 comments | Link me |
Wednesday, July 28, 2004
Is that curve a little steep?
First I'd heard of ARRESTED.
| 0 comments | Link me |
The Sound of IR
| 0 comments | Link me |
Tuesday, July 27, 2004
More Kowari References
* An approach to using the Resource Description Framework (RDF) for Life Science Data "Several open source solution were evaluated but did not meet our performance requirements. To be fair, few projects claim to support such large data sets and most focus on providing advanced features such as inference capabilities instead. Kowari [http://kowari.sourceforge.net/] was the most promising solution, but does not at the time fulfill the last two requirements." The last requirement was maintaining insertion order.
* del.icio.us / url
* RDF APIs (JRDF 0.3 should be out soon, btw).
| 0 comments | Link me |
First Kowari Kontribution
Chris is the author of RDQLPlus.
| 0 comments | Link me |
Monday, July 26, 2004
What to do with a 40 Petabyte iPod
Scanning books costs between $5 and $20. That's the mechanical cost if you just wanted to scan a book and end up with the images of the pages at high enough resolution that you could print it on a high-end laser printer so it would be a good facsimile at 600 DPI, color—a nice-looking book. So books are doable, in terms of technology.
Now let's take music. It's been estimated that there are about 2 to 3 million albums. In terms of salable units—things that were sold as either 78s, LPs, or CDs—that's the universe of commercial music. If you do the math again, it's a few more of your bookshelves. So you're still not talking about anything daunting.
If you take movies and video, Rick Prelinger [founder of a film collection known as the Prelinger Archives] estimated that the total number of theatrical releases of movies was between 100,000 and 200,000. Again if you do the math, based on DVD quality, you come up with low numbers of petabytes [one petabyte is 1 million gigabytes]."
You'd still have enough storage space left over for your address book, email, and every second of your life in video.
What was also interesting is the comments about the printing of library books rather than borrowing:
"A 100-page black-and-white book with current toner and paper costs in the United States is $1, not figuring labor costs, rights costs, or depreciation of capital. That's an interesting number, because at a buck a book, it turns out that for a library, it could be less expensive to give books away than to loan them. In his book, Practical Digital Libraries, Michael Lesk reported that it cost Harvard incrementally $2 to loan a book out and bring it back and put it on the shelf. This is not figuring in the warehousing costs and all the building costs. This is just the incremental cost of loaning a book out."
| 0 comments | Link me |
D2RMap 0.3
| 0 comments | Link me |
Sunday, July 25, 2004
RDF Mapper 2.0
RSS is translated into RDF before processing (except for RSS 1.0, which is already RDF). For brevity, RSS is mentioned in what follows only when the non-RDF variants of RSS (RSS 0.9x and RSS 2.0) require explicit discussion."
| 0 comments | Link me |
Thursday, July 22, 2004
Tamino goes Semantic (sort of)
"In keeping with Champion's vision of Tamino evolving with semantic technology, he pointed out that the new version offers capabilities for a meta data repository containing definitions of business terms that can be used for 'semantic integration.'
The new version has a special developer's edition and includes improvements made for developers, including:
* expanded XQuery, XPath and text retrieval functions, including a thesaurus;
* additional indexing capabilities for rapid query execution;
* improved handling of standard XML schemas; and
* a redesigned and more intuitive online tutorial."
Also of interest is Perspective on XML: Steady steps spell success with Google.
| 0 comments | Link me |
Two for Thursday
Defining N-ary Relations on the Semantic Web: Use With Individuals "In Semantic Web languages, such as RDF and OWL, a property is a binary relation: it links two individuals or an individual and a value. How do we represent relations among more than two individuals? How do we represent properties of a relation, such as our certainty about it, severity or strength of a relation, relevance of a relation, and so on?"
| 0 comments | Link me |
If you look closely...
| 0 comments | Link me |
IBM Releases Semantics Toolkit
1. Integrated Ontology Development Toolkit (Orient), as a visual ontology management tool, is mainly used by domain experts who have limited computer knowledge but who are familiar with specific domain knowledge. It is designed as a set of loosely-coupled cooperative Eclipse plug-ins. Orient can now run on Eclipse 3.0 or compatiable software. Orient is a joint R&D project of IBM China Research Laboratory, Beijing, and APEX Data and Knowledge Management Lab, Shanghai Jiao Tong University.
2. Extended Ontolgy Definition Metamodel (EODM) and RDF Repository Star (RStar) provide a set of programming APIs for programmers and IT specialists. EODM is designed to provide a high performance OO interface for the programmer. Now, it is mainly used to manage ontology-level data with limited size.
3. RStar is used for storing and querying mass data, most of which belong to the instance level. In such a situation, the programmer will use SQL-like sentences to manipulate data."
"RStar provides a high-performance RDF storage and query system. It can takes RDF/XML files or RDF triples as input for loading ontology and instances. It accepts queries in the RStar Query Language and returns results as tables. It supports RDF(S) inference. Currently, RStar uses relational database as its back-end storage."
From, Semantic Web Interest Group IRC Scratchpad.
| 0 comments | Link me |
Wednesday, July 21, 2004
XQuery or SQL
"“With XPath calls, you go down one leg at a time,” he said, with manual coding required to traverse more than one leg at a time. “XQuery’s FLWR statement has loop statements. But you’d have to do your own correlation between paths and set up a different path call on each leg, and that gets complex.”"
"At the heart of SQLfX, which David expects to release in mid-2005, is SQL’s “outer join” operation. This brings two hierarchical structures together as a means of coping with XML’s nesting. “If you’ve ever looked at two legs of an org chart to see how they’re related, that’s what this does. The user doesn’t have to know the structure; they just need to say what data they need.”"
"“Because XML documents can and often do have a large maximum depth of nesting, with 10 or 15 levels not uncommon,” Melton continued, “a combination of 10 to 15 outer joins would be required to reassemble the data into a hierarchical representation,” which he said is enough to make many SQL engines bog down.
Ironically, David claims to address these inefficiencies with proprietary algorithms."
So, there is a similar debate in the database world about using XQuery over SQL to query XML.
The use case for multiple paths in a hierarchy, is similar to the Optional Match requirement in the DAWG. With RDF, of course, it's graph matching not multiple hierarchies.
With respect to querying RDF, I'm not sure that there should automatically be only one type of syntax. Currently, the DAWG is focused on the use cases and the required operations to meet these use cases. Then I'm sure the group can make a judgement as to how it could be expressed functionally (like XQuery does for XML) or declaractively (like in a BRQL/iTQL way).
Another problem that was brought up in our discussions at work was with the return syntax in XQuery. Applying some of the syntax of XQuery to an RDF query language, it would have to describe returning either a graph or some sort of list of results. This seems to be mixing the binding of results with the presentation of the results.
Paul's most recent blog discusses some of the issues, especially as Network Inference continues to make the claim that RDF is "grounded" in XML.
| 0 comments | Link me |
Monday, July 19, 2004
Adaptive Information
"Semantic Interoperability Framework – A highly dynamic, adaptable, loosely-coupled, flexible, real-time, secure and open infrastructure service to facilitate a more automated information sharing framework among diverse organizational environments."
This was preceeded by:
"One way to describe a system is with a set of buzzwords. A standard set of them has been used to describe the framework. The rest of this section is to explain what is meant by those buzzwords and the problems that are being addressed."
| 0 comments | Link me |
Everyone wants to Integrate
"He noted that adding integration capabilities to the JBoss application server mirrors what other Java server companies are already doing and could help make JBoss more competitive.
"Integration is a critical factor in many of the same projects that people are deploying application servers for," O'Grady said. "It's almost as if integration is a new checklist item for application server projects." "
And something I thought I'd wouldn't see JBoss Application Server gets J2EE-certified.
| 0 comments | Link me |
Ontology Editors
"Reference to taxonomies and ontologies by vendors of mainstream enterprise-application-integration (EAI) solutions are becoming commonplace. Popularly tagged as semantic integration, vendors like Verity, Modulant, Unicorn, Semagix, and many more are offering platforms to interchange information among mutually heterogeneous resources including legacy databases, semi-structured repositories, industry-standard directories and vocabularies like ebXML, and streams of unstructured content as text and media."
"The ontology editor enhancement mentioned most often by respondents was a higher-level abstraction of ontology language constructs to allow more intuitive and more powerful knowledge modeling expressions."
And on the second page:
"While achieving full-range ontology editing functionality is a tall order for toolmakers, the capabilities called out above are not the only demands toolmakers face...Some see the gathering demands as an impending crisis for providing editing environments that can accommodate an expanding scope of ontology language responsibilities. Eventually, editors will have to address the ontology language and reasoner functions currently under development..."
| 0 comments | Link me |
XQuery, XDS and Oracle
"There's a lot of similarity between the technologies Andrew links to, together with what Oracle are trying to achieve with XDS and XQuery, and what we're trying to do with business intelligence, data warehousing and data mining. It wouldn't suprise me if we start to hear more about XML, XQuery, RDF and so on in a business intelligence context in the future, and I fully expect these sorts of technologies making their way into Oracle's BI & knowledge management products over the next few years."
I've mentioned Oracle's recent interest in RDF here and here.
| 0 comments | Link me |
Saturday, July 17, 2004
SW is Vietnam
So when did the French try the Semantic Web?
| 0 comments | Link me |
That was quick
We believe that the DAWG working group is making an egregious error by rejecting any level of commitment to XQuery at this critical juncture."
"Regardless of outcome, Network Inference will remain devoted to our customer feedback by continuing our XQuery support for query-driven inferencing across RDF and OWL data inside our Cerebra Server product family."
| 0 comments | Link me |
Friday, July 16, 2004
Semantic Web and MDA
"As ontologies move into industry they need to coexist with industrial metadata. We do not want ontologies to become yet another silo in a fragmented metadata landscape. Since much enterprise tooling is moving toward MOF-based metadata management, a minimal goal would be to make it possible for MOF-based tools to physically manage ontologies using the common MOF mechanisms."
"In order to achieve the goal of using MDA and the Semantic Web together, the OMG issued an RFP that calls for standardizing the following:
* A MOF metamodel for ontology definition
* A UML profile for ontology definition
* A mapping between the UML profile and the MOF metamodel".
Found by “Semantic Web” applied.
| 0 comments | Link me |
RDF Querying going to the DAWGs
"You show RDFS/OWL/Rule query langauges as somehow being more easy inXquery, but again I think that is because you are assuming these things will be kept in their RDF/XML documents, or in APIs that respect the "boundaries" of those. I already see many applications moving towards multiontologies w/linking, and that seems to me to argue that we simply don't know yet which of these models are better."
The original proposal suggests that we're going to need a query language for OWL, Rules and RDF, which probably won't happen and it's suggested without proof. It also suggests that because RDF can be serialized in XML that it has something in common with XQuery, which is untrue. The standard RDF/XML serialization can have multiple forms of the same RDF graph. The same RDF query will work across different RDF/XML serializations, because it is operating on the same data model, this isn't true for XQuery.
It reminded me of the recent anti-XQuery article, "If You Liked SQL,You'll Love XQUERY".
Fabian is saying that the relational model was a simplification of graph theory. In this respect relational theory and RDF have much in common, much more in common than XML.
At a syntactic level, query languages like RDQL, iTQL and other SQL-like RDF query languages are leveraging off a legacy of SQL, Datalog and other similar languages. This is something that XQuery lacks as well. Do we really want FLWOR and Conditional Expressions in our query language?
Fabian also mentions NULLs, a continual pet peeve of the anti-SQL crowd, it's good to see XQuery avoids this. Something that I hope an RDF query language avoids as well.
Interestingly, Don Chamberlin's XQuery tutorial is quoted both by Fabian and in Jeff's proposal.
Andrae is also blogging some of this as well, "Jumping the gun".
| 0 comments | Link me |
Thursday, July 15, 2004
New co-Chair of SW Best Practices
Fwd: W3C Announcement: David Wood, New Co-Chair of the Semantic Web Best Practices and Deployment Working Group
| 0 comments | Link me |
Labels: david wood, semantic web, tucana
Querying with Rules
* Answering DL Queries using Deductive Database Techniques and
* Rules and Queries with Ontologies: a Unified Logical Framework.
This is appropriate for our current TKS work, see Paul's blog for more details.
| 0 comments | Link me |
Everything New is Old Again
There's also the older, yet still relevant THTTP specification for encoding resolution into a HTTP request.
I was aware of the Handle System, which is for documents, and has its own RFCs including RFC 3650.
I think I got rid of all the times I tried to type RFC and my hand spat out RDF.
| 0 comments | Link me |
Wednesday, July 14, 2004
Supersonik
Zladko “Zlad” Vladcik was to perform his very popular techno-ballad, “Elektronik – Supersonik” - described as “a melodic fusion combining hot disco rhythms with cold war rhetoric”."
"Hey baby, wake up from your asleep
We have arrived on to the future,
And the whole world has become...
Electronic... Supersonic...
Supersonic... Electronic"
Champagne comedy indeed...by Working Dog.
Via Metafilter.
| 0 comments | Link me |
Discretization
And while a little unreadable, The unbearable inevitability of discretization is an interesting rant about the Semantic Web and all things in general:
"Evolutive efficiency also applies to the Semantic Web. Luckily for us, it benefits from two distinct evolutionary avenues. It indeed gains effectiveness both from cleverer agents and from semiotically-complete ontology representation formats (relational databases, XML/RDF, OWL, UML, etc.). Therefore, with some site correctly implementing the Semantic Web-enabling technologies, one is right to argue that the Web already shows some signs of semantic intelligence.
Discretization is the fundamental mechanism behind any form of cognition. Solve et coagula-based computing rules!"
| 0 comments | Link me |
For XML Users
They think it's an utterly complex way to write metadata that you can do with simple namespaces. The two worlds (despite being both hosted inside W3C) don't talk very much. Many (if not all) W3C folks are all in the RDF camp (and have been there for a while) and they see XML as a half-baked attempt to solve issues that RDF already solves. Unfortunately, not having been in the XML camp at all, they have no way to communicate with the other side.
The XML camp, on the other hand, thinks that they know how to build things that work, while the RDF people are all sitting in their ivory towers telling them that what they are doing is wrong, but without understanding their real-world needs.
As it normally happens in a debate, both are right and both are wrong. "
| 0 comments | Link me |
Tuesday, July 13, 2004
Nature nuturing Oracle
"NPG and Oracle investigating the suitability of the NDM to store and query
RDF-encoded information
o Storage looks OK
o Can hold directed labelled graphs
o Allows URIs, literals and blank nodes
o Can include provenance information
Querying may need more development:
o Can extract sub-graphs but performance and scalability need to be tested
o RDF/XML import and export would be desirable
o Support for RDFS- and OWL-based inferencing"
Mentions Urchin.
| 0 comments | Link me |
Monday, July 12, 2004
Ontologies for the Web
| 0 comments | Link me |
Sunday, July 11, 2004
N3QL
Part of CWM.
| 0 comments | Link me |
WebMethods and RDF
Glass said WebMethods has been looking at specifications for publishing metadata of other types, such as the W3C’s Resource Definition Framework (RDF). “This looks promising as a way to represent a broader array of metadata than simply that of Web services. And in [the forthcoming] UDDI version 4, there’s a lot of work on leveraging RDF.”"
| 0 comments | Link me |
Friday, July 09, 2004
Pay me!
Many of us, no doubt, employ the wishlist facility on Amazon to communicate our birthday needs to distant relatives. The decentralization of this seems like a natural target for RDF-savvy developers."
| 0 comments | Link me |
More metadata than data (again)
"I would claim that there is more implied data (or inferable meta-data) than "raw" data on the web, and that we are barely scratching the surface of it. Today, all search engines are scraping for some simple forms of implied data: language, locality, etc. What's missing from this list is a nearly infinite collection of relationships that are obvious to most any human reader but extremely difficult to infer from a single document. The reason why implied data is so hard to identify is because, in the aggregate, it forms our collective cultural wisdom."
| 0 comments | Link me |
Thursday, July 08, 2004
Oracle's RDF Store
The attached document is based on articles available from Oracle. An indepth description is available in "Network Data Model Overview" (free registration required).
There are various schemas defined for storing networks which includes:
"NODE_NAME VARCHAR2(32) Name of the node.
NODE_TYPE VARCHAR2(24) User-defined string to identify the node type."
The schema is obviously not designed to store RDF unlike other RDBMS mappings.
One difference is their flexibility in storing different graphs and giving links a "cost".
Another difference is their nodes and links are typed as strings; this looks like it would limit the effectiveness of data type operations. Querying for all nodes that are numbers between two values or dates between two ranges is going to be costly compared to dedicated data type handling. That's apart from the obvious difficulty in trying to put everything into a VARCHAR2(24).
Unless they have optimised the query layer specifically for the task, which might be case, it will also incur the costs of joining against the same table many 10s or 100s of times.
It does have some neat operations (like shortest-path), a Java API, PL/SQL integration and of course it integrates well with existing Oracle databases.
| 0 comments | Link me |
Tuesday, July 06, 2004
Kowari 1.0.4 Released
* Walk and transitive constraints.
* Backup individual models.
* Automatic reconnect of the iTQL Swing UI when the server restarts.
* Constructing Jena and JRDF with sessions rather than databases to allow multiple access.
* Sub-queries and the greater-than/less-than constraints are much faster.
Download here.
| 0 comments | Link me |
Monday, July 05, 2004
A Simpler Time
In each of three sets of horizontal lines of random lengths, the demo sorted the collection by size, from shortest to longest, by actually moving them up and down in the browser. The audience had never seen anything but static images in a browser before this: The lines were moving, as if being sorted by unseen hands!
Suddenly, everyone in the room was rethinking the potential of the Internet. Far from the crash-and-burn scenario Gosling had first envisioned, his demo had jolted a very influential audience off their seats, and they were delivering enthusiastic applause. And within this technology-entertainment crowd, word would spread quickly."
That draw dropping demo still runs, too.
| 0 comments | Link me |
Sunday, July 04, 2004
Java Rules
The manual says: "The mandarax inference engine uses backward reasoning, and the reference implementation uses an object oriented version of backward reasoning similar to the algorithm used in Prolog. On the other hand, most commercial rule systems such as ILOG and popular open source solutions like CLIPS and JESS use forward reasoning, in particular an algorithm called RETE. This algorithm keeps the derivation structure in memory and propagates changes in the rule and fact base."
A description of the RETE algorithm is here.
| 0 comments | Link me |
Desktop Metadata
Comments link to: WinFS is not filesystem, Spotlight, rdf semweb winfs, Haystack, Questions about Longhorn, Pike and libferris.
| 0 comments | Link me |