Sunday, May 02, 2004
Kowari linkers
DeliverableS4Simile "Naive use of persistent store in Jena decreases performance by 100x:
* Part of the problem is limited expressivity in RDQL
* Look at performance tuning on databases
* Ryan: Look at the performance of more specialized RDF databases, cf. Kowari"
Open Source Projects That Use Java NIO "The storage engine of Kowari is a transactional triplestore known as the XA Triplestore. ll relevant fields of in-memory and on-disk data structures are 64 bits wide..."
Kowari for hundreds of millions of triples "Jim Hendler emailed me in response to my having mentioned on www-rdf-interest@w3c.org that I was surveying triple stores for use in data mining and machine learning. He mentioned a Java-based, non-relational, triple store called Kowari that is available in open source form..."
RDQLPlus "I discovered Kowari last week. The iTQL language is very similar to what I've come up with for RDQLPlus... having equally been inspired by SQL/DDL. Kowari looks like a nice database. I've downloaded it but haven't had a chance to play with it yet..."
Some Tools "Kowari is a layer on top of Jena with OWL reasoning, too"
I would say that Kowari is a layer beneath Jena - one that provides persistance. The OWL reasoning is really only offered at the moment through Jena. However, we will be getting some basic inferencing, at our own query layer, in there soon too.
[protege-discussion] Re: large data sets, bulk data acquisition "I had a really bad time with Kowari earlier this year, it wouldn't compile and then pass its own self-tests...."
This is basically problems with Windows. We develop on Linux and OS X and only do QA on Windows. Our initial release had known problems under Windows - which lead to failing unit tests but was not fatal for data storage. Anyway, it is fixed now; although Windows does have some drawbacks when it comes to using NIO.
| 0 comments | Link me |
Ontological Software Development
Considering that statement, it's also clear that application independence of ontological models makes these applications candidates for reference models. We do this by stripping the applications of the semantic divergences that were introduced to satisfy their requirements, thus creating a common application integration foundation for use as the basis for an application integration project."
"Once we define the ontologies, we must account for the semantic mismatches that occur during translations between the various terminologies. Therefore, we have the need for mapping.
Creating maps is significant work that leverages a great deal of reuse. The use of mapping requires the "ontology engineer" to modify and reuse mapping. Such mapping necessitates a mediator system that can interpret the mappings in order to translate between the different ontologies that exist in the problem domain. It is also logical to include a library of mapping and conversion functions, as there are many standards transformations employable from mapping to mapping."
| 0 comments | Link me |
Friday, April 30, 2004
Limiting Complexity
I've spoken to people that spent several years of their lives coming up with an ontology and their perception is that the complexity over time of these models to cover a particular domain saturates, does not continue to grow.
This is a basic, but vital assumption for this entire approach to work: if the ontologies grow linearely with the amount of information they can describe, the ontology creation/maintainance process simply won't scale globally."
| 0 comments | Link me |
Intellidimension's SWS
From the FAQ:
"...using our standard search engine interface you can just type a one or more of keywords describing the information you are trying to locate. This is no more complicated than a traditional Web search engine. However like a traditional Web search engine this can lead to a large number of irrelevant results. To narrow your search you can restrict it to the specific type of resource that you are trying to locate such as a person (FOAF Person) or news article (RSS Item). If your search is still producing a large number of irrelevant results than you can refine it further by specifying one or more specific property values that the resource must have."
| 0 comments | Link me |
Browse the Semantic Web
Some thoughts on RDF rendering had some ideas on visualizing the Semantic Web too.
| 0 comments | Link me |
The Passion of RDF
See also: The Vision of a Semantic New Testament: "Just as important as avoiding commercial barriers to sharing is the requirement that SemANT support existing and emerging standards that enable use across the Internet. To this end, SemANT will build on the Semantic Web Activity of the World Wide Web Consortium (W3C), including XML as a syntactic standard for data interchange, and RDF for ontology-based representation, and DAML/OWL for additional semantic expressiveness.".
Another example of when your technology has matured like: PCs, CDROMs, Hypertext, the Web, etc.
| 0 comments | Link me |
XUL vs XAML
XAML, Microsoft warned, is more potent than XUL in its ability to reflect exactly what's in the operating system.
"XUL is not the multipurpose declarative language that Gnome probably wants," said Ed Kaim, product manager for the Windows developer platform. "People say that when all you've got is a hammer, everything looks like a nail. In the same way, people are trying to figure out how to crush XUL into an OS it really wasn't designed for. The browser is great for a lot of things, but when it comes to robust client side applications, it's not the best."
Another trick will be in reconciling XUL with Gnome's existing user interface technology.
"There are ways to marry them," said Bruce Perens, an open-source consultant who serves as executive director of the Desktop Linux Consortium, a marketing organization. "But it's very difficult to get the two teams working in the same direction. They both went on a several-year tour of technical creation where they sat down and created everything they needed to do GUI [graphical user interface] applications — and they didn't create the same thing. Now to get them together it would take some number of years to resolve the technical diversions.""
| 0 comments | Link me |
Query Use Cases
Although a little self advertisement and some missing languages, its a good thing to read. If you need info about RDF Query languages, read it.
My previous demand about "optional joins in queries" is answered by SeRQL."
The report is an excellent example of the current features required from a query language. From what I can tell iTQL implements 11 of the 14. Some of the others are fairly trival to add support for (like the data type support).
| 0 comments | Link me |
Wednesday, April 28, 2004
Will the Semantic Web Scale?
Lately, several researchers doubted whether the Semantic Web idea will ever scale for numerous reasons technological [But03], theoretical [van02] and practical [MS03,Sow]. Dedicated workshops on that topic [CKDE03,VDC03] have been organized recently to promote research to improve scalability. We will pick up these three categories of doubt by organizing the panel in three parts discussing each aspect: theory, technology/implementation, and practise."
I'm not sure I agree, there are very few Semantic Web systems that don't reuse existing SQL databases - they just suck at storing triples. With Kowari, and I'm sure with other native stores, the data structures and techniques used are taken directly from databases. They mention "Is the semantic web hype?" (which I responded to) and a few others. Although, there's no links to syllogism, metacrap or gnomes. BTW, I'm still not sure why you'd want an XML version of OWL.
"Network round-trips are often considerably less costly than the time taken for a transactional database operation due to the need to forcibly log transactional operations which is very costly in terms of disk performance. i.e. network round-trips aren't always the performance bottleneck." From Martin Fowler's First Law of Distribution.
As long as you keep the Semantic Web like the Web there's no real reason why it shouldn't scale.
| 0 comments | Link me |
Google Watching
Google Goes Public? The Rich Get Richer "People speculate. People dream. And if the numbers are to be believed, people will drool. The current prediction is that Google, if it decides to sell shares to investors this year, would probably end up with a market value of $20 billion to $25 billion by the end of its first day as a publicly traded company."
Google's Brin Talks on Gmail Future "It was interesting to me that you did finally hit on the word conversation. It seems to me that there's a synergy between the elements of the conversation in the RSS space and what you're doing in the e-mail space.
I think that's very true. Part of the things we've seen why blogs and RSS feeds are such a success is that you can actually read it—you don't have to stop, click back and forth, collect bits and pieces here and there—but it is all presented to you as one. "
| 0 comments | Link me |
Mozilla to Upgrade RDF
| 0 comments | Link me |
Tuesday, April 27, 2004
Ant is now more useful
In a nutshell, the
Also new is macrodef: "Macrodef is a way to define a new Ant task in an Ant build itself. Macrodef allows you to define standard tasks that have attributes and elements given to them when they are called."
| 0 comments | Link me |
Paul's Blog
| 0 comments | Link me |
Friday, April 23, 2004
Free Bits of Description Logic
| 0 comments | Link me |
Metaweb Graph Updated
| 0 comments | Link me |
Semaview Interview
"My company, Semaview has developed an application called eventSherpa. eventSherpa is making it simple to create and organize schedules and share them over the Internet. Our application automatically creates Semantic Web content transparently without the end user knowing it...Aside from reducing the complexity issue...I believe the largest challenge is convincing application developers to make their data available in semantic format. However it is "a chicken and egg problem" -- the more content available in a semantic format, the more applications that will be developed to take advantage of it; and vice versa."
| 0 comments | Link me |
Knobot
From Danny Ayers: Knobot PlanetRDF Demo.
| 0 comments | Link me |
XML 2004
"Consequently, even at the low level of operating systems vendors are seeing the need and advantages of implementing metadata storage and manipulation.
This is good. We have the tools to support this, whichever way you swing on the technology issues. RDF & OWL, Topic Maps, W3C XML Schema: all have the right machinery. Unfortunately that's not the biggest issue. The main problem is which terms, schemas, and ontologies to use. That's just not clear right now for most if not all metadata applications. At best, we'll get inconsistently classified information, which defeats the promise of interoperability. More typically, we'll end up with little tagged metadata and islands of de facto proprietary information."
"As an RDF fan, the realization of this truth causes me some pain. The way out is to stop thinking of RDF as an XML application, and look to easier syntaxes such as Turtle and N3."
| 0 comments | Link me |
Wednesday, April 21, 2004
Unix Job Ad
"But the point of all that is, Unix is basically a sort of secret society where you either know it, or you don’t. And since most people just really can’t be bothered going through the agonies of learning it, it’s why we have jobs like this: “Unix Specialist”. Of course that means nothing, or at least it means about as much as “Car Specialist” or “Bread Specialist”. Bread Specialist? What the hell is that? What kind of bread? White, multigrain, mixed grain, wholemeal, sourdough? Sliced or unsliced? If sliced, sliced for sandwiches or for toast? Crusty or soft? No matter! Just eat your bread!"
| 0 comments | Link me |
RDF Engine
| 0 comments | Link me |
Tuesday, April 20, 2004
80/20 REST to SOAP
| 0 comments | Link me |
Monday, April 19, 2004
Phew
BTW, you can now do things like: "select $s $p $o from <http://www.w3c.org/2000/08/w3c-synd/home.rss> where $s $p $o ;". You can combine local and remote (via file or http) models by using "and" and "or" in the FROM clause.
| 0 comments | Link me |
Bloom Filters in Social Networks
"If any one of the filters is intercepted, it will register the full 50% false-positive rate. So I am able to hedge my privacy risk across several interactions, and have some control over how accurately other people can see my network. My friends can be sure with a high degree of certainty whether someone is on my contact list, but someone who manages to snag just one or two of my filters will learn almost nothing about me."
"Additionally, you can combine two Bloom filters that have the same length and hash functions with the bitwise OR operator to create a composite filter."
| 0 comments | Link me |
Sunday, April 18, 2004
Save Our Software
"First, the FCC is conducting a proceeding to set the guidelines for what it calls 'software defined radio...in the near future, it will be possible use spectrum more efficiently and to increase competition in a space that's now the sole domain of incumbent operators. But only if we keep the FCC from regulating these technologies."
"Second...the FCC generally approved a mandate called the 'broadcast flag,' which would require that digital TV broadcasts include an anti-theft code to prevent consumers from recording programs off the air."
| 0 comments | Link me |
Thursday, April 15, 2004
Webfountain posting
"As I mentioned earlier, IBM’s model for WebFountain is platform-based. Assuming they can pay the freight, most anyone can develop for it, using a standard API that leverages simple web services. IBM won’t disclose most of its customers, but two it will mention are Semagix, which has a (pretty damn frightening) money laundering application, and Factiva, which has developed a “reputation manager” - think of it as Technorati on steroids for the serious corporate marketing or legal department. (Imagine being able to find any mention of your product or service anywhere on the web and create custom filters for the context, location, date, author, and relationships attached to those mentions, in near real time.)"
| 0 comments | Link me |
History of Treemaps
Tree structured node-link diagrams grew too large to be useful, so I explored ways to show a tree in a space-constrained layout. I rejected strategies that left blank spaces or those that dealt with only fixed levels or fixed branching factors. Showing file size by area coding seemed appealing, but various rectangular, triangular, and circular strategies all had problems. Then while puzzling about this in the faculty lounge, I had the Aha! experience of splitting the screen into rectangles in alternating horizontal and vertical directions as you traverse down the levels. This recursive algorithm seemed attractive, but it took me a few days to convince myself that it would always work and to write a six line algorithm."
I especially liked PhotoMesa.
| 0 comments | Link me |
Wednesday, April 14, 2004
RSS on the TV
You know your technology has reached maturity when someone tries to put it on the television. A good followup posting with links to screenshots.
| 0 comments | Link me |
DSpace presentations
| 0 comments | Link me |
42...What was the question?
Part of a larger set of articles on UML Fever.
"The entertaining tenor of "Death by UML Fever" should not be inferred to suggest that UML fever is an imaginary ailment. It is genuinely real, it is thriving, and its presence is causing cost and schedule trauma on many software programs right now. Furthermore, the root causes of this fever, in general, have nothing to do with the UML itself: Rather, this fever and its various manifestations are largely symptoms of deeper ills in an organization's software development practices. Software organizations should consider launching self-diagnosis campaigns to assess the presence or extent of UML fever on their programs and plan rehabilitation strategies as necessary. Developing good software is a difficult enough task without having to endure the preventable and often painful complications of the dreaded UML fever."
| 0 comments | Link me |
We all Agree
Yawp after me - "RDF! RDF!". But seriously, the answer to "why not?""
From Yawping!. The collaborative tools manifesto has been mentioned several times recently it's worth a look.
| 0 comments | Link me |
Monday, April 12, 2004
Another RSS library for Java
| 0 comments | Link me |
Sunday, April 11, 2004
Querying Use Cases and Features
Some of the features being considered include: optional triples, disjunction, queries with paths of length two or more edges, able to indicate whether the query response includes entailment from the graph or treats the graph as a fixed object, expressing arbitrary RDF datatypes, queries expressible as a URL, user specifiable query result formats and query results in RDF (closure).
| 0 comments | Link me |
Saturday, April 10, 2004
Librarians and the Semantic Web
* Extreme! "I want people to leave understanding that the Semantic Web didn’t come out of nowhere, that there’s a good century (at least!) of work coming to terms with infoglut that Semantic Web technologies build on. I want them to start thinking critically about website information architecture, about information management in general, about metadata, about problems with (and hopes for!) search engines.",
* More on metadata "Well, I have news. Everything Cory says is true. But it doesn’t matter. Hear me? It doesn’t matter. Because all those arguments aren’t arguments against a Semantic Web—they’re arguments against a populist Semantic Web.", and
* Metadata and authority "This means, quite simply, that a lot of current Semantic Web pipe dreaming will founder on authority and representation-convention problems. Heck, it’s already foundering."
One day, when I categorize this blog there will definitely be a "librarians" section.
| 0 comments | Link me |
SASDO
From GRDDL.
| 0 comments | Link me |
Gnowsis
He gave a list of features to add: optional triples, update language, protocol on the web and a protocol on the desktop (ODBC like). The first two are handled by iTQL although the second is more elegant than first. Kowari/TKS has always had a driver. The Kowari Lite client, currently the executable iTQL jar, is only about 1MB and doesn't include things like Jena.
why I love Patrick Sticklers URIQA approach "Building RDF aggregators is not an easy task. I have tried it in many different ways, and had varying forms of success. If you want an easy approach that does it, think about concise bounded descriptions."
I do think that the resolver idea, using the FROM clause, views and defining caching and updates, is going to be an easy way to do RDF aggregation.
| 0 comments | Link me |
Thursday, April 08, 2004
Regime Change
""I'm kind of hoping for regime change," one official who quit told Reuters."
""Iraq has been a distraction from the whole counterterrorism effort," said the former official, adding the policy had frustrated many in the White House anti-terrorism office, about two-thirds of whom have left and been replaced since Sept. 11."
"Roger Cressey, who served under Clarke in the White House counterterrorism office, said: "Dick accurately reflects the frustration of many in the counterterrorism community in getting the new administration to take the al Qaeda issue seriously.""
Oh and for something cool with photo-mosaics:
War President "Below is a small version of an image I made. It's a mosaic composed of the photos of the American service men and women who have died in Iraq. No photograph is used more than three times." From here.
If it's broken:
http://randomfoo.net/junk/200404/warpresident/warpres.jpg
An OS tool to do mosaics (includes Tux mosaic - I thought it was something else when I first looked at it):
http://www.stud.uni-hannover.de/~michaelt/juggle/
| 0 comments | Link me |
Is WiX a trick?
Has Microsoft done this with WiX's XML formats yet? I don't know. But if the pros from Redmond haven't yet, they will. They did it for Office XML document formats; they'll do it for this. Thus, Microsoft's open-source code will work only on Microsoft-proprietary XML to produce Microsoft-proprietary installation programs. With open source like this, who needs proprietary programs?"
"Now, while it may look like Microsoft is doing something new, or perhaps even something helpful to the open-source community, it's not. What Microsoft is really doing is putting more of the Halloween memo's plans into action. Why shouldn't it? The Halloween plans are just an elaboration of Microsoft's time-tested embrace-and-extend technique The only embrace Microsoft is really giving the open-source community is a stranglehold. "
| 0 comments | Link me |
Tuesday, April 06, 2004
The Power of Free
• the user community must be vast
• the product scope must be well defined
• a viable business model must exist.
Today, I think that people are starting to understand that open source is more than just a development approach—it's a business model. What evidence is there to support this? Just follow the money: the OSBC conference was packed with venture capitalists and lawyers; techies were outnumbered ten-to-one. Even long time veterans of the closed source software world, like Ray Lane (previously of Oracle) and Chris Stone of Novell are focused on leveraging the power of open source in their businesses.
And why not? That's the nice thing about capitalism. It has a built-in Darwinian efficiency. If open source allows companies to deliver better products at a cheaper price, then it will be used to do just that. We see many of the most competitive companies in the world, like Amazon, Charles Schwab, Cisco, Corporate Express, FedEx, GE, Merrill Lynch, Motorola, Nokia, Sabre, SAP, and UPS using open source software as a platform for building new applications. Many companies that have been successful with Linux are now looking at deploying an entire open source stack, known as LAMP (Linux, Apache, MySQL, PHP/Perl/Python).
These days, it's easier than ever before for an ambitious developer to download all the software he or she needs to build a product—or even to build a business—on top of commodity hardware and open source software. Some of today's hottest pre-IPO companies, including billion-dollar-babies Salesforce.com and Google, as well as emerging social networking companies like AlwaysOn's Zaibatsu, Friendster, and Tribe.net, are all using open source software."
| 0 comments | Link me |
Kowari 1.0.2 Soon
It's now at the stage of what to do next. Some of them are fairly obvious like full data type support, scripting (possibly Groovy) and inferencing. But we've also got zeroconf and MAYBE. There's also a large refactoring underway to make Lucene, the triple store, ontology models and others as plugins exposed via a resolver interface. This should allow just about anything to be queried in the FROM clause of iTQL. In the short term, this means extending the querying of file and http RDF sources to include other metadata formats such as MP3 tags, iCal, vCard and XML based RSS. It will allow models to be typed and configured. Using views you could easily create an RSS aggregator. Other, more difficult resolver implementations will be available in TKS.
JRDF has been a little neglected. The next things to be worked on is transactions and a ResultSet like interface.
| 0 comments | Link me |
Your Life in a Wal-Mart Database
"And in the not too distant future, data may be able to interact with other data without human intervention. That was the message of Web inventor Tim Berners-Lee. He talked about the glorious future of something called the "semantic web," a concept that most people aside from programmers will probably never understand. His talk here didn't help. One hopeful journalist from the Economist asked Berners-Lee to give an example of how companies could make or save money using it, but he didn't have an answer. He doesn't have to. He's an academic. Businesspeople here assured me this is a big deal we'll hear more about. "
I thought I'd read this before but couldn't find it in a previous entry. Although I did find a previous claim that the biggest database was France Telecom's at 30 TB.
| 0 comments | Link me |
Monday, April 05, 2004
Two Quick Projects
* ROWL "The system enables users to frame rules in RDF/XML syntax using an ontology in OWL. Using XSLT stylesheets, the rules in RDF/XML are transformed into forward-chaining defrules in JESS. We make use of two more stylesheets to transform ontology and instance files into Jess undordered facts..."
| 0 comments | Link me |
Chimney Sweeps
"To answer questions within a company, you’ve got to reach across boundaries. You’ve got stovepipes of information. You have HR information. You have totally separate information for bug reports and customer satisfaction.
You might have another, for example, for documentation. We find that we were tracking a customer’s problem, but most of those documents left the company, and when we left the company, we should have kept them. So let’s find out who else has written documents which are being used by people who are tracking bug reports for processing customers who have this problem.
At the moment, what we’re looking at for this is huge monster applications like the pre-Web, which are document databases. That didn’t work because everyone wanted to operate it a certain way. It’s much better to let each part of a company have a Web site, and they’re linked to each other. Let the financial folks organize things how they want, and let manufacturing control people organize their things how they want, and then merge them. Where they have things in common like people or times or places—they have to be able to connect them."
| 0 comments | Link me |
Lisp
"It's something a running joke in the Lisp community that all language efforts are doomed to reinvent Lisp."
"It's corny to say the worst thing about Lisp is a Lisp programmer. But there's some truth in it. Work long enough in this industry and you'll run into a Lisp bigot...I think Lisp never traditionally emphasized libraries because the language is so flexible you can build one on your own, but then you have the twin nightmares of integration and reuse lying in wait for you."
I have been thinking about Lisp and aspect programming recently. The other is ontological programming or programming against an ontology.
| 0 comments | Link me |
Blast from the past
* Ercim News posting in 2002 mentioned both Corese and Notio. Their API uses strings to define concepts, relations and attributes. Apparently, it's now available for download (via PlanetRDF). It uses Jena's ARP but has its own CG interface.
* Sleepycat for Java Released under the same licence as the C version.
* Platypus Wiki Again, using Jena and looks good.
* Not in RDF The difficulties of NOT in RDF. I think Kowari/TKS lacks NOT because it was initially hard to do in our implementation; there's now no reason not to do NOT.
* Fixing the Java Memory Model, Part 2 Includes double checked locking, changes to volatile and initialization safety.
* Blog-Bleary? Try (What Else?) a Blog on kinja. Another review.
* RSS Versions Syndic8 is reporting that RSS 1.0 is now 46.4% of all RSS feeds with RSS 2.0 and 0.91 (combined) taking up 49.9%. Will RSS 1.0 take the lead soon?
| 0 comments | Link me |