The Third Web "Web services represent the third attempt to rebuild enterprise IT on the standards-based foundation conceived by Tim Berners-Lee. To realize that vision, vendors and IT pros must resist some familiar temptations and avoid some traditional mistakes...But it would be a sad repetition of past errors if IT vendors seized upon the success of Web services as their latest excuse for locking customers into product stacks—perversely building proprietary structures from standards-based components, for example, by devising proprietary XML schemas for their applications and documents."
Tuesday, June 22, 2004
Scaling Taxonomies
Verity Announces New Content Classifier (with screenshot) "VCC is based on roles and rules. Roles include taxonomy experts, subject matter experts within the organization (e.g., chemists, engineers, human resources staff, etc.), editors, and publishers. The company says the workflow feature allows taxonomy and classification management to be distributed to subject matter experts who know the content, as well as to knowledge engineers who know and understand taxonomy development. Different people who serve different roles are assigned different permissions to alter categories. The company claims that VCC is the only software that enables such real-time collaboration between knowledge workers and subject experts.
VCC uses rules to define how documents should be classified. Once taxonomies have been set up, VCC automatically classifies new documents as spiders discover them. The customer can control automatic classification by defining how well VCC thinks a document matches a category. For instance, if VCC’s confidence level for a candidate category exceeds 70 percent, then automatically publish it into that category; if not, route the decision to an assigned knowledge worker. Verity calls this “automated classification with manual oversight.”
Verity worked with DuPont to develop VCC. Whitney said the company uses VCC to manage a 25,000 node taxonomy. Internal users include “everyone from a bench chemist to a knowledge engineer—whoever.” He said VCC provides DuPont with “frictionless review between highly specialized knowledge workers, many with PhDs, and the knowledge engineering staff.”"
VCC uses rules to define how documents should be classified. Once taxonomies have been set up, VCC automatically classifies new documents as spiders discover them. The customer can control automatic classification by defining how well VCC thinks a document matches a category. For instance, if VCC’s confidence level for a candidate category exceeds 70 percent, then automatically publish it into that category; if not, route the decision to an assigned knowledge worker. Verity calls this “automated classification with manual oversight.”
Verity worked with DuPont to develop VCC. Whitney said the company uses VCC to manage a 25,000 node taxonomy. Internal users include “everyone from a bench chemist to a knowledge engineer—whoever.” He said VCC provides DuPont with “frictionless review between highly specialized knowledge workers, many with PhDs, and the knowledge engineering staff.”"
Monday, June 21, 2004
Semantic Web for 5th Graders
This was an interesting find using the Google Catalog to search for the Semantic Web.
It says: "Twenty-two lessons show students how to build a vocabulary...(KWL, SQ3R, semantic web making, outlining)...". Of course, it doesn't look like it's *the* Semantic Web.
Searching for Java by itself reveals some interesting results. Searching for Java in the Computers category reveals no hits; the category function seems to be broken.
It says: "Twenty-two lessons show students how to build a vocabulary...(KWL, SQ3R, semantic web making, outlining)...". Of course, it doesn't look like it's *the* Semantic Web.
Searching for Java by itself reveals some interesting results. Searching for Java in the Computers category reveals no hits; the category function seems to be broken.
Thursday, June 17, 2004
Global Scalability
Semantic Web: Hype-Bubble or an interesting research area? Links to a comment made by on the SUO list "In 6 years (1998 to 2004) with ENORMOUS hype and funding, the semantic web has evolved from Tim BL's book to a few prototype applications, which are less advanced than technologies of the 1970s such as SQL, Prolog, and expert systems -- and they're doing it with XML, which is far less advanced than LISP, which was developed in the 1950s."
There are many interesting responses in the thread including one from Frederick Kintanar: "...I do think the Web makes a big difference between earlier knowledge representation efforts and Semantic Web initiative. A big one is global scalability, where the key element is URI's (and their already deployed global acceptance). Hypertext was already a relatively mature technology in the research community, when Tim Berners-Lee hit upon what is needed to make it scalable: global identification..."
And no this isn't scaling a single agreed upon ontology across the Web.
There are many interesting responses in the thread including one from Frederick Kintanar: "...I do think the Web makes a big difference between earlier knowledge representation efforts and Semantic Web initiative. A big one is global scalability, where the key element is URI's (and their already deployed global acceptance). Hypertext was already a relatively mature technology in the research community, when Tim Berners-Lee hit upon what is needed to make it scalable: global identification..."
And no this isn't scaling a single agreed upon ontology across the Web.
Spicy Searching
IBM expands search push with Masala "The computing giant, based in Armonk, N.Y., is gearing up to release Masala, a new version of its DB2 Information Integrator software that will let corporate employees retrieve information from databases, applications and the Web at the same time. Subsequent improvements will include a data-mining component code-named Criollo."
"Microsoft, though, isn't standing still. It is working on its own distributed search plan with Longhorn and a new release of its SQL Server database, code-named Yukon, and plans to build its own Internet search service. BEA Systems and others are working on similar technology."
"Information Integrator is a software layer than can pull data from different software--Oracle databases, Microsoft Excel, IBM's own DB2 and Lotus databases--with a single query. IBM and other companies are touting this "federated" database approach, in which searches tap into spread-out data sources, as a potentially cheaper alternative to shipping and storing large amounts of information in a single database."
"Microsoft, though, isn't standing still. It is working on its own distributed search plan with Longhorn and a new release of its SQL Server database, code-named Yukon, and plans to build its own Internet search service. BEA Systems and others are working on similar technology."
"Information Integrator is a software layer than can pull data from different software--Oracle databases, Microsoft Excel, IBM's own DB2 and Lotus databases--with a single query. IBM and other companies are touting this "federated" database approach, in which searches tap into spread-out data sources, as a potentially cheaper alternative to shipping and storing large amounts of information in a single database."
Ruined by Developers
How Microsoft Lost the API War Most of this is standard "how the Web has won" but there were a couple of interesting snippets.
"WinFS, advertised as a way to make searching work by making the file system be a relational database, ignores the fact that the real way to make searching work is by making searching work. Don't make me type metadata for all my files that I can search using a query language. Just do me a favor and search the damned hard drive, quickly, for the string I typed, using full-text indexes and other technologies that were boring in 1973."
"RSS became fragmented with several different versions, inaccurate specs and lots of political fighting, and the attempt to clean everything up by creating yet another format called Atom has resulted in several different versions of RSS plus one version of Atom, inaccurate specs and lots of political fighting. When you try to unify two opposing forces by creating a third alternative, you just end up with three opposing forces. You haven't unified anything and you haven't really fixed anything."
"WinFS, advertised as a way to make searching work by making the file system be a relational database, ignores the fact that the real way to make searching work is by making searching work. Don't make me type metadata for all my files that I can search using a query language. Just do me a favor and search the damned hard drive, quickly, for the string I typed, using full-text indexes and other technologies that were boring in 1973."
"RSS became fragmented with several different versions, inaccurate specs and lots of political fighting, and the attempt to clean everything up by creating yet another format called Atom has resulted in several different versions of RSS plus one version of Atom, inaccurate specs and lots of political fighting. When you try to unify two opposing forces by creating a third alternative, you just end up with three opposing forces. You haven't unified anything and you haven't really fixed anything."
Sunday, June 13, 2004
Loom and Drools
Loom is now open source (a BSD-like license). "Loom is a language and environment for constructing intelligent applications. The heart of Loom is a knowledge representation system that is used to provide deductive support for the declarative portion of the Loom language. Declarative knowledge in Loom consists of definitions, rules, facts, and default rules. A deductive engine called a classifier utilizes forward-chaining, semantic unification and object-oriented truth maintainance technologies in order to compile the declarative knowledge into a network designed to efficiently support on-line deductive query processing."
An article theServerSide.com introduces "...the JSR-94 Rules Engine API and an Open Source product called Drools, the forerunner implementation of this up-and-coming technology...it can scale to incorporate and execute hundreds of thousands of rules in a manner which is an order of magnitude more efficient then the next best algorithm." Drools Homepage.
An article theServerSide.com introduces "...the JSR-94 Rules Engine API and an Open Source product called Drools, the forerunner implementation of this up-and-coming technology...it can scale to incorporate and execute hundreds of thousands of rules in a manner which is an order of magnitude more efficient then the next best algorithm." Drools Homepage.
WSDL2OWL-S 1.1 Released
WSDL2OWL-S "...provides a partial translation between WSDL and OWL-S. The results of this translation are a complete specification of the Grounding, partial specification of the Process Model and Profile and Daml Class file, when at least one of the input and output messages are of XSD Complex type." Download it here.
Friday, June 11, 2004
TKS the RDF Database
The Semantic Web in the Enterprise ""RDF provides a simple model to represent logical statements in the form of subject, predicate, object," said Daconta. "While this follows a linguistic approach, that same model can also represent resource, property, and value. This triple model is a powerful underpinning of more powerful languages layered on top of it like OWL."
He referenced one of his non-RDF projects that uses explicit database tables to make certain data associations. This approach is neither scalable nor flexible. The unique nature of RDF, however, can provide a flexible mechanism that would allow far greater associative capabilities, thereby increasing the ability to query and make inferences on topic matters not explicitly hard-wired into tables.
Daconta added that good strides have been made in commercial products supporting Semantic Web technologies, including RDF databases (such as Tucana Knowledge Server), ontology editors and inference engines (such as Network Inference's Cerebra Server), and data source integration engines (such as Unicorn System)."
The author is Ken Fromm from Loomia who are working on "a semantic technology software company building an RDF/FoaF-based identity and relationship stack".
He referenced one of his non-RDF projects that uses explicit database tables to make certain data associations. This approach is neither scalable nor flexible. The unique nature of RDF, however, can provide a flexible mechanism that would allow far greater associative capabilities, thereby increasing the ability to query and make inferences on topic matters not explicitly hard-wired into tables.
Daconta added that good strides have been made in commercial products supporting Semantic Web technologies, including RDF databases (such as Tucana Knowledge Server), ontology editors and inference engines (such as Network Inference's Cerebra Server), and data source integration engines (such as Unicorn System)."
The author is Ken Fromm from Loomia who are working on "a semantic technology software company building an RDF/FoaF-based identity and relationship stack".
Behind the Wall
Something Useful This Way Comes "Mike Champion offered an optimistic note, suggesting that Semantic Web technology may first flourish behind the enterprise firewall, in a way reminiscent of the earliest days of Netscape's corporate success:
The other previously missing ingredient is that real organizations have at least something approximating an implicit ontology in their database schema, standard operating procedures, official vocabularies, etc. It is at least arguable that the technologies that have emerged from the Semantic Web efforts allow all this diverse stuff to be pulled together in a useful way -- ontology editors, inferencing engines, semantic metadata repositories, etc. I'm seeing real success stories in my day job, and a coherent story is starting to be told by a number of vendors, analysts, etc.
Champion here makes a similar point to the one I argued in an article last fall ("Commercializing the Semantic Web"), namely, that there exist today several startups and fledgling ventures that are selling Semantic Web technologies to corporate clients, including Network Inference, Tucana Technologies, and others."
"Honestly, I don't know whether to laugh, because with WinFS Microsoft seems to be buying into the Semantic Web idea, or cry, because with WinFS Microsoft seems to be embracing-and-extending the Semantic Web idea. Oh well -- outside of the realm of unenforced US antitrust legislation, Microsoft is like gravity. Eventually, you just learn to work around it."
The other previously missing ingredient is that real organizations have at least something approximating an implicit ontology in their database schema, standard operating procedures, official vocabularies, etc. It is at least arguable that the technologies that have emerged from the Semantic Web efforts allow all this diverse stuff to be pulled together in a useful way -- ontology editors, inferencing engines, semantic metadata repositories, etc. I'm seeing real success stories in my day job, and a coherent story is starting to be told by a number of vendors, analysts, etc.
Champion here makes a similar point to the one I argued in an article last fall ("Commercializing the Semantic Web"), namely, that there exist today several startups and fledgling ventures that are selling Semantic Web technologies to corporate clients, including Network Inference, Tucana Technologies, and others."
"Honestly, I don't know whether to laugh, because with WinFS Microsoft seems to be buying into the Semantic Web idea, or cry, because with WinFS Microsoft seems to be embracing-and-extending the Semantic Web idea. Oh well -- outside of the realm of unenforced US antitrust legislation, Microsoft is like gravity. Eventually, you just learn to work around it."
Thursday, June 10, 2004
Semantic Web Tutorial
Tutorial on basic SW Technologies Nice to get mentioned in the same breath as Sesame, Redland, etc. (not quite with Jena which got two of its own slides). Links also included: graphical editors and the Semantic Web Application Survey
WinFS and RDF to Square Off?
Questions about Longhorn, part 2: WinFS and semantics "Today's personal information systems are organized hierarchically. WinFS proposes that they be organized semantically. A number of observers have noted a family resemblance between RDF (Resource Description Framework) "triples" and WinFS relationships. An RDF triple, in geek-speak, is a subject-predicate-object relation. Sets of RDF triples can be (and Semantic Web people say must be) used to represent and organize knowledge."
Where do you get Longhorn today? RDF and schemas for RDF are available now. Taking the both bottom up and top down schemas and deployed locally and globally. Last year I asked, Should the RDF model be integrated into the File System?, the answer still seems to be 'yes'.
Danny has a good summary of part 1 of this discussion.
Where do you get Longhorn today? RDF and schemas for RDF are available now. Taking the both bottom up and top down schemas and deployed locally and globally. Last year I asked, Should the RDF model be integrated into the File System?, the answer still seems to be 'yes'.
Danny has a good summary of part 1 of this discussion.
Wednesday, June 09, 2004
Can't get enough of the Semantic Web?
The Special Interest Group on Semantic Web and Information Systems. The first bulletin (PDF) is 75 pages of Semantic Web goodness. Includes interviews with Jim Hendler and Amit Sheth, where the SIGSEMIS is positioned, and various papers on the Semantic Web. There's also related events, an industry column (about Semagix Freedom) and book reviews.
Tuesday, June 08, 2004
Cynical Development
A fresh look at the waterfall "Analysis -> Dream, Design -> Guess and Waffle...Support -> Duck and Deny"
Monday, June 07, 2004
Sesame Speed
The KIM Platform: Performance and Scale "The result: a repository of 15M explicit statements, describing 1.2M entities, is manageable with an indicative upload speed of 1300 statements/sec...The experiment was performed on a 2xOpteron 240 (1.4GHz) server with 6GB of RAM - a $3000-worth, brandless machine. 64-bit beta-versions for Amd64 of Windows 2003 Server and JDK 1.5.0 were used."
iTQL evaluated
iTQL evaluation We do support optional match through the ugly sub-query syntax but apart from that I think Kendell has it right. Aliases allow you to shorten URIs.
Sunday, June 06, 2004
Datalog and Inferencing
Implementing OWL Lite in rule-based systems and recursion-enabled relational DBs "The semantics of negation in stratified Datalog is not compatible with the semantics of negation in Description Logics...Alternative, more compatible semantics for negation in Datalog exist, namely well-founded semantics, that also relies on such a three-valued logics. However, this is not semantics available in SQL:1999 compliant databases. Well-founded semantics only regards the minimal model when computing negation, whereas Description Logics regard all possible models. Hence, they are also incompatible."
Two other related papers: Bubo - Implementing OWL in rule-based systems and Description Logic Programs: Combining Logic Programs with Description Logic.
This maps well to 3VL and the recursive predicate functions trans and walk.
Two other related papers: Bubo - Implementing OWL in rule-based systems and Description Logic Programs: Combining Logic Programs with Description Logic.
This maps well to 3VL and the recursive predicate functions trans and walk.
Not Relational Enough
Two recent articles about XML and database technology. One of the FAQs about Kowari is in respect to relational databases. Taking the pedantic view, the two databases mentioned are not relational but SQL. I've previously posted links about XML management systems too.
In, XML, the New Database Heresy "One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store “any data” in a traditional relational database without resorting to an opaque blob?
I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea."
Edd Dumbill wrote in, Ron Bourret on XML and databases "My guess is that everything will pick up on this front in a year or two, with companies moving towards what I consider the holy grail of XML support in relational databases: native storage behind a first-class XML data type, XQuery support with extensions for (a) including relational data or SQL queries and (b) updates, SQL/XML support with extensions for embedded XQuery queries, and support for JSR 225 (see below)."
One of the aspects we've found in implementing TKS/Kowari was that it pays to stick closely to the relational model. Everything comes down to tuples and I think that the problems with putting RDF on top of SQL database is that they aren't relational or "relational enough" (whatever that means).
Actually, the DAWG is asking for comments, maybe ensuring that the queries are expressed in relational algebra. That should prevent things like NULL getting in there.
In, XML, the New Database Heresy "One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store “any data” in a traditional relational database without resorting to an opaque blob?
I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea."
Edd Dumbill wrote in, Ron Bourret on XML and databases "My guess is that everything will pick up on this front in a year or two, with companies moving towards what I consider the holy grail of XML support in relational databases: native storage behind a first-class XML data type, XQuery support with extensions for (a) including relational data or SQL queries and (b) updates, SQL/XML support with extensions for embedded XQuery queries, and support for JSR 225 (see below)."
One of the aspects we've found in implementing TKS/Kowari was that it pays to stick closely to the relational model. Everything comes down to tuples and I think that the problems with putting RDF on top of SQL database is that they aren't relational or "relational enough" (whatever that means).
Actually, the DAWG is asking for comments, maybe ensuring that the queries are expressed in relational algebra. That should prevent things like NULL getting in there.
Why FOAF? Why XFN?
This is a fairly old blog entry (4 months) but I hadn't seen it before, it discusses another social software format, XFN.
Social software snippets "So Jonas poses a good question, why was FOAF invented? Perhaps an attempt to justify RDF? Is this an example of a solution looking for a problem? A clever acronym looking for a reason to exist?
Speaking of which, the biggest irony I see about FOAF is its name, which stands for "Friend of a Friend", and yet the technology has nothing to do with "friends". Like Jonas said, it's simply vCard recast in RDF, except for the "knows" relationship, which itself is quite meaningless (quite ironic for a Semantic Web effort), as it provides no more meaning than a plain hyperlink. Why work so hard for so little?
FOAF could be saved however, simply by adding an XFN module, thus enabling FOAF to finally fulfill its namesake, and represent friendship rather than just claiming to."
Social software snippets "So Jonas poses a good question, why was FOAF invented? Perhaps an attempt to justify RDF? Is this an example of a solution looking for a problem? A clever acronym looking for a reason to exist?
Speaking of which, the biggest irony I see about FOAF is its name, which stands for "Friend of a Friend", and yet the technology has nothing to do with "friends". Like Jonas said, it's simply vCard recast in RDF, except for the "knows" relationship, which itself is quite meaningless (quite ironic for a Semantic Web effort), as it provides no more meaning than a plain hyperlink. Why work so hard for so little?
FOAF could be saved however, simply by adding an XFN module, thus enabling FOAF to finally fulfill its namesake, and represent friendship rather than just claiming to."
Friday, June 04, 2004
TCO of open source software
Weighing the costs of open source "While MySQL passed performance tests, the IT staff at the credit card processing company became concerned that MySQL didn't have enough formal support backing it up. What if the credit card processing company's databases failed, asked Tim Kelly, who serves as technical director of technology at TSYS.
"We were not prepared to deal with new support contracts and rely on an alternative database with our customer data." Tim Kelly director of technology, TSYS
"We have a procedure on how to roll back transactions when something goes wrong, but in the event that everything you try doesn't work and you look for support, there area million comments out there on the Web -- and newsgroup articles on MySQL -- but in production scenarios, you can't really rely on that," Kelly said.
Figuring out the total cost of ownership for a DBMS can be more complex than many companies expect at the outset, said Mike Schiff, vice president of data warehousing and business intelligence at Sterling, Va.-based Current Analysis.
"The cost of ownership isn't just the cost of acquisition or maintenance; it's also the vendor responsiveness when you've got a critical issue and downtime," Schiff said. "The cost of dollars to a company that has a database down can be staggering.""
"We were not prepared to deal with new support contracts and rely on an alternative database with our customer data." Tim Kelly director of technology, TSYS
"We have a procedure on how to roll back transactions when something goes wrong, but in the event that everything you try doesn't work and you look for support, there area million comments out there on the Web -- and newsgroup articles on MySQL -- but in production scenarios, you can't really rely on that," Kelly said.
Figuring out the total cost of ownership for a DBMS can be more complex than many companies expect at the outset, said Mike Schiff, vice president of data warehousing and business intelligence at Sterling, Va.-based Current Analysis.
"The cost of ownership isn't just the cost of acquisition or maintenance; it's also the vendor responsiveness when you've got a critical issue and downtime," Schiff said. "The cost of dollars to a company that has a database down can be staggering.""
Subscribe to:
Posts (Atom)