More News: 01/01/2003

Friday, January 31, 2003

Mapping RDF to Relational Stores

SWAD-Europe: Mapping Semantic Web Data with RDBMSes

"We've tried to address:
1. I want to implement an RDF triple store. How can I do this with a relational database?
2. I have a legacy relational database. How can it be exposed as RDF?"

They say that BDB is an order of magntitude faster than RDBMS and attribute that to transaction support. They discuss that a two index approach is the best tradeoff (indexing subject+predicate and object). Also noted is the performance problems when changing the schema (they note Sesame's use of PostgreSQL). There's a mailing list and blog available too.

Thursday, January 30, 2003

PMI-IR

Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL "This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing)."

Wednesday, January 29, 2003

Ted Nelson on the Semantic Web

Buy In "XML is not an improvement but a hierarchy hamburger. Everything, everything must be forced into hierarchical templates! And the "semantic web" means that tekkie committees will decide the world's true concepts for once and for all. Enforcement is going to be another problem :) It is a very strange way of thinking, but all too many people are buying in because they think that's how it must be.

There is an alternative.

Markup must not be embedded. Hierarchies and files must not be part of the mental structure of documents. Links must go both ways. All these fundamental errors of the Web must be repaired. But the geeks have tried to lock the door behind them to make nothing else possible.

We fight on."

And I've just been reading about all the things that Semantic Web developers have been doing to give users control and to adapt and evolve concepts.

Tuesday, January 28, 2003

Swing is just Better

Comparing SWT and Swing "SWT is really just a thin veneer over the native toolkit, so as one would expect, all application model data must be copied from application data structures to native toolkit components. Developers can't choose a representation that best matches their applications needs. Consider a spreadsheet application, in which the user is presented with a very large table but the table data is sparse. It is trivial in Swing to create such a model and bind it to a JTable. The time required to construct such a JTable is constant, whether 10,000 or 1,000,000 rows are shown to the user.[3]. The same is not true of SWT. It takes SWT 5 seconds to show a 40,000 row table [4], and I gave up waiting for SWT to display a 100,000 row table after one minute. Additionally, as SWT creates a native widget for each row, memory grows and grows with each additional row. The same is not true of Swing."

There and back again

I thought I'd written about this before. But this is not RDF Template but RDF Transformation. The process takes the form of mapping an XML model to RDF, RDF to a common RDF ontology and then that is converted to a desired XML model. This is unsurprisingly for WSDL or to be more buzzword compliant Semantic Web Services.

http://www.cs.vu.nl/~borys/RDFT/tutorial/index.html

Towards the Semantic Web

Well I've finally received this book and the first thing is that it's much smaller than I thought. Also, the content is not all that different from deliverables from the On-To-Knowledge group.

Software Archive

"Although a number of people online are (unofficially) archiving console software and game ROMs, nobody is making sure there are perfect digital copies and databases of the PC/Mac CD 'multimedia' boom and bust of the early and mid 90s. This is a _vital_ pre-broadband era where some of the first widely available ideas of 'virtual reality' and cinema-quality 3D graphics for the home were being explored (see 'Myst'!).

Although the Internet has now superceded a lot of the multimedia ideals the Macromedia collection stands for, that's precisely WHY the collection is important - as a document of what the era stands for. As an added impulse, the collection is stored on decayable CD media, and it's not strictly clear how long it will be until these discs will lose their reflective surfaces and become unplayable (some people claim 10 to 25 years!) "

http://www.archive.org/cdroms/cdroms.php

The Third Place

Somedays you come across a bunch of links that seem to be all pointing at the same thing, the same big thing in flaming neon writing, 30 metres high.

A quote from a quote: "Benkler suggest that we are seeing the broad and deep emergence of a new, third mode of production in the digitally networked environment. He calls this mode “commons-based peer production,” to distinguish it from the property- and contract-based modes of firms and markets. Its central characteristic is that groups of individuals successfully collaborate on largescale projects following a diverse cluster of motivational drives and social signals, rather than either market prices or managerial commands." A followup article lists the problems with intellectual honesty and control in the field of Natural Language Processing.

The paper is "Coase’s Penguin, or, Linux and The Nature of the Firm". His other papers present very much the same view of Lessig and others (and predates many) such as open spectrum and building commons for intellectual property. I'd actually check the references to Lessig's book if they both weren't on loan (irony intended).

In Cory Doctorow's interview he claims that money is a form of Whuffie (brownie points, how much esteem you have garnered). Of course, having 50,000 downloads is quite a lot of Whuffie too.

Another person who lives off of Whuffie is Esther Dyson: "From the business point of view--not to overstate it--intellectual property is dead; long live intellectual process. Long live service; long live performance. The intellectual assets should be distributed for free, and then you should use them as advertising to charge for speaking, consulting, for software support--for T-shirts. The Lion King is great advertising for T-shirts, baseball caps, lunch boxes. To me Java [software] is advertising for Sun Microsystems."

Monday, January 27, 2003

Why they don't share

Five reasons people don't tell what they know and you only need one reason to share.

Saturday, January 25, 2003

Palm continues to lead

For Palm, a Successful Parting... "Estimates of the share of handhelds running PalmSource software range from 50% to 80%, depending on just how the market is defined and who's doing the counting. No one questions, however, that Palm retains a very hefty lead over second-place Microsoft and an assortment of also-rans that includes Research in Motion (RIMM ), Nokia (NOK ), and Sharp (SHCAY )."

How spam doesn't work

SMBmeta is different than the old Keywords meta tags "Some of the things we learn from the Keywords meta tag experience are the following:

* Showing the searcher data used in doing the search can help the searcher determine if a result is relevant and ignore inappropriate ones
* Find ways to have third party validation
* Make it easy to flag abusers
* Do not be biased by simple repetition
* Some detailed queries can only be answered using data provided by the web site author
* Authors will go to the trouble of providing meta data if they think it will help searchers
* If it's easy to be abused, it will"

Practical RDF Chapters

At Long Last chapters 1, 2, 3, 4, 5, 6 and 9 are now available. Looks like chapter 15 and 16 have been rejigged a little with 15 becoming non-commercial and 16 becoming commercial uses of RDF.

Friday, January 24, 2003

Zoe 0.3.6

Zoe This continues to be cranked out and now supports a built-in FTP server so you can mount attachments directly on your file system.

Magic Roundabout

Round And About "...the most important work in reshaping the series was undertaken by the actor selected to provide the narration, Eric Thompson...He loathed the original French stories, which it is claimed he regarded as simplistic and dull, and refused to work with a straightforward translation. Instead, he watched the episodes with the soundtrack turned down, created new names and personalities for the characters, and invented completely new storylines to match the on-screen action. The resultant scripts were sharp and witty, and traded in language and humour that was far in advance of the level of sophistication that might usually have been expected in a programme of this nature." I just learnt that there's going to be a CGI version of it.

DSpace

An Open Source Dynamic Digital Repository "It is an attempt to address a problem that MIT faculty have been expressing to the Libraries for the past few years. As faculty and other researchers develop research materials and scholarly publications in increasingly complex digital formats, there is a need to collect, preserve, index and distribute them: a time-consuming and expensive chore for individual faculty and their departments, labs, and centers to manage themselves. The DSpace system provides a way to manage these research materials and publications in a professionally maintained repository to give them greater visibility and accessibility over time...Only three fields are required: title, language, and submission date, all other fields are optional...DSpace is the first open source digital repository system to tackle the complex problem of how to accommodate the differing submission workflows needed for a multidisciplinary system...All original code is in the Java programming language. Other pieces of the technology stack include a relational database management system (PostgreSQL), a Web server and Java servlet engine (Apache and Tomcat, both from the Apache Foundation), Jena (an RDF toolkit from HP Labs), OAICat from OCLC, and several other useful libraries."

Tuesday, January 21, 2003

Snapping with SnipSnap

Moblogging go! "SnipSnap DV now supports image attachments (or inlines) in mails. Need to test it with a mobile with a camera."

Right

Tech Predictions for the Decade "Take, for example, the concept of the "semantic" Web, a next-generation version of the Internet that will enable users to obtain more precise information by utilizing computerized "agents" that find exactly what they want online. Today, if users search for "books about Agatha Christie" on Google, they receive hundreds of search results leading to information on the books written by Christie. In contrast, semantic Web agents will be intelligent enough to decipher the word "about" and find biographies on the writer rather than her works, Gantz said."

I'm not even going to start with what's wrong with the statements above. Take a deep breath and start again.

Meanwhile, the Butler Group says about OLAP that: "It is no longer seen as a specialist solution, for business analysts and experienced line of business managers, but as an end user solution, supporting the information needs of a broad range of individuals. Furthermore, OLAP needs to be viewed not as a standalone tool, but as part of a larger integrated BI solution." It's all about the ease of user rather than pure performance.

Joseki 1.0 Released

Joseki is an API to enable Jena models to be accessed over the web using POST and GET. I like this way of doing things. Although, it might not be so good for large queries. You could add a page and page size values (requiring the server to keep state). They've taken the odd step of using JDK 1.4 for logging but not for regular expressions. Even the ORO group are considering using parts of JDK 1.4 (CharSequence) to get better performance.

http://www.joseki.org/

Andy replies "Jena itself runs with Java 1.3 (and Java 1.2, I think - we don't test that during the release cycle) so we include the ORO regular expression package (actually used in the query filters for RDQL).

It's the Joseki server that uses Java 1.4 logging. Too many logging points were arising; there is web server logging, Joseki logging, logging by any query language plugins, ..., so I decided to use Java 1.4.

I also realise that not everyone wants to or is able to move to Java 1.4 just yet. If this comes a problem for anyone I'll fix it." Where's that email and web integrator when you need it.

Sunday, January 19, 2003

Proguard

Proguard is a free Java class file shrinker and obfuscator. It can detect and remove unused classes, fields, methods, and attributes. It can then rename the remaining classes, fields, and methods using short meaningless names. The resulting jars are smaller and harder to reverse-engineer.

What's an Invoice?

OASIS to preview XML system for standard business documents ""An analyst, however, said he doubted OASIS would succeed in providing standard forms for business. "How can hospitals and manufacturing firms and aerospace industries all share the same notion of an invoice?" asked Ronald Schmelzer, analyst at ZapThink in Waltham, Mass.

"Even if they all adopt the core business language, they're going to have different extensions on it," Schmelzer said. "I don't think this is going to be any magic pill," for getting documents to agree with each other, he added.

Schmelzer said he favored a concept known as the semantic Web, which would have computers are more intelligent in understanding semantics.""

Saturday, January 18, 2003

Trackback to the Future

TrackBack in motion. Update on Trackback and related ideas.

Trackback Technologies contains some new Perl and PHP code for the Recent Trackbacks and Backtrack.

I'd like an Extended Copyright with that

I WANT TO BELIEVE Countries can agree to reciprocal protection. This effectively means that a published work does have the rights based in country it was published. In some ways this is a much more sensible agreement. The sovereignty of a country is preserved. Of course, this could mean boatloads of American copyright refugees.

Anything More?

once more into the breach, my friends? "The easy answer is no. The Supreme Court has ruled that Congress has the power perpetually to extend the terms of existing copyrights. This brief “experiment with the public domain,” as the NYT eloquently put it, is over. In twenty years, we can expect terms will be extended again. There is no good reason to expect anything different.

The hard answer is, well, yes..."

I wonder if international agreements like TRIPS (Article 13) could be used against the US. Of course, something that treats a computer program, with or without source, the same as literary works should probably be ignored.

Also, an amusing interview with Mickey on the decision and comic strip.

Correcting Categorization

Cleaning iTunes "...it works poorly for show tunes -- people seem to really care about the performers, so they get really specific with the artist portion of the ID, to the extent that every song on the CD has a different artist string even though it is the same cast...Some of the things I want to clean up are bad data, and some are matters of Gracenote policy I disagree with, like inserting "(Disk N)" at the end of album names for multi-CD sets. In my iTunes Music Library, I get to have things my way...If I change the tag in iTunes, it updates the iTunes database and playlists, updates the ID tag in the actual MP3 file, and moves the file to a new artist directory if my music is in the iTunes Music Library. This is all very cool." The author writes some AppleScript to modify the ID tags to his taste.

The Misuse of Metadata

Creative Comments: On the Uses and Abuses of Markup This pretty much says the same things I've been saying of late such as, "The Semantic Web isn't a replacement, it's a supplement.". I'm not going to bother bringing a whole lot of context here as the article is actually about the Creative Commons' (and apparently Trackback's) misuse of the comment tag and embedding RDF into HTML.

"Surely, I wondered, they know that putting RDF into HTML comments is an inelegant way of relating human and machine-consumable resources? Creative Commons, which has taken on the laudable task of creating RDF descriptions of common licensing terms for intellectual property, suggests its users associate machine-consumable licensing terms...with the web resources to which they apply by embedding RDF directly in HTML comments...In short, markup language comments are for communicating with humans, not with machines. The problem with incoherent strategies is that it's not always possible to predict all the ways in which they will fail or go bad."

It's a shame that a really good use of RDF has decided to do it in such a non-standard, if simple way. RSS is a really good example of how this separation is good. New proposed standards like SMBMeta (or Whoogle) propose the same thing. The author suggests the link tag to link to the licence. I guess you could also use the meta tag to embed the extra metadata.

Using this example:
<meta name="dc.title" content="Anti-war demonstrators">
<link rel="schema.dc" href="http://purl.org/metadata/dublin_core_elements#title">
<meta name="dc.date" content="2002-09-28T14:36:54">
<link rel="schema.dc" href="http://purl.org/metadata/dublin_core_elements#date">

This is using the Dublin Core Guide.

Thursday, January 16, 2003

Buffy Blogs

The First Evil, Willow, Andrew, Spike, Faith, Dawn, and of course Buffy.

Wednesday, January 15, 2003

Nokia Tools for Linux

Nokia woos Linux programmers "Nokia, the top seller of mobile phones, has released software to let Linux programmers develop Java software for its mobile phones."

For JBuilder and NetBeans for Nokia Series 40.

The download is available here.

Scopeware

Why Automatic Information Management is Doomed to Fail "I firmly believe the answer lies within each individual user. The user should always be in control of their information. You should be able to organize your information in a way that makes sense to you, rather than have to figure out how the computer has organized your information.

I propose an information management system where files can be stored in categories, and metadata attributes would be attached to each file such as description, keywords, context, and other information that users can enter."

The rest of this is a review of Scopeware and how it introduces a whole new set of problems by trying to make hard drives more searchable. I really haven't seen a 3D file system navigation that I like (Navigate, Throw up, 3DOSX) although most of them look impressive. The key to all of this is more feedback. With correct tools they can help not only in searching and automatically categorizing files but also in detecting mistakes.

Source Blogs

Apple snub stings Mozilla The interesting thing about this article is that almost all the source information came from Blogs.

RDF Buffy Anyone

RDF has ruined me. Everytime I see a ball and stick diagram I think it must be done in RDF. Like the Buffy Sex Chart. Think of all the wonderful reifications that could be made and what interesting properties you could add.

Tuesday, January 14, 2003

I Did it My Way

"Back when Daisuke Inoue was a youngster banging drums with a local lounge band, he didn't think his invention for sing-along soundtracks and a portable microphone would amount to much.

He certainly had no idea of applying for a patent.

Three decades later, karaoke is a household word around the world and Inoue hardly sees a dime. His closest link to the business is selling cockroach killer for karaoke booths. "

Japanese inventor loses patent to songbox.

Quantum Leap

MIT makes quantum leap in graphics "MIT says that the major advance with this technology, called Quantum Dot Organic LED, or QD-OLED, is that the filling in the sandwich is just one quantum dot thick. This has the potential to allow 100 percent of the electrons moved into the holes to produce light, compared to a maximum efficiency in other systems of around 30 percent and an actual efficiency that can struggle to beat 5 percent. Previous QD-OLEDs had layers of 10 to 20 quantum dots; the single-dot layer is just three nanometers across."

Xparanto

New database war shapes up "Database companies have been tackling the idea of the "federated," or virtual, database for years, although many attempts have failed because of the poor performance of distributed queries, said Philip Russom, an analyst with Giga Information Group. System complexity and the lack of a universal data language such as Extensible Markup Language (XML) also sidetracked earlier efforts... IBM's Xperanto, which builds on XML, a standard for data exchange, is based on the concept of federated data management. Instead of creating a single, larger database --a model, in part, espoused by rival Oracle--a federated scheme creates a virtual database linked to all the relevant data. In this model, data sources are queried from their native locations and database management servers consolidate the results and make them available to users."

Their method not only allows processing of XQuery on this data but also to convert XQuery into SQL if required. Querying XML Views of Relational Data. In the RDF word this idea is similar to Intellidimension's data services. Much like our full text models or querying RSS feeds.

Sunday, January 12, 2003

Ignored Again

With my crappy tool still in hand, I'm finally back to getting those less than double digit hits a day.

I will be switching when the time comes to renew but that was already true. I've covered a few possibilities before like SnipSnap but that's probably too lame as well. For example, they don't seem to have PingBack or TrackBack but they do have other neat features. I can see that I'm going to need some of these things sooner rather than later and I'd like to contribute so Java is probably best.

NMF

Round-tripping Metadata "NMF is designed to allow both a natural description of metadata in NMF while also allowing as large a subset of RDF metadata as possible to be translated into NMF. This translation between the NMF representation and the RDF representation must be supported in both a type aware and type unaware environment. In practice, even the type-aware (schemas available) environment may not have access to the type information for all of the metadata that the mapping is being applied to."

Data Integrator 6.0

Business Objects upgrades integration platform "Business Objects will unveil the latest version of its data integration tool Monday designed to conquer the Holy Grail of business intelligence: integrating data from disparate systems in real time so that it can be leveraged to affect company operations...It also supports complex workflow processing that allows for complicated data structures while combining support for Web services standards such as XML, SOAP, and WSDL...By pushing a button, data analysts can create and update Business Objects universes, the semantic layer that presents a graphical business representation of underlying structures. As a result, users are insulated from underlying changes in source data by their ETL tool, eliminating the need to re-create scores of reports after the data is reorganized."

Taxomita

Well, continuing the feeling that there's just too much stuff being done to link to let alone download and try is Taxomita.

"Taxomita is a web-based authoring application that lets you create distributed, hierarchical, faceted metadata, and use it to index any page on the web.

The distributed bit is what makes Taxomita different. It means you won't have to do all metadata work yourself, you will be able to create a cloud of metadata sources you trust. You can indicate your trust on a topic-level, so you can indicate you'd like to incorporate Joe's indexing on "accessibility" in your map, but you don't care for his indexing of "babes on the web". (or the other way round)."

In version 2 they plan to have controlled vocabularies and thesauri. Would be nice to create your own from your own corpus. They are using eXchangeable Faceted Metadata Language.

Idle Semantics

Semant-O-Matic "The technique I have been working with, called latent semantic analysis, uses linear algebra to examine patterns of word use across many blog entries and make intelligent guesses about the topics those entries cover."

Together we can build the semantic web. How else? :-)

Gosh I'd love to do a Perl to Java conversion. Includes displaying near-by documents.

The demo page contains the ability to search by keyword or the extracted semantic index. The is a GPLed version of search engines you can pay for.

Oghma Project

"MindHive is a open-source tool for organising your personal information which allows you to selectively share your information with others and to run autonomous, rational agents capable of moving across a peer-to-peer (p2p) network to perform tasks for your benefit."

Requirements include a versioned RDF store. It's a shame that there's no source available yet.

http://www.oghma.org/

FoneBlog

Start-up marries blogs and camera phones "A Dublin-based start-up is to offer software to mobile operators that will enable mobile phone users to create and maintain Weblogs or "blogs" using only their phones."

NewBay Software Announces World's First Blogging System for Mobile Network Operators "NewBay Software was recently founded to develop this specialized market in anticipation of the massive availability of camera phones. According to Strategy Analytics, by 2004, 11% of all phones sold in 2004 will have a camera facility and camera phones will outsell digital cameras. "

Supports RSS (RDF Site Summary). Similar to Hiptop Nation. After seeing the quality but more importantly the ease at which the Nokia 7650 and 7210 with extra phone enable this kind of functionality I wouldn't be surprised how big this will become.

Friday, January 10, 2003

Fink using RDF

This is kind of neat. Benjamin Reed has gone RSS crazy. Fink uses RDF (namely RSS) to list the updates to the various packages in stable and unstable.

Copyright

Same old song, different meaning for P2P says that the difficulty of blocking access is "...part of the reason the RIAA, along with other copyright holders, is pressing policy-makers in Europe and elsewhere to bring their copyright laws in line with those of the United States, Turkewitz said."

So far those efforts have met with little success. The most recent Europe-wide copyright rules, which have yet to be adopted by several countries, maintain the 50-year limit. However, copyright holders remain hopeful that individual countries will address the expiration dates as they periodically re-examine digital copyright issues."

Other copyright holders have been successful. Recently, the limit in the European countries was extended to 70 years (life of the creator plus 70) just not for sound recordings. If you're an Australian artist you get life plus 50 in the UK and in Australia. Letting each country control its own laws seems much more sensible than complaining about other countries not doing it like us (or should that be the US). Other US groups, like the BSA, have been very successful getting their copyright laws applied to other countries.

More NIO

JSR 203 is basically what Sun didn't get finished for 1.4's version of NIO. This is to be released in Java 1.5.

"Its major components will be:

1. A new filesystem interface that supports bulk access to file attributes, escape to filesystem-specific APIs, and a service-provider interface for pluggable filesystem implementations;

2. An API for asynchronous (as opposed to polled, non-blocking) I/O operations on both sockets and files; and

3.The completion of the socket-channel functionality defined in JSR-51, including the addition of support for binding, option configuration, and multicast datagrams."

NLP Searching

Going beyond simple keywords—the next generation of Search Tools - The uses of NLP are at query time and at document extraction time.

"NLP technologies go beyond traditional information retrieval techniques, enabling a system to accomplish a human-like understanding of documents, and thus permitting the extraction of useful meaning from unstructured texts.

Search companies such as Ask Jeeves, Convera, Northern Light, Verity, SmartLogik, Q-Go, and Cognit among others, have incorporated NLP techniques in their commercial search solutions. "

Didn't know about EuroWordNet which is WordNet for French, German, Spanish, Dutch, Italian, Czech, Estonian and English languages. A list of all the Wordnets is available at Global WordNet Association.

Rendezvous

Mod_rendezvous points out that both the new Mac browser Safari and Chimera's Rendezvous support can now be a bit more useful with the new Apache module for rendezvous. Another step towards the P2P web.

As the Apple Turns reports that there were 300,000 downloads of Safari in the first day. They also link to a a summary of Steve on CNBC.

Thursday, January 09, 2003

Technoelite

A little control "On a side note, remember that flap just before the holidays about the Coders-Only Club? Well, look at it this way: some people have strong opinions about the way computers should work. Other people have strong opinions about the way computers should work, and back them up with code. It seems obvious to me which people deserve more attention. Maybe that’s just me."

Hopefully it's just you. The implementation and the specification should be completely different. Controlling ideas has shown to have very little historic success. There's little reason to see why this will work now. In fact, the opposite has consistently been shown to be true in recent information technology history. From assembly, to compiled code, to scripting languages, to markup, all have shown that increasing the number of people who can use and modify the technology is an advantage.

Although, I can imagine how seductive this might be. It would be good if users gave programmers offerings in the hope that the technology gods smile more favourably on them and provide them with more stable and appropriate applications.

Update 9th January 2003: Shelley noticed the same thing. From the comments: "I[n] my opinion, well-thought arguments carry more weight than stand-alone code...", ""Shut up and write some code" strikes me as about the least productive thing a developer can say. Being proud of that attitude strikes me as outright bizarre.". Also, it's more correct to say this is about controlling ideas than centralising power.

Wednesday, January 08, 2003

GPL Based Knowledge Discovery Tools

"The KDDML-MQL system is being developed using Java as programming language, WEKA as data mining library, XML (and related XSL, DOM technologies) as representation language, IBM Alphaworks XML4J as XML parser, IBM Lotus XSL as XSLT processor and XQuery as XML query language. Also, the system can access external data sources, including Microsoft SQL Server and Oracle databases."

I just thought this was interesting given recent news on XSLT, XQuery and Weka.

KDDML-MQL: Knowledge Discovery in Databases Markup Language

No Standards

"I think I'm getting the picture. North Korea breaks all its nuclear agreements with the United States, throws out UN inspectors and sets off to make a bomb a year, and President Bush says it's "a diplomatic issue". Iraq hands over a 12,000-page account of its weapons production and allows UN inspectors to roam all over the country, and – after they've found not a jam-jar of dangerous chemicals in 230 raids – President Bush announces that Iraq is a threat to America, has not disarmed and may have to be invaded. So that's it, then."

"Indeed, many Americans don't even know what the chilling acronym of the "US Patriot Act" even stands for. "Patriot" is not a reference to patriotism. The name stands for the "United and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorism Act"."

That's the most alarming thing of all, law by marketing.

Robert Fisk: The double standards, dubious morality and duplicity of this fight against terror

Graph like Querying

RDF Objects "An RDF Object is a compound data structure that is extracted from a database...Extraction algorithms are represented by RDFS classes [21] that are subclasses of ExtractionAlgorithm. Instances of these classes represent a particular extraction, and have associated properties that are used as parameters to the algorithm. The input database and output object are included as parameters to all algorithms...The RDF Object is extracted by first locating the focus resource in the database, and then by recursing in all directions from this resource through the graph to the specified depth. In traversing arcs between resources, the direction of the arrow is unimportant; both forward and backward arrows are traversed."

"If a list of vocabularies to include is specified, an arc will only be traversed if it is a property from a vocabulary that is in the inclusion list. If a list of vocabularies to exclude is specified, an arc will only be traversed if it isn't a property from a vocabulary that is in the exclusion list. "

This seems more like it.

RDF Templates

"The idea comes from a simple desire: what if there was a way to apply XSLT to an RDF document to produce an (X)HTML page? You can't do it in the general case because there are many, many ways to serialize an RDF graph and not all of them provide the nice tree structure that XSLT requires to operate...It consists of a pattern language which works like XPath and a set of elements that work like XSLT."

I'm not really sure why you'd want to convert RDF to (X)HTML in this way. There are query languages for RDF. I've seen people use XSLT to process RDF/XML. In XUL you use RDF queries to populate an XML template. This is like Semantic Schemas but that was going the other way from XML to RDF.

The problem is that you don't want to match just nodes or arcs but a bit more higher up, based on a schema. Being able to invert RDF statements for example.

RDF Templates Draft 1

Safari

Safari Well it's based on Konqueror and it does seem faster than Mozilla (1.2.1). It supports Java well. The Springback feature seems a little gimmicky. I miss tabbed browsing. Having a text field separate to the address bar for searching seems like a fair enough thing. I'll have to put it through its paces as far as the HTML, CSS, Javascript and DOM is concerned. It seems to have some issue with cookies, sites saying that I need to turn session cookies on even when I changed it to always accept cookies. The download is around 3MB so it doesn't hurt to try it.

Apple Announces New "Safari" Browser links to the changelog. As one of the existing KDE developers wrote "Seems to me like a huge christmas gift. Thank you. Thanks a lot." It looks like KHTML and KJS were determined by Apple to be easiest to use, " The size of your code and ease of development within that code made it a better choice for us than other open source projects."

No new iPod.

Tuesday, January 07, 2003

Bake Off

Well there's lots of rumours and no facts on what Apple has in store. Just perfect hype. CNet's guessing consists of stringing together buzzwords in hope of getting a hit. Forbes doesn't promise too much and simply says they're guessing on an update of the iPod.

Only 5 and a half hours to go before the keynote.

Everything that's Wrong with XML

Standardizing an XML vocabulary is fraught with problems according to "Here's What's Wrong With XML-Defined Standards". There is no control of scope, it has too many participants, it takes to long, and they're just too big.

"It is impossible for one organization to orchestrate or supervise standards development in all vertical domains, and we don't expect any organization to announce such plans."

"Without some revolutionary change to the way in which XML-defined standards are developed, the maze of standards will continue to proliferate, and there will be no way to discover redundancies or identify conflicts and reconcile them. At the least, the proliferation of standards will result in millions of dollars of lost effort. At worst, it will corrupt data and compromise business-critical transactions and operations because different parts of the same company will process conflicting XML messages without knowing it. From 2001 through 2004, enterprises worldwide will spend more than $3 billion on XML modeling activities with no return on investment on $2 billion of it (0.8 probability)."

This is a pretty clear outline of why XML is not just the answer. It doesn't matter how schemas are created and interoperability achieved. They have to work together in order for business to operate.

I know I'm not alone in the dislike of Gartner's use of "x.x probability". I heard the some English politician (or someone apparently worth listening to) who said that the chance of going to war with Iraq was "less than 50%". Now what does that mean? They're not making a statement of fact, if it happens or not they're still right.

Disruptive Technologies

"Blogs work differently. Instead of n-way authentication, there is only one-way authentication. Each author authenticates to a hosting service to post there. Each blog is bound to and emblematic of its author's identity. The popularity rankings that the blog indexers incessantly churn out are also measures of trust. As a longtime blogger, I've built trust that comes partly from speaking consistently over time with a credible voice and partly from peer validation expressed in the currency of links. Everyone can see that to violate that trust by spreading malicious code would be a self-destructive act.

If you're creating a Web service that you hope will have a disruptive impact, the lessons are clear. Support HTTP GET-style URLs. Design them carefully, matching de facto standards where they exist. Keep the URLs short, so people can easily understand, modify, and trade them. Establish a blog reputation. Use the blog network to promote the service and enable users of the service to self-organize. It all adds up to a recipe for recombinant growth."

The disruptive Web

True Lies

In Renmin Voice Joshua Allan defends the Semantic Web. Taking up another point in the argument, he says that whether statements are lies or not it's all knowledge. I'm not going to reiterate everything he's said. The initial points are well made. He then seems to be stretching as he makes such claims that XML and RDF are almost the same thing. His use of English is Bard-like in comparison to my own so it's garnered much more respect.

The Year of Linking Dangerously, Joshua Allen on the semantic web and Tracking all the lies are both positive and critical responses to Joshua's blog. I laughed at, "It's in the air. It's viral. It's contagious. Hold your breath or you'll catch it." and I thought I just had a cold.

Jonathon notes in Track all the lies that MoveableType can free you from the tyranny of Google (through the use of Trackback). I've wanted Trackback or pingback for a while. I agree that a link doesn't give you enough information. As Joshua is a Radio user there's no Trackback or Pingback which means hunting down all the responses.

The discussion continues and the tools get better.

Monday, January 06, 2003

Pick Returns

Maverick is an open-source version of a Pick-like MultiValue database written in Java. I've seen old Pick systems but never could work out whether it was an alternative to an OS or an alternative to SQL databases.

""You can think in a relational manner, you can program it like a relational database, but it doesn't [have] all the relational constraints," Youngman said. For example, programming in SQL requires tables for each line on an invoice, while Pick needs just a single record per invoice that can be viewed relationally, he said." Open-source Pick-like database being developed

It currently backends onto some existing relational databases like MySQL.

Semantic Links

More than one way to link a cat "The move from document-centered hypertext systems to map-based hypertext systems had some unforeseen but far-reaching implications: relationships between nodes could be expressed in more than one way. Maps showed interconnectedness explicitly, usually in the form of a directed graph. But also node proximity came into play; relationships among different nodes or documents could be indicated simply on the basis of their relative location. The use of these map-based hypertext systems to author new information spaces uncovered an interesting phenomenon. Users avoided the explicit linking mechanisms in favor of the more implicit expression of relationships through spatial proximity and visual attributes [14]. Further analysis showed that the use of these spatial and visual cues to imply relationships applied not only to map-based hypertext systems, but also to traditional hypertext systems and in the physical arrangement of paper and notecards [15]."

Spatial Hypertext: An Alternative to Navigational and Semantic Links

Sunday, January 05, 2003

Go TKS

It seems that TKS will go into Practical RDF. iTQL isn't mentioned but given the time constraints it's pretty good anyway. There's even a chapter with XUL and RDF in Mozilla.

It looks like it will be a good book. However there were a couple of things that I thought were missing:
* P2P and RDF - like Edutella.
* Annotation - like Annotea.
* Document management - like DSpace.
* IRIs - Can't really tell though from just the TOC.
* CC/PP.
* FIPA.

Saturday, January 04, 2003

How to be a philosopher

"Begin by making a spurious distinction. Befuddle the reader with your analytic wizardry. The reader will enter a logical trance, from which she will be unable to recall the initial spurious distinction and will feel strangely compelled to accept your conclusions." Techniques 7 and 10 aren't all that bad either. :-)

http://www.philosophersnet.com/article.php?id=540

Google is the Web, the Semantic Web

Shark jumping Google has rating good.
August 2009: How Google beat Amazon and Ebay to the Semantic Web is related to Shark jumping Google.

Tools will Save Us (again)

Tools Will Save Us "The person being scolded failed to realize why his post was so ironic and deserving of the scolding he got from Mark. I can't believe he failed to realize that the point Mark was making was that if we (i.e. he) can't even create properly machine readable versions of documents for simple formats like HTML and RSS, yet believs that we will somehow magically do this with more complex and confusing technologies like RDF/XML, DAML+OIL, OWL, etc. "

I haven't failed to understand that complexity will cause problems writing tools. In my original response, I linked to something I said in September, "It's a cop out to say that you should only use tools; until the machines write the tools that is."

The barrier to web publishing has never been lower than it is now, this is because of tools overcoming complexity.

Google is the obvious example of a tool overcoming massive complexity. Even Google doesn't always work. But then I end up using Vivisimo which lets me discriminate between the same lexical term by picking the correct semantic term.

Blogger is a bad tool, therefore tools can't save us. Engineers write tools, therefore engineers can't save us. These are very bad arguments, actually they're stupid and dangerous arguments. If that's the only argument that people have against the Semantic Web then please let me sell you some rocks to protect you from tigers. Things still work because of tolerance.

When I say we can categorize lies I was saying that this categorization does not have to depend on AI. All I was saying was that if you don't want to believe a person, site or data between the meta tag then that's what you can do. Maybe you don't need RDF but you'll need something that looks an awful lot like it.

I actually agree mostly with Mark if not the way he's argued his points. What is needed is semantics from the ground up and using HTML is a simple way of doing it, you need semantic applications for everyone indeed this is what the semantic web challenge is about.

Take a look at the TAP demos, like the Eric Miller example. This TAP KB is broad and shallow. However, it could be used to bootstrap a larger shared vocabulary. As Eric has said, "It's not artificial, and it's not intelligent...The conceptual models behind RDF are predicated on work in the digital library community. You can think of this as a common framework that supports thesaurus, taxonomies and classification schemes."

I don't like ad hominem arguments, and this isn't one, but I just had to wonder about the author of metacrap. That article always struck me as satire and made me laugh when I read it. Section 3 is even positive towards the latent semantics used by Google. So I dug around and I came across the author's FOAF which is an RDF vocabulary. He also co-founded OpenCola which is coming close to the every man's semantic web application. His "Disreputable Conduct in a Reputation Network OpenCola Versus the Demons of the Popular Imagination" talks about the kind of categorization and filtering that works and can be applied to the Semantic Web (he even talks about automated bots).

Predictions for 2003

Most of the predictions for 2003 seem to focus on web services, mobile phones, J2EE, .Net, Microsoft, Apple, Sun, etc. I liked Bill de hÓra's predictions for 2003, here's some of them:

Semantic Web:
* Ontologies are not as useful and more difficult to design than first expected; the upfront costs of creation become a concern.
* Specialized metadata formats continue to be favoured over RDF.
* RDF is used for systems integration.
* Machine learning comes to the semantic web.

XML:
* Users of of W3C Schema make all the same mistakes OO users did 10 years ago.
* XSLT/XPath are forked.
* W3C Schema is subsetted.
* Use of entity references for characters becomes an antipattern.

Some could say that XPath is already forked with XPath NG.

Friday, January 03, 2003

XML Class Warfare

"Strong data typing is mostly a false crutch that collapses very readily...I also received a handful of messages with no new insights, but just expressing support for the "bohemian" point of view and dismay at the layers of complexity the W3C appears to be molding over everything it produces lately." - More on XML class warfare. Largely that complexity stems from XML Schema.

"The gentry, however, would like more class consciousness from these workhorse technologies. They reason that if they have gone to the trouble of specifying in the schema that ''1.0'' represents a floating point number, the XPath and XSLT processors should make this information available, and the processor should use such type information far more broadly. The gentry take the view that such capabilities should be built into the foundations of XPath and XSLT...Not by coincidence, these specifications are several times larger and more complex than the 1.0 generation of specifications."

I would agree with the "bohemians" that as some of these specs get close to acceptance the W3C then adds one more layer of complexity. Uche seems to think that XML data types are all or nothing, especially with the new versions of XPath and XSLT. As everything is about RDF, I just have to note that even without a parser that understands RDF data types, the RDF is still useful and distinct. It's a shame though, that types in RDF was not just another property but is embedded in the syntax of RDF/XML. This kind of separation is also mentioned in one of xml-dev threads.

From the original article, "The bohemians insist that the next-generation XML technologies should not only learn from RELAX NG's isolation of class consciousness, but should avoid bias toward WXS, supporting RELAX NG and other alternatives as well. The battle rages on at present."

A Relax NG Schema for RDF/XML does exist but it is not part of the RDF specification.

JLense

This looks to be an Eclipse based framework that uses Swing instead of SWT.

"..the Eclipse plugin platform is the inspiration for JLense. JLense is a significant departure from Eclipse in some important ways, especially with regard to the UIWorks framework, but JLense also incorporates some parts of Eclipse unchanged."

"There was some speculation that the name of the Eclipse project was perhaps specifically meant to suggest eclipsing of the 'Sun'.

While looking for a name for this project I read about a phenomenon called gravitational microlensing. In short, gravitational microlensing is similar to an eclipse in that a large object passes between us and another star. But unlike an eclipse, which obscures the more distant star, gravitational microlensing will cause the more distant star to become more visible or brighter in appearance.

And that is what I hope JLense (the large object between the viewer and the star) will do for Swing (the star)."

http://jlense.sourceforge.net/

Dynamic Class Loader Using OWL

Mindswap's OWL Class Loader allows you to load and persist Java classes to RDF and is used in RIC. "So now to "compile" the Agenda-RIC program from the RIC code, all I need to do is put the correct OWL in the config file and make sure that its marked active. No recompiling is needed. Modules can be upgraded without changing code, just tell the program what the new module is to load, and voila, the program uses the new and improved part, rather than the old one you replaced.". By itself, I'm not sure it has much advantage over storing everything in XML (as in JSX, JAXB, Castor or KOML). However, as part of a larger system I can see how the configuration and autodeployment might be interesting. Your configuration files could depend upon version, security or any other piece of metadata you could query.

The next step is to actually have the actual functionality described in RDF which is close to what I've seen with Adenine (which is more like Lisp but uses Java objects) and is part of the Haystack project that I've covered before.

Thursday, January 02, 2003

J2EE Client Provisioning Servers

JSR 188 - "To enable interoperability between web servers and access mechanisms, and to facilitate development of device independent web applications, this specification will define a set of APIs for processing CC/PP information...This specification will depend on CC/PP [2] which uses RDF [4] (a framework for processing metadata) which uses XML [5]. This specification will not define a generalized API for RDF." Expert group includes HP, Nokia, Oracle, IBM, BT, and others.

Restina

Semantic Web Calendaring Agent "The Retsina Semantic Web Calendar Agent provides interoperability between RDF based calendar descriptions on the web, and Personal Information Manager (PIM) Systems such as Microsoft's Outlook. Schedules and events can be described on the web in RDF, using existing ontologies such as the Hybrid iCal-like RDF Schema or the Dublin Core ontology, and linked to individual's contact information described, for example, at their home page."

Machine Learning Software in Java

Weka and Kea - these are useful for anyone wanting to have a look at some examples of extracting keyphrases from the text of a document and other data mining activitives (data pre-processing, classification, regression, clustering, association rules, and visualization). Also, take a look at the book.

Tools will Save You

My points are invalid because my RSS, HTML and email link are broken. Condemn me for my argument not my broken tools! At least you didn't comment on my appearance, spelling or grammar. Anyway, unless you parsed them yourself you used tools to see that my RSS and HTML were invalid. But is HTML or RSS flawed because my RSS and HTML are flawed? These can be fixed and will. I'm sorry you missed the point.

Specifically:
- I've written about RDF's failure with RSS before, its great tragedy (this one references a previous Dive Into Mark), and a simple way to convert between the various formats. I also think that the RDF/XML syntax is a mess and is an obstacle to good tool writing.
- An example where competition and truth can work together is with Semantic Web Services. There's also the more indepth paper "The Web Service Modeling Framework (WSMF)" (which covers mediated P2P as a way to get around the schema problems in a P2P environment).
- Blogger annoys me a lot, the ampersand problem is just one of many, it mangles ampersands in links. I really want to stop using it but I have to worry about not only moving to something else but taking this content with me. Again, a problem for a tool.

Wednesday, January 01, 2003

Tag Soup

In Tag Soup TNG Mark Pilgrim writes that the Semantic Web will never succeed because people lie. He also says that it's machine readable only which is just not so. As Tim Berners-Lee wrote in Nature: "The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help users communicate with each other." It's extra to the Web not a replacement.

In the Semantic Web, as in life, you don't have to believe everything everybody says forever. If you are deeply cynical, then don't believe what anyone says. Still, the Semantic Web technologies could be of use. You could use your own text mining tool to extract metadata (unlike the document's own data in the meta tag). You can then combine this with other document's metadata and create your own ontology. You can use ontologies to further classify other documents or sets of documents.

If you can trust other people and their metadata extraction then things get even better or even exponentially better as you trust more (people, groups, companies, etc.) and/or use more tools (Semantic Web enabled P2P clients or Google, etc.).

RDF lets you create statements that the document/author made, that you or anyone else has made. The lies can be categorized and pruned to your or anyone elses whim.

I really hope no one or at least very few, will ever have to manually produce RDF/XML. If you're a programmer use a library, if you're not use an application. If you want to write it yourself it's just like code or English - if it doesn't parse it can't be understood.

The schema problem does not seem that hard either. People do it now for XML or database schemas. The conversion tools can either be as general or as specific as you want and can include as much or as little human intervention as you want. The OntoShare paper addresses some of these problems such as how do you evolve a shared ontology and the use of tools (in their case ViewSum) to consistently extract key concepts. Semantic Gossiping outlines the problems involved in doing this in a peer-to-peer environment.

XXL

XXL provides a powerful collection of easy-to-use index-structures, query operators and algorithms facilitating the performance evaluation of new query processing developments. Query algorithms in XXL use the same set of basic classes, like I/O routines and improved (de)serialization methods. Index structures include B-trees, R-trees, M-trees, AVL-trees, Red-Black-trees and bulk-loading-algorithms. Also includes VisualXXL which is a graphical query engine. Requires Java 3D.

JEDAS

JEDAS (Java EDucational Animation System). Allows: animations and simulations as Java applications and applets, integration of existing implemented algorithms and processes, recording, multicast transmission and annotation of JEDAS animations in online presentations and a playback facility.

JX

JX - The JX system architecture consists of a set of Java components executing on the JX core that is responsible for system initialization, CPU context switching and low-level domain management. The Java code is organized in components which are loaded into domains, verified, and translated to native code.