The Online Journalism Review has an article about Newsblaster which is the result of a DARPA funded project called TIDES.

TIDES is an ambitious technology development effort focused on the automated processing and understanding of a variety of human language data. The primary goal is to make it possible for English speakers to find and interpret needed information quickly and effectively regardless of language or medium.

Our objective is to develop a new generation of language analysis software that will multi-lingually and translingually analyze, categorize, and conceptually index gigabytes of text/hour, detecting topics, extracting entities and relations, automatically linking related documents and knowledge, and spotting important/novel information. The result will be multi-modal content-based analysis of a collection of documents using textual labels, running text, tables, geo-displays, timelines, and link displays. The primary target for demonstrations will be news.

A list of all the projects associated with TIDES:

A demo of the TIDES technology called Newsblaster:

Columbia University is also working on a project called PERSIVAL. Healthcare consumers and providers both need quick and easy access to a wide range of online resources. The goal of this project is to provide personalized access to a distributed patient care digital library through the development of a system, PERSIVAL.

All projects (which include things like automatic summerization, multimedia generation and corpus annotation).


Friday, March 15, 2002

RDF Visualization Tool

Curiously, this tool uses Jena and not the W3C RDF API even though it's a W3C tool.

`Serious software companies don't ship open source. They may start with it but they build products on it,'' Card added. ''You just have to be serious about the business and I don't think they are serious (about Netscape).'' - Jupiter Media Metrix analyst David Card.

Thursday, March 14, 2002

P4 vs G4

The Register article was wrong. I found a better article which objectively discusses the P4 vs G4.

Wednesday, March 13, 2002

G4s don't suck

"As for SSE2 vs. Altivec, SSE2 is a toy by comparison. Its architecture does not offer the range of generalized high precision capability that the altivec instruction set does. It is filled with bandwidth limitations, particularly its tiny number of harder to use registers that make it nearly impossible to keep the pipeline full, and it is capable of basically no parallelism whatsoever with the regular FP unit on the processor (which means it must start and stop each unit to switch back and forth, and the lack of generalization makes this an excruciating performance penalty). The small number of registers in particular makes the P3 a better scientific computing processor than the P4 for real world applications because the P4's pipe is too deep to keep it filled. This can be graphically demonstrated with fully optimized applications that force significant branching on real world data. "
Revenge of Mozilla

"Hebrew is now supported on Solaris. By itself, the sentence is weirdly enigmatic. But it's a heck of a lot more significant than the simple fact that some users of Sun hardware can now render the Hebrew alphabet accurately while Web surfing. It means that somewhere out there, someone right now is hacking a few lines of code that will make life better for someone else, and we all get to benefit. "

Monday, March 11, 2002

AOL goes Linux and Mozilla (at last!)

According to Newsforge AOL is going Linux for it's servers and has a client that's ready (boring). The client (version 8) is rumoured to be based on Mozilla's Gecko rendering engine. Of course, version 7 was supposed to be Mozilla as well. So the Netscape version will be the only 100% pure Mozilla based browser. It's good to see that the reason is better standard support something that IE sucks at (although give me good DHTML performance too!).

Sunday, March 10, 2002

.Net Gumbo

While I don't like posting stuff from other web logs I found an interesting post to reinforce my distrust of multiple languages in .Net.

The example is using C# classes and just calling it from C# and VB.Net.

The difference in results is explained because overloading behaviour is not standard. It's up to the compiler of the language. So VB.Net find the "closests" method in the class hierachy. C# uses the direct parents class (as I would think).

You can produce enough bugs using one language, forget about trying to figure out the problems caused by cross language integration.
Ontopia Knowledge Suite(TM).

Ontopia AS, providers of advanced topic map tools and services and co-creators of the topic mapping standards, today announced the release of version 1.3 of the
Ontopia Knowledge Suite(TM).

Topic mapping is a new paradigm for organizing, retrieving, and navigating information resources. Through the provision of a 'knowledge layer' that is independent of the information resources themselves, topic maps help capture and manage corporate memory, improve indexing, and enable the integration of information that spans multiple, disparate repositories. Topic maps are based on an international standard, defined by the ISO, and are interchanged using the Extensible Markup Language (XML) defined by the W3C.

The Ontopia Knowledge Suite (OKS) comprises a full-featured Topic Map Engine written in 100% Java; the Ontopia Topic Map Navigator Framework, a framework for building J2EE compliant web applications; integration with full-text search engines; and an RDBMS backend for persistent storage of very large topic maps.

New functionality

Version 1.3 of the OKS adds significant new functionality in the form of a query engine and schema tools. Among other enhancements, all components now support Java2 version 1.4, memory consumption has been reduced by on average 30%, and support has been introduced for version 1.2 of LTM, the Linear Topic Map notation.

The query engine, which utilises the 'tolog' query language, enables complex queries to be performed on topic maps. For example, the following query against the well-known Italian Opera topic map, returns a list of cities in descending order of the number of premieres they hosted:

select $CITY, count($OPERA) from { premiere($OPERA : opera, $CITY : city) | premiere($OPERA : opera, $THEATRE : theatre), located-in($THEATRE : containee, $CITY : container) } order by $OPERA desc?

The schema tools allow the validation of semantic constraints expressed using the Ontopia Schema Language (OSL), and the development of more intelligent, schema-aware end-user applications. For example, it is possible to constrain associations of type "born in" to have exactly two roles, "person" and "place", each of which must be played by topics belonging to the superclasses "person" and "place" respectively.

In addition to providing customers with much-needed additional power,'tolog' and OSL are major contributions to the projects currently underway in ISO to develop a Topic Map Query Language (TMQL) and a Topic Map Constraint Language (TMCL).

Together with the Navigator Framework, these new features enable customers to build powerful topic map-driven web portals that integrate information from a variety of sources, provide much more intuitive navigational interfaces for end users, and greatly simplify web site development and maintenance.

New Omnigator

To coincide with the release of version 1.3, there is also a new version of Ontopia's popular free topic map browser, the Omnigator.

The Omnigator is a generic application built on top of the Ontopia Navigator Framework that allows users to load and browse any conforming topic map, including their own. Designed primarily as a teaching aid to help newcomers understand the topic map concepts, it is also an extremely useful tool for debugging topic maps and for building demo applications.

Some of the new features in the Omnigator 6 include plug-ins for performing querying and validation, the ability to display class hierarchies (in both text and graphics modes), better stylesheets, andan improved statistics printer. The Omnigator 6 can be downloaded for free from

New product structure

As of version 1.3, the Ontopia Knowledge Suite will be offered in three different editions, corresponding to the needs of different groups of users. Each edition is available under either a Development License (D) or a Runtime License (R). OEM licenses are also available at negotiable prices.

PERSONAL EDITION: $500 (D), $5,000 (R)
- Includes the Topic Map Engine and the Omnigator.

PROFESSIONAL EDITION: $2,000 (D), $20,000 (R)
- Includes the Topic Map Engine, the Navigator Framework, thefull-text integration, query engine, and schema tools.

ENTERPRISE EDITION: $3,000 (D), $30,000 (R)
- Extends the Professional Edition with the RDBMS backend for persistent storage of very large topic maps in relational databases.
More Mono

* Mono now builds on OS X .
* Mono is now self hosting on Linux (although it still seems to crash).

Mono on Linux, Windows, OS X, Solaris and various BSD Unices before JDK 1.4. Following the thread it seems so much like Java:21 % is spent in Array.Copy, 20 % of the time is spent in the various StringBuilder.Append and 15 % is spent in String.BoyerMoore (searching). However, the quality is still way behind that of Java for things like JITing and basic functions (as you would expect for a relatively new, smallish team).

Now how Steve got the company back from the brink:

1. He focused strongly on bringing cost in line with WinTel PC's. If I remember correctly, he hired away a top manager from Compaq to re-engineer Apple's supply chain (cost reduction).

2. He introduced radically new computers for the home and education markets (iMac). He did this by springing a surprise on competitors (surprise is very important to keep competitors off-balance and stunned, preventing them from regrouping and attacking).

3. He focused on satisfying customers in strong holds like printing and publishing, education, scientific and graphics. He made computers and developed technologies to satisfy this market e.g. beefed up QuickTime, robust AppleScript, cheaper computers for education markets and a dedicated sales force for the education market.

4. Apple's CFO I guess was instructed to start building a war chest for a couple of quarter's worth of losses. The target was $4 billion, as that amount has been stable for the last few quarters in the balance sheet.

How can a company gain advantages in these 4 arenas?

D'Aveni says that there are the new 7 S's

Speed: is related to the churning out of advantage after advantage in each arena. Apple has been regularly churning out product innovations like Airport, Mac OS X, iTunes, iMovie and novel form factors etc.

Strategic Soothsaying: Steve Jobs setting targets to become the center of the Digital Hub is an example of this. Steve or any body else from the company, as far as I know, has not postulated the next aim.

Superior Stakeholder Satisfaction: Satisfying stakeholder is very important. The most important stakeholders are the customers. Apple has obviously been doing very well as they won the best Computer Support Award from ZDNet. The lowliest stakeholders are the shareholders and company executives.

Surprise: Apple has consistently surprised customers and competitors alike by introducing the iMac, Airport, iMovie, Final Cut Pro and lean mean laptops like the iBook and PowerBook G4.

Shift the rules of competition: Apple shifted the rules of competition from MHz and RAM to aesthetics. All Apple computers look beautiful, while the competitors suck.

Signaling Strategic intent: Apple clearly in early 1997 interviews signaled it's intent to provide better products and services to it's customers. When Dell started to make inroads into the education market, Apple signaled it's intent not to lose this market by creating a focused in house sales force, appointing a vice president for educational sales and aggressively providing discounts to education systems around the country to make sales. Recently it went to court to prevent Microsoft from donating used Pc's and its software to schools.

Simultaneous & Strategic thrusts: Apple consistently made multiple thrusts in software, hardware and customer support (Knowledge Base).

A computer scientist looks at game theory

I consider issues in distributed computation that should be of relevance to game theory. In particular, I focus on (a) representing knowledge and uncertainty, (b) dealing with failures, and (c) specification of mechanisms.

Tuesday, March 05, 2002

Agoric Computing - Applying Market Forces to Computing (Getting Agents to Work?)

"For a variety of reasons, this work explores essentially pure markets as models of economic organization for computation, supported by a minimal "legal" framework of foundational constraints. A large body of economic theory and historical experience indicates that markets are, on the whole, remarkably effective in promoting efficient, cooperative interactions among entities with diverse knowledge, skills, and goals."

It has a rather full explanation of getting agents to work.

"The following argues that agents at a higher level can accomplish adaptive automatic data structure selection, guide sophisticated code transformation techniques, provide for competition among business agents, and maintain reputation information to guide competition."

Links to the papers:
Fabian Pascal and Relational Databases:

1A. Can you name the 'truly relational DBMS' that you have mentioned? 1B. Can you tell us anything about this new technology that makes use of true relational technology?

My reference was not to a relational DBMS per se, but rather to a recently developed technology which can be used to build truly relational DBMSs, as well as other software tools. Unfortunately, I am not at liberty to say much about it at this time. What I can say is that the tables in RDBMSs implemented using this technology would truly resemble mathematical sets and would not have the inherent physical ordering of rows and columns that current SQL products impose on tables. It is, of course, up to the industry to use this technology, but given the way in which it operates, I would not hold my breath (see question 4 below).,289483,sid13_gci804576,00.html

eWeek recently published a set of performance values on various databases. Showing that MySQL, Oracle were the fastest and DB2 and SQL Server were the slowest.,3658,s=702&a=23115,00.asp

According to Fabian: "Oracle is not a truly and fully relational DBMS, only a SQL one. MySQL is essentially a file manager.",289483,sid13_gci799976,00.html?FromTaxonomy=%2Fpr%2F284872

Some interesting points:
* There is no standard VM, everything is JITed to an executable.
* The Mono project requires the MS implementation.
Another RDF Query language.

While I haven't worked if it is distinctly different or just a grammatic skin over other query languages Versa is interesting none the less.

Smart brick hits Wall

Unable to get an R&D grant from the Australian government (too cutting edge apparently). A QLD company called Ultimate Masonary is producing a brick from the waste of coal fired power plants. It weighs half as much and requires no rendering.

Wednesday, February 27, 2002

Rosetta Stone of the 21st Century?

You can rip the subtitle information out of VOB files (that are on DVDs). Not only will you be able to search for "Never get involved in a land war in Asia" but how to say it in Spainish and other languages.

I think (and I might be wrong here) that DVD subtitle information would provide the biggest human repository of translation information.

Page of subtitle software:

No doubt this is copy protected (and copyrighted) though. But it would probably be fair use to hold this in a peer-to-peer fashion. You only hold and index the DVDs you own. Then just return the results of the search.

Now if we combine it with the graphical search engines (Eikon or eVision or something) you can put in a picture of an actress/actor and get all the movies they appear in.

This probably isn't a million miles away from my video store project that I did in High School. :-)

The Australian Capital Territory (ACT) Dept of Urban Services have
provided their police officers, parking inspectors, and on-road vehicle
inspectors with WAP-enabled mobile phones and PDA's which use the Cable
& Wireless Optus GPRS network to connect to the rego.ACT database system.

Police have always been mobile workers and availability of data instantly
in crucial. rego.ACT recognise that, within 2 years, mobile access will be
indispensable for monile workers. GPRS represents the first step with ACT
police already envisaging further uses for mobile data services.

The system was launched on 1 Aug 2001 to provide real time access to the
database for driver and vehicle details in the ACT as well as NSW and
Victoria. By mid-October 2001, the police are expected to have access to
all Australian vehicles and licenses. The System is used to identify
stolen or written off vehicles on the shpot and replaces the requirement
to radio back to base for information.

The previous systems (TRIPS) was "slow & cubersome, crashed all the time
and only operated about 70% of the time" (ACT statement in Computer World
21 Aug. 2001). GPRS was chosen as a delivery mechanism to address latency
issues involved with CS data over GSM.

Australia has one of the highest car theft rates in the world and by
using the system, police have the potential to curb the stolen vehicle
market in the ACT. The systems enables officers to spend more time on the
beat and less time on administration. ACT police recognise that the
availability of such information "could make a police officer a very
efficient opertor ". (ACT statement in Computerworld 21 Aug 2001)

The key success factors of the systems are seen to include:

* Real time accessw (1-3 seconds response time compared to 30-60
seconds on GSM)
* Very simple user interface the mobile devices have been
accepted well by officers
* Secure access (provided by 128 bit SSL and WTLS)
* Low cost of ownership & adaptability for future technology
* Performance & scalability to deliver fast response times 24/7
with over 99.8% availability

The next stage of the initiative will be to provide officers with Intranet
access of the systems over ACT's government WAN (early 2002). This will
be3 followed by the provision of Internet access for the public to
transact with the government (mid 2002).

The rego.ACT system was developed by Internet Solutions Australia and CSC
and delivered by C&W Optus. It is built on standards written entirely in
Java, using an n-tier enterprise architecture built on open standards
using the J2EE platform. The middle tier uses EJB and the link between
mobile devices and the system is secured using 128- bit WTLS.

XML Databases

There is an update about the available databases that support XML. Of those, there's quite a few open source ones:
* 4Suite,
* eXist,
* Ozone,
* XDB,
* Xindice.

log4j 1.2 beta 3:

In addition to many performance improvements, bug fixes, and other small enhancements, log4j 1.2 beta3 adds JMX support, Mapped Diagnostic Contexts, and buffered IO capability. One important change is the replacement of the Category class with Logger class and the Priority with the Level in order to facilitate migrating from the JDK 1.4 logging API to log4j.

All changes except the removal of deprecated methods are backward compatible such that log4j 1.2beta3 can be considered a drop in replacement for log4j 1.1.3.

Using the American model to solve all acts of terrorism:

"To prevent terrorism by dropping bombs on Iraq is such an obvious idea that I can't think why no one has thought of it before. It's so simple. If only the UK had done something similar in Northern Ireland, we wouldn't be in the mess we are in today."

"Having bombed Dublin and, perhaps, a few IRA training bogs in Tipperary, we could not have afforded to be complacent. We would have had to turn our attention to those states which had supported and funded the IRA terrorists through all these years. The main provider of funds was, of course, the USA, and this would have posed us with a bit of a problem. Where to bomb in America?",6903,651594,00.html

Microsoft's Metadata Management

The metadata services is part of MS SQL Server 2000.

Their model management techonology uses an algorithm called CUPID to map two data sources together. It was developed by Phil Bernstein. The paper goes through the various different way to match two different models (or database/xml schemas) using linguistic, element, structure, referential and contextual matching.

CUPID paper:

Another such tool is ARTEMIS here's a paper called: "A Schema Analysis and Reconciliation Tool Environment for Heterogeneous Databases":

The MOMIS Project is the parent project:

Global Crossing announced the fouth largest bankruptcy about two weeks ago. They had the same accounting firm, Anderson. The big accounting firms have gone from eight, to six, to four. Mary Sullivan from the DOJ released a report on the reasons why these mergers occur including: "across-the-board marginal cost reductions, marginal cost reductions for large clients, coordinated effects and unilateral anticompetitive effects".

Article on Global Crossing:
The DOJ report:
Fear Me!

It took Java about 7 years to acquire 40MB of bloat. MS .Net, which is really leveraging on 20 odd years of bloat technology, is 131MB in size. If you want a compiler for C# (and VB, JScript, C++) try it out. Three OSes supported: Window NT 4.0, Windows 2000 and Windows XP. Watch out for those messy buffer overrun problems.

Mazda's new car

She's Miss California (uh) hottest thing in West LA (uh uh)
House down by the water
Sails her yacht across the bay
Drives a mp3 (oo oo oo ooo)
Hollywood's her favourite scene
Loves to be surrounded with superstars that know her name

