Semantic search

Jump to navigation Jump to search

What's in a name - Part 1

There are tons of mistakes that may occur when writing down RDF statements. I will post a six part series of blog entries, starting with this one, about what can go wrong in the course of naming resources, why it is wrong, and why you should care - if at all. I'll try to mix experience with pragmatics, usability with philosophy. And I surely hope that, if you disagree, you'll do so in the comments or in your own blog.

The first one is the easiest to spot. Here we go:

"Politeia" dc:creator "Plato".

If you don't know about the differences between Literals, QNames and URIs, please take a look at the RDF Primer. It's easy to read and absolutely essential. If you know about the differences, you already know that the above said actually isn't a valid RDF statement: you can't have a literal as the subject of a statement. So, let's change this:

philo:Politeia dc:creator "Plato".

What's the difference between these two? In the first one you say that "Plato" is the creator of "Politeia" (we take the semantics of dc:creator for granted for now). But in the second you say that "Plato" is the creator of philo:Politeia. That's like in Dragonheart, where Bowen tries to find a name for the dragon because he can't just call him "dragon", and he decides on "draco". The dragon comments: "So, instead of calling me dragon in your own language, you decide to call me dragon in another language."

Yep, we decide to talk about Politeia in another language. Because RDF is another language. It tries to look like ours, it even has subjects, objects, predicates, but it is not the language of humans. It is (mostly) much easier, so easy in fact even computers can cope with it (and that's about the whole point of the Semantic Web in the first place, so you shouldn't be too surprised here).

"Politeia" has a well defined meaning: it is a literal (the quotation marks tell you that) and thus it is interpreted as a value. "Politeia" actually is just a word, a symbol, a sign pointing to the meant string Politeia (a better example would be: "42" means the number 42. "101010b", "Fourty-Two" or "2Ah" would have been perfectly valid other signs denoting the number 42).

And what about philo:Politeia? How is it different from "Politeia", what does this point to?

philo:Politeia is a Qualified Name (QName), and thus ultimatively a short-hand notation for an URI, an Unified Resource Identifier. In RDF, everything has to be a resource (well, remember, RDF stands for Resource Description Framework), but that's not really a constraint, as you may simply consider everything a resource. Even you and me. And URIs are names for resources. Universally (well, at least globally) unique names. Like philo:Politeia.

You may wonder about what your URI is, the one URI denoting you. Or what the URI of Plato is, or of the Politeia? How to choose good URIs, and what may go wrong? And what do URIs actually denote, and how? We'll discuss this all in the next five parts of this series, don't worry, just stay tuned.

Why we will win

People keep saying that the Semantic Web is just a hype. That we are just an unholy chimaera of undead AI researchers talking about problems solved by the database guys 15 years ago. And that our work will never make any impact in the so called real world out there.

As I stated before: I'm a believer. I'm even a catholic, so this means I'm pretty good at ignoring hard facts about reality in order to stick to my beliefs, but it is different in this case: I slowly start to comprehend why Semantic Web technology will prevail and make life better for everyone out there. It' simply the next step in the IT RevoEvolution.

Let's remember the history of computing. Shortly after the invention of the abacus the obvious next step, the computer mainframe, appeared. Whoever wanted to work with it, had to learn to use this one mainframe model (well, the very first ones were one-of-a-kind machines). Being able to use one didn't necessarily help you using the other.

First the costs for software development were negligible. But slowly this changed, and Fred Brooks wrote down his experience with creating the legendary System/360 in the Mythical Man-Month (a must-read for software engineers), showing how much has changed.

Change was about to come, and it did come twofold. Dennis Ritchie is to blame for both of them: together with Ken Thompson he made Unix, but in order to make that, he had to make a programming language to write Unix in, this was C, which he made together with Brian Kernighan (this account is overly simplified, look at the history of Unix for a better overview).

Things became much easier now. You could port programs in a simpler way than before, just recompile (and introduce a few hundred #IFDEFs). Still, the masses used the Commodore 64, the Amiga, the Atari ST. Buying a compatible model was more important than looking at the stats. It was the achievement of the hardware development for the PC and of Microsoft to unify the operating systems for home computers.

Then came the dawning of the age of World Wide Web. Suddenly the operating system became uninteresting, the browser you use was more important. Browser wars raged. And in parallel, Java emerged. Compile once, run everywhere. How cool was that? And after the browser wars ended, the W3Cs cries for standards became heard.

That's the world as it is now. Working at the AIFB, I see how no one cares what operating system the other has, be it Linux, Mac or Windows, as long as you have a running Java Virtual Machine, a Python interpreter, a Browser, a C++ compiler. Portability really isn't the problem anymore (like everything in this text, this is oversimplified).

But do you think, being OS independent is enough? Are you content with having your programs run everywhere? If so, fine. But you shouldn't be. You should ask for more. You also want to be independent of applications! Take back your data. Data wants to be free, not locked inside an application. After you have written your text in Word, you want to be able to work with it in your Latex typesetter. After getting contact information via a Bluetooth connection to your mobile phone, you want to be able to send an eMail to the contact from your web mail account.

There are two ways to achieve this: the one is with standard data formats. If everyone uses vCard-files for contact information, the data should flow freely, shouldn't it? OpenOffice can read Word files, so there we see interoperability of data, don't we?

Yes, we do. And if it works, fine. But more often than not it doesn't. You need to export and import data explicitly. Tedious, boring, error prone, unnerving. Standards don't happen that easily. Often enough interoperability is achieved with reverse engineering. That's not the way to go.

Using a common data model with well defined semantics and solving tons of interoperability questions (Charset, syntax, file transfer) and being able to declare semantic mappings with ontologies - just try to imagine that! Applications being aware of each other, speaking a common language - but without standard bodies discussing it for years, defining it statically, unmoving.

There is a common theme in the IT history towards more freedom. I don't mean free like in free speech, I mean free like in free will.

That's why we will win.

I am weak

Basically I was working today, instead of doing some stuff I should have finished a week ago for some private activities.

The challenge I posed myself: how semantic can I already get? What tools can I already use? Firefox has some pretty neat extensions, like FOAFer, or the del.icio.us plugin. I'll see if I can work with them, if there's a real payoff. The coolest, somehow semantic plugin I installed is the SearchStatus. It shows me the PageRank and the Alexa rating of the visited site. I think that's really great. It gives me just the first glimpse of what metadata can do in helping being an informed user. The Link Toolbar should be absolutely necessary, but pitily it isn't, as not enough people make us of HTMLs link element the way it is supposed to be used.

Totally unsemantic is the mouse gestures plugin. Nevertheless, I loved those with Opera, and I'm happy to have them back.

Still, there are such neat things like a RDF editor and query engine. Installed it and now I want to see how to work with it... but actually I should go upstairs, clean my room, organise my bills and insurance and doing all this real life stuff...

What's the short message? Get Firefox today and discover its extensions!

Imagine there's a revolution...

... and no one is going to it.

This notion sometimes scares me when I think abou the semantic web. What if all this great ideas are just to complex to be implemented? What if it remains an ivory tower dream? But, on the other hand, how much pragmatism can we take without loosing the vision?

And then, again, I see the semantic web working already: it's del.icio.us, it's flickr, it's julie, and there's so much more to come. The big time of the semantic web is yet to come, and I think none of us can really imagine the impact it is going to have. But it will definitively be interesting!

AcceLogiChip

Accelerated logic chips - that would be neat.

The problem with all this OWL stuff is, that it is computationally expensive. Google beats you in speed easily, having some 60.000 PCs or so, but indexing some 8 billion web pages, each with maybe a thousand words. And if you ever tried Googles Desktop Search, you will see they can perform this miracles right on your PC too! (Never mind that there are a dozen tools doing exactly the same stuff Googles Desktop Search does, just better - but hey, they lack the name!)

What does the Semantic Web achieve? Well, ever tried to run a logic inferencing engine with a few million instances? With a highly axiomatized TBox of, let's say, just a few thousand terms? No? You really should.

Sure, our PCs do get faster all the time (thanks to Moores Law!), but is that fast enough? We want to see the Semantic Web up and running not in a few more iterations of Moores Law, but much, much earlier. Why not use the same trick graphic magicians did? Highly specialized accelerated logic chips, things that can do your tableu reasoning in just a fraction of the time needed with your bloated all-purpose-CPU.

World Wide Prolog

Today I had an idea - maybe this whole Semantic Web idea is nothing else than a big worldwide Prolog program. It's the AI researchers trying to enter the real world through the W3Cs backdoor...

No, really, think about it: almost all most people do with OWL is actually some logic programing. Declaring subsumptions, predicates, conjunctions, testing for entailment, getting answers out of this - but on a world wide scale. And your browser does the inferencing for you (or maybe the server? Depends on your architecture).

They are still a lot of questions open (and the actual semantic differences between Description Logics, and Logic Programming surely ain't the smalles ones of them), like how to infere anything with contradicting data (something that surely will happen in the World Wide Semantic Web), how to treat dynamics (I'm not sure how to do that without reification in RDF), and much more. Looking forward to see this issues resolved...

Gnowsis and further

Today, Leo Sauermann of the DFKI was here, presenting his work on Gnowsis. It was really interesting, and though I don't agree with everything he said, I am totally impressed by the working system he presented. It's close to some ideas I had, about a Semantic Operating System Kernel, doing nothing but administrate your RDF data and offering it to any application around via a http-protocol. Well, I guess this idea was just a tat too obvious...

So I installed Gnowsis on my own desktop and play around with it now. I guess the problem is we don't really have roundtrip information yet - i.e., Information I change in one place shall magically be changed everywhere. What Gnowsis does is integrate the data from various sources into one view, that makes a lot of applications easily accessible. Great idea. But roundtripping data integration is definitively what we need: if I change the phone number of a person, I want this change to get propagated to all applications.

So again, differing to Gnowsis I would prefer a RDF store, that actually offers the whole data householding for all applications sitting atop. Applications are nought but a view on your data. Integrating from existing applications is done the Gnowsis way, but after that we leave the common trail. Oh well, as said, really interesting talk.

Mother philosophy

I should start to write some content on this blog soon, but actually I am still impressed with this technology I am learning here every day...

When the FOIS2004 was approaching, an Italian newspapers published this under the heading "Philosophy - finally useful for something" (or so, my Italian is based on a autodidactic half day course). I found this funny, and totally untrue.

Philosophy always had the bad luck, that every time a certain aspect of it provoced wider attention, this aspect became a discipline of its own. Physics, geometry and mathematics are the classical examples, later on theology, linguistics, anthropology, and then, in the 20th century, logic went this way too. It's like philosophy being the big incubator for new disciplines (you can see that still in the anglo-american tradition of almost all doctors actually being Ph.D.s, philosophical doctors.

Thus this misconception becomes understandable. Now, let's look around - what's the next discipline being born from philosophy? Will it be business ethics? Will it be the philosophy of science, being renamed as scientific managment?

My guess is: due to the fast growing area of the Semantic Web, it will be ontology. Today, the Wikipedia already made two articles on it, ontologies in philosophy and ontologies in computer science. This trend will gain momentum, and even though applied ontology will always feed from the fundamental work done from Socrates until today, it will become a full-fledged discipline of its own.

I'm a believer

The Semantic Web is promising quite a lot. Just take a look at the most cited description of the vision of the Semantic Web, written by Tim Berners-Lee and others. Many people are researching on the various aspects of the SemWeb, but in personal discussions I often sense a lack of believing.

I believe in it. I believe it will change the world. It will be a huge step forward to the data integration problem. It will allow many people to have more time to spend on the things they really love to do. It will help people organize their lives. It will make computers seem more intelligent and helpful. It will make the world a better place to live in.

This doesn't mean it will safe the world. It will offer only "nice to have"-features, but then, so many of them you will hardly be able to think of another world. I hardly remember the world how it was before e-Mail came along (I'm not that old yet, mind you). I sometimes can't remember how we went out in the evening without a mobile. That's where I see the SemWeb in 10 years: no one will think it's essential, but you will be amazed when thinking back how you lived without it.

Who am I?

Well, as this being a blog, it will turn out that it is more important what I write than who I am. Just for the context, I nevertheless want to offer a short sketch about my bio.

I studied Computer Science and Philosophy at the University of Stuttgart, Germany. In Computer Science, I thought about Software Architectures, Programming Languages and User Interfaces, and my master thesis happened to be the first package to offer a validating XML parser for the programming language Ada 95.
In Philosophy I started thinking a lot of Justice, especially John Rawls and Plato, but finally I had a strong move to Construcitivst Epistemology and the ontological status of neural networks (both papers are in German and available from my website.

It's a pretty funny thing that next week I will listen to talk on neural networks and ontologies again, and nevertheless my then made paper and the talk won't have too much in common ;-)

Well, so how comes I am working on Semantic Web technologies by now? I have the incredible luck to work in the Knowledge Management Group of the AIFB in Karlsruhe, and there on the EU SEKT Project. I still have a lot to learn, but in the last few weeks I aggregated quite a good grasp on Ontology Engineering, RDF and OWL and some other fields. This is all pretty exicting and amazing and I am looking forward to see what's around the next triple.

Welcome!

Welcome to my new blog! Technology kindly provided by Blogger.com