Semantic search

Jump to navigation Jump to search

Wikidatan in residence at Google

Over the last few years, more and more research teams all around the world have started to use Wikidata. Wikidata is becoming a fundamental resource. That is also true for research at Google. One advantage of using Wikidata as a research resource is that it is available to everyone. Results can be reproduced and validated externally. Yay!

I had used my 20% time to support such teams. The requests became more frequent, and now I am moving to a new role in Google Research, akin to a Wikimedian in Residence: my role is to promote understanding of the Wikimedia projects within Google, work with Googlers to share more resources with the Wikimedia communities, and to facilitate the improvement of Wikimedia content by the Wikimedia communities, all with a strong focus on Wikidata.

One deeply satisfying thing for me is that the goals of my new role and the goals of the communities are so well aligned: it is really about improving the coverage and quality of the content, and about pushing the projects closer towards letting everyone share in the sum of all knowledge.

Expect to see more from me again - there are already a number of fun ideas in the pipeline, and I am looking forward to see them get out of the gates! I am looking forward to hearing your ideas and suggestions, and to continue contributing to the Wikimedia goals.

Wikimania 2006 is over

And it sure was one of the hottest conferences ever! I don't mean just because of the 40°C/100°F that we had to endure in Boston, but also because of the further speakers there.

Brewster Kahle, the man behind the Internet Archive, and who started Alexa and WAIS Inc., told us about his plans to digitalize every book (just a few Petabytes), every movie (just a few Petabytes), every record (just a... well, you get the drill), and to make a snapshot of the web every few months, and archive this. Wow.

Yochai Benkler spoke about the Wealth of Networks. You can download his book from his site, or go to a bookstore and get it there. The talk was really inviting to read it: why does a network thingy like Wikipedia work and not suck? How does this change basically everything?

Next day, there was Mitch Kapor, president of the Open Source Application Foundation -- and I am really sorry I had to miss his talk, because at the same time we were giving our workshop on how to reuse the knowledge within a Semantic MediaWiki in your own applications and websites. Markus Krötzsch, travel companion and fellow AIFB PhD student, and basically the wizard who programmed most of the Semantic MediaWiki extension, totally surprised me by being surprised about what you can do with this Semantic Web stuff. Yes, indeed, the idea is to be able to ask another website to put stuff up on yours. And to mush data.

There was David Weinberger, whose talk made me laugh more than I had for a while (and I am quite merry, usually!). I still have to rethink what he actually said, contentwise, but it made a lot of sense, and I took some notes, it was on the structure of knowledge, and how it changes in the new world we are living in.

Ben Shneiderman, the pope on visualization and User Interfaces had an interesting talk on visualizing the Wikipedia. The two talks before his, by Fernanda Viegas and Martin Wattenberg, were really great, because they have visualized real Wikipedia data -- and showed us a lot of interesting data. I hope their tools will become available soon. (Ben's own talk was rather a bit disappointing, as he didn't seem to have the time to take some real data, but only used fake data to show some general possible visualizations. As i had the chance to see him in Darmstadt last year anyway, I didn't see much new stuff).

The party at the MIT Museum was great! Even though I wasn't allow to drink, because I forgot my ID. I'd never think anyone would consider me looking younger than 21. So I take this as the most sincere compliment. Don't bother explaining they had to check my ID even if I looked 110, I really don't want to hear :) I saw Kismet! Pitily, he was switched off.

Trust me. I was kinda tired after this week. It was lots of fun, it was enormously interesting. Thanks to all the Wikipedians, who made Wikipedia and Wikimania possible. Thanks to all these people for organizing this event and helping out! I am looking forward to Wikimania 2007, wherever it will be. The bidding for hosting Wikimania 2007 are open!

Wikimania is coming

Wikimania starts on Friday. Looking forward to it, I'll be there with a collegue and we will present a paper on Wikipedia and the Semantic Web - The Missing Links on Friday. Should you be in Frankfurt, don't miss it!

Here's the abstract: "Wikipedia is the biggest collaboratively created source of encyclopaedic knowledge. Growing beyond the borders of any traditional encyclopaedia, it is facing new problems of knowledge management: The current excessive usage of article lists and categories witnesses the fact that 19th century content organization technologies like inter-article references and indices are no longer sufficient for today's needs.

Rather, it is necessary to allow knowledge processing in a computer assisted way, for example to intelligently query the knowledge base. To this end, we propose the introduction of typed links as an extremely simple and unintrusive way for rendering large parts of Wikipedia machine readable. We provide a detailed plan on how to achieve this goal in a way that hardly impacts usability and performance, propose an implementation plan, and discuss possible difficulties on Wikipedia's way to the semantic future of the World Wide Web. The possible gains of this endeavour are huge; we sketch them by considering some immediate applications that semantic technologies can provide to enhance browsing, searching, and editing Wikipedia."

Basically we suggest to introduce typed links to the Wikipedia, and an RDF-export of the articles annotated with these typed links being regarded as relations. And suddenly, you get the a huge ontology, created by thousands and thousands of editors, queryable and usable, a really big starting block and incubator for Semantic Web technologies - and all this, still scalable!

If the Wikipedia community agrees that this is a nice idea, which I hope with all my heart. We'll see this weekend.

Wired: "Wikipedia is the last best place on the Internet"

WIRED published a beautiful ode to Wikipedia, painting the history of the movement with broad strokes, aiming to capture its impact and ambition with beautiful prose. It is a long piece, but I found the writing exciting.

Here's my favorite paragraph:

"Pedantry this powerful is itself a kind of engine, and it is fueled by an enthusiasm that verges on love. Many early critiques of computer-assisted reference works feared a vital human quality would be stripped out in favor of bland fact-speak. That 1974 article in The Atlantic presaged this concern well: “Accuracy, of course, can better be won by a committee armed with computers than by a single intelligence. But while accuracy binds the trust between reader and contributor, eccentricity and elegance and surprise are the singular qualities that make learning an inviting transaction. And they are not qualities we associate with committees.” Yet Wikipedia has eccentricity, elegance, and surprise in abundance, especially in those moments when enthusiasm becomes excess and detail is rendered so finely (and pointlessly) that it becomes beautiful."

They also interviewed me and others for the piece, but the focus of the article is really on what the Wikipedia communities have achieved in our first two decades.

Two corrections: - I cannot be blamed for Wikidata alone, I blame Markus Krötzsch as well - the article says that half of the 40 million entries in Wikidata have been created by humans. I don't know if that is correct - what I said is that half of the edits are made by human contributors

Wordle is good and pure

The nice thing about Wordle - whether you play it or not, whether you like it or not - is that it is one of those good, pure things the Web was made for. A simple Website, without ads, popups, monetization, invasive tracking, etc.

You know, something that can chiefly be done by someone who already has a comfortable life and won't regret not having monetized this. The same way scientists mainly have been "gentleman scientist". Or tenured professors who spent years on writing novels.

And that is why I think that we should have a Universal Basic Income. To unlock that creativity. To allow for ideas from people who are not already well off to see the light. To allow for a larger diversity of people to try more interesting things.

Thank you for coming to my TED talk.

P.S.: on January 31, five days after I wrote this text, Wordle was acquired by the New York Times for an undisclosed seven-digit sum. I think that is awesome for Wardle, the developer of Wordle, and I still think that what I said was true at that time and still mostly is, although I expect the Website now to slowly change to have more tracking, branding, and eventually a paywall.

World Wide Prolog

Today I had an idea - maybe this whole Semantic Web idea is nothing else than a big worldwide Prolog program. It's the AI researchers trying to enter the real world through the W3Cs backdoor...

No, really, think about it: almost all most people do with OWL is actually some logic programing. Declaring subsumptions, predicates, conjunctions, testing for entailment, getting answers out of this - but on a world wide scale. And your browser does the inferencing for you (or maybe the server? Depends on your architecture).

They are still a lot of questions open (and the actual semantic differences between Description Logics, and Logic Programming surely ain't the smalles ones of them), like how to infere anything with contradicting data (something that surely will happen in the World Wide Semantic Web), how to treat dynamics (I'm not sure how to do that without reification in RDF), and much more. Looking forward to see this issues resolved...

Zen and the Art of Motorcycle Maintenance

13 May 2021

During my PhD, on the topic of ontology evaluation - figuring out what a good ontology is and what is not - I was running circles up and down trying to define what "good" means for an ontology (Benjamin Good, another researcher on that topic, had it easier, as he could call his metric "Good metric" and be done with it).

So while I was struggling with the definition in one of my academic essays, a kind anonymous reviewer (I think it was Aldo Gangemi) suggested I should read "Zen and the Art of Motorcycle Maintenance".

When I read the title of the suggested book, I first thought the reviewer was being mean or silly and suggesting a made-up book because I was so incoherent. It took me two days to actually check whether that book existed, as I wouldn't believe it.

It existed. And it really helped me, by allowing me to set boundaries of how far I can go in my own work, and that it is OK to have limitations, and that trying to solve EVERYTHING leads to madness.

(Thanks to Brandon Harris for triggering this memory)