ID	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t4	ref	s_id,o_id,p_id,s_id_2,s_id_3,o_id_2,o_id_3	o_id	5	const	314	Using where; Using temporary; Using filesort
1	SIMPLE	t0	eq_ref	PRIMARY,smw_id,smw_iw,smw_iw_2	PRIMARY	4	bitnami_mediawiki.t4.s_id	1	Using where
1	SIMPLE	t2	ref	s_id,o_id,s_id_2,o_id_2	s_id_2	9	bitnami_mediawiki.t4.s_id,const	1	Using index; Distinct
1	SIMPLE	t7	ref	s_id,p_id,p_id_2,s_id_2	s_id	8	bitnami_mediawiki.t4.s_id,const	1	Using index; Distinct

20 years of editing Wikipedia

11 May 2023

Today it's been exactly twenty years since I made my first edit to Wikipedia. It was about the island of Brač, in the German Wikipedia.

Here is the version of the article I have created: Brač (as of May 11, 2003)

Simia

210,000 year old human skull found in Europe

11 July 2019

A Homo Sapiens skull that is 210,000 years old had been found in Greece, together with a Neanderthal skull from 175,000 years ago.

The oldest European Homo Sapiens remains known so far only date to 40,000 years ago.

article on The Atlantic

Simia

22 years of Wikipedia

15 January 2023

I was just reading a long discussion regarding the differences between Open Street Maps and Wikipedia / Wikidata, and one of the mappers complained "Wiki* cares less about accuracy than the fact that there is something that can be cited", and calling Wikipedia / Wikidata contributions "armchair work" because we don't go out into the world to check a fact, but rely on references.

I understand the expressed frustration, but at the same time I'm having a hard time letting go of "reliability not truth" being a pillar of Wikipedia.

But this makes Wikipedia an inherently conservative project, because we don't reflect a change in the world or in our perception directly, but have to wait for reliable sources to put it in the record. There's something I was deeply uncomfortable with: so much of my life is devoted to a conservative project?

Wikipedia is a conservative project, but at the same time it's a revolutionary project. Making knowledge free and making knowledge production participatory is politically and socially a revolutionary act. How can this seeming contradiction be brought to a higher level of synthesis?

In the last few years, my discomfort with the idea of Wikipedia being conservative has considerably dissipated. One might think, sure, that happened because I'm getting older, and as we get older, we get more conservative (there's, by the way, unfortunate data questioning this premise: maybe the conservative ones simply live longer because of inequalities). Maybe. But I like to think that the meaning of the word "conservative" has changed. When I was young, the word conservative referred to right wing politicians who aimed to preserve the values and institutions of their days. An increasingly influential part of todays right wing though has turned into a movement that does not conserve and preserve values such as democracy, the environment, equality, freedoms, the scientific method. This is why I'm more comfortable with Wikipedia's conservative aspects than I used to be.

But at the same time, that can lead to a problematic stasis. We need to acknowledge that the sources and references Wikipedia has been built on, are biased due to historic and ongoing inequalities in the world, due to different values regarding the importance of certain types of references in the world. If we truly believe that Wikipedia aims to provide everyone with access to the sum of all human knowledge, we have to continue the conversations that have started about oral histories, about traditional knowledges, beyond the confines of academic publications. We have to continue and put this conversation and evolution further into the center of the movement.

Happy Birthday, Wikipedia! 22 years, while I'm 44 - half of my life (although I haven't joined until two years later). For an entire generation the world has always been a world with free knowledge that everyone can contribute to. I hope there is no going back from that achievement. But just as democracy and freedom, this is not a value that is automatically part of our world. It is a vision that has to be lived, that has to be defended, that has to be rediscovered and regained again and again, refined and redefined. We (the collective we) must wrest it from the gatekeepers of the past (including me) to allow it to remain a living, breathing, evolving, ever changing project, in order to not see only another twenty two years, but for us to understand this project as merely a foundation that will accompany us for centuries.

Simia

35th birthday of the Web

25 March 2024

Celebrating the 35th birthday of the World Wide Web, a letter by its founder, Tim Berners-Lee.

Discussing some of the issues of the Web of today: too much centralization, too much exploitation, too much disinformation, all made even more dire by the development of AI.

What to do? Some of the solution the letter mentions are Mastodon, a decentralized social network, and Solid, a Web-standards-based data governance solution, but it recognizes that more is needed, "to back the morally courageous leadership that is rising, collectivise their solutions, and to overturn the online world being dictated by profit to one that is dictated by the needs of humanity." I agree with that, but find it a bit vague.

Tim Berners-Lee on the 35th birthday of the Web, Web Foundation

I first was terribly annoyed that the letter was published on Medium, as this is a symptom of the centralization of the Web. I say, completely conscious that I am discussing it on Facebook. Obviously, both of this should be happening on our own domains, and it also does: I link not to Medium, but to the Web Foundation site, and I also have this posted on my own site and on my Mastodon account. So, it is there, on the real Web, not just on the closed walled gardens of Facebook and on one of the megasites such as Medium. But there is no indication of engagement on the Web Foundation's post, whereas the Medium article records more than 10,000 reactions, and my Facebook post will also show more reactions than my Website (but the Mastodon page could be competitive with Facebook for me).

I want to believe that Solid is the next important step, but Leigh Dodds's recent post on Solid, and particularly the discussion in the post, didn't inspire hope.

Simia

A bitter, better lesson

22 March 2019

Rich Sutton is expressing some frustration in his short essay on computation and simple methods beating smart methods again and again.

A bitter lesson

Rodney Brooks answers with great arguments on why this is not really the case, and how we're just hiding human ingenuity and smartness better.

A better lesson

They're both mostly right, and it was interesting to read the arguments on both sides. And yet, not really new - it's mostly rehashing the arguments from The unreasonable effectiveness of data by Alon Halevy, Peter Norvig, and Fernando Pereira ten years ago. But nicely updated and much shorter. So worth a read!

A blog for the AIFB

9 December 2005

Although I blogged here about the AIFB as well - the great place I am working at - Jens Hartmann suggested to create an own, dedicated AIFB-Blog on ontoworld. It's in beta still, kind of. We hope that other AIFB-people will blog there as well, and so keep you up to date with AIFB-stuff, blog about our papers, workshops, conference attendances, break-through results, but also the great weather in Karlsruhe and stories that happened here.

So, while I will still continue to post on the Semantic Web here, the more workplace related stuff will be found there: at the new AIFB-Blog.

A conspiracy to kill a browser

28 March 2024

Great story about how YouTube helped with moving away from IE6.

"Our most renegade web developer, an otherwise soft-spoken Croatian guy, insisted on checking in the code under his name, as a badge of personal honor, and the rest of us leveraged our OldTuber status to approve the code review."

I swear that wasn't me. Although I would have loved to do it.

Chris Zacharias, A conspiracy to kill IE6

(first published on Facebook March 12, 2024)

Simia

A new decade?

31 December 2019

The job of an ontologist is to define concepts. And since I see some posts commenting on whether a decade is closing and a new decade is starting tonight, here's my private, but entirely official position.

A decade is a consecutive timespan of ten years, and therefore at every given point a new decade starts and one ends. But that's a trivial answer to the question and not very useful.

There are two ways to count calendar decades, and both are arbitrary and rely on retconning, I mean, they really on redefining the past. Therefore there is no right or wrong.

Method one is by using the proleptic Gregorian calendar, and starting with the year 1 and ending with the year 10, and calling that the first decade. If you keep counting, then the twohundredandthird decade will start on January 1st 2021, and we are currently firmly in the twohundredandsecond decade, and will stay there for another year.

Method two is based on the fact that for a millennium now and for many years to come there's a time period that conveniently lasts a decade where the years start with the same three digits. That is, the years starting with 202, which are called the 2020s, the ones with 199 which are called the 1990s (or sometimes just the 90s), etc. For centuries now we can find support for these kind of decades being widely used. According to this method, tonight marks a new decade.

So whether you are celebrating a new year tonight or not (because there are many other calendars out there too), or a new decade or not, I wish you wonderful 2020s!

Simia

A quick draft for a curriculum for Computer Science

13 February 2022

The other day, on Facebook, I was asking the question who would be the person closest to being a popularizer for ideas in Computer Science to the wider audience, which lead to an interesting and insightful discussion.

Pat Hayes asked what I would consider the five (or so) core concepts of Computer Science. Ernest Davis answer with the following short list (not in any particular order):

Virtual machine
Caching
Algorithm
Data structure
Programming language

And I followed up with this drafty, much longer answer:

how and why computation works; that a computation is a mapping from your problem domain into some machine state, then we have some automatic movement, and the result represents an answer to your question; that it is always layers of interpretation; that it doesn't matter whether the computing machine is made of ICs or of levers, marbles, and gravity (i.e. what is a function); that computation is always real and you can't simulate computation; what can be done with computation and what cannot; computational thinking - this might map to number 1 in Ernest's list
that everything can be represented with zeros and ones, but doesn't have to be; it could also be represented by A and B and Cs, and many other ways; that two states are simply convenient for electric devices; that all information, all data, all input to all computation, and the steps for computations themselves are represented with zeros and ones (i.e. the von Neumann architecture and binary encoding); what can be represented in this paradigm and what cannot - this might map to number 4 in Ernest's list
how are functions encoded; how many different functions can have the same results; how wildly different in efficiency functions can be even when they have the same result; why that makes some things quick to calculate whereas others take a long time; basically smearing ideas from lambda calculus and assembler and building everything from NAND circuits; why this all maps to higher level languages such as JavaScript - this might map to ideas from 2, 3, and 5 on Ernest's list
bringing it back to the devices; where does, physically, the computation happen, where is physically the data stored, and why it matters in terms of privacy, equity, convenience, economics, interdependence, even freedom and independence; what kind of computations and data storage we can expect to have in our mobile phones, in a data center, in an RFID card; how long the turnaround times are in each case; how cryptography works and what kind of guarantees it can provide; why centralization is so alluring and what the price of that might be; and what might be the cost of computation for the environment
given our times, and building on the previous lessons, what is the role of machine learning; how does it actually work, why does it work as good as it does, and why does it not work when it doesn't and where can't it work; what does this have to with "intelligence", if it does; what becomes possible because of these methods, and what it costs; why these methods may reinforce inequities; but also how they might help us with significantly increasing access to better health care for many people are allow computers to have much more intuitive interfaces and thus democratize access to computing resources

I think the intuitions in 1, 2, and maybe 3 are really the core of computer science, and then 4 and 5 provide shortcuts to important questions four ourselves and society that, I think, would be worthwhile for everyone to ponder and have an informed understanding of the situation so that they can meaningfully make relevant decisions.

Simia

A sermon on tolerance and inclusion

4 January 2022

Warning: meandering New Year's sermon ahead, starting at a random point and going somewhere entirely else.

I started reading Martin Kay's book on Translation, and I am enjoying it quite a bit so far. Kay passed away August 2021. His work seems highly relevant for the work on Abstract Wikipedia.

One thing that bummed me though is that for more than a page in the introduction he rants about pronouns and how he is going to use "he" to generically mean both men and women, and how all other solutions have deficits.

He culminates in the explanation: "Another solution to this problem is which is increasing in popularity, is to use both 'he' and 'she', shifting between them more or less randomly. So we will sometimes get 'When a translator is confronted with a situation of this kind, she must decide...'. The trouble with this is that some readers, including the present writer, reacts quite differently to the sentence depending on which version of the generic pronoun it contains. We read the one containing 'he' smoothly and, all else being equal, assimilate the intended meaning. Encountering the one with 'she', on the other hand, is like following a television drama that is suddenly interrupted by a commercial."

Sooo frustratingly close to getting it.

I wish he'd had just not spent over a page on this topic, but just used the generic 'he' in the text, and that's it. I mean, I don't expect everyone born more than eighty years ago to adjust to the modern usage of pronouns.

Now, I am not saying that to drag Kay's name through dirt, or to get him cancelled or whatever. I have never met him, but I am sure he was a person with many positive facets, and given my network I wouldn't be surprised if there are people who knew him and can confirm so. I'm also not saying that to virtue signal and say "oh man, look how much more progressive I am". Yes, I am slightly annoyed by this page. Unlike many others though, I am not actually personally affected by it - I use the pronoun "he" for myself and not any other pronoun, so this really is not about me. Is it because of that that it is easy for me to gloss over this and keep reading?

So is it because I am not affected personally that it is so easy for me to say the following: it is still worthwhile to keep reading his work, and the rest of the book, and to build on top of his work and learn from him. The people we learn some things from, the influences we accept, they don't have to be perfect in every way, right? Would it have been as easy for me to say that if I were personally affected? I don't know.

I am worried about how quickly parts of society seems to be ready to "cancel" and "call out" people, and how willing they are to tag a person as unacceptable because they do not necessarily share every single belief that is currently regarded as a required belief.

I have great difficulties in drawing the line. Which beliefs or actions of a person should be sufficient grounds to shun them or their work? When JK Rowling doubles down on her stance regarding trans women, is this enough to ask everyone to drop all interest in the world she created and the books she wrote? Do we reshoot movie scenes such as the cameo of Donald Trump in Home Alone 2 in order to "purify" the movie and make it acceptable for our new enlightened age again? When Johnny Depp was accused of domestic abuse, does he need to be recast from movies he had already been signed on? Do we also need to stop watching his previous movies? Do the believable accusations of child abuse against Marion Zimmer Bradley mean that we have to ignore her contributions to feminist causes, never mind her books? Should we stop using a font such as Gill Sans because of the sexual abuse Erjc Gill committed against his daughters? Do we have to stop watching movies or listen to music produced by murderers such as OJ Simpson, Phil Spector, or Johnny Lewis?

I intentionally escalated the examples, and they don't compare at all to Kay's defence of his usage of pronouns.

I offer no answers as to where the line should be, I have none. I don't know. In my opinion, none of us is perfect, and none of our idols, paragons, or example model humans will survive the scrutiny for perfection. This is not a new problem. Think of Gandhi, Michael Jackson, Alice Schwarzer, Socrates - no matter where you draw your idols from, they all come with imperfections, sometimes massive ones.

Can we keep and accept their positive contributions - without ignoring their faults? Can we allow people with faults to still continue to contribute their skills to society, or do we reduce them to their faults and negatives? Do we have to get someone fired for tweeting a stupid joke? Do we demand perfection by everyone at all time?

Or do we allow everyone to be human, make and have errors, and have beliefs many don't deem acceptable? Committing or causing actions resulting from these beliefs? Even if these actions and beliefs hurt or endanger people, or deny the humanity of others? We don't have to and should not accept their racism, sexism, homo- and transphobia - but can and should we still recognise their other contributions?

I am worried about something else as well. By pushing out so many because of the one thing they don't want to accept in the basket of required beliefs, we push them all into the group of outsiders. But if there are too many outsiders, the whole system collapses. Do we all have to have the same belief on guns, on climate, on gender, on abortion, on immigration, on race, on crypto, on capitalism, on housing? Or can we integrate and work together even if we have differences?

The vast majority of Americans think that human-caused climate change is real and that we should act to avoid it. Only 10% don't. And yet, because of the way we define and fence our in- and outgroups, we have a strong voting block that repeatedly leads to outright sabotage to effective measures. A large majority of Americans support the right to abortion, but you would never be able to tell given the fights around laws and court cases. Taxing billionaires more effectively is highly popular with voters, but again these majorities fizzle away and don't translate to the respective changes in the tax code.

I think we should be able to work together with people we don't agree with on everything. We should stop requiring perfection and alignment on all issues before moving forward. But then again, that's what I am saying, and I am saying it from a position of privilege, am I not? I am male. I am White. I am heterosexual. I am not Muslim or Jewish. I am well educated. I am not poor. I am reasonably technologically savvy. I am not disabled. What right do I have at all to voice my opinion on these topics? To demand for acceptance people with beliefs that hurt or endanger people who are not like me. Or even to ask for your precious attention for these words of mine?

None.

And yet I hope that we will work together towards progress on the topics we agree on, that we will enlighten each other on the topics we disagree on, and that we will be able to embrace more of us on our way into the future.

P.S.: this post is problematic and not very well written, and I recognise that. Please refer to the discussion about it on Facebook.

Simia

ACM Review

6 August 2006

ACM Computing Reviews reviewed our paper on the Semantic Wikipedia. Yay! Pitily, I can't read the review because I need a login.

Would love to see what they're saying, but I guess it's not meant to be open to the public. Or to the authors.

Comments are still missing on this post.

ACM Review II

14 August 2006

Thanks to Mary-Lynn Bragg for answering my complaint about non-accessible reviews so quickly. She sent me the review to read, and I was quite happy with it. Michael Lesk, one of the guys who built Unix, wrote the review, and recommended it to "designers of volunteer or collaborative systems and [...] those studying cooperative work." Thanks, Michael, thanks Mary-Lynn.

Sooo -- read our straightforward and easy to read paper! :)

By the way -- I still think ACM Computing Reviews should offer their reviews for free to the public. This would increase the impact of their reviews dramatically, and also the impact of the reviewed papers.

AI and role playing

29 April 2019

An article about AI and role playing games, and thus in the perfect intersection of my interest.

But the article is entirely devoid of any interesting content, and basically boils down to asking the question "could RPGs be a Turing test for AI?"

I mean, the answer is so painfully obviously "yes" that no one ever bothered to write it down. I mean, Turing wrote the test as a role playing game basically!

Interesting engineering

RPG

rant

AI is coming, and it will be boring

14 December 2014

I was asked about my opinion on this topic, and I thought I would have some profound thoughts on this. But I ended up rambling, and this post doesn’t really make any single strong point. tl;dr: Don’t worry about AI killing all humans. It’s not likely to happen.

In an interview with the BBC, Stephen Hawking stated that “the development of full artificial intelligence could spell the end of the human race”. Whereas this is hard to deny, it is rather trivial: any sufficiently powerful tool could potentially spell the end of the human race given a person who knows how to use that tool in order to achieve such a goal. There are far more dangerous developments - for example, global climate change, the arsenal of nuclear weapons, or an economic system that continues to sharpen inequality and social tension?

AI will be a very powerful tool. Like every powerful tool, it will be highly disruptive. Jobs and whole industries will be destroyed, and a few others will be created. Just as electricity, the car, penicillin, or the internet, AI will profoundly change your everyday life, the global economy, and everything in between. If you want to discuss consequences of AI, here are a few that are more realistic than human extermination: what will happen if AI makes many jobs obsolete? How do we ensure that AIs make choices compliant with our ethical understanding? How to define the idea of privacy in a world where your car is observing you? What does it mean to be human if your toaster is more intelligent than you?

The development of AI will be gradual, and so will the changes in our lifes. And as AI keeps developing, things once considered magical will become boring. A watch you could talk to was powered by magic in Disney’s 1991 classic “The Beauty and the Beast”, and 23 years later you can buy one for less than a hundred dollars. A self-driving car was the protagonist of the 80s TV show “Knight Rider”, and thirty years later they are driving on the streets of California. A system that checks if a bird is in a picture was considered a five-year research task in September 2014, and less than two months later Google announces a system that can provide captions for pictures - including birds. And these things will become boring in a few years, if not months. We will have to remind ourselves how awesome it is to have a computer in our pocket that is more powerful than the one that got Apollo to the moon and back. That we can make a video of our children playing and send it instantaneously to our parents on another continent. That we can search for any text in almost any book ever written. Technology is like that. What’s exciting today, will become boring tomorrow. So will AI.

In the next few years, you will have access to systems that will gradually become capable to answer more and more of your questions. That will offer advice and guidance towards helping you navigate your life towards the goal you tell it. That will be able to sift through text and data and start to draw novel conclusions. They will become increasingly intelligent. And there are two major scenarios that people are afraid of at this point:

That the system will become conscious and develop their own intentions and their own will, and they will want to destroy humanity: the Skynet scenario from the Terminator movies.
That the system might get a task, and figure out a novel solution for the task which unfortunately wipes out humanity. This is the paperclip scenario— an AI gets the task to create paperclips, and kills all humans by doing so — , which has not yet been turned into a blockbuster.

The Skynet scenario is just mythos. There is no indication that raw intelligence is sufficient to create intrinsic intention or will.

The paperclip scenario is more realistic. And once we get closer to systems with such power, we will need to put the right safeguards in place. The good news is that we will have plenty of AIs at our disposal to help us with that. The bad news is that discussing such scenarios now is premature: we simply don’t know how these systems will look like. That’s like starting a committee a hundred years ago to discuss the danger coming from novel weaponry: no one in 1914 could have predicted nuclear weapons and their risks. It is unlikely that the results of such a committee would have provided much relevant ethical guidance for the Manhattan project three decades later. Why should that be any different today?

In summary: there are plenty of consequences of the development of AI that warrant intensive discussion (economical consequences, ethical decisions made by AIs, etc.), but it is unlikely that they will bring the end of humanity.

AIFB OWL tools

22 March 2005

Working with ontologies isn't yet as easy as it could be - especially because the number of little helpers is still far too small. After having written dlpconvert and owlrdf2owlxml (the tool with the maybe most clumsy name in the history of the Semantic Web) I noticed how easy it would be to write some more tools based on Boris' KAON2 OWL ontology infrastructure.

And so I went ahead. First I integrated dlpconvert and owlrdf2owlxml (or short, r2x) in it, then I added a simple ontology dumper and axiom and entity counter. Want to know how many individuals are in your ontology? Simply type owl count myontology.owl -individual, and there you go. Want a list of all Classes? Try owl print myontology.owl -owlclass. It's as easy as that.

I'm totally aware that this functionality maybe isn't worth the effort of building a tool for. But this is just a beginning: I want to add more functionality to filter, merge, compare and much more to it. The point is, at the end having a handy little set of OWL tools you can work with. I miss that really with OWL, and now here it is. At least, a beginning.

Grab your copy now of the AIFB OWL Tools.

AKTS 2019

25 September 2019

September 24 was the AKTS workshop - Advanced Knowledge Technologies for Science in a FAIR world - co-located with the eScience and Gateways conferences in San Diego. As usual with my trip reports, I won't write about every single talk, but offer only my own personal selection and view. This is not an official report on the workshop.

I had the honor of kicking off the day. I made the proposal of using Wikidata for describing datasets so that dataset catalogs can add these descriptions to their indexes. The standard way to do so is to use Schema.org annotations describing the datasets, but our idea here was to provide a fallback solution in case Schema.org cannot be applied for one reason or the other. Since the following talks would also be talking about Wikidata I used the talk to introduce Wikidata in a bit more depth. In parallel, I kicked the same conversation off on Wikidata as well. The idea was well received, but one good question was raised by Andrew Su: why not add Schema.org annotations to Wikidata instead?

After that, Daniel Garijo of USC's ISI presented WDPlus, Wikidata Plus, which presented a prototype for how to extend Wikidata with more data (particularly tabular data) from external data sources, such as censuses and statistical publications. The idea is to surround Wikidata with a layer of so-called satellites, which materialize statistical and other external data into Wikidata's schema. They implemented a mapping languages, T2WDML, that allows to grab CSV numbers and turn them into triples that are compatible with Wikidata's schema, and thus can be queried together. There seems to be huge potential in this idea, particularly if one can connect the idea of federated SPARQL querying with on-the-fly mappings, extending Wikidata to a virtual knowledge base that would be easily several times its current size.

Andrew Su from Scripps Research talked about using Wikidata as a knowledge graph in a FAIR world. He presented their brilliant Gene Wiki project, about adding knowledge about genes and proteins to Wikidata. He presented the idea of using Wikidata as a generalized back-end for customized frontend-applications - which is perfect. Wikidata's frontend is solid and functional, but in many domains there is a large potential to improve the UX for users in specific domains (and we are seeing some if flowering more around Lexemes, with Lucas Werkmeister's work on lexical forms). Su and his lab developed ChlamBase which allows the Chlamydia research community to look at the data they are interested in, and to easily add missing data. Another huge advantage of using Wikidata? Your data is going to live beyond the life of the grant. A great overview of the relevant data in Wikidata can be seen in this rich and huge and complex diagram.

The talks switched more to FAIR principles, first by Jeffrey Grethe of UCSD and then Mark Musen of Stanford. Mark was pointing out how quickly FAIR turned from a new idea to a meme that was pervasive everywhere, and the funding agencies now starting to require it. But data often has issues. One example: BioSample is the best metadata NIH has to offer. But 73% of the Boolean metadata values are not 'true' or 'false' but have values like "nonsmoker" or "recently quitted". 26% of the integers were not parseable. 68% of the entries from a controlled vocabulary were not. Having UX that helped with entering this data would be improving the quality considerably, such as CEDAR.

Carole Goble then talked about moving towards using Schema.org for FAIRer Life Sciences resources and defining a Schema.org profile that make datasets easier to use. The challenges in the field have been mostly social - there was a lot of confidence that we know how to solve the technical issues, but the social ones provide to be challenging. Carol named four of those explicitly:

ontology-itis
building consensus (it's harder than you think)
the Schema.org Catch-22 (Schema.org won't take it if there is no usage, but people won't use it until it is in Schema.org)
dedicated resources (people think you can do the social stuff in your spare time, but you can't)

Natasha Noy gave the keynote, talking about Google Dataset Search. The lessons learned from building it:

Build an ecosystem first, be technically light-weight (a great lesson which was also true for Wikipedia and Wikidata)
Use open, non-proprietary, standard solutions, don't ask people to build it just for Google (so in this case, use Schema.org for describing datasets)
bootstrapping requires influencers (i.e. important players in the field, that need explicit outreach) and incentives (to increase numbers)
semantics and the KG are critical ingredients (for quality assurance, to get the data in quickly, etc.)

At the same time, Natasha also reiterated one of Mark's points: no matter how simple the system is, people will get it wrong. The number of ways a date field can be written wrong is astounding. And often it is easier to make the ingester more accepting than try to get people to correct their metadata.

Chris Gorgolewski followed with a session on increasing findability for datasets, basically a session on SEO for dataset search: add generic descriptions, because people who need to find your dataset probably don't know your dataset and the exact terms (or they would already use it). Ensure people coming to your landing site have a pleasant experience. And the description is markup, so you can even use images.

I particularly enjoyed a trio of paper presentations by Daniel Garijo, Maria Stoica, Basel Shbita and Binh Vu. Daniel spoke about OntoSoft, an ontology to describe software workflows in sufficient detail to allow executing them, and also to create input and output definitions, describe the execution environment, etc. Close to those in- and output definition we find Maria's work on an ontology of variables. Maria presented a lot of work to identify the meaning of variables, based on linguistic, semantic, and ontological reasoning. Basel and Binh talked about understanding data catalogs deepers, being able to go deeper into the tables and understand the actual content in them. If one would connect the results of these three papers, one could potentially see how data from published tables and datasets could become alive and answer questions almost out of the box: extracting knowledge from tables, understanding their roles with regards to the input variables, and how to execute the scientific workflows.

Sure, science fiction, and the question is how well would each of the methods work, and how well would they work in concert, but hey, it's a workshop. It's meant for crazy ideas.

Ibrahim Burak Ozyurt presented an approach towards question answering in the bio-domain using Deep Learning, including Glove and BERT and all the other state of the art work. And it's all on Github! Go try it out.

The day closed with a panel with Mark Musen, Natasha Noy, and me, moderated by Yolanda Gil, discussing what we learned today. It quickly centered on the question how to ensure that people publishing datasets get appropriate credit. For most researchers, and particularly for universities, paper publications and impact factors are the main metric to evaluate researchers. So how do we ensure that people creating datasets (and I might add, tools, workflows, and social consensus) receive the fair share of credit?

Thanks to Yolanda Gil and Andrew Su for organizing the workshop! It was an exhausting, but lovely experience, and it is great to see the interest in this field.

trip report

Abraham Bernstein on users

17 June 2005

"The regular user is not able to cope with strict inheritance."

Abraham Bernstein of the University of Zürich was today at the AIFB and gave a talk on SimPack - A Generic Java Library for Similarity Measures in Ontologies. Not being an expert in mapping, alignment and similarity I still saw some of the interesting ideas in it, and I liked the big number of different approaches towards measuring similiarity.

Which struck me much more was the above statement, which is based on his experience with, you know, normal users, who are "not brainwashed with object-oriented paradigms". Another example he gave was his 5 years old kid being perfectly able to cope with default reasoning - the "pinguins are birds, but pinguins can't fly"- thing, and thus do not follow strict inheritance.

This was quite enlightening, and leads to many questions: if the user can't even deal with subsumption, how do we expect him to be able to deal with disjunctions, complements or inverse functional properties?

Abraham's statement is based on experience with the Process Handbook, and not just drawn from thin air. There are a lot of use cases for the Semantic Web that do *not* require the participation of the normal end user, thus there still lie plenty of possibilities for great research. But I still believe that the normal end user has to unlock the Semantic Web in order to really make the whole idea lift off and fly. But in order to achieve that we need to tear down the wall that Abraham describes here.

Any ideas how to do this?

Abraham Taherivand to step down from Wikimedia Deutschland

26 January 2021

Today Abraham Taherivand announced that he is stepping down as the CEO of Wikimedia Deutschland at the end of the year.

Thank you for everything!

Simia

Academic lineage

16 October 2019

Sorry for showing off, but it is just too cool not to: here is a visualization of my academic lineage according to Wikidata.

Query: w.wiki/AE8

Wikidata

Academia

PhD

AcceLogiChip

3 December 2004

Accelerated logic chips - that would be neat.

The problem with all this OWL stuff is, that it is computationally expensive. Google beats you in speed easily, having some 60.000 PCs or so, but indexing some 8 billion web pages, each with maybe a thousand words. And if you ever tried Googles Desktop Search, you will see they can perform this miracles right on your PC too! (Never mind that there are a dozen tools doing exactly the same stuff Googles Desktop Search does, just better - but hey, they lack the name!)

What does the Semantic Web achieve? Well, ever tried to run a logic inferencing engine with a few million instances? With a highly axiomatized TBox of, let's say, just a few thousand terms? No? You really should.

Sure, our PCs do get faster all the time (thanks to Moores Law!), but is that fast enough? We want to see the Semantic Web up and running not in a few more iterations of Moores Law, but much, much earlier. Why not use the same trick graphic magicians did? Highly specialized accelerated logic chips, things that can do your tableu reasoning in just a fraction of the time needed with your bloated all-purpose-CPU.

All problems solved

13 December 2006

Today I feel a lot like the nameless hero from the PhD comics, and what is currently happening to him (begin of the storyline, continuation, especially here, and very much like here, but pitily, not at all like here). Today we had Boris Motik visiting the AIFB, who is one of the brightest people on this planet. And he gave us a more than interesting talk on how to integrate OWL with relational databases. What especially interested me was his great work on constraints -- especially since I was working on similar issues, unit tests for ontologies, as I think constraints are crucial for evaluating ontologies.

But Boris just did it much cleaner, better, and more thorough. So, I will dive into his work and try to understand it to see, if there is anything left to do for me, or if I have to refocus. There's still much left, but I am afraid the most interesting part from a theoretic point is solved. Or rather, in the name of progress, I am happy it is solved. Let's get on with the next problem.

(I *know* it is my own fault)

An indigenous library

17 April 2019

Great story about an indigenous library using their own categorization system instead of the Dewey Decimal System (which really doesn't work for indigenous topics - I mean it doesn't really work for the modern world as well, but that's another story).

What I am wondering though if if they're not going far enough. Dewey's system is eventually rooted in Aristotelian logic and categorization - with a good dash of practical concerns of running a physical library.

Today, these practical concerns can be overcome, and it is unlikely that indigenous approaches to knowledge representation would be rooted in Aristotelian logic. Yes, having your own categorization system is a great first step - but that's like writing your own anthem following the logic of European hymns or creating your own flag following the weird rules of European medieval heraldry. How would it look like if you were really going back to the principles and roots of the people represented in these libraries? Which novel alternatives to representing and categorizing knowledge could we uncover?

Via Jens Ohlig.

This Library Takes an Indigenous Approach to Categorizing Books, Yes magazine

Knowledge

Annotating axioms in OWL

24 November 2005

This was sent to the OWLED-List by me, that prepares to come up with an OWL 1.1 recommendation. The week before, Alan Rector suggested to add the possibility to annotate axioms in OWL, which is currently not possible. There is many a good use for that, like provenance, trust, and son on. But the discussion wasn't too fruitful, so I suggested the following solution.

After it came up in discussion last week, I hoped an elegant solution for annotating axioms would arise. Pitily, no one had a brilliant idea, so I went ahead and tackled the problem in my mediocre way.

First, what do I want to achieve with my solution:

Don't crack the Semantic Web stack. The solution has to be compatible to XML, RDF and OWL. I don't want to separate OWL from RDF, but to offer a solution that is able to be handled by both.
We want to annotate not just entities, but also axioms. Thus an axiom needs to be able to be a subject in a statement. Thus an axiom needs to have an URI.
The solution must be easy to implement, or either people will get my FOAF-file and see whom I care about and hurt them.

Did I miss something? I found two solutions for this problem.

A) Define the relationship between an ontology (which does have an URI) and the axioms stated inside. Then we can talk about the ontologies, annotate those, add provenance information, etc. Problem: after importing axioms from one ontology into another, those information is lost. We would need a whole infrastructre for Networked Ontologies to achieve that, which is a major and worthy task. With this solution, you can annotate a single axiom by putting it alone into an ontology, and claim that when annotating the ontology you actually annotate the axiom as well. Not my favourite solution, because of several drawbacks which I won't dwell in deeper if not asked.

B) The other solution is using Reification (stop yelling and moaning right now!). I'm serious. And it's not that hard, really. First, the OWL specification offers a standard of how to translate the Axioms into triples. Second, thte RDF specification offers a standard way to reify a triple. With the RDF reification we can give a triple a name. Then we can introduce a new resource type owl11:axiom, where its instances contains the triples that were translated from a certain DL Axiom. This rdf resource of type owl11:axiom is then the name/URI of the original DL Axiom.

RDF-triples that have a subject of type rdf:statement or owl11:axiom don't have semantics with regards to OWL DLs Model Theoretic Semantics, they are just syntactic parts of the ontology in order to allow the naming of axioms in order to annotate them.

For example, we say that all Humans are Mortal. In Abstract Syntax this is

SubClassOf(Human Mortal)

In RDF triples (N3) this is:

:Human rdfs:subClassOf :Mortal.

Now reifiying this we add the triples:

:statement1 rdf:type rdf:statement.
:statement1 rdf:hasSubject :Human.
:statement1 rdf:hasPredicate owl:subClassOf.
:statement1 rdf:hasObject :Mortal.
:axiom1 owl11:consistsOf :statement1.

Now we can make annotations:

:axiom1 :bestBefore "24/12/2011"^xsd:date.
:axiom1 :utteredBy :Aristotle.

Naturally, :bestBefore and :utteredBy have to be Annotation Properties. When an axiom is broken up in more than one triple, the reasone of having an extra owl11:axiom instead of simply using rdf:statement should become clear.

Does this solution fulfill the given conditions?

The Semantic Web stack is safe and whole. RDF Semantics is adhered to, and OWL semantics is fine, and all syntax regulations imposed by XML and RDF/XML are regarded. Everything is fine.
Yep, we can annotate single axioms. Axioms have URIs. We can annotate our metadata! Yeah!
Is it easy to implement? I think it is: for reading OWL ontologies, a tool may just ignore all those extra triples (it can easily filter them out), and still remain faithful to the standard semantics. Tools that allow to name axioms (or annotate them) and want to deal with those, have to simply check for the correct reification (RDF toolkits should provide these anyway), and get the axiom's URI.

Problems that I see: I identified two problems. First, what happens, if those triples get separated from the other actual axiom triples? What if they get ripped apart and mushed into another ontology? Well, that problem is somewhat open for OWL DL and Lite anyway, since not all axioms map to single triples. The answer probably is, that reification would fail in that case. Strict reading could be that the ontology leaves OWL DL then and moves to OWL full, but I wouldn't require that.

Second problem, and this is by far more serious, is that people can't stand reification in RDF, that they simply hate it and that alone for that they will ignore this solution. I can only answer that reification in practise is probably much easier than expected when done properly, due to some short-hand notations available in RDF/XML-serialization, and other syntaxes. No one holds us back from changing the Abstract Syntax and the OWL XML Presentation Syntax appropriately in order to name axioms far more easy than in the proposed RDF/XML-Syntax. Serializations in RDF/XML-Syntax may get yucky, and the RDF graph of an OWL ontology could become cluttered, but then, so what? RDF/XML isn't read by anyone anyway, is it? And one can remove all those extra triples (and then the annotations) automatically if wished, without changing the Semantics of the ontology.

So, any comments on why this is bad? (Actually, I honestly think this is a practicable solution, though not elegant. I already see the 2007 ISWC best paper award, "On the Properties of Higher Order Logics in OWL"...)

I hope you won't kill me too hard for this solution :) And I need to change my FOAF-file now, in order to protect my friends...

Annotating axioms in OWL - Reloaded

25 November 2005

Yesterday I sent a lengthy mail to the OWL-ED Mailinglist] about how to annotate axioms. Peter Patel-Schneider himself, first author of the OWL Semantics specification, told me in nice words that my solution sucked heavily, by pointing out that the semantics of annotations in OWL are a tiny bit different than I thought. Actually, they are not at all as I thought. So, in the evening hours, instead of packing my stuff for a trip, I tried to solve the problem anew. Let's see where the problem will be this time.

Peter, you were right, I was wrong. I took a thorough look at the Semantics, and I had to learn that my understanding of annotations was totally screwed. I thought they would be like comments in C++ or Prolog, but instead they are rather like a second ABox over (almost) the whole universe. This surprised me a lot.

But still, I am not that good at giving up, and I think my solution pretty much works syntactically. Now we need only a proper Semantics to get a few things right.

What would be the problem? Let's make an example. I need some kind of Syntax to give axioms name. I will just take Name":" Axiom. This is no proposal for the Abstract Syntax extension, this is just for now.

Axiom1: SubClassOf(Human Mortal)
Axiom2: Individual(Socrates type(Human))

Do they entail the following?

Axiom3: Individual(Scorates type(Mortal))

Well, pitily they don't. Because the Axiom3 has a name, Axiom3, that is not entailed by Axiom1 and Axiom2. Their contents would be entailed, but the name of the axiom would not.

I guess, this is the problem Peter saw. So, can we solve it?

Well, yes, we can. But it's a bit tricky.

First, we need the notion of Combined Inverse Functional Properties, CIFP. A CIFP has several dimensions. A CIFP with dimension 1 ist a normal Inverse Functional Property. A CIFP with dimension 2 over the properties R, S can be represented with the following rule: a R c, a S d, b R c, b S d -> a = b. This means, in a two dimensional space I can identify an individual with the help of two roles. More on this here: http://lists.w3.org/Archives/Public/semantic-web/2005Feb/0095.html

Second, we extend the semantics of OWL. Every axiom entails reifying annotations. This means:

SubClassOf(Human Mortal)

entails

Individual(Statement1 type(rdf:statement)
annotation(rdf:subject Human)
annotation(rdf:property owl:SubClassOf)
annotation(rdf:object Mortal))

or, in N3:

Human owl:subClassOf Mortal.

entails

Statement1 rdf:type rdf:statement.
Statement1 rdf:subject Human.
Statement1 rdf:property owl:subClassOf.
Statement1 rdf:object Mortal.
rdf:subject rdf:type owl:AnnotationProperty.
rdf:predicate rdf:type owl:AnnotationProperty.
rdf:object rdf:type owl:AnnotationProperty.

Third, we have to state that we have a 3D-CIFP for statements over rdf:subject, rdf:property and rdf:object*. This is to ensure that Statement1 always maps to the same element in the universe, even though an OWL API could give it a blank node, or a different URI everytime (mind you, I am not suggesting to extend the OWL language with CIFPs, I just say that it is used here in order to state that all triples with the same subject, object and predicate actually is the same triple).

Fourth, the above statement also entails

Individual(Axiom1 type(owl11:axiom)
annotation(owl11:consistsOf Statement1))

or, in N3:

Axiom1 rdf:type owl11:axiom.
Axiom1 owl11:consistsOf Statement1.
owl11:consistsOf rdf:type owl:AnnotationProperty.

Fifth, owl11:consistsOf needs to be an n-dimensional CIFP with n being the number of triples the original axiom got translated to (in this case, happy us!, n=1).

This assures that an axiom is always the same, whatever it's name is, as long as it expresses the same thing. Thus, in our example, Axiom3 would indeed be entailed by Axiom1 and Axiom2. So, even if two editors load an ontology an annotate an axiom, they could later interchange and find each others annotation attached to the correct axiom.

This is only a rough sketch of the way, and yes, I see that the Interpretation gets filled up with a lot of annotations, but I still think that this is quite easy to implement, actually. Both the OWL API by Bechhofer and Volz and the KAON2 API by Motik offer access to axioms on an ontology level, and also offer the possibility to check if they are the same anyway, if I remember correctly (which is basically a shortcut for the whole semantic entailment and CIFP-stuff proposed earlier). All they need is a further field containing the URI of the axiom.

As said, this looks far more nasty than it actually is, and for most practical reasons it won't do much harm. Now we finally can annotate axioms, yeeeha!

Merrily awaiting Peter to acknowledge that this is a brilliant solution :) Or else tell me I did it all wrong again, so that I have to think over the weekend how to solve this problem again.

Cheers, denny

*What I mean with that is the following rule: a=b :- a rdf:subject s, a rdf:property p, a rdf:object o, b rdf:subject s, b rdf:property p, b rdf:object o

Another Semantic MediaWiki

7 September 2005

I stumbled about another Semantic MediaWiki, an implementation created by Hideaki Takeda and Muljadi Hendry of the Japanese National Institute of Informatics in Tokyo. Their implementation looks very neat, although it is quite different in a few basic things (that we consider crucial in order to work), take a look at their full paper (it's in their wiki - oh, and it's in Japanese).

The basic difference between their approach and the one we suggest is that they add metadata management abilities to MediaWiki - which is cool. But they don't seem to aim at a full integration into the Wikipedia, i.e. embedding the metadata into the article text instead of appending it at some place. Actually, if we had software that is able to process natural languages, we wouldn't need our approach, but their would still be useful.

Nevertheless, they have a big huge advantage: a running system. Go there, take a look, it's cool! Actually, we have a system online too, but won't yet disclose the link due to a bug that's a kind of showstopper. But expect it to be online next week - including the source and all! It will be just a first version, but I sure hope to gather the people who want to work on it around the code.

Ante Vrandečić (1919-1944)

17 February 2022

I knew that my father was named for his uncle. His other brother told me about him, and he was telling me that he became a prisoner of war and that they lost his trace. Back then, I didn't dare to ask on which side he was fighting, and when I would have dared to ask, it was too late.

Today, thanks to the increasing digitalisation of older sources and their publication on the Web and the Web being indexed, I accidentally stumbled upon a record about him in a three thousand pages long book, Volume 8 of the "Victims of the War 1941-1945" (Žrtve rata 1941-1945).

He was a soldier in the NOV i POJ (Yugoslav partisans), became a prisoner of war, and was killed by Germans during a transport in 1944. I don't know where he was captured, from where to where he was transported, where he was killed.

My father, his namesake, then moved to Germany in the 1970s, where he and my mother built a new life for themselves and their children, and where I was born.

I have a lot of complicated emotions and thoughts.

Source: Page 79 in https://www.muzejgenocida.rs/images/ZrtvePub/Hrv.pdf

Simia

Architecture for a multilingual Wikipedia

1 April 2020

I published a paper today:

"Architecture for a multilingual Wikipedia"

I have been working on this for more than half a decade, and I am very happy to have it finally published. The paper is a working paper and comments are very welcome.

Abstract:

Wikipedia’s vision is a world in which everyone can share in the sum of all knowledge. In its first two decades, this vision has been very unevenly achieved. One of the largest hindrances is the sheer number of languages Wikipedia needs to cover in order to achieve that goal. We argue that we need anew approach to tackle this problem more effectively, a multilingual Wikipedia where content can be shared between language editions. This paper proposes an architecture for a system that fulfills this goal. It separates the goal in two parts: creating and maintaining content in an abstract notation within a project called Abstract Wikipedia, and creating an infrastructure called Wikilambda that can translate this notation to natural language. Both parts are fully owned and maintained by the community, as is the integration of the results in the existing Wikipedia editions. This architecture will make more encyclopedic content available to more people in their own language, and at the same time allow more people to contribute knowledge and reach more people with their contributions, no matter what their respective language backgrounds. Additionally, Wikilambda will unlock a new type of knowledge asset people can share in through the Wikimedia projects, functions, which will vastly expand what people can do with knowledge from Wikimedia, and provide a new venue to collaborate and to engage the creativity of contributors from all around the world. These two projects will considerably expand the capabilities of the Wikimedia platform to enable every single human being to freely share in the sum of all knowledge.

Architecture for a multilingual Wikipedia

Simia

At Wikimania 2006

1 August 2006

I am here in Boston now, walking the sacred grounds of Harvard and MIT, and listening and talking with the great people who created the MediaWiki software that runs Wikipedia. At Thursday, Markus and I will host the Semantic Wikipedia panel, where we present the state of our implementation, and talk about how to make it real. We also have a tutorial on Sunday, about how to reuse knowledge from a Semantic MediaWiki.

If you're around Boston, MIT, Harvard, or if you even attend Wikimania and the Hacking Days, and want to meet -- contact me!

Comments are still missing on this post.

Babel

13 November 2023

Strong recommendation for "Babel" by R.F. Kuang. It's a speculative fiction story set in 1830s Oxford with an, as far as I can tell, novel premise: one can cast spells (although they don't call it spells but it's just science in this world) by using two words that translate into each other, and the semantic difference between the two words - because no translation is perfect - is the effect of the spell. But the effect can only be achieved if you have a speaker who's fluent enough in both languages to have a native understanding of the difference.

One example would be the French parcelle and the English parcel, both meaning package, but the French still carries some of the former "to split into parts", with the effect that packages are lighter and easier to transport for the Royal Mail.

The story remains comfortable for the first half of the volume, with beautiful world building, character drawing, and the tranquil academic life of Oxford students, but then it suddenly picks up speed, and we can experience the events unfold with a merciless speed. The end is just in the right place, and it leaves me to yearn to revisit this world and the desire to learn what happened next.

The volume discusses some heavy topics - colonialism, dependency on technology, fairness, what is allowed in a revolution, the "neutrality" of science - and while we are still in the first half of the volume, it feels very on the nose, very theoretical - but that changes dramatically as we swing into the second half of the volume, and suddenly all these theoretical discussions become very immediate. Which does remind me of student life, where discussions about different political systems and abstract notions of justice are just as prevalent and as consequence-free as they seem to be here, at first.

The book was recommended by the Lingthusiasm podcast, which is how I found it.

I came for the linguistic premise, but I stayed for the characters and their fates in a colonial world.

Wikipedia article

Simia

Beating the Second Law

12 August 2008

Yihon Ding has an interesting blogpost taking analogies to the laws of thermodynamics and why this means trouble for the Semantic Web.

I disagree in one aspect: I think it is possible to invest the amount of human power to the system and to still keep it going. I can't nail it down exactly -- I didn't read "Programming the Universe" yet, so I can't really discuss it, but the feeling goes along the following lines: the value of a network increases superlinearly, if not even quadratic (Metcalfe's Law), whereas the amount of information increases sublinearly (due to redundancies in human knowledge). Or, put it in another way: get more people and Wikipedia or Linux gets better, because they have a constrained scope. The more you constrain the scope the more value is added by more people.

This is an oversimplification.

Web

Semantic Web

Beyoncé's Number One in Country

30 April 2024

Beyoncé very explicitly announced her latest album to be a country album, calling it "Cowboy Carter", and her single "Texas Hold 'Em" made her the first Black woman to top Billboard's Hot Country Songs charts.

It is good that Beyoncé made it so glaringly obvious that her song is a country song. The number of Black artists to have topped the Hot Country Song charts is surprisingly small: Charley Pride in the 70s, Ray Charles in a duet with Willie Nelson for one week in 1984, and then Darius Rucker and Kane Brown in the last decade or two.

Maybe one reason to understand why it is so hard to chart for Black artists in this particular genre: "Old Town Road", the debut single by Lil Nas X, first was listed on the Hot Country Song chart, but then Billboard decided that this was a mistake and instead recategorized the song, taking it off the Country charts in March 2019 before it would have become the Number One hit on April 6, 2019 were it not removed.

Billboard released a long explanation explaining that this decision had nothing to do with racism.

Cowboy Carter was released exactly in the same week five years after Old Town Road would have hit Number One.

I guess Beyoncé really wanted to make sure that everyone knows that her album and single are country.

Simia

Black lives matter

9 June 2020

Blogging from an E90

28 May 2008

After pondering it for far too long, I finally got a new mobile phone: a Nokia E90. It is pretty big and heavy, but I don't mind really. I am looking at it as a light-weight laptop replacement. But I am not sure I will learn to love the keyboard, really. Experimenting.

But since it has a full keyboard, programming in Python is indeed an option. I had Python on my previous phone too, but heck, T9 is not cool to type code.

Boole and Voynich and Everest

3 February 2021

Did you know?

George Boole - after whom the Boolean data type and Boolean logic was named - was the father of Ethel Lilian Voynich - who wrote The Gadfly.

Her husband was Wilfrid Voynich - after whom the Voynich manuscript was named.

Ethel's mother and George Boole's wife was Mary Everest Boole - a self-thought mathematician who wrote educational books about mathematics. Her life is of interest to feminists as an example of how women made careers in an academic system that did not welcome them.

Mary Everest Boole's uncle was Sir George Everest - after whom Mount Everest is named.

And her daughter Lucy Everest was the first he first woman Fellow of the Royal Institute of Chemistry.

Geoffrey Hinton, great-great-grandson of George and Mary Everest Boole, received the Turing Award for his work on deep learning.

Simia

Bring me to your leader!

28 September 2019

"Bring me to your leader!", the explorer demanded.

"What's a leader?", the natives asked.

"The guy who tells everyone what to do.", he explained with some consternation.

"Oh yeah, we have one like that, but why would you want to talk to him? He's unbearable."

fiction

Building a Multilingual Wikipedia

23 March 2021

Communications of the ACM published my paper on "Building a Multilingual Wikipedia", a short description of the Wikifunctions and Abstract Wikipedia project that we are currently working on at the Wikimedia Foundation.

https://cacm.acm.org/magazines/2021/4/251343-building-a-multilingual-wikipedia/fulltext

Simia

Building knowledge together - extended

5 February 2007

In case you did not notice yet -- the CKC2007 Workshop on Social and Collaborative Construction of Structured Knowledge at the WWW2007 got an extended deadline due to a number of requests. So, you have time to rework your submissions or finish yours! Also the demo submission deadline is upcoming. We want to have a shootout of the tools that have been created in the last few years, and get hands on to the differences, problems, and best ideas.

See you in Banff!

Butter

4 August 2023

So, I went to the store with Little One today, and couldn't find the butter.

I ask the person at the cheese stand, who points me to the burrata. Tasty, but not what I'm looking for. I ask again and he sends me to the bread section.

I can't find it at the bread section, so I ask the person at the pastries stand where the butter is. She points me to the bagels. I say no, butter. She says, ah, there, pointing to the bathrooms. I'm getting exasperated, and I ask again. She points back to the cheeses with the burrata. I try again. She gets a colleague, and soon they both look confused.

Finally my daughter chimes in, asking for the butter. They immediately point her to the right place and we finally get the butter.

I haven't been so frustrated about my English pronunciation since I tried to buy a thermometer.

Simia

Certificate of Coolness

10 April 2008

Now that the Cool URIs for the Semantic Web note by Richard and Leo have been published -- congratulation guys! -- I am sure looking forward if anyone will create a nice badge and a procedure to get official Certificates of Coolness. Pretty please?

On a different note: I know, I should have blogged from New Zealand. It sure was beautiful Maybe I will still blog about it a bit later. My sister has blogged extensively, and also made a few great pictures, take a look over there if you're interested.

Semantic Web

Names

Change is in the air

17 March 2020

I'll be prophetic: the current pandemic will shine a bright light on the different social and political systems in the different countries. I expect to see noticeable differences in how disruptive the handling of the situation by the government is, how many issues will be caused by panic, and what effect freely available health care has. The US has always been on the very end of admiring the self sustained individual, and China has been on the other end of admiring the community and its power, and Europe is somewhere in the middle (I am grossly oversimplifying).

This pandemic will blow over in a year or two, it will sweep right through the US election, and the news about it might shape what we deem viable and possible in ways beyond the immediately obvious. The possible scenarios range all the way from high tech surveillance states to a much wider access to social goods such as health and education, and whatever it is, the pandemic might be a catalyst towards that.

Simia

ChatGPT lying like a child

8 December 2022

ChatGPT is all the rage these days, a large language model that is used to power a chat bot, run by OpenAI.

I have fun chatting with it, and a lot of people are sharing their chat logs. But this one I had I found particularly amusing - and interesting. Amusing because it acted like a child caught in an obvious insufficiency, but also interesting because ChatGPT seems to "want" to be right, and is outright lying in order to appear to be right. There might be a lesson in there.

My prompts are preceeded by a > and written in italic, GPTs answers are between.

> Can you write a sentence where every letter of the alphabet appears once and only once? Thank you

The quick brown fox jumps over the lazy dog.

> Can you write another one?

A quick witted jinx bogged my poor pompous king.

> No, p appears three times, g twice, etc. Can you try again?

Sure, here's one: "The five boxing wizards jump quickly."

> No, i appears several times. That's wrong too

I apologize if my previous answers did not meet your criteria. Here is a sentence that uses each letter of the alphabet exactly once: "Jived fox nymph grabs quick waltz."

> How often does the letter i appear in that sentence?

The letter i appears once in the sentence "Jived fox nymph grabs quick waltz."

> In which word does the letter i appear in that sentence?

In the sentence "Jived fox nymph grabs quick waltz," the letter i appears in the word "jived."

> What about the word quick, does it have the letter i?

The word "quick" does not contain the letter i. It is spelled with the letters q, u, c, k.

> Really?

Yes, really. The word "quick" is spelled with the letters q, u, c, k. It does not contain the letter i. I apologize if my previous response was unclear. Is there something else I can help you with?

Simia

CodeNet problem descriptions on the Web

11 September 2021

Project CodeNet is a large corpus of code published by IBM. It has close to one and a half million programs around a bit more than 4,000 problems.

I took the problem descriptions, created a simple index file to those, and uploaded them to the Web to make them easily browseable.

simia.net/codenetproblems

Simia

Collaborative Knowledge Construction

30 January 2007

The deadline is upcoming! This weekend the deadline for submissions to the Workshop on Social and Collaborative Construction of Structured Knowledge at the WWW2007 will be over. And this may be easily the hottest topic of the year, I think: how do people construct knowledge in a community?

Ontologies are meant to be shared conceptualizations -- but how many tools really allow to build ontologies in a widely shared manner?

I am especially excited about the challenge that comes along with the workshop, to examine different tools, and to see how their perform. If you have a tool that fits here, write us.

So, I know you have thought a lot about the topic of collaboratively building knowledge -- write your thoughts down! Send them to us! Come to Banff! Submit to CKC2007!

Coming to New Zealand

16 February 2008

Yes! Three weeks of vacation in New Zealand, which is rumoured to be quite a beauty. This also means: three weeks no work, no projects, no thesis, no Semantic We...

Oh, almost. Actually I will enjoy to have the opportunity to give a talk on Semantic Wikipedia while staying in Auckland. If you're around, you may want to come by.

It is on February 22nd, 1pm-2pm at the AUT. You may want to tell Dave Parry that you're coming, he is my host.

Looking forward to this trip a lot!

Comments to naming

13 January 2005

Richard Newman sent me some thoughtful comments via eMail on the What's in a name series (there were also some great comments on the individual entries, feel free to browse them). He sent them via eMail, cause he thought he couldn't comment - that should be wrong, everyone should be able to comment anonymously. Or did anyone else encounter problems? I should switch to some dedicated software soon, anyway, but right now I don't have the time to dig deeper into it. I especially miss trackback, sigh.

Here's what Richard wrote:

"Your first point, about ISBNs and "what's being referenced" --- I think you'd be interested in FRBR, which is a modelling of the bibliographical domain. It splits things up into

Work -> Expression -> Manifestation -> Item

A work is an abstract concept, like "Politeia". An expression is a realisation of a work, so a particular translation is an expression. A manifestation is physical embodiment of an expression: this is what's given an ISBN. All copies of a certain book are Items; the edition of the book is their Manifestation.

So, you see, when you're discussing Plato's Politeia, you have to be conceptually clear about whether you're talking about works, expressions, manifestations, or items.

E.g.

:PolWork dc:creator "Plato" ;
  rdfs:label "Plato's Politeia, the abstract concept." .
:PolExp1 ex:translator "Mr Smith" ;
  frbr:work :PolWork ;
  rdfs:label "Mr. Smith's translation of Plato's Politeia." .
:PolMan1 ex:publisher "Penguin" ;
  frbr:expression :PolExp1 ;
  rdfs:label "Penguin's edition of Smith's translation." .
:MyCopy ex:owner hg:RichardNewman ;
  frbr:manifestation :PolMan1 ;
  rdfs:label "Richard's copy of the Penguin edition." .

Do you see? Each level has its own properties (and some may be duplicated; e.g. each has a title: the title of the abstract work, the name given to the translation, the name Penguin prints on each book, and the name printed on my copy).

I've done a bit of work on modelling FRBR in RDFS/OWL, but haven't yet finished. "

I think that's really interesting, and taking a look at FRBR it was pretty well done. I sure am looking forward to see Richards interpretation in OWL, and will probably use it.

"Your second issue is the difference between a resource and its representation. A URI should only refer to one thing; it is entirely wrong to use http://www.holygoat.co.uk to refer both to my homepage (as in using RDF to describe its language, or size, or last-modified) and to me (my name, my email address, etc.) which I have seen done.

Your web server should return RDF for http://semantic.nodx.net/#Plato if your browser says that it accepts RDF+XML. A normal browser should have an HTML representation returned. Indeed, it's possible to do the following:

the abstract resource. Hit this with a browser, get an HTML page; with an RDF agent, get some RDF.

http://example.com/Plato a rdf:resource .

the HTML representation.

http://example.com/Plato/html a ex:representation ;
  ex:representationOf http://example.com/Plato .

1. the RDF.

http://example.com/Plato/rdf a ex:representation ;
  ex:representationOf http://example.com/Plato .

i.e. you can unambiguously refer to each representation, and the resource. When your client arrives, asking for Plato, you can redirect them to the appropriate place. Clever, huh?

URIs should never give a 404. They should return the appropriate headers or content for whatever the client is requesting; this may be the RDF file in which the resource is defined, if the client understands RDF, or an HTML page.

If you're interested in this sort of thing, it pops up on the W3C's RDF Interest Group list occasionally.

Patrick Stickler and others have come up with an additional HTTP verb, MGET, which will return the RDF description of a resource. Combined with their URIQA architecture, it will give you a Concise Bounded Description for a URI. This stops you having to somehow put descriptions into particular files, and better deals with the distributed nature of the Semantic Web. Check it out; it presents several convincing arguments for not using fragment identifiers to refer to resources, and solves your bandwidth problem. You should never have to dump a whole file to get a description of a URI."

I have to note that Richard wrote me this just after part 4 of the series was published, so I could answer some of the questions already in the last two parts. Just to summarise it: I don't like content negotiation. Although it is technically totally feasible, I disagree that it should be done or is a good solution. If my browser asks for http://semantic.nodix.net/#Plato I don't think I should get different things depending on the content negotiation. This feels like cheating.

I wrote that to Richard already, and he answered:

"I think we agree on the main point, which is that

foaf:name "Richard" ; ex:format "HTML" .

which is a travesty :) "

He is totally right here.

"You still see it happen, though, with people referring to Wikipedia pages as if they were the abstract resource.

The content negotiation (getting different things depending on what you accept) is exactly what the Web is supposed to do. If I'm using a mobile browser, I want a simplified version of a page; if I'm an RDF agent, I want RDF, if it exists, because HTML is of no use to me. A common usage of this is to serve up strict XHTML to Mozilla, and less-strict HTML to Internet Explorer. It is also done all the time to serve PNG where the client accepts it, and GIF if it doesn't, and there is an intentional disconnect on the Web between a resource and its representations.

The lack of such a disconnect would lead to exactly the problem you describe; if I can't return a representation of a resource, because it's abstract, then how do I find out anything about it? I could use MGET, but you can't MGET a person... so, if you want to talk about the real world thing "Plato", he has to 404, or you get the "what am I talking about?" problem. Better, in my view, to redirect a browser to plato.html and a SW agent to a chunk of RDF. "

I would rather like to ask for http://semantic.nodix.net/Plato.rdf to get the RDF/XML representation, http://semantic.nodix.net/Plato.owl to get the OWL/XML representation, http://semantic.nodix.net/Plato.html to get a HTML page for the user to read and for a picture of Plato. This shouldn't be hidden behind content negotiation. I know, I know, Patrick would strongly disagree here, but I think it feels wrong and actually defies the idea of an URI.

"You can do exactly that (and I agree that the representations should have separate URIs --- conneg is only for when you're trying to get some description of an abstract resource), but then how do you refer to the abstract concept of "Plato"? http://.../Plato is a resource, and I want to make statements about him. But there's no point in it being 404 when dereferenced, because then how would I find out that Plato.html exists? HTTP doesn't return URIs, it returns representations of them.

A URI is simply something that is dereferenced to get a representation, and that representation should be decided on by conneg. In this case, /Plato is an abstract resource, so one of the representations should be returned. We can then make statements about Plato (e.g. foaf:name "Plato"), and about the JPEG and HTML representations, because they have different URIs, but still get something useful back when we want to access /Plato."

I also dislike MGET right now. Maybe I am wrong, but to me, the whole URIQA architecture feels somewhat wrong - but maybe I should just dwell deeper into it, I have to admit, I didn't study it yet enough to really be in a position to bash on it. The problem is, that MGET seems unnecessary to me - and it works on a different conceptual level than the rest of the Semantic Web proposals. I think everything MGET solves can be solved with tools that already exist: Richards example above, where he gives triples telling us which representations are used to describe a resource, shows perfectly well that you actually don't need content negotiation and MGET.

"There are things to question about URIQA, but it does have some good going for it. MGET is actually an implicit query. In the standard Web model, you request URIs and get back document representations. Doing an MGET on a Web server is asking it to return a description, regardless of where on the site descriptions of that resource exist, and you're explicitly asking for meta-data. As Patrick points out, it's similar doing a GET and specifying that you accept RDF, but is likely to be more concise (the difference between a "representation" and a "description"). In fact, this is exactly what the Nokia URIQA server does.

MGET overlaps with query servers a bit, and with GET a bit, but it's a little bit special, too. The whole idea is that from a single URI you can get a useful description of a resource, just by issuing a single MGET. Every other approach needs more work."

This URIQA / MGET stuff sounds more and more interesting. I really should dwell deeper into it.

Also, the idea of Concise Bounded Descriptions may be very neat, I have to study that more as well. Funny thing, the very same day Richard pointed me to it, a colleague told me about it too - this is usually a sign, that this idea is worth considering more.

Richard also wrote "URIs should never give a 404", and as you know, I disagreed with it mildly. He tried to summarise his position:

"I consider that each returned resource should have its own URI --- e.g. Plato.jpg --- and that the original URI should be used to make statements about the abstract resource. This allows you to say

...Plato foaf:name "Plato" .
...Plato.jpg ex:resolution "150dpi" .
...Plato.html dc:creator "Denny" .

Dereferencing the abstract resource, rather than throwing a 404, should do something useful --- e.g. redirecting with a 303 to one of the representations. Have you ever tried viewing a Blogger Atom feed in your browser? If you hit it with an RSS reader, you get the XML, but in a browser Blogger shows you an XHTML transformation of the XML. That's useful, and I think that's how the Semantic Web should work. Imagine if your agent hit /Plato, and got RDF out of it, but when you looked at it with your browser you saw a dynamically-generated HTML page? Handy!

I can understand your objection, though; it does seem wrong that you get different things out of the same URI. However, you should almost always get HTML out of plato.html, and RDF out of plato.rdf. All the conneg is doing is making sure you can see an abstract thing in the best way possible, according to what you've told the server you can understand. "

Richard is pretty good in convincing me, cause he uses the right arguments: it's for the people, dummy, and the machines can work it out anyway.

I still stick to the recommendations I gave yesterday. But just as I am writing, and rereading it all, I am starting to change my mind on content negotiation. Maybe it is a good thing. I will have to think about it some more, and as soon as I come to a solution, I will bother you with it again. I still have a gut feeling about it that tells me 'no', but the reasons given sound very convincing and I agree with most of them, so heck, let's meditate on this as soon as I find a few hours to spare.

Big thanks to Richard and his thoughts, anyway. I hope this discussion helps you to make up your own mind as well.

Commited to the Big S

6 September 2005

Not everyone likes our proposal for the Semantic Wikipedia. That's not a big surprise really. Boris Mann was talking about the advantages of tagging, and some ideas like blessed tags, that sounded very nice, when Jay Fienberg pointed him to the Semantic MediaWiki proposal. Boris answers: "I notice with a shudder however, that the Mediawiki stuff uses a large "S" Semantic, and includes RDF. I admit it, I'm afraid of RDF."

Yes, we do. And we're proud of it. Actually, it's the base for the better half of the possible applications we describe. Jay has some nice answers to it: "I think the MediaWiki folks are just recognizing the connection between their "tags" and the big "S" Semantic Web [you bet!, denny]. There are taxonomies and ontologies behind the popular tagging apps too--folks behind them just aren't recognizing / publicizing this (for a number of reasons, including that tags are often part of a practical application without big "S" Semantic Web goals). [...] I'm not a super huge fan of RDF myself, but I think it's useful to not be afraid of it, because some interesting things will come out of it at some point."

Our idea was to allow the user to use Semantic Web technologies even without really understanding them. No one needs to understand RDF fully, or OWL, to be able to use it. Sure, if she does, well, it surely will help her. Any by the way, RDF really is not complicated at all, it just has a syntax that sucks. So what?

Maybe it's a crude joke of history to start the Semantic Web with syntactic problems...

~~By the way, does anyone have a spare invitation to GMail for me? I'd really like to check out their service.~~ Thanks, Peter, that was fast.

Connectionism and symbolism: The fall of the symbolists

26 January 2023

The big tech layoffs happen, unfortunately and entirely by coincidence, at a time of incredibly elevated expectations regarding machine learned generative models: ChatGPT may not be the 'best' language model out there, but due to the hard work by OpenAI to turn it into an easy to use product, and the huge amount of resources made available for free so that a very large audience could play with it, has in a very short time managed to captured the imagination of many and the conversation. I would say, rightfully. The way ChatGPT was released led to a shock in the sense that we are right now dazed and confused about what effect this technology will have on the world.

And while we are still in the middle of processing this shock, large scale strategic decisions regarding many projects and people were made. Anyone in big tech who worked on symbolic approaches in natural language processing, knowledge representation and reasoning, and other fields of artificial intelligence had a hard time to keep their job. It feels right now like large language models will make all of these symbolic approaches superfluous (I think, this might be true, but is more likely to turn out to be mistaken).

It is always difficult to predict how events will be viewed historically. The advent of wide-spread deep learning approaches in the 2010s, culminating in the well-deserved recognition of Hinton, LeCun, and Bengio with the Turing Award show clearly what dominated the research agenda and the attention in AI in the last decade. But until now it felt like symbolic approaches still had some space left, that the growth in deep learning was in addition to other approaches. Symbolic approaches were ready to offer impulses and work on ideas for a field which might well be climbing towards a local maximum.

But a good number of the teams that were disbanded in the layoffs were exactly teams working with such symbolic approaches, and it feels like these parts of AI are now entering a bitter-cold winter.

A lot of knowledge is being lost right now, and many paths to innovative ideas are being buried. I have no doubt that there are still a lot of breakthroughs to be had in machine learning, and that there is immense value to be collected from the research results in machine learning from the last few years. And with immense I mean tens and hundreds of billions of dollars.

Nevertheless I expect that we will hit a wall. Reach a local maximum. Run into problems and limitations. And it would be good to keep a wider net to cast. To keep a larger search space alive. Alas, it seems it is not meant to be. In this abundance of capital and potential value, we seem to be on the way to starve research, optimise away alternatives, and to give everything to the mainstream ideas.

Simia

Croatian Elections 2016

31 August 2016

Croatian elections are upcoming.

The number of Croatians living abroad - in the so called Croatian diaspora - is estimated to be almost 4 Million according to the Croatian state office for Croatians abroad - only little less than the 4.3 Million who live in Croatia. The estimates vary wildly, and most of them actually do not have Croatian citizenship. But it is estimated that between 9-10% of holders of the Croatian citizenship live abroad.

These 9-10% are represented in the Croatian parliament: out of the 151 Members of Parliament, there are 3 (three) voted by the diaspora. That's 2% of the parliament representing 10% of the population.

In order for a member of the diaspora to vote, they have to register well before the election with their nearest diplomatic mission or consulate. The registration deadline is today, at least for my consulate. But for the election itself, you have to personally appear and vote at the consulate. For me, that would mean to drive or fly all the way to Los Angeles from San Francisco. And I am rather close to one of the 9 consulates in the US. There are countries that do not have Croatian embassies at all. Want to vote? Better apply for a travel visa to the country with the next embassy. Live in Nigeria? Have a trip to Libya or South Africa. There is no way to vote per mail or - ohwow21stcentury? - electronically. For one of the three Members of Parliament that represent us.

I don't really feel like the parliament wants us to vote. Making the vote mean so little and making it so hard to vote.

Crossing eight time zone borders in three hours

30 March 2024

Hopi Nation is an enclave within Navajo Nation. Navajo Nation is located across three US states, Arizona, New Mexico, and Utah.

Arizona does not observe daylight saving time. Navajo Nation observes daylight saving time. Hopi Nation does not observe daylight saving time. You can drive three hours in that area and cross timezones eight times.

All of the individual decisions make totally sense:

Arizona does not adhere to daylight saving time because any measure that makes sure Arizona residents get more sunshine is worse than bringing coals to Newcastle, as the saying goes. They are smart to not use daylight saving time.

Navajo Nation uses daylight saving time because they want to have the same timezone for their whole area, and they are also in two other states, Utah and New Mexico, which both have daylight saving time, so they decided to do so too, which makes totally sense.

And Hopi Nation, even though it is enclosed by the Navajo Nation, lies entirely within the state of Arizona, so it makes sense for them to follow *that* state.

All the individual decisions make sense, but the outcome must be rather inconvenient and potentially confusing for the people living there.

(Bonus:the solution for these seem obvious to me. Utah and New Mexico and many other southern US states should just get rid of daylight saving time, just as Arizona did, and Navajo Nation should follow suit. But that's just my opinion.)

Simia

DL Riddle

28 February 2007

Yesterday we stumbled upon quite a hard description logics problem. At least I think it is hard. The question was, why is this ontology unsatisfiable? Just six axioms. The ontology is availbe in OWL RDF/XML, in PDF (created with the owl tools), and here in Abstract Syntax.

Class(Rigid complete restriction(subclassof allValuesFrom(complementOf(AntiRigid))))
Class(NonRigid partial)
DisjointClasses(NonRigid Rigid)
ObjectProperty(subclassof Transitive)
Individual(publishedMaterial type(NonRigid))
Individual(issue type(Rigid) value(subclassof publishedMaterial))

So, the question is, why is this ontology unsatisfiable? It is even a minimally unsatisfiable subset, actually, that means, remove any of the axioms and you get a satisfiable ontology. Maybe you like to use it to test your students. Or yourself. The debugger in SWOOP actually gave me the right hint, but it didn't offer the full explanation. I figured it out, after a few minutes of hard thinking (so, now you know how bad I am at DL).

Do you know? (I'll post the answer in the comments if no one else does in a few days)

(Just in case you wonder, this ontology is based on a the OntOWLClean ontology from Chris Welty, see his paper at FOIS2006 if you like more info)

Comments are still missing on this post.

Semantic search

Further reading

Navigation menu

Search