Semantic search

Jump to navigation Jump to search

Ante Vrandečić (1919-1944)

I knew that my father was named for his uncle. His other brother told me about him, and he was telling me that he became a prisoner of war and that they lost his trace. Back then, I didn't dare to ask on which side he was fighting, and when I would have dared to ask, it was too late.

Today, thanks to the increasing digitalisation of older sources and their publication on the Web and the Web being indexed, I accidentally stumbled upon a record about him in a three thousand pages long book, Volume 8 of the "Victims of the War 1941-1945" (Žrtve rata 1941-1945).

He was a soldier in the NOV i POJ (Yugoslav partisans), became a prisoner of war, and was killed by Germans during a transport in 1944. I don't know where he was captured, from where to where he was transported, where he was killed.

My father, his namesake, then moved to Germany in the 1970s, where he and my mother built a new life for themselves and their children, and where I was born.

I have a lot of complicated emotions and thoughts.

A quick draft for a curriculum for Computer Science

The other day, on Facebook, I was asking the question who would be the person closest to being a popularizer for ideas in Computer Science to the wider audience, which lead to an interesting and insightful discussion.

Pat Hayes asked what I would consider the five (or so) core concepts of Computer Science. Ernest Davis answer with the following short list (not in any particular order):

  1. Virtual machine
  2. Caching
  3. Algorithm
  4. Data structure
  5. Programming language

And I followed up with this drafty, much longer answer:

  1. how and why computation works; that a computation is a mapping from your problem domain into some machine state, then we have some automatic movement, and the result represents an answer to your question; that it is always layers of interpretation; that it doesn't matter whether the computing machine is made of ICs or of levers, marbles, and gravity (i.e. what is a function); that computation is always real and you can't simulate computation; what can be done with computation and what cannot; computational thinking - this might map to number 1 in Ernest's list
  2. that everything can be represented with zeros and ones, but doesn't have to be; it could also be represented by A and B and Cs, and many other ways; that two states are simply convenient for electric devices; that all information, all data, all input to all computation, and the steps for computations themselves are represented with zeros and ones (i.e. the von Neumann architecture and binary encoding); what can be represented in this paradigm and what cannot - this might map to number 4 in Ernest's list
  3. how are functions encoded; how many different functions can have the same results; how wildly different in efficiency functions can be even when they have the same result; why that makes some things quick to calculate whereas others take a long time; basically smearing ideas from lambda calculus and assembler and building everything from NAND circuits; why this all maps to higher level languages such as JavaScript - this might map to ideas from 2, 3, and 5 on Ernest's list
  4. bringing it back to the devices; where does, physically, the computation happen, where is physically the data stored, and why it matters in terms of privacy, equity, convenience, economics, interdependence, even freedom and independence; what kind of computations and data storage we can expect to have in our mobile phones, in a data center, in an RFID card; how long the turnaround times are in each case; how cryptography works and what kind of guarantees it can provide; why centralization is so alluring and what the price of that might be; and what might be the cost of computation for the environment
  5. given our times, and building on the previous lessons, what is the role of machine learning; how does it actually work, why does it work as good as it does, and why does it not work when it doesn't and where can't it work; what does this have to with "intelligence", if it does; what becomes possible because of these methods, and what it costs; why these methods may reinforce inequities; but also how they might help us with significantly increasing access to better health care for many people are allow computers to have much more intuitive interfaces and thus democratize access to computing resources

I think the intuitions in 1, 2, and maybe 3 are really the core of computer science, and then 4 and 5 provide shortcuts to important questions four ourselves and society that, I think, would be worthwhile for everyone to ponder and have an informed understanding of the situation so that they can meaningfully make relevant decisions.

The Strange Case of Booker T. Washington’s Birthday

A lovely geeky essay about how much work a single edit to Wikipedia can be. I went down this kind of rabbit holes myself more than once, and so I very much enjoyed the essay.

Wordle is good and pure

The nice thing about Wordle - whether you play it or not, whether you like it or not - is that it is one of those good, pure things the Web was made for. A simple Website, without ads, popups, monetization, invasive tracking, etc.

You know, something that can chiefly be done by someone who already has a comfortable life and won't regret not having monetized this. The same way scientists mainly have been "gentleman scientist". Or tenured professors who spent years on writing novels.

And that is why I think that we should have a Universal Basic Income. To unlock that creativity. To allow for ideas from people who are not already well off to see the light. To allow for a larger diversity of people to try more interesting things.

Thank you for coming to my TED talk.

P.S.: on January 31, five days after I wrote this text, Wordle was acquired by the New York Times for an undisclosed seven-digit sum. I think that is awesome for Wardle, the developer of Wordle, and I still think that what I said was true at that time and still mostly is, although I expect the Website now to slowly change to have more tracking, branding, and eventually a paywall.

Meat Loaf

"But it was long ago
And it was far away
Oh God, it seemed so very far
And if life is just a highway
Then the soul is just a car
And objects in the rear view mirror may appear closer than they are."

Bat out of Hell II: Back into Hell was the first album I really listened to, over and over again. Where I translated the songs to better understand them. Paradise by the Dashboard Light is just a fun song. He was in cult classic movies such as The Rocky Horror Picture Show, Fight Club, and Wayne's World.

Many of the words we should remember him for are by Jim Steinman, who died last year and wrote many of the lyrics that became famous as Meat Loaf's songs. Some of Meat Loaf's own words better not be remembered.

Rock in Peace, Meat Loaf! You have arrived at your destination.

Map of current Wikidata edits

It starts entirely black and then listens to Wikidata edits. Every time an item with a coordinate is edited, a blue dot in the corresponding place is made. So slowly, over time, you get a more and more complete map of Wikidata items.

If you open the developer console, you can get links and names of the items being displayed.

The whole page is less than a hundred lines of JavaScript and HTML, and it runs entirely in the browser. It uses the Wikimedia Stream API and the Wikidata API, and has no code dependencies. Might be fun to take a look if you're so inclined.

https://github.com/vrandezo/wikidata-edit-map/blob/main/index.html

White's illusion

I stumbled upon "White's Illusion" and was wondering - was this named after a person called White, or was this named because, well it is an illusion where the colour white plays an important role?

As usual in this case, I started at Wikipedia's article on White's illusion. But Wikipedia didn't answer that question. The references at the bottom also didn't list to anyone named White. So I started guessing it's about the colour.

But wait! Skimming the article there was a mention to "White and White (1985)" - but without any further citation information. So not only one White but two of them!

Google Scholar and Semantic Scholar didn't help me resolving "White and White (1985)" to a proper paper, so I started suspecting that this was a prank that someone entered into the article. I started checking the other references, but they indeed reference papers by White! And with those more complete references I was able to find out that Michael White and Tony White wrote that 1985 paper, that they are both Australian, that Michael White wrote a number of other papers about this illusion and others, and that this is Tony White's only paper.

I added some of the info to the article, but that was a weird ride.

She likes music, but only when the music is loud

Original in German by Herbert Grönemeyer, 1983.

She sits on her windsill all day
Her legs dangling to the music
The noise from her room
drives all the neighbours mad
She is content
smiles merrily

She doesn't know
that snow
falls
without a sound
to the ground

Doesn't notice
the knocking
on the wall

She likes music
but only
when the music is loud
When it hits her stomach
with the sound

She likes music
but only
when the music is loud
When her feet feel
the shaking ground

She then forgets
that she is deaf

The man of her dreams
must play the bass
the tickling in her stomach
drives her crazy

Her mouth seems
to scream
with happiness
silently
her gaze removed
from this world

Her hands don't know
with whom to talk
No one's there
to speak to her

She likes music
but only
when the music is loud
When it hits her stomach
with the sound

She likes music
but only
when the music is loud
When her feet feel
the shaking ground

A sermon on tolerance and inclusion

Warning: meandering New Year's sermon ahead, starting at a random point and going somewhere entirely else.

I started reading Martin Kay's book on Translation, and I am enjoying it quite a bit so far. Kay passed away August 2021. His work seems highly relevant for the work on Abstract Wikipedia.

One thing that bummed me though is that for more than a page in the introduction he rants about pronouns and how he is going to use "he" to generically mean both men and women, and how all other solutions have deficits.

He culminates in the explanation: "Another solution to this problem is which is increasing in popularity, is to use both 'he' and 'she', shifting between them more or less randomly. So we will sometimes get 'When a translator is confronted with a situation of this kind, she must decide...'. The trouble with this is that some readers, including the present writer, reacts quite differently to the sentence depending on which version of the generic pronoun it contains. We read the one containing 'he' smoothly and, all else being equal, assimilate the intended meaning. Encountering the one with 'she', on the other hand, is like following a television drama that is suddenly interrupted by a commercial."

Sooo frustratingly close to getting it.

I wish he'd had just not spent over a page on this topic, but just used the generic 'he' in the text, and that's it. I mean, I don't expect everyone born more than eighty years ago to adjust to the modern usage of pronouns.

Now, I am not saying that to drag Kay's name through dirt, or to get him cancelled or whatever. I have never met him, but I am sure he was a person with many positive facets, and given my network I wouldn't be surprised if there are people who knew him and can confirm so. I'm also not saying that to virtue signal and say "oh man, look how much more progressive I am". Yes, I am slightly annoyed by this page. Unlike many others though, I am not actually personally affected by it - I use the pronoun "he" for myself and not any other pronoun, so this really is not about me. Is it because of that that it is easy for me to gloss over this and keep reading?

So is it because I am not affected personally that it is so easy for me to say the following: it is still worthwhile to keep reading his work, and the rest of the book, and to build on top of his work and learn from him. The people we learn some things from, the influences we accept, they don't have to be perfect in every way, right? Would it have been as easy for me to say that if I were personally affected? I don't know.

I am worried about how quickly parts of society seems to be ready to "cancel" and "call out" people, and how willing they are to tag a person as unacceptable because they do not necessarily share every single belief that is currently regarded as a required belief.

I have great difficulties in drawing the line. Which beliefs or actions of a person should be sufficient grounds to shun them or their work? When JK Rowling doubles down on her stance regarding trans women, is this enough to ask everyone to drop all interest in the world she created and the books she wrote? Do we reshoot movie scenes such as the cameo of Donald Trump in Home Alone 2 in order to "purify" the movie and make it acceptable for our new enlightened age again? When Johnny Depp was accused of domestic abuse, does he need to be recast from movies he had already been signed on? Do we also need to stop watching his previous movies? Do the believable accusations of child abuse against Marion Zimmer Bradley mean that we have to ignore her contributions to feminist causes, never mind her books? Should we stop using a font such as Gill Sans because of the sexual abuse Erjc Gill committed against his daughters? Do we have to stop watching movies or listen to music produced by murderers such as OJ Simpson, Phil Spector, or Johnny Lewis?

I intentionally escalated the examples, and they don't compare at all to Kay's defence of his usage of pronouns.

I offer no answers as to where the line should be, I have none. I don't know. In my opinion, none of us is perfect, and none of our idols, paragons, or example model humans will survive the scrutiny for perfection. This is not a new problem. Think of Gandhi, Michael Jackson, Alice Schwarzer, Socrates - no matter where you draw your idols from, they all come with imperfections, sometimes massive ones.

Can we keep and accept their positive contributions - without ignoring their faults? Can we allow people with faults to still continue to contribute their skills to society, or do we reduce them to their faults and negatives? Do we have to get someone fired for tweeting a stupid joke? Do we demand perfection by everyone at all time?

Or do we allow everyone to be human, make and have errors, and have beliefs many don't deem acceptable? Committing or causing actions resulting from these beliefs? Even if these actions and beliefs hurt or endanger people, or deny the humanity of others? We don't have to and should not accept their racism, sexism, homo- and transphobia - but can and should we still recognise their other contributions?

I am worried about something else as well. By pushing out so many because of the one thing they don't want to accept in the basket of required beliefs, we push them all into the group of outsiders. But if there are too many outsiders, the whole system collapses. Do we all have to have the same belief on guns, on climate, on gender, on abortion, on immigration, on race, on crypto, on capitalism, on housing? Or can we integrate and work together even if we have differences?

The vast majority of Americans think that human-caused climate change is real and that we should act to avoid it. Only 10% don't. And yet, because of the way we define and fence our in- and outgroups, we have a strong voting block that repeatedly leads to outright sabotage to effective measures. A large majority of Americans support the right to abortion, but you would never be able to tell given the fights around laws and court cases. Taxing billionaires more effectively is highly popular with voters, but again these majorities fizzle away and don't translate to the respective changes in the tax code.

I think we should be able to work together with people we don't agree with on everything. We should stop requiring perfection and alignment on all issues before moving forward. But then again, that's what I am saying, and I am saying it from a position of privilege, am I not? I am male. I am White. I am heterosexual. I am not Muslim or Jewish. I am well educated. I am not poor. I am reasonably technologically savvy. I am not disabled. What right do I have at all to voice my opinion on these topics? To demand for acceptance people with beliefs that hurt or endanger people who are not like me. Or even to ask for your precious attention for these words of mine?

None.

And yet I hope that we will work together towards progress on the topics we agree on, that we will enlighten each other on the topics we disagree on, and that we will be able to embrace more of us on our way into the future.

P.S.: this post is problematic and not very well written, and I recognise that. Please refer to the discussion about it on Facebook.

Long John and Average Joe

You may know about Long John Silver. But who's the longest John? Here's the answer according to Wikidata: https://w.wiki/4dFL

What about your Average Joe? Here's the answer about the most average Joe, based on all the Joes in Wikidata: https://w.wiki/4dFR

Note, the average height of a Joe in Wikidata is 1,86cm or 6'1", which is quite a bit higher than the average height in the population. A data collection and coverage issue: it is much more likely to have the height for a basketball player than for an author in Wikidata.

Just two silly queries for Wikidata, which are nice ways to show off the data set and what one can do with the SPARQL query endpoint. Especially the latter one shows off a rather interesting and complex SPARQL query.

Temperatures in California

It has been a bit chillier the last few days. I noticed that after almost a decade in California, I feel pretty comfortable with understanding temperatures in Fahrenheit - as long as they are over 60° F. If it is colder, I need to switch to Celsius in order to understand how cold it exactly is. I have no idea what 40° or 45° or 50° F are, but I still know what 5° C is!

The fact that I still haven't acclimatised to Fahrenheit for the cooler temperatures tells you a lot about the climate in California.

SWSA panel

Thursday, October 7, 2021, saw a panel of three founding members of the Semantic Web research community, who each have been my teachers and mentors over the years: Rudi Studer, Natasha Noy, and Jim Hendler. I loved watching the panel and enjoyed it thoroughly, also because it was just great to see all of them again.

There were many interesting insights and thoughts in this panel, too many to write them all down, but I want to mention a few.

It was interesting how much all panelists talked about creating the Semantic Web community, and how much of an intentional effort that was. Deciding that it needs a conference, a journal, an organization, setting those up, and their interactions. Seeing and fostering a sustainable research community grown out of an idea is a formidable and amazing effort. They all mentioned positively the diversity in the community, and that it was a conscious effort to work towards that. Rudi mentioned that the future challenge will be with ensuring that computer science students actually have Semantic Web technologies integrated into their standard curriculum.

They named a number of the successes that were influenced by the Semantic Web research work, such as Schema.org, the heavy use of SPARQL in supercomputing (I had no idea!), Wikidata (thanks for the shout out, Rudi!), and the development of scalable graph databases. Natasha raised the advantage of having common identifiers throughout an organization, i.e. that everyone refers to California the same way. They also named areas that remained elusive and that they expect to see progress in the coming years, Rudi in particular mentioned Agents and Common Sense, which was echoed by the other participants, and Jim mentioned Personal Knowledge Graphs. Jim mentioned he was surprised by the growing importance of unstructured data. Jim is also hoping for something akin to “procedural attachments” - you see some new data coming in, you perform this action (I would like to think that a little Wikifunctions goes a long way).

We need both, open knowledge graphs and closed knowledge graphs (think of your personal ones, but also the ones by companies).

The most important contribution so far and also well into the future was the idea of decentralization of semantics. To allow different stakeholders to work asynchronously and separately on parts of the semantics and yet share data. This also includes the decentralization of knowledge graphs, but also in the future we will encounter a world where semantics are increasingly brought together and yet decentralized.

One interesting anecdote was shared by Natasha. She was talking about a keynote by Guha (one of the few researchers who were namechecked in the panel, along with Tim Berners-Lee) at ISWC in Sydney 2013. How Guha was saying how simple the technology needs to be, and how there were many in the audience who were aghast and shocked by the talk. Now, eight years later and given her experience building Dataset Search, she appreciates the insights. If they have a discussion about a new property for longer than five minutes, they drop it. It’s too complicated, and people will use it wrong so often that the data cleanup will become expensive.

All of them shared the advice for researchers in their early career stage to work on topics that truly inspire them, on problems that are real and that they and others care about, and that if they do so, the results have the best chance to have impact. Think about problems you can explain to people not in your field, about “how can we use triples to save the world” - and not just about “hey, look, that problem that we solved with these other technologies previously, now we can also solve it with Semantic Web technologies”. This doesn’t really help anyone. Solve new problems. Solve real problems. And do what you are truly passionate about.

I enjoyed the panel, and can recommend everyone in the Semantic Web research area or any related, nearby research, to check it out. Thanks to the organizers for this talk (which is the first session in a series of talks that will continue with Ora Lassila early December).


Our four freedoms for our technology

(This is a draft. Comments are welcome. This is not meant as an attack on any person or company individually, but at certain practises that are becoming increasingly prevalent)

We are not allowed to use the devices we paid for in the ways we want. We are not allowed to use our own data in the way we want. We are only allowed to use them in the way the companies who created the devices and services allow us.

Sometimes these companies are nice and give us a lot of freedom in how to use the devices and data. But often they don’t. They close them down for all kinds of reasons. They may say it is for your protection and safety. They might admit it is for profit. They may say it is for legal reasons. But in the end, you are buying a device, or you are creating some data, and you are not allowed to use that device and that data in the way you want to, you are not allowed to be creative.

The companies don’t want you to think of the devices that you bought and the data that you created as your devices and your data. They want you to think of them as black boxes that offer you services they create for you. They don’t want you to think of a Ring doorbell as a camera, a microphone, a speaker, and a button, but they want you to think of it as providing safety. They don’t want you to think of the garage door opener as a motor and a bluetooth module and a wifi module, but as a garage door opening service, and the company wants to control how you are allowed to use that service. Companies like Chamberlain and SkyLink and Genie don’t allow you to write a tool to check on your garage door, and to close or open it, but they make deals with Google and Amazon and Apple in order to integrate these services into their digital assistants, so that you can use it in the way these companies have agreed on together, through the few paths these digital assistants are available. The digital assistant that you buy is not a microphone and a speaker and maybe a camera and maybe a screen that you buy and use as you want, but you buy a service that happens to have some technical ingredients. But you cannot use that screen to display what you want. Whether you can watch your Amazon Prime show on the screen of a Google Nest Hub depends on whether Amazon and Google have an agreement with each other, not on whether you have paid for access to Amazon Prime and you have paid for a Google Nest Hub. You cannot use that camera to take a picture. You cannot use that speaker to make it say something you want it to say. You cannot use the rich plethora of services on the Web, and you cannot use the many interesting services these digital assistants rely on, in novel and creative combinations.

These companies don’t want you to think of the data that you have created and that they have about you as your data. They don’t want you to think about this data at all. They just want you to use their services in the way they want you to use their services. On the devices they approve. They don’t want you to create other surfaces that are suited to the way you use your data. They don’t want you to decide on what you want to see in your feed. They don’t want you to be able to take a list of your friends and do something with it. They will say it is to protect privacy. They will say that it is for safety. That is why you cannot use the data you and your friends have created. They want to exactly control what you can and cannot do with the data you and your friends have created. They want to control how many ads you must see in order to be allowed to see your friends’ posts. They don't want anyone else to have the ability to provide you creative new interfaces to your feed. They don’t want you yourself the ability to look at your feed and do whatever you want with it.

Those are devices you paid for.

These are data you and your friends have created.

And more and more we are losing our freedom of using our devices and our data as we like.

It would be impossible to invent email today. It would be impossible to invent the telephone today. Both are protocols that allow everyone to communicate with anyone no matter what their email provider or their phone is. Try reading your friend’s Facebook feed on Instagram, or send a direct message from your Twitter account to someone on WhatsApp, or call your Skype contact on Facetime.

It would be impossible to launch the Web today - many companies don’t want you browsing the Web. They want you to be inside of your Facebook feed and consume your content there. They want you to be on your Twitter feed. They don’t want you to go to the Website of the New York Times and read an article there, they don’t want you to visit the Website of your friend and read their blog there. They want you to stay on their apps. Per default, they open Websites inside their app, and not in your browser, so you are always within their app. They don’t want you to experience the Web. The Web is dwindling and all the good things on it are being recut and rebundled within the apps and services of tech companies.

Increasingly, we are seeing more and more walls in the world. Already, it is becoming impossible to pay and watch certain movies and shows without buying into a full subscription in a service. We will likely see the day where you will need a specific device to watch a specific movie. Where the only way to watch a Disney+ exclusive movie is on a Disney+ tablet. You don’t think so? Think about how easy it is to get your Kindle books onto another Ebook reader. How do you enable a skill or capability available in Alexa on your Nest smart speaker? How can you search through the books that you bought and are in your digital library, besides by using a service provided by the company that allows you to search your digital library? When you buy a movie today on YouTube or on iMovies, what do you own? What are you left with when the companies behind these services close that service, or go out of business altogether?

Devices and content we pay for, data we and our friends create, should be ours to use in empowering and creative ways. Services and content should not be locked in with a certain device or subscription service. The bundling of services, content, devices, and locking up user data creates monopolies that stifle innovation and creativity. I am not asking to give away services or content or devices for free, I am asking to be allowed to pay for them and then use them as I see fit.

What can we do?

As far as I can tell, the solution, unfortunately, seems to be to ask for regulation. The market won’t solve it. The market doesn’t solve monopolies and oligopolies.

But don’t ask to regulate the tech giants individually. We don’t need a law that regulates Google and a law that regulates Apple and a law that regulates Amazon and a law to regulate Microsoft. We need laws to regulate devices, laws to regulate services, laws to regulate content, laws that regulate AI.

Don’t ask for Facebook to be broken up because you think Mark Zuckerberg is too rich and powerful. Breaking up Facebook, creating Baby Books, will ultimately make him and other Facebook shareholders richer than ever before. But breaking up Facebook will require the successor companies to work together on a protocol to collaborate. To share data. To be able to move from one service to another.

We need laws that require that every device we buy can be made fully ours. Yes, sure, Apple must still be allowed to provide us with the wonderful smooth User Experience we value Apple for. But we must also be able to access and share the data from the sensors in our devices that we have bought from them. We must be able to install and run software we have written or bought on the devices we paid for.

We need laws that require that our data is ours. We should be able to download our data from a service provider and use it as we like. We must be allowed to share with a friend the parts of our data we want to share with that friend. In real time, not in a dump download hours later. We must be able to take our social graph from one social service and move to a new service. The data must be sufficiently complete to allow for such a transfer, and not crippled.

We need laws that require that published content can be bought and used by us as we like. We should be able to store content on our hard disks. To lend it to a friend. To sell it. Anything I can legally do with a book I bought I must be able to legally do with a movie or piece of music I bought online. Just as with a book you are not allowed to give away the copies if the work you bought still enjoys copyright.

We need laws that require that services and capabilities are unbundled and made available to everyone. Particularly as technological progress with regards to AI, Quantum computing, and providing large amounts of compute becomes increasingly an exclusive domain for trillion dollar companies, we must enable other organizations and people to access these capabilities, or run the risk that sooner or later all and any innovation will be happening only in these few trillion dollar companies. Just because a company is really good at providing a specific service cheaply, it should not be allowed to unfairly gain advantage in all related areas and products and stifle competition and innovation. This company should still be allowed to use these capabilities in their products and services, but so should anyone else, fairly prized and accessible by everyone.

We want to unleash creativity and innovation. In our lifetimes we have seen the creation of technologies that would have been considered miracles and impossible just decades ago. These must belong to everybody. These must be available to everyone. There cannot be equity if all of these marvellous technologies can be only wielded by a few companies on the West coast of the United States. We must make them available to all the people of the world: the people of the Indian subcontinent, the people of Subsaharan Africa,the people of Latin America, and everyone else. They all should own the devices they paid for, the data they created, the content they paid for. They all should have access to the same digital services and capabilities that are available to the engineers at Amazon or Google or Microsoft. The universities and research centers of the world should be able to access the same devices and services and extend them with their novel and creative ideas. The scrappy engineers in Eastern Europe and India and Nigeria and Central Asia should be able to call the AI models trained by Google and Microsoft and use them in novel ways to run their devices and chip-powered cars and agricultural machines. We want a world of freedom, tinkering, where creativity and innovation are unleashed, and where everyone can contribute their ideas, their creativity, and where everyone can build their fortune.


The Center of the Universe

The discovery of the center of the universe led to a series of unexpected consequences. It killed some, it enlightened others, but most people just were left utterly confused in the end.

When the results from the Total Radiating Universal Tessellation Hyperfield satellites measurements came in, it became depressingly clear that the universe was indeed contracting. Very slowly, but without any reasonable doubt — or, as the physicists said, they were five sigma sure about it. As the data from the measurements became available, physicists, cosmologists, topologists, even a few mathematically inclined philosophers, and a huge number of volunteers started to investigate it. And after a short period of time, they came to a whole set of staggering conclusions.

First, the Universe had a rather simple four-dimensional form. The only unfortunate blemishes in this theory were the black holes, but most of the volunteers, philosophers, and topologists decided to ignore these as accidental.

Second, the form was bounded. There was a beginning and an end in time, and there were boundaries in space, and those who understood that these were the same were enlightened about the form of the universe.

Third, since the form of the universe was bounded and simple, it had a center. Whereas this was slightly surprising it was a necessary consequence of the previous findings. What first seemed exciting, but soon will turn out not to be only the heart of this report, but the heart of all humanity, was that the data collected by the satellites allowed to calculate the position of the center of the universe.

Before that, let me recapture what we traditionally knew about how the universe is built. Our sun is a star, around which a few planets travel, one of them being our Earth. Our sun is one of a few tens of billions of stars that form a long curved thread which ties around a supermassive black hole. A small number of such threads are tangled together, forming the spiral arms of our galaxy, the Milky Way. Our galaxy consists of half a trillion stars like our sun.

Galaxies, like everything else in the universe, like to stick together and form groups. A few hundred thousand galaxies make up a supercluster. A few of these superclusters together build enormous walls of stars, filaments traversing the universe. The galaxies of such a wall are all in a single plane, more or less, and sometimes even in a single line.

Between these walls, walls made of superclusters and galaxies and stars and planets, there is, basically, nothing. The walls of stars are like gigantic honeycombs, and between them, are enormous empty spaces, hundred million of light years wide. When you look at a honeycomb, you will see that the empty spaces between the walls are much, much larger than the walls themselves. Such is the universe. You might think that the distance from here to the next grocery store is quite far, or that the ocean is quite big. But the distance from the earth to the sun is so much bigger, and the distance from the sun to the next star again so much more. And from our galaxy to the next, there is a huge empty space. Nevertheless, our galaxy is so close to the next group of galaxies that they together form a building block of a huge wall, separating two unimaginable large empty spaces from each other.

So when we figured out that we can calculate the center of the universe, it was widely expected that the center would be somewhere in one of those vast spaces of nothing. The chances that it would be in one of the filaments were tiny.

It turned out that this was not a question of chance.

The center of the universe was not only inside of a filament, but the first quick calculations (quick, though, has to be understood as taking three and a half years) suggested that the center is actually within our filament. And not only within our filament — but our galaxy. Within a one light year radius of our sun.

The team that made these calculations was working at a small research institute in rural Japan. They did not believe the results, and double and triple checked them. The head of the institute had graduated from Princeton, and called his former advisor there. Although it was deep in the night in Japan, they talked for many hours. In the end he learned that Princeton has made the same calculations, and received their own results about eight months ago. They didn’t dare to publish them. There must have been a mistake. These results had to be wrong.

Science has humiliated the whole of humanity again and again. And it was quite successful in doing so. A scientist would much easier accept that the center of the universe is some mathematical construct pointing to nothing than what the infallible mathematics indicated. But the data was out. And the number of people making the above mentioned realizations and calculations continued growing. It was only a matter of time. And when the Catholic University of Rio de Janeiro finally published the results — in a carefully written paper, without any accompanying press release, and formulated so cautiously and defensively — all the scientists who already knew the results held their breath.

The storm was unimaginable. Everyone demanded an explanation, but no one would listen to anyone offering one. The religions rejoiced, claiming they knew it all along, and many flocked to the mosques and churches and temples, as a proof of God was finally found. The irony of science leading humans to the embrace of religion was profoundly lost at that time, but later recognized as one of the largest jokes in history. Science has dealt its ultimate humiliation, not to humanity, but perversely to its most devout followers, the scientists. The scientists, who, while trashing the superiority of humans over the world, were secretly inflating their own, and were now reminded that they were merely slaves to a most cruel mistress. Their bitter resistance to the results did not stop them from emerging.

The mathematics and calculations were soon made public. The mathematics were deceptively simple, once the required factorizations were done, and easy to check. High school courses went through the proofs, and desperate parents peeked over the shoulders of their daughters and sons who, sometimes for the first time, talked of integrals and imaginary numbers. Television and streaming platforms were explaining discriminants and complex numbers and roots of higher degrees. Websites offering math courses bent under the load and moral weight.

There is one weird thing about roots. The root of a number is the number that, multiplied with itself, gives you the original number. The weird thing is that there is usually not a single, unique result to that question. For example, the root of the number four is not just two, but also minus two, as minus two times minus two results in four, too. There are two roots of the second degree (which we usually call the square root). There are three roots of the third degree (sometimes called the cube root). There are four roots of the fourth degree. And so on. All of them are correct. Sometimes you can discard one or the other because the result has to fit certain constraints (say, you are looking only for the positive root of four), but sometimes, you can not.

As the calculations went public, the methods became more and more refined. The results became increasingly precise, and as the data from the satellites poured in, one of the last steps involved a root of the seventh degree. First, this was regarded as a minor curiosity, especially because these seven results led to basically the same point. Cosmologically speaking.

Earth is moving. Earth is moving around the sun, with a speed of a sixty seven thousand miles per hour, or eighteen miles each second. Also the sun is moving, and the earth is moving with the sun, and our galaxy is moving, and with our galaxy the sun moves along, and with the sun our earth. We are racing with a speed of a thousand miles each second in some direction away from the center of the universe.

And it was realized, maybe we just passed the center of the universe. Maybe it was just an accident, maybe all the planets and stars pass the center of the universe at some point. That we are so close to the center of the universe might be just a funny coincidence.

And maybe they are right. Maybe every star will at some point cross the center of the universe within the distance of a light year.

At some point though it was realized that, since the universe was bounded in all four dimensions, there was not only a center in space, but also a center in time, a midpoint between the beginning of the universe and its future end.

All human history is encompassed in the last hundred thousand years. From the mitochondrial Eve and the Y-Chromosomal Adam who lived in Africa, the mother of our mother of our mother, and so on, that we all share, and the father of our father of our father, and so on, that we all share, their descendants, our ancestors, who crossed the then fertile jungle of the Sahara and who afterwards settled the whole planet, painted on the walls of caves and filled the air with music by blowing over grass blades and into hollow bones, wandered over the land bridge connecting Asia with the Americas and traveled over the vast Pacific to discover tiny islands, until the recent invention of the alphabet, all of this happened in the last hundred thousand years. The universe has an age of hundred thousand times a hundred thousand years, roughly. And the fabled midpoint turned out to be within the last few thousand years.

The hopes that our earth was just accidentally next to the center of the universe was shattered. As the precision of the calculations increased, it became clearer and clearer that earth was not merely close to the center of the universe, but back at the midpoint of history, earth was right there in the center. In every single of the seven possible results, Earth was right at the center of the universe. [1]

As the calculations continued over the years, a new class of mystic mathematicians emerged, and many walls between religion and science were shattered. On both sides the unshakeable ones remained: the scientists who would not admit that these results mean anything, that it all is merely a mathematical abstraction; and the priests who say that these results mean nothing, that they don’t tell us about how to live a good life. That these parallels intersect, is the only trace of infinity left.


[1] As the results refined, it seemed that the seven mathematical solutions for the center of time and space turned out to be some very well known dates. So far the precisions calculated was ten years here or there. The well known dates were: 3760 BC, 541 BC, 30 AD, and 610 AD. The other dates turned out to be quite less well known: 10909 BC, 3114 BC, and 1989 AD. The interpretation of the dates led to a well-known series of events all over the world, which we will not discuss here.


(This story was first published on Medium on February 2, 2014 under CC-BY 4.0).

CodeNet problem descriptions on the Web

Project CodeNet is a large corpus of code published by IBM. It has close to one and a half million programs around a bit more than 4,000 problems.

I took the problem descriptions, created a simple index file to those, and uploaded them to the Web to make them easily browseable.

Wikidata or scraping Wikipedia

Yesterday I was pointed to a blog post describing how to answer an interesting project: how many generations from Alfred the Great to Elizabeth II? Alfred the Great was a king in England at the end of the 9th century, and Elizabeth II is the current Queen of England (and a bit more).

The author of the blog post, Bill P. Godfrey, describes in detail how he wrote a crawler that started downloading the English Wikipedia article of Queen Elizabeth II, and then followed the links in the infobox to download all her ancestors, one after the other. He used a scraper to get the information from the Wikipedia infoboxes from the HTML page. He invested quite a bit of work in cleaning the data, particularly doing entity reconciliation. This was then turned into a graph and the data analyzed, resulting in a number of paths from Elizabeth II to Alfred, the shortest being 31 generations.

I honestly love these kinds of projects, and I found Bill’s write-up interesting and read it with pleasure. It is totally something I would love to do myself. Congrats to Bill for doing it. Bill provided the dataset for further analysis on his Website. Thanks for that!

Everything I say in this post is not meant, in any way, as a criticism of Bill. As said, I think he did a fun project with interesting results, and he wrote a good write-up and published his data. All of this is great. I left a comment on the blog post sketching out how Wikidata could be used for similar results.

He submitted his blog post to Hacker News, where a, to me, extremely surprising discussion ensued. He was pointed rather naturally and swiftly to Wikidata and DBpedia. DBpedia is a project that started and invested heavily in scraping the infoboxes from Wikipedia. Wikidata is a sibling project of Wikipedia where data can be directly maintained by contributors and accessed in a number of machine-readable ways. Asked why he didn’t use Wikidata, he said he didn’t know about it. All fair and good.

But some of the discussions and comments on Hacker News surprised me entirely.

Expressing my consternation, I started discussions on Twitter and on Facebook. And there were some very interesting stories about the pain of using Wikidata, and I very much expect us to learn from them and hopefully make things easier. The number of API queries one has to make in order to get data (although, these numbers would be much smaller than with the scraping approach), the learning curve about SPARQL and RDF (although, you can ignore both, unless you want to use them explicitly - you can just use JSON and the Wikidata API), the opaqueness of the identifiers (wdt:P25 wd:Q9682 instead of “mother” and “Queen Elizabeth II”) were just a few. The documentation seems hard to find, there seem to be a lack of libraries and APIs that are easy to use. And yet, comments like "if you've actually tried getting data from wikidata/wikipedia you very quickly learn the HTML is much easier to parse than the results wikidata gives you" surprised me a lot.

Others asked about the data quality of Wikidata, and complained about the huge amount of bad data, duplicates, and the bad ontology in Wikidata (as if Wikipedia wouldn’t have these problems. I mean how do you figure out what a Wikipedia article is about? How do you get a list of all bridges or events from Wikipedia?)

I am not here to fight. I am here to listen and to learn, in order to help figuring out what needs to be made better. I did dive into the question of data quality. Thankfully, Bill provides his dataset on the Website, and downloading the query result for the following query - select * { wd:Q9682 (wdt:P25|wdt:P22)* ?p . ?p wdt:P25|wdt:P22 ?q } - is just one click away. The result of this query is equivalent to what Bill was trying to achieve - a list of all ancestors of Elizabeth II. (The actual query is a little bit more complex, because we also fetch the names of the ancestors, and their Wikipedia articles, in order to help match the data to Bill’s data).

I would claim that I invested far less work than Bill in creating my graph data. No data cleansing, no scraping, no crawling, no entity reconciliation, no manual checking. How about the quality of the two datasets?

Update: Note, this post is not a tutorial to SPARQL or Wikidata. You can find an explanation of the query in the discussion on Hacker News about this post. I really wanted to see how the quality of the data using the two approaches compares. Yes, it is an unfamiliar language for many, but I used to teach SPARQL and the basics of the languages seem not that hard to learn. Try out this tutorial for example. Update over

So, let’s look at the datasets. I will refer to the two datasets as the scrape (that’s Bill’s dataset) and Wikidata (that’s the query result from Wikidata, as of the morning of August 20 - in particular, none of the errors in Wikidata mentioned below have been fixed).

In the scrape, we find 2,584 ancestors of Elizabeth II (including herself). They are connected with 3,528 parenthood relationships.

In Wikidata, we find 20,068 ancestors of Elizabeth II (including herself). They are connected with 25,414 parenthood relationships.

So the scrape only found a bit less than 13% of the people that Wikidata knows about, and close to 14% of the relationships. If you ask me, that’s quite a bad recall - almost seven out of eight ancestors are missing.

Did the scrape find things that are missing in Wikidata? Yes. 43 ancestors are in the scrape which are missing in Wikidata, and 61 parenthood relationships are in the scrape which are missing from Wikidata. That’s about 1.8% of the data in the scrape, or 0.24% compared to the overall parent relationship data of Elizabeth II in Wikidata.

I evaluated the complete list of those relationships from the scrape missing from Wikidata. They fall into five categories:

  • Category 1: Errors that come from the scraper. 40 of the 61 relationships are errors introduced by the scrapers. We have cities or countries being parents - which isn’t too terrible, as Bill says in the blog post because they won’t have parents themselves and won’t participate in the original question of findinging the lineage from Alfred to Elizabeth, so no problem. More problematic is when grandparents or great-grandparents are identified as the parent, because this directly messes up the counting of generations: Ügyek is thought to be a son, not a grandson of Prince Csaba, Anna Dalassene is skipping two generations to Theophylact Dalassenos, etc. This means we have an error rate of at least 1.1% in the scraper dataset, besides having the low recall rate mentioned above.
  • Category 2: Wikipedia has an error. Those are rare, it happened twice. Adelaide of Metz had the wrong father and Sophie of Mecklenburg linked to the wrong mother in the infobox (although the text was linking to the right one). The first one has been fixed since Bill ran his scraper (unlucky timing!), and I fixed the second one. Note I am linking to the historic version of the article with the error.
  • Category 3: Wikidata was missing data. Jeanne de Fougères, Countess of La Marche and of Angoulême and Albert Azzo II, Margrave of Milan were missing one or both of their parents, and Bill’s scraping found them. So of the more than 3,500 scraped relationships, only 2 were missing! I added both.
  • In addition, correct data was marked deprecated once. I fixed that, too.
  • Category 4: Wikidata has duplicates, and that breaks the chain. That happened five times, I think the following pairs are duplicates: Q28739301/Q106688884, Q105274433/Q40115489, Q56285134/Q354855, Q61578108/Q546165 and Q15730031/Q59578032. Duplicates were mentioned explicitly in one of the comments as a problem, and here we can see that they happen with quite a bit of frequency, particularly for non-central items. I merged all of these.
  • Category 5: the situation is complicated, and different Wikipedia versions disagree, because the sources seem to disagree. Sometimes Wikidata models that disagreement quite well - but often not. After all, we are talking about people who sometimes lived more than a millennium ago. Here are these cases: Albert II, Margrave of Brandenburg to Ada of Holland; Prince Álmos to Sophia to Emmo of Loon (complicated by a duplicate as well); Oldřich, Duke of Bohemia to Adiva; William III to Raymond III, both Counts of Toulouse; Thored to Oslac of York; Bermudo II of León to Ordoño III of León (Galician says IV); and Robert Fitzhamon to Hamo Dapifer. In total, eight cases. I didn't edit those as these require quite a bit of thought.

Note that there was not a single case of “Wikidata got it wrong”, which surprised me a lot - I totally expected errors to happen. Unless you count the cases in Category 5. I mean, even English Wikipedia had errors! This was a pleasant surprise. Also, the genuine complicated cases are roughly as frequent as missing data, duplicates, and errors together. To be honest, that sounds like a pretty good result to me.

Also, the scraped data? Recall might be low, but the precision is pretty good: more than 98% of it is corroborated by Wikidata. Not all scraping jobs have such a high correctness.

In general, these results are comparable to a comparison of Wikidata with DBpedia and Freebase I did two years ago.

Oh, and what about Bill’s original question?

Turns out that Wikidata knows of a path between Alfred and Elizabeth II that is even shorter than the shortest 31 generations Bill found, as it takes only 30 generations.

This is Bill’s path:

  • Alfred the Great
  • Ælfthryth, Countess of Flanders
  • Arnulf I, Count of Flanders
  • Baldwin III, Count of Flanders
  • Arnulf II, Count of Flanders
  • Baldwin IV, Count of Flanders
  • Judith of Flanders
  • Henry IX, Duke of Bavaria
  • Henry X, Duke of Bavaria
  • Henry the Lion
  • Henry V, Count Palatine of the Rhine
  • Agnes of the Palatinate
  • Louis II, Duke of Bavaria
  • Louis IV, Holy Roman Emperor
  • Albert I, Duke of Bavaria
  • Joanna Sophia of Bavaria
  • Albert II o _Germany
  • Elizabeth of Austria
  • Barbara Jagiellon
  • Christine of Saxony
  • Christine of Hesse
  • Sophia of Holstein-Gottorp
  • Adolphus Frederick I, Duke of Mecklenburg-Schwerin
  • Adolphus Frederick II, Duke of Mecklenburg-Strelitz
  • Duke Charles Louis Frederick of Mecklenburg
  • Charlotte of Mecklenburg-Strelitz
  • Prince Adolphus, Duke of Cambridge
  • Princess Mary Adelaide of Cambridge
  • Mary of Teck
  • George VI
  • Elizabeth II

And this is the path that I found using the Wikidata data:

  • Alfred the Great
  • Edward the Elder (surprisingly, it deviates right at the beginning)
  • Eadgifu of Wessex
  • Louis IV of France
  • Matilda of France
  • Gerberga of Burgundy
  • Matilda of Swabia (this is a weak link in the chain, though, as there might possibly be two Matildas having been merged together. Ask your resident historian)
  • Adalbert II, Count of Ballenstedt
  • Otto, Count of Ballenstedt
  • Albert the Bear
  • Bernhard, Count of Anhalt
  • Albert I, Duke of Saxony
  • Albert II, Duke of Saxony
  • Rudolf I, Duke of Saxe-Wittenberg
  • Wenceslaus I, Duke of Saxe-Wittenberg
  • Rudolf III, Duke of Saxe-Wittenberg
  • Barbara of Saxe-Wittenberg (Barbara has no article in the English Wikipedia, but in German, Bulgarian, and Italian. Since the scraper only looks at English, they would have never found this path)
  • Dorothea of Brandenburg
  • Frederick I of Denmark
  • Adolf, Duke of Holstein-Gottorp (husband to Christine of Hesse in Bill’s path)
  • Sophia of Holstein-Gottorp (and here the two lineages merge again)
  • Adolphus Frederick I, Duke of Mecklenburg-Schwerin
  • Adolphus Frederick II, Duke of Mecklenburg-Strelitz
  • Duke Charles Louis Frederick of Mecklenburg
  • Charlotte of Mecklenburg-Strelitz
  • Prince Adolphus, Duke of Cambridge
  • Princess Mary Adelaide of Cambridge
  • Mary of Teck
  • George VI
  • Elizabeth II

I hope that this is an interesting result for Bill coming out of this exercise.

I am super thankful to Bill for doing this work and describing it. It led to very interesting discussions and triggered insights into some shortcomings of Wikidata. I hope the above write-up is also helpful, particularly in providing some data regarding the quality of Wikidata, and I hope that it will lead to work in making Wikidata more and easier accessible to explorers like Bill.

Update: there has been a discussion of this post on Hacker News.

Double copy in gravity

15 May 2021

When I was younger, I understood these theories much better. Today I read them like a fascinated, but a bit distant bystander.

But it is terribly interesting. What does turning physics into math mean? When we find a mathematical shortcut that works but we don't understand - is this real? What is the relation between mathematical formulas and reality? And will we finally understand gravity some day?

It was an interesting article, but I am not sure I understood it all. I guess, I'm getting old. Or just too specialized.

Zen and the Art of Motorcycle Maintenance

13 May 2021

During my PhD, on the topic of ontology evaluation - figuring out what a good ontology is and what is not - I was running circles up and down trying to define what "good" means for an ontology (Benjamin Good, another researcher on that topic, had it easier, as he could call his metric "Good metric" and be done with it).

So while I was struggling with the definition in one of my academic essays, a kind anonymous reviewer (I think it was Aldo Gangemi) suggested I should read "Zen and the Art of Motorcycle Maintenance".

When I read the title of the suggested book, I first thought the reviewer was being mean or silly and suggesting a made-up book because I was so incoherent. It took me two days to actually check whether that book existed, as I wouldn't believe it.

It existed. And it really helped me, by allowing me to set boundaries of how far I can go in my own work, and that it is OK to have limitations, and that trying to solve EVERYTHING leads to madness.

(Thanks to Brandon Harris for triggering this memory)

Keynote at Web Conference 2021

Today, I have the honor to give a keynote at the WWW Confe... sorry, the Web Conference 2021 in Ljubljana (and in the whole world). It's the 30th Web Conference!

Join Jure Leskovec, Evelyne Viegas, Marko Grobelnik, Stan Matwin and myself!

I am going to talk about how Abstract Wikipedia and Wikifunctions aims to contribute to Knowledge Equity. Register here for free:

Update: the talk can now be watched on VideoLectures:

Building a Multilingual Wikipedia

Communications of the ACM published my paper on "Building a Multilingual Wikipedia", a short description of the Wikifunctions and Abstract Wikipedia project that we are currently working on at the Wikimedia Foundation.


Jochen Witte

Jochen Witte war ein Freund meiner Schulzeit. Ich habe viel von ihm gelernt, er konnte all diese praktischen Sachen zu denen ich nie einen Zugang hatte und von denen ich oft wünschte, ich könnte sie. Von ihm lernte ich, was eine gute Soundanlage braucht und warum Subwoofer groß sein müssen und was Subwoofer überhaupt sind. Zusammen schleppten wir schwere Boxen, um Unterstufendiscos und Abischerze und Vorträge zu ermöglichen. Von ihm lernte ich die Vorzüge des Gaffertapes kennen, und dass es nicht nur silbernes Klebeband ist. Er war der erste, der mir Mangas und Anime ein wenig näherbrachte, insbesondere hatte er eine Leidenschaft für Akira. Er ließ mich das erste Mal die elektronische Musik von Chris Hülsbeck und Jean-Michel Jarre hören. Er las ASM, ich las Power Play. Wir spielten eine zeitlang DSA miteinander. Er war der erste den ich kannte mit einem Pager. Er wirkte stets so als konnte er alles reparieren, und es war gut so jemanden zu kennen.

Gleichzeitig waren einige meiner Freunde und ich ihm gegenüber nicht immer freundlich, oh nein, im Gegenteil, manchmal war ich geradewegs grausam. Ich mache mich über seine Brille lustig oder sein Gewicht, und konnte Punkte damit sammeln, über ihn Witze zu machen. Ich wusste es war falsch. Wir waren ja schon die Außenseiter in der Klasse, und ich versuchte ihn zum Außenseiter der Außenseiter zu machen. Meine einzige Entschuldigung ist, dass wir Kinder waren, und ich noch nicht die Stärke hatte, besser zu sein. Ich lernte viel daraus, und wollte nie wieder so sein. Mit der Zeit verstand ich mich besser. Wo diese Grausamkeit herkam. Und das es nicht an Jochen lag, sondern in mir. Ich schäme mich für vieles was ich tat. Ich weiß nicht, ob ich mich jemals bei ihm entschuldigt habe.

Und dennoch glaube ich waren wir Freunde.

Nach der Schulzeit verloren wir uns aus den Augen. Er studierte Chemie in Esslingen, wir trafen uns hin und wieder im Movie Dick zur Sneak Preview. Er zog nach Staig im Alb-Donau-Kreis und fand sich als Goth wieder. Aber über die Jahre hinweg, gerieten wir hin und wieder in Kontakt.

Eine unserer gemeinsamen Erinnerungen war, wie wir zusammen zu einem Vortrag von Erich von Däniken fuhren. Es war mein Auto. Wir hatten einen Platten, und während er es zum Laufen brachte - wie gesagt, er konnte alles reparieren - fragte er mich, wann ich denn das letzte Mal nach dem Öl geschaut habe. Ich muss so belämmert reingeschaut haben, dass er nur noch lachen konnte. Die Antwort war "Nie", und er sah es in meinem Gesicht. Jedesmal wenn wir uns trafen, sprach er mich auf diesen Abend an.

Jochen half mir beim Umzug nach Karlsruhe. Das Gästebett passte nicht richtig zusammen. Er sagte er könnte es festziehen, aber ich würde es nie wieder auseinander bekommen. Es wird schwierig, damit umzuziehen. Ich sagte, das ist OK, ist ja nur ein billiges IKEA Gästebett Couch Dings. Ich habe nicht vor, damit umzuziehen, versicherte ich ihm.

Ich zog damit von Karlsruhe nach Berlin. Von Berlin nach Alameda. Innerhalb von Alameda. Von Alameda nach Berkeley. Es hat den Umzugshelfern jedesmal Kopfzerbrechen bereitet, genau wie Jochen versprochen hatte. Letzte Woche brach ein Stück ab. Ich sitze jetzt darauf und schreibe das hier. Nach fast einem Jahrzehnt sollte ich es wohl endlich austauschen.

Das letzte mal trafen wir uns ganz zufällig 2017 am Stuttgarter Bahnhof. Ich war überhaupt nur ein Mal im letzen halben Jahrzehnt wieder in Deutschland. Und da, am Bahnhof, traf ich ihn. Es war schön, Jochen wiederzusehen, und wir redeten als ob wir uns immer noch täglich sehen würden, wie zwanzig Jahre zuvor. Als ob das Abitur erst gestern war.

Diese Woche erfuhr ich von Michael, dass Jochen verstorben ist. Er starb nur wenige Monate nach unserem zufälligen Treffen, im April 2018. Er wurde nur vierzig Jahre alt.

Es tut mir leid.

Und noch viel mehr: Danke.

Ruhe in Frieden, Jochen Witte.

Der Name Zdenko

Heute sah ich dass der Artikel Zdenko - mein eigentlicher Name - auf der Englischen Wikipedia verändert wurde. Jemand hatte die Bedeutung des Namens von dem, was ich für richtig hielt (slawische Form von Sidonius) zu etwas was ich nie zuvor gehört habe (Koseform von Zdeslav) verändert, aber nicht die Quelle angepasst. Ich dachte, das wird eine schnelle Korrektur, habe aber dennoch in die Quelle geschaut - und, schau an, die Quelle sagte weder das eine noch das andere, sondern behauptete der Name stammt von dem slawischen Wort zidati, bauen, errichten.

Das führte mich zu einer zweitstündigen Odyssee durch verschiedene Quellen des 19. und 20. Jahrhunderts, wo ich Belege für alle drei Bedeutungen finden konnte - außerdem Quellen, die behaupteten, dass der Name von dem Slawischen Wort zdenac, Brunnen, abgeleitet ist, dass auch der Name Sidney von Sidonius stamme, und eine Hessische Quelle die vehement darüber schimpfte, dass doch Zdenko und Sidonius nichts miteinander zu tun haben (auch die Slowenische Wikipedia sagt, dass die Namen Zdenko und Sidonius zwar einen gemeinsamen Namenstag haben, aber nicht der gleiche Name sind). Dafür aber führt die gleiche Quelle aus, dass der im Osthessischen gebrauchte Name Denje wohl von Zdenka kommt (so nah an Denny!)

Denje gefällt mir als Name.

Kurzgesagt: wenn Du denkst, Etymologie sei kompliziert, sei gewarnt: Anthroponomastik ist deutlich schlimmer!

The name Zdenko

Today I saw that the Wikipedia article on Zdenko - my actual name - was edited, and the meaning of the name was changed from something I considered correct (slavic form of Sidonius) to something that I never heard of before (diminutive of Zdeslav), but the reference stayed intact, so I thought that'll be an easy revert. Just to do due process, I checked the given source - and funnily enough, it didn't say neither one nor the other, but gave an etymology from the slavic word zidati, to build, to create.

That lead me down a two hour rabbit hole through different sources crossing the 19th to 20th century, finding sources that claim the name is derived from the Slavic word zdenac, a well, or that Zdenko is cognate to Sidney, a Hessian source explaining that it is considered the root for the name Denje (so close to Denny!) (and saying it has nothing to do with Sidonius), and much more.

In short, if you think that etymology is messy, I tell you, anthroponymy is far worse!

Time on Mars

This is a fascinating and fun listen about the mars mission. Because a day on Mars takes 40 minutes longer than on Earth, the people working on that mission had to live on Mars time, as the Mars rovers work with solar panels. So they have watches showing Mars time. They invent new words in their language, speaking about sol instead of day, of yestersol, and they start themselves calling Martians. 11 minutes.

Katherine Maher to step down from Wikimedia Foundation

Today Katherine Maher announced that she is stepping down as the CEO of the Wikimedia Foundation in April.

Thank you for everything!

Boole and Voynich and Everest

Did you know?

George Boole - after whom the Boolean data type and Boolean logic was named - was the father of Ethel Lilian Voynich - who wrote The Gadfly.

Her husband was Wilfrid Voynich - after whom the Voynich manuscript was named.

Ethel's mother and George Boole's wife was Mary Everest Boole - a self-thought mathematician who wrote educational books about mathematics. Her life is of interest to feminists as an example of how women made careers in an academic system that did not welcome them.

Mary Everest Boole's uncle was Sir George Everest - after whom Mount Everest is named.

And her daughter Lucy Everest was the first he first woman Fellow of the Royal Institute of Chemistry.

Geoffrey Hinton, great-great-grandson of George and Mary Everest Boole, received the Turing Award for his work on deep learning.

Abraham Taherivand to step down from Wikimedia Deutschland

Today Abraham Taherivand announced that he is stepping down as the CEO of Wikimedia Deutschland at the end of the year.

Thank you for everything!

Twenty years

On this day, twenty years ago, on January 15, 2001, I started my third Website, Nodix, and I kept it up since then (unlike my previous two Websites, which are lost to history as Internet Archive didn't capture them yet, it seems). A few years later I renamed it to Simia.

Here is the first entry: Willkommen auf der Webseite von Denny Vrandecic!

My Website never became particularly popular, although I was meticulously keeping track of how many hits I got and all of this. It was always a fun side project for which I had sometimes more and sometimes less time.

The funniest thing is that it was - and that was completely incidental - exactly the same day that another Website was started, which I, over the years, spent much more time on: Wikipedia.

Wikipedia changed my life, not only once, but many times.

It is how I met Kamara.

It is how I met a lot of other very smart people, too. It became part of my research work and my PhD thesis. It became the motivation for many of the projects I have started, be it Semantic MediaWiki, Wikidata, or Abstract Wikipedia. It is the reason for my career trajectory over the last fifteen years. It is hard to overstate how influential Wikipedia has been on my life.

It is hard to overstate how important Wikipedia has become for modern AI and for the Web of today. For smaller language communities. For many, many people looking for knowledge. And for the many people who realised that they can contribute to it too.

Thanks to the Wikipedia community, thanks to this marvellous project, and happy anniversary and many returns to Wikipedia!

Happy New Year 2021!

2020 was a challenging year, particularly due to the pandemic. Some things were very different, some things were dangerous, and the pandemic exposed the fault lines in many societies in a most tragic way around the world.

Let's hope that 2021 will be better in that respect, that we will have learned from how the events unfolded.

But I'm also amazed by how fast the vaccine was developed and made available to tens of millions.

I think there's some chance that the summer of '21 will become one to sing about for a generation.

Happy New Year 2021!

Keynote at SMWCon Fall 2020


I have the honor of being the invited keynote for the SMWCon Fall 2020. I am going to talk "From Semantic MediaWiki to Abstract Wikipedia", discussing fifteen years of Semantic MediaWiki, how it all started, where we are now - crossing Freebase, DBpedia, Wikidata - and now leading to Wikifunctions and Abstract Wikipedia. But, more importantly, how Semantic MediaWiki, over all these years, still holds up and what its unique value is.

Page about the talk on the official conference site: https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2020/Keynote:_From_Semantic_Wikipedia_to_Abstract_Wikipedia

Site went down

The site went down, again. First time was in July, when Apache had issues, this time it's due to MySQL acting up and frying the database. I found a snapshot from July 2019, and am trying to recreate the entries from in between (thanks, Wayback Machine!)

Until then, at least the site is back up, even though they might be some losses in the content.

P.S.: it should all be back up. If something is missing, please email me.

Wikidata crossed Q100000000

Wikidata crossed Q100000000 (and, in fact, skipped it and got Q100000001 instead).

Here's a small post by Lydia Pintscher and me: https://diff.wikimedia.org/2020/10/06/wikidata-reaches-q100000000/

Mulan

I was surprised when Disney made the decision to sell Mulan on Disney+. So if you wanted to watch Mulan, you not only have to buy it, so far so good, but you have to join their subscription service first. The price for Mulan is $30 in the US, additionally to the monthly fee of streaming, $7. So the $30 don't buy you Mulan, but allow you to watch it if you keep up your subscription.

Additionally, on December 4 the movie becomes free for everyone with a Disney+ subscription.

I thought, that's a weird pricing model. Who'd pay that much money for streaming the movie a few weeks earlier? I know, it will be very long weeks due to the world being so 2020, but still. Money is tight for many people. Also, the movie had very mixed reviews and a number of controversies attached to it.

According to the linked report, Disney really knows what they're doing. 30% of subscribers bought the early streaming privilege! Disney made hundreds of millions in extra profit within three first few days (money they really will be thankful for right now given their business with the cruise ships and theme parks and movies this year).

The most interesting part is how this will affect the movie industry. Compare to Tenet - which was reviewed much better and which was the hope to revive the moribund US cinema industry, but made less than $30M - which also needs to be shared with the theaters and had much more distribution costs. Disney keeps a much larger share of the $30 for Mulan than Tenet makes for its production company.

The lesson from Mulan and Trolls 2, which also did much better than I would ever have predicted, for the production companies experimenting with novel pricing models, could be disastrous for theaters.

I think we're going to see even more experimentation with pricing models. If the new Bond movie and/or the new Marvel movie should be pulled from cinemas, this might also be the end of cinemas as we know them.

I don't know how the industry will change, but the swing is from AMC to Netflix, with the producers being caught in between. The pandemic massively accelerated this transition, as it did so many others.

https://finance.yahoo.com/amphtml/news/nearly-onethird-of-us-households-purchased-mulan-on-disney-for-30-fee-data-221410961.html

Gödel's naturalization interview

When Gödel went to his naturalization interview, his good friend Einstein accompanied him as a witness. On the way, Gödel told Einstein about a gap in the US constitution that would allow the country to be turned into a dictatorship. Einstein told him to not mention it during the interview.

The judge they came to was the same judge who already naturalized Einstein. The interview went well until the judge asked whether Gödel thinks that the US could face the same fate and slip into a dictatorship, as Germany and Austria did. Einstein became alarmed, but Gödel started discussing the issue. The judge noticed, changed the topic quickly, and the process came to the desired outcome.

I wonder what that was, that Gödel found, but that's lost to history.

Gödel and Leibniz

Gödel in his later age became obsessed with the idea that Leibniz had written a much more detailed version of the Characteristica Universalis, and that this version was intentionally censored and hidden by a conspiracy. Leibniz had discovered what he had hunted for his whole life, a way to calculate truth and end all disagreements.

I'm surprised that it was Gödel in particular to obsess with this idea, because I'd think that someone with Leibniz' smarts would have benefitted tremendously from Gödel's proofs, and it might have been a helpful antidote to his own obsession with making truth a question of mathematics.

And wouldn't it seem likely to Gödel that even if there were such a Characteristics Universalis by Leibniz, that, if no one else before him, he, Gödel himself would have been the one to find the fatal bug in it?

Starting Abstract Wikipedia

I am very happy about the Board of the Wikimedia Foundation having approved the proposal for the multilingual Wikipedia aka Abstract Wikipedia aka Wikilambda aka we'll need to find a name for it.

In order to make that project a reality, I will as of next week join the Foundation. We will be starting with a small, exploratory team, which will allow us to have plenty of time to continue to socialize and discuss and refine the idea. Being able to work on this full time and with a team should allow us to make significant progress. I am very excited about that.

I am sad to leave Google. It was a great time, and I learned a lot about running *large* projects, and I met so many brilliant people, and I ... seriously, it was a great six and a half years, and I will very much miss it.

There is so much more I want to write but right now I am just super happy and super excited. Thanks everyone!

Lexical masks in JSON

We have released lexical masks as ShEx files before, schemata for lexicographic forms that can be used to validate whether the data is complete.

We saw that it was quite challenging to turn these ShEx files into forms for entering the data, such as Lucas Werkmeister’s Lexeme Forms. So we adapted our approach slightly to publish JSON files that keep the structures in an easier to parse and understand format, and to also provide a script that translates these JSON files into ShEx Entity Schemas.

Furthermore, we published more masks for more languages and parts of speech than before.

Full documentation can be found on wiki: https://www.wikidata.org/wiki/Wikidata:Lexical_Masks#Paper

Background can be found in the paper: https://www.aclweb.org/anthology/2020.lrec-1.372/

Thanks Bruno, Saran, and Daniel for your great work!

Major bill for US National Parks passed

Good news: the US Senate has passed a bipartisan large Public Lands Bill, which will provide billions right now and continued sustained funding for National Parks.

There a number of interesting and good parts about this, besides the obvious that National Parks are being funded better and predictably:

  1. the main reason why this passed and was made was that the Evangelical movement in the US is increasingly reckoning that Pro-Life also means Pro-Environment, and this really helped with making this bill a reality. This is major as it could set the US on a path to become a more sane nation regarding environmental policies. If this could also extend to global warming, that would be wonderful, but let's for now be thankful for any momentum in this direction.
  2. the sustained funding comes from oil and gas operations, which has a certain satisfying irony to it. I expect this part to backfire a bit somehow, but I don't know how yet.
  3. Even though this is a political move by Republicans in order to safe two of their Senators this fall, many Democrats supported it because the substance of the bill is good. Let's build on this momentum of bipartisanship.
  4. This has nothing to do with the pandemic, for once, but was in work for a long time. So all of the reasons above are true even without the pandemic.

Black lives matter

Fun in coding

16 May 2020

This article really was grinding my gears today. Coding is not fun, it claims, and everyone who says otherwise is lying for evil reasons, like luring more people into programming.

Programming requires almost superhuman capabilities, it says. And other jobs who do that, such as brain surgery, would never be described as fun, so it is wrong to talk like this about coding.

That is all nonsense. The article not only misses the point, but it denies many people their experience. What's the goal? Tell those "pretty uncommon" people that they are not only different than other people, but that their experience is plain wrong, that when they say they are having fun doing this, they are lying to others, to the normal people, for nefarious reasons? To "lure people to the field" to "keep wages under control"?

I feel offended by this article.

There are many highly complex jobs that some people have fun doing some of the time. Think of writing a novel. Painting. Playing music. Cooking. Raising a child. Teaching. And many more.

To put it straight: coding can be fun. I have enjoyed hours and days of coding since I was a kid. I will not allow anyone to deny me that experience I had, and I was not a kid with nefarious plans like getting others into coding to make tech billionaires even richer. And many people I know have expressed fun with coding.

Also: coding does not *have* to be fun. Coding can be terribly boring, or difficult, or frustrating, or tedious, or bordering on painful. And there are people who never have fun coding, and yet are excellent coders. Or good enough to get paid and have an income. There are coders who code to pay for their rent and bills. There is nothing wrong with that either. It is a decent job. And many people I know have expressed not having fun with coding.

Having fun coding doesn't mean you are a good coder. Not having fun coding doesn't mean you are not a good coder. Being a good coder doesn't mean you have to have fun doing it. Being a bad coder doesn't mean you won't have fun doing it. It's the same for singing, dancing, writing, playing the trombone.

Also, professional coding today is rarely the kind of activity portrayed in this article, a solitary activity where you type code in green letters into a monotype font on black background, without having to answer to anyone, your code not being reviewed and scrutinized before it goes into production. For decades, coding has been a highly social activity, that requires negotiation and discussion and social skills. I don't know if I know many senior coders who spend the majority of their work time actually coding. And it's in that level of activity where ethical decisions are made. Ethical decisions are rarely happening at the moment the coder writes an if statement, or declares a variable. These decisions are made long in advance, documented in design docs and task descriptions, reviewed by a group of people.

So this article, although it has its heart in the right position, trying to point out that coding, like any engineering, also has many relevant ethical questions, goes about it entirely wrongly, and manages to offend me, and probably a lot of other people.

Sorry for my Saturday morning rant.

OK

11 May 2020

I often hear "don't go for the mediocre, go for the best!", or "I am the best, * the rest" and similar slogans. But striving for the best, for perfection, for excellence, is tiring in the best of times, never mind, forgive the cliché, in these unprecedented times.

Our brains are not wired for the best, we are not optimisers. We are naturally 'satisficers', we have evolved for the good-enough. For this insight, Herbert Simon received a Nobel prize, the only Turing Award winner to ever get one.

And yes, there are exceptional situations where only the best is good enough. But if good enough was good enough for a Turing-Award winning Nobel laureate, it is probably for most of us too.

It is OK to strive for OK. OK can sometimes be hard enough, to be honest.

May is mental health awareness month. Be kind to each other. And, I know it is even harder, be kind to yourself.

Here is OK in different ways. I hope it is OK.

Oké ఓకే ਓਕੇ オーケー ओके 👌 ওকে או. קיי. Окей أوكي Օքեյ O.K.


Tim Bray leaving Amazon in protest

Tim Bray, co-author of XML, stepped down as Amazon VP over their handling of whistleblowers on May 1st. His post on this decision is worth reading.

If life was one day

If the evolution of animals was one day... (600 million years)

  • From 1am to 4am, most of the modern types of animals have evolved (Cambrian explosion)
  • Animals get on land a bit at 3am. Early risers! It takes them until 7am to actually breath air.
  • Around noon, first octopuses show up.
  • Dinosaurs arrive at 3pm, and stick around until quarter to ten.
  • Humans and chimpanzees split off about fifteen minutes ago, modern humans and Neanderthals lived in the last minute, and the pyramids were built around 23:59:59.2.

In that world, if that was a Sunday:

  • Saturday would have started with the introduction of sexual reproduction
  • Friday would have started by introducing the nucleus to the cell
  • Thursday recovering from Wednesday's catastrophe
  • Wednesday photosynthesis started, and lead to a lot of oxygen which killed a lot of beings just before midnight
  • Tuesday bacteria show up
  • Monday first forms of life show up
  • Sunday morning, planet Earth forms, pretty much at the same time as the Sun.
  • Our galaxy, the Milky Way, is about a week older
  • The Universe is about another week older - about 22 days.

There are several things that surprised me here.

  • That dinosaurs were around for such an incredibly long time. Dinosaurs were around for seven hours, and humans for a minute.
  • That life started so quickly after Earth was formed, but then took so long to get to animals.
  • That the Earth and the Sun started basically at the same time.

Addendum April 27: Álvaro Ortiz, a graphic designer from Madrid, turned this text into an infographic.

Architecture for a multilingual Wikipedia

I published a paper today:

"Architecture for a multilingual Wikipedia"

I have been working on this for more than half a decade, and I am very happy to have it finally published. The paper is a working paper and comments are very welcome.

Abstract:

Wikipedia’s vision is a world in which everyone can share in the sum of all knowledge. In its first two decades, this vision has been very unevenly achieved. One of the largest hindrances is the sheer number of languages Wikipedia needs to cover in order to achieve that goal. We argue that we need anew approach to tackle this problem more effectively, a multilingual Wikipedia where content can be shared between language editions. This paper proposes an architecture for a system that fulfills this goal. It separates the goal in two parts: creating and maintaining content in an abstract notation within a project called Abstract Wikipedia, and creating an infrastructure called Wikilambda that can translate this notation to natural language. Both parts are fully owned and maintained by the community, as is the integration of the results in the existing Wikipedia editions. This architecture will make more encyclopedic content available to more people in their own language, and at the same time allow more people to contribute knowledge and reach more people with their contributions, no matter what their respective language backgrounds. Additionally, Wikilambda will unlock a new type of knowledge asset people can share in through the Wikimedia projects, functions, which will vastly expand what people can do with knowledge from Wikimedia, and provide a new venue to collaborate and to engage the creativity of contributors from all around the world. These two projects will considerably expand the capabilities of the Wikimedia platform to enable every single human being to freely share in the sum of all knowledge.

Stanford seminar on Knowledge Graphs

My friend Vinay Chaudhri is organising a seminar on Knowledge Graphs with Naren Chittar and Michael Genesereth this semester at Stanford.

I have the honour to present in it as the opening guest lecturer, introducing what Knowledge Graphs are and what are good for.

Due to the current COVID situation, the seminar was turned virtual, and opened to everyone to attend to.

Other speakers during the semester include Juan Sequeda, Marie-Laure Mugnier, Héctor Pérez Urbina, Michael Uschold, Jure Leskovec, Luna Dong, Mark Musen, and many others.

Change is in the air

I'll be prophetic: the current pandemic will shine a bright light on the different social and political systems in the different countries. I expect to see noticeable differences in how disruptive the handling of the situation by the government is, how many issues will be caused by panic, and what effect freely available health care has. The US has always been on the very end of admiring the self sustained individual, and China has been on the other end of admiring the community and its power, and Europe is somewhere in the middle (I am grossly oversimplifying).

This pandemic will blow over in a year or two, it will sweep right through the US election, and the news about it might shape what we deem viable and possible in ways beyond the immediately obvious. The possible scenarios range all the way from high tech surveillance states to a much wider access to social goods such as health and education, and whatever it is, the pandemic might be a catalyst towards that.

Wired: "Wikipedia is the last best place on the Internet"

WIRED published a beautiful ode to Wikipedia, painting the history of the movement with broad strokes, aiming to capture its impact and ambition with beautiful prose. It is a long piece, but I found the writing exciting.

Here's my favorite paragraph:

"Pedantry this powerful is itself a kind of engine, and it is fueled by an enthusiasm that verges on love. Many early critiques of computer-assisted reference works feared a vital human quality would be stripped out in favor of bland fact-speak. That 1974 article in The Atlantic presaged this concern well: “Accuracy, of course, can better be won by a committee armed with computers than by a single intelligence. But while accuracy binds the trust between reader and contributor, eccentricity and elegance and surprise are the singular qualities that make learning an inviting transaction. And they are not qualities we associate with committees.” Yet Wikipedia has eccentricity, elegance, and surprise in abundance, especially in those moments when enthusiasm becomes excess and detail is rendered so finely (and pointlessly) that it becomes beautiful."

They also interviewed me and others for the piece, but the focus of the article is really on what the Wikipedia communities have achieved in our first two decades.

Two corrections: - I cannot be blamed for Wikidata alone, I blame Markus Krötzsch as well - the article says that half of the 40 million entries in Wikidata have been created by humans. I don't know if that is correct - what I said is that half of the edits are made by human contributors

Normbrunnenflasche

It's a pity there's no English Wikipedia article about this marvellous thing that exemplifies Germany so beautifully and quintessentially: the Normbrunnenflasche.

I was wondering the other day why in Germany sparkling water is being sold in 0.7l bottles and not in 1l or 2l or whatever, like in the US (when it's sold here at all, but that's another story).

Germany had a lot of small local producers and companies. To counter the advantages of the Coca Cola Company pressing in the German market, in 1969 a conference of representatives of the local companies decided to introduce a bottle design they all would use. This decision followed a half year competition and discussion on what this bottle should look like.

Every company would use the same bottle for sparkling water and other carbonated drinks, and so no matter which one you bought, the empty bottle would afterwards be routed to the closest participating company, not back home, therefore reducing transport costs and increasing competitiveness against Coca Cola.

The bottle is full of smart features. The 0.7l were chosen to ensure that the drink remained carbonated until the last sip, because larger bottles would last longer and thus gradually loose carbonization.

The form and the little pearls outside were chosen for improved grip, but also to symbolize the sparkles of the carbonization.

The metal screw cap was the real innovation there, useful for drinks that could increase pressure due to the carbonization.

And finally two slightly thicker bands along the lower half of the bottle that would, while being rerouted for another usage, slowly get more opaque due to mechanical pressure, thus indicating how well used the individual bottle was, so they could be taken out of service in time before breaking at the customer.

The bottles were reused an average of fifty times, their boxes an average of hundred times. More than five billion of them have been brought into circulation in the fifty years since their adoption, for an estimated quarter of a trillion fillings.

A new decade?

The job of an ontologist is to define concepts. And since I see some posts commenting on whether a decade is closing and a new decade is starting tonight, here's my private, but entirely official position.

A decade is a consecutive timespan of ten years, and therefore at every given point a new decade starts and one ends. But that's a trivial answer to the question and not very useful.

There are two ways to count calendar decades, and both are arbitrary and rely on retconning, I mean, they really on redefining the past. Therefore there is no right or wrong.

Method one is by using the proleptic Gregorian calendar, and starting with the year 1 and ending with the year 10, and calling that the first decade. If you keep counting, then the twohundredandthird decade will start on January 1st 2021, and we are currently firmly in the twohundredandsecond decade, and will stay there for another year.

Method two is based on the fact that for a millennium now and for many years to come there's a time period that conveniently lasts a decade where the years start with the same three digits. That is, the years starting with 202, which are called the 2020s, the ones with 199 which are called the 1990s (or sometimes just the 90s), etc. For centuries now we can find support for these kind of decades being widely used. According to this method, tonight marks a new decade.

So whether you are celebrating a new year tonight or not (because there are many other calendars out there too), or a new decade or not, I wish you wonderful 2020s!

SWAT4HCLS trip report

This week saw the 12th SWAT4HCLS event in Edinburgh, Scotland. It started with a day of tutorials and workshops on Monday, December 10th, on topics such as SPARQL, querying, ontology matching, and using Wikibase and Wikidata.

Conference presentations went on for two days, Tuesday and Wednesday. This included four keynotes, including mine on Wikidata, and how to move beyond Wikidata (presenting the ideas from my Abstract Wikipedia papers). The other three keynotes (as well as a number of the paper presentation) were all centered on the FAIR concept which I already saw being so prominent at the eScience conference earlier this year. FAIR as in Findable, Accessible, Interoperable, and Reusable publication of data. I am very happy to see these ideas spread out so prominently!

Birgitta König-Ries talked about how to use semantic technologies to manage FAIR data. Dov Greenbaum talked about how licenses interplay with data and what it means for FAIR data - personally, my personal favorite of the keynotes, because of my morbid fascination regarding licenses and intellectual property rights pertaining to data and knowledge. He actually confirmed my understanding of the area - that you can’t really use copyright for data, and thus the application of CC-BY or similar licenses to data would stand on shaky grounds in a court. The last keynote was by Helen Parkinson, who gave a great talk on the issues that come up when building vocabularies, including issues around over-ontologizing (and the siren call of just keeping on modeling) and others. She put the issues in parallel to the travels of Odysseus, which was delightful.

The conference talks and posters were really on spot on the topic of the conference: using semantic web technologies in the life sciences, health care, and related fields. It was a very satisfying experience to see so many applications of the technologies that Semantic Web researchers and developers have been creating over the years. My personal favorite was MetaStanza, web components that visualize SPARQL results in many different ways (a much needed update to SPARK, that Andreas Harth and I had developed almost a decade ago).

On Thursday, the conference closed with a Hackathon day, which I couldn’t attend unfortunately.

Thanks to the organizers for the event, and thanks again for the invitation to beautiful Edinburgh!

Other trip reports (send me more if you have them):