Semantic search

Jump to navigation Jump to search

Normbrunnenflasche

It's a pity there's no English Wikipedia article about this marvellous thing that exemplifies Germany so beautifully and quintessentially: the Normbrunnenflasche.

I was wondering the other day why in Germany sparkling water is being sold in 0.7l bottles and not in 1l or 2l or whatever, like in the US (when it's sold here at all, but that's another story).

Germany had a lot of small local producers and companies. To counter the advantages of the Coca Cola Company pressing in the German market, in 1969 a conference of representatives of the local companies decided to introduce a bottle design they all would use. This decision followed a half year competition and discussion on what this bottle should look like.

Every company would use the same bottle for sparkling water and other carbonated drinks, and so no matter which one you bought, the empty bottle would afterwards be routed to the closest participating company, not back home, therefore reducing transport costs and increasing competitiveness against Coca Cola.

The bottle is full of smart features. The 0.7l were chosen to ensure that the drink remained carbonated until the last sip, because larger bottles would last longer and thus gradually loose carbonization.

The form and the little pearls outside were chosen for improved grip, but also to symbolize the sparkles of the carbonization.

The metal screw cap was the real innovation there, useful for drinks that could increase pressure due to the carbonization.

And finally two slightly thicker bands along the lower half of the bottle that would, while being rerouted for another usage, slowly get more opaque due to mechanical pressure, thus indicating how well used the individual bottle was, so they could be taken out of service in time before breaking at the customer.

The bottles were reused an average of fifty times, their boxes an average of hundred times. More than five billion of them have been brought into circulation in the fifty years since their adoption, for an estimated quarter of a trillion fillings.

A new decade?

The job of an ontologist is to define concepts. And since I see some posts commenting on whether a decade is closing and a new decade is starting tonight, here's my private, but entirely official position.

A decade is a consecutive timespan of ten years, and therefore at every given point a new decade starts and one ends. But that's a trivial answer to the question and not very useful.

There are two ways to count calendar decades, and both are arbitrary and rely on retconning, I mean, they really on redefining the past. Therefore there is no right or wrong.

Method one is by using the proleptic Gregorian calendar, and starting with the year 1 and ending with the year 10, and calling that the first decade. If you keep counting, then the twohundredandthird decade will start on January 1st 2021, and we are currently firmly in the twohundredandsecond decade, and will stay there for another year.

Method two is based on the fact that for a millennium now and for many years to come there's a time period that conveniently lasts a decade where the years start with the same three digits. That is, the years starting with 202, which are called the 2020s, the ones with 199 which are called the 1990s (or sometimes just the 90s), etc. For centuries now we can find support for these kind of decades being widely used. According to this method, tonight marks a new decade.

So whether you are celebrating a new year tonight or not (because there are many other calendars out there too), or a new decade or not, I wish you wonderful 2020s!

SWAT4HCLS trip report

This week saw the 12th SWAT4HCLS event in Edinburgh, Scotland. It started with a day of tutorials and workshops on Monday, December 10th, on topics such as SPARQL, querying, ontology matching, and using Wikibase and Wikidata.

Conference presentations went on for two days, Tuesday and Wednesday. This included four keynotes, including mine on Wikidata, and how to move beyond Wikidata (presenting the ideas from my Abstract Wikipedia papers). The other three keynotes (as well as a number of the paper presentation) were all centered on the FAIR concept which I already saw being so prominent at the eScience conference earlier this year. FAIR as in Findable, Accessible, Interoperable, and Reusable publication of data. I am very happy to see these ideas spread out so prominently!

Birgitta König-Ries talked about how to use semantic technologies to manage FAIR data. Dov Greenbaum talked about how licenses interplay with data and what it means for FAIR data - personally, my personal favorite of the keynotes, because of my morbid fascination regarding licenses and intellectual property rights pertaining to data and knowledge. He actually confirmed my understanding of the area - that you can’t really use copyright for data, and thus the application of CC-BY or similar licenses to data would stand on shaky grounds in a court. The last keynote was by Helen Parkinson, who gave a great talk on the issues that come up when building vocabularies, including issues around over-ontologizing (and the siren call of just keeping on modeling) and others. She put the issues in parallel to the travels of Odysseus, which was delightful.

The conference talks and posters were really on spot on the topic of the conference: using semantic web technologies in the life sciences, health care, and related fields. It was a very satisfying experience to see so many applications of the technologies that Semantic Web researchers and developers have been creating over the years. My personal favorite was MetaStanza, web components that visualize SPARQL results in many different ways (a much needed update to SPARK, that Andreas Harth and I had developed almost a decade ago).

On Thursday, the conference closed with a Hackathon day, which I couldn’t attend unfortunately.

Thanks to the organizers for the event, and thanks again for the invitation to beautiful Edinburgh!

Other trip reports (send me more if you have them):

Frozen II in Korea

This is a fascinating story, that just keeps getting better (and Hollywood Reporter is only scratching the surface here, unfortunately): an NGO in South Korea is suing Disney for "monopolizing" the movie screens of the country, because Frozen II is shown on 88% of all screens.

Now, South Korea has a rich and diverse number of movie theatres - they have the large cineplexes in big cities, but in the less populated areas they have many small theatres, often with a small number of screens (I reckon it is similar to the villages in Croatia, where there was only a single screen in the theater, and most movies were shown only once, and there were only one or two screenings per day, and not on every day). The theatres are often independent, so there is no central planning about which movies are being shown (and today, it rarely matters today how many copies of a movie are being made, as many projectors are digital and thus unlimited copies can be created on the fly - instead of waiting for the one copy to travel from one town to the next, which was the case in my childhood).

So how would you ensure that these independent movies don't show a movie too often? By having a centralized way that ensures that not too many screens show the same movie? (Preferably on the Blockchain, using an auction system?) Good luck with that, and allowing the local theatres to adapt their screenings to their audiences.

But as said, it gets better: the 88% number is being arrived at by counting how many of the screens in the country showed Frozen II on a given day. It doesn't mean that that screen was used solely for Frozen II! If the screen was used at noon for a showing of Frozen II, and at 10pm for a Korean horror movie, that screen counts for both. Which makes the percentage a pretty useless number if you want to show monopolistic dominance (also, because the numbers add up to far more than 100%). Again, remember that in small towns there is often a small number of screens, and they have to show several different movies on the same screen. If the ideas of the lawsuit would be enacted, you would need to keep off Frozen II from a certain number of screens! Which basically makes it impossible to allow kids and teens in less populated areas to participate in event movie-going such as Frozen II and trying to avoid spoilers in Social Media afterwards.

Now, if you look how many screenings, instead of screens, were occupied by Frozen II, the number drops down to 46% - which is still impressive, but far less dominant and monopolistic than the 88% cited above (and in fact below the 50% the Korean law requires to establish dominance).

And even more impressive: in the end it is up to the audience. And even though 'only' 46% of the screenings were on Frozen II, every single day since its release between 60% and 85% of all revenue was going to Frozen II. So one could argue that the theatres were actually underserving the audience (but then again, that's not how it really works, because screenings are usually in rooms with hundred or more seats, and they can be very differently filled - and showing a blockbuster three times with almost full capacity, and showing a less popular movie once with only a dozen or so tickets sold might still have served the local community better than only running the block buster).

I bet the NGO's goal is just to raise awareness about the dominance of the American entertainment industry, and for that, hey, it's certainly worth a shot! But would they really want to go back to a system where small local cinemas would not be able to show blockbusters for a long time, involving a complicated centralized planning component?

(Also, I wish there was a way to sign up for updates on a story, like this lawsuit. Let me know if anyone knows of such a system!)


Machine Learning and Metrology

There are many, many papers in machine learning these days. And this paper, taking a step back, and thinking about how researchers measure their results and how good a specific type of benchmarks even can be - crowdsourced golden sets. It brings a convincing example based on word similarity, using terminology and concepts from metrology, to show how many results that have been reported are actually not supported by the golden set, because the resolution of the golden set is actually insufficient. So there might be no improvement at all, and that new architecture might just be noise.

I think this paper is really worth the time of people in the research field. Written by Chris Welty, Lora Aroyo, and Praveen Paritosh.

The story of the Swedish calendar

Most of us are mostly aware how the calendar works. There’s twelve months in a year, each month has 30 or 31 days, and then there’s February, which usually has 28 days and sometimes, in what is called a leap year, 29. In general, years divisible by four are leap years.

This calendar was introduced by no one else then Julius Caesar, before he became busy conquering the known world and becoming the Emperor of Rome. Before that he used to have the job title “supreme bridge builder” - the bridge connecting the human world with the world of the gods. One of the responsibilities of this role was to decide how many days to add to the end of the calendar year, because the Romans noticed that their calendar was getting misaligned with the seasons, because it was simply a bit too short. So, for every year, the supreme bridge builder had to decide how many days to add to the calendar.

Since we are talking about the Roman Republic, this was unsurprisingly misused for political gain. If the supreme bridge builder liked the people in power, he might have granted a few extra weeks. If not, no extra days. Instead of ensuring that the calendar and the seasons aligned, the calendar got even more out of whack.

Julius Caesar spearheaded a reform of the calendar, and instead of letting the supreme bridge builder decide how many days to add, the reform devised rules founded in observation and mathematical rules - leading to the calendar we still have today: twelve months each year, each with 30 or 31 days, besides February, which had 28, but every four years would have 29. This is what we today call the Julian calendar. This calendar was not perfect, but pretty good.

Over the following centuries, the role of the supreme bridge builder - or, in latin, Pontifex Maximus - transferred from the Emperor of Rome to the Bishop of Rome, the Pope. And with continuing observations over centuries it was noticed that the calendar was again getting out of sync with the seasons. So it was the Pope - Gregory XIII, later called The Great - who, in his role as Pontifex Maximus, decided that the calendar should be fixed once again. The committee he set up to work on that came up with fabulous improvements, which would guarantee to keep the calendar in sync for a much longer time frame. In addition to the rules established by the Julian calendar, every hundred years we would drop a leap year. But every four hundred years, we would skip dropping the leap year (as we did in 2000, which not many people noticed). And in 1582, this calendar - called the Gregorian calendar - was introduced.

Imagine leading a committee that comes up with rules on what the whole world would need to do once every four hundred years - and mostly having these rules implemented. How would you lead and design such a committee? I find this idea mind-blowing.

Since the time of Caesar until 1582, about fifteen centuries have passed. And in this time, the calendar was getting slightly out of sync - by one day every century, skipping every fourth. In order to deal with that shift, they decided that ten calendar days need to be skipped. Following the 4th of October 1582 was the 15th of October 1582. In 1582, there was no 5th or 14th of October, nor any of the days in between, in the countries that had the Gregorian calendar adopted.

This lead to plenty of legal discussions, mostly about monthly rents and wages: is this still a full month, or should the rent or wage be paid prorated to the number of days? Should annual rents, interests, and taxes be prorated by these ten days, or not? What day of the week should the 15th of October be?


The Gregorian calendar was a marked improvement over the Julian calendar with regards to keeping the seasons in sync with the calendar. So one might think its adoption should be a no-brainer. But there was a slight complication: politics.

Now imagine that today the Pope gets out on his balcony, and declares that, starting in five years, January to November all have 30 days, and December has 35 or 36 days. How would the world react? Would they ponder the merits of the proposal, would they laugh, would they simply adopt it? Would a country such as Italy have a different public discourse about this topic than a country such as China?

In 1582, the situation was similarly difficult. Instead of pondering the benefits of the proposal, the source of the proposal and the relation to that source became the main deciding factor. Instead of adopting the idea because it is a good idea, the idea was adopted - or not - because the Pope of the Catholic Church declared it. The Papal state, the Spanish and French Kingdoms, were first to adopt it.

Queen Elizabeth wanted to adopt it in England, but the Anglican bishops were fiercely opposed to it because it was suggested by the Pope. Other Protestant and the Orthodox countries simply ignored it for centuries. And thus there was a 5th of October 1582 in England, but not in France, and that lead to a number of confusions over the following centuries.

Ever wondered why the October Revolution started November 7? There we go. There is even a story that Napoleon won an important battle (either the Battle of Austerlitz or the Battle of Ulm) because the Russian and Austrian forces coordinated badly as the Austrians were using the Gregorian and the Russians the Julian calendar. The story is false, but it makes for a great story.

Today, the International Day of the Book is on April 23 - the death date of both Miguel de Cervantes and William Shakespeare in 1616, the two giants of literature in their respective languages - with the amusing side-effect that they actually died about two weeks apart, even though they died on the same calendar day, but in different calendars.

It wasn’t until 1923 that for most purposes all countries had deprecated the Julian calendar, and for religious purposes some still follow it - which is why the Orthodox and the Amish celebrate Christmas on January 6. Starting 2101, that should shift by another day - and I would be very curious to see whether it will, or whether by then January 6th has solidified as the Christmas date.


Possibly the most confusing story about adopting the Gregorian calendar comes from Sweden. Like most protestant countries, Sweden did not initially adopt the Gregorian calendar, and was sticking with the Julian calendar, until in 1699 they decided to switch.

Now, the idea of skipping eleven or twelve days in one go did not sound appealing - remember all the chaos that occurred in the other countries for dropping these days. So in Sweden they decided that instead of dropping the days all at once, they would drop them one by one, by skipping the leap years from 1700 until 1740, when the two calendars would finally catch up.

In 1700, February 29 was skipped in Sweden. Which didn’t bring them any closer to Gregorian countries such as Spain, because they skipped the leap year in 1700 anyway. But it brought them out of alignment with Russia - by one day.

A war with Russia started (not about the calendar, but just a week before the calendars went out of sync, incidentally), and due to the war Sweden forgot to skip the leap days in 1704 and 1708 (they had other things on their mind). And as this was embarrassing, in 1711, King Charles XII of Sweden declared to abandon the plan, and added one extra day the following year to realign it back to Russia. And because 1712 was a leap year anyway, in Sweden there was not only a February 29, but also a February 30, 1712. The only legal February 30 in history so far.

It needed not only for Charles XII to die, but also for his sister (who succeeded him) and her husband (who succeeded her) in 1751, before Sweden could move beyond that embarrassing episode, and in 1752 Sweden switched from the Julian to the Gregorian calendar, by cutting February short and ending it after February 17, following that by March 1.


Somewhere on my To-Do list, I have the wish to write a book on Wikidata. How it came to be, how it works, what it means, the complications we encountered, and the ones we missed, etc. One section in this book is planned to be about calendar models. This is an early, self-contained draft of part of that section. Feedback and corrections are very welcome.


Erdös number, update

I just made an update to a post from 2006, because I learned that my Erdös number has went down from 4 to 3. I guess that's pretty much it - it is not likely I'll ever become a 2.

The Fourth Scream

Janie loved her research. It was at the intersection of so many interesting areas - genetics, linguistics, neuroscience. And the best thing about it - she could work the whole day with these adorable vervet monkeys.

One more time, she showed the video of the flying eagle to Kassandra. The MRI helmet on Kassandra’s little head measured the neuron activation, highlighting the same region on her computer screen as the other times, the same region as with the other monkeys. Kassandra let out the scream that Janie was able to understand herself by now, the scream meaning “Eagle!”, and the other monkeys behind the bars in the far end of the room, in a cage large as half the room, ran to cover in the bushes and small caves, if they were close enough. As they did every time.

That MRI helmet was a masterpiece. She could measure the activation of the neurons in unprecedented high resolution. And not only that, she could even send inferencing waves back, stimulating very fine grained regions in the monkey’s brain. The stimulation wasn’t very fast, but it was a modern miracle.

She slipped a raspberry to Kassandra, and Kassandra quickly snatched it and stuffed it in her mouth. The monkeys came from different populations from all over Southern and Eastern Africa, and yet they all understood the same three screams. Even when the baby monkeys were raised by mute parents, the baby monkeys understood the same three screams. One scream was to warn them from leopards, one scream was to warn them from snakes, and the third scream was to warn them from eagles. The screams were universally understood by everyone across the globe - by every vervet monkey, that is. A language encoded in the DNA of the species.

She called up the aggregated areas from the scream from her last few experiments. In the last five years, she was able to trace back the proteins that were responsible for the growth of these four areas, and thus the DNA encoding these calls. She could prove that these three different screams, the three different words of Vervetian, were all encoded in DNA. That was very different from human language, where every word is learned, arbitrary, and none of the words were encoded in our DNA. Some researchers believed that other parts of our language were encoded in our DNA: deep grammatical patterns, the ability to merge chunks into hierarchies of meaning when parsing sentences, or the categorical difference between hearing the syllable ba and the syllable ga. But she was the first one to provably connect three different concrete genes with three different words that an animal produces and understands.

She told the software to create an overlapping picture of the three different brain areas activated by the three screams. It was a three dimensional picture that she could turn, zoom, and slice freely, in real time. The strands of DNA were highlighted at the bottom of the screen, in the same colors as the three different areas in the brain. One gene, then a break, then the other two genes she had identified. Leopard, snake, eagle.

She started to turn the visualization of the brain areas, as Kassandra started squealing in pain. Her hand was stuck between the cage bars and the plate with raspberries. The little thief was trying to sneak out a raspberry or two! Janie laughed, and helped the monkey get the hand unstuck. Kassandra yanked it back into the cage, looked at Janie accusingly, knowing that the pain was Janie’s fault for not giving her enough raspberries. Janie snickered, took out another raspberry and gave it to the monkey. She snatched it out of Janie’s hand, without stopping the accusing stare, and Janie then put the plate to the other side of the table, in safe distance and out of sight of Kassandra.

She looked back at the screen. When Kassandra cried out, her hand had twitched, and turned the visualization to a weird angle. She just wanted to turn it back to a more common view, when she suddenly stopped.

From this angle, she could see the three different areas, connecting together with the audiovisual cortex at a common point, like the leaves of a clover. But that was just it. It really looked like three leaves of a four-leaf clover. The area where the fourth leaf would be - it looked a lot like the areas where the other three leaves were.

She zoomed into the audiovisual cortex. She marked the neurons that triggered each of the three leaves. And then she looked at the fourth leaf. The connection to the cortex was similar. A bit different, but similar enough. She was able to identify what probably are the trigger-neurons, just like she was able to find them for the other three areas.

She targeted the MRI helmet on the neurons connected to the eagle trigger neurons, and with a click she sent a stimulus. Kassandra looked up, a bit confused. Janie looked at the neurons, how they triggered, unrolled the activation patterns, and saw how the signal was suppressed. She reprogrammed the MRI helmet, refined the neurons to be stimulated, and sent off another stimulus.

Kassandra yanked her head up, looking around, surprised. She looked at her screen, but it showed nothing as well. She walked nervously around inside the little cage, looking worriedly to the ceiling of the lab, confused. Janie again analyzed the activation patterns, and saw how it almost went through. There seemed to be a single last gatekeeper to pass. She reprogrammed the stimulator again. Third time's the charm, they say. She just remembered a former boyfriend, who was going on and on about this proverb. How no one knew how old it was, where it began, and how many different cultures all over the world associate trying something three times with eventual success, or an eventual curse. How some people believed you need to call the devil's name three times to —

Kassandra screamed out the same scream as before, the scream saying “Eagle!”. The MRI helmet had sent the stimulus, and it worked. The other monkeys jumped for cover. Kassandra raised her own arms above her head, peeking through her fingers to find the eagle she had just sensed.

Janie was more than excited! This alone will make a great paper. She could get the monkeys to scream out one of the three words of their language by a simple stimulation of particular neurons! Sure, she expected this to work - why wouldn’t it? But the actual scream, the confirmation, was exhilarating. As expected, the neurons now had a heightened potential, were easier to activate, waiting for more input. They slowly cooled down as Kassandra didn’t see any eagles.

She looked at the neurons connected to the fourth leaf. The gap. Was there a secret, fourth word hidden? One that all the zoologists studying vervet monkeys have missed so far? What would that word be? She reprogrammed the MRI helmet, aiming at the neurons that would trigger the fourth leaf. If her theory was right. With another click she sent a stimulus to the —

Janie was crouching in the corner of the room, breathing heavily, cold sweat was covering her arms, her face, her whole body. Her clothes were clamp. Her arms were slung above her head. She didn’t remember how she got here. The office chair she was just sitting in a moment ago, laid on the floor. The monkeys were quiet. Eerily quiet. She couldn’t see them from where she was, she couldn’t even see Kassandra from here, who was in the cage next to her computer. One of the halogen lamps in the ceiling was flickering. It wasn’t doing that before, was it?

She slowly stood up. Her body was shivering. She felt dizzy. She almost stumbled, just standing up. She slowly lowered her arms, but her arms were shaking. She looked for Kassandra. Kassandra was completely quiet, rolled up in the very corner of her cage, her arms slung around herself, her eyes staring catatonically forward, into nothing.

Janie took a step towards the middle of the room. She could see a bit more of the cage. The monkeys were partly huddled together, shaking in fear. One of them laid in the middle of the cage, his face in a grimace of terror. He was dead. She thought it was Rambo, but she wasn’t sure. She stumbled to the computer, pulled the chair from the floor, slumped into it.

The MRI helmet had recorded the activation pattern. She stepped through it. It did behave partially the same: the neurons triggered the unknown leaf, as expected, and that lead to activate the muscles around the lungs, the throat, the tongue, the mouth - in short, that activated the scream. But, unlike with the eagle scream, the activation potential did not increase, it was now suppressed. Like if it was trying to avoid a second triggering. She checked the pattern: yes, the neuron triggered that suppression itself. That was different. How did this secret scream sound?

Oh no! No, no, no, no, NOO!! She had not recorded the experiment. How stupid!

She was excited. She was scared, too, but she tried to push that away. She needed to record that scream. She needed to record the fourth word, the secret word of vervet monkeys. She switched on all three cameras in the lab, one pointed at the large cage with the monkeys, the other two pointing at Kassandra - and then she changed her mind, and turned one onto herself. What has happened to herself? Why couldn’t she remember hearing the scream? Why was she been crouching on the floor like one of the monkeys?

She checked her computer. The MRI helmet was calibrated as before, pointing at the group of triggering neurons. The suppression was ebbing down, but not as fast as she wanted. She increased the stimulation power. She shouldn’t. She should follow protocol. But this all was crazy. This was a cover story for Nature. With her as first author. She checked the recording devices. All three were on. The streams were feeding back into her computer. She clicked to send the sti—

She felt the floor beneath her. It was dirty and cold. She was laying on the floor, face down. Her ears were ringing. She turned her head, opened her eyes. Her vision was blurred. Over the ringing in her ears she didn’t hear a single sound from the monkeys. She tried to move, and she felt her pants were wet. She tried to stand up, to push herself up.

She couldn’t.

She panicked. Shivered. And when she felt the tears running over her face, she clenched her teeth together. She tried to breath, consciously, to collect herself, to gain control. Again she tried to stand up, and this time her arms and legs moved. Slower than she wanted. Weaker than she hoped. She was shaking. But she moved. She grabbed the chair. Pulled herself up a bit. The computer screen was as before, as if nothing has happened. She looked to Kassandra.

Kassandra was dead. Her eyes were bloodshot. Her face was a mask of pure terror, staring at nothing in the middle of the room. Janie tried to look at the cage with the other monkeys, but she couldn’t focus her gaze. She tried to yank herself into the chair.

The chair rolled away, and she crashed to the floor.

She had went too far. She had made a mistake. She should have had followed protocol. She was too ambitious, her curiosity and her impatience took the best of her. She had to focus. She had to fix things. But first she needed to call for help. She crawled to the chair. She pulled herself up, tried to sit in the chair, and she did it. She was sitting. Success.

Slowly, she rolled back to the computer. Her office didn’t have a phone. She double-clicked on the security app on her desktop. She had no idea how it worked, she never had to call security before. She hoped it would just work. A screen opened, asking her for some input. She couldn’t read it. She tried to focus. She didn’t know what to do. After a few moments the app changed, and it said in big letters: HELP IS ON THE WAY. STAY CALM. She closed her eyes. Breathed. Good.

After a few moments she felt better. She opened her eyes. HELP IS ON THE WAY. STAY CALM. She read it, once, twice. She nodded, her gaze jumping over the rest of the screen.

The recording was still on.

She moved the mouse cursor to the recording app. She wanted to see what has happened. There was nothing to do anyway, until security came. She clicked on the play button.

The recording filled three windows, one for each of the cameras. One pointed at the large cage with the vervet monkeys, two at Kassandra. Then, one of the cameras pointing at Kassandra was moved, pointing at Janie, just moments ago - it was moments, was it? - sitting at the desk. She saw herself getting ready to send the second stimulus to Kassandra, to make her call the secret scream a second time.

And then, from the recording, Kassandra called for a third time.

The end

History of knowledge graphs

An overview on the history of ideas leading to knowledge graphs, with plenty of references. Useful for anyone who wants to understand the background of the field, and probably the best current such overview.

On the competence of conspiracists

“Look, I’ll be honest, if living in the US for the last five years has taught me anything is that any government assemblage large enough to try to control a big chunk of the human population would in no way be consistently competent enough to actually cover it up. Like, we would have found out in three months and it wouldn’t even have been because of some investigative reporter, it would have been because one of the lizards forgot to put on their human suit on day and accidentally went out to shop for a pint of milk and like, got caught in a tik-tok video.” -- Os Keyes, WikidataCon, Keynote "Questioning Wikidata"

Power in California

It is wonderful to live in the Bay Area, where the future is being invented.

Sure, we might not have a reliable power supply, but hey, we have an app that connects people with dogs who don't want to pick up their poop with people who are desperate enough to do this shit.

Another example how the capitalism that we currently live failed massively: last year, PG&E was found responsible for killing people and destroying a whole city. Now they really want to play it safe, and switch off the power for millions of people. And they say this will go on for a decade. So in 2029 when we're supposed to have AIs, self-driving cars, and self-tieing Nikes, there will be cities in California that will get their power shut off for days when there is a hot wind for an afternoon.

Why? Because the money that should have gone into, that was already earmarked for, making the power infrastructure more resilient and safe went into bonus payments for executives (that sounds so cliché!). They tried to externalize the cost of an aging power infrastructure - the cost being literally the life and homes of people. And when told not to, they put millions of people in the dark.

This is so awfully on the nose that there is no need for metaphors.

San Francisco offered to buy the local power grid, to put it into public hands. But PG&E refused that offer of several billion dollars.

So if you live in an area that has a well working power infrastructure, appreciate it.

Academic lineage

Sorry for showing off, but it is just too cool not to: here is a visualization of my academic lineage according to Wikidata.

Query: w.wiki/AE8

Bring me to your leader!

"Bring me to your leader!", the explorer demanded.

"What's a leader?", the natives asked.

"The guy who tells everyone what to do.", he explained with some consternation.

"Oh yeah, we have one like that, but why would you want to talk to him? He's unbearable."

AKTS 2019

September 24 was the AKTS workshop - Advanced Knowledge Technologies for Science in a FAIR world - co-located with the eScience and Gateways conferences in San Diego. As usual with my trip reports, I won't write about every single talk, but offer only my own personal selection and view. This is not an official report on the workshop.

I had the honor of kicking off the day. I made the proposal of using Wikidata for describing datasets so that dataset catalogs can add these descriptions to their indexes. The standard way to do so is to use Schema.org annotations describing the datasets, but our idea here was to provide a fallback solution in case Schema.org cannot be applied for one reason or the other. Since the following talks would also be talking about Wikidata I used the talk to introduce Wikidata in a bit more depth. In parallel, I kicked the same conversation off on Wikidata as well. The idea was well received, but one good question was raised by Andrew Su: why not add Schema.org annotations to Wikidata instead?

After that, Daniel Garijo of USC's ISI presented WDPlus, Wikidata Plus, which presented a prototype for how to extend Wikidata with more data (particularly tabular data) from external data sources, such as censuses and statistical publications. The idea is to surround Wikidata with a layer of so-called satellites, which materialize statistical and other external data into Wikidata's schema. They implemented a mapping languages, T2WDML, that allows to grab CSV numbers and turn them into triples that are compatible with Wikidata's schema, and thus can be queried together. There seems to be huge potential in this idea, particularly if one can connect the idea of federated SPARQL querying with on-the-fly mappings, extending Wikidata to a virtual knowledge base that would be easily several times its current size.

Andrew Su from Scripps Research talked about using Wikidata as a knowledge graph in a FAIR world. He presented their brilliant Gene Wiki project, about adding knowledge about genes and proteins to Wikidata. He presented the idea of using Wikidata as a generalized back-end for customized frontend-applications - which is perfect. Wikidata's frontend is solid and functional, but in many domains there is a large potential to improve the UX for users in specific domains (and we are seeing some if flowering more around Lexemes, with Lucas Werkmeister's work on lexical forms). Su and his lab developed ChlamBase which allows the Chlamydia research community to look at the data they are interested in, and to easily add missing data. Another huge advantage of using Wikidata? Your data is going to live beyond the life of the grant. A great overview of the relevant data in Wikidata can be seen in this rich and huge and complex diagram.

The talks switched more to FAIR principles, first by Jeffrey Grethe of UCSD and then Mark Musen of Stanford. Mark was pointing out how quickly FAIR turned from a new idea to a meme that was pervasive everywhere, and the funding agencies now starting to require it. But data often has issues. One example: BioSample is the best metadata NIH has to offer. But 73% of the Boolean metadata values are not 'true' or 'false' but have values like "nonsmoker" or "recently quitted". 26% of the integers were not parseable. 68% of the entries from a controlled vocabulary were not. Having UX that helped with entering this data would be improving the quality considerably, such as CEDAR.

Carole Goble then talked about moving towards using Schema.org for FAIRer Life Sciences resources and defining a Schema.org profile that make datasets easier to use. The challenges in the field have been mostly social - there was a lot of confidence that we know how to solve the technical issues, but the social ones provide to be challenging. Carol named four of those explicitly:

  1. ontology-itis
  2. building consensus (it's harder than you think)
  3. the Schema.org Catch-22 (Schema.org won't take it if there is no usage, but people won't use it until it is in Schema.org)
  4. dedicated resources (people think you can do the social stuff in your spare time, but you can't)

Natasha Noy gave the keynote, talking about Google Dataset Search. The lessons learned from building it:

  1. Build an ecosystem first, be technically light-weight (a great lesson which was also true for Wikipedia and Wikidata)
  2. Use open, non-proprietary, standard solutions, don't ask people to build it just for Google (so in this case, use Schema.org for describing datasets)
  3. bootstrapping requires influencers (i.e. important players in the field, that need explicit outreach) and incentives (to increase numbers)
  4. semantics and the KG are critical ingredients (for quality assurance, to get the data in quickly, etc.)

At the same time, Natasha also reiterated one of Mark's points: no matter how simple the system is, people will get it wrong. The number of ways a date field can be written wrong is astounding. And often it is easier to make the ingester more accepting than try to get people to correct their metadata.

Chris Gorgolewski followed with a session on increasing findability for datasets, basically a session on SEO for dataset search: add generic descriptions, because people who need to find your dataset probably don't know your dataset and the exact terms (or they would already use it). Ensure people coming to your landing site have a pleasant experience. And the description is markup, so you can even use images.

I particularly enjoyed a trio of paper presentations by Daniel Garijo, Maria Stoica, Basel Shbita and Binh Vu. Daniel spoke about OntoSoft, an ontology to describe software workflows in sufficient detail to allow executing them, and also to create input and output definitions, describe the execution environment, etc. Close to those in- and output definition we find Maria's work on an ontology of variables. Maria presented a lot of work to identify the meaning of variables, based on linguistic, semantic, and ontological reasoning. Basel and Binh talked about understanding data catalogs deepers, being able to go deeper into the tables and understand the actual content in them. If one would connect the results of these three papers, one could potentially see how data from published tables and datasets could become alive and answer questions almost out of the box: extracting knowledge from tables, understanding their roles with regards to the input variables, and how to execute the scientific workflows.

Sure, science fiction, and the question is how well would each of the methods work, and how well would they work in concert, but hey, it's a workshop. It's meant for crazy ideas.

Ibrahim Burak Ozyurt presented an approach towards question answering in the bio-domain using Deep Learning, including Glove and BERT and all the other state of the art work. And it's all on Github! Go try it out.

The day closed with a panel with Mark Musen, Natasha Noy, and me, moderated by Yolanda Gil, discussing what we learned today. It quickly centered on the question how to ensure that people publishing datasets get appropriate credit. For most researchers, and particularly for universities, paper publications and impact factors are the main metric to evaluate researchers. So how do we ensure that people creating datasets (and I might add, tools, workflows, and social consensus) receive the fair share of credit?

Thanks to Yolanda Gil and Andrew Su for organizing the workshop! It was an exhausting, but lovely experience, and it is great to see the interest in this field.

Illuminati and Wikibase

When I was a teenager I was far too much fascinated by the Illuminati. Much less about the actual historical order, and more about the memetic complex, the trilogy by Shea and Wilson, the card game by Steve Jackson, the secret society and esoteric knowledge, the Templar Story, Holy Blood of Jesus, the rule of 5, the secret of 23, all the literature and offsprings, etc etc...

Eventually I went to actual order meetings of the Rosicrucians, and learned about some of their "secret" teachings, and also read Eco's Foucault's Pendulum. That, and access to the Web and eventually Wikipedia, helped to "cure" me from this stuff: Wikipedia allowed me to put a lot of the bits and pieces into context, and the (fascinating) stories that people like Shea & Wilson or von Däniken or Baigent, Leigh & Lincoln tell, start falling apart. Eco's novel, by deconstructing the idea, helps to overcome it.

He probably doesn't remember it anymore, but it was Thomas Römer who, many years ago, told me that the trick of these authors is to tell ten implausible, but verifiable facts, and tie them together with one highly plausible, but made-up fact. The appeal of their stories is that all of it seems to check out (because back then it was hard to fact check stuff, so you would use your time to check the most implausible stuff).

I still understand the allure of these stories, and love to indulge in them from time to time. But it was the Web, and it was learning about knowledge representation, that clarified the view on the underlying facts, and when I tried to apply the methods I was learning to it, it fell apart quickly.

So it is rather fascinating to see that one of the largest and earliest applications of Wikibase, the software we developed for Wikidata, turned out to be actual bona fide historians (not the conspiracy theorists) using it to work on the Illuminati, to catalog the letters they sent to reach other, to visualize the flow of information through the order, etc. Thanks to Olaf Simons for heading this project, and for this write up of their current state.

It's amusing to see things go round and round and realize that, indeed, everything is connected.

Wikidatan in residence at Google

Over the last few years, more and more research teams all around the world have started to use Wikidata. Wikidata is becoming a fundamental resource. That is also true for research at Google. One advantage of using Wikidata as a research resource is that it is available to everyone. Results can be reproduced and validated externally. Yay!

I had used my 20% time to support such teams. The requests became more frequent, and now I am moving to a new role in Google Research, akin to a Wikimedian in Residence: my role is to promote understanding of the Wikimedia projects within Google, work with Googlers to share more resources with the Wikimedia communities, and to facilitate the improvement of Wikimedia content by the Wikimedia communities, all with a strong focus on Wikidata.

One deeply satisfying thing for me is that the goals of my new role and the goals of the communities are so well aligned: it is really about improving the coverage and quality of the content, and about pushing the projects closer towards letting everyone share in the sum of all knowledge.

Expect to see more from me again - there are already a number of fun ideas in the pipeline, and I am looking forward to see them get out of the gates! I am looking forward to hearing your ideas and suggestions, and to continue contributing to the Wikimedia goals.

Deep kick


Mark Stoneward accepted the invitation immediately. Then it took two weeks for his lawyers at the Football Association to check the contracts and non-disclosure agreements prepared by the AI research company. Stoneward arrived at the glass and steel building in London downtown. He signed in at a fully automated kiosk, and was then accompanied by a friendly security guard to the office of the CEO.

Denise Mirza and Stoneward had met at social events, but never had time to talk for a longer time. “Congratulations on the results of the World Cup!” Stoneward nodded, “Thank you.”

“You have performed better than most of our models have predicted. This was particularly due to your willingness to make strategic choices, where other associations would simply have told their players to do their best. I am very impressed.” She looked at Stoneward, trying to read his face.

Stoneward’s face didn’t move. He didn’t want to give away how much was planned, how much was luck. He knew these things travel fast, and every little bit he could keep secret gave his team an edge. Mirza smiled. She recognised that poker face. “We know how to develop a computer system that could help you with even better strategic decisions.”

Stoneward tried to keep his face unmoved, but his body turned to Mirza and his arms opened a bit wider. Mirza knew that he was interested.

“If our models are correct, we can develop an Artificial Intelligence that could help you discuss your plans, help you with making the right strategic decisions, and play through different scenarios. Such AIs are already used in board rooms, in medicine, to create new recipes for top restaurants, or training chess players.”

“What about the other teams?”

“Well, we were hoping to keep this exclusive for two or four years, to test and refine the methodology. We are not in a hurry. Our models give us an overwhelming probability to win both the European Championship and the World Cup in case you follow our advice.”

“Overwhelming probability?”

“About 96%.”

“For the European Championship?”

“No. To win both.”

Stoneward gasped. “That is… hard to believe.”

The CEO laughed. “It is good that you are sceptical. I also doubted these probabilities, but I had two teams double-check.”

“What is that advice?”

She shrugged. “I don’t know yet. We need to develop the AI first. But I wanted to be sure you are actually interested before we invest in it.”

“You already know how effective the system will be without even having developed it yet?”

She smiled. “Our own decision process is being guided by a similar AI. There are so many things we could be doing. So many possible things to work on and revolutionise. We have to decide how to spend our resources and our time wisely.”

“And you’d rather spend your time on football than on… I don’t know, healing cancer or making a product that makes tons of money?”

“Healing cancer is difficult and will take a long time. Regarding money… the biggest impediment to speeding up the impact of our work is currently not a lack of resources, but a lack of public and political goodwill. People are worried about what our technology can do, and parliament and the European Union are eager to throw more and more regulations at us. What we need is something that will make every voter in England fall in love with us. That will open up the room for us to move more freely.”

Stoneward smiled. “Winning the World Cup.”

She smiled. “Winning the World Cup.”


Three months later…

“So, how will this work? Do I, uhm, type something in a computer, or do we have to run some program and I enter possible players we are considering to select?”

Mirza laughed. “No, nothing that primitive. The AI already knows all of your players. In fact, it knows all professional players in the world. It has watched and analyzed every second of TV screening of any game around the world, every relevant online video, and everything written in local newspapers.”

Stoneward nodded. That sounded promising.

“Here comes a little complication, though. We have a protocol for using our AIs. The protocols are overcautious. Our AIs are still far away from human intelligence, but our Ethics and Safety boards insisted on implementing these protocols whenever we use some of the near-human intelligence systems. It is completely overblown, but we are basically preparing ourselves for the time we have actually intelligent systems, maybe even superhuman intelligent systems.”

“I am afraid I don’t understand.”

“Basically, instead of talking to the AI directly, we talk with them through an operator, or medium.”

“Talk to them? You simply talk with the AI? Like with Siri?”

Mirza scoffed. “Siri is just a set of hard coded scripts and triggers.”

Stoneward didn’t seem impressed by the rant.

“The medium talks with the AI, tries its best to understand it, and then relays the AI’s advice to us. The protocol is strict about not letting the AI interact with decision makers directly.”

“Why?”

“Ah, as said, it is just being overly cautious. The protocol is in place in case we ever develop a superhuman intelligence, in which case we want to ensure that the AI doesn’t have too much influence on actual decision makers. The fear is that a superhuman AI could possibly unduly influence the decision maker. But with the medium in between, we have a filter, a normal intelligence, so it won’t be able to invert the relationship between adviser and decision maker.”

Stoneward blinked. “Pardon me, but I didn’t entirely follow what you — ”

“It’s just a Science Fiction scenario, but in case the AI tries to gain control, the fear is that a superhuman intelligence could basically turn you into a mindless muppet. By putting a medium in between, well, even if the medium becomes enslaved, the medium can only use their own intelligence against you. And that will fail.”

The director took a sip of water, and was pondering what he just heard for a few moments. Denise Mirza was burning with frustration. Sometimes she forgets how it is to deal with people this slow. And this guy had more balls banged against his skull than is healthy, which isn’t expected to speed his brain up. After what felt like half an eternity, he nodded.

“Are you ready for me to call the medium in?”

“Yes.”

She tapped her phone.

“Wait, does this mean that these mediums are slaves to your AI?”

She rolled her eyes. “Let us not discuss this in front of the medium, but I can assure you that our systems have not yet reached the level to convince a four year old to give up a lollipop, never mind a grown up person to do anything. We can discuss this more afterwards. Oh, there he is!”

Stoneward looked up surprised.

It was an old acquaintance, Nigel Ramsay. Ramsay used to manage some smaller teams in Lancashire, where Stoneward grew up. Ramsay was more known for his passion than for his talents.

“I am surprised to see you here”

The medium smiled. “It was a great offer, and when I learned what we are aiming for, I was positively thrilled. If this works we are going to make history!”

They sat down. “So, what does the system recommend?”

“Well, it recommends to increase the pressure on the government for a second referendum on Brexit.”

Stoneward stared at Ramsay, stunned. “Pardon me?”

“It is quite clear that the Prime Minister is intentionally sabotaging any reasonable solution for Brexit, but is too afraid to yet call a second referendum. She has been a double agent for the remainers the whole time. Once it is clear how much of a disaster leaving the European Union would be, we should call for a second referendum, reversing the result of the first.”

“I… I am not sure I follow… I thought we are talking football?”

“Oh, but yes! We most certainly are. Being part of an invigorated European Union after Brexit gets cancelled, we should strongly support a stronger Union, even the founding of a proper state.”

Stoneward looked at Ramsay with exasperation. Mirza motioned with her hands, asking for patience.

“Then, when the national football associations merge, this will pave the way for a single, unified European team.”

“The associations… merge?”

“Yes, an EU-wide all stars team. Just imagine that. Also, most of the serious competition would already be wiped out. No German team, no French team, just one European team and — “

“This is ridiculous! Reversing Brexit? Just to get a single European team? Even if we did, a unified European team might kill any interest in international football.”

“Yeah, that is likely true, but our winning chances would go through the roof!”

“But even then, 96% winning chances?”

“Oh, yeah, I asked the same. So, that’s not all. We also need to cause a war between Argentina and Brazil, in order to get them disqualified. There are a number of ways to get to this — ”

“Stop! Stop right there.” Stoneward looked shocked, his hands raised like a goalie waiting for the penalty kick. “Look, this is ridiculous. We will not stop Brexit or cause a war between two countries just to win a game.”

The medium looked at Stoneward in surprise. “To ‘just’ win a game?” His eyes wandered to Mirza in support. “I thought this was the sole reason for our existence. What does he mean, ‘just’ win a game? He is a bloody director of the FA, and he doesn’t care to win?”

“Maybe we should listen to some of the other suggestions?”, the CEO asked, trying to soothe the tension in the room.

Stoneward was visibly agitated, but after a few moments, he nodded. “Please continue.”

“So even if we don’t merge the European associations due to Brexit, we should at least merge the English, Scottish, Welsh, and Northern Irish associations in — ”

“No, no, NO! Enough of this association merging nonsense. What else do you have?”

“Well, without mergers, and wars, we’re down to 44% probability to win both the European and World Cup within the next twenty years.” The medium sounded defeated.

“That’s OK, I’ll take that. Tell me more.” Stoneward has known that the probabilities given before were too good to be true. It was still a disappointment.

“England has some of the best schools in the world. We should use this asset to lure young talent to England, offer them scholarships in Oxford, in Cambridge.”

“But they wouldn’t be English? They can’t play for England.”

“We would need to make the path to citizenship easier for them, immigration laws should be more integrative for top talent. We need to give them the opportunity to become subjects of the Queen before they play their first international. And then offer them to play for England. There is so much talent out there, and if we can get them while they’re young, we could prep up our squad in just a few years.”

“Scholarships for Oxford? How much would that even cost?”

“20, 25 thousand per year and student? We can pay a hundred scholarships and it wouldn’t even show up in our budget.”

“We are cutting budgets left and right!”

“Since we’re not stopping Brexit, why not dip into those 350 million pounds per week that we will save.”

“That was a lie!”

“I was joking.”

“Well, the scholarship thing wasn’t bad. What else is on the table?”

“One idea was to hack the video stream and bribe the referee, and then we can safely gaslight everyone.”

“Next idea.”

“We could poison the other teams.”

“Just stop it.”

“Or give them substances that would mess up their drug tests.”

“Why not getting FIFA to change the rules so we always win?”

“Oh, we considered it, but given the existing corruption inside FIFA it seems that would be difficult to outbid.”

Stonward sighed. “Now I was joking.”

“One suggestion is to create a permanent national team, and have them play in the national league. So they would be constantly competing, playing with each other, be better used to each other. A proper team.”

“How would we even pay for the players?”

“It would be an honor to play for the national team. Also, it could be a new rule to require the best players to play in the national team.”

“I think we are done here. These suggestions were… rather interesting. But I think they were mostly unactionable.” He started standing up.

Mirza looked desperately from one to the other. This meeting did not go as she had intended. “I think we can acknowledge the breadth of the creative proposals that have been on the table today, and enjoy a tea before you leave?”, she said, forcing a smile.

Stoneward nodded politely. “We sure can appreciate the creativity.”

“Now imagine this creativity turned into strategies in the pitch. Tactical moves. Variations to set pieces.”, the medium started, his voice slightly shifting.

“Yes, well, that would certainly be more interesting than most of the suggestions so far.”

“Wouldn’t it? And not only that, but if we could talk to the players. If we could expand their own creativity. Their own willpower. Their focus. Their energy to power through, not to give up.”

“If you’re suggesting to give them drugs, I am out.”

Ramsay laughed. “No, not drugs. But a helmet that emits electromagnetic waves and allows the brain muscles to work in more interesting ways.”

Stoneward looked over to the CEO. “Is that a possibility?”

Mirza looked uncomfortable, but tried to hide it. “Yes, yes, it is. We had tested it a few times, and the results were quite astonishing. It is just not what I would have expected as a proposal.”

“Why? Anything wrong with that?”

“Well, we use it for our top engineers, to help them focus when developing and designing solutions. The results are nothing short of marvelous. It is just, I didn’t think football would benefit that much from improved focus.”

Stoneward chuckled, as he sat down again. “Yes, many people underestimate the role of a creative mind in the game. I think I would now like a tea.” He looked to Ramsay. “Tell me more.”

The medium smiled. The system will be satisfied with the outcome.

(Originally published July 28, 2018 on Medium)

Saturn the alligator

Today at work I learned about Saturn the alligator. Born to humble origins in 1936 in Mississippi, he moved to Berlin where he became acquainted with Hitler. After the bombing of the Berlin Zoo he wandered through the streets. British troops found him, gave him to the Soviets, where against all odds he survived a number of near death situations - among others he refused to eat for a year - and still lives today, in an enclosure sponsored by Lacoste.

I also went to Wikidata to improve the entry on Saturn. For that I needed to find the right property to express the connection between Saturn, and the Moscow Zoo, where he is held.

The following SPARQL query was helpful: https://w.wiki/7ga

It tells you which properties connect animals with zoos how often - and in the Query Helper UI it should be easy to change either types to figure out good candidates for the property you are looking for.

Wikidata reached a billion edits

As of today, Wikidata has reached a billion edits - 1,000,000,000.

This makes it the first Wikimedia project that has reached that number, and possibly the first wiki ever to have reached so many edits. Given that Wikidata was launched less than seven years ago, this means an average edit rate of 4-5 edits per second.

The billionth edit is the creation of an item for a 2006 physics article written in Chinese.

Congratulations to the community! This is a tremendous success.

In the beginning

"Let there be a planet with a hothouse effect, so that they can see what happens, as a warning."

"That is rather subtle, God", said the Archangel.

"Well, let it be the planet closest to them. That should do it. They're intelligent after all."

"If you say so."