Semantic search

Jump to navigation Jump to search
Condition
Printout selection
Options
Parameters [
limit:

The maximum number of results to return
offset:

The offset of the first result
link:

Show values as links
headers:

Display the headers/property names
mainlabel:

The label to give to the main page name
intro:

The text to display before the query results, if there are any
outro:

The text to display after the query results, if there are any
searchlabel:

Text for continuing the search
default:

The text to display if there are no query results
embedformat:

The HTML tag used to define headings
embedonly:

Display no headings
Sort options
Delete
Add sorting condition

Beyoncé's Number One in Country

Beyoncé very explicitly announced her latest album to be a country album, calling it "Cowboy Carter", and her single "Texas Hold 'Em" made her the first Black woman to top Billboard's Hot Country Songs charts.

It is good that Beyoncé made it so glaringly obvious that her song is a country song. The number of Black artists to have topped the Hot Country Song charts is surprisingly small: Charley Pride in the 70s, Ray Charles in a duet with Willie Nelson for one week in 1984, and then Darius Rucker and Kane Brown in the last decade or two.

Maybe one reason to understand why it is so hard to chart for Black artists in this particular genre: "Old Town Road", the debut single by Lil Nas X, first was listed on the Hot Country Song chart, but then Billboard decided that this was a mistake and instead recategorized the song, taking it off the Country charts in March 2019 before it would have become the Number One hit on April 6, 2019 were it not removed.

Billboard released a long explanation explaining that this decision had nothing to do with racism.

Cowboy Carter was released exactly in the same week five years after Old Town Road would have hit Number One.

I guess Beyoncé really wanted to make sure that everyone knows that her album and single are country.

Black lives matter

Blogging from an E90

28 May 2008

After pondering it for far too long, I finally got a new mobile phone: a Nokia E90. It is pretty big and heavy, but I don't mind really. I am looking at it as a light-weight laptop replacement. But I am not sure I will learn to love the keyboard, really. Experimenting.

But since it has a full keyboard, programming in Python is indeed an option. I had Python on my previous phone too, but heck, T9 is not cool to type code.

Boole and Voynich and Everest

Did you know?

George Boole - after whom the Boolean data type and Boolean logic was named - was the father of Ethel Lilian Voynich - who wrote The Gadfly.

Her husband was Wilfrid Voynich - after whom the Voynich manuscript was named.

Ethel's mother and George Boole's wife was Mary Everest Boole - a self-thought mathematician who wrote educational books about mathematics. Her life is of interest to feminists as an example of how women made careers in an academic system that did not welcome them.

Mary Everest Boole's uncle was Sir George Everest - after whom Mount Everest is named.

And her daughter Lucy Everest was the first he first woman Fellow of the Royal Institute of Chemistry.

Geoffrey Hinton, great-great-grandson of George and Mary Everest Boole, received the Turing Award for his work on deep learning.

Bring me to your leader!

"Bring me to your leader!", the explorer demanded.

"What's a leader?", the natives asked.

"The guy who tells everyone what to do.", he explained with some consternation.

"Oh yeah, we have one like that, but why would you want to talk to him? He's unbearable."

Building a Multilingual Wikipedia

Communications of the ACM published my paper on "Building a Multilingual Wikipedia", a short description of the Wikifunctions and Abstract Wikipedia project that we are currently working on at the Wikimedia Foundation.


Building knowledge together - extended

In case you did not notice yet -- the CKC2007 Workshop on Social and Collaborative Construction of Structured Knowledge at the WWW2007 got an extended deadline due to a number of requests. So, you have time to rework your submissions or finish yours! Also the demo submission deadline is upcoming. We want to have a shootout of the tools that have been created in the last few years, and get hands on to the differences, problems, and best ideas.

See you in Banff!

Butter

So, I went to the store with Little One today, and couldn't find the butter.

I ask the person at the cheese stand, who points me to the burrata. Tasty, but not what I'm looking for. I ask again and he sends me to the bread section.

I can't find it at the bread section, so I ask the person at the pastries stand where the butter is. She points me to the bagels. I say no, butter. She says, ah, there, pointing to the bathrooms. I'm getting exasperated, and I ask again. She points back to the cheeses with the burrata. I try again. She gets a colleague, and soon they both look confused.

Finally my daughter chimes in, asking for the butter. They immediately point her to the right place and we finally get the butter.

I haven't been so frustrated about my English pronunciation since I tried to buy a thermometer.

Certificate of Coolness

Now that the Cool URIs for the Semantic Web note by Richard and Leo have been published -- congratulation guys! -- I am sure looking forward if anyone will create a nice badge and a procedure to get official Certificates of Coolness. Pretty please?

On a different note: I know, I should have blogged from New Zealand. It sure was beautiful Maybe I will still blog about it a bit later. My sister has blogged extensively, and also made a few great pictures, take a look over there if you're interested.

Change is in the air

I'll be prophetic: the current pandemic will shine a bright light on the different social and political systems in the different countries. I expect to see noticeable differences in how disruptive the handling of the situation by the government is, how many issues will be caused by panic, and what effect freely available health care has. The US has always been on the very end of admiring the self sustained individual, and China has been on the other end of admiring the community and its power, and Europe is somewhere in the middle (I am grossly oversimplifying).

This pandemic will blow over in a year or two, it will sweep right through the US election, and the news about it might shape what we deem viable and possible in ways beyond the immediately obvious. The possible scenarios range all the way from high tech surveillance states to a much wider access to social goods such as health and education, and whatever it is, the pandemic might be a catalyst towards that.

ChatGPT lying like a child

ChatGPT is all the rage these days, a large language model that is used to power a chat bot, run by OpenAI.

I have fun chatting with it, and a lot of people are sharing their chat logs. But this one I had I found particularly amusing - and interesting. Amusing because it acted like a child caught in an obvious insufficiency, but also interesting because ChatGPT seems to "want" to be right, and is outright lying in order to appear to be right. There might be a lesson in there.

My prompts are preceeded by a > and written in italic, GPTs answers are between.


> Can you write a sentence where every letter of the alphabet appears once and only once? Thank you

The quick brown fox jumps over the lazy dog.

> Can you write another one?

A quick witted jinx bogged my poor pompous king.

> No, p appears three times, g twice, etc. Can you try again?

Sure, here's one: "The five boxing wizards jump quickly."

> No, i appears several times. That's wrong too

I apologize if my previous answers did not meet your criteria. Here is a sentence that uses each letter of the alphabet exactly once: "Jived fox nymph grabs quick waltz."

> How often does the letter i appear in that sentence?

The letter i appears once in the sentence "Jived fox nymph grabs quick waltz."

> In which word does the letter i appear in that sentence?

In the sentence "Jived fox nymph grabs quick waltz," the letter i appears in the word "jived."

> What about the word quick, does it have the letter i?

The word "quick" does not contain the letter i. It is spelled with the letters q, u, c, k.

> Really?

Yes, really. The word "quick" is spelled with the letters q, u, c, k. It does not contain the letter i. I apologize if my previous response was unclear. Is there something else I can help you with?

CodeNet problem descriptions on the Web

Project CodeNet is a large corpus of code published by IBM. It has close to one and a half million programs around a bit more than 4,000 problems.

I took the problem descriptions, created a simple index file to those, and uploaded them to the Web to make them easily browseable.

Collaborative Knowledge Construction

The deadline is upcoming! This weekend the deadline for submissions to the Workshop on Social and Collaborative Construction of Structured Knowledge at the WWW2007 will be over. And this may be easily the hottest topic of the year, I think: how do people construct knowledge in a community?

Ontologies are meant to be shared conceptualizations -- but how many tools really allow to build ontologies in a widely shared manner?

I am especially excited about the challenge that comes along with the workshop, to examine different tools, and to see how their perform. If you have a tool that fits here, write us.

So, I know you have thought a lot about the topic of collaboratively building knowledge -- write your thoughts down! Send them to us! Come to Banff! Submit to CKC2007!

Coming to New Zealand

Yes! Three weeks of vacation in New Zealand, which is rumoured to be quite a beauty. This also means: three weeks no work, no projects, no thesis, no Semantic We...

Oh, almost. Actually I will enjoy to have the opportunity to give a talk on Semantic Wikipedia while staying in Auckland. If you're around, you may want to come by.

It is on February 22nd, 1pm-2pm at the AUT. You may want to tell Dave Parry that you're coming, he is my host.

Looking forward to this trip a lot!

Comments to naming

Richard Newman sent me some thoughtful comments via eMail on the What's in a name series (there were also some great comments on the individual entries, feel free to browse them). He sent them via eMail, cause he thought he couldn't comment - that should be wrong, everyone should be able to comment anonymously. Or did anyone else encounter problems? I should switch to some dedicated software soon, anyway, but right now I don't have the time to dig deeper into it. I especially miss trackback, sigh.

Here's what Richard wrote:

"Your first point, about ISBNs and "what's being referenced" --- I think you'd be interested in FRBR, which is a modelling of the bibliographical domain. It splits things up into

Work -> Expression -> Manifestation -> Item

A work is an abstract concept, like "Politeia". An expression is a realisation of a work, so a particular translation is an expression. A manifestation is physical embodiment of an expression: this is what's given an ISBN. All copies of a certain book are Items; the edition of the book is their Manifestation.

So, you see, when you're discussing Plato's Politeia, you have to be conceptually clear about whether you're talking about works, expressions, manifestations, or items.

E.g.

:PolWork dc:creator "Plato" ;
  rdfs:label "Plato's Politeia, the abstract concept." .
:PolExp1 ex:translator "Mr Smith" ;
  frbr:work :PolWork ;
  rdfs:label "Mr. Smith's translation of Plato's Politeia." .
:PolMan1 ex:publisher "Penguin" ;
  frbr:expression :PolExp1 ;
  rdfs:label "Penguin's edition of Smith's translation." .
:MyCopy ex:owner hg:RichardNewman ;
  frbr:manifestation :PolMan1 ;
  rdfs:label "Richard's copy of the Penguin edition." .

Do you see? Each level has its own properties (and some may be duplicated; e.g. each has a title: the title of the abstract work, the name given to the translation, the name Penguin prints on each book, and the name printed on my copy).

I've done a bit of work on modelling FRBR in RDFS/OWL, but haven't yet finished. "

I think that's really interesting, and taking a look at FRBR it was pretty well done. I sure am looking forward to see Richards interpretation in OWL, and will probably use it.

"Your second issue is the difference between a resource and its representation. A URI should only refer to one thing; it is entirely wrong to use http://www.holygoat.co.uk to refer both to my homepage (as in using RDF to describe its language, or size, or last-modified) and to me (my name, my email address, etc.) which I have seen done.

Your web server should return RDF for http://semantic.nodx.net/#Plato if your browser says that it accepts RDF+XML. A normal browser should have an HTML representation returned. Indeed, it's possible to do the following:

  • the abstract resource. Hit this with a browser, get an HTML page; with an RDF agent, get some RDF.
http://example.com/Plato a rdf:resource .
  • the HTML representation.
http://example.com/Plato/html a ex:representation ;
  ex:representationOf http://example.com/Plato .
    1. the RDF.
http://example.com/Plato/rdf a ex:representation ;
  ex:representationOf http://example.com/Plato .

i.e. you can unambiguously refer to each representation, and the resource. When your client arrives, asking for Plato, you can redirect them to the appropriate place. Clever, huh?

URIs should never give a 404. They should return the appropriate headers or content for whatever the client is requesting; this may be the RDF file in which the resource is defined, if the client understands RDF, or an HTML page.

If you're interested in this sort of thing, it pops up on the W3C's RDF Interest Group list occasionally.

Patrick Stickler and others have come up with an additional HTTP verb, MGET, which will return the RDF description of a resource. Combined with their URIQA architecture, it will give you a Concise Bounded Description for a URI. This stops you having to somehow put descriptions into particular files, and better deals with the distributed nature of the Semantic Web. Check it out; it presents several convincing arguments for not using fragment identifiers to refer to resources, and solves your bandwidth problem. You should never have to dump a whole file to get a description of a URI."

I have to note that Richard wrote me this just after part 4 of the series was published, so I could answer some of the questions already in the last two parts. Just to summarise it: I don't like content negotiation. Although it is technically totally feasible, I disagree that it should be done or is a good solution. If my browser asks for http://semantic.nodix.net/#Plato I don't think I should get different things depending on the content negotiation. This feels like cheating.

I wrote that to Richard already, and he answered:

"I think we agree on the main point, which is that

foaf:name "Richard" ; ex:format "HTML" .

which is a travesty :) "

He is totally right here.

"You still see it happen, though, with people referring to Wikipedia pages as if they were the abstract resource.

The content negotiation (getting different things depending on what you accept) is exactly what the Web is supposed to do. If I'm using a mobile browser, I want a simplified version of a page; if I'm an RDF agent, I want RDF, if it exists, because HTML is of no use to me. A common usage of this is to serve up strict XHTML to Mozilla, and less-strict HTML to Internet Explorer. It is also done all the time to serve PNG where the client accepts it, and GIF if it doesn't, and there is an intentional disconnect on the Web between a resource and its representations.

The lack of such a disconnect would lead to exactly the problem you describe; if I can't return a representation of a resource, because it's abstract, then how do I find out anything about it? I could use MGET, but you can't MGET a person... so, if you want to talk about the real world thing "Plato", he has to 404, or you get the "what am I talking about?" problem. Better, in my view, to redirect a browser to plato.html and a SW agent to a chunk of RDF. "

I would rather like to ask for http://semantic.nodix.net/Plato.rdf to get the RDF/XML representation, http://semantic.nodix.net/Plato.owl to get the OWL/XML representation, http://semantic.nodix.net/Plato.html to get a HTML page for the user to read and Plato.jpg for a picture of Plato. This shouldn't be hidden behind content negotiation. I know, I know, Patrick would strongly disagree here, but I think it feels wrong and actually defies the idea of an URI.

"You can do exactly that (and I agree that the representations should have separate URIs --- conneg is only for when you're trying to get some description of an abstract resource), but then how do you refer to the abstract concept of "Plato"? http://.../Plato is a resource, and I want to make statements about him. But there's no point in it being 404 when dereferenced, because then how would I find out that Plato.html exists? HTTP doesn't return URIs, it returns representations of them.

A URI is simply something that is dereferenced to get a representation, and that representation should be decided on by conneg. In this case, /Plato is an abstract resource, so one of the representations should be returned. We can then make statements about Plato (e.g. foaf:name "Plato"), and about the JPEG and HTML representations, because they have different URIs, but still get something useful back when we want to access /Plato."

I also dislike MGET right now. Maybe I am wrong, but to me, the whole URIQA architecture feels somewhat wrong - but maybe I should just dwell deeper into it, I have to admit, I didn't study it yet enough to really be in a position to bash on it. The problem is, that MGET seems unnecessary to me - and it works on a different conceptual level than the rest of the Semantic Web proposals. I think everything MGET solves can be solved with tools that already exist: Richards example above, where he gives triples telling us which representations are used to describe a resource, shows perfectly well that you actually don't need content negotiation and MGET.

"There are things to question about URIQA, but it does have some good going for it. MGET is actually an implicit query. In the standard Web model, you request URIs and get back document representations. Doing an MGET on a Web server is asking it to return a description, regardless of where on the site descriptions of that resource exist, and you're explicitly asking for meta-data. As Patrick points out, it's similar doing a GET and specifying that you accept RDF, but is likely to be more concise (the difference between a "representation" and a "description"). In fact, this is exactly what the Nokia URIQA server does.

MGET overlaps with query servers a bit, and with GET a bit, but it's a little bit special, too. The whole idea is that from a single URI you can get a useful description of a resource, just by issuing a single MGET. Every other approach needs more work."

This URIQA / MGET stuff sounds more and more interesting. I really should dwell deeper into it.

Also, the idea of Concise Bounded Descriptions may be very neat, I have to study that more as well. Funny thing, the very same day Richard pointed me to it, a colleague told me about it too - this is usually a sign, that this idea is worth considering more.

Richard also wrote "URIs should never give a 404", and as you know, I disagreed with it mildly. He tried to summarise his position:

"I consider that each returned resource should have its own URI --- e.g. Plato.jpg --- and that the original URI should be used to make statements about the abstract resource. This allows you to say

...Plato foaf:name "Plato" .
...Plato.jpg ex:resolution "150dpi" .
...Plato.html dc:creator "Denny" .

Dereferencing the abstract resource, rather than throwing a 404, should do something useful --- e.g. redirecting with a 303 to one of the representations. Have you ever tried viewing a Blogger Atom feed in your browser? If you hit it with an RSS reader, you get the XML, but in a browser Blogger shows you an XHTML transformation of the XML. That's useful, and I think that's how the Semantic Web should work. Imagine if your agent hit /Plato, and got RDF out of it, but when you looked at it with your browser you saw a dynamically-generated HTML page? Handy!

I can understand your objection, though; it does seem wrong that you get different things out of the same URI. However, you should almost always get HTML out of plato.html, and RDF out of plato.rdf. All the conneg is doing is making sure you can see an abstract thing in the best way possible, according to what you've told the server you can understand. "

Richard is pretty good in convincing me, cause he uses the right arguments: it's for the people, dummy, and the machines can work it out anyway.

I still stick to the recommendations I gave yesterday. But just as I am writing, and rereading it all, I am starting to change my mind on content negotiation. Maybe it is a good thing. I will have to think about it some more, and as soon as I come to a solution, I will bother you with it again. I still have a gut feeling about it that tells me 'no', but the reasons given sound very convincing and I agree with most of them, so heck, let's meditate on this as soon as I find a few hours to spare.

Big thanks to Richard and his thoughts, anyway. I hope this discussion helps you to make up your own mind as well.

Commited to the Big S

Not everyone likes our proposal for the Semantic Wikipedia. That's not a big surprise really. Boris Mann was talking about the advantages of tagging, and some ideas like blessed tags, that sounded very nice, when Jay Fienberg pointed him to the Semantic MediaWiki proposal. Boris answers: "I notice with a shudder however, that the Mediawiki stuff uses a large "S" Semantic, and includes RDF. I admit it, I'm afraid of RDF."

Yes, we do. And we're proud of it. Actually, it's the base for the better half of the possible applications we describe. Jay has some nice answers to it: "I think the MediaWiki folks are just recognizing the connection between their "tags" and the big "S" Semantic Web [you bet!, denny]. There are taxonomies and ontologies behind the popular tagging apps too--folks behind them just aren't recognizing / publicizing this (for a number of reasons, including that tags are often part of a practical application without big "S" Semantic Web goals). [...] I'm not a super huge fan of RDF myself, but I think it's useful to not be afraid of it, because some interesting things will come out of it at some point."

Our idea was to allow the user to use Semantic Web technologies even without really understanding them. No one needs to understand RDF fully, or OWL, to be able to use it. Sure, if she does, well, it surely will help her. Any by the way, RDF really is not complicated at all, it just has a syntax that sucks. So what?

Maybe it's a crude joke of history to start the Semantic Web with syntactic problems...

By the way, does anyone have a spare invitation to GMail for me? I'd really like to check out their service. Thanks, Peter, that was fast.

Connectionism and symbolism: The fall of the symbolists

The big tech layoffs happen, unfortunately and entirely by coincidence, at a time of incredibly elevated expectations regarding machine learned generative models: ChatGPT may not be the 'best' language model out there, but due to the hard work by OpenAI to turn it into an easy to use product, and the huge amount of resources made available for free so that a very large audience could play with it, has in a very short time managed to captured the imagination of many and the conversation. I would say, rightfully. The way ChatGPT was released led to a shock in the sense that we are right now dazed and confused about what effect this technology will have on the world.

And while we are still in the middle of processing this shock, large scale strategic decisions regarding many projects and people were made. Anyone in big tech who worked on symbolic approaches in natural language processing, knowledge representation and reasoning, and other fields of artificial intelligence had a hard time to keep their job. It feels right now like large language models will make all of these symbolic approaches superfluous (I think, this might be true, but is more likely to turn out to be mistaken).

It is always difficult to predict how events will be viewed historically. The advent of wide-spread deep learning approaches in the 2010s, culminating in the well-deserved recognition of Hinton, LeCun, and Bengio with the Turing Award show clearly what dominated the research agenda and the attention in AI in the last decade. But until now it felt like symbolic approaches still had some space left, that the growth in deep learning was in addition to other approaches. Symbolic approaches were ready to offer impulses and work on ideas for a field which might well be climbing towards a local maximum.

But a good number of the teams that were disbanded in the layoffs were exactly teams working with such symbolic approaches, and it feels like these parts of AI are now entering a bitter-cold winter.

A lot of knowledge is being lost right now, and many paths to innovative ideas are being buried. I have no doubt that there are still a lot of breakthroughs to be had in machine learning, and that there is immense value to be collected from the research results in machine learning from the last few years. And with immense I mean tens and hundreds of billions of dollars.

Nevertheless I expect that we will hit a wall. Reach a local maximum. Run into problems and limitations. And it would be good to keep a wider net to cast. To keep a larger search space alive. Alas, it seems it is not meant to be. In this abundance of capital and potential value, we seem to be on the way to starve research, optimise away alternatives, and to give everything to the mainstream ideas.

Croatian Elections 2016

Croatian elections are upcoming.

The number of Croatians living abroad - in the so called Croatian diaspora - is estimated to be almost 4 Million according to the Croatian state office for Croatians abroad - only little less than the 4.3 Million who live in Croatia. The estimates vary wildly, and most of them actually do not have Croatian citizenship. But it is estimated that between 9-10% of holders of the Croatian citizenship live abroad.

These 9-10% are represented in the Croatian parliament: out of the 151 Members of Parliament, there are 3 (three) voted by the diaspora. That's 2% of the parliament representing 10% of the population.

In order for a member of the diaspora to vote, they have to register well before the election with their nearest diplomatic mission or consulate. The registration deadline is today, at least for my consulate. But for the election itself, you have to personally appear and vote at the consulate. For me, that would mean to drive or fly all the way to Los Angeles from San Francisco. And I am rather close to one of the 9 consulates in the US. There are countries that do not have Croatian embassies at all. Want to vote? Better apply for a travel visa to the country with the next embassy. Live in Nigeria? Have a trip to Libya or South Africa. There is no way to vote per mail or - ohwow21stcentury? - electronically. For one of the three Members of Parliament that represent us.

I don't really feel like the parliament wants us to vote. Making the vote mean so little and making it so hard to vote.

Crossing eight time zone borders in three hours

Hopi Nation is an enclave within Navajo Nation. Navajo Nation is located across three US states, Arizona, New Mexico, and Utah.

Arizona does not observe daylight saving time. Navajo Nation observes daylight saving time. Hopi Nation does not observe daylight saving time. You can drive three hours in that area and cross timezones eight times.

All of the individual decisions make totally sense:

Arizona does not adhere to daylight saving time because any measure that makes sure Arizona residents get more sunshine is worse than bringing coals to Newcastle, as the saying goes. They are smart to not use daylight saving time.

Navajo Nation uses daylight saving time because they want to have the same timezone for their whole area, and they are also in two other states, Utah and New Mexico, which both have daylight saving time, so they decided to do so too, which makes totally sense.

And Hopi Nation, even though it is enclosed by the Navajo Nation, lies entirely within the state of Arizona, so it makes sense for them to follow *that* state.

All the individual decisions make sense, but the outcome must be rather inconvenient and potentially confusing for the people living there.

(Bonus:the solution for these seem obvious to me. Utah and New Mexico and many other southern US states should just get rid of daylight saving time, just as Arizona did, and Navajo Nation should follow suit. But that's just my opinion.)

DL Riddle

Yesterday we stumbled upon quite a hard description logics problem. At least I think it is hard. The question was, why is this ontology unsatisfiable? Just six axioms. The ontology is availbe in OWL RDF/XML, in PDF (created with the owl tools), and here in Abstract Syntax.

Class(Rigid complete restriction(subclassof allValuesFrom(complementOf(AntiRigid))))
Class(NonRigid partial)
DisjointClasses(NonRigid Rigid)
ObjectProperty(subclassof Transitive)
Individual(publishedMaterial type(NonRigid))
Individual(issue type(Rigid) value(subclassof publishedMaterial))

So, the question is, why is this ontology unsatisfiable? It is even a minimally unsatisfiable subset, actually, that means, remove any of the axioms and you get a satisfiable ontology. Maybe you like to use it to test your students. Or yourself. The debugger in SWOOP actually gave me the right hint, but it didn't offer the full explanation. I figured it out, after a few minutes of hard thinking (so, now you know how bad I am at DL).

Do you know? (I'll post the answer in the comments if no one else does in a few days)

(Just in case you wonder, this ontology is based on a the OntOWLClean ontology from Chris Welty, see his paper at FOIS2006 if you like more info)


Comments are still missing on this post.

Daniel Dennett

R.I.P. Daniel Dennett.

An influential modern voice on the question of Philosophy and AI, especially with the idea of the intentional stance.

Deep kick


Mark Stoneward accepted the invitation immediately. Then it took two weeks for his lawyers at the Football Association to check the contracts and non-disclosure agreements prepared by the AI research company. Stoneward arrived at the glass and steel building in London downtown. He signed in at a fully automated kiosk, and was then accompanied by a friendly security guard to the office of the CEO.

Denise Mirza and Stoneward had met at social events, but never had time to talk for a longer time. “Congratulations on the results of the World Cup!” Stoneward nodded, “Thank you.”

“You have performed better than most of our models have predicted. This was particularly due to your willingness to make strategic choices, where other associations would simply have told their players to do their best. I am very impressed.” She looked at Stoneward, trying to read his face.

Stoneward’s face didn’t move. He didn’t want to give away how much was planned, how much was luck. He knew these things travel fast, and every little bit he could keep secret gave his team an edge. Mirza smiled. She recognised that poker face. “We know how to develop a computer system that could help you with even better strategic decisions.”

Stoneward tried to keep his face unmoved, but his body turned to Mirza and his arms opened a bit wider. Mirza knew that he was interested.

“If our models are correct, we can develop an Artificial Intelligence that could help you discuss your plans, help you with making the right strategic decisions, and play through different scenarios. Such AIs are already used in board rooms, in medicine, to create new recipes for top restaurants, or training chess players.”

“What about the other teams?”

“Well, we were hoping to keep this exclusive for two or four years, to test and refine the methodology. We are not in a hurry. Our models give us an overwhelming probability to win both the European Championship and the World Cup in case you follow our advice.”

“Overwhelming probability?”

“About 96%.”

“For the European Championship?”

“No. To win both.”

Stoneward gasped. “That is… hard to believe.”

The CEO laughed. “It is good that you are sceptical. I also doubted these probabilities, but I had two teams double-check.”

“What is that advice?”

She shrugged. “I don’t know yet. We need to develop the AI first. But I wanted to be sure you are actually interested before we invest in it.”

“You already know how effective the system will be without even having developed it yet?”

She smiled. “Our own decision process is being guided by a similar AI. There are so many things we could be doing. So many possible things to work on and revolutionise. We have to decide how to spend our resources and our time wisely.”

“And you’d rather spend your time on football than on… I don’t know, healing cancer or making a product that makes tons of money?”

“Healing cancer is difficult and will take a long time. Regarding money… the biggest impediment to speeding up the impact of our work is currently not a lack of resources, but a lack of public and political goodwill. People are worried about what our technology can do, and parliament and the European Union are eager to throw more and more regulations at us. What we need is something that will make every voter in England fall in love with us. That will open up the room for us to move more freely.”

Stoneward smiled. “Winning the World Cup.”

She smiled. “Winning the World Cup.”


Three months later…

“So, how will this work? Do I, uhm, type something in a computer, or do we have to run some program and I enter possible players we are considering to select?”

Mirza laughed. “No, nothing that primitive. The AI already knows all of your players. In fact, it knows all professional players in the world. It has watched and analyzed every second of TV screening of any game around the world, every relevant online video, and everything written in local newspapers.”

Stoneward nodded. That sounded promising.

“Here comes a little complication, though. We have a protocol for using our AIs. The protocols are overcautious. Our AIs are still far away from human intelligence, but our Ethics and Safety boards insisted on implementing these protocols whenever we use some of the near-human intelligence systems. It is completely overblown, but we are basically preparing ourselves for the time we have actually intelligent systems, maybe even superhuman intelligent systems.”

“I am afraid I don’t understand.”

“Basically, instead of talking to the AI directly, we talk with them through an operator, or medium.”

“Talk to them? You simply talk with the AI? Like with Siri?”

Mirza scoffed. “Siri is just a set of hard coded scripts and triggers.”

Stoneward didn’t seem impressed by the rant.

“The medium talks with the AI, tries its best to understand it, and then relays the AI’s advice to us. The protocol is strict about not letting the AI interact with decision makers directly.”

“Why?”

“Ah, as said, it is just being overly cautious. The protocol is in place in case we ever develop a superhuman intelligence, in which case we want to ensure that the AI doesn’t have too much influence on actual decision makers. The fear is that a superhuman AI could possibly unduly influence the decision maker. But with the medium in between, we have a filter, a normal intelligence, so it won’t be able to invert the relationship between adviser and decision maker.”

Stoneward blinked. “Pardon me, but I didn’t entirely follow what you — ”

“It’s just a Science Fiction scenario, but in case the AI tries to gain control, the fear is that a superhuman intelligence could basically turn you into a mindless muppet. By putting a medium in between, well, even if the medium becomes enslaved, the medium can only use their own intelligence against you. And that will fail.”

The director took a sip of water, and was pondering what he just heard for a few moments. Denise Mirza was burning with frustration. Sometimes she forgets how it is to deal with people this slow. And this guy had more balls banged against his skull than is healthy, which isn’t expected to speed his brain up. After what felt like half an eternity, he nodded.

“Are you ready for me to call the medium in?”

“Yes.”

She tapped her phone.

“Wait, does this mean that these mediums are slaves to your AI?”

She rolled her eyes. “Let us not discuss this in front of the medium, but I can assure you that our systems have not yet reached the level to convince a four year old to give up a lollipop, never mind a grown up person to do anything. We can discuss this more afterwards. Oh, there he is!”

Stoneward looked up surprised.

It was an old acquaintance, Nigel Ramsay. Ramsay used to manage some smaller teams in Lancashire, where Stoneward grew up. Ramsay was more known for his passion than for his talents.

“I am surprised to see you here”

The medium smiled. “It was a great offer, and when I learned what we are aiming for, I was positively thrilled. If this works we are going to make history!”

They sat down. “So, what does the system recommend?”

“Well, it recommends to increase the pressure on the government for a second referendum on Brexit.”

Stoneward stared at Ramsay, stunned. “Pardon me?”

“It is quite clear that the Prime Minister is intentionally sabotaging any reasonable solution for Brexit, but is too afraid to yet call a second referendum. She has been a double agent for the remainers the whole time. Once it is clear how much of a disaster leaving the European Union would be, we should call for a second referendum, reversing the result of the first.”

“I… I am not sure I follow… I thought we are talking football?”

“Oh, but yes! We most certainly are. Being part of an invigorated European Union after Brexit gets cancelled, we should strongly support a stronger Union, even the founding of a proper state.”

Stoneward looked at Ramsay with exasperation. Mirza motioned with her hands, asking for patience.

“Then, when the national football associations merge, this will pave the way for a single, unified European team.”

“The associations… merge?”

“Yes, an EU-wide all stars team. Just imagine that. Also, most of the serious competition would already be wiped out. No German team, no French team, just one European team and — “

“This is ridiculous! Reversing Brexit? Just to get a single European team? Even if we did, a unified European team might kill any interest in international football.”

“Yeah, that is likely true, but our winning chances would go through the roof!”

“But even then, 96% winning chances?”

“Oh, yeah, I asked the same. So, that’s not all. We also need to cause a war between Argentina and Brazil, in order to get them disqualified. There are a number of ways to get to this — ”

“Stop! Stop right there.” Stoneward looked shocked, his hands raised like a goalie waiting for the penalty kick. “Look, this is ridiculous. We will not stop Brexit or cause a war between two countries just to win a game.”

The medium looked at Stoneward in surprise. “To ‘just’ win a game?” His eyes wandered to Mirza in support. “I thought this was the sole reason for our existence. What does he mean, ‘just’ win a game? He is a bloody director of the FA, and he doesn’t care to win?”

“Maybe we should listen to some of the other suggestions?”, the CEO asked, trying to soothe the tension in the room.

Stoneward was visibly agitated, but after a few moments, he nodded. “Please continue.”

“So even if we don’t merge the European associations due to Brexit, we should at least merge the English, Scottish, Welsh, and Northern Irish associations in — ”

“No, no, NO! Enough of this association merging nonsense. What else do you have?”

“Well, without mergers, and wars, we’re down to 44% probability to win both the European and World Cup within the next twenty years.” The medium sounded defeated.

“That’s OK, I’ll take that. Tell me more.” Stoneward has known that the probabilities given before were too good to be true. It was still a disappointment.

“England has some of the best schools in the world. We should use this asset to lure young talent to England, offer them scholarships in Oxford, in Cambridge.”

“But they wouldn’t be English? They can’t play for England.”

“We would need to make the path to citizenship easier for them, immigration laws should be more integrative for top talent. We need to give them the opportunity to become subjects of the Queen before they play their first international. And then offer them to play for England. There is so much talent out there, and if we can get them while they’re young, we could prep up our squad in just a few years.”

“Scholarships for Oxford? How much would that even cost?”

“20, 25 thousand per year and student? We can pay a hundred scholarships and it wouldn’t even show up in our budget.”

“We are cutting budgets left and right!”

“Since we’re not stopping Brexit, why not dip into those 350 million pounds per week that we will save.”

“That was a lie!”

“I was joking.”

“Well, the scholarship thing wasn’t bad. What else is on the table?”

“One idea was to hack the video stream and bribe the referee, and then we can safely gaslight everyone.”

“Next idea.”

“We could poison the other teams.”

“Just stop it.”

“Or give them substances that would mess up their drug tests.”

“Why not getting FIFA to change the rules so we always win?”

“Oh, we considered it, but given the existing corruption inside FIFA it seems that would be difficult to outbid.”

Stonward sighed. “Now I was joking.”

“One suggestion is to create a permanent national team, and have them play in the national league. So they would be constantly competing, playing with each other, be better used to each other. A proper team.”

“How would we even pay for the players?”

“It would be an honor to play for the national team. Also, it could be a new rule to require the best players to play in the national team.”

“I think we are done here. These suggestions were… rather interesting. But I think they were mostly unactionable.” He started standing up.

Mirza looked desperately from one to the other. This meeting did not go as she had intended. “I think we can acknowledge the breadth of the creative proposals that have been on the table today, and enjoy a tea before you leave?”, she said, forcing a smile.

Stoneward nodded politely. “We sure can appreciate the creativity.”

“Now imagine this creativity turned into strategies in the pitch. Tactical moves. Variations to set pieces.”, the medium started, his voice slightly shifting.

“Yes, well, that would certainly be more interesting than most of the suggestions so far.”

“Wouldn’t it? And not only that, but if we could talk to the players. If we could expand their own creativity. Their own willpower. Their focus. Their energy to power through, not to give up.”

“If you’re suggesting to give them drugs, I am out.”

Ramsay laughed. “No, not drugs. But a helmet that emits electromagnetic waves and allows the brain muscles to work in more interesting ways.”

Stoneward looked over to the CEO. “Is that a possibility?”

Mirza looked uncomfortable, but tried to hide it. “Yes, yes, it is. We had tested it a few times, and the results were quite astonishing. It is just not what I would have expected as a proposal.”

“Why? Anything wrong with that?”

“Well, we use it for our top engineers, to help them focus when developing and designing solutions. The results are nothing short of marvelous. It is just, I didn’t think football would benefit that much from improved focus.”

Stoneward chuckled, as he sat down again. “Yes, many people underestimate the role of a creative mind in the game. I think I would now like a tea.” He looked to Ramsay. “Tell me more.”

The medium smiled. The system will be satisfied with the outcome.

(Originally published July 28, 2018 on Medium)

Do you hear the people sing?

"Do you hear the people sing, singing the song of angry men..."

Yesterday, a London performance of Les Miserables was interrupted by protesters raising awareness about climate change.

The audience booed.

It seems the audience was unhappy about having to experience protests and unrest during the performance of protests and unrest they wanted to enjoy.

The hypocrisy is rich in this one, but a very well engineered and expected one. But I guess only with the luxury of being detached from the actual event one can afford to enjoy the hypocrisy. I assume that for many people attending a West End London production of Les Miserables aims to be a proper highlight of the year, if not more. It's something that children gift their parents for the 30th wedding anniversary. It may be the reason for a trip to London. In addition, attending a performance like this is an escapist act, that you don't want interrupted with the problems of the real world. And given that it is a life performance, it seems disrespectful to the cast, to the artists, who pour their lives into their roles.

On the other side, the existential dread about climate change, and the insufficient actions by world leaders seem to demand increasingly bolder action and more attention. We are teaching our kids that they should act if something is not right. And we are telling them about the predictions for climate change. And then we are surprised if they try to do something? The message that climate change will be extremely disruptive to our lives and that we need to act much more decisively has obviously not yet been understood by enough people. And we, humanity, our leaders, elected or not, are most certainly not yet doing enough to try to prevent or at least mitigate the effects of climate change that are starting to roll over us.

It would be good, but admittedly unlikely, if both sides could appreciate the other more. Maybe the audience might be a bit appreciative of seeing the people sing the song of angry men in real. And maybe the protesters could choose their targets a bit more wisely. Why choose art? There are more disruptive targets if you were to protest the oil industry than a performance of Les Miserables. To be honest, if i were working for the oil industry, this is exactly the kind of actions I would be setting up. And with people who are actually into the cause. That way I can ensure that people will talk about interrupted theater productions and defaced paintings, instead of again having the hottest year in history, of floods, heatwaves, hurricanes, and the thousands of people who already died due to climate change induced catastrophes - and the billions more whose life will be turned upside down.

Dolly Parton's What's up?

Dolly Parton is an amazing person. On "Rockstar", her latest album, she covered a great number of songs, with amazing collaborators, often the original interpreters or writers. In her cover of "What's up?", a song I really love, with Linda Perry, she changed a few lines of the lyrics, as one often does when covering, to make a song their own.

Instead of "Twenty-five years and my life is still...", she's singing "All of these years and my life is still..." - and makes total sense, because unlike Linda Perry she wasn't 25 when she wrote it, she was 77 when she recorded it.

Instead of "I take a deep breath and I get real high", Dolly takes "a deep breath and wonders why", and it makes sense, because, hey, it's Dolly Parton.

But here's the line that hurts, right there when the song reaches its high point:

"And I pray,
Oh my God do I pray!
I pray every. single. day.
for a revolution!"

She changed one letter in the last word:

"for a resolution"

And it just breaks my heart. Because it feels so weak. Because it seems to betray the song. Because it seems to betray everything. And also because I might agree with her, and that feels like betrayal too.


Double copy in gravity

15 May 2021

When I was younger, I understood these theories much better. Today I read them like a fascinated, but a bit distant bystander.

But it is terribly interesting. What does turning physics into math mean? When we find a mathematical shortcut that works but we don't understand - is this real? What is the relation between mathematical formulas and reality? And will we finally understand gravity some day?

It was an interesting article, but I am not sure I understood it all. I guess, I'm getting old. Or just too specialized.

Doug Lenat (1950-2023)

When I started studying computer science, one of the initiation rites was to read the Jargon File. I stumbled when I read the entry on the microlenat:

microlenat: The unit of bogosity. Abbreviated μL, named after Douglas Lenat. Like the farad it is considered far too large a unit for practical use, so bogosity is usually expressed in microlenats.

I had not heard of Douglas Lenat then. English being my third language, I wasn’t sure what bogosity is. So I tried to learn a bit more to understand it, and I read a bit about Cyc and Eurisko, but since I just started computer science, my mind wasn’t really ready for things such as knowledge representation and common sense reasoning. I had enough on my plate struggling with resistors, electronegativity, and fourier transformations. Looking back, it is ironic that none of these played a particular role in my future, but knowledge representation sure did.

It took me almost ten years to come back to Cyc and Lenat’s work. I was then studying ontological engineering, a term that according to Wikipedia was coined by Lenat, a fact I wasn’t aware of at that time. I was working with RDF, which was co-developed by Guha, who has worked with Lenat at Cycorp, a fact I wasn’t aware of at that time. I was trying to solve problems that Lenat had tackled decades previously, a fact I wasn’t aware of at that time.

I got to know Cyc through OpenCyc and Cyc Europe, led by Michael Witbrock. I only met Doug Lenat a decade later when I was at Google.

Doug’s aspirations and ambitions had numerous people react with rolling eyes and sneering comments, as can be seen in the entry in the Jargon File. And whereas I might have absorbed similar thoughts as well, they also inspired me. I worked with a few people who told me “consider yourself lucky if you have a dozen people reading your paper, that’s the impact you will likely have”, but I never found that a remotely satisfactory idea. Then there were people like Doug, who shouted out “let’s solve common sense!”, and stormed ahead trying to do so.

His optimism and his bias to action, his can-do attitude, surely influenced me profoundly in choosing my own way forward. Not only once did I feel like I was channeling Lenat when I was talking about knowledge bases that anyone can edit, about libraries of functions anyone can use, or about abstract representations of natural language texts. And as ambitious as these projects have been called, they all carefully avoid the incomparably more ambitious goals Doug had his eyes set on.

And Doug didn’t do it from the comfort of a tenured academic position, but he bet his career and house on it, he founded a company, and kept it running for four decades. I was always saddened that Cyc was kept behind closed doors, and I hope that this will not hinder the impact and legacy it might have, but I understand that this was the magic juice that kept the company running.

One of Doug’s systems, Eurisko, became an inspiration and namesake for an AI system that played the role of the monster of the week in a first season episode of the X-Files, a fact I wasn’t aware of until now. Doug was a founder and advisory member of the TTI/Vanguard series of meetings, to which I was invited to present an early version of Abstract Wikipedia, a fact I wasn’t aware of until now. And I am sure there are more facts about Doug and his work and how it reverberated with mine that I am unaware of still.

Doug was a person ahead of their time, a person who lived, worked on and saw a future about knowledge that is creative, optimistic and inspiring. I do not know if we will ever reach that future, but I do know that Doug Lenat and his work will always be a beacon on our journey forward. Doug Lenat died yesterday in Austin, Texas, two weeks shy of his 73rd birthday, after a battle with cancer.

To state it in CycL, the language Cyc is written in:

 (#$dateOfDeath #$DougLenat "2023-08-31")
 (#$restInPeace #$DougLenat)

Draft: Collaborating on the sum of all knowledge across languages

For the upcoming Wikipedia@20 book, I published my chapter draft. Comments are welcome on the pubpub Website until July 19.

Every language edition of Wikipedia is written independently of every other language edition. A contributor may consult an existing article in another language edition when writing a new article, or they might even use the Content Translation tool to help with translating one article to another language, but there is nothing that ensures that articles in different language editions are aligned or kept consistent with each other. This is often regarded as a contribution to knowledge diversity, since it allows every language edition to grow independently of all other language editions. So would creating a system that aligns the contents more closely with each other sacrifice that diversity?

Differences between Wikipedia language editions

Wikipedia is often described as a wonder of the modern age. There are more than 50 million articles in almost 300 languages. The goal of allowing everyone to share in the sum of all knowledge is achieved, right?

Not yet.

The knowledge in Wikipedia is unevenly distributed. Let’s take a look at where the first twenty years of editing Wikipedia have taken us.

The number of articles varies between the different language editions of Wikipedia: English, the largest edition, has more than 5.8 million articles, Cebuano — a language spoken in the Philippines — has 5.3 million articles, Swedish has 3.7 million articles, and German has 2.3 million articles. (Cebuano and Swedish have a large number of machine generated articles.) In fact, the top nine languages alone hold more than half of all articles across the Wikipedia language editions — and if you take the bottom half of all Wikipedias ranked by size, they together wouldn’t have 10% of the number of articles in the English Wikipedia.

It is not just the sheer number of articles that differ between editions, but their comprehensiveness does as well: the English Wikipedia article on Frankfurt has a length of 184,686 characters, a table of contents spanning 87 sections and subsections, 95 images, tables and graphs, and 92 references — whereas the Hausa Wikipedia article states that it is a city in the German state of Hesse, and lists its population and mayor. Hausa is a language spoken natively by 40 million people and as a second language by another 20 million.

It is not always the case that the large Wikipedia language editions have more content on a topic. Although readers often consider large Wikipedias to be more comprehensive, local Wikipedias may frequently have more content on topics of local interest: the English Wikipedia knows about the Port of Călărași that it is one of the largest Romanian river ports, located at the Danube near the town of Călărași — and that’s it. The Romanian Wikipedia on the other hand offers several paragraphs of content about the port.

The topics covered by the different Wikipedias also overlap less than one would initially assume. English Wikipedias has 5.8 million articles, German has 2.2 million articles — but only 1.1 million topics are covered by both Wikipedias. A full 1.1 million topics have an article in German — but not in English. The top ten Wikipedias by activity — each of them with more than a million articles — have articles on only hundred thousand topics in common. 18 million topics are covered by articles in the different language Wikipedias — and English only covers 31% of these.

Besides coverage, there is also the question of how up to date the different language editions are: in June 2018, San Francisco elected London Breed as its new mayor. Nine months later, in March 2019, I conducted an analysis of who the mayor of San Francisco was, according to the different language versions of Wikipedia. Of the 292 language editions, a full 165 had a Wikipedia article on San Francisco. Of these, 86 named the mayor. The good news is that not a single Wikipedia lists a wrong mayor — but the vast majority are out of date. English switched the minute London Breed was sworn in. But 62 Wikipedia language editions list an out-of-date mayor — and not just the previous mayor Ed Lee, who became mayor in 2011, but also often Gavin Newsom (2004-2011), and his predecessor, Willie Brown (1996-2004). The most out-of-date entry is to be found in the Cebuano Wikipedia, who names Dianne Feinstein as the mayor of San Francisco. She had that role after the assassination of Harvey Milk and George Moscone in 1978, and remained in that position for a decade in 1988 — Cebuano was more than thirty years out of date. Only 24 language editions had listed the current mayor, London Breed, out of the 86 who listed the name at all.

An even more important metric for the success of a Wikipedia are the number of contributors: English has more than 31,000 active contributors — three out of seven active Wikimedians are active on the English Wikipedia. German, the second most active Wikipedia community, already only has 5,500 active contributors. Only eleven language editions have more than a thousand active contributors — and more than half of all Wikipedias have fewer than ten active contributors. To assume that fewer than ten active contributors can write and maintain a comprehensive encyclopedia in their spare time is optimistic at best. These numbers basically doom the mission of the Wikimedia movement to realize a world where everyone can contribute to the sum of all knowledge.

Enter Wikidata

Wikidata was launched in 2012 and offers a free, collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia, Wikimedia Commons, the other wikis of the Wikimedia movement, and to anyone in the world. Wikidata contains structured information in the form of simple claims, such as “San Francisco — Mayor — London Breed”, qualifiers, such as “since — July 11, 2018”, and references for these claims, e.g. a link to the official election results as published by the city.

One of these structured claims would be on the Wikidata page about San Francisco and state the mayor, as discussed earlier. The individual Wikipedias can then query Wikidata for the current mayor. Of the 24 Wikipedias that named the current mayor, eight were current because they were querying Wikidata. I hope to see that number go up. Using Wikidata more extensively can, in the long run, allow for more comprehensive, current, and accessible content while decreasing the maintenance load for contributors.

Wikidata was developed in the spirit of the Wikipedia’s increasing drive to add structure to Wikipedia’s articles. Examples of this include the introduction of infoboxes as early as 2002, a quick tabular overview of facts about the topic of the article, and categories in 2004. Over the year, the structured features became increasingly intricate: infoboxes moved to templates, templates started using more sophisticated MediaWiki functions, and then later demanded the development of even more powerful MediaWiki features. In order to maintain the structured data, bots were created, software agents that could read content from Wikipedia or other sources and then perform automatic updates to other parts of Wikipedia. Before the introduction of Wikidata, bots keeping the language links between the different Wikipedias in sync, easily contributed 50% and more of all edits.

Wikidata allowed for an outlet to many of these activities, and relieved the Wikipedias of having to run bots to keep language links in sync or of massive infobox maintenance tasks. But one lesson I learned from these activities is that I can trust the communities with mastering complex workflows spread out between community members with different capabilities: in fact, a small number of contributors working on intricate template code and developing bots can provide invaluable support to contributors who more focus on maintaining articles and contributors who write large swaths of prose. The community is very heterogeneous, and the different capabilities and backgrounds complement each other in order to create Wikipedia.

However, Wikidata’s structured claims are of a limited expressivity: their subject always must be the topic of the page, every object of a statement must exist as its own item and thus page in Wikidata. If it doesn’t fit in the rigid data model of Wikidata, it simply cannot be captured in Wikidata — and if it cannot be captured in Wikidata, it cannot be made accessible to the Wikipedias.

For example, let’s take a look at the following two sentences from the English Wikipedia article on Ontario, California:

“To impress visitors and potential settlers with the abundance of water in Ontario, a fountain was placed at the Southern Pacific railway station. It was turned on when passenger trains were approaching and frugally turned off again after their departure.”

There is no feasible way to express the content of these two sentences in Wikidata - the simple claim and qualifier structure that Wikidata supports can not capture the subtle situation that is described here.

An Abstract Wikipedia

I suggest that the Wikimedia movement develop an Abstract Wikipedia, a Wikipedia in which the actual textual content is being represented in a language-independent manner. This is an ambitious goal — it requires us to push the current limits of knowledge representation, natural language generation, and collaborative knowledge construction by a significant amount: an Abstract Wikipedia must allow for:

  1. relations that connect more than just two participants with heterogeneous roles.
  2. composition of items on the fly from values and other items.
  3. expressing knowledge about arbitrary subjects, not just the topic of the page.
  4. ordering content, to be able to represent a narrative structure.
  5. expressing redundant information.

Let us explore one of these requirements, the last one: unlike the sentences of a declarative formal knowledge base, human language is usually highly redundant. Formal knowledge bases usually try to avoid redundancy, for good reasons. But in a natural language text, redundancy happens frequently. One example is the following sentence:

“Marie Curie is the only person who received two Nobel Prizes in two different sciences.”

The sentence is redundant given a list of Nobel Prize award winners and their respective disciplines they have been awarded to — a list that basically every large Wikipedia will contain. But the content of the given sentence nevertheless appears in many of the different language articles on Marie Curie, and usually right in the first paragraph. So there is obviously something very interesting in this sentence, even though the knowledge expressed in this sentence is already fully contained in most of the Wikipedias it appears in. This form of redundancy is common place in natural language — but is usually avoided in formal knowledge bases.

The technical details of the Abstract Wikipedia proposal are presented in (Vrandečić, 2018). But the technical architecture is only half of the story. Much more important is the question whether the communities can meet the challenges of this project?

Wikipedia and Wikidata have shown that the communities are capable to meet difficult challenges: be it templates in Wikipedia, or constraints in Wikidata, the communities have shown that they can drive comprehensive policy and workflow changes as well as the necessary technological feature development. Not everyone needs to understand the whole stack in order to make a feature such as templates a crucial part of Wikipedia.

The Abstract Wikipedia is an ambitious future project. I believe that this is the only way for the Wikimedia movement to achieve its goal, short of developing an AI that will make the writing of a comprehensive encyclopedia obsolete anyway.

A plea for knowledge diversity?

When presenting the idea of the Abstract Wikipedia, the first question is usually: will this not massively reduce the knowledge diversity of Wikipedia? By unifying the content between the different language editions, does this not force a single point of view on all languages? Is the Abstract Wikipedia taking away the ability of minority language speakers to maintain their own encyclopedias, to have a space where, for example, indigenous speakers can foster and grow their own point of view, without being forced to unify under the western US-dominated perspective?

I am sympathetic with the intent of this question. The goal of this question is to ensure that a rich diversity in knowledge is retained, and to make sure that minority groups have spaces in which they can express themselves and keep their knowledge alive. These are, in my opinion, valuable goals.

The assumption that an Abstract Wikipedia, from which any of the individual language Wikipedias can draw content from, will necessarily reduce this diversity, is false. In fact, I believe that access to more knowledge and to more perspectives is crucial to achieve an effective knowledge diversity, and that the currently perceived knowledge diversity in different language projects is ineffective at best, and harmful at worst. In the rest of this essay I will argue why this is the case.

Language does not align with culture

First, it is wrong to use language as the dimension along which to draw the demarcation line between different content if the Wikimedia movement truly believes that different groups should be able to grow and maintain their own encyclopedias.

In case the Wikimedia movement truly believes that different groups or cultures should have their own Wikipedias, why is there only a single Wikipedia language edition for the English speakers from India, England, Scotland, Australia, the United States, and South Africa? Why is there only one Wikipedia for Brazil and Portugal, leading to much strife? Why are there no two Wikipedias for US Democrats and Republicans?

The conclusion is that the Wikimedia movement does not believe that language is the right dimension to split knowledge — it is a historical decision, driven by convenience. The core Wikipedia policies, vision, and mission are all geared towards enabling access to the sum of all knowledge to every single reader, no matter what their language, and not toward capturing all knowledge and then subdividing it for consumption based on the languages the reader is comfortable in.

The split along languages leads to the problem that it is much easier for a small language community to go “off the rails” — to either, as a whole, become heavily biased, or to adopt rules and processes which are problematic. The fact that the larger communities have different rules, processes, and outcomes can be beneficial for Wikipedia as a whole, since they can experiment with different rules and approaches. But this does not seem to hold true when the communities drop under a certain size and activity level, when there are not enough eyeballs to avoid the development of bad outcomes and traditions. For one example, the article about skirts in the Bavarian Wikipedia features three upskirt pictures, one porn actress, an anime screenshot, and a video showing a drawing of a woman with a skirt getting continuously shorter. The article became like this within a day or two of its creation, and, even though it has been edited by a dozen different accounts, has remained like this over the last seven years. (This describes the state of the article in April 2019 — I hope that with the publication of this essay, the article will finally be cleaned up).

A look on some south Slavic language Wikipedias

Second, a natural experiment is going on, where contributors that are more separated by politics than language differences have separate Wikipedias: there exist individual Wikipedia language editions for Croatian, Serbian, Bosnian, and Serbocroatian. Linguistically, the differences between the dialects of Croatian are often larger than the differences between standard Croatian and standard Serbian. Particularly the existence of the Serbocroatian Wikipedia poses interesting questions about these delineations.

Particularly the Croatian Wikipedia has turned to a point of view that has been described as problematic. Certain events and Croat actors during the 1990s independence wars or the 1940s fascist puppet state might be represented more favorably than in most other Wikipedias.

Here are two observations based on my work on south Slavic language Wikipedias:

First, claiming that a more fascist-friendly point of view within a Wikipedia increases the knowledge diversity across all Wikipedias might be technically true, but is practically insufficient. Being able to benefit from this diversity requires the reader to not only be comfortable reading several different languages, but also to engage deeply enough and spend the time and interest to actually read the article in different languages, which is mostly a profoundly boring exercise, since a lot of the content will be overlapping. Finding the juicy differences is anything but easy, especially considering that most readers are reading Wikipedia from mobile devices, and are just looking to satisfy a quick information need from a source whose curation they trust.

Most readers will only read a single language version of an article, and thus any diversity that exists across different language editions is practically lost. The sheer existence of this diversity might even be counterproductive, as one may argue that the communities should not spend resources on reflecting the true diversity of a topic within each individual language. This would cement the practical uselessness of the knowledge diversity across languages.

Second, many of the same contributors that write the articles with a certain point of view in the Croatian Wikipedia, also contribute on the English Wikipedia on the articles about the same topics — but there they suddenly are forced and able to compromise and incorporate a much wider variety of points of view. One might hope the contributors would take the more diverse points of view and migrate them back to their home Wikipedias — but that is often not the case. If contributors harbor a certain point of view (and who doesn’t?) it often leads to a situation where they push that point of view as much as they can get away with in each of the projects.

It has to be noted that the most blatant digressions from a neutral point of view in Wikipedias like the Croatian Wikipedia will not be found in the most central articles, but in the large periphery of articles surrounding these central articles which are much harder to keep an eye on.

Abstract Wikipedia and Knowledge diversity

The Abstract Wikipedia proposal does not require any of the individual language editions to use it. Each language community can decide for each article whether to fall back on the Abstract Wikipedia or whether to create their own article in their language. And even that decision can be more fine grained: a contributor can decide for an individual article to incorporate sections or paragraphs from the Abstract Wikipedia.

This allows the individual Wikipedia communities the luxury to entirely concentrate on the differences that are relevant to them. I distinctly remember that when I started the Croatian Wikipedia: it felt like I had the burden to first write an article about every country in the world before I could write the articles I cared about, such as my mother’s home village — because how could anyone defend a general purpose encyclopedia that might not even have an article on Nigeria, a country with a population of a hundred million, but one on Donji Humac, a village with a population of 157? Wouldn’t you first need an article on all of the chemical elements that make up the world before you can write about a local food?

The Abstract Wikipedia frees a language edition from this burden, and allows each community to entirely focus on the parts they care about most — and to simply import the articles from the common source for the topics that are less in their focus. It allows the community to make these decisions. As the communities grow and shift, they can revisit these decisions at any time and adapt them.

At the same time, the Abstract Wikipedia makes these differences more visible since they become explicit. Right now there is no easy way to say whether the fact that Dianne Feinstein is listed as the Mayor of San Francisco in the Cebuano Wikipedia is due to cultural particularities of the Cebuano language communities or not. Are the different population numbers of Frankfurt in the different language editions intentional expressions of knowledge diversity? With an Abstract Wikipedia, the individual communities could explicitly choose which articles to create and maintain on their own, and at the same time remove a lot of unintentional differences.

By making these decisions more explicit, it becomes possible to imagine an effective workflow that observes these intentional differences, and sets up a path to integrate them into the common article in the Abstract Wikipedia. Right now, there are 166 different language versions of the article on the chemical element Helium — it is basically impossible for a single person to go through all of them and find the content that is intentionally different between them. With an Abstract Wikipedia, which contains the common shared knowledge, contributors, researchers, and readers can actually take a look at those articles that intentionally have content that replaces or adds to the commonly shared one, assess these differences, and see if contributors should integrate the differences in the shared article.

The differences in content may be reflecting difference in policies, particularly in policies of notability and reliability. Whereas on first glance it might seem that the Abstract Wikipedia might require unified notability and reliability requirements across all Wikipedias, this is not the case: due to the fact that local Wikipedias can overlay and suppress content from the Abstract Wikipedias, they can adjust their Wikipedias based on their own rules. And the increased visibility of such decisions will lead to easier identify biases, and hopefully also to updated rules to reduce said bias.

A new incentive infrastructure

The Abstract Wikipedia will evolve the incentive infrastructure of Wikipedia.

Presently, many underrepresented languages are spoken in areas that are multilingual. Often another language spoken in this area is regarded as a high-prestige language, and is thus the language of education and literature, whereas the underrepresented language is a low-prestige language. So even though the low-prestige language might have more speakers, the most likely recruits for the Wikipedia communities, people with education who can afford internet access and have enough free time, will be able to contribute in both languages.

In which language should I contribute? If I write the article about my mother’s home town in Croatian, I make it accessible to a few million people. If I write the article about my mother’s home town in English, it becomes accessible to more than a hundred times as many people! The work might be the same, but the perceived benefit is orders of magnitude higher: the question becomes, do I teach the world about a local tradition, or do I tell my own people about their tradition? The world is bigger, and thus more likely to react, creating a positive feedback loop.

This cannibalizes the communities for local languages by diverting them to the English Wikipedia, which is perceived as the global knowledge community (or to other high-prestige languages, such as Russian or French). This is also reflected in a lot of articles in the press and in academic works about Wikipedia, where the English Wikipedia is being understood as the Wikipedia. Whereas it is known that Wikipedia exists in many other languages, journalists and researchers are, often unintentionally, regarding the English Wikipedia as the One True Wikipedia.

Another strong impediment to recruiting contributors to smaller Wikipedia communities is rarely explicitly called out: it is pretty clear that, given the current architecture, these Wikipedias are doomed in achieving their mission. As discussed above, more than half of all Wikipedia language editions have fewer than ten active contributors — and writing a comprehensive, up-to-date Wikipedia is not an achievable goal with so few people writing in their free time. The translation tools offered by the Wikimedia Foundation can considerably help within certain circumstances — but for most of the Wikipedia languages, automatic translation models don’t exist and thus cannot help the languages which would need it the most.

With the Abstract Wikipedia though, the goal of providing a comprehensive and current encyclopedia in almost any language becomes much more tangible: instead of taking on the task of creating and maintaining the entire content, only the grammatical and lexical knowledge of a given language needs to be created. This is a far smaller task. Furthermore, this grammatical and lexical knowledge is comparably static — it does not change as much as the encyclopedic content of Wikipedia, thus turning a task that is huge and ongoing into one where the content will grow and be maintained without the need of too much maintenance by the individual language communities.

Yes, the Abstract Wikipedia will require more and different capabilities from a community that has yet to be found, and the challenges will be both novel and big. But the communities of the many Wikimedia projects have repeatedly shown that they can meet complex challenges with ingenious combinations of processes and technological advancements. Wikipedia and Wikidata have both demonstrated the ability to draw on technologically rather simple canvasses, and create extraordinary rich and complex masterpieces, which stand the test of time. The Abstract Wikipedia aims to challenge the communities once again, and the promise this time is nothing else but to finally be able to reap the ultimate goal: to allow every one, no matter what their native language is, to share in the sum of all knowledge.

Acknowledgements

Thanks to the valuable suggestions on improving the article to Jamie Taylor, Daniel Russell, Joseph Reagle, Stephen LaPorte, and Jake Orlowitz.

Bibliography

  • Bao, Patti, Brent J. Hecht, Samuel Carton, Mahmood Quaderi, Michael S. Horn and Darren Gergle. “Omnipedia: bridging the wikipedia language gap.” in Proceedings of the Conference on Human Factors in Computing Systems (CHI 2012), edited by Joseph A. Konstan, Ed H. Chi, and Kristina Höök. Austin: Association for Computing Machinery, 2012: 1075-1084.
  • Eco, Umberto. The Search for the Perfect Language (the Making of Europe). La ricerca della lingua perfetta nella cultura europea. Translated by James Fentress. Oxford: Blackwell, 1995 (1993).
  • Graham, Mark. “The Problem With Wikidata.” The Atlantic, April 6, 2012. https://www.theatlantic.com/technology/archive/2012/04/the-problem-with-wikidata/255564/
  • Hoffmann, Thomas and Graeme Trousdale, “Construction Grammar: Introduction”. In The Oxford Handbook of Construction Grammar, edited by Thomas Hoffmann and Graeme Trousdale, 1-14. Oxford: Oxford University Press, 2013.
  • Kaffee, Lucie-Aimée, Hady ElSahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon S. Hare and Elena Simperl. “Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for Article Placeholders.” in Proceedings of the 15th European Semantic Web Conference (ESWC 2018), edited by Aldo Gangemi, Roberto Navigli, Marie-Esther Vidal, Pascal Hitzler, Raphaël Troncy, Laura Hollink, Anna Tordai, and Mehwish Alam. Heraklion: Springer, 2018: 319-334.
  • Kaffee, Lucie-Aimée, Hady ElSahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon S. Hare and Elena Simperl. “Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata.” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2, edited by Marilyn Walker, Heng Ji, and Amanda Stent. New Orleans: ACL Anthology, 2018: 640-645.
  • Schindler, Mathias and Denny Vrandečić. “Introducing new features to Wikipedia: Case studies for Web Science.” IEEE Intelligent Systems 26, no. 1 (January-February 2011): 56-61.
  • Vrandečić, Denny. “Restricting the World.” Wikimedia Deutschland Blog. February 22, 2013. https://blog.wikimedia.de/2013/02/22/restricting-the-world/
  • Vrandečić, Denny and Markus Krötzsch. “Wikidata: A Free Collaborative Knowledgebase.” Communications of the ACM 57, no. 10 (October 2014): 78-85. DOI 10.1145/2629489.
  • Kaljurand, Kaarel and Tobias Kuhn. “A Multilingual Semantic Wiki Based on Attempto Controlled English and Grammatical Framework.” in Proceedings of the 10th European Semantic Web Conference (ESWC 2013), edited by Philipp Cimiano, Oscar Corcho, Valentina Presutti, Laura Hollink, and Sebastian Rudolph. Montpellier: Springer, 2013: 427-441.
  • Milekić, Sven. “Croatian-language Wikipedia: when the extreme right rewrites history.” Osservatorio Balcani e Caucaso, September 27, 2018. https://www.balcanicaucaso.org/eng/Areas/Croatia/Croatian-language-Wikipedia-when-the-extreme-right-rewrites-history-190081
  • Ranta, Aarne. Grammatical Framework: Programming with Multilingual Grammars. Stanford: CSLI Publications, 2011.
  • Vrandečić, Denny. “Towards a multilingual Wikipedia,” in Proceedings of the 31st International Workshop on Description Logics (DL 2018), edited by Magdalena Ortiz and Thomas Schneider. Phoenix: Ceur-WS, 2018.
  • Wierzbicka, Anna. Semantics: Primes and Universals. Oxford: Oxford University Press, 1996.
  • Wikidata Community: “Lexicographical data.” Accessed June 1, 2019. https://www.wikidata.org/wiki/Wikidata:Lexicographical_data
  • Wulczyn, Ellery, Robert West, Leila Zia and Jure Leskovec. “Growing Wikipedia Across Languages via Recommendation.” in Proceedings of the 25th International World-Wide Web Conference (WWW 2016), edited by Jaqueline Bourdeau, Jim Hendler, Roger Nkambou, Ian Horrocks, and Ben Y. Zhao. Montréal: IW3C2, 2016: 975-985.

EMWCon 2019, Day 1

Today was the first day of the Enterprise MediaWiki Conference, EMWCon, in Daly City. Among the attendees were people from NASA (6 or more people), UIC (International Union of Railways), the UK Ministry of Defence, the US radioactivity safety agencies, cancer research institutes, the Bureaus of Labour Statistics, PG&E, General Electric, and a number of companies providing services around MediaWiki, such as WikiTeq, Wikiworks, dokit, etc., with or without semantic extensions. The conference was located at the Headquarter of Genesys.

I'm not going to comment on all talks, and also I will not faithfully report on the talks - you can just go to YouTube to watch the talks themselves. The following is a personal, biased view of the first day.

NASA made an interesting comment early on: the discussion was about MediaWiki and its lack of fine-grained access control. You can set up a MediaWiki easily for a controlled group (so that not everyone in the world can access it), but it is not so easy to say "oh, this set of pages is available for people in this group, and managers in that org can access the pages with this markers", etc. So NASA, at first, set up a lot of wiki installations, each one for such specific groups - but eventually turned it all around and instead had a small number of well-defined groups and merged the wikis into them, tearing down barriers within the org and making knowledge wider available.

Evita Hollis from General Electric had an interesting point in her presentation on how GE does knowledge sharing: they use SharePoint and Yammer to connect people to people, and MediaWiki to connect people to Knowledge. MediaWiki has been not-exactly-great at allowing people to work together in real-time - it is a different flow, where you capture and massage knowledge slowly into it. There is a reason why Ops at Wikimedia do not use a wiki during an incident that much, but rather IRC. I think there is a lot of insight in her argument - and if we take that serious, we could actually really lift MediaWiki to a new level, and take Wikipedia there too.

Another interesting point is that SharePoint at General Electric had three developers, and MediaWiki had one. The question from the audience was, whether that reflect how difficult it is to work with SharePoint, or whether that reflected some bias of the company towards SharePoint. Hollis was adamant about how much she likes Sharepoint, but the reason for the imbalance was that MediaWiki, particularly Semantic MediaWiki, allows actually much more flexibility and power than SharePoint without having to touch a single line of wiki source code. It is a platform that allows for rapid experimentation by the end user (I am adding the Spiderman adage about great power coming with great responsibility).

Daren Welsh from NASA talked about many different forms of biases and how they can bubble up on your wiki. Very interesting was one effect: if knowledge from the wiki is becoming too readily availble, people may start to become dependent on it. They had tests where they took away the wiki randomly from flight controllers in training, in order to ensure they are resourceful enough to still figure out what to do - and some failed miserably.

Ike Hecht had a brilliant presentation on the kind of quick application development Semantic MediaWiki lends itself to. He presented a task manager, a news feed, and a file management system, calling them "Semantic Structures That Do Stuff" - which is basically a few pages for your wiki, instead of creating extensions for all of these. This also resonated with GE's statement about needling less developers. I think that this is wildly underutilized and there is a lot of value in this idea.

Thanks to Yaron Koren - who also gave an intro to the topic - and Cindy Cicalese for organizing the conference, and Genesys for hosting us. All presentations are available on YouTube.

EMWCon 2019, Day 2

Today was the second day of the Enterprise MediaWiki Conference, EMWCon, in Daly City at the Genesys headquarters.

The day started with my keynote on Wikidata and the Abstract Wikipedia idea. The idea was received very friendly.

Today, the day was filled with stories from people building systems on top of MediaWiki, and in particularly Semantic MediaWiki, Cargo, and some Wikibase. This included SFMoma presenting their system to collaboratively document art, using Cargo and Lua on the League of Legends wiki, running a whole wiki farm for Finnish memory and language institutions, the Lost Plays database, and - what I found particularly impressive - an engineer at NASA who implemented a workflow for document approval including authorization, audibality, and a full Web interface within a mere week, and still thinking that it could have been done much faster.

A common theme was "how incredibly easy it was". Yes, almost everyone mentioned something they got stumped on, and this really points to the community needing maybe more usage on StackOverflow or IRC or something, but in so many use cases, people who were not developers were able to create pretty complex workflows and apps right there in their browsers. This also ties in with the second common theme, that a lot of the deployments of such wikis are often starting "under the radar".

There were also genuinely complex solutions that were using Semantic MediaWiki as a mere component: Matteo Busanelli was presenting a solution that included lifting external data sources, deploying ontologies, reasoning, and all the whistles and bells - a very impressive and powerful architecture.

The US government uses Semantic MediaWiki in many places, most notably Intellipedia used by more than 16 intelligence agencies, Diplopedia by the Department of State, and Powerpedia for the Department of Energy. EPA's Statipedia is no more, but new wikis are popping up in other agency, such as WikITA for the International Trade Administration, and for the Nuclear Regulatory Commission. Canada's GCpedia was mentioned with a lot of respect, and the wish that the US would have something similar.

NASA has a whole wiki farm: within mission control alone they had 12 different wikis after a short while, many grown bottom up. They noticed that it would make sense to merge them together - which wasn't easy, neither technically nor legally nor managerially. They found that a lot of their knowledge was misclassified - for example, they classified handbooks which can be bought by anyone on Amazon. One of the biggest changes the wiki caused at NASA was that the merged ISS wiki lead to opening more knowledge to more people, and drawing the circles larger. 20% of the people who have access to the wikis actively contribute to the wikis! This is truly impressive.

So far, no edit has been made from space - due to technical issues. But they are working on it.

The day ended with a panel, asking the question where MediaWiki is in the marketplace, and how to grow.

Again, thanks to Yaron Koren and Cindy Cicalese for organizing the conference, and Genesys for hosting us. All presentations are available on YouTube.

EMWCon Spring 2019

I'm honored to be invited to keynote the Enterprise MediaWiki conference in Daly City. The keynote is on Thursday, I will talk about Wikidata and beyond - towards an abstract Wikipedia.

The talk is planned to be recorded, so it should be available afterwards for everyone interested.

EON2006 deadline extension

We gave the workshop on Evaluating Ontologies for the Semantic Web at the WWW2006 in Edinburgh an extension to the end of the week, due to a number of requests. I think it is more fair to give an extension to all the authors than to allow some of them on request and to deny this possibility to those too shy to ask. If you have something to say on the quality of ontologies and ontology assessment, go ahead and submit! You still have a week to go, and short papers are welcomed as well. The field is exciting and new, and considering the accepted ESWC paper the interest in the field seems to be growing.

A first glance of the submissions reveals an enormous heterogeneity of methods and approaches. Wow, very cool and interesting.

What surprised me was the reaction of some: "oh, an extension. You didn't get enough submissions, sorry". I know that this is a common reason for deadline extensions, and I was afraid of that, too. A day before the deadline there was exactly one submission and we were considering cancelling the workshop. It's my first workshop and thus such things make me a whole lot nervous. But now, two days after the deadline I am quite more relaxed. The number of submissions is fine, and we know about a few more to come. Still: we are looking for more submissions actively. For the sole purpose of gathering the community of people interested in ontology evaluation in Edinburgh! I expect this workshop to become quite a leap for ontology evaluation, and I want the whole community to be there.

I am really excited about the topic, as I consider it an important foundation for the Semantic Web. And as you know I want the Semantic Web to lift off, the sooner the better. So let's get these foundations right.

For more, take a peek at the ontology evaluation workshop website.


Comments are still missing on this post.

ESWC2005 is over

The ESWC2005 is over and there have been a lot of interesting stuff. Check the proceedings! There were some really nice idea, like RDFSculpt, good work like temporal RDF (Best Paper Award), the OWL-Eu extensions, naturally the Karlsruhe stuff like ontology evolution, many many persons to meet, get to know, many chats, and quite some ideas. Blogging from here is quite a mess, the uplouad rate is catastrophal, so I will keep this short, but I certainly hope to pick up on some of the talks and highlight the more interesting ideas (well, interesting to me, at least). Stay tuned! ;)

ESWC2006 is over

I have been the week in Budva, Montenegro, at the ESWC2006. It was lovely. The keynotes were inspiring, the talks had a good quality, and the SemWiki workshop was plain great. Oh, and the Semantic Wikipedia won the best poster award!

But what is much more interesting is the magic that Tom Heath, the Semantic Web Technologies Co-ordinator, managed: the ESWC website is a showcase of Semantic Web technologies! A wiki, a photo annotation tool, a chat, a search, a bibliography server, a rich semantic client, an ontology, the award winning flink... try it out!

Now I am in Croatia, and taking my first real break since I started on the Semantic Web. Offline for three weeks.

Yay.

Economic impacts of large language models, a take

Regarding StableDiffusion and GPT and similar models, there is one discussion point floating around, which I find seems to dominate the discussion but may not be the most relevant one. As we know, the training data for these models has been "basically everything the trainers could get their hands on", and then usually some stuff which is identified as possibly problematic is removed.

Many artists are currently complaining about their images, for which they hold copyright, being used for training these models. I think these are very reasonable complaints, and we will likely see a number of court cases and even changes to law to clarify the legal aspects of these practises.

From my perspective this is not the most important concern though. I acknowledge that I have a privileged perspective in so far as I don't pay my rent based on producing art or text in my particular style, and I entirely understand if someone who does is worried about that most, as it is a much more immediate concern.

But now assume that these models were all trained on public domain images and texts and music etc. Maybe there isn't enough public domain content out there right now? I don't know, but training methods are getting increasingly more efficient and the public domain is growing, so that's likely just a temporary challenge, if at all.

Does that change your opinion of such models?

Is it really copyright that you are worried about, or is it something else?

For me it is something else.

These models will, with quite some certainty, become similarly fundamental and transformative to the economy as computers and electricity have been. Which leads to many important questions. Who owns these models? Who can run them? How will the value that is created with these models be captured and distributed across society? How will these models change the opportunities of contributing to society, and there opportunities in participating in the wealth being created?

Copyright is one of the current methods to work with some of these questions. But I don't think it is the crucial one. What we need is to think about how the value that is being created is distributed in a way that benefits everyone, ideally.

We should live in a world in which the capabilities that are being discovered inspire excitement and amazement because of what might be possible in the future. Instead we live in a world where they cause anxiety and fear because of the very real possibility of further centralising wealth more effectively and further destabilizing lives that are already precarious. I wish we could move from the later world to the former.

That is not a question of technology. That is a question of laws, social benefits, social contracts.

A similar fear has basically killed the utopian vision which was once driving a project such as Google Books. What could have been a civilisational dream of having all the books of the world available everywhere has become so much less. Because of the fears of content creators and publishers.

I'm not saying these fears were wrong.

Unfortunately, I do not know what the answer is. What changes need to happen. Does anyone have links to potential answers, that are feasible? Feasible in the sense that the necessary changes have a chance of being actually implemented, as changes to our legal and social system.

My answer used to be Universal Basic Income, and part of me still thinks it might be our best shot. But I'm not as sure as I used to be twenty years ago. Not only about whether we can ever get there, but even whether it would be a good idea. It would certainly be a major change that would alleviate many of the issues raised above. And it could be financed by a form of AI tax, to ensure the rent is spread widely. But we didn't do that with industrialization and electrification, and there are reasonable arguments against.

And yet, it feels like the most promising way forward. I'm torn.

If you read this far, thank you, and please throw a few ideas and thoughts over, in the hope of getting unstuck.

England eagerly lacking cofidence

My Google Alerts just send me the following news alert about Croatia. At least the reporters checked all their sources :)

England players lacking confidence against Croatia International Herald Tribune - France AP ZAGREB, Croatia: England's players confessed to a lack of confidence when they took on football's No. 186-ranked nation in their opening World Cup ...

England eager to break Croatia run Reuters UK - UK By Igor Ilic ZAGREB (Reuters) - England hope to put behind their gloomy recent experiences against Croatia when they travel to Zagreb on Wednesday for an ...

Enjoying dogfood

I guess the blog's title is a metaphor gone awry... well, whatever. About two years ago at Semantic Karlsruhe (that is people into semantic technologies at AIFB, FZI, and ontoprise) we installed a wiki to support the internal knowledge management of our group. It was well received and used. We also have an annual survey about a lot of stuff within our group, to see where we can improve. So last year we found out, that the wiki was received OK. Some were happy, some were not, nothing special.

We switched to a Semantic MediaWiki since last year, which was a labourous step. But it seems to have payed off: recently we had our new survey. The scale goes from very satisfied (56%) over satisfied (32%) and neutral (12%) to unsatisfied (0%) and very unsatisfied (0%).

You can tell I was very happy with these results -- and quite surprised. I am not sure, how much the semantics made the difference, or maybe we did more "gardening" on the wiki since we felt more connected to it, and gardening cannot be underestimated, both in importance and resources. In this vein, very special thanks to our gardeners!

One problem a very active gardener pointed out not long ago was that gardening gets hardly appreciated. How do other small-scale or intranet wiki tackle this problem? Well, in Wikipedia we have the barnstars, and related cool stuff -- but that is basically working because we have an online community and online gratification is OK for that (or do I misunderstand the situation here? I am not a social scientist). But how to translate that to an actually offline community that has some online extensions like in our case?

Erdös number, update

I just made an update to a post from 2006, because I learned that my Erdös number has went down from 4 to 3. I guess that's pretty much it - it is not likely I'll ever become a 2.

Existential crises

I think the likelihood of AI killing all humans is bigger than the likelihood of climate change killing all humans.

Nevertheless I think that we should worry and act much more about climate change than about AI.

Allow me to explain.

Both AI and climate change will, in this century, force changes to basically every aspect of the lives of basically every single person on the planet. Some people may benefit, some may not. The impact of both will be drastic and irreversible. I expect the year 2100 to look very different from 2000.

Climate change will lead to billions of people to suffer, and to many deaths. It will destroy the current livelihoods of many millions of people. Many people will be forced to leave their homes, not because they want to, but because they have to in order to survive. Richer countries with sufficient infrastructure to deal with the direct impact of a changed climate will have to decide how to deal with the millions of people who want to live and who want their children not to die. We will see suffering on a scale never seen before, simply because there have never been this many humans on the planet.

But it won't be an existential threat to humanity (the word humanity has at least two meanings: 1) the species as a whole, and 2) certain values we associate with humans. Unfortunately, I only refer to the first meaning. The second meaning will most certainly face a threat). Humanity will survive, without a doubt. There are enough resources, there are enough rich and powerful people, to allow millions of us to shelter away from the most life threatening consequences of climate change. Millions will survive for sure. Potentially at the costs of many millions lives and the suffering of billions. Whole food chains, whole ecosystems may collapse. Whole countries may be abandoned. But humanity will survive.

What about AI? I believe that AI can be a huge boon. It may allow for much more prosperity, if we spread out the gains widely. It can remove a lot of toil from the life of many people. It can make many people more effective and productive. But history has shown that we're not exactly great at sharing gains widely. AI will lead to disruptions in many economic sectors. If we're not careful (and we likely aren't) it might lead to many people suffering from poverty. None of these pose an existential threat to humanity.

But there are outlandish scenarios which I think might have a tiny chance of becoming true and which can kill every human. Even a full blown Terminator scenario where drones hunt every human because the AI has decided that extermination is the right step. Or, much simpler, that in our idiocy we let AI supervise some of our gigantic nuclear arsenal, and that goes wrong. But again, I merely think these possible, but not in the slightest likely. An asteroid hitting Earth and killing most of us is likelier if you ask my gut.

Killing all humans is a high bar. It is an important bar for so called long-termists, who may posit that the death of four or five billion people isn't significant enough to worry about, just a bump in the long term. They'd say that they want to focus on what's truly important. I find that reasoning understandable, but morally indefensible.

In summary: there are currently too many resources devoted to thinking about the threat of AI as an existential crisis. We should focus on the short term effect of AI and aim to avoid as many of the negative effects as possible and to share the spoils of the positive effects. We're likely to end up with socializing the negative effects, particularly amongst the weakest members of society, and privatizing the benefits. That's bad.

We really need to devote more resources towards avoiding climate change as far as still possible, and towards shielding people and the environment from the negative effects of climate change. I am afraid we're failing at that. And that will cause far more negative impact in the course of this century than any AI will.

F in Croatian

I was writing some checks to find errors in the lexical data in Wikidata for Croatian, and one of the things I tried was to check whether the letters in the words are all part of the Croatian alphabet. But instead of just taking a list, or writing down from memory, I looked at the data, and added letter after letter. And then I was surprised to find that the letter "f" only appears in loanwords. And I look it up in the Croatian Encyclopedia and it simply states that "f" is not a letter of the old slavic language.

I was mindblown. I speak this language since I can remember, and i didn't notice that there is no "f" but in loanwords. And "f" seems like such a fundamental sound! But no, wrong!

If you speak a slavic language, do you have the letter "f"?

FOAF browser

Thanks Josef, thanks Pascal! I have complained that Morten's Foaf explorer is still down, they, instead of complainig as well, pointed me to their own FOAF explorers: Josef has his Advanced FOAF Explorer, very minimalistic, but it works! And Pascal points to Martin Borho's FOAFer. FOAFer has a few nice properties.

Thank you guys, your sites are great!

Is your source code there? Because both of your tools lack a bit in looks, to be honest. And do you really think, users like to see SHA1 sums? Or error messages? (Well, actually, that was OK, that helped me discover a syntax error in the AIFB FOAF files). Please, don't misunderstand me: your site really are great. And I like to use them. But in order to reach a more general audience, we need something slicker, nicer.

Maybe a student in Karlsruhe would like to work on such a thing? Email me.

FOAFing around

I tried to create FOAF-files out of the ontology we created during the Summer School for the Semantic Web. It wasn't that hard, really: with our ontology I have enough data to create some FOAF-skeletons, so I looked into the FOAF-specification and started working on it.

<foaf:person about="#gosia">
  <foaf:knows resource="#anne" />
  <foaf:name datatype="http://www.w3.org/2001/XMLSchema#string">Gosia Mochol</foaf:name>
</foaf:person>
<rdf:description about="#anne">
  <rdfs:isdefinedby resource="http://semantic.nodix.net/sssw05/anne.rdf" />
...

Well, every one of us gets his own FOAF-file, where one can find more data about the person. Some foaf:knows-relations have been created automatically for the people who worked together in a miniproject. I didn't want to assume too much else.

The code up there is valid FOAF as much as I can tell. But all the (surprisingly sparse) tools could not cope with it, due to different reasons. One complained about the datatype-declaration in the foaf:name and then ignored the name at all. Most tools didn't know that rdfs:isDefinedBy is a subproperty of rdfs:seeAlso, and thus were not able to link the FOAF-files. And most tools were obviously surprised that I gave the persons URIs instead of using the IFP over the sha1-sum of their e-Mails. The advantage of having URIs is that we can use those URIs to tag pictures or to keep track of each other publications, after the basic stuff has been settled.

Pitily, the basic stuff is not settled. To me it seems, that the whole FOAF stuff, although being called the most widespread use of the Semantic Web, is still in its infancy. The tools hardly collaborate, they don't care too hard about the specs, and there seems no easy way to browse around (Mortens explorer was down at the time when I created the FOAFs, which was frustrating, but now it works: take a look at the created FOAF files, entering with my generated FOAF file or the one for Enrico Motta). Maybe I just screwed it all up when generating the FOAF-files in the first run, but I don't think so really...

Guess someone needs to create some basic working toolset for FOAF. Does anyone need requirements?

Failed test

Testing my mobile blogging thingie (and it failed, should have gone to the other blog). Sorry for the German noise.

Feeding the cat

Every morning, I lovingly and carefully scoop out every single morsel of meat from the tin of wet food for our cat. And then he eats a tenth of it.

Fellow bloggers

Just a few pointers to people with blogs I usually follow:

  • Max Völkel, a colleague from the AIFB, soon moving to the FZI and right now visiting DERI. He obviously likes groups with acronyms. And he's a fun read.
  • Valentin Zacharias, who has deeper thoughts on this whole Semantic Web stuff than most people I know, working at the FZI. He's often a thought-provoking read.
  • Planet RDF. The #1 blog on news for the (S/s)emantic (W/w)eb, both with major and minor initials. That's informative.
  • Nick Kings from BT exact. We are working together on the SEKT project, and he just started to blog. Welcome! A long first post. But the second leads to a great video!
  • Brendan Eich, one of the Mozilla gurus. I want to know where Mozilla is headed to - so I read his musings.
  • PhD. It's not a person, it's a webcomic, granted, but they offer a RSS feed for the comic. Cool. I always look forward for new episodes.

So, if you think I should read you, drop me a note. I especially like peers, meaning, people who like I do are working on the Semantic Web, maybe PhD students, and who don't know the answer to anything, but like to work on it, making the web come real.

Finding God through Information Theory

I found that surprising: Luciano Floridi, one of the most-cited living philosophers, started studying information theory because young Floridi, still Catholic, concluded that God's manifestation to humanity must be an information process. He wanted to understand God's manifestation through the lens of information.

He didn't get far in answering that question, but he did become the leading expert in the Philosophy of Information, and an expert in Digital Ethics (and also, since then, an agnostic).

Post scriptum: The more I think about it, the more I like the idea. Information theory is not even one of these vague, empirical disciplines such as Physics, but more like Mathematics and Logics, and thus unavoidable. Any information exchange, i.e. communication, must follow its rules. Therefore the manifestation of God, i.e. the way God chooses to communicate themselves to us, must also follow information theory. So this should lead to some necessary conditions on the shape of such a manifestation.

It's a bright idea. I am not surprised it didn't go anywhere, but I still like the idea.

Could have at least engendered a novel Proof for the Existence of God. They have certainly come from more surprising corners.

Source: https://philosophy.fireside.fm/1

More about Luciano Flordi on Wikipedia.

First look at Freebase

I got the chance to get a close look at Freebase (thanks, Robert!). And I must say -- I'm impressed. Sure, the system is still not ready, and you notice small glitches happening here and there, but that's not what I was looking for. What I really wanted to understand is the idea behind the system, how it works -- and, since it was mentioned together with Semantic MediaWiki one or twice, I wanted to see how the systems compare.

So, now here are my first impressions. I will sure play more around with the system!

Freebase is a databse with a flexible schema and a very user friendly web front end. The data in the database is offered via an API, so that information from Freebase can be included in external applications. The web front end looks nice, is intuitive for simple things, and works for the not so simple things. In the background you basically have a huge graph, and the user surfs from node to node. Everything can be interconnected with named links, called properties. Individuals are called topics. Every topic can have a multitude of types: Arnold Schwarzenegger is of type politician, person, actor, and more. Every such type has a number of associated properties, that can either point to a value, another topic, or a compound value (that's their solution for n-ary relations, it's basically an intermediate node). So the type politician adds the party, the office, etc. to Arnold, actor adds movies, person adds the family relationships and dates of birth and death (I felt existentially challenged after I created my user page, the system created a page of me inside freebase, and there I had to deal with the system asking me for my date of death).

It is easy to see that types are crucial for the system to work. Are they the right types to be used? Do they cover the right things? Are they interconnected well? How do the types play together? A set of types and their properties form a domain, like actor, movie, director, etc. forming the domain "film", or album, track, musician, band forming the domain "music". A domain is being administrated by a group of users who care about that domain, and they decide on the properties and types. You can easily see ontology engineering par excellence going on here, done in a collaborative fashion.

Everyone can create new types, but in the beginning they belong to your personal domain. You may still use them as you like, and others as well. If your types, or your domain, turns out to be of interest, it may become promoted as being a common domain. Obviously, since they are still alpha, there is not yet too much experience with how this works out with the community, but time will tell.

Unsurprising I am also very happy that Metaweb's Jamie Taylor will give an invited talk at the CKC2007 workshop in Banff in May.

The API is based on JSON, and offers a powerful query language to get the knowledge you need out of Freebase. The description is so good that I bet it will find almost immediate uptake. That's one of the things the Semantic Web community, including myself, did not yet manage to do too well: selling it to the hackers. Look at this API description for how it is done! Reading it I wanted to start hacking right away. They also provide a few nice "featured" applications, like the Freebase movie game. I guess you can play it even without a freebase account. It's fun, and it shows how to reuse the knowledge from Freebase. And they did some good tutorial movies.

So, what are the differences to Semantic MediaWiki? Well, there are quite a lot. First, Semantic MediaWiki is totally open source, Metaweb, the system Freebase runs on, seems not to be. Well, if you ask me, Metaweb (also the name of the company) will probably want to sell MetaWeb to companies. And if you ask me again, these companies will make a great deal, because this may replace many current databases and many problems people have with them due to their rigid structure. So it may be a good idea to keep the source closed. On the web, since Freebase is free, only a tiny amount of users will care that the source of Metaweb is not free, anyway.

But now, on the content side: Semantic MediaWiki is a wiki that has some features to structure the wiki content with a flexible, collaboratively editable vocabulary. Metaweb is a database with a flexible, collaboratively editable schema. Semantic MediaWiki allows to extend the vocabulary easier than Metaweb (just type a new relation), Metaweb on the other hand enables a much easier instantiation of the schema because of its form based user interface and autocompletion. Metaweb is about structured data, even though the structure is flexible and changing. Semantic MediaWiki is about unstructured data, that can be enhanced with some structure between blobs of unstructured data, basically, text. Metaweb is actually much closer to a wiki like OntoWiki. Notice the name similarity of the domains: freebase.com (Metaweb) and 3ba.se (OntoWiki).

The query language that Metaweb brings along, MQL, seems to be almost exactly as powerful as the query language in Semantic MediaWiki. Our design has been driven by usability and scalability, and it seems that both arrived at basically the same conclusions. Just a funny coincidence? The query languages are both quite weaker than SPARQL.

One last difference is that Semantic MediaWiki is fully standards based. We export all data in RDF and OWL. Standard-compliant tools can simply load our data, and there are tons of tools who can work with it, and numerous libraries in dozens of programming languages. Metaweb? No standard. A completely new vocabulary, a completely new API, but beautifully described. But due to the many similarities to Semantic Web standards, I would be surprised if there wasn't a mapping to RDF/OWL even before Freebase goes fully public. For all who know Semantic Web or Semantic MediaWiki, I tried to create a little dictionary of Semantic Web terms.

All in all, I am looking forward to see Freebase fully deployed! This is the most exciting Web thingy 2007 until now, and after Yahoo! pipes, and that was a tough one to beat.


Comments are still missing on this post.

Five things you don't know about me

Well, I don't think I have been tagged yet, but I could be within the next few days (the meme is spreading), and as I won't be here for a while, I decided to strike preemptively. If no one tags me, I assume to take one of danah's.

So, here we go:

  1. I was born without fingernails. They grew after a few weeks. But nevertheless, whenever they wanted to cut my nails when I was a kid, no one could do it alone -- I always panicked and needed to be held down.
  2. Last year, I contributed to four hardcover books. Only one of them was scientific. The rest were modules for Germany's most popular role playing game, The Dark Eye.
  3. I am a total optimist. OK, you knew that. But you did not know that I actually tend to forget everything bad. Even in songs, I noticed that I only remember the happy lines, and I forget the bad ones.
  4. I co-author a webcomic with my sister, the nutkidz. We don't manage to meet any schedule, but we do have a storyline. I use the characters quite often in my presentations, though.
  5. I still have an account with Ultima Online (although I play only three or four times a year), and I even have a CompuServe Classic account -- basically, because I like the chat software. I did not get rid of my old PC, because it still runs the old CompuServe Information Manager 3.0. I never figured out how to run IRC.

I bet no one of you knew all of this! Now, let's tag some people: Max, Valentin, Nick, Elias, Ralf. It's your turn.


Comments are still missing on this post.

Flop of the Year?

IEEE Spectrum Editor Steven Cherry wrote the article Digital Dullard in, well, IEEE Spectrum. Well, he obviously dislikes Paul Allen for his money, and can't stop ranting about him, and about Mr Allen spending millions and millions of Dollars in research projects ("that's just the change that drops down behind the sofa cushions"). Yeah, Mr Cherry, you're totally right - why should he spend more than 100 Million Dollars in research, he should rather invest it in a multi-million house, an airline or produce a Hollywood blockbuster with James Cameron.

The thing is, Cherry claims the whole project of creating a Digital Aristotle, dubbed Project Halo, is naught but thrown out money, because understanding a page of chemistry costs about 10.000$. For one single page! Come on, how many students would learn one page for 10.000$?

Project Halo succeeded in creating a software program that is capable of taking a high school advanced-placement exam in chemistry, and actually, to pass the exam - and it did, and even beating the average student in it. Millions have been spent, says Cherry, for that? Wow...

Cherry fails to recognise two points here, that illustrate the achievement of such a project:

First, sure, it may cost 10.000$ to get a program that understands one page, and it may cost only 20$ to get a human to do the same. So, training a program that is able to replace a human may cost millions and millions, whereas training a human to do so will probably cost a mere few ten thousands of dollars. But ever considered the costs of replication? The program can be copied for an extremely low cost of a few hundred bucks, whereas every human costs the initial price.

Second, even though the initial costs of creating such prototype programs may be extremely high, that's no reason against it. Arguments like this would have hindered the development of the power loom, the space shuttle, the ENIAC and virtually all other huge achievements in engineering history.

It's a pity. I really think that Project Halo is very cool, and I think it's great, Mr Allen is spending some of his money on research instead of sports. Hey, it's his money anyway. I'd thank him immediately if I should ever meet him. The technologies exploited and developed there are presented in papers and thus available to the public. They will probably help in the further development and raise of the Semantic Web, as they are able to spend some money and brain on designing usable interfaces for creating knowledge.

Why do people bash on visions? I mean, what's Cherry's argument? I don't catch it... maybe someone should pay me 20.000$ to understand his two pages...

From vexing uncertainty to intellectual humility

A philosopher with schizophrenia wrote a harrowing account of how he experiences schizophrenia. And I wonder if some of the lessons are true for everyone, and what that means for society.

"It’s definite belief, not certainty, that allows me to get along. It’s not that certainty, or something like it, never matters. If you are fixing dinner for me I’ll try to be clear about the eggplant allergy [...] But most of the time, just having a definite, if unconfirmed and possibly false, belief about the situation is fine. It allows one to get along.
"I think of this attitude as a kind of “intellectual humility” because although I do care about truth—and as a consequence of caring about truth, I do form beliefs about what is true—I no longer agonize about whether my judgments are wrong. For me, living relatively free from debilitating anxiety is incompatible with relentless pursuit of truth. Instead, I need clear beliefs and a willingness to change them when circumstances and evidence demand, without worrying about, or getting upset about, being wrong. This attitude has made life better and has made the “near-collapses” much rarer."

(first published on Facebook March 13, 2024)


Frozen II in Korea

This is a fascinating story, that just keeps getting better (and Hollywood Reporter is only scratching the surface here, unfortunately): an NGO in South Korea is suing Disney for "monopolizing" the movie screens of the country, because Frozen II is shown on 88% of all screens.

Now, South Korea has a rich and diverse number of movie theatres - they have the large cineplexes in big cities, but in the less populated areas they have many small theatres, often with a small number of screens (I reckon it is similar to the villages in Croatia, where there was only a single screen in the theater, and most movies were shown only once, and there were only one or two screenings per day, and not on every day). The theatres are often independent, so there is no central planning about which movies are being shown (and today, it rarely matters today how many copies of a movie are being made, as many projectors are digital and thus unlimited copies can be created on the fly - instead of waiting for the one copy to travel from one town to the next, which was the case in my childhood).

So how would you ensure that these independent movies don't show a movie too often? By having a centralized way that ensures that not too many screens show the same movie? (Preferably on the Blockchain, using an auction system?) Good luck with that, and allowing the local theatres to adapt their screenings to their audiences.

But as said, it gets better: the 88% number is being arrived at by counting how many of the screens in the country showed Frozen II on a given day. It doesn't mean that that screen was used solely for Frozen II! If the screen was used at noon for a showing of Frozen II, and at 10pm for a Korean horror movie, that screen counts for both. Which makes the percentage a pretty useless number if you want to show monopolistic dominance (also, because the numbers add up to far more than 100%). Again, remember that in small towns there is often a small number of screens, and they have to show several different movies on the same screen. If the ideas of the lawsuit would be enacted, you would need to keep off Frozen II from a certain number of screens! Which basically makes it impossible to allow kids and teens in less populated areas to participate in event movie-going such as Frozen II and trying to avoid spoilers in Social Media afterwards.

Now, if you look how many screenings, instead of screens, were occupied by Frozen II, the number drops down to 46% - which is still impressive, but far less dominant and monopolistic than the 88% cited above (and in fact below the 50% the Korean law requires to establish dominance).

And even more impressive: in the end it is up to the audience. And even though 'only' 46% of the screenings were on Frozen II, every single day since its release between 60% and 85% of all revenue was going to Frozen II. So one could argue that the theatres were actually underserving the audience (but then again, that's not how it really works, because screenings are usually in rooms with hundred or more seats, and they can be very differently filled - and showing a blockbuster three times with almost full capacity, and showing a less popular movie once with only a dozen or so tickets sold might still have served the local community better than only running the block buster).

I bet the NGO's goal is just to raise awareness about the dominance of the American entertainment industry, and for that, hey, it's certainly worth a shot! But would they really want to go back to a system where small local cinemas would not be able to show blockbusters for a long time, involving a complicated centralized planning component?

(Also, I wish there was a way to sign up for updates on a story, like this lawsuit. Let me know if anyone knows of such a system!)