Denny: Created page with "{{pubdate|3|December|2019}} There are many, many papers in machine learning these days. And this paper, taking a step back, and thinking about how researchers measure their r..."

2020-11-15T23:39:55Z

Created page with "{{pubdate|3|December|2019}} There are many, many papers in machine learning these days. And this paper, taking a step back, and thinking about how researchers measure their r..."

New page

{{pubdate|3|December|2019}}

There are many, many papers in machine learning these days. And this paper, taking a step back, and thinking about how researchers measure their results and how good a specific type of benchmarks even can be - crowdsourced golden sets. It brings a convincing example based on word similarity, using terminology and concepts from metrology, to show how many results that have been reported are actually not supported by the golden set, because the resolution of the golden set is actually insufficient. So there might be no improvement at all, and that new architecture might just be noise.

I think this paper is really worth the time of people in the research field. Written by [https://en.wikipedia.org/wiki/Chris_Welty Chris Welty], [https://en.wikipedia.org/wiki/Lora_Aroyo Lora Aroyo], and [https://ai.google/research/people/PraveenParitosh Praveen Paritosh].

* [https://www.arxiv-vanity.com/papers/1911.01875/ Metrology for AI: From Benchmarks to Instruments]

{{tag|Simia}}
<noinclude>{{simiapost|english}}</noinclude>

Machine Learning and Metrology - Revision history

Denny: Created page with "{{pubdate|3|December|2019}} There are many, many papers in machine learning these days. And this paper, taking a step back, and thinking about how researchers measure their r..."