Why do many researchers choose not to publish code and data alongside their articles?

Question

Why do many researchers choose not to publish code and data alongside their articles?

commented Aug 17, 2015 by Gavin Simpson (820 points)

commented Aug 17, 2015 by Simon W (215 points)

commented Aug 17, 2015 by tomp (250 points)

commented Aug 17, 2015 by Gram (240 points)

commented Aug 17, 2015 by tomp (250 points)

8 Answers

commented Aug 17, 2015 by tomp (250 points)

commented Aug 17, 2015 by Richard Smith-Unna (0 points)

commented Aug 17, 2015 by Neil Chue Hong (185 points)

Gram · Answer 1 · 2015-08-04T18:12:13+0000

I am a big fan of open science here is a list of fears I have seen at my workplace:

Trade Secrets (Someone will put us out of business if they know this)
Hosting Costs (It is cheaper to keep this internal, then pay to have it exposed)
If it works why fix it (Fear of changes is certainly not a new thing)
Licencing Inexperience (It will take a lot of work and possible experience to figure this out)
Too many bosses (The amount of red tape to get this approved is not worth the effort)
Plagiarism (We don't want our work "Stolen" and if we have unwittingly used someone else's work, at least a limited set of eyes will see it so we should be safe.)

Edit: My answer was meant as a point-form summary if you would like I can go into detail on any of the topics listed.

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

Guido Jorg · Answer 2 · 2015-08-04T19:13:02+0000

Hebb wrote, not all issues are psychological, but most are. I agree with Gram's answer. Simon's answer is true not only for code but for mathematical proofs too. However there is one other major issue and it is psychological I suggest.

Many researchers tenured or not falsely think: open, therefore free, therefore worthless. Too many imagine they lose face in front of colleagues who publish in journals that are not open... (For the apparently valid reason that because if somebody pays to read an article it means they value it more and it has more value...)

This argument isn't applicable here. Most paywalled journals are read because a university subscribes to them. The target readers don't usually pay for them. Furthermore in science value is not judged by willingness to pay anyway.

Indeed, tenured colleagues really have no reason to prefer publications with more prestige. Some would even reply they don't know what that even means. For at that point, articles are looked up and read, as they become aware of their possible utility, not journals.

So the reason is often irrational one can suggest. PNAS for instance becomes free after a year but that does not mean its somehow less important or valuable.

(For illustration, my university library subscribes to virtually all journals in any field. But that means they spent the money, not any scientific peers of the authors publishing there. The administrators who allocated the funds don't read the journal, although they did a great favor for us researchers. At another university, this one in Europe, there were very few journals accessible so people cited mostly books or cited a paper citing another paper if they couldn't find the other online for ...)

This also answers this question: Why do tenured professors still publish in pay-walled venues?

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

Daniel Standage · Answer 3 · 2015-08-10T19:04:47+0000

The other answers provide some good insight into why scientists might publish the way they do today, but I think all of them miss a pretty obvious and important point: the history of publishing.

Scientific findings have been published in print for hundreds of years, even if the concept of peer review is more recent¹. Over the majority of this time period, scientists did not work with large data sets with the frequency and ease that we do today, and publishing "code" was certainly not common. A small data set could simply be published within an article, probably in the form of a table or figure, which could be distributed by photocopy or transcribed by hand.

Fast-forward to the 2010s: software is a critical intellectual and technical component in most areas of scientific research, and huge data sets can be disseminated openly, with ease, and with little to no cost. Distributing data and code inside a print article is rarely realistic these days, and even though journals typically publish online versions of all articles now, how to integrate supporting code and data is a big challenge—or at least publishers make it out to be a big challenge.

I would attribute the rapid advance in computing and networking technology as a primary cause in many (most) cases of "closed" thinking when it comes to publication. Many publishers and senior scientists are simply struggling to find their feet in this brave new world, and holding on to decades-old practices and values: the practices and values under which they were trained, and their mentors were trained.

¹Baldwin M (2015) Credibility, peer review, and Nature, 1945–1990. Notes and Records of the Royal Society, 69, 337-352, doi:10.1098/rsnr.2015.0029.

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

Michael · Answer 4 · 2015-08-11T09:30:24+0000

One two main reasons I can think of are:

Laziness or lack of time. Most of the time code (in science) is in a form that is not readable by someone who did not write it. Making it publishable in a proper form would take quite a lot of time and scientists have better thing to do. It is possible to write the code "readable" from the beginning, but this requires planning in advance. This rarely happens in science, since most scientists were not trained in programming and whatever they know is learned on the job.
Not realising it's important enough. This is not restricted to code. How many times have you read a paper with a poorly written Materials and Methods section? On the one hand, it is quite boring to read "we used a 1 ml pipette..." in papers, but on the other hand this information is crucial when trying to reproduce the work. Code sits in the same spot as pipettes. It is a method, and scientists don't really care about it. They care about the results and the conclusions. How the results were actually achieved is usually less important, in the eyes of most authors.

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

Franck Dernoncourt · Answer 5 · 2015-08-13T21:36:48+0000

An interesting list of reasons explaining why authors refused to send code when asked can be found in section 4.3 So, What Were Their Excuses? (Or, The Dog Ate My Program) of the paper "Measuring Reproducibility in Computer Systems Research." Christian Collberg, Todd Proebsting, Gina Moraila, Akash Shankaran, Zuoming Shi, Alex M Warren. March 21, 2014.

(The paper was mentioned to me when I asked for a reference on availability of source code used in computer science research articles).

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

Carlisle Rainey · Answer 6 · 2015-08-16T12:13:09+0000

In a field experiment on sociologists, Cristobal Young investigated a slightly different question. It is not the norm in sociology to publicly post reproduction files, so he wanted to know how often these would be provided upon request. The results are summarized in this blog post.

Only 28% of the 53 researchers contacted released their data. 73% did not release their data, even upon request. Here are the justifications of the 38 researchers that did not provide their data:

32% - IRB/legal/confidentiality issue
26% - no response
16% - don't have data
14% - don't have time/too complicated
5% - still using the data
5% - "see the article and figure it out"

Perhaps most informatively, one researcher acknowledges his/her true feeling about making data publicly available:

I don’t keep or produce "replication packages"… Data takes a significant amount of human capital and financial resources, and serves as a barrier-to-entry against other researchers… they can do it themselves.

It would seem that this barrier-to-entry sentiment would be highly underreported in most measures of why researchers don't provide their data. It might explain a lot of the resistance to making data publicly available.

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

Simon W · Answer 7 · 2015-08-04T18:17:24+0000

One factor is that many people will be comfortable publishing results based on messy, hurridly-written code, but will be embarrassed about sharing that same code, and it can be a considerable time investment to clean it up. There is often little incentive for putting in that time.

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

Why do many researchers choose not to publish code and data alongside their articles?

Please log in or register to add a comment.

Please log in or register to answer this question.

8 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories

Most popular tags

Why do many researchers choose not to publish code and data alongside their articles?

Please log in or register to add a comment.

Please log in or register to answer this question.

8 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories

Most popular tags