God people, this is not that hard.. If you are working in academia, in the US, you are very likely to be funded by taxpayer money. The data you produce is enabled by taxpayers- all (most) of us.. So why is is such a hard friggin’ concept that you should be required to share your data freely, upon publication? Had you asked most scientists this question – ‘Should you release data on publication?’ – most would agree. Unfortunately we live in a fucking world where talk is cheap, and when somebody actually tries to enforce this tenet, even if the tenant is not water-tight, everybody backs down.. everybody thinks of reasons why they could not possibly ever do it..
This is what basically happened.. Today (or was it yesterday) PLOS released a policy statement that essentially requires people to deposit data – read the post here. Now you would have thought the the viking apocalypse itself had occurred by the magnitude of the shit-storm response, including a myopic blog post by DrugMonkey and laughable article in ‘The Scientist’. I get it that there are issues, especially with the sharing of massive amounts of data, but look- these are corner cases. Look at the last issue of your favorite journal.. What number of papers contain data that could not be shared on Dryad/Figshare/NCBI/SRA or whatever.. The number is VERY small.. That is not to say that there are exceptions, but again, these are corner cases..
Another ‘major’ objection is the definition of ‘raw data’, which is what is to be released.. Again, these are corner cases.. What proportion of PLOS papers critically depend on 2839642983 hours of high res video, or some other more obtuse data type. A few probably, but not many. There is a grey area here – I see that.. Do I submit raw output from fancy machine X, or its slightly more useful compiled format. Whatever, people.. whatever.. Do what you’ve always done, and see where it takes you.. How about this, make an honest effort to make the data accessible and useful to others, and chances are you’re probably good to go. Many people do this currently, and for them this policy change should be no problem..
There are other objections – one type is the ‘my raw data are so damn special that nobody can over make sense of them’, while another is ‘I use special software and stuff, so they are probably not useful to anybody else’. I call BS on both of these arguments. Maybe you have the worlds most complicated data, but why not release them and not worry about whether or not people find them useful – that is not your concern (though it should be). Remember, the policy is not ‘make the data available so that everybody can use it easily and with minimal effort’, but instead to ‘share’. This does require extra effort, but data curation is part of the job.
Look, change is hard, and there will be challenges in implementing this policy. PLOS is clearly trying blaze their own trails, something which they have done previously with great success. Do I think this is a perfect policy – absolutely not. Do I think its better that what we have now, yes. How many of us have been hindered by ‘data not available’? I know I have, and I don’t think I’m unique in that regard.
So, I challenge you, dear reader, Go to any of the PLOS journals and look at the last month of publications.. How many of these contain data so super-special or large that they could not be posted? Who knows, maybe I’m wrong.