Data Mining as an Industry
Frank T. Denton
Review of Economics and Statistics, 67(1):124-127, February 1985.

Readers familiar with statistics, and particularly the concept of Type II errors, will know that a researcher who keeps searching a data set until he or she finds a statistically significant result can all too easily find a result that appears in the data just by chance. The researcher, then selecting to report this significant result, is likely to make a claim that is false. Denton shows that, given the bias towards statistically significant results that academic journals typically exhibit, the mistakes made by an individual researcher who inappropriately mines data is generally replicated by many individuals working on the same data set, even if individually they do not engage in data mining. 
This paper is a formal counterpart to Steven Landsberg's hypothesis, "to true to be good". 

Download pdf file