KDnuggets : Polls : Data preparation (Oct 2003) 64% said that 60% or more time was spent cleaning the data. Interestingly, 8% said less than 20%.
"Karl Brazier, The blip at the bottom
Suspect there may be a small peak at the bottom end caused by model induction researchers like myself. Doesn't mean we don't think cleaning is important, just that our remit is to focus elsewhere. So we'll just have one or two new data sets to clean at the start of our work and probably supplement these with some of the cleaner sets from the UCI Repository. At the time of writing, I think I see this blip beginning to form. Well anyway, there's the offer of an explanation if it does. "