Have you ever participated in a survey or research study? It's hard to imagine that anyone would not have. Survey results cover the pages of most newspapers, Web sites, and just about any other medium. Most of them make me sputter in horror. Why? No, they're not scary. They are merely scarily bad - meaning that there are fatal flaws in the design that render the results DOA. Alternatively, the data are "analyzed" by people who don't know the difference between good and bad quality data, or do not realize that there is more to data analysis than knowing how to do the math.
Here's an example.
In December 2011, a study was published in the proceedings of the National Academy of Sciences, purporting to show that the sound made by valuable, old Stradivarius and Guarneri violins was no better than that produced by brand-new, stainless steel violins. Articles subsequently appeared first The New York Times ("In Classic vs. Modern Violins, Beauty Is in Ear of the Beholder") then in Ars Technica ("Old, million-dollar violins don't play better than the new models"). The story was picked up and featured everywhere from NPR to The DW Academie.
There was just one small problem: the study was so badly done, and the math used so inappropriate for the tiny, non-random sample (21 people attending one conference), that there was no basis for drawing any conclusion(s) at all.
As I wrote to Ars Technica at the time:
.... the Big Mistake was to publish This Article under This Title (Old, million-dollar violins don't play better than the new models) without knowing anything about research design or inferential vs descriptive statistics. As a result, Ars Technica has shared with the public a piece of research that fails utterly to support the claim made in the title.
The number of subjects in the study is: 21. This is too small a number of cases to merit computation of anything other than very simple descriptive statistics (e.g., "The number of people in the group is; their ages are," etc.). You may note that the authors made no attempt to generalize beyond these 21 individuals, and neither should you.
This is not to suggest that the investigator-authors did a fabulous job, either. For instance, I see no mention in the technical abstract (which I did download from the publisher's site) of 'inter-rater reliability,' which is a necessary computation in any research that asks people to rate and/or compare anything to anything else. Unless one has established, empirically, a reason to believe that, for each subject, there is consistency in ratings from one episode to another, one cannot tell whether or not one is receiving reliable reports - period. Moreover, the elaborate reporting of statistics like effect size, F, and p value may convey the appearance of probity. However, that appearance is highly misleading since the fundamental requirements for acceptable study design, implementation, and sample size have not been met.
All in all, this "research" does not merit coverage in your fine publication, and it certainly does not merit the conclusions drawn either by this story's author or by the original authors.
In short, this kind of research is a classic case of fools rush in where angels fear to tread. There's a lot of rushing going on these days, so be careful to watch your step!