Hi all
I have been discussing this past couple of months, two important issues: extraordinary claims and flawed data sets. In the latter case, it seems pretty obvious that if the data set is garbage, then the analysis which flows therefrom is also garbage. You can't make a silk purse out of a sow's ear, &c. Here is an interesting example of flawed data and the flawed conclusion
> The idea is they’re going to take in a bunch of crime data and try to predict where the next crime will happen. But, the problem is, we don’t really have crime data. What we have are proxies for crime data, and they’re all terrible. Some cities use arrest data as proxies for crime, and some use reported crime. New York City uses reported crime. Other cities use arrest data.
> They say, “As a proxy for where the crime is, we’ll look at where people have been arrested for crimes in the past.” The problem, though, is that most crimes never get reported, and most crimes never lead to arrests. So when you’re trying to use the best proxy that you have to actually locate and predict crime, you’re just using a very flawed data set.
> And when you’re using that kind of data set with overpoliced neighborhoods to look for future crime, you’re going to end up reinforcing this pattern of overpolicing, so, for example, even though white people use marijuana at higher rates than Blacks or Latinos, 86% of all pot busts in New York this past year were of people of color. That’s a statistical farce.
¶
How does this relate to beekeeping? Well, for starters, BIP data. I remember when the idea of collecting beekeeper data was first presented. The PI stated that there were many beekeepers "out there" that were succeeding and just as many failing -- all we needed to do was get the information from group A to group B. Ten years later, piles upon piles of data have accumulated, but what do we know? About the same, only more so.
Just as unsupported claims can be dismissed without any counter evidence, conclusions based upon flawed data sets can be dismissed without any further analysis, once it's shown that the data is skewed, biased, or otherwise compromised. The key to statistics is to design data collection in a manner that avoids these pitfalls. Otherwise, what you have is propaganda.
PLB
***********************************************
The BEE-L mailing list is powered by L-Soft's renowned
LISTSERV(R) list management software. For more information, go to:
http://www.lsoft.com/LISTSERV-powered.html
|