LISTSERV - BEE-L Archives - COMMUNITY.LSOFT.COM

BEE-L Archives

Informed Discussion of Beekeeping Issues and Bee Biology

BEE-L@COMMUNITY.LSOFT.COM

	LISTSERV Archives
	BEE-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: Statistics, color blindness, multiple testing...
From:	Blair Christian <[log in to unmask]>
Reply To:	Informed Discussion of Beekeeping Issues and Bee Biology <[log in to unmask]>
Date:	Mon, 15 Jul 2013 12:50:33 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (147 lines)
Re: Jerry:
I'm happy to do any peer reviewing, or better, to try to explain the
methodology in layman's terms (or in Lehmann's terms,
http://www.stat.berkeley.edu/obituaries/ErichLehmannLegRevised.pdf
sorry I couldn't resist the bad statistics wordplay joke).  I find
that wikipedia, while not perfect, does a better job of explaining the
boilerplate concepts better than I can.  I provided links to some
advanced texts for the more cutting edge material.  It's always hard
to send out technical information to a diverse audience.  I'm happy to
entertain any questions/clarify anything I wrote/etc.

Re: Pete:

First, a note.  P-values are everywhere.  Historically, there are
three statistical traditions-- frequentists, Fisherians, Bayesians.
These days the frequentists and Fisherians are pretty much one group
(you'd be hard pressed to find a frequentist who wasn't a Fisherian
these days, but there was a big Fisher vs. Pearson fued a hundred
years ago).   Each of these traditions has a different "default" way
of testing hypotheses.  (likelihood ratio, p-values, Bayes factor).
P-values are more common than these other methods for hypothesis
testing, but they are not necessarily better.  I find some of these
Bayesian approaches to hypothesis testing (like P(hypothesis True |
hypothesis k)) to be much more intuitive/meaningful...
http://stats.org.uk/statistical-inference/Lenhard2006.pdf
http://www2.fiu.edu/~blissl/PearsonFisher.pdf

With that in mind, there are 2 main ways to approach meta-analysis
(the frequentist/fisherian approach, which we can just call
frequentist for now and the Bayesian approaches).  Jim Berger has
given a great set of lectures for at least 10 years now on how
p-values and false positive rates (alpha) are not the same thing.
http://www.stat.duke.edu/~berger/p-values.html

Also, this is a great discussion of why we don't have reproducibility
in more scientific studies:
http://www.stat.purdue.edu/symp2012/slides/Purdue_Symposium_2012_Jim_Berger_Slides.pdf
Bayesian hypothesis testing information:
http://cbms-mum.soe.ucsc.edu/lecture2.pdf



Meta-analysis.

Light background:

In statistics, we have this idea of controlling type 1 errors.  (what
follows is a frequentist take based on p-values since it should be
more understandable for this audience?).  Suppose you test a
hypothesis (eg have a treatment called A, setup a control group in a
designed experiment, spent 3 years looking at outcomes and have data
about differences in mortality at 3 years between the group of hives
treated for varroa and those not treated (the control group)) about
varroa control and compute a p-value and get a p-value of .04.
Suppose your friend has an entirely different method for varroa
control (treatment B), does a hypothesis test and gets a p-value of
.06.

If you start with a false positive rate of .05, you would say that
your treatment showed a significantly different response (at the
alpha=.05 level), while your friend's method did not.  But in reality,
the statistical evidence for each varroa treatment was very similar.
[effects: Treatment A is approved, a paper about it gets published;
Treatment B gets shelved, the paper was rejected because the treatment
was not significant at the alpha=0.05 level]

So Jerry comes along and says, I'm going to replicate both of those
studies.  He finds a p-value of .65 for your study, and .07 for your
friend's studies, so he says neither treatment has a statistically
significant difference [at the alpha=.05 level].
[effects: Jerry doesn't think either one does any good]


Now, Pete comes along, and he has the results of 2 studies for each of
2 varroa treatments.  Pete performs a meta-analysis.  Pete carefully
checks the statistical assumptions and uses Fisher's method.  He finds
an entirely different outcome WITHOUT PERFORMING ANY NEW EXPERIMENTS.
Pete found that Treatment B had a statistically significant effect on
reducing mortality due to varroa, while Treatment A did not.
[effects: Pete is confused and is not sure if he should trust statistics]


What has happened here?
If you think reporting about CCD gets confused by the media, you
should see how they deal with statistics.  The above scenario is
common, and is played out in journals over and over again.  The
fundamental flaw is that typically users of statistics see statistical
evidence in a binary fashion (accept/reject hypothesis), when in
reality, there is a continuous quantity underneath everything (let's
just call it "statistical evidence" or "likelihood of outcome" or
"likelihood").  This scenario has been discussed over and over again
in the statistics community, so I'll refer you all to some papers
instead of something more off the cuff.

In short, a well done meta-analysis is much more valuable than a well
done study.  A poorly done meta-analysis can be misleading at best.
It's most important to understand the statistical assumptions and
details to understand/interpret the results of any statistical tool
correctly.


Here are some cut and pastes from some papers:
http://www.stat.purdue.edu/symp2012/slides/Purdue_Symposium_2012_Jim_Berger_Slides.pdf

"The incorrect way in which p-values are used:
“To p, or not to p, that is the question?”
  Few non-statisticians understand p-values, most erroneously thinking
they are some type of error probability (Bayesian or frequentist).
  A survey 30 years ago:
  “What would you conclude if a properly conducted, randomized
clinical trial of a treatment was reported to have resulted in a
beneﬁcial response (p < 0:05)?
1. Having obtained the observed response, the chances are less than 5%
that the therapy is not eﬀective.
2. The chances are less than 5% of not having obtained the observed
response if the therapy is eﬀective.
3. The chances are less than 5% of having obtained the observed
response if the therapy is not eﬀective.
4. None of the above
  We asked this question of 24 physicians ... Half ... answered
incorrectly, and all had diﬃculty distinguishing the subtle diﬀerences...
  The correct answer to our test question, then, is 3.”


http://www.stanford.edu/~gavish/documents/note_on_p_values.pdf
"Fisher's p-values are everywhere in empirical science.  Quoting a
recent provocative paper [1] which addressed the (irr)reproducibility
crisis in medical science, "Research is not most appropriately
represented and summarized by p-values, but, unforturately, there is a
widespread notion that medical research articles should be interpreted
based only on p-values".  Somewhata more belligerent is [2]: "And we,
as teachers, consultants, authors, and otherwise perpetrators of
quantitative methods, are responsible for the ritualization of null
hypothesis significance testing [...] to the point of meaninglessness
and beyond."

"This is a short review of the part philosophical, part statistical
part scientific discussion within the statistical community about the"

"Big Question: What is the correct way to quantify and weight
empirical evidence against a point null hypothesis"

             ***********************************************
The BEE-L mailing list is powered by L-Soft's renowned
LISTSERV(R) list management software.  For more information, go to:
http://www.lsoft.com/LISTSERV-powered.html
ATOM RSS1 RSS2
COMMUNITY.LSOFT.COM