BEE-L Archives

Informed Discussion of Beekeeping Issues and Bee Biology

BEE-L@COMMUNITY.LSOFT.COM

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
James Fischer <[log in to unmask]>
Reply To:
Informed Discussion of Beekeeping Issues and Bee Biology <[log in to unmask]>
Date:
Fri, 15 May 2015 16:24:11 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (85 lines)
> Are any of us experts at statistical analysis? 
> Maybe someone who is could explain to us how our thinking is wrong or
correct.

As Michael Franti and Spearhead sang, "The more I see the less I know", but
I can explain some.

I think that it is unfair and a little reckless to diss the Bee Informed
gang without first asking what sort of statistical analysis they are using.

Of course there is selection bias - it is inherent in the science of
surveying!

There are a very rich set of tools for dealing with selection bias.
The social sciences use a lot of surveys, and they like math even less than
beekeepers.
(I dunno why beekeepers don't like statistics, as managing hundreds of
beehives is a natural fit for statistical methods.)

First, it is crucial to look at systematic differences between survey
participants and non-participants on selected variables (the selection
model).  
But wait, how do we find out anything about non-participants? 
We go bug people - we call a few hundred people who DID NOT respond to the
survey, and beg them to answer a few quick simple questions.
That's enough of a sample of non-participants for the work we want to do.

For fun, lets pick these variables: 

a)  Rural vs Suburban/urban
b)  Under a dozen hives vs over 100 hives
c)  Have never taken any beekeeping course,  have taken classes, or are
"certified" by one group or another at one level or another

We want to distinguish between the binary dependent variable in the
substantive model (i.e., serious losses or not) and the dependent variable
in the selection model (i.e., participation or non-participation),
which is also a binary dependent variable.  

Just off the top of my head, I could use a Pearson or other form of chi
square for (a) and (b), and for (c) which has multiple possible values,
rather than being binary, I'd pick something like a Mann-Whitney test.
Then if we get a significant result, we can say that one demographic or
another was "under-represented" in our survey, or make other statements
about participants vs non-participants.

But these " bivariate comparisons" have limits. Most times, one detects
selection bias using one form or another of a "binary probit regression"
model.
In my own experience, everyone runs a logit model first, and then uses the
resulting likelihood value to decide between using a logit or probit model,
but I've yet to find a dataset where one curve was massively different from
the other, so I just run a probit, and be done with it.

The once one detects some selection bias, one can correct for it with
two-stage Heckmans (I am likely admitting to being a dinosaur just by
calling them Heckmans), and the more sophisticated current versions thereof.
These are great for working with the simultaneous estimation of two multiple
regression models. (e.g., "Are losses higher among those with under a dozen
hives vs those with over 100 hives?").

But these days, one uses software like SAS, SSPS, or R.
No one sits down with a calculator and does this sort of work by hand any
more.
All the tests are hidden by so much abstraction, most research teams get
themselves a "stats person" just to crunch the numbers, and to keep everyone
honest as an "auditor".
I have to admit that SAS is worth every penny of their astronomical license
fees.

A funny story:  a good friend and researcher who reported to me was honored
with a highly prestigious award, and was asked to give an after-lunch talk
to explain his breakthrough. 
I came to the talk of course.
He had a lot of statistics in his slides, so he started off with a joke:
"Did you hear the one about the statistician?"
I couldn't help myself.  I said in a stage whisper "Probably!"
Everyone kept giggling through the whole talk, and he didn't accept my
apology for a solid week.

             ***********************************************
The BEE-L mailing list is powered by L-Soft's renowned
LISTSERV(R) list management software.  For more information, go to:
http://www.lsoft.com/LISTSERV-powered.html

ATOM RSS1 RSS2