LISTSERV - ISEN-ASTC-L Archives - COMMUNITY.LSOFT.COM

ISEN-ASTC-L Archives

Informal Science Education Network

ISEN-ASTC-L@COMMUNITY.LSOFT.COM

	LISTSERV Archives
	ISEN-ASTC-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: The Great Testing Thread
From:	Alan Friedman <[log in to unmask]>
Reply To:	Informal Science Education Network <[log in to unmask]>
Date:	Sun, 24 Mar 2013 15:48:58 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (171 lines)

ISEN-ASTC-L is a service of the Association of Science-Technology Centers
Incorporated, a worldwide network of science museums and related institutions.
*****************************************************************************

I appreciate all the thoughtful and passionate discussion on assessment
and testing which Charlie Carlson began by pointing out the Stanford work.
Obviously I am a supporter and participant in assessment development, so
hearing these views and having the chance to hone my own line of argument
is very helpful.

Here are a few observations and comments on recent contributions to the
thread by Charlie, Leonie Rennie, Paul Orselli, Eric Siegel, Steve Uzzo,
Sarah Gruber, and Chuck Howarth.

Chuck and Paul describe horrible tests, which damage the education system
and do nobody any good except those who make money from the tests. These
tests are often politically motivated, and are often based on the idea
that punishing poor performers, including students, teachers, and schools,
will force them to get better, somehow. Many of the tests Chuck and Paul
are describing were the result of the No Child Left Behind legislation
(Harold Chapnick of the NY Hall of Science said requiring these mandated
long annual tests should have been called "No Child Left with a Behind)."

I agree entirely that there are horrible, useless, damaging standardized
tests. But I don't agree that just because rotten apples exist in a
bushel, that means we throw out the entire bushel, as well as all other
bushels, we stop eating apples altogether, and we campaign to rid the
planet of apple trees. We need to work hard, and this includes politics,
economics, PR, and threads in ISEN, to figure out how to keep apples from
going bad and to promote healthy apple-eating. To see that it can be done
right, look at the state of Massachusetts, which has excellent assessments
it created to meet its No Child Left Behind mandates. Massachusetts is
often at the top of US states in performance, rivaling the best countries
in the world. And yes, this is true even if you disaggregate the
Massachusetts data by socio-economic status (going back to the first
thread in our series). Many states found ways to do this right. Many
others did not, and some deliberately gamed the system for various
reasons, doing a disservice to students, teachers, and all of us. Those
are the bad apples.

One language usage issue makes legitimate criticisms of some tests, like
those Chuck and Paul have made, more problematic than necessary.
"Standardized testing" sounds as if the tests themselves are standard, so
that one standard test must be essentially the same as every other. You
see one bad test, and deduce that all others must be bad too. But the
"standard" refers to frameworks which define the content, and a consistent
manner in which a given test is administered and scored, not to the test
itself. That means each test we encounter can be based on a different set
of standards from the next test we see. And two tests can be quite
different despite both being based on the same set of standards. In the
trade we talk about "standards-based tests," rather than "standardized
tests." It is a fine distinction, perhaps, but one that we need to pay
attention to as we find there are better and poorer standards, and better
and poorer tests based on those standards.

There are also very valuable tests which are not high stakes and are not
used beyond an individual classroom. These include daily or weekly
diagnostic tests teachers use to help themselves understand what they are
doing well and what they need to be work on; what individual students are
getting and what they are not getting; and when to proceed to the next
stage of the syllabus. Sometimes these tests are embedded in the
curriculum itself and are not disruptive of learning time in any way.

The assessments we have been discussing, NAEP, TIMSS, and PISA, are not
"high-stakes" tests like NCLB-mandated tests. In fact NAEP, TIMSS, and
PISA cannot be high-stakes, because the design of these tests is based on
sampling students, and on each student taking only a sample of the test
itself. That means NAEP, TIMSS, and PISA cannot and do not report
anything about how individual students, teachers, or schools are doing.
They can only report on large jurisdictions, such as nations, states, and
very large districts. Unlike NCLB assessments, there is no legislation
assigning penalties and rewards for performance on NAEP, TIMSS, and PISA.
There may be psychic and indirect consequences for a nation's or big
jurisdiction's performance, but nothing like the rigid mechanical
structure of the high-stakes tests. High-stakes testing can have good
uses, as Massachusetts and other states which created good tests have
shown.

Steve and Sarah note the weakness in searching for something only where
the light is good--we tend to build assessments of things which are easy
to assess, and this limits the tests' utility and omits important things
which are harder to measure. This weakness describes the reason some
tests look only at rote memory and stress vocabulary, leading to a
damaging distortion of education, through "teaching to the test" when the
test itself is so limited. Countering these weaknesses is one reason why
tests like NAEP, TIMSS, and PISA take so long to develop and are so
expensive, and why they must use sampling to keep the tests affordable at
all. I can say that for NAEP we really do try to look where the light is
poor as well as where it is bright. Please peruse the newer science
assessment and the even newer technology and engineering literacy
assessment, including the hands-on and interactive computer tasks, and I'd
love to hear if you think we are indeed searching for student performance
in vital but harder to measure areas
(http://www.nagb.org/content/nagb/assets/documents/publications/frameworks/
science-2011.pdf;
http://www.nagb.org/content/nagb/assets/documents/publications/frameworks/p
repub_naep_tel_framework_2014.pdf).

Thanks to Leonie for observations and the link to the actual PISA items.
Yes, take a 10-minute look and see if the items are as dumb as the
damaging cheap tests we all hate (hint: they are not, IMHO). I gave
links earlier so you can do the same for NAEP test items. Thanks to
Charlie for remarks on the dangers of blaming the test when the bigger
problems may be rote curriculum standardization and teacher-bashing.

Eric reminds us not to make the mistake of interpreting test correlations
as causality, a fallacy I've often fallen into myself. It is so tempting,
especially when the correlations are in the same direction as your own
theory of what is going on. But correlations can be very valuable in
pointing to areas where a bit of rigorous causal research could be highly
fruitful. I'm working on an analysis with Alan Ginsburg of the
correlations between NAEP science scores and background question responses
on items of student interest, out-of-school activities, and attitudes
towards science courses. Definitely not causal information, but really
encouraging for stimulating research on the potential benefits and impacts
of informal science learning.

Charlie, I haven't looked at what's been happening to PISA scores for
upper SES groups. I know in NAEP there has been a general tendency for two
decades showing slowly rising scores for nearly all groups. Performance
gaps between lower SES groups and everybody else have have been generally
decreasing. Best of all, this gap-closing has not been because the
higher-performing groups have been going down, but because the previously
lower-performing groups have been rising faster, which is certainly good
news. But the gaps are closing too slowly, not uniformly, and we have a
long way to go. NAEP can't tell us what is causing these trends (see the
paragraph above) but analyzing NAEP, PISA, and TIMSS data can tell us some
really good places to look for causes and for potential improvements in
the education system. Which is the bottom line for why I am defending good
tests (and joining the attack on bad ones).

Finally, I agree with all the commentary that there is just too much
testing. But we should get rid of the bad tests, and keep the good ones.
Otherwise we would be relinquishing our best tools for knowing if our
total learning ecology is doing better, worse, or stagnating, for whom,
and in which circumstances. Finland, by the way, has no mandated national
test, so they don't "teach to the test" as a nation, as Charlie points
out. The Finns do participate in PISA and TIMSS sampling because they
find doing so valuable in order to learn how they are doing year to year,
and in relation to other nations. Each school is allowed to do whatever
and as much additional testing as it chooses.

Thanks for listening, everyone.

Alan
________________________________________
Alan J. Friedman, Ph.D.
Consultant for Museum Development and Science Communication
29 West 10th Street
New York, New York 10011 USA
T +1 917 882-6671
E [log in to unmask] <mailto:[log in to unmask]>
W www.FriedmanConsults.com <http://www.friedmanconsults.com/>
a member of The Museum Group
www.museumgroup.com <http://www.museumgroup.com/>

***********************************************************************
For information about the Association of Science-Technology Centers and the Informal Science Education Network please visit www.astc.org.

Check out the latest case studies and reviews on ExhibitFiles at www.exhibitfiles.org.

The ISEN-ASTC-L email list is powered by LISTSERVR software from L-Soft. To learn more, visit
http://www.lsoft.com/LISTSERV-powered.html.

To remove your e-mail address from the ISEN-ASTC-L list, send the
message SIGNOFF ISEN-ASTC-L in the BODY of a message to
[log in to unmask]

ATOM RSS1 RSS2

COMMUNITY.LSOFT.COM