LISTSERV - ISEN-ASTC-L Archives - COMMUNITY.LSOFT.COM

ISEN-ASTC-L Archives

Informal Science Education Network

ISEN-ASTC-L@COMMUNITY.LSOFT.COM

	LISTSERV Archives
	ISEN-ASTC-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: Poor ranking on international test misleading about U.S. student performance, Stanford researcher finds
From:	Alan Friedman <[log in to unmask]>
Reply To:	Informal Science Education Network <[log in to unmask]>
Date:	Fri, 22 Mar 2013 11:13:28 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (221 lines)
ISEN-ASTC-L is a service of the Association of Science-Technology Centers
Incorporated, a worldwide network of science museums and related institutions.
*****************************************************************************

I usually agree with Dennis, but in this case I've got to disagree
strongly with his all-out attacks on national and international
standards-based testing.  Oh yes, these tests are fair game for rigorous
scrutiny.  If you don't like the message, it is entirely appropriate to
question whether the messenger has distorted the message, or if the message
itself is devoid of meaning.

Before I dispute Dennis' assertions, first my personal disclaimers:
Eight years ago I was asked to join the National
Assessment Governing Board (NAGB), a 25-year old Congressionally-mandated
body that sets policy and oversees the National Assessment of Educational
Progress (NAEP).  I have learned a tremendous amount about
NAEP, and something about PISA and TIMSS.  I must also be one of the
villains in Dennis' version of the testing story, because for a couple of
years I have been the chair of the Assessment Development Committee of
NAGB.  My committee reviews every item on every NAEP assessment, and we
also oversee the creation of the Frameworks which define what's in the
tests.  My comments reflect my personal views, not necessarily those of
NAGB.  But now that I've described my own involvement and potential biases,
I beg to differ with nearly everything my friend Dennis says in his posts.

"Who decides what a 3rd grader should know?  Why some psychometrician
somewhere, of course!"

OK, I didn't know what a psychometrician was 8 years ago, but now some of
my best friendsŠ.  Actually, there are many psychometricians involved, in
NAGB, in the US Department of Education which operates NAEP, and working
for the contractors who NAGB hires to draft the assessments.  But
psychometricians are a tiny minority of the (literally) thousands of
people who are involved with creating the Frameworks, which describe what
NAEP is intended to measure.  Those thousands include teachers,
principals, superintendents, parents, pedagogy experts, curriculum
developers, business people, informal science educators, and in the case
of STEM, practicing scientists, engineers, and mathematicians.  The
process of creating a framework and the actual assessment which implements
it takes several years, and also involves thousands of students who
participate in iterative probe and pilot tests of every test.  This
includes interviews with students about what they think a a given question
means, and why they responded the way they did to each item.  All the
Frameworks are public documents, open to review and critique
(http://www.nagb.org/publications/frameworks.html).

Part of creating a Framework involves carefully looking at what is in the
various individual state curricula, and what is in the various published
curricula and lab activities.  Certainly this is easier in countries which
have uniform national curricula, but we do look for common, grade
appropriate content for each NAEP Framework, at the 4th, 8th, and 12
grades.  As Dennis notes, it is impossible to get an exact correspondence
with what is taught in each classroom in the US, but I'll argue we do
produce tests which substantially reflect what students should and could
know and be able to do from their classes.  The big assessments
like NAEP can't and don't try to report results on the level of individual
students, classrooms, or schools, but rather they are designed to examine
large-scale performance and trends over time.  The Frameworks are products
of a large, painstakingly coordinated, open, multi-year effort of
stakeholders who are well aware of the issues.

"The more important question is why anyone feels like these standardized
tests are useful measures of anything BUT SES."

I think the best rebuttal is to look at the those many, many major
results, on both national and international tests, which show dramatic
changes in student performance which are NOT related to SES.  Take for
example the famous case of Finland.  Finland has a highly homogeneous
population, at least in comparison to the US and many other countries.  A
decade ago it was well in the middle of the PISA rankings.  Now it is at
or near the very top every time.  Is that because the SES distributions in
Finland or in every other country it is compared with have drastically
changed?  Nope.  A better explanation is that Finland embarked on a
deliberate process to improve its educational system for all its students.
Change they did, and the assessments all tracked the
enormous improvements in their results, despite a stable SES distribution.

 We could also look just within the US:  we were at the top of
international assessments some decades ago, and now we are somewhere in
the middle of the pack.  Did our SES distribution change?  Yes, but
nevertheless our scores have actually remained pretty steady for several
decades, with modest, gradual improvements for almost every segment of our
populations.  It is just that other countries improved even more.  Neither
Finland's rise nor our fall in the rankings is attributable primarily to
changes in SES distributions.

"That's why I like the 'how many legs does a spider have' example.  If you
were never taught it in school, and you get the question right, how do you
suppose you knew the answer?Š."

If that were a representative item on NAEP, TIMSS, or PISA, I'd certainly
agree with Dennis that these assessments are grossly unfair for students
who did not get the chance to learn that specific content in their
classes.  But that isn't a representative item at all.  Take a look at an
assortment of the thousands of released items from NAEP or the other
assessments (http://nces.ed.gov/nationsreportcard/itmrlsx/default.aspx).
First you might be surprised to see that they are not all multiple choice,
but many are short or long constructed responses (students have to explain
their answers), and in science there are now hands-on items and
interactive computer scenarios.  Then you'll see that very few questions
are about recall of specific facts, and many more involve applying almost
universally treated basic principles and practices to new situations.
Some of my favorite items are about control of variables in scientific
experiments, which involve principles and practices common to nearly all
curricula in the US and other countries.  The particular assessment items
are almost never about a specific example students will have encountered,
but rather present an original experiment and ask students to critique
what its weaknesses might be, and how to fix them.  For an example of
this type of item and how students did with it, see page 40 of the NAEP
Science 2009 Report Card
(http://nces.ed.gov/nationsreportcard/pdf/main2009/2011451.pdf).  There
are dozens of examples of this kind of item in the released item bank.
And there are tools on the website which allow you to do your own
analysis of how students did dis-aggregated by SES or dozens of other
characteristics (http://nces.ed.gov/nationsreportcard/naepdata/).

These assessments are imperfect, of course.  But if we are to improve
large scale education systems, we need to work hard to create the best
possible assessments, and to know their strengths and limitations.  I've
concluded that NAEP, TIMSS, and PISA are pretty darn useful, warts and
all.  I hope ISEN readers will avail themselves of all the resources
freely available to decide for themselves if these tools are as
simple-minded and corrupt as some of our colleagues are contending, or if
they are among the best tools available to help improve education.

Alan
 
________________________________________
Alan J. Friedman
T  +1 917 882-6671
E   [log in to unmask] <mailto:[log in to unmask]>

On Mar 21, 2013, Dennis Bartels <[log in to unmask]> wrote:
>Hi Charlie,
>In fact the findings and research is even more insidious and nuanced.
>There is a great deal of research on the subject going back 50 years, and
>exemplified by the great debates in the 1960s with the New Deal in a
>series of government studies done by Dr. Jim Coleman (i.e., the Coleman
>Studies) which essentially established the strongest link between test
>scores and any other factor, including teachers and schools, was SES.
>It's been hotly debated every since.  Here is the insidious part, and
>then the nuanced part.
>The insidious part is for many years people treated these infamous
>studies as if demography is destiny.  In other words, that children from
>poorer backgrounds cannot learn as well or as fast as kids from more
>affluent backgrounds.  However, later cognitive research and empirical
>studies demonstrating many counter examples renewed our optimism and
>beliefs that indeed every human no matter background has similar learning
>potentials, excluding serious cognitive disorders.  So SES can and has
>been overcome many times over.
>Now the nuanced part.  Why the tight correlation between SES and
>standardized test scores (which I note here is NOT the same thing as
>learning or learning potential).  Subsequent analysis of standardized
>testing has revealed that external-referenced standardized tests (e.g.,
>tests used that are not directly tied to the curriculum taught but
>created somewhere outside of schools) always have a SES bias, including
>PISA.  Why?  Well, if the tests are not curriculum dependent (say for
>sake of contrast like end of course exams or AP exams, which are highly
>curriculum dependent), who decides what a 3rd grader should know?
>Why some psychometrician somewhere, of course!  And if the tests are not
>tied to classroom learning, but external references, such as how many
>legs does a spider have--if I was never taught that in school, who has a
>better chance of answering that question correctly, a affluent kids with
>access to all kinds of media and TV, or a poor kid who might not even
>have a TV or been to a summer camp or Exploratorium?
>Of course we all know the answer to that question.  So ALL that
>standardized tests (read non-curriculum or non-school dependent tests)
>measure is a perfect proxy FOR SES!  Ta-da!
>It's amazingly circular and I'm stunned more people don't know about this
>consistent result and that researchers still feel a need to verify it!
>The more important question is why anyone feels like these standardized
>tests are useful measures of anything BUT SES.
>
>>> Hi Charlie,
>>> 
>>> My bigger point is that no external-reference exam can ever measure
>>>anything but SES by definition.  That's to say, if the test is
>>>constructed by others outside of the direct learning environment, e.g.,
>>>classroom, there is no way to verify that the content on the questions
>>>was ever taught in the first place.  Therefore it is measuring
>>>something else.  This problem is only exacerbated with international
>>>tests.  Only those exams that are directly tied to the taught
>>>curriculum, such as the NY Regents end of course exams or the AP tests
>>>come to mind--as the test items come 100 percent from the prescribed
>>>curriculum and presumably all students had the same opportunities to
>>>learn the same material.  If you are being tested on material you never
>>>were taught in school, and you get the question right, you either
>>>guessed right OR were exposed to the material someplace else, e.g, an
>>>out of school environment.  Perhaps you learned it from your parents,
>>>or from TV, or from something you read at your leisure, or from the
>>>internet, etc you get the point.  So these tests often measure better
>>>your outside exposure and experiences which do covary by SES
>>>considerably.  That's why I like the "how many legs does a spider have"
>>>example.  If you were never taught it in school, and you get the
>>>question right, how do you suppose you knew the answer?  There's good
>>>evidence elsewhere that the variation isn't as strongly coordinated to
>>>even factors as health, nutrition or safety as they are other factors
>>>in kids backgrounds, very often the education level of their immediate
>>>family.  Which, of course is highly correlated with SES!  Not teachers
>>>or schools. 
>>> 
>>> So the point being that no external reference exam--no matter how well
>>>constructed--can rid itself of an SES bias, nor is a good proxy of true
>>>learning, let alone teacher "effectiveness."  If you really want to
>>>measure learning, the instrument has to be constructed around and tied
>>>directly to the course or set of learning experiences actually engaged
>>>by the learners.  When this is true, most SES differences decrease
>>>dramatically.

***********************************************************************
For information about the Association of Science-Technology Centers and the Informal Science Education Network please visit www.astc.org.

Check out the latest case studies and reviews on ExhibitFiles at www.exhibitfiles.org.

The ISEN-ASTC-L email list is powered by LISTSERVR software from L-Soft. To learn more, visit
http://www.lsoft.com/LISTSERV-powered.html.

To remove your e-mail address from the ISEN-ASTC-L list, send the
message  SIGNOFF ISEN-ASTC-L in the BODY of a message to
[log in to unmask]
ATOM RSS1 RSS2
COMMUNITY.LSOFT.COM