Tuesday, February 28, 2012

How do teachers measure up?

The is a big push to improve teacher accountability and rating systems, and with good reason.  Many teachers get almost no feedback, rated either "satisfactory" or "unsatisfactory."  Feedback is often not continuous and does not feed continuous improvement: the feedback may occur once a year, it may be sporadic, and it may be reduced in frequency as one becomes a more experienced teacher.  There is often a very limited amount of "data" that goes into evaluations, and there may often be a limited amount of evaluators feeding into the assessment.

Michelle Rhee's The New Teacher Project proposed a new framework for teacher evaluations (pdf) that is largely based upon her very pro-charter, pro-standardized testing agenda.  It has a lot of great points, but it also is based upon some pretty egregious flaws.

The framework is based upon the idea that every teacher should be excellent, that several years of excellent teaching can bridge the gap between poor and wealthy students' performance, and that teachers need to be evaluated in a more rigorous way that focuses most heavily on the improvements made by their students.  On its face, this sounds very reasonable.

The  framework opens by citing a variety of studies that show how important great teachers are, which is certainly true.  Two of the studies she cites, however, paint an incomplete picture:
  1. A 2006 study y Gordon, Kane and Staiger found that "having a top-quartile teacher rather than a bottom-quartile teacher four years in a row could be enough to close the black-white test score gap."
  2. A 2002 study by Rivkin, Hanushek and Kain found that "having a high-quality teacher throughout elementary school can substantially offset or even eliminate the disadvantage of low socio-economic background."
These two studies, which are perfectly valid on their face, leave out some important underlying assumptions:
  1.  Teacher quality is static - Northwestern University's Helen Ladd (pdf) evaluated teachers (2008) in the highest and lowest quintiles according to student assessments and found that most highly-effective teachers one year were not highly effective the next and many ineffective teachers were no longer ineffective the next.
  2. An effective teacher is effective for all groups of students s/he teaches - If you are given a class that is significantly less prepared than the last one you taught, your performance may not improve and even a great teacher might fail to produce the desired gains.
  3. Student gains and failures can be universally attached to the performance of a given teacher - Economist Jesse Rothstein surveyed data for 99,000 5th graders in NC and performed a statistical test asking "What effect do fifth grade teachers have on their students' 4th grade performance?"  Obviously, the effect should be 0, as they had yet to teach the kids.  Nevertheless, he tested three different "value added" measures and in all cases found that fifth grade teachers had an enormous impact on their students' test scores before they had even taught them for a day. There is obviously a flaw with the use of such value-added tests if such preposterous results are statistically significant. 
  4. That the results of a good teacher can be added as the two studies suggest - Diane Ravitch notes in her The Death and Life of the Great American Education System that "nowhere was there a real-life demonstration in which a district had identified the top quintile of teacher, assigned low-performing students to their classes, and improved the test scores of low-performing students so dramatically in three, four, or five years that the black-white test score gap closed." 
  5. Bridging the socio-economic gap is sufficient and means that our students are receiving a quality education:  plenty of white students from higher income backgrounds are doing terribly.  Bridging the achievement gap between rich and poor is a start, but certainly not the "destination" if a quality, world-class system is our goal. 
Now, the framework then continues with six key characteristic of a "good evaluation system":
  1. All teachers should be evaluated at least annually;
  2. Evaluations should be based on clear standards that prioritize student learning;
  3. Multiple sources of data should be considered, especially those measuring student's academic growth;
  4. Multiple rating levels to better differentiate teacher effectiveness;
  5. Rating encourages regular, ongoing, and  constructive feedback; and,
  6. Evaluation outcomes must have teeth, that is they should feed into teacher employment, bonuses, and pay.
I agree in general with the first five, though I see some constraints to the sixth characteristic.

All teachers should be evaluated annually
I think the more feedback a teacher can get from different evaluators during different types of lessons over the course of a year, the more useful a tool can be.  This seems like a great basis for an improved evaluation system that all teachers can use to improve.

Base evaluation on clear standards, emphasize student learning
Any evaluation, to be fair, should be based upon very clear standards with limited room for interpretation.  I agree also that they should be based on student learning, but I would urge caution in operationalizing the concept.  I think having impartial master teachers and principles observing or conducting a pop quiz to see if lesson plans are having an effect on a student's learning would get at this a lot better than using standardized tests.  Further, it would give teachers the freedom to teach a diversity of lessons that cover materials that are of extreme import but not necessarily on a standardized test.  The document does identify some opportunities like having a master teacher come in and note how many kids raise their hand or seem to "get" the material presented, though it does express a lot of support for the use of standardized tests.

Multiple sources of data should be used, focused on student growth measures
Diverse data--both in type and person evaluating--is critical to getting a more balanced assessment of a teacher's performance.  The focus on standardized tests is problematic as student performance on tests can vary and these tests may not test material that is all that worthwhile to know (or they may not test many subjects).  There is a further issue: a successful goal, according to the Harvard Business Review, is one that is concrete, that you can identify clearly when you have fulfilled it, and that is not dependent upon others.  Setting a goal for teachers that is dependent upon someone else (their students) is somewhat unfair.  Worse yet, these tests are not designed to test teacher performance.  They do not have the external validity to be misappropriated in this manner.  Create a standardized test for teachers, as that would at least have the validity necessary to make them an appropriate measure.  Additionally, if a test is administered mid-year, are the gains (or lack thereof) attributable to the current teacher or the previous ones? This is not clear. As I noted, I would prefer multiple observations and student and parent feedback.

Multiple Rating levels and on-going feedback
This is indeed preferable because it does improve upon evaluations to make them a tool for teacher encouragement and feedback rather than a narrow filter to remove only the worst teachers.  Further, if the ratings are meaningful and accompanied by concrete feedback, it gives teachers the actual tools to look at how they might improve and for the school to perhaps pair up that teacher with resources to help on their weaknesses. The more regular, the better.

Tie teacher ratings to their pay and employment 
All of the studies that I noted earlier should make us very cautious about this.  If teachers drop in and out of the highly-effective category (and the ineffective category) between years, then you need to be cautious about wantonly firing or punishing someone for doing poorly one year or rewarding someone who anomalously does brilliantly one year.  I think a more appropriate sixth metric would be to use student data and teacher performance data to try to determine what kinds of students a teacher teaches most effectively for future class assignments to try to set up student and teacher alike for success.  I recognize that this may not be realistic or might be logistically quite challenging, but it might be interesting to see what limits there are to this idea in practice.


In the end, my preference is for a rating system that looks a little different:
  1. Monthly evaluations by an independent master teacher (15%)
  2. Quarterly evaluations by school administrator/principals with experience in the classroom (15%)
  3. Semester evaluations by external education evaluation experts (15%)
  4. Round robin evaluations in which teachers evaluate their peers (10%)
  5. Amount and quality of efforts made by the teacher to improve on areas identified in previous observations (15%)
  6. Evaluations of student portfolios that look at growth on the subjects taught (10%)
  7. Use of interviews to get randomized student and parent opinions of teacher (10%)
  8. Performance on testing that can be attributed to that specific teacher, is included in that school's curriculum for the year, and that is value-added in nature (10%)
The weightings are my personal preference, but I think they reflect the importance of regular, directed feedback being used by a teacher for continuous improvement.  Ultimately consistently bad teachers should be counseled out, but those teachers who have a decent record of doing well should be retained and efforts should be made to match them with students with whom they will succeed (insomuch as is possible while still giving the most difficult to teach students access to quality teaching).

Speak out!

How would you evaluate teachers?

What are some interesting evaluation methodologies or criteria that you have seen/experienced?  

2 comments:

  1. I agree with practically everything you say and think it gives some great insight. My best friend is a middle school teacher in a small district in Tucson. While most of the schools fall under the Tucson Unified School District, her small district is separate, low income, emphasizes their highly involved teacher evaluation/observation system and consequently has a high performance rating quite unusual for its socioeconomic status. I can try to get some information for you about the model they use, as I am curious myself to find out more about it.

    I am interested to hear an expansion on your thoughts about one of the things you mentioned - pairing teachers and students more effectively. I've been facilitating budget forums over the last few weeks within various school districts around the area and one of the major complaints of the teachers is increasing class sizes. They communicated that they struggle with the class they already have due to the varying level of each student's education. This led me to wonder why grade-levels in school are entirely age based. Children start their education at home with their parents and we all know there are varying levels of involvement by said parents. This seems to influence what age children are school-ready. One teacher I talked to fervently expressed that she spends far to much class time attending to kids who need higher or lower level attention. Now, if kids were evaluated before entering the district (or school year) to be placed in a class at their level while completely ignoring the age component, each class would have a better balance of learning stages and the teacher could adapt better. It seems like teachers would be able to teach a like-minded and equal group more effectively in addition to the peer to peer learning improvement students would experience with kids at their level.

    There is also a district around here that used to group kids with similar test scores within their grade into the same class under one teacher so that most of the students in a classroom were at the same level. I guess with budget cuts, this was done away with since it cost extra resources and wasn't mandated by the state. The teachers I talked to in this district seemed to miss it greatly.

    Do you know of any schools that do any of the above?

    ReplyDelete
    Replies
    1. Sabre,

      I would love to hear more about that model! I feel like half the battle is a constructive approach. Of course there are some teachers, unions, administrators, and civil society organizations that will be unhappy regardless of what you put out there. That said, it isn't clear what a perfect system is, and even if it were, I don't think we'd get it 100% of the time. The idea that a person is going to develop a great working relationship with 30 new people each year is absurd, and would not be expected in the private sector. To expect it of teachers seems wrong-headed to me. We want to maximize their effectiveness and ability to bring out the best in students, but sometimes people just have very different styles. I had several "good" teachers that just did not teach in the right way for me. I also did well with some bad teachers.

      As for my suggestion for matching teachers and kids...that is a tough one. It would be quite resource intensive, and it's not clear what criteria would be best. I think removing age and social promotion is probably a good start for matching student with learning level, though we'd need to be careful to watch for students that are not advancing. I am less comfortable with grouping kids by test scores for a few reasons: (1) it inflates the importance of material covered on tests, (2) it makes tests seem like a measure of worth, (3) not all subjects are tested and it's not clear how those would factor in, (4) how would you place inconsistent performers who do amazing on math but poorly on reading, and (5) tests do not test readiness to learn. I also think tracking is really problematic in general, especially at a young age when your skills and interests are limitless. I think general readiness to learn might be the best way, but it's not clear how to operationalize that idea.

      Delete