In a 5-4 decision on Monday, the Supreme Court, splitting along ideological lines, ruled that resegregating fire departments based on a procedure known to do so when other procedures are available is not only reasonable but irreversible by a city as long as the rules for doing so were set out in advance and bear any relationship, however distant, to the requisites of the job. Overturning the opinions of all the lower courts that had reviewed the case, along with decades of legislation and case law since the Civil Rights Act of 1964, the court opined that the city was concerned about the results only because it was worried about lawsuits from black firefighters, and in so doing “turned a blind eye to evidence supporting the exams’ validity.” Thus, the underlying assumption of the majority decision is that an exam that produces racial differences that other tests do not similarly find is a valid exam.

If someone had demonstrated that a multiple-choice test is the best way to predict successful leadership in a fire department, no one could argue with the New Haven exam results. But in fact, no one has ever demonstrated that how well a fireman answers questions on a bubble sheet predicts how well he wields a fire hose when confronted with a fire, let alone how well he can lead his fellow firefighters into burning buildings. In fact, if Woody Allen studied a couple of books on firefighting and took the exam, I suspect he’d do a lot better than virtually all of the firefighters who passed it, regardless of their color. But frankly, I’d feel a lot more confident with firefighters of any race or color leading the charge into my burning home than Woody Allen.

It’s one thing to know the “rules” of firefighting. It’s another to be able to move into action at a moment’s notice and actually perform the actions required to save lives and homes in a crisis. The distinction is between what cognitive scientists sometimes call declarative and procedural knowledge--that is, between knowledge you can consciously “declare” and knowledge that automatically “comes out” in your behavior. Imagine asking a tennis player and a physicist how a tennis player should adjust her feet and swing her racquet if the ball comes 35 percent faster and at a 40 degree angle to the right of what she had anticipated. I’d bet on the physicist in the oral exam to explain how she should adjust but on the tennis player on the court to know how to adjust in real time.

There’s one other factor oddly missing from the procedure the city established to promote firefighters: actual performance. The best predictor of future behavior is generally past behavior. So it might have made sense to crack open the files of the firefighters who had applied for promotion and perhaps just take a peek to see who had blemished, unblemished, or distinguished careers, or to ask their supervisors and peers to evaluate them. None of these factors were even considered by the fire department in narrowing down candidates to consider for promotion.

The problem with looking at past performance, of course, is its subjectivity, and like any single measure, it would be vulnerable to bias. That’s why when scientists (and competent psychologists) want to understand something well enough to predict it, they take multiple measurements with the best instruments at their disposal, to see if they converge on the same results. The best way to reduce error, particularly systematic error that can bias the results in one direction or another, is to use a combination of measures that differ substantially from each other (such as the field test, the written test, the oral test and an assessment of past performance for firefighters), which are all likely to get at some aspect of what’s needed for optimal performance but aren’t likely to share the same errors and the same direction of bias.