Accidental Bracketology

May 29, 2014

I have been doing some form of the Pairwise Comparison-based presentation of the Nitty Gritty data ever since SEBaseball.com poster SetonHallPirate suggested that we borrow the idea from D1 Hockey. I was never one who tried to predict the committee's field - Mark Etheridge was the D1 Baseball Bracketologist - so the main intent of the report was to provide an alternative to the mundane "by RPI rank" sort sequence.

As the report was always input to the process, I never actually bothered to compare its regular-season final results to the selections and seedings until this year. I never expected it to do all that well at prediction, since as I said in my description of the report the weights for the 15 criteria used are rather arbitrary: after head-to-head, record vs common opponents, and record vs top 50 they are mostly set to reduce the possibility of a pairwise tie.

I did check that this year, and to my surprise it did surprisingly well. If the sort sequence had been used to select at-large teams and assign seeds, it would have 32 of the 33 at-large teams and 50 of the 64 teams seeded the same as the committee (including having the same national seeds.)

To do the comparison I "unwrapped" the regionals to produce what would have been an S-curve, were that to actually exist. This is a pure fiction in baseball, but you can use it to order the teams and mentally replace 17-32 with 2, 33-48 with 3 and 49-64 with 4.

  Team Comm PWM
(RPI)
RPI ISR
  Oregon State 1 4 6 1
  Florida 2 1 3 13
  Virginia 3 2 1 9
  Indiana 4 7 2 4
  Florida State 5 3 4 7
  Louisiana-Lafayette 6 8 5 2
  Texas Christian 7 6 11 5
  Louisiana State 8 5 9 6
   
Rice 9 18 7 16
  Cal Poly 10 15 16 3
  Mississippi 11 13 13 14
Louisville 12 22 20 21
  Vanderbilt 13 9 8 10
  South Carolina 14 14 14 19
  Miami Florida 15 16 15 22
  Oklahoma State 16 11 18 11
   
  Nebraska 17 31 26 29
  Texas Tech 18 19 17 18
  Maryland 19 26 25 43
  Oregon 20 27 23 12
  Kentucky 21 17 19 25
  Washington 22 28 24 15
Arizona State 23 33 38 24
Texas 24 12 12 8
Houston 25 10 10 17
Dallas Baptist 26 45 28 37
  Mississippi State 27 32 32 35
  Alabama 28 23 22 36
Indiana State 29 34 21 39
  Arkansas 30 25 33 27
  Long Beach State 31 29 29 26
Nevada-Las Vegas 32 39 27 33
   
  UC Irvine 33 43 43 31
  North Carolina 34 42 41 51
Liberty 35 -3 30 47
  Stanford 36 44 44 32
  Kennesaw State 37 46 57 64
San Diego State 38 30 40 23
Sam Houston State 39 20 37 34
  Bryant 40 48 47 59
  Texas A&M 41 36 42 40
Pepperdine 42 24 34 20
Georgia Tech 43 21 31 46
  Kansas 44 37 46 42
  Clemson 45 40 49 50
  Old Dominion 46 41 36 52
Columbia 47 50 35 71
  Cal State Fullerton 48 35 54 28
   
  Binghamton 49 59 157 184
  Bethune-Cookman 50 60 208 216
  Campbell 51 49 66 83
  Xavier 52 54 99 136
  Kent State 53 57 126 124
  Jacksonville State 54 53 90 106
  Cal State Sacramento 55 55 133 74
  George Mason 56 56 121 138
  Southeastern Louisiana 57 52 75 79
  Siena 58 61 196 227
  Jackson State 59 63 268 235
  Georgia Southern 60 51 89 88
  Youngstown State 61 64 272 273
  Bucknell 62 58 112 155
College of Charleston 63 47 53 58
  North Dakota State 64 62 248 234

† — teams seeded higher by the committee
‡ — teams seeded higher by the program
∗ — team not selected by the program

The at-large selections

The at-large cutoff fell after team #44 in the list, and Tennessee was 36th, so the algorithm would've had the Vols a three-seed. That is not to argue that Tennessee should've gotten a bid - the sort order of the report does not override good judgement. In this case, the #30 non-conference RPI and 13 top-50 wins just don't overcome the 3-7 conference series record. (Idle thought - if Tennessee had won the home series against SEC #13 Auburn and a game in Hoover might the SEC have gotten 10 at-large bids?)

The committee chose Liberty, whereas the algorithm might've chosen West Virginia. I say might have because humans using the report might've scratched the Mountaineers for the same reason as Tennessee and San Diego for its poor finish to the season. Personally, I'd have given a long look at UCSB, who had the same number of pairwise wins as Liberty.

The last four in would have been Old Dominion, North Carolina, UC Irvine and Stanford. Next out were West Virginia, San Diego, Liberty, UCSB, Mercer, UCF, Illinois, USC, East Carolina and NC State.

One Seeds

As mentioned above, all of the National seeds were listed in the top eight of the final report. The algorithm would've had two differences in the remaining one seeds. It, like many of the bracketologists, would've had Texas and Houston instead of Rice and Louisville. Showing the committee's sense of humour, Rice winds up with the toughest two seed in the tournament. Houston winding up at a National seed's regional seems cruel punishment for losing the series to the Cardinals.

Twos and Threes

Four of the committee's two seeds would've been threes had my computer been doing the work. Arizona State and Indiana State were the top threes according to the algorithm, so there's hardly a difference there. Dallas Baptist and UNLV's resumes look much more like three seeds to me.

Those would've been replaced by three teams that look like top-half of the twos in the computer's eyes: Sam Houston (#20), Georgia Tech (#21) and Pepperdine (#24.) #30 San Diego State doesn't look different enough from Arizona State for there to be too much of a complaint.

A little more worrisome is the committee's number three that the computer would've had a four. Fully sixty of the teams on the report had better resumes than Columbia, despite their being 35th according to the RPI. One would think a three seed would do better than oh-fer against the top 50, and winning the 24th-best conference is something a lot of those 60 would've done.

Fours

Which leads us to College of Charleston's seeding. The Cougars were 35th in non-conference RPI to the Lions' 73rd, the road records were nearly the same (.565 to .571) and winning the 13th-best conference counts more to me than the 24th. It is only Columbia's RPI that separates it from the other four-seeds, and one wonders if it should count so much.

That said, by pairwise wins CofC would only be the next-to-last three seed, so not all that much damage was done if mis-seeded these teams are.

If the standard to which the committee is held is the degree to which they apply the criteria stated in the handbook, this year's compares very favorably to the job an unbiased automaton would do. Obviously everyone could give different weights to the criteria than my quasi-random arbitrary ones and get a different list, but then we're talking more a matter of taste, and I can't measure that.

In memory of
SEBaseball.com

Paul Kislanko