Consolidating Computer Rankings

March 16, 2015

With the games of 14 March the field was finally connected, with a games graph diameter of six. 18.4 percent of the 45,150 team-pairs were either opponents or opponents' opponents and two thirds were separated by a path length of three.

Kenneth Massey (http://www.masseyratings.com) compiles a College Baseball Ranking Composite report that includes human and computer rating-based rankings. I publish a consolidated version of the human rankings, and beginning March 17 will do the same for the computer ratings, with a few differences.

Actually, because computer ratings rank all of the teams there are consolidation methods that can be brought to bear that aren't available for the human polls. Mapping multiple ordered lists into a representative one is a voting problem, and there are some relatively simple ones that I apply to Dr. Massey's lists.

Borda: For each ranking, assign each team points equal to the number of teams ranked below the team. For lists 301 teams long this amounts to giving 300 points for first, 299 for second, and so on down to zero for last. A slight twist to this familiar approach is how ties are handled. If there's a two-way tie for second, say, the rankings begin 1, 3, 3 instead of 1, 2, 2. Ties are rare in advanced computer ratings, but can be common when their ordinal rankings are combined.; To calculate the Borda result, add the Borda counts for each team from all rankings, assign 1^st to the team with the most points, 2^nd to the team with the next-most points and so on, taking into account the convention for ties. Note that a ranking by descending Borda Count will be identical to ranking by increasing average rank.
Bucklin: The Bucklin Majority rank for a team is the one for which a majority of the rankings have the team ranked at least that highly. For the human polls I use a "weak" definition of majority: "at least 50% of the rankings." For computer rankings I use the stronger "50%+1" definition.; This is the only summary that actually uses one of the rankings as its value. If there are an odd number of rankings Bucklin is identically the median rank (for an even number of voters it is the higher/worse rank in the median calculation.) For all the other composites some arithmetic function is applied to the rankings and then the teams are ordered by the values of that function. Bucklin is my preferred ranking for human polls with their truncated ballots, but it doesn't work as well for computer rankings unless there are a lot of those. (It works fine for FBS football, for instance.)
Instant Runoff Voting: IRV is actually a method used to pick a "single winner" (#1 team) using an algorithm that includes votes for #2, #3, etc. until some alternative has 50%+1 of the accumulated votes for first. There isn't an easy way to turn the "true" IRV algorithm into a consolidation of ranked ballots but a tweak to the Bucklin calculation to include the right kind of tiebreaker provides an ordered list that is about the same as what one would get with an IRV for 1^st followed by IRV for 2^nd from the remaining teams, and so on.; The actual calculation is just Bucklin with tiebreakers. The first tiebreaker for teams with the same majority ranking is the number of votes contributing to the majority. With a small number of "voters" this is not likely to break the tie, so the second tiebreaker is a "truncated" Borda Count - it only considers votes up to and including the majority rank.
Pairwise Winning Percentage: Pwise is based upon the number of other teams for which a team is ranked better than. It is based upon Condorcet methods that use a pairwise matrix where the row corresponding to team A and column corresponding to team B is the number of voters (ratings) that have team A ranked better than team B. The ranking is based upon a "pairwise winning percentage." For each (TeamA,TeamB) pair, count a pairwise win for team A (and pairwise loss for team B) if #votes("team A is better than tea B") is strictly greater than #votes("team B is better than teamA"), and pairwise loss for team A (win for team B) if the #votes("team B is better than team A") is strictly greater than #votes("team A is better than team B"). Pairwise ties for each of A and B occur when those #votes are equal.; To rank the teams, calculate "pairwise winning percentage" in the usual way, using for each team #pairwise wins + ½ #pairwise losses divided by the number of other teams. Note that for any particular pair of teams A and B, team B may have a "pairwise win" over team team A but team A may be ranked higher because it has more pairwise wins against the other N-2 teams than team B.
GeometricMean: No one has ever suggested this as a voting method, but it is included because Dr. Massey and I independently observed that it has some nice properties with regard to consolidating rankings. Specifically it gives more weight to better ranks by any rating compared to the equal weights given by the arithmetic average (Borda).; Sort in ascending order the n^th root of the product of n rankings or equivalently the AntiLog of the average Logarithm of the rankings.

Correlating Rankings

There is no "right" computer rating compilation any more than there is a "right" computer rating. There are measurements of how similar two ratings are (this is one of the benefits of considering only rankings that include all teams.)

The simplest and to me the most intuitive is to just count how many of the ½×n×(n-1) pairs of teams are in the same relative order (concordant pairs) in the rankings being compared and how many are in reverse order (discordant pairs.) The number of discordant pairs for two rankings is called the distance between them. It is the number of team-swaps that would be required in either ranking to make it exactly the same as the other.

These can be used in a number of ways to form a ranking correlation with a value ranging from -1 (teams are ranked in exactly opposite order) to +1 (the lists are identical). One such correlation is Kendall's tau (τ) defined by

τ(rankA,RankB) = #concordant pairs(rankA,rankB) - #discordant pairs(rankA,RankB)

½×n×(n-1)

where n is the number of teams ranked.

In week five there are six computers, and their τ-correlations are:

AvgDist	Comp	SAG	RT	MAS	KLK	MOR	NOL
3875.2	SAG		0.8721	0.8520	0.8467	0.8007	0.7645
4020.4	RT	0.8721		0.8529	0.8155	0.7865	0.7765
4080.4	MAS	0.8520	0.8529		0.8189	0.7849	0.7815
4716.8	KLK	0.8467	0.8155	0.8189		0.7738	0.6935
5404.8	MOR	0.8007	0.7865	0.7849	0.7738		0.6490
5987.2	NOL	0.7645	0.7765	0.7815	0.6935	0.6490

Obviously τ(rankB,rankA) = τ(rankA,rankB) since it doesn't matter which of RankA or RankB is sorted to match the other. Using average distance between ranks, in week 5 the Sagarin ranking is the most representative of the computer rankings.

When the size of the ranked list (n) is large, the distances may seem "big" but for D1 baseball you have to compare them to a ½×n×(n-1) that is a lot bigger: 45,150 team-pairs for 2015's field of 301 teams. To make it easier to take that into account instead of looking at the distances one could look at the percentage of concordant team pairs.

The percentage of concordant pairs is related to τ by %concordant = ½×(τ+1)
½×(τ+1) serves the same role in ordinal ranking correlations as r² in correlations of real-valued lists. It is a mapping of values in the [-1,1] interval to [0,1] ( [0,100] when expressed as a per cent.)

Correlating Composites

I do not know if there's a theorem that proves that any "composite" ranking always correlates more closely to the constituents of the lists than any one of the lists does to the others, but that is true for all of the composite rankings I listed above. Every one of the composite rankings has a lower average distance from the six computer rankings than any of the computer rankings has:

		Distance						%Concordant
AvgDist	Comp	SAG	RT	MAS	KLK	MOR	NOL	SAG	RT	MAS	KLK	MOR	NOL
2906.50	PWISE	1659	1916	2090	2991	3978	4805	96.26	95.69	95.30	93.29	91.09	89.24
2930.33	BUCK	1791	1947	2049	2973	3916	4906	95.76	95.42	95.19	93.13	91.03	88.82
2950.50	BORDA	1800	1995	2121	3185	3991	4611	95.93	95.49	95.21	92.84	91.04	89.66
2979.00	IRV	1842	1988	2094	3028	3971	4951	95.89	95.57	95.33	93.25	91.15	88.96
2984.33	GMEAN	1815	2031	2201	3229	4068	4562	95.95	95.47	95.09	92.80	90.93	89.83
3875.20	SAG		2868	3320	3438	4469	5281		93.61	92.60	92.33	90.04	88.23
4020.40	RT	2868		3298	4138	4787	5011	93.61		92.65	90.77	89.33	88.83
4080.40	MAS	3320	3298		4062	4823	4899	92.60	92.65		90.94	89.25	89.08
4716.80	KLK	3438	4138	4062		5073	6873	92.33	90.77	90.94		88.69	84.68
5404.80	MOR	4469	4787	4823	5073		7872	90.04	89.33	89.25	88.69		82.45
5987.20	NOL	5281	5011	4899	6873	7872		88.23	88.83	89.08	84.68	82.45

For the week 5 rankings the pairwise percentage is the most representative, which is consistent with its construction. It is no surprise that the geometric mean is the least representative (though its still a very good consolidation) given the reason we include it in the first place - it exaggerates the differences in the better ranks.

In memory of
SEBaseball.com