Quantitative investing

Shapley values

Shapley values are a concept borrowed from game theory. Regarding the latter, the game constitutes a setting in which players contribute to an overall outcome (payoff). In terms of this, Shapley values are used to distribute the gains or costs fairly across several players cooperating in coalitions. Effectively, they are the average marginal contributions (gains or costs) for all players, when all possible coalitions have been taken into account.

Where

φi is the Shapley value for player i
N is the set of all players and N\{i} is the set of all players excluding player i
n is the number of players in set N: the total number of all players
S is any subset of the set of all players excluding player i
v(S) is the payoff for a coalition consisting of the players in set S

For example, take two individuals (A and B) who would like to Uber home after work. If they do so separately, the cost for A is EUR 10 and for B EUR 15 given varying distances. However, if they share the ride, the fare amounts to EUR 20. With Shapley values, you can calculate how this should be split fairly between the two individuals. In this scenario, the players (A and B) form a coalition (carpool) and they play a game (Uber ride) to obtain a payoff/cost (EUR 20 fare).

In this setting, the Shapley values amount to the players’ average marginal contributions (or costs) across all possible orders in which the players can join the carpool arrangement, i.e., if A hails the ride first or if B is the first to request it. If A books the Uber, her marginal contribution will be EUR 10. Given that B joins the carpool as the second individual, his contribution will also be EUR 10 as that the total cost of the ride is EUR 20. But if B hails the ride, his marginal contribution will be EUR 15 and A will contribute the remaining EUR 5. Therefore, the Shapley values would then amount to EUR 7.5 for A – (10+5)/2 – and EUR 12.5 for B – (10+15)/2.

Shapley values have many applications, including machine learning (ML). In terms of our quant investing platform, we use them in interpreting our ML models. For example, they can help us to determine which variables (features) are the most impactful in our models and rank them in terms of importance. They can also shed light on the outputs generated by our models by deconstructing the marginal contributions of features, thus allowing us to better understand the economic rationale behind our model predictions.