Using the Gini Coefficient to Evaluate Deep Neural Network Layer Representations
- GitHub repositories: gini, brain-imaging-and-the-neural-code.
- Guest, O., Love, B. C. (2017). What the Success of Brain Imaging Implies about the Neural Code. eLife. doi: 10.7554/eLife.21397.
Sparsity is an issue in neural representation and we think it should be measured in artificial neural networks to understand how they are representing information at each layers. For example, are a few units doing the work or is there a distributed pattern across all units (i.e., overlapping units taking part in the representations of cat, car, etc.). So in What the Success of Brain Imaging Implies about the Neural Code we decided to use the Gini coefficient, inspired by its use in evaluating voxel activations, to uncover the degree of sparsity within each of the layers of Inception-v3 GoogLeNet.
The Gini coefficient is primarily used to give an idea of how wealth is distributed within a group of people, usually a whole nation. But it can also be used more generally on a vector of numbers, a distribution, to describe how distributed values are (more on this below).
We were dealing with relatively big data, as Inception-v3 GoogLeNet has quite a few layers, so I needed something with relatively low space and time complexity. In terms of speed, my Gini calculator is quite a lot faster than (the current implementation of) PySAL’s Gini coefficient function (see the documentation) and outputs are indistinguishable before approximately 6 decimal places. And it is slightly faster than the Gini coefficient function by David on Ellipsix.
where is the index for each data point and is the total number of data points.
For a very unequal sample, e.g., with 999 zeros and a single one, the Gini coefficient is very high (close to 1). For uniformly distributed random numbers, it will be low, around 0.33. While, for a homogeneous sample, the Gini coefficient is 0. In other words, the lower is the more equal the distribution of wealth/numbers is. Check out the readme file for examples of what can be passed to the
The Gini calculation by definition requires non-zero positive (ascending-order) sorted values within a 1-dimensional vector. This is dealt with within the gini function. So these four assumptions can be violated, as they are controlled for:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import numpy as np def gini(array): """Calculate the Gini coefficient of a numpy array.""" # All values are treated equally, arrays must be 1d: array = array.flatten() if np.amin(array) < 0: # Values cannot be negative: array -= np.amin(array) # Values cannot be 0: array += 0.0000001 # Values must be sorted: array = np.sort(array) # Index per array element: index = np.arange(1,array.shape+1) # Number of array elements: n = array.shape # Gini coefficient: return ((np.sum((2 * index - n - 1) * array)) / (n * np.sum(array)))
And that is all there is to it! The only two inviolable assumptions it makes is that you have numpy installed and that you send it something like a numpy array (use
np.asarray() to check if what you have is array-like).
But what does this have to do with artificial neural networks? Well, instead of people within a nation, we can consider the units within a layer. And instead of people’s wealth we can look at units’ activations after we have propagated input to the layer. So given an input to a layer, we can measure how sparse (unequal) the distribution of activations is. A single number can give us an idea of how localist or distributed the representation the layer has learned is. Averaging over the Gini coefficients for all the possible inputs to a layer, we can calculate how localist or distributed the representations within a layer are in general.
Inception-v3 GoogLeNet has output that is trained to be completely sparse/localist, since it uses one-hot coding for the classes. Representing the output classes using one-hot coding ensures that outputs are trained to be both orthogonal and localist (two properties which are not by definition mutually inclusive). In terms of the targets it learns per input image, the network’s output will have a Gini coefficient of approximately 1. And in general, we can expect the output’s Gini to be close to 1, except in the very rare cases where the network is completely unsure of what we have shown it.
On the other hand, on other/lower layers, we find that the Gini coefficient can be high or low. It decreases and increases non-monotonically as a function of layer depth. Although it does show a rough trend of becoming higher as we move deeper, it is by no means a given. What this implies is that the network is not representing things by definition in a more localist way as we move towards deeper/later layers. In the two layers we talked about in the aforementioned Guest and Love (2017), the network has a Gini coefficient of 0.579 for the penultimate layer and 0.947 for the shallower layer (on the specific stimuli we used). At the end the average Gini for the output is, as expected given the training regime, 0.941. These and other points with respect to the representational contents of each layer are discussed in depth in Guest and Love (2017).
See here for a translation of this article by Daniel Morales into Spanish: El coeficiente de Gini como herramienta para evaluar las representaciones de las capas en redes neuronales profundas.