QUANTIFYING THE EXTENT TO WHICH POPULAR PRE-TRAINED CONVOLUTIONAL NEURAL NETWORKS IMPLICITLY LEARN HIGH-LEVEL PROTECTED ATTRIBUTES
In a popular technique called “transfer learning,” a technique used widely in the computer vision field, researchers adapt publicly released models pre-trained on millions of images, for example, to determine whether a person talking in a video is telling the truth. But could the resulting classifier be biased? Has the pre-trained neural network model learned high-level features that correspond to protected attributes such as race, gender, religion, or disability status?
Understanding the high-level features encoded in deep neural network representations is pivotal to understanding the kinds of biases that may be introduced in a broad range of applications during transfer learning. In this paper, we quantify the extent to which three popular pre-trained convolutional neural networks are implicitly learning and encoding age, gender, and race information during the transfer learning process.
Results indicate that these readily used pre-trained models encode information that can be used to infer protected attributes such as race, gender, or age, even with very limited labeled data available at a very high accuracy.