Guidance for people building or training systems with AI components
Generative AI is a special example of an approach to AI called machine learning (ML). ML is a method of teaching computers using data sets. Instead of humans creating instructions for computers to follow, each piece of training data can result in changes to the system’s internal values. A trained system can later make decisions about new data or provide output in response to new data based on the internal patterns it has learned from. As training is a vastly different technique for instructing computer systems than computer programming, there are new risk areas that people building ML systems need to consider.
Privacy and machine learning training data
ML systems learn from data during the training process. Training data may be revealed later in ways that can’t be predicted. For this reason, do not train ML systems with level 2-4 data unless all results from that ML system can be kept private.
Those building their own applications should ensure that the platforms they use are set up with the highest level of security and safety features. Likewise, any source code that is being used for developing such applications should be assessed to ensure that they came from a trustworthy source to provide full protection of users and data. Miscreants may attempt to compromise the integrity of such applications in a variety of ways, including accessing end user data, altering source code, or in the case of AI applications, altering or subverting training data to introduce specific bias in the output of the application. To mitigate this latter risk, developers should regularly be examining, controlling and/or sanitizing data used for training to ensure that adversaries can’t subvert the behaviour of the system through careful manipulation of the training data.
Beware of adversarial training data
Examine, control and/or sanitize data used for ML training to ensure that the system’s behaviour can’t be subverted through adversary-provided training data.
Recognize the possibility of unintended bias in training data
The behaviour of an ML system is based on the training data that has been given and learned from. This may result in patterns within the data, even unexpected or undesirable patterns. Examples:
- An ML system asked to distinguish between pictures of dogs and wolves may learn to distinguish one from the other using other indicators, such as snow and trees or other characteristics – not the animals themselves.
- A system asked to distinguish between people visually may make an unwarranted association between race and traits if the training data contains more racially identifiable people for a specific trait.
- If a training data set has been collected from higher socioeconomic status individuals, it may contain biases with respect to neighbourhood, lifespan, education, and many other characteristics or factors that correlate with their socioeconomic status.
In general, if there are any biases in training data, the system will end up learning those biases along with all the other patterns in the data, and then may produce prejudiced or inequitable results. For this reason, carefully examine data used for ML training for biases and address them when found.