Most companies depend on real-world data to train artificial intelligence models that can identify abnormalities to generate valuable insights. This isn't enough though, due to increasing frauds that are taking place day by day. It is difficult for researchers to detect when there is fewer data to support the algorithm’s training. For this reason, companies are coming up with synthetic data sets created to augment training data through AI-based fraud detection models.
The American Express Co.Machine learning along with data scientists have been experimenting with this so-called synthetic data for almost two years now. This was to improve the company’s AI-based fraud-detection models. American Express, a credit card company uses an advanced form of AI to generate fake fraud patterns focused at strengthening the real training data. The company’s AI-based fraud-detection model can overlook the rare types of fraud in case algorithms do not have enough training.
“There are various kinds of patterns, the number of fraud patterns in real life is huge. And some fraud patterns happen more often and some are rare”, said Dmitry Efimov, head of the company’s Machine Learning Center of Excellence.
American Express is working on improving these kinds of models by exploring and experimenting with generative adversarial networks, which is a method to create synthetic data on rare fraud patterns. The generated data is then used to augment the company's existing data set of fraud behaviors to improve the overall fraud detection models.
Dmitry says that, when thinking of balancing the presence of various fraud patterns, they came up with generative adversarial networks. The new generative adversarial network is an AI method that is frequently used to design simulation data to train the AI-based Fraud detection models that power self-driving cars. This technique is also used to create deepfakes that are hard to differentiate from reality.
If one AI model acts as a “generator” to yield new data, the other model tries to determine if the data is fake or real. The perfect generative adversarial network is one that cannot distinguish data from real. And personally identifiable information is not used at any level of the process, Dmitry added.
The efforts to determine these are still in the early stages as it is difficult to determine the amount of each unique fake fraud pattern the AI model should be generating. Even though the results so far are promising. The experiments have shown that for specific types of fraud, fake data cannot improve the AI-based fraud detection model.
The synthetic data has been found in other industries already such as hospitals for making medical decisions. And Startup Moveworks Inc, generates synthetic data to aid its AI-based chatbots, used by corporate customers to answer employee questions related to information technology, human resources, and finance, says Vaibhav Nivargi, Co-Founder and Chief Technology Officer. Moveworks built a machine-learning model that generates questions that could be asked by humans based on the technical documents.