Leggi in italiano

Credit: Fly View Productions/ E+/ Getty Images.

Recent banking crises highlight the need for new and better tools to monitor and manage financial risk, and artificial intelligence (AI) can be part of the answer. The adoption of AI in finance and banking has long been a matter of discussion.In 2017, the bank J.P. Morgan presented the first disruptive AI-based software for processing financial document called COIN (COntratc Intelligence). A few years later, the Organisation for Economic Cooperation and Development (OECD) opened the AI Observatory on Fintech (AIFinanceOECD 2021) focusing on opportunities and risks. Europe and Italy have also gone in this direction, and one of the 11 Italian priorities in the National Strategic Program on Artificial Intelligence launched in November 2021, is indeed AI for banking, finance and insurance. This is also a subject for the large new national research project on AI called FAIR.

AI is affecting finance in several ways. Deep learning models can be used for supporting customer interactions with digital platforms, for client biometric identifications, for chatbots or other AI-based apps that improve user experience. Machine learning has also been often applied with success to the analysis of financial time-series for macroeconomic analysis1, or for stock exchange prediction, thanks to the large available stock exchange data.

However, the use of deep learning for analysing data on bank transactions is still under-explored. Transactional data represent the largest source of information for banks, because they allow profiling of clients, detection of fraud, dynamic prediction that can help prevent the loss of clients. But the nature of the data and the unavailability of large public annotated dataset (for privacy and commercial reasons) make transactional data extremely difficult to handle for the current state-of-the-art AI models.

These “foundation models”, were initially developed for natural language processing, and they are large neural architectures pre-trained on huge amounts of data, such as Wikipedia documents, or billions of web-collected images. They can be used in simple ways, see the worldwide success of Chat-GPT3, or fine-tuned to specific tasks. But it is more complex to redefine their architecture for new types of data, such as transactional bank data. These data are multimodal, meaning that they can include numerical information (the amount of the transaction), categorical (its type), textual (the bank transfer description), and in some cases have a specific structure (the date). The structure changes according with the type of transaction (a card payment, an ATM withdrawal, a direct debit or a bank transfer). There are important correlations within a series of transactions, for example in periodical payments, and among different series, because each client can own different bank products, different accounts, and some accounts have different owners. Finally, some transactions are correlated with external but unknown conditions, such as holidays, or the lockdown in the pandemic period.

The new UniTTab Italian project, a research collaboration of the independent economic research group Associazione Prometeia and the University of Modena and Reggio Emilia, is trying to overcome these difficulties and explore the use of deep learning for transactional bank data. The project relies on a large dataset provided by an important Italian bank, with about 1.5 billion transactions from about three million anonymized clients, spanning from 2020 to 2022. Also crucial are the availability of large GPU facilities and new neural architectural models, specifically designed for bank transactional data.

The project achieved preliminary results in the creation of a new foundation model for finances2, based on an evolution of the ‘Transformer’ architecture used by BERT, GPT and many other models. The AI receives in input sequences of bank transactions, and transforms the different numerical, textual and categorical data formats into a uniform representation. Then it learns in a self-supervised way to reconstruct the initial sequences, similar to what GPT does with text. This allows to perform many tasks on new transactions series, different from the original training set.

The model can classify the behavior of clients, detect anomalies and frauds, predict product churn (clients leaving the bank) in the next few months. These data could also be used to generate new transaction series that would be very similar to the original ones but not exactly equal, creating a new synthetic dataset of transaction that could be used for analysis while preserving the privacy of clients.After training on the dataset provided to Associazione Prometeia, the model’s effectiveness on a task of fraud detection has been tested on a dataset of about 450,000 synthetic card transactions. The results are strong and outperform any competitor, with an accuracy of 95.5 %. A task of loan default prediction was tested on an open-source transaction dataset and achieved an accuracy of 94.5%. A task of churn rate prediction was tested on a different version of the original Prometeia dataset, and the results were compared with the real annotation of accounts closed in 2022. The prediction was very precise and better than competitors, with an accuracy of 90.8%.

The architecture is only a first prototype, but the project shows the feasibility of designing specific AI models adapted to the financial domain.