We're releasing an API for accessing the latest Features library version - Features v2.0. The new feature generation library is designed to extract information from raw bank account data (transactions and account meta-data) and produce variables for building predictive models from open banking data. The new Features 2.0 is now available for developers and data science teams to test and apply in their next open banking data project. To get early access to the API, please request access here.
In data science, features are used to build predictive models. Feature is an observable event or a characteristic that can be quantified and recorded - in order words, information about each observation in the data set. Feature is the machine learning version of a word that other disciplines might call an “attribute”, a “factor”, a “predictor”, an “ independent variable” or just a “variable”. The quality of predictive models depends on the quality features, which is why data scientists invest resources in feature engineering. Feature engineering often requires extensive research, which is why data science teams often acquire external feature libraries of pre-engineered features that can be used to build and test predictive models in shorter periods of time.
The new features library can generate up to 1 million unique features for every bank statement. The features are segmented into groups according to their characteristics, including features generated based on descriptive statistics, end-user financial behaviour patterns or high-level bank account information.
Most features in the library are numerical, with a few exceptions of categorical features. All categorical features have a small, finite value set, which allows using them in simple processes such as one-hot encoding. Complexity for features ranges from simple binary features (e.g. indicating an event) to complex behaviours that capture information over the span of multiple patterns.
The library contains features that can support most popular use-cases where transaction data is used including credit risk assessment, credit scoring, credit application fraud detection, income verification, automated loan application screening, customer segmentation for marketing and personal finance management.
Features v2.0 supports the most popular use-cases of open banking data as well as other sources of bank account information, including user-submitted bank statements, card transactions, transactions from mobile wallets and more.
While the number of features that the new library can generate for every bank statement is substantial, not all features will be predictive or contain valuable information for all use-cases and models. To find the most valuable features, data science teams have to run a feature selection process, where a larger set of features are tested and a smaller subset of features are picked for the final models. For the current version of our API, Nordigen data science team assists all users of the API in the feature selection process.
The new Features v2.0 has been tested in credit scoring projects with the initial results yielding 7-14 percentage point GINI uplift. The new features library can be used in all 19 countries, where Nordigen provides Transaction Categorisation service.
Features v2.0 is the result of 9 months of research and testing with external partners to develop a complete set of open banking data features. Some of the newly generated features will be available as part of our existing self-service platform in products like Income, Loans, Risk and Marketing starting this year. The findings from the research will also be used to improve the performance of Simple Score.
We frequently share industry news and Nordigen product updates to our closest friends, fintech innovators and industry experts. Sign up to our newsletter to hear more from us.