The general health of a country’s credit economy is always of great concern. No country can flourish in the majority of its subjects remain trapped in a cycle of poverty. Lack of accessible formal credit is known to have a big role in perpetuating systematic economic barriers, inter-generational poverty, and class immobility.
Preface
The general health of a country’s credit economy is always of great concern. No country can flourish in the majority of its subjects remain trapped in a cycle of poverty. Lack of accessible formal credit is known to have a big role in perpetuating systematic economic barriers, inter-generational poverty, and class immobility.
In the financial world, “Credit” often refers to an agreement to receive something of value now with an explicit promise to repay the same in the future. Credit enables entities (individuals and businesses) to get immediate access to the tools they may need (like education, machinery, etc) to enable them to produce better outputs in the future (like providing jobs to others).
Access to credit is considered to be one of the most important pillars for the economic development of a country: it increases competitiveness, creates job opportunities, eradicates poverty, builds wealth, generates assets, promotes flexibility, and fosters inclusive economic growth. Most importantly, it enables investment in human capital and businesses, and it has the potential to reduce inequality in society and drive economic growth.
Understanding the Credit Process
The credit life cycle begins when a potential borrower approaches a lender for a credit advance. From the lender’s perspective, the decision on whether to grant the loan or not is determined by the potential profitability and risk of the transaction. The lower the risk and greater the profitability to the lender, the more they would be willing to extend the loan.
Before deciding on the lending viability, lenders seek to gauge the credibility and repayment capacity of the borrower. Every lender (bank or lending institution) has its own internal norms and procedures for underwriting the loan and scrutinizing the applicant’s details and credentials. This process of assessing the lending viability is better known as “credit appraisal”.
To know the previous credit profile of a borrower, the lender considers the credit score and a detailed credit report procured from an established credit bureau. A credit bureau is an agency that collects and researches the lifetime credit information of the borrower and shares it with the lender.
This credit score forms an important component of the credit appraisal process performed by the lender. If the credit appraisal results are favorable, the lender will lend to the borrower, and if not, the lender will refuse to grant the loan.
Limitations of the tradidtional Credit Scoring System
Traditional credit scoring agencies had been providing great predictive outcomes to lenders for many years. Some of the underlying factors, which they used to determine credit scores included the borrower’s repayment history, current debt, amounts owed, type of debt, credit history, frequency of credits, and payment interest. Each credit-checking bureau has its own proprietary algorithm to assign a credit score to the borrowers.
However, the traditional credit scoring process followed by most bureaus has historically favored only affluent borrowers while leaving the less well-off borrowers with no way to obtain the loans. Traditional lenders have limited ways of assessing the credit worthiness of vulnerable groups like the poor, women, and small businesses as these groups often do not have any tangible data or credit history. This leads to a vicious cycle because without a loan, they cannot build a credit score, and without a credit history, they cannot avail any loan… the classic chicken-egg problem.
This has exacerbated the already wide supply-demand gap for credit products and its access. On the one hand, lenders were not able to advance loans due to the lack of visibility on the applicant’s credit score. As a result, their loan approval ratio and profitability had dipped substantially. On the other hand, many applicants had to face rejection as they were new to the credit market and didn’t have a lot of background data to assess their true repayment capacity.
This explains why lack of data has historically been an obstacle for banks and financial institutions to extend credit to the unbanked, and thus, an impediment to tapping opportunities at the bottom of the pyramid and achieving financial inclusion.
Also, with the changing times, the way people look at credit and financing has undergone a sea of change over the last five years or so. The hitherto credit scoring techniques have not been able to match this radical shift in their way of assessing the creditworthiness of a potential loan applicant. Hence, the older credit scoring process is now slowly becoming outdated, making way for new and innovative ways of scoring applicants.
The Rise of Alternative Credit Scoring
In order to overcome the aforementioned limitations of the traditional credit scoring approaches, observers have proposed that lenders could possibly cast their nets wider and look for more data points, which could give them a holistic view of their borrowers. This methodology is referred to as “Alternative credit scoring” in the contemporary financial world.
Alternative credit scoring is a more inclusive credit scoring mechanism, which goes way beyond the traditional parameters employed by credit bureaus like Experian, FICO and CIBIL – and leverages many more data sources to assess the borrowers’ current financial standing and willingness to repay in order to get a more holistic credit risk assessment.
The biggest beneficiaries of alternative credit scoring mechanisms are borrowers who are new to the credit and financing ecosystem. For such new borrowers, there is no sufficient centralized data available, but this doesn’t imply that they cannot avail credit. New-age alternative credit scoring companies use other tangible factors like a digital footprint to determine the credit-worthiness of a new customer.
This provides benefits at both ends. By democratizing access to credit, borrowers who are new to the credit ecosystem can still avail loan facilities irrespective of their lack of credit scoring data on traditional channels. Lenders also can utilize alternative credit scoring in order to boost their penetration in previously unexplored territories while still keeping their risk minimum.
Data for Credit Scoring
Building Blocks of Alternate Data
Multiple components are needed to bring alternate data models to life. Consumer consent and collaboration-based models will be the de-facto standard in the new world.
Financing companies that want to reach the “thin file” consumers should make faster, more reliable decisions with deeper insight and data points. It is estimated that formal credit scoring models generally use about eight to 10 variables. Meanwhile, alternative data credit scoring has the capacity to use more than 500 data points.
a. Systems: Systems should be fully integrated, online, and should provide data in real time. Many businesses are putting huge investments in machine learning and advanced analytics. However, many are still struggling to transform results into insights. Platform capabilities are needed to build, manage, and deploy the combination of real-time and batch data capabilities.
b. Data Collection: Data (structured/unstructured) should be captured in real time, be fully integrated, and be managed by automated processes.
c. IT/Analytics: Resources should support reporting, MIS, BI, predictive analytics, and scoring models. Advanced analytics and machine learning combined with a wide range of accessible data can aid with problem solving as well as with improving customer insights.
d. Data analysis: Data should be used to predict behaviours, segment customers, funnel sales, assess risk, etc. Models run in real time.
e. Decision Making: Information should be used across the organization to drive business strategy and support operations, marketing, and risk management.
f. Proactive Management: Products should be marketed proactively and target customers according to segmented demographic, geographic, and behavioral models.
Banks hoping to use alternative credit scores to make lending decisions need to perform a series of steps. First, the shortlisted alternative data points need to be collected from various third-party data providers as demonstrated in the earlier sections. The collated data then needs to be pre-processed, so that relevant data points are extracted that can be used to run the ML models. Based on the results of the ML models, the prediction results of alternative credit scoring are determined. Finally, the decisions about loan application approval can be taken, for which a workflow needs to be developed.
a. Development of machine learning models for default prediction: Machine learning models enable synthetic, effective, and dynamic indexes to be calculated that can facilitate rapid, informed decision-making in a changing and increasingly competitive environment. Machine learning models are the best candidates for the efficient extraction of value from data. They are constantly evolving as part of the search to cut analysis time, reduce discretionary power in the process, improve model development and validation, and produce high-performance risk indicators. Alternative credit scoring using machine learning requires the right procedures and processes to be in place. This section introduces the common pipelines and tools required to empower AI/machine learning for alternative credit scoring.
Source: Data for alternative credit scoring – HK Applied Science and Technology Research Institute (ASTRI)
To support decision-making about loan applications and satisfy risk management requirements, there are three phases involved in developing a machine learning model for predicting default.
b. Model Selection: There is no single answer to the question of which machine learning algorithm is best to use for alternative credit scoring because default prediction is highly dependent on the type of alternative data available. Model selection, therefore, needs to be an exploratory process involving the continuous evaluation of multiple machine learning models.
Broadening the data universe can be useful, but it also adds to model complexity. Once the model needs to compare variables consisting of numbers and characters (alphanumeric), which may have discrete or continuous distributions, it becomes important to decide which model generates the most accurate predictions of the probability of default.
In practice, the best machine learning algorithm will depend on the problem that needs to be solved. All machine learning algorithms have their respective pros and cons as alternative credit scoring models.
c. Data Exploration: In alternative credit scoring, the data available will dictate what type of model to employ and what information can be gained from the model’s output. Not all data is made equal. Nowadays, in-house data may comprise of data ranging from public financial information to temporal data. The process of credit scoring modelling requires rigorous and detailed exploratory data analysis to distinguish the variables and optimize the model.
The input dataset is usually split into a training and a testing set. The usual rule for splitting the data is 70% for training and 30% for testing (or 80% and 20%, respectively) based on the Pareto Principle.
d. Model Training: ML algorithms establish statistical/mathematical models that can make inferences. The inputs of machine learning algorithms are the training data, also known as predictors or independent variables, and the outputs of machine learning algorithms are the responses, also known as predictions or dependent variables. The inputs and outputs of machine learning algorithms can be defined as either quantitative (numerical) or qualitative (categorical). A numerical output corresponds to regression problems, such as the future price of a stock. A categorical output corresponds to classification problems, such as whether a client will fail to repay a loan. In practice, the raw data needs to be processed into meaningful data of good quality. Next, the qualifying data is divided into a training dataset and a testing dataset. The training dataset is the input of the model built by machine learning algorithms, and the testing dataset is used to evaluate the performance of the model, such as the accuracy of its predictions in response to a certain question.
Source: Data for alternative credit scoring – HK Applied Science and Technology Research Institute (ASTRI)
e. Model Assessment: In the banking industry, alternative credit scoring models provide answers to questions such as “will this entity default in paying?” and “how much should be loaned to this entity?” The quality of the answers resides in the model output and its interpretation. On the one hand, the model needs to be as accurate as possible to avoid the bank incurring losses; on the other hand, the model needs to align with the bank’s working capacity and liquidity. For example, a financial lender employing a model that approves all of its customers for a loan may end up exceeding its limits. To avoid these kinds of situations, it is important not only to consider the accuracy of the model, but also to align it with the lender’s actual business operations.
A machine learning model that rejects too many loan applicants may, for example, not allow the bank to deliver enough of their products. On the other hand, if the number of True Positives is large, the bank may not have enough staff to handle the cases individually. In conclusion, an alternative credit scoring model needs to perform well both quantitatively and qualitatively. The right threshold needs to be determined by taking the perspectives of both data scientists and business managers into account.
f. Feature Importance: Once the input data has been explored and the model has been trained and tested, an additional step is to analyze the importance of the variables within the model before outputting any results. Analyzing feature importance involves inspecting the variables and deciding whether a change in a variable (e.g., a change of distribution) would change the model output. Feature importance can be achieved by evaluating the variables by using software tools such as the Partial Dependence Plots10, ELI511, or SHAP12 Python libraries for specific algorithms, such as Random Forests and Boosted Trees. The reason for investigating feature importance is to further improve the model. The more important a feature is, the greater the amount of information it contains. Conversely, if a feature has low importance, it could be irrelevant for model training, and there would be no loss of model accuracy even if it is discarded.
By comparing the importance of all the features, a subset of features can be selected to replace the original training dataset. There are three advantages to applying feature selection:
g. Model Interpretability: According to Miller, “Interpretability is the degree to which a human can understand the cause of a decision. The greater the interpretability of a machine learning model, the easier it is for humans to understand why a certain decision or prediction has been made.” Model interpretability has proven to be a barrier to the adoption of machine learning for the financial industry. If a model is not highly interpretable, a bank may not be permitted to apply its insights to its business. To help humans interpret the outcomes of machine learning models, a number of model interpretation technologies have been developed. These technologies include SHAP14, ELI515, LIME16, Microsoft InterpretML17, XAI — explainableAI18, Alibi19, TreeInterpreter20, Skater21, FairML22, and fairness23.
To facilitate the automation of the workflow for alternative credit analysis, an online lending platform is needed to manage the steps involved in the process. These steps include the structuring and categorization of data fields, analysis by machine learning, decision making, and continuous monitoring. An online lending platform can achieve shorter turnaround times for loan approvals, which, in stormy economic times, can be critical in helping entities to survive. It can also help lenders’ operations become more cost-effective in the processing of loan applications.
Human discretion is involved only when the final decision for a loan application is made. The advantage of this approach is its flexibility in considering the results generated by challengers by using a wide range of alternative data (both transactional data and non-transactional data).
Running machine algorithms on borrowers’ data points can give the lenders the ability to make a variety of decisions and enhanced insights, like:
a. Interest rate determination: Lenders often base their interest rates on the borrower’s perceived intent and ability to repay. The higher the risk, the higher the interest rate is.
b. Pre-approved loan amount: Based on the borrowers overall financial standing (ability to repay) and his perceived behavior (intent to repay), lenders can offer pre-approved loans up to a certain amount, which can help lenders make their products attractive and remain competitive.
c. Customized repayment terms: Repayment terms can be customized for every borrower based on their specific needs. Flexibility can be given either on loan tenure, repayment installments, frequency of repayments, repayment holidays, or any other parameter.
d. Collection scoring: This refers to the degree to which the lenders are able to recover their loans in case of defaults.
e. Attrition scoring: This refers to the borrowers who prepay their loans.
f. Likelihood of re-borrowing: This refers to the likelihood of borrowers to borrow again from the same lender.
At the core of the alternate credit scoring companies’ competencies are three key factors – the ability, intent, and stability of the customer to repay the loan measured on the basis of these innovative scoring systems.
Alternative credit scoring demonstrates the potential strength of combining data from multiple sources like airtime usage, mobile money usage, geolocation, bills payment history, and social media usage.
Alternative data can take various forms, ranging from data based on observations of the borrower’s operations to data relating to the business principal’s personal risk characteristics and credibility.
Transactional Data, which refers to the records of business activities between a company and its customers. They usually include revenue-related information (cash flow data) and non–monetary-related information (non-cashflow data). The behavior trends generated by an analysis of revenue-related information can be used to assess the latest financial status of a company. At the same time, nonmonetary-related information can produce insights that are useful in predicting a company’s creditworthiness.
With machine learning techniques that are currently being used to perform trend analysis and default prediction, transactional data is becoming a promising type of alternative data for credit scoring. Opportunities to acquire transactional data are being created by open banking and OpenAPI initiatives while financial institutions can also source transactional-based data from third-party data providers.
i. Bank transactional data refers to all of a bank account’s cash inflows and outflows. Bank accounts are typically used by entities for receiving revenue and settling payments. Generally, bank transactional data is held by banks in large quantities and is of credible quality. Only recently did banks realize that they could take advantage of this huge amount of data. Transactional data is also relatively easy to retrieve as its recording is fully automated and it can be transferred between banks with the consent of the bank account’s owner. With access to an entity’s bank accounts, banks can formulate a new type of credit scoring model using their bank transactional data.
ii. Payment transactional data: In the retail industry, customer payment transactional data represents the sales activities of an entity. Alternative credit scoring can be performed on transactional data to identify whether the revenue of an entity for any given year/season/month is good or not. Although payment transactional data is not directly related to defaults on loans, they can be used to predict trends and patterns in an entity’s revenues. For example, with access to an entity’s payment transaction history, providers of online payment service platforms in the U.S. (such as PayPal, Amazon, and Square) can offer alternative lending services based on the entity’s sales data. In the trading and logistics industries, supply-chain payment profiles capture the ability of entities to pay their suppliers on time. Similarly, invoice information records can be used to analyze the status of and trends in an entity’s revenue streams.
iii. Miscellaneous: Other excellent sources that could provide relevant information about an entity include, but are not limited to:
Supply-chain payment data.
1. Utility transaction profiles: electricity consumption.
2. Telco transaction profiles.
3. Shipping records and logistics data.
4. Account records.
5. ERP database: Invoice records, A/R records.
b. Non-cashflow data: Besides cash flow data, transactional data records carry information that is not related to cash flow, but can be used to determine the quality of an entity’s business, i.e., non-cashflow data.
i. Target customer profile: The transaction values and patterns in the transaction records can be used to categorize the spending profiles of customers.
ii. Quality of customers: The payment identifications of the transaction records can be analyzed to identify the profiles of customers and ascertain if they are recurring or one-time customers.
iii. Quality of transactions: The percentages of cancelled, reversed, and voided transactions can reveal the quality of engagement with customers.
iv. Risk of fraudulence: Non-cashflow data provides credible sources of information relating to risk factors and problematic transactions for fraud detection and credit-related analysis.
i.Company’s credit analysis reports: Quantitative data from credit bureau reports include reference data that can be very useful for assessing the creditworthiness of entities. The Commercial Credit Reference Agency (CCRA) was established in Hong Kong under an industry initiative supported by the Hong Kong Monetary Authority. The CCRA in Hong Kong is an organization that collects information about the indebtedness and credit history of business enterprises and makes this information available to lending institutions. After receiving consent from the entity, lending institutions can check with the CCRA about the entity’s credit record to help them assess a loan application. The CCRA increases lending institutions’ knowledge of borrowers’ credit records, expedites the loan approval process, and helps strengthen lending institutions’ credit risk management. The fact that information about borrowers can be exchanged by lending institutions also incentivizes borrowers to repay their loans and helps to reduce the overall default rate.
ii. Personal credit reports: The personal credit history of a borrower’s key personnel can provide lenders with a view into the lending and repayment behaviour of these individuals. As these people are the decision-makers in the company, their credit behavior is likely to have an influence on the company’s credit behaviour. For example, if the proprietor is repaying debt regularly without default, then it is very likely that the company will do the same.
iii. Data from business lending partnerships (Google, Alibaba, and Sam’s Club): This data can give lenders a holistic view of the concerned business’ overall finances.
iv. Third-party business/products/services review (e.g., Alexa Global Rank, Yelp, Foursquare, Amazon, and eBay): In cases of businesses, lenders can check what people are saying about the business in question in order to gauge their reputation and reliability.
i. Psychometric test: A psychometric test is a standardized tool used to objectively assess traits that are not visible on the physical level (such as personality, intelligence, motivations, and needs). It is a way to understand an individual by better understanding their personality, achievement orientation, intelligence, needs, and motivations. These constructs are used for the psychometric test as they are usually found to be consistent. They can be mapped in an individual, and they can be used for profiling. Psychometric tests can provide valuable insights into an individual’s ability and willingness to repay loans. They can also provide valuable insights into an entity’s future by assessing individual personalities. Depending on the legal structure of the entity, the key person whose personality influences the entity’s prospects will be different.
1. Sole Proprietorship: The sole proprietor is responsible for all acts performed in the capacity of the business owner. Thus, the relevance of personality assessment is very high.
2.Partnership: Although certain forms of partnership (e.g., Limited Liability Partnerships) make the owner responsible for only certain types of debts, the majority of the decisions are taken by the partners. Therefore, a personality assessment of the partners can provide major insights into the prospects of the enterprise.
3. Limited company: A limited company has its own corporate identity, and the company’s liability is not the liability of its shareholders. Thus, personality assessment is not especially relevant and is not needed for this class of entities.
4. Sentimental analysis: Sentiment analysis (or opinion mining) is a natural language-processing technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand a customer’s needs.
ii. Remittance history is a good indicator of borrower creditworthiness.
iii. SIM data: Information about SIM activation, tenure of customers alongside other details indicated in the application form, postpaid defaults, credit, churn and payment information, and mobile wallet information.
iv.Smartphones reveal a host of mobile usage parameters that are now being used for building risk profiles of customers. For example, if a prospective borrower makes or receives calls from many different people throughout the day, it can probably be inferred that the borrower has a large social circle and would want to protect his or her social status by repaying his financial obligations on time.
v. Social media mining can be used to build profiles of customers based on their online presence. For example, if a defaulting customer of a bank is part of the social network of a prospective borrower, then such a customer can be regarded as high-risk. The assumption here is that prospective customers will have traits and behaviors similar to their online contacts.
vii. Utility bills analysis can give interesting insights into the creditworthiness of a customer. Regular, on-time payments indicate a user’s sense of personal responsibility, which can drive up the person’s credit score.
viii.Calling/SMS Patterns: Statistics on call duration/count, towers, SMS sent and received, time-of-day calling, inactivity, and calling consistency.
xi. Other reference data: Other reference non-transactional data points include, but are not limited to:
1. Intellectual properties: patents, trademarks, etc.
2. Physical asset value.
3. Industry recognitions: awards.
4. Size of customer base.
Machine learning can be used to pick up the borrowers’ micro patterns. For example, raw call detail records can be transformed into behavioural patterns to correlate with risk, ultimately providing lead generations for financing companies. The use of machine learning can give banks better insights, increase their sales through improved credit approval rates, reduce bad debt through better exposure management, and minimize processing time through automated decisions.
Advantages and Limitations of Alternative Credit Scoring
Like any other scoring system, alternative credit scoring methods have their own advantages and limitations.
1. Holistic information: Alternate data presents more insights into an entity’s creditworthiness, which the banks can use to make better-informed decisions.
2. Credibility: Data from third-party sources are more credible and not easy to manipulate while traditional financial data is subject to accounting manipulation.
3. Uncovering credit invisibles: Early movers can tap into traditionally excluded, but worthy borrowers consisting of thin-file and no-file applicants.
3. Fraud Detection: Digital automation and machine learning helps detect abnormal patterns of business operations, from which lenders can detect fraud and implement risk mitigation measures.
4. Uncovering new revenue streams: A lender’s typical decline traffic has a significantly qualified applicant hidden in it. With accurate, reliable, alternative data for credit scoring, you can turn decline traffic losses into revenue wins.
5. Continuous monitoring: With alternative data, the lender can monitor the borrower’s actual business situation and get a more complete and comprehensive view of a consumer’s creditworthiness.
6. Gaining a competitive advantage: The quest to expand credit access among the next billion population is becoming increasingly competitive. Being among the first movers to leverage alternative credit data and penetrating emerging markets can be a huge advantage.
1. Data quantity: There must be enough alternative data available to build the machine learning models. By nature, alternative data is more difficult to process than financial data because the data format is often unstructured.
2. Data quality: It is extremely important to guarantee the quality of the alternative data when creating a reliable risk assessment model as data points containing no value or having a large variance can compromise the output of the model.
3. Data privacy: Personal data or data that is in aggregate can be used to piece together an individual’s identity, which has become a lightning rod for regulators. Data protection and privacy laws will likely be applicable if alternative data sources contain personal data.
4. Model fairness: Using the correct data in the machine learning model is crucial for ensuring the appropriateness of that model. Indeed, the dataset is often the first place where bias is introduced into a model, and this situation also applies to alternative data.
5. Special engineering efforts: The adoption of alternative data requires engineering efforts in the areas of data science and machine learning. Lack of relevant human resources will hinder the development of alternative credit scoring. Banks need talent in these areas to adopt this new approach.
What is the Future of Alternate Data?
ACS’ advantages and limitations coexist. The financial services industry, specifically the credit industry, has mixed opinions about the viability of alternative data. There are still many uncertainties regarding the future adoption of alternative data for use in credit scoring. A key question is how to best combine the use of alternative data with conventional financial data in order to improve credit scoring performance.
On one hand, it is believed that the adoption of alternative data will continue to rise with the explosion of digital interfaces or digital interface points with consumers. Another factor that is likely to bolster its adoption is the significant reduction of cost, computing power, and data storage.
On the other hand, there are concerns about its accuracy and dependability. As some of the possible alternative data points are subjective in nature, there are concerns as to whether one could go about measuring them.
Summary: A brief summary of the blog can be found in this presentation
Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.
1700 West Park Drive, Suite 190
Westborough, MA 01581
Email: support@experfy.com
Toll Free: (844) EXPERFY or
(844) 397-3739