Alternative Data for Credit Scoring

The general health of a country’s credit economy is always of great concern. No country can flourish in the majority of its subjects remain trapped in a cycle of poverty. Lack of accessible formal credit is known to have a big role in perpetuating systematic economic barriers, inter-generational poverty, and class immobility.

August 6,2021 Fintech


The general health of a country’s credit economy is always of great concern. No country can flourish in the majority of its subjects remain trapped in a cycle of poverty. Lack of accessible formal credit is known to have a big role in perpetuating systematic economic barriers, inter-generational poverty, and class immobility. 

In the financial world, “Credit” often refers to an agreement to receive something of value now with an explicit promise to repay the same in the future. Credit enables entities (individuals and businesses) to get immediate access to the tools they may need (like education, machinery, etc) to enable them to produce better outputs in the future (like providing jobs to others).

Access to credit is considered to be one of the most important pillars for the economic development of a country: it increases competitiveness, creates job opportunities, eradicates poverty, builds wealth, generates assets, promotes flexibility, and fosters inclusive economic growth. Most importantly, it enables investment in human capital and businesses, and it has the potential to reduce inequality in society and drive economic growth.

Understanding the Credit Process

The credit life cycle begins when a potential borrower approaches a lender for a credit advance. From the lender’s perspective, the decision on whether to grant the loan or not is determined by the potential profitability and risk of the transaction. The lower the risk and greater the profitability to the lender, the more they would be willing to extend the loan.

Before deciding on the lending viability, lenders seek to gauge the credibility and repayment capacity of the borrower. Every lender (bank or lending institution) has its own internal norms and procedures for underwriting the loan and scrutinizing the applicant’s details and credentials. This process of assessing the lending viability is better known as “credit appraisal”.

To know the previous credit profile of a borrower, the lender considers the credit score and a detailed credit report procured from an established credit bureau. A credit bureau is an agency that collects and researches the lifetime credit information of the borrower and shares it with the lender.

This credit score forms an important component of the credit appraisal process performed by the lender. If the credit appraisal results are favorable, the lender will lend to the borrower, and if not, the lender will refuse to grant the loan.

Limitations of the tradidtional Credit Scoring System

Traditional credit scoring agencies had been providing great predictive outcomes to lenders for many years. Some of the underlying factors, which they used to determine credit scores included the borrower’s repayment history, current debt, amounts owed, type of debt, credit history, frequency of credits, and payment interest. Each credit-checking bureau has its own proprietary algorithm to assign a credit score to the borrowers.

However, the traditional credit scoring process followed by most bureaus has historically favored only affluent borrowers while leaving the less well-off borrowers with no way to obtain the loans. Traditional lenders have limited ways of assessing the credit worthiness of vulnerable groups like the poor, women, and small businesses as these groups often do not have any tangible data or credit history. This leads to a vicious cycle because without a loan, they cannot build a credit score, and without a credit history, they cannot avail any loan… the classic chicken-egg problem.

This has exacerbated the already wide supply-demand gap for credit products and its access. On the one hand, lenders were not able to advance loans due to the lack of visibility on the applicant’s credit score. As a result, their loan approval ratio and profitability had dipped substantially. On the other hand, many applicants had to face rejection as they were new to the credit market and didn’t have a lot of background data to assess their true repayment capacity.

This explains why lack of data has historically been an obstacle for banks and financial institutions to extend credit to the unbanked, and thus, an impediment to tapping opportunities at the bottom of the pyramid and achieving financial inclusion

Also, with the changing times, the way people look at credit and financing has undergone a sea of change over the last five years or so. The hitherto credit scoring techniques have not been able to match this radical shift in their way of assessing the creditworthiness of a potential loan applicant. Hence, the older credit scoring process is now slowly becoming outdated, making way for new and innovative ways of scoring applicants.

The Rise of Alternative Credit Scoring

In order to overcome the aforementioned limitations of the traditional credit scoring approaches, observers have proposed that lenders could possibly cast their nets wider and look for more data points, which could give them a holistic view of their borrowers. This methodology is referred to as “Alternative credit scoring” in the contemporary financial world.

Alternative credit scoring is a more inclusive credit scoring mechanism, which goes way beyond the traditional parameters employed by credit bureaus like Experian, FICO and CIBIL – and leverages many more data sources to assess the borrowers’ current financial standing and willingness to repay in order to get a more holistic credit risk assessment. 

The biggest beneficiaries of alternative credit scoring mechanisms are borrowers who are new to the credit and financing ecosystem. For such new borrowers, there is no sufficient centralized data available, but this doesn’t imply that they cannot avail credit. New-age alternative credit scoring companies use other tangible factors like a digital footprint to determine the credit-worthiness of a new customer.

This provides benefits at both ends. By democratizing access to credit, borrowers who are new to the credit ecosystem can still avail loan facilities irrespective of their lack of credit scoring data on traditional channels. Lenders also can utilize alternative credit scoring in order to boost their penetration in previously unexplored territories while still keeping their risk minimum.

Data for Credit Scoring

Building Blocks of Alternate Data

Multiple components are needed to bring alternate data models to life. Consumer consent and collaboration-based models will be the de-facto standard in the new world.

Financing companies that want to reach the “thin file” consumers should make faster, more reliable decisions with deeper insight and data points. It is estimated that formal credit scoring models generally use about eight to 10 variables. Meanwhile, alternative data credit scoring has the capacity to use more than 500 data points.

  • The major building blocks of alternate data include

    Building Blocks of Alternate Data

a. Systems: Systems should be fully integrated, online, and should provide data in real time. Many businesses are putting huge investments in machine learning and advanced analytics. However, many are still struggling to transform results into insights. Platform capabilities are needed to build, manage, and deploy the combination of real-time and batch data capabilities.

b. Data Collection: Data (structured/unstructured) should be captured in real time, be fully integrated, and be managed by automated processes. 

c. IT/Analytics: Resources should support reporting, MIS, BI, predictive analytics, and scoring models. Advanced analytics and machine learning combined with a wide range of accessible data can aid with problem solving as well as with improving customer insights.

d. Data analysis: Data should be used to predict behaviours, segment customers, funnel sales, assess risk, etc. Models run in real time.

e. Decision Making: Information should be used across the organization to drive business strategy and support operations, marketing, and risk management.

f. Proactive Management: Products should be marketed proactively and target customers according to segmented demographic, geographic, and behavioral models.

2. Solution design 

Banks hoping to use alternative credit scores to make lending decisions need to perform a series of steps. First, the shortlisted alternative data points need to be collected from various third-party data providers as demonstrated in the earlier sections. The collated data then needs to be pre-processed, so that relevant data points are extracted that can be used to run the ML models. Based on the results of the ML models, the prediction results of alternative credit scoring are determined. Finally, the decisions about loan application approval can be taken, for which a workflow needs to be developed.

 a. Development of machine learning models for default prediction: Machine learning models enable synthetic, effective, and dynamic indexes to be calculated that can facilitate rapid, informed decision-making in a changing and increasingly competitive environment. Machine learning models are the best candidates for the efficient extraction of value from data. They are constantly evolving as part of the search to cut analysis time, reduce discretionary power in the process, improve model development and validation, and produce high-performance risk indicators. Alternative credit scoring using machine learning requires the right procedures and processes to be in place. This section introduces the common pipelines and tools required to empower AI/machine learning for alternative credit scoring.
Development of machine learning models for default prediction

Source: Data for alternative credit scoring – HK Applied Science and Technology Research Institute (ASTRI) 

To support decision-making about loan applications and satisfy risk management requirements, there are three phases involved in developing a machine learning model for predicting default.

  • The Preparation phase selects suitable ML algorithms for model development and explores the alternative data available for credit scoring
  • Model training based on the selected algorithms and the pre-processed data
  • Evaluation – once the model has been executed against the pre-processed data, an evaluation of the result must be carried out consisting of model assessment, output consideration, and model interpretability


b. Model Selection: There is no single answer to the question of which machine learning algorithm is best to use for alternative credit scoring because default prediction is highly dependent on the type of alternative data available. Model selection, therefore, needs to be an exploratory process involving the continuous evaluation of multiple machine learning models. 

Broadening the data universe can be useful, but it also adds to model complexity. Once the model needs to compare variables consisting of numbers and characters (alphanumeric), which may have discrete or continuous distributions, it becomes important to decide which model generates the most accurate predictions of the probability of default. 

In practice, the best machine learning algorithm will depend on the problem that needs to be solved. All machine learning algorithms have their respective pros and cons as alternative credit scoring models. 

c. Data Exploration: In alternative credit scoring, the data available will dictate what type of model to employ and what information can be gained from the model’s output. Not all data is made equal. Nowadays, in-house data may comprise of data ranging from public financial information to temporal data. The process of credit scoring modelling requires rigorous and detailed exploratory data analysis to distinguish the variables and optimize the model. 

The input dataset is usually split into a training and a testing set. The usual rule for splitting the data is 70% for training and 30% for testing (or 80% and 20%, respectively) based on the Pareto Principle.

d. Model Training: ML algorithms establish statistical/mathematical models that can make inferences. The inputs of machine learning algorithms are the training data, also known as predictors or independent variables, and the outputs of machine learning algorithms are the responses, also known as predictions or dependent variables. The inputs and outputs of machine learning algorithms can be defined as either quantitative (numerical) or qualitative (categorical). A numerical output corresponds to regression problems, such as the future price of a stock. A categorical output corresponds to classification problems, such as whether a client will fail to repay a loan. In practice, the raw data needs to be processed into meaningful data of good quality. Next, the qualifying data is divided into a training dataset and a testing dataset. The training dataset is the input of the model built by machine learning algorithms, and the testing dataset is used to evaluate the performance of the model, such as the accuracy of its predictions in response to a certain question. 
Building Blocks of Alternate Data

Source: Data for alternative credit scoring – HK Applied Science and Technology Research Institute (ASTRI) 

e. Model Assessment: In the banking industry, alternative credit scoring models provide answers to questions such as “will this entity default in paying?” and “how much should be loaned to this entity?” The quality of the answers resides in the model output and its interpretation. On the one hand, the model needs to be as accurate as possible to avoid the bank incurring losses; on the other hand, the model needs to align with the bank’s working capacity and liquidity. For example, a financial lender employing a model that approves all of its customers for a loan may end up exceeding its limits. To avoid these kinds of situations, it is important not only to consider the accuracy of the model, but also to align it with the lender’s actual business operations. 

A machine learning model that rejects too many loan applicants may, for example, not allow the bank to deliver enough of their products. On the other hand, if the number of True Positives is large, the bank may not have enough staff to handle the cases individually. In conclusion, an alternative credit scoring model needs to perform well both quantitatively and qualitatively. The right threshold needs to be determined by taking the perspectives of both data scientists and business managers into account.

f. Feature Importance: Once the input data has been explored and the model has been trained and tested, an additional step is to analyze the importance of the variables within the model before outputting any results. Analyzing feature importance involves inspecting the variables and deciding whether a change in a variable (e.g., a change of distribution) would change the model output. Feature importance can be achieved by evaluating the variables by using software tools such as the Partial Dependence Plots10, ELI511, or SHAP12 Python libraries for specific algorithms, such as Random Forests and Boosted Trees. The reason for investigating feature importance is to further improve the model. The more important a feature is, the greater the amount of information it contains. Conversely, if a feature has low importance, it could be irrelevant for model training, and there would be no loss of model accuracy even if it is discarded.

By comparing the importance of all the features, a subset of features can be selected to replace the original training dataset. There are three advantages to applying feature selection:

  • The training time is shortened
  • Reducing the number of features can simplify the learning model and improve model interpretability
  • This process can effectively prevent the occurrence of overfitting and enhance the versatility of the model.


g. Model Interpretability: According to Miller, “Interpretability is the degree to which a human can understand the cause of a decision. The greater the interpretability of a machine learning model, the easier it is for humans to understand why a certain decision or prediction has been made.” Model interpretability has proven to be a barrier to the adoption of machine learning for the financial industry. If a model is not highly interpretable, a bank may not be permitted to apply its insights to its business. To help humans interpret the outcomes of machine learning models, a number of model interpretation technologies have been developed. These technologies include SHAP14, ELI515, LIME16, Microsoft InterpretML17, XAI — explainableAI18, Alibi19, TreeInterpreter20, Skater21, FairML22, and fairness23. 

3. Workflow for Alternative Credit Scoring


To facilitate the automation of the workflow for alternative credit analysis, an online lending platform is needed to manage the steps involved in the process. These steps include the structuring and categorization of data fields, analysis by machine learning, decision making, and continuous monitoring. An online lending platform can achieve shorter turnaround times for loan approvals, which, in stormy economic times, can be critical in helping entities to survive. It can also help lenders’ operations become more cost-effective in the processing of loan applications.

  • Step I – Structuring data from API channels: To streamline and automate the credit underwriting workflow, an online lending platform will collect alternative data related to the credit history of the entities directly from third-party data providers (with due consent). A straight-through transfer of this alternative data can be achieved by new open banking API interfaces that are being made possible by Open API initiatives. In compliance with the requirements of Open API initiatives, an online lending platform needs to maintain the status of the consent of entities. For example, the platform should revoke the consent of the entity if the validity period of the consent has expired or the entity decides it wishes to revoke its consent. A notable example of an Open API initiative is the revised Payment Services Directive (PSD2) that came into force in January 2018 in the European Union (EU). All regulated payment service providers in the EU need to comply with PSD2 and the Regulatory Technical Standards set out by the European Banking Authority. The requirements of Access to Account (XS2A) under PSD2 give financial institutions and regulated third parties access to the bank accounts of consumers. Structuring and categorization of input data from APIs and other direct access methods are critical pre-processing steps required before the structured data moves to the next step. Data fields coming from the API channel are pre-defined and well-structured.
  • Step II – Structuring data from bank statements through OCR and XS2A: Entities can also upload their own bank statements and financial statement documents to the online lending platform. OCR technology can then be applied to locate, pull, and capture the data fields from the uploaded documents. Alternatively, entities can authorize access to their bank statement information via access channels that comply with the requirements of XS2A. 
  • Step III – Categorization of data variables using NLP and machine learning: The data fields captured by OCR and other direct access pipelines are unstructured and not pre-defined, and so they need to be structured. The next step is to perform the categorization of data fields by transaction text analysis. Natural Language Processing (NLP) technology is required to determine the meaning of data fields and categorize them into the data variables that are required by the machine learning model.
  • Step IV – Continuous monitoring: The online lending platform can perform a reassessment of the entity’s creditworthiness based on any up-to-date alternative data received. Continuous monitoring of changes in the entity’s creditworthiness can help lenders control and minimize their risk exposure. Compared with the conventional approach to credit scoring, another major benefit to lenders deploying an online lending platform is that it provides them with the ability to perform continuous monitoring of the ongoing financial risk associated with the entity in their lending portfolios. With continuous monitoring, an online lending platform can evaluate smaller loan credit lines more often and detect the following situations:
    • Tendency for delinquency
    • Change in risk profile
    • Potential loan application fraud
    • Signs of risky credit conditions  
  • Step V – Champion and challenger models: In the final step, the results of the alternative credit scoring using a ML model need to be combined with the results of the conventional credit scoring model. Using both conventional and alternative credit scoring models is a prudent strategy for financial lenders. The combining of results supports both the final decision-making and risk management for continuous monitoring. Due to the nature of ML algorithms, alternative credit scoring models require historical data and iterative fine-tuning to improve their accuracy. The insights generated by conventional models should, therefore, always be used as a basic reference for creditworthiness assessments. One common approach to managing the coexistence of conventional and alternative credit scoring is known as the champion–challenger approach. 
  • The champion–challenger approach involves comparing the results of a conventional credit scoring model (champion) with the results of different alternative credit scoring models (challengers). Financial lenders can adopt this approach to compare credit score outputs from the existing champion with those by a number of challengers, which are dynamically created by adjusting different rule sets. Reliable ways are required for comparing the effectiveness of champions and challengers and measuring and combining the results. 
  • To support decision making, the individual champion and challenger credit scores should be used to generate a combined scoring result. The combined scoring result of the champion and challenger models can be visualized by using three separate methods. 
    • Score Matrices: The risk scores of the champion and individual challengers can be compared by a matrix representation in which different bands of risk level can be identified.
    • Decision tree: Loan applications can go through different assessments using the champion and challenger models. The risk scores of the champion and individual challengers can then be assessed phase-by-phase in a specific sequence. Based on the sequence of assessment, loan applicants may be granted a second chance.
    • Composite score: Decisions can also be made based on a composite score generated from multiple individual scores. Machine learning techniques, such as logistic regression and stacking, can be deployed if a composite risk score is the result of combining the scores of the champion and multiple challengers.

Human discretion is involved only when the final decision for a loan application is made. The advantage of this approach is its flexibility in considering the results generated by challengers by using a wide range of alternative data (both transactional data and non-transactional data).

4. Possible decisions 

Running machine algorithms on borrowers’ data points can give the lenders the ability to make a variety of decisions and enhanced insights, like:

a. Interest rate determination: Lenders often base their interest rates on the borrower’s perceived intent and ability to repay. The higher the risk, the higher the interest rate is.

b. Pre-approved loan amount: Based on the borrowers overall financial standing (ability to repay) and his perceived behavior (intent to repay), lenders can offer pre-approved loans up to a certain amount, which can help lenders make their products attractive and remain competitive.

c. Customized repayment terms: Repayment terms can be customized for every borrower based on their specific needs. Flexibility can be given either on loan tenure, repayment installments, frequency of repayments, repayment holidays, or any other parameter.

d. Collection scoring: This refers to the degree to which the lenders are able to recover their loans in case of defaults.

e. Attrition scoring: This refers to the borrowers who prepay their loans.

f. Likelihood of re-borrowing: This refers to the likelihood of borrowers to borrow again from the same lender.

At the core of the alternate credit scoring companies’ competencies are three key factors – the ability, intent, and stability of the customer to repay the loan measured on the basis of these innovative scoring systems.

Alternative credit scoring demonstrates the potential strength of combining data from multiple sources like airtime usage, mobile money usage, geolocation, bills payment history, and social media usage.
Alternative Data for Credit Scoring

Alternative data can take various forms, ranging from data based on observations of the borrower’s operations to data relating to the business principal’s personal risk characteristics and credibility.

The various sets of data that can be potentially used to determine an entity’s alternate credit score (ACS) include:

1. Transactional Data:

Transactional Datawhich refers to the records of business activities between a company and its customers. They usually include revenue-related information (cash flow data) and non–monetary-related information (non-cashflow data). The behavior trends generated by an analysis of revenue-related information can be used to assess the latest financial status of a company. At the same time, nonmonetary-related information can produce insights that are useful in predicting a company’s creditworthiness. 

With machine learning techniques that are currently being used to perform trend analysis and default prediction, transactional data is becoming a promising type of alternative data for credit scoring. Opportunities to acquire transactional data are being created by open banking and OpenAPI initiatives while financial institutions can also source transactional-based data from third-party data providers.

a. Cash Flow data:

i. Bank transactional data refers to all of a bank account’s cash inflows and outflows. Bank accounts are typically used by entities for receiving revenue and settling payments. Generally, bank transactional data is held by banks in large quantities and is of credible quality. Only recently did banks realize that they could take advantage of this huge amount of data. Transactional data is also relatively easy to retrieve as its recording is fully automated and it can be transferred between banks with the consent of the bank account’s owner. With access to an entity’s bank accounts, banks can formulate a new type of credit scoring model using their bank transactional data.

ii. Payment transactional data: In the retail industry, customer payment transactional data represents the sales activities of an entity. Alternative credit scoring can be performed on transactional data to identify whether the revenue of an entity for any given year/season/month is good or not. Although payment transactional data is not directly related to defaults on loans, they can be used to predict trends and patterns in an entity’s revenues. For example, with access to an entity’s payment transaction history, providers of online payment service platforms in the U.S. (such as PayPal, Amazon, and Square) can offer alternative lending services based on the entity’s sales data. In the trading and logistics industries, supply-chain payment profiles capture the ability of entities to pay their suppliers on time. Similarly, invoice information records can be used to analyze the status of and trends in an entity’s revenue streams. 

iii. Miscellaneous: Other excellent sources that could provide relevant information about an entity include, but are not limited to:

Supply-chain payment data.

1. Utility transaction profiles: electricity consumption.

2. Telco transaction profiles.

3. Shipping records and logistics data.

4. Account records.

5. ERP database: Invoice records, A/R records.

b. Non-cashflow data: Besides cash flow data, transactional data records carry information that is not related to cash flow, but can be used to determine the quality of an entity’s business, i.e., non-cashflow data.

i. Target customer profile: The transaction values and patterns in the transaction records can be used to categorize the spending profiles of customers.

ii. Quality of customers: The payment identifications of the transaction records can be analyzed to identify the profiles of customers and ascertain if they are recurring or one-time customers.

iii. Quality of transactions: The percentages of cancelled, reversed, and voided transactions can reveal the quality of engagement with customers.

iv. Risk of fraudulence: Non-cashflow data provides credible sources of information relating to risk factors and problematic transactions for fraud detection and credit-related analysis. 

2. Non-transactional data:

a. External reports:

i.Company’s credit analysis reports: Quantitative data from credit bureau reports include reference data that can be very useful for assessing the creditworthiness of entities. The Commercial Credit Reference Agency (CCRA) was established in Hong Kong under an industry initiative supported by the Hong Kong Monetary Authority. The CCRA in Hong Kong is an organization that collects information about the indebtedness and credit history of business enterprises and makes this information available to lending institutions. After receiving consent from the entity, lending institutions can check with the CCRA about the entity’s credit record to help them assess a loan application. The CCRA increases lending institutions’ knowledge of borrowers’ credit records, expedites the loan approval process, and helps strengthen lending institutions’ credit risk management. The fact that information about borrowers can be exchanged by lending institutions also incentivizes borrowers to repay their loans and helps to reduce the overall default rate.

ii. Personal credit reports: The personal credit history of a borrower’s key personnel can provide lenders with a view into the lending and repayment behaviour of these individuals. As these people are the decision-makers in the company, their credit behavior is likely to have an influence on the company’s credit behaviour. For example, if the proprietor is repaying debt regularly without default, then it is very likely that the company will do the same.

iii. Data from business lending partnerships (Google, Alibaba, and Sam’s Club): This data can give lenders a holistic view of the concerned business’ overall finances. 

iv. Third-party business/products/services review (e.g., Alexa Global Rank, Yelp, Foursquare, Amazon, and eBay): In cases of businesses, lenders can check what people are saying about the business in question in order to gauge their reputation and reliability.

b. Behavioural traits:

i. Psychometric test: A psychometric test is a standardized tool used to objectively assess traits that are not visible on the physical level (such as personality, intelligence, motivations, and needs). It is a way to understand an individual by better understanding their personality, achievement orientation, intelligence, needs, and motivations. These constructs are used for the psychometric test as they are usually found to be consistent. They can be mapped in an individual, and they can be used for profiling. Psychometric tests can provide valuable insights into an individual’s ability and willingness to repay loans. They can also provide valuable insights into an entity’s future by assessing individual personalities. Depending on the legal structure of the entity, the key person whose personality influences the entity’s prospects will be different.

1. Sole Proprietorship: The sole proprietor is responsible for all acts performed in the capacity of the business owner. Thus, the relevance of personality assessment is very high.

2.Partnership: Although certain forms of partnership (e.g., Limited Liability Partnerships) make the owner responsible for only certain types of debts, the majority of the decisions are taken by the partners. Therefore, a personality assessment of the partners can provide major insights into the prospects of the enterprise.

3. Limited company: A limited company has its own corporate identity, and the company’s liability is not the liability of its shareholders. Thus, personality assessment is not especially relevant and is not needed for this class of entities.

4. Sentimental analysis: Sentiment analysis (or opinion mining) is a natural language-processing technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand a customer’s needs.

c. Other relevant data points:

i. Geolocation: Mobility from call detail record data, roaming transactions, day-and-night presence, density of location, prominent location with attributes from census, and publicly available data.

ii. Remittance history is a good indicator of borrower creditworthiness.

iii. SIM data: Information about SIM activation, tenure of customers alongside other details indicated in the application form, postpaid defaults, credit, churn and payment information, and mobile wallet information.

iv.Smartphones reveal a host of mobile usage parameters that are now being used for building risk profiles of customers. For example, if a prospective borrower makes or receives calls from many different people throughout the day, it can probably be inferred that the borrower has a large social circle and would want to protect his or her social status by repaying his financial obligations on time.

v. Social media mining can be used to build profiles of customers based on their online presence. For example, if a defaulting customer of a bank is part of the social network of a prospective borrower, then such a customer can be regarded as high-risk. The assumption here is that prospective customers will have traits and behaviors similar to their online contacts.  

vi.Top-Up information, type, size, and frequency, alongside the channel of top-up, the channel of bill payment, the invoiced amount, the payment terms, and the mode of payment—bank, credit, e-wallet, etc.

vii. Utility bills analysis can give interesting insights into the creditworthiness of a customer. Regular, on-time payments indicate a user’s sense of personal responsibility, which can drive up the person’s credit score.

viii.Calling/SMS Patterns: Statistics on call duration/count, towers, SMS sent and received, time-of-day calling, inactivity, and calling consistency.

ix.Data Usage: Data used, revenue generated from data, hourly usage of data, data-related value-added services, applications used, and websites browsed during the day and night.

x. Demographics: Demographics information recorded at the time of a customer filing for a SIM card as well as inferred data.

xi. Other reference data: Other reference non-transactional data points include, but are not limited to:

1. Intellectual properties: patents, trademarks, etc.

2. Physical asset value.

3. Industry recognitions: awards.

4. Size of customer base.

Machine learning can be used to pick up the borrowers’ micro patterns. For example, raw call detail records can be transformed into behavioural patterns to correlate with risk, ultimately providing lead generations for financing companies. The use of machine learning can give banks better insights, increase their sales through improved credit approval rates, reduce bad debt through better exposure management, and minimize processing time through automated decisions.

Advantages and Limitations of Alternative Credit Scoring

Like any other scoring system, alternative credit scoring methods have their own advantages and limitations.


1. Holistic information: Alternate data presents more insights into an entity’s creditworthiness, which the banks can use to make better-informed decisions.

2. Credibility: Data from third-party sources are more credible and not easy to manipulate while traditional financial data is subject to accounting manipulation.

3. Uncovering credit invisibles: Early movers can tap into traditionally excluded, but worthy borrowers consisting of thin-file and no-file applicants.

3. Fraud Detection: Digital automation and machine learning helps detect abnormal patterns of business operations, from which lenders can detect fraud and implement risk mitigation measures.

4. Uncovering new revenue streams: A lender’s typical decline traffic has a significantly qualified applicant hidden in it. With accurate, reliable, alternative data for credit scoring, you can turn decline traffic losses into revenue wins.

5. Continuous monitoring: With alternative data, the lender can monitor the borrower’s actual business situation and get a more complete and comprehensive view of a consumer’s creditworthiness.

6. Gaining a competitive advantage: The quest to expand credit access among the next billion population is becoming increasingly competitive. Being among the first movers to leverage alternative credit data and penetrating emerging markets can be a huge advantage.


1. Data quantity: There must be enough alternative data available to build the machine learning models. By nature, alternative data is more difficult to process than financial data because the data format is often unstructured.

2. Data quality: It is extremely important to guarantee the quality of the alternative data when creating a reliable risk assessment model as data points containing no value or having a large variance can compromise the output of the model.

3. Data privacy: Personal data or data that is in aggregate can be used to piece together an individual’s identity, which has become a lightning rod for regulators. Data protection and privacy laws will likely be applicable if alternative data sources contain personal data.

4. Model fairness: Using the correct data in the machine learning model is crucial for ensuring the appropriateness of that model. Indeed, the dataset is often the first place where bias is introduced into a model, and this situation also applies to alternative data.

5. Special engineering efforts: The adoption of alternative data requires engineering efforts in the areas of data science and machine learning. Lack of relevant human resources will hinder the development of alternative credit scoring. Banks need talent in these areas to adopt this new approach.

What is the Future of Alternate Data?

ACS’ advantages and limitations coexist. The financial services industry, specifically the credit industry, has mixed opinions about the viability of alternative data. There are still many uncertainties regarding the future adoption of alternative data for use in credit scoring. A key question is how to best combine the use of alternative data with conventional financial data in order to improve credit scoring performance.

On one hand, it is believed that the adoption of alternative data will continue to rise with the explosion of digital interfaces or digital interface points with consumers. Another factor that is likely to bolster its adoption is the significant reduction of cost, computing power, and data storage. 

On the other hand, there are concerns about its accuracy and dependability. As some of the possible alternative data points are subjective in nature, there are concerns as to whether one could go about measuring them.

Summary: A brief summary of the blog can be found in this presentation