Machine Learning, referred to as “ML” by tech folks, is simply the use of powerful computers and software tools that work on large sets of data to discover patterns and make predictions. The reason you hear about “big data” is because you need a large enough set of data to find statistically significant correlations that let you make predictions with confidence.
For example: When New York City released the data on a year’s worth of taxi fares, it became clear where people were flagging cabs at what time of day and day of the week. Much of what was discovered was intuitively true for any experienced cab driver but machine learning analyses could also predict the amount that would be tipped based on location/time/fare and other factors, as an example of finding correlations. And researchers using the same data discovered that variations in the taxi cab software were causing differences in amounts tipped.
But the reason ML is really taking off now is a combination of factors: cheap and powerful computing resources, mature algorithms and tools (many of which are free to use), and an abundance of data. This last point is important to fully appreciate, because our lives are increasingly throwing off tons of data about us and what we’re doing. Every Facebook login, every cell phone call, credit card swipe, freeway toll, even viewing this web page generates a stream of data about who looked, when, and where they came from. As more of our lives and work migrate to digital information (such as accounting systems online, point of sale terminals connected to the internet, order tracking, even customer reviews and feedback) there’s simply more data for ML to chew on. So it’s feasible, it’s getting cheaper, and it’s getting more accurate at a rapid rate.
The idea of using ML in predicting who’s not going to repay their loan is not new. Back in 2011 (ancient history in tech) Kaggle, a machine learning web site, ran a competition for the most accurate algorithm to predict financial distress. IBM is now using their big data and ML tech to tackle credit card fraud detection. Paypal is using ML to detect fraud too. And with publicly available Lending Club data, people are sharing ways to analyze the data and predict bad loans. Increasingly, tech-savvy companies do this where there is plenty of data and a compelling economic benefit from analyzing it at scale.
Not surprisingly, applying ML to business lending makes sense where the data is available. But the challenge with business lending is of course that it’s a lot more complicated than a consumer loan. While a FICO score and some demographics on the borrower may be sufficient to drive consumer credit predictions, the set of data points about a business are going to encompass available financial data (AR, AP, GL, Inv) along with 3rd party data sources and any secret sauce calculated values that a lender develops as part of their custom model.
What this means for business borrowers is that lenders will be asking for their data -- hungry algorithms need to feed their models and a business that can provide the right data is going to be a valued borrower. Lenders will seek out and even reward borrowers who have their accounting in good shape and can share that data directly (e.g. cloud-connected accounting systems), as well as authorize integrations with other parts of their ecosystem such as banking, legal records, and more. It’s always been the case that a good borrower was rewarded with access to capital, but with the advent of bigger data and machine learning lenders can now process non-traditional data sources to go deeper into more complex sources -- providing a more unique view of each business in an efficient and timely way. This is a good development for the borrowers and the lenders, working together.
Daniel is the Chief Technology Officer of The Credit Junction