Is the #bigdata useful for credit risk analysis?
In last few months, there has been an increasing discussion about Big Data and how people can exploit it efficiently to improve their own models. We in modeFinance can say that we have Big Data (through an agreement with Bureau van Dijk, all financial data of more than 70 million corporates in over 200 countries … let’s say hundreds and thousands of digitized financial statements!) and we deal with credit risk analysis, practically assign the credit rating and the commercial credit limit to all companies around the world. But do we use the Big Data in our work?
The answer is yes, but with caution! Now I will try to explain briefly why!
I taught Fundamentals of Numerical Design for a while at the University of Trieste (Now, I am teaching Fluid Dynamics) and I also developed numerical models that can help engineers improve their products. Among the methods that I taught (and I continue to develop) there are all machine learning methods (although- as an academician- I should not be giving but, here is the link Wiki ).
In a nutshell and a simplistic way: these methods that learn from the data given to divide them into categories (look at the figure 1). Basically, the Machine Learning model creates the blue line to separate the green circles from red squares (circles and squares are our data).
If you think that the squares are companies that failed historically and the circles are healthy businesses, one could say that the Machine Learning is just right model for you: you have practically created a mathematical model that separates companies from healthy ones from not healthy ones! (Figure 2).
But there is one big problem: you have to have a COMPLETE database in your own hands, that is; you have to be able to know the whole universe, both healthy companies and companies which went bankrupt. And here come big problems: in Italy, thanks to the very efficient work of Chambers of Commerce and Companies’ Registry, we have this database, also in some European countries (such as France, Spain and UK), but outside of Europe? Absolutely missing: Machine Learning therefore will be fine for Italy and a few other countries, but it absolutely is not suitable for all over the world! Well then it is a problem!
Can we say Big Data does not serve in our field, so that modeFinance should not use the huge amount of data held? No, Big Data is fundamental and modeFinance uses it daily, but with caution: using them to understand the economic environment in which a company operates (a service company in India operates in a completely different environment from an Italian industrial enterprise) is one thing, but exploiting the Big Data DIRECTLY to build the evaluation model is another thing which we modeFinance find particularly dangerous.