While using a data-driven approach, GiniMachine first analyses your historical data to create a model fully suitable to your unique business needs. That is why the very first step is data preparation.

Gini works with raw data - no need for data cleansing beforehand.
Data types: numerical, categorical, ordinal, time series data (e.g. dates).
The minimal number of records necessary to create a model depends on your business case. Practise shows that the volume of records of a smaller class (e.g. unpaid debts) should be at least 1000 (not duplicates, but different/unique scenarios). The number of records of a larger class (e.g. paid debts) depends on the entire ratio of rows in your dataset.
You can also experiment with different sets of Attributes - it’s up to you how to evaluate your debts/debtors. However, it’s recommended to add the results of your collection/recovery activities. It’ll contribute the following analysis and benefit the future scoring and decision-making process.
Your dataset should be of an Excel or CSV file format, and contain a target column - it’s called “Paid”. Values there should be marked in a binary mode - a historical output per record (debt or debtor). E.g. whether this debt is paid - 1 (one) or not - 0 (zero). You can define the payment in full (if we work with bad debts or recoveries) or a partial payment (mostly suitable for routine collection activities).
The aforementioned is necessary for Train&Test of the system’s machine learning algorithm.
The list of metrics to be used for your collections scoring dataset may be different and depends on variables such as: your business region, business sphere, credit product specification, customers’ type, etc. E.g. the method of contact with the debtor is a significant factor (this field is required). Ones used by GiniMachine clients can be also applied to your dataset, feel free to utilise this experience and add there your evaluation criteria leveraged by your team.
- Categories: Contact type (online, offline), contact source (email, phone, face-to-face), outcomes from outbound activities, collection agent ID, number of successful/failed calls, late payment in previous months, length of relationship with the company, amount spent by the client with the company in previous transactions, previous contacts with the client, etc. Please keep in mind that collection scoring will be interconnected with the fields (criteria) by which we want to further score our borrowers. So, your collection questionnaire should contain straightforward answers to further complete them as “a criterion result” in your dataset.
- Activity metrics: Number of phones in the registration, Number of e-mails
in the registration, Number of previous contacts by telephone, Number of
previous contacts by e-mail, etc.
Different KPIs such as: - Days Sales Outstanding (DSO)
- Collector Effective Index (CEI)
- Right Party Contacts (RPC) rate
- Percentage of Outbound Calls Resulting in Promise to Pay (PTP)
- Profit per Account (PPA)
- Bad Debt to Sales
- Active Customer Accounts per Credit and Collection Employee
- Cost Per Sales Dollar
- The cost of collection
- High-Risk Accounts
- Browsing your file on your personal computer,
- Dragging and dropping a file into the user interface,
- Or simply pasting your file’s URL.
Model building process takes seconds, sometimes minutes if you upload sets of a huge size. Time taken is shown on the predictive model’s screen.
It’s possible to perform addition actions to your model such as edit naming, add to favourite, export model, delete, import, etc.

An import model button is placed in Collection models tab screen.

A model cannot be deleted if it has made collection scorings.
Model’s predictive power can be checked through Gini index. According to industry standards, if Gini index is around 0.55 and higher, it signifies high model predictive power.
On Gini index box, the system presents your sample stats: total records count, the number of paid and unpaid debts according to your datafile and the volume of records taken for self-training process. The training set is randomly separated from your historical file and used by GiniMachine for model’s validation. By default, splits are 70/30 or 90/10 where the first digit is the train set and the second digit is the volume of data to be randomly taken for validation.
Gimmick! If you want GiniMachine to use other train&test splits, add a column named “Validation” to your historical dataset. In this column, badge rows: 1 or true for test/validation, 0 or false - rows for training. Then rebuild a predictive model.
The higher the model performance is, the more trustworthy predictions are.
For outsourcing collection agencies, results of the model’s performance could be a part of the future presentation to a customer - having a preliminary analysis done based on your client’s data increases your chances to finally sign an agreement.
More details on Gini index dashboard allows investigating the main model performance indicators.
NPA. It stands for non-performing assets and refers to a classification of loans or advances that are in default or in arrears. NPA is a good signal for you to evaluate your debt portfolio and whether collection/recovery strategies you’re utilising now work fine. The lower NPA, the better.
Density distribution by classes. Using this graph, you can check how well the model can predict the desired results and how often it makes mistakes. The blue curve represents all the positive records, and the grey one shows negative records.
ROC-AUC chart. The Receiver Operating Characteristic Curve is also known as the ROC curve. This plot represents the dependency between True Positive (TPR) and False Positive Rates(FPR). AUC stands for the area under the curve of ROC, and can be anything between 0 and 1.
AUC 0 – your model makes mistakes, meaning that predictions are 100% wrong;
AUC 0.5 – your model makes random predictions, almost no predictive power;
AUC 1.0 – it’s an ideal model with the ideal measure of separability. But probably, it cheats on you.
We suggest relying on ROC between 0.75 and 1 - it shows a trustable parameter of model’s predictive capabilities.
Kolmogorov – Smirnov Score. Another way to evaluate the predictive power of the machine learning model is by running a Kolmogorov-Smirnov (K-S) test. This test is more complicated but more powerful and in combination with the ROC curve allows us to evaluate model performance more precisely.
Notwithstanding, when you evaluate the prediction power of the model built by GiniMachine, it’s important to analyze the scores you get together from all the aforementioned indicators.
More details on model performance in GiniMachine: Machine Learning Model Evaluation: Gini Index | ROC-AUC | Kolmogorov – Smirnov Score

In the attribute importance report, the color contrast the decision-impactful parameters pop from the debtors’ profile. That’s how GiniMachine reveals the hidden dependencies/patterns for valuable insights. E.g. showing an average profile of your good/bad debtor. Such pivot tables show simple dependencies that are easy to detect and using them to validate business hypotheses on the go.


It’s a threshold for your future decision-making process. Cutoff selection in GiniMachine helps to find a balance between increased risks and missed revenues.
Pushing How it works button, a user can calibrate the selection mark. By moving a slider, you can choose a) the paybacks’ amount, b) adjust the volume of right and wrong predictions, c) plan your NPA.

For simulation in the Cutoff selection window, the system uses data from a test set (the random records number taken for model validation during a train&test process).
The very value of a collection scoring is
- to present the predicted outcomes in table-like format,
- to explain them through scoring interpretation (helps to avoid biases),
- to indicate the best collection strategies per debt/debtor.
Scoring in GiniMachine can be done through GiniMachine user interface or via API.
All collection scorings are grouped in Collection history tab. For checking scorings per model, open any predictive model and find the field named “Collection history”.

- from the predictive model dashboard
- from the main dashboard by selecting the appropriate model from the list and uploading a scoring file.
The attributes in the scoring file should match the attributes of your historical dataset used for creating your predictive model. Do not add “Paid” column to your file for scoring.
When uploading your file with new debts for scoring, the system will ask to checkbox focus-parameters for your collection strategy. At this step, choose the column or the list of columns to strategize your collection/recovery with as much debt revenue as possible.
- If you select columns when uploading debts/debtors for scoring, the system will use the Cartesian product logic to present all possible combinations of dependencies and their effectiveness. The application of this method with collaboration of the model predictive power allows for finding out the better collection strategies per debt/debtor.
- If you’re not sure of selecting the columns for Cartesian product, click Next button. This approach predicts potentially performing debts/debtors.

All incomplete scorings will be saved in the Collection history tab and highlighted in red. You may resume them any time.
The scoring results are displayed with a table, where each possible Cartesian combination have the score mark and prediction. It’s up to a GiniMachine user to pick the most efficient collection/recovery approach - you may do that while relying on the highest score mark. If it’s too low, probably, it’s the right moment to score your debts/debtors with another model which is for a different collection strategy.

Scoring interpretation might be found while clicking on a record in the table. The numbers near each attribute represent the share of scoring taken away (if minus (red)) or added (if positive (blue)) to the final score - negative or positive impact.

The scoring results might be received:
- in the GiniMachine UI,
- via API to your own system’s UI,
- or while downloading Excel. In the Excel as well as via API, it’s possible to additionally filter results. It’s especially helpful if you score huge debts’ sets.

At some point, predictive models may steadily lose their predictive power and need to be updated. It may happen due to the data drift or due to the concept drift.
The data drift takes place when the input data has changed, and the distribution of the variables is meaningfully different. For example, the business started working with borrowers from a different region, so a new column appeared in the dataset. However, the model will still perform well on the data similar to the “old” one.
The concept drift occurs when the model’s patterns learned are no longer relevant. In contrast to the data drift, the applicant’s data structure may even remain the same, but the relationships between the model inputs and outputs change. For example, economic crisis
How to find out the right time for the model update?
Check out the dynamics and the rejection rate in the Monitoring tab of the GiniMachine interface. In case of sudden pattern changes, some investigation and model updates may be required.GiniMachine is a no-code AI-based platform that can be used for scoring, predictions, and decision-making.
The system uses historical data to build high-accuracy predictive models in minutes.
It is ready-to-use and has an intuitive interface with explanations. GiniMachine brings the power of AI/ML to serve your business and, at the same time, it requires no data scientists or machine learning engineers on board to operate it.
Got a question?
Contact us: it@ginimachine.com