How can I interpret the evaluation results for my classification model?
For classification models, the evaluation results are based on the following metrics or measures:
- Accuracy: is the number of correct predictions over the number of total instances that have been evaluated.
- Precision: the higher this number is, the more you can find all positives correctly. A low score implies that you detected positives where there were none.
- Recall: if this score is high, you haven’t missed many positives. If the scores are lower, it means that you are not predicting the positives that are present.
- F-Measureor F1-score: this is the balanced harmonic mean of Recall and Precision, giving both metrics equal weight. The higher the F-Measure is, the better.
- Phi Coefficient: the Phi Coefficient is a measure of association between variables and takes the True Negatives (explicitly) into account.
Knowing whether these measures are good enough to start making predictions with unseen data is quite subjective. You will need to decide which level of accuracy is the one you accept to trust your training model. For instance, imagine you own a supermarket, and you are purchasing two kinds of products: the fresh ones that need to be sold quickly and other products you do not mind having in stock (you would still like to maintain optimal levels of stock, however). In the second case, getting a model with 85% accuracy may be acceptable for you. For the fresh products, however, better value for accuracy might be required.