Type your query or search by category
< All Topics

How do I know which rules are the most relevant ones? Are there any metrics to summarize all results?

For each rule discovered, you get a set of pre-computed measures. These measures highlight different characteristics of the rules. For example, a high support means that the rule occurs very frequently, while a high confidence indicates that the rule has a high predictive power (meaning that whenever the antecedent occurs, the consequent tends to occur). Unfortunately, there are no magical thresholds or a rule-of-thumb that can tell us whether an association is good enough in absolute terms. Is a support of 1%, 10% or 20% good enough? It will depend on many things including the subject-domain of analysis, but above all it will depend on what your main goals are and what you are looking for.

For example, you may be looking for rules that occur very frequently or you may be caring more about the strength of the relationship itself. A very typical example in this regard is the caviar and champagne analogy, both items are usually bought together but they are not part of the daily shopping. So, the association rule is expected to show a strong relationship between the items (high lift) but quite low frequency in the dataset (low support). Shifting through the patterns to identify the most interesting ones is not a trivial pipeline, as it can be subjective. Do not forget that one person’s trash might be another person’s treasure.

Pre-computed association measures give a more objective estimation of those associations’ performance, but at the end it is the domain expert who needs to say whether a rule is interesting as it may reveal unexpected information about the data and potentially can provide useful knowledge that can lead to profitable new response actions. Incorporating subjective knowledge into pattern evaluation and iterating the model accordingly is an essential part to get meaningful associations.

Table of Contents