Labeling data is the process of assigning a class label to each data item in a dataset.
Hand labeling
Natural labeling
Weak supervised labeling
Semi-supervised labeling
Active learning
Labeling is a key part of many ML system workflows.
The Kappa Statistic
Kappa measures inter-rater agreement for category labels.
Percent Agreement\[
p_a = \frac{a}{n}
\] …where \(a\) is the number of times the raters agree, and \(n\) is the number of ratings.
Kappa Statistic\[
\kappa = \frac{p_a - p_e}{1 - p_e}
\] …where \(p_e\) is the probability of agreement by chance.
A labeled dataset with a high Kappa is often known as a gold standard dataset.
The Kappa Statistic - Example
Here, we compare the statistics for two different label-sets.
# The original test data from the example:t11 = ["negative", "positive", "negative", "neutral", "positive"]t12 = ["negative", "positive", "negative", "neutral", "negative"]# Modified: same agreement count, only two possible answers:t21 = ["negative", "positive", "negative", "positive", "positive"]t22 = ["negative", "positive", "negative", "positive", "negative"]print(f"Agmt: {accuracy_score(t11, t12)} Kappa: {cohen_kappa_score(t11, t12)}\\nAgmt: {accuracy_score(t21, t22)} Kappa: {cohen_kappa_score(t21, t22)}")