Class Similarity
public class Similarity extends Object
Similarity uses a client-provided definition of label similarities, where 0 is least similar and 1 is most similar.
The similarity between two nonempty multi-interval sets is the ratio: (sum of piecewise-matching between the sets) / (span of the sets) where the span is the length of the smallest interval that contains all the intervals from both sets, and the amount of piecewise-matching for any unit interval [i, i+1) is:
- 0 if neither set has a label on that interval
- 0 if only one set has a label on that interval
- otherwise, the similarity between the labels as defined for this Similarity instance
For example, suppose you have multi-interval sets that use labels "happy", "sad", and "meh"; and similarity between labels is defined as:
- 1 if both are "happy", both "sad", or both "meh"
- 0.5 if one is "meh" and the other is "happy" or "sad"
- 0 otherwise
Then the similarity between these two sets:
- { "happy" = [[0, 1), [2,4)], "sad" = [[1,2)] }
- { "sad" = [[1, 2)], "meh" = [[2,3)], "happy" = [[3,4)] }
would be: (0 + 1 + 0.5 + 1) / (4 - 0) = 0.625
Label similarities are provided as a list of definition strings, where each
one must contain exactly three pieces, separated by one or more spaces. The
first two pieces give a pair of labels, and the third piece gives the decimal
similarity between them, in a format allowed by Double.valueOf(String)
,
between 0 and 1 inclusive. The definition strings may not contain newlines.
Similarity between labels is symmetric, so the order of labels in each pair is
irrelevant. A pair may not appear more than once. The similarity between all
other pairs of labels is 0. This format cannot define non-zero similarity for
labels that contain newlines or spaces, or for the empty string label.
For example, the following 5 definitions give the similarity values used above:
happy happy 1 sad sad 1 meh meh 1 meh happy 0.5 meh sad 0.5
PS2 instructions: this is a required class, and you MUST NOT weaken the required specification. You MAY strengthen it, add additional methods, etc.
-
Method Summary
Modifier and Type Method Description static double
similarity(List<String> similarities, MultiIntervalSet<String> a, MultiIntervalSet<String> b)
Compute similarity between two multi-interval sets under the given definition.
-
Method Details
-
similarity
public static double similarity(List<String> similarities, MultiIntervalSet<String> a, MultiIntervalSet<String> b)Compute similarity between two multi-interval sets under the given definition. Returns a value between 0 and 1 inclusive.- Parameters:
similarities
- label similarity definition as described abovea
- non-empty multi-interval set with string labelsb
- non-empty multi-interval set with string labels- Returns:
- similarity between a and b as defined above (or as close as possible within the precision of a double)
-