| Copyright | © 2016–present Mark Karpov | 
|---|---|
| License | BSD 3 clause | 
| Maintainer | Mark Karpov <markkarpov92@gmail.com> | 
| Stability | experimental | 
| Portability | portable | 
| Safe Haskell | None | 
| Language | GHC2021 | 
Data.Text.Metrics
Description
The module provides efficient implementations of various strings metric
 algorithms. It works with strict Text values.
Note: before version 0.3.0 the package used C implementations of the algorithms under the hood. Beginning from version 0.3.0, the implementations are written in Haskell while staying almost as fast, see:
Synopsis
- levenshtein :: Text -> Text -> Int
- levenshteinNorm :: Text -> Text -> Ratio Int
- damerauLevenshtein :: Text -> Text -> Int
- damerauLevenshteinNorm :: Text -> Text -> Ratio Int
- overlap :: Text -> Text -> Ratio Int
- jaccard :: Text -> Text -> Ratio Int
- hamming :: Text -> Text -> Maybe Int
- jaro :: Text -> Text -> Ratio Int
- jaroWinkler :: Text -> Text -> Ratio Int
Levenshtein variants
levenshtein :: Text -> Text -> Int #
Return the Levenshtein distance between two Text values. The
 Levenshtein distance between two strings is the minimal number of
 operations necessary to transform one string into another. For the
 Levenshtein distance allowed operations are: deletion, insertion, and
 substitution.
See also: https://en.wikipedia.org/wiki/Levenshtein_distance.
Heads up, before version 0.3.0 this function returned
 Natural.
levenshteinNorm :: Text -> Text -> Ratio Int #
Return the normalized Levenshtein distance between two Text values.
 Result is a non-negative rational number (represented as Ratio
 Natural
See also: https://en.wikipedia.org/wiki/Levenshtein_distance.
Heads up, before version 0.3.0 this function returned Ratio
 Natural
damerauLevenshtein :: Text -> Text -> Int #
Return the Damerau-Levenshtein distance between two Text values. The
 function works like levenshtein, but the collection of allowed
 operations also includes transposition of two adjacent characters.
See also: https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance.
Heads up, before version 0.3.0 this function returned
 Natural.
damerauLevenshteinNorm :: Text -> Text -> Ratio Int #
Return the normalized Damerau-Levenshtein distance between two Text
 values. 0 signifies no similarity between the strings, while 1 means
 exact match.
See also: https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance.
Heads up, before version 0.3.0 this function returned Ratio
 Natural
Treating inputs like sets
overlap :: Text -> Text -> Ratio Int #
Return the overlap coefficient for two Text values. Returned value is
 in the range from 0 (no similarity) to 1 (exact match). Return 1 if both
 Text values are empty.
See also: https://en.wikipedia.org/wiki/Overlap_coefficient.
Since: 0.3.0
jaccard :: Text -> Text -> Ratio Int #
Return the Jaccard similarity coefficient for two Text values.
 Returned value is in the range from 0 (no similarity) to 1 (exact match).
 Return 1 if both
See also: https://en.wikipedia.org/wiki/Jaccard_index
Since: 0.3.0
Other
hamming :: Text -> Text -> Maybe Int #
O(n) Return the Hamming distance between two Text values. Hamming
 distance is defined as the number of positions at which the corresponding
 symbols are different. The input Text values should be of equal length
 or Nothing will be returned.
See also: https://en.wikipedia.org/wiki/Hamming_distance.
Heads up, before version 0.3.0 this function returned Maybe
 Natural
jaro :: Text -> Text -> Ratio Int #
Return the Jaro distance between two Text values. Returned value is
 in the range from 0 (no similarity) to 1 (exact match).
While the algorithm is pretty clear for artificial examples (like those from the linked Wikipedia article), for arbitrary strings, it may be hard to decide which of two strings should be considered as one having “reference” order of characters (order of matching characters in an essential part of the definition of the algorithm). This makes us consider the first string the “reference” string (with correct order of characters). Thus generally,
jaro a b ≠ jaro b a
This asymmetry can be found in all implementations of the algorithm on the internet, AFAIK.
See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Heads up, before version 0.3.0 this function returned Ratio
 Natural
Since: 0.2.0
jaroWinkler :: Text -> Text -> Ratio Int #
Return the Jaro-Winkler distance between two Text values. Returned
 value is in range from 0 (no similarity) to 1 (exact match).
See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Heads up, before version 0.3.0 this function returned Ratio
 Natural
Since: 0.2.0