Difference between revisions of "HyperLogLog"

From wikieduonline
Jump to navigation Jump to search
(10 intermediate revisions by one other user not shown)
Line 1: Line 1:
[[wikipedia:HyperLogLog]] is an algorithm for the [[count-distinct]] problem.
+
[[wikipedia:HyperLogLog]] is an [[algorithm]] for the [[count-distinct]] problem.
 +
 
 +
Calculating the exact [[cardinality]] of a [[multiset]] requires an amount of memory proportional to the [[cardinality]], which is impractical for very large data sets. The HyperLogLog algorithm is able to estimate cardinalities of > 109 with a typical accuracy (standard error) of 2%, using 1.5 kB of memory.
 +
 
 
  [[/etc/redis.conf]]
 
  [[/etc/redis.conf]]
 +
 +
[[HyperLogLog++]] counts based on the hashes of the values with some properties:
 +
* Configurable precision, which decides on how to trade memory for accuracy
 +
* Excellent accuracy on low-cardinality sets
 +
* Fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
 +
 +
 +
 +
== See also ==
 +
* [[LogLog]]
 +
* {{Redis}}
 +
 +
 +
[[Category:Probabilistic data structures]]

Revision as of 14:59, 3 October 2022

wikipedia:HyperLogLog is an algorithm for the count-distinct problem.

Calculating the exact cardinality of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. The HyperLogLog algorithm is able to estimate cardinalities of > 109 with a typical accuracy (standard error) of 2%, using 1.5 kB of memory.

/etc/redis.conf

HyperLogLog++ counts based on the hashes of the values with some properties:

  • Configurable precision, which decides on how to trade memory for accuracy
  • Excellent accuracy on low-cardinality sets
  • Fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.


See also

Advertising: