Bloom Filter — What is it ?
Bloom filter is memory efficient probabilistic data structure, the main idea behind is to check the availability of an element in a given set of data, it either provides false positive or a negative occurrence in a set, so in simple words it will never deny an element if it is available in a set but it can claim the presence of an element which may or may not present in a set of data.
It is imperative to define a size of filter while creating it otherwise it will provide false-positive results for each and every search it is due to the fact that bloom filter uses a hashing algorithm if the size of the bloom filter is not big enough for the elements the hashing algorithm will go in deception and give false-positive results each time. So in a nutshell, one should always be wise while defining a size of filter in correspondence with the size of set. We can also define error tolerance in it but its value will definitely affect the space in memory, so it is in fact a trade-off.
There are numerous scenarios where bloom filter is used:
- Cassandra uses bloom filter in its SSTable for effective read operations.
- Bitcoin blockchain uses bloom filter for payment verification, this process is used by SPV (Simplified Payment Verification), so instead of checking the entire chain of blocks only specific block is monitored by using bloom filter technique. (Author Note: Simplified payment verification was developed to collaborate bitcoin and Ethereum smart contracts)
- Google web browser chrome uses it to check malicious Urls.
Thus any activity that involves to find the possibility of occurrence of an element in a large set of data can use bloom filter.