Bloom Filter — What is it ?

Bloom filter is memory efficient probabilistic data structure, the main idea behind is to check the availability of an element in a given set of data, it either provides false positive or a negative occurrence in a set, so in simple words it will never deny an element if it is available in a set but it can claim the presence of an element which may or may not present in a set of data.

It is imperative to define a size of filter while creating it otherwise it will provide false-positive results for each and every search it is due to the fact that bloom filter uses a hashing algorithm if the size of the bloom filter is not big enough for the elements the hashing algorithm will go in deception and give false-positive results each time. So in a nutshell, one should always be wise while defining a size of filter in correspondence with the size of set. We can also define error tolerance in it but its value will definitely affect the space in memory, so it is in fact a trade-off.

There are numerous scenarios where bloom filter is used:

  1. Cassandra uses bloom filter in its SSTable for effective read operations.

Thus any activity that involves to find the possibility of occurrence of an element in a large set of data can use bloom filter.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Waqar Mansoor

I am a tech enthusiast and a visionary person and been in the software industry for the last 3 years, I believe in sharing knowledge with each other.