Some time ago I happened upon a class of algorithms called streaming algorithms. They piqued my curiosity greatly, and I implemented several, initially in C++. I have since started to implement some of them in other languages (currently only python and go). I decided that I'd next implement a bloom filter in python. A generic bloom filter is incredibly easy to write up, provided you have a few underlying libraries. It turns out that there is not a standard bit array library for python. I was kind of surprised, since python has a lot of built in libraries. I had to write one for C++ (ok, I didn't have to, but I'd rather write my own that force a dependency on something like Boost), and so to get bloom filter working, I'd need one for python as well. I spend about 10 minutes writing this up, so don't be surprised if there are some lingering issues. If you find one, I'd love to hear about it. I based this off of the one I wrote in C++, but this is much simpler. The C++ variant allows for each element to be more than 1 bit. You can have elements of size n-bits, which is nice for some of the other forms of bloom filters (like counting bloom filters). Without further ado, here is the code: code:

import array
from math import log

class BitArray(object):
    def __init__(self, bits):
        # L = 4 bytes minimum, but could be 8 bytes. No need to waste all that space
        # so lets look up the size of L on this machine
        self.elem_size = int(log(array.array('L').itemsize * 8, 2))
        self.elem_mask =  int(pow(2, self.elem_size) - 1)
        size = bits >> self.elem_size;
        if bits & self.elem_mask:
            size += 1

        self.bits = array.array('L', (0,) * size)



    def set(self, bit):
        # first look up the entry in the array
        array_index = bit >> self.elem_size
        # find position of bit in the individual element
        bit_position = bit & self.elem_mask
        # generate a mask to apply to the array element
        mask = 1 << bit_position

        # apply the mask
        self.bits[array_index] |= mask

    def clear(self, bit):
        # first look up the entry in the array
        array_index = bit >> self.elem_size
        # find position of bit in the individual element
        bit_position = bit & self.elem_mask
        # generate a mask to apply to the array element,
        # but invert it since we are clearing
        mask = 1 << bit_position
        mask = ~mask

        self.bits[array_index] &= mask


    def bit(self, bit):
         # first look up the entry in the array
        array_index = bit >> self.elem_size
        # find position of bit in the individual element
        bit_position = bit & self.elem_mask
        # generate a mask to apply to the array element
        mask = 1 << bit_position

        return 1 if self.bits[array_index] & mask > 0 else 0

You can find the rest of the source (like test files and the license) in the repo on github.