org.apache.lucene.codecs.bloom

Class BloomFilteringPostingsFormat

  • All Implemented Interfaces:
    NamedSPILoader.NamedSPI


    public final class BloomFilteringPostingsFormat
    extends PostingsFormat

    A PostingsFormat useful for low doc-frequency fields such as primary keys. Bloom filters are maintained in a ".blm" file which offers "fast-fail" for reads in segments known to have no record of the key. A choice of delegate PostingsFormat is used to record all other Postings data.

    A choice of BloomFilterFactory can be passed to tailor Bloom Filter settings on a per-field basis. The default configuration is DefaultBloomFilterFactory which allocates a ~8mb bitset and hashes values using MurmurHash2. This should be suitable for most purposes.

    The format of the blm file is as follows:

    • BloomFilter (.blm) --> Header, DelegatePostingsFormatName, NumFilteredFields, FilterNumFilteredFields, Footer
    • Filter --> FieldNumber, FuzzySet
    • FuzzySet -->See FuzzySet.serialize(DataOutput)
    • Header --> IndexHeader
    • DelegatePostingsFormatName --> String The name of a ServiceProvider registered PostingsFormat
    • NumFilteredFields --> Uint32
    • FieldNumber --> Uint32 The number of the field in this segment
    • Footer --> CodecFooter