public final class Lucene50TermVectorsFormat extends CompressingTermVectorsFormat
term vectors format
.
Very similarly to Lucene50StoredFieldsFormat
, this format is based
on compressed chunks of data, with document-level granularity so that a
document can never span across distinct chunks. Moreover, data is made as
compact as possible:
packed ints
.
Term vectors are stored using two files
File formats
A vector data file (extension .tvd). This file stores terms,
frequencies, positions, offsets and payloads for every document. Upon writing
a new segment, it accumulates data into memory until the buffer used to store
terms and payloads grows beyond 4KB. Then it flushes all metadata, terms
and positions to disk using LZ4
compression for terms and payloads and
blocks of packed ints
for positions.
Here is a more detailed description of the field data file format:
IndexHeader
PackedInts.VERSION_CURRENT
as a VInt
VInt
VInt
VInt
if ChunkDocs==1 and as a PackedInts
array otherwisePackedInts
arrayblocks of 64 packed ints
blocks of 64 packed ints
blocks of 64 packed ints
blocks of 64 packed ints
blocks of 64 packed ints
blocks of 64 packed ints
blocks of 64 packed ints
blocks of 64 packed ints
CodecFooter
An index file (extension .tvx).
IndexHeader
CompressingStoredFieldsIndexWriter
CodecFooter
Constructor and Description |
---|
Lucene50TermVectorsFormat()
Sole constructor.
|
toString, vectorsReader, vectorsWriter
public Lucene50TermVectorsFormat()