org.apache.lucene.analysis.custom

Class CustomAnalyzer

  • All Implemented Interfaces:
    Closeable, AutoCloseable


    public final class CustomAnalyzer
    extends Analyzer
    A general-purpose Analyzer that can be created with a builder-style API. Under the hood it uses the factory classes TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

    You can create an instance of this Analyzer using the builder:

     Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
       .withTokenizer(StandardTokenizerFactory.class)
       .addTokenFilter(StandardFilterFactory.class)
       .addTokenFilter(LowerCaseFilterFactory.class)
       .addTokenFilter(StopFilterFactory.class, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
       .build();
     
    The parameters passed to components are also used by Apache Solr and are documented on their corresponding factory classes. Refer to documentation of subclasses of TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

    You can also use the SPI names (as defined by ServiceLoader interface):

     Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
       .withTokenizer("standard")
       .addTokenFilter("standard")
       .addTokenFilter("lowercase")
       .addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
       .build();
     

    The list of names to be used for components can be looked up through: TokenizerFactory.availableTokenizers(), TokenFilterFactory.availableTokenFilters(), and CharFilterFactory.availableCharFilters().