org.apache.lucene.analysis.icu

Class ICUFoldingFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable


    public final class ICUFoldingFilter
    extends ICUNormalizer2Filter
    A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

    This filter applies the following foldings from the report to unicode text:

    • Accent removal
    • Case folding
    • Canonical duplicates folding
    • Dashes folding
    • Diacritic removal (including stroke, hook, descender)
    • Greek letterforms folding
    • Han Radical folding
    • Hebrew Alternates folding
    • Jamo folding
    • Letterforms folding
    • Math symbol folding
    • Multigraph Expansions: All
    • Native digit folding
    • No-break folding
    • Overline folding
    • Positional forms folding
    • Small forms folding
    • Space folding
    • Spacing Accents folding
    • Subscript folding
    • Superscript folding
    • Suzhou Numeral folding
    • Symbol folding
    • Underline folding
    • Vertical forms folding
    • Width folding

    Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.