org.apache.lucene.analysis.in

Class IndicNormalizer



  • public class IndicNormalizer
    extends Object
    Normalizes the Unicode representation of text in Indian languages.

    Follows guidelines from Unicode 5.2, chapter 6, South Asian Scripts I and graphical decompositions from http://ldc.upenn.edu/myl/IndianScriptsUnicode.html

    • Method Detail

      • normalize

        public int normalize(char[] text,
                             int len)
        Normalizes input text, and returns the new length. The length will always be less than or equal to the existing length.
        Parameters:
        text - input text
        len - valid length
        Returns:
        normalized length