Class URLClassifyProcessor

  • public class URLClassifyProcessor
    extends UpdateRequestProcessor
    Update processor which examines a URL and outputs to various other fields characteristics of that URL, including length, number of path levels, whether it is a top level URL (levels==0), whether it looks like a landing/index page, a canonical representation of the URL (e.g. stripping index.html), the domain and path parts of the URL etc.

    This processor is intended used in connection with processing web resources, and helping to produce values which may be used for boosting or filtering later.