org.apache.solr.handler.extraction

Class ExtractingDocumentLoader.MostlyPassthroughHtmlMapper

  • java.lang.Object
    • org.apache.solr.handler.extraction.ExtractingDocumentLoader.MostlyPassthroughHtmlMapper
    • Field Detail

      • INSTANCE

        public static final HtmlMapper INSTANCE
    • Method Detail

      • isDiscardElement

        public boolean isDiscardElement(String name)
        Keep all elements and their content. Apparently <SCRIPT> and <STYLE> elements are blocked elsewhere
      • mapSafeElement

        public String mapSafeElement(String name)
        Lowercases the element name, but returns null for <BR>, which suppresses the start-element event for lt;BR> tags. This also suppresses the <BODY> tags because those are handled internally by Tika's XHTMLContentHandler.