Skip to content

Bump tika-core from 2.6.0 to 2.7.0

Bumps tika-core from 2.6.0 to 2.7.0.

Changelog

Sourced from tika-core's changelog.

Release 2.7.1 - ???

  • Normalize author, title, subject and description to their Dublin Core properties in the HTMLParser (TIKA-3963).

Release 2.7.0 - 1/31/2023

  • Add SVG detection for svg files that lack the xml header (TIKA-3308).

  • Migrate to a live fork of Universal Charset Detector (TIKA-3213).

  • Improve handling of text-based attachments inside .eml files (TIKA-3959).

  • Add tika-parser-nlp-package to release artifacts (TIKA-3958).

  • Remove need for element in classes that extend ConfigBase (TIKA-3946).

  • Add X-TIKA:embedded_id_path to ensure unique embedded file paths (TIKA-3942).

  • Fix bug that prevented digests when the fallback/EmptyParser was called (TIKA-3939).

  • Remove log4j 1.2.x (and slf4j-log4j12 which now redirects to slf4j-reload4j) from all modules (TIKA-3935).

  • Upgrade mime4j to 0.8.9 (TIKA-3950).

  • Refactor date parsing for emails (TIKA-3957)

  • Upgrade to Bouncy Castle 1.71 and jdk18on jars (TIKA-3933).

  • Add a JDBCPipesReporter (TIKA-3931).

  • Add multivalued field strategy option in jdbc-emitter (TIKA-3930). Default is now 'concatenate' with ', ' as the delimiter.

  • Downgrade logging in PipesClient for each parse from info to debug.

Release 2.6.0 - 11/3/2022

  • Add optional Siegfried detector (TIKA-3901).

  • Move OverrideDetector's functionality to the CompositeDetector (TIKA-3904).

  • The FileCommandDetector has been refactored to have the same behavior as the Siegfried detector; see setUseMime in the javadoc (TIKA-3902).

  • Fix bug in OpenSearch emitter that prevented upserts on documents with embedded files (TIKA-3882).

... (truncated)

Commits


Dependabot commands
You can trigger Dependabot actions by commenting on this MR
  • $dependabot rebase will rebase this MR
  • $dependabot recreate will recreate this MR rewriting all the manual changes and resolving conflicts

Merge request reports