Skip to content

Bump tika-core from 2.4.1 to 2.5.0

Bumps tika-core from 2.4.1 to 2.5.0.

Changelog

Sourced from tika-core's changelog.

Release 2.5.0 - 09/30/2022

  • Improved extraction of PDF subset info for PDF/UA, PDF/VT, and PDF/X. NOTE: we no longer append PDF/A information, e.g. 'version="A-1b"' to the 'dc:format'. Users must now get that information from the 'pdfa:PDFVersion' key or from 'pdfaid:conformance' and 'pdfaid:part' (TIKA-3844).

  • Avoid infinite loop in bookmark extraction from PDFs (TIKA-3832).

  • Upgraded to slf4j 2.0.1 (TIKA-3842).

  • Added upsert option for the OpenSearch emitter (TIKA-3855).

  • Extract PDF signature information at the document level into the metadata (TIKA-3852).

  • Enable configuration of digests via AutoDetectParserConfig (TIKA-3853).

  • Use commons-io byte array streams via PJ Fanning (TIKA-3843).

  • Upgrade to PDFBox 2.0.27 (TIKA-3866).

  • Upgrade to JempBox 1.8.17 (TIKA-3856).

  • Add extraction of ODF version from ODF files (TIKA-3840).

  • tika-parser-html-commons (BoilerPipeHandler) is no longer a a dependency of tika-parser-html-module. tika-app and tika-server-standard have added a dependency on tika-parser-html-commons. However, users who are managing custom dependencies and who want the BoilerPipeHandler will have to now include the tika-parser-html-commons dependency (TIKA-1484).

  • Add unrar as an optional parser (TIKA-3800).

  • Refactor FuzzingCLI to use PipesParser (TIKA-3799).

  • ServiceLoader's loadServiceProviders() now guarantees unique classes (TIKA-3797).

  • Fix bug that prevented setting of includeHeadersAndFooters for xls, xlsx, doc and docx via tika-config (TIKA-3796).

  • Fix bug that prevented specification of rendered image type via http header in the PDFParser (TIKA-3794).

  • Fix bug causing some Exif dates to be decoded wrongly on timezones different than UTC (TIKA-3815).

... (truncated)

Commits
  • 1f4169b [maven-release-plugin] prepare release 2.5.0-rc1
  • 15aa09f Fix rat-check problems
  • 5f43255 prep for 2.5.0 rc1
  • cf5896c Merge pull request #722 from apache/dependabot/maven/com.google.protobuf-prot...
  • 1ddc907 Merge pull request #721 from apache/dependabot/maven/aws.version-1.12.313
  • c6e7c8e Merge pull request #720 from apache/dependabot/maven/test.containers.version-...
  • d3279e9 Bump protobuf-java from 3.21.6 to 3.21.7
  • 488c885 Bump aws.version from 1.12.312 to 1.12.313
  • cfa3acf Bump test.containers.version from 1.17.3 to 1.17.4
  • 87730cb TIKA-3864 -- disable unit tests because Apache's Hudson doesn't like utf-8 in...
  • Additional commits viewable in compare view


Dependabot commands
You can trigger Dependabot actions by commenting on this MR
  • $dependabot rebase will rebase this MR
  • $dependabot recreate will recreate this MR rewriting all the manual changes and resolving conflicts

Merge request reports