Skip to content

Bump tika-core from 2.7.0 to 2.8.0

Bumps tika-core from 2.7.0 to 2.8.0.

Changelog

Sourced from tika-core's changelog.

Release 2.8.0 - 5/11/2023

  • Enable counting and/or parsing of incremental updates in PDFs. This is an experimental feature and may change in later releases (TIKA-4017).

  • Fixed bug that prevented the the loading of CompositeExternalParser in tika-app and tika-server-standard. This parser will call exiftool and ffmpeg if those are installed, as was the behavior in Tika 1.x. Exclude org.apache.tika.parser.external.CompositeExternalParser if you do not want this behavior (TIKA-4022).

  • Removed the shading of tika-parsers-standard-module (TIKA-4038).

  • Enable optional extraction of file system metadata in FileSystemFetcher (TIKA-4035).

  • Allow pretty printing in FileSystemEmitter (TIKA-4034).

  • Add detection for and a new mime type for older postscript-based Adobe Illustrator "application/illustrator+ps" files (TIKA-3971).

  • Add magic detection for canon raw file types: crw, cr2 and cr3 (TIKA-3991).

  • Add detection for ONIX message files (TIKA-4011).

  • Add detection and a parser for ActiveMime files (TIKA-3987).

  • Add extraction of rendition layout value and version from Epub (TIKA-4013).

  • Improve embedded file extraction from PDFs (TIKA-4012).

  • Improve metadata extraction from WARCs (TIKA-4018).

  • Update to PDFBox 2.0.28 (TIKA-4016).

  • Users may now avoid the ZeroByteFileException via a setting on the AutoDetectParserConfig (TIKA-3976).

  • Fix bug in closing elements in the presence of elements in RTF files (TIKA-3972).

  • Improve extraction of embedded file names in .docx (TIKA-3968).

  • Normalize author, title, subject and description to their Dublin Core properties in the HTMLParser (TIKA-3963).

Release 2.7.0 - 1/31/2023

  • Add SVG detection for svg files that lack the xml header (TIKA-3308).

  • Migrate to a live fork of Universal Charset Detector (TIKA-3213).

... (truncated)

Commits
  • 656971f [maven-release-plugin] prepare release 2.8.0-rc2
  • fd27103 Update CHANGES.txt and rollback dev version for 2.8.0-rc2
  • ef8c8ff Remove shading of tika-parsers-standard-package (#1130)
  • 6a93b54 Merge pull request #1127 from apache/dependabot/maven/test.containers.version...
  • 93d824a Merge pull request #1128 from apache/dependabot/maven/com.google.cloud-google...
  • 49e5970 Bump google-cloud-storage from 2.22.1 to 2.22.2
  • 4b6d797 Merge pull request #1129 from apache/dependabot/maven/aws.version-1.12.467
  • fab540d Bump aws.version from 1.12.466 to 1.12.467
  • c12e825 Bump test.containers.version from 1.18.0 to 1.18.1
  • 5323f9e TIKA-4037 -- add detection for os2 bitmap arrays.
  • Additional commits viewable in compare view


Dependabot commands
You can trigger Dependabot actions by commenting on this MR
  • $dependabot rebase will rebase this MR
  • $dependabot recreate will recreate this MR rewriting all the manual changes and resolving conflicts

Merge request reports