optimize content streams #385

maxiride · 2021-10-04T06:01:16Z

I've been using pdfcpu optimize for a while now with very successful results, recently it has been brought to my attention from a fellow peer another software called cpdfsqueeze (source and binares linked below), I'm not making an issue to bring attention to cpdfsqueeze but rather only because its source is available thus allowing for potential learnign and a pdfcpu improvement.

cpdfsqueeze managed to compress a 280MB PDF down to 19MB while pdfcpu (with or without the xrefstream option set to true\false) compressed it to 153MB and 156MB respectively.

From a purely side by side comparison the PDF produced are identical (given that they are all meant for printing so forms and other content is not important), however, due to a very low knowledge of the PDF internals I can't really tell what has been done.

In the event that the contributors will find the source code resourceful and "clean" on the PDF manipulation I am willingly to put a bounty to improve the optimization process.

cpdfqueeze
Source: https://github.com/johnwhitington/cpdfsqueeze
Binaries: https://github.com/coherentgraphics/cpdfsqueeze-binaries

Due to the troubles of producing large enough PDFs I've sent demo production files via email.

hhrutter · 2021-10-04T06:19:48Z

Checking for duplicate content streams is something I noticed cpdfsqeeze is doing and pdfcpu not doing right now.
I am further investigating this.

hhrutter · 2022-01-18T12:23:26Z

Optimization takes care now of redundant content streams and forms.

This reverts commit a002745.

This reverts commit bbe8e25.

* Fix pdfcpu#442, pdfcpu#443 * Fix pdfcpu#437 * Fix pdfcpu#434 * Fix pdfcpu#429 * Fix pdfcpu#438 * Fix pdfcpu#440 * Fix pdfcpu#380 * Fix pdfcpu#446 * Add Fedora instructions (pdfcpu#439) * Fix pdfcpu#389 * Fix pdfcpu#357, pdfcpu#451 * Fix free list validation * Cleanup * Fix pdfcpu#453 * Fix pdfcpu#457 * Revert "Revert "Fix pdfcpu#385"" This reverts commit bbe8e25. Co-authored-by: Horst Rutter <hhrutter@gmail.com> Co-authored-by: Fabio Alessandro Locati <77888+Fale@users.noreply.github.com>

hhrutter self-assigned this Oct 5, 2021

hhrutter added the investigate label Oct 5, 2021

hhrutter changed the title ~~[Feedback] optimize performances~~ optimize content streams Nov 30, 2021

hhrutter closed this as completed in a002745 Jan 18, 2022

adamgreenhall added a commit to adamgreenhall/pdfcpu that referenced this issue Feb 3, 2022

Revert "Fix pdfcpu#385"

bbe8e25

This reverts commit a002745.

adamgreenhall mentioned this issue Feb 8, 2022

pdf create + stamp + optimization problem #429

Closed

adamgreenhall added a commit to adamgreenhall/pdfcpu that referenced this issue Apr 28, 2022

Revert "Revert "Fix pdfcpu#385""

df90179

This reverts commit bbe8e25.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize content streams #385

optimize content streams #385

optimize content streams #385

optimize content streams #385

Comments