## tex4ht - Bugs: bug #343, Package pdfpages

Show feedback again

You are not allowed to post comments on this tracker with your current authentification level.

## bug #343: Package pdfpages

 Submitted by: Michal Hoftich Submitted on: Mon 05 Dec 2016 02:20:24 PM EET Category: None Priority: 5 - Normal Severity: 5 - Normal Status: None Privacy: Public Assigned to: None Open/Closed: Open

Sun 22 Jan 2017 08:55:03 PM EET, comment #5:

The bottom line seems to be that post-processing the png from gs can indeed reduce the file size, but I doubt it is worth the trouble of invoking another external program.

Perhaps Ghostscript itself has options to control its png output and achieve smaller sizes that way, but I didn't look.

Meanwhile, any pdfpages support would be better than none :).

Thanks,
Karl

Karl Berry <karl>
Sun 22 Jan 2017 08:52:44 PM EET, comment #4:

Regarding pdf to png conversion, I finally took a few minutes to try to
get to the bottom of it. (Additional discussion on mailing list,
http://tug.org/pipermail/tex4ht/2016q4/001682.html)

I started with pdflatex small2e.tex. Resulting PDF is 60587 bytes.
I saw the same basic results you did: convert small2e.tex magick.png
resulted in a smaller file than your rungs invocation:

-rw-rw-r-- 1 karl root 9262 Jan 22 10:26 convert.png
-rw-rw-r-- 1 karl root 19189 Jan 22 10:15 rungs.png

I wondered if the precise gs invocation would make a difference.
So I ran
strace -vfs 9999 convert small2e.pdf convert.png >&/tmp/str
where the options to strace make it display everything.
The (voluminous) output shows gs being invoked this way,
except with temporary filenames:

gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
-dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 \
-sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 \
-r72x72 -sOutputFile=gsmagick.png \
small2e.pdf

But, running this results in the same file size as rungs; I was surprised:
-rw-rw-r-- 1 karl root 19189 Jan 22 10:29 gsmagick.png

Ok, so then I ran
convert -debug all small2e.pdf convert.png >&/tmp/deb
to get a sense of what convert thought it was doing.

And indeed, I see it running gs as we expected, and getting but then doing
postprocessing on the png file:
Searching for module "PNG" using filename "png.la"
...
...

Ok, so I am led to believe that convert is smarter than gs about how to
use png compression features (or whatever), and this seems plausible.

Finally, running it through netpbm results in an even smaller file:
pngtopnm convert.png | pnmtopng >pngto.png; ls -l pngto.png
-rw-rw-r-- 1 karl root 4185 Jan 22 10:34 pngto.png

While identify shows that the netpbm output is "PseudoClass" (uses color
table) rather than "DirectClass" (separate color per pixel):

\$ identify pngto.png convert.png
pngto.png PNG 612x792 612x792+0+0 8-bit PseudoClass 2c 4.18KB 0.000u 0:00.000
convert.png[1] PNG 612x792 612x792+0+0 8-bit DirectClass 9.26KB 0.000u 0:00.000

Some discussion at http://www.imagemagick.org/discourse-server/viewtopic.php?t=16706.

And no doubt with additional options one could get imagemagick to do
that too, or netpbm not to, or whatever, but it doesn't matter :).

Karl Berry <karl>
Wed 14 Dec 2016 05:34:03 PM EET, comment #3:

Thanks Karl.

pdfpages support isn't still ready, I should put it together before I forget it.

Michal Hoftich <michal_h21>
Wed 14 Dec 2016 02:16:59 AM EET, comment #2:

I committed the reordered tex4ht.env to TL, r42704.
(In Master/texmf-dist/tex4ht/base/unix)

You have some pdfpages support to commit, Michal?

Thanks ...

Karl Berry <karl>
Sat 10 Dec 2016 02:42:48 AM EET, comment #1:

if i'm understanding correctly, it would be fine (good) to commit the new tex4ht.env[-unix], independent of the actual pdfpages stuff you've done?

for the record, regarding tex4ht.env-win32: it bears no resemblance to the texmf-dist/tex4ht/base/win32/tex4ht.env that is used in TeX Live. Many (10-15?) years ago, the TL version was hacked (by Staszek W of GUST, as I recall) to be somewhat more portable, and use Unix-style paths. There has been no effort to get back in sync.

Karl Berry <karl>
Mon 05 Dec 2016 02:20:24 PM EET, original submission:

Package pdfpages is not supported by tex4ht. It is not really a surprise, as we operate in the DVI mode, but I was able to make some basic support, see this answer of mine on TeX.sx [1].

It is really an basic support, it supports just the `\includepdf[pages={1,2,3}]{filename.pdf}` form, the more advanced forms which include several pdf files, impose them, etc., are not supported. I also added support for `page` option for `\includegraphics`, so it is also possible to use `\includegraphics[page=number]{filename.pdf}`.

Before I add this to the sources, what is the best and most portable way of converting pdf to bitmap formats? In my solution, Imagemagick is used, but I guess that it is not bundled with TL on Windows, is it?

Michal Hoftich <michal_h21>

No files currently attached

Depends on the following items: None found

Items that depend on this one: None found

Carbon-Copy List
• -unavailable- added by karl (Posted a comment)
• -unavailable- added by michal_h21 (Submitted the item)
•

Do you think this task is very important?
This task has 0 encouragements so far.

Only logged-in users can vote.

Please enter the title of George Orwell's famous dystopian book (it's a date):