bugtex4ht - Bugs: bug #618, Incomplete XML Document, domfilter...

 
 
Show feedback again

You are not allowed to post comments on this tracker with your current authentification level.

bug #618: Incomplete XML Document, domfilter error, truncated build on large file.

Submitted by:  Nasser M. Abbasi <nma123>
Submitted on:  Tue Dec 12 01:04:12 2023  
 
Category: NonePriority: 5 - Normal
Severity: 7 - ImportantStatus: Wont Fix
Privacy: PublicAssigned to: None
Open/Closed: Closed

Mon Dec 18 22:03:11 2023, comment #3:

For the record: 1) dvitype reads the entire dvi file page by page, merely storing away the
total_pages value from the postamble so it can report any discrepancy,
not using it to actually parse the dvi file.

In theory it might be possible to change the tex4ht and t4ht programs to
do the same, but this is not something I'm going to spend time on. Aside
from the work of rearranging the basic logic of the programs, I think
it's likely that some other capacity problem will arise.

2) here is the tiny plain TeX file I wrote to
generate a document with 65600 DVI pages:

\count255=0
\loop\ifnum\count255 < 65600
\advance\count255 by 1
x\vfil\eject
\repeat
\end

Karl Berry <karl>
Project Administrator
Sun Dec 17 21:25:56 2023, comment #2:

As I wrote at the end of the long thread on the mailing list:
https://tug.org/pipermail/tex4ht/2023q4/003508.html

(thread starts at https://tug.org/pipermail/tex4ht/2023q4/003489.html)

DVI format is limited to 2^16 pages, and I see no feasible way to change that. Sorry.

Karl Berry <karl>
Project Administrator
Tue Dec 12 14:45:11 2023, comment #1:

It seems that something causes tex4ht command to stop processing your DVI file abruptly. Maybe some wrong \special command. There are no errors in the log file. It seems that LaTeX compiled the whole file. The problem is with the DVI file.

But I cannot compile the sample you sent me; it is too large for my computer. I've made a smaller example that contains text up to the point where it fails on your system, and the compilation succeeded. So it is even more mysterious than I expected.

Michal Hoftich <michal_h21>
Project Member
Tue Dec 12 01:04:12 2023, original submission:

I have been working with Michal on this via private email but thought to enter a bug report on this just for tracking and documentation.

I have one large file (57,000 PDF pages) that when compiled with tex4ht (takes 14 hrs), and at about 10% when generating the final HTML pages, it gets XML error and stops.

i.e. the 90% rest of the sections are missing from the final web pages.

-------------------------------------------------------

[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
[WARNING] domfilter: ...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete XML Document [char=33675]

[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
[WARNING] domfilter: ...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete XML Document [char=33675]

[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm

----------------------------------

I've just send Michal a link to complete self contained ZIP file (450 MB) with instructions how to run as standalone in order to see these errors on his end.

I tried this on latest texlive 2023 on new Linux installation.

I will work with Michal to provide any additional information he needs from me, to hopefully find the cause of this problem.

This happens only on this file. I think may be due to the large size, since the Latex code is all generated by same program and only this file gives this error.

--Nasser

Nasser M. Abbasi <nma123>

 

No files currently attached

 

Depends on the following items: None found

Items that depend on this one: None found

 

Carbon-Copy List
  • -unavailable- added by karl (Posted a comment)
  • -unavailable- added by michal_h21 (Posted a comment)
  • -unavailable- added by nma123 (Submitted the item)
  •  

    Do you think this task is very important?
    If so, you can click here to add your encouragement to it.
    This task has 0 encouragements so far.

    Only logged-in users can vote.

     

    Please enter the title of George Orwell's famous dystopian book (it's a date):

     

     

    2 latest changes follow.

    Date Changed By Updated Field Previous Value => Replaced By
    Sun Dec 17 21:25:56 2023karlStatusNone=>Wont Fix
      Open/ClosedOpen=>Closed
    Show feedback again

    Back to the top


    Powered by Savane 3.1-cleanup+gray