bugtex4ht - Bugs: bug #597, tex4ht + biblatex + non-ascii...

Show feedback again

You are not allowed to post comments on this tracker with your current authentification level.

bug #597: tex4ht + biblatex + non-ascii chars = mixed encoding in html file

Submitted by:  Matteo Gamboz <gamboz>
Submitted on:  Sat Mar 4 14:01:23 2023  
Category: NonePriority: 5 - Normal
Severity: 5 - NormalStatus: None
Privacy: PublicAssigned to: None
Open/Closed: Closed

Mon Mar 6 12:01:36 2023, comment #4:

I forgot to close it :)

Michal Hoftich <michal_h21>
Project Member
Mon Mar 6 12:00:13 2023, comment #3:

The trick with a space is a bit confusing, but I'm afraid it had some purpose originally. It actually allows you to select different translation tables for fonts. I don't know if anyone uses this functionality, but it's probably better not to change it. It's shown in the original documentation. See:


htlatex filename "" "dbcs/!"


htlatex filename "" " -ciso2htf" "" "-translate-file=il2-pl"

Michal Hoftich <michal_h21>
Project Member
Mon Mar 6 07:33:19 2023, comment #2:

Thank you Michal, the space before "-cunihtf" did the trick :)

For me the issue can be closed (it was a configuration error on my part anyway).

Matteo Gamboz <gamboz>
Sat Mar 4 14:29:38 2023, comment #1:

It is OK to cross-post, don't worry. I've already provided my answer on TeX.sx, so I will post only a shorter version here:

The correct call for htlatex if you want Unicode output is this:

$ htlatex mwe.tex "xhtml,charset=utf-8" " -cunihtf -utf8"

Important is the space between " and -cunihtf, and the charset=utf-8 option.

But it is better to use make4ht, because it outputs correct Unicode by default.

Michal Hoftich <michal_h21>
Project Member
Sat Mar 4 14:01:23 2023, original submission:

NB: this is the same as https://tex.stackexchange.com/q/678200/56076
(sorry for cross-posting) :)

I have this situation
- a LaTeX file with a macro that is usually translated into a unicode char by tex4ht (e.g. `\ldots` that became `…`)
- a citation with non-ascii char in the name of the author (e.g. the `í` in `Albarracín`)
- I would like to generate an xhtml file with htlatex

The procedure works, but the resulting file has one char encoded in utf-8 (the latex macro) and the non-ascii char in the author's name encoded in latin-1. AFAICT, htlatex includes the bbl file reading it as if it was in latin-1.

Is there anything that I could do to fix this behavior? :)\
(I'm working on `pdfTeX, Version 3.141592653-2.6-1.40.24 (TeX Live 2022/Arch Linux)`)

Here is a mwe, and below the commands that I run:

%% File mwe.tex


year = {2000},
volume = {1},
issue = {2},
pages = {3},
author = {Anyone Albarracín},
title = {A beautiful paper.},
journaltitle = {Some Journal}



I Am a Scientist\ldots\ Ask Me Anything



htlatex mwe.tex "xhtml" "-cunihtf -utf8" "" ""
biber mwe
htlatex mwe.tex "xhtml" "-cunihtf -utf8" "" ""
and the result
$ file mwe.html
mwe.html: XML 1.0 document, Non-ISO extended-ASCII text
$ grep -a -e 'Anyone Albarra' -e Scientist --color mwe.html
<!--l. 22--><p class="noindent" >I Am a Scientist… Ask Me Anything [<a
<!--l. 26--><p class="noindent" >Anyone Albarrac�n. &#8220;A beautiful paper.&#8221; In: <span


Matteo Gamboz <gamboz>


No files currently attached


Depends on the following items: None found

Items that depend on this one: None found


Carbon-Copy List
  • -unavailable- added by michal_h21 (Posted a comment)
  • -unavailable- added by gamboz (Submitted the item)

    Do you think this task is very important?
    If so, you can click here to add your encouragement to it.
    This task has 0 encouragements so far.

    Only logged-in users can vote.


    Please enter the title of George Orwell's famous dystopian book (it's a date):



    1 latest change follows.

    Date Changed By Updated Field Previous Value => Replaced By
    Mon Mar 6 12:01:36 2023michal_h21Open/ClosedOpen=>Closed
    Show feedback again

    Back to the top

    Powered by Savane 3.1-cleanup+gray