Information Concerning Character Encodings

The new default character encoding at the Mathematical Institute is UTF-8.

What are the advantages of Unicode compared to 8-bit-encodings (e.g. Latin1)?

8-bit-encodings only contain a small number of characters. This means that every region of the world has its own encoding so that the characters of that particular region can be displayed.

This leads to the following problems:

To avoid these problems, a character set was created which contains all possible characters: Unicode. In the computer world, the coding often used for these characters is UTF-8.

What are the advantages of UTF-8?

Obviously, the problems of 8-bit encodings are avoided. Still, there are other advantages:

For which types of files the character encoding plays a role?

Generally Text files, which include: Latex documents, html documents and program files (e.g. C code). Open-/Libre-/MS- office and pdf files are not text files.

Commands for manual conversion of files

Show the actual encoding:
file -i Datei
Example outputs are:
UTF-8: Datei: text/plain; charset=utf-8
Latin1: Datei: text/plain; charset=iso-8859-1
If the output contains charset=ascii, then no conversion is necessary since no special characters are involved.
If the output contains charset=unknown-8bit, it is very likely that the files contain both Latin1 and UTF-8 characters. If you are not able to solve this problem on your own, please do not hesitate to contact us.

Conversion of file content
Latin1 to UTF-8: iconv -f latin1 -t utf8 file > newfile
UTF-8 to Latin1: iconv -f utf8 -t latin1 file > newfile
Please make sure that the input (file) end output (newfile) file names are different in order to avoid corruption of your files.

Conversion of file names and folders:
Latin1 to UTF-8: convmv -f latin1 -t utf8 --notest Datei[en]
UTF-8 to Latin1: convmv -f utf8 -t latin1 --notest Datei[en]

Conversion of latex files:
First, please convert the file content as explained above.
The next step is to replace \usepackage[latin1]{inputenc} with \usepackage[utf8]{inputenc} .
If this is not sufficient, please leave us a note.

In a file Umlauts are not displayed correctly, what should I do?

Please don't edit the file and especially do not save the file afterwards, because it could become corrupted.
Many editors do have heuristics to detect the encoding. But this doesn't always work, respectively some editors don't have this ability. In this case you do have to manually set the encoding -- in most cases it should be sufficient to select UTF-8 or Latin1 (or ISO-8859-1, ISO-8859-15). If you are not able to solve this problem on your own, please do not hesitate to contact us.

Umlauts are not displayed correctly in the file manager dolphin and it isn't possible to open or edit the file, what should I do?

Unfortunately, many KDE applications have a bug, so that it isn't possible to open, edit or rename files with other character encodings in filenames.
On the command line you can automatically rename file names as described above with convmv. After that it should be possible to work again with these files. Alternatively the graphical file manager thunar is installed, which allows you to rename files with a different encoding in their names manually.
Furthermore, the KDE bug relates to unpacking archives that contain file names with a different encoding in their names. Please use the program file-roller to extract files from these archives.
If you are not able to solve this problem on your own, please do not hesitate to contact us.