Weird locale behavior
Author Message

Joined: 18 Jan 2013
Posts: 279

PostPosted: Sat Jan 28, 2017 1:08 am    Post subject: Weird locale behavior Reply with quote

Even when I switch to utf-8 using eselect locale set, not only do I still get ISO-8859-1 text, but the keyboard starts behaving strangely when I switch to utf-8 and type special Norwegian characters.

I will demonstrate this by showing two sessions where I create a trivial file in each of them, containing some Norwegian characters, and then check the type of the file using the file program.

First I'm going to show what happens when I'm in ISO-88591:


# eselect locale list
Available targets for the LANG variable:
  [1]   bokmal
  [2]   C
  [3]   en_US *
  [4]   en_US.iso88591
  [5]   en_US.utf8

I have already executed . /etc/profile, I create the file as follows:


# echo "test æøå test" > test_iso8859
# file test_iso88591
test_utf8: ISO-8859 text

This is the expected result, en_US is defined as en_US ISO-8859-1 in my locale.gen file.

Now I'll show what happens when I switch to en_US.utf8:


# eselect locale set 5
Setting LANG to en_US.utf8 ...
Run ". /etc/profile" to update the variable in your shell.
# . /etc/profile
# echo "test æøå test" > test_utf8
# file test_utf8
test_utf8: ISO-8859 text

This is not the expected result. I would expect it to be UTF-8, as en_US.utf8 is defined as UTF-8 in my locale.gen file.

If I compare the two files using hexdump, they are also identical. The second UTF-8 file is indeed not UTF-8, it is ISO-8859.

The "weird" part is that when I type the characters "aøa" this is actually printed on screen "aø", it prunes away the last character". However, if I then type a space, the last character appears again, and I get "aøa " (the space is also printed and I included it in this example intentionally to illustrate).

What am I missing here?

I'm using OpenRC if it should matter, and this is my /etc/locale.gen:


# cat /etc/locale.gen
# /etc/locale.gen: list all of the locales you want to have on your system
# The format of each line:
# <locale> <charmap>
# Where <locale> is a locale located in /usr/share/i18n/locales/ and
# where <charmap> is a charmap located in /usr/share/i18n/charmaps/.
# All blank lines and lines starting with # are ignored.
# For the default list of supported combinations, see the file:
# /usr/share/i18n/SUPPORTED
# Whenever glibc is emerged, the locales listed here will be automatically
# rebuilt for you.  After updating this file, you can simply run `locale-gen`
# yourself instead of re-emerging glibc.

nb_NO.UTF-8 UTF-8
nb_NO ISO-8859-1
en_US ISO-8859-1
en_US.UTF-8 UTF-8
Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Sat Jan 28, 2017 12:22 pm    Post subject: Reply with quote

deltamalloc ...

it may not be the locale but the terminal (perhaps even shell/command) used to generate/display the output:

% echo $SHELL
% echo $LANG
% which echo
echo: shell built-in command
% echo "test æøå test" > test
% file test
test: UTF-8 Unicode text
% eix '-Ic' 'x11-terms/*'
[I] x11-terms/rxvt-unicode (9.21@2016-09-30): rxvt clone with xft and unicode support
% eix '-Ice#' --installed-with-use unicode

HTH & best ... khay
Joined: 02 May 2003
Posts: 7430

PostPosted: Sun Jan 29, 2017 3:43 am    Post subject: Reply with quote

Pretty sure you forget to run locale-gen no?

what you define in /etc/locale.gen file will be created only when you rebuild glibc or run locale-gen.
until you do that, previous definitions remains.

as a clue you forget to run it, eselect output should have the option en_US-UTF8 and you don't have it. Also you should have nb_NO.UTF-8 too...
look at khayyam output, he use "en_GB.UTF-8" too, not "en_gb.utf8", and look at mine.
echo $LANG
grep FR /etc/locale.gen
fr_FR.UTF-8 UTF-8
 * Generating 1 locales (this might take a while) with 1 jobs
 *  (1/1) Generating fr_FR.UTF-8 ...                                      [ ok ]
 * Generation complete
