Using UTF-8 in mrxvt, even though mrxvt doesn't support it

2014-09-14

I use mrxvt (see also the SourceForge page) as my terminal emulator. I’ve contributed a lot of code and took over development of mrxvt 10 years ago. However I ran out of free time, and don’t code for it anymore. I do still use it every day, and it doesn’t have any bugs that affect me… however it does lack utf8 support. (We have an experimental utf8 branch, but never managed to make it usable.)

Even though I mainly use an ISO 8859-1 encoding, I’ve run into enough UTF-8 files that I needed to do something about it. Here are a few things I’ve done that allows me to keep using mrxvt and interact correctly with UTF-8.

Setting the LANG environment variable.

Put the following in ~/.mrxvtrc:

Mrxvt.profile0.command: \!LANG=en_US exec $SHELL

Now all terminal processes in mrxvt will have LANG set to something mrxvt supports. (I know this isn’t UTF-8. That comes later.)

Getting Zsh to set LANG correctly.

If you use ssh to log in remotely to systems, or sometimes use Unicode aware terminal emulators then you might want to get your shell to tweak LANG a little. The following will ensure LANG is set to a locale supported by the system you’re currently logged into, and will prefer LANG=en_US if it guesses you’re coming from mrxvt and use LANG=en_US.UTF-8 otherwise. This requires you to be using zsh, and won’t work on [bash].

locales=($(locale -a 2>/dev/null | sed -e 's/utf8/UTF-8/'))
typeset -U langs

parent=$(ps -o comm= $PPID)
case $parent in
    (mrxvt)
        # Coming from mrxvt.
        langs=(en_US $LANG en_US.UTF-8 C POSIX)
        ;;

    (sshd)
        if [[ $TERM == rxvt* ]]; then
            # Probably coming from mrxvt
            langs=(en_US C POSIX $LANG en_US.UTF-8)
        else
            # Probably coming from a Unicode aware emulator
            langs=($LANG en_US.UTF-8 en_US C POSIX)
        fi
        ;;

    (*)
        # Most things support UTF-8. Try that first.
        langs=($LANG en_US.UTF-8 en_US C POSIX)
        ;;
esac
unset parent

for l in $langs; do
    [[ -n $locales[(r)$l] ]] && break
done

[[ $l != $LANG ]] && \
    echo "\e[31mWarning: \e[m Using LANG=$l (instead of ${LANG:-being unset})."

export LANG=$l
unset l langs locales

Edit UTF-8 files in Vim in a non-UTF-8 environment.

I’m “old-school”. I still use vim on a vanilla console (and never gvim etc.). I recently found out that Vim can edit UTF-8 files, but translate the display into your current locale using the tenc setting. If you only ever use vim under mrxvt, then put

set tenc=latin1

in your ~/.vimrc, and you should be good to go.

If you use multiple terminal emulators, ssh and mutt, then your setup will be more complicated. Basically you need to set enc=utf-8 tenc=latin1 when run in mrxvt, except when run by mutt. When run by mutt, it does the encoding for you and you need use fencs= enc= (see the mutt wiki entry for more info).

Here’s what I use in my ~/.vimrc:

if $LANG !~ '\v\cutf-?8$'
    " Not a UTF-8 locale.
    set tenc=latin1

    if argv(0) =~ '/mutt-.*-[0-9]\{8,}'
        " Probably running mutt. Use language detected encoding, and disable
        " buffer encoding detection. See
        "   http://dev.mutt.org/trac/wiki/MuttFaq/Charset
        set fencs=
    else
        " Probably not running mutt.
        set enc=utf-8
    endif
endif

You can of course replace latin1 with whatever encoding you currently use.

Note. You will only be able to see characters that can be displayed in the currently supported encoding (latin1). This includes many common accents and symbols, so should suffice for most purposes (i.e. WorksForMe™). Characters that can’t be displayed will be shown as an upside-down question mark. Most importantly, you won’t corrupt files that are UTF-8 encoded by (perhaps unknowingly) editing them in the wrong encoding.

Setting the configuration file encoding in Mutt

If you use mutt I’d also recommend setting the configuration file encoding explicitly. (This way, if you run mutt in different locales, your aliases aren’t messed up.) Typically you’ll only need this in your alias files. Write your .mutt/aliases as follows:

set config_charset =  'utf-8'

# Your aliases (in UTF-8) go here.
alias Námè Wïtħ Ãccênts <email@domain.com>
...

# Put this at the end of the file.
set config_charset=''

You can use config_charset = 'latin1' if you prefer.

Using Screen to translate UTF-8 into your current encoding

The above is sufficient for 99% of my needs. I never have filenames with non-ascii characters. But if you do, or you have other terminal based programs that require utf-8, you can run them in screen. Put the following in your ~/.screenrc:

defencoding     UTF-8
setenv LANG     en_US.UTF-8

You can further run screen automatically in every tab in mrxvt using the following in your ~/.mrxvtrc:

Mrxvt.profile0.command: env LANG=en_US screen

In this case using startup_message off and altscreen on in ~/.screenrc might also help.

Using xterm / lxterminal temporarily

As an absolute last resort, I run a different terminal emulator briefly. Both xterm and lxterminal have good UTF-8 support (but suck for a variety of other reasons). For convenience, I have the following zsh alias:

alias utf8='LANG=en_US.UTF-8 '

Now I can launch xterm (or any other program) in a UTF-8 locale using:

utf8 xterm &
utf8 firefox &

📮 Leave a comment (Spammers beware: All comments are moderated)

Sorry. There was an error submitting your comment. Please try again, or contact me if the problem persists.
Sending comment; please wait.
Thanks. Your comment was successfully submitted. It will appear here shortly if it isn't spam.