This is a companion article to my talk at Neovimconf 2023.
I have been using Vim/Neovim as my full time text editor for close to 10 years.
I’ve spent a lot of time in the terminal and have become very aware of the
many flaws and idiosyncrasies of this bizarre platform. But I also think it
gets a lot of things right! And I’m not alone in this belief: terminal based
tools are still widely popular even in the presence of many alternatives
(the StackOverflow developer survey shows that Neovim is the “most loved”
editor 3 years in a row).
It’s only been in the last couple of years that I’ve begun to dig deep into
the inner workings of how terminal emulators, and the applications that run
inside of them, really work. I’ve learned that there is a lot of innovation
and creative problem solving happening in this space, even though the
underlying technology is over half a century old1.
I’ve also found that many people who use terminal based tools (including
shells like Bash and editors like Vim) know very little about terminals
themselves, or some of the modern features and capabilities they can support.
In this article, we’ll discuss some of the problems that terminal based
applications have historically had to deal with (and what the modern solutions
are) as well as some features that modern terminal emulators support that you
may not be aware of.
But first, some (very) brief history.
Background & History
Most terminal emulators today can directly trace their roots back to the DEC
VT100. The VT100 was not the first video terminal, nor was it the last, but it
was the most popular (at the time). And as we’ve learned from history many
times since, what becomes popular creates the de facto standard for
everything that comes after.
DEC VT100
Jason Scott, CC BY 2.0 via Wikimedia Commons
Video terminals were an improvement on the teletype machines that preceded
them. They could move the cursor around the screen to create interactive
interfaces. They could use color, and clear and redraw their displays quickly
without feeding out reams of paper.
Different video terminals had their own unique way of doing things using
unique, proprietary escape codes (a sequence of bytes beginning with the
escape 0x1b character). This made life difficult for applications because
they had to know which of these sequences to use. Libraries and helper
programs (e.g. termcap) were created to help ameliorate these issues (we
still live with the descendant of these early libraries,
terminfo).
Eventually, formal standards were created, such as ECMA-48 and ANSI X3.64
(from which the term “ANSI escape codes” derives), which defined a set
of standard escape sequences. The DEC VT100 was the first video terminal to
support these new standards. Its popularity, combined with the new standards,
meant that programs now had a set of known good escape
sequences they could reliably use. Its popularity spawned many clones, which
in turn supported the same sequences for compatibility with applications.
Graphical window systems eventually replaced hardware video terminals, but
users still wanted to use the terminal based programs they were accustomed to
(you know how those vi people are). In 1984, work began on a software
terminal emulator at MIT. This emulator became part of the X project and was
named Xterm. Xterm implemented its own features which did not exist
on the video terminals it emulated, such as mouse tracking and a configurable
color palette. These features were in turn copied by Xterm clones, until
eventually Xterm itself became the new de facto standard.
Terminal Emulator Basics
Terminal based applications write two kinds of data to the terminal emulator:
printable text that is displayed to the user, and control codes, which modify
the terminal emulator’s state. Control codes are either single bytes in the C0
character set (bytes 0x00 through 0x1f) or sequences of bytes that begin
with the escape character (0x1b). These sequences are most commonly referred
to as “escape sequences”, and it is these sequences that do the bulk of the
heavy lifting in terminal applications.
Most control codes from the C0 character set are not used today, but
regardless of experience with terminals or terminal applications, most
developers are likely familiar with control codes such as r (carriage
return), which moves the cursor to the beginning of the current line, and n
(line feed), which moves the cursor to the next line.
Escape sequences are varied and numerous, but the vast majority used in
practice fall into one of three categories: Control Sequence Introducer (CSI),
Device Control String (DCS), and Operating System Command (OSC).
CSI sequences are those which begin with the prefix ESC [ (0x1b 0x5b).
Escape sequences in this category are those which reposition the cursor,
change the cursor style, clear the screen, set foreground and background
colors, and more.
OSC sequences are those which begin with the prefix ESC ] and are typically
used for things that modify or interact with the user’s environment outside of
the terminal emulator itself (hence the name “Operating System Command”).
Examples are reading from or writing to the system clipboard, changing the
title of the terminal emulator’s window, or sending desktop notifications.
Xterm maintains a list of all of the control sequences it supports on its
website, which, along with vt100.net, forms an
informal pseudo-specification for VT100 emulators. Note that this list may not
contain some control sequences used by other, modern terminal emulators for
features which Xterm does not support (e.g. the Kitty keyboard protocol, which
we’ll discuss later).
Escape sequences are actually quite easy to use, and you can even do it
straight from your shell. Try running the following command from any shell:
printf ‘e[1;32mHello e[0;4;31mworld!ne[0m’
This command will print the text “Hello world!”, with “Hello” in
green, bold
text and “world!” in
red, underlined
text.
The escape sequences used here are of the form CSI m, which is
so common it has its own name: Select Graphic Rendition (SGR). The SGR escape
sequence sets foreground and background colors for all printed text. The first
escape sequence in the example e[1;32m enables the bold attribute (1)
and sets the foreground color to
green
(32). The second escape sequence e[0;4;31m first clears any existing
styles (0), then enables the
underline
attribute (4), and finally sets the foreground text color to
red
(31). Finally, the last escape sequence e[0m resets all styles back to
their defaults.
Another use case for simple CSI sequences is redrawing text on the screen on
an already existing line (e.g. for a progress bar or text that updates itself
over time). Hint: look at r, CSI A, and CSI K.
Most escape sequences are sent from the application to the terminal emulator,
but occasionally the terminal emulator sends escape sequences to the
application. Usually this is done in response to a query from the application
(for instance, to determine if a certain mode is set).
Problems & Solutions
Terminal emulators are descended from old, legacy technologies, which brings
with it its fair share of problems. Many of these problems have been (mostly)
solved, or at least ameliorated, while others are still active areas of
innovation and research.
Key Encoding
Terminal emulators and terminal applications communicate through a stream of
bytes. When a user presses a key the terminal sends the byte representation of
the character associated with that key. The old video terminals only supported
ASCII so this was, generally, fairly straightforward.
Modifier keys like Ctrl and Alt complicate this situation. Alt modified
keys are encoded by prefixing the character with an Esc. But this has a
problem: including an extra Esc byte for the Alt modifier introduces
ambiguity between Alt modified key presses and two separate key presses.
When an application sees Esc C, should it interpret it as Alt-C or did the
user press Esc and then press C? Applications usually solve this by
measuring the amount of time between Esc and the next character. If the time
is less than some defined interval, it is considered an Alt modified key
press (Vim uses the ttimeoutlen option, tmux uses the escape-time option).
Ctrl modified keys are an even bigger problem. When Ctrl is used as a
modifier, the shifted2 version of the key has the 7th bit masked off (for
example, C is 0x43 and after masking the 7th bit the byte becomes 0x03).
This means that not only can the Shift modifier not be used in conjunction
with Ctrl, but that certain Ctrl modified keys are completely
indistinguishable from other control codes.
For instance, when you press the Return key the terminal emulator sends the
byte r (0x0d) to the application. But if you press Ctrl-M then the
terminal emulator also sends the byte 0x0d to the application (M is 0x4d
in ASCII, so when the 7th bit is masked out, it becomes 0x0d). From the
application’s perspective, there is literally no way to distinguish these two
events.
For a long time this meant that certain modified keys like Ctrl-I, Ctrl-J,
and Ctrl-M could not be used in terminal applications like Vim. There have
been a few attempts to solve this problem: the first came from Xterm in 2006
through the modifyOtherKeys option. Paul Evans (author of libvterm and
libtickit) introduced an alternate key encoding using the CSI u escape
sequence in an essay which is sometimes colloquially referred to
as “fixterms”. The CSI u encoding proposed by Evans was extended by Kovid
Goyal, the author of the kitty terminal emulator, in what has become known as
the kitty keyboard protocol.
What all of these solutions have in common is that key presses are sent to the
terminal application encoded as escape sequences. This eliminates any
ambiguity for modified keys and enables certain modifier combinations (such as
Ctrl + Shift) that are not possible using “legacy” encoding. The CSI u
encoding proposed by Evans and adapted by kitty encodes a modified key press
like Ctrl-M as e[109;5u. The encoding of unmodified key presses like
Return depend on which “level” of the kitty keyboard protocol is enabled.
Applications can opt-in to different levels to ease adoption (for instance,
Neovim uses only the first level, “Disambiguate escape keys”). See the kitty
documentation for more details.
Sending key presses as escape sequences requires that terminal applications
are able to recognize and parse those sequences, so it is not something that
“just works” out of the box. However, the kitty keyboard protocol has been
widely adopted by both modern terminal emulators and terminal applications.
Terminals which support the kitty keyboard protocol (to some degree) include
Wezterm, Alacritty, kitty, foot, Ghostty, and iTerm2. Applications which
support the kitty keyboard protocol (to some degree) include Vim, Neovim,
Helix, kakoune, and nushell. This means that when using one of these
applications in one of these terminals, all of the key encoding problems
discussed above (as well as some others which were not discussed…) are
solved.
Decorations
Xterm has supported 256 user specified colors since 1999. These
colors could be changed at runtime using an escape sequence (OSC 4), which can
be used to great effect (see “8 Bit & ‘8 Bitish’ Graphics-Outside the
Box” by Mark Ferrari for an incredible demonstration, or install
notcurses and run notcurses-demo j in your terminal).
Within the last decade or so, 24 bit color (sometimes referred to as
“truecolor” or “RGB color”) has become widely supported by terminal emulators
which allows terminal applications to use whatever arbitrary colors they want.
This provides terminal UIs a much greater degree of flexibility and creative
freedom.
Modern terminals also support other kinds of “rich” text markup, such as
strikethrough and various types of underlines. For instance, text editors like
Vim and Neovim can add a
red squiggly line
under misspelled words (as seen in many graphical rich text editors).
Examples of markup styles supported by modern terminal emulators
It is also possible to display images and even videos inline inside of
terminal emulators. There are (at least) three different ways to do this
(sixels, the iTerm2 image protocol, and the kitty
graphics protocol) and support among terminal emulators
varies. Unfortunately this means that terminal applications are in a bit of an
awkward situation, as they must either implement support for all of the image
protocols, or only support a subset of terminals. For this reason, use of
images in terminal applications is still relatively uncommon.
It is important to note that advances in terminal based UIs are not only due
to the efforts of terminal emulators, but also to the creativity and talent of
terminal application and library authors. For example, see some of the
fantastic work that charm.sh has done creating delightful,
interactive terminal based user interfaces that rival (and in some cases,
surpass!) graphical UIs for similar tools.
Capability Determination
Terminal emulators do not all support the same features. In some cases, the
same feature is implemented in different ways. Terminal applications need some
way to know which features the terminal they’re running in support and how to
properly use those features.
Today this is primarily done using a distributed database of “terminfo” files.
The terminal emulator uses the $TERM environment variable to communicate to
terminal applications which terminfo file to use to lookup which capabilities
the terminal supports.
This has a multitude of problems, however. The terminfo database is part of
the ncurses library, and different operating systems and distributions package
different versions of ncurses. This was a problem for tmux users on
macOS for many years because the version of ncurses packaged with macOS
was so old that it did not even include the tmux-256color terminfo entry at
all!
This is also a problem for newer terminals which have not yet been added to
the ncurses terminfo database. Terminal emulators can (and often do) ship
their own terminfo entries which are used by applications running on the same
system as the terminal emulator itself. But when connecting to a remote system
(e.g. with SSH), the terminfo database on the remote system will not have the
terminfo entry and the user is met with cryptic warnings like WARNING: terminal is not fully functional and applications not functioning properly.
To circumvent this issue, many terminals use xterm-256color as their $TERM
value, essentially claiming to be Xterm even though they are not, piggybacking
on Xterm’s ubiquity. This creates a vicious cycle, as terminal applications
often hardcode special cases for xterm-256color, which incentivizes
terminals to claim to be xterm-256color, which incentivizes applications to
special case xterm-256color, which… and so on. The problem is
exacerbated by common (bad) advice to users facing problems with terminal
applications to simply override $TERM to be xterm-256color (the
Xterm FAQ itself warns against this).
Unfortunately there are no easy fixes for these problems, but there is hope.
The vast majority of escape sequences used by applications today are common
across most (if not all) modern terminal emulators. This makes terminfo less
necessary since applications can usually safely assume that a given escape
sequence will “just work”.
In addition, terminal emulators increasingly support applications querying
support for certain capabilities. For instance, applications can query the
terminal for support of the kitty keyboard protocol mentioned above and only
enable it if the terminal responds that it is supported. A nice property of
escape sequence queries is they still work even over remote login connections
like SSH.
Some new TUI libraries, such as vaxis, are designed specifically to avoid
using terminfo at all and exclusively use queries to determine feature
capabilities. As more applications, libraries, and terminal emulators move in
this direction, terminfo will become increasingly unnecessary.
System Integration
One of the many advantages of software terminal emulators over hardware video
terminals is that they are one piece of a larger, integrated computing system.
Modern terminal emulators support many escape sequences to interact with their
broader environment. These sequences are generally known as Operating System
Commands (OSCs) and are often referred to by the numeric integer which appears
after the OSC prefix.
Some of the more popular OSC sequences are OSC 2 for setting the title of the
terminal emulator’s window (used frequently by shells and text editors), OSC 8
for creating clickable hyperlinks, OSC 9 for sending desktop
notifications, and OSC 52 for interacting with the system clipboard.
You can test these sequences out for yourself. Try running the following in
your shell:
printf ‘e]9;This is a notification!a’
If your terminal emulator supports OSC 9, you will see a desktop notification
appear with the text, “This is a notification!” (some terminals or operating
systems may not display a notificaton for the focused application. In that
case, add a sleep 2 before the printf command and quickly change focus to
another window).
Terminals which support OSC 8 can create clickable hyperlinks. For instance,
try running the below command:
printf ‘e]8;;https://www.youtube.com/watch?v=dQw4w9WgXcQaClick me for an awesome video!ne]8;;a’
You will see the text “Click me for an awesome video!”. If your terminal
emulator supports OSC 8, the text will be clickable (perhaps requiring a
modifier key like Shift or Command to be held) and might be styled with an
underline or some other visual affordance to indicate that the text is a
hyperlink. Clicking on the text will open your web browser to the (perfectly
innocuous) embedded URL.
A long standing issue for terminal based text editors like Vim is clipboard
management in remote sessons. A strength of Vim is that it can be run just as
easily in a remote SSH session as it can locally; however, the remote SSH
session is not able to communicate with the clipboard on your local system, so
it is not possible to copy text inside of Vim on the remote session to your
clipboard.
Vim addresses this by (optionally) linking against X11 and allowing users to
forward their X connection to the remote server, allowing Vim on the remote
server to copy text to the X clipboard on the local system. And while this
does work, it has its own problems (users must use a version of Vim compiled
against X11, with the optional +clipboard feature enabled, and use X11 as
their display server, and remember to forward the X connection to the remote
system).
A better solution is to copy data to the clipboard through the terminal
emulator directly. An application running in the terminal can use the OSC 52
escape sequence to write a Base64 encoded string to the terminal emulator. The
terminal then decodes the string and copies the data into the system
clipboard. The terminal emulator does not know or care whether the application
that sent the sequence is running remotely or not, which means this works on
any system with zero dependencies.
Pasting (reading) from the clipboard has serious security implications,
because any program in the terminal (even ones on remote servers) can request
the clipboard contents of the user’s system. For this reason, most terminal
emulators disable reading from the clipboard by default, or require the user
to explicitly allow it with a prompt.
Neovim recently added builtin support for using OSC 52 and it will be enabled
for users by default (if the terminal emulator supports it) in the forthcoming
0.10 release.
Conclusion
While it’s true that terminals, as an application platform, are idiosyncratic
and quirky, their portability, ubiquity, and relative ease of use (for
application authors) makes them increasingly popular for many developers, even
in the face of an increasing number of alternatives.
This article is not exhaustive, but it is not meant to be. There are other
challenges that both terminal emulator and terminal application authors face
that are not discussed here, as well as other areas of innovation and creative
exploration. Some examples: better grapheme clustering,
synchronized output to avoid “flickering” in redraw-heavy UIs, and
custom shaders to create arbitrary visual effects.
Terminal emulators are not static: they continue to evolve and innovate to
solve users’ problems and improve users’ experience. The underlying technology
is old: downright ancient by the standards of modern tech. But, instead of a
flaw, I consider this a strength: it gives me confidence that while individual
terminal emulators may come and go, the underlying platform will endure.
References & Further Reading
The TTY demystified
What happens when you press a key in your terminal?
A history of the tty
Understanding ASCII (and terminals)
Comprehensive keyboard handling in terminals
Fix Keyboard Input on Terminals – Please
Grapheme Clusters and Terminal Emulators
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Hacker News – https://gpanders.com/blog/state-of-the-terminal/