Change Case of Word at Point in Emacs, But for Real This Time

At the moment, I’m proof-reading and editing the book manuscript of my pal Sascha for the new edition of the Zettelkasten Method book. As with most things text these days, I’m doing that with Emacs.

Something that continually drives me bonkers is how Emacs handles upcasing, downcasing, and capitalization of words by default. The functions that are called for the default key bindings are upcase-word, downcase-word, and capitalize-word. These sounds super useful to fix typos. The default behavior is odd, though: They only change the case of the whole word when you select the word first. Otherwise they change the case of the remainder of the word beginning at the character at the insertion point. The docstrings say as much: “Capitalize from point to the end of word, moving over.” Why?

Anything before the insertion point is ignored; and when capitalizing, only 1 character is actually changed.

So the functions are aware of my intention to change the word. Why don’t they start at its beginning?

I can understand the underlying functions involved here that act on the region aka selection of the user. They usually expect 2 parameters, the start and end of the region where the effect should be applied. That’s super useful to compose effects with other functions because of its general nature.

The “convenient” behavior of the key-bound functions to change the case for the remainder of the word puzzles me, though. Is it because you can pass numerical parameters to it to continue from point onward N words forward or backward? I dont’ know. Even then, why not start at the beginning while we’re at it? I would understand not acting on words at all without a selection, and just changing the case of character at the insertion point’s location then. But this?!

Xah Lee, who seemingly has done every conceivable thing you can do to Emacs in the past 20 years, implemented his own ‘toggle letter case’ function that does what maybe not every Emacs Lisp programmer, but any writer would expect: to act on the whole word.

He opted to figure out word boundaries via the [:alpha:] regular expression. That’s maybe not always enough, but it’s good enough for typing text. And, unless “thing at point”, it is consistent. Every mode can redefine what a “word” means in its context. (Which is useful on its own, but not helping to keep downcasing predictable.)

I changed Xah’s code a bit, because I don’t want to cycle through cases interactively. I’d rather hit M-u for ALL CAPS upcasing once.

Imagine a helper function ct/word-boundary-at-point-or-region that returns 2 character locations: the start and end of either the current region (i.e. selection in emacs) or the word below the insertion point.

The return value could be (100 110) for a 10 character word that starts at offset 100. The position of the insertion point notably doesn’t matter.

You can upcase a word like this, using car to get the first element of the returned list value, and cadr (aka (car (cdr x)))) to get the last element.1

Here’s a function that would utilize this to fetch both points and then capitalize the region:

(defun ct/capitalize-word-at-point ()
  (let* (($bounds (ct/word-boundary-at-point-or-region))
         ($p1 (car $bounds))
         ($p2 (cadr $bounds)))
    (upcase-initials-region $p1 $p2)))

I’d have to copy and paste that for all three case-changing functions I need. I’d rather extract the common theme here and change the approach to an adapter of sorts, if you pardon the OOP terminology when using Lisp.

The actual implementation of ct/word-boundary-at-point-or-region thus is implemented in a way to figure out the start and end points of a word, return these, but then also forward these to a callback, if that is provided.

Here’s the implementation, mostly copies from Xah’s excellent code:

(defun ct/word-boundary-at-point-or-region (&optional callback)
  "Return the boundary (beginning and end) of the word at point, or region, if any.
  Forwards the points to CALLBACK as (CALLBACK p1 p2), if present.

  (let ((deactivate-mark nil)
        $p1 $p2)
    (if (use-region-p)
        (setq $p1 (region-beginning)
              $p2 (region-end))
        (skip-chars-backward "[:alpha:]")
        (setq $p1 (point))
        (skip-chars-forward "[:alpha:]")
        (setq $p2 (point))))
    (when callback
      (funcall callback $p1 $p2))
    (list $p1 $p2)))

Let me walk you through the parts here in case Lisp is odd for you to read:

  • When a region is active, use the region beginning and end points for $p1 and $p2.
  • When no region is active, move the insertion point to the beginning of the word, save that as $p1, skip to the end of the word, save that offset as $p2. (And restore the original position thanks to the save-excursion decorator.)
  • If a callback function is given, pass the two points.
  • Always return a tuple of points via (list $p1 $p2).

Now I can get the tuple of points if I need, or I can tell the function to call another function and forward these points.

The implementation thus shrinks down to one-liners:

(defun ct/capitalize-word-at-point ()
  (ct/word-boundary-at-point-or-region #'upcase-initials-region))
(defun ct/downcase-word-at-point ()
  (ct/word-boundary-at-point-or-region #'downcase-region))
(defun ct/upcase-word-at-point ()
  (ct/word-boundary-at-point-or-region #'upcase-region))

;; Set global shortcuts
(global-set-key (kbd "M-c") #'ct/capitalize-word-at-point)
(global-set-key (kbd "M-u") #'ct/upcase-word-at-point)
(global-set-key (kbd "M-d") #'ct/downcase-word-at-point)

I prefer these one-liners over repeatedly unpacking 2 points frm a tuple that was returned.

The actual capitalization should maybe be implemented a bit different, though: upcase-initials-region only changes the case of the initials and leaves the remainder untouched, unlike capitalize-word which lowercases the rest. "fizzBUZZ" thus becomes "FizzBUZZ". My expectation is for the whole word to change, not just the initial characters, so I prefer to downcase the whole word first and then capitalize the initials for my current task:

(defun ct/capitalize-region (p1 p2)
  (downcase-region p1 p2)
  (upcase-initials-region p1 p2))
(defun ct/capitalize-word-at-point ()
  (ct/word-boundary-at-point-or-region #'ct/capitalize-region))

I have to say I really like function composition.

By the way, I also evaluated to train myself to expand the selection to the current word first and then call the built-in case changing functions. There’s tools for that. But that sucks, and the default behavior of the built-in functions still is odd.

  1. If you’re new to Lisp, using only cdr is like a dropFirst call on an array, still returning a list, but with 1 element in this case. car then fetches this element. And cadr is a shorthand for this common combination to fetch the butt of a list, so to speak.