Parse Boolean Search Expressions in Swift

Ever wanted to implement a full-text search in your app? Didn’t find a boolean search expression parser that works? Look no further!

For the initial release of my note-taking app The Archive I had to create a couple of open source libraries that I have yet to talk about. Today, I want to show you my search expression parser. It powers the app’s Omnibar to find notes quickly using simple boolean expressions.

Searching in a Pile of Notes

A prerequisite for me was to have an interface that’d work well with C Strings. strstr is just the fastest way to look for a needle in a haystack. It’s crazy. To utilize the full potential of full-text C String search, though, you should keep an all-caps or all-lowercase version of the note around so you can do a case-insensitive search. While case-insensitive searches are slower when you perform them on-the-fly, the results are stunning on a bre-built index that doesn’t distinguish upcase and downcase anymore.

The actual objects in my app have a bit more information, but this is a good-enough approximation of what a note representation in the in-memory index looks like:

struct IndexableNote {
    private let cString: [CChar]

    init(text: String) {
        self.cString = text
            // Favor simple over grapheme cluster characters
            .precomposedStringWithCanonicalMapping
            .cString(using: .utf8)!
    }
}

The original String doesn’t matter for the purpose of indexing. All we want to do is perform fast searches.

My first naive implementation of a full-text search separated the search string at every space. So foo bar baz would search for foo, bar, and baz in every note. To make the parts easier to search for, convert them to C Strings as well.

Here’s the implementation that served me well through the beta and into the v1.0 release:

struct IndexableSearchString {
    private let cStringWords: [[CChar]]

    init(_ string: String) {
        self.cStringWords = string
            // Favor simple over grapheme cluster characters
            .precomposedStringWithCanonicalMapping
            .lowercased()
            .split(separator: " ")
            .flatMap { $0.cString(using: .utf8) }
    }

    func matchesAll(in haystack: [CChar]) -> Bool {
        for needle in cStringWords {
            if strstr(haystack, needle) == nil { return false }
        }

        return true
    }
}

extension IndexableNote {
    func matches(searchString: IndexableSearchString) -> Bool {
        searchString.matchesAll(in self.cString)
    }
}

You prepare the search once, then match it against each note in the index as a filter. That’s it. It’s super simple, but it works well for 99% of all use cases. This is pretty close to a Google search already: You enter a list of terms and want search results that contain all of them.

Then there are tech-savvy people, though, who want to exclude search terms and use boolean OR to search for variants, like “banana OR apple OR fruit”. To cater to their needs, I wrote a search expression parser that does just that, and which provides the C String matching that proved to be so useful.

The Boolean Search Expression Parser

I wrote the SearchExpressionParser library with note-taking apps in mind. Search terms had to be human-readable enough for a layperson to understand what’s going on. That’s why operators are all caps: AND, OR, and NOT/!.

The library behaves as follows:

  • foo bar baz is equivalent to foo AND bar AND baz
  • NOT b equals !b
  • ! b (note the space) is ! AND b
  • "!b" is a phrase search for “!b”, matching the literal exclamation mark
  • Escaping works in addition to phrase search, too: \!b also searches for ”!b”.
  • Escaping inside phrase searches also works: hello "you \"lovely\" specimen"
  • Escaping operator keywords treats them literal: \AND.

Note that a lowercase “and” will not be treated as an operator, only all-caps ”AND” will. So there’s no need to escape a lowercase \and, for example.

You can parenthesize expressions:

!(foo OR (baz AND !bar))

That evaluates to an equivalent of:

!foo OR !baz AND !foo OR !bar

As of yet, there is no real operator precedence implementation. I didn’t need that, and I discovered not every full-text search implements this correctly at all. So instead of operator precedence that satisfies math-nerds, logicians, and programmers, I roll with a strict left-to-right approach.

The Expression object tree of the nested term above looks like this, by the way:

// !(foo OR (baz AND !bar))
NotNode(
    OrNode(lhs: ContainsNode("foo"), 
           rhs: AndNode(lhs: ContainsNode("baz"), 
                        Rhs: NotNode(ContainsNode("bar")))))

That’s what you’ll get from the parser. It’s a self-evaluating expression node tree.

Expressions are not optimizing themselves to abort quickly; instead, the whole expression tree will be traversed and checked if the operators permit. Since an AndNode simply combines the result of the left-hand side with the right-hand side using Swift’s && operator, the right-hand side could be skipped if the left-hand side evaluates to false already. In the best-case scenario, this is just as efficient as regular boolean expressions in Swift.

This also means that !(a OR b) will result in:

NotNode(OrNode(lhs: ContainsNode("a"), rhs: ContainsNode("b")))

Since the underlying || operator always evaluates both sides, this is less efficient than the equivalent term !a AND !b.

But does it matter in your case?

If so, pull requests with boolean expression normalization are welcome, of course!

It didn’t matter to me. When people compose intricate expressions, well, then I think they’re using their note archive in a very peculiar way. I don’t see the benefit of spending extra work on a normalizer when I could be adding features that benefit a much wider audience.

Using the SearchExpressionParser API

The SearchExpressionParser API exposes Parser.parse(searchString:) that you’ll be using:

import SearchExpressionParser
guard let expr = try? Parser.parse(searchString: "Hello") else { fatalError() }
expr.isSatisfied(by: "Hello World!") // true
expr.isSatisfied(by: "hello world!") // false

The parser can potentially throw an error, but all errors you’ll get are programmer errors on my side. There are no regular error conditions. When an error gets thrown here, please report it, because it’s a bug.

The library provides a CStringExpressionSatisfiable protocol to perform my beloved strstr search instead of the more literal and much slower String.contains. It will also make the search case-insensitive.

To implement this, take the IndexableNote from above and modify it to meet the API criteria:

struct IndexableNote {
    private let cString: [CChar]

    init(text: String) {
        self.cString = text
            // Favor simple over grapheme cluster characters
            .precomposedStringWithCanonicalMapping
            .cString(using: .utf8)!
    }
}

import SearchExpressionParser

extension IndexableNote: CStringExpressionSatisfiable {
    func matches(needle: [CChar]) -> Bool {
        return strstr(self.cString, needle) != nil
    }
}

That’s all you need to get lightning-fast case-insensitive search:

guard let expr = try? Parser.parse(searchString: "Hello") else { fatalError() }
expr.isSatisfied(by: IndexableNote(text: "Hello World!")) // true
expr.isSatisfied(by: IndexableNote(text: "hello world!")) // true

Let the boolean expression parser do its magic for you:

let warAndPeace = IndexableNote(text: String(contentsOf: "books/Tolstoy/War-and-Peace.txt"))
let protagonist = try! Parser.parse(searchString: "\"Pierre Bezukhov\" OR \"Pyotr Kirillovich\"")
protagonist.isSatisfied(by: warAndPeace) // true

That’s all there is to the power of The Archive’s search expressions! They were pretty fun to implement and make searching for relevant notes in your note archive much easier, e.g. hyperlink (#zettelkasten OR #note-taking). With The Archive’s saved search feature, you can compose boolean queries once and then get back to an accurately reduced subset of your thousands of notes in a split-second.

Find SearchExpressionParser on GitHub and feel free to open issues, pull requests, and ask questions anytime!

Fixing Ruby ncurses Unicode Character Display on Linux Terminals

A little side-project of mine is a role-playing game written in Ruby that runs in the terminal and uses Unicode/ASCII characters instead of bitmap pixel graphics. In my personal tradition of these kinds of side projects, this is called TermQuickRPG. It’s a work-in progress, so there’s not a lot to do in the sample game at the moment.

How I found out why special characters wouldn’t draw under Linux

After I finished a little scenario with some custom scripting on the maps, I wanted to share the game with friends. Some have Linux machines running, and since I use the curses gem, I thought I was good to go. But no such luck: on macOS, it behaves totally different. On Linux, the Unicode Box Drawing characters cannot be printed. I get garbage output instead.

macOS and Linux Terminal output compared

So the unicode characters don’t get printed. Bummer. I am using Linux Mint 19 (a Ubuntu derivate) in a VM to test this, by the way. Here are my initial test results:

  • The font was capable of showing these characters.
  • The Terminal was capable of showing these characters.
  • Python 3 was capable of displaying these characters in a curses window.

The last one made me curious. The Python 3 docs say:

Note: Since version 5.4, the ncurses library decides how to interpret non-ASCII data using the nl_langinfo function. That means that you have to call locale.setlocale() in the application and encode Unicode strings using one of the system’s available encodings. This example uses the system’s default encoding:

import locale locale.setlocale(locale.LC_ALL, ”) code = locale.getpreferredencoding()

Then use code as the encoding for str.encode() calls.

And sure enough! When I call setlocale(locale.LC_ALL, ''), the Python sample did display the box drawing characters; without, it didn’t. There was no such setting in Ruby, though, and it seems no tinkering with the LC_ALL environment variable and file encodings did help.

That’s when I tried a quick sample in plain C.

  • Plain C was not capable of displaying these characters.

Wait, what? Python is, C isn’t? (Not even with setlocale called from C.)

So I dug into the Python code. The Python implementation of addstr, the curses function that will eventually print a string on screen, reveals that for some environments, mvwaddwstr is used. That’s part of ncursesw.

Once I installed ncursesw sudo apt install libncursesw5-dev and compiled the C code with the -lncursesw option and called mvwprintw (note the trailing “w”, which makes this part of ncursesw, not ncurses!) – sure enough, it did output the characters just fine.

Curses’s internal representation of the string contents I was giving it did work with the ncursesw library, not with the curses or ncurses library.

There’s a ncursesw ruby gem, too, and it does work just as fine once you change the code to use that gem’s API.

Well, the ruby/curses gem says it in the README, too, once I looked a second time:

Requires ncurses or ncursesw (with wide character support).

Wide character support is what I was looking for all the time. I just didn’t pay attention to this stuff after I settled for the ruby/curses gem because its API was so nice. Sheesh!

Adjusting the game to ncursesw

The ruby/curses gem supports ncursesw, actually. It just loads the older stuff first, if possible. It comes 3rd since 2016. Switching the order of the #if defined compile-time macros to load ncursesw first, instead, instantly made the nice ruby/curses gem’s API do the job just as well as the ncursesw gem I mentioned above. No need to adjust even a single line of code!

Naturally, I created a pull request to incorporate the changes after local testing.

It even works on Linux! Sort of. My linux terminal displays the box drawing characters as wide unicode characters (which was my problem in the first place), so each box drawing character takes up the same screen space as 2 regular characters.

All the space characters are only half as wide as the boxes, the house walls, and the smiling faces.

Up next, I’ll have to figure out if I can enforce double character width on macOS (why doesn’t macOS have to use wide-character support at all?) and adjust everything to these new width constraints. Or the other way around, get Linux terminals to display more narrow characters instead.

Or maybe I’ll switch to either a graphics-based renderer or PDCurses, which can use SDL to draw characters, it seems. We’ll see about that.

How to Fix fileReferenceURL() to Work with NSURL in Swift 3 and Swift 4

I am upgrading the code base of the Word Counter to Swift 3. Yeah, you read that right. I didn’t touch the Swift code base for almost 2 years now. Horrible, I know – and I’m punished for deferring this so long in every module I try to convert and build.

One very interesting problem was runtime crashes in a submodule I build where URLs were nil all of a sudden. This code from 2015 (!!) used to work:

public struct LocalURL {
    public let URL: NSURL

    public init(URL: NSURL) {
        assert(URL.fileURL)

        if URL.isFileReferenceURL() {
            self.URL = URL
        } else {
            self.URL = URL.fileReferenceURL()!
        }
    }
    // ...
}

Yeah, some unicorns are dying from my code thanks to force unwrapping. This is not the only place I did that. I was a terrible Swift citizen in 2015, it turns out.

Apart from my low Swift !-standards, this did use to work. Now it doesn’t. Even for an existing file path, URL.fileReferenceURL() just returns the same URL as you put in; this is clearly not what I was aiming for, since fileReferenceURL() is supposed to convert existing file URLs to a path-independent pointer to a file, aka a “file reference”. These look something like file:///.file/id=6571367.437879/ instead of file:///tmp/test.txt.

After casting from URL to NSURL and back for a while, I discovered Swift Bug SR-2728: apparently this is a known problem since Swift 3. Seems to be related to the bridging between NSURL, which is an NSObject subclass, and URL, which is a Swift struct.

An annoyingly verbose workaround by Charles Srstka for Swift 3.1 is to perform all the work in the Objective-C runtime:

if let refURL = (url as NSURL).perform(#selector(NSURL.fileReferenceURL))?.takeUnretainedValue() as? NSURL {
    print(refURL) // will print something along the lines of 'file:///.file/id=01234546.789012345'
}

That does indeed work! So if you have to work on a Swift 3.1 codebase and encounter this, there you go.

Swift 4.1 has a simpler mechanism to ensure you get a NSURL instead of a URL – the only type that supports file reference URLs as of September 2018, still.

if let fileRefURL = (url as NSURL).fileReferenceURL() as NSURL? { 
    print(fileRefURL)
}

Try that in the Swift 4.1 REPL to see that it works. Whew.

I do understand that there may be reasons to remove fileReferenceURL from URL and leave it on NSURL, but when you do invoke it on NSURL, I think it should at least return another NSURL object that works as expected instead of bridging to Swift’s URL struct that, for some reason, won’t work.

Interestingly enough, if you do know the file reference, e.g. file:///.file/id=6571367.437879/, you can work with Swift’s plain URL just fine:

let url = URL(string: "file:///.file/id=6571367.437879/")
print(url?.path)
// Output: 
//   /private/etc/tmp/test.txt

So when Swift’s Foundation URL type supports getting a path from a file reference URL, why does the fileReferenceURL() stuff not work?

Beats me!

If you happen to know something more, I’d be happy to know about the secret in the comments.

React to Programmatic Changes to NSControl.state in RxCocoa

Say you have a collection of radio buttons. They’re NSButton instances, and NSButton inherits from NSControl. Radio buttons’s mutual exclusivity is implemented by …

  1. Grouping radio buttons from their target and action property, even if the action doesn’t do anything;
  2. Allowing only one control’s state property to be NSControl.StateValue.on, thus switching all others in the group to .off for you.

With the correct setup, you can set 1 out of 100 radio buttons to .on and have the previous selection turned off for you automatically. That’s neat.

You cannot rely on programmatic changes to the state property, though, when you work with RxSwift’s RxCocoa wrapper for NSControl. Because programmatic changes do not trigger AppKit’s target/action mechanism, the callbacks are not invoked. Just as anywhere else in RxCocoa land, when you perform programmatic changes, your Observable will not receive an event. Depending on the mechanism that’s used to provide the reactive extension, you can trigger state updates using Key–Value-Coding or notifications instead. That won’t work for the target/action mechanism used for NSControl.rx.state, though, unless you invoke the selector:

_ = theRadioButton.target?.perform(theRadioButton.action)

That’s pretty ugly and force-unwraps the action for you, potentially causing runtime exceptions, unless you unwrap things safely yourself:

if let action = theRadioButton.action, let target = theRadioButton.target {
    _ = target.perform(action)
}

Meh. Just to trigger side effects that depend on the internal implementation of RxCocoa’s NSControl reactive extension, which is prone to change over time without you noticing. Not good.

If you’re curious how the .rx.state property is implemented, have a look at the current NSButton+Rx.swift code exposing state; you’ll notice it uses the controlProperty factory method you can find in NSControl+Rx.swift. There, upon close inspection, you’ll see a ControlTarget being responsible for generating the events. Then having a look at the implementation of ControlTarget, you will finally see that it changes the target/action of the observed control to itself and its eventHandler method (by the way, I’ll have to test if this will break multiple radio button groups because they all have the same action afterwards), where the callback is invoked, which was set by NSControl+Rx to be the event forwarder.

I don’t expect this to stay the same forever. RxSwift and RxCocoa has a history of huge leaps forward with major version changes, introducing new wrapper mechanisms for the delegate pattern, for example. That’s why I won’t bet on this staying the same forever.

So what I do instead: provide a wrapper observable stream!

class ViewController: NSViewController {
    @IBOutlet var radioButtonA: NSButton!
    let radioButtonAStateChange = PublishRelay<NSControl.StateValue>()
    @IBOutlet var radioButtonB: NSButton!
    let radioButtonBStateChange = PublishRelay<NSControl.StateValue>()
    @IBOutlet var radioButtonC: NSButton!
    let radioButtonCStateChange = PublishRelay<NSControl.StateValue>()

    private let disposeBag = DisposeBag()

    override func awakeFromNib() {
        super.awakeFromNib()
        wireRadioStates()
    }

    private func wireRadio() {
        radioButtonA.rx.state.bind(to: radioButtonAStateChange).disposed(by: disposeBag)
        radioButtonB.rx.state.bind(to: radioButtonBStateChange).disposed(by: disposeBag)
        radioButtonC.rx.state.bind(to: radioButtonCStateChange).disposed(by: disposeBag)
    }

    // MARK: - Incoming events

    func updateControlsProgrammatically(whichRadioButton: Int) {
        // ...
        elseif whichRadioButton = 2 {
            radioButtonB.state = .on
            radioButtonAStateChange.accept(.off)
            radioButtonBStateChange.accept(.on)
            radioButtonCStateChange.accept(.off)
        }
        // ...
    }
}

Then you bind your event handlers not to radioButtonB.rx.state directly, but to the relay that can also be triggered programmatically.

But doesn’t this imply I have to keep a ton of relays around when I have a lot of radio buttons?” – Yes, sure it does!

Depending on your use of radio buttons, you may be lucky: maybe you can put knowledge about which radio button is active in a single observable stream. Set each radio buttons’s tag property to a number and lump state changes together from individual button states to a single PublishRelay<Int> that tells you which button is active based on its tag value.

    let activeRadioTag = PublishRelay<Int>()

    func updateControlsProgrammatically(whichRadioButton: Int) {
        // ...
        elseif whichRadioButton = 2 {
            radioButtonB.state = .on
            activeRadioTag.accept(radioButtonB.tag)
        }
        // ...
    }

If you then want to react to “button C is selected” events, you’ll end up with something like activeRadioTag.filter { $0 == radioButtonC.tag }, which isn’t too bad. At least you don’t have to copy and paste during programmatic setting of the state.

So this was another day of writing “”“interesting”“” RxSwift/RxCocoa event handlers that also work when you set up your views programmatically with initial display values.

Remove Trailing Whitespace in TextMate 2 Code Files

I still use TextMate for some things: editing documents quickly, scripting in Ruby, navigating project folders of foreign code bases (especially when they’re not using my main language so I could use Xcode, e.g. Java projects), and finding and replacing text.

But it always bugged me that when I move around code and indent and outdent and whatnot, that sometime lines with nothing but whitespaces would be saved. Or I’d combine stuff and have 10 trailing spaced all of a sudden. I do show invisible characters, but I don’t want to pay attention to that kind of stuff when I’m coding.

Xcode can be setup to remove trailing whitespace while you edit. I want that.

Turns out TextMate has a “Text” bundle with the “Remove Trailing Spaces in Document / Selection” command. You can launch it from the Bundles menu, but then you still have to do it manually.

Turns out TextMate also has callbacks! You can hook any command to callback.document.will-save and it’ll be executed before saving the file.

To set this up:

  • Open the bundle editor (from the main menu, select “Bundles > Edit bundles …”, or hit ⌃⌥⌘B)
  • Select “Text” from the leftmost pane (that’s the pane listing all installed bundles)
  • Select “Menu Actions” from the 2nd pane
  • Select “Converting / Stripping” submenu from the 3rd pane
  • Select “Remove Trailing Spaces in Document / Selection” from the 4th pane
  • In the item drawer to the right of the bundle editor, you’ll see a swath of settings; in there …
  • set the Semantic Class attribute to callback.document.will-save, and then
  • set the Scope Selector attribute to source.

I included the last setting because I do not want to trim trailing whitespace from Markdown documents: sometimes, empty lines with indentation do have meaning. And every time, 2 trailing spaces signify a line break. I don’t want to lose these. You can leave the limitation out if you want.

Here’s a depiction of the settings:

TextMate 2 Bundle Editor settings to trim whitespace in code

For the curious: text instead of source would apply the command to non-source code files like plain text or Markdown or Pandoc – or HTML. You can combine selectors to apply to specific types, like source, text.html. Be aware that Markdown documents report their base scope as text.html.markdown, though, so you’d end up removing trailing whitespace from Markdown again. So you might instead want to use text.html.basic if you use the plain HTML language from the bundle, or text.html.erb if you use the ”HTML (Ruby - ERB)” language setting. You can put as many language scopes in the list as you like, as far as I know, so source.ruby, text.html.basic, source.swift would work, too. You can go crazy and restrict this down to the scope of individual blocks, like meta.tag.inline.span.start.html to only remove trailing whitespace inside the <span> tag itself, before the closing >.

If you don’t know which scope the language you’re using reports to the bundle engine, invoke the “Show Scope” command from the command palette (“Bundles > Select Bundle Item …”, or ⌃⌘T).

Works beautifully and reduced my git diff noise a ton already. Have fun hacking away!

ReSwift Custom Diffs and Enqueued State Updates

Vinh Nguyen found that his ReSwift status updates became slow.

  1. There were too many subscribers.
  2. Objects would react to state updates by dispatching a new action immediately. (ReSwift action dispatching happens synchronously.)

His app state ends up containing a lot of objects in a 3-level hierarchy that mimicks the hierarchy of view components on screen. In a drawing or otherwise canvas based graphics app, it seems. It doesn’t make sense to have each objects on the canvas responds to state updates when one other object updates on screen. Instead, you’ll want to at least minimize the amount of updates that get passed through.

Vinh implemented a custom diff or “delta update” for the 2nd level in his 3-level hierarchy of objects because they were few enough to be performant during state updates, and could easily manage their child objects.

Read about his discovery of state update bottlenecks on his blog.

He solved the second problem, newState callbacks triggering the dispatch of another action, by enqueuing the dispatch in an asynchronous block on the main queue, which is the queue ReSwift uses:

class ObjectView {
    func newState(state: ObjectState) {
        // ...
        if conditionThatTriggersAnAction == true {
            DispatchQueue.main.async {
                store.dispatch(Action())
            }
        }
    }
}

Sure, this enqueues the action dispatch until the current execution is finished. But you have to take care about other actions being dispatched in between now, and if that is a problem. (E.g. another subscriber type reacting to the same state update with another action.)

I had prefered another solution initially: subscribe to updates in the top level Canvas object, then delegate down the view hierarchy as needed. Every sub-component that wants to fire an action tell the Canvas about this, which enqueues the actions, and then processes the queue after all sub-component updates are finished. A bit like in game development where the game loop ensures there is just 1 point of action handling per run. But then again, Vinh’s approach does exactly that: it enqueues action dispatching until later, ensuring the current run loop run isn’t interrupted. Also, my approach to delegation would make everything just so much more complicated in the app code.

I wonder is it’d be beneficial if the ReSwift store operated on a high priority queue that is not the main queue all the time. Then you can dispatch actions synchronously from view components on the main queue, waiting for the result, or asynchronously.

I will have to think more about the consequences of an approach like this before I suggest anything to anybody, though. I don’t do a lot of concurrent programming in my apps, and when I do, I contain it very strictly; on the downside, I don’t have developed any instinct regarding implications of using multiple queues.

Synchronize Scrolling of Two (or More) NSScrollViews

You can make two NSScrollViews scroll in concert quite easily because every scrolled pixel is broadcasted to interested parties.

Rows in TableFlip
Rows in TableFlip

In TableFlip, the main table is a NSTableView contained in a NSScrollView. You can view and hide row numbers in TableFlip; but I didn’t want to reload the whole table and mess with the table model to insert and remove the first column. Instead, I use a second table view with a single column. The upside of this approach: I can animate hiding the whole scroll view with the row numbers inside easily without affecting the main table.

Synchronizing two or more scroll views is pretty simple: upon scrolling, the NSScrollView’s NSClipView can post a NSView.boundsDidChangeNotification. Simply subscribe to that.

Note that you need to enable posting the notification first: set NSView.postsBoundsChangedNotifications = true for the NSClipView that you want to observe.

I put the logic for this into a NSScrollView subclass with an @IBOutlet to the scroll view that the current one should be synced to. This way, I can wire them in Interface Builder and don’t have to write code for that.

class SynchronizedScrollView: NSScrollView {

    @IBOutlet weak var sourceScrollView: NSScrollView!
    lazy var notificationCenter: NotificationCenter = NotificationCenter.default

    deinit {
        notificationCenter.removeObserver(self)
    }

    override func awakeFromNib() {

        super.awakeFromNib()

        let scrollingView = sourceScrollView.contentView
        scrollingView.postsBoundsChangedNotifications = true

        notificationCenter.addObserver(self, 
            selector: #selector(scrollViewContentBoundsDidChange(_:)), 
            name: NSView.boundsDidChangeNotification, 
            object: scrollingView)
    }

    @objc func scrollViewContentBoundsDidChange(_ notification: Notification) {

        guard let scrolledView = notification.object as? NSClipView else { return }

        let viewToScroll = self.contentView
        let currentOffset = viewToScroll.bounds.origin        
        var newOffset = currentOffset
        newOffset.y = scrolledView.documentVisibleRect.origin.y

        guard newOffset != currentOffset else { return }

        viewToScroll.scroll(to: newOffset)
        self.reflectScrolledClipView(viewToScroll)
    }
}

NSTextField usesSingleLineMode Stops Working When You Implement NSTextViewDelegate Methods

Today I learned why my NSTextField permits pasting of newline characters even though I set usesSingleLineMode properly. It’s because I made it conform to NSTextViewDelegate to cache changes.

When you edit text inside of an NSTextField, you actually type inside a field editor of the window. That’s a shared NSTextView instance. Most of the hard work of an NSTextField is done by its cell, which is an NSTextCell. NSTextCells implement at least the delegate method NSTextViewDelegate.textView(_:shouldChangeTextIn:replacementText:) – and when you set usesSingleLineMode, this is actually set for the cell, not the view itself. You can use textView(_:shouldChangeTextIn:replacementText:) to sanitize input text, and I suspect that’s where the usesSingleLineMode implementation happens. If your NSTextField subclass implements this method, the NSTextCell implementation isn’t called. And since that one isn’t public (it was called “implicit protocol conformance” back in the day), you cannot delegate up in Swift because the compiler knows it isn’t there.

NSTextFields register as the delegate of their field editor and seemingly forward some delegate calls to their cells. That’s good to know and can be exploited for all kinds of things – you don’t have to mess around with the field editor’s delegate on your own at all. You always know it’s the text field being edited.

Since I cannot delegate back to NSTextCell and just decorate what the framework’s doing anyway, I have to find a different solution.

I was using the delegate method to record the text before and after the change so I could cache both and compute a diff later. Since there is no “will change” notification for NSText, NSTextView, NSTextField, or NSControl, that sounded like a good idea. But without the ability to merely decorate the default behavior, I’m looking for alternatives. Here’s what I think one could do:

  • Recreate the usesSingleLineMode functionality myself. While that’s doable, who knows what else happens there!
  • Leverage implicit protocol conformance from Objective-C. That introduces Objective-C to the library.

The Objective-C adapter I wrote checks if the receiver responds to the method first and looks like this:

@implementation DelegatableTextField

- (BOOL)del_textView:(NSTextView *)textView shouldChangeTextInRange:(NSRange)affectedCharRange replacementString:(NSString *)replacementString {
    if (![self.cell respondsToSelector:@selector(textView:shouldChangeTextInRange:replacementString:)]) {
        NSAssert(false, @"NSTextField's cell should responds to NSTextViewDelegate functions");
        return true;
    }
    return [((id<NSTextViewDelegate>)self.cell) textView:textView
                                 shouldChangeTextInRange:affectedCharRange
                                       replacementString:replacementString];
}

@end

To make this available in a Swift framework target, you need to include the header file in the framework’s public header, sadly. There’s no project internal bridging header in that case. But I can live with that prefix. That’s how you handled implicit protocol conformance.

It’s used from the Swift class as you’d expect:

func textView(_ textView: NSTextView, shouldChangeTextIn affectedCharRange: NSRange, replacementString: String?) -> Bool {

    // `replacementString` is `nil` for attribute changes
    guard let replacementString = replacementString else { 
        return super.del_textView(textView, shouldChangeTextIn: affectedCharRange, replacementString: replacementString)
    }

    let oldText = textView.string
    cacheTextChange(original: oldText, 
        replacement: replacementString,
        affectedRange: affectedCharRange)

    return super.del_textView(textView, shouldChangeTextIn: affectedCharRange, replacementString: replacementString)
}

Works. I’m happy. Still, it’s an ugly solution.

But couldn’t you do the same from your Swift delegate method?” – Sadly, no. In Swift’s type system, you cannot case the NSCell to NSTextViewDelegate; in Objective-C, protocol conformance casts won’t fail, only message sending will.

So this is how I do it. You may do it in a similar way. Please tell me if you find a better solution. Or even better, create a pull request with a fix!

What was actually going on in usesSingleLineMode, after all?

While I’m at it, let’s log what’s happening. To my surprise, the sanitization is more like a post-hoc fix:

NSTextDidChange notification: "a\nb"
NSTextDidChange notification: "a b"
NSTextField change: "a b"

So when I paste a text with newline characters, the text is simply replaced. Notice how the NSTextField delegate won’t know about the initial paste.

The stack trace tells a more complete story when I break in the NSTextDidChange notification handler:

#0  0x0000000100004007 in closure #1 in AppDelegate.applicationDidFinishLaunching(_:)
#1  0x0000000100004372 in thunk for @escaping @callee_guaranteed (@in Notification) -> () ()
#2  0x00007fff545dc640 in -[__NSObserver _doit:] ()
#3  0x00007fff524b461c in __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ ()
#4  0x00007fff524b44ea in _CFXRegistrationPost ()
#5  0x00007fff524b4221 in ___CFXNotificationPost_block_invoke ()
#6  0x00007fff52472d72 in -[_CFXNotificationRegistrar find:object:observer:enumerator:] ()
#7  0x00007fff52471e03 in _CFXNotificationPost ()
#8  0x00007fff5459b8c7 in -[NSNotificationCenter postNotificationName:object:userInfo:] ()
#9  0x00007fff4fbaf761 in -[NSTextView(NSSharing) didChangeText] ()
#10 0x00007fff4fbb00e6 in -[NSCell textDidChange:] ()
#11 0x00007fff4fbafe64 in -[NSTextField textDidChange:] ()
#12 0x00007fff524b461c in __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ ()
#13 0x00007fff524b44ea in _CFXRegistrationPost ()
#14 0x00007fff524b4221 in ___CFXNotificationPost_block_invoke ()
#15 0x00007fff52472d72 in -[_CFXNotificationRegistrar find:object:observer:enumerator:] ()
#16 0x00007fff52471e03 in _CFXNotificationPost ()
#17 0x00007fff5459b8c7 in -[NSNotificationCenter postNotificationName:object:userInfo:] ()
#18 0x00007fff4fbaf761 in -[NSTextView(NSSharing) didChangeText] ()

At (18) you see what happens when you paste. (17)–(12) dispatch the notification; (11)–(9) shows that the sanitization produces another round of changes – that eventually reach my subscriber at (0).

That’s because NSTextView.didChangeText triggers the notification, NSTextField responds to its field editor’s textDidChange, hands this down to the NSTextCell, then that cell changes the text in the field editor after the fact again, and that triggers a new notification. That was unexpected.

You cannot fake that behavior from within the delegate method:

// Warning, does not work:
func textView(_ textView: NSTextView, shouldChangeTextIn affectedCharRange: NSRange, replacementString: String?) -> Bool {

    if let replacementString = replacementString,
        replacementString.contains("\n") {
            textView.insertText(replacementString.replacingOccurrences(of: "\n", with: " "), replacementRange: affectedCharRange)
    } 

    return true
}

The actual replacement is happening from within the original NSCell.textDidChange, not the delegate method, and I have no clue why that isn’t happening when I don’t forward the call up to NSTextField from my own delegate method implementation. Maybe it’s a private state toggle that’s triggered in the delegate method when you paste \n and which is then processed later in textDidChange. In any case, NSTextDidChange is triggered by the field editor in the regular fashion, only the fixup won’t happen if you implement textView(_:shouldChangeTextIn:replacementString:) yourself.

Better Form Model Validation

Earlier this month, I wrote about validating temporary models for forms. The validation returned .complete or .incomplete, which doesn’t help much when you want to show what did go wrong.

So I came up with a richer validation syntax.

You can see the code as a whole without my comments as a Gist.

Example Model, its Partial, and Validation

If this is our model:

struct User {
    let firstName: String
    let lastName: String
    let age: Int?
}

extension User: PartialInitializable {
    init(from partial: Partial<User>) throws {
        self.firstName = try partial.value(for: \.firstName)
        self.lastName = try partial.value(for: \.lastName)
        self.age = partial.value(for: \.age)
    }
}

Then I want validations to look like this:

// Specify a non-empty list of validation requirements
let validation = Partial<User>.Validation(
    .required(\User.firstName),
    .valueValidation(keyPath: \User.firstName, { !$0.isEmpty })
    .required(\User.lastName),
    .valueValidation(keyPath: \User.lastName, { $0.count > 5 }),
    .valueValidation(keyPath: \User.age, { $0 >= 18 })

Given this validation and a “partial” which contains data provided by the user, executing and evaluating the validation will look like this:

var partial = Partial<User>()
partial.update(\.firstName, to: "foo")
partial.update(\.lastName, to: "bar")
partial.update(\.age, to: 12) // <- too young!

switch validation.validate(partial) {
case .valid(let user): 
    print("Is valid: \(user)")

case .invalid(let reasons):
    for reason in reasons {
        switch reason {
        case .missing(let keyPath):
            if keyPath == \User.firstName { print("Missing first name") }
            if keyPath == \User.lastName  { print("Missing last name") }
        case .invalidValue(let keyPath):
            if keyPath == \User.firstName { print("Invalid first name value") }
            if keyPath == \User.lastName  { print("Invalid last name value") }
            if keyPath == \User.age       { print("User is too young") }
        }
    }
}

Thanks to nested generic types inside generic types, this is pretty easy to accomplish!

The Code, Explained in Much Detail

protocol PartialInitializable {
    init(from partial: Partial<Self>) throws
}

This is new: it is a type restriction so that successful Partial<T>.Validation can attempt to create an instance right away.

Except for the T: PartialInitializable constraint, this is pretty much Ian’s original Partial<T>:

struct Partial<T> where T: PartialInitializable {
    enum Error: Swift.Error {
        case valueNotFound
    }

    private var data: [PartialKeyPath<T>: Any] = [:]

    mutating func update<U>(_ keyPath: KeyPath<T, U>, to newValue: U?) {
        data[keyPath] = newValue
    }

    func value<U>(for keyPath: KeyPath<T, U>) throws -> U {
        guard let value = data[keyPath] as? U else { throw Error.valueNotFound }
        return value
    }

    func value<U>(for keyPath: KeyPath<T, U?>) -> U? {
        return data[keyPath] as? U
    }
}

Here comes the validation extension:

extension Partial {
    struct Validation {
        enum Strategy {
            case required(PartialKeyPath<T>)
            case value(AnyValueValidation)

            static func valueValidation<V>(keyPath: KeyPath<T, V>, _ block: @escaping (V) -> Bool) -> Strategy {
                let validation = ValueValidation(keyPath: keyPath, block)
                return .value(AnyValueValidation(validation))
            }

            struct AnyValueValidation {
                let keyPath: PartialKeyPath<T>
                private let _isValid: (Any) -> Bool

                init<V>(_ base: ValueValidation<V>) {
                    keyPath = base.keyPath
                    _isValid = {
                        guard let value = $0 as? V else { return false }
                        return base.isValid(value)
                    }
                }

                func isValid(partial: Partial<T>) -> Bool {
                    guard let value = partial.data[keyPath] else { return false }
                    return _isValid(value)
                }
            }

            struct ValueValidation<V> {
                let keyPath: KeyPath<T, V>
                let isValid: (V) -> Bool

                init(keyPath: KeyPath<T, V>, _ isValid: @escaping (V) -> Bool) {
                    self.keyPath = keyPath
                    self.isValid = isValid
                }
            }

        }

Partial<T>.Validation.Strategy offers two modes to validate key paths at the moment:

  1. required, which just checks for mere presence of any value, and
  2. value, which performs a custom check via a closure. Use this to validate that a string is non-empty, for example.

The ValueValidation<V> type accepts a key path of type KeyPath<T,V>. Part of the key path’s generic constraints are satisfied by Partial<T>, so the subject is given; the value you point to specifies which type ValueValidation will work on.

Example: The key path \User.name has the type KeyPath<User, String>. If you pass this in, you’ll get a Partial<User>.Validation.ValueValidation<String>. The second parameter in its initializer is the actual boolean validity check. Thanks to the power of generics and nested types within generic types, we end up with a very specialized ValueValidation. The compiler will help us with these in place: we cannot accidentally treat string values as numbers, unlike a dictionary of type [String : Any], where the cast from Any may fail for all the wrong reasons.

While creating ValueValidation objects with a key path and a fitting closure is then very straight-forward, you cannot lump different value type validations together. [ValueValidation<Int>(...), ValueValidation<String>(...)] will be an Array<Any> since the specified types have nothing in common although you and I know they’re supposed to do a similar thing. The same-ness has to be expressed in code, though. In other words, we have to erase the generic information and thus make all generic specializations the same. There’s no communism(_:) function in the Swift standard library that does this for us. We need to provide another type on our own and type-erased the generic V from the ValueValidation<V> in order to do that. I went with the AnyValueValidation approach that hides the generic constraint from the outside and wraps the isValid check accordingly.

That’s why the Strategy case is called value(AnyValueValidation) and not value(ValueValidation) – because the latter won’t compile.

Initializing the type-erased validation sucks big time, though: AnyValueValidation(ValueValidation(keyPath: \User.name, { !$0.isEmpty })).

That’s why I added a static factory called valueValidation. When you call it, it looks like an enum case, but really isn’t. You’ll see how we can call it as .valueValidation(keyPath: \User.firstName, { $0.count > 5 })) in a second.

Now that the Strategy type is declared, here’s how it’s used when computing a Partial<T>.Validation.Result:

        let validations: [Strategy]

        // Prevent creating an empty validation collection by making 1 parameter
        // required, and then add a variadic list afterwards.
        init(_ first: Strategy, _ rest: Strategy...) {
            var all = [first]
            all.append(contentsOf: rest)
            self.validations = all
        }

        enum Result {
            case valid(T)
            case invalid([Reason])
            enum Reason {
                case missing(PartialKeyPath<T>)
                case invalidValue(PartialKeyPath<T>)
            }
        }

        func validate(_ partial: Partial<T>) -> Result {
            var failureReasons: [Result.Reason] = []
            for validation in validations {
                switch validation {
                case .required(let keyPath):
                    if !partial.data.keys.contains(keyPath) {
                        failureReasons.append(.missing(keyPath))
                    }

                case .value(let valueValidation):
                    if !valueValidation.isValid(partial: partial) {
                        failureReasons.append(.invalidValue(valueValidation.keyPath))
                    }
                }
            }

            guard failureReasons.isEmpty else { return .invalid(failureReasons) }

            return .valid(try! T.init(from: partial))
        }
    }
}

The validate method exercises all instances of the validation strategy. The failure reasons match the strategy cases: you get missing for required, and invalidValue for value. Both pass the PartialKeyPath<T> along.

It’d be nice to have the fully qualified KeyPath<T, V> here, but again you cannot lump these together. PartialKeyPath<T> is a type-erased variant already – but using a different approach, namely being the parent class instead of boxing the specialized type in.

After validation, you get a Result; if nothing failed, you will get a fully initialized object. Hopefully. The initializer could throw an error for reasons not expressed in the validation constraints. That’d be bad. And I’d argue that the person responsible for throwing an error that’s not covered by the validation mechanism didn’t adhere to the contract of PartialInitializable.

Improvements

There’s room for improvement. For example, I think I’ll rename Strategy to Constraint and then provide the constraint’s user-facing message upon initialization. That way you can have multiple value validations for the same property like “age is above 18” and “age is below 30” for cheap student insurance rates in Germany with different explanations for the validation failure.

Also, I don’t like the try! a lot. I still think it’s a programmer error if a validation passes but object initialization isn’t possible, because that’s the whole point of all this. But there could be a better way.

Maybe there’s room for more Constraints? The value validation is very powerful already, but maybe it’s too generic and someone could use more specialized cases instead.

Again, if you want to have a look at the whole code, it’s available as a Gist on GitHub. Feedback is very welcome!


→ Blog Archive