Working with Process Pipe and Its 64KiB Limit

Helge Heß pointed out that naive usage of Pipe in child Processes can break your program if you pipe too much data.

I wasn’t aware of this, followed his references, and here are my findings.

Pipe Buffer Size

Older Mac OS X versions had a pipe buffer size of 16KiB by default, offering 64KiB on demand; in my N=1 test on an M1 with macOS 14, I always get 64KiB buffers, even if I only send 1 Byte. Run pipe buffer size discovery tests yourself to check.

So the upper limit is 64KiB (65536 Bytes) for all intents and purposes.

Want to send even 1 Byte extra?

  • If the data is larger than the pipe buffer, you need to drain the corresponding FileHandle with repeated read calls. (Or provide data larger than 64KiB with repeated write calls, respectively.)
  • If you try to send/receive the whole buffer in one go, from a user’s perspective, your program will freeze, and the read call never return. As a CLI app, it’ll never terminate.

Naive reading from the FileHandle would be using the deprecated FileHandle.readDataToEndOfFile() you will see in many examples online, and the somewhat newer readToEnd() API.

Instead, you’re supposed to use readabilityHandler with FileHandle.availableData for reading, and writeabilityHandler with FileHandle.write(_:) to stream data from/to a pipe.

(That implies that Paul Hudson’s “How to run an external program using Process” needs to be changed to be safe.)

Sending Data via STDIN to Another Process

To be clear, this is not about reading your process’s standard input.

From the perspective of the child Process you’re about to spawn, the data you send is its standard input pipe. The following helper will allow you to write something like:

childProcess.standardInput = try .stdin(string: ...)!

Here’s an example helper to create a Pipe that you can use as standard input to another process to send an arbitrarily long string:

extension Pipe {
	static func stdin(string: String) throws -> Pipe? {
		guard let data = string.data(using: .utf8) else { return nil }
		let stdin = Pipe()
		stdin.fileHandleForWriting.writeabilityHandler = { handle in
			handle.write(data)
			try! handle.close()  // Without closing, it'll never finish, but
                                 // what to do with the error except crash
                                 // is not clear to me :)
			handle.writeabilityHandler = nil
		}
		return stdin
	}
}

As you can see, I don’t need to compute 64KiB chunks to make this work, I just need to use the writabilityHandler.

The similar-looking naive example of writing the data directly won’t work for strings larger than the 64KiB limit:

extension Pipe {
	static func broken_stdin(string: String) throws -> Pipe {
		guard let data = string.data(using: .utf8) else { return nil }
		let stdin = Pipe()
		try stdin.fileHandleForWriting.write(contentsOf: data)
		try stdin.fileHandleForWriting.close()
		return stdin
	}
}

Reading Data from a Child Process’s STDOUT (or Your STDIN)

Helge shares a ProcessHelper implementation that shows how to use a readabilityHandler and collect the data.

A simplified example is:

let stdoutPipe = Pipe()
var outputData = Data()
let outputDataQueue = DispatchQueue(label: "outputDataQueue")
stdoutPipe.fileHandleForReading.readabilityHandler = { handle stdin
	let data = handle.availableData
    outputDataQueue.async { outputData.append(data) }
}

// run the child process, wait for it to finish, then use outputData

stdoutPipe.fileHandleForReading.readabilityHandler = nil
stdoutPipe.fileHandleForReading.closeFile()  // Helge runs this on outputDataQueue, but I'm not certain it's necessary

Appending data from another queue is hinted at from the docs:

Assigning a valid Block object to this property creates a dispatch source for reading the contents of the file or socket. Your block is submitted to the file handle’s dispatch queue when there is data to read.

So the block may not run on your calling thread or the main queue at all. Assume it doesn’t, and put your operations to a queue under your control.

Run the Experiments

To see for yourself, I uploaded a .sh receiver and .swift sender as a Gist.

  • The sender will pipe 64KiB of string data, in 1KiB chunks, to the receiver.
  • The receiver echo’s whatever it gets.
  • Once you exceed 65536 Bytes even by 1 Byte, with the naive approach, the receiver won’t echo anything and your program won’t terminate (at all).

Takeaway

Unless you know what you’ll be sending, and that it won’t exceed 64KiB, avoid e.g. write(contentsOf:) and use the block-based write/read handlers. If you don’t want to use the block-based handler for some reason, make sure to add an assert or precondition to codify your expectation of maximum data size.

Crashing there is better than the process never finishing.

My use case was about reading a user-provided text file from disk, performing some transformations, and then pipe the result to another program. The mere presence of user-provided data from the file is reason enough to use the safer methods. You can’t make meaningful assumptions about the data size.

Closing With a Warning

  • Sven Schmidt from the Swift Package Index mentioned that they ran into hard-to-debug problems with Swift’s standard Process implementation, so they migrated the SPI code to use Swift Tools Support Core (TSC) Process. It sounds like they could be migrating swift-testing’s take on process spawning in the future, though (esp. since TSC is being deprecated, it sounds like adopting TSC’s code nowadays will put the burden of maintenance on you in the long run).
  • Matt Massicotte also recalls running into weird bugs that were hard to reproduce when using a readabilityHandler and handle.availableData.