From the deadwax

Your Audio Is Never Re-encoded

2026 · 03 · 10

engineeringformatsaudioartwork

The first question anyone with a lossless collection asks about Private Press is: "Will this touch my audio?"

No. Here's exactly why not, at the binary level.

Audio files are not blobs

An audio file isn't a single stream of data. It's a structured container with distinct regions for different kinds of information. Metadata lives in one region. Audio samples live in another. The format defines the boundary.

MP3: ID3v2 tags sit in a header before the audio frames. The audio payload starts at a known offset after the tag block. Writing new artwork means rewriting the ID3 header. The audio frames, every sample, every bit, remain byte-identical.
M4A / ALAC: The MP4 container organizes data into atoms (also called boxes). Artwork goes in the covr atom inside the ilst metadata atom, which lives inside moov. Audio samples live in mdat. These are separate atoms at separate positions in the file. Changing covr doesn't touch mdat.
FLAC: Uses a header of metadata blocks followed by audio frames. The PICTURE block holds artwork. The STREAMINFO block describes the audio. Audio frames follow the metadata blocks. Replacing the PICTURE block means rewriting the metadata header. The audio frames stay untouched.
AIFF: Organizes data into chunks. Audio samples live in the SSND chunk. Metadata lives in an ID3 chunk (note the trailing space, that's part of the spec). Different chunks, different byte ranges.

In every format, the architectural guarantee is the same: artwork and audio are structurally separate. You can rewrite one without touching the other. But you have to do it correctly.

What "correctly" requires

You can't just overwrite bytes in place. When artwork changes size (and it almost always does, since you're replacing a 200px thumbnail with a 1400px high-resolution image), the metadata region grows. Everything after it shifts. In some formats, that shift breaks playback unless you fix the pointers.

MP4 is the hardest. The mdat atom contains raw audio samples, and the stco (sample-to-chunk offset) table tells the decoder where each chunk of samples starts. Those offsets are absolute byte positions in the file. If you insert a larger covr atom before mdat, every sample offset in stco is now wrong by the number of bytes you added. Miss this and the file plays silence, or crashes the decoder, or plays at the wrong position.

Private Press recalculates every entry in stco (32-bit offsets) and co64 (64-bit offsets for large files). The recalculation walks the moov atom recursively, finds every offset table, and adjusts each entry by the exact delta. If a 32-bit offset would overflow after adjustment, meaning the file is too large for 32-bit chunk offsets, the operation fails cleanly rather than producing a corrupt file.

FLAC is strict. Metadata blocks form a linked chain. Each block header declares its type, length, and whether it's the last block before audio frames. Replace the PICTURE block with a larger one and you need to rewrite every subsequent block header's position. The is-last flag on the final metadata block must be correct, or the decoder won't find where audio starts.

AIFF requires chunk boundary precision. AIFF chunks must be word-aligned (2-byte boundaries). An odd-length ID3 chunk needs a padding byte. Get the alignment wrong and chunk parsing fails for everything after the ID3 data, including the SSND audio chunk.

Why we don't load the whole file

A hi-res 24-bit/96kHz ALAC file can be 900MB. A hi-res FLAC can be 400MB. Loading that into memory to change 50KB of metadata is wasteful and dangerous. On a machine processing thousands of files, you'll exhaust memory.

Private Press uses readMetadataStreaming for M4A files. Instead of Data(contentsOf: url) (which loads the entire file), it opens a FileHandle, scans the top-level atom headers to find moov, reads only the moov atom's bytes, and parses metadata from that. The audio in mdat, which is 99.9% of the file, is never read into memory at all.

FLAC uses readMetadataOnly, which reads metadata blocks via FileHandle and stops at the audio frame boundary. MP3 reads only the ID3 tag data. AIFF parses chunk headers to locate the ID3 chunk without reading SSND.

The result: scanning 3,000 albums reads megabytes of metadata, not terabytes of audio.

Atomic writes

The actual write follows a strict sequence:

Backup. If backups are enabled, copy the original file to ~/Library/Application Support/PrivatePress/Backups/. This directory is app-owned, so no security-scoped bookmark is required.
Write to temp file. Construct the new file contents (updated metadata + unchanged audio) and write them to a temporary file in the same directory as the original. Same filesystem, so the atomic swap is guaranteed to be atomic by the OS.
Verify. The temp file exists and has the expected size.
Atomic swap. Replace the original with the temp file using FileManager.replaceItemAt. This is a rename operation at the filesystem level. It either fully succeeds or the original is untouched. No partial writes. No half-updated files. No corrupt state.

If anything fails at any step (disk full, permissions error, corrupt metadata parse) the original file is still exactly where it was, byte-for-byte identical. The operation is all-or-nothing.

Why not use Apple's frameworks?

macOS ships with AVFoundation, a comprehensive media framework. It can read metadata from audio files. It can write metadata. So why build custom editors from scratch?

Because AVFoundation was designed for playback and media pipelines, not for surgical metadata editing across thousands of files.

When reading metadata from large M4A files through AVFoundation, memory usage scales with file size rather than metadata size. For a 5MB AAC track, that's fine. For a 900MB 24-bit/96kHz ALAC file, it's a problem. For 3,000 of them in a batch scan, it's untenable. Our editors read only the metadata regions via file handles, never touching the audio payload. The difference isn't marginal. It's the difference between scanning a large collection in seconds and running out of memory.

There's also no single Apple API that handles metadata uniformly across MP3, M4A, FLAC, and AIFF. Each format has different container structures, different metadata conventions, and different edge cases. AVFoundation handles some of these well. Others, not completely. We found scenarios during development where the framework produced incorrect results or couldn't handle specific file configurations that are common in real-world collections. Building our own editors meant we could handle every edge case we encountered, rather than working around framework limitations we couldn't control.

The issues aren't limited to metadata. Apple's audio processing APIs handle basic operations like stereo-to-mono channel mixing and sample rate conversion differently than you'd expect. In some cases, what should be a simple average of left and right channels produces something closer to one channel. Sample rate converters use different internal filter parameters than reference implementations. For playback, these differences are inaudible. For audio analysis that needs sample-level accuracy, they produce wrong results. We had to reimplement these operations ourselves to get correct output.

And there's a deeper issue: Apple's media APIs are designed around display and playback. They're excellent at providing metadata for a player UI. But writing metadata back into files at scale, with atomic guarantees, format-specific binary correctness, and offset recalculation, isn't what they were built for. We needed write-path code we could fully control, fully test, and fully audit.

Zero dependencies

So we wrote every editor ourselves. Pure Swift. No FFmpeg. No TagLib. No SFBAudioEngine. No vendored C libraries of any kind.

This isn't philosophical minimalism. It's a security decision. Private Press touches your irreplaceable audio files. Every line of code in the write path is code we wrote, code we test, and code we can audit. There's no third-party library that might introduce a regression in a version bump. No C dependency with its own memory management that might overflow a buffer into your audio data.

The entire supply chain is Apple frameworks (for everything except the write path) and Keynell code. 37 Apple frameworks for UI, networking, system integration. Zero third-party code anywhere near your files.

Hardware acceleration

There's another reason to write everything in Swift: it lets us use the hardware properly.

Every Mac with Apple Silicon has dedicated acceleration hardware that most audio tools never touch. The AMX coprocessor handles vector math. The Neural Engine runs machine learning models. They're sitting right there on the die, waiting for code that knows how to use them.

Private Press does. Audio fingerprinting runs through Accelerate's vDSP, the same signal processing framework Apple uses internally. The FFT that powers our Chromaprint implementation, the autocorrelation in our FLAC encoder's LPC stage, the vector operations in our perceptual quality analyzer: all of it dispatches to the AMX coprocessor automatically. No configuration. No GPU kernel tuning. Just native Swift calling Accelerate, running on silicon designed for exactly this kind of math.

For artwork quality analysis, a Core ML model runs on the Neural Engine to detect upscaled images, compression artifacts, watermarks, and placeholders. The Neural Engine processes these classifications in milliseconds per image, across an entire collection, without competing with the CPU for cycles.

On macOS 26, Apple Intelligence adds another layer. Private Press uses Foundation Models for on-device metadata cleanup: extracting featured artists from mangled title fields, normalizing edition names, cleaning up tag inconsistencies that accumulated over years of ripping and importing. It runs entirely on-device. No cloud API. No data leaving your Mac.

This is what "zero dependencies" actually buys you. A vendored C library like FFmpeg or TagLib can't use the Neural Engine. It can't call Accelerate's vDSP without a bridging layer. It brings its own math, compiled for a generic target, running on the CPU while dedicated hardware sits idle. Writing everything in Swift means every computation goes through the path Apple designed for it, on hardware Apple built for it.

The result: every compute-intensive operation in Private Press runs on the hardware Apple built for it.

Provenance

Every press is recorded: which provider supplied the artwork, its resolution, the confidence score, which file was modified, and a timestamp. The provenance record is HMAC-signed so it can't be tampered with after the fact.

If you ever need to know what changed, when, and why, it's there. If you want to re-press with different artwork later, you can see exactly what you're replacing.

This is what "earned, not aspirational" means in practice. "Your audio is never re-encoded" isn't a marketing promise. It's an architectural guarantee, enforced by the binary structure of every format we support, verified by the atomic write pattern, and recorded in a signed provenance trail.

Every claim is literally true. That's the only kind we make.