Cairn is a macOS screenshot tool that captures the screen and your voice at the exact same moment. That two-second mumble — "the Postgres lock thing Ivan sent me" — becomes the search key that beats every clever AI tag, because it’s how you would actually look for it.
Every other tool in this category is racing to understand your screenshot — object detection, OCR, captioning, embeddings. It’s impressive. It’s also wrong — because three weeks from now you don’t want "a code editor with a Markdown file." You want the postgres lock thing Ivan sent you.
We also generate AI tags — we’re not luddites — but at search time we weight your voice tag heaviest. Because you wrote the query. You wrote the tag. They will rhyme.
A capture is a deliberate act, not an ambient surveillance tape. We do not record passively.
Press ⌥⇧2 — or whatever you like. As long as you hold, the mic is open. Let go and the screenshot fires.
Two seconds. A whisper. A swear. A name. "that pricing slide for the Tuesday meeting". Your voice, the screen, and the meaning all become one memory.
You’re done. The capture lives in a single file on your disk. No cloud round-trip, no “syncing…” spinner, no second thoughts about what you just said into a microphone.
Search listens for three things at once: what you said (loudest), what was on the screen, and what the moment was about. You don’t need the right words — you need your words.
“The thing my mom sent about the wedding venue” — and there it is.
No fine print, no asterisks, no “by accepting these terms.”