Cardan grille | Svelte Hacker News

gwern 8 months ago

This would be a perfect fit for a LLM (albeit a bit complicated by byte-pair tokenization and decoder models rather than bidirectional). You can clamp the grille words, and just generate samples with those as requirements until they reach a good likelihood and are highly fluent and indistinguishable from non-message carrying text.

andrewflnr 8 months ago

That reminds me of this suckerpinch/tom7 video: https://www.youtube.com/watch?v=Y65FRxE7uMc
eru 8 months ago

> (albeit a bit complicated by byte-pair tokenization and decoder models rather than bidirectional)
You can clamp tokens (instead of letters / words) in the grille, I guess?
- gwern 8 months ago
  
  No, it's the other ones that are the problem, the ones you're trying to find. You need to maintain a fixed width of characters, and while you know what characters the secret words tokenize from, and so there's no problem there (you just record the # of characters, turn them into tokens, and you're fine), all of the possible chaff text could have the wrong character size. You sample a possible text of tokens, and the likelihood is good and it reads fluently and the secret words don't stick out, but then the first line has 15 characters too many, the second has 5 too few, and so on, simply because of the vagaries of BPEs.
  (And you can't replace the grill positions with relative indexes like 'the ith, mth, and nth tokens' because the tokenizations will all change... Maybe a more complicated relative index like 'whitespace separated character-ranges'... Well, whatever you do, it'll be annoying when it's a BPE LLM instead of a character-level model.)
  
  eru 8 months ago
  
  Oh, that's true, if you want your grille be a physical grille over the printed out text.
  I was assuming you have a grille over the tokens.

nxobject 8 months ago

I imagine this is a practical example of tradeoffs and threat models: mapping syllables-to-syllables is quite a lot of information to leak from the plaintext, but if the cost of that was a less suspicious ciphertext in the age of intercepting physical mail, it would have been worth it.

In the age of dragnet surveillance and megaflops it seems incredibly weak, especially with smaller messages.

jhoechtl 8 months ago

Even more impressive is the lecture of the life of Cardano himself.

janderson215 8 months ago

“Cars…

Cars on…

Carson City.” -John Cusack in Con Air