It's interesting and honestly encouraging that this kind of thing can be discovered and understood using just "simple linear methods" and high-level analysis of patterns in layer activations.
I ran a comparison of DINOv2 with and without registers on some image embedding tasks for work; DINOv2+registers saw a performance metric bump of 2-3%. Not nothing, not transformative, worth using when the only difference for inference is the model name string you're loading.
I can't speak for the whole industry, but we used it in older UForm <https://github.com/unum-cloud/uform> and saw good adoption, especially among those deploying on the Edge, where every little trick counts. It's hard to pin down exact numbers since most deployments didn't go through Hugging Face, but at the time, these models were likely among the more widely deployed by device count.
[2023]
Previously:
2 years ago: https://news.ycombinator.com/item?id=37794996.
1 year ago: https://news.ycombinator.com/item?id=40329675
Extremely cool!
It's interesting and honestly encouraging that this kind of thing can be discovered and understood using just "simple linear methods" and high-level analysis of patterns in layer activations.
So basically multiple CLS tokens.
Fwiw, I tried multiple global tokens in my chess neural net and didn't see any uplift compared to my baseline of just having one.
Note that it's not done for performance reason but rather to generate clear feature maps.
Has this been used widely since?
I ran a comparison of DINOv2 with and without registers on some image embedding tasks for work; DINOv2+registers saw a performance metric bump of 2-3%. Not nothing, not transformative, worth using when the only difference for inference is the model name string you're loading.
I can't speak for the whole industry, but we used it in older UForm <https://github.com/unum-cloud/uform> and saw good adoption, especially among those deploying on the Edge, where every little trick counts. It's hard to pin down exact numbers since most deployments didn't go through Hugging Face, but at the time, these models were likely among the more widely deployed by device count.
For example, it is used here https://github.com/facebookresearch/vggt/
yes