Firth Mode: 🍤/2

A quarter-waveplate can change the polarization of an electromagnetic wave passing through a piece of fried shrimp. (This figure is interactive, try dragging the fried shrimp!)

As capabilities advance, we may need to explore … alternative access mechanisms.

Gemma: Open Models Based on Gemini Research and Technology

Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language.

Gemma Prohibited Use Policy

You can be sure LLMs will struggle with more elusive aspects of meaning because they struggle with fundamental aspects of meaning, such as negation, compositionality and coreference. A vector space representation of language as character sequences doesn’t equip them to handle such phenomena. Rotating 2D planes to encode positional information for a “performance boost” on a machine translation task from one high resource language to another high resource language or normalizing the embedding dimensionality for faster convergence (yet another signal processing optimization technique) will not remedy this situation. Reasoning is not a domain. You can’t get “better performance” on reasoning. You’re either equipped for it or you’re not. These models lack the ability to apply common sense reasoning, and less sophisticated forms of reasoning, in every situation.

The main problem with these models is that they begin with the assumption that tiktokens2vec is a good idea. It’s described here as a “successful research innovation” that will enable “downstream developers” and “the next wave of innovations”. word2vec, GloVe, WordPiece, SentencePiece, and GPT-2’s BPE are iterations of the same thing. The field of natural language processing doesn’t even seem interested in language anymore. If it was adjacent to information retrieval before it’s starting to look more like a subfield of information retrieval. Its TPUs are the stream and everyone is a downstream developer. Google’s message is that nothing is more important than scale and nobody can scale like Google.

What’s more boring than a rotating 2D plane for encoding the positional information of character sequences? Why wait until 2034 for Ra- Ri- and Ruformers trained on XPUs and evaluated against the same data sets? I’m here to tell you you don’t have to. Introducing Rollformer 3D. It’s a new model for encoding the positional information of elements in a sequence by following a traversal path along the surface of a sphere. This path may be in the shape of a question mark, a shrugs emoji or a capital D, as in Downstream Developer, Dunno or Captain D’s. If you don’t get a performance boost on your first few rolls don’t be discouraged. Just keep rolling until you do. Then publish your results on arXiv.