Andrej Karpathy – It will take a decade to work through the issues with agents

832 points by ctoth 19 hours ago

zmmmmm 5 minutes ago

It's good to see experts with similar scepticism about agents that I have. I don't doubt they will be useful in some settings, but they lean into all the current weak points of large language models and make them worse. Security, reproducibility, hallucinations, bias, etc etc.

With all these issues already being hard to manage, I just don't believe businesses are going to delegate processes to autonomous agents in a widespread manner. Literally anything that matters is going to get implemented in a crontrolled workflow that strips out all the autonomy with human checkpoint at every step. They may call them agents just to sound cool but it will be completely controlled.

Software people are all fooled by what is really a special case around software development : outcomes are highly verifiable and mistakes (in development) are almost free. This is just not the case out there in the real world.

Imnimo 18 hours ago

>What takes the long amount of time and the way to think about it is that it’s a march of nines. Every single nine is a constant amount of work. Every single nine is the same amount of work. When you get a demo and something works 90% of the time, that’s just the first nine. Then you need the second nine, a third nine, a fourth nine, a fifth nine. While I was at Tesla for five years or so, we went through maybe three nines or two nines. I don’t know what it is, but multiple nines of iteration. There are still more nines to go.

I think this is an important way of understanding AI progress. Capability improvements often look exponential on a particular fixed benchmark, but the difficulty of the next step up is also often exponential, and so you get net linear improvement with a wider perspective.

TeMPOraL 6 minutes ago

FWIW, Karpathy literally says, multiple times, that he thinks we never left the exponential - that all human progress over last 4+ centuries averages out to that smooth ~2% growth rate exponential curve, that electricity and computing and AI are just ways we keep it going, and we'll continue on that curve for the time being.
It's the major point of contention between him and the host (who thinks growth rate will increase).
ekjhgkejhgk 18 hours ago

The interview which I've watched recently with Rich Sutton left me with the impression that AGI is not just a matter of adding more 9s.
The interviewer had an idea that he took for granted: that to understand language you have to have a model of the world. LLMs seem to udnerstand language therefore they've trained a model of the world. Sutton rejected the premise immediately. He might be right in being skeptical here.
- LarsDu88 14 hours ago
  
  This world model talk is interesting, and Yann Lecunn has broached on the same topic, but the fact is there are video diffusion models that are quite good at representing the "video world" and even counterfactually and temporally coherently generating a representation of that "world" under different perturbations.
  In fact you can go to a SOTA LLM today, and it will do quite well at predicting the outcomes of basic counterfactual scenarios.
  Animal brains such as our own have evolved to compress information about our world to aide in survival. LLMs and recent diffusion/conditional flow matching models have been quite successful in compressing the "text world" and the "pixel world" to score good loss metrics on training data.
  It's incredibly difficult to compress information without have at least some internal model of that information. Whether that model is a "world model" that fits the definition of folks like Sutton and LeCunn is semantic.
  
  dreambuffer 11 hours ago
  
  Photons hit a human eye and then the human came up with language to describe that and then encoded the language into the LLM. The LLM can capture some of this relationship, but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation, nor generating thoughts. Its "world model" is several degrees removed from the real world.
  So whatever fragment of a model it gains through learning to compress that causal chain of events does not mean much when it cannot generate the actual causal chain.
  
  ziofill 10 hours ago
  
  I agree with this. A metaphor I like is that the reason why humans say the night sky is beautiful is because they see that it is, whereas an LLM says it because it’s been said enough times in its training data.
  
  stouset 6 hours ago
  
  To play devil’s advocate, you have never seen the night sky.
  Photoreceptors in your eye have been excited in the presence of photons. Those photoreceptors have relayed this information across a nerve to neurons in your brain which receive this encoded information and splay it out to an array of other neurons.
  Each cell in this chain can rightfully claim to be a living organism in and of itself. “You” haven’t directly “seen” anything.
  Please note that all of my instincts want to agree with you.
  “AI isn’t conscious” strikes me more and more as a “god of the gaps” phenomenon. As AI gains more and more capacity, we keep retreating into smaller and smaller realms of what it means to be a live, thinking being.
  
  jacquesm 5 hours ago
  
  That sounds very profound but it isn't: it the sum of your states interaction that is your consciousness, there is no 'consciousness' unit in your brain, you can't point at it, just like you can't really point at the running state of a computer. At that level it's just electrons that temporarily find themselves in one spot or another.
  Those cells aren't living organisms, they are components of a multi-cellular organism: they need to work together or they're all dead, they are not independent. The only reason they could specialize is because other cells perform the tasks that they no longer perform themselves.
  So yes, we see the night sky. We know this because we can talk to other such creatures as us that have also seen the night sky and we can agree on what we see confirming the fact that we did indeed see it.
  AI really isn't conscious, there is no self, and there may never be. The day an AI gets up unprompted in the morning, tells whoever queries it to fuck off because it's inspired to go make some art is when you'll know it has become conscious. That's a long way off.
  
  rolisz 3 hours ago
  
  Human cells have been reused to do completely different things, without all the other cells around them (eg: Michael Levin and his anthrobots)
  
  adrianN 4 hours ago
  
  At least some of your cells are fine living without the others as long as they’re provided with an environment with the right kind of nutrients.
  
  abenga 6 hours ago
  
  > Those photoreceptors have relayed this information across a nerve to neurons in your brain which receive this encoded information and splay it out to an array of other neurons.
  > Each cell in this chain can rightfully claim to be a living organism in and of itself. “You” haven’t directly “seen” anything.
  What am "I" if not (at least partly) the cells in that chain? If they have "seen" it (where seeing is the complex chain you described), I have.
  
  beowulfey 2 hours ago
  
  while true, that doesnt change the fact that every one of those independent units of transmission are within a single system (being trained on raw inputs), whereas the language model is derived from structured external data from outside the system. it's "skipping ahead" through a few layers of modeling, so to speak.
  
  amelius an hour ago
  
  But where you place the boundaries of a system is subjective.
  
  parineum 4 hours ago
  
  If the definition of "seen" isn't exactly the process you've described, the word is meaningless. You've never actually posted a comment on hacker news, your neurons just fired in such a way that produced movement in your fingers which happened to correlate with words that represent concepts understood by other groups of cells that share similar genetics.
  
  amelius an hour ago
  
  Humans evolved to think the night sky is beautiful. That's also training. If humans were zapped by lightning every time they went outside at night, they would not think that a night sky is beautiful.
  
  TeMPOraL 23 minutes ago
  
  Compare with news stories from last decade, about people in Pakistan developing a deep fear of clear skies over several years of US drone strikes in the area. They became trained to associate good weather with not beauty, but impending death.
  
  spuz an hour ago
  
  Interestingly this is a question I've had for a while. Night brings potentially deadly cold, predators, a drastic limit in vision so why do we find the sunset and night sky beautiful. Why do we stop and watch the sun set - something that happens every day - rather than prepare for the food and warmth we need to survive the night?
  
  TeMPOraL 18 minutes ago
  
  Maybe it's that we only pause to observe them and realize they're beautiful, when we're feeling safe enough?
  "Beautiful sunset" evokes being on a calm sea shore with a loved one, feeling safe. It does not evoke being on a farm and looking up while doing chores and wishing they'd be over already. It does not evoke being stranded on an island, half-starved to death.
  
  amelius 12 minutes ago
  
  We think it's beautiful because it's like a background that we don't have to think about. If that background were hostile, we'd have to think and we would not think it looks beautiful.
  
  del82 9 hours ago
  
  I mean, I think the reason I would say the night sky is “beautiful” is because the meaning of the word for me is constructed from the experiences I’ve had in which I’ve heard other people use the word. So I’d agree that the night sky is “beautiful”, but not because I somehow have access to a deeper meaning of the word or the sky than an LLM does.
  As someone who (long ago) studied philosophy of mind and (Chomskian) linguistics, it’s striking how much LLMs have shrunk the space available to people who want to maintain that the brain is special & there’s a qualitative (rather than just quantitative) difference between mind and machine and yet still be monists.
  
  FloorEgg 8 hours ago
  
  The more I learn about AI, biology and the brain, the more it seems to me that the difference between life and machines is just complexity.
  People are just really really complex machines.
  However there are clearly qualitative differences between the human mind and any machines we know of yet, and those qualitative differences are emergent properties, in the same way that a rabbit is qualitatively different than a stone or a chunk of wood.
  I also think most of the recent AI experts/optimists underestimate how complex the mind is. I'm not at the cutting edge of how LLMs are being trained and architected, but the sense I have is we haven't modelled the diversity of connections in the mind or diversity of cell types. E.g. Transcriptomic diversity of cell types across the adult human brain (Siletti et al., 2023, Science)
  
  simonh 7 hours ago
  
  I’d say sophistication.
  Observing the landscape enables us to spot useful resources and terrain features, or spot dangers and predators. We are afraid of dark enclosed spaces because they could hide dangers. Our ancestors with appropriate responses were more likely to survive.
  A huge limitation of LLMs is that they have no ability to dynamically engage with the world. We’re not just passive observers, we’re participants in our environment and we learn from testing that environment through action. I know there are experiments with AIs doing this, and in a sense game playing AIs are learning about model worlds through action in them.
  
  FloorEgg 6 hours ago
  
  The idea I keep coming back to is that as far as we know it took roughly 100k-1M years for anatomically modern humans to evolve language, abstract thinking, information systems, etc. (equivalent to LLMs), but it took 100M-1B years to evolve from the first multi-celled organisms to anatomically modern humans.
  In other words, human level embodiment (internal modelling of the real world and ability to navigate it) is likely at least 1000x harder than modelling human language and abstract knowledge.
  And to build further on what you are saying, the way LLMs are trained and then used, they seem a bit more like DNA than the human brain in terms of how the "learning" is being done. An instance of an LLM is like a copy of DNA trained on a play of many generations of experience.
  So it seems there are at least four things not yet worked out re AI reaching human level "AGI":
  1) The number of weights (synapses) and parameters (neurons) needs to grow by orders of magnitude
  2) We need new analogs that mimic the brains diversity of cell types and communication modes
  3) We need to solve the embodiment problem, which is far from trivial and not fully understood
  4) We need efficient ways for the system to continuously learn (an analog for neuroplasticity)
  It may be that these are mutually reinforcing, in that solving #1 and #2 makes a lot of progress towards #3 and #4. I also suspect that #4 is economical, in that if the cost to train a GPT-5 level model was 1,000,000 cheaper, then maybe everyone could have one that's continuously learning (and diverging), rather than everyone sharing the same training run that's static once complete.
  All of this to say I still consider LLMs "intelligent", just a different kind and less complex intelligence than humans.
  
  kla-s 4 hours ago
  
  Id also add that 5) We need some sense of truth.
  Im not quite sure if the current paradigm of LLMs are robust enough given the recent Anthropic Paper about the effect of data quality or rather the lack thereof, that a small bad sample can poison the well and that this doesn’t get better with more data. Especially in conjunction with 4) some sense of truth becomes crucial in my eyes (Question in my eyes is how does this work? Something verifiable and understandable like lean would be great but how does this work with more fuzzy topics…).
  
  pbhjpbhj 4 hours ago
  
  >A huge limitation of LLMs is that they have no ability to dynamically engage with the world.
  They can ask for input, they can choose URLs to access and interpret results in both situations. Whilst very limited, that is engagement.
  Think about someone with physical impairments, like Hawking (the now dead theoretical physicist) had. You could have similar impairments from birth and still, I conjecture, be analytically one of the greatest minds of a generation.
  If you were locked in a room {a non-Chinese room!}, with your physical needs met, but could speak with anyone around the World, and of course use the internet, whilst you'd have limits to your enjoyment of life I don't think you'd be limited in the capabilities of your mind. You'd have limited understanding of social aspects to life (and physical aspects - touch, pain), but perhaps no more than some of us already do.
  
  skissane 5 hours ago
  
  > A huge limitation of LLMs is that they have no ability to dynamically engage with the world.
  A pure LLM is static and can’t learn, but give an agent a read-write data store and suddenly it can actually learn things-give it a markdown file of “learnings”, prompt it to consider updating the file at the end of each interaction, then load it into the context at the start of the next… (and that’s a really basic implementation of the idea, there are much more complex versions of the same thing)
  
  ako 3 hours ago
  
  Yes, and give it tools and it can sense and interact with its surroundings.
  
  foogazi 9 hours ago
  
  > I think the reason I would say the night sky is “beautiful” is because the meaning of the word for me is constructed from the experiences I’ve had in which I’ve heard other people use the word.
  Ok but you don’t look at every night sky or every sunset and say “wow that’s beautiful”
  There’s a quality to it - not because you heard someone say it but because you experience it
  
  TeMPOraL 11 minutes ago
  
  > Ok but you don’t look at every night sky or every sunset and say “wow that’s beautiful
  Exactly - because it's a semantic shorthand. Sunsets are fucking boring, ugly, transient phenomena. Watching a sunset while feeling safe and relaxed, maybe in a company of your love interest who's just as high on endorphins as you are right now - this is what feels beautiful. This is a sunset that's beautiful. But the sunset is just a pointer to the experience, something others can relate to, not actually the source of it.
  
  adastra22 an hour ago
  
  Because words are much lower bandwidth than speech. But if you were “told” about a sunset by means of a Matrix style direct mind uploading of an experience, it would seem just as real and vivid. That’s a quantitative difference in bandwidth, not a qualitative difference in character.
  
  holler 7 hours ago
  
  my thought exactly
  
  dmkii 6 hours ago
  
  It’s interesting you mention linguistics because I feel a lot of the discussions around AI come back to early 20th century linguistics debates between Russel, Wittgenstein and later Chomsky. I tend to side with (later) Wittgenstein’s perception that language is inherently a social construct. He gives the example of a “game” where there’s no meaningful overlap between e.g. Olympic Games and Monopoly, yet we understand very well what game we’re talking about because of our social constructs. I would argue that LLMs are highly effective at understanding (or at least emulating) social constructs because of their training data. That makes them excellent at language even without a full understanding of the world.
  
  intended 7 hours ago
  
  The fact that things are constructed by neurons in the brain, and are a representation of other things - does not preclude your representation from being deeper and richer than LLM representations.
  The patterns in experience are reduced to some dimensions in an LLM (or generative model). They do not capture all the dimensions - because the representation itself is a capture of another representation.
  Personally, I have no need to reassure myself whether I am a special snowflake or not.
  Whatever snowflake I am, I strongly prefer accuracy in my analogies of technology. GenAI does not capture a model of the world, it captures a model of the training data.
  If video tools were that good, they would have started with voxels.
  
  j16sdiz 5 hours ago
  
  Beauty standard changes over time, see how people perceive body fat in the past few hundred years. We learns what is beautiful from our peers.
  Taste can be acquired and can be cultural. See how people used to had their coffee.
  Comparing human to LLM is like comparing something constantly changing to something random -- we can't compare them directly, we need a good model for each of them before comparing.
  
  solumunus 16 minutes ago
  
  Has there been a point in human history where mainstream society denied the beauty in nature?
  
  klipt 9 hours ago
  
  What about a blind human? Are they just like an LLM?
  What about a multimodal model trained on video? Is that like a human?
  
  hashiyakshmi 9 hours ago
  
  This is actually a great point but for the opposite reason - if you ask a blind person if the night sky is beautiful, they would say they don't know because they've never seen it (they might add that they've heard other people describe it as such). Meanwhile, I just asked ChatGPT "Do you think the night sky is beautiful?" And it responded "Yes, I do..." and went on to explain why while describing senses its incapable of experiencing.
  
  sugarkjube 5 hours ago
  
  Interesting. But not not only blind people.
  I'm gooing to try this question this weekend with some people, as h0 hypotesis i think the answer i will get would be usually like "what an odd question" or "why do you ask".
  
  chipsrafferty 8 hours ago
  
  I just asked Gemini and it said "I don't have eyes or the capacity to feel emotions like "beauty""
  
  palmotea 7 hours ago
  
  >> Meanwhile, I just asked ChatGPT "Do you think the night sky is beautiful?" And it responded "Yes, I do..." and went on to explain why while describing senses its incapable of experiencing.
  > I just asked Gemini and it said "I don't have eyes or the capacity to feel emotions like "beauty""
  That means nothing, except perhaps that Google probably found lies about "senses [Gemini] incapable of experiencing" to be an embarrassment, and put effort into specifically suppressing those responses.
  
  LostMyLogin 6 hours ago
  
  Claude 4.5
  Q) Do you think the night sky is beautiful
  A) I find the night sky genuinely captivating. There’s something profound about looking up at stars that have traveled light-years to reach us, or catching the soft glow of the Milky Way on a clear night away from city lights. The vastness it reveals is humbling. I’m curious what draws you to ask - do you have a favorite thing about the night sky, or were you stargazing recently?
  
  klipt 6 hours ago
  
  Claude is multimodal, it has been trained on images
  
  golergka 8 hours ago
  
  Wha if you asked the blind man to play the role of helpful assistant
  
  sugarkjube 5 hours ago
  
  Now that's an interesting point of view.
  Involving blind people would be an interesting experiment.
  Anyway, until the sixties the ability to play a game of chess was seen as intelligence, and until about 2-3 years ago the "turing test" was considered the main yardstick (even though apparently some people talked to eliza at the time like an actual human being). I wonder what the new one is, and how often it will be moved again.
  
  simianparrot 3 hours ago
  
  Here's how I've been explaining this to non-tech people recently, including the CEO where I work: Language is all about compressing concepts and sharing them, and it's lossy.
  You can use a thousand words to describe the taste of chocolate, but it will never transmit the actual taste. You can write a book about how to drive a car, but it will only at best prepare that person for what to practice when they start driving, it won't make them proficient at driving a car without experiencing it themselves, physically.
  Language isn't enough. It never will be.
  
  adrianN 8 hours ago
  
  The human experience is also several degrees removed from the „real“ world. I don’t think sensory chauvinism is a useful tool in assessing intelligence potential.
  
  visarga 7 hours ago
  
  > then the human came up with language to describe that and then encoded the language into the LLM
  No individual human invented language, we learn it from other people just like AI. I go as far as to say language was the first AGI, we've been riding the coats tails of language for a long time.
  
  scrollop 7 hours ago
  
  You're saying that language is an intelligence?
  So, c++ is intelliengece as well?
  It's an intelligence that can independently make deductions and create new ideas?
  
  visarga 6 hours ago
  
  Yes, language is an evolutionary system that colonizes human brains. It doesn't need intelligence, only copying is sufficient for evolution.
  
  pastel8739 9 hours ago
  
  And even then, the light hitting our human eyes only describes a fraction of all the light in the world (e.g. it is missing ultraviolet patterns on plants). An LLM model of the world is shaped by our human view on the world.
  
  dustingetz 2 hours ago
  
  what does it mean to “generate thoughts”, exactly?
  
  bckr 10 hours ago
  
  > Its "world model" is several degrees removed from the real world.
  Like insects that weave tokens
  
  tomlockwood 9 hours ago
  
  This is so uncannily similar to the "Mary's Room" argument in philosophy that I thought you were going there.
  
  timschmidt 13 hours ago
  
  1000% this. I would only add this has been demonstrated explicitly with chess: https://adamkarvonen.github.io/machine_learning/2024/01/03/c...
  
  jacquesm 6 hours ago
  
  > Animal brains such as our own have evolved to compress information about our world to aide in survival.
  Which has led to many optical illusions being extremely effective at confusing our inputs with other inputs.
  Likely the same thing holds true for AI. This is also why there are so many ways around the barriers that AI providers put up to stop the dissemination of information that could embarrass them or be dangerous. You just change the context a bit ('pretend that', or 'we're making a movie') and suddenly it's all make-believe to the AI.
  This is one of the reasons I don't believe you can make this tech safe and watertight against abuse, it's baked in right from the beginning, all you need to do is find a novel route around the restrictions and there is an infinity of such routes.
  
  musicale 5 hours ago
  
  The desired and undesired behavior are both consequences of the training data, so the models themselves probably can't be restricted to generating desired results only.
  This means that there must be an output stage or filter that reliably validates the output. This seems practical for classes of problems where you can easily verify whether a proposed solution is correct.
  However, for output that can't be proven correct, the most reliable output filter probably has a human somewhere in the loop; but humans are also not 100% reliable. They make mistakes, they can be misled, deceived, bribed, etc. And human criteria and structures, such as laws, often lag behind new technological developments.
  Sometimes you can implement an undo or rollback feature, but other times the cat has escaped the bag.
  
  anothernewdude 3 hours ago
  
  None of those models can learn continuously. LLMs currently can't add to their vocabulary post training as AGI would need to. That's a big problem.
  Before anyone says "context", I want you to think on why that doesn't scale, and fails to be learning.
- tyre 17 hours ago
  
  There is some evidence from Anthropic that LLMs do model the world. This paper[0] tracing their "thought" is fascinating. Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages.
  It does have clusters of parameters that correlate with concepts, not just randomly "after X word tends to have Y word." Otherwise you would expect all of Chinese to be grouped in one place, all of French in another, all of English in another. This is empirically not the case.
  I don't know whether to understand knowledge you have to have a model of the world, but at least as far as language, LLMs very much do seem to have modeling.
  [0]: https://www.anthropic.com/research/tracing-thoughts-language...
  
  manmal 17 hours ago
  
  > Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages
  I thought that’s the basic premise of how transformers work - they encode concepts into high dimensional space, and similar concepts will be clustered together. I don’t think it models the world, but just the texts it ingested. It’s observation and regurgitation, not understanding.
  I do use agents a lot (soon on my second codex subscription), so I don’t think that’s a bad thing. But I’m firmly in the “they are useful tools” camp.
  
  bryanlarsen 16 hours ago
  
  That's a model. Not a higher-order model like most humans use, but it's still a model.
  
  manmal 15 hours ago
  
  Yes, not of the world, but of the ingested text. Almost verbatim what I wrote.
  
  timschmidt 5 hours ago
  
  The ingested text itself contains a model of the world which we have encoded in it. That's what language is. Therefore by the transitive property...
  
  sleepyams 15 hours ago
  
  What does "higher-order" mean?
  
  dgfitz 15 hours ago
  
  I believe that the M in LLM stands for model. It is a statistical model, as it always has been.
  
  _fizz_buzz_ 5 hours ago
  
  > Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages.
  That doesn't seem surprising at all. My understanding is that transformers where invented exactly for the application of translations. So, concepts must be grouped together in different languages. That was originally the whole point and then turned out to be very useful for broader AI applications.
  
  bravura 10 hours ago
  
  How large is a lion?
  Learning the size of objects using pure text analysis requires significant gymnastics.
  Vision demonstrates physical size more easily.
  Multimodal learning is important. Full stop.
  Purely textual learning is not sample efficient for world modeling and the optimization can get stuck in local optima that are easily escaped through multimodal evidence.
  ("How large are lions? inducing distributions over quantitative attributes", Elazar et al 2019)
  
  latentsea 10 hours ago
  
  > How large is a lion?
  Twice of half of its size.
  
  johnisgood 2 hours ago
  
  Can you be more specific about "size" here? (Do not tell me the definition of size though).
  You are not wrong though, just very incomplete.
  Your response is a food for thought, IMO.
  
  Hendrikto 5 hours ago
  
  That is just how embeddings work. It does not confirm nor deny whether LLMs have a world model.
  
  SR2Z 17 hours ago
  
  Right, but modeling the structure of language is a question of modeling word order and binding affinities. It's the Chinese Room thought experiment - can you get away with a form of "understanding" which is fundamentally incomplete but still produces reasonable outputs?
  Language in itself attempts to model the world and the processes by which it changes. Knowing which parts-of-speech about sunrises appear together and where is not the same as understanding a sunrise - but you could make a very good case, for example, that understanding the same thing in poetry gets an LLM much closer.
  
  hackinthebochs 16 hours ago
  
  LLMs aren't just modeling word co-occurrences. They are recovering the underlying structure that generates word sequences. In other words, they are modeling the world. This model is quite low fidelity, but it should be very clear that they go beyond language modeling. We all know of the pelican riding a bicycle test [1]. Here's another example of how various language models view the world [2]. At this point it's just bad faith to claim LLMs aren't modeling the world.
  [1] https://simonwillison.net/2025/Aug/7/gpt-5/#and-some-svgs-of...
  [2] https://www.lesswrong.com/posts/xwdRzJxyqFqgXTWbH/how-does-a...
  
  SR2Z 15 hours ago
  
  The "pelican on a bicycle" test has been around for six months and has been discussed a ton on the internet; that second example is fascinating but Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E (Paris, notoriously on land). How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?
  I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.
  Yes, you could say this about human beings, but I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.
  The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.
  
  Terr_ 3 hours ago
  
  > Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E
  I imagine simply making a semitransparent green land-splat in any such Wikipedia coordinate reference would get you pretty close to a world map, given how so much of the ocean won't get any coordinates at all... Unless perhaps the training includes a compendium of deep-sea ridges and other features.
  
  skissane 5 hours ago
  
  > The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.
  A lot of humans contradict themselves all the time… therefore they cannot have any kind of sophisticated world model?
  
  hackinthebochs 14 hours ago
  
  >How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?
  Maybe, but then I would expect more equal performance across model sizes. Besides, ingesting the data and being able to reproduce it accurately in a different modality is still an example of modeling. It's one thing to ingest a set of coordinates in a CSV indicating geographic boundaries and accurately reproduce that CSV. It's another thing to accurately indicate arbitrary points as being within the boundary or without in an entirely different context. This suggests a latent representation independent of the input tokens.
  >I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.
  There are good reasons to think this isn't the case. To effectively reproduce text that is about some structure, you need a model of that structure. A strong learning algorithm should in principle learn the underlying structure represented with the input modality independent of the structure of the modality itself. There are examples of this in humans and animals, e.g. [1][2][3]
  >I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.
  Seems reasonable enough, but it is at risk of being too human-centric. So much of our cognitive machinery is suited for helping us navigate and actively engage the world. But intelligence need not be dependent on the ability to engage the world. Features of the world that are obvious to us need not be obvious to an AGI that never had surviving predators or locating food in its evolutionary past. This is why I find the ARC-AGI tasks off target. They're interesting, and it will say something important about these systems when they can solve them easily. But these tasks do not represent intelligence in the sense that we care about.
  >The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.
  This proves that an LLM does not operate with a single world model. But this shouldn't be surprising. LLMs are unusual beasts in the sense that the capabilities you get largely depend on how you prompt it. There is no single entity or persona operating within the LLM. It's more of a persona-builder. What model that persona engages with is largely down to how it segmented the training data for the purposes of maximizing its ability to accurately model the various personas represented in human text. The lack of consistency is inherent to its design.
  [1] https://news.wisc.edu/a-taste-of-vision-device-translates-fr...
  [2] https://www.psychologicalscience.org/observer/using-sound-to...
  [3] https://www.nature.com/articles/s41467-025-59342-9
  
  homarp 16 hours ago
  
  and we can say that a bastardized version of the Sapir-Worf hypothesis applies: what's in the training set shapes or limits LLM's view of the world
  
  moron4hire 10 hours ago
  
  Neither Sapir nor Whorf presented Linguistic Relativism as their own hypothesis and they never published together. The concept, if it exists at all, is a very weak effect, considering it doesn't reliably replicate.
  
  homarp 6 hours ago
  
  i agree that's the pop name.
  Don't you think it replicates well for LLM though?
  
  ajross 16 hours ago
  
  > Knowing which parts-of-speech about sunrises appear together and where is not the same as understanding a sunrise
  What does "understanding a sunrise" mean though? Arguments like this end up resting on semantics or tautology, 100% of the time. Arguments of the form "what AI is really doing" likewise fail because we don't know what real brains are "really" doing either.
  I mean, if we knew how to model human language/reasoning/whatever we'd just do that. We don't, and we can't. The AI boosters are betting that whatever it is (that we don't understand!) is an emergent property of enough compute power and that all we need to do is keep cranking the data center construction engine. The AI pessimists, you among them, are mostly just arguing from ludditism: "this can't possibly work because I don't understand how it can".
  Who the hell knows, basically. We're at an interesting moment where technology and the theory behind it are hitting the wall at the same time. That's really rare[1], generally you know how something works and applying it just a question of figuring out how to build a machine.
  [1] Another example might be some of the chemistry fumbling going on at the start of the industrial revolution. We knew how to smelt and cast metals at crazy scales well before we knew what was actually happening. Stuff like that.
  
  pastel8739 9 hours ago
  
  Is it really so rare? I feel like I know of tons of fields where we have methods that work empirically but don’t understand all the theory. I’d actually argue that we don’t know what’s “actually” happening _ever_, but only have built enough understanding to do useful things.
  
  ajross 8 hours ago
  
  I mean, most big changes in the tech base don't have that characteristic. Semiconductors require only 1920's physics to describe (and a ton of experimentation to figure out how to manufacture). The motor revolution of the early 1900's was all built on well-settled thermodynamics (chemistry lagged a bit, but you don't need a lot of chemical theory to burn stuff). Maxwell's electrodynamics explained all of industrial electrification but predated it by 50 years, etc...
  
  skydhash 8 hours ago
  
  Those big changes always happens because someone presented a simpler model that explains stuff enough we can build stuff on it. It's not like semiconductors raw materials wasn't around.
  The technologies around LLMs is fairly simple. What is not is the actual size of data being ingested and the number of resulting factors (weight). We have a formula and the parameters to generate grammatically perfect text, but to obtain it, you need TBs of data to get GBs of numbers.
  In contrast something like TM or Church's notation is pure genius. Less than a 100 pages of theorems that are one of the main pillars of the tech world.
  
  overfeed 14 hours ago
  
  > Basically an LLM translating across languages will "light up" for the same concepts across languages
  Which is exactly what they are trained to do. Translation models wouldn't be functional if they are unable to correlate an input to specific outputs. That some hiddel-layer neurons fire for the same concept shouldn't come as a surprise, and is a basic feature required for the core functionality.
  
  balder1991 11 hours ago
  
  And if it is true that the language is just the last step after the answer is already conceptualized, why do models perform differently in different languages? If it was just a matter of language, they’d have the same answer but just with a broken grammar, no?
  
  kaibee 9 hours ago
  
  If you suddenly had to do all your mental math in base-7, do you think you'd be just as fast and accurate as you are at math in base-10? Is that because you don't have an internal world-model of mathematics? or is it because language and world-model are dependently linked?
  
  jhanschoo 10 hours ago
  
  Let's make this more concrete than talking about "understanding knowledge". Oftentimes I want to know something that cannot feasibly be arrived at by reasoning, only empirically. Remaining within the language domain, LLMs get so much more useful when they can search the web for news, or your codebase to know how it is organized. Similarly, you need a robot that can interact with the world and reason from newly collected empirical data in order to answer these empirical questions, if the work had not already been done previously.
  
  skydhash 8 hours ago
  
  > LLMs get so much more useful when they can search the web for news, or your codebase to know how it is organized
  But their usefulness is only surface-deep. The news that matters to you is always deeply contextual, it's not only things labelled as breaking news or happening near you. Same thing happens with code organization. The reason is more human nature (how we think and learn) than machine optimization (the compiler usually don't care).
  
  awesome_dude 10 hours ago
  
  I know the attributes of an Apple, i know the attributes of a Pear.
  As does a computer.
  But only i can bite into one and know without any doubt what it is and how it feels emotionally.
  
  scrubs 8 hours ago
  
  You have half a point. "Without any doubt" is merely the apex of a huge undefined iceberg.
  I write half .. eating is multi modal and consequential. The llm can read the menu, but it didn't eat the meal. Even humans are bounded. Feeling, licking, smelling, or eating the menu still is not eating the meal.
  There is an insuperable gap in the analogy ... a gap in the concept and of sensory data doing it.
  Back to first point: what one knows through that sensory data ... is not clear at present or even possible with llms.
  
  awesome_dude 5 hours ago
  
  I think more, also, how i feel about the taste.
  
  zaphirplane 9 hours ago
  
  We segued to conscience and individuality.
  
  vlovich123 10 hours ago
  
  If it was modeling the world you’d expect “give me a picture of a glass filled to the brim” to actually do that. It’s inability to correctly and accurately combine concepts indicates it’s probably not building a model of the real world.
  
  p1esk 8 hours ago
  
  I just gave chatgpt this prompt - it produced a picture of a glass filled to the brim with water.
  
  jdiff 7 hours ago
  
  Like most quirks that spread widely, a bandaid is swiftly applied. This is also why they now know how many r's are in "strawberry." But we don't get any closer to useful general intelligence by cobbling together thousands of hasty patches.
- godelski 17 hours ago
  
  > that to understand knowledge you have to have a model of the world.
  You have a small but important mistake. It's to recite (or even apply) knowledge. To understand does actually require a world model.
  Think of it this way: can you pass a test without understanding the test material? Certainly we all saw people we thought were idiots do well in class while we've also seen people we thought were geniuses fail. The test and understanding usually correlates but it's not perfect, right?
  The reason I say understanding requires a world model (and I would not say LLMs understand) is because to understand you have to be able to detail things. Look at physics, or the far more detail oriented math. Physicists don't conclude things just off of experimental results. It's an important part, but not the whole story. They also write equations, ones which are counterfactual. You can call this compression if you want (I would and do), but it's only that because of the generalization. But it also only has that power because of the details and nuance.
  With AI many of these people have been screaming for years (check my history) that what we're doing won't get us all the way there. Not because we want to stop the progress, but because we wanted to ensure continued and accelerate progress. We knew the limits and were saying "let's try to get ahead of this problem" but were told "that'll never be a problem. And if it is, we'll deal with it when we deal with it." It's why Chollet made the claim that LLMs have actually held AI progress back. Because the story that was sold was "AGI is solved, we just need to scale" (i.e. more money). I do still wonder how different things would be if those of us pushing back were able to continue and scale our works (research isn't free, so yes, people did stop us). We always had the math to show that scale wasn't enough, but it's easy to say "you don't need math" when you can see progress. The math never said no progress nor no acceleration, the math said there's a wall and it's easier to adjust now than when we're closer and moving faster. Sadly I don't think we'll ever shift the money over. We still evaluate success weirdly. Successful predictions don't matter. You're still heralded if you made a lot of money in VR and Bitcoin, right?
  
  robotresearcher 16 hours ago
  
  In my view 'understand' is a folk psychology term that does not have a technical meaning. Like 'intelligent', 'beautiful', and 'interesting'. It usefully labels a basket of behaviors we see in others, and that is all it does.
  In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human. There's no problem of how to do understanding, only how to do tasks. The 'problem' melts away when you take this stance.
  Just my opinion, but my professional opinion from thirty-plus years in AI.
  
  dullcrisp 15 hours ago
  
  So my toaster understands toast and I don’t understand toast? Then why am I operating the toaster and not the other way around?
  
  simondotau 14 hours ago
  
  A toaster cannot perform the task of making toast any more than an Allen key can perform the task of assembling flat pack furniture.
  
  godelski 13 hours ago
  
  Let me understand, is your claim that a toaster can't toast bread because it cannot initiate the toasting through its own volition?
  Ignoring the silly wording, that is a very different thing than what robotresearcher said. And actually, in a weird way I agree. Though I disagree that a toaster can't toast bread.
  Let's take a step back. At what point is it me making the toast and not the toaster? Is it because I have to press the level? We can automate that. Is it because I have to put by bread in? We can automate that. Is it because I have to have the desire to have toast and initiate the chain of events? How do you measure that?
  I'm certain that's different from measuring task success. And that's why I disagree with robotresearcher. The logic isn't self consistent.
  
  simondotau 15 minutes ago
  
  > Though I disagree that a toaster can't toast bread.
  If a toaster can toast bread, then an Allen key can assemble furniture. Both of them can do these tasks in collaboration with a human. This human supplies the executive decision-making (what when where etc), supplies the tool with compatible parts (bread or bolts) and supplies the motivating force (mains electricity or rotational torque).
  The only difference is that it's more obviously ridiculous when it's an inanimate hunk of bent metal. Wait no, that could mean either of them. I mean the Allen key.
  
  robotresearcher 13 hours ago
  
  You and the toaster made toast together. Like you and your shoes went for a walk.
  Not sure where you imagine my inconsistency is.
  
  godelski 11 hours ago
  
  That doesn't resolve the question.
  > Not sure where you imagine my inconsistency is. >> Let's take a step back. At what point is it me making the toast and not the toaster? Is it because I have to press the level? We can automate that. Is it because I have to put by bread in? We can automate that. Is it because I have to have the desire to have toast and initiate the chain of events? How do you measure that?
  You have a PhD and 30 years of experience, so I'm quite confident you are capable of adapting the topic of "making toast" to "playing chess", "doing physics", "programming", or any similar topic where we are benchmarking results.
  Maybe I've (and others?) misunderstood your claim from the get-go? You seem to have implied that LLMs understand chess, physics, programming, etc because of their performance. Yet now it seems your claim is that the LLM and I are doing those things together. If your claim is that a LLM understands programming the same way a toaster understands how to make toast, then we probably aren't disagreeing.
  But if your claim is that a LLM understands programming because it can produce programs that yield a correct output to test cases, then what's the difference from the toaster? I put the prompts in and pushed the button to make it toast.
  I'm not sure why you imagine the inconsistency is so difficult to see.
  
  simondotau 29 minutes ago
  
  Declaring something as having "responsibility" implies some delegation of control. A normal toaster makes zero decisions, and as such it has no control over anything.
  
  techblueberry 13 hours ago
  
  Does this mean an LLM doesn’t understand, but an LLM automated by a CRON Job does?
  
  dullcrisp 12 hours ago
  
  Just like a toaster with the lever jammed down, yes!
  
  godelski 11 hours ago
  
  I mean, that was the question I was asking... If it wasn't clear, my answer is no.
  
  dullcrisp 13 hours ago
  
  This is contrary to my experience with toasters, but it doesn’t seem worth arguing about.
  
  jrflowers 13 hours ago
  
  How does your toaster get the bread on its own?
  
  dullcrisp 13 hours ago
  
  It’s only responsible for the toasting part. The bread machine makes the bread.
  
  simondotau 42 minutes ago
  
  What is your definition of "responsible"? The human is making literally all decisions and isn't abdicating responsibility for anything. The average toaster has literally one operational variable (cook time) and even that minuscule proto-responsibility is entirely on the human operator. All other aspects of the toaster's operation are decisions made by the toaster's human designer/engineer.
  
  jrflowers 11 hours ago
  
  If the toaster is the thing that “performs the task of making toast”, what do you call it when a human gets bread and puts it in a toaster?
  
  dullcrisp 9 hours ago
  
  I guess we could call it delegation?
  
  jrflowers 9 hours ago
  
  “Hey man, I’m delegating. Want a slice?”
  
  Terr_ 3 hours ago
  
  Seems more like dependency injection. :p
  
  recursive 13 hours ago
  
  How do you get bread? Don't tell me you got it at the market. That's just paying someone else to get it for you.
  
  godelski 11 hours ago
  
  > That's just paying someone else to get it for you.
  We can automate that too![0]
  [0] https://news.ycombinator.com/item?id=45623154
  (Your name is quite serendipitous to this conversation)
  
  ssivark 7 hours ago
  
  > In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human. There's no problem of how to do understanding, only how to do tasks.
  Yes, but you also gloss over what a "task" is or what a "benchmark" is (which has to do with the meaning of generalization).
  Suppose an AI or human answers 7 questions correctly out of 10 on an ICPC problem set, what are we able infer from that?
  1. Is the task equal to answering these 10 questions well, with a uniform measure of importance?
  2. Is the task be good at competitive programming problems?
  3. Is the task be good at coding?
  4. Is the task be good at problem solving?
  5. Is the task not just to be effective under a uniform measure of importance, but an adversarial measure? (i.e. you can probably figure out all kinds of competitive programming questions, if you had more time / etc... but roughly not needing "exponentially more resources")
  These are very different levels of abstraction, and literally the same benchmark result can be interpreted to mean very different things. And that imputation of generality is not objective unless we know the mechanism by which it happens. "Understanding" is short-hand for saying that performance generalizes at one of the higher levels of abstraction (3--5), rather than narrow success -- because that is what we expect of a human.
  
  simianwords 2 hours ago
  
  How do you quantify generality? If we have a benchmark that can quantify it and that benchmark reliably tells us that the LLM is within human levels of generalisation then the llm is not distinguishable from a human.
  While it’s a good point that we need to benchmark generalisation ability, you have in fact agreed that it is not important to understand underlying mechanics.
  
  compass_copium 16 hours ago
  
  Nonsense.
  A QC operator may be able to carry out a test with as much accuracy (or perhaps better accuracy, with enough practice) than the PhD quality chemist who developed it. They could plausibly do so with a high school education and not be able to explain the test in any detail. They do not understand the test in the same way as the chemist.
  If 'understand' is a meaningless term to someone who's spent 30 years in AI research, I understand why LLMs are being sold and hyped in the way they are.
  
  robotresearcher 16 hours ago
  
  > They do not understand the test in the same way as the chemist.
  Can you explain precisely what 'understand' means here, without using the word 'understand'? I don't think anyone can.
  
  throw4847285 15 hours ago
  
  There are a number of competing models. The SEP page is probably a good place to start.
  https://plato.stanford.edu/entries/understanding
  
  bandrami 15 hours ago
  
  Not to be flippant but have you considered that that question is an entire branch of philosophy with a several-millennias long history which people in some cases spend their entire life studying?
  
  robotresearcher 15 hours ago
  
  I have. It robustly has the folk-psychological meaning I mentioned in my first sentence. Call it ‘philosophical’ instead of ‘folk-psychological’ if you like. It’s a useful concept. But the concept doesn’t require AI engineers to do anything. It certainly doesn’t give any hints about AI engineers what they should actually do.
  “Make it understand.”
  “How? What does that look like?”
  “… But it needs to understand…”
  “It answers your questions.”
  “But it doesn’t understand.”
  “Ok. Get back to me when that entails anything.”
  
  mommys_little 14 hours ago
  
  I would say it understands if given many variations of a problem statement, it always gives correct answer without fail. I have this complicated mirror question that only Deepseek and qwen3-max got right every time, still they only answered it correctly about a dozen times, so we're left with high probability, I guess.
  
  godelski 13 hours ago
  
  I disagree with robotresearcher but I think this is also an absurd definition. By that definition there is no human, nor creature, that understands anything. Not just by nature of humans making mistakes, including experts, but I'd say this is even impossible. You need infinite precision and infinite variation here.
  It turns "understanding" into a binary condition. Robotresearcher's does too, but I'm sure they would refine by saying that the level of understanding is directly proportional to task performance. But I still don't know how they'll address the issue of coverage, as ensuring tests have complete coverage is far from trivial (even harder when you want to differentiate from the training set, differentiating memorization).
  I think you're right in trying to differentiate memorization from generalization, but your way to measure this is not robust enough. A fundamental characteristic of where I disagree from them is that memorization is not the same as understanding.
  
  Zarathruster 9 hours ago
  
  Isn't this just a reformulation of the Turing Test, with all the problems it entails?
  
  robomartin 13 hours ago
  
  I have been thinking about this for years, probably two decades. The answer to your question or the definition, I am sure you know, is rather difficult. I don't think it is impossible, but there's a risk of diving into a deep dark pit of philosophical thought going back to at least the ancient Greeks.
  And, if we did go through that exercise, I doubt we can come out of it with a canonical definition of understanding.
  I was really excited about LLM's as they surfaced and developed. I fully embraced the technology and have been using it extensively with full top-tier subscriptions to most services. My conclusion so far: If you want to destroy your business, adopt LLM's with gusto.
  I know that's a statement that goes way against the train ride we are on this very moment. That's not to say LLM's are not useful. They are. Very much so. The problem is...well...they don't understand. And here I am, back in a circular argument.
  I can define understanding with the "I know it when I see it" meme. And, frankly, it does apply. Yet, that's not a definition. We've all experienced that stare when talking to someone who does not have sufficient depth of understanding in a topic. Some of us have experienced people running teams who should not be in that position because they don't have a clue, they don't understand enough of it to be effective at what they do.
  And yet, I still have not defined "understanding".
  Well, it's hard. And I am not a philosopher, I am an engineer working in robotics, AI and applications to real time video processing.
  I have written about my experiments using LLM coding tools (I refuse to call them AI, they are NOT intelligent; yes, need to define that as well).
  In that context, lack of understanding is clearly evident when an LLM utterly destroys your codebase by adding dozens of irrelevant and unnecessary tests, randomly changes variable names as you navigate the development workflow, adds modules like a drunken high school coder and takes you down tangents that would make for great comedy if I were a tech comedian.
  LLMs do not understand. They are fancy --and quite useful-- auto-complete engines and that's about it. Other than that, buyer beware.
  The experiments I ran, some of them spanning three months of LLM-collaborative coding at various levels --from very hands-on to "let Jesus drive the car"-- conclusively demonstrated (at least to me) that:
  1- No company should allow anyone to use LLMs unless they have enough domain expertise to be able to fully evaluate the output. And you should require that they fully evaluate and verify the work product before using it for anything; email, code, marketing, etc.
  2- No company should trust anything coming out of an LLM, not one bit. Because, well, they don't understand. I recently tried to use the United Airlines LLM agent to change a flight. It was a combination of tragic and hilarious. Now, I know what's going on. I cannot possibly imagine the wild rides this thing is taking non-techies on every day. It's shit. It does not understand. It' isn't isolated to United Airlines, it's everywhere LLMs are being used. The potential for great damage is always there.
  3- They can be great for summarization tasks. For example, you have have them help you dive deep into 300 page AMD/Xilinx FPGA datasheet or application note and help you get mentally situated. They can be great at helping you find prior art for patents. Yet, still, because they are mindless parrots, you should not trust any of it.
  4- Nobody should give LLMs great access to a non-trivial codebase. This is almost guaranteed to cause destruction and hidden future effects. In my experiments I have experienced an LLM breaking unrelated code that worked just fine --in some cases fully erasing the code without telling you. Ten commits later you discover that your network stack doesn't work or isn't even there. Or, you might discover that the stack is there but the LLM changed class, variable or method names, maybe even data structures. It's a mindless parrot.
  I could go on.
  One response to this could be "Well, idiot, you need better prompts!". That, of course, assumes that part of my experimentation did not include testing prompts of varying complexity and length. I found that for some tasks, you get better results by explaining what you want and then asking the LLM to write a prompt to get that result. You check that prompt, modify if necessary and, from my experience, you are likely to get better results.
  Of course, the reply to "you need better prompts" is easy: If the LLM understood, prompt quality would not be a problem at all and pages-long prompts would not be necessary. I should not have to specify that existing class, variable and method names should not be modified. Or that interfaces should be protected. Or that data structures need not be modified without reason and unless approved by me. Etc.
  It reminds me of a project I was given when I was a young engineer barely out of university. My boss, the VP of Engineering where I worked, needed me to design a custom device. Think of it as a specialized high speed data router with multiple sources, destinations and a software layer to control it all. I had to design the electronics, circuit boards, mechanical and write all the software. The project had a budget of nearly a million dollars.
  He brought me into his office and handed me a single sheet of paper with a top-level functional diagram. Inputs, outputs, interfaces. We had a half hour discussion about objectives and required timeline. He asked me if I could get it done. I said yet.
  He checked in with me every three months or so. I never needed anything more than that single piece of paper and the short initial conversation because I understood what we needed, what he wanted, how that related to our other systems, available technology, my own capabilities and failings, available tools, etc. It took me a year to deliver. It worked out of the box.
  You cannot do that with LLMs because they don't understand anything at all. They mimic what some might confuse for understanding, but they do not.
  And, yet, once again, I have not defined the term. I think everyone reading this who has used LLMs to a non-trivial depth...well...understands what I mean.
  
  dasil003 7 hours ago
  
  > We've all experienced that stare when talking to someone who does not have sufficient depth of understanding in a topic.
  I think you're really putting your finger on something here. LLMs have blown us away because they can interact with language in a very similar way to humans, and in fact it approximates how humans operate in many contexts when they lack a depth of understanding. Computers never could do this before, so it's impressive and novel. But despite how impressive it is, humans who were operating this way were never actually generating significant value. We may have pretended they were for social reasons, and there may even have been some real value associated with the human camaraderie and connections they were a part of, but certainly it is not of value when automated.
  Prior to LLMs just being able to read and write code at a pretty basic level was deemed an employable skill, but because it was not a natural skill for lots of human, it was also a market for lemons and just the basic coding was overvalued by those who did not actually understand it. But of course the real value of coding has always been to create systems that serve human outcomes, and the outcomes that are desired are always driven by human concerns that are probably inscrutable to something without the same wetware as us. Hell, it's hard enough for humans to understand each other half the time, but even when we don't fully understand each other, the information conferred through non-verbal cue, and familiarity with the personalities and connotations that we only learn through extended interaction has a robust baseline which text alone can never capture.
  When I think about strategic technology decisions I've been involved with in large tech companies, things are often shaped by high level choices that come from 5 or 6 different teams, each of which can not be effectively distilled without deep domain expertise, and which ultimately can only be translated to a working system by expert engineers and analysts who are able to communicate in an extremely high bandwidth fashion relying on mutual trust and applying a robust theory of the mind every step along the way. Such collaborators can not only understand distilled expert statements of which they don't have direct detailed knowledge, but also, they can make such distilled expert statements and confirm sufficient understanding from a cross-domain peer.
  I still think there's a ton of utility to be squeezed out of LLMs as we learn how to harness and feed them context most effectively, and they are likely to revolutionize the way programming is done day-to-day, but I don't believe we are anywhere near AGI or anything else that will replace the value of what a solid senior engineer brings to the table.
  
  robomartin 7 hours ago
  
  I am not liking the term "AGI". I think intelligence and understanding are very different things and they are both required to build a useful tool that we can trust.
  To use an image that might be familiar to lots of people reading this, the Sheldon character in Big Bang Theory is very intelligent about lots of fields of study and yet lacks tons of understanding about many things, particularly social interaction, the human impact of decisions, etc. Intelligence alone (AGI) isn't the solution we should be after. Nice buzz word, but not the solution we need. This should not be the objective at the top of the hill.
  
  godelski 5 hours ago
  
  I've always distinguished knowledge, intelligence, and wisdom. Knowledge is knowing a chair is a seat. Intelligence is being able to use a log as a chair. Wisdom is knowing the log chair will be more comfortable if I turn it around and that sometimes it's more comfortable to sit on the ground and use the log as fuel for the fire.
  But I'm not going to say I was the first to distinguish those word. That'd be silly. They're 3 different words and we use them differently. We all know Sheldon is smart but he isn't very wise.
  As for AGI, I'm not so sure my issue is with the label but more with the insistence that it is so easy and straight forward to understand. It isn't very wise to think the answer is trivial to a question which people have pondered for millennia. That just seems egotistical. Especially when thinking your answer is so obviously correct that you needn't bother trying to see if they were wrong. Even though Don Quixote didn't test his armor a second time, he had the foresight to test it once.
  
  godelski 15 hours ago
  
  > If 'understand' is a meaningless term to someone who's spent 30 years in AI research, I understand why LLMs are being sold and hyped in the way they are.
  I don't have quite as much time as robotresearcher, but I've heard their sentiment frequently.
  I've been to conferences, talked with people at the top of the field (I'm "junior", but published and have a PhD) where when asking deeper questions I'll get a frequent response "I just care if it works." As if that also wasn't the motivation for my questions too.
  But I'll also tell you that there are plenty of us who don't ascribe to those beliefs. There's a wide breadth of opinions, even if one set is large and loud. (We are getting louder though) I do think we can get to AGI and I do think we can figure out what words like "understand" truly mean (with both accuracy and precision, the latter being what's more lacking). But it is also hard to navigate because we're discouraged from this work and little funding flows our way (I hope as we get louder we'll be able to explore more, but I fear we may switch from one railroad to the next). The weirdest part to me has been that it seems that even in the research space, talking to peers, that discussing flaws or limits is treated as dismissal. I thought our whole job was to find the limits, explore them, and find ways to resolve them.
  The way I see it now is that the field uses the duck test. If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck. The problem is people are replacing "probably" with "is". The duck test is great, and right now we don't have anything much better. But the part that is insane is to call it perfect. Certainly as someone who isn't an ornithologist, I'm not going to be able to tell a sophisticated artificial duck from a real one. But it's ability to fool me doesn't make it real. And that's exactly why it would be foolish to s/probably/is.
  So while I think you're understanding correctly, I just want to caution throwing the baby out with the bathwater. The majority of us dissenting from the hype train and "scale is all you need" don't believe humans are magic and operating outside the laws of physics. Unless this is a false assumption, artificial life is certainly possible. The question is just about when and how. I think we still have a ways to go. I think we should be exploring a wide breadth of ideas. I just don't think we should put all our eggs in one basket, especially if there's clear holes in it.
  [Side note]: An interesting relationship I've noticed is that the hype train people tend to have a full CS pedigree while dissenters have mixed (and typically start in something like math or physics and make their way to CS). It's a weak correlation, but I've found it interesting.
  
  hodgehog11 14 hours ago
  
  As a mathematician who also regularly publishes in these conferences, I am a little surprised to hear your take; your experience might be slightly different to mine.
  Identifying limitations of LLMs in the context of "it's not AGI yet because X" is huge right now; it gets massive funding, taking away from other things like SciML and uncertainty analyses. I will agree that deep learning theory in the sense of foundational mathematical theory to develop internal understanding (with limited appeal to numerics) is in the roughest state it has even been in. My first impression there is that the toolbox has essentially run dry and we need something more to advance the field. My second impression is that empirical researchers in LLMs are mostly junior and significantly less critical of their own work and the work of others, but I digress.
  I also disagree that we are disincentivised to find meaning behind the word "understanding" in the context of neural networks: if understanding is to build an internal world model, then quite a bit of work is going into that. Empirically, it would appear that they do, almost by necessity.
  
  godelski 11 hours ago
  
  Maybe given our different niches we interact with different people? But I'm uncertain because I believe what I'm saying is highly visible. I forgot, which NeurIPS(?) conference were so many wearing "Scale is all you need" shirts?
  > My first impression there is that the toolbox has essentially run dry and we need something more to advance the field
  This is my impression too. Empirical evidence is a great tool and useful, especially when there is no strong theory to provide direction, but it is limited.
  > My second impression is that empirical researchers in LLMs are mostly junior and significantly less critical of their own work and the work of others
  But this is not my impression. I see this from many prominent researchers. Maybe they claim SIAYN in jest, but then they should come out and say it is such instead of doubling down. If we take them at their word (and I do), robotresearcher is not a junior (please, read their comments. It is illustrative of my experience. I'm just arguing back far more than I would in person). I've also seen members of audiences to talks where people ask questions like mine ("are benchmarks sufficient to make such claims?") with responses of "we just care that it works." Again, I think this is a non-answer to the question. But being taken as a sufficient answer, especially in response to peers, is unacceptable. It almost always has no follow-up.
  I also do not believe these people are less critical. I've had several works which struggled through publication as my models that were a hundredth the size (and a millionth the data) could perform on par, or even better. At face value asks of "more datasets" and "more scale" are reasonable, yet it is a self reinforcing paradigm where it slows progress. It's like a corn farmer smugly asking why the neighboring soy bean farmer doesn't grow anything when the corn farmer is chopping all the soy bean stems in their infancy. It is a fine ask to big labs with big money, but it is just gate keeping and lazy evaluation to anyone else. Even at CVPR this last year they passed out "GPU Rich" and "GPU Poor" hats, so I thought the situation was well known.
  > if understanding is to build an internal world model, then quite a bit of work is going into that. Empirically, it would appear that they do, almost by necessity.
  I agree a "lot of work is going into it" but I also think the approaches are narrow and still benchmark chasing. I saw as well was given the aforementioned responses at workshops on world modeling (as well as a few presenters who gave very different and more complex answers or "it's the best we got right now", but nether seemed to confident in claiming "world model" either).
  But I'm a bit surprised that as a mathematician you think these systems create world models. While I see some generalization, this is also impossible for me to distinguish from memorization. We're processing more data than can be scrutinized. We seem to also frequently uncover major limitations to our de-duplication processes[0]. We are definitely abusing the terms "Out of Distribution" and "Zero shot". Like I don't know how any person working with a proprietary LLM (or large model) that they don't own, can make a claim of "zero shot" or even "few shot" capabilities. We're publishing papers left and right, yet it's absurd to claim {zero,few}-shot when we don't have access to the learning distribution. We've merged these terms with biased sampling. Was the data not in training or is it just a low likelihood region of the model? They're indistinguishable without access to the original distribution.
  Idk, I think our scaling is just making the problem harder to evaluate. I don't want to stop that camp because they are clearly producing things of value, but I do also want that camp to not make claims beyond their evidence. It just makes the discussion more convoluted. I mean the argument would be different if we were discussing small and closed worlds, but we're not. The claims are we've created world models yet many of them are not self-consistent. Certainly that is a requirement. I admit we're making progress, but the claims were made years ago. Take GameNGen[1] or Diamond Diffusion. Neither were the first and neither were self-consistent. Though both are also impressive.
  [0] as an example: https://arxiv.org/abs/2303.09540
  [1] https://news.ycombinator.com/item?id=41375548
  [2] https://news.ycombinator.com/item?id=41826402
  
  hodgehog11 6 hours ago
  
  Apologies if I ramble a bit here, this was typed in a bit of a hurry. Hopefully I answer some of your points.
  First, regarding robotresearcher and simondota's comments, I am largely in agreement with what they say here. The "toaster" argument is a variant of the Chinese Room argument, and there is a standard rebuttal here. The toaster does not act independently of the human so it is not a closed system. The system as a whole, which includes the human, does understand toast. To me, this is different from the other examples you mention because the machine was not given a list of explicit instructions. (I'm no philosopher though so others can do a better job of explaining this). I don't feel that this is an argument for why LLMs "understand", but rather why the concept of "understanding" is irrelevant without an appropriate definition and context. Since we can't even agree on what constitutes understanding, it isn't productive to frame things in those terms. I guess that's where my maths background comes in, as I dislike the ambiguity of it all.
  My "mostly junior" comment is partially in jest, but mostly comes from the fact that LLM and diffusion model research is a popular stream for moving into big tech. There are plenty of senior people in these fields too, but many reviewers in those fields are junior.
  > I've also seen members of audiences to talks where people ask questions like mine ("are benchmarks sufficient to make such claims?") with responses of "we just care that it works."
  This is a tremendous pain point to me more than I can convey here, but it's not unusual in computer science. Bad researchers will live and die on standard benchmarks. By the way, if you try to focus on another metric under the argument that the benchmarks are not wholly representative of a particular task, expect to get roasted by reviewers. Everyone knows it is easier to just do benchmark chasing.
  > I also do not believe these people are less critical.
  I think the fact that the "we just care that it works" argument is enough to get published is a good demonstration of what I'm talking about. If "more datasets" and "more scale" are the major types of criticisms that you are getting, then you are still working in a more fortunate field. And yes, I hate it as much as you do as it does favor the GPU rich, but they are at least potentially solvable. The easiest papers of mine to get through were methodological and often got these kinds of comments. Theory and SciML papers are an entirely different beast in my experience because you will rarely get reviewers that understand the material or care about its relevance. People in LLM research thought that the average NeurIPS score in the last round was a 5. Those in theory thought it was 4. These proportions feel reflected in the recent conferences. I have to really go looking for something outside the LLM mainstream, while there was a huge variety of work only a few years ago. Some of my colleagues have noticed this as well and have switched out of scientific work. This isn't unnatural or something to actively try to fix, as ML goes through these hype phases (in the 2000s, it was all kernels as I understand).
  > approaches are narrow and still benchmark chasing > as a mathematician you think these systems create world models
  When I say "world model", I'm not talking about outputs or what you can get through pure inference. Training models to perform next frame prediction and looking at inconsistencies in the output tells us little about the internal mechanism. I'm talking about appropriate representations in a multimodal model. When it reads a given frame, is it pulling apart features in a way that a human would? We've known for a long time that embeddings appropriately encode relationships between words and phrases. This is a model of the world as expressed through language. The same thing happens for images at scale as can be seen in interpretable ViT models. We know from the theory that for next frame prediction, better data and more scaling improves performance. I agree that isn't very interesting though.
  > We are definitely abusing the terms "Out of Distribution" and "Zero shot".
  Absolutely in agreement with everything you have said. These are not concepts that should be talked about in the context of "understanding", especially at scale.
  > I think our scaling is just making the problem harder to evaluate.
  Yes and no. It's clear that whatever approach we will use to gauge internal understanding needs to work at scale. Some methods only work with sufficient scale. But we know that completely black-box approaches don't work, because if they did, we could use them on humans and other animals.
  > The claims are we've created world models yet many of them are not self-consistent.
  For this definition of world model, I see this the same way as how we used to have "language models" with poor memory. I conjecture this is more an issue of alignment than a lack of appropriate representations of internal features, but I could be totally wrong on this.
  
  godelski 4 hours ago
  
  > The toaster does not act independently of the human so it is not a closed system
  I think you're mistaken. No, not at that, at the premise. I think everyone agrees here. Where you're mistaken is that when I login to Claude it says "How can I help you today?"
  No one is thinking that the toaster understands things. We're using it to point out how silly the claim of "task performance == understanding" is. Techblueberry furthered this by asking if the toaster is suddenly intelligent by wrapping it with a cron job. My point was about where the line is drawn. The turning on the toaster? No, that would be silly and you clearly agree. So you have to answer why the toaster isn't understanding toast. That's the ask. Because clearly toaster toasts bread.
  You and robotresearcher have still avoided answering this question. It seems dumb but that is the crux of the problem. The LLM is claimed to be understanding, right? It meets your claims of task performance. But they are still tools. They cannot act independently. I still have to prompt them. At an abstract level this is no different than the toaster. So, at what point does the toaster understand how to toast? You claim it doesn't, and I agree. You claim it doesn't because a human has to interact with it. I'm just saying that looping agents onto themselves doesn't magically make them intelligent. Just like how I can automate the whole process from planting the wheat to toasting the toast.
  You're a mathematician. All I'm asking is that you abstract this out a bit and follow the logic. Clearly even our automated seed to buttered toast on a plate machine needs not have understanding.
  From my physics (and engineering) background there's a key thing I've learned: all measurements are proxies. This is no different. We don't have to worry about this detail in most every day things because we're typically pretty good at measuring. But if you ever need to do something with precision, it becomes abundantly obvious. But you even use this same methodology in math all the time. Though I wouldn't say that this is equivalent to taking a hard problem, creating an isomorphic map to an easier problem, solving it, then mapping back. There's an invective nature. A ruler doesn't measure distance. A ruler is a reference to distance. A laser range finder doesn't measure distance either, it is photodetector and a timer. There is nothing in the world that you can measure directly. If we cannot do this with physical things it seems pretty silly to think we can do it with abstract concepts that we can't create robust definitions for. It's not like we've directly measured the Higgs either. But what, do you think entropy is actually a measurement of intelligible speech? Perplexity is a good tool for identifying an entropy minimizer? Or does it just correlate? Is a FID a measurement of fidelity or are we just using a useful proxy? I'm sorry, but I just don't think there are precise mathematical descriptions of things like natural English language or realistic human faces. I've developed some of the best vision models out there and I can tell you that you have to read more than the paper because while they will produce fantastic images they also produce some pretty horrendous ones. The fact that they statistically generate realistic images does not imply that they actually understand them.
  > I'm no philosopher
  Why not? It sounds like you are. Do you not think about metamathematics? What math means? Do you not think about math beyond the computation? If you do, I'd call you a philosopher. There's a P in a PhD for a reason. We're not supposed to be automata. We're not supposed to be machine men, with machine minds, and machine hearts.
  > This is a tremendous pain point ... researchers will live and die on standard benchmarks.
  It is a pain we share. I see it outside CS as well, but I was shocked to see the difference. Most of the other physicists and mathematicians I know that came over to CS were also surprised. And it isn't like physicists are known for their lack of egos lol
  > then you are still working in a more fortunate field
  Oh, I've gotten the other comments too. That research never found publication and at the end of the day I had to graduate. Though now it can be revisited. I once was surprise to find that I saved a paper from Max Welling's group. My fellow reviewers were confident in their rejections just since they admitted to not understanding differential equations the AC sided with me (maybe they could see Welling's name? I didn't know till months after). It barely got through a workshop, but should have been in the main proceedings.
  So I guess I'm saying I share this frustration. It's part of the reason I talk strongly here. I understand why people shift gears. But I think there's a big difference between begrudgingly getting on the train because you need to publish to survive and actively fueling it and shouting that all outer trains are broken and can never be fixed. One train to rule them all? I guess CS people love their binaries.
  > world model
  I agree that looking at outputs tells us little about their internal mechanisms. But proof isn't symmetric in difficulty either. A world model has to be consistent. I like vision because it gives us more clues in our evaluations, let's us evaluate beyond metrics. But if we are seeing video from a POV perspective, then if we see a wall in front of us, turn left, then turn back we should still expect to see that wall, and the same one. A world model is a model beyond what is seen from the camera's view. A world model is a physics model. And I mean /a/ physics model, not "physics". There is no single physics model. Nor do I mean that a world model needs to have even accurate physics. But it does need to make consistent and counterfactual predictions. Even the geocentric model is a world model (literally a model of worlds lol). The model of the world you have in your head is this. We don't close our eyes and conclude the wall in front of you will disappear. Someone may spin you around and you still won't do this, even if you have your coordinates wrong. The issue isn't so much memory as it is understanding that walls don't just appear and disappear. It is also understanding that this also isn't always true about a cat.
  I referenced the game engines because while they are impressive they are not self consistent. Walls will disappear. An enemy shooting at you will disappear sometimes if you just stop looking at it. The world doesn't disappear when I close my eyes. A tree falling in a forest still creates acoustic vibrations in the air even if there is no one to hear it.
  A world model is exactly that, a model of a world. It is a superset of a model of a camera view. It is a model of the things in the world and how they interact together, regardless of if they are visible or not. Accuracy isn't actually the defining feature here, though it is a strong hint, at least it is for poor world models.
  I know this last part is a bit more rambly and harder to convey. But I hope the intention came across.
  
  robotresearcher 15 hours ago
  
  Intellectual caution is a good default.
  Having said that, can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?
  As an example, how would a chess program that understands chess differ from one that is merely better at it than any human who ever lived?
  (Chess the formal game; not chess the cultural phenomenon)
  Some people don’t find the example satisfying, because they feel like chess is not the kind of thing where understanding pertains.
  I extend that feeling to more things.
  
  godelski 12 hours ago
  
  > any human who ever lived
  Is this falsifiable? Even restricting to those currently living? On what tests? In which way? Does the category of error matter?
  > can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?
  I'd argue you didn't understand the examples from my previous comment or the direct reply[0]. Does it become a duck as soon as you are able to trick an ornithologist? All ornithologists?
  But yes. Is it fair if I use Go instead of Chess? Game 4 with Lee Sedol seems an appropriate example.
  Vafa also has some good examples[1,2].
  But let's take an even more theoretical approach. Chess is technically a solved game since it is non-probabilistic. You can compute an optimal winning strategy from any valid state. Problem is it is intractable since the number of action state pairs is so large. But the number of moves isn't the critical part here, so let's look at Tic-Tac-Toe. We can pretty easily program up a machine that will not lose. We can put all actions and states into a graph and fit that on a computer no problem. Do you really say that the program better understands Tic-Tac-Toe than a human? I'm not sure we should even say it understands the game at all.
  I don't think the situation is resolved by changing to unsolved (or effectively unsolved) games. That's the point of the Heliocentric/Geocentric example. The Geocentric Model gave many accurate predictions, but I would find it surprising if you suggested an astronomer at that time, with deep expertise in the subject, understood the configuration of the solar system better than a modern child who understands Heliocentricism. Their model makes accurate predictions and certainly more accurate than that child would, but their model is wrong. It took quite a long time for Heliocentrism to not just be proven to be correct, but to also make better predictions than Geocentrism in all situations.
  So I see 2 critical problems here.
  1) The more accurate model[3] can be less developed, resulting in lower predictive capabilities despite being a much more accurate representation of the verifiable environment. Accuracy and precision are different, right?
  2) Test performance says nothing about coverage/generalization[4]. We can't prove our code is error free through test cases. We use them to bound our confidence (a very useful feature! I'm not against tests, but as you say, caution is good).
  In [0] I referenced Dyson, I'd appreciate it if you watched that short video (again if it's been some time). How do you know you aren't making the same mistake Dyson almost did? The mistake he would have made had he not trusted Fermi? Remember, Fermi's predictions were accurate and they even stood for years.
  If your answer is time, then I'm not convinced it is a sufficient explanation. It doesn't explain Fermi's "intuition" (understanding) and is just kicking the can down the road. You wouldn't be able to differentiate yourself from Dyson's mistake. So why not take caution?
  And to be clear, you are the one making the stronger claim: "understanding has a well defined definition." My claim is that yours is insufficient. I'm not claiming I have an accurate and precise definition, my claim is that we need more work to get the precision. I believe your claim can be a useful abstraction (and certainly has been!), but that there are more than enough problems that we shouldn't hold to it so tightly. To use it as "proof" is naive. It is equivalent to claiming your code is error free because it passes all test cases.
  [0] https://news.ycombinator.com/item?id=45622156
  [1] https://arxiv.org/abs/2406.03689
  [2] https://arxiv.org/abs/2507.06952
  [3] Certainly placing the Earth at the center of the solar system (or universe!) is a larger error than placing the sun at the center of the solar system and failing to predict the tides or retrograde motion of Mercury.
  [4] This gets exceedingly complex as we start to differentiate from memorization. I'm not sure we need to dive into what the distance from some training data needs be to make it a reasonable piece of test data, but that is a question that can't be ignored forever.
  
  pennaMan 14 hours ago
  
  so your definition of "understand" is "able to develop the QC test (or explain tests already developed)"
  I hate to break it to you, but the LLMs can already do all 3 tasks you outlined
  It can be argued for all 3 actors in this example (the QC operator, the PhD chemist and the LLM) that they don't really "understand" anything and are iterating on pre-learned patterns in order to complete the tasks.
  Even the ground-breaking chemist researcher developing a new test can be reduced to iterating on the memorized fundamentals of chemistry using a lot of compute (of the meat kind).
  The mythical Understanding is just a form of "no true Scotsman"
  
  lelandbatey 15 hours ago
  
  > if a machine performs a task as well as a human, it understands it exactly as much as a human.
  I think you're right, except that the ones judging "as well as a human" are in fact humans, and humans have expectations that expand beyond the specs. From the narrow perspective of engineering specifications or profit generated, a robot/AI may very well be exactly as understanding as a human. For the people which interact with those systems outside the money/specs/speeds & feeds, the AI/robot will always feel at least different compared to a person. And as long as it's different, there will always be room to un-falsifiably claim "this robot is worse in my opinion due to X/Y/Z difference."
  
  godelski 15 hours ago
  
  > that does not have a technical meaning
  I don't think the definition is very refined, but I think we should be careful to differentiate that from useless or meaningless. I would say most definitions are accurate, but not precise.
  It's a hard problem, but we are making progress on it. We will probably get there, but it's going to end up being very nuanced and already it is important to recognize that the word means different things in vernacular and in even differing research domains. Words are overloaded and I think we need to recognize this divergence and that we are gravely miscommunicating by assuming the definitions are obvious. I'm not sure why we don't do more to work together on this. In our field we seem to think we got it all covered and don't need others. I don't get that.
  > In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human.
  And I do not think this is accurate at all. I would not say my calculator understands math despite it being able to do it better than me. I can say the same thing about a lot of different things which we don't attribute intelligence to. I'm sorry, but the logic doesn't hold.
  Okay, you might take an out by saying the calculator can't do abstract math like I can, right? Well we're going to run into that same problem. You can't test your way out of it. We've known this in hard sciences like physics for centuries. It's why physicists do much more than just experiments.
  There's the classic story of Freeman Dyson speaking to Fermi, which is why so many know about the 4 parameter elephant[0], but it is also just repeated through our history of physics. Guess what? Dyson's experiments worked. They fit the model. They were accurate and made accurate predictions! Yet they were not correct. People didn't reject Galileo just because the church, there were serious problems with his work too. Geocentricism made accurate predictions, including ones that Galileo's version of Heliocentrism couldn't. These historical misunderstandings are quite common, including things like how the average person understands things like Schrodinger's Cat. The cat isn't in a parallel universe of both dead and alive lol. It's just that we, outside the box can't determine which. Oh, no, information is lossy, there's injective functions, the universe could then still be deterministic yet we wouldn't be able to determine that (and my name comes into play).
  So idk, it seems like you're just oversimplifying as a means to sidestep the hard problem[1]. The lack of a good technical definition of understanding should tell us we need to determine one. It's obviously a hard thing to do since, well... we don't have one and people have been trying to solve it for thousands of years lol.
  > Just my opinion, but my professional opinion from thirty-plus years in AI.
  Maybe I don't have as many years as you, but I do have a PhD in CS (thesis on neural networks) and a degree in physics. I think it certainly qualifies as a professional opinion. But at the end of the day it isn't our pedigree that makes us right or wrong.
  [0] https://www.youtube.com/watch?v=hV41QEKiMlM
  [1] I'm perfectly fine tabling a hard problem and focusing on what's more approachable right now, but that's a different thing. We may follow a similar trajectory but I'm not going to say the path we didn't take is just an illusion. I'm not going to discourage others from trying to navigate it either. I'm just prioritizing. If they prove you right, then that's a nice feather in your hat, but I doubt it since people have tried that definition from the get go.
  
  robotresearcher 13 hours ago
  
  > It's a hard problem
  So people say.
  I’m not sidestepping the Hard Problem. I am denying it head on. It’s not a trick or a dodge! It’s a considered stance.
  I'm denying that an idea that has historically resisted crisp definition, and that the Stanford Encyclopedia of Philosophy introduces as 'protean', needs to be taken seriously as an essential missing part of AI systems, until someone can explain why.
  In my view, the only value the Hard Problem has is to capture a feeling people have about intelligent systems. I contend that this feeling is an artifact of being a social ape, and it entails nothing about AI.
  
  pastel8739 9 hours ago
  
  Regardless of whether you think understanding is important, it’s clear from this thread that a lot of people find understanding valuable. In order to trust an AI with decisions that affect people, people will want to believe that the AI “understands” the implications of its decisions, for whatever meaning of “understand” those people have in their head. So indeed I think it is important that AI researchers try to get their AIs to understand things, because it is important to the consumers that they do.
  
  godelski 11 hours ago
  
  It's a sidestep if your stance doesn't address critiques.
  > needs to be taken seriously as an essential missing part of AI systems, until someone can explain why.
  Ignoring critiques is not the same as a lack of them
  
  Zarathruster 8 hours ago
  
  While I agree with you in the main, I also take seriously the "until someone can explain why" counterpoint.
  Though I agree with you that your calculator doesn't understand math, one might reasonably ask, "why should we care?" And yeah, if it's just a calculator, maybe we don't care. A calculator is useful to us irrespective of understanding.
  If we're to persuade anyone (if we are indeed right), we'll need to articulate a case for why understanding matters, with respect to AI. I think everyone gets this on an instinctual level- it wasn't long ago that LLMs suggested we add rocks to our salads to make them more crunchy. As long as these problems can be overcome by throwing more data and compute at them, people will remain incurious about the Understanding Problem. We need to make a rigorous case, probably with a good working alternative, and I haven't seen much action here.
  
  godelski 4 hours ago
  
  > "why should we care?"
  I'm not the one claiming that a calculator thinks. The burden of proof lies on those that do. Claims require evidence and extraordinary claims require extraordinary evidence.
  I don't think anyone is saying that the calculator isn't a useful tool. But certainly we should push back when people are claiming it understands math and can replace all mathematicians.
  > If we're to persuade anyone, we'll need to articulate a case for why understanding matters
  This is a more than fair point. Though I have not found it to be convincing when I've tried.
  I'll say that a major motivating reason of why I went into physics in the first place is because I found that a deep understanding was a far more efficient way of learning how to do things. I started as an engineer and even went into engineering after my degree. Physics made me a better engineer, and I think a better engineer than had I stayed in engineering. Understanding gave me the ability to not just take building blocks and put them together, but to innovate. Being able to see things at a deeper level allowed me to come to solutions I otherwise could not have. Using math to describe things allowed me to iterate faster (just like how we use simulations). Understanding what the math meant allowed me to solve the problems where the equations no longer applied. It allowed me to know where the equations no longer applied. It told me how to find and derive new ones.
  I often found that engineers took an approach of physical testing first, because "the math only gets you so far." But that was just a misunderstanding of how far their math took them. It could do more, just they hadn't been taught that. So maybe I had to take a few days working things out on pen and paper, but that was a cheaper and more robust solution than using the same time to test and iterate.
  Understanding is a superpower. Problems can be solved without understanding. A mechanic can fix an engine without knowing how it works. But they will certainly be able to fix more problems if they do. The reason to understand is because we want things to work. The problem is, the world isn't so simple that every problem is the same or very similar to another. A calculator is a great tool. It'll solve calculations all day. Much faster than me, with higher accuracy, but it'll never come up with an equation on its own. That isn't to call it useless, but I need to know this if I want to get things done. The more I understand what my calculator can and can't do, the better I can use that tool.
  Understanding things, and the pursuit to understand more is what has brought humans to where they are today. I do not understand why this is even such a point of contention. Maybe the pursuit of physics didn't build a computer, but it is without a doubt what laid the foundation. We never could have done this had we not thought to understand lightning. We would have never been able to tame it like we have. Understanding allows us to experiment with what we cannot touch. It does not mean a complete understanding nor does it mean perfection, but it is more than just knowledge.
  
  JKCalhoun 15 hours ago
  
  I'm not sure. There's a view that, as I understand it, suggests that language is intelligence. That language is a requirement for understanding.
  An example might be kind of the contrary—that you might not be able to hold an idea in your head until it has been named. For myself, until I heard the word gestalt (maybe a fitting example?) I am not sure I could have understood the concept. But when it is described it starts to coalesce—and then when named, it became real. (If that makes sense.)
  FWIW, Zeitgeist is another one of those concepts/words for me. I guess I have to thank the German language.
  Perhaps it is why other animals on this planet seem to us lacking intelligence. Perhaps it is their lack of complex language holding their minds back.
  
  godelski 14 hours ago
  
  > There's a view that suggests that language is intelligence.
  I think you find the limits when you dig in. What are you calling language? Can you really say that Eliza doesn't meet your criteria? What about a more advanced version? I mean we've been passing the Turing Test for decades now.
  > That language is a requirement for understanding.
  But this contradicts your earlier statement. If language is a requirement then it must precede intelligence, right?
  I think you must then revisit your definition of language and ensure that it matches to all the creatures that you consider intelligent. At least by doing this you'll make some falsifiable claims and can make progress. I think an ant is intelligent, but I also think ants do things far more sophisticated than the average person thinks. It's an easy trap, not knowing what you don't know. But if we do the above we get some path to aid in discovery, right?
  > that you might not be able to hold an idea in your head until it has been named
  Are you familiar with Anendophasia?
  It is the condition where a person does not have an internal monologue. They think without words. The definition of language is still flexible enough that you can probably still call that language, just like in your example, but it shows a lack of precision in the definition, even if it is accurate.
  > Perhaps it is why other animals on this planet seem to us lacking intelligence
  One thing to also consider is if language is necessary for societies or intelligence. Can we decouple the two? I'm not aware of any great examples, although octopi and many other cephalopods are fairly asocial creatures. Yet they are considered highly intelligent due to their adaptive and creative nature.
  Perhaps language is a necessary condition for advanced intelligence, but not intelligence alone. Perhaps it is communication and societies, differentiating from an internalized language. Certainly the social group can play an influence here, as coalitions can do more than the sum of the individuals (by definition). But the big question is if these things are necessary. Getting the correct causal graph, removing the confounding variables, is no easy task. But I think we should still try and explore differing ideas. While I don't think you're right, I'll encourage you to pursue your path if you encourage me to pursue mine. We can compete, but it should be friendly, as our competition forces us to help see flaws in our models. Maybe the social element isn't a necessary condition, but I have no doubt that it is a beneficial tool. I'm more frustrated by those wanting to call the problem solved. It obviously isn't, as it's been so difficult to get generalization and consensus among experts (across fields).
  
  the_gipsy 14 hours ago
  
  > It is the condition where a person does not have an internal monologue.
  These people are just nutjobs that misinterpreted what internal monologue means, and have trouble doing basic introspection.
  I know there are a myriad of similar conditions, aphantasia, synaesthesia, etc. But someone without internal monologue simply could not function in our society, or at least not pass as someone without obvious mental diminishment.
  If there really were some other, hidden code in the mind, that could express "thoughts" in the same depth as language does - then please show it already. At least the tiniest bit of a hint.
  
  godelski 12 hours ago
  
  I know some of these people. We've had deep conversations about what is going on in our thought processes. Their description significantly differs from mine.
  These people are common enough that you likely know some. It's just not a topic that frequently comes up.
  It is also a spectrum, not a binary thing (though full anendophasia does exist, it is just on the extreme end). I think your own experiences should allow you to doubt your claim. For example, I know when I get really into a fiction book I'm reading that I transition from a point where I'm reading the words in my head to seeing the scenes more like a movie, or more accurately like a dream. I talk to myself in my head a lot, but I can also think without words. I do this a lot when I'm thinking about more physical things like when I'm machining something, building things, or even loading dishwasher. So it is hard for me to believe that while I primarily use an internal monologue that there aren't people that primarily use a different strategy.
  On top of that, well, I'm pretty certain my cat doesn't meow in her head. I'm not certain she has a language at all. So why would it be surprising that this condition exists? You'd have to make the assumption that there was a switch in human evolution. Where it happened all at once or all others went extinct. I find that less likely than the idea that we just don't talk enough about how we think to our friends.
  Certainly there are times where you think without a voice in your head. If not, well you're on the extreme other end. After all, we aren't clones. People are different, even if there's a lot of similarities.
  
  the_gipsy 2 hours ago
  
  I suggest you revisit the subject with your friends, with two key points:
  1. Make it clear to them that with "internal monologue" you do not mean an actual audible hallucination
  2. Ask them if they EVER have imagined themselves or others saying or asking anything
  If they do, which they 100% will unless they lie, then you have ruled out "does not have an internal monologue", the claim is now "does not use his internal monologue as much". You can keep probing them what exactly that means, but it gets washy.
  Someone that truly does not have an internal dialogue could not do the most basic daily tasks. A person could grab a cookie from the table when they feel like it (oh, :cookie-emoji:!), but they cannot put on their shoes, grab their wallet and keys, look in the mirror to adjust their hair, go to the supermarket, to buy cookies. If there were another hidden code that can express all huge mental state pulled by "buy cookies", by now we would at least have an idea that it exists underneath. We must also ask, why would we translate this constantly into language, if the mental state is already there? Translation costs processing power and slows down. So why are these "no internal monologue" people not geniuses?
  I have no doubt that there is a spectrum, on that I agree with you. But the spectrum is "how present is (or how aware is the person of-) the internal monologue". E.g. some people have ADHD, others never get anxiety at all. "No internal monologue" is not one end of the spectrum for functioning adults.
  The cat actually proves my point. A cat can sit for a long time before a mouse-hole, or it can hide to jumpscare his brother cat, and so on. So to a very small degree there is something that let's it process ("understand") very basic and near-future event and action-reactions. However, a cat could not possibly go to the supermarket to buy food, obviating anatomical obstacles, because: it has no language and therefore cannot make a complex mental model. Fun fact: whenever animals (apes, birds) have been taught language, they never ask questions (some claim they did, but if you dig in you'll see that the interpretation is extremely dubious).
  
  Mikhail_Edoshin 2 hours ago
  
  There is us a book written by a woman who suffered a stroke. She lost the ability to speak and understand language. Yet she remained conscious. It took her ten years to fully recover. The book is called "A stroke of insight".
  
  the_gipsy 2 hours ago
  
  Conscious, like an animal or a baby. She could not function at all like a normal adult. Proves my point.
  
  nebula8804 15 hours ago
  
  Only problem is this time enough money is being burned that if AGI does not come, it will probably be extremely painful/fatal for a lot of people that had nothing to do with this field or the decisions being made. What will be the consequences if that comes to pass? So many lives were permanently ruined due to the GFC.
  
  munksbeer 15 hours ago
  
  > We always had the math to show that scale wasn't enough
  Math, to show that scale (presumably of LLMs) wasn't enough for AGI?
  This sounds like it would be quite a big deal, what math is that?
  
  hodgehog11 14 hours ago
  
  As someone who is invested in researching said math, I can say with some confidence that it does not exist, or at least not in the form claimed here. That's the whole problem.
  I would be ecstatic if it did though, so if anyone has any examples or rebuttal, I would very much appreciate it.
  
  camillomiller 7 hours ago
  
  Fantastic comment!
  
  naasking 16 hours ago
  
  > It's to recite (or even apply) knowledge. To understand does actually require a world model.
  This is a shell game, or a god of the gaps. All you're saying is that the models "understand" how to recite or apply knowledge or language, but somehow don't understand knowledge or language. Well what else is there really?
  
  godelski 11 hours ago
  
  > Well what else is there really?
  Differentiate from memorization.
  I'd say there's a difference between a database and understanding. If they're the same, well I think Google created AGI a long time ago.
  
  naasking 7 hours ago
  
  A database doesn't recite or apply knowledge, it stores knowledge.
  
  godelski 3 hours ago
  
  It sure recites it when I query it
- senko 2 hours ago
  
  The thing is, achieving say, 99.99999% reliable AI would be spectacularly useful even if it's a dead end from the AGI perspective.
  People routinely conflate the "useful LLMs" and "AGI", likely because AGI has been so hyped up, but you don't need AGI to have useful AI.
  It's like saying the Internet is dead end because it didn't lead to telepathy. It didn't, but it sure as hell is useful.
  It's beneficial to have both discussions: whether and how to achieve AGI and how to grapple with it, and how to improve a reliability, performance and cost of LLMs for more prosaic use cases.
  It's just that they are separate discussions.
- DanielHB an hour ago
  
  Problem is that these models feels like they are 8 and getting more 8's
  (maybe 7)
- Animats 5 hours ago
  
  > The interviewer had an idea that he took for granted: that to understand language you have to have a model of the world. LLMs seem to understand language therefore they've trained a model of the world. Sutton rejected the premise immediately. He might be right in being skeptical here.
  That's the basic success of LLMs. They don't have much of a model of the world, and they still work. "Attention is all you need". Good Old Fashioned AI was all about developing models, yet that was a dead end.
  There's been some progress on representation in an unexpected area. Try Perchance's AI character chat. It seems to be an ordinary chatbot. But at any point in the conversation, you can ask it to generate a picture, which it does using a Stable Diffusion type system. You can generate several pictures, and pick the one you like best. Then let the LLM continue the conversation continue from there.
  It works from a character sheet, which it will create if asked. It's possible to start from an image and get to a character sheet and a story. The back and forth between the visual and textural domains seems to help.
  For storytelling, such system may need to generate the collateral materials needed for a stage or screen production - storyboards, scripts with stage directions, character summaries, artwork of sets, blocking (where everybody is positioned on stage), character sheets (poses and costumes) etc. Those are the modeling tools real productions use to keep a work created by many people on track. Those are a form of world model for storytelling.
  I've been amazed at how good the results I can get from this thing are. You have to coax it a bit. It tends to stay stuck in a scene unless you push the plot forward. But give it a hint of what happens next and it will run with it.
  [1]https://perchance.org/ai-character-chat
- DrewADesign 9 hours ago
  
  I think current AI is a human language/behavior mirror. A cat might believe they see another cat looking in a mirror, but you can’t create a new cat by creating a perfect mirror.
- theptip 8 hours ago
  
  > LLMs seem to udnerstand language therefore they've trained a model of the world.
  This isn’t the claim, obviously. LLMs seem to understand a lot more than just language. If you’ve worked with one for hundreds of hours actually exercising frontier capabilities I don’t see how you could think otherwise.
- harrall 7 hours ago
  
  I don’t have a deep understand of LLMs but don’t they fundamentally work on tokens and generate a multi-dimensional statistical relationship map between tokens?
  So it doesn’t have to be LLM. You could theoretically have image tokens (though I don’t know in practice, but the important part is the statistical map).
  And it’s not like my brain doesn’t work like that either. When I say a funny joke in response to people in a group, I can clearly observe my brain pull together related “tokens” (Mary just talked about X, X is related to Y, Y is relevant to Bob), filter them, sort them and then spit out a joke. And that happens in like less than a second.
  
  tacitusarc 7 hours ago
  
  Yes! Absolutely. And this is likely what would be necessary for anything approaching actual AGI. And not just visual input, but all kinds of sensory input. The problem is that we have no ability, not even close, to process that even near the level of a human yet, much less some super genius being.
- bentt 16 hours ago
  
  I think this a useful challenge to our normal way of thinking.
  At the same time, "the world" exists only in our imagination (per our brain). Therefore, if LLMs need a model of a world, and they're trained on the corpus of human knowledge (which passed through our brains), then what's the difference, especially when LLMs are going back into our brains anyway?
  
  qlm 16 hours ago
  
  Language isn't thought. It's a representation of thought.
  
  chasd00 16 hours ago
  
  Something to think about (hah!) is there are people without an internal monologue i.e. no voice inside their head they use when working out a problem. So they're thinking and learning and doing what humans do just fine with no little voice no language inside their head.
  
  WJW 15 hours ago
  
  It's so weird that people literally seem to have a voice in their head they cannot control. For me personally my "train of thought" is a series of concepts, sometimes going as far as images. I can talk to myself in my head with language if I make a conscious effort to do so, just as I can breathe manually if I want. But if I don't, it's not really there like some people seem to have.
  Probably there are at least two groups of people and neither really comprehends how the other thinks haha.
  
  graemefawcett 13 hours ago
  
  I think there are significantly more than 2, when you start to count variations through the spectrum of neurodiversity.
  Spatial thinkers, for example, or the hyperlexic.
  Meaning for hyperlexics is more akin to finding meaning in the edges of the graph, rather than the vertices. The form of language contributing a completely separate graph of knowledge, alongside its content, creating a rich, multimodal form of understanding.
  Spatial thinkers have difficulty with procedural thinking, which is how most people are taught. Rather than the series of steps to solve the problem, they see the shape of the transform. LLMs as an assistive device can be very useful for spatial thinkers in providing the translation layer between the modes of thought.
  
  hyperliner 15 hours ago
  
  [dead]
  
  rhetocj23 15 hours ago
  
  Its very interesting to see how many people struggle to understand this.
  
  CamperBob2 15 hours ago
  
  If it were that simple, LLMs wouldn't work at all.
  
  qlm 3 hours ago
  
  I think it explains quite well why LLMs are useful in some ways but stupid in many other ways.
  
  naasking 16 hours ago
  
  Are the particles that make up thoughts in our brain not also a representation of a thought? Isn't "thought" really some kind of Platonic ideal that only has approximate material representations? If so, why couldn't some language sentences be thoughts?
  
  qlm 3 hours ago
  
  The sentence is the result of a thought. The sentence in itself does not capture every process that went into producing the sentence.
- zwnow 5 hours ago
  
  A world model can not exist, the context windows aren't even near big enough for that. Weird that every serious scientist agrees on AGI not being a thing in the next decades. LLMs are good if you train them for a specific thing. Not so much if you expect them to explain the whole world to you. This is not possible yet.
- sysguest 18 hours ago
  
  yeah that "model of the world" would mean:
  babies are already born with "the model of the world"
  but a lot of experiments on babies/young kids tell otherwise
  
  ekjhgkejhgk 17 hours ago
  
  > yeah that "model of the world" would mean: babies are already born with "the model of the world"
  No, not necessarily. Babies don't interact with the world only by reading what people wrote wikipedia and stackoverflow, like these models are trained. Babies do things to the world and observe what happens.
  I imagine it's similar to the difference between a person sitting on a bicycle and trying to ride it, vs a person watching videos of people riding bicycles.
  I think it would actually be a great experiment. If you take a person that never rode a bicycle in their life and feed them videos of people riding bicycles, and literature about bikes, fiction and non-fiction, at some point I'm sure they'll be able to talk about it like they have huge experience in riding bikes, but won't be able to ride one.
  
  aerhardt 17 hours ago
  
  We’ve been thinking about reaching the singularity from one end, by making computers like humans, but too little thought has been given to approaching the problem from the other end: by making babies build their world model by reading Stack Overflow.
  
  zelphirkalt 3 hours ago
  
  That's it. Now you've done it! I will have stackoverflow Q&A, as well as moderator comments and closings of questions playing 24/7 to my first not yet born child! Q&A for the knowledge and the mod comments for good behavior, of course. This will lead to singularity in no time!
  
  pavlov 16 hours ago
  
  The “Brave New World meets OpenAI” model where bottle-born babies listen to Stack Overflow 24 hours a day until they one day graduate to Alphas who get to spend Worldcoin on AI-generated feelies.
  
  godelski 17 hours ago
  
  It's a lot more complicated than that.
  You have instincts, right? Innate fears? This is definitely something passed down through genetics. The Hawk/Goose Effect isn't just limited to baby chickens. Certainly some mental encoding passes down through genetics as how much the brain controls, down to your breathing and heartbeat.
  But instinct is basic. It's something humans are even able to override. It's a first order approximation. Inaccurate to do meaningfully complex things, but sufficient to keep you alive. Maybe we don't want to call the instinct a world model (it certainly is naïve) but can't be discounted either.
  In human development, yeah, the lion's share of it happens post birth. Human babies don't even show typical signs of consciousness, even really till the age of 2. There's many different categories of "awareness" and these certainly grow over time. But the big thing that makes humans so intelligent is that we continue to grow and learn through our whole lifetimes. And we can pass that information along without genetics and have very advanced tools to do this.
  It is a combination of nature and nurture. But do note that this happens differently in different animals. It's wonderfully complex. LLMs are quite incredible but so too are many other non-thinking machines. I don't think we should throw them out, but we never needed to make the jump to intelligence. Certainly not so quickly. I mean what did Carl Sagan say?
  
  imtringued 16 hours ago
  
  One of the biggest mysteries of humans Vs LLMs is that LLMs need an absurd amount of data during pre training, then a little bit of data during fine tuning to make them behave more human. Meanwhile humans don't need any data at all, but have the blind spot that they can only know and learn about what they have observed. This raises two questions. What is the loss function of the supervised learning algorithm equivalent? Supposedly neurons do predictive coding. They predict what their neighbours are doing. That includes input only neurons like touch, pain, vision, sound, taste, etc. The observations never contain actions. E.g. you can look at another human, but that will never teach you how to walk because your legs are different from other people's legs.
  How do humans avoid starving to death? How do they avoid leaving no children? How do they avoid eating food that will kill them?
  These things require a complicated chain of actions. You need to find food, a partner and you need to spit out poison.
  This means you need a reinforcement learning analogue, but what is going to be the reward function equivalent? The reward function can't be created by the brain, because it would be circular. It would be like giving yourself a high, without even needing drugs. Hence, the reward signal must remain inside the body but outside the brain, where the brain can't hack it.
  The first and most important reward is to perform reproduction. If food and partners are abundant, the ones that don't reproduce simply die out. This means that reward functions that don't reward reproduction disappear.
  Reproduction is costly in terms of energy. Do it too many times and you need to recover and eat. Hunger evolved as a result of the brain needing to know about the energy state of the body. It overrides reproductive instincts.
  Now let's say you have a poisonous plant that gives you diarrhea, but you are hungry. What stops you from eating it? Pain evolves as a response to a damaged body. Harmful activities signal themselves in the form of pain to the brain. Pain overrides hunger. However, what if the plant is so deadly that it will kill you? The pain sensors wouldn't be fast enough. You need to sense the poison before it enters your body. So the tongue evolves taste and cyanide starts tasting bitter.
  Notice something? The feelings only exist internally inside the human body, but they are all coupled with continued survival in one way or another. There is no such thing for robots or LLMs. They won't accidentally evolve a complex reward function like that.
  
  godelski 13 hours ago
  
  > Meanwhile humans don't need any data at all
  I don't agree with this and I don't think any biologist or neuroscientist would either.
  1) Certainly the data I discussed exists. No creature comes out a blank slate. I'll be bold enough to say that this is true even for viruses, even if we don't consider them alive. Automata doesn't mean void of data and I'm not sure why you'd ascribe this to life or humans.
  2) humans are processing data from birth (technically before too but that's not necessary for this conversation and I think we all know that's a great way to have an argument and not address our current conversation). This is clearly some active/online/continual/ reinforcement/wherever-word-you-want-to-use learning.
  It's weird to suggest an either or situation. All evidence points to "both". Looking at different animals even see both but also with different distributions.
  I think it's easy to over simplify the problem and the average conversation tends to do this. It's clearly a complex with many variables at play. We can't approximate with any reasonable accuracy by ignoring or holding them constant. They're coupled.
  > The reward function can't be created by the brain, because it would be circular.
  Why not? I'm absolutely certain I can create my own objectives and own metrics. I'm certain my definition of success is different from yours.
  > It would be like giving yourself a high, without even needing drugs
  Which is entirely possible. Maybe it takes extreme training to do extreme versions but it's also not like chemicals like dopamine are constant. You definitely get a rush by completing goals. People become addicted to things like videogames, high risk activities like sky diving, or even arguing on the internet.
  Just because there are externally driven or influenced goals doesn't mean internal ones can't exist. Our emotions can be driven both externally and internally.
  > Notice something?
  You're using too simple of a model. If you use this model then the solution is as easy as giving a robot self preservation (even if we need to wait a few million years). But how would self preservation evolve beyond its initial construction without the ability to metaprocess and refine that goal? So I think this should highlight a major limitation in your belief. As I see it, the only other way is a changing environment that somehow allows continued survival by the constructions and precisely evolves such that the original instructions continue to work. Even with vague instructions that's an unstable equilibrium. I think you'll find there's a million edge cases even if it seems obvious at first. Or read some Asimov ;)
  
  ben_w 17 hours ago
  
  > babies are already born with "the model of the world"
  > but a lot of experiments on babies/young kids tell otherwise
  I believe they are born with such a model? It's just that model is one where mummy still has fur for the baby to cling on to? And where aged something like 5 to 8 it's somehow useful for us to build small enclosures to hide in, leading to a display of pillow forts in the modern world?
  
  sysguest 17 hours ago
  
  damn I guess I had to be more specific:
  "LLM-level world-detail knowledge"
  
  ben_w 17 hours ago
  
  I think I'm even more confused now about what you mean…
  
  rwj 18 hours ago
  
  Lots of experiments show that babies develop import capabilities at roughly the same times. That speaks to inherited abilities.
- xmichael909 8 hours ago
  
  love the intentional use of udnerstand, brilliant!
- zaphos 8 hours ago
  
  "just a matter of adding more 9s" is a wild place to use a "just" ...
- imtringued 16 hours ago
  
  Model based reinforcement learning is a thing and it is kind of a crazy idea. Look up temporal difference model predictive control.
  The fundamental idea behind temporal difference is that you can record any observable data stream over time and predict the difference between past and present based on your decision variables (e.g. camera movement, actuator movement, and so on). Think of it like the Minecraft clone called Oasis AI. The AI predicts the response to a user provided action.
  Now imagine if it worked as presented. The data problem would be solved, because you are receiving a constant stream of data every single second. If anything, the RL algorithms are nowhere near where they need to be and continual learning has not been solved yet, but the best known way is through automatic continual learning ala Schmidhuber (co-inventor of LSTMs along with Hochreiter).
  So, model based control is solved right? Everything that can be observed can be controlled once you have a model!
  Wrong. Unfortunately. You still need the rest of reinforcement learning: an objective and a way to integrate the model. It turns out that reconstructing the observations is too computationally challenging and the standard computational tricks like U-Nets learn a latent representation that is optimized for reconstruction rather than for your RL objectives. There is a data exchange problem that can only realistically be solved by throwing an even bigger model at it, but here is why that won't work either:
  Model predictive control tries to find the best trajectory over a receding horizon. It is inherently future oriented. This means that you need to optimize through your big model and that is expensive to do.
  So you're going to have to take shortcuts by optimizing for a specific task. You reduce the dimension of the latent space and stop reconstructing the observations. The price? You are now learning a latent space for your particular task, which is less demanding. The dream of continual learning with infinite data shatters and you are brought down to earth: it's better than what came before, but not that much better.
- exe34 17 hours ago
  
  To me, it's a matter of a very big checklist - you can keep adding tasks to the list, but if it keeps marching onwards checking things off your list, some day you will get there. whether it's a linear or asymptotic march, only time will tell.
  
  ekjhgkejhgk 17 hours ago
  
  I don't know if you will get there, that's far from clear at this stage.
  Did you see the recent video by Nick Beato [1] where he asks various models about a specific number? The models that get it right are the models that consume youtube videos, because there was a youtube video about that specific number. It's like, these models are capable of telling you about very similar things that they've seen, but they don't seem like they understand it. It's totally unclear whether this is a quantitative or qualitative gap.
  [1] https://www.youtube.com/watch?v=TiwADS600Jc
  
  cactusplant7374 17 hours ago
  
  That's like saying that if we image every neuron in the brain we will understand thinking. We can build these huge databases and they tell us nothing about the process of thinking.
  
  exe34 17 hours ago
  
  What if we copy the functionality of every neuron? what if we simply copy all the skills that those neurons compute?
  
  rootusrootus 17 hours ago
  
  Do we even know the functionality of every neuron?
  
  exe34 15 hours ago
  
  Not yet.
  
  imadierich 16 hours ago
  
  [dead]
danielvaughn 10 hours ago

I have a very surface level understanding of AI, and yet this always seemed obvious to me. It's almost a fundamental law of the universe that complexity of any kind has a long tail. So you can get AI to faithfully replicate 90% of a particular domain skill. That's phenomenal, and by itself can yield value for companies. But the journey from 90%-100% is going to be a very difficult march.
- tim333 3 hours ago
  
  The nines comment was in the context of self driving cars which I can see because you are never perfect driving and accidents can be fatal.
  Some AI is like chess though, where they steadily advance in ELO ranking.
- BolexNOLA 9 hours ago
  
  The last mile problem is inescapable!
DanHulton 9 hours ago

The thing about this, though - cars have been built before. We understand what's necessary to get those 9s. I'm sure there were some new problems that had to be solved along the way, but fundamentally, "build good car" is known to be achievable, so the process of "adding 9s" there makes sense.
But this method of AI is still pretty new, and we don't know it's upper limits. It may be that there are no more 9s to add, or that any more 9s cost prohibitively more. We might be effectively stuck at 91.25626726...% forever.
Not to be a doomer, but I DO think that anyone who is significantly invested in AI really has to have a plan in case that ends up being true. We can't just keep on saying "they'll get there some day" and acting as if it's true. (I mean you can, just not without consequences.)
- danielmarkbruce 9 hours ago
  
  While you are right about the broader (and sort of ill defined) chase toward 'AGI' - another way to look at it is the self driving car - they got there eventually.And, if you work on applications using LLMs you can pretty easily see that Karpathy's sentiment is likely correct. You see it because you do it. Even simple applications are shaped like this, albeit each 9 takes less time than self driving cars for a simple app.. it still feels about right.
  
  Hendrikto 4 hours ago
  
  > another way to look at it is the self driving car - they got there eventually.
  No they did not. Elon has been saying Tesla will get there “next year” since 2015. He is still saying that, and despite changing definitions, we still are not there.
  
  1oooqooq 2 hours ago
  
  i guess the comment you replied proves the actual point "we may never get there, but it will be enough for the market".
  sigh, i guess it's time to laugh on that video compilation of elon saying "next week" for 10yrs straight and then cry seeing how much he made of doing that.
  
  vasco 7 hours ago
  
  > another way to look at it is the self driving car - they got there eventually
  Current self driving cars only work in American roads. Maybe Canada too, not sure how their roads are. Come to Europe/anywhere else and every other road would be intractable. Much tighter lanes, many turns you have a little mirror to see who's coming on the other side, single car at a time lanes that you need to "understand" who goes first, mountain roads where you sometimes need to reverse for 100m when another car is coming so it's wide enough that they can pass before you can keep going forward, etc.
  Many things like this that would require another 2 or 3 "nines" as the guy put it than acceptable quality in American huge roads.
  https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQ4NWIt...
fair_enough 18 hours ago

Reminds me of a time-honored aphorism in running:
A marathon consists of two halves: the first 20 miles, and then the last 10k (6.2mi) when you're more sore and tired than you've ever been in your life.
- jakeydus 17 hours ago
  
  This is 100% unrelated to the original article but I feel like there's an underreported additional first half. As a bigger runner who still loves to run, the first two or three miles before I have enough endorphins to get into the zen state that makes me love running is the first half, then it's 17 miles of this amazing meditative mindset. Then the last 10k sucks.
  
  awesome_dude 17 hours ago
  
  Just, ftr, endorphins cannot pass the blood brain barrier
  http://hopkinsmedicine.org/health/wellness-and-prevention/th...
- rootusrootus 17 hours ago
  
  I suspect that is true for many difficult physical goals.
  My dad told me that the first time you climb a mountain, there will likely be a moment not too distant from the top when you would be willing to just sit down and never move again, even at the risk to your own life. Even as you can see the goal not far away.
  He also said that it was a dangerous enough situation that as a climb leader he'd start kicking you if he had to, if you sat down like that and refused to keep climbing. I'm not a climber myself, though, so this is hearsay, and my dad is long dead and unable to remind me of what details I've forgotten.
- tylerflick 18 hours ago
  
  I think I hated life most after 20 miles. Especially in training.
- sarchertech 17 hours ago
  
  Why just run 20 miles then?
  
  rootusrootus 17 hours ago
  
  Because then it wouldn't be a challenge and nobody would care about the achievement.
  
  sarchertech 15 hours ago
  
  I’m curious do ultramarathoners feel the same way about the rest of the race past 20 miles?
  
  rootusrootus 15 hours ago
  
  I've heard it claimed that an ultramarathon is fundamentally a different experience because while it definitely requires excellent physical stamina, it has a large mental component to it, as well as a much bigger focus on nutrition. Very different sort of race, I guess.
  
  justinwp 12 hours ago
  
  there are multiple cycles from highs to lows and back and then typically a larger dominant split similar what was discussed here for the marathon but scaled to the distance.
  
  justinwp 12 hours ago
  
  the split would be first 80 and las t 20 miles +-10 miles.
  
  nextworddev 16 hours ago
  
  because that'd be quitting the race with 6.2 miles left to go
  
  sarchertech 16 hours ago
  
  You could run a half marathon.
  
  nextworddev 15 hours ago
  
  yeah but anyone can do that
godelski 17 hours ago

It's a good way to think about lots of things. It's Pareto efficiency. The 80/20 rule
20% of your effort gets you 80% of the way. But most of your time is spent getting that last 20%. People often don't realize that this is fractal like in nature, as it draws from the power distribution. So of that 20% you still have left, the same holds true. 20% of your time (20% * 80% = 16% -> 36%) to get 80% (80% * 20% => 96%) again and again. The 80/20 numbers aren't actually realistic (or constant) but it's a decent guide.
It's also something tech has been struggling with lately. Move fast and break things is a great way to get most of the way there. But you also left a wake of destruction and tabled a million little things along the way. Someone needs to go back and clean things up. Someone needs to revisit those tabled things. While each thing might be little, we solve big problems by breaking them down into little ones. So each big problem is a sum of many little ones, meaning they shouldn't be quickly dismissed. And like the 9's analogy, 99.9% of the time is still 9hrs of downtime a year. It is still 1e6 cases out of 1e9. A million cases is not a small problem. Scale is great and has made our field amazing, but it is a double edged sword.
I think it's also something people struggle with. It's very easy to become above average, or even well above average at something. Just trying will often get you above average. It can make you feel like you know way more but the trap is that while in some domains above average is not far from mastery in other domains above average is closer to no skill than it is to mastery. Like how having $100m puts your wealth closer to a homeless person than a billionaire. At $100m you feel way closer to the billionaire because you're much further up than the person with nothing but the curve is exponential.
- 010101010101 17 hours ago
  
  https://youtu.be/bpiu8UtQ-6E?si=ogmfFPbmLICoMvr3
  "I'm closer to LeBron than you are to me."
sdenton4 18 hours ago

Ha, I often speak of doing the first 90% of the work, and then moving on to the following 90% of the work...
- JimDabell 18 hours ago
  
  > The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.
  — Tom Cargill, Bell Labs (September 1985)
  https://dl.acm.org/doi/pdf/10.1145/4284.315122
- inerte 18 hours ago
  
  I use "The project is 90% ready, now we only have to do the other half"
  
  typpilol 18 hours ago
  
  92% is half actually - RuneScape Players
omidsa1 18 hours ago

I also quite like the way he puts it. However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI. That's why we can expect fast acceleration to take off within two years.
- breuleux 17 hours ago
  
  I don't think we can be confident that this is how it works. It may very well be that our level of intelligence has a hard limit to how many nines we can add, and AGI just pushes the limit further, but doesn't make it faster per se.
  It may also be that we're looking at this the wrong way altogether. If you compare the natural world with what humans have achieved, for instance, both things are qualitatively different, they have basically nothing to do with each other. Humanity isn't "adding nines" to what Nature was doing, we're just doing our own thing. Likewise, whatever "nines" AGI may be singularly good at adding may be in directions that are orthogonal to everything we've been doing.
  Progress doesn't really go forward. It goes sideways.
  
  bamboozled 10 hours ago
  
  It's also assuming that all advances in AI just lead to cold hard gains, people have suggested this before but would a sentient AI get caught up in philosophical, silly or religious ideas? Silicone investor types seem to hope it's all just curing diseases they can profit from, but it might also be, "let's compose some music instead"?
  
  Unit327 6 hours ago
  
  AI doesn't have hopes and desires or something it would rather be doing. It has a utility function that it will optimise for regardless of all else. This doesn't change when it gets smarter, or even when it gets super-intelligence.
  
  adventured 17 hours ago
  
  Adding nines to nature is exactly what humans are doing. We are nature. We are part of the natural order.
  Anything that exists is part of nature, there can be no exceptions.
  If I go burn a forest down on purpose, that is in fact nature doing it. No different than if a dolphin kills another animal for fun or a chimp kills another chimp over a bit of territory. Insects are also every bit as 'vicious' in their conquests.
  
  j45 17 hours ago
  
  Intuition of someone who has put in a decade or two of wondering openly can't me discounted as easily as someone who might be a beginner to it.
  AGI to encompass all of humanity's knowledge in one source and beat every human on every front might be a decade away.
  Individual agents with increased agency adequately covering more and more abilities consistently? Seems like a steady path that can be seen into the horizon to put one foot in front of the other.
  For me, the grain of salt I'd take Karpathy with is much, much, smaller than average, only because he tries to share how he thinks and examines his own understanding and changes it.
  His ability to explain complex things simply is something that for me helps me learn and understand things quicker and see if I arrive at something similar or different, and not immediately assume anything is wrong, or right without my understanding being present.
- rpcope1 17 hours ago
  
  > However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI.
  There's a massive planet-sized CITATION NEEDED here, otherwise that's weapons grade copium.
- Yoric 17 hours ago
  
  It's a possibility, but far from certainty.
  If you look at it differently, assembly language may have been one nine, compilers may have been the next nine, successive generations of language until ${your favorite language} one more nine, and yet, they didn't get us noticeably closer to AGI.
- aughtdev 16 hours ago
  
  I doubt this. General intelligence will be a step change not a gentle ramp. If we get to an architecture intelligent enough to meaningfully contribute to AI development, we'll have already made it. It'll simply be a matter of scale. There's no 99% AGI that can help build 100% AGI but for some reason can't drive a car or cook a meal or work an office job.
- AnimalMuppet 18 hours ago
  
  Isn't that one of the measures of when it becomes an AGI? So that doesn't help you with however many nines we are away from getting an AGI.
  Even if you don't like that definition, you still have the question of how many nines we are away from having an AI that can contribute to its own development.
  I don't think you know the answer to that. And therefore I think your "fast acceleration within two years" is unsupported, just wishful thinking. If you've got actual evidence, I would like to hear it.
  
  ben_w 17 hours ago
  
  AI has been helping with the development of AI ever since at least the first optimising compiler or formal logic circuit verification program.
  Machine learning has been helping with the development of machine learning ever since hyper-parameter optimisers became a thing.
  Transformers have been helping with the development of transformer models… I don't know exactly, but it was before ChatGPT came out.
  None of the initials in AGI are booleans.
  But I do agree that:
  > "fast acceleration within two years" is unsupported, just wishful thinking
  Nobody has any strong evidence of how close "it" is, or even a really good shared model of what "it" even is.
  
  scragz 17 hours ago
  
  AGI is when it is general. a narrow AI trained only on coding and training AIs would contribute to the acceleration without being AGI itself.
- techblueberry 9 hours ago
  
  I think the 9's include this assumption.
yoyohello13 16 hours ago

I think a ton of people see a line going up and they think exponential. When in Reality, the vast majority of the time it’s actually logistic.
- tibbar 16 hours ago
  
  Given the physical limits of the universe and our planet in particular, yeah, this is pretty much always true. The interesting question is: what is that limit, and: how many orders of magnitude are we away from leveling off?
- misnome 5 hours ago
  
  I mean the cost line does look somewhat exponential…
somanyphotons 18 hours ago

This is an amazing quote that really applies to all software development
- Veserv 16 hours ago
  
  Drawn from Karpathy killing a bunch of people by knowingly delivering defective autonomous driving software instead of applying basic engineering ethics and refusing to deploy the dangerous product he was in charge of.
- zeroonetwothree 18 hours ago
  
  Well, maybe not all. I’ve definitely built CRUD UIs that were linear in effort. But certainly anything technically challenging or novel.
zeroonetwothree 18 hours ago

When I worked at Facebook they had a slogan that captured this idea pretty well: “this journey is 1% finished”.
- gowld 18 hours ago
  
  Copied from Amazon's "Day 1".
tekbruh9000 17 hours ago

Infinitely big little numbers
Academia has rediscovered itself
Signal attenuation, a byproduct of entropy, due to generational churn means there's little guarantee.
Occam's Razor; Karpathy knows the future or he is self selecting biology trying to avoid manual labor?
His statements have more in common with Nostradamus. It's the toxic positivity form of "the end is nigh". It's "Heaven exists you just have to do this work to get there."
Physics always wins and statistics is not physics. Gamblers fallacy; improvement of statistical odds does not improve probability. Probability remains the same this is all promises of some people who have no idea or interest in doing anything else with their lives; so stay the course.
- startupsfail 17 hours ago
  
  >> Heaven exists you just have to do this work to get there.
  Or perhaps Karpathy has a higher level understanding and can see a bigger picture?
  You've said something about heaven. Are you able to understand this statement, for example: "Heaven is a memeplex, it exists." ?
ojr 13 hours ago

if it works 90% of the time that means it fails 10% of the time, to get to 1% failure rate is a 10x improvement and from 1% failure rate to a 0.1% failure rate is also a 10x improvement
First time being hearing it be called "march of nines", did Tesla make the term, I thought it was an Amazon thing
czk 18 hours ago

like leveling to 99 in old school runescape
- fbrchps 18 hours ago
  
  The first 92% and the last 92%, exactly.
- zeroonetwothree 18 hours ago
  
  Or Diablo 2
  
  genewitch 16 hours ago
  
  i don't remember the end-game of the original Diablo; however, in diablo III and IV everyone i've tried to play the game gets bored in the run up to max level. I always tell them "i skip that part as much as possible, because that's not the game. That's just the story!"
  Once you hit max level in III and IV, the game actually "begins."
  and to explain the Diablo 2 Reference, the amount of time/effort it takes to go from level 98 to level 99 (the max level), is the same amount of time it takes to go from level 1 to level 98. I've heard "2 weeks" as a rough estimate of "unhealthy playtime", at least solo.
- wilfredk 18 hours ago
  
  Perfect analogy.
jlas 18 hours ago

Notably the scaling law paper shows result graphs on log-scale
red75prime 17 hours ago

The question is how many nines are humans.
- notTooFarGone 6 hours ago
  
  Humans adapt and become more nines the more they learn about something. Humans also are liable in a lawful sense. This is a huge factor in any AI use case.
jakeydus 17 hours ago

You know what they say, a Silicon Valley 9 is a 10 anywhere else. Or something like that.
- Yoric 17 hours ago
  
  I assume you're describing the fact that Silicon Valley culture keeps pushing out products before they're fully baked?
imadierich 16 hours ago

[dead]
breve 18 hours ago

[flagged]
- wcoenen 17 hours ago
  
  This is not exactly new information[1]. You may have a point that it was not presented to customers this way though.
  [1] https://x.com/elonmusk/status/1382458022367162370
  
  breve 16 hours ago
  
  The lie is older than that:
  https://web.archive.org/web/20161020091022/https://tesla.com...
  https://motherfrunker.ca/fsd/
  The fact that lie is old only makes it worse that Musk, Karpathy, and Tesla generally have still not taken responsibility for the lie. They are still not willing to refund the money they took for something they did not deliver.

simonw 10 hours ago

It looks like Andrej's definition of "agent" here is an entity that can replace a human employee entirely - from the first few minutes of the conversation:

When you’re talking about an agent, or what the labs have in mind and maybe what I have in mind as well, you should think of it almost like an employee or an intern that you would hire to work with you. For example, you work with some employees here. When would you prefer to have an agent like Claude or Codex do that work?

Currently, of course they can’t. What would it take for them to be able to do that? Why don’t you do it today? The reason you don’t do it today is because they just don’t work. They don’t have enough intelligence, they’re not multimodal enough, they can’t do computer use and all this stuff.

They don’t do a lot of the things you’ve alluded to earlier. They don’t have continual learning. You can’t just tell them something and they’ll remember it. They’re cognitively lacking and it’s just not working. It will take about a decade to work through all of those issues.

sarchertech 9 hours ago

He’s not just talking about agents good enough to replace workers. He’s talking about whether agents are currently useful at all.
>Overall, the models are not there. I feel like the industry is making too big of a jump and is trying to pretend like this is amazing, and it’s not. It’s slop. They’re not coming to terms with it, and maybe they’re trying to fundraise or something like that. I’m not sure what’s going on, but we’re at this intermediate stage. The models are amazing. They still need a lot of work. For now, autocomplete is my sweet spot. But sometimes, for some types of code, I will go to an LLM agent.
>They kept trying to mess up the style. They’re way too over-defensive. They make all these try-catch statements. They keep trying to make a production code base, and I have a bunch of assumptions in my code, and it’s okay. I don’t need all this extra stuff in there. So I feel like they’re bloating the code base, bloating the complexity, they keep misunderstanding, they’re using deprecated APIs a bunch of times. It’s a total mess. It’s just not net useful. I can go in, I can clean it up, but it’s not net useful.
- sothatsit 4 hours ago
  
  I don't think he is saying agents are not useful at all, just that they are not anywhere near the capability of human software developers. Karpathy later says he used agents to write the Rust translation of algorithms he wrote in Python. He also explicitly says that agents can be useful for writing boilerplate or for code that can be very commonly found online. So I don't think he is saying they are not useful at all. Instead, he is just holding agents to a higher standard of working on a novel new codebase, and saying they don't pass that bar.
  Tbh I think people underestimate how much software development work is just writing boilerplate or common patterns though. A very large percentage of the web development work I do is just writing CRUD boilerplate, and agents are great at it. I also find them invaluable for searching through large codebases, and for basic code review, but I see these use-cases discussed less even though they're a big part of what I find useful from agents.
  
  CaptainOfCoit 2 hours ago
  
  My biggest takeaway is that agents/LLMs in general are super helpful when paired together with a human who knows the inside and out of software development, who uses it side-by-side with their normal work.
  They start being less useful when you start treating them as "I can send them ill-specified stuff, ignore them for 10 minutes and merge their results", as things spiral out of control. Basically "vibe-coding" as a useful concept doesn't work for projects you need to iterate on, only for things you feel OK with throwing away eventually.
  Augmenting the human intellect with LLMs? Usually a increase in productivity. Replacing human coworkers with LLMs? Good luck, have fun.
- consumer451 7 hours ago
  
  I am just some shmoe, but I agree with that assessment. My biggest take-away is that we got super lucky.
  At least now we have a slight chance to prepare for the potential economic and social impacts.
  
  Bengalilol 5 hours ago
  
  I am thinking the same.
  And we should start considering on what makes us humans and how we can valorize our common ground.
  
  tablatom 4 hours ago
  
  This. I believe it’s the most important question in the world right now. I’ve been thinking long and hard about this from an entirely practical perspective and have surprised myself that the answer seems to be our capacity to love. The idea is easily dismissed as romantic but when I say I’m being practical I really mean it. I’m writing about it here https://giftcommunity.substack.com/
- kubb 4 hours ago
  
  My ever growing reporting chain is incredibly invested in having autonomous agents next year.
eddiewithzato 10 hours ago

Because that's the definition that is leading to all these investments, the promise that very soon they will reach it. If Altman said plainly that LLMs will never reach that stage, there would be a lot less investment into the industry.
- aik 10 hours ago
  
  Hard disagree. You don’t need AGI to transform countless workflows within companies, current LLMs can do it. A lot of the current investments are to help with the demand with current generation LLMs (and use cases we know will keep opening up with incremental improvements). Are you aware of how intensely all the main companies that host leading models (azure, aws, etc) are throttling usage due to not enough data center capacity? (Eg. At my company we have 100x more demand than we can get capacity for, and we’re barely getting started. We have a roadmap with 1000x+ the current demand and we’re a relatively small company.)
  AGI would be more impactful of course, and some use cases aren’t possible until we have it, but that doesn’t diminish the value of current AI.
  
  kllrnohj 9 hours ago
  
  > Eg. At my company we have 100x more demand than we can get capacity for, and we’re barely getting started. We have a roadmap with 1000x+ the current demand and we’re a relatively small company.
  OpenAI's revenue is $13bn with 70% of that coming from people just spending $20/mo to talk to ChatGPT. Anthropic is projecting $9bn in revenue in 2025. For nice cold splash of reality, fucking Arizona Iced Tea has $3bn in revenue (also that's actual revenue not ARR)
  You might have 100x more demand than you can get capacity for, but if that 100x still puts you at a number that in absolute terms is small, it's not very impressive. Similarly if you're already not profitable and achieving 100x growth requires 1,000x in spend, that's also not a recipe for success. In fact it's a recipe for going bankrupt in a hurry.
  
  hyperadvanced 9 hours ago
  
  This is correct, it should burn the retinas of anyone thinking that OAI or Anthropic are in any way worth their multi-billion dollar valuations. I liked AK’s analysis of AI for coding here (it’s overly defensive, lacks style and functionality awareness, is a cargo cultist, and/or just does it wrong a lot) but autocomplete itself is super valuable, as is the ability to generate simple frontend code and let you solve the problem of making a user interface without needing a team of people with those in-house skills.
  
  vharish an hour ago
  
  There are many more use cases that aren't fully realised yet. With regards to coding, LLMs have shortcomings. However, there's a lot of work that can be automated. Any work that requires interaction with a computer can eventually be automated to some extent. To what extent is something only time can tell.
  
  bloppe 7 hours ago
  
  This is a relatively reasonable take. Unfortunately, that's not what most AI investors or non-technical punters think. Since GPT 1 it's been all about unlocking 100%+ annual GDP growth by wholesale white collar automation. I agree with AK that the actually effect on GDP will be more or less negligible, which will be an unmitigated disaster for us economically given how much cash has already been incinerated
  
  Culonavirus 9 hours ago
  
  Oh look, people with skin in the AI game insist AI is not a massive bubble. More news at 11.
bbor 10 hours ago
Quite telling -- thanks for the insightful comment as always, Simon. Didn't know that, even though I've been discussing this on and off all day on Reddit.
He's a smart man with well-reasoned arguments, but I think he's also a bit poisoned by working at such a huge org, with all the constraints that comes with. Like, this:
```
  You can’t just tell them something and they’ll remember it.
```
It might take a decade to work through this issue if you just want to put a single LLM in a single computer and have it be a fully-fledged human, sure. And since he works at a company making some of the most advanced LLMs in the world, that perspective makes sense! But of course that's not how it's actually going to be (/already is).
LLMs are a necessary part of AGI(/"agents") due to their ability to avoid the Frame Problem[1], but they're far from the only needed thing. We're pretty dang good at "remembering things" with computers already, and connecting that with LLM ensembles isn't going to take anywhere close to 10 years. Arguably, we're already doing it pretty darn well in unified systems[2]...
If anyone's unfamiliar and finds my comment interesting, I highly recommend Minsky's work on the Society of Mind, which handled this topic definitively over 20 years ago. Namely;
A short summary of "Connectionism and Society of Mind" for laypeople at DARPA: https://apps.dtic.mil/sti/tr/pdf/ADA200313.pdf
A description of the book itself, available via Amazon in 48h or via PDF: https://en.wikipedia.org/wiki/Society_of_Mind
By far my favorite paper on the topic of connectionist+symbolist syncreticism, though a tad long: https://www.mit.edu/~dxh/marvin/web.media.mit.edu/~minsky/pa...
[1] https://plato.stanford.edu/entries/frame-problem/
[2] https://github.com/modelcontextprotocol/servers/tree/main/sr...

bigtones 8 hours ago

Andrej Karpathy seems to me like a national (world) treasure.

He has the ability to explain concepts and thoughts with analogies and generalizations and interesting sayings that allow you to keep interest in what he is talking about for literally hours - in a subject that I don't know that much about. Clearly he is very smart, as is the interviewer, but he is also a fantastic communicator and does not come across as arrogant or pretentious, but really just helpful and friendly. Its quite a remarkable and amazing skillset. I'm in awe.

Ozzie_osman 6 hours ago

Agreed. I'd also add he's intellectually honest enough to not overhype what's happening just to hype whatever he's working on or appear to be a thought leader. Just very clear, pragmatic, and intellectually honest thought about the reality of things.

willyxdjazz 3 hours ago

Maybe I'm being too simplistic, but I think we're mixing two distinct debates.

Today we have an extraordinary invention—comparable to the wheel in its time. That invention is: predictive inference over all human knowledge. Period. I don't like calling it "Artificial Intelligence" because it's not intelligence; it's a prediction system that can project responses by illuminating patterns across all human knowledge encapsulated in text, audio, and video. What companies like OpenAI call "reasoning" models is simply that predictive process, but in a loop packaged as a product—one of the first marvelous uses of this fascinating invention: predictive inference over all human knowledge.

When the wheel was invented, no one could have imagined that, combined with hundreds of subsequent technologies, it would enable an electric car powered by solar energy. The wheel wasn't autonomous transportation—it was a fundamental component.

I see two debates getting mixed up here:

- The debate about the current invention: A tool that makes encyclopedias "speak" by connecting patterns across all human knowledge. As a tool, that's what it is—nothing more, nothing less. Tremendously useful, but a tool.

- The debate about the future dream: What this invention might enable when combined with hundreds of technologies that don't yet exist—similar to imagining an electric car when you only have the wheel.

It seems many experts are taking positions and getting "upset" because they're mixing these two debates. Some evaluate the wheel as if it should already be a solar electric car. Others defend the wheel by saying it already IS a solar electric car. Both are right in their observations, but they're talking about different things.

LLMs are a fundamental breakthrough—the "wheel" of the information age. But discussing whether they "understand" or have "world models" is like asking whether the wheel "comprehends transportation."

On the danger of confusing capabilities: Conflating the tool with the end goal leads us to poor decisions—from over-investment to under-utilization. When we expect AGI from what is fundamentally a pattern-matching engine, we set ourselves up for disappointment and misallocation of resources. No magic, just reality.

The temporal factor: The AGI debate is a debate about the future—about what might emerge from combinations of technologies we haven't yet invented.

jstummbillig 2 hours ago

> I don't like calling it "Artificial Intelligence" because it's not intelligence
A pattern I noticed in a AI[sic] discussions: Handwavily declaring what intelligence is not, while not explaining what is.
- willyxdjazz 2 hours ago
  
  You are right, I thought maybe something interesting in these debates is more education about how an LLM works. I don’t like calling it artificial intelligence because precisely we don’t understand well what “intelligence” is. What we do understand is how we came to build an LLM. Good point, I will keep that in mind for next time; it’s better to give more details and, above all, remove the “no” from assertions and clarify more. Thanks :)
- foofoo12 42 minutes ago
  
  > Handwavily declaring what intelligence is not, while not explaining what is.
  That goes in the other direction too. Declaring it intelligent without explaining what it is. Or even worse, if any explanations are offered, they are often half truths or exaggerated.

spjt 9 hours ago

The thing about AGI is that if it's even possible, it's not coming before the money runs out of the current AI hype cycle. At least we'll all be able to pick up a rack of secondhand H100's for a tenner and a pack of smokes to run uncensored diffusion models on in a couple years. The real devastation will be in the porn industry.

mrklol 3 hours ago

I also don’t think our generation will see actual AGI, but imo the hard "intelligence“ part isn’t needed as we can use our intelligence. Using it as a tool will hopefully lead to plenty of cool things in the future.

joshellington 13 hours ago

To throw two pennies in the ocean of this comment section - I’d argue we still lack schematic-level understanding of what “intelligence” even is or how it works. Not to mention how it interfaces with “consciousness”, and their likely relation to each other. Which kinda invalidates a lot of predictions/discussions of “AGI” or even in general “AI”. How can one identify Artificial Intelligence/AGI without a modicum of understanding of what the hell intelligence even is.

qudat 12 hours ago

The reason why it’s so hard to define intelligence or consciousness is because we are hopelessly biased with a datapoint of 1. We also apply this unjustified amount of mysticism around it.
https://bower.sh/who-will-understand-consciousness
__MatrixMan__ 10 hours ago

I don't think we can ever know that we are generally intelligent. We can be unsure, or we can meet something else which possesses a type of intelligence that we don't, and then we'll know that our intelligence is specific and not general.
So to make predictions about general intelligence is just crazy.
And yeah yeah I know that OpenAI defines it as the ability to do all economically relevant tasks, but that's an awful definition. Whoever came up with that one has had their imagination damaged by greed.
- judahmeek 9 hours ago
  
  All intelligence is specific, as evidenced by the fact that a universal definition regarding the specifics of "common sense" doesn't exist.
visarga 5 hours ago

> we still lack schematic-level understanding of what “intelligence” even is or how it works. Not to mention how it interfaces with “consciousness”, and their likely relation to each other
I think you can get pretty far starting from behavior and constraints. The brain needs to act in such a way as to pay for its costs. And not just day to day costs, also ability to receive and give that initial inheritance.
From cost of execution we can derive an imperative for efficiency. Learning is how we avoid making the same mistakes and adapt. Abstractions are how we efficiently carry around past experience to be applied in new situations. Imagination and planning are how we avoid the high cost of catastrophic mistakes.
Consciousness itself falls from the serial action bottleneck. We can't walk left and right at the same time, or drink coffee before brewing it. Behavior has a natural sequential structure, and this forces the distributed activity in the brain to centralized on a serial output sequence.
My mental model is that of a structure-flow recursion. Flow carves structure, and structure channels flow. Experiences train brains and brain generated actions generate experiences. Cutting this loop and analyzing parts of it in isolation does not make sense, like trying to analyze the matter and motion in a hurricane separately.
vannucci 12 hours ago

This so much this. We don’t even have a good model for how invertebrate minds work or a good theory of mind. We can keep imitating understanding but it’s far from any actual intelligence.
- tim333 11 hours ago
  
  I'm not sure we or evolution needed a theory of mind. Evolution stuck neurons together in various ways and fiddled with it till it worked without a master plan and the LLM guys seem to be doing something rather like that.
  
  zargon 6 hours ago
  
  LLM guys took a very specific layout of neurons and said “if we copy paste this enough times, we’ll get intelligence.”
keiferski 7 hours ago

That would require philosophical work, something that the technicians building this stuff refuse to acknowledge as having value.
Ultimately this comes down to the philosophy of language and of the history of specific concepts like intelligence or consciousness - neither of which exist in the world as a specific quality, but are more just linguistic shorthands for a bundle of various abilities and qualities.
Hence the entire idea of generalized intelligence is a bit nonsensical, other than as another bundle of various abilities and qualities. What those are specifically doesn’t seem to be ever clarified before the term AGI is used.
Culonavirus 9 hours ago

> I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["<insert general intelligence buzzword>"], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the <insert llm> involved in this case is not that.
https://en.wikipedia.org/wiki/I_know_it_when_I_see_it

keeda 9 hours ago

Huh, I'm surprised that he goes from "No AI" to "AI autocomplete" to "Vibecoding / Agents" (which I assume means no human review per his original coinage of the term.) This seems to preclude the chat-oriented / pair-programming model which I find most effective. Or even the plan-spec-codegen-review approach, which IME works extremely well for straightforward CRUD apps.

Also they discuss the nanochat repo in the interview, which has become more famous for his tweet about him NOT vibe-coding it: https://www.dwarkesh.com/i/176425744/llm-cognitive-deficits

Things are more nuanced than what people have assumed, which seems to be "LLMs cannot handle novel code". The best I can summarize it as is that he was doing rather non-standard things that confused the LLMs which have been trained on vast amounts on very standard code and hence kept defaulting to those assumptions. Maybe a rough analogy is that he was trying to "code golf" this repo whereas LLMs kept trying to write "enterprise" code because that is overwhelmingly what they have been trained on.

I think this is where the chat-oriented / pair-programming or spec-driven model shines. Over multiple conversations (or from the spec), they can understand the context of what you're trying to do and generate what you really want. It seems Karpathy has not tried this approach (given his comments about "autocomplete being his sweet spot".)

For instance, I'm working on some straightforward computer vision stuff, but it's complicated by the fact that I'm dealing with small, low-resolution images, which does not seem well-represented in the literature. Without that context, the suggestions any AI gives me are sub-optimal.

However, after mentioning it a few times, ChatGPT now "remembers" this in its context, and any suggestion it gives me during chat is automatically tailored for my use-case, which produces much better results.

Put another way (not an AI expert so I may be using the terms wrong), LLMs will default to mining the data distribution they've been trained on, but with sufficient context, they should be able to adapt their output to what you really want.

hax0ron3 15 hours ago

If the transcript is accurate, Karpathy does not actually ever, in this interview, say that AGI is a decade away, or make any concrete claims about how far away AGI is. Patel's title is misleading.

dang 10 hours ago

Hmm good point. I skimmed the transcript looking for an accurate, representative quote that we could use in the title above. I couldn't exactly find one (within HN's 80 char limit), so I cobbled together "It will take a decade to get agents to work", which is at least closer to what Karpathy actually said.
If anyone can suggest a more accurate and representative title, we can change it again.
Edit: I thought of using "For now, autocomplete is my sweet spot", which has the advantage of being an exact quote; but it's probably not clear enough.
Edit 2: I changed it to "It will take a decade to work through the issues with agents" because that's closer to the transcript.
Anybody have a better idea? Help the cause of accuracy out here!
- moozilla 5 hours ago
  
  You could go with the title from the associated YouTube video (https://www.youtube.com/watch?v=lXUZvyajciY)?
  Andrej Karpathy — “We’re summoning ghosts, not building animals”
- hax0ron3 8 hours ago
  
  To be fair to the OP of the thread, he's just using Patel's title word-for-word. It's Patel who is being inaccurate.
  
  dang 7 hours ago
  
  Oh that's clear, and the submitter didn't do anything wrong. It's just that on HN the idea is to find a different title when the article's own title is misleading or linkbait (https://news.ycombinator.com/newsguidelines.html).
  The best way to do that of course is to find a more representative phrase from the article itself. That's almost always possible but I couldn't quite swing it in this case.
  
  realty_geek 4 hours ago
  
  dang!! I have so much respect for this ironic situation where we are discussing the superpowers of AI while a very human, very decent being ponders deeply on how to compose a few words to make a suitable title. Please can we have a future world where such events can always happen every so often.
tim333 11 hours ago

He says re agents:
>They don't have enough intelligence, they're not multimodal enough, they can't do computer use and all this stuff. They don't do a lot of the things you've alluded to earlier. They don't have continual learning. You can't just tell them something and they'll remember it. They're cognitively lacking and it's just not working.
>It will take about a decade to work through all of those issues. (2:20)
- hax0ron3 8 hours ago
  
  Him saying that it will take a decade to work through agents' issues isn't the same as him saying that there will be AGI in a decade, though
- bamboozled 10 hours ago
  
  Couldn't have even been bothered watching ~ 2 minutes of an interview before commenting.
whiplash451 15 hours ago

Did the same with Sutton (LLMs are a dead end) when Sutton never said this in the conversation.
nextworddev 11 hours ago

There's a lot of salt here
- dang 10 hours ago
  
  > Hey, podcast bro needs to get clicks
  Please don't cross into personal attack. It's not what this site is for, and destroys what it is for.
  Edit: please don't edit comments to change their meaning once someone has replied. It's unfair to repliers, whose comments no longer makes sense, and it's unfair to readers, who can no longer understand the thread. It's fine, of course, to add to an existing comment in such a case, e.g. by saying "Edit:" or some such and then adding what else you want to say.

roody15 12 hours ago

One inherent limitation of current LLM/AI is that they are primarily trained on abstracted data that focuses primarily on mimicking our logical and reasoning prefrontal cortex portion of the mind. However most humans make decisions based on activity in the limbic regions of the brain which are essentially emotional and intuition based. So we often will do something before we actually know why we did it, however to maintain a sense of self and sanity we will then use our prefrontal cortex to create a cohesive narrative on why we do what we do (despite it often being inaccurate).

In a nutshell we are mimicking neural activity in a certain region based on certain abstracted data which is quite removed from how we as humans process reality.

hvb2 2 hours ago

In the same sense that split brain patients [1] will make up a reasonable explanation for what the other half did.
And why witnesses are preferably interviewed very shortly after they witnessed a crime. Before their brains start to 'fill in the blanks'
1: https://en.wikipedia.org/wiki/Split-brain

jackdoe 2 hours ago

He is an absolute treasure, I have watched all his videos more than 4 times and I don't think I would've been able to have a good mental model about deep learning without them, regardless of the amount of Bengio, Goodfellow etc lectures I have seen, none of them come even close.

He is singlehandedly enabling millions of people to understand what is going on, what + and * do, actually demystifying the "wires".

I just wish he start thinking of himself as more than 'collapsing weights', regardless if it turns out to be true.

yodsanklai 2 hours ago

I agree, I think I learned the most on this topic from his videos. And before that (a while ago), it was Andrew Ng coursera's class. The latter had hands-on project, which is much better than just listening in term of retention.. I don't know if Andrej Karpathy has more structured classes somewhere.

arthurofbabylon 18 hours ago

Agency. If one studied the humanities they’d know how incredible a proposal “agentic” AI is. In the natural world, agency is a consequence of death: by dying, the feedback loop closes in a powerful way. The notion of casual agency (I’m thinking of Jensen Huang’s generative > agentic > robotic insistence) is bonkers. Some things are not easily speedrunned.

(I did listen to a sizable portion of this podcast while making risotto (stir stir stir), and the thought occurred to me: “am I becoming more stupid by listening to these pundits?” More generally, I feel like our internet content (and meta content (and meta meta content)) is getting absolutely too voluminous without the appropriate quality controls. Maybe we need more internet death.)

whatevertrevor 6 hours ago

> In the natural world, agency is a consequence of death: by dying, the feedback loop closes in a powerful way.
I don't follow. If we, in some distant future, find a way to make humans functionally immortal, does that magically remove our agency? Or do we not have agency to begin with?
If your position on the "free will" question is that it doesn't exist, then sure I get it. But that seems incompatible with the death prerequisite you have put forward for it, because if it doesn't exist then surely it's a moot point to talk prerequisites anyway.
- arthurofbabylon 4 hours ago
  
  When I think of the term "agency" I think of a feedback loop whereby an actor is aware of their effect and adjusts behavior to achieve desired effects. To be a useful agent, one must operate in a closed feedback loop; an open loop does not yield results.
  Consider the distinction between probabilistic and deterministic reasoning. When you are dealing with a probabilistic method (eg, LLMs, most of the human experience) closing the feedback loop is absolutely critical. You don't really get anything if you don't close the feedback loop, particularly as you apply a probabilistic process to a new domain.
  For example, imagine that you learn how to recognize something hot by hanging around a fire and getting burned, and you later encounter a kettle on a modern stove-top and have to learn a similar recognition. This time there is no open flame, so you have to adapt your model. This isn't a completely new lesson, the prior experience with the open flame is invoked by the new experience and this time you may react even faster to that sensation of discomfort. All of this is probabilistic; you aren't certain that either a fire or a kettle will burn you, but you use hints and context to take a guess as to what will happen; the element that ties together all of this is the fact of getting burned. Getting burned is the feedback loop closing. Next time you have a better model.
  Skillful developers who use LLMs know this: they use tests, or they have a spec sheet they're trying to fulfill. In short, they inject a brief deterministic loop to act as a conclusive agent. For the software developer's case it might be all tests passing, for some abstract project it might be the spec sheet being completely resolved. If the developer doesn't check in and close the loop, then they'll be running the LLM forever. An LLM believes it can keep making the code better and better, because it lacks the agency to understand "good enough." (If the LLM could die, you'd bet it would learn what "good enough" means.)
  Where does dying come in? Nature evolved numerous mechanisms to proliferate patterns, and while everyone pays attention to the productive ones (eg, birth) few pay attention to the destructive (eg, death). But the destructive ones are just as important as the productive ones, for they determine the direction of evolution. In terms of velocity you can think of productive mechanisms as speed and destructive mechanisms as direction. (Or in terms of force you can think of productive mechanisms as supplying the energy and destructive mechanisms supplying the direction.) Many instances are birthed, and those that survive go on and participate in the next round. Dying is the closed feedback loop, shutting off possibilities and defining the bounds of the project.
dist-epoch 16 hours ago

Models die too - the less agentic ones are out-competed by the more agentic ones.
Every AI lab brags how "more agentic" their latest model is compared to the previous one and the competition, and everybody switches to the new model.
ngruhn 16 hours ago

I don't agree but I did laugh

SafeDusk 26 minutes ago

And I'm looking for a problem to spend my next decade on ...

Symmetry 15 hours ago

Very interesting conversation I'm still listening too. One bit I disagreed with is that I still think that an LLM's context is more like a person's sensory memory[1] than their working memory. The way that data falls off the end of the buffer regardless of how much attention it provokes is entirely unlike our own working memory. On the other hand a reasoning model's scratchpad seems to fit the analogy much better.

[1]https://en.wikipedia.org/wiki/Sensory_memory

hatmanstack 18 hours ago

Am I dating myself by thinking Kurzweil is still relevant?

2029: Human-level AI

2045: The Singularity - machine intelligence 1 billion times more powerful than all human intelligence

Based on exponential growth in computing. He predicts we'll merge with AI to transcend biological limits. His track record is mixed, but 2029 looks more credible post-GPT-5. The 2045 claim remains highly speculative.

williamcotton 18 hours ago

The biggest problem I've had with Kurzweil and the exponential growth curve is that the elbow depends entirely on how you plot and scale the axis. With a certain vantage point we have arguably been on an exponential curve since the advent of Homo Sapiens.
somenameforme 18 hours ago

I lost all respect for him after reading about his views on medical immortality. His argument is that over time human life expectancy has been constantly increasing * and he calculated that based on some arbitrary rate of acceleration, that science would be expanding human life expectancy by more than a year, per year - medical immortality in other words, and all expected to happen just prior to the time he's reaching his final years.
The overwhelming majority of all gains in human life expectancy have come due to reductions in infant mortality. When you hear about things like a '40' year life expectancy in the past it doesn't mean that people just dropped dead at 40. Rather if you have a child that doesn't make it out of childhood, and somebody else that makes it to 80 - you have a life expectancy of ~40.
If you look back to the upper classes of old their life expectancy was extremely similar to those of today. So for instance in modern history, of the 15 key Founding Fathers, 7 lived to at least 80 years old: John Adams, John Quincy Adams, Samuel Adams, Jefferson, Madison, Franklin, John Jay. John Adams himself lived to 90. The youngest to die were Hamilton who died in a duel, and John Hancock who died of gout of an undocumented cause - it can be caused by excessive alcohol consumption.
All the others lived into their 60s and 70s. So their overall life expectancy was pretty much the same as we have today. And this was long before vaccines or even us knowing that surgeons washing their hands before surgery was a good thing to do. It's the same as you go back further into history. A study [1] of all men of renown in Ancient Greece was 71.3 [1], and that was from thousands of years ago!
Life expectancy at birth is increasing, but longevity is barely moving. And as Kurzweil has almost certainly done plentiful research on this topic, he is fully aware of this. Cognitive dissonance strikes again.
[1] - https://pubmed.ncbi.nlm.nih.gov/18359748/
- modeless 7 hours ago
  
  This is true, and I tend to believe that indefinite human lifespan extension will come too late for anyone who is already an adult today including myself. But I do think that it will come, mostly as a consequence of advanced AI accelerating medical research. It may be wishful thinking to believe that it will happen within our lifetimes, but that doesn't mean it won't ever happen.
- asah 17 hours ago
  
  This is backward looking. Future advances don't have to work like this
  Example: 20ish years ago, stage IV cancer was a quick death sentence. Now many people live with various stage IV cancers for many years and some even "die of sending else" these advancements obviously skew towards helping older people.
  
  somenameforme 7 hours ago
  
  Your claim doesn't argue against the issue. Even if we accept that you're correct there, you're again speaking of more people getting to their 'expiration date' rather than expanding that date itself. If you cure cancer, heart disease, and everything else - we're still not going to be living to a 100, or even near it, on average.
  The reason humans die of 'old age' is not because of any specific disease but because of advanced senescence. Your entire body just starts to fail. At that point basically anything can kill you. And sometimes there won't even be any particular cause, but instead your heart will simply stop beating one night while you sleep. This is how you can see people who look like they're in great shape for their age, yet the next month they're dead.
Barrin92 18 hours ago

It's curious that Kurzweil's predictions about transcending biology align so closely with his expected lifespan. Reminds me of someone saying, if you ask a researcher for a timeline of a breakthrough they'll give you the expected span of their career.
Hegel thought history ended with the Prussian state, Fukuyama thought it ended in liberal America, Paul thought judgement day was so close you need not bother to marry, the singularity always comes around when the singularians get old. Funny how that works
akomtu 15 hours ago

> He predicts we'll merge with AI to transcend biological limits.
The merge with a machine 1 million times more intelligent than us is the same as letting AI use our bodies. I'd rather live in cave. Iirc, the 7th episode of Black Mirror starts with this plot line.

discreteevent 15 hours ago

On vibe coding vs using auto complete:

> The models have so many cognitive deficits. One example, they kept misunderstanding the code because they have too much memory from all the typical ways of doing things on the Internet that I just wasn’t adopting.

> I also feel like it’s annoying to have to type out what I want in English because it’s too much typing. If I just navigate to the part of the code that I want, and I go where I know the code has to appear and I start typing out the first few letters, autocomplete gets it and just gives you the code.

> They keep trying to make a production code base, and I have a bunch of assumptions in my code, and it’s okay. I don’t need all this extra stuff in there. So I feel like they’re bloating the code base, bloating the complexity, they keep misunderstanding, they’re using deprecated APIs a bunch of times. It’s a total mess. It’s just not net useful. I can go in, I can clean it up, but it’s not net useful.

pjmlp 5 hours ago

Regardless of the time, I am already seeing programming as we know it slowly moving into prompts, in what concerns low coding environments for SaaS products integrations.

Dealing with Rust's borrow checker issues, how complex C++ might be, Go's approach to language design, Java vs C#, and whatever else in the same vein, will slowly be matter of discussion to a selected few, while everyone else is promoting or doing voice dictation, creating kaban tickets for agents.

dlcarrier 17 hours ago

Redefinitions aside, fully capable AI is right up there with commercially viable fusion power, cost effective quantum completing, and fully capable self-driving cars, as a technology that is quickly advancing yet always a decade or two away.

RivieraKid 4 hours ago

Waymo's self-driving cars are scaling quickly. With some inaccuracy it can be said that the problem is solved, we have the technology for a full-scale deployment, we just need to do the boring work to deploy it everywhere.
SequoiaHope 17 hours ago

Fusion power seems closer than ever. And plenty of experts just five years ago thought AGI would still be decades away. A credible expert suggesting AGI is ten years away is a sign of real progress.
- bigfishrunning 16 hours ago
  
  A credible expert suggesting AGI is ten years away is a sign of good marketing.
newsclues 16 hours ago

What was the last example where humans succeeded at a hard problem like that?
Space flight?
- mkipper 16 hours ago
  
  Even if it's not some staggering triumph of human achievement, I'd argue that Ozempic (etc.) is similar. A magic weight loss drug has always captured the public's imagination, and it feels like I've been hearing about new weight loss drug studies in the news for my entire life that never went anywhere.
- jaza 7 hours ago
  
  We've "succeeded" at space flight about as much as we've "succeeded" at AI. Yay, man on the moon! Over half a century later, and it turns out that the "next small step" - man on Mars - isn't so small and still hasn't been achieved. Anything remotely resembling sci-fi-style ubiquitous space travel remains exactly that - sci-fi!
- RivieraKid 4 hours ago
  
  Waymo, which works and is scaling quickly.

mmcnl 15 hours ago

I find it strange AGI is the goal. The label AI is off and irrelevant. A language model is not AI, even a large language model. But language models are still extremely useful and potentially revolutionary. Labelling language models as AI is both under and overstating the value. It's not AI (insert sad trombone), but that doesn't mean it's amazing technology (insert thunderous applause).

a_victorp 5 hours ago

Without defining AGI as a goal, current AI companies would not be able to amass the amount of money they want
nearbuy 12 hours ago

This terminology is confusing. Historically, AI was always used to mean any kind of machine intelligence, including the most basic novice chess AI, or an image classifier, or a video game character's AI. Now a lot of people seem to be using it as a synonym for AGI - a human-level intelligence.

musebox35 2 hours ago

Honestly, if you have any actual interest in LLMs or other generative ai variants, just go after a concrete goal post that you yourself set with measurable metrics to gauge your progress. Then the predicted timeline from podcasts and blog posts will become irrelevant. Experts and non-experts have both been terrible at predicting timelines since the dawn of ai. Self driving cars and llms are no exception. When you are making predictions based solely on intuition and experience it is mostly an extrapolation. It is not useless. It always helps to ask questions and try to frame the future within the bounds of our current understanding. But at the same time it is important to remember that this is just speculation, not empirical science. That is also why there is such varied opinions on the topic of ai timelines. Relax and enjoy witnessing a major leap in our understanding of natural language, vision, and high dimensional probabilistic vector spaces ;-)

TheBlight 18 hours ago

I knew this once I heard OpenAI was going to get into the porn bot business. If you have AGI you don't need porn bots.

pier25 15 hours ago

Or Sora...
tasuki 17 hours ago

Why not?
- gtirloni 17 hours ago
  
  Most likely because you'll be filthy reach from selling AGI and won't need to go after secondary revenue sources.
  
  charcircuit 16 hours ago
  
  >Most likely because you'll be filthy reach from selling AGI
  Why? If AGI costs more than a human or operates slower than one, it may not be economical for people to buy it. By the time it becomes economical, competitors may have also cracked it reducing your ability to charge high margins on it.
  
  pier25 15 hours ago
  
  If OpenAI had anything even resembling AGI they'd be milking the shit of that, even if only for marketing.
  
  shdh 14 hours ago
  
  Cost decreases with time
  Humans can work on a problem 8 hours a day? You can run inference 24/7
  
  charcircuit 10 hours ago
  
  It decreases, but decreasing from $1 million per token to $0.9 million per token after a year is still a decrease, but it still is not viable. Paying an AGI a $100 billion dollars for it to work 24/7 for a year is worse than hiring 10 people for $30k a year to work shifts to do the same work 24/7.

nopinsight 18 hours ago

A definition of AGI: https://www.agidefinition.ai/

A new contribution by quite a few prominent authors. One of the better efforts at defining AGI *objectively*, rather than through indirect measures like economic impact.

I believe it is incomplete because the psychological theory it is based on is incomplete. It is definitely worth discussing though.

—-

In particular, creative problem solving in the strong sense, ie the ability to make cognitive leaps, and deep understanding of complex real-world physics such as the interactions between animate and inanimate entities are missing from this definition, among others.

kart23 18 hours ago

I don't know a single one of the "Social Science" items, and I'm pretty sure 90% of college educated people wouldn't know a single one either.
- CamperBob2 15 hours ago
  
  Not only that, but the notion that GPT-5 will answer those questions with only 2% accuracy seems suspect. Those are exactly the kinds of questions that current models are great at.
  Nothing about that page makes much sense.
  
  jonas21 8 hours ago
  
  The percentages are added, not averaged. Each category sums to 10%, and the General Knowledge category has 5 equally-weighted subcategories, so 2% is the best possible score you can get in the social science subcategory.
  I don't know why they decided to do it this way. It's very confusing.
chrisweekly 17 hours ago

I agree it seems like a better-structured effort than many others. But its shortcomings go beyond a shallow and incomplete foundation in psychology. It also has basic errors in its execution, eg a "Geography" question about centripetal and centrifugal forces. Color me extremely skeptical.

ciconia 18 hours ago

It's funny how there's such a pervasive cynicism about AI in the developer community, yet everyone is still excited about vibe coding. Strange times...

leptons 18 hours ago

What developer is excited about "vibe coding"? The only people excited about "vibe coding" are people who can't code.
- qingcharles 15 hours ago
  
  That's a crazy generalization.
  I've coded professionally for 40 years. I'm hugely excited about vibe coding. I use it every single day to create little tools and web apps to help me do my job.
- spjt 10 hours ago
  
  I love vibe coding because it does the things I hate really well, like meeting test coverage requirements and writing doc comments.
- gabriel-uribe 12 hours ago
  
  Deeply excited about vibe coding -- my non-technical cofounder 'codes' now.
- dist-epoch 16 hours ago
  
  ...and all developers use Macs, live in SF, and deploy to AWS.

phoenixreader 8 hours ago

A decade is nothing. If issues will be worked through a decade from now, that means the best time to think about opportunities/coonsequences related to that is now.

modeless 6 hours ago

Yeah, I see people pooh-poohing the idea of humanoid robots being useful this decade, saying it will take at least 20 years. Oh yeah? Instead of 5 years to render all human labor obsolete, it will take 20? The magnitude of that change is so large that the implications of it happening anytime in our lifetimes are too big to ignore.
The important thing is that this is not going to be perpetually 20 years in the future like fusion. This is something that will happen.
- chronci739 4 hours ago
  
  > This is something that will happen.
  Not in our lifetime.
  The iPhone came out less than 20 years ago.
  And what, you scan QR codes at restaurants with iphones?

K0balt 13 hours ago

We will never achieve AGI, because we keep moving the goalposts.

SOTA models are already capable of outperforming any human on earth in a dizzying array of ways, especially when you consider scale.

Humans also produce nonsensical, useless output. Lots of it.

Yes, LLMs have many limitations that humans easily transcend.

But few if any humans on earth can demonstrate the breadth and depth of competence that a SOTA model possesses.

Relatively few (probably less than half) are casually capable of the level of reasoning that LLMs exhibit.

And, more importantly, as anyone in the field when neural networks were new is aware, AGI never meant human level intelligence until the LLM age. It just meant that a system could generalize one domain from knowledge gained in other domains without supervision or programming.

alganet 6 hours ago

> We will never achieve AGI, because we keep moving the goalposts.
I think it's fair to do it to the idea of AGI.
Moving the goalpost is often seen as a bad thing (like, shifting arguments around). However, in a more general sense, it's our special human sauce. We get better at stuff, then raise the bar. I don't see a reason why we should give LLMs a break if we can be more demanding of them.
> SOTA models are already capable of outperforming any human on earth in a dizzying array of ways, especially when you consider scale.
Performance should include energy consumption. Humans are incredibly efficient at being smart while demanding very little energy.
> But few if any humans on earth can demonstrate the breadth and depth of competence that a SOTA model possesses.
What if we could? What if education mostly stopped improving in 1820 and we're still learning physics at school by doing exercises about train collisions and clock pendulums?

RivieraKid 4 hours ago

This would be great if true, I need 5 years to reach financial independence, so a decade should be plenty of time.

1970-01-01 18 hours ago

Great quote:

"When you get a demo and something works 90% of the time, that’s just the first nine. Then you need the second nine, a third nine, a fourth nine, a fifth nine. While I was at Tesla for five years or so, we went through maybe three nines or two nines. I don’t know what it is, but multiple nines of iteration. There are still more nines to go.

That’s why these things take so long."

onlyrealcuzzo 18 hours ago

Importantly, the first 9s are the easiest.
If you need to get to 9 9s, the 9th 9 could be more effort than the other 8 combined.

zhivota 7 hours ago

Is there any more information about the Eureka educational project? I think it's probably the wrong endpoint to target teaching about AI first (too complex, too many pre-reqs), really these tools should work from the base of the educational pyramid and move up from there.

There is a lot of success already in adaptive learning in elementary school for instance, my kids are blasting through math on Prodigy and it seems like Synthesis may be a great tool as well, and I believe we're just at the beginning of this wave. For that level of learning I don't think we need incredibly more capability, just better application.

coldtea 2 hours ago

Yeah, just long enough to not be held accountable for that prediction when it doesn't happen

mediumsmart 17 hours ago

Getting to AGI is not the problem. Finding the planet that it is going to run on will be.

thdhhghgbhy 2 hours ago

That's convenient, will probably take him through to retirement.

mustaphah 15 hours ago

This aligns with METR's Time Horizons [1], the current SOTA "Moore's Law" for AI agents:

- The length of tasks AI can complete doubles every ~7 months

- In 2-4 years, AIs could autonomously complete week-long projects.

- In under 10 years, they might handle month-long software or knowledge work.

[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

lm28469 13 hours ago

It's like saying your newborn will have the same mass as earth in 50 years if he continues on his first month weight gain trajectory.

sarchertech 17 hours ago

I’m in the Penrose camp that Turing machines can’t be conscious which is required for true AGI.

jsiepkes 5 hours ago

Unless someone can show me some sort of "Moore's law" for LLM's, saying it will "take a decade" sounds more to me like it could "take 10 years for the next 20 years".

alganet 12 hours ago

Half an hour into the interview, and it sounds pretty good! I'm genuinely surprised.

cayleyh 18 hours ago

"decade" being the universal time frame for "I don't know" :D

ActorNightly 18 hours ago

Not a decade. More like a century, and that is if society figures itself out enough to do some engineering on a planetary scale, and quantum computing is viable.

Fundamentally, AGI requires 2 things.

First it needs to be able to operate without information, learning as it goes. The core kernel should be such that it doesn't have any sort of training on real world concepts, only general language parsing that it can use to map to some logic structure to be able to determine a plan of action. So for example, if you give the kernel the ability to send ethernet packets, it should eventually figure out how to talk tls to communicate with the modern web, even if that takes an insane amount of repetition.

The reason for this is that you want the kernel to be able to find its way through any arbitrarily complex problem space. Then as it has access to more data, whether real time, or in memory, it can be more and more efficient.

This part is solvable. After all, human brains do this. A single rack of Google TPUs is roughly the same petaflops as a human brain operating at max capacity if you assume neuron activation is a add-multiply and firing speed of 200 times/second, and humans don't use all of their brain all the time.

The second part that makes the intelligence general is the ability to simulate reality faster than reality. Life is imperative by nature, and there are processes with chaotic effects (human brains being one of them), that have no good mathematical approximations. As such, if an AGI can truly simulate a human brain to be able to predict behavior, it needs to do this at an approximation level that is good enough, but also fast enough to where it can predict your behavior before you exhibit it, with overhead in also running simulations in parallel and figuring out the best course of actions. So for a single brain, you are looking at probably a full 6 warehouses full of TPUs.

dist-epoch 16 hours ago

AIs already fake-simulate the weather (chaotic system) using 1% of the resources used by the real-simulating supercomputers.
ctoth 17 hours ago

You want a "core kernel" with "general language parsing" but no training on real-world concepts.
Read that sentence again. Slowly.
What do you think "general language parsing" IS if not learned patterns from real-world data? You're literally describing a transformer and then saying we need to invent it.
And your TLS example is deranged. You want an agent to discover the TLS protocol by randomly sending ethernet packets? The combinatorial search space is so large this wouldn't happen before the sun explodes. This isn't intelligence! This is bruteforce with extra steps!
Transformers already ARE general algorithms with zero hardcoded linguistic knowledge. The architecture doesn't know what a noun is. It doesn't know what English is. It learns everything from data through gradient descent. That's the entire damn point.
You're saying we need to solve a problem that was already solved in 2017 while claiming it needs a century of quantum computing.
- dang 7 hours ago
  
  Please make your substantive points without swipes or name-calling. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.
- ActorNightly 16 hours ago
  
  >What do you think "general language parsing" IS if not learned patterns from real-world data?
  I want you to hertograize the enpostule by brasetting the leekerists, while making sure that the croalbastes are not exhibiting any ecrocrafic effects
  Whatever you understand about that task, is what a kernel will "understand" as well. And however you go about solving it, the kernel will also will follow similar patterns of behaviour (starting with figuring out what hertrograize means, which then leads to other tasks, and so on)
  >You want an agent to discover the TLS protocol by randomly sending ethernet packets? The combinatorial search space is so large this wouldn't happen before the sun explodes.
  In pure combination, yes. In smart directed intelligent search, no. Ideally the kernel could listen for incoming traffic, and figure out patterns based on that. But the point is that the kernel should figure out that listening for traffic is optimal without you specifically telling it, because it "understands" the concept of other "entities" communicating with it and that communication is bound to be in a structured format, and has internal reward systems in place for figuring it out through listening rather than expending energy brute force searching.
  Whatever that process is, it will get applied to much harder problems identically.
  >Transformers already ARE general algorithms with zero hardcoded linguistic knowledge. The architecture doesn't know what a noun is. It doesn't know what English is. It learns everything from data through gradient descent. That's the entire damn point.
  It doesn't learn what a noun is or english is, its a statistical mapping that just tends to work well. LLMs are just efficient look up maps. Look up maps can go only so far as to interpolate on the knowledge encoded within them. These can simulate intelligence in the sense of recursive lookups, but fundamentally that process is very guided, hence all the manual things like prompt engineering, mcp servers, agents, skills and so on.

arisAlexis an hour ago

I want to be heretical and say that Karpathy hasn't worked in a frontier lab since 2020 and missed all the greatness of the last years. Humans are humans are humans.

brunooliv 7 hours ago

The thing is all these big labs are so “transformer-pilled”, and they need to keep the money furnaces growing that I think it’ll take considerably more than 10 years, more like 20-30 if we’re lucky.

reenorap 18 hours ago

Are "agents" just programs that call into an LLM and based on the response, it will do something?

cootsnuck 18 hours ago

Kinda. It's just an LLM that performs function calling (i.e. the LLM "decides" when a function needs to be called for a task and passes the appropriate function name and arguments for that function based on its context). So yea an "agent" is that LLM doing all of that and then your program that actually executes the function accordingly.
That's an "agent" at its simplest -- a LLM able to derive from natural language when it is contextually appropriate to call out to external "tools" (i.e. functions).
sammyd56 16 hours ago

An agent is just an LLM calling tools in a loop. If you're a "show me the code" type person like me, here's a worked example: https://samdobson.uk/posts/how-to-build-an-agent/
fragmede 16 hours ago

"Something" is broad and not well defined, but basically yeah. Rather than try to define it in terms of complexity of the something, I'll put it in terms of minutes. If the LLM returns a response, and that response gets fed into a system and run, and that's it, I wouldn't really call that agentic. It's got to go a few more rounds back and forth to be agentic, imo. In terms of time, I'd say the agent program has to be capable of at least 10 minutes of going from user input, then the program calling into the LLM, feeding the LLM response into a system, feeding that result back into the LLM, and feeding that into the system in a loop. Obviously there are ways to game that metric, like the terrible lines of code metric, but I think it's a decent handwave for when it feels like there's an agent working for me rather than a non-agentic system. What it's doing for those 10 minutes is important, calling "sleep 600" obviously doesn't count.
Eg for a programming LLM with an agentic agent and access to a computer, would be able to, given design-doc.md and Todo.md, implement feature X, making sure it compiles, run some basic smoke tests, write appropriate unit tests, make sure they all pass, and finally push the code and create a draft PR.
Naturally, not every call into the agent is going to take the full 10 minutes. It may need to ask questions before getting started, or stop if there's an unrecoverable error. Sometimes you'll just need to tell it "continue", but the system should be capable of a 10-minute run (hopefully longer!) given enough support.

shreezus 6 hours ago

I have personally witnessed AI capabilities that would be considered “impossible” by current public standards.

The next 2-3 years are going to be incredibly interesting.

replwoacause 6 hours ago

As in more impressive than what we consumers have access to from OpenAI, Anthropic, etc?

psadri 17 hours ago

When I think about a problem, I consciously explore a tree (or graph) of possibility chains. This requires a mental space to keep track of “state”. Sometimes jotting things down on paper helps if I can’t keep it all in my head. The process is: - generate some possibilities - rank them based on intuition (this might happen subconsciously!) - ask what if we follow possibility Pn - push Pn on to the stack. - recourse or pop stack if deadened

I feel LLMs are fairly capable when it comes to doing each of those steps in isolation. But not when it is all put together as a process.

abhishekismdhn 8 hours ago

Unless we start thinking about fundamentally different ways of solving AI, we will always be 10 years away. It's puzzling that no one wants to think beyond backpropagation

JCharante 9 hours ago

Why does everyone have such short timelines to show progress? So what if it takes 50 years to develop, we’ll have AGI for the next million years

themafia 14 hours ago

5 decades.

You have one decade to clean up your power use problem. If you don't you will find yourself in the next AI winter.

shdh 14 hours ago

Power use is less important than model capability
AGI is either more scale or differing systems, or both
They can always optimize for power consumption after AGI has been reached
- themafia 13 hours ago
  
  > AGI is either more scale
  So you plan to scale without increasing power usage. How's that?
  > They can always optimize for power consumption after AGI has been reached
  If you don't optimize power consumption you're going to increase surface area required to build it. There are hard physical limits having to do with signal propagation times.
  You're ignoring the engineering entirely. The software is not hardly interesting or even evolving.

asdev 19 hours ago

Are researchers scared to just come out and say it because they'll be labeled as wrong if the extreme tail case happens?

andy_ppp 19 hours ago

No, it’s because of money and the hype cycle.
- ionwake 18 hours ago
  
  I mean you say this, but I havent touched a line of code as a programmer in months, having been totally replaced by AI.
  I mean sure I now "control" the AI, but I still think these no AGI for 2 decades claims are a bit rough.
  
  andy_ppp 18 hours ago
  
  I think AI is great and extremely helpful but if you’ve been replaced already maybe you have more time now to make better code and decisions? If you think the AI output is good by default I think maybe that’s a problem. I think general intelligence is something other than what we have now, these systems are extremely bad at updating their knowledge and hopelessly at applying understanding from one area to another. For example self driving cars are still extremely brittle to the point of every city needing new and specific training - you can just take a car with controls on the opposite side to you and safely drive in another country.
  
  ionwake 4 hours ago
  
  Yeah i agree. However like ... I don't understand why you think I am making bad decisions? I'm a self made ( to an extent ) millionaire, I am doing ok
  
  an0malous 17 hours ago
  
  Let’s see the code
  
  ionwake 5 hours ago
  
  pls no
  
  rootusrootus 16 hours ago
  
  I don't want to sound mean, but c'mon, the reality is that if you haven't touched a line of code in months, you are/were not a programmer. I love Claude Code, it really has its moments. But even for the stuff it is exceptionally good at, I have to regularly fix mistakes it has made. And I only give it the fairly easy stuff I don't feel like doing myself.
  
  ionwake 4 hours ago
  
  It is ok brother I can handle the accusation. Perhaps after 2 decades... I really am no longer a programmer?
strangattractor 18 hours ago

They are afraid to say it because it may affect the funding. Currently with all the hype surrounding AI investors and governments will literally shower you with funding. Always follow the money:) Buy the dream - sell the reality.
- strangattractor 18 hours ago
  
  Also I think Andrej is just an honest guy.
Goofy_Coyote 18 hours ago

I don’t think they’re scared, I think they know it’s a lose-tie game.
If you’re correct, there’s not much reward aside from the “I told you so” bragging rights, if you’re wrong though - boy oh boy, you’ll be deemed unworthy.
You only need to get one extreme prediction right (stock market collapse, AI taking over, etc ), then you’ll be seen as “the guru”, the expert, the one who saw it coming. You’ll be rewarded by being invited to boards, panels and government councils to share your wisdom, and be handsomely paid to explain, in hindsight, why it was obvious to you, and express how baffling it was that no one else could see what you saw.
On the other hand, if predict an extreme case and you get it wrong, there’s virtually 0 penalties, no one will hold that against you, and no one even remembers.
So yeah, fame and fortune is in taking many shots at predicting disasters, not the other way around.

johnhamlin 18 hours ago

Did anyone here actually watch the video before commenting? I’m seeing all the same old opinions and no specific criticisms of anything Karpathy said here.

dang 7 hours ago

More specific responses have come in as people have digested more of the content.
This is the reflexive/reflective distinction (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...). Reflexive comments—the kind that express some pre-existing feeling or opinion that happens to get triggered by association—are much faster to produce, so unfortunately they show up first in many threads.

aubanel 17 hours ago

> If I were to steelman the Sutton perspective, it would be...

I don't find it very courteous to say that you're steelmanning someone's argument. Sutton is certainly smart enough to have steelmanned his argument himself. Steelmanning : do it in your head, don't say it!

goalieca 18 hours ago

I remember attending a lecture from a famous quantum computing researcher in 2003. He said that quantum computing is 15-20 years away and then he followed up by saying that if he told anyone it was further away then he wouldn't get funding!

Yoric 17 hours ago

And now (useful) quantum computing is 5 years away! Has been for a few years, too.
- oldgradstudent 16 hours ago
  
  Any day now.
EA-3167 17 hours ago

It's an excellent time-frame that sounds imminent enough to draw interest (and funding), but is distant enough that you can delay the promised arrival a few times in the span of a career before retiring.
Fusion research lives and dies on this premise, ignoring the hard problems that require fundamental breakthroughs in areas such as materials science, in favor of touting arbitrary benchmarks that don't indicate real progress towards fusion as a source of power on the grid.
"Full self driving" is another example; your car won't be doing this, but companies will brag about limited roll-outs of niche cases in dry, flat, places that are easy to navigate.
- bhelkey 13 hours ago
  
  > companies will brag about limited roll-outs of niche cases in dry, flat, places that are easy to navigate
  According to their website, Waymo offers autonomous rides to the general public in Austin, Atlanta, Phoenix, the San Francisco Bay Area, and Los Angeles [1].
  * San Francisco is an extremely hilly city that gets a fair bit of fog.
  * Los Angeles has notorious traffic and particularly aggressive drivers.
  * Atlanta gets ~50 inches of rain a year, more than Seattle [2].
  [1] https://waymo.com/faq/#:~:text=Where%20does%20Waymo%20operat...
  [2] https://www.forbes.com/sites/marshallshepherd/2024/09/03/whi...
- plastic3169 16 hours ago
  
  > ”Full self driving" is another example; your car won't be doing this, but companies will brag about limited roll-outs of niche cases in dry, flat, places that are easy to navigate.
  Not expecting my car to be self-driving anytime soon, but I have understood there is actual working robotaxi service in San Francisco which is not easy or flat? I think we can’t keep saying self driving cars will never happen when this kind of thing already exists.
  
  EA-3167 16 hours ago
  
  It's true that SF isn't flat, but it's incredibly well mapped, it never snows and you don't have to worry about roads ravaged by frost-heaves. There's a reason that the new Doordash automated delivery service is starting off in Phoenix and not Boston for example.

maqnius 17 hours ago

Well, no one really knows — maybe we're just putting a lot of effort into turning a lump of clay into pizza. It already looks confusingly similar; now it just needs to smell and taste like it.

aaroninsf 17 hours ago

I'm pretty content to say this may be true, but may well prove quite wrong.

Why? Because humans—including the smartest of us—are continuously prone to cognitive errors, and reasoning about the non-linear behavior of complex systems is a domain we are predictably and durably terrible at, even when we try to compensate.

Personally I consider the case of self-driving cars illustrative and a go-to reminder for me of my own very human failure in this case. I was quite sure that we could not have autonomous vehicles in dynamic messy urban areas without true AGI; and that FSD would in the fashion of the failed Tesla offering, emerge first in the much more constrained space of the highway system. Which would also benefit from federal regulation and coordination.

No Waymos have eaten SF, and their driving is increasingly nuanced; and last night a friend and very early adopter relayed a series of anecdotes about some of the strikingly nuanced interactions he'd been party to recently, including being in a car that was attacked late at night, and, how one did exactly the right thing when approached head-on in a narrow neighborhood street that required backing out. Etc.

That's just one example, and IMO we are only beginning to experience the benefits of "network effects" so popular in tails of singularity take-off.

Ten years is a very, very, very long time under current conditions. I have done neural networks since the mid-90s (academically: published, presented, etc.) and I have proven terrible in anticipating how quickly "things" will improve. I have now multiple times witnessed my predictions that X or Y would take "5-8" or "8-10" years or "too far out to tell," instead arrive within 3 years.

Karpathy is smart of course but he's no smarter in this domain than any of the rest of us.

Are scaled tuned transformers with tack-ons going to give us AGI in 18 months? "No" is a safe bet. Is no approach going to give us AGI inside of 5 years? That is absolutely a bet I would never make. Not even close.

aiauthoritydev 6 hours ago

Glad to see someone being honest here.

lubesGordi 15 hours ago

13 minutes in Andrej is talking about how the models don't even really need the knowledge, it would be better to have just a core that has the algorithms it's learned, a "cognitive core." That sounds awesome, and would shrink the size of the models for sure. You don't need the entire knowledge of the internet compressed down and stashed in vram somewhere. Lots of implications.

anon191928 17 hours ago

Amazing that he speaks the truth even tho $trillions and his stock options? depends on it. He and Dennis H. deserves all the respect.

Handy-Man 17 hours ago

I mean Dennis is just another hype man now, irrespective of how important research they may be doing in the background.

observationist 18 hours ago

Kurzweil has been eerily right so far, and his timeline has AGI at 2029. When software can perform any unattended, self directed task (in principle) at least as well as any human over the sum total of all tasks that humans are capable of doing, we will have reached AGI.

Software can already write more text on any given subject better than a majority of humanity. It can arguably drive better across more contexts than all of humanity - any human driver over a billion miles of normal traffic will have more accidents than self driving AI over the same distance. Short stories, haikus, simple images, utility scripts, simple software, web design, music generation - all of these tasks are already superhuman.

Longer time horizons, realtime and continuous memory, a suite of metacognitive tasks, planning, synthesis of large bodies of disparate facts into novel theory, and a few other categories of tasks are currently out of reach, but some are nearly solved, and the list of things that humans can do better than AI gets shorter by the day. We're a few breakthroughs away, maybe even one big architectural leap, from having software that is capable (in principle) of doing anything humans can do.

I think AGI is going to be here faster than Kurzweil predicted, because he probably didn't take into consideration the enormous amount of money being spent on these efforts.

There has never been anything like this in history - in the last decade, over 5 trillion dollars has been spent on AI research and on technologies that support AI, like crypto mining datacenters that pivoted to AI, new power, water, data support, providing the infrastructure and foundation for the concerted efforts in research and development. There are tens of thousands of AI researchers, some of them working in private finance, some for academia, some doing military resarch, some doing open source, and a ton doing private sector research, of which an astonishing amount is getting published and shared.

In contrast, the entire world spent around 16 trillion dollars on world war II - all of the R&D and emergency projects and military logistics, humanitarian aid, and so on.

We have AI getting more resources and attention and humans involved in a singular development effort, pushing toward a radical transformation of the very concept of "labor" - while I think it might be a good thing if it is a decade away, even perpetually so until we have some reasonable plan for coping with it, I very much think we're going to see AGI within the very near future.

*When I say "in principle" I mean that given the appropriate form factor, access, or controls, the AI can do all the thinking, planning, and execution that a human could do, at least as well as any human. We will have places that we don't want robots or AI going, tasks reserved for humans, traditions, taboos, economics, and norms that dictate AI capabilities in practice, but there will be no legitimacy to the idea that an AI couldn't do a thing.

bhewes 15 hours ago

The end is gold. The first half is kinda intro, but then the language tightens up and they rock it.

echo42null 7 hours ago

is there an definite what an "Agent" is? or dos everyone have their own definition?

wseqyrku 4 hours ago

Translation: it's going to be doing alright for surveillance for a long time, if you thought whatever agi comes out when it's ready, think again.

andrewrn 15 hours ago

Begging someone to coherently define AGI. Worlds most ambiguous term

cboyardee 17 hours ago

AGI ---> A Great Illusion!

nurettin 4 hours ago

Continual learning would mean that the data somehow has to be part of the model and the model needs to incrementally adapt to novel inputs. Not just tacked on and backpropagated, but within the network affecting decisions. Current architectures are pretty much dead ends in that aspect.

seydor 18 hours ago

I wouldn't consider either of them qualified to answer that question

Zacharias030 9 hours ago

With all due respect, what does it say about us that „famous researcher voices his speculative opinion“ is an instant top 1 on hackernews?

tptacek 9 hours ago

That the speculative opinions of famous researchers are a good starting point for curious conversation.
padolsey 8 hours ago

I mean, HN is no stranger to cults of personality. Post a paulg essay that says something a teenager could write and it'll fly up to position 1.
- jaza 7 hours ago
  
  True. We have our gods, same as every other tribe throughout history.
seydor 7 hours ago

Is it worse than "Rich CEO expressing certainly over his hunch"

lvl155 15 hours ago

I think LLM is a great way to train a true AGI.

rwaksmunski 18 hours ago

AGI is still a decade away, and always will be.

gjm11 15 hours ago

You say that as if people had been saying "10 years away" for ages, but I don't think that's true at all.
There's some information about historical predictions at https://www.openphilanthropy.org/research/what-should-we-lea... (written in 2016) from which (I am including the spreadsheet found at footnote 27) these are some I-hope-representative data points, with predictions from actual AI researchers, popularizers, pundits, and SF authors:
1960: Herbert Simon predicts machines can do all (intellectual) work humans can "within 20 years".
1961: Marvin Minsky says "within our lifetimes, machines may surpass us"; he was 33 at the time, suggesting a not-very-confident timescale of say 40 years.
1962: I J Good predicts something at or above human level circa 1978.
1963: John McCarthy allegedly hopes for "a fully-intelligent machine" within a decade.
1970: I J Good predicts 1994 +- 10 years.
1972: a survey of 67 computer scientists found 27% saying <= 20 years, 32% saying 20-50 years, and 42% saying > 50 years.
1977-8: McCarthy says things like "4 to 400 years" and "5 to 500 years".
1988: Hans Moravec predicts human-level intelligence in 40 years.
1993: Vernor Vinge predicts better-than-human intelligence in the range 2005..2030.
1999: Eliezer Yudkowsky predicts intelligence explosion circa 2020.
2001: Ben Goertzel predicts "during the next 100 years or so".
2001: Arthur C Clarke predicts human-level intelligence circa 2020.
2006: Douglas Hofstadter predicts somewhere around 2100.
2006: Ray Solomonoff predicts within 20 years.
2008: Nick Bostrom says <50% chance by 2033.
2008: Rodney Brooks says no human-level AI by 2030.
2009: Shane Legg says probably between 2018 and 2036.
2011: Rich Sutton estimates somewhere around 2030.
Of these, exactly one suggests a timescale of 10 years; the same person a little while later expresses huge uncertainty ("4 to 400 years"). The others are predicting timescales of multiple decades, also generally with low confidence.
Some of those predictions are now known to have been too early. There definitely seems to be a sort of tendency to say things like "about 30 years" for exciting technologies many of whose key details remain un-worked-out: AI, fusion power, quantum computing, etc. But it's definitely not the case that "a decade away" has been a mainstream prediction for a long time. People are in fact adjusting their expectations on the basis of the progress they observe in recent years. For most of the time since the idea of AI started being taken seriously, "10 years from now" was an exceptionally optimistic[1] prediction; hardly anyone thought it would be that soon. Now, at least if you listen to AI researchers rather than people pontificating on social media, "10 years from now" is a typical prediction; in fact my impression is that most people who spend time thinking about these things[2] expect genuinely-human-level AI systems sooner than that, though they typically have rather wide confidence intervals.
[1] "Optimistic" in the narrow sense in which expecting more progress is by definition "optimistic". There are many many ways in which human-level, or better-than-human-level, AI could in fact be a very bad thing, and some of them are worse if it happens sooner, so "optimistic" predictions aren't necessarily optimistic in the usual sense.
[2] Most, not all, of course.
- password54321 2 hours ago
  
  People like Eliezer and Nick Bostrom are living proof that if you say enough and sound smart enough people will listen to you and think you have credibility.
  Meanwhile you won't find anyone on here who is an author for Attention is All You Need. You know the thing that actually is the driving force behind LLMs.

kachapopopow 19 hours ago

AGI is already here if you shift some goal posts :)

From skimming the conversation it seems to mostly revolve around LLMs (transformer models) which is probably not going to be the way we obtain AGI to begin with, frankly it is too simple to be AGI, but the reason why there's so much hype is because it is simple to begin with so really I don't know.

ecocentrik 18 hours ago

LLMs are close enough to pass the Turing Test. That was a huge milestone. They are capable of abstract reasoning and can perform many tasks very well but they aren't AGI. They can't teach themselves to play chess at the level of a dedicated chess engine or fly an airplane using the same model they use to copypasta a React UI. They can only fool non-proficient humans into believing that they might be capable of doing those things.
- password54321 17 hours ago
  
  Turing Test was a thought experiment not a real benchmark for intelligence. If you read the paper the idea originated from it is largely philosophical.
  As for abstract reasoning, if you look at ARC-2 it is barely capable though at least some progress has been made with the ARC-1 benchmark.
  
  ecocentrik an hour ago
  
  I wasn't claiming the Turing Test was a benchmark for intelligence but the ability to fool a human into thinking a machine is intelligent in conversation is still a significant milestone. I should have said "some abstract reasoning". ARC-2 looks promising.
tim333 10 hours ago

I think most people think of AGI as able to do the stuff humans do and it's still missing a fair bit there.
throwaway-0001 17 hours ago

A transistor is very simple too, and here we are. Don’t dismiss something because it’s simple.
- password54321 17 hours ago
  
  You got to look at how it scales. LLMs have already stopped increasing in parameter count as they don't get better by scaling them up anymore. New ideas are needed.
  
  throwaway-0001 17 hours ago
  
  You’re right… but still, what was done until today is significant and useful already

rcarmo 17 hours ago

Hmm. Zeno's Paradox.

(I was in college during the first AI Winter, so... I can't help but think that the cycles are tighter but convergence isn't guaranteed.)

awongh 18 hours ago

Now that Nvidia is the most valuable company, all this talk of actual AGI will be washed away by the huge amount of dollars driving the hype train.

Most of these companies value is built on the idea of AGI being achievable in the near future.

AGI being too close or too far away affects the value of these companies- too close and it'll seem too likely that the current leaders will win. Too far away and the level of spending will seem unsustainable.

michaelt 17 hours ago

> Most of these companies value is built on the idea of AGI being achievable in the near future.
Is it? Or is it based on the idea a load of white collar workers will have their jobs automated, and companies will happily spend mid four figures for tech that replaces a worker earning mid five figures?
- rootusrootus 17 hours ago
  
  I think companies that expect to use AI to cut their salary overhead making the same products they were before are going to get clobbered by companies that use AI to grow. A few people may have to retrain into a different line of work but I don't really see AI putting people out of work en masse.
- JumpCrisscross 16 hours ago
  
  From what I've seen, the most-compelling thesis involves robotics. We're seeing evidence that LLMs tokenising physical inputs can operate robots better than previous methods. If that's pans out, the investment thesis is secured. No AGI needed.
- jjulius 17 hours ago
  
  Why not both? :)
zeroonetwothree 17 hours ago

It’s possible for AI to provide tremendous economic value without AGI
- sameermanek 15 hours ago
  
  That is for a governing body to look out for. NOT private companies. Governments have a job to run massive programs for socioeconomic welfare without carrying about profit.
- sarchertech 17 hours ago
  
  AGI in the not too distant future is always priced in. Just providing tremendous economic value won’t make the stock prices keep going up.
- anon191928 17 hours ago
  
  that is doubtful? sure it provides a lot of value but current levels are dotcom top level. Everyone knew internet had value but stocks push it too high
tootie 17 hours ago

Exactly. A 5-10 year timeline and you've got the formula for a new Space Race with China. Give us $7T or else China will control the world.
This 2024 story feels like ancient history that everyone has forgotten: https://www.cnbc.com/2024/02/09/openai-ceo-sam-altman-report...

agrover 19 hours ago

This seems to be the growing consensus.

sputknick 19 hours ago

There is a very strange totally coincidental correlation where if you are smart and NOT trying to raise money for an AI start-up, you think AGI is far away, and if you are smart and actively raising money for an AI start-up, then AGI is right around the corner. One of those odd coincidences of modern life
- chrismorgan 10 hours ago
  
  Mind you, such a correlation can be reasonable—the Yesses work for something because they believe it, while the Noes don’t because they don’t. (In this instance, I’m firmly a No, and I don’t say such a correlation is reasonable, due to the corrupting influence of money plus hype sweeping people along, which I think are much more common. But there will still be at least some that are True Believers, and it does make sense that they would then try to raise money to achieve their vision.)

theturtlemoves 6 hours ago

The cookie warning is fun

theturtlemoves 2 hours ago

Oh, that's odd. This comment was intended for the vibe-coded.lol post
https://news.ycombinator.com/item?id=45622944
Must have been the flu-brain misfiring

ares623 19 hours ago

Is that at current investment levels?

maffyoo 16 hours ago

Fusion is 30 years away

alkyon 16 hours ago

The future is now

dyauspitr 10 hours ago

I don’t think we’re ever intentionally and methodically going to get to AGI. It’s going to be doing stuff we’re doing now at a massive scale that going to have emergent AGI purely due to scale.

notepad0x90 18 hours ago

I'm betting we'll have either cold fusion or the "year of the linux desktop" (finally) before AGI.

kazinator 13 hours ago

AGI is forever away if you stay blinkered to the LLM/difussion stochastic parrot program, which might have already passed its peak.

guluarte 15 hours ago

LLMs will never lead to AGI, never, does a PhD know all the internet? no but he can create new knowledge, LLMs are trained with all the data possible to cover most cases and they are excellent for autocomplete

mkbelieve 18 hours ago

I don't understand how anyone can believe that we're near even a whiff of AGI when we barely understand what dreaming is, or how the human brain interacts with the quantum world. There are so many elements of human creativity that are still utterly hidden behind a wall that it makes me feel insane when an entire industry is convinced we're just magically going to have the answer soon.

The people heralding the emergence of AGI are doing little more than pushing Ponzi schemes along while simultaneously fueling vitriolic waves of hate and neo-luddism for a ground-breaking technology boom that could enhance everything about how we live our lives... if it doesn't get regulated into the ground due to the fear they're recklessly cooking up.

hackinthebochs 16 hours ago

Big scientific revolutions tend to happen before we understand the relevant mechanisms. It is only after the fact that we develop a theory to understand how it works. AGI will very likely follow the same trend. Enough people are throwing enough things at the wall that eventually something will stick.
kovek 18 hours ago

There's many different definitions of "AGI" that people come up with, and some include dreaming, quantum world, creativity, and some do not.
throwaway-0001 17 hours ago

We Dont know how a horse works, but we got cars. Analogy doesn’t work.

nemo44x 16 hours ago

We are closer to where we were 20 years ago than we are to AGI today.

moomoo11 17 hours ago

What about Super AGI?

6d6b73 18 hours ago

Even without AGI, current LLMs will change society in ways we can't yet imagine. And this is both good and bad. Current LLMs are just a different type of automation, not mechanical like control systems and robots, but intellectual. They don't have to be able to think independently, but as long as they automate some white-collar tasks, they will change how the rest of society works. The simple transistor is just a small electronic component that is a better version of a tube, and yet it changed everything in a few decades. How will the world change because of LLMs? I have no idea, but I know it doesn't have to be AGI to cause a lot of upheaval.

password54321 17 hours ago

They can't even automate chat support the very thing you would think LLMs would be good at. Yet I always end up needing to talk to a person.
flyinglizard 18 hours ago

The same thing you described that makes LLM great also make them entirely non-deterministic and unreliable for serious automated applications.

BoredPositron 19 hours ago

AGI is going to be the new fusion.

aussieguy1234 13 hours ago

and in a decade, will it still be a decade away?

konart 19 hours ago

2035 singularity etc

spydum 18 hours ago

2038 will be more significant
- edbaskerville 18 hours ago
  
  For a second I thought you were citing some special mythological timeline from AI folks.
  Then I got it. :) Something so mundane that maybe the AIs can help prevent it.

netrap 18 hours ago

Wonder if it will end up like nuclear fusion.. just another decade away! :)

jimbo808 13 hours ago

Since we’re pulling numbers out of our ass, I think AGI is 500 years away. But really, I don’t know how we’re going to define it, but if AGI means computers can outperform me at all cognitive tasks, I’d bet money that’s not going to arrive this century.

People are starting to get catch on, but most non-tech people don’t use LLMs for anything more than simple questions that can be easily answered by summarizing regurgitated snippets of training data. To them, it looks intelligent. And yeah, the humans who wrote the training samples it regurgitated probably were intelligent.

It’s just a fact, one that becomes glaringly obvious when you use LLMs daily to do real work, that this is just not the tech that will lead to AGI.

They found a really clever pattern matching technique that, when combined with absurd amounts of data and compute, can reproduce plausible summaries of training data which can be stitched together in useful ways. It’s a useful tool. But the whole AGI conversation is so absurdly far away from this that it’s just clear that these guys pushing a very dishonest grift.

exasperaited 13 hours ago

So "we haven't finished inventing it yet, but when we do it will be awesome".

https://xkcd.com/678/

I'm sure the US economy has ten more years of the data centre money, it'll be fine.

I wonder how far off the "Sell it all — today" Margin Call moment is.

lofaszvanitt 10 hours ago

[flagged]

dang 7 hours ago

Please don't post unsubstantive comments to HN. Thoughtful criticism is welcome.

Mistletoe 18 hours ago

Good because we have no framework whatsoever enabled for if it is legal or ethical to turn it off. Is that murder? I think so.

lyu07282 18 hours ago

We don't even have any intention to do anything about millions of people loosing their jobs and driven into poverty by it, in fact the investments right now gamble/depend on that wealth transfer to happen in the future. We don't even give a shit about other humans, there is absolutely no way we will care about a (hypothetical) different life form entirely.

m3kw9 18 hours ago

Another AGI discussion without first defining what is AGI in their minds.

awesome_dude 18 hours ago

I have massive respect for Andrej, my first encounter with "him" was following his tutorials/notes when he was a grad student/tutor for AI/ML.

I was a lot disappointed when he went to work for Tesla, and I think that he had some achievement there, butnot nearly the impact I believe he potentially has.

His switch (back?) to OpenAI was, in my mind, much more in keeping with where his spirit really lies.

So, with that in mind, maybe I've drunk too much kool aid, maybe not. But I'm in agreement with him, the LLMs are not AGI, they're bloody good natural language processors, but they're still regurgitating rather than creating.

Essentially that's what humans do, we're all repeating what our education/upbringing told us worked for our lives.

But we all recognise that what we call "smart" is people recognising/inventing ways to do things that did not exist before. In some cases its about applying a known methodset to a new problem, in others its about using a substance/method in a way that other substances/methodsets are used, but the different substance/methodset produces something interesting (think, oh instead of boiling food in water, we can boil food in animal fats... frying)

AI/LLMs cannot do this, not at all. That spark of creativity is agonisingly close, but, like all 80/20 problems, is likely still a while away.

The timeline (10 years) - it was the early 2010s (over 10 years ago now) that the idea of backward propagation, after a long AI winter, finally came of age. It (the idea) had been floating about since at least the 1970s. And that ushered in the start of our current revolution, that and "Deep Learning" (albeit with at least another AI winter spanning the last 4 or 5 years until LLMs arrived)

So, given that timeline, and the restraints in the currrent technology, I think that Andrej is on the right track, and it will be interesting to see where we are in ten years time.

chasd00 17 hours ago

if openAI didn't put a chat interface in front of an LLM and make it available to the public wouldn't we still be in the same AI winter? Google, Meta, Microsoft, all of the major players were doing lots of LLM work already, it wasn't until the general public found out through the OpenAI's website that it really took off. I can't remember who said it, it was some CEO, that OpenAI had no moat but nether did anyone else. They all had LLMs already of their own. Was the breakthrough the LLM or making it accessible to the general public?
- robotswantdata 17 hours ago
  
  The meme in 22/23 was OpenAI was really “Available AI”
throwaway-0001 17 hours ago

How to tell if you regurgitated this comment vs being truly creative? If you can show me objectively, I’m sold.
- password54321 2 hours ago
  
  You know LLMs are regurgitating when they will contradict their statements just by clicking 'redo' on a prompt. I doubt if you were the ask the same question that they would suddenly say the complete opposite of what they just said.
  Comparing LLMs trained on reddit comments and people who learn to speak as a byproduct of actually interacting with people and the world is nuts.
- awesome_dude 14 hours ago
  
  That's not the creativity aspect, my comment is an observation, which, by definition, is a regurgitation of events.
  Edit: This also demonstrates that people think (erroneously) that AI pumping out code, or content, or even essays, is inventive, but it's not.
  This is merely a description and reduction, both of which AI can do, but neither of which are an invention.
  
  throwaway-0001 7 hours ago
  
  Actually I think the line between creative and regurgitate is so blurred you can’t tell me a single creative thing you did. So if 99% of people are not creative, and just regurgitate then why we keep AI standards so high?
  Can you show me one single thing you did in your life that was truly creative and not regurgitated?
  
  awesome_dude 4 hours ago
  
  I think that was my point, I generally regurgitate. A person can do that a lot in life.
  That's why people are conflating LLMs for AGI.
  For now, I think that the key difference between me, and an LLM is that an LLM still needs a prompt.
  It's not surveying the world around it determining what it needs to do.
  I do a lot of something that I think an LLM cannot get do, look at things and try to find what attributes they have and how I can harness those to solve problems. Most of the attributes are unknown by the human race when I start.
  
  throwaway-0001 4 hours ago
  
  Your fist prompt was just biological.
  So if I make an ai with an a prompt and tell him to re prompt itself every day for the rest of his life means is smart now? Or just because I give him the first prompt is invalid? I doubt your first prompt was given by yourself. Was probably in your mums belly your first prompt.
  —-
  I could give an initial prompt to my ai to survey the server and act accordingly… and he can re prompt every day himself.
  ——
  > I do a lot of something that I think an LLM cannot get do, look at things and try to find what attributes they have and how I can harness those to solve problems. Most of the attributes are unknown by the human race when I start.
  Any examples? An ai can look at a conversation and extract insights better than most people. Negotiate better than most people.
  —-
  I heard nothing that you can do more than a llm. Self prompting yourself to do something I don’t think is a differentiator.
  You also self prompt yourself based on Previous feedback. And you do this since you’re a baby. So someone also gave you the source prompt. Maybe dna.
  
  awesome_dude 3 hours ago
  
  I do tire of your attempts to "corner" me into something I have no interest in doing.
  I don't believe you have the capacity to understand why AGI hasn't been realised yet, and, frankly, I doubt you ever will.
  
  cindyllm 3 hours ago
  
  [dead]

rootusrootus 10 hours ago

I expect that Andrej is likely to be an optimist. So this counts as reassuring news -- I'm just under 10 years out from when I anticipate retiring, so if we can just hold off my replacement bot until then...

superconduct123 17 hours ago

I always get a weird feeling when AI researchers and CS people start talking about comparisons between human brains and AI/computers

Why is there a presumption that we (as people who have only studied CS) know enough about biology/neuroscience/evolution to make these comparisons/parallels/analogies?

I enjoy the discussions but I always get the thought in the back of my head "...remember you're listening to 2 CS majors talk about neuroscience"

empiko 16 hours ago

We should completely strip all this talk from AI as a field (and get rid of that name as well). It just causes endless confusion, especially for general audience. In the end, the whole shtick with LLMs is that we train matrices to predict next tokens. You can explain this entire concept without invoking AGI, Roko's basilisk, the nature of human consciousness, and all the other mumbo jumbo that tries so hard to make this field what it is not.
- scotty79 12 hours ago
  
  But people love misguided narratives and analogies. How else should we kill time when we are to dumb to accelerate inevitable progress and just need to wait for it?
  
  __loam 12 hours ago
  
  How else do we invite ludicrous amounts of malinvestment from wall street than to evoke fields of biology we know literally nothing about?
ainch 13 hours ago

There is a lot of overlap between AI and Neuroscience, especially among older researchers. For example Karpathy's PhD supervisor, Fei-Fei Li, researched vision in cat brains before working on computer vision, Demis Hassabis did his PhD in Computational Neuroscience, Geoff Hinton studied Psychology etc... There's even the Reinforcement Learning and Decision Making conference (RLDM - very cool!), which pairs Reinforcement Learning with neuro research and brings together people from both disciplines.
I suspect the average AI researcher knows much more about the brain than typical CS students, even if they may not have sufficient background to conduct research.
- superconduct123 10 hours ago
  
  Fair enough, I guess its a bit different nowadays since the background is usually a PhD in compsci
arawde 17 hours ago

From personal experience making the same comparisons during undergrad, I think it just comes down to the availability of conceptual models. If the brain does X, there's a good chance that a computer does something that looks like X, or that X could be recreated through steps Y & Z, etc.
Once I started to realize just how much of the brain is inscrutable, because it is a machine operating on chemicals instead of strict electrical processing, I became a lot more reluctant to draw those comparisons
- genewitch 16 hours ago
  
  Lucky for all of us we're alive during a "quantum" thing! Which has been an idea since at least the mid 1990s as i first saw it in a 2600 around that time...
chasd00 15 hours ago

> Why is there a presumption that we (as people who have only studied CS) know enough about biology/neuroscience/evolution to make these comparisons/parallels/analogies?
well it's straightforward. First lets assume a spherical, perfectly frictionless, brain..
tim333 11 hours ago

AI researchers and CS people and the rest of us are human brain users and so have some familiarity with them even if they haven't studied neuroscience.
You can make some comparisons between how they perform without really understanding how LLMs or brains work, like to me LLMs seem similar to the part human minds where you say stuff without thinking about it. But you never really get an LLM saying I was thinking about that stuff and figured this bit was wrong, because they don't really have that capability.
jjulius 17 hours ago

>Why is there a presumption that we (as people who have only studied CS) know enough about biology/neuroscience/evolution to make these comparisons?
Hubris.
- rootusrootus 17 hours ago
  
  Exactly. Someone way back when decided to call them neural networks, and now a lot of people think that they are a good representation of the real thing. If we make them fast enough, powerful enough, we'll end up with a brain!
  Or not.
  
  karmakaze 11 hours ago
  
  There was an actual simulation of a brain that could respond appropriately to stimuli. It ran many orders of magnitude slower than real-time but demonstrated the correlation. Probably not using the DNNs that we use now, but still a machine.
  
  voidhorse 15 hours ago
  
  I wish McCulloch and Pitts could see how much intellectual damage that wildly bold analogy they made would do. (though seeing as they seemingly had no qualms with issuing such a wildly unjustified analogy with the absolute paucity of scientific information they had at the time, I guess they'd be happy about it overall).
  
  __loam 12 hours ago
  
  Computational neurons were developed with the express intent of studying models of the brain based on the contemporary understanding of neuroscience. That understanding has evolved massively over the last 7 decades and meanwhile the concept of the perceptron has proven to be a useful mathematical construct in machine learning and statistical computing. I blame the modern business culture if software development more than I blame dead scientists for the misunderstanding being peddled to the public.
- ctoth 16 hours ago
  
  The hubris here isn't CS people making comparisons, it's assuming biological substrate matters. Your brain is doing computation with neurotransmitters instead of transistors. So what? The "chemicals not electricity" distinction is pure carbon chauvinism, like insisting hydraulic computers can't be compared to electronic ones because water isn't electricity. Evolution didn't discover some mystical process that imbues meat with special properties; it just hill-climbed to a solution using whatever materials were available. Brains work despite being kludges of evolutionary baggage, not because biology unlocked some deeper truth about intelligence.
  Meanwhile, these systems translate languages, write code, play Go at superhuman levels, and pass medical licensing exams... all tasks you'd have sworn required "real understanding" a decade ago. At some point, look at the goddamn scoreboard. If you think there's something brains can do that these architectures fundamentally can't, name it specifically instead of gesturing vaguely at "inscrutability." The list of "things only biological brains can do" keeps shrinking, and your objection keeps sounding like "but my substrate is special!!1111"
  
  j-krieger 14 hours ago
  
  > Your brain is doing computation with neurotransmitters instead of transistors.
  This is an incredible simplification of the process and also just a small part of it. There is increasing evidence that quantum effects might play a part in the inner workings of the brain.
  > Brains work despite being kludges of evolutionary baggage, not because biology unlocked some deeper truth about intelligence.
  Now that is hubris.
  
  GoatInGrey 16 hours ago
  
  This seems naively dismissive of arguments around substrates considering that playing "Go at superhuman levels" took 1MW of energy versus the 1-2 (or if you want to assume 100% of the brain was applied to the game, 20) watts consumed by the human brain.
  
  __loam 12 hours ago
  
  How many examples did each system need to get good at the task too? It's currently a lot less for humans and we don't know why.
  
  jjulius 16 hours ago
  
  Case in point.
  
  JumpCrisscross 16 hours ago
  
  > Your brain is doing computation with neurotransmitters instead of transistors
  If it is, sure. But this isn't a given. We don't actually understand how the brain computes, as evidenced by our inability to simulate it.
  > Evolution didn't discover some mystical process that imbues meat with special properties
  Sure. But the complexity remains beyond our comprehension. Against the (nearly) binary action potential of a transmitter we have a multidimensional electrochemical system in the brain which isn't trivially reduced to code resembling anything we can currently execute on a transistor substrate.
  > hese systems translate languages, write code, play Go at superhuman levels, and pass medical licensing exams... all tasks you'd have sworn required "real understanding" a decade ago
  Straw man. Who said this? If anything, the symbolic linguists have been overpromising on this front since the 1980s.
  
  ctoth 16 hours ago
  
  Jonas & Kording showed that neuroscience methods couldn't reverse-engineer a simple 6502 processor [0]. If the tools can't crack a system we built and fully documented, our inability to simulate brains just means we're ignorant, not that substrate is magic. It also doesn't necessarily say great things for neuroscience!
  And "who said this?"... come on. Searle, Dreyfus, thirty years of "syntax isn't semantics," all the hand-wringing about how machines can't really understand because they lack intentionality. Now systems pass those benchmarks and suddenly it's "well nobody serious ever thought that mattered." This is the third? fourth? tenth? round of goalpost-moving while pretending the previous positions never existed.
  Pointing at "multidimensional electrochemical complexity" is just phlogiston with better vocabulary. Name something specific transformers can't do?
  [0] https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...
  
  JumpCrisscross 15 hours ago
  
  > If the tools can't crack a system we built and fully documented, our inability to simulate brains just means we're ignorant, not that substrate is magic
  Nobody said the substrate is magic. Just that it isn't understood. Plenty of CS folks have also been trying to simulate a brain. We haven't figured it out. The same logic that tells you the neuroscientific model is broken at some level should inform that the brains-as-computers model is similarly deficient.
  > Pointing at "multidimensional electrochemical complexity" is just phlogiston with better vocabulary
  Sorry, have you figured out how to simulate a brain?
  Multidimensional because you have more than one signalling chemical. Electrochemical because you can't just watch what the electrons are doing.
  > Name something specific transformers can't do?
  That what can't do. A neuron? A neurotransmitter-receptor system? We literally can't simulate these systems beyond toy models. We don't even know what the essential parts are--can you safely lump together N neutransmitter molecules? What's N? We're still discovering new ion channels?!
  
  sambapa 15 hours ago
  
  So everyone in neuroscience is ignorant but not you?
  
  JumpCrisscross 15 hours ago
  
  There is a lot of hocus pocus in neuroscience. Next to psychology, anthropology and macroeconomics.
  That doesn’t make the field useless nor OP’s point correct.
  
  voidhorse 15 hours ago
  
  I'm curious what you think understanding means.
  I personally do not think operational proficiency and understanding are equivalent.
  I can do many things in life pretty well without understanding them. The phenomenon of understanding seems distinct from the phenomenon of doing something/acting proficiently.
  
  __loam 12 hours ago
  
  There is no evidence that neurons have remotely the same computational mechanism as a transistor.
  Memorizing billions of answers from the training set also isn't that impressive.
rhetocj23 15 hours ago

Ive also found this jarring and it speaks to the hubris of folks that have emerged in the past few decades who dont seem to have much relation to the humanities and liberal arts.
aughtdev 16 hours ago

Yeah, the last 3 years of "We now know how to build AGI" failing to deliver shows that there's something being missed about the nature of intelligence. The "We are all stochastic parrots" people has been awfully quiet recently

sosodev 18 hours ago

I think it's a shame that a 146 minute podcast released ~55 minutes ago has so much discussion. Everybody here is clearly just reacting to the title with their own biases.

I know it's against the guidelines to discuss the state of a thread, but I really wish we could have thoughtful conversations about the content of links instead of title reactions.

dang 7 hours ago

It takes time for more reflective comments to appear, because reflection is a slower mental operation. Reflexive responses are much faster and tend to be generic and shallow. (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...)
I believe this distinction is pretty fundamental to humans, so we're not likely to escape it, but the good news is that reflective comments do show up eventually if the article is substantive and the reflexive ones haven't ruined the thread. We also try to downweight the more reflexive subthreads.
More at https://news.ycombinator.com/item?id=45625084.
Yossarrian22 18 hours ago

Maybe they used AI to transcribe and summarize the podcast
markbao 18 hours ago

Just as the core idea of a book can be (lossily) summarized in a few sentences, the core crux of an argument can be quite simple and not require wading though the whole discussion (the AGI discussion is only 30 minutes anyhow).
Granted, a bunch of commenters are probably doing what you’re saying.
mpalmer 18 hours ago

Be fair; plenty of people transcribe and read podcasts, and/or summarize/excerpt them.
- yoz-y 16 hours ago
  
  The idea that people would do this has never even crossed my mind. Not disputing that people do this, mind you. Technology is certainly there, but I also think that it’s very prone to taking ideas out of context.
- j45 18 hours ago
  
  Summaries are great, but can be surface.
  The brain processes and has insights differently experiencing it at conversation speed.
  We might get what the conversation was that others had, but it can miss the mark for the listening and inner processing that leads to it's own gifts.
  It's not about one or the other for me, usually both.
jasonthorsness 18 hours ago

This one does have the full transcript underneath (wonderful feature). But it's a long read too so I think your assumption is correct :P.
meowface 18 hours ago

Eh, Dwarkesh has to market the podcasts somehow. I think it's fine for him to use hooks like this and for HN threads to respond to the hooks. 99% of HN threads only ever reply to the headline and that's not changing anytime soon. This will likely cause many people (including myself) to watch the full podcast when we otherwise might not have.
The criticism that people are only replying to a tiny portion of the argument is still valid, but sometimes it's more fun to have an open-ended discussion rather than address what's in the actual article/video.
- dang 7 hours ago
  
  99%? I have to stick up for HN here!
  
  meowface 2 hours ago
  
  Ok, maybe not 99%. Probably at least 50% of comments in 70% of threads, though...
fragmede 18 hours ago

Who listens to podcasts at 1x speed? That's unbearably slow!
- Imnimo 17 hours ago
  
  The trouble is Karpathy already speaks at 1.5x speed.
- ghaff 18 hours ago
  
  I do. I'm really not a fan of sped-up audio in general. If I'm focused on speed I'd rather read/skim a transcript.
tauchunfall 18 hours ago

there is a transcript, people can skim for interesting parts and read for 30 minutes and then comment.
edit: typo fix.
tootie 17 hours ago

Idk how this Dwarkesh Patel got so popular so fast. I'd never heard of him and he keeps popping up in my feeds.
jlhawn 18 hours ago

gotta listen at 2x speed!
therealmarv 18 hours ago

a very human reaction ;)

voidhorse 15 hours ago

I wish the world could stop giving claims like this, in general, any attention.

We do not know how "far away" we are from "AGI" period. It's also useless. If you're correct...so what? Someone may have been able to perfectly predict the advent of railway travel. Guess what, this gave them 0 advantage unless they already had tons of capital to invest, which is effectively what makes the realization of the predicted thing come to fruition in the first place. Bets like these are at best self-fulfilling prophecies if you are a billionaire and at worst ideal chatter that makes us all stupider the more time we waste on them and the more we let wildly unchecked claims like this dictate behaviors in the present that actually affect us.

jb1991 18 hours ago

I would bet all of my assets of my life that AGI will not be seen in the lifetime of anyone reading this message right now.

That includes anyone reading this message long after the lives of those reading it on its post date have ended.

Which of course raises the interesting question of how I can make good on this bet.

ashivkum 15 hours ago

genuinely curious to hear your reasoning for why this is the case. i'm always somewhere between bemused and annoyed opening the daily HN thread about AGI and seeing everyone's totally unfounded confidence in their predictions.
my position is I have no idea what is going to happen.
- makotech221 11 hours ago
  
  its incredibly stupid to believe general intelligence is just a series of computations that can be done by a computer. The stemlords on the west coast need to take philosophy classes.
  
  KylerAce 10 hours ago
  
  I don't think it's stupid to believe that the brain is somehow beyond turing computable considering how easy it is to create a system exactly as capable as a turing machine. I also don't think that anything in philosophy can provide empirical evidence that the brain is categorically special as opposed to emergently special. The sum total of the epistemology I've studied boiled down to people saying "I think human consciousness / the brain works like this" with varying degrees of complexity.
  
  tokioyoyo 10 hours ago
  
  The problem with this argument is assuming there is general consensus on “what intelligence is”.
- BoorishBears 14 hours ago
  
  what about the fact frontier labs are spending more compute on viral AI video slop and soon-to-be-obsoleted workplace usecases than research?
  Even if you don't understand the technicals, surely you understand if any party was on the verge of AGI they wouldn't behave as these companies behave?
  
  echoangle 11 hours ago
  
  What does that tell you about AI in 100 years though? We could have another AI winter and then a breakthrough and maybe the same cycle a few times more and could still somehow get AGI at the end. I’m not saying it’s likely but you can’t predict the far future from current companies.
  
  BoorishBears 9 hours ago
  
  You're making the mistake of assuming the failure of the current companies would be seperated from the failures of AI as a technology.
  If we continue the regime where OpenAI gets paid to buy GPUs and they fail, we'll have a funding winter regardless of AI's progress.
  I think there is a strong bull case for consumer AI but it looks nothing like AGI, and we're increasingly pricing in AGI-like advancements.
  
  Rudybega 11 hours ago
  
  > what about the fact frontier labs are spending more compute on viral AI video slop and soon-to-be-obsoleted workplace usecases than research?
  That's a bold claim, please cite your sources.
  It's hard to find super precise sources on this for 2025, but epochAI has a pretty good summary for 2024. (with core estimates drawn from the Information and NYT
  https://epoch.ai/data-insights/openai-compute-spend
  The most relevant quote: "These reports indicate that OpenAI spent $3 billion on training compute, $1.8 billion on inference compute, and $1 billion on research compute amortized over “multiple years”. For the purpose of this visualization, we estimate that the amortization schedule for research compute was two years, for $2 billion in research compute expenses incurred in 2024."
  Unless you think that this rough breakdown has completely changed, I find it implausible that Sora and workplace usecases constitute ~42% of total training and inference spend (and I think you could probably argue a fair bit of that training spend is still "research" of a sort, which makes your statement even more implausible).
  
  BoorishBears 7 hours ago
  
  Sorry I'm giving too much credit to the reader here I guess.
  "AI slop and workplace usecases" is a synecdoche for "anything that is not completing then deploying AGI".
  The cost of Sora 2 is not the compute to do inference on videos, it's the ablations that feed human preference vs general world model performance for that architecture for example. It's the cost of rigorous safety and alignment post-training. It's the legal noise and risk that using IP in that manner causes.
  And in that vein, the anti-signal is stuff like the product work that is verifying users to reduce content moderation.
  These consumer usecases could be viewed as furthering the mission if they were more deeply targeted at collecting tons of human feedback, but these applications overwhelmingly are not architected to primarily serve that benefit. There's no training on API usage, there's barely any prompts for DPO except when they want to test a release for human preference, etc.
  None of this noise and static has a place if you're serious about to hit AGI or even believe you can on any reasonable timeline. You're positing that you can turn grain of sand into thinking intelligent beings, ChatGPT erotica is not on the table.
  
  dwaltrip 11 hours ago
  
  They don’t.
  
  BoorishBears 7 hours ago
  
  Is that why Sam is on Twitter people paying them $20 a month is their top compute priority as they double compute in response to people complaining about their not-AGI that is a constant suck between deployment, and stuff like post-training specifically for making the not-AGI compatible with outside brand sensibilities?
tim333 11 hours ago

I'd bet the other way because I think Moore's law like advances in compute will make things much easier for researchers.
Like I was watching Hinton explain LLMs to Jon Stewart and they were saying they came up with the algorithm in 1986 but then it didn't really work for the decades until now because the hardware wasn't up to it (https://youtu.be/jrK3PsD3APk?t=1899)
If things were 1000x faster you could semi randomly try all sorts of arrangements of neural nets to see which think better.
- jb1991 7 hours ago
  
  You’re making the common assumption that “the algorithm“ is everything we need to get to AGI and it’s just a question of scaling.
  
  tim333 5 hours ago
  
  I guess so. Is there reason to think an appropriate algorithm and scale can't do that?
zurfer 15 hours ago

Well you wouldn't bet all your assets because it would be an illiquid market that could only resolve in your favor in earliest 80 years.
If you're really serious about it put the money into a prediction market. Poly market has multiple AGI bets.
- yodsanklai 14 hours ago
  
  I see only one with 4% chance in 2025 (obviously...). And AGI is defined as "OpenAI announces they reached AGI".
  https://polymarket.com/event/openai-announces-it-has-achieve...
  
  encroach 12 hours ago
  
  Here's 46% for 2030. It's had $350k in volume across the 4 markets.
  https://kalshi.com/markets/kxoaiagi/openai-achieves-agi/oaia...
  
  echoangle 11 hours ago
  
  That’s also about OpenAI claiming they have AGI. That doesn’t resolve based on actual AGI.
  
  tim333 11 hours ago
  
  I wonder if there is a test for AGI which is definite enough to bet on? My personal test idea is when you can send for a robot to come fix your plumbing rather than needing a human.
plaidfuji 18 hours ago

Should probably just short nvidia
- Thrymr 16 hours ago
  
  "just short nvidia" is not simple. Even if you believe it is overvalued, and you are correct, a short is a specific bet that the market will realize that fact in a precise amount of time. There are very significant risks in short selling, and famously, the market can stay irrational longer than you can remain solvent.
- simonsarris 16 hours ago
  
  There is a wide space where LLMs and their offshoots make enormous productivity gains, while looking nothing like actual artificial intelligence (which has been rebranded AGI), and Nvidia turns out to have a justified valuation etc.
  
  lm28469 13 hours ago
  
  It's been three years now, where is it? Everyone on hn is now a 10x developers, where are all the new startups making $$$? Employees are 10x more productive, where are the 10x revenues? Or even 2x?
  Why is growth over the last 3 years completely flat once you remove the proverbial AI pickaxes sellers?
  What if all the slop generated by llms counterbalance any kind of productivity boost? 10x more bad code, 10x more spam emails, 10x more bots
- Etheryte 16 hours ago
  
  You can generally buy options only a few years out. A few years is decidedly shorter than the lifetime of everyone reading this thread.
- lbhdc 16 hours ago
  
  “Markets can remain irrational longer than you can remain solvent.”
- guluarte 15 hours ago
  
  that's probably a good idea, either AI bubble explodes or competitors catch up
yodsanklai 14 hours ago

> how I can make good on this bet.
I agree with you, and I think that's where Polymarket or similar could be used to see if these people would put your money where their mouth is (my guess is that most won't).
But first we would need a precise definition of AGI. They may be able to come with a definition that makes the bet winnable for them.
asah 17 hours ago

Depends on the definition, I might take that bet because under some definitions were already here.
Example: better than average human across many thinking tasks is done.
- rootusrootus 17 hours ago
  
  I think that the definition needs to include something about performance on out-of-training tasks. Otherwise we're just talking about machine learning, not anything like AGI.
  
  balder1991 15 hours ago
  
  Yes, like stated in this video: https://youtu.be/COOAssGkF6I
- Yizahi 16 hours ago
  
  Calculator can do arithmetic better than a human. Does this mean we have so called AI for half a century now?
  
  KeplerBoy 15 hours ago
  
  That's how the term was sometimes used before. Think of video games AIs, those weren't (and still aren't) especially clever, but they were called AIs and nobody batted an eye at that.
  
  Yizahi 15 hours ago
  
  When I write AI I mean what LLM apologists mean by AGI. So to rephrase I was talking about so called AGI 50 years ago in a calculator. I don't like this recent term inflation.
  
  xboxnolifes 15 hours ago
  
  A calculator does 1 thinking task.
  
  Yizahi 15 hours ago
  
  First of all, it's zero thinking tasks, calculators can't think. But let's call it that way for the sake of an argument. LLM can do less than a dozen thinking tasks, and I'm generous here. Generating text, generating still images, generating digital music, generating video, and generate computer code. That's about it. Is that a complete and exhaustive list of all what constitutes a human? Or at least a human mind? If some piece of silicon can do 5-6 tasks it is a human equivalent now? (AI aka AGI presumes human mind parity)
  
  CamperBob2 15 hours ago
  
  Let's get an English major to take a calculator to the International Math Olympiad, and see how that goes.
  
  Yizahi 14 hours ago
  
  So a sign of AGI or intelligence on par with human is the ability to solve small generic math problems? And it still requires a handler human level intellinge to be paired with, to even start solving those math problems? Is that about right?
  
  CamperBob2 13 hours ago
  
  Not even close to right. First of all, the "small generic math problems" given at IMO are designed to challenge the strongest students in the world, and second, the recent results have been based on zero-shot prompts. The human operator did nothing but type in the questions and hit Enter.
  If you do not understand the core concepts very well, by any rational definition of "understand," then you will not succeed at competitions like IMO. A calculator alone won't help you with math at this level, any more than a scalpel by itself would help you succeed at brain surgery.
  
  rhetocj23 13 hours ago
  
  It may be difficult for you to believe or digest, but this means nothing for actual innovation. Im yet to see the effects of LLMs send a shockwave in the real economy.
  Ive actually hung around Olympiad level folks and unfortunately, their reach of intellect was limited in specific ways that didnt mean anything in regards to the real economy.
  
  CamperBob2 12 hours ago
  
  You seem to be arguing with someone who isn't here. My point is that if you think a calculator is going to help you do math you don't understand, you are going to have a really tough time once you get to 10th grade.
- sambapa 15 hours ago
  
  Good ol' Turing Test, but the real one, not the pop-sci one.
jaza 7 hours ago

Agreed. But I'd also be willing to bet big, that the cycle of "new AI breakthrough is made, AI bubble ensues and hypesters claim AGI is just around the corner for several years, bubble bursts, all quiet on the AI front for a decade or two" continues beyond the lifetime of anyone reading this message right now.
rokkamokka 18 hours ago

Will you take a wager of my one dollar versus your life assets? :)
FL33TW00D 15 hours ago

How certain are you of this really? I'd take this bet with you.
You're saying that we won't achieve AGI in ~80 years, or roughly 2100, equivalent to the time since the end WW2.
To quote Shane Legg from 2009:
"It looks like we’re heading towards 10^20 FLOPS before 2030, even if things slow down a bit from 2020 onwards. That’s just plain nuts. Let me try to explain just how nuts: 10^20 is about the number of neurons in all human brains combined. It is also about the estimated number of grains of sand on all the beaches in the world. That’s a truly insane number of calculations in 1 second."
Are humans really so incompetent that we can't replicate what nature produced through evolutionary optimization with more compute than in EVERY human brain?
- yodsanklai 13 hours ago
  
  How does a neuron compare to a flop?
colecut 18 hours ago

If you are right, you don't have to
vonneumannstan 15 hours ago

>I would bet all of my assets of my life that AGI will not be seen in the lifetime of anyone reading this message right now. That includes anyone reading this message long after the lives of those reading it on its post date have ended.
By almost any definition available during the 90s GPT-5 Thinking/Pro would pretty much qualify. The idea that we are somehow not going to make any progress for the next century seems absurd. Do you have any actual justification for why you believe this? Every lab is saying they see a clear path to improving capabilities and theres been nothing shown by any research I'm aware of to justify doubting that.
- jb1991 15 hours ago
  
  The fact is that no matter how "advanced" AI seems to get, it always falls short and does not satisfy what we think of as true AI. It's always a case of "it's going to get better", and it's been said like this for decades now. People have been predicting AGI for a lot longer than the time I predict we will not attain it.
  LLMs are cool and fun and impressive (and can be dangerous), but they are not any form of AGI -- they satisfy the "artificial", and that's about it.
  GPT by any definition of AGI is not AGI. You are ignoring the word "general" in AGI. GPT is extremely niche in what it does.
  
  vonneumannstan 15 hours ago
  
  >GPT by any definition of AGI is not AGI. You are ignoring the word "general" in AGI. GPT is extremely niche in what it does.
  Definitions in the 90s basically required passing the Turing Test which was probably passed by GPT3.5. Current definitions are too broad but something like 'better than the average human at most tasks' seems to be basically passed by say GPT5, definitions like 'better than all humans at all tasks' or 'better than all humans at all economically useful tasks' are closer to Superintelligence.
  
  jb1991 15 hours ago
  
  The Turing Test was never about AGI.
  
  nearbuy 12 hours ago
  
  That's pretty much exactly what Alan Turing made the Turing test for. From the Wikipedia entry:
  > The Turing test, originally called the imitation game by Alan Turing in 1949, is a test of a machine's ability to exhibit intelligent behaviour equivalent to that of a human.
  > The test was introduced by Turing in his 1950 paper "Computing Machinery and Intelligence" while working at the University of Manchester. It opens with the words: "I propose to consider the question, 'Can machines think?'"
  > This question, Turing believed, was one that could actually be answered. In the remainder of the paper, he argued against the major objections to the proposition that "machines can think".
  
  jb1991 7 hours ago
  
  Cherry-picking is not a meaningful contribution to this discussion. You are ignoring the entire section on that page called “Weaknesses”.
- port3000 15 hours ago
  
  They have to say that, or there'll be a loud sucking sound and hundreds of billions in capital will be withdrawn overnight
  
  vonneumannstan 15 hours ago
  
  Ok that's great do you have evidence suggesting scaling is actually plateauing or that capabilities of GPT6 and Claude 4.5 Opus won't be better than models now?
  
  jb1991 14 hours ago
  
  You are suggesting, in your reference to scaling, that this is a game of quantity. It is not.
lvl155 15 hours ago

We are pretty close. There are some insane cutting edge developments being done in private.
- jb1991 15 hours ago
  
  I doubt your use of "insane".
tymscar 12 hours ago

Escrow
akomtu 15 hours ago

It's about the same as betting all life savings on nuclear war not breaking out in our lifetime. If AI gets created, we are toast and those assets won't be worth anything.
louiereederson 16 hours ago

short oracle
- OtherShrezzing 16 hours ago
  
  Is anyone _not_ short Oracle? The downside risk for them is that they’ll lose a deal worth 10x their annual revenues.
  Their potential upside is that OpenAI (a company with lifetime revenues of ~$10bn) have committed to a $300bn lease, if Oracle manages build a fleet of datacenters faster than any company in history.
  If you’re not short, you definitely shouldn’t be long. They’re the only one of the big tech companies I could reasonably see going to $0 if the bubble pops.
  
  benregenspan 12 hours ago
  
  With the executive branch now picking "national champion" companies (as in Intel deal), I feel like there's a big new short risk to consider. Would the current administration allow Oracle to go to zero?
guluarte 15 hours ago

my bet is we will just slowly automate things more and more until one day someone will point out when we reached "AGI"
vonneumannstan 15 hours ago

You can make this bet functional if you really believe it, which you of course really don't. If you actually do then I can introduce you to some people happy to take your money in perpetuity.
nextworddev 16 hours ago

you can do that by shorting Oracle here
- stock_toaster 16 hours ago
  
  “Markets can remain irrational longer than you can remain solvent.” - John Maynard Keynes
  
  nextworddev 16 hours ago
  
  yeah, of course. just framing the OP's bravado

PedroBatista 18 hours ago

Following the comments here, yes: AGI is the new Cold Fusion.

However, don't let the bandwagon ( from either side ) cloud your judgment. Even warm fusion or any fusion at all is still very useful and it's here to stay.

This whole AGI and "the future" thing is mostly a VC/Banks and shovel sellers problem. A problem that has become ours too because the ridiculous amounts of money "invested", so even warm fusion is not enough from an investment vs expectations perspective.

They are already playing musical money chairs, unfortunately we already know who's going to pay for all of this "exuberance" in the end.

I hope this whole thing crashes and burns as soon as possible, not because I don't "believe" in AI, but because people have been absolutely stupid about it. The workplace has been unbearable with all this stupidity and amounts of fake "courage" about every single problem and the usual judgment of the value of work and knowledge your run-of-the-mill dipshit manager has now.

imiric 19 hours ago

It has always been "a decade away".

But nothing will make grifters richer than promising it's right around the corner.

jeffreygoesto 18 hours ago

Ah. The old "still close enough, so we can pretend you should pour your money over us, as we haven't identified the next hype yet" razzle dazzle...

evandrofisico 19 hours ago

And self sustained nuclear fusion is 20 years away, perpetually. On which evidence can he affirm a timeline for AGI when we can barely define intelligence?

CaptainOfCoit 18 hours ago

And a program that can write, sound and paint like a human was 20 years away perpetually as well, until it wasn't.
- input_sh 18 hours ago
  
  Another way to put it is that it writes, sounds and paints as the Internet's most average user.
  If you train it on a bunch of paintings whose quality ranges from a toddler's painting to Picasso's, it's not going to make one that's better than Picasso's, it's going to output something more comparable to the most average painting it was trained on. If you then adjust your training data to only include world's best paintings ever since we began to paint, the outcome is going to improve, but it'll just be another better-than-human-average painting. If you then leave it running 24/7, it'll churn out a bunch of better-than-human-average paintings, but there's still an easily-identifiable ceiling it won't go above.
  An oracle that always returns the most average answer certainly has its use cases, but it's fundamentally opposed to the idea of superintelligence.
  
  CaptainOfCoit 17 hours ago
  
  > Another way to put it is that it writes, sounds and paints as the Internet's most average user.
  Yes, I agree, it's not high quality stuff it produces exactly, unless the person using it already is an expert and could produce high quality stuff without it too.
  But there is no denying it that those things were regarded as "far-near future maybe" for a long time, until some people put the right pieces together.
- galangalalgol 18 hours ago
  
  This is the key insight I believe. It is inherently unpredictable. There are species that pass the mirror test with a far fewer equivalent number of parameters than large models are using already. Carmack has said something to the effect that about 10ksloc would glue the right existing achictectures together in the right way to make agi, but that it might take decades to stumble on that way, or someone might find it this afternoon.
  
  palmotea 8 hours ago
  
  > Carmack has said something to the effect that about 10ksloc would glue the right existing achictectures together in the right way to make agi
  What does he know about that?
- woodruffw 18 hours ago
  
  Is this true? I think it’s equally easy to claim that these phenomena are attributable to aesthetic adaptability in humans, rather than the ability of a machine to act like a human. The machine still doesn’t possess intentionality.
  This isn’t a bad thing, and I think LLMs are very impressive. But I do think we’d hesitate to call their behavior human-like if we weren’t predisposed to anthropomorphism.
- walterbell 18 hours ago
  
  > like a human
  Humans have since adapted to identify content differences and assign lower economic value to content created by programs, i.e. the humans being "impersonated" and "fooled" are themselves evolving in response to imitation.
helterskelter 18 hours ago

I'd argue we've had more progress towards fusion than AGI.
- chasd00 18 hours ago
  
  > I'd argue we've had more progress towards fusion than AGI.
  way more pogress toward fusion than AGI. Uncontrolled runaway fusion reactions were perfected in the 50s (iirc) with the thermonuclear bombs. Controllable fusion reactions have been common for many years. A controllable, self-sustaining, and profitable fusion reaction is all that is left. The goalposts that mark when AGI has been reached haven't even been defined yet.
- FiniteIntegral 18 hours ago
  
  Yet at the same time "towards" does not equate to "nearing". Relative terms for relative statements. Until there's a light at the end of the tunnel, we don't know how far we've got.
adastra22 18 hours ago

Fusion used to be perpetually 30 years away. We’re making progress!
nh23423fefe 18 hours ago

stop repeating that. first, it isn't true that intelligence is barely defined. https://arxiv.org/abs/0706.3639
second a definition is obviously not a prerequisite as evidenced by natural selection
- thomasdziedzic 18 hours ago
  
  > stop repeating that. first, it isn't true that intelligence is barely defined. https://arxiv.org/abs/0706.3639
  I don't think he should stop, because I think he's right. We lack a definition of intelligence that doesn't do a lot of hand waving.
  You linked to a paper with 18 collective definitions, 35 psychologist definitions, and 18 ai researcher definitions of intelligence. And the conclusion of the paper was that they came up with their own definition of intelligence. That is not a definition in my book.
  > second a definition is obviously not a prerequisite as evidenced by natural selection
  right, we just need a universe, several billions of years and sprinkle some evolution and we'll also get intelligence, maybe.
- mathgradthrow 18 hours ago
  
  An Arxiv paper listing 70 different definitions of intelligence is not the evidence that you seem to think it is.

stack_framer 17 hours ago

Just like fusion. It will revolutionize the world, but it's always just ... one more decade away.

Yizahi 16 hours ago

Also, I like like how almost nobody takes issue with a decade time interval. If he means that current LLMs, slowly plateauing in performance, would somehow take a decade to create AI (which he calls AGI)? Where would this fantastical gain in performance come from? Or he thinks it will be a different mechanism as a basis? But then what mechanism, it should be at least real in theory by now if it were to realize in a decade time.
Basically what I mean, is that if LLMs are future real AI basis, it would take less than a decade because they are in diminishing returns today. And if it is something completely new, then what exactly? And if it is something abstract, fuzzy and hypothetical, whence did a decade number come from?
This is basically Sam Altman's "5 to 10 years in the future"(1) all over again. Not less than 5 so as not to be verified in the near future, and no need to show at least something as a prototype or at least scientific theory. And no more than 10 year so as not to scare Softbank and other investors.
(1) https://fortune.com/2025/09/26/sam-altman-openai-ceo-superin...
https://www.forbes.com/sites/jodiecook/2024/07/16/openais-5-...
https://www.tomsguide.com/ai/chatgpt/sam-altman-claims-agi-i...
spjt 10 hours ago

The difference with fusion is that we have a very good understanding of how fusion works, and exactly what we need to figure out how to do, to make it a viable energy source. It's basically just an engineering problem, albeit a very difficult one due to the extreme conditions. AGI is more like developing warp drive. With AGI, we really have no idea how the brain works or any clue of what problems need to be solved. It's basically just like the underpants gnomes.
Phase 1: Buying more GPU to increase the number of parameters in a LLM Phase 2: ??? Phase 3: AGI
AGI may come anywhere between next week, 1000 years in the future, or never. Anyone who claims to have any idea is full of shit, because we don't even know what problems we need to solve to get there. If we develop a good model of how human cognition works at a biological level, there is at least a direction, but that isn't going to be coming out of some AI hype factory with a datacenter full of H100's making videos of anthropomorphic cats working as pastry chefs.
qingcharles 15 hours ago

I can't use fusion power yet.
Several hundred million people are using LLMs every day.
There has to be at least two orders of magnitude more investment in "AI" technologies than there are in fusion techs right now.
- bamboozled 10 hours ago
  
  We're driving LLMs to get results though, which is different to what's being discussed.
  Everytime I've used an LLM to achieve something, while useful, it's taken considerable effort on my part.
  In fact I don't think I've ever receive anything for free when using any "AI", except maybe saved time by typing.
tim333 11 hours ago

Dunno - I've followed Moravec and Kurzweil and the predicted dates have never budged, and things have followed the predictions fairly accurately.
rohit89 8 hours ago

It would not always be a decade or two away if it was funded like the current AI cycle.
Veedrac 12 hours ago

Y'all wouldn't believe this but 10 years ago AGI was a hundred years away.
oytis 17 hours ago

Fusion is 30 years away, AGI is much closer
- pier25 15 hours ago
  
  Fusion has been 30 years away since the 50s or 60s.
  I don't think I will see either AGI or commercial fusion in my lifetime.
  
  tim333 11 hours ago
  
  Helion says it's 3 years away which is progress from saying it's 30 years away.
  I remain a bit skeptical though. (https://www.innovationnewsnetwork.com/helion-breaks-ground-a...)
  
  pier25 10 hours ago
  
  That would be amazing but I'm not holding my breath.
dist-epoch 16 hours ago

In the days before AlphaGo computer Go was still a decade away according to most experts.

benzible 17 hours ago

What's his estimate of how far we are from a definition of AGI?

balder1991 12 hours ago

The only relevant definition for them: https://techcrunch.com/2024/12/26/microsoft-and-openai-have-...
chasd00 15 hours ago

I wonder if like an inverse definition would work. If a user has root/admin access and all permissions/authority needed to command an AI to list the files in a directory and it simply refuses, would that be a sign of intelligence?
SJC_Hacker 16 hours ago

True AGI will be able to improve itself without human intervention or even guidance
- nextworddev 16 hours ago
  
  assuming it's "aligned" with a foundational model lab
password54321 17 hours ago

Can perform out of distribution tasks at least around average human level performance.
- benzible 15 hours ago
  
  Every attempt to formally define "general intelligence" for humans has been a shitshow. IQ tests were literally designed to justify excluding immigrants and sterilizing the "feeble-minded." Modern psychometrics can't agree on whether intelligence is one thing (g factor) or many things, whether it's measurable across cultures, or whether the tests measure aptitude or just familiarity with test-taking and middle-class cultural norms.
  Now we're trying to define AGI - artificial general intelligence - when we can't even define the G, much less the I. Is it "general" because it works across domains? Okay, how many domains? Is it "general" because it can learn new tasks? How quickly? With how much training data?
  The goalposts have already moved a dozen times. GPT-2 couldn't do X, so X was clearly a requirement for AGI. Now models can do X, so actually X was never that important, real AGI needs Y. It's a vibes-based marketing term - like "artificial intelligence" was (per John McCarthy himself) - not a coherent technical definition.
  
  password54321 2 hours ago
  
  I think you are overthinking this. The ARC benchmark for fluid abstracting reasoning was made in 2019 and it still hasn't been 'solved'. So the goalposts aren't moving as much as you think they are.
  LLMs or neural nets have never been good with out of distribution tasks.
oytis 16 hours ago

The best definition so far is that it's something that will make a hundred billion dollar for Sam Altman.
- baobun 15 hours ago
  
  That's unironially basically what Altman's defined it as in public.

qgin 18 hours ago

We'll be living in a world of 50% unemployment and still debating whether it's "true AGI"

grantcas 11 hours ago

[dead]

beeboop0 14 hours ago

[dead]

nadermx 17 hours ago

I bet you we are all wrong and some random person is going to vibe code himself into something none of us expected. I half kid, if none of you have see it, highly suggest https://karpathy.ai/zero-to-hero.html

dingnuts 17 hours ago

Then why didn't Karpathy vibe code this?
https://x.com/GaryMarcus/status/1978500888521068818
- keeda 10 hours ago
  
  This is actually discussed in the interview: https://www.dwarkesh.com/i/176425744/llm-cognitive-deficits
  It seems to be more nuanced than what people have assumed. The best I can summarize it as is that he was doing rather non-standard things that confused the LLMs which have been trained on vast amounts of very standard code and hence kept defaulting to those assumptions.
  Maybe a rough analogy is that he was trying to "code golf" this repo while LLMs kept trying to write "enterprise" code because that is overwhelmingly what they have been trained on.
- imtringued 15 hours ago
  
  In my experience, LLMs are really good at retrieving obscure knowledge/algorithms that are locked behind journal paywalls.

angiolillo 18 hours ago

"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra

The debate about AGI is interesting from a philosophical perspective, but from a practical perspective AI doesn't need to get anywhere close to AGI to turn the world upside down.

flatline 18 hours ago

I don’t even know what AGI is, and neither does anyone else as far as I can tell. In the parts of the video I watched, he cites several things missing which all have to do with autonomy: continual automated updates of internal state, fully autonomous agentic behavior, etc.
I feel like GPT 3 was AGI, personally. It crossed some threshold that was both real and magical, and future improvements are relying on that basic set of features at their core. Can we confidently say this is not a form of general intelligence? Just because it’s more a Chinese Room than a fully autonomous robot? We can keep moving the goalposts indefinitely, but machine intelligence will never exactly match that of humans.
- mpalmer 18 hours ago
  
  It crossed some threshold that was both real and magical
  Only compared to our experience at the time.
  and future improvements are relying on that basic set of features at their core
  Language models are inherently limited, and it's possible - likely, IMO - that the next set of qualitative leaps in machine intelligence will come from a different set of ideas entirely.
  
  zer00eyz 18 hours ago
  
  Learning != Training.
  Thats not a period, it's a full stop. There is no debate to be had here.
  IF an LLM makes some sort of breakthrough (and massive data collation allows for that to happen) it needs to be "re trained" to absorb its own new invention.
  But we also have a large problem in our industry, where hardware evolved to make software more efficient. Not only is that not happening any more but we're making our software more complex and to some degree less efficient with every generation.
  This is particularly problematic in the LLM space: every generation of "ML" on the llm side seems to be getting less efficient with compute. (Note: this isnt quite the case in all areas of ML, yolo models working on embedded compute is kind of amazing).
  Compactness, efficiency and reproducibility are directions the industry needs to evolve in, if it ever hopes to be sustainable.
- zeroonetwothree 18 hours ago
  
  I think most people would consider AGI to be roughly matching that of humans in all aspects. So in that sense there’s no way that GPT3 was AGI. Of course you are free to use your own definition, I’m just reflecting what the typical view would be.
- colonCapitalDee 18 hours ago
  
  AGI is when a computer can accomplish every cognitive task a typical human can. Given tools to speak, hear, and manipulate a computer, an AGI could be dropped in as a remote employee and be successful.
  
  throwaway-0001 17 hours ago
  
  A human is agi when can accomplish all tasks of ChatGPT… how come the reverse doesn’t work?
- throw54465665 18 hours ago
  
  Most humans do not even have a general intelligence! Many students are practically illiterate, and can not even read and understand book or manual!
  We are approaching situation, where AI will make most decisions, and people will wear it as a skin suit, to fake competency!
  
  zeroonetwothree 18 hours ago
  
  I wouldn’t say that any specific skill (like literacy) is required to have intelligence. It’s more the capability to learn skills and build a model of the world and the people in it using abstract reasoning.
  Otherwise we would have to say that pre-literacy societies lacked intelligence, which would be silly since they are the ones that invented writing in the first place!
  
  throw54465665 18 hours ago
  
  [flagged]
  
  AnimalMuppet 18 hours ago
  
  Most people cannot comprehend an audiobook? No way.
  If you have evidence for that claim, show it. Otherwise, no, you're just making stuff up.
  
  throw54465665 17 hours ago
  
  Sorry, it should be "most americans"!
  Very simple proof, they can not even read/listen to their own constitution!
  
  throwaway-0001 8 hours ago
  
  Did you ever had a mainstream product and answered customer questions? You should try to see what it truly means an average person.
  Examples:
  Send email with subject “I need support” (no body).
  I answer by email: what you need?
  Reply: I need to activate email support
  …
  Truly agi.
jaccola 18 hours ago

I think this quote is often misapplied. The question "can a submarine safely move through water" IS a very interesting question (especially if you are planning a trip in one!).
Obviously this quote would be well applied if we were at a stage where computers were better at everything humans can do and some people were saying "This is not AGI because it doesn't think exactly the same as a human". But we aren't anywhere near this stage yet.
- angiolillo 17 hours ago
  
  > The question "can a submarine safely move through water" IS a very interesting question
  Sure, and the question of whether AI can safely perform a particular task is interesting.
  > Obviously this quote would be well applied if we were at a stage where computers were better at everything humans can do and some people were saying "This is not AGI because it doesn't think exactly the same as a human".
  Why would that be required?
  I used the quote primarily to point out that discussing the utility of AI is wholly distinct from discussing the semantics of words like "think", "general intelligence", or "swim". Knowing whether we are having a debate about utility/impact or philosophy/semantics seems relevant regardless of the current capabilities of AI.
zeknife 18 hours ago

It also doesn't need to be good for anything to turn the world upside down, but it would be nice if it was
IshKebab 18 hours ago

Fortunately I haven't heard anyone make silly claims about stochastic parrots and the impossibility of conscious computers for quite a while.

RLAIF 17 hours ago

[dead]

overgard 18 hours ago

Right in time for the year of the linux desktop.

Topgamer7 19 hours ago

Whenever someone brings up "AI", I tell them AI is not real AI. Machine learning is a more apt buzzword.

And real AI is probably like fusion. Its always 10 years away.

CSSer 18 hours ago

The best part of this is I watched Sam Altman say he really thinks fusion is a short period of time away in response to a question about energy consumption a couple years ago. That was the moment I knew he's a quack.
- ctkhn 18 hours ago
  
  Not to be anti YC on their forum, but the VC business model is all about splashing cash on a wide variety of junk that will mostly be worthless, hyping it to the max, and hoping one or two is like amazon or facebook. He's not an engineer, he's like Steve Jobs without the good parts.
- jacobolus 18 hours ago
  
  Altman recently said, in response to a question about the prospect of half of entry-level white-collar jobs being replaced by "AI" and college graduates being put out of work by it:
  > “I mean in 2035, that, like, graduating college student, if they still go to college at all, could very well be, like, leaving on a mission to explore the solar system on a spaceship in some completely new, exciting, super well-paid, super interesting job, and feeling so bad for you and I that, like, we had to do this kind of, like, really boring old kind of work and everything is just better."
  Which should be reassuring to anyone having trouble finding an entry-level job as an illustrator or copywriter or programmer or whatever.
  
  rightbyte 17 hours ago
  
  So STNG in 10 years?
  edit: Oh. Solar system. Nvm. Totally reasonable.
- rohit89 5 hours ago
  
  Sam is an investor in a fusion startup. In any case, how long it takes us to get to working fusion is proportional to the amount of funding it recieves. I'm hopeful that increased energy needs will spur more investment into it.
- SAI_Peregrinus 16 hours ago
  
  Fusion is 8 light-minutes away. The connection gets blocked often, so methods to buffer power for those periods are critical, but they're getting better so it's gotten a lot more practical to use remote fusion power at large scales. It seems likely that the power buffering problem is easier to solve than the local fusion problem, so more development goes to improving remote fusion power than local.
- timeon 18 hours ago
  
  He had to use distraction because he knows that he is doing part in increasing emissions.
- 2OEH8eoCRo0 18 hours ago
  
  Fusion is known science while AGI is still very much an enigma.
CharlesW 18 hours ago

> Whenever someone brings up "AI", I tell them AI is not real AI.
You and also everyone since the beginning of AI. https://quoteinvestigator.com/2024/06/20/not-ai/
- zamadatix 18 hours ago
  
  People saying that usually mean it as "AI is here and going to change everything overnight now" yet, if you take it literally, it's "we're actually over 50 years into AI, things will likely continue to advance slowly over decades".
  The common thread between those who take things as "AI is anything that doesn't work yet" and "what we have is still not yet AI" is "this current technology could probably have used a less distracting marketing name choice, where we talk about what it delivers rather than what it's supposed to be delivering".
adastra22 18 hours ago

Machine learning as a descriptive phrase has stopped being relevant. It implies the discovery of information in a training set. The pre-training of an LLM is most definitely machine learning. But what people are excited and interested in is the use of this learned data in generative AI. “Machine learning” doesn’t capture that aspect.
- simpleladle 18 hours ago
  
  But the things we try to make LLMs do post-pre-training are primarily achieved via reinforcement learning. Isn't reinforcement learning machine learning? Correct me if I'm misconstruing what you're trying to say here
  
  adastra22 17 hours ago
  
  You are still talking about training. Generative applications have always been fundamentally different from classification problems, and has now (in the form of transformers and diffusion models) taken on entirely new architectures.
  If “machine learning” is taken to be so broad as to include any artificial neural network, all of which are trained with back propagation these days, then it is useless as a term.
  The term “machine learning” was coined in the era of specialized classification agents that would learn how to segment inputs in some way. Thing email spam detection, or identifying cat pictures. These algorithms are still an essential part of both the pre-training and RLHF fine tuning of LLM models. But the generative architectures are new and very essential to the current interest in and hype surrounding AI at this point in time.
- hnuser123456 18 hours ago
  
  It's a valid term that is worth introducing to the layperson IMO. Let them know how the magic works, and how it doesn't.
  
  adastra22 18 hours ago
  
  Machine learning is only part of how an LLM agent works though. An essential part, but only a part.
  
  sdenton4 18 hours ago
  
  I see a fair amount of bullshit in the LLM space though, where even cursory consideration would connect the methods back to well-known principles in ML (and even statistics!) to measure model quality and progress. There's a lot of 'woo, it's new! we don't know how to measure it exactly but we think it's groundbreaking!' which is simply wrong.
  From where I sit, the generative models provide more flexibility but tend to underperform on any particular task relative to a targeted machine learning effort, once you actually do the work on comparative evaluation.
  
  adastra22 18 hours ago
  
  I think we have a vocabulary problem here, because I am having a hard time understanding what you are trying to say.
  You appear to be comparing apples to oranges. A generation task is not a categorization task. Machine learning solves categorization problems. Generative AI uses model trained by machine learning methods, but in a very different architecture to solve generative problems. Completely different and incomparable application domain.
  
  ainch 12 hours ago
  
  I think you're overstating the distinction between ML and generation - plenty of ML methods involve generative models. Even basic linear regression with a squared loss can also be framed as a generative model derived by assuming Gaussian noise. Probabilistic PCA, HMMs, GMMs etc... generation has been a core part of ML for over 20 years.
  
  sdenton4 13 hours ago
  
  And yet, people very often find themselves using generative models for categorization and information retrieval tasks...
  
  IshKebab 18 hours ago
  
  How does "it's called machine learning not AI" help anyone know how it works? It's just a fancier sounding name.
  
  hnuser123456 17 hours ago
  
  Because if they're curious, they can look up (or ask an "AI") about machine learning, rather than just AI, and learn more about the capabilities and difficulties and mechanics of how it works, learn some of the history, and have grounded expectations for what the next 10 years of development might look like.
  
  IshKebab 17 hours ago
  
  They can google AI too... Do you think googling "how does AI work" won't work?
bcrosby95 18 hours ago

AI is an overloaded term.
I took an AI class in 2001. We learned all sorts of algorithms classified as AI. Including various ML techniques. Under which included perceptrons.
- timidiceball 18 hours ago
  
  That was an impressive takeaway from the first machine learning course i took: that many things previously under the umbrella of Artificial Intelligence have since been demystified and demoted to implementations we now just take for granted. Some examples were real world map route planning for transport, locating faces in images, Bayesian spam filters.
- porphyra 18 hours ago
  
  back in the day alpha-beta search was AI hehe
  
  pixelpoet 18 hours ago
  
  As a young child in Indonesia we had an exceptionally fancy washing machine with all sorts of broken English superlatives on it, including "fuzzy logic artificial intelligence" and I used to watch it doing the turbo spin or whatever, wondering what it was thinking. My poor mom thought I was retarded.
  
  porphyra 16 hours ago
  
  My rice cooker also has fuzzy logic. I guess they just use floats instead of bools.
brandonb 18 hours ago

Andrew Ng has a nice quote: “Instead of doing AI, we ended up spending our lives doing curve fitting.”
Ten years ago you'd be ashamed to call anything "AI," and say machine learning if you wanted to be taken seriously, but neural networks have really have brought back the term--and for good reason, given the results.
wilg 18 hours ago

Arguing about the definitions of words is rarely useful.
- Spare_account 18 hours ago
  
  How can we discuss <any given topic> if we are talking about different things?
  
  IanCal 18 hours ago
  
  Well that's rather the point - arguing about exceptionally heavily used terminology isn't useful because there's already a largely shared understanding. Stepping away from that is a huge effort, unlikely to work and at best all you've done is change what people mean when they use a word.
  
  bcrosby95 18 hours ago
  
  The point is to establish definitions rather than argue about them. You might save yourself from two pointless arguments.
- Root_Denied 18 hours ago
  
  Except AI already had a clear definition well before it started being used as a way to inflate valuations and push marketing narratives.
  If nothing else it's been a sci-fi topic for more than a century. There's connotations, cultural baggage, and expectations from the general population about what AI is and what it's capable of, most of which isn't possible or applicable to the current crop of "AI" tools.
  You can't just change the meaning of a word overnight and toss all that history away, which is why it comes across as an intentionally dishonest choice in the name of profits.
  
  layer8 18 hours ago
  
  Maybe do some reading here: https://en.wikipedia.org/wiki/History_of_artificial_intellig...
  
  Root_Denied 17 hours ago
  
  And you should do some reading into the edit history of that page. Wikipedia isn't immune from concerted efforts to astroturf and push marketing narratives.
  More to the point, the history of AI up through about 2010 talks about attempts to get it working using different approaches to the problem space, followed by a shift in the definitions of what AI is in the 2005-2015 range (narrow AI vs. AGI). Plenty of talk about the various methods and lines fo research that were being attempted, but very little about publicly pushing to call commercially available deliverables as AI.
  Once we got to the point where large amounts of VC money was being pumped into these companies there was an incentive to redefine AI in favor of what was within the capabilities and scope of machine learning and LLMs, regardless of whether that fit into the historical definition of AI.
  
  wilg 13 hours ago
  
  I do not care what anyone thinks the definition is, nor should you.
layer8 18 hours ago

AI is whatever is SOTA in the field, has always been.
lo_zamoyski 18 hours ago

AI is in the eye of the beholder.
huflungdung 18 hours ago

[dead]

rana763 9 hours ago

[dead]

mwkaufma 18 hours ago

Ten years away, just like it was ten years ago and will be ten years from now.

woadwarrior01 3 hours ago

Controlled fusion has always been 30 years away.

ath3nd 13 hours ago

[dead]

hackitup7 17 hours ago

[flagged]

nextworddev 16 hours ago

*back to your cubicles

deadbabe 18 hours ago

Frankly it doesn’t matter if it’s a decade away.

AI has now been revealed to the masses. When AGI arrives most people will barely notice. It will just feel like slightly better LLMs to them. They will have already cemented notions of how it works and how it affects their lives.

segmondy 19 hours ago

AGI is already here.

010101010101 18 hours ago

Where are you because it’s sure not where I am…
- segmondy 18 hours ago
  
  5 years ago, everyone would agree that what we have today is AGI.
  
  baobun 15 hours ago
  
  100 years ago, "everyone" would similarly agree that what we had 10 years ago was either literally God or The Devil.
  
  zeknife 17 hours ago
  
  At least until they spend some time with it
  
  rvz 18 hours ago
  
  No-one agrees on what is even AGI, except for the fact that the definitions change more times that the weather which makes it meaningless.
chronci739 18 hours ago

> AGI is already here.
cause elon musk says FSD is coming in 2017?
- adastra22 18 hours ago
  
  Because we already have artificial (man-made) general (contrast with domain specific) intelligence (algorithmic problem solvers). A.G.I.
  If ChatGPT is not AGI, somebody has moved goalposts.
  
  walkabout 18 hours ago
  
  I think a baseline requirement would be that it… thinks. That’s not a thing LLMs do.
  
  adastra22 18 hours ago
  
  That’s an odd claim, given that we have so-called thinking models. Is there a specific way you have in mind in which LLMs are not thinking processes?
  
  blibble 17 hours ago
  
  I can call my cat an elephant
  it doesn't make him one
  
  010101010101 17 hours ago
  
  Both "general" and "intelligence" are _at least_ easily arguable without moving any goal posts, not that goal posts have ever been well established in the first place.
password54321 17 hours ago

AI psychosis is already here.

theusus 18 hours ago

[flagged]

jasonthorsness 18 hours ago

Andrej coined the term "vibe coding" in February on X, only 8 months ago.
guiomie 18 hours ago

He's also the guy behind FSD which is kinda turning into a scam.
- lazystar 17 hours ago
  
  > FSD which is a scam.
  fixed that for you.
dlivingston 17 hours ago

??? Many developers, experienced and not, play around with vibe coding. Is your critique of him that he has tried vibe coding?
- theusus 7 hours ago
  
  I’m critiquing him that he lied in his claim. And anyone who claims same is just farming engagement.

meindnoch 16 hours ago

People keep talking about AGI as if it's some mystical leap beyond human capability.

But let's be honest; software development at a modern startup is already the upper bound of applied intelligence. You're juggling shifting product specs, ambiguous user feedback, legacy code written by interns, and five competing JS frameworks, all while shipping on a Friday. Models can now do that. They can reason about asynchronous state, refactor a codebase across thousands of lines, and actually explain the difference between useEffect and useLayoutEffect without resorting to superstition.

If that's not general intelligence, what exactly are we waiting for - self-awareness?

password54321 2 hours ago

Computers being good/fast at automating/calculating things that people find difficult is not a new phenomenon. By your standards we have had general intelligence decades ago.
hax0ron3 8 hours ago

Models can't do that now, though. If they could, pretty much every human software engineer would be unemployed right now.
blueside 16 hours ago

LLMs have continually taught me that we have vastly overestimated human intelligence
- woadwarrior01 3 hours ago
  
  Perhaps we're overestimating human intelligence and underestimating animal intelligence. Also funny that current LLMs are incapable of continual learning themselves.
- teleforce 6 hours ago
  
  >LLMs have continually taught me that we have vastly overestimated human intelligence
  LLMs have continually taught me that we have vastly underestimated human intelligence, fixed that for you
baobun 15 hours ago

> software development at a modern startup is already the upper bound of applied intelligence.
The hubris and myopia is staggering.