Today’s most advanced AI models have many flaws, but decades from now, they will be recognized as the first true examples of artificial general intelligence.
While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method.
Graphs:
I think we can’t really get the most out of current LLMs because of how much they cost to run. Once we can get speeds up and costs down, they’ll be able to do more impressive things.
If it needs me to pre-chew and check every single step then it can still be a smart tool but its definitely not intelligent.
If this is the standard for AGI, I’m not 100% convinced that every human meets the standard for intelligence either. Anyone who’s ever done team projects will have experience of someone who cannot complete a simple task without extensive pre-chewing and checking in on every step.
It’s going to end up like that thing with the bear-proof bins, isn’t it? The overlap between the smartest LLMs and dumbest humans is going to be bigger than one might think, even if the LLM never achieves true general intelligence or self-awareness. Bears aren’t sapient either, but it doesn’t stop them being more intelligent than some tourists.
Thats why i said low level, and it does not need to be perfect either. Not all my colleques are on the bright side but all of them are remaining employed without someone else sitting next to them for every single minute. Also a huge thing for human intelligence is in personal strengths. They may
Be bad for task A but when it comes to taking in an emphatic way or analyzing sports there suddenly
Pro. This ability is what defines “General” intelligence versus narrow Intelligence which is supposed to do one job only.
I work with gpt4 for my Job and while
It is very useful the moment you poke it with deeper questions it becomes clear it absolutely no idea what its doing or what is going on. You cannot trust anything it says and often its a frustrating experience rollong the regenerate button till it gets valid answer.
In such context intelligent is much more relative.
Same thing with animals. There is also big difference with ai not having a proper body yet.
There are a number of low level jobs that can be done by both children and animals for instance a service dog. They are both capable intelligent creatures.
In the past children started to work in a factory the moment they could stand on their legs.
It would be near impossible for a 2yo or a dog to do a work from home assignment but for AI this should by far be an advantageous situation because its trained on computer data and does not need to spend so much of its “brain” learning to move and go potty.
It is relevant because “intelligence” is a collection of multiple things. The first kinds of intelligence a living creature learns are all fysical. If you instinctively pull your hand away when it touched fire. Thats already a kind of intelligence. Learning to understand and act on bodily needs to survive is bigger example.
The first steps towards emotional intelligence starts with the physical comfort of the womb and hugs received as a baby.
Every sentient creature we have ever known starts as autonomous body. A child without a body does not exists.
(Wow. That’s really a bad article. And even though the author managed to ramble on for quite some pages, they somehow completely failed to address the interesting and well discussed arguments.)
[Edit: I disagree -strongly- with the article]
We’ve discussed this in June 2022 after the Google engineer Blake Lemoine claimed his company’s artificial intelligence chatbot LaMDA was a self-aware person. We’ve discussed both intelligence and conciousness.
And my -personal- impression is: If you use ChatGPT for the first time, it blows you away. It’s been a ‘living in the future’ moment for me. And I see how you’d write an excited article about it.
But once you used it for a few days, you’ll see every 6th grade teacher can distinguish if homework assignments were done by a sentient being or an LLM. And ChatGPT isn’t really useful for too many tasks. Drafting things, coming up with creative ideas or giving something the final touch, yes. But defenitely limited and not something ‘general’. I’d say it does some of my tasks so badly, it’s going to be years before we can talk about ‘general’ intelligence.
Sorry, It was probably more me having a bad day. I was a bit grumpy that day, because I didn’t have that much sleep.
I’m seeing lots of …let’s say… uninformed articles about AI. People usually anthropomorphise language models. (Because they do the thing they’re supposed to do very well. That is: write text that sounds like text.) People bring in other (unrelated) concepts. But generally, evidence doesn’t support their claims. Like with the ‘conciousness’ in that case with Lemoine, last year. Maybe I get annoyed too easily. But my claim is, it is very important not to spread confusion about AI.
I didn’t see the article was written by two high profile AI researchers. I’m going to bookmark it because it has lots of good references to papers and articles in it.
But I have to disagree on almost every conclusion in it:
They begin with claiming fixing (all) current flaws like hallucinations would mean superintelligence. Without backing it up at all.
The next paragraph is titled like they’d now define AGI, but they just broaden the tasks narrow AI can do. I’d agree. It’s impressive what can be done with current AI tech. But you’d need to show me the distinguishing factors and prove AI is past that. The way they do it just makes it a wide variant of narrow AI. (And I’d argue it’s not that wide at all, compared to the things a human does every day.)
I think their example showcasing emergent abilities of ML is flawed. When doing arithmatics, there is a sharp threshold where you don’t just memorize numbers and the result but get a grasp of numbers and how the decimal system works and you understand the concept of addition and push past memorizing multiplication tables. I’d argue it’s not gradual like they claim. I get that this couldn’t be backed up by studying current models. But it could be well the case that they’re still so small or you’d need to teach them maths in a more effective way than just feeding them the words of every book on earth and Wikipedia.
The story on AI history is fascinating. How people first tried to build AI with formal reasoning and semantic networks, constructed vast connected knowledge databases, got through two "AI winter"s and nowadays we just dump the internet into an LLM and that’s the approach that works.
What I would like to have been part of that article:
How can we make sure we’re not antropomorphizing, but it’s really the thing itself that has general intelligence?
What are some quantivative and qualitative measurements for AGI? How does the current state of the art AI perform on these metrics? They address that in the section “Metrics”. But they just criticise current metrics and say it passed the bar exam etc. What are the implications? What are some proper metrics to back up a claim like theirs? They just did away with the metrics. What are they basing their conclusion on, then?
If defining general intelligence is difficult: What is the lower bound for AGI? What is concidered the upper bound at which we’re sure it’s AGI?
What about the lack of a state of mind in transformer models? It is trained and then it is the way it is. Until OpenAI improves it a few months layer and incorporates new information into the next iteration. But it’s unable to transition into a new state while running. It get’s some tokens as input, calculates and then does output. No internal state that could save something or change. This is one of the main points ruling out conciousness. But it also limts the tasks it can do at all. Doesn’t it? It now needs prior knowledge or to fit every bit of information into the context window. Or retrieve it somehow, for example with a vector database. The authors mention “in-context-learning” early on. But it’s not clear if that does it for every task and to what scale. Without more information or new scientific advancements, I doubt it.
Most importantly:
It can’t learn anything while running. It can’t ‘remember’. This is a requirement per definition of AGI. Aren’t intelligent entities supposed to be able to learn?
Are there tasks that can’t be done by transformer models? One example I read about is: They are feed-forward models. There is nothing regressive in them. The example task is you want to write a joke. Now you first need to come up with a pun and then write the build-up to the pun. But once you tell it, you need to tell the build-up first, then the pun. A transformer model starts writing at the beginning and then comes up with the pun afterwards once it gets to that point in the text. Are there many real-world tasks for intelligence that inherently require you to think the other way round / backwards? Can you map them so you can tackle them with forwards-thinking? If not, transformer models are unable to do that task. Hence they’re not AGI. But still there are tasks similar to the joke example that the LLMs obviously do better than you’d expect.
Are we talking about LLMs or agents? A LLM embedded in a larger project can do more. For example have the text fed back in. Do reasoning and then give a final answer. Can store/remember information in a vector database. Be instructed to fact check it’s output and rephrase it after providing its own critique. But from the article it’s completely unclear what they’re talking about. It seems like they only refer to a plain LLM like ChatGPT.
And my personal experience doesn’t align with the premise either. The article wants to tell me we’re already at AGI. I’ve fooled around with ChatGPT and had lots of fun with the smaller Llama models at home. But I completely fail to have them do really useful tasks from my every-day life. It does constrained and narrowed down tasks like drafting an email or text. Or doing the final touches. Exactly like I’d expect from narrow AI. And I always need to intervene and give it the correct direction. It’s my intelligence an me guiding ChatGPT that’s making the result usable. And still it gets facts wrong often while wording them in a way that sounds good. I sometimes see prople use summary bots here. Or use an LLM to summarize a paper for a Lemmy post. More often than not, the result is riddled with inaccuracies and false information. Like someone who didn’t understand the paper but had to hand in something for their assignment.
That’s why I don’t understand the conclusion of the article. I don’t see AGI around me.
I really don’t like confusion being spread about AI. I think it is going to have a large impact on our lives. But people need to know the facts. Currently some people fear about their jobs, some are afraid of an impeding doom… the robot apocalypse. Other people hype it to quite some levels and investors eagerly throw hundreds of millions of dollars at anything that has ‘AI’ in its name. And yet other people aren’t aware of the limitations and false information they spread by using it as a tool. I don’t think this is healthy.
To end on a positive note: Current LLMs are very useful and I’m glad we have them. I can make them do useful stuff. But I need to constrain them and have them work on a well defined and specific task to make it useful. Exactly like I’d expect it from narrow AI. Emergent abilities are a thing. A LLM isn’t just autocomplete text. There are concepts and models of real-world facts inside. I think researchers will tackle issues like the ‘hallucinations’ and make AI way smarter and more useful. Some people predict AGI to be in reach within the next decade or so.
deleted by creator
Have you seen this paper?
Abstract:
Graphs:
I think we can’t really get the most out of current LLMs because of how much they cost to run. Once we can get speeds up and costs down, they’ll be able to do more impressive things.
https://www.youtube.com/watch?v=Zlgkzjndpak
https://www.youtube.com/watch?v=NfGcWGaO1E4
deleted by creator
My standard for agi is that its able to do a low-level human work from home job.
If it needs me to pre-chew and check every single step then it can still be a smart tool but its definitely not intelligent.
If this is the standard for AGI, I’m not 100% convinced that every human meets the standard for intelligence either. Anyone who’s ever done team projects will have experience of someone who cannot complete a simple task without extensive pre-chewing and checking in on every step.
It’s going to end up like that thing with the bear-proof bins, isn’t it? The overlap between the smartest LLMs and dumbest humans is going to be bigger than one might think, even if the LLM never achieves true general intelligence or self-awareness. Bears aren’t sapient either, but it doesn’t stop them being more intelligent than some tourists.
Thats why i said low level, and it does not need to be perfect either. Not all my colleques are on the bright side but all of them are remaining employed without someone else sitting next to them for every single minute. Also a huge thing for human intelligence is in personal strengths. They may Be bad for task A but when it comes to taking in an emphatic way or analyzing sports there suddenly Pro. This ability is what defines “General” intelligence versus narrow Intelligence which is supposed to do one job only.
I work with gpt4 for my Job and while It is very useful the moment you poke it with deeper questions it becomes clear it absolutely no idea what its doing or what is going on. You cannot trust anything it says and often its a frustrating experience rollong the regenerate button till it gets valid answer.
But that would also exclude little children from intelligence.
How are small children different from smart animals?
It takes humans a while to develop our thinking goo, before that, we’re barely able to survive.
In such context intelligent is much more relative. Same thing with animals. There is also big difference with ai not having a proper body yet.
There are a number of low level jobs that can be done by both children and animals for instance a service dog. They are both capable intelligent creatures.
In the past children started to work in a factory the moment they could stand on their legs.
It would be near impossible for a 2yo or a dog to do a work from home assignment but for AI this should by far be an advantageous situation because its trained on computer data and does not need to spend so much of its “brain” learning to move and go potty.
deleted by creator
It is relevant because “intelligence” is a collection of multiple things. The first kinds of intelligence a living creature learns are all fysical. If you instinctively pull your hand away when it touched fire. Thats already a kind of intelligence. Learning to understand and act on bodily needs to survive is bigger example.
The first steps towards emotional intelligence starts with the physical comfort of the womb and hugs received as a baby.
Every sentient creature we have ever known starts as autonomous body. A child without a body does not exists.
“You can use them for all kinds of tasks” - so would you say they’re generally intelligent? As in they aren’t an expert system?
deleted by creator
(Wow. That’s really a bad article. And even though the author managed to ramble on for quite some pages, they somehow completely failed to address the interesting and well discussed arguments.)
[Edit: I disagree -strongly- with the article]
We’ve discussed this in June 2022 after the Google engineer Blake Lemoine claimed his company’s artificial intelligence chatbot LaMDA was a self-aware person. We’ve discussed both intelligence and conciousness.
And my -personal- impression is: If you use ChatGPT for the first time, it blows you away. It’s been a ‘living in the future’ moment for me. And I see how you’d write an excited article about it. But once you used it for a few days, you’ll see every 6th grade teacher can distinguish if homework assignments were done by a sentient being or an LLM. And ChatGPT isn’t really useful for too many tasks. Drafting things, coming up with creative ideas or giving something the final touch, yes. But defenitely limited and not something ‘general’. I’d say it does some of my tasks so badly, it’s going to be years before we can talk about ‘general’ intelligence.
deleted by creator
Sorry, It was probably more me having a bad day. I was a bit grumpy that day, because I didn’t have that much sleep.
I’m seeing lots of …let’s say… uninformed articles about AI. People usually anthropomorphise language models. (Because they do the thing they’re supposed to do very well. That is: write text that sounds like text.) People bring in other (unrelated) concepts. But generally, evidence doesn’t support their claims. Like with the ‘conciousness’ in that case with Lemoine, last year. Maybe I get annoyed too easily. But my claim is, it is very important not to spread confusion about AI.
I didn’t see the article was written by two high profile AI researchers. I’m going to bookmark it because it has lots of good references to papers and articles in it.
But I have to disagree on almost every conclusion in it:
What I would like to have been part of that article:
And my personal experience doesn’t align with the premise either. The article wants to tell me we’re already at AGI. I’ve fooled around with ChatGPT and had lots of fun with the smaller Llama models at home. But I completely fail to have them do really useful tasks from my every-day life. It does constrained and narrowed down tasks like drafting an email or text. Or doing the final touches. Exactly like I’d expect from narrow AI. And I always need to intervene and give it the correct direction. It’s my intelligence an me guiding ChatGPT that’s making the result usable. And still it gets facts wrong often while wording them in a way that sounds good. I sometimes see prople use summary bots here. Or use an LLM to summarize a paper for a Lemmy post. More often than not, the result is riddled with inaccuracies and false information. Like someone who didn’t understand the paper but had to hand in something for their assignment. That’s why I don’t understand the conclusion of the article. I don’t see AGI around me.
I really don’t like confusion being spread about AI. I think it is going to have a large impact on our lives. But people need to know the facts. Currently some people fear about their jobs, some are afraid of an impeding doom… the robot apocalypse. Other people hype it to quite some levels and investors eagerly throw hundreds of millions of dollars at anything that has ‘AI’ in its name. And yet other people aren’t aware of the limitations and false information they spread by using it as a tool. I don’t think this is healthy.
To end on a positive note: Current LLMs are very useful and I’m glad we have them. I can make them do useful stuff. But I need to constrain them and have them work on a well defined and specific task to make it useful. Exactly like I’d expect it from narrow AI. Emergent abilities are a thing. A LLM isn’t just autocomplete text. There are concepts and models of real-world facts inside. I think researchers will tackle issues like the ‘hallucinations’ and make AI way smarter and more useful. Some people predict AGI to be in reach within the next decade or so.
More references: