1 comments

  • vaishak2future 2 hours ago
    Author here. We’ve reached the point where frontier models (along with agent harnesses) can perform verbal reinforcement learning over multimodal data to reason over failures and correct policies expressed as code to fix them in a targeted manner. This allows for generalizing across tasks and embodiments without having to collect massive teleoperation datasets or having to update model parameters or perform reward engineering. I believe this opens up the path for general models to be more powerful than “robotics-first” foundation models. AMA.