This is how o1 works, OpenAi's model that 'thinks' before responding
Project Straberry finally unveiled. The new Ai model is available for ChatGpt Plus and Team users
2' min read
2' min read
OpenAI today announced o1, a new series of models for solving increasingly complicated problems: from complex tasks to better problem solving than previous versions of the technology. This represents an early preview of this series that includes o1-preview and o1-mini, on ChatGPT and in the API.
"We trained these models to spend more time thinking about problems before responding, just as a person would. Through training, they learn to refine their thinking process, try different strategies and recognise their mistakes".
Known by the code name Strawberry as blogged o1-preview is a new large language model. It has been trained with reinforcement learning to perform complex reasoning. And as they write on the blog: 'o1 thinks before it responds: it can produce a long internal chain of thoughts before responding to the user'.
'In essence,' it says, 'through the training process, these models learn to refine the processing method by using various possibilities and recognising errors'. It means that it answers more advanced questions more quickly, even compared to a human being.
Similar to how a human being may think for a long time before answering a difficult question, o1 uses a chain of thought when trying to solve a problem. Through reinforcement learning, o1 learns to refine his chain of thought and refine the strategies he uses. He learns to recognise and correct his mistakes. He learns to break down difficult steps into simpler ones. He learns to try a different approach when his current one does not work.



