Nothing Special   »   [go: up one dir, main page]

What is an AI accelerator? What is agentic AI? Complete guide
X

OpenAI o1 explained: Everything you need to know

OpenAI's o1 models, launched in September 2024, enhance reasoning in AI and excel in complex tasks, such as generating and debugging code.

OpenAI has emerged to be one of the primary leaders of the generative AI era. The company's ChatGPT is among the most popular and widely used instances of generative AI, powered by its GPT family of large language models, or LLMs. As of September 2024, the primary models used by ChatGPT are GPT-4o and GPT-3.5.

For multiple weeks in August and into September 2024, reports circulated about a new model from OpenAI -- codenamed "Strawberry." Initially, it was not clear whether Strawberry was the successor to GPT-4o or something else.

On Sept. 12, 2024, the suspense behind Strawberry lifted with the official launch of OpenAI o1 models, including o1-preview and o1-mini.

What is OpenAI o1?

OpenAI o1 is a family of LLMs from OpenAI that have been optimized with enhanced reasoning functionality.

The o1 models are initially intended to be preview models, designed to provide users -- as well as OpenAI -- with a different type of LLM experience than the GPT-4o model. As is the case with all OpenAI's LLMs, o1 is a transformer model. It can be used to summarize content, generate new content, answer questions and write application code.

As opposed to OpenAI's prior models, the o1 models are designed to reason better. That is, instead of just providing a response as quickly as possible and using the basic transformer approach of weights and understanding what word or words belong together, o1 "thinks" about what the right approach is to solve a problem. The process of reasoning about a given problem in response to a user query is intended to provide a potentially more accurate response to certain types of complex queries. Unlike previous models, the o1 series spends more time processing information before responding. The o1 models are targeted at tackling hard problems that require multistep reasoning and complex problem-solving strategies.

The basic strategy taken by OpenAI for reasoning is chain-of-thought prompting, where a model reasons step by step through a problem in an iterative approach. The development of o1 involved advanced training techniques, such as reinforcement learning.

The initial launch in September 2024 included two models:

  • OpenAI o1-preview -- excels at tackling sophisticated problems.
  • OpenAI o1-mini -- provides a smaller, more cost-efficient version of o1.

What can OpenAI o1 do?

OpenAI o1 can perform many tasks like any of OpenAI's other GPT models -- such as answering questions, summarizing content and generating new content.

As an advanced reasoning model, o1 is particularly well-suited for certain tasks and use cases, including the following:

  • Enhanced reasoning. The o1 models are optimized for complex reasoning tasks, especially in STEM (science, technology, engineering and mathematics).
  • Brainstorming and ideation. The model's advanced reasoning abilities make it useful for generating creative ideas and solutions in various contexts.
  • Scientific research. The o1 models are ideal for different types of scientific research tasks. For example, o1 can annotate cell sequencing data and handle complex mathematical formulas needed in fields such as quantum optics.
  • Coding. The o1 models are effective at generating and debugging code, performing well in coding benchmarks such as HumanEval and Codeforces, according to OpenAI. The models are also effective in helping build and execute multi-step workflows for developers.
  • Mathematics. According to OpenAI, o1 excels in math-related benchmarks, outscoring the company's prior models. In a qualifying exam for the International Mathematics Olympiad (IMO), o1 scored 83% accuracy, compared to GPT-4o's 13%. The o1's mathematical power was tested with strong results in other advanced mathematics competitions, including the American Invitational Mathematics Examination (AIME). The model's math capabilities could potentially be used to help generate complex mathematical formulas for physicists.
  • Self-fact-checking. The o1 models can self-fact-check, improving the accuracy of its responses.

How to use OpenAI o1

There are several ways users and organizations can use the o1 models.

  • ChatGPT Plus and Team users. The o1-preview and o1-mini models are available directly for users of ChatGPT Plus and Team as of Sept. 12. Users can select the model manually in the model picker.
  • ChatGPT Enterprise and Education users. OpenAI has pledged to provide access to both models as of Sept. 19, 2024.
  • ChatGPT Free users. At launch, free users of ChatGPT do not have access to the o1 models. OpenAI plans to bring o1-mini access to all free users in the future.
  • API developers. Developers can access o1-preview and o1-mini through OpenAI's API.
  • Third-party services. Multiple third-party services have made the models available, including Microsoft Azure AI Studio and GitHub Models.

What are the limitations of OpenAI o1

As a preview set of models for an early iteration of a new type of LLM, there are several limitations, including the following:

  • Feature gaps. At launch, the o1 models lack web browsing, image processing and file uploading capabilities.
  • API restrictions. At launch, there are a variety of restrictions on the API limiting the models. Function calling and streaming are not supported initially. There is also limited access to chat completion parameters during the preview phase.
  • Response time. OpenAI users have come to expect rapid responses with little delay. But the o1 models are initially slower than previous models due to more thorough reasoning processes.
  • Rate limits. For ChatGPT Plus or Team users, OpenAI initially limited o1-preview usage to 30 messages a week, rising to 50 messages a week for 01-mini. On Sept. 16, 2024, OpenAI raised the limit for o1-preview to 50 messages a week and increased o1-mini to 50 messages per day.
  • Cost. For API users OpenAI o1 is more expensive than previous models -- including GPT-4o.

How OpenAI o1 improves safety

As part of the o1 models release, OpenAI also publicly released a System Card, which is a document that describes the safety evaluations and risk assessments that were done during model development. It details how the models were evaluated using OpenAI's framework for assessing risks in areas such as cybersecurity, persuasion and model autonomy.

  • Chain-of-thought reasoning. The o1 models use large-scale reinforcement learning to perform complex reasoning before responding. This lets them refine the generation process and recognize mistakes. As a result, they can better follow specific guidelines and model policies, improving their ability to provide safe and appropriate content.
  • Advanced jailbreak resistance. The o1 models demonstrate significant improvements in resisting jailbreaks. On the Strong Reject benchmark, which tests resistance against common attacks from literature, o1-preview and o1-mini achieve better scores than GPT-4o.
  • Improved content policy adherence. On the Challenging Refusal Evaluation, which tests the model's ability to refuse unsafe content across categories such as harassment, hate speech and illicit activities, o1-preview achieves a not-unsafe score of 0.934, which is superior to GPT-4o's 0.713.
  • Enhanced bias mitigation. On the Bias Benchmark for QA evaluation, which tests for demographic fairness, o1-preview selects the correct answer 94% of the time on unambiguous questions, compared to GPT-4o's 72%. The models also show improved performance on evaluations measuring the use of race, gender and age in decision-making, with o1-preview generally outperforming GPT-4o.
  • Legible safety monitoring. The chain-of-thought summaries provided by o1 models offer a new approach for safety monitoring. In an analysis of 100,000 synthetic prompts, only 0.79% of o1-preview's responses were flagged as potentially deceptive, with most of these being forms of hallucination rather than intentional deception.

GPT-4o vs. OpenAI o1

The following chart provides a comparison of OpenAI's GPT-4o and o1 models, showing a number of differences across them.

Feature GPT-4o o1 models
Release date May 13, 2024 Sept. 12, 2024
Model variants Single model Two variants: o1-preview and o1-mini
Reasoning capabilities Good performance Enhanced reasoning, especially in STEM fields
Performance benchmarks 13% on Mathematics Olympiad 83% on Mathematics Olympiad, PhD-level accuracy in STEM
Multimodal capabilities Handles text, images, audio and video Primarily text-focused with developing image capabilities
Context window 128K tokens 128K tokens
Speed Twice as fast as previous models Slower due to more reasoning processes
Cost (per million tokens) Input: $5; Output: $15 o1-preview: $15 input, $60 output; o1-mini: $3 input, $12 output
Availability Widely available across OpenAI products Limited access for specific users
Features Includes web browsing, file uploads Lacks some features from GPT-4o, such as web browsing
Safety and alignment Focused on safety measures Improved safety measures, higher resistance to jailbreaking

Sean Michael Kerner is an IT consultant, technology enthusiast and tinkerer. He has pulled Token Ring, configured NetWare and been known to compile his own Linux kernel. He consults with industry and media organizations on technology issues.

Dig Deeper on Data analytics and AI

Search Networking
Search Security
Search CIO
Search HRSoftware
Search Customer Experience
Close