• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, March 13, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

OpenAI Introduces GPT 5.2: A Long Context Workhorse For Agents, Coding And Knowledge Work

Josh by Josh
December 11, 2025
in Al, Analytics and Automation
0
OpenAI Introduces GPT 5.2: A Long Context Workhorse For Agents, Coding And Knowledge Work


OpenAI has just introduced GPT-5.2, its most advanced frontier model for professional work and long running agents, and is rolling it out across ChatGPT and the API.

GPT-5.2 is a family of three variants. In ChatGPT, users see ChatGPT-5.2 Instant, Thinking and Pro. In the API, the corresponding models are gpt-5.2-chat-latest, gpt-5.2, and gpt-5.2-pro. Instant targets everyday assistance and learning, Thinking targets complex multi step work and agents, and Pro allocates more compute for hard technical and analytical tasks.

Benchmark profile, from GDPval to SWE Bench

GPT-5.2 Thinking is positioned as the main workhorse for real world knowledge work. On GDPval, an evaluation of well specified knowledge tasks across 44 occupations in 9 large industries, it beats or ties top industry professionals on 70.9 percent of comparisons, while producing outputs at more than 11 times the speed and under 1 percent of the estimated expert cost. For engineering teams this means the model can reliably generate artifacts such as presentations, spreadsheets, schedules, and diagrams given structured instructions.

On an internal benchmark of junior investment banking spreadsheet modeling tasks, average scores rise from 59.1 percent with GPT-5.1 to 68.4 percent with GPT-5.2 Thinking and 71.7 percent with GPT-5.2 Pro. These tasks include three statement models and leveraged buyout models with constraints on formatting and citations, which is representative of many structured enterprise workflows.

In software engineering, GPT-5.2 Thinking reaches 55.6 percent on SWE-Bench Pro and 80.0 percent on SWE-bench Verified. SWE-Bench Pro evaluates repository level patch generation over multiple languages, while SWE-bench Verified focuses on Python.

Long context and agentic workflows

Long context is a core design target. GPT-5.2 Thinking sets a new state of the art on OpenAI MRCRv2, a benchmark that inserts multiple identical ā€˜needle’ queries into long dialogue ā€œhaystacksā€ and measures whether the model can reproduce the correct answer. It is the first model reported to reach near 100 percent accuracy on the 4 needle MRCR variant out to 256k tokens.

For workloads that exceed even that context, GPT-5.2 Thinking integrates with the Responses /compact endpoint, which performs context compaction to extend the effective window for tool heavy, long running jobs. This is relevant if you are building agents that iteratively call tools over many steps and need to maintain state beyond the raw token limit.

On tool usage, GPT-5.2 Thinking reaches 98.7 percent on Tau2-bench Telecom, a multi turn customer support benchmark where the model must orchestrate tool calls across a realistic workflow. The official examples from OpenAI release post show scenarios like a traveler with a delayed flight, missed connection, lost bag and medical seating requirement, where GPT-5.2 manages rebooking, special assistance seating and compensation in a consistent sequence while GPT-5.1 leaves steps unfinished.

Vision, science and math

Vision quality also moves up. GPT-5.2 Thinking roughly halves error rates on chart reasoning and user interface understanding benchmarks like CharXiv Reasoning and ScreenSpot Pro when a Python tool is enabled. The model shows improved spatial understanding of images, for example when labeling motherboard components with approximate bounding boxes, GPT-5.2 identifies more regions with tighter placement than GPT-5.1.

For scientific workloads, GPT-5.2 Pro scores 93.2 percent and GPT-5.2 Thinking 92.4 percent on GPQA Diamond, and GPT-5.2 Thinking solves 40.3 percent of FrontierMath Tier 1 to Tier 3 problems with Python tools enabled. These benchmarks cover graduate level physics, chemistry, biology and expert mathematics, and OpenAI highlights early use where GPT-5.2 Pro contributed to a proof in statistical learning theory under human verification.

Comparison Table

Model Primary positioning Context window / max output Knowledge cutoff Notable benchmarks (Thinking / Pro vs GPT-5.1 Thinking)
GPT-5.1 Flagship model for coding and agentic tasks with configurable reasoning effort 400,000 tokens context, 128,000 max output 2024-09-30 SWE-Bench Pro 50.8 percent, SWE-bench Verified 76.3 percent, ARC-AGI-1 72.8 percent, ARC-AGI-2 17.6 percent
GPT-5.2 (Thinking) New flagship model for coding and agentic tasks across industries and for long running agents 400,000 tokens context, 128,000 max output 2025-08-31 GDPval wins or ties 70.9 percent vs industry professionals, SWE-Bench Pro 55.6 percent, SWE-bench Verified 80.0 percent, ARC-AGI-1 86.2 percent, ARC-AGI-2 52.9 percent
GPT-5.2 Pro Higher compute version of GPT-5.2 for the hardest reasoning and scientific workloads, produces smarter and more precise responses 400,000 tokens context, 128,000 max output 2025-08-31 GPQA Diamond 93.2 percent vs 92.4 percent for GPT-5.2 Thinking and 88.1 percent for GPT-5.1 Thinking, ARC-AGI-1 90.5 percent and ARC-AGI-2 54.2 percent

Key Takeaways

  1. GPT-5.2 Thinking is the new default workhorse model: It replaces GPT-5.1 Thinking as the main model for coding, knowledge work and agents, while keeping the same 400k context and 128k max output, but with clearly higher benchmark performance across GDPval, SWE-Bench, ARC-AGI and scientific QA.
  2. Substantial accuracy jump over GPT-5.1 at similar scale: On key benchmarks, GPT-5.2 Thinking moves from 50.8 percent to 55.6 percent on SWE-Bench Pro and from 76.3 percent to 80.0 percent on SWE-bench Verified, and from 72.8 percent to 86.2 percent on ARC-AGI-1 and from 17.6 percent to 52.9 percent on ARC-AGI-2, while keeping token limits comparable.
  3. GPT-5.2 Pro is targeted at high end reasoning and science: GPT-5.2 Pro is a higher compute variant that mainly improves hard reasoning and scientific tasks, for example reaching 93.2 percent on GPQA Diamond versus 92.4 percent for GPT-5.2 Thinking and 88.1 percent for GPT-5.1 Thinking, and higher scores on ARC-AGI tiers.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

šŸ™Œ Follow MARKTECHPOST: Add us as a preferred source on Google.



Source_link

READ ALSO

Top LiDAR Annotation Companies for AI & 3D Point Cloud Data

Can AI help predict which heart-failure patients will worsen within a year? | MIT News

Related Posts

Top LiDAR Annotation Companies for AI & 3D Point Cloud Data
Al, Analytics and Automation

Top LiDAR Annotation Companies for AI & 3D Point Cloud Data

March 13, 2026
Can AI help predict which heart-failure patients will worsen within a year? | MIT News
Al, Analytics and Automation

Can AI help predict which heart-failure patients will worsen within a year? | MIT News

March 13, 2026
Al, Analytics and Automation

How to Build an Autonomous Machine Learning Research Loop in Google Colab Using Andrej Karpathy’s AutoResearch Framework for Hyperparameter Discovery and Experiment Tracking

March 13, 2026
Meta Unveils Four New Chips to Power Its AI and Recommendation Systems
Al, Analytics and Automation

Meta Unveils Four New Chips to Power Its AI and Recommendation Systems

March 12, 2026
New MIT class uses anthropology to improve chatbots | MIT News
Al, Analytics and Automation

New MIT class uses anthropology to improve chatbots | MIT News

March 12, 2026
How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments
Al, Analytics and Automation

How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments

March 12, 2026
Next Post
Disney’s deal with OpenAI is about controlling the future of copyright

Disney's deal with OpenAI is about controlling the future of copyright

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plansĀ 

Google announced the next step in its nuclear energy plansĀ 

August 20, 2025

EDITOR'S PICK

Why P2P Fundraising Teams Need a Unified Texting Platform

Why P2P Fundraising Teams Need a Unified Texting Platform

February 24, 2026
AI Video Mastery: Creating Videos That Sell

AI Video Mastery: Creating Videos That Sell

February 24, 2026
Google highlights efforts to support AI education in the U.S.

Google highlights efforts to support AI education in the U.S.

September 5, 2025
Apple might use Google servers to store data for its upgraded AI Siri

Apple might use Google servers to store data for its upgraded AI Siri

March 3, 2026

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • What We Learned After Finding 7 Forgotten Jobs Running for 5 Years
  • Purpose-Driven Food Brands Succeed | 5W PR Agency Blog
  • This web app lets you ‘channel surf’ YouTube like a ’90s kid watching cable
  • Top LiDAR Annotation Companies for AI & 3D Point Cloud Data
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions