If you asked a room full of math majors to name their nightmares, one word would likely float to the top: Putnam.
The William Lowell Putnam Mathematical Competition isn’t just a test; it is a humbling ritual. It’s widely considered the most difficult undergraduate math competition in the world. To give you an idea of the brutality, the median score is often zero. Out of 120 possible points, some of the brightest young minds in universities like MIT and Harvard struggle to scrape together a single digit.
For years, this exam was the “moat.” It was the place where human reasoning stood tall and where AI, for all its flashy poetry and coding tricks, fell flat on its face. AI models could memorize, sure, but they couldn’t invent the kind of novel, multi-step logic required to solve a Putnam problem.
That changed this week.
Nous Research, an open-source AI collective, just dropped a bombshell called Nomos-1. It didn’t just pass the exam; it scored an 87 out of 120. To put that in perspective, that score would have placed it roughly 2nd overall against thousands of human competitors in last year’s bracket.
And the craziest part? It’s not a trillion-dollar model from Google or OpenAI. It’s an open-source model you can download.
Here is why this changes everything.
It’s Not About the Size of the Brain, It’s How You Use It
For the last few years, the AI narrative has been simple: “Bigger is Better.” If you want a smarter AI, you need more data centers, more electricity, and trillions of parameters.
Nomos-1 completely flips the script. It is a 30-billion parameter model. In the world of LLMs (Large Language Models), that is surprisingly petite. For context, GPT-4 is rumored to have over a trillion parameters.
So, how did David beat Goliath on the hardest math test in history?
The secret sauce isn’t in how much the AI knows; it’s in how it thinks. Nous Research didn’t just train the model to spit out an answer. They built a system that mimics the human “scratchpad.”
When you or I solve a hard math problem, we don’t just blurt out “42.” We try a method, realize it’s a dead end, scratch it out, try a new angle, verify our steps, and then conclude. Nomos-1 utilizes a similar two-phase architecture:
- The Workers: Several AI “agents” attempt to solve the problem simultaneously.
- The Tournament: The system critiques its own attempts, effectively running a bracket-style tournament to find the strongest, most logical chain of thought.
It’s “Metacognition”—thinking about thinking. And it turns out, when you give a smaller AI the time to pause and reflect, it can outsmart the giants.
Breaking the “Black Box” of Elite Math
Why does the Putnam matter to anyone outside of a math department?
Because the Putnam is a proxy for Reasoning.
Most AI benchmarks are getting stale. “Can it write a poem?” Yes. “Can it code a basic website?” Easy. But those tasks often rely on pattern matching. You’ve seen one Python script; you’ve seen them all.
Putnam problems, however, are designed to be unique. You cannot memorize your way through them. They require true, fluid intelligence—the ability to take tools you know (calculus, algebra, combinatorics) and apply them in a way you’ve never seen before.
When Nomos-1 scored an 87/120, it proved that open-source AI is no longer just “copying” human text. It is beginning to reason through novel situations.
This has massive implications for fields like scientific discovery and software engineering. If an AI can self-correct through a complex math proof, it can potentially self-correct through a complex codebase, finding bugs that a human (or a standard LLM) would miss because they’re just “autocompleting” code rather than understanding the logic behind it.
The “Open Source” Revolution Just Got Real
This is the part that should make the big tech giants sweat.
For a long time, the assumption was that the best AI would always be behind a paid API (like ChatGPT or Claude). The cost to train these “reasoning models” was just too high for the open-source community.
Nous Research just shattered that ceiling. By releasing the weights of Nomos-1 (and the “harness” code used to run it), they have democratized genius-level reasoning.
Imagine a student in a developing nation who can’t afford a $20/month subscription to a premium AI but has a decent local computer. They now have access to a math tutor that is, statistically speaking, smarter than 99.9% of math undergrads.
This isn’t just a tech upgrade; it’s a leveling of the global playing field.
Is Math “Solved”? (The Human Element)
Now, before we throw away our textbooks, let’s take a breath.
Does this mean the AI “understands” math the way Euler or Ramanujan did? Probably not. It doesn’t have intuition or joy. It doesn’t feel the “beauty” of a proof. It is running a very sophisticated search algorithm through a probability tree.
There is also the fear factor. If a 30B parameter model can do this now, what happens when this “reasoning architecture” is applied to a 400B parameter model next year? The curve is pointing almost vertically upward.
But I choose to look at it differently.
Calculators didn’t kill math; they killed arithmetic. They freed up mathematicians to think about higher-level abstract problems because they didn’t have to spend hours doing long division.
Nomos-1 and its successors are the “Calculators of Logic.” They will handle the grunt work of verifying proofs and checking reasoning chains, freeing up human scientists to ask the big, creative questions that AI—no matter how smart—still can’t formulate.
So, What Now?
The genie is out of the bottle. We now know that you don’t need a trillion dollars of compute to achieve elite reasoning. You just need better architecture.
As we move into 2025, we are going to see a shift. The race won’t just be about who has the biggest model, but who has the most thoughtful one. And thanks to Nous Research, that race is now open to everyone.

