I have always had an interest in history and politics, but with the culture being what it has been the last couple of years, I have largely retrenched into staying quiet. That has become especially relevant when engaging with family, where political conversations often feel less like discussions and more like parallel monologues.
When those conversations do happen, they tend to follow a familiar pattern. We talk about presidents. Who was good. Who was bad. Who we admire. Who we blame. And almost immediately, the discussion collapses into opinion. Vibes. Cable news talking points. Handed down reputations that everyone seems to accept but few can clearly defend.
That was the moment I realized how strange this is. We are perfectly comfortable quantifying things that are far less important. We debate who the greatest second baseman of all time was. We argue endlessly about whether the Michael Jordan Bulls were the greatest team ever. We build advanced metrics, normalize eras, and adjust for context. Surely we can do something similar for presidents. We have the data.
Yet when it comes to evaluating the most powerful executive role in the world, we still rely on something that looks like a popularity contest dressed up as historical consensus. Which is often just a polite way of saying collective subjective judgment.
Over the holiday break, I decided to see what it would look like to move beyond that. I opened up Google Gemini Deep Research and used it as a validation and synthesis engine to help build a more rigorous, quantifiable framework for evaluating presidential performance. The result is the Presidential Effectiveness Rating, or PER.
Why We Need a Quantitative Model
The core problem with existing rankings is historian bias. Presidents who were compelling speakers or symbolically important tend to be over-ranked. Presidents who managed complex, unglamorous economic or institutional shifts tend to be under-ranked. We forgive social instability if a president looked presidential while it was happening.
A quantitative approach removes the halo effect. It does not care about charisma. It cares about normalization. It asks a harder question: how did this leader move the needle relative to the baseline they inherited?
Show Your Work
Subjectivity thrives when the methodology is opaque. To counter that, the PER model is explicit. It breaks presidential effectiveness into four weighted pillars, each grounded in verifiable data.
Economic Weight (35%)
This goes beyond the traditional Misery Index. It looks at GDP velocity, meaning growth relative to economic potential, and fiscal health measured through shifts in debt to GDP. Using BEA and BLS data, the goal is to identify who actually managed the economic engine rather than who simply benefited from timing.
Social Stability (30%)
This is the most overlooked dimension. Using Census Gini coefficients and FBI UCR crime data, this pillar measures how much friction was introduced into the social fabric during a presidency. If inequality widened or crime increased, the score reflects that regardless of rhetorical skill.
Geopolitical Standing (25%)
This tracks conflict involvement versus resolution using SIPRI data and State Department treaty records. The focus is not on winning wars, but on maintaining global stability and the United States’ share of global GDP.
Crisis Leadership (10%)
This is where the model attempts to quantify what is usually left to narrative. It measures response lag, the time between a national crisis and a meaningful policy or executive action.
The Normalization Protocol
The core of the model is normalization. A three percent growth rate in 1890 does not mean the same thing as a three percent growth rate in 2024.
Every data point is converted into a Z score against a rolling 100 year baseline. That creates a level playing field where Abraham Lincoln and Joe Biden are evaluated using the same mathematical rigor rather than the same mythology.
How This Was Built
Gemini Deep Research played a critical role in validation. Sources were cross checked. Assumptions were pressure tested. Alternative metrics were explored and discarded where they added noise rather than clarity.
Importantly, the results passed a basic sanity check. The data produced what you would expect. The highest performing presidents clustered around the traditional Mt. Rushmore group. That was reassuring. It suggested the model was not contrarian for its own sake, but capable of both confirming consensus and challenging it where the data diverged.
The Goal Is Better Conversations
This project is not meant to end debate. It is meant to improve it.
I want us to stop arguing about whether a president was good or bad based on party affiliation or personality. I want us to argue about weighting. About normalization windows. About whether a metric belongs in the model at all.
If you do not like a ranking, do not tell me I am biased. Tell me which assumption is wrong. Tell me which data source is incomplete. Tell me what outcome you believe matters more and why.
In a country with this much data, we should not still be debating purely on vibes.
Let’s look at the work.
Comprehensive Presidency Audit
QUANTITATIVE EFFECTIVENESS ALGORITHM • FULL 45-SUBJECT REGISTRY
LOADING…
Select a subject to view historical context and data variance explanations.