IVy League Thinking: The right instrument for every job (or so we pretend).

I once presented a paper on YouTube content creators, and it was one of those talks where you walk away thinking you have at least managed to communicate the main idea without setting off a small methodological incident. Then an assistant professor came up to me afterwards and said, with great confidence, that the analysis could not work because videos are not released at random, and that what I therefore needed was an instrumental variable (IV).
To be clear, the observation that videos are not released at random is not incorrect, and it is also not particularly informative by itself. Nobody releases videos at random, or at least I sincerely hope that they do not, because content creation is a purposive activity that responds to audience feedback, platform incentives, personal schedules, trends, fatigue, inspiration, and sometimes the mundane reality of rent being due at the end of the month. In other words, the non-randomness is not some embarrassing defect in the data that should be hurriedly scrubbed away; it is very often the phenomenon itself, and it should push us to think more carefully about the process that generated what we are observing rather than about which methodological incantation we need to perform to make the discomfort go away.
When I asked what kind of instrument the assistant professor had in mind, the answer was vague hand-waving in the direction of “something exogenous,” which is academic shorthand for “I do not know, but I am absolutely sure that what you did is not sufficient.” This is what I have started to think of as Ivy League Thinking, not because it is actually confined to the Ivy League, but because it mirrors a broader academic habit in which invoking the right high-status methods grants a kind of methodological carte blanche, allowing confidence and ritual to stand in for explicit reasoning about what would have to be true in the world for the proposed fix to be meaningful — and, of course, because I needed the IV pun.
One Method to Rule Them All
The deeper temptation behind Ivy League Thinking is not really about instrumental variables in particular, but about the fantasy that there exists one method that can solve any empirical problem, a kind of Sauron-style approach to research design in which the world is messy and strategic, and our task is simply to pick the correct technique to restore order. In that worldview, serious scholarship becomes a sequence of familiar moves: identify a threat to causality, apply the approved remedy, and proceed as though the method itself has converted a complicated social process into a clean causal estimate.
The problem is that no method, instrumental variables included, is a magic wand, and certainly not a universal antidote to the fact that the social world is populated by decision-making agents. Methods can be powerful, but they do not eliminate the underlying structure of the data generating process, and they do not replace the need to defend assumptions that are aligned with the mechanisms we claim to study. At best, a method helps us formalize a particular compromise, because every design is a trade-off that exchanges one set of risks for another set of risks.
This is also where it helps to be precise about language. Instrumental variables are not an estimator in the usual sense; they are an identification strategy that is implemented through estimators such as two-stage least squares. The strategy does not solve non-randomness, and it does not eliminate endogeneity by decree. Instead, it replaces one set of untestable assumptions about the data generating process with another set of untestable assumptions, and confusing that trade-off for a solution is one of the easiest ways for methods to turn into rituals.
The World Is Not Random — And That’s Not the Problem
One reason Ivy League Thinking is so tempting is that it starts from an imaginary version of the world in which treatments are randomized, timing is exogenous, agents are passive, and causality politely reveals itself if only we choose the correct estimator. When the real world fails to cooperate, which it always does, the instinct is to correct reality rather than to understand it, and the methodological conversation drifts away from the data generating process toward an idealized checklist of techniques that promise redemption.
Yet the fact that something is not random does not tell us which method we should use, and it certainly does not justify jumping straight from a general concern about realism to a general demand for an instrument. What it should do is force us to take the data generating process seriously, because people make decisions, firms strategize, and creators respond to feedback, and those processes generate structure in the data that we need to model, approximate, or at least acknowledge. The relevant question is not whether our setting violates the assumptions of some idealized statistical world, but whether our data and design meaningfully reflect the mechanisms that actually produced the outcomes we are studying.
Virtue Dressing with Instruments
Nowhere is this more visible, and more persistently absurd, than in my own research field of optimal distinctiveness. If you read enough papers in this area, you quickly notice a pattern in which instrumental variable regressions appear with striking regularity, often using variations of average market distinctiveness, lagged category distinctiveness, or past-period means as instruments for a firm’s current distinctiveness.
The difficulty is that, in optimal distinctiveness research, the entire theoretical mechanism revolves around audience expectations, reference points, and comparative evaluation. Market-level distinctiveness is not an external background variable that floats outside the causal system; it sets the baseline, shapes what is considered normal, determines what is perceived as atypical, and influences which signals are even legible to audiences. Using market averages as instruments in this context is therefore not merely questionable but conceptually incoherent, because the instrument is not just correlated with the endogenous regressor; it is part of the mechanism through which the outcome is generated.
These instruments persist not because they are convincing, but because they look correct. Authors can say that they “also estimated IV models,” reviewers can say that endogeneity was “addressed,” and the paper can move forward with a reassuring aura of seriousness, even though the exclusion restriction has not been defended in a way that engages with the field’s own theory. This is what I mean by virtue dressing, because the technique functions as a moral credential rather than as a coherent commitment about how the data were generated.
When Instruments Are Actually Beautiful
None of this is meant as shade on instrumental variables in general, because I genuinely like instrumental variables when they are well designed, conceptually grounded, and honest about what they do and do not identify. The best IV papers do not treat the instrument as a decorative add-on that magically converts observational data into causal truth; they treat it as a serious claim about the data generating process, usually anchored in variation that is experimentally produced or plausibly “as good as random” through some design-like event such as a policy discontinuity, lottery, timing rule, or institutional shock.
The phrase “as good as random” is also one of those expressions that should trigger a small internal alarm, because it often functions as an invitation to stop thinking precisely at the moment when we should start thinking harder. Most of us have a childhood memory of being told that the new console our parents bought was “as good as the PlayStation,” which was technically a sentence in the English language, but not one that survived contact with reality for more than about twelve seconds. “As good as random” can be exactly the same kind of claim: it can be true in a narrow and defensible sense, but it can also be wishful thinking wrapped in a familiar phrase that sounds methodological enough to pass unchallenged, especially when it appears in a methods section with the confidence of a theorem.
This is also why many of the most credible and useful instruments tend to come from settings that resemble experiments, even if they are not experiments in the narrow sense. When an instrument is generated by design rather than convenience, it becomes much easier to reason about the exclusion restriction and to have an honest debate about alternative channels. By contrast, instruments that are simply convenient exogenous-looking variables that “everyone uses,” such as lags, averages, and the weather, often ask the reader to suspend disbelief in precisely the places where the theory tells us disbelief is warranted, because their seeming externality is frequently purchased by ignoring the many mundane ways in which they can be connected to behavior, timing, attention, and opportunity. Bastardoz et al. (2023) provide a great overview on the topic.
A classic example of an instrument that is both clever and defensible is the paper by Oberholzer-Gee and Strumpf (2007) on file sharing and music sales, where German school holidays were used as an instrument for file-sharing activity to estimate its effect on music sales in the United States. What makes that instrument appealing is not its novelty, but the clarity of its logic: holidays plausibly affect the ability to engage in file sharing, and the exclusion restriction is at least coherent enough to invite serious scrutiny rather than ritual acceptance.
Methods as Status Signals
What worries me about Ivy League Thinking is not methodological rigor, but the substitution of rigor with ritual. When methods become status signals rather than analytical tools, the conversation shifts away from substantive plausibility and toward checklist compliance, and we stop asking which assumptions we are actually willing to defend and why.
Statistics does not offer a Sauron-style method to rule them all, because every design is a compromise that relocates risk rather than eliminating it. Fixed effects, matching, and instrumental variables all have their place, but none of them absolve us of the obligation to think carefully about how our data came to look the way they do.
Causal-ish, Revisited
If this sounds familiar, it is because this post is meant as a continuation of the earlier argument about being “causal-ish,” which is the uncomfortable but honest recognition that most social science designs approximate causality rather than possessing it outright. Randomized experiments do this, natural experiments do this, and instrumental variables do this as well, and none of them allow us to avoid judgment, because judgment is the entire substance of the exercise.
What Ivy League Thinking does is pretend that judgment can be outsourced to method choice, as though selecting the right estimator could absolve us of defending assumptions about the data generating process. In reality, methods do not remove assumptions; they rearrange them, and instrumental variables are a particularly vivid example because they demand a small number of very strong assumptions that are often harder to justify than the ones they replace.
So What Do You Say After the Talk?
Which brings me back to the assistant professor, because this conversation will happen again. Someone will tell you that your setting is not random and that you therefore need an instrument, and they may be well-intentioned, and they may even be correct that endogeneity is a concern, but they are skipping the crucial step of explaining which assumptions an instrument would relax in this specific context and which new assumptions it would force you to defend.
So here is the reply I now wish I had given, because it shifts the conversation from ritual to substance without being confrontational: I agree that the treatment is not random, and I also agree that this creates identification challenges; what I am struggling with is which assumptions an instrument would let me relax here, and which new ones it would require me to defend, so what kind of exclusion restriction do you think would actually be plausible in this context? If a good instrument exists, this is where it should reveal itself.
And if the answer is still “something exogenous,” then at least you have learned something too, not about your paper, but about Ivy League Thinking, and about the temptation to treat methods as moral credentials rather than as conditional tools. That, to me, is progress, and if it makes me an accidental data scientist rather than a high priest of identification, I can live with that.