The motherhood penalty isn’t as large as we think
And event-study designs aren’t the be-all end-all to empirical economics
One of the most influential findings in the past ten years of labor economics is that having children imposes a massive and persistent penalty on women’s earnings. In a landmark series of papers appearing in 2019, H. Kleven and coauthors (in the following Kleven et al.) showed that the birth of a first child causes women’s earnings to drop sharply and never fully recover, while men’s earnings are barely affected. The resulting child or motherhood penalty – the percentage by which women economically fall behind men due to children – ranges from about 20 percent in Denmark to over 60 percent in Germany, and has become a central explanation for gender inequality in high-income economies, entailing wide-ranging policy debates.
However, these estimates are now being challenged. A new paper by Bensnes, Huitfeldt, and Leuven (in the following BHL 2026) argues that the event-study method used to estimate the motherhood penalty overstates the earning losses from motherhood – possibly by as much as half. Their critique isn’t just nit-picking; it strikes at the core assumption that makes the entire design work, having implications that reach well beyond the already large and impactful motherhood penalty literature.
How we got there: event studies and causality
The approach by Kleven et al. is elegantly simple. Take all parents in administrative data, index time relative to the year of their first child’s birth, and track their earnings trajectories. The key regression includes event-time dummies (years relative to childbirth), age dummies (to control for life-cycle trends), and calendar year dummies (to control for macro conditions like business cycles). Identification, as economists call it when your regression spits out a causal effect, comes from variation in the age at which people have their first child: two 30-year-olds in the same calendar year can be at different event times if one had a child at 25 and the other at 28. Figure 1 below sketches the ideal situation for this approach: All women’s earnings follow the same trend before some of them have babies. This can be used to justify the assumption that they would have continued to follow this same trend if no woman had become a mother, which is the well-known parallel-trends assumption. Then after some of them have become mothers, the motherhood penalty is simply the gap between their earnings and the earnings of the women who have not become mothers yet, as by assumption, observing the earnings of the later mothers is as good as observing the hypothetical earnings of the earlier mothers had they not become mothers yet.
The results are striking. Before the first child, men’s and women’s earnings evolve on nearly identical paths, after adjusting for age and year effects. At the moment of the first birth, women’s earnings plummet, but men’s don’t. The gap never closes in up to 20 years of post-childbirth data. In Denmark, women’s earnings are about 20 percent lower relative to counterfactual ten years after the first child. Across six countries, the long-run penalty ranges from 21 percent in Denmark to 61 percent in Germany. The pattern is remarkably consistent: parallel pre-trends, a sharp downturn for women at childbirth, and persistent divergence between men and women thereafter.
Kleven et al. use the motherhood penalty to perform a dynamic decomposition of the overall gender earnings gap. Bottom line: children can explain the vast majority of remaining gender inequality in earnings. This finding was enormously influential. The motherhood penalty framework has become the default way that economists, policymakers, and journalists think about gender inequality in the labor market. Simultaneously, the proposed event-study design resulted in a proliferation of papers that are able to claim causality, the most important publication prerequisite in empirical economics, by building on the accepted validity of the Kleven et al. identification strategy, applying it to numerous other treatments and outcomes whenever a treatment timing was recognizable in the data.
Challenging the crucial assumption
As highlighted above, the causal event-study framework relies on the crucial assumption that the earnings of women who give birth later can serve as a valid counterfactual to the earnings of women who already have given birth. This assumption can be challenged by a simple question: Why do some women become mothers earlier than others? Raising this question is important because as soon as we had solid reason to believe that the differential timing of motherhood was systematically related to what women were expecting to earn in the future, we would immediately question the causal nature of the event-study results. BHL (2026) summarize it compactly as follows:
“The validity of the estimates produced by the event-study model […] depends on the assumption that women do not time fertility to their unobserved counterfactual earnings trajectory conditional on observed age and time.” (BHL 2026, p. 30)
Intuitively, this assumption never made a lot of sense to me. It always carried an air of mechanistic instead of economic decision-making. Economic theory tells us that women would indeed incorporate available information both about their current and their future potential economic prospects into their timing decisions. Childbearing definitely carries significant economic implications, probably even more so in the modern world of high female educational investments and career ambitions. Yet we are to believe that the timing of fertility is not a choice variable, conditional on a minimal set of control variables.
Insert IVF – both figuratively and literally
As it turns out, people do act quite a bit like economic agents. At least if we follow the evidence and argument presented by BHL (2026), women notice when their earnings profiles start to flatten over the course of their careers; when this happens, their likelihood of giving birth increases. This is in line with basic economic theory: childbearing becomes more attractive as the opportunity cost in form of career progression decreases. This means that the counterfactual to the women who give birth is wrongly chosen: the women who do not give birth (yet) are still moving ahead on steeper earnings profiles, while the women who give birth would have followed a flatter earnings profile even without kids. This implies that some of the motherhood penalty is incorrectly attributed to the widening earnings trajectory of the two groups of women who are being compared. Figure 2 below sketches this problem: Due to the earnings profile of later mothers still progressing steeply (in red) while earlier mothers already see their earninings profile flatten before their first birth (in black), comparing the two after earlier mothers have given birth overstates the true effect of motherhood on earnings.
BHL (2026) are able to show this is true by exploiting data on in-vitro fertilization (IVF) treatments in Norway. Their data do not only show when an IVF treatment resulted in a live birth, but also when women began their IVF treatments. Given that not all IVF treatments are successful, the procedure introduces randomness in the timing of births among women who successfully conceive a child via IVF. Consequently, BHL (2026) control for the timing of the IVF treatment – doing so does not affect the pre-birth trends but it reduces the magnitudes of the post-birth penalties for women considerably, in particular in the “long-run” when children are at least six years old: whereas Kleven et al. found a motherhood penalty of 20 percent in Denmark, BHL (2026) arrive at a penalty of only 7 percent in Norway. In addition, the corrected estimation technique by BHL (2026) reveals that the earnings of the women’s partners actually increase considerably by 9 percent, contributing to the earnings gap between women and their partners without directly harming women though.
“But pre-trends”
In order to dispel doubts about achieving identification in an event-study design, researchers resort to (over-)emphasizing the statistical insignificance of pre-trends, i.e. the differences between treatment groups in the periods leading up to treatment. This is then taken as evidence in support of the parallel-trends assumption: if it doesn’t fail prior to treatment, that’s an indication that it might also hold post-treatment. Kleven et al. rely on this, too, as every research team using this framework would. However, a particular feature of the motherhood penalty setting is that the pre-trends, while insignificant, are uninformative due to the selection into treatment conditional on what’s happening to post-treatment: as treatment timing is endogenous for all women, there are a priori no differential trends prior to treatment for earlier- and later-treated women that would point to any violation of the parallel-trends assumption.
The new state of facts
Importantly, the results by BHL (2026) do not say there is no motherhood penalty. Having children still reduces women’s earnings. Even the most conservative estimates from BHL (2026) show a long-run earnings loss of around 7 percent for mothers in Norway. What is disputed is the magnitude and the interpretation of the earlier results. The standard event study may overstate the maternal earnings penalty by a factor of two. And the gap between mothers’ and fathers’ earnings trajectories after childbirth may be driven more by the fathers’ earnings going up than by the mothers’ earnings going down. Using a different source of random variation in the timing of pregnancies, Gallen et al. (2025) find that unintended pregnancies have large negative effects on women’s incomes, especially if they occur during women’s education or early career stage. This supports the argument presented by BHL (2026): the motherhood penalty will be larger if childbirth occurs farther away from the optimum along women’s earnings profiles.
This matters directly for how we think about the problem and what policies we design to address it: If the motherhood penalty is primarily about women’s earnings collapsing, the policy response should focus on things like subsidized childcare, parental leave, and workplace flexibility – measures to increase mothers’ attachment to the labor market. If part of the “penalty” also reflects fathers investing more in their careers after having children, the picture becomes more complex. Policies that encourage a more equal division of childcare – like earmarked paternity leave – become relatively more important.
The way forward
As indicated above, the initial Kleven et al. papers not only changed the discourse surrounding gender equality in the labor market but also caused a “gold rush” of event-study designs. Suddenly, it wasn’t necessary to worry that much about identification anymore, as long as there were a fresh population registry and a research question that could be studied by exploiting the timing of an event – and there are many settings that fit this prerequisite. For years, papers written in this fashion have been dominating conferences on health, labor, and population economics. I’ve attended sessions in which every paper executed the event-study design, with the blunt execution accounting for a great deal of the slides. Nobody in the audience would get up and fundamentally challenge the repetitive yet widely accepted identification strategy, nor would anyone inject some deeper economic thinking into what was being presented. That said, the highlighting of the event study problems regarding the effect of motherhood does not imply that the event-study design is similarly questionable in the many other contexts it has been used. But it seems like a good opportunity to start thinking hard again about how it could be challenged.
References
Bensnes, S., Huitfeldt, I., & Leuven, E. (2026). Reconciling estimates of the long-term earnings effect of fertility. Discussion Paper 043/26. Rockwool Foundation Berlin. https://www.rfberlin.com/network-paper/reconciling-estimates-of-the-long-term-earnings-effect-of-fertility/
Gallen, Y., Joensen, J. S., Johansen, E. R., & Veramendi, G. F. (2025). The labor market returns to delaying pregnancy. Working Paper. https://yanagallen.com/UnplannedPregnancy.pdf
Kleven, H., Landais, C., Posch, J., Steinhauer, A., & Zweimüller, J. (2019). Child penalties across countries: Evidence and explanations. In AEA Papers and Proceedings, 109, pp. 122-126. https://www.aeaweb.org/articles?id=10.1257/pandp.20191078
Kleven, H., Landais, C., & Søgaard, J. E. (2019). Children and gender inequality: Evidence from Denmark. American Economic Journal: Applied Economics, 11(4), 181-209. https://www.aeaweb.org/articles?id=10.1257/app.20180010




I wrote about this last year as well when BHL was still a working paper!
https://open.substack.com/pub/illuminatingfertility/p/how-much-would-mothers-earn-if-they?utm_campaign=post&utm_medium=web
I think I’ve come around more to believing the original event studies, also with the new paper on infertile women that shows that they basically have the same outcomes as men.
Do you know of any studies that use spontaneous twins as an instrument? That would be an interesting source of randomness (controlling for age, since likelihood of twins increases with maternal age) to identify the effect of >1 child. It could also potentially identify the effects of maternity leave, since the vast majority of employers only give one leave per pregnancy, not per child.