Mapping the Murky Waters: The Promise of Integrative Experiment Design

by Abdullah Almaatouq (Guest Author)

My PhD journey began with a clear vision: to unravel the interplay between social network structures and their collective outcomes. I was particularly interested in the collective intelligence arising in those structures. With several projects already underway on this topic, I felt prepared. Perhaps optimistically, or some might think naively, I chose to tackle the literature review of my dissertation —often considered the “easy part”— during the first year of my PhD.

However, the deeper I waded into the literature, the murkier the waters became. The sheer volume of studies was overwhelming, but quantity wasn’t the only issue. Contradictory findings didn’t just pepper the landscape—they seemed to dominate it. On one side I read studies that sang the praises of social interactions, emphasizing their role in fostering social learning and the emergence of collective intelligence. Counterbalancing those studies was a chorus of papers cautioning against these very principles, warning of the homogenization of thought and the dilution of diverse ideas. It felt like I was navigating a labyrinth where every paper added layers of confusion. This apparent incoherency (Watts, 2017) transformed what I initially thought to be an “easy” part of doing a PhD into a seemingly impossible challenge.

I found myself questioning am I simply navigating the fallout of the well-known replication crisis in research? Or, are the inconsistencies merely reflections of known challenges in empirical research, like small-N samples, p-hacking, HARK-ing, researcher degrees of freedom, and publication bias? I wasn’t dealing with an abstract, academic conundrum; it was a tangible hurdle to pen a coherent literature review. The more I reflected, the more I realized that the issue hinted at broader methodological challenges in social science research. This realization, combined with a fortunate internship with Duncan Watts, had me pivot my research focus. Instead of studying social networks, my dissertation became an exploration of the apparent lack of cumulativeness in the social and behavioral sciences—a result of what Alan Newell termed “playing twenty questions with nature.”

Last year, my colleagues and I collected our thoughts into a target article soon to be published in BBS. We deep dive into the lack of cumulativeness in experimental social and behavioral sciences and argue that it stems from the problem of incommensurability—where individual experiments often operate in theoretical silos; thus making it difficult, if not impossible, to compare findings across studies. To address this challenge, we introduce the idea of an “integrative experiment design.” In general terms, the traditional approach, which we call the “one-at-a-time approach” to experimentation, starts with a single, often very specific, theoretically informed hypothesis. In contrast, the integrative approach starts from the position of embracing many potentially relevant theories. All sources of measurable experimental-design variation are potentially relevant, and decisions about which parameters are relatively more or less important are to be answered empirically. The integrative approach proceeds in three phases:

Define a comprehensive, multi-dimensional design space for the phenomenon of interest.
Sample strategically from this space, aligned with the objectives.
Integrate the results to develop theories that can address the observed outcome variations.

But what does this mean?

The integrative approach begins by clearly defining the design space of all possible experiments in a particular domain of interest. Experiments that have already been done can be placed within specific coordinates along axes representing the degrees of freedom in the experimental design, while those not yet undertaken represent areas to explore. The important takeaway here is the method’s inherent ability to pinpoint both differences and similarities between any pair of experiments focused on a shared outcome. Put simply, this method ensures commensurability from the get-go.

One practical issue with the integrative approach is how the design space’s size increases, especially as more dimensions are identified. Thankfully, several existing methods can help researchers navigate these high-dimensional spaces effectively.

Finally, just like the traditional one-at-a-time method, the end goal of the integrative approach remains the formulation of solid, cohesive, and progressively built theoretical explanations. Yet, the process varies notably. Instead of always seeking new, distinct theories, the emphasis shifts to identifying the extent or limits of current theories, which often involves understanding complex interactions among existing constructs. 

But how do we determine the dimensions of the design space? Given the resource constraints, how can we best devise sampling strategies? What implications does it have for the nature of theory in our fields? And could this approach inadvertently concentrate research power among a few, potentially intensifying research disparities? For an in-depth discussion of these questions and more, I encourage you to read our target article, the accompanying commentaries, and our [response to the commentaries]. As our field continues to discuss these approaches then we will more fully realize collective impact.

Back to Collective Impact

I’ve argued in another post here (Response to “From Quasi-Replication to Generalization” – Mack Institute for Innovation Management (upenn.edu) there are 4 possible explanations for non-replication of empirical results: variations in operationalization, variations in methodological competence across researchers, sampling error (p-hacking and HARKING are specific versions of this) and context dependence due to omitted moderator variables. Integrative experimental design is a proposed solution to the last.
Context dependence of an empirical result arises when there exist omitted variables that moderate the key relationships, and their values vary across contexts. This is formally equivalent to the problem of unobserved moderators that vary by context (see for instance Bareinbom and Pearl, 2016). Meta-analyses attempt to correct for this by finding and coding study level moderators (Hunter and Schmidt, 2003). Integrative experimental design adheres to this spirit as it is essentially a pre-planned meta-analysis across designs that systematically vary contextual variables that would otherwise have remained unobserved.
This is a very useful idea, and I discussed their paper’s pre-print with some excitement with my colleagues (and experimentalists) Carsten Bergenholtz (Aarhus) and Tianyu He (NUS) last year when I first encountered it. However in the spirit of useful pushback, I’ll channel here a few concerns that came up in our conversations. First, the authors assume that the size of the design space is tractable. Second they assume that the surface of results over the design space is not too rugged (i.e., local generalization is possible). Third, they assume the design space and its ruggedness in conjunction are such as to make intuitive explanations possible. It’s not a priori clear why any of these are valid assumptions.
An alternative we thought we should seriously consider: give up on the objective of globally generalizable and explainable theories and rely instead on experiments to build local (i.e., context specific) theories. Therefore, rather than attempting to experimentally cover the vast design space associated with the general version of a question, a more realistic objective is to systematically generate multiple smaller, context-dependent design spaces, accepting that these may remain imperfectly linked through human-comprehensible theories.
A different way to put it is that once we allow for “causal density” in Meehl’s memorable phrase (1967) quoted by the authors, we believe that it’s counterpart- our own bounded rationality as researchers- cannot be set aside either in favor of utopian dreams of generalizability while retaining simple explanations. General, accurate and simple- Thorngate (1976) famously argued that we’re lucky if we can get 2 out of 3 most of the time with our theories about organizations.
If we don’t like this, perhaps we should study simpler systems- like billiard balls and sub-atomic particles.

One comment on “Mapping the Murky Waters: The Promise of Integrative Experiment Design”

Phanish Puranam says:

January 10, 2024 at 10:40 pm

I’ve argued in another post here (Response to “From Quasi-Replication to Generalization” – Mack Institute for Innovation Management (upenn.edu) there are 4 possible explanations for non-replication of empirical results: variations in operationalization, variations in methodological competence across researchers, sampling error (p-hacking and HARKING are specific versions of this) and context dependence due to omitted moderator variables. Integrative experimental design is a proposed solution to the last.
Context dependence of an empirical result arises when there exist omitted variables that moderate the key relationships, and their values vary across contexts. This is formally equivalent to the problem of unobserved moderators that vary by context (see for instance Bareinbom and Pearl, 2016). Meta-analyses attempt to correct for this by finding and coding study level moderators (Hunter and Schmidt, 2003). Integrative experimental design adheres to this spirit as it is essentially a pre-planned meta-analysis across designs that systematically vary contextual variables that would otherwise have remained unobserved.
This is a very useful idea, and I discussed their paper’s pre-print with some excitement with my colleagues (and experimentalists) Carsten Bergenholtz (Aarhus) and Tianyu He (NUS) last year when I first encountered it. However in the spirit of useful pushback, I’ll channel here a few concerns that came up in our conversations. First, the authors assume that the size of the design space is tractable. Second they assume that the surface of results over the design space is not too rugged (i.e., local generalization is possible). Third, they assume the design space and its ruggedness in conjunction are such as to make intuitive explanations possible. It’s not a priori clear why any of these are valid assumptions.
An alternative we thought we should seriously consider: give up on the objective of globally generalizable and explainable theories and rely instead on experiments to build local (i.e., context specific) theories. Therefore, rather than attempting to experimentally cover the vast design space associated with the general version of a question, a more realistic objective is to systematically generate multiple smaller, context-dependent design spaces, accepting that these may remain imperfectly linked through human-comprehensible theories.
A different way to put it is that once we allow for “causal density” in Meehl’s memorable phrase (1967) quoted by the authors, we believe that it’s counterpart- our own bounded rationality as researchers- cannot be set aside either in favor of utopian dreams of generalizability while retaining simple explanations. General, accurate and simple- Thorngate (1976) famously argued that we’re lucky if we can get 2 out of 3 most of the time with our theories about organizations.
If we don’t like this, perhaps we should study simpler systems- like billiard balls and sub-atomic particles.

One comment on “Mapping the Murky Waters: The Promise of Integrative Experiment Design”

Leave a Reply Cancel reply