This article presents the Perception-Process-Product (hereafter “Triple P”) conceptual framework to expand the scope of experimental archaeology. The field has long tended to adopt the principle of Occam's razor (e.g., Blessing and Schmidt Reference Blessing and Schmidt2021; Domínguez-Rodrigo Reference Domínguez-Rodrigo2008; Reeves et al. Reference Reeves, Bury and Robinson2009; Schmidt et al. Reference Schmidt, Blessing, Rageot, Iovita, Pfleging, Nickel, Righetti and Tennie2019), whether explicitly or implicitly. This assumption acts to center inquiry around the reverse engineering of a past technology in a minimal or least-effort manner while ignoring the rich contextual information experimentation affords. When applied to the experimental study of ancient craftsmanship, Occam's razor—or the law of parsimony—implies that a technological solution that is simpler to reproduce is more likely to be the one used in the archaeological context. This is insufficient to infer the preferences of “irrational” agents possessing incomplete information (Mindermann and Armstrong Reference Mindermann and Armstrong2018) in tool design and use. The two conditions described here provide a better approximation of past humans displaying extensive cultural variation as opposed to the assumption of omniscient Homo economicus (i.e., the idea that humans are consistently rational and narrowly self-interested agents pursuing optimality) that has been rejected by many anthropologists (Apicella et al. Reference Apicella, Norenzayan and Henrich2020; Henrich et al. Reference Henrich, Boyd, Bowles, Camerer, Fehr, Gintis and McElreath2001). Heyes (Reference Heyes2012) similarly questioned the abuse of parsimony in animal behavioral research and proposed that new observational and experimental studies that allow testing of differential predictions become necessary when both a simple and a complex mechanism can explain the phenomenon of interest.
In fact, there are several reasons why past technologies may violate “parsimonious” assumptions of minimal manufacture complexity and optimal functional efficiency. In the evolution of technology, it is rather common that opaque causal perception and its resulting tendency of overimitation can lead to the widespread and long-lasting reproduction of technological solutions that are neither minimal in manufacture complexity nor optimal in functional efficiency. Overimitation means the copying of actions that are causally irrelevant in a goal-directed action sequence (Lyons et al. Reference Lyons, Young and Keil2007). It is a psychological propensity that was suggested to be uniquely prevalent among humans when compared with nonhuman primates, including chimpanzees (Horner and Whiten Reference Horner and Whiten2005), bonobos (Clay and Tennie Reference Clay and Tennie2018), and orangutans (Nielsen and Susianto Reference Nielsen, Susianto and Håkansson2010). Subsequent research further suggested that within human societies overimitation has been commonly observed among children across various cultural contexts (Berl and Hewlett Reference Berl and Hewlett2015; Nielsen and Tomaselli Reference Nielsen and Tomaselli2010; Nielsen et al. Reference Nielsen, Mushin, Tomaselli and Whiten2014; Stengelin et al. Reference Stengelin, Hepach and Haun2020; Subiaul et al. Reference Subiaul, Winters, Krumpak and Core2016). Gergely and Csibra (Reference Gergely and Csibra2006) introduced “Sylvia's Recipe,” which vividly illustrates this cognitive process in the transmission of technical skills. Sylvia is an education researcher who developed a unique way of cooking ham roast by having observed her mother throughout her childhood, who cut both ends of a ham. Later in life, her mother happened to watch her cooking, during which she noticed and asked Sylvia about the purpose of this step of the preparation. When Sylvia could not answer the question, her mother said that it was because she had not had a pan that was large enough to cook a full-sized ham. The commonality of this opaque causal perception has also been demonstrated in a recent study of Hadza bowmakers. Harris et alia (Reference Harris, Boyd and Wood2021) found that even experienced bowmakers only possess limited causal knowledge regarding the design and construction of bows according to modern engineering principles—that is, they cannot spell out the mechanical (dis)advantages of many morphological features.
On the other hand, path dependence also constrained the pursuit of functional optimization or simplification of manufacturing procedures. In this case, people are implicitly or explicitly aware of the existence of a more efficient solution, but they still stick to the older one due to the cost of learning, cultural conservatism (Acerbi et al. Reference Acerbi, Enquist and Ghirlanda2009; Ghirlanda et al. Reference Ghirlanda, Enquist and Nakamaru2006; Morin Reference Morin2022), or other reasons. One such example in the evolution of technology is the longevity of QWERTY keyboard design (Kafaee et al. Reference Kafaee, Daviran and Taqavi2022). This deliberately unergonomic solution was invented in the era of typewriters in order to disperse commonly used letters, preventing the most frequently struck “hammers” from clashing. Yet it is still the most common keyboard design today even though such constraint no longer exists on modern computer hardware. In short, we should acknowledge the existence and variation of many “good enough” technological solutions featuring various degrees of “redundancy” in real-world contexts, which often represent locally adaptive peaks instead of a global optimum in a multimodal fitness landscape due to multiple constraints and trade-off factors (Bettinger and Baumhoff Reference Bettinger and Baumhoff1982; Mesoudi and O'Brien Reference Mesoudi and O'Brien2008).
Building on this critique of Homo economicus and the four strategiesFootnote 1 of behavioral archaeology (Reid et al. Reference Reid, Schiffer and Rathje1975; Schiffer Reference Schiffer2010), here I propose the Triple P framework, which aims to (1) amplify the expression of variation in experimental replicas (product) and their associated behavioral channels (process) as well as sensory experiences (perception) by experiments in diverse contexts and (2) better identify the complex interacting relationships across these three levels of variations in real-world conditions. To accomplish these two objectives, I advocate the following three principles as integral components of the Triple P framework: (a) acknowledging the inherent trade-off between control and generalizability in the experimental research design, (b) encouraging collaborative projects that involve geographically diverse and nontraditional research participants such as hobbyists and novices, and (c) adopting a workflow that normalizes the collection and curation of ethological and ethnographic data in experimental projects. These two principles are developed to advocate for a pluralistic approach to the explanation of complex variation, which has received more attention in evolutionary anthropology (Antón and Kuzawa Reference Antón and Kuzawa2017) and cognitive science (Barrett 2020), instead of treating the optimization-based research agenda as a panacea. The second principle particularly allows researchers to develop research questions that are also meaningful to descendant communities through respectful conversation and collaboration (Montgomery and Fryer Reference Montgomery and Fryer2023). Regarding the third principle of the Triple P framework, it is acknowledged that strategies of data collection and analysis of a given experimental project should be primarily derived from the research question, but the awareness of the rich tool kit available can sometimes inspire researchers to ask questions that are bold and transformative (Schmidt and Marwick Reference Schmidt and Marwick2020). Here, I will leverage the extensive corpus in experimental designs and inferences revolving around stone artifacts to demonstrate the necessity and potential of this framework.
What Good Is Less-Controlled Experimentation?
The trade-off between causal inference (aka “internal validity”) and generalization (aka “external validity”) forms a central issue in experimental design across different disciplines (Degtiar and Rose Reference Degtiar and Rose2023; Roe and Just Reference Roe and Just2009:1266–1267).Footnote 2 Even in fields known for their development of rigorous and well-controlled experimental methods such as cognitive psychology and neuroscience, researchers have started to use relatively naturalistic stimuli more frequently and advocate a paradigm shift to semicontrolled experiments due to the generalizability crisis—namely, the prevailing mismatch between phenomenon of interest and measured variables in psychological science (Nastase et al. Reference Nastase, Goldstein and Hasson2020; Shamay-Tsoory and Mendelsohn Reference Shamay-Tsoory and Mendelsohn2019; Sonkusare et al. Reference Sonkusare, Breakspear and Guo2019; Yarkoni Reference Yarkoni2022). In contrast, the past decades have witnessed experimental archaeology's growing research interests focusing on the robust inference of causal mechanisms while compromising generalizability in the explanation of material culture variation (Eren and Meltzer Reference Eren and Meltzer2024; Eren et al. Reference Eren, Lycett, Patten, Buchanan, Pargeter and O'Brien2016; Lin et al. Reference Lin, Rezek and Dibble2018; Marreiros et al. Reference Marreiros, Pereira and Iovita2020). In the context of stone artifact replication, one typical research design emphasizing causality over generalizability is the use of knapping machines/robots (Li et al. Reference Li, Lin, McPherron, Abdolahzadeh, Chan, Dogandžić and Iovita2022; Pfleging et al. Reference Pfleging, Iovita and Buchli2019), which has helped map out the physical constraints of stone artifact manufacture and use through the identification of causal relationships between input (force, exterior platform angle, platform depth, etc.) and outcome variables (flake size, flake shape, wear formation, etc.). All variables of interest in this setting are relatively easy to measure, quantify, and control, but this type of design can be insufficient in inferring how context-generic principles interact in a particular context as reflected in real-world conditions. This research orientation prioritizes the material science aspect over the social science aspect of experimental archaeology. Similarly, standardized artificial materials such as bricks (Lombao et al. Reference Lombao, Guardiola and Mosquera2017) or foam blocks (Schillinger et al. Reference Schillinger, Mesoudi and Lycett2016) have been used to standardize materials and/or reduce learning demands in experimental studies focusing on the transmission of lithic technologies, producing results with limited generalizability (Liu et al. Reference Liu, Khreisheh, Stout and Pargeter2023). In real-world knapping, each rock has a different shape and often different physical properties such as inner cracks and inclusions, and this heterogeneity itself represents a critical variable in cultural transmission and skill development (Proffitt et al. Reference Proffitt, Bargalló and de la Torre2022).
In contrast, less-controlled experiments, which have been traditionally known as naturalistic or actualistic experiments (for detailed terminological critiques, see Conrad et al. Reference Conrad, Hough, Baldino, Gala, Buchanan, Walker, Key, Redmond, Bebber and Eren2023; Eren and Meltzer Reference Eren and Meltzer2024), pay more attention to how experimental insights can be generalized to archaeological samples by incorporating authentic materials and plausible social settings with a certain degree of compromised control (Outram Reference Outram2008). With regard to the cases of cultural transmission experiments, a less-controlled experiment would involve the use of (1) natural rocks with varied morphology instead of standardized artificial materials and (2) human demonstrators instead of videos of knapping instruction, despite the fact that the latter will remain consistent across individuals. Unlike strictly controlled experiments testing one variable of interest each time (Almaatouq et al. Reference Almaatouq, Griffiths, Suchow, Whiting, Evans and Watts2024), less-controlled experiments are designed to produce variation and their interactions. This feature is crucial and cannot be simply replaced by ethnographic records or ethnoarchaeology, because many Paleolithic technological components do not have analogs in contemporary nonindustrial societies (e.g., Arthur Reference Arthur2018; Stout Reference Stout2002). Although uncontrolled variation has traditionally been viewed as highly problematic, statistical techniques for developing causal inference from observational data—of the kind produced by less-controlled experiments—have also been greatly boosted in epidemiology and economics in recent years (Cunningham Reference Cunningham2021; Hernan and Robins Reference Hernan and Robins2023). Despite the fact that one should not interpret any experiment as a direct representation of an actual past event (Eren and Meltzer Reference Eren and Meltzer2024), less-controlled experiments can serve a heuristic role in hypothesis generation, aligning with the perspective of Lin et alia (Reference Lin, Rezek and Dibble2018:680–681), who proposed that the interaction between less-controlled and strictly controlled experiments “operates in a cyclical form of induction and deduction.”
Many Places, Many Voices
Traditional practices in experimental archaeology, as manifested by the fact that a majority of scholarly publications are produced as results of experiments conducted by a single knapper with the dual identity of also being a researcher (Whittaker Reference Whittaker2004), tend to be restrained by the cognitive bias known as the “curse of knowledge” or “curse of expertise.” This psychological term originally refers to the phenomenon that it is extremely challenging for experts to ignore the information that is held by them but not others—particularly novices—when communicating with others (Hinds Reference Hinds1999), but it has further implications for the sample representativeness in experimental archaeology. When the knapping expertise is gradually formed through multiple years of observations and trial-and-error learning, an expert knapper develops some specific ways of strategic planning, motor habits (and their associated impacts on anatomical forms such as wrist and elbow), preferences of percussor and raw material types, and familiarity of various techniques that become unforgettable (Moore Reference Moore2020:654). The existence of this cognitive bias is not inherently bad, and these many years of experience should be appreciated and celebrated by experimental archaeologists. However, what is problematic is that the results of replication experiments conducted by these experienced practitioners, often in settings of a single knapper, have been constantly framed as generalizations regarding the evolution of technology and cognition that masks a vast range of technological diversity.
Modern flintknapping techniques, as a research subject and a scientific method, originated from hobbyists’ individualistic trials of reverse engineering during the nineteenth century (Johnson 1978; Whittaker Reference Whittaker1994:54–61). Hobbyist knappers represent a huge repertoire of technological knowledge that does not fully overlap with what is acquired by academic knappers. They tend to generate ideas that may appear to be counterintuitive at first glance for academics. One such example is the utility of obtuse edge angle, as demonstrated by Don Crabtree (Reference Crabtree1977)—a mostly self-educated flintknapper, yet one of the most important figures in experimental archaeology. In his experiment, Crabtree demonstrated the excellent performance of blade dorsal ridge on tasks such as shaving and cutting hard materials, which challenged the traditional perspective on producing sharp lateral edges as the sole purpose of stone toolmaking and shed light on future functional reconstruction through the use-wear analysis. It is rather unfortunate that collaborations between academics and hobbyists are less common than expected due to their complicated and uneasy relationships, as detailed in Whittaker's (Reference Whittaker2004) ethnography. Likewise, novices’ lack of flintknapping expertise also helps to mitigate the “curse of knowledge” bias that may hinder expert knappers. Their involvement can potentially lead to the discovery of alternative methods, techniques, and interpretations that may have been overlooked by experts. Several researchers have also pointed out that literature-informed archaeologists sometimes get lost in reconstructing previous archaeologists’ reconstructions instead of searching for diverse solutions to better understand the actual archaeological phenomenon (Bell Reference Bell2014; Currie Reference Currie2022), which is another reason why we need the presence of hobbyists and novices in the community of experimental archaeology.
Emphasizing variation at its core, the Triple P conceptual framework recognizes that experimental archaeology can greatly benefit from diverse perspectives (Pargeter et al. Reference Pargeter, Liu, Kilgore, Majoe and Stout2023:164) and thereby inherently adopts a collaborative mode of knowledge production, which has been recently advocated in experimental studies (Liu and Stout Reference Liu and Stout2023; Ranhorn et al. Reference Ranhorn, Pargeter and Premo2020) and museum collection studies (Timbrell Reference Timbrell2023) of stone artifacts. Furthermore, the Triple P framework acknowledges that communities living in specific geographical areas possess unique insights and understanding of their cultural heritage (Arthur et al. Reference Arthur, Barkai, Allen, Shpayer, Efrati, Finkel and Ganchrow2024). This emphasis on team efforts and inclusivity allows for a more complete understanding of the nonutilitarian or unexpected aspects of raw material procurement (Batalla Reference Batalla2016) and selection (Arthur Reference Arthur2021), pretreatment (Maloney and Street Reference Maloney and Street2020), production (Griffin et al. Reference Griffin, Freedman, Nicholson, McConachie, Parmington, Lawrence, Frankel, Spry, Cunning and Berelo2013), and use (Martellotta et al. Reference Martellotta, Perston, Craft, Wilkins and Langley2022; Milks et al. Reference Milks, Hoggard and Pope2023) across different regions. Through ethical collaborations with those knapping practitioners in nonindustrial societies in the research process, the framework allows their voices to be heard and their contributions to be acknowledged. This not only enhances the quality of research outcomes but also fosters a sense of ownership and pride within these communities, strengthening the connection between archaeological research and the people it directly affects (Douglass Reference Douglass2020; Marshall Reference Marshall2002; Montgomery and Fryer Reference Montgomery and Fryer2023).
However, the facilitation of large-scale collaborations faces challenges within the current system of research evaluation. The prevailing practice of attributing credit primarily to the first author and senior (last/corresponding) author in peer-reviewed journal articles hampers the recognition of multiple contributors. This system often overlooks the valuable input of collaborators who may not fit into the traditional authorship structure but have made significant intellectual and practical contributions to the research. To truly embrace the principles of collaboration and inclusivity, there is a need for a reevaluation of the research evaluation system, allowing for proper acknowledgment of the diverse voices and contributions involved in large-scale collaborations (Ouzman Reference Ouzman2023). Moreover, considering the checkered disciplinary history of archaeology/anthropology featuring colonial exploitation, the changes in the evaluation system alone are not enough. This further highlights the need for adopting a community-based approach to fundamentally transform the power dynamics in archaeological knowledge production and distribution (Atalay Reference Atalay2012; La Salle Reference La Salle2010; Schneider and Hayes Reference Schneider and Hayes2020).
The Triple P Framework in Action
The implementation of the Triple P framework involves the collection of process-level (ethological) and perception-level (ethnographic) data (Figure 1), which is critical to address equifinality and multifinality (Eren et al. Reference Eren, Bebber, Mukusha, Wilson, Boehm, Buchanan and Logan Miller2024; Hiscock Reference Hiscock2004; Nami Reference Nami and Nami2010; Premo Reference Premo, Costopoulos and Lake2010)—two daunting challenges in archaeological inference. Equifinality refers to situations in which a similar state or consequence can be achieved through different paths, whereas multifinality emerges when a similar process can lead to multiple ends. Although we cannot fully solve these two problems and accurately reconstruct the past behavioral processes simply based on materials remains, context-rich experiments involving the collection of ethological and ethnographic data can help us better document an enlarged range of possible combinations of variation and draw a more informed inference (Reynolds Reference Reynolds and Harding1999). The importance of specifying and documenting the context information of both the experiment and the phenomenon of interest has also been recently highlighted in psychological sciences (Holleman et al. Reference Holleman, Hooge, Kemner and Hessels2020).

Figure 1. A schematic diagram demonstrating how to operationalize the Perception-Process-Product conceptual framework.
Product-Level Data
Traditionally speaking, the product-level data—namely, the documentation and analysis of replicas—form the sole research subject of experimental archaeology and serve as the tangible foundation for analogical inference in the interpretation of archaeological materials. It can exist in the form of spreadsheets containing detailed technological attributes, photos and illustrations, or high-resolution 3D scans of individual artifacts or a whole assemblage. No particular modification regarding the collection procedure of product-level data is required in the context of the Triple P framework, although the definition of variables measured and the documentation techniques (models of camera/scanners, light setting, processing software version and workflow, etc.) should be always available in the relevant metadata. I also strongly recommend adopting good habits in spreadsheet data organization (Broman and Woo Reference Broman and Woo2018).
Process-Level Data
Although systematic behavioral coding methods widely used in the study of nonhuman animal behavior (Fragaszy and Mangalam Reference Fragaszy and Mangalam2018) are still largely neglected among archaeologists, attempts to reconstruct behavioral sequences involved in the manufacture of material remains are not infrequent, ranging from the well-established chaîne opératoire approach (Audouze et al. Reference Audouze, Bodu, Karlin, Julien, Pelegrin and Perlès2017; Delage Reference Delage2017; Dobres Reference Dobres1999; Porqueddu et al. Reference Porqueddu, Sciuto and Lamesa2023; Soressi and Geneste Reference Soressi and Geneste2011) to the more recent cognigram method. To illustrate the benefits and drawbacks of existing analytical frameworks, I will use the cognigram as an example. First systematically developed and applied in archaeological research by Haidle (Reference Haidle, de Beaune, Coolidge and Wynn2009, Reference Haidle2010, Reference Haidle2014, Reference Haidle2023), a cognigram is a graphical representation of the reconstructed behavior behind archaeological artifacts in chronological order of appearance (Haidle Reference Haidle2014), which essentially represents an abstracting process of a series of action sequences achieving a similar goal. This approach provides an elegant descriptive methodology, but it is limited by its normative and analytical orientation—meaning it cannot handle variation very well. To some extent, it describes the minimal steps to achieve a goal from the perspective of reverse engineering and reflects the analyst's own causal understanding. However, this may be biased because (1) certain causal insights in stone fracture mechanics remained opaque to academic knappers until they were revealed through controlled experiments by Dibble and his colleagues (Li et al. Reference Li, Lin, McPherron, Abdolahzadeh, Chan, Dogandžić and Iovita2022) and (2) ethnographic studies demonstrated that expert nonacademic practitioners can have a different set of causal understanding (Harris et al. Reference Harris, Boyd and Wood2021).
Consequently, we need to accumulate more real-world data by recording a large number of toolmaking videos and conducting systematic ethogram analysis. With the emergence of new software platforms such as BORIS (Friard and Gamba Reference Friard and Gamba2016), the difficulty of coding has decreased significantly in recent years (Figure 2). Here, I use a modified version of action grammar developed by Stout et alia (Reference Stout2021) as an example, among multiple coding schemes featuring different research focus (Muller et al. Reference Muller, Shipton and Clarkson2023) or granularity (Cueva-Temprana et al. Reference Cueva-Temprana, Lombao, Morales, Geribàs and Mosquera2019; Mahaney Reference Mahaney2014; Roux and David Reference Roux, David, Roux and Brill2005.). The knapping action recorded in videos can be coded following the ethogram presented in Table 1. Depending on the original research question, sequences of coded actions can then be used in further analysis, such as the measurement of the complexity of various technological systems—a classical topic in Paleolithic archaeology and the evolution of human cognition (Muller et al. Reference Muller, Clarkson and Shipton2017; Perreault et al. Reference Perreault, Jeffrey Brantingham, Kuhn, Wurz and Gao2013). Unlike the traditional approaches resorting to the extraction and comparison of lithic attributes, Stout and colleagues (Reference Stout, Chaminade, Apel, Shafti and Aldo Faisal2021) recorded the videos of expert flintknappers reproducing Oldowan and Acheulean technologies and then manually parsed their knapping activities using action grammar, generating multiple sequences of actions. Borrowing tools from computational linguistics, they then calculated the transition probability between each action category across two technological systems, which provided an objective and quantifiable proxy for measuring technological complexity. Another scenario of its application is the measurement of behavioral similarity across individuals (Cristino et al. Reference Cristino, Mathôt, Theeuwes and Gilchrist2010; Mobbs et al. Reference Mobbs, Wise, Suthana, Guzmán, Kriegeskorte and Leibo2021), which is particularly relevant in the abovementioned cultural transmission experiments. Again, given that the existing works on this topic focus mainly on the sole analysis of experimental replicas, many aspects of knapping skill-learning processes remain unclear. For example, how do different individual learning strategies (high-fidelity action copying vs. predominantly trial-and-error learning) affect the morphological variation of their final products? Or will learning behavioral conformity within a community of practice necessarily lead to homogeneity in the formation of lithic assemblages? To answer these questions, the quantitative analysis of process-level data associated with the product-level data becomes necessary. Behatrix (https://www.boris.unito.it/behatrix/; Figure 3), a sister software of BORIS, allows us to calculate the action sequence (dis)similarity between novice learners and expert demonstrators / fellow novice learners using established algorithms (for an application of analyzing play behavior similarity among gorillas, see Cordoni et al. Reference Cordoni, Pirarba, Elies, Demuru, Guéry and Norscia2022).

Figure 2. An example of coding a handaxe knapping session using the BORIS software.
Table 1. A Modified Version (Liu et al. Reference Liu, Yan, Ding, Zhao, Jannati, Martinez and Stout2024) of the Original Action Grammar Presented in Stout and Colleagues (Reference Stout, Chaminade, Apel, Shafti and Aldo Faisal2021).


Figure 3. The user interface of Behatrix displaying the transition probability between each action category in a handaxe knapping session.
Perception-Level Data
Direct applications of ethnographic research in experimental archaeology as a field (Reeves Flores Reference Reeves Flores2012) and practices of specific technologies such as flintknapping—including contemporary US hobbyists (Whittaker Reference Whittaker2004) and knapping practitioners in various nonindustrial societies (Arthur Reference Arthur2018; Stout Reference Stout2002)—are far from novel. However, ethnography has never been formally recognized as a legitimate research method in mainstream experimental archaeology. Echoing the recent trends of adopting embodied cognition (Varela et al. Reference Varela, Thompson and Rosch2017) in archaeological research (Malafouris Reference Malafouris2013), ethnographic data and methods can reveal hidden information (e.g., intention, phenomenology) that is otherwise irretrievable. Consequently, it should occupy a unique niche in experimental archaeology. Within the broader context of burgeoning interest in mixed-method research in contemporary social science (Creswell and Clark Reference Creswell and Plano Clark2017), this also echoes the post-positivist turn in psychology in the past decades, particularly the emphasis on the value of incorporating qualitative research (Stout Reference Stout2021; Syed and McLean Reference Syed and McLean2022; Weger et al. Reference Weger, Wagemann and Tewes2019).
Through participant observation, interviews, and detailed field notes, ethnography can capture the subtle nuances of perception, such as cognitive affordances (Hussain and Will Reference Hussain and Will2021; Roepstorff Reference Roepstorff2008), sensory experiences (Day Reference Day2013; O'Neill and O'Sullivan Reference O'Neill and O'Sullivan2019; Skeates and Day Reference Skeates and Day2019), social interactions, and cultural meanings associated with the experimental activities (Gowlland Reference Gowlland2019). Compared with the ethological methods, the interview questions and participant observation in ethnographic methods feature an even higher degree of freedom and rely more heavily on both the research question and ad hoc interaction. One potential application of ethnographic methods in the experimental archaeology of stone artifacts is asking knappers about the intentions of each action and seeing how it matches with the results as revealed by lithic analysis of replicas, which can provide crucial contextual information that addresses the issues of equifinality and multifinality in the formation of lithic assemblage. For example, serial formation of step fracture on the debitage surface is commonly interpreted as unintentional mistakes indicative of novice knappers, whereas in some cases, researchers treat it as evidence of deliberate core rejuvenation (Akerman Reference Akerman1993:126). The accumulation of testimonies by participating knappers in terms of their intended outcome becomes useful in this scenario, although these materials should be examined in combination with the relevant product- and process-level data in a careful manner. Instead of seeing intention as something abstruse or unapproachable in archaeology (David Reference David2004; Russell Reference Russell2004), the Triple P framework adopts a novel definition proposed by Quillien and German (Reference Quillien and German2021:1) from the perspective of causal perception—that is, “an agent did X intentionally to the extent that X was causally dependent on how much the agent wanted X to happen (or not to happen).” In this sense, the mismatch between how different individuals perceive cause-and-effect relationships and how they are organized according to physical laws is exactly where interesting variation emerges and where ethnography becomes necessary.
Multilevel Sample and Data Curation
The comparative study and large-scale synthesis of variation data require the building of centralized, open-access, and carefully curated data infrastructure (Marwick and Birch Reference Marwick and Pilaar Birch2018), which unfortunately still does not exist in experimental archaeology. The accessibility and availability of experimental data can foster collaboration and enhance the reproducibility and transparency of research findings, because others can verify and validate the results by examining the original data. Moreover, a centralized database also promotes data preservation and long-term accessibility. Storing experimental data in a structured and organized manner safeguards valuable information from potential loss or degradation over time. This preservation ensures that the data remain accessible for future researchers, avoiding the loss of valuable insights and preventing the need for unnecessary and costly repetitions of experiments. It also allows for the reanalysis of existing data, facilitating discoveries and insights that may not have been initially anticipated. However, it has been widely acknowledged that the reuse of archaeological data has not received enough attention among researchers in our discipline (Faniel et al. Reference Faniel, Austin, Kansa, Kansa, France, Jacobs, Boytner and Yakel2018; Huggett Reference Huggett2018; Moody et al. Reference Moody, Dye, May, Wright and Buck2021).
Among the three dimensions of the Triple P framework, the product-level data are usually stored in the format of spreadsheets, photos, and 3D models; the perception-level data formats mainly include audio files and their transcribed texts; and videos are the main vector of process-level data, a rather nontraditional data format in archaeological research featuring the largest file size compared with the other two. Consequently, following data-sharing principles of FAIR (Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023; Wilkinson et al. Reference Wilkinson, Dumontier, Aalbersberg, Appleton, Axton, Baak and Blomberg2016) and CARE (Carroll et al. Reference Carroll, Garba, Figueroa-Rodríguez, Holbrook, Lovett, Materechera and Parsons2020; Gupta et al. Reference Gupta, Martindale, Supernant and Elvidge2023), the Triple P framework recommends Databrary (Gilmore et al. Reference Gilmore, Adolph, Millman, Steiger and Simon2015; Simon et al. Reference Simon, Gordon, Steiger and Gilmore2015)—a web-based library originally designed for developmental scientists—as the main data curation platform, where researchers can freely upload video files with no size limit and related metadata that can connect with different types of data within the same project. Databrary has three advantages compared with other data storage solutions: (1) no cost from the side of researchers, (2) long-term data security monitored by a specialized maintenance team, and (3) the fostering of potential collaborations between experimental archaeologists and developmental psychologists.
On top of the digital data curation, an easily ignored but crucial aspect regarding the integrity and reliability of research in experimental archaeology is the long-term and proper curation of physical specimens produced during experiments, which is particularly relevant to the product-level data. Haythorn et alia (Reference Haythorn, Buchanan and Eren2018) sharply pointed out that rerunning statistical analyses on a publicly available spreadsheet containing incorrect lithic projectile attribute data would be meaningless. In this case, a reexamination of the actual experiment samples becomes necessary. Moreover, it is likely that new research questions can only be answered through direct observation and measurement of the experimental assemblages themselves instead of data readily available in previously compiled spreadsheets (Eren et al. Reference Eren, Lycett, Patten, Buchanan, Pargeter and O'Brien2016). The existence of these possibilities therefore requires that an experimental assemblage of interest should be curated in public institutions with easy access and detailed contextual information (Haythorn et al. Reference Haythorn, Buchanan and Eren2018).
Conclusion
Through the broadening of traditional data types and recording methods revolving around experimental replicas per se, the Triple P conceptual framework allows the amplified multiscale expression of material cultural variation. It is also compatible with many theoretical orientations, ranging from behavioral archaeology (emphasis on video recording of behavioral processes) through evolutionary archaeology (emphasis on the amplification of variation) to postprocessual archaeology (emphasis on perception through ethnography). In terms of its research practice, it embraces a collaborative mode of knowledge production by involving a more diverse pool of stakeholders. It should be noted that this alternative mode of knowledge production is not necessarily a bundle sale, where each single component is independent and detachable according to the individual research question. Instead, it can serve as a heuristic tool to inspire potential readers to explore a broader range of data collection and analysis strategies. To summarize, the innovativeness, flexibility, and inclusiveness of the Triple P conceptual framework have enormous potential in redefining what can be and what should be studied by experimental archaeology as a field and thereby contributing to a better understanding of our deep past.
Acknowledgments
I thank Dietrich Stout for his helpful comments on earlier drafts of this article, and Mark Moore and Shaya Jannati for inspiring discussions. I am also grateful to Metin Eren and two other anonymous reviewers for their insightful feedback. I would also like to express my sincere gratitude to Javier Baena Preysler, who helped proofread the Spanish translation of abstract and keywords.
Funding Statement
This study was supported by the Leakey Foundation research grant titled “Inferring skill reproduction from stone artifacts: A middle-range approach” and the International Society of Human Ethology's Owen Aldis Award project titled “Sealed in stones: The computational ethology of stone toolmaking and its implications to the evolution of cultural transmission.”
Data Availability Statement
No original data were used.
Competing Interests
The author declares none.