Introduction
Investigating speech fluency has, for a long time, been at the core of second language (L2) studies, as fluency is believed to epitomise successful acquisition of L2, characterise effective communication, elucidate the complex process of acquisition, and predict L2 speakers' proficiency. The significance attributed to fluency in these areas explicates the research attention paid to it over the past decades. An important area of development in this regard is L2 assessment in which fluency is recognised as a key underlying construct of spoken language ability by international language tests (e.g., IELTS, TEEP, APTIS) and language benchmarks (e.g., CEFR). Many high-stakes tests of English and other languages include fluency in their rating scales, with the earliest on record tracing back to the 1930s – the College Board's English Competence Examination (1930) in America. Including fluency as a fundamental aspect of speaking ability in the rating scales, rating descriptors, and rater training materials, either as an independent criterion or combined with others (e.g., delivery), has become common practice in language testing over the past decades. What has made assessment of fluency even more appealing to researchers and test providers in recent years is the objectivity and reliability of its measurement and its compatibility with the technological developments in automated assessment of speaking. Fluency is now largely recognised as a construct that can be efficiently and reliably assessed in automated assessment of spoken language ability and used to predict proficiency (de Jong, 2018Footnote *; Ginther et al., 2010*; Kang & Johnson, 2021*; Tavakoli et al., Reference Tavakoli, Kendon, Muzhurnaya and Ziomek2023).
Fluency is commonly regarded as a complex and multidimensional construct, often reported as difficult to define with a degree of openness to interpretation in assessment contexts (de Jong, 2018*). Lennon (1990*) provided one of the earliest L2 definitions of fluency, considering it as the ‘rapid, smooth, accurate, lucid, and efficient translation of thought or communicative intention into language’ (Lennon, 1990, p. 26). Since its publication, this definition has been widely cited and has become the basis for other conceptualisations of fluency to emerge. Lennon's (1990*) work was also pioneering as it distinguished between a broad versus narrow sense of fluency with the former referring to the general concept of proficiency and the latter characterising the speakers’ fluidity of speech. A decade later, Koponen and Riggenbach (2000, p. 6) offered their own interpretation of fluency as ‘flow, continuity, automaticity, or smoothness of speech’. What these definitions have in common is that flow and fluidity are central to fluency and largely embody efficiency of language processing. Segalowitz's (2010*) work was a turning point in the process of developing a better understanding of the construct of fluency as it provided a more overarching and structured approach to its conceptualization and examination. In his triadic model, Segalowitz (2010*) argued that fluency should be understood as a multidimensional construct with at least three distinct but interrelated aspects: cognitive fluency (i.e., efficiency of the operations underlying speech production), utterance fluency (i.e., the measurable aspects of speech performance such as speed and silence), and perceived fluency (i.e., inferences listeners make based on their perceptions of the speaker's fluidity of speech). The relationship between the three aspects has been examined in the literature and the findings, so far, provide an in-depth understanding of the three aspects and how they interact with one another during the speech production processes.
To complement the predominantly cognitive perspective on fluency, more recently researchers have emphasised the significance of conceptualising fluency from a sociolinguistic and interactional perspective (Segalowitz, 2010*, 2016*). Tavakoli and Wright (Reference Tavakoli and Wright2020, p. 3), for example, underlined the social and interactional nature of fluency, in both L1 and L2, and maintained that a more complete understanding of fluency will only be achieved when it is examined in relation to ‘the context, purpose and audience’. They argued that a speaker considered fluent in one context addressing a specific audience may not be as fluent in another context, discussing a different topic, or interacting with a different audience. Similarly, speakers may project different fluency behaviour when speaking for different purposes (e.g., giving bad news is usually delivered more slowly than breaking good news). This new perspective is being adopted in recent studies (e.g., Morrison & Tavakoli, Reference Morrison and Tavakoli2023), offering emerging evidence about the interactional and sociolinguistic dimensions of fluency.
The earliest L2 studies in this timeline that have aimed to develop a systematic and objective framework for measuring fluency date back to the 1980s (e.g., Raupach, 1980*). In such studies, a native speaker baseline was usually considered and measures such as number of words or pauses per minute were calculated. Considerable developments have emerged since then. Skehan (2003*) and Tavakoli and Skehan (2005*) provided evidence that utterance fluency consists of three distinct factors: speed, breakdown, and repair. Other studies (Suzuki, Kormos & Uchihara, 2020*) have supported the three-factor model and argued that speed, breakdown, and repair constitute the underlying construct, consolidating the validity of the triadic approach to measuring fluency. Adopting this measurement framework, other researchers have complemented it by including more nuanced measures. Skehan (2009*) and Kahng (2014*), for example, have made a distinction between pure (e.g., speed only) and composite (e.g., speed plus pauses) measures; de Jong and Bosker (2013*) have shown that a threshold of 250–300 milliseconds (ms) is optimal for measuring the number of pauses when examining proficiency, and Hunter (Reference Hunter2017) has argued that pause should be examined in terms of duration, frequency, location, and character (e.g., filled or unfilled). Tavakoli's (2011*) findings highlighting the significance of pause location in distinguishing L1 from L2 speech were followed by a growing trend in fluency studies that systematically examined the location of pauses (e.g., final vs mid-clause position).
The development of technology has had two main influences on the assessment of fluency. First, the use of digital technology has changed the way fluency is measured. While earlier studies measured fluency more manually (e.g., using a watch or chronometer to measure pauses), often in longer time units (e.g., one second), the introduction of digital technology in the 1990s (e.g., GoldWave Digital Audio Editor, 2024) made it possible to measure the temporal aspects of fluency more objectively, in smaller units, and with more precision. Subsequently, the free availability of technical software specifically developed to analyse speech (e.g., PRAAT, Boersma & Weenink, Reference Boersma and Weenink2013) further helped spread the use of this technology. Second, the rapid development of the use of Artificial Intelligence (e.g., speech recognition and auto scoring technology) has changed assessment of fluency in automated assessment of speaking ability. Using such technological developments, some key test providers (e.g., Pearson Test of English) have started measuring fluency automatically or using it to predict proficiency for other aspects of speech (e.g., comprehensibility).
Since the 1980s, L2 research has additionally made remarkable progress in understanding a range of factors that affect L2 fluency. Studies in task-based language teaching (TBLT) have been particularly important in delineating the impact of task design, task characteristics, and performance conditions on fluency (Foster & Skehan, Reference Foster and Skehan1996; Michel, Reference Michel and Robinson2011). As can be seen in the timeline below, the findings of cognitive and psycholinguistic research in fluency have also helped foster a more in-depth understanding of the relationship between fluency and processing and production demands. Factors such as L1 fluency, personal style, social and pragmatic aspects of fluency, and task type are now recognised as variables with a significant impact on L2 fluency.
The complex nature of fluency and the range of factors affecting it have made its assessment difficult. Tavakoli and Wright (Reference Tavakoli and Wright2020) noted the disparity between the rating descriptors, rating criteria, and the common practice in the assessment of fluency in different high-stakes tests. Despite the widespread agreement about including fluency as a key criterion in the assessment of speaking ability, little agreement is observed about how fluency is characterised in the tests’ rating scales and descriptors or how it is measured. Tavakoli and Wright (Reference Tavakoli and Wright2020, p. 110) reported that the rating descriptors in these tests lack an unequivocal and specific definition to the extent that they can lead to ‘a degree of personal interpretation of fluency’. In addition, rating descriptors are often not based on research evidence (Fulcher, 1996*). What makes assessment of fluency even more difficult is the paucity of research evidence about the extent to which fluency is expected to develop in relation to proficiency. So far there is little research in this area to show whether a linear relationship is expected between fluency and proficiency, and whether the different aspects of fluency (speed, breakdown, repair) develop consistently as L2 ability progresses. Strikingly, most language testing fluency descriptors assume fluency develops as a whole construct (i.e., in its different aspects of speed, breakdown, and repair) as proficiency develops, but this assumption is not supported by solid empirical evidence (e.g., see Tavakoli et al., 2020*). Previous research has reported that most rating scales and rating descriptors are not based on empirical evidence (de Jong, 2018*; Fulcher, 1996*) and, as can be seen below, there are few studies that have examined fluency in a language testing context or made a contribution to the development of rating descriptors and rating scales.
The purpose of this article is to provide a timeline of studies (1979–2022) that have aimed at helping develop an accurate, evidence-based, and reliable understanding of fluency and its measurement and assessment. The timeline demonstrates a selection of seminal work and primary studies by both established and emerging researchers that have made an impactful contribution to the development of the current fluency frameworks (i.e., conceptual, measurement, and assessment frameworks) over the past four decades.
The timeline presented below has categorised these studies based on the following themes:
-
A. Understanding and defining the construct of fluency
1. Understanding the construct of fluency
2. Cognitive perspectives to conceptualising fluency
3. Interactional perspectives to conceptualising fluency
4. Emerging models of fluency
-
B. Fluency in language assessment
1. Fluency in international tests of L2 proficiency
2. Developing fluency rating scales, rating descriptors, rater training
3. Relationship between fluency and levels of proficiency
4. Automated assessment of fluency in international tests of English
-
C. Factors affecting L2 fluency
1. Individual speaker factors (e.g., personal styles)
2. Cross-linguistic factors
3. Social and contextual factors (e.g., study abroad)
4. Task related factors
5. Fluency in relation to other aspects of linguistic knowledge (e.g., lexis)
-
D. Measuring fluency
1. Subjective approaches to measuring fluency and perceived fluency
2. Objective approaches to measuring utterance fluency
3. Technological advancement in assessing L2 fluency

Parvaneh Tavakoli is Professor of Applied Linguistics at the University of Reading. Parvaneh's main research interest lies in the interface of second language acquisition, language teaching, and language testing. She is specifically interested in task-based language teaching, task design, and development of oral fluency in second language acquisition. Parvaneh has led several international research projects investigating second language performance, acquisition, assessment, and policy in different contexts. She has disseminated her research in the form of articles in prestigious peer-reviewed journals (e.g., Modern Language Journal, SSLA, and Language Learning), policy reports (e.g., Report to Welsh Government), and books by key publishers (e.g., Cambridge University Press and TESOL Press). Her latest monograph ‘Comprehensibility in Language Testing’ has been published Open Access by Equinox.