corpus Flashcards

Question

Binomial Sequencing Analysis [Corpus Linguistics]

Answer 1

Utilizing concordance lines to empirically prove the fixed, most frequent word order of paired items (e.g., confirming "bread and butter" while showing zero occurrences of "butter and bread").

Answer 2

A base linguistic form that groups together all morphological inflections for frequency counting (e.g., tallying "go," "goes," "going," and "went" as a single unit).

Answer 3

The ability to generate frequency lists strictly restricted to specific grammatical categories, allowing researchers to isolate all prepositions or adverbs in a corpus.

Answer 4

A web-based platform (lextutor.ca) utilizing corpora like the BNC and COCA to facilitate vocabulary assessment, testing, and materials development.

Answer 5

Using LexTutor to automatically create interactive gap-fill exercises by selectively removing vocabulary based on corpus frequency bands (e.g., post-1000 words) or specific parts of speech.

Answer 6

To write a 2,000–3,000-word case study analyzing a learner's writing to empirically determine if they are accurately placed at the C1 CEFR level.

Answer 7

They must be systematically tagged using custom codes enclosed within angle brackets (e.g., greens ).

Answer 8

A clear key that explicitly defines what each custom error code represents.

Answer 9

To identify and evaluate what the learner does correctly, focusing on their successful command of advanced vocabulary, syntax, and pragmatic structures.

Answer 10

A 2.9-million-word subset of student exam responses used to cross-reference a learner's output against peers with similar demographic or linguistic backgrounds.

Answer 11

A web-based analytical tool that estimates the CEFR level of a submitted text by calculating metrics such as lexical diversity, density, and academic word percentage.

Answer 12

A reference database that profiles specific words, idioms, and phrases to indicate the typical CEFR proficiency level at which a learner can use them accurately.

Answer 13

A reference database that maps syntactic and grammatical structures to specific CEFR levels using descriptive "can-do" statements.

Answer 14

90% of the grade; writing a case study report profiling a learner named Joseph to determine if C1 is his correct proficiency level using corpus tools.

Answer 15

A 50-year-old from the Democratic Republic of Congo whose first language is French, placed in a C1 English class.

Answer 16

A 2,000–3,000-word case study report that utilizes screenshots of corpus tools to substantiate findings, rather than a traditional academic essay.

Answer 17

They must be enclosed in angle brackets with a custom code and a closing tag with a forward slash (e.g., error ).

Answer 18

It explicitly indicates to the corpus software exactly where the erroneous text ends so frequencies can be automatically calculated.

Answer 19

A clear, tabular key that explicitly defines what each custom error code represents (e.g., = missing article).

Answer 20

To ensure the evaluation empirically identifies what the learner does successfully (e.g., advanced syntax, pragmatics) rather than solely focusing on their mistakes.

Answer 21

The accurate use of de-lexical verbs, collocations, idiomatic language, discourse markers, or academic distancing phrases.

Answer 22

Lexical diversity, density, and the percentage of academic vocabulary and pragmatic discourse markers used.

Answer 23

It categorizes the entire text by specific parts of speech (e.g., isolating all coordinating conjunctions) to evaluate structural complexity.

Answer 24

To cross-reference and verify the specific CEFR proficiency level of individual words, idioms, or syntactic structures used by a learner.

Answer 25

To compare the frequency of a specific learner's lexical choices against a vast dataset of other learners segmented by age, first language, and CEFR level.

Answer 26

The concept that "words hunt in packs" and do not occur alone. [J.R. Firth, 1957]

Answer 27

A verb that carries little independent meaning but forms a meaningful semantic phrase when combined with a specific noun (e.g., "make an effort," "have a party").

Answer 28

A verb consisting of a main verb and one or more particles that together create a distinct, often non-literal meaning (e.g., "take after").

Answer 29

"Of course." [John Sinclair, 1991]

Answer 30

To sue someone and legally take all of their money.

Answer 31

Because making new errors, rather than repeating fossilized ones, is essential for language acquisition and provides teachers with clear instructional focus. [Liz Regan, 2003]

Answer 32

The accurate and frequent use of figurative language strings, de-lexical verbs, collocations, and idioms.

Answer 33

The duck-rabbit illusion, illustrating that it is often easier to see the "duck" (errors) than the "rabbit" (competence).

Answer 34

A 1930s-1950s behaviorist approach that compared a learner's L1 and L2 to predict difficulties based on structural similarities and differences. [Robert Lado]

Answer 35

Positive transfer. [Robert Lado]

Answer 36

Negative transfer (or interference). [Robert Lado]

Answer 37

The Audiolingual method.

Answer 38

Stephen Pit Corder.

Answer 39

Intralingual errors that are common to both first and second language acquisition (e.g., overgeneralizing the past tense to say "eated"). [Stephen Pit Corder]

Answer 40

Errors caused by the direct interference or negative transfer from the learner's first language. [Stephen Pit Corder]

Answer 41

The concept that learner language is not just an imperfect copy of the target language, but a developing, rule-governed linguistic system in its own right.

Answer 42

Sylviane Granger.

Answer 43

The systematic use of learner corpora to describe and identify what learners are successfully able to do at different proficiency levels, rather than just what they do wrong.

Answer 44

A database developed by Cambridge that identifies the specific CEFR level at which learners typically acquire certain words and their varying senses (e.g., "live in" at A1 vs. "live on" at B2). [Annette Capel]

Answer 45

A database containing over 1,200 descriptors that map specific grammatical structures and competencies to their corresponding CEFR levels. [Anne O'Keefe and Geraldine Mark]

Answer 46

Text Inspector.

Answer 47

The Cambridge Learner Corpus.

Answer 48

The phenomenon where learners taking risks to produce more complex, higher-level language structures temporarily experience an increase in their error rate. [Jennifer Tuveson]

Answer 49

Because continued errors at advanced levels often indicate ongoing risk-taking and organic language development rather than a cessation of learning. [Diane Larsen-Freeman]

Answer 50

A dataset containing over 55 million words from 200,000 Cambridge exam scripts, featuring 32 million manually error-coded words across 143 first languages.

Answer 51

A publicly accessible subset containing nearly 3 million words that is aligned with CEFR levels but lacks manual error coding.

Answer 52

Because a learner might sit for an exam at one CEFR level (e.g., B2) but actually perform at a higher or lower true proficiency level (e.g., C1 or B1).

Answer 53

A specialized syntax used in software like Sketch Engine to conduct advanced searches using wildcards, regular expressions, and Part-of-Speech tags.

Answer 54

word="wh.*"

Answer 55

word=".*ing"

Answer 56

tag="RB" tag="J.*"

Answer 57

B1 learners rely heavily on basic intensifiers (e.g., "very," "really"), whereas C2 learners utilize greater lexical variety, hedging, and discourse organization (e.g., "especially," "moreover").

Answer 58

The systematic process of identifying and marking errors within a text using a specific bracketed syntax so corpus software can automatically tabulate them.

Answer 59

Insert angle brackets containing a chosen symbol representing the error type immediately before the erroneous text (e.g., ).

Answer 60

Insert angle brackets containing a forward slash followed by the same error symbol immediately after the erroneous text (e.g., ).

Answer 61

It explicitly indicates to the corpus software exactly where the error sequence ends, allowing for accurate automated counting and frequency analysis.

Answer 62

A comprehensive key or table that explicitly defines what each custom error code represents, ideally including an example sentence for each.

Answer 63

Base the decision on the recurring patterns observed in the specific learner's text; use granular codes if you want to zoom in on specific, frequently repeated sub-category struggles.

Answer 64

A reflective rationale explaining the analytical decisions behind your specific coding design, justified by the patterns you observed in the data.

Answer 65

The foundational base form of a word (e.g., "run") that serves as the root for all of its grammatical and morphological variants.

Answer 66

Grammatical markers or affixes added to a word to indicate tense, aspect, number, or comparison (e.g., the "-s" in "runs" or the "-er" in "happier").

Answer 67

A modal verb error.

Answer 68

The grammatical error of incorrectly connecting two fully independent clauses with only a comma, rather than separating them with a period, semicolon, or coordinating conjunction.

Answer 69

A capitalization or proper noun error.

Answer 70

A morphological mistake where the incorrect grammatical variant of a word is used, such as writing the adjective "economical" when the adverb "economically" is required.

Answer 71

A preposition error.

Answer 72

1. Scan the text for recurring patterns. 2. Design a draft coding key. 3. Systematically code the texts. 4. Compute the totals using corpus software. 5. Analyze the resulting data to draft the final report.

Answer 73

An unintentional performance-related lapse or slip where the learner possesses the underlying linguistic competence and can typically self-correct.

Answer 74

A systematic manifestation of a deficit in interlanguage competence requiring pedagogical intervention, as the learner lacks the underlying knowledge to self-correct.

Answer 75

Stephen Krashen's theory stating that students learn best in low-anxiety environments where motivation is high and errors are treated as a natural part of acquisition.

Answer 76

Failing to capitalize the first word of a sentence or a proper noun.

Answer 77

Failing to match the subject and verb in number, such as omitting the third-person singular "s".

Answer 78

Incorrectly pluralizing non-count nouns, such as writing "furnitures" or "evidences".

Answer 79

Lacking structural consistency within a series or list, such as mixing gerunds and infinitives.

Answer 80

Punctuating a dependent subordinate clause as if it were a complete, independent sentence.

Answer 81

Confusing animate pronouns (who/that) with inanimate pronouns (which).

Answer 82

Using exclusively male pronouns to refer to generic nouns, rather than using gender-inclusive forms.

Answer 83

A section reflecting on the challenges faced while designing the error coding system and justifying the chosen codes.

Answer 84

Broad introductory statements characterizing a specific error type before presenting detailed analysis.

Answer 85

Presenting quantitative data drawn from corpus software, such as tables showing error frequencies among different students.

Answer 86

Providing qualitative analysis by breaking down errors into specific patterns observed in concordance lines, evidenced by screenshots.

Answer 87

Suggesting specific instructional interventions to help the learner improve based on the corpus analysis findings.

Answer 88

Questions regarding the cognitive storage of vocabulary (the mental lexicon) and the psychological motivations of learners. [McCarthy, 2026]

Answer 89

Because it will be overwhelmingly dominated by grammatical function words (e.g., "the," "of," "and") rather than lexical content words. [McCarthy, 2026]

Answer 90

A finite list of known grammatical function words that the computer is instructed to ignore when generating a lexical frequency list. [McCarthy, 2026]

Answer 91

The process of grouping words by their base form (lemma) so that inflected variants (e.g., take, took, taken) are counted as a single lexical unit. [McCarthy, 2026]

Answer 92

A word family includes not only the base form and its inflections but also its derivations (prefixes and suffixes), such as grouping "war," "pre-war," and "post-war." [McCarthy, 2026]

Answer 93

A highly frequent verb (e.g., get, go, make, take) that possesses very little independent lexical content and instead derives its meaning from the words it collocates with. [McCarthy, 2026]

Answer 94

Words used to create and maintain interpersonal social relations, and words that form cemented multi-word chunks. [McCarthy, 2026]

Answer 95

A frequent pairing of words that maintain their associative strength even when separated by other intervening words (e.g., "blonde, beautiful, long hair"). [McCarthy, 2026]

Answer 96

A multi-word item that is firmly cemented together in a specific sequence, where altering or interrupting the sequence destroys its meaning (e.g., "by the way"). [McCarthy, 2026]

Answer 97

The percentage of a given text or corpus that is accounted for by a specific number of the most frequent words. [McCarthy, 2026]

Answer 98

Approximately 83%. [McCarthy, 2026]

Answer 99

Approximately 95%, which requires knowledge of roughly the top 10,000 words. [McCarthy, 2026]

Answer 100

Equipping students with independent vocabulary learning strategies, particularly training them to learn words in multi-word chunks rather than as single isolated units. [McCarthy, 2026]

Answer 101

Because the mental lexicon stores chunks rather than just single words, and retrieving chunks allows for instantaneous access and greater speaking fluency. [McCarthy, 2026]

Answer 102

A solid foundational understanding of corpus linguistics, which is necessary to write precise prompts for specific outputs like "keyword lists" or "collocations." [McCarthy, 2026]

Answer 103

Publishers commission books based on commercial market research and teacher expectations rather than academic linguistic research. [Burton, 2026]

Answer 104

Unlike novelists, ELT authors are commissioned by publishers to write to a specific concept under massive time pressure (often <1 year). [Burton, 2026]

Answer 105

The standard ELT 4-type system accounts for only 44% of real-world "if" clauses found in the BNC. [Gabrielatos; Burton, 2026]

Answer 106

Coursebooks teach "if + past simple" as hypothetical; however, 1/3 of corpus examples refer to actual past events (e.g., "If I offended you, I'm sorry"). [Burton, 2026]

Answer 107

Coursebooks over-rely on "since" and "for" while ignoring the highly frequent co-occurrence of the adverb "now" with the present perfect. [Shortall; Burton, 2026]

Answer 108

Coursebooks focus on "said/asked," while corpora show high frequencies of spoken introducers like "be like," "be all," or "go." [Burton, 2026]

Answer 109

The natural omission of subjects or auxiliary verbs (e.g., "[I] didn't know you used boiling water") commonly found in corpora but ignored in "clean" coursebook dialogues. [McCarthy & Carter; Burton, 2026]

Answer 110

Placing an extra element at the start of a clause for focus (e.g., "This friend of mine, her son..."), a feature common in speech but absent from materials. [McCarthy & Carter; Burton, 2026]

Answer 111

Using "which" to add a comment on a previous statement (e.g., "...which is nice"); these make up 70% of spoken relative clauses but are treated as "advanced" or secondary in books. [Tao & McCarthy; Burton, 2026]

Answer 112

Authentic conversation follows a "Question-Answer-Comment" pattern; coursebook dialogues often skip the third "Comment" part, making them sound robotic. [McCarthy & Carter; Burton, 2026]

Answer 113

Real-world transactions contain "messy" features like false starts, overlaps, and back-channeling, which are typically removed from "clean" coursebook recordings. [Gilmore; Burton, 2026]

Answer 114

Studies show as little as 1% overlap in lexical phrases across different coursebooks, suggesting authors rely on intuition rather than principled corpus data. [Koprovsky; Burton, 2026]

Answer 115

Surveyed authors cited a lack of time, lack of access to software, and a perceived lack of technical expertise as reasons for not using corpora. [Burton, 2012/2026]

Answer 116

The gap between academic linguists (who suggest changes based on data) and publishers (who only change if the market demands it). [Burton, 2026]

Answer 117

Newer series like "Cambridge English Empower" use EVP/EGP data to select level-appropriate meanings of words (e.g., "just") based on actual learner performance. [Burton, 2026]

Answer 118

A lack of consensus on scope and sequence; grammar often becomes a "rag bag" of obscure items rather than a clear progression. [McCarthy, 2026]

Answer 119

After the core 2,000–3,000 words, additional words are statistically rare and provide very little additional text coverage. [McCarthy, 2026]

Answer 120

The vertical axis of choice within a set (e.g., choosing "red" instead of "blue"); these meaning contrasts dominate lower-level learning. [Halliday; McCarthy, 2026]

Answer 121

The horizontal axis of structure and combination (e.g., collocation and word order); this becomes the primary focus for advanced learners. [Halliday; McCarthy, 2026]

Answer 122

Michael Halliday’s concept that grammar and lexis are not separate but exist on a single continuum. [Halliday; McCarthy, 2026]

Answer 123

The process where a grammatical structure freezes into a routine, formulaic lexical chunk (e.g., "you know" becoming a discourse marker). [McCarthy, 2026]

Answer 124

Two-word phrases connected by "and" with a fixed, unchangeable order (e.g., "pros and cons," "wear and tear"). [McCarthy, 2026]

Answer 125

Incorrect word order, such as saying "white and black" instead of "black and white." [McCarthy, 2026]

Answer 126

Mastering the 570 word families in the AWL can increase text coverage by 10%, a massive gain compared to learning random rare words. [McCarthy, 2026]

Answer 127

The phenomenon where older, core knowledge is not lost permanently but becomes "pushed out" or less available as the brain is crowded with new, advanced items. [McCarthy, 2026]

Answer 128

Using the form to make assumptions about the present or past knowledge (e.g., "You will have heard the news") rather than projecting into the future. [McCarthy, 2026]

Answer 129

A non-standard conditional structure that uses a command to express a condition (e.g., "Go into any shop and you will see..."). [McCarthy, 2026]

Answer 130

1. Verb patterns (insist that he go); 2. Noun patterns (requirement that he wear); 3. Adjective patterns (crucial that he be). [McCarthy, 2026]

Answer 131

The process of transforming verbs or adjectives into nouns (e.g., "I decided" → "My decision"), which is statistically linked to higher grades and academic success. [Halliday; McCarthy, 2026]

Answer 132

The pairing of a modal verb with a modal adverb (e.g., "could possibly," "might well") to achieve greater stylistic sophistication in writing. [McCarthy, 2026]

Answer 133

A marketing reality where teachers reject advanced textbooks if they don't see traditional (though infrequent) items like the subjunctive when flipping through the pages. [McCarthy, 2026]

Answer 134

A universal scale (A1 to C2) that defines what learners "can do" at each language proficiency level.

Answer 135

Introducing oneself and talking about daily routines using the present simple tense and frequency adverbs.

Answer 136

Describing experiences, dreams, and plans, as well as discussing abstract topics like politics or technology.

Answer 137

Expressing complex ideas clearly in professional or academic settings and understanding nuances, slang, and jokes.

Answer 138

The English Vocabulary Profile (EVP).

Answer 139

Because its CEFR level changes depending on whether it is used as a basic noun (A1), an abstract noun (B1), a verb (B2), or an idiom (C2).

Answer 140

By generating topic-specific word lists restricted to a specific CEFR level (e.g., retrieving only C1 words related to politics).

Answer 141

Text Inspector.

Answer 142

A token is the total number of words in a text including repetitions, whereas a type is the number of distinct, unique words.

Answer 143

It indicates a very basic vocabulary with low word coverage of specialized or lower-frequency terms.

Answer 144

Lexical Tutor (LexTutor).

Answer 145

Words that carry meaning, specifically nouns, verbs, adjectives, and adverbs.

Answer 146

Words that provide grammatical structure, such as prepositions, pronouns, and conjunctions.

Answer 147

The ratio of content words to the total number of words in a text.

Answer 148

It indicates a more advanced, sophisticated vocabulary compared to an over-reliance on basic function words.

Answer 149

The English Grammar Profile (EGP).

Answer 150

By finding specific grammatical constructions in the learner's text via Sketch Engine and consulting the EGP to verify what CEFR level those constructions represent.

Answer 151

Screenshots of concordance lines generated from Sketch Engine.

Answer 152

Pedagogical recommendations for the teacher on how to practically enhance the learner's competences.

Answer 153

A pedagogical and analytical shift focusing on what learners can achieve linguistically (the "glass half full") rather than focusing on learner errors.

Answer 154

The count of distinctive, unique words in a text excluding all repetitions.

Answer 155

The total number of words in a text, including every instance of repeated functional and content words.

Answer 156

The proportion of content words (nouns, verbs, adjectives, adverbs) relative to the total number of words in a sample.

Answer 157

An online resource that identifies which specific words and meanings are typically mastered at each CEFR level.

Answer 158

A resource that maps grammatical structures and learner usage patterns to the corresponding CEFR proficiency levels.

Answer 159

A tool used to analyze learner data by providing a percentage breakdown of vocabulary across CEFR levels (A1–C2).

Answer 160

A platform used to measure lexical density and provide vocabulary profiles using tools like VP Classic.

Answer 161

A systematic planning document defining variables and specifications (e.g., L1, medium of instruction, age) for building a research corpus.

Answer 162

The measure of how much of a learner's text falls within a specific frequency benchmark, such as the BNC 2,000 most frequent words.

Answer 163

A diachronic analysis tool tracking the frequency of words or phrases across digitized books from the 1500s to 2022.

Answer 164

They must be typed sequentially, separated by commas, with no spaces between the terms.

Answer 165

Students can investigate the emergence of idioms, compare usage timelines of semantically related terms, and click specific chronological periods to reveal actual source texts.

Answer 166

A free, streamlined interface derived from Sketch Engine, designed to allow language learners to extract and analyze corpus data without the complexity of the full software suite.

Answer 167

They display target phrases within authentic sentence contexts, allowing students to independently deduce meaning and identify strict syntactical positioning.

Answer 168

Word Sketches.

Answer 169

It generates visual clusters of synonyms and near-synonyms based purely on their structural and distributional behavior within the corpus.

Answer 170

An automated, corpus-driven writing evaluation platform powered by the Cambridge Learner Corpus designed for real-time formative assessment.

Answer 171

It calculates and assigns a CEFR proficiency level to submitted text, highlights syntactic and lexical errors, and plots longitudinal progress on a visual graph.

Answer 172

The British National Corpus (BNC) and the Corpus of Contemporary American English (COCA).

Answer 173

To assess a learner's receptive vocabulary knowledge across distinct frequency bands, advancing from the first 2,000 words up to the 10,000-word level or the University Word List.

Answer 174

The Vocabulary Size Test.

Answer 175

It parses submitted text to extract and categorize multi-word units, cross-referencing them against established databases such as the Academic Collocations List and specific idiomatic inventories.

Answer 176

The study and tracking of how language, specifically the frequency and usage of words or phrases, evolves over a timeline.

Answer 177

A discovery-based approach where learners are exposed to raw data and must independently deduce the underlying linguistic rules, meanings, or patterns.

Answer 178

The lexical items that a learner can recognize and comprehend during reading or listening tasks, distinct from the vocabulary they can actively produce.

Answer 179

A diachronic analysis tool tracking the frequency of words or phrases across digitized books from the 1500s to 2022.

Answer 180

They must be typed sequentially, separated by commas, with no spaces between the terms (e.g., Albert Einstein,Sherlock Holmes).

Answer 181

Students can query terms from required reading and inspect the chronological timeline to reveal the exact source texts, publications, and page numbers from specific historical eras.

Answer 182

The phrase had near-zero historical frequency but spiked exponentially between 1980 and 2000, allowing students to analyze modern discourse on corruption.

Answer 183

"Sympathy" maintained consistent historical usage, whereas "empathy" emerged almost entirely post-1960.

Answer 184

"Raining cats and dogs."

Answer 185

A free, streamlined interface derived from Sketch Engine, designed for language learners to extract corpus data without complex querying syntax.

Answer 186

They display target phrases within authentic sentence contexts, allowing learners to independently deduce semantic meaning and syntactical positioning without explicit teacher definition.

Answer 187

The concordance data proved the idiom functions almost exclusively in the sentence-final position.

Answer 188

Concordance lines highlighted gendered and descriptive collocations, associating "purse" with women and specific colors, and "wallet" with men.

Answer 189

A summary of a word's collocational behavior, categorizing the specific verbs, objects, and modifiers (e.g., "short-lived" for "happiness") that typically co-occur with the target term.

Answer 190

It generates visual clusters of synonyms and near-synonyms based purely on their structural and distributional behavior within the corpus.

Answer 191

An automated, corpus-driven writing evaluation platform powered by the Cambridge Learner Corpus designed for real-time formative assessment.

Answer 192

It instantly calculates and assigns a CEFR proficiency level (A1–C2) to a submitted text, highlights errors as "suspicious words," and plots longitudinal progress on a visual graph.

Answer 193

IELTS Academic, IELTS General, B2 First certificate, Business English, and English for Healthcare.

Answer 194

Instructors can establish digital workbooks, assign timed writing tasks, and monitor individual or class-wide progression and error frequency metrics.

Answer 195

The British National Corpus (BNC) and the Corpus of Contemporary American English (COCA).

Answer 196

Receptive vocabulary knowledge across distinct frequency bands: the 1st 2,000 words, 3rd 2,000 words, 5,000 words, 10,000 words, and the University Word List.

Answer 197

The Vocabulary Size Test.

Answer 198

Comprehension of phrasal verbs based on empirical frequency data from the British National Corpus (BNC).

Answer 199

It parses a submitted text to extract and categorize multi-word units by cross-referencing them against established academic, structural, and idiomatic databases.

Answer 200

The Academic Word List.

Answer 201

An inventory of 505 multiword expressions (MWEs) that occur within the 5,000 most frequent word families in the BNC, specifically selected for being semantically non-transparent for L2 learners (Martinez & Schmitt, 2012).

Answer 202

The Oxford Placement Test lexicon.

Answer 203

The Cambridge and Nottingham Corpus of Discourse in English, a five-million-word spoken corpus used to study conversational language. [McCarthy & Carter]

Answer 204

The accurate use of chunks, a repertoire of small interactive words, and confluence. [McCarthy]

Answer 205

Listener monitoring, used to continuously check if the speaker and listener are on the same wavelength. [McCarthy]

Answer 206

It projects a shared worldview and assumes shared knowledge between speakers, preventing listener fatigue. [McCarthy]

Answer 207

A phrase used to cap off a list, signaling an assumed shared category without explicitly naming every item (e.g., "and things like that"). [McCarthy]

Answer 208

The specific noun phrase or clause (e.g., "bone development") provided immediately before a marker to tune the listener into the correct category. [McCarthy]

Answer 209

There is a direct statistical correlation between high fluency grades and the deployment of small interactive words like "just," "so," "then," and "actually." [Hasselgren, 2004]

Answer 210

"Small words with big meanings." [Sinclair]

Answer 211

To politely correct a listener's assumption without causing a face-threat. [McCarthy]

Answer 212

The B2 or C1 levels. [McCarthy]

Answer 213

Confluence. [McCarthy]

Answer 214

The first words of a speaker's turn explicitly demonstrate their reaction to what they have just heard, rather than transmitting new content. [Hong In Tao, 2003]

Answer 215

"Oh" and "Well" strongly prefer position 1, while "Basically" overwhelmingly prefers position 2. [McCarthy]

Answer 216

They refer to objects and fail to provide the vital interactive or retroactive link required to maintain conversational flow. [McCarthy]

Answer 217

The practice of using grammatically dependent items (like "which" or "if" clauses) as freestanding turns to hook directly onto the previous speaker's main clause. [McCarthy]

Answer 218

2.5 seconds. [Riga, 2003]

Answer 219

1. Noticing (via listening), 2. Input (explaining the function), and 3. Drilling/Practice. [McCarthy]

Answer 220

Because it is the type of discourse people spend the vast majority of their lives engaging in, providing a standard measure against which specialized corpora can be compared. [McCarthy]

Answer 221

64%, closely mirroring its 66% frequency in casual social conversation. [McCarthy]

Answer 222

It demonstrates that knowledge transmission relies heavily on projecting shared worlds and assumptions to ensure the speaker and listener are on the same wavelength. [McCarthy]

Answer 223

By taking a general spoken corpus and asking the software to measure the extent to which specific words are significantly more or less frequent in a target corpus. [McCarthy]

Answer 224

It functions structurally as a sectional paragraph marker to signal the stages of progression in the development of knowledge. [McCarthy]

Answer 225

It highlights the unequal power dynamics and the subtle control teachers exert over academic interactions, despite the appearance of a shared conversation. [McCarthy]

Answer 226

They identify the words that do not distinguish the two corpora, revealing areas where the specialized language and the baseline are similar or identical. [McCarthy]

Answer 227

Pronouns and hesitations such as "well", "er", "I", and "you". [McCarthy]

Answer 228

"In terms of". [McCarthy]

Answer 229

It reveals that academic knowledge is structured and taught by explicitly relating ideas to one another, rather than presenting them as isolated facts. [McCarthy]

Answer 230

It is highly frequent in the social sciences and education, but considerably less frequent in the physical sciences and engineering. [McCarthy]

Answer 231

"In the sense that". [McCarthy]

Answer 232

It narrows down a general idea by explicitly focusing on a more precise, specific meaning. [McCarthy]

Answer 233

It operates strictly as a masked directive or command. [McCarthy]

Answer 234

To foster a cooperative interaction and subtly invite the student into membership within a discourse community or community of practice. [McCarthy]

Answer 235

An exemplar (or multiple exemplars) followed by a general extender or marker. [McCarthy]

Answer 236

It begins with the word "and" (e.g., "and that sort of thing"). [McCarthy]

Answer 237

It signals the assumption that the listener knows the parameters and can more or less completely fill the members of the referenced category. [McCarthy]

Answer 238

It begins with the word "or" (e.g., "or whatever"). [McCarthy]

Answer 239

It leaves the category deliberately open-ended, serving as an invitation for the listener to creatively expand the parameters. [McCarthy]

Answer 240

Consistency analysis. [McCarthy]

Answer 241

In conversation they are distributed equally, whereas in academic speaking "you" achieves much wider distribution while "I" becomes narrower and dependent on the specific event type. [McCarthy]

Answer 242

The lecturer is explicitly framing the academic concepts as belonging to the student in order to pull them into an inclusive, shared world. [McCarthy]

Answer 243

The Michigan Corpus of Academic Spoken English, a 1.7-million-word corpus developed by John Swales and his team. [McCarthy]

Answer 244

The Cambridge Limerick and Shannon corpus, containing academic data collected from a hotel management college. [McCarthy]

Answer 245

The British Academic Spoken English corpus, functioning as a parallel corpus to the American MICASE. [McCarthy]

Answer 246

The nature of the mental lexicon and the psychology of learning vocabulary, although artificial intelligence may change this limitation. (McCarthy, 2025)

Answer 247

Frequency lists, keywords (statistically significant words), collocations and chunks, coverage measures, and concordances. (McCarthy, 2025)

Answer 248

Stop lists and lemmatisation. (McCarthy, 2025)

Answer 249

A filtering process that removes specific categories of words, such as all grammar words, from corpus data before analysis. (McCarthy, 2025)

Answer 250

The amalgamation of word-forms to include both the base form and its inflected forms, which is distinct from 'word families' that also include derivations. (McCarthy, 2025)

Answer 251

Be, do, and have. (McCarthy, 2025)

Answer 252

Words which occur with significantly high or low frequency in a given corpus when statistically compared to a reference corpus. (McCarthy, 2025)

Answer 253

High-frequency content words related to operational concepts, such as management, strategy, and performance. (McCarthy, 2025)

Answer 254

"Actually" is the most frequent adverb by sheer occurrence, whereas "effectively" possesses the highest statistical keyness. (McCarthy, 2025)

Answer 255

2,500 to 3,000 words. (Nation and Waring, 1997; Schmitt and Schmitt, 2014; Szudarski, 2018; McCarthy, 2023)

Answer 256

Items of unusually high frequency that are implicated in the creation and maintenance of social relations, which are often involved in chunks. (McCarthy, 2025)

Answer 257

Collocation and chunks. (McCarthy, 2025)

Answer 258

A type of word combination characterized by the frequent and expected co-occurrence of specific words, such as "business strategy" or "market share". (McCarthy, 2025)

Answer 259

Spoken Business English features a critically high frequency of discourse markers, hedges, and phrasal verb collocations, such as "you know" and "I mean". (McCarthy, 2025)

Answer 260

"I don't know" is the most frequent chunk in spoken Business English, while "a lot of" is the most frequent in spoken Academic English. (McCarthy, 2025)

Answer 261

Because special vocabularies develop pragmatic specialisations and specific meanings for words and chunks that frequency lists cannot reveal. (McCarthy, 2025)

Answer 262

It has shifted from denoting physical movement to signifying progression "into the future". (McCarthy, 2025)

Answer 263

They allow for the automatic gathering of data according to defined curation parameters to rapidly assemble and grow corpora. [O'Keeffe and McCarthy, 2021]

Answer 264

Online corpus interfaces. [O'Keeffe and McCarthy, 2021]

Answer 265

Rapid-response corpora. [O'Keeffe and McCarthy, 2021]

Answer 266

Whether the rapid curation accurately reflects or refractions our shared social reality. [O'Keeffe and McCarthy, 2021]

Answer 267

Crowdsourcing. [O'Keeffe and McCarthy, 2021]

Answer 268

Characterized by the combination of various communicative modes, such as speech, body language, and text, within a single dataset. [O'Keeffe and McCarthy, 2021]

Answer 269

Reducing rich communication into one-dimensional, impoverished orthographic transcripts. [O'Keeffe and McCarthy, 2021]

Answer 270

The principles of careful sampling, corpus design, and representativeness. [O'Keeffe and McCarthy, 2021]

Answer 271

Pertaining to the traditional method of error analysis in learner corpora that isolates L1 versus L2 linguistic interference. [O'Keeffe and McCarthy, 2021]

Answer 272

To empirically test calibrated proficiency scales to demonstrate what language learners can do. [O'Keeffe and McCarthy, 2021]

Answer 273

They serve as training data for algorithms to process learner performances for automated assessment and feedback. [O'Keeffe and McCarthy, 2021]

Answer 274

Pertaining to a data cubing technique in machine learning that models learner data across three axes: the learner, their academic discipline, and change over time. [O'Keeffe and McCarthy, 2021]

Answer 275

Chatroom corpora of teacher-learner interactions. [O'Keeffe and McCarthy, 2021]

Answer 276

Pertaining to learner corpora that gather data over extended periods, requiring partnerships between SLA and CL experts to fully utilize. [O'Keeffe and McCarthy, 2021]

Answer 277

Usage-based perspectives. [O'Keeffe and McCarthy, 2021]

Answer 278

The manual, line-by-line indexing of words and their citations by biblical scholars. [O'Keeffe and McCarthy, 2021]

Answer 279

An early automated concordancing format developed by library and information scientists in the 1970s. [O'Keeffe and McCarthy, 2021]

Answer 280

The need to collect reliable, empirical samples of language usage to compile dictionaries and grammatical descriptions. [O'Keeffe and McCarthy, 2021]

Answer 281

They allow for the automatic gathering of data according to defined curation parameters to rapidly assemble and grow corpora. [O'Keeffe and McCarthy, 2021]

Answer 282

Online corpus interfaces. [O'Keeffe and McCarthy, 2021]

Answer 283

Rapid-response corpora. [O'Keeffe and McCarthy, 2021]

Answer 284

Whether the rapid curation accurately reflects or refractions our shared social reality. [O'Keeffe and McCarthy, 2021]

Answer 285

Crowdsourcing. [O'Keeffe and McCarthy, 2021]

Answer 286

Characterized by the combination of various communicative modes, such as speech, body language, and text, within a single dataset. [O'Keeffe and McCarthy, 2021]

Answer 287

Reducing rich communication into one-dimensional, impoverished orthographic transcripts. [O'Keeffe and McCarthy, 2021]

Answer 288

The principles of careful sampling, corpus design, and representativeness. [O'Keeffe and McCarthy, 2021]

Answer 289

Pertaining to the traditional method of error analysis in learner corpora that isolates L1 versus L2 linguistic interference. [O'Keeffe and McCarthy, 2021]

Answer 290

To empirically test calibrated proficiency scales to demonstrate what language learners can do. [O'Keeffe and McCarthy, 2021]

Answer 291

They serve as training data for algorithms to process learner performances for automated assessment and feedback. [O'Keeffe and McCarthy, 2021]

Answer 292

Pertaining to a data cubing technique in machine learning that models learner data across three axes: the learner, their academic discipline, and change over time. [O'Keeffe and McCarthy, 2021]

Answer 293

Chatroom corpora of teacher-learner interactions. [O'Keeffe and McCarthy, 2021]

Answer 294

Pertaining to learner corpora that gather data over extended periods, requiring partnerships between SLA and CL experts to fully utilize. [O'Keeffe and McCarthy, 2021]

Answer 295

Usage-based perspectives. [O'Keeffe and McCarthy, 2021]

Answer 296

The manual, line-by-line indexing of words and their citations by biblical scholars. [O'Keeffe and McCarthy, 2021]

Answer 297

An early automated concordancing format developed by library and information scientists in the 1970s. [O'Keeffe and McCarthy, 2021]

Answer 298

The need to collect reliable, empirical samples of language usage to compile dictionaries and grammatical descriptions. [O'Keeffe and McCarthy, 2021]

Answer 299

An established, language-neutral benchmark for language competence comprising six levels (A1 to C2) defined by intuitively derived "can-do statements". (O’Keeffe & Mark, 2017)

Answer 300

Its descriptors are generic, intuitively derived, and lack empirical evidence, leading to arbitrary and inconsistent interpretations. (O’Keeffe & Mark, 2017)

Answer 301

To provide empirical detail about learner English competence, supplementing and adapting the generic, language-neutral CEFR. (O’Keeffe & Mark, 2017)

Answer 302

A database of over 1,200 empirically-derived statements detailing what learners can do with English grammar at each CEFR level, based on the Cambridge Learner Corpus. (O’Keeffe & Mark, 2017)

Answer 303

They posit that frequently occurring form-meaning pairings become entrenched in a learner's mind through repeated use and experience. (O’Keeffe & Mark, 2017)

Answer 304

A phenomenon where advanced learners attempt to use more complex language, which inherently increases their risk of error and hinders measurable improvements in accuracy. (O’Keeffe & Mark, 2017)

Answer 305

They may rely on language they are comfortable with or engage in exam display to exhibit knowledge of structures perceived as complex. (O’Keeffe & Mark, 2017)

Answer 306

There is rarely an L1 corpus that perfectly matches the learner corpus in terms of representativeness, task constraints, and context. (O’Keeffe & Mark, 2017)

Answer 307

Performance-based "can-do statements" derived by experts to define the minimum requirements for each stage within a proficiency framework. (O’Keeffe & Mark, 2017)

Answer 308

A stabilization phase in language development, typically between B2 and C2 levels, where error rates stop decreasing significantly despite continued learning. (O’Keeffe & Mark, 2017)

Answer 309

The state of a form-meaning pairing becoming firmly established as grammatical knowledge in a learner's mind due to frequency of use. (O’Keeffe & Mark, 2017)

Answer 310

A task effect where a learner deliberately uses specific linguistic features to demonstrate their knowledge during a test. (O’Keeffe & Mark, 2017)

Answer 311

A theoretical, homogenous L1 target based on a consensus of native-speaker success, which rarely exists in reality. (O’Keeffe & Mark, 2017)

Answer 312

The developing linguistic system of an L2 learner, which can be observed and compared across different proficiency levels. (O’Keeffe & Mark, 2017)

Answer 313

The development of grammar competence across proficiency levels, rather than tracking patterns of error decline, plateau, or regression. (O’Keeffe & Mark, 2017)

Answer 314

Knowing more lexis allows learners to expand the repertoire of grammatical and pragmatic uses for a specific syntactic form. (O’Keeffe & Mark, 2017)

Answer 315

At A1, the syntactic pattern is stable but used with a limited range of verbs; by B1, the identical morphosyntactic pattern is applied to a wide range of verbs. (O’Keeffe & Mark, 2017)

Answer 316

The phenomenon where a single grammatical pattern is progressively deployed across a wider range of meanings and for greater pragmatic effect as a learner's lexical repertoire grows. (O’Keeffe & Mark, 2017)

Answer 317

Just as learners acquire multiple meanings for a single vocabulary word over time, they acquire multiple pragmatic and semantic functions for a single, stabilized grammatical structure. (O’Keeffe & Mark, 2017)

Answer 318

At A1, the pattern is restricted to simple combinations like "very + adjective," whereas at C1, it utilizes advanced lexis to add pragmatic force and function as a focusing device. (O’Keeffe & Mark, 2017)

Answer 319

They become increasingly aware of the collocational and colligational limitations of the pattern, understanding which specific lexical items are primed to fill syntactic slots. (O’Keeffe & Mark, 2017)

Answer 320

The stage at which a grammatical form reaches syntactic stabilization, typically at lower proficiency levels, before later being deployed with greater semantic complexity. (O’Keeffe & Mark, 2017)

Answer 321

Rather than indicating an end to learning, the stabilization of form at A and B levels serves as a foundational baseline for deploying those forms with greater meaning complexity and dexterity at higher levels. (O’Keeffe & Mark, 2017)

Answer 322

It is in constant development and is never fully complete, continually evolving in complexity even after specific syntactic forms appear to have stabilized. (O’Keeffe & Mark, 2017)

Answer 323

The ability to skillfully manipulate an acquired syntactic form for subtlety of meaning, focus, or social functions. (O’Keeffe & Mark, 2017)

Answer 324

The form is deployed for pragmatic effect, utilizing verbs like "wondered" or "wanted" as politeness structures for requesting or thanking rather than indicating past time. (O’Keeffe & Mark, 2017)

Answer 325

The data is strictly limited to written examinations, and proficiency calibration relies exclusively on the assessment criteria of a single examination board. (O’Keeffe & Mark, 2017)

Answer 326

To expose anomalies in competence calibration, compare findings across different testing frameworks, and examine learner competence beyond the constraints of written exams. (O’Keeffe & Mark, 2017)

Answer 327

The axis of linguistic choice where a learner selects specific, contextually primed lexical items to fill a stable syntactic slot. (O’Keeffe & Mark, 2017)

Answer 328

Expressing plans or intentions, and making predictions based on present evidence. [Burton, 2021]

Answer 329

The rule asserting that speakers typically avoid using the verb "go", the verb "come", or verbs of movement generally, immediately following the future marker "going to". [Burton, 2021]

Answer 330

They utilize hedging language, framing rules in relative terms as tendencies (e.g., stating that speakers "tend not to" or "avoid" a form rather than strictly prohibiting it). [Burton, 2021]

Answer 331

It implies that if native speakers use a specific structure infrequently, learners should also avoid it because high frequency of that structure would be considered erroneous. [Burton, 2021]

Answer 332

It first appeared explicitly in Harold Palmer's 1924 pedagogical text, A grammar of spoken English on a strictly phonetic basis. [Burton, 2021]

Answer 333

The British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). [Burton, 2021]

Answer 334

Because their overall frequencies as lemmas in the BNC are highly similar to "go" and "come", providing a baseline to test if "go" and "come" appear unexpectedly less often after "going to". [Burton, 2021]

Answer 335

Both strings are highly attested; "going to go" is the eighth most frequent "going to + infinitive" chunk in the BNC, directly refuting the claim that the combination is avoided. [Burton, 2021]

Answer 336

Analyzing "will go" versus "will say" provided a baseline for how often speakers talk about future travel versus future speech, demonstrating that the lower frequency of "going to go" compared to "going to get" contradicts expected usage patterns if the rule were true. [Burton, 2021]

Answer 337

The calculations yielded neutral or positive scores (e.g., Odds Ratio above 1), providing no statistical evidence that the structure "going to" repels or avoids the verbs "go" and "come". [Burton, 2021]

Answer 338

The rule is part of an unexamined "canon" of ELT grammar points; coursebook writers often lack the time or training to consult corpora, and publishers hesitate to update descriptions for fear of alienating traditional teachers. [Burton, 2021]

Answer 339

Grammatical descriptions designed specifically for second language learners, which frequently compromise absolute descriptive truth in favor of clarity, simplicity, or conceptual parsimony. [Burton, 2021]

Answer 340

A large, structured database of machine-readable spoken and written text used by researchers to identify empirical linguistic frequencies and usage patterns. [Burton, 2021]

Answer 341

The dictionary base form of a word that encompasses all its inflected variations (e.g., the lemma "know" includes "know," "knows," "knowing," "knew," and "known"). [Burton, 2021]

Answer 342

A sequential string of words that operate together as a single grammatical or semantic entity, such as "going to go". [Burton, 2021]

Answer 343

A linguistic device used to express a lack of categorical commitment to the truth of a statement, often seen in rules phrased as "we usually avoid" rather than "we never use." [Burton, 2021]

Answer 344

A statistical measure of association that calculates whether two linguistic elements co-occur more or less often than would be expected by chance; a score below zero indicates the elements repel each other. [Burton, 2021]

Answer 345

A measure of association between two events; in corpus statistics, a score below 1 indicates that two linguistic features are observed together less frequently than expected, suggesting avoidance. [Burton, 2021]

Answer 346

A well-established, collective agreement within the language teaching profession regarding which specific grammatical structures must be taught, resulting in rules being repeated uncritically across materials. [Burton, 2021]

Answer 347

Approximately 46 percent, as the traditional system only accounts for 54 percent of actual usage. [Burton, 2022]

Answer 348

It offers a "pedagogic convenience" because the four conditionals act as discrete teaching points that fit easily into a structural syllabus and are simple to test. [Burton, 2022]

Answer 349

It abandons treating the conditional sentence as a single unified structure and focuses primarily on the verb form in the if-clause. [Burton, 2022]

Answer 350

Because tense choice in the main clause functions the same as in other contexts, meaning learners can simply apply existing knowledge of modal verbs and future forms rather than learning them as part of a conditional pair. [Burton, 2022]

Answer 351

The term "zero conditional" only appeared relatively recently, coined likely because the numbers 1 to 3 were already assigned, and was absent from major studies of conditionals even in the late 1980s and early 1990s. [Burton, 2022]

Answer 352

It appears to have originated in W.S. Allen's 1947 learner's grammar, "Living English Structure". [Burton, 2022]

Answer 353

It uses the EGP's empirical data on learner competence at each CEFR level to organize conditional instruction into "core" and "non-core" information for multi-level syllabuses. [Burton, 2022]

Answer 354

Core information represents the basic structural knowledge and earliest competences demonstrated by learners, while non-core information includes variations and expanding repertoires typically acquired at higher proficiency levels. [Burton, 2022]

Answer 355

He categorized them using a four-way matrix distinguishing between past and non-past reference, and real and unreal (counterfactual) situations. [Burton, 2022]

Answer 356

The "real, past" conditional (e.g., "If it rained, the streets always flooded"), which Gabrielatos found to make up over one-third of the uses of the past simple in if-clauses. [Burton, 2022]

Answer 357

The core use of a present tense in the if-clause to refer to actions or states in the present or future. [Burton, 2022]

Answer 358

The core use of "if" plus the past simple to talk about repeated events in the past or events that may or may not have occurred on a specific past occasion. [Burton, 2022]

Answer 359

The core use of the past simple or past continuous in the if-clause to talk about the hypothetical present or future. [Burton, 2022]

Answer 360

The core use of the past perfect (simple or continuous) in the if-clause, accompanied by "would have" in the main clause. [Burton, 2022]

Answer 361

They are very infrequent, and by shifting the grammatical focus to individual clauses rather than the whole sentence, mixed structures no longer require a separate, dedicated analysis for learners to produce them. [Burton, 2022]

Answer 362

It accounts for 86 percent of the data, which increases to 89 percent if specific modal and continuous forms are categorized under the past simple or continuous. [Burton, 2022]

Answer 363

A categorization used in semantics and logic that conflates the traditional zero and first conditionals. [Burton, 2022]

Answer 364

A categorization used in semantics and logic that corresponds to the traditional second and third conditionals, describing unreal or hypothetical situations. [Burton, 2022]

Answer 365

Sentences where the if-clause shows the relevance of the main clause rather than setting up a condition for its truth, a form rarely covered in ELT. [Burton, 2022]

Answer 366

A philosophical term synonymous with relevance conditionals, derived from J. L. Austin's example: "There are biscuits on the sideboard if you want them". [Burton, 2022]

Answer 367

A database of over 1,000 grammar competency statements mapped to CEFR levels based on an analysis of the Cambridge Learner Corpus. [Burton, 2022]

corpus Flashcards

(395 cards)