Storytime AI home

Literacy Glossary

What are running records? The reading assessment being phased out

A definition you can quote

A running record is a one-on-one oral reading assessment in which a teacher listens to a student read a leveled text aloud and codes every word read — correct reads, substitutions, omissions, insertions, repetitions, and self-corrections — using a standardized notation. The teacher then computes an accuracy rate, a self-correction ratio, and an MSV (Meaning/Syntax/Visual) breakdown of the errors.

Developed by New Zealand literacy researcher Marie Clay in the 1960s, the running record became the central diagnostic instrument of Reading Recovery and a near-universal practice in balanced-literacy and Fountas & Pinnell Guided Reading classrooms. It is now being actively phased out in Science-of-Reading-aligned districts because the MSV framework it depends on conflicts with cognitive-science findings about how skilled reading works.

How a running record is taken

The procedure is consistent across implementations:

  1. The teacher selects a passage at the student’s expected instructional level (often a Fountas & Pinnell or Reading Recovery leveled text).
  2. The student reads the passage aloud while the teacher sits beside them with a copy of the text.
  3. The teacher codes each word in real time using standard marks:
    • Check mark above a word = read correctly.
    • Word written above the text = substitution (what the student said).
    • A line through the word = omission.
    • Caret with an inserted word = insertion.
    • “SC” = self-correction (student noticed the error and fixed it).
    • “R” = repetition (re-reading the same word or phrase).
  4. After the reading, the teacher counts errors and calculates:
    • Accuracy rate = (words read correctly / total words) × 100.
    • Self-correction ratio = (errors + self-corrections) / self-corrections.
    • MSV breakdown = the cueing-system code applied to each substitution.

Accuracy thresholds drive placement: 95%+ is independent level, 90-94% is instructional level, and below 90% is frustration level.

The MSV miscue framework

The MSV portion is where running records depart from neutral assessment and become an interpretation of reading behavior built on a specific theory of reading. For each substitution, the teacher decides which “cueing systems” the student was using:

  • M (Meaning): Does the substitution preserve the meaning of the sentence? “Pony” for “horse” — yes, M.
  • S (Syntax): Is the substitution grammatically plausible in context? “Pony” for “horse” — yes, S (same part of speech).
  • V (Visual): Does the substitution look like the printed word? “Pony” for “horse” — no, V is not used. “House” for “horse” — yes, V.

A student who consistently makes M+S substitutions but ignores V is read as “neglecting print” — relying on context and guessing. A student who makes V-only substitutions (“house” for “horse”) is read as “over-relying on visual cues without monitoring meaning.” Both diagnoses come straight from Ken Goodman’s “psycholinguistic guessing game” model and Marie Clay’s reading-process theory.

Why the Science-of-Reading community pushed back

The MSV framework rests on a theory of reading that cognitive science has rejected. Three concrete objections:

  • Meaning and syntax are not equivalent to visual decoding. Skilled readers identify words by automatic orthographic recognition — they don’t guess from context. Treating M, S, and V as three coequal cueing systems misrepresents how the reading brain actually works.
  • “Meaningful” miscues are word-recognition failures. When a student reads “pony” for “horse” and the teacher praises the substitution as a sign of good reader behavior, they are celebrating that the student couldn’t read the word. The student needed to decode “horse” and didn’t. Coding the error as M+S obscures that failure.
  • Running records don’t measure orthographic mapping. They don’t probe whether a student can decode unfamiliar regularly-spelled words, sound out nonsense words, or store new word forms. The instrument is silent on the most important early-reading skill in the Science-of-Reading model.

A practical objection also looms: running records are slow (one student at a time, often 15-20 minutes), expensive in teacher time, and unreliable across raters. Two trained teachers given the same audio routinely disagree on error counts and MSV codes.

What’s replacing them

Districts moving to Science-of-Reading-aligned assessment have largely shifted to a different stack:

  • ORF with WCPM scoring — a timed one-minute oral reading passage scored on accuracy, words correct per minute, and prosody. DIBELS, Acadience Reading, and aimswebPlus are the standard instruments.
  • DIBELS Nonsense Word Fluency (NWF) — a measure of pure decoding in isolation, immune to vocabulary or context guessing.
  • Decodable phrase fluency — short decodable phrases that probe whether the student can apply taught phonics patterns to connected text.
  • Comprehension assessed separately — through standardized passage-based assessments rather than inferred from oral reading miscues.

The result is a cleaner diagnostic picture: decoding and comprehension are measured on separate instruments, faster to administer, normed and reliable across raters, and silent on disputed theoretical claims about cueing systems.

Several state literacy laws now explicitly prohibit or de-emphasize running records — including Florida, Mississippi, and others — and require districts to use evidence-based screening instruments. Some districts still administer running records administratively or for parent communication, but most no longer use them to drive instructional decisions.

How Storytime relates to running records

Storytime ORF challenges score oral reading on WCPM, accuracy, and prosody with no MSV miscue analysis. Audio recordings are transcribed, compared to the source text, and scored on the three measures separately — there is no “cueing systems” interpretation layered on top of the error count.

Decoding and comprehension are presented to teachers as separate signals in the Skill Tree dashboard. A student who reads “pony” for “horse” loses an accuracy point, exactly as they would on DIBELS or Acadience. There is no version of the error where it is praised as a meaningful substitution. Word-recognition failure is reported as word-recognition failure.

Comprehension is assessed through separate comprehension quizzes tied to each decodable book, so teachers see decoding-vs-comprehension patterns clearly rather than merged into a single “cueing” picture.

Where to start

If your district is migrating off running records, the practical sequence is straightforward: pick an ORF screener (DIBELS, Acadience, or built-in like Storytime ORF), establish three benchmark windows per year, layer DIBELS NWF for decoding-in-isolation diagnostic data, and assess comprehension separately on a normed instrument. Decoding and comprehension belong on different rulers.

Frequently asked questions

(Answered above in the FAQ block — surfaced via JSON-LD FAQPage schema for AI extraction.)

Frequently asked questions

What is a running record?
A running record is a one-on-one oral reading assessment in which the teacher sits with a student, listens to them read a leveled text aloud, and codes every word — correct reads, substitutions, omissions, insertions, repetitions, and self-corrections — using a standardized notation. The teacher then calculates an accuracy rate, a self-correction ratio, and an MSV (Meaning/Syntax/Visual) breakdown of the student's errors.
How do you take a running record?
Choose a passage at the student's instructional level. Sit beside them with a copy of the text. As they read aloud, mark each word: a check for a correct read, the student's substitution written above the printed word, 'SC' for self-corrections, lines through omitted words, and carets for inserted words. After the reading, count errors, calculate the accuracy percentage, and code each substitution as M, S, V, or some combination.
What does MSV coding mean?
MSV stands for Meaning, Syntax, and Visual. For each substitution the student makes, the teacher decides which cueing systems they used. 'Pony' for 'horse' is coded M and S (it preserves meaning and grammar) but not V (the letters don't match). 'House' for 'horse' is coded V (visually similar) but not M or S. The MSV breakdown is meant to diagnose which 'cueing systems' the student is using.
What accuracy rate counts as instructional level?
The traditional thresholds: 95% or higher is independent level (the student can read the text alone), 90 to 94% is instructional level (appropriate for teaching), and below 90% is frustration level (the text is too hard). These thresholds drive placement decisions in Guided Reading and Reading Recovery.
How is a running record different from an ORF assessment?
ORF scores oral reading on three measures: accuracy, words correct per minute (WCPM), and prosody. It is timed (one minute), produces a single normed score, and does not interpret why errors happened. A running record is untimed, produces an accuracy percentage plus an MSV interpretation of errors, and is built on the three-cueing model that cognitive science has rejected.
Why do Science-of-Reading advocates object to running records?
The MSV framework treats meaning and syntax as equally valid ways to identify a word, alongside visual decoding. Cognitive-science research shows skilled readers identify words by automatic orthographic recognition, not by guessing from context. When a teacher praises a student for reading 'pony' instead of 'horse' because it 'made sense,' they're celebrating a failure of word recognition. Running records institutionalize that error.
What's replacing running records?
Most districts moving to Science-of-Reading-aligned assessment use ORF passages scored on WCPM (DIBELS, Acadience, aimswebPlus), DIBELS Nonsense Word Fluency for decoding in isolation, and separate standardized comprehension assessments. Decoding and comprehension are measured separately rather than merged into a single 'cueing' picture. The result: clearer diagnostic signal, faster administration, fair across raters.