Transcription Guide


Annotation guidelines

General principles

  • Transcription should represent all words as spoken – including hesitations, filler words, and false starts.
  • Transcription must be orthographic, not phonetic. Refer to American Heritage Dictionary for reference:
  • Transcription should include only upper and lowercase letters, apostrophes, tildes, hyphens, periods, question marks, commas, and spaces. No numbers or other special characters. 
  • If you cannot understand what the speaker says and the speech is unintelligible, use the footnote [INAUDIBLE hh:mm:ss] as described in this article.

Speech event transcription

Use orthographic spelling


  • gimme
  • gonna
  • gotta
  • lemme
  • wanna
  • watcha
  • kinda


Stumbled speech and corrections

Filler words

  • uh
  • um
  • ah
  • er
  • hm


  • eee
  • ew
  • huh
  • hm
  • jeez
  • mm
  • mhm
  • nah
  • oh
  • uh-huh
  • uh-oh
  • whoa
  • whew
  • yay
  • yep

Overlapping speech


  • Use end-punctuations (full stop, question mark) to indicate the end of a complete sentence. 
  • Use punctuation symbols that are essential part of the word, such as apostrophes and hyphens. 
  • Use commas to break up long stretches of speech. This is to facilitate reader comprehension. 
  • AVOID semi-colons.

Acronyms or Spelled Out Words


  • To separate items in a list of three or more, using the serial (aka Oxford) comma (i.e., the comma before the conjunction that joins the last two elements:
  • To set off a direct address:
  • To break up compound and complex sentences:
  • To set off introductory words and phrases:
  • Around parenthetical phrases:

Exclamation marks



  • a-line 
  • d-day
  • ex-boyfriend, ex-drummer, ex-girlfriend, ex-husband, ex-wife
  • extra-loud
  • self-aware
  • t-shirt
  • u-turn
  • v-neck
  • x-ray

Truncated words

Special symbols




Unintelligible words and phrases

Speaker Labelling

Non-speech (acoustic event) transcription

Non-speech sound inventory

  • [lipsmack] -- Lipsmacks, tongue-clicks.
  • [breath] -- Inhalation and exhalation between words, yawning.
  • [cough] -- Coughing, throat clearing, sneezing.
  • [laugh] -- Laughing, chuckling.
  • [click] -- Machine or phone click. 
  • [ring] -- Telephone ring.
  • [dtmf] -- Noise made by pressing a telephone keypad.
  • [sta] -- At the start of continuous background noise (static)
  • [cry] -- Crying/sobbing 
  • [applause] -- Applause, clapping, cheering
  • [prompt] -- IVR prompts or voice recordings commonly found at the beginning of calls
  • [music] -- Music or singing

For any other foreground noise, use [noise hh:mm:ss].