1. Help Center
  2. Guide to complete tasks
  3. "I like to transcribe" tasks guide

Transcription Guide


Transcription is the commitment of an audio signal to a textual representation. This can include speech data, such as conversation, as well as non-verbal sounds, such as phones ringing. 
The transcription data must be of high quality. In this case, “high quality” means annotating in a consistent manner, in careful concert with the parameters outlined in these guidelines. 

Annotation guidelines:

General principles.

These general principles are expanded below in each relevant section. 
  • Transcription should represent all words as spoken – including hesitations, filler words, and false starts.
  • Transcriptions must be orthographic, not phonetic. Refer to American Heritage Dictionary for reference: https://ahdictionary.com/
  • Transcription should include only upper and lowercase letters, apostrophes, periods, question marks, commas, and spaces. No numbers or other special characters. 
  • If you cannot understand what the speaker says and the speech is unintelligible, use the footnote [INAUDIBLE hh:mm:ss] as described in this article.

Speech event transcription.

Use orthographic spelling:

Transcriptions must be orthographic, not phonetic. Mispronunciations should be represented in their correct orthographic form. If a word is deliberately mispronounced, such as for comedic effect, do represent the variation in the transcription. However, dialectic variations such as “darlin'” should be given the orthographic standard, such as “darling”.
“Call your representive.” = “Call your representative.”
“Issall well n’ good darlin’.” = “It’s all well and good darling.”
“The volcano said: I lava you.” = “The volcano said I lava you.”
If the spelling of a word is unclear, use the American Heritage Dictionary as a standard reference: https://ahdictionary.com/.  To reference the names of song titles, movies, TV shows, brands, etc. please research to get the correct spelling.


Standard contractions must be transcribed as pronounced, including the apostrophe, such as “isn’t”, “where’s”, “you’re”, “y’all”.  
These words are allowed when transcribing in English:  
  • gimme
  • gonna
  • gotta
  • lemme
  • wanna
  • watcha
  • kinda


Never introduce abbreviations in the transcription, always spell out the full word when pronounced as such.  Transcribe abbreviations only if the abbreviation is explicitly articulated by the speaker. Do not add a period after abbreviated words (unless it’s at the end of a sentence).
“He’s 6 ft 2!” = “He’s six foot two.”
“I live in Cambridge, Mass.” = “I live in Cambridge, Mass.”
“Talk to Dr. Smith at Cal.” = “Talk to Doctor Smith at Cal.” 

Stumbled speech and corrections.

Represent all speech, including false starts and corrections. Truncated words are represented with a *  as described in the section below. 
“Directions to the… to the… the hotel” = “Directions to the… To the… The hotel.”
“Ale… Alexa play Janet Jackson… no wait…” = “Ale*… Alexa play Janet Jackson… No, wait.”

Overlapping speech.

If there is an overlapping speech where two or more speakers are talking at the same time, you should transcribe what each person says on separate lines. If you don't understand what one of the speakers is saying, use the footnote [INAUDIBLE hh:mm:ss] as described in this article


Use punctuation as required by the grammar rules. When transcribing a language other than English, use punctuation symbols and rules that are appropriate for that language. For example, in Spanish, ¿? is used as in standard orthography.
  • Use end-punctuations (full stop, question mark) to indicate the end of a complete sentence. 
  • Use punctuation symbols that are an essential part of the word, such as apostrophes. 
  • Use commas to break up long stretches of speech. This is to facilitate reading comprehension. 
  • AVOID semi-colons.
Of the list of permissible punctuations, we expect that commas will be the most difficult one to implement. We understand that you will have to make some relatively subjective and stylistic decisions on the use of the comma, and disagreements are not necessarily errors.

Acronyms or Spelled Out Words.

Do not use periods following a letter spoken as a letter, such as when a name is spelled out or an acronym is spelled out. 
“My name is John – jay, oh, eich, en”. = “My name is John J O H N.”
“I work at IBM” = “I work at I B M.”
“I work at NASA” = “I work at NASA.” 


Use a comma when it is necessary to make a transcript more readable. Below are some suggestions of when a comma should be used:
  • To separate items in a list of three or more, using the serial (aka Oxford) comma (i.e., the comma before the conjunction that joins the last two elements:
I enjoy skydiving, snowboarding, and mountain biking.
  • To set off a direct address:
Maryam, listen to me carefully.
I'm not calling you, my friends, just to whine about my life.
  • To break up the compound and complex sentences:
I would like to join you, but I'm afraid I have class at that time.
Marcos and I couldn't go to the jazz concert, so we watched it on TV instead.
  • To set off introductory words and phrases:
Therefore, they cancelled their trip.
After taking a break, the team resumed their meeting.
  • Around parenthetical phrases:
That report on the New York Times was, to say the least, a bombshell.
Getting a hotel by the sea, like the one we stayed last year, would be superb.

Exclamation marks.

Do not use exclamation marks ever.


Use apostrophes in contractions, possessives of individual letters, possessive “s”, or as part of a person’s name. 
“That’s where it’s at” = “That’s where it’s at.”
“Project Q’s timeline” = “Project Q’s timeline.”
“Sinead O’Connor” = “Sinead O’Connor.”
“Eleven o’clock” = “Eleven o’clock.”
“Read Jess’ email” = “Read Jess email.”


You shouldn't use hyphens while transcribing, instead, use a space to replace the hyphen, for example: Twenty seven, x ray, t shirt.

Truncated words.

Use * to indicate truncated words, whether at the beginning or the end.  Use * also to represent false starts. 
“…exa, stop the mu…” = “*Exa stop the mu*…”
“Ale… alexa … stop the mu… the music.” = “Ale*… Alexa stop the mu*… The music”

Special symbols.

Special symbols should never be used in the transcription, the only ones allowed are apostrophes and spaces as part of the transcription convention. Everything else should be spelled out.  When one of the allowed special characters is used in speech, transcribe it as it was pronounced. 
“I have like $0” = “I have like zero dollars.”
“It was great/weird” = “It was great slash weird.”
“… and +, she didn’t know!” = “and plus she didn’t know.”
“My email is m-golden@...” = “My email is M dash golden at.”
“http://www.amazon.com” = “H T T P colon slash slash W W W dot Amazon dot com”
“http://www.wikipedia.org” = “H T T P colon slash slash W W W dot Wikipedia dot O R G”


Capitalization should follow orthographic conventions. Capitalize the first word of a sentence. Proper names include human names (Jeff Bezos), place names (France), product names (iPad, Xbox), company names (eBay, Amazon), acronyms (NASA), and so on.  
“I want to visit Oregon” = “I want to visit Oregon.”
 “I work at IBM” = “I work at I B M”
“George W Bush paints now” = “George W Bush paints now.”
“I’m going to Mexico on Thursday” = “I’m going to Mexico on Thursday.”


Numbers should never be represented numerically, and should always be written out alphabetically.  Ordinal numbers should be represented as pronounced. 
"5" should be transcribed as five
“5th” should be transcribed as fifth.
“306” = should be: three hundred and six, three O six, or three zero six depending on how it was pronounced. 
“Play radio 109.4 FM” = “play radio one O nine point four F M”.
“Beverly Hills, 90210” = “Beverly Hills nine O two one oh”. 
Larger numbers typically written with hyphens should be transcribed without a hyphen.
“25” = twenty five.


Acronyms spoken as words should be transcribed as words in upper case without white spaces between the letters. Initialisms (words spoken as individual letters) should be written as upper case letters followed by periods and with a white space in between each letter.  
“I work for IBM.” = “I work for I B M”
“I work for NASA.” = “I work for NASA”
“Check it out on IMDB” = “Check it out on I M D B”
“I like ZZ Top.” = “I like Z Z Top.”
Transcribe plural acronyms as an “s” following the period.  Transcribe possessives on an acronym with an apostrophe and an “s”. 
“The SATs are nerve-wracking.” = “The S A Ts are nerve wracking.”
“He’s from Washington DC’s downtown.” = “He’s from Washington D C’s downtown.”

Unintelligible words and phrases.

If a word cannot be understood within a larger phrase, transcribe all segments that are understandable, and use the footnote [INAUDIBLE hh:mm:ss] as described in this article to mark the unintelligible word or phrase.   
“Alexa play ???? on spotify” = “Alexa play [INAUDIBLE 00:22:13] on Spotify”
If you have a guess of what the word/phrase might be but are not sure, use the footnote [sic hh:mm:ss].
“Alexa read ????? from audible.” = “Alexa read Cat In The Hat [sic 00:00:07] from audible”
For entire segments that are unintelligible use the footnote [INAUDIBLE hh:mm:ss] as described in this article.   

Speaker Labelling.

General projects do not require speaker labeling. Please refer to specific project requirements or consult the Support Service for further information.

Non-speech sound inventory.

When a nonverbal sound occurs such as a yawning, applauses, music, interjections, or a filler word you should always register it by using the F8 key.

These are the nonverbal sounds you must register (categorized):


  • ah -- Expression of surprise, pain, etc.
  • eh -- For example: "Eh, you"
  • er -- The speaker is thinking
  • ew -- Expression of disgust or disgust
  • jeez -- Expression of surprise or annoyance
  • mm -- for example: "Mm, is delicious"
  • nah -- The speaker is saying no
  • oh -- Expression of surprise, pain, etc.
  • uh -- Question or confusion expression
  • uh-huh -- The speaker is nodding
  • uh-oh -- The speaker is saying no
  • um -- The speaker is thinking
  • whew -- Relief expression
  • whoa -- Surprise expression
  • yay -- Expression of  happiness
  • yep -- The speaker is nodding

Other human sounds:

  • Applauses
  • Yawn, breath, or sigh
  • Lipsmark
  • Sneeze, cough, or throat
  • Hiccup
  • Crying or sob
  • Laugh
  • Rage/Fury
  • Cheers -- Hip hip hurra
  • Other human sounds - different to all listed

Ambient sounds:

  • Beep -- Answering machine
  • Telephone dialing
  • Static - Continuous background noise
  • Voice recorder answering machine
  • Music or singing
  • Keyboard sounds
  • Telephone ring
  • Other ambient sounds - different to all listed

If the sound occurs repeatedly, represent it only once, never register two identical sounds without at least one intermediate word, no matter, for example:
“Wait … *click* *click click* *click* there.” = “wait [click hh:mm:ss] there”
Do not split words to insert a non-speech sound tag, even if it occurs this way in the audio.
“I will abso-*ring*-lutely open it!” = “I will [ring hh:mm:ss] absolutely open it”

Check this article for full details.