- About Us
- Programs for Children and Youth
- Workforce for All
- DME Reuse
Written by Rob Carr, Oklahoma ABLE Tech IT Accessibility Coordinator
Has ABLE Tech’s Information and Communication Technology know-how helped your agency, institution or organization? We’d love to know how! Please fill out this survey and let us know if we can use your comments for our promotional or reporting purposes. Your feedback will be really helpful!
Captioning audio and video files makes the content possible for people who are deaf or hard of hearing to perceive spoken dialogue. There are lots of other advantages to providing transcripts for audio and captions for video:
The challenge for most organizations, though, is not in understanding that they need to create transcripts or captions. The challenge is usually in doing it.
Different kinds of multimedia need different kinds of text equivalents. But they all start with an accurate transcript. An accurate transcript will capture everything that’s spoken, along with punctuation and identification of speakers if there is more than one person in the track.
If you have an audio-only track, then the transcript is all you need. You can publish it in the webpage that you publish your audio track on and give people a way to get to your content even if they cannot hear it.
If your track is audio and video, then you need to add timing information to your transcript so that the words appear on screen at the same time that they are spoken. Combining the accurate transcript with the timing information gives you captions.
So, if you have an audio/video track, then you need to be sure that you have captions so that the text appears in sync with the action on screen.
I cannot stress enough that your transcripts need to be accurate. They need to capture the text, word for word (in most cases). They also need to show when music is playing or, if possible, reflect an important sound. Transcripts and captions serve as a text equivalent, not a text approximation, of what is said.
First and foremost, ABLE Tech does not endorse vendors. I mention several below, but you should do your own research and determine which vendor will work best. Especially if you have specialized language or jargon, then you’ll want to make sure that the vendor can handle that. They may need you to provide a dictionary that they can use to make sure that they don’t misspell things.
There are two ends of a spectrum here, and you can also land somewhere in between.
At one end of the spectrum, you can do the captions yourself. Software such as CaptionMaker and MAGpie let you create captions from scratch, manually. The benefit of creating the captions yourself is that the accuracy is generally higher after you make the first pass through. The downside is that it can be time consuming, especially for longer tracks.
You can also use automatic speech recognition tools to create your captions. There are both free and paid tools that will convert the spoken words in a track into text. Docsoft, for example, is an appliance that creates caption files in several common formats. It uses technology that learns a speaker’s voice. YouTube also has automatic captioning available, and tools like Camtasia have begun to integrate text to speech creation into their functionality. Generally speaking, though, you cannot just run a track through an automated tool and get accurate captions out. Each tool that is available offers transcript editing features, and you must use these to make corrections to text, punctuation, and timing as needed. Don’t publish a video on YouTube and expect its automatically created captions to be accurate enough to be useful. It’s critical to go in and make appropriate edits.
Another approach to creating your own captions is referred to as “re-voicing.” This combines a person, a microphone and speech recognition software like Dragon Naturally Speaking. With this approach, you watch the video or listen to the audio and dictate everything that you hear to Dragon. It creates a transcript based on what you dictate. This can be quicker than using tools like CaptionMaker, since your voice creates the transcript. And Dragon Naturally Speaking does a very good job of learning individuals’ voices. If you are the one that dictates to Dragon, then you typically end up with a very accurate transcript that requires less editing than is required after using an automatic captioning tool. Then, just use a captioning tool like CaptionMaker or even YouTube to sync the text up with the timing of the video.
At the other end of the spectrum, you can outsource the creation of your captions to companies like 3Play Media, Automatic Sync Technologies or Alternative Communication Services. You may also be able to find a service that is more local. The benefit of outsourcing the captioning is guaranteed accuracy and more control over the timing. Third parties guarantee 99% or better accuracy and their typical turnaround is 2 days. You can usually pay more and lower the turnaround to 24 hours or sometimes less. The downside is that there is an expense associated with the services. The costs vary, and you should contact captioning companies to see what their fees are.
And then you can come in somewhere between doing it yourself and outsourcing. You can have the captioning company just create a transcript for you. The cost of this is usually less than creating captions, but the accuracy is still guaranteed to be very high. Turnaround times may be lower as well, depending on the vendor you work with. Then, you can take your transcript and use tools like CaptionMaker or YouTube to apply the timing and sync everything up. This hybrid approach can save you some money, still give you a very accurate transcript, and possibly speed the process up as well.
There is not a hard and fast set of rules that will tell you which approach you should use. It depends on resources that you have available, how quickly you need the transcript or captions, and the duration of the clip. Other things may point you in the right direction.
Generally speaking, it is easier to create transcripts or captions for shorter media clips. A 5 minute video may take anywhere from 30 minutes to an hour to caption, depending on the tools you use and your experience creating captions. A 1 hour video may take upwards of 4-6 hours to caption, if not longer.
Even when you begin with a transcript or caption file that comes from an automatic, speech-to-text system, you should expect to invest a fair amount of time into correcting text, punctuation and timing.
The best advice is to find the approach or approaches that fit your needs. You can always call ABLE Tech for advice using our toll-free number, 888-885-5588.
Access to multimedia is one area where we consistently see gaps. There is more and more video on the web, in classrooms and in training rooms. But all too often it is not captioned, or the publisher just lets YouTube guess and doesn’t correct the result. It is vital to provide accurate transcripts for your audio and accurate captions for your video. Build the cost for transcription and captioning into your production budget and process so that it’s not something that you think about after a video is published and someone calls to tell you that they can’t use the video without captions.
Captioning Key, with guidelines for creating high quality captions (Note that these are guidelines and not legal standards)
Oklahoma ABLE Tech
Oklahoma State University | Department of Wellness
1514 W. Hall of Fame | Stillwater, OK 74078
888.885.5588 (V/TTY) | Email: email@example.com