HOW TO: Create Captions in YouTube for Ensemble Video
Joel Steinfeldt, Media Communications Specialist at the University of Illinois Urbana-Champaign, recently shared with me the details of how to use YouTube transcription and closed caption functions with Ensemble Video (Thanks, Joel!).
Joel has experimented with YouTube’s new captioning functions and is incorporating it into his Ensemble Video captioning workflow. He routinely uploads videos to YouTube University account for Illinois, and the automatic conversion of transcript to time-coded closed caption file has proven a real time saver.
According to Joel:
“The ability to use YouTube and Google Voice’s algorithms to convert any .txt file to a format [for] Ensemble may be a potentially huge development here and possibly at other schools who are also required to caption…”
In an earlier blog entry, we documented how to create and upload closed caption files to Ensemble Video. YouTube has integrated Google Voicespeech-to-text technology to offer two new capabilities that may be of particular interest to Ensemble Video users who want or need to caption their content:
- Auto-captioning – automatically converts a text-based transcript into a time-coded caption file.
- Machine transcription – a mechanism for automatically converting speech to text transcription.
With these capabilities, you may be able to shave some time and effort off of your captioning workflow.
YouTube can automatically break a text transcript into chunks, and create a time-synchronized caption file. First you have to create a transcript where you enter captions in a plain text file, preferably with a blank line separating the text into bite-sized chunks that will form you captions.
Once you’ve created the transcript, go to the Captions tab of your YouTube video and click on the “Add New Captions or Transcript” button, and upload your transcript. Once uploaded, the system will process the file, which can take a little while — or can take quite a while for long form videos. Using speech-to-text technology, YouTube will automatically create and and synchronize captions and create a .sbv formatted caption file.
You can download the .sbv file, then convert to a Timed Text XML file using a caption converter like the one provided by the Web Accessibility Center at the Ohio State University. The resulting XML file can be tweaked and uploaded to Ensemble Video to caption your Ensemble Video content. Here is an example.
The Local Life (with automatic YouTube transcipt->caption conversion)
Here is the sample transcript file that was uploaded to YouTube:
Hi, I’m Jamie Hudson from The Local Life. We operate a platform that gives small businesses publishing and reputation management tools they need to manage their business and get new customers. We provide this tool as a white label solution to local media companies, and we partner with those media companies to bring these tools to local businesses.
You can reach our Web site at www.thelocallife.com, and for pretty much any information about the Web site you can find there.
Here is the .sbv caption file generated by YouTube:
Hi, I’m Jamie Hudson from The Local Life.
We operate a platform that gives small businesses
publishing and reputation management tools
they need to manage their businesses and get
new customers. We provide this tool as a white
label solution to local media companies, and
we partner with those media companies
to bring these tools to local businesses.
You can reach our Web site at www.thelocallife.com,
and for pretty much any information about
the Web site you can find there.
The XML Timed Text file generated by Ohio State University’s caption converter is generally compatible with Ensemble Video’s captioning mechanism. But, before you upload in Ensemble, you’ll need to change the name space URL (change the “04” to “10”) and remove the styling code. If you have caption chunks that break across two lines in the .sbv file, you will want to also remove “<br />” tags that are inserted but not recognized by the Flash-based player used by Ensemble Video; and you may need to clean up special characters that aren’t converted properly. You can do this with a text editor like TextWrangler or Notepad. The result should be a file that looks something like this:
<?xml version=”1.0″ encoding=”UTF-8″?>
<p begin=”0:00:00.300″ end=”0:00:07.180″>Hi, I'm Jamie Hudson from The Local Life. We operate a platform that gives small businesses</p>
<p begin=”0:00:07.180″ end=”0:00:11.410″>publishing and reputation management tools they need to manage their businesses and get</p>
<p begin=”0:00:11.410″ end=”0:00:17.849″>new customers. We provide this tool as a white label solution to local media companies, and</p>
<p begin=”0:00:17.849″ end=”0:00:23.159″>we partner with those media companies to bring these tools to local businesses.</p>
<p begin=”0:00:23.159″ end=”0:00:30.159″>You can reach our Web site at www.thelocallife.com, and for pretty much any information about</p>
<p begin=”0:00:30.929″ end=”0:00:31.879″>the Web site you can find there.</p>
To add captions to your Ensemble Video content, upload this file through the “Captions” dropdown in the Manage Content form in the Ensemble Video Add/Edit wizard. See Creating and and Synchronizing Closed Captions elsewhere on this blog for more details.
While the transcript uploaded above was captioned properly from a technical standpoint, it could have been made more readable with better chunking. I edited the transcript to manually “chunk” it before upload, to improve readability (see Chunking the Transcript for more on this). Here is the result.
The Local Life (transcript chunked prior to upload to YouTube)
Beyond the basics: best practices for captioning
The method outlined above works for video where there is only one speaker and there are no sound effects, music, or offstage voices. But more work usually needs to be done to make the captions fully accessible. This includes breaking sentences correctly, indicating who is speaking, and correctly identifying music and lyrics. For more information about best practices for captioning online video see this PDF from the Described and Captioned Media Program and the WGBH Media Access Group’s Web page on captioning styles and conventions.
YouTube machine transcription
Normally, when you want to create closed captions for a lecture, speech, or some other video, you need to have someone manually create a complete text transcript that can be processed further to create a closed caption file.
You can now generate a transcript automatically in YouTube. Just upload your video file, and after it’s been processed, click on the Captions and Subtitles tab, and then click on the “Request Processing” button as shown here.
YouTube speech-to-text technology can automatically generate a transcript from the audio track of your video. Unfortunately, the conversion success rate can be very disappointing. I tried it on a several videos, and to be honest the results were less than impressive from an accuracy standpoint. This leads to a lot of editing before the transcript is ready to be converted to closed caption file. Here is an example of automatic machine transcription using the the video captioned above.
The Local Life (with YouTube machine-generated transcript)
You can try your own content to see if you get better results, and if you do, you can then automate that process, and together with the YouTube automatic captioning process outlined earlier in this blog entry, create a very smooth process for captioning your videos.
YouTube’s has Google speech-to-text technology that enables auto captioning and machine transcription which may prove quite useful for many Ensemble Video users.