When Marvin Gaye Met Amazon Transcribe & PowerShell – Automating Subtitle Creation – Part I

“I Heard it Through the Grape Van”

It’s been a while since my last blog, so I had to try and think of something a bit more eye-catching than the previous ones for the title. 🙂  That said, the title and heading are actually very accurate… #mysterious

These set of posts cover how one of the AWS services, Transcribe, can be used, in this case in combo with PowerShell, to create a subtitles file for any video, which can then be used for viewing. There’s quite a bit of content, so as mentioned, it’s being split across several posts.

Todays post provides a background to the main parts that will be used. These are two AWS services Amazon S3 and Amazon Transcribe, a subtitle file, and AWS Tools for PowerShell Core Edition, and a video of the legend himself, Marvin Gaye.

Amazon Transcribe/S3

Amongst the plethora of services AWS offer is Transcribe, or to be more precise, Amazon Transcribe. Part of AWS’s group of Machine Learning offerings, Transcribe’s role is fairly straightforward. Feed it a supported media file (FLAC, MP3, MP4 or WAV) from a bucket on S3 and it will process the file, endeavoring to provide as best as possible a transcription of it. Upon successful completion of a job, a JSON formatted file becomes available for download.

The file itself contains a summary of the conversion at its beginning:

Which is then followed by a breakdown of the job. This consists either of data about the next word identified (start and end time, the word, a ‘confidence’ rating from the service that it has correctly identified the word, and its classification…

…or if its found an appropriate place that would use punctuation.

Unlike the other formats supported, MPG4 can also (and usually does), consist of one of more additional streams than audio. Typically this will be video content, but it might also include additional streams for other audio (think different languages, or directors/producers comments for example) or subtitles.

Subtitle Files

At their core, subtitle files simply contain textual descriptions of the content of its accompanying video file. This is typically dialogue, but also other notifications, such as the type of music being played, or other intonations. Accompanying these are timespan indicators, which are used to match this information up with the video content.

The most common file format in use is the Subrip format, better recognised by its extension of SRT. These files are arranged in a manner similar to below:

00:02:30.268 --> 00:02:41.958
S o we announced, transcribed and translated 
reinvent in december last year in las vegas.
Line by line respectively, these consist of :
  • The numeric counter identifying each sequential subtitle
  • Start and end time for the subtitle to be visible, separated by the marker you see.
  • The text itself, typically between one and two lines, and ideally restricted to a number of characters per line
  • A blank line indicating the end of this sequence.

Looking at the two different forms of text data in Transcribe and SRT format respectively, you’ll probably have already noticed that the former contains enough information that should allow, with a bit of transformation, the output to be in the latters.

AWS Tools for PowerShell Core Edition

PowerShell Core is Microsoft’s cross platform implementation of PowerShell and as such can pretty much run on any platform that has .NET Core installed on it. AWS provide a module for this platform, AWS Tools for PowerShell Core Edition. Consisting of, at present, 4136 cmdlets, it pretty much covers all of the broad spectrum of services available from the provider. Amongst these are the set of ones for the Transcribe service, ironically numbering only three.

Marvin Gaye

Needing no introduction whatsoever, the posts over the next day or so make use of an MP4 file of the legend singing I Heard it Through the Grapevine acapella. If you really feel the need to follow along exactly, then its fairly straightforward to find and download. It’s most definitely worth a listen in any case if you’ve not heard it already.

With all the background set, part II will kick in properly with getting setup for the script and the beginning of its implementation.



Leave a Reply

Your email address will not be published. Required fields are marked *