In today’s post, we’re going to be doing the fun part of putting everything together to get our project into actions. We’ll be uploading our code to S3, creating our Lambda function, and then creating an event subscription that will trigger the function when we’ve uploaded an mp4 file to the bucket being used. As well already are aware, this should then begin processing of the file and create a transcription using Transcribe.
Uploading our Code to S3
When creating a Lambda function that runs Go code, we need to provide a zipped file of the Go code. This can either be carried out via an upload of the file on your development system, or alternatively by uploading the zip file to S3 and providing the location information.
Of the two options, the latter is the most flexible for us since we can update our code and upload a new zip file to this location without needing to change any configuration.
Note that I’m doing these steps on OSX and using Bash. For Windows systems you may need to use slightly different context.
Ensure you’re using the current Github release
Change the current directory to ./src/transcribe within the project folder.
The Lambda code runs on Linux, so we need to ensure when compiling the code that the compiler knows this (via GOOS=linux). We also specify the output file to be main
GOOS=linux go build -o main
Now, we can create the zip file
zip main.zip main
An inspection of the contents of the directory should show a new files, main, and the zip file of it, main.zip.
Uploading the File
../upload/upload -bucket "aws-lambda-go-transcribe2srt" -filename main.zip
Now, if you login to the AWS console, and take a look at the S3 bucket, main.zip will be there. Click on the file to get its properties and copy the URL under Link into the clipboard. We’re going to be using this in the next step.
Creating the Lambda Function
From the AWS console:
Click Services on the black bar, and then Lambda
Click Create Function
Now, in the Author from scratch section, enter the following values.
- Name : transcribe
- Runtime: Go 1.x
- Role: Create new role from template(s)
- Role name: transcribe_role
Click Create function
The Designer window appears, featuring the transcribe function, and with a role already defined to allow access to Cloudwatch.
Next, we tell Lambda where to get the code.
In the Function code section, select:
- Code entry type : Upload a file from Amazon S3
- Runtime: Go 1.x
- S3 link URL : <paste the link that you copied into the clipboard in the previous steps>
- Hander : main
Create an S3 Trigger
Now that the basic function is in place, we need to configure it for our specific needs. its As previously mentioned, we want it to run when a new .mp4 file is created in our S3 bucket.
On the left hand side, click S3
This will add it onto the console, and a Configure triggers dialog will appear.
Change Suffix to .mp4 and select Add
Give the Lambda function access to Transcribe
We need to give transcribe_role permissions to access amazon transcribe in addition to S3 and Cloudwatch
- Click Services, IAM
- Click Roles
- Click transcribe_role
- Click Attach policies
- Find and put a check next to AmazonTranscribeFullAccess
- Attach Policy
With this complete, the policy is now attached to the role.
A return to the Lambda function will also now show Amazon Transcribe on the right hand side, indicating it has permission to access this service.
Upload our movie
We’re ready to test the functionality out! Let’s find an mp4 video file and upload it. In my case, I’ve a file, movie.mp4, which is going to be used.
As before, you can choose either to manually upload the file via the AWS console, via the AWS CLI, or using the Upload program we created earlier.
../upload/upload -bucket "aws-lambda-go-transcribe2srt" -filename /movie.mp4
The function should kick in pretty much as soon as the file has finished copying to S3. Let’s have a look from the console by going to Machine Learning, Amazon Transcribe.
And it’s there. We’ve now got an end-to-end mp4 to transcript file.
In this post, we’ve created our Go package, uploaded it to S3, setup the Lambda function, made an event subscription for when an MP4 file is uploaded to our bucket, configured the role associated with the function to allow it to use Transcribe, and verified its operation.
At this point there are further steps we could think of.
- During the time of putting this series of blogs together, AWS added CloudWatch events for Transcribe. We could write another Lambda function, which ran once either a Transcribe completed or failed event occurred. Using this, we could do things like notify us when a job has completed, or even do something like converting the output to .srt format.
- Add an endpoint and some code to allow us to query the status of one of the jobs.
- We could even look into using a completely different way of getting a file into S3, such as passing a link to an MP4 file on a website and getting the Lambda function to download the file and store directly in S3 prior to creating the job.
- Via the above, we could also look at adding in additional event sources, such as via API Gateway.
There’s lots and lots of possibilities, and a forthcoming blog series will cover one or more of these.
thanks for reading! Feedback always welcome. 🙂