Using AWS Transcribe in CFML: Checking for Job Completion
Posted 28 September 2018
We’ve started a Transcribe job, now we need to see if the job is done. Jobs in Transcribe can take seconds, or even hours, depending on the length of the source media. Jobs are never immediately completed. As such, you’ll need to approach this in an asyncrhonous manner.
Given that ColdFusion lacks the niceties of Promises in a langugae like JavaScript, and ColdFusion only recently added support for Java Futures in ColdFusion 2018, we need to think of ColdFusion as the executor and checker of asynchronous jobs in AWS. In the previous post, we started a job, and put a reference to that job name in the application scope. (In a real production applicaiton, you’d want to save the job name to a database or other method of persistence.)
Now we can check on the status of the job.
The Code to Retrieve Job Status for a Transcribe Job
We’ll again follow our basic pattern when working with the AWS Java SDK:
- Get a copy of the client that’s making a connection to the service you want to use.
- Create a “request” object.
- Fill the “request” object with the parameters (or other objects) you need to supply.
- Tell the client to make the request.
- Get back a “response” object.
Here are the steps to checking the status of a Transcribe job via the AWS Java SDK:
- Get a copy of the Transcribe client we created in the first part of this series.
- Create a “GetTranscriptionJobRequest” object.
- Set the job name in the GetTranscriptionJobRequest object.
- Run the GetTranscriptionJobRequest.
- Get back a GetTranscriptionJobResult object.
Here’s how we do this in the AWSPlaybox app:
If you haven’t already read the entry on the basic setup needed to access AWS from CFML, please do so now.
As there are three basic actions when working with Transcribe job, I’ve broken out each of those into three separate code blocks in /transcribe.cfm. The second, containing the code to check on a Transcribe job, starts with:
Note that you can only pass a URL.checkTranscribeJob value if you’ve already started a Transcribe job in the AWSPlaybox application.
If you continue on in this code block, you’ll see the five steps listed above translated into code:
Processing the GetTranscriptionJobResult
The GetTranscriptionJobResult contains a property called, confusingly, TranscriptionJob. It’s this object that has the actual status information about the job in question. Key properties of this object are the TranscriptioJobName, CreationTime, MediaFileUri, and Status.
Here’s the appropriate code from /transcribe.cfm:
The getStatus() method of the TranscriptionJob object can return one of three values: COMPLETED, FAILED, and IN_PROGRESS. If the job is currently in progress, there’s nothing we can do but wait around and make another GetTranscriptionJobRequest at a later time.
If the status is COMPLETED, we can get more information, including the job completion time, and, most critically, the URI of the transcript itself:
Note the setting of the deleteJob flag. This is only done to remove completed jobs from our list of currently running jobs in the application. It doesn’t do anything to the actual job on AWS.
If the status is FAILED, Transcribe does give us a failure reason. If the job has failed, we should also remove it from our list of currently running jobs in the application.
Finally, we output data on the completed (or failed) job:
If the job is completed, we construct a URL pointing directly to the full transcript job output — and give a warning about the five minute expiry of that URL. Why is there a time limit on that URL? We’ll cover in the next blog post.