Audio to Text using OpenAI's Whisper
Integrating OpenAI models with PROCESIO is something anyone can do due to the simplicity of using the Call API action. If you don't believe us, we're going to prove how easy it is with this brand-new use case.
At the bottom of this article you'll be able to find:
- import file containing this ready to use, use-case.
- following the import, credentials will need to be configured.
Scenario:
Let's say you work for a company that specializes in providing accessibility services for people with hearing impairments. You want to generate transcripts from audio files such as podcasts or radio theater so that anyone who suffers from hearing loss or similar impairments can enjoy this type of content.
We're going to use the Whisper model from OpenAI to generate transcripts from audio files. First, we retrieve the audio file from our Google Drive and then we generate the transcript.
The audio file that will be used in our example is an MP3 recording from Marian.
We'll start by creating the Credential to access Google Drive.
Notice that the file is publicly accessible so No authentication is required to download it.
We define some variables to be used in our flow:
- FileURL ā String containing the audio file location.
- Endpoint ā String representing the Google Drive endpoint to download from.
- DownloadStatusCode ā Integer representing the status code of the download request.
- AudioFile ā File representing the downloaded audio file.
Then we build the actual flow:
- We extract the Endpoint by removing the base URL from FileURL using String Replace
- We download the file using Call API with our Google Drive credential.
For Call API we will have to use the following headers:
- Content-Type ā application/force-download
- Content-Disposition ā attachment
We create the Credential for OpenAI.
Make sure to use your own API Key instead of $OPENAI_API_KEY when configuring the credential.
We define some variables to be used in our flow:
- FileURL ā String containing the audio file location.
- AudioFile ā File representing the downloaded audio file.
- queryResponse ā Json holding the model's response.
- queryStatusCode ā Integer representing the model response status code.
- Transcript ā String containing the transcript for the audio file.
Then we build the actual flow:
- We download the audio file by using Call Subprocess with our first process.
- We query the Whisper model using Call API with a form-data body.
- We extract the transcript from the model's response using Json Mapper.
For Call API we will use form-data:
- file [File] ā insert AudioFile variable
- model [Text] ā whisper-1
- response_format [Text] ā json
ļ»æ
Make sure to select the right type (File or Text) when using Call API with form data.
Transcript
Action Pool
Import File
Use below .procesio file, for importing this use-case dirctly to one of your workspaces (feel free to create a new workspace dedicted for this example).
ļ»æ