Demos

Audio to Text using OpenAI's Whisper

11min

Integrating OpenAI models with PROCESIO is something anyone can do due to the simplicity of using the Call API action. If you don't believe us, we're going to prove how easy it is with this brand-new use case.

At the bottom of this article you'll be able to find:

  • import file containing this ready to use, use-case.
  • following the import, credentials will need to be configured.

Scenario:

Let's say you work for a company that specializes in providing accessibility services for people with hearing impairments. You want to generate transcripts from audio files such as podcasts or radio theater so that anyone who suffers from hearing loss or similar impairments can enjoy this type of content.

We're going to use the Whisper model from OpenAI to generate transcripts from audio files. First, we retrieve the audio file from our Google Drive and then we generate the transcript.

The audio file that will be used in our example is an MP3 recording from Marian.

Document image


Retrieve the file from Google Drive

We'll start by creating the Credential to access Google Drive.

Document image


Notice that the file is publicly accessible so No authentication is required to download it.

We define some variables to be used in our flow:

  • FileURLString containing the audio file location.
    • To create the file URL for your own files, you can follow this tutorial.
  • EndpointString representing the Google Drive endpoint to download from.
  • DownloadStatusCodeInteger representing the status code of the download request.
  • AudioFile File representing the downloaded audio file.
Document image


Then we build the actual flow:

  • We extract the Endpoint by removing the base URL from FileURL using String Replace
  • We download the file using Call API with our Google Drive credential.

For Call API we will have to use the following headers:

  • Content-Typeapplication/force-download
  • Content-Dispositionattachment
Document image


Generate the transcript

We create the Credential for OpenAI.

Document image


Make sure to use your own API Key instead of $OPENAI_API_KEY when configuring the credential.

We define some variables to be used in our flow:

  • FileURLString containing the audio file location.
  • AudioFileFile representing the downloaded audio file.
  • queryResponseJson holding the model's response.
  • queryStatusCodeInteger representing the model response status code.
  • TranscriptString containing the transcript for the audio file.
Document image


Then we build the actual flow:

  • We download the audio file by using Call Subprocess with our first process.
  • We query the Whisper model using Call API with a form-data body.
  • We extract the transcript from the model's response using Json Mapper.

For Call API we will use form-data:

  • file [File] ➜ insert AudioFile variable
  • model [Text]whisper-1
  • response_format [Text]json



Document image


Make sure to select the right type (File or Text) when using Call API with form data.

Transcript

Hello, hello, smalllings. This is Marian from Procesio, the technology that uncomplicates your automation life. I want to say you rock! And thank you for trusting and being with us for two days already. Don't forget, Procesio is a proven technology with use cases at enterprise level. So, if you have use cases that you want to discuss or just need help, join our Discord community and we will be more than happy to help. Happy automation with Procesio!

Action Pool

Import File

Use below .procesio file, for importing this use-case dirctly to one of your workspaces (feel free to create a new workspace dedicted for this example).