Demos

Audio to Text using OpenAI's Whisper

11min

Integrating OpenAI models with PROCESIO is something anyone can do due to the simplicity of using the Call API action. If you don't believe us, we're going to prove how easy it is with this brand-new use case.

At the bottom of this article you'll be able to find:

  • import file containing this ready to use, use-case.
  • following the import, credentials will need to be configured.

Scenario:

Let's say you work for a company that specializes in providing accessibility services for people with hearing impairments. You want to generate transcripts from audio files such as podcasts or radio theater so that anyone who suffers from hearing loss or similar impairments can enjoy this type of content.

We're going to use the Whisper model from OpenAI to generate transcripts from audio files. First, we retrieve the audio file from our Google Drive and then we generate the transcript.

The audio file that will be used in our example is an MP3 recording from Marian.

Document image
ļ»æ

Retrieve the file from Google Drive

We'll start by creating the Credential to access Google Drive.

Document image
ļ»æ

Notice that the file is publicly accessible so No authentication is required to download it.

We define some variables to be used in our flow:

  • FileURL āžœ String containing the audio file location.
    • To create the file URL for your own files, you can follow this tutorial.
  • Endpoint āžœ String representing the Google Drive endpoint to download from.
  • DownloadStatusCode āžœ Integer representing the status code of the download request.
  • AudioFile āžœ File representing the downloaded audio file.
Document image
ļ»æ

Then we build the actual flow:

  • We extract the Endpoint by removing the base URL from FileURL using String Replace
  • We download the file using Call API with our Google Drive credential.

For Call API we will have to use the following headers:

  • Content-Type āžœ application/force-download
  • Content-Disposition āžœ attachment
Document image
ļ»æ

Generate the transcript

We create the Credential for OpenAI.

Document image
ļ»æ

Make sure to use your own API Key instead of $OPENAI_API_KEY when configuring the credential.

We define some variables to be used in our flow:

  • FileURL āžœ String containing the audio file location.
  • AudioFile āžœ File representing the downloaded audio file.
  • queryResponse āžœ Json holding the model's response.
  • queryStatusCode āžœ Integer representing the model response status code.
  • Transcript āžœ String containing the transcript for the audio file.
Document image
ļ»æ

Then we build the actual flow:

  • We download the audio file by using Call Subprocess with our first process.
  • We query the Whisper model using Call API with a form-data body.
  • We extract the transcript from the model's response using Json Mapper.

For Call API we will use form-data:

  • file [File] āžœ insert AudioFile variable
  • model [Text] āžœ whisper-1
  • response_format [Text] āžœ json

ļ»æ

Document image
ļ»æ

Make sure to select the right type (File or Text) when using Call API with form data.

Transcript

Hello,Ā hello,Ā smalllings.Ā ThisĀ isĀ MarianĀ fromĀ Procesio,Ā theĀ technologyĀ thatĀ uncomplicatesĀ yourĀ automationĀ life.Ā IĀ wantĀ toĀ sayĀ youĀ rock!Ā AndĀ thankĀ youĀ forĀ trustingĀ andĀ beingĀ withĀ usĀ forĀ twoĀ daysĀ already.Ā Don'tĀ forget,Ā ProcesioĀ isĀ aĀ provenĀ technologyĀ withĀ useĀ casesĀ atĀ enterpriseĀ level.Ā So,Ā ifĀ youĀ haveĀ useĀ casesĀ thatĀ youĀ wantĀ toĀ discussĀ orĀ justĀ needĀ help,Ā joinĀ ourĀ DiscordĀ communityĀ andĀ weĀ willĀ beĀ moreĀ thanĀ happyĀ toĀ help.Ā HappyĀ automationĀ withĀ Procesio!

Action Pool

Import File

Use below .procesio file, for importing this use-case dirctly to one of your workspaces (feel free to create a new workspace dedicted for this example).

ļ»æ