Using PowerShell and Azure OpenAI Service for Text-to-Speech
In today's digital age, transforming text into natural-sounding speech has a multitude of applications, from enhancing accessibility to creating engaging user experiences. Azure's Text-to-Speech (TTS) service, part of the Azure OpenAI Service, offers robust capabilities to convert text into lifelike speech using advanced neural voice models.
In this tutorial, we will walk through a simple yet powerful PowerShell script that leverages Azure's TTS service to synthesize speech from text. The script will save the output as a WAV file in a configured folder, with each file named using the current date and time to ensure uniqueness. Prerequisites
Before diving into the script, ensure you have the following:
- Azure Subscription: Access to an Azure subscription with the Azure OpenAI Service enabled.
- OpenAI Resource: An Azure OpenAI resource created in either the North Central US or Sweden Central regions with the tts-1 or tts-1-hd model deployed.
- PowerShell: The Az PowerShell module installed on your system.
This step-by-step guide will help you set up and run the script, enabling you to harness the power of Azure's TTS service effortlessly. Script Overview
The provided PowerShell script performs the following tasks:
- Authentication: Obtains an access token using your Azure subscription key.
- Text Synthesis: Sends the input text to the Azure TTS API.
- File Management: Saves the synthesized speech as a WAV file, named with the current date and time, in a specified folder.
By using this script; you'll be able to convert any text into speech and store it systematically, making it an excellent tool for presentation and small video's.
# Ensure you have the Az module installed
if (-not (Get-Module -ListAvailable -Name Az)) {
Install-Module -Name Az -AllowClobber -Force
}
# Set your Azure OpenAI service details
$subscriptionKey = "YOUR_AZURE_SUBSCRIPTION_KEY"
$region = "YOUR_AZURE_REGION" # e.g., "northcentralus" or "swedencentral"
$endpoint = "https://YOUR_ENDPOINT_NAME.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
# Get an access token
$headers = @{
"Ocp-Apim-Subscription-Key" = $subscriptionKey
}
$response = Invoke-RestMethod -Method Post -Uri $endpoint -Headers $headers
$accessToken = $response.Token
# Define the text you want to synthesize
$text = "Hello, this is a sample text using Azure Text-to-Speech."
# Define the TTS API endpoint
$ttsEndpoint = "https://$region.tts.speech.microsoft.com/cognitiveservices/v1"
# Define the request headers and body
$headers = @{
"Authorization" = "Bearer $accessToken"
"Content-Type" = "application/ssml+xml"
"X-Microsoft-OutputFormat" = "riff-24khz-16bit-mono-pcm"
}
$body = @"
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
<voice name='en-US-GuyNeural'>$text</voice>
</speak>
"@
# Set the output folder
$outputFolder = "C:\Path\To\Your\Desired\Folder"
# Ensure the output folder exists
if (-not (Test-Path -Path $outputFolder)) {
New-Item -ItemType Directory -Path $outputFolder
}
# Generate the filename with current date and time
$currentDateTime = Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
$outputWav = Join-Path -Path $outputFolder -ChildPath "$currentDateTime.wav"
# Send the request to the TTS API
$response = Invoke-RestMethod -Method Post -Uri $ttsEndpoint -Headers $headers -Body $body -OutFile $outputWav
Write-Output "Speech synthesis complete. Output saved to $outputWav."
Instructions:
-
Replace placeholders:
- Replace
"YOUR_AZURE_SUBSCRIPTION_KEY"
with your actual Azure subscription key. - Replace
"YOUR_AZURE_REGION"
with the region where your Azure OpenAI resource is deployed (e.g.,northcentralus
orswedencentral
). - Replace
"https://YOUR_ENDPOINT_NAME.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
with the correct endpoint URL for your Azure OpenAI resource. - Replace
C:\Path\To\Your\Desired\Folder
with the path to the folder where you want to save the WAV files.
- Replace
-
Run the script: Save the script to a
.ps1
file and run it in PowerShell.
Notes:
- The script ensures the output folder exists and creates it if it does not.
- The filename is generated based on the current date and time when the script is run, ensuring unique filenames for each execution.
- Ensure you have the necessary permissions and correct configurations in your Azure subscription.
- The script uses the neural voice model
en-US-GuyNeural
. You can replace this with another voice if needed. Check the Azure documentation for available voices. - Make sure your network allows outbound calls to Azure services. If you encounter connectivity issues, verify your firewall and network settings.