AI Agent Powered by OpenAI ChatGPT, Whisper And Microsoft Azure AI Speech (TTS)!

ai agent

Introduction

In the ever-evolving realm of telecommunication, Asterisk stands as a testament to the power of open-source solutions, offering flexibility and robustness to developers worldwide. As the demand for smarter, more responsive systems grows, the integration of Artificial General Intelligence (AGI) with platforms like Asterisk becomes not just desirable, but essential. An AGI, with its capacity to understand, learn, and make decisions across a broad spectrum of tasks, has the potential to elevate the capabilities of a conventional PBX system to unprecedented heights.

This technical blog will delve deep into the intricacies of developing an AGI tailored for Asterisk. From establishing the initial interface to optimizing real-time communication processes, we’ll explore the challenges, solutions, and breakthroughs in intertwining these two powerful entities. For developers, telecommunication professionals, or anyone intrigued by the fusion of AI and telephony, this is a journey into the next frontier of communication technology.

Necessary Resources For an AI Agent

  1. OpenAI Account API Key. Check our blog post to create an Open AI api key.
  2. Microsoft Azure TTS API Key. Check our blog post to create a Microsoft Azure api key.
  3. VitalPBX 4
  4. Python and some dependency

Source Code

In the following link you can find the process for a quick installation, as well as all the source code and the latest updates.

https://github.com/VitalPBX/vitalpbx_agent_ai_chatgpt/tree/main

1.- Create Asterisk AGI and Dial-Plan

Now we are going to show you how to create the AGI that will do all the magic.

1.1.- Installing dependencies

We install certain dependencies.

				
					root@vitalpbx:~# apt update
root@vitalpbx:~# apt install python3 python3-pip
root@vitalpbx:~# pip install azure-cognitiveservices-speech

				
			

Create the requirements.txt file.

				
					root@vitalpbx:~# nano requirements.txt
				
			

Paste this list of dependencies.

Now install the requirements.

				
					pip install pyst2
pip install pydub
pip install python-dotenv==0.21.0
pip install langchain==0.0.161
pip install pypdf==3.8.1
pip install docx2txt==0.8
pip install openai==0.27.6
pip install chromadb==0.3.22
pip install tiktoken==0.4.0

				
			
				
					root@vitalpbx:~# pip install -r requirements.txt
				
			

Now we must go to the /var/lib/asterisk/agi-bin folder on our VitalPBX server. And create the .env file.

				
					root@vitalpbx:~# cd /var/lib/asterisk/agi-bin
root@vitalpbx:/var/lib/asterisk/agi-bin# nano .env

				
			

1.2.- Creating .env file to save global variables

Copy and paste the following content into your file. Replace the OPENAI_API_KEY, AZURE_SPEECH_KEY, AZURE_SERVICE_REGION, PATH_TO_DOCUMENTS and PATH_TO_DATABASE with your own values.

				
					OPENAI_API_KEY = "sk- "
AZURE_SPEECH_KEY = ""
AZURE_SERVICE_REGION = "eastus"
PATH_TO_DOCUMENTS = "/var/lib/asterisk/agi-bin/docs/"
PATH_TO_DATABASE = "/var/lib/asterisk/agi-bin/data/"

				
			

1.3.- Using Script (Optional)

Next we use a Script to create all the files and we will skip steps 3.4, 3.5, 3.6 and 3.7.

				
					wget https://raw.githubusercontent.com/VitalPBX/vitalpbx_agent_ai_chatgpt/main/vpbx-agent-ai.sh
				
			
				
					chmod +x vpbx-agent-ai.sh
				
			
				
					./vpbx-agent-ai.sh
				
			

Now we will proceed to create the Voice Guides.

				
					root@vitalpbx:~# cd /var/lib/asterisk/agi-bin
				
			

The format to record a prompt is as follows: ./record-prompt.py file-name “Text to record” language

  • file-name –> file name if extension mp3, remember that in the Agent AI script, the welcome audio is: welcome-en (English), welcome-es (Spanish), and the wait audio is: wait-en (English), and wait-es (Spanish).
  • languaje –> could be “en-US” or “es-ES”

If you want to add more languages, you must modify the scripts

Below we show an example of how you should use the script to record the prompt.

				
					./record-prompt.py wait-en "Just a moment, please. We're fetching the information for you." "en-US"
./record-prompt.py welcome-en "Hello! Welcome to the AI Agent. Please ask your question after the tone." "en-US"
./record-prompt.py short-message-en "Your question is too short. Please provide more details." "en-US"
./record-prompt.py anything-else-en "Can I assist you with anything else?" "en-US"  
./record-prompt.py wait-es "Un momento, por favor. Estamos buscando la información para ti." "es-ES"
./record-prompt.py welcome-es "¡Hola! Bienvenido al Agente de IA. Haga su pregunta después del tono." "es-ES"
./record-prompt.py short-message-es "Tu pregunta es demasiado corta. Por favor, proporciona más detalles." "es-ES"
./record-prompt.py anything-else-es "¿Hay algo más en lo que pueda ayudarte?" "es-ES"  
				
			

1.4.- Creating AI Agent with ChatGPT information

Now we must go to the /var/lib/asterisk/agi-bin folder on our VitalPBX server.

				
					root@vitalpbx:~# cd /var/lib/asterisk/agi-bin
root@vitalpbx:/var/lib/asterisk/agi-bin# nano vpbx-agent-ai.py

				
			

Copy and paste the following content into your file.

				
					#!/usr/bin/env python3
import sys
import os
import openai
import time
# Uncomment if you are going to use sending information to a web page
#import websockets
#import asyncio
import azure.cognitiveservices.speech as speechsdk
from dotenv import load_dotenv

# Uncomment if you are using a valid domain with ssl
#import  ssl
#import logging

# Uncomment if you are using a valid domain with ssl
#logging.basicConfig()

#ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)

#ssl_cert = "/usr/share/vitalpbx/certificates/yourdomain.com/bundle.pem"
#ssl_key = "/usr/share/vitalpbx/certificates/yourdomain.com/private.pem"

#ssl_context.load_cert_chain(ssl_cert, keyfile=ssl_key)

# For Asterisk AGI
from asterisk.agi import *

load_dotenv("/var/lib/asterisk/agi-bin/.env")
AZURE_SPEECH_KEY = os.environ.get('AZURE_SPEECH_KEY')
AZURE_SERVICE_REGION = os.environ.get('AZURE_SERVICE_REGION')
OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')

# Uncomment if you are going to use sending information to a web page
# For valid domains with SSL:
#HOST_PORT = 'wss://yourdomain.com:3001'
# For environments without a valid domain:
#HOST_PORT = 'ws://IP:3001'

agi = AGI()

# Check if a file name was provided
uniquedid = sys.argv[1] if len(sys.argv) > 1 else None
language = sys.argv[2] if len(sys.argv) > 1 else None

if uniquedid is None:
    print("No filename provided for the recording.")
    sys.exit(1)

# Check if a file name was provided
recording_path = f"/tmp/rec{uniquedid}"
answer_path = f"/tmp/ans{uniquedid}.mp3"
pq_file = f"/tmp/pq{uniquedid}.txt"
pa_file = f"/tmp/pa{uniquedid}.txt"

if language == "es-ES":
    azure_language = "es-ES" 
    azure_voice_name = "es-ES-ElviraNeural"
    wait_message = "/var/lib/asterisk/sounds/wait-es.mp3"
    short_message = "/var/lib/asterisk/sounds/short-message-es.mp3"
else:
    azure_language = "en-US" 
    azure_voice_name = "en-US-JennyNeural"
    wait_message = "/var/lib/asterisk/sounds/wait-en.mp3"
    short_message = "/var/lib/asterisk/sounds/short-message-en.mp3"

# Uncomment if you are going to use sending information to a web page
#async def send_message_to_websocket(message):
#    async with websockets.connect(HOST_PORT) as websocket:
#        await websocket.send(message)

def main():

    try:

        # We send the 'raw' command to record the audio, q--> no beep, 3 second of silences
        sys.stdout.write('EXEC Record ' + recording_path + '.wav,3,30,y\n')
        sys.stdout.flush()
        # We await Asterisk's response
        result = sys.stdin.readline().strip()

        if result.startswith("200 result="):

            # Play wait message.
            agi.appexec('MP3Player', wait_message)
           
            #DEBUG
            agi.verbose("Successful Recording",2)

            # Once everything is fine, we send the audio to OpenAI Whisper to convert it to Text
            openai.api_key = OPENAI_API_KEY
            audio_file = open(recording_path + ".wav", "rb")
            transcript = openai.Audio.transcribe("whisper-1", audio_file)
            chatgpt_question = transcript.text
            chatgpt_question_agi = chatgpt_question.replace('\n', ' ') 

	    # If nothing is recorded, Whisper returns "you", so you have to ask again.
            if transcript.text == "you":
                agi.appexec('MP3Player', short_message)
                agi.verbose("Message too short",2)
                sys.exit(1)

            #DEBUG
            agi.verbose("AUDIO TRANSCRIPT: " + chatgpt_question_agi,2)

            # It is used to send the question via WebSocket, to be displayed on a web page. 
            # Uncomment if you want to use this functionality with the chatserver.py script
            # If the chatserver.py program is not running the AGI will not work.
            #try:
            #    chatgpt_question_tv = "USER: " + chatgpt_question
            #    asyncio.get_event_loop().run_until_complete(send_message_to_websocket(chatgpt_question_tv))
            #    agi.verbose("MESSAGE SENT TO WEBSOCKET")
            #except AGIException as e:
            #    agi.verbose("MESSAGE SENT TO WEBSOCKET ERROR:" + str(e))

	    # Find the previous question, with the idea of keeping the conversation
            if os.path.exists(pa_file):
                with open(pa_file, 'r') as previous_file:
                    previous_question = previous_file.readline().strip()
            else:
                previous_question = ""

            # Send the question to ChatGPT
            # Low "Temperature": More deterministic and predictable responses. High "Temperature": More diverse and creative responses, but less predictable.

            messages = []
            messages.append({"role": "user", "content": chatgpt_question})
            messages.append({"role": "assistant", "content": previous_question})
            response = openai.ChatCompletion.create(
                       model="gpt-3.5-turbo",
                       messages=messages,
                       temperature=0.2 
                       )
            chatgpt_answer = response['choices'][0]['message']['content']
            chatgpt_answer_agi = chatgpt_answer.replace('\n', ' ') 

            # save current question
            with open(pq_file, "w") as current_question:
                current_question.write(chatgpt_question + "\n")

            # save current answer
            with open(pa_file, "w") as current_answer:
                current_answer.write(chatgpt_answer + "\n")

            #DEBUG
            agi.verbose("ChatGPT ANSWER: " + chatgpt_answer_agi,2)

            # It is used to send the answer via WebSocket, to be displayed on a web page. 
            # Uncomment if you want to use this functionality with the chatserver.py script
            # If the chatserver.py program is not running the AGI will not work.
            #try:
            #    chatgpt_answer_tv = "ASSISTANT: " + chatgpt_answer 
            #    asyncio.get_event_loop().run_until_complete(send_message_to_websocket(chatgpt_answer_tv)) 
            #    agi.verbose("MESSAGE SENT TO WEBSOCKET")     
            #except AGIException as e:
            #    agi.verbose("MESSAGE SENT TO WEBSOCKET ERROR:" + str(e))


            # Sets API Key and Region
            speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SERVICE_REGION)

            # Sets the synthesis output format.
            # The full list of supported format can be found here:
            # https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech#audio-outputs
            speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3)

            # Select synthesis language and voice
            # Set either the `SpeechSynthesisVoiceName` or `SpeechSynthesisLanguage`.
            speech_config.speech_synthesis_language = azure_language 
            speech_config.speech_synthesis_voice_name = azure_voice_name

            # Creates a speech synthesizer using file as audio output.
            # Replace with your own audio file name.
            speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)
            result = speech_synthesizer.speak_text_async(chatgpt_answer).get()
 
            stream = speechsdk.AudioDataStream(result)
            stream.save_to_wav_file(answer_path)

            # Play the recorded audio.
            agi.appexec('MP3Player', answer_path)

        else:
            agi.verbose("Error while recording: %s" % result)

    except AGIException as e:
        agi.verbose(str(e))

if __name__ == "__main__":
    main()
				
			

Once the vpbx-agent-ai.py file has been saved, we must give it execution permissions.

				
					root@vitalpbx:/var/lib/asterisk/agi-bin# chmod +x vpbx-agent-ai.py
				
			

1.5.- Creating AI Agent with information provided to ChatGPT (Embedding)

In order to use information provided to ChatGPT it is necessary to use the Embedding method, for which we provide a script to convert the document.

This process uses chromadb as a vector database locally.

 

Now we must go to the /var/lib/asterisk/agi-bin folder on our VitalPBX server.

				
					root@vitalpbx:~# cd /var/lib/asterisk/agi-bin
root@vitalpbx:/var/lib/asterisk/agi-bin# nano vpbx-embedded-docs.py

				
			

Copy and paste the following content into your file.

				
					#!/usr/bin/env python3

import os
import sys
from dotenv import load_dotenv
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import Docx2txtLoader
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter

# Load environment variables from a .env file
load_dotenv("/var/lib/asterisk/agi-bin/.env")

# Retrieve paths and API keys from environment variables
PATH_TO_DOCUMENTS = os.environ.get('PATH_TO_DOCUMENTS')
PATH_TO_DATABASE = os.environ.get('PATH_TO_DATABASE')
OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')

# Create an empty list to store documents
documents = []

# Create a List of Documents from all files in the PATH_TO_DOCUMENTS folder
for file in os.listdir(PATH_TO_DOCUMENTS):
    if file.endswith(".pdf"):
        pdf_path = PATH_TO_DOCUMENTS + file
        loader = PyPDFLoader(pdf_path)
        documents.extend(loader.load())
    elif file.endswith('.docx') or file.endswith('.doc'):
        doc_path = PATH_TO_DOCUMENTS + file
        loader = Docx2txtLoader(doc_path)
        documents.extend(loader.load())
    elif file.endswith('.txt'):
        text_path = PATH_TO_DOCUMENTS + file
        loader = TextLoader(text_path)
        documents.extend(loader.load())

# Split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
documents = text_splitter.split_documents(documents)

# Convert the document chunks to embeddings and save them to the vector store
vectordb = Chroma.from_documents(documents, embedding=OpenAIEmbeddings(), persist_directory=PATH_TO_DATABASE)
vectordb.persist()
				
			

Before running the script, create the docs and data directories. And copy the file to be converted to the docs folder (preferably pdf.

t amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

				
					root@vitalpbx:~# cd /var/lib/asterisk/agi-bin
root@vitalpbx:/var/lib/asterisk/agi-bin# mkdir docs
root@vitalpbx:/var/lib/asterisk/agi-bin# mkdir data

				
			

Once the vpbx-embedded-docs.py file has been saved, we must give it execution permissions.

				
					root@vitalpbx:/var/lib/asterisk/agi-bin# chmod +x vpbx-embedded-docs.py
				
			

Upload the document to the /var/lib/asterisk/agi-bin/docs folder with the information to use for the query with ChatGPT-Embedded

Now let’s run the script.

				
					root@vitalpbx:~# ./vpbx-embedded-docs.py
				
			

Once the database is created, we proceed to create the script to make the query.

 

Now we must go to the /var/lib/asterisk/agi-bin folder on our VitalPBX server.

				
					root@vitalpbx:~# cd /var/lib/asterisk/agi-bin
root@vitalpbx:/var/lib/asterisk/agi-bin# nano vpbx-agent-ai-embedded.py

				
			

Copy and paste the following content into your file.

				
					#!/usr/bin/env python3
import sys
import os
import openai
import time
# Uncomment if you are going to use sending information to a web page
import websockets
import asyncio
import azure.cognitiveservices.speech as speechsdk
from dotenv import load_dotenv 
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

# Uncomment if you are using a valid domain with ssl
#import  ssl
#import logging

# Uncomment if you are using a valid domain with ssl
#logging.basicConfig()

#ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)

#ssl_cert = "/usr/share/vitalpbx/certificates/yourdomain.com/bundle.pem"
#ssl_key = "/usr/share/vitalpbx/certificates/yourdomain.com/private.pem"

#ssl_context.load_cert_chain(ssl_cert, keyfile=ssl_key)

# For Asterisk AGI
from asterisk.agi import *

load_dotenv("/var/lib/asterisk/agi-bin/.env")
PATH_TO_DATABASE = os.environ.get('PATH_TO_DATABASE')
AZURE_SPEECH_KEY = os.environ.get('AZURE_SPEECH_KEY')
AZURE_SERVICE_REGION = os.environ.get('AZURE_SERVICE_REGION')
OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')

# Uncomment if you are going to use sending information to a web page
# For valid domains with SSL:
#HOST_PORT = 'wss://yourdomain.com:3001'
# For environments without a valid domain:
#HOST_PORT = 'ws://IP:3001'

agi = AGI()

# Check if a file name was provided
uniquedid = sys.argv[1] if len(sys.argv) > 1 else None
language = sys.argv[2] if len(sys.argv) > 1 else None

if uniquedid is None:
    print("No filename provided for the recording.")
    sys.exit(1)

# Check if a file name was provided
recording_path = f"/tmp/rec{uniquedid}"
answer_path = f"/tmp/ans{uniquedid}.mp3"
pq_file = f"/tmp/pq{uniquedid}.txt"
pa_file = f"/tmp/pa{uniquedid}.txt"

if language == "es-ES":
    azure_language = "es-ES" 
    azure_voice_name = "es-ES-ElviraNeural"
    wait_message = "/var/lib/asterisk/sounds/wait-es.mp3"
    short_message = "/var/lib/asterisk/sounds/short-message-es.mp3"
else:
    azure_language = "en-US" 
    azure_voice_name = "en-US-JennyNeural"
    wait_message = "/var/lib/asterisk/sounds/wait-en.mp3"
    short_message = "/var/lib/asterisk/sounds/short-message-en.mp3"

# Uncomment if you are going to use sending information to a web page
#async def send_message_to_websocket(message):
#    async with websockets.connect(HOST_PORT) as websocket:
#        await websocket.send(message)

def main():

    try:

        # We send the 'raw' command to record the audio, 'q' for no beep, 2 seconds of silence, '30' max duration, 'y' to overwrite existing file
        sys.stdout.write('EXEC Record ' + recording_path + '.wav,2,30,yq\n')
        sys.stdout.flush()
        # We await Asterisk's response
        result = sys.stdin.readline().strip()

        if result.startswith("200 result="):

            # Play wait message.
            agi.appexec('MP3Player', wait_message)
           
            #DEBUG
            agi.verbose("Successful Recording",2)

            # Once everything is fine, we send the audio to OpenAI Whisper to convert it to Text
            openai.api_key = OPENAI_API_KEY
            audio_file = open(recording_path + ".wav", "rb")
            transcript = openai.Audio.transcribe("whisper-1", audio_file)
            chatgpt_question = transcript.text
            chatgpt_question_agi = chatgpt_question.replace('\n', ' ') 

	    # If nothing is recorded, Whisper returns "you", so you have to ask again.
            if chatgpt_question == "you":
                agi.appexec('MP3Player', short_message)
                agi.verbose("Message too short",2)
                sys.exit(1)

            #DEBUG
            agi.verbose("AUDIO TRANSCRIPT: " + chatgpt_question_agi,2)

            # It is used to send the question via WebSocket, to be displayed on a web page. 
            # Uncomment if you want to use this functionality with the chatserver.py script
            # If the chatserver.py program is not running the AGI will not work.
            #try:
            #    chatgpt_question_tv = "USER" + chatgpt_question
            #    asyncio.get_event_loop().run_until_complete(send_message_to_websocket(chatgpt_question_tv))
            #    agi.verbose("MESSAGE SENT TO WEBSOCKET")
            #except AGIException as e:
            #    agi.verbose("MESSAGE SENT TO WEBSOCKET ERROR:" + str(e))

	    # Find the previous question, with the idea of keeping the conversation
            if os.path.exists(pq_file):
                with open(pq_file, 'r') as previous_question_file:
                    previous_question = previous_question_file.readline().strip()
            else:
                previous_question = ""

	    # Find the previous question, with the idea of keeping the conversation
            if os.path.exists(pa_file):
                with open(pa_file, 'r') as previous_answer_file:
                    previous_answer = previous_answer_file.readline().strip()
            else:
                previous_answer = ""

            embeddings = OpenAIEmbeddings()
            vectordb = Chroma(persist_directory=PATH_TO_DATABASE, embedding_function=embeddings)

            #DEBUG
            agi.verbose("PREVIOUS QUESTION: " + previous_question,2)
            #DEBUG
            agi.verbose("PREVIOUS ANSWER: " + previous_answer,2)

            # Add previous Answer a Question
            chat_history = []

            if len(previous_question) >= 2:
                chat_history.append((previous_question, previous_answer))
                #DEBUG
                agi.verbose("PREVIOUS QUESTION OK",2)

            resp_qa = ConversationalRetrievalChain.from_llm(
                ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo'),
                retriever=vectordb.as_retriever(search_kwargs={'k': 6}),
                return_source_documents=True,
                verbose=False
            )

            response = resp_qa(
                {"question": chatgpt_question, "chat_history": chat_history})

            chatgpt_answer = response["answer"]
            chatgpt_answer_agi = chatgpt_answer.replace('\n', ' ')

            #DEBUG
            agi.verbose("ChatGPT ANSWER: " + chatgpt_answer_agi,2)
      
            # It is used to send the answer via WebSocket, to be displayed on a web page. 
            # Uncomment if you want to use this functionality with the chatserver.py script
            # If the chatserver.py program is not running the AGI will not work.
            #try:
            #    chatgpt_answer_tv = "ASSISTANT" + chatgpt_answer 
            #    asyncio.get_event_loop().run_until_complete(send_message_to_websocket(chatgpt_answer_tv)) 
            #    agi.verbose("MESSAGE SENT TO WEBSOCKET")     
            #except AGIException as e:
            #    agi.verbose("MESSAGE SENT TO WEBSOCKET ERROR:" + str(e))

            # save current question and current answer
            with open(pq_file, "w") as current_question:
                current_question.write(chatgpt_question + "\n")

            with open(pa_file, "w") as current_answer:
                current_answer.write(chatgpt_answer + "\n")

            # Sets API Key and Region
            speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SERVICE_REGION)

            # Sets the synthesis output format.
            # The full list of supported format can be found here:
            # https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech#audio-outputs
            speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3)

            # Select synthesis language and voice
            # Set either the `SpeechSynthesisVoiceName` or `SpeechSynthesisLanguage`.
            speech_config.speech_synthesis_language = azure_language 
            speech_config.speech_synthesis_voice_name = azure_voice_name

            # Creates a speech synthesizer using file as audio output.
            # Replace with your own audio file name.
            speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)
            result = speech_synthesizer.speak_text_async(chatgpt_answer).get()
 
            stream = speechsdk.AudioDataStream(result)
            stream.save_to_wav_file(answer_path)

            # Play the recorded audio.
            agi.appexec('MP3Player', answer_path)

        else:
            agi.verbose("Error while recording: %s" % result)

    except AGIException as e:
        agi.verbose("ERROR:" + str(e))

if __name__ == "__main__":
    main()
				
			

Once the vpbx-agent-ai-embedded.py file has been saved, we must give it execution permissions.

				
					root@vitalpbx:/var/lib/asterisk/agi-bin# chmod +x vpbx-agent-ai-embedded.py
				
			

Other very important data that we can configure with the following       

 

Record(filename.format,[silence,[maxduration,[options]]])

 

  • filename
  • filename
  • format – Is the format of the file type to be recorded (wav, gsm, etc).
  • silence – Is the number of seconds of silence to allow before returning.
  • maxduration – Is the maximum recording duration in seconds. If missing or 0 there is no maximum.
  • options
  • a – Append to existing recording rather than replacing.
  • n – Do not answer, but record anyway if line not yet answered.
  • – Exit when 0 is pressed, setting the variable RECORD_STATUS to ‘OPERATOR’ instead of ‘DTMF’
  • q – quiet (do not play a beep tone).
  • s – skip recording if the line is not yet answered.
  • t – use alternate ‘*’ terminator key (DTMF) instead of default ‘#’
  • u – Don’t truncate recorded silence.
  • x – Ignore all terminator keys (DTMF) and keep recording until hangup.
  • k – Keep recorded file upon hangup.
  • y – Terminate recording if any DTMF digit is received.

 

The “q” option is very important, since if we add it, we will not know when to start asking.

Also, the “silence” option is very important since it is the wait that is taken to finish the question, it must be kept between 2 or 3 seconds.

 

Since OpenaAI Whisper automatically detects the language in which we speak to it, it is not necessary to define it. However, in Azure TTS we must define the language and voice to use, you can add the voice that you like the most by changing the section: #Azure TTS info

 

In this example, 2 Azure voices are defined, Spanish and English

1.6.- Creating the Dial Plan

Now we will create the dial-plan to access the AGI.

				
					root@vitalpbx:~# cd /etc/asterisk/vitalpbx
root@vitalpbx:/etc/asterisk/vitalpbx# nano extensions__70-agent-ai.conf

				
			

Copy and paste the following content into your file.

				
					;This is an example of how to use the AI Agent

[cos-all](+)
;For English
exten => *778,1,Answer()
 same => n,MP3Player(/var/lib/asterisk/sounds/welcome-en.mp3)
 same => n(AskAgaing),AGI(vpbx-agent-ai.py,${UNIQUEID},"en-US")
 same => n,MP3Player(/var/lib/asterisk/sounds/anything-else-es.mp3)
 same => n,Goto(AskAgaing)
 same => n,Hangup()

;For Spanish
exten => *779,1,Answer()
 same => n,MP3Player(/var/lib/asterisk/sounds/welcome-es.mp3)
 same => n(AskAgaing),AGI(vpbx-agent-ai.py,${UNIQUEID},"es-ES")
 same => n,MP3Player(/var/lib/asterisk/sounds/anything-else-es.mp3)
 same => n,Goto(AskAgaing)
 same => n,Hangup()

;For English
exten => *888,1,Answer()
 same => n,MP3Player(/var/lib/asterisk/sounds/welcome-en.mp3)
 same => n(AskAgaing),AGI(vpbx-agent-ai-embedded.py,${UNIQUEID},"en-US")
 same => n,MP3Player(/var/lib/asterisk/sounds/anything-else-es.mp3)
 same => n,Goto(AskAgaing)
 same => n,Hangup()

;For Spanish
exten => *889,1,Answer()
 same => n,MP3Player(/var/lib/asterisk/sounds/welcome-es.mp3)
 same => n(AskAgaing),AGI(vpbx-agent-ai-embedded.py,${UNIQUEID},"es-ES")
 same => n,MP3Player(/var/lib/asterisk/sounds/anything-else-es.mp3)
 same => n,Goto(AskAgaing)
 same => n,Hangup()
				
			

Now restart the Asterisk dialplan and you can call *778, *888 to chat in English or *779,*889 to chat in Spanish.

				
					root@vitalpbx:~# asterisk -rx "dialplan reload"
				
			

1.7.- Creating voice guides

To record our own prompt, we are going to create the following script.

				
					root@vitalpbx:/etc/asterisk/vitalpbx# nano record-prompt.py
				
			
				
					#!/usr/bin/env python3
import sys
import os
import time
from dotenv import load_dotenv 
from pydub import AudioSegment
import azure.cognitiveservices.speech as speechsdk

load_dotenv("/var/lib/asterisk/agi-bin/.env")
AZURE_SPEECH_KEY = os.environ.get('AZURE_SPEECH_KEY')
AZURE_SERVICE_REGION = os.environ.get('AZURE_SERVICE_REGION')

# The format to record a prompt is as follows:
# ./record-prompt.py file-name "Text to record" language
# file-name --> file name if extension mp3, remember that in the Agent AI script, the welcome audio is: welcome-en (English), welcome-es (Spanish), and the wait audio is: wait-en (English), and wait-es (Spanish).
# languaje --> could be "en-US" or "es-ES"
# If you want to add more languages you must modify the scripts

# Check if a file name was provided
audio_name = sys.argv[1] if len(sys.argv) > 1 else None
audio_text = sys.argv[2] if len(sys.argv) > 1 else None
language = sys.argv[3] if len(sys.argv) > 1 else None

if audio_name is None:
    print("No filename provided for the recording.")
    sys.exit(1)

if audio_text is None:
    print("No text to record audio.")
    sys.exit(1)

if language == "es-ES":
    azure_language = "es-ES" 
    azure_voice_name = "es-ES-ElviraNeural"
else:
    azure_language = "en-US" 
    azure_voice_name = "en-US-JennyNeural"

audio_path = f"/var/lib/asterisk/sounds/{audio_name}.mp3"
print(audio_path)

def main():

    # Sets API Key and Region
    speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SERVICE_REGION)

    # Sets the synthesis output format.
    # The full list of supported format can be found here:
    # https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech#audio-outputs
    speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3)

    # Select synthesis language and voice
    # Set either the `SpeechSynthesisVoiceName` or `SpeechSynthesisLanguage`.
    speech_config.speech_synthesis_language = azure_language 
    speech_config.speech_synthesis_voice_name = azure_voice_name

    # Creates a speech synthesizer using file as audio output.
    # Replace with your own audio file name.
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)
    result = speech_synthesizer.speak_text_async(audio_text).get()
 
    stream = speechsdk.AudioDataStream(result)
    stream.save_to_wav_file(audio_path)

    # Path to the original MP3 file and path for the trimmed file
    original_file = audio_path
    trimmed_file = "/tmp/tmp.mp3"

    # Load the original audio file in MP3 format
    audio = AudioSegment.from_mp3(original_file)

    # Get the total duration of the file in milliseconds
    total_duration = len(audio)

    # Calculate the new duration without the last second
    new_duration = total_duration - 750  # Subtract 1000 milliseconds (1 second)

    # Trim the audio file
    trimmed_audio = audio[:new_duration]

    # Save the trimmed file as MP3
    trimmed_audio.export(trimmed_file, format="mp3")

    # Remove the original file
    os.remove(original_file)

    # Rename the trimmed file to the original file name
    os.rename(trimmed_file, original_file)

if __name__ == "__main__":
    main()
				
			

We proceed to give execution permissions.

				
					root@vitalpbx:/etc/asterisk/vitalpbx# chmod +x record-prompt.py
				
			

The format to record a prompt is as follows:

./record-prompt.py file-name “Text to record” language

  • file-name –> file name if extension mp3, remember that in the Agent AI script, the welcome audio is: welcome-en (English), welcome-es (Spanish), and the wait audio is: wait-en (English), and wait-es (Spanish).
  • languaje –> could be “en-US” or “es-ES”
  • If you want to add more languages, you must modify the scripts

 

Below we show an example of how you should use the script to record the prompt.

				
					./record-prompt.py wait-en "Just a moment, please. We're fetching the information for you." "en-US"
./record-prompt.py welcome-en "Hello! Welcome to the AI Agent. Please ask your question after the tone." "en-US"
./record-prompt.py short-message-en "Your question is too short. Please provide more details." "en-US"
./record-prompt.py anything-else-en "Can I assist you with anything else?" "en-US"  
./record-prompt.py wait-es "Un momento, por favor. Estamos buscando la información para ti." "es-ES"
./record-prompt.py welcome-es "¡Hola! Bienvenido al Agente de IA. Haga su pregunta después del tono." "es-ES"
./record-prompt.py short-message-es "Tu pregunta es demasiado corta. Por favor, proporciona más detalles." "es-ES"
./record-prompt.py anything-else-es "¿Hay algo más en lo que pueda ayudarte?" "es-ES"  
				
			

Testing Embedding

To test from the command line, we will proceed to create the following Script.

				
					root@vitalpbx:/etc/asterisk/vitalpbx# nano chatbot.py 
				
			

We copy and paste the following content:

				
					#!/usr/bin/env python3
import os
import sys
from dotenv import load_dotenv
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

# Load environment variables from .env file
load_dotenv('.env')

# Define the path to the database
PATH_TO_DATABASE = os.environ.get('PATH_TO_DATABASE')

# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings()

# Create a Chroma vector store
vectordb = Chroma(persist_directory=PATH_TO_DATABASE, embedding_function=embeddings)

# Create a Q&A chat chain
pdf_qa = ConversationalRetrievalChain.from_llm(
    ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo'),
    retriever=vectordb.as_retriever(search_kwargs={'k': 6}),
    return_source_documents=True,
    verbose=False
)

# Define text colors for console output
yellow = "\033[0;33m"
green = "\033[0;32m"
white = "\033[0;39m"

# Initialize chat history
chat_history = []

# Print welcome message and instructions
print(f"{yellow}--------------------------------------------------------------------------------------------")
print('Welcome to the VitalPBX Agent AI. You are now ready to start interacting with your documents')
print('                            Type exit, quit, q or f to finish                               ')
print('--------------------------------------------------------------------------------------------')

# Start the interactive chat loop
while True:
    query = input(f"{green}Prompt: ")
    
    # Check for exit commands
    if query == "exit" or query == "quit" or query == "q" or query == "f":
        print('Exiting')
        sys.exit()
    
    # Skip empty queries
    if query == '':
        continue
    
    # Perform document retrieval and answer generation
    result = pdf_qa(
        {"question": query, "chat_history": chat_history})
    
    # Display the answer
    print(f"{white}Answer: " + result["answer"])
    
    # Append the query and answer to chat history
    chat_history.append((query, result["answer"]))
				
			

We proceed to give execution permissions.

				
					root@vitalpbx:/etc/asterisk/vitalpbx# chmod +x chatbot.py
				
			

To test the functionality of our AI Agent with the Embedding option, run the following script:

				
					root@vitalpbx:/etc/asterisk/vitalpbx#  ./chatbot.py
				
			

Limitations of AI Agents in Telephony

  1. Data Dependency: The effectiveness of an AI agent heavily relies on the quality and volume of data it has been trained on. If the data set is limited or biased, the agent might malfunction or make erroneous decisions.
  2. Complexity and Cost: Implementing AI solutions in telephony systems may require significant investments in terms of hardware, development, and training.
  3. Privacy Concerns: AI agents processing calls or messages might have access to personal or sensitive information. This raises concerns about data privacy and security.
  4. Limited Human Interaction: While AI agents can handle many tasks autonomously, there are still situations that require the human touch. Over-reliance on AI can lead to customer frustration if they can’t connect with a real person when needed.
  5. Adaptability and Learning: While AI can learn and adapt over time, it may initially not be prepared to handle atypical or emerging situations that weren’t present in its training data.
  6. Resource Consumption: Some AI solutions, especially the more advanced ones, might require a significant amount of computational resources, influencing infrastructure and operational costs.
  7. Errors and Misunderstandings: AI agents might misinterpret verbal commands or contexts, especially in noisy environments or with varied accents and dialects.
  8. Updates and Maintenance: AI technology evolves quickly. This means implemented solutions might require frequent updates, implying a constant commitment to resources.
  9. Ethical Considerations: Using AI in telephony might lead to ethical debates, especially concerning recordings, sentiment analysis, and other aspects that might be perceived as intrusive.
  10. Delay in response: Sometimes you will notice a delay in the response as it is necessary to convert the question from audio to text (Whisper), send it to ChatGPT, wait for the response and then convert the response to audio again.

 

Despite these limitations, it’s undeniable that AI holds the potential to transform the telephony industry, offering significant improvements in efficiency and customer experience. However, it’s crucial to address these limitations to ensure successful and user-centric implementation.

Conclusion

The integration of artificial intelligence into the telephony sector has marked a clear divide in how we communicate and how businesses engage with their customers. AI agents in telephony not only enable the automation of repetitive tasks and the enhancement of efficiency but also pave the way for a more personalized and agile user experience.

With their ability to learn and adapt, these systems deliver swift and accurate responses, reducing errors and improving customer satisfaction. Additionally, they allow businesses to scale their operations without proportionately increasing their workforce.

However, it’s vital to remember that technology alone isn’t the entire solution. It’s essential for businesses to pair these tools with a well-thought-out strategy, taking into account user needs and emotions. While AI can handle a vast amount of interactions, the human touch remains irreplaceable in specific situations.

Looking to the future, we can anticipate continued AI advancements revolutionizing the telephony industry. However, it’s our responsibility as a society to ensure these changes are carried out ethically and thoughtfully, always prioritizing the needs and well-being of individuals.

 

Our Latest Post