Americas

  • United States
sandra_henrystocker
Unix Dweeb

Digging into voice AI platform Deepgram

Opinion
Feb 06, 20253 mins

Developers can save time, improve accessibility, and automate tasks with Deepgram's suite of tools for speech-to-text, text-to-speech, and full speech-to-speech voice agents.

Image of a person typing on a keyboard. Text, speech, typing.
Credit: Tero Vesalainen/Shutterstock

Deepgram is the leading voice AI platform used by over 200,000 developers to build speech-to-text, text-to-speech, and full speech-to-speech (which enables individuals with speech disabilities to be clearly understood) tools. It’s aimed at businesses and developers who need accurate and scalable transcriptions for applications like call centers, video captioning, voice assistants and more.

Deepgram is addressing problems like poor customer experience and the financial risk associated with it ($3.7 trillion annually according to Qualtrics XM Institute) by providing services that are accurate, efficient and scalable.

As call centers migrate to cloud-based solutions, embrace hybrid workforces, and look to leverage AI for automation and compliance, voice technology is becoming essential for delivering better customer experiences. In fact, the role of voice technology in call centers is exploding.

And it’s not only for call centers. Jack in the Box is using Deepgram to implement automated AI voice agents to take customer orders at their drive-through locations. Their goal was to use AI voice agents to shorten the time customers have to wait in the drive-through and improve order accuracy. Deepgram’s technology, including its ability to adapt models to specific business needs like Jack in the Box’s menu items, was a key factor in their choosing Deepgram.

The features of Deepgram include:

  • Using deep neural networks to improve transcription accuracy
  • Supporting both live audio streaming and file-based transcription
  • Providing multi-language and multi-speaker support. It can transcribe multiple languages and distinguish between speakers.
  • The ability to fine-tune models to accommodate industry-specific jargon.
  • High accuracy for enterprise-grade performance.
  • Security and scalability, offering on-premise, cloud and hybrid deployment options
  • Ability to be integrated with various applications using API access

Developers are attracted to Deepgram for its voice-native foundational models, accuracy, low latency and affordable pricing. The rest of us might want to have it converting voice to text instead of typing and automating workflows for us. It’s available for personal use as well as business use with fairly modest pay-as-you-go pricing. A “free” account provides you with a $200 credit that seems to cover a few thousand minutes of transcription.

sandra_henrystocker

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author