AI trained on YouTube and podcasts talks to ums and ahs News-thread


An AI can generate more natural-sounding synthetic speech by including pauses

Shutterstock/Prince of Love

Generating speech with different rhythms and pauses makes it sound more human, according to an assessment by a voice-trained artificial intelligence gleaned from YouTube and podcasts.

Most AI text-to-speech systems are trained on voice-actuated data sets, which can make the output sound stilted and one-dimensional. More natural speech often displays a wide range of rhythms and patterns to convey different meanings and emotions.

Now, Alexander Rudnicky at Carnegie Mellon University in Pittsburgh, Pennsylvania.


Please enter your comment!
Please enter your name here