AI trained on YouTube and podcasts talks to ums and ahs News-thread


An AI can generate more natural-sounding synthetic speech by including pauses

Shutterstock/Prince of Love

Generating speech with different rhythms and pauses makes it sound more human, according to an assessment by a voice-trained artificial intelligence gleaned from YouTube and podcasts.

Most AI text-to-speech systems are trained on voice-actuated data sets, which can make the output sound stilted and one-dimensional. More natural speech often displays a wide range of rhythms and patterns to convey different meanings and emotions.

Now, Alexander Rudnicky at Carnegie Mellon University in Pittsburgh, Pennsylvania.


