Google’s DeepMind unit, which is working to develop super-intelligent computers, has created a system for machine-generated speech that it says outperforms existing technology by 50 percent.
“But thou, O Daniel, shut up the words, and seal the book,even to the time of the end: many shall run to and fro, and knowledge shall be increased.” Daniel 12:4 (KJV)
U.K.-based DeepMind, which Google acquired for about 400 million pounds ($533 million) in 2014, developed an artificial intelligence called WaveNet that can mimic human speech by learning how to form the individual sound waves a human voice creates, it said in a blog post Friday. In blind tests for U.S. English and Mandarin Chinese, human listeners found WaveNet-generated speech sounded more natural than that created with any of Google’s existing text-to-speech programs, which are based on different technologies. WaveNet still underperformed recordings of actual human speech.
Many computer-generated speech programs work by using a large data set of short recordings of a single human speaker and then combining these speech fragments to form new words. The result is intelligible and sounds human, if not completely natural. The drawback is that the sound of the voice cannot be easily modified. Other systems form the voice completely electronically, usually based on rules about how the certain letter-combinations are pronounced. These systems allow the sound of the voice to be manipulated easily, but they have tended to sound less natural than computer-generated speech based on recordings of human speakers, DeepMind said.
WaveNet is a type of AI called a neural network that is designed to mimic how parts of the human brain function. Such networks need to be trained with large data sets.
WaveNet won’t have immediate commercial applications because the system requires too much computational power: it has to sample the audio signal it is being trained on 16,000 times per second or more, DeepMind said. And then for each of those samples it has to form a prediction about what the soundwave should look like based on each of the prior samples. Even the DeepMind researchers acknowledged in their blog post that this “is a clearly challenging task.” source