Nov 092018
 

A Chinese news agency has introduced an English-language computer simulated “news anchor.” I’m honestly a bit befuddled. What’s the point? What does an AI anchor bring that simply reading text does not? Shrug. Anyway, here it is:

The interesting thing here is that the video is more convincing than the audio. If you just glance at the video with the audio off, you might not noticed that this isn’t a human (though you also might, as the movement of the mouth is still well within Uncanny Valley… the “anchor” seems to be talking with his teeth almost clenched). But if you just hear the audio, it’s blisteringly obviously a synthesized voice. At first thought you’d likely think that the video should be far harder to do convincingly than audio. but really, all the “anchor” needs to do is sit there and move its mouth, while providing a few other seemingly randomized movements to simulate the living; but human speech is *all* over the place. An actual human would put pauses and inflections all over in a way that would sound natural to another human, but a voice synthesizer would have to be specifically programmed to produce those. And apparently that’s hard.

 

 Posted by at 8:56 am