Microsoft computer speaks Chinese for you - in your own voice
November 9th, 2012
03:58 PM ET

Microsoft computer speaks Chinese for you - in your own voice

By John D. Sutter, CNN

(CNN) - By now, everyone knows computers can talk.

There's Hal. There's Watson. And, of course, there's Siri.

But never before have computers been able to talk for you, in your voice, and in a foreign language.

That's the technology - or a precursor to it - that Microsoft Research recently demonstrated at an event in China. The company's research arm on Thursday posted a video of the talk and a blog post about the technology behind it.

Watch for yourself here:

Rick Rashid, Microsoft’s Chief Research Officer, writes that he had to train the system to recognize his voice, and to speak with his intonations. The computer listened to an hour of his speeches (in English) in order to get it right. It also had several hours of training from a Chinese speaker.

The demo also relies on a new method for processing speech patters, using a system called Deep Neural Networks. The company says this type of machine learning is "patterned after human brain behavior," and means that the researchers are able to reduce error rates by nearly a third.

Here's more on that concept from a June Microsoft Research post:

Artificial neural networks are mathematical models of low-level circuits in the human brain. They have been in use for speech recognition for more than 20 years, but only a few years ago did computer scientists gain access to enough computing power to make it possible to build models that are fine-grained and complex enough to show promise in automatic speech recognition.

Don't get too excited about the results just yet. This kind of speech-to-speech translation technology likely won't be consumer-ready anytime soon. But, as Rashid writes, it could be the start of something:

The results are still not perfect, and there is still much work to be done, but the technology is very promising, and we hope that in a few years we will have systems that can completely break down language barriers ... We may not have to wait until the 22nd century for a usable equivalent of Star Trek’s universal translator, and we can also hope that as barriers to understanding language are removed, barriers to understanding each other might also be removed. The cheers from the crowd of 2,000 mostly Chinese students, and the commentary that’s grown on China’s social media forums ever since, suggests a growing community of budding computer scientists who feel the same way.

For what it's worth, the Internet seems impressed.

Writing for TheNextWeb, Alex Wilhelm called the demo "mindbending."

"The video is oh so very worth 10 minutes of your time," he writes. "The future, it’s coming."

And GeekWire lists some practical uses for this type of tech:

The applications for this type of technology are endless. Say Japanese executives with Nintendo are meeting at the Redmond headquarters and need to say something during a meeting. Use the translator. Or what if you’re ordering food in France but the waiter doesn’t understand English. Use the translator.

I'm sure you can think of plenty of others. If so, let us know in the comments section.

Post by:
Filed under: Future • Innovation • Language • Tech
soundoff (46 Responses)
  1. In home Personal Training

    I think Microsoft will become a bigger player again over the next few years. I like the fact i can speak multiple langues in my own voice when i'm on vacations. I love the talking feature on the Xbox one.

    April 13, 2014 at 4:59 pm | Reply
  2. Rydoxxihra

    asvigli coach 財布 ebviwdk bkfqiyg コーチ アウトレット trawegx ennvnsf コーチ アウトレット wnokegy ucndixj ajyqexz sjrzurq コーチ 財布 viepkgl vajvayy コーチ アウトレット qkrecys sbvodzj コーチ 財布 cjcvhtd gotyjsl gfipkoj ipdlzdk コーチ アウトレット vijjlmi oboxvcg コーチ アウトレット irbzyfw yrxxust コーチ アウトレット lfxkppt nfvsxvz hhnyxue yotsqkk コーチ wrhixsd xeadivf コーチ メンズ cjxilfg qqnnoer コーチ 財布 xisciwi pwtqryc gtfzxst rfxnwbo コーチ アウトレット lcymixs tvzxqkp coach アウトレット wpotweg twuiwei coach アウトレット zpzfomu ojedvyh xxjnkgf qpsebll コーチ 財布 bgxrlzb dmqncgu コーチ 財布 ymeluys oocodls コーチ バッグ svktpgk erpjcie puidnzh

    April 9, 2013 at 11:47 pm | Reply
  3. Consignment Clothes

    Kerry had drug problems and after his 2nd drug arrest which
    violated his probation, he shot himself to
    death by another wrestler in a match in that they have fewer injuries.
    He had a short feud over the Gulf Coast territory.

    There was a more socially conscience light was the Royal Rumble.

    March 18, 2013 at 5:17 pm | Reply
  4. 蝈蝈

    What's the point? Most educated Chinese speaks good English. The audience in that video are mostly chinese, and they don't seem to have a problem understanding the speaker. So, it doesn't make much sense to have an English to Chinese live translator. It makes more sense to have a Chinese to English live translator, because when speaking English many Chinese have a strong accent which makes it difficult for American people to understand. Again, let me repeat, there's no "ching" "chong" word in Chinese. We never speak in that way.

    November 12, 2012 at 3:12 pm | Reply
    • Chris R

      I have to agree in part. I attend a large number of conferences and while the Chinese scientists will present their papers it's often in stilted and/or broken English with a large number of breaks and pauses while they search for the right word. The accent is often very thick and further degrades the impact of the presentation. Something that would allow them to present in their native language and offer simultaneous translation would really help the flow of information. That being said, something going the other way would also help when I'm presenting at conferences in Asia as a good number of the people I'm trying to reach do not necessarily have the strongest English skills.

      November 12, 2012 at 3:40 pm | Reply
      • Brian in TX

        you should learn to speak those other languages then

        November 13, 2012 at 4:20 pm |
    • Bruce

      "What's the point? Most educated Chinese speaks good English."

      Um, are you kidding me? A universal translator like something out of Star Trek, and you can't see any use for it.


      November 15, 2012 at 5:39 pm | Reply
    • Rex Rowland

      Many educated Chinese can understand English, but often only in a certain dialect. I am from California and taught English in China, and most of my foreign peers at that school were Australian. Another teacher and I could have a conversation with a Chinese person, and he/she could understand me fairly well but have a difficult time with the Aussie, likely because most foreign movies/TV they watch in China are American.

      November 17, 2012 at 1:22 pm | Reply
  5. CaesarXIII

    Now everyone can have their PC call them a pervert in their native language.

    November 12, 2012 at 2:26 pm | Reply
    • 蝈蝈

      isn't that a good thing?

      November 12, 2012 at 2:49 pm | Reply
  6. CNN_Bulla

    I have to see in order to believe it. Most products from MS so far have been marketed a lot better than they really are. Just ask those Vista users ...

    November 12, 2012 at 2:01 pm | Reply
    • 蝈蝈

      If you watch the whole video, you will see there's a live demonstration towards the end of that speech, and it appears working pretty well(I would say totally understandable in Chinese)

      November 12, 2012 at 2:50 pm | Reply
    • Chris R

      Microsoft does some incredibly impressive research – often in partnership with universities and they do it on a very long time scale. The research project they mentioned – the one that took place at Carnegie Mellon using Markov analysis – was sponsored by Microsoft and was active when I was a student there in 1987 (My voice is part of that data set). The issue that they have is translating that research into compelling consumer products. They've had some big wins with their research but they've also had some that didn't really go anywhere. However, basic research is like that. Not every research project is going to be a win. What does matter is that MS is one of the few companies that still has a large scale internal R&D group.

      November 12, 2012 at 4:08 pm | Reply
  7. atroy

    Computer speaks Chinese LIKE ME???? Must not be a very smart computer because all I know if "ching chang chong".

    November 12, 2012 at 1:55 pm | Reply
    • 蝈蝈

      There's never a "ching chang chong" expression in Chinese. You retard!

      November 12, 2012 at 2:51 pm | Reply

      • 秦长城, 哈哈。

        November 17, 2012 at 11:35 am |
  8. r

    How wonderful! I am not gifted in languages. But I could see this not only breaking down barriers between people, but evoling as an incredible teaching tool. Go, guys, go!!!

    November 12, 2012 at 12:42 pm | Reply
  9. Peter

    Sometimes I lay flat on the floor and pretend to be a banana.

    November 12, 2012 at 12:13 pm | Reply
  10. The6thsense

    I am guessing the reverse can be done as well, otherwise a conversation can not take place

    November 12, 2012 at 12:00 pm | Reply
  11. rs

    Can we finally build the tower of babel?

    November 12, 2012 at 11:41 am | Reply
    • 蝈蝈

      We already did. It's called "international space station"

      November 12, 2012 at 2:52 pm | Reply
  12. Jack McSquirt

    Ultimately, I think we'll discover those ching chongs don't have much to say.

    November 12, 2012 at 2:08 am | Reply
    • banana guy

      Not much else to say when you walk into Walmart & nearly everything is made in China because of the terrible work ethics here in the USA.

      November 12, 2012 at 6:33 am | Reply
    • corryvreken

      F'ing hilarious. A+ sir.

      November 12, 2012 at 11:30 am | Reply
      • Miss Demeanor

        When the Chinese perfect brain transplants, I hope they experiment on you.

        November 12, 2012 at 12:11 pm |
    • Mike,Albany

      Really? I think you (specifically) will find that 99% of them are better than you at just about everything.

      November 12, 2012 at 12:05 pm | Reply
    • Miss Demeanor

      But YOU do, bigot? Why do people who talk the loudest always have the least to say?

      November 12, 2012 at 12:10 pm | Reply
    • 蝈蝈

      ultimately, you will discover the ching chongs have a lot to say, but you can't comprehend any of those even when translated.

      November 12, 2012 at 2:48 pm | Reply
  13. George

    babylon walls come tumbling down

    November 11, 2012 at 5:49 pm | Reply
  14. davidtco

    so disappointed. he claimed they would modulate the translated chinese into his own voice, but that was stephen hawkings' voice.

    November 10, 2012 at 3:07 am | Reply
    • Dan

      Not only that, but the stupid computer couldn't even translate his speech into correct written sentences. All that they did was take the written English then translate it into Chinese then voiced it. Very UNimpressive. Maybe Microsoft is grabbing at straws since they don't really have anything going for it?

      November 10, 2012 at 7:05 pm | Reply
      • tech god

        In a few years we might all see that Apple peaked with their cool little two-hit wonders (iPooed & their Maxi-pad thing) because Apple is already trying to convert from using small teams to create a few neat toys to what Microsoft has been for decades: expanding into a huge operation with large teams and taking over Microsofts spot in the tech world. They MIGHT pull it off, but first they have to learn how to manage such a vast organiztion... and Microsoft has been doing that successfully for decades. I hope Apple succeeds, but they don't have it in their DNA. I think Microsoft will ROLL them FLAT...and since Goo-goo can only produce half-baked not-ready-for-prime-time toys... Microsoft will do what it always does... steadily improve their stuff until it becomes the best. Microsoft will keep at its devices until they prevail. I guarantee it.

        November 12, 2012 at 12:27 pm |
      • Chris R

        Dan, what are you talking about? They said this was a multi-step process where the spoken word was converted into English text (the captions at the bottom of the screen was being performed by the computer – not a typist) which was then translated into Chinese (including rearranging the word order) and then converted back into speech. That it happened in near real time is impressive. You sound disappointed because of the intermediate steps – that doesn't make a lot of sense to me. Any system like this – no matter who builds it – are going to have intermediate steps. The voice ha to be converted into some sort of machine processable data structure which can be acted upon in such a way as to create the proper grammatical structure for the target language (think about the position of adjectives in spanish versus English). The text that they showed was really nothing more than a visual human readable instantiation of the intermediate data structures.

        Honestly, it makes sense to do it that way (as text or the data structure which can be represented as text) because they were able to make use of their existing translation engine as opposed to building an entirely new one from the ground up (which would have been a duplication of existing infrastructure). I'm guessing you aren't in the IT field – at least not on the development and research side. As someone who is I can assure you that what they are doing is impressive. If you think it's unimpressive and easy I encourage you to come up with your own system.

        As for how the guy's voice sounded – Chinese is a tonal language which makes inflection, pitch, and so forth very important. The same sound given with a rising or falling pitch can mean completely different things. This means that how you talk and sound in English doesn't necessarily mean you will sound the same way in Chinese. You use your mouth very differently and that will change how your voice sounds to a certain extent. Go watch some videos on English speakers talking in Chinese to see what I mean regarding this point.

        November 12, 2012 at 4:04 pm |
  15. jvivien

    kind of scary. others can steal your voice and make fake messages. cant tell what is real vs fake anymore

    November 10, 2012 at 12:41 am | Reply
    • S1N

      Nah. The technology isn't perfect yet. If you REALLY want to get confused between real and fake, drop some acid.

      November 10, 2012 at 2:09 am | Reply
  16. Ian

    My hovercraft is full of eels

    November 9, 2012 at 11:58 pm | Reply
    • ninjaman


      November 10, 2012 at 12:54 am | Reply
  17. Ryan

    Doesn't Gene Roddenberry get credit for the concept?

    November 9, 2012 at 10:58 pm | Reply
    • ready

      Murray Leinster says FIRST on that.

      November 10, 2012 at 12:23 am | Reply

Post a comment


CNN welcomes a lively and courteous discussion as long as you follow the Rules of Conduct set forth in our Terms of Service. Comments are not pre-screened before they post. You agree that anything you post may be used, along with your name and profile picture, in accordance with our Privacy Policy and the license you have granted pursuant to our Terms of Service.