Microsoft conversational speech in EN, FR and DE

All about language programs, courses, websites and other learning resources
User avatar
tommus
Blue Belt
Posts: 957
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2)
x 1937

Microsoft conversational speech in EN, FR and DE

Postby tommus » Tue Feb 14, 2017 2:57 pm

Microsoft has released a huge corpus of audio and text for conversational speech in German, French and English. It is free to download for personal use. I have just now downloaded it and had a quick look and listen to the contents.

There are many short dialogues in each language. For each audio file (.wav), there is an exact transcript/translation in plain text in all three languages.

Microsoft says "It can serve as a standardized data set for testing bilingual conversational speech translation systems". To me, it looks like a wonderful resource of conversational EN, FR and DE for language learners. I hope they will develop/release such a resource in other languages.

Microsoft corpus
4 x
Dutch: 01 September -> 31 December 2020
Watch 1000 Dutch TV Series Videos : 40 / 1000

User avatar
Henkkles
Green Belt
Posts: 276
Joined: Thu Apr 07, 2016 2:13 pm
Languages: N FI | A EN SV | I EE RU | B FR LN
x 795

Re: Microsoft conversational speech in EN, FR and DE

Postby Henkkles » Tue Feb 14, 2017 6:20 pm

What kind of interface is this meant to work with?
0 x

User avatar
tommus
Blue Belt
Posts: 957
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2)
x 1937

Re: Microsoft conversational speech in EN, FR and DE

Postby tommus » Tue Feb 14, 2017 8:13 pm

Henkkles wrote:What kind of interface is this meant to work with?

As the website explains, this is a large dataset of audio and matching text that can be used to test "bilingual conversational speech translation systems such as the Microsoft Translator live feature and Skype Translator." So the interface would be whatever test setup the developers decided to use to test the accuracy of speech-to-text and text-to-speech in a bilingual conversational speech situation. It was not developed specifically as an interface for language learners, just a dataset for testing.

A link on the main page is to a .PDF article that describes how the corpus was produced and how it will be used by Microsoft.

http://workshop2016.iwslt.org/downloads/IWSLT_2016_paper_12.pdf

For use by language learners, it is simply conversational audio and accurate transcripts and translations. There could be various ways to make an automated system. But it is very usable, as is. It is rare to see so much conversational material available with audio and text, let alone in three languages.
1 x
Dutch: 01 September -> 31 December 2020
Watch 1000 Dutch TV Series Videos : 40 / 1000

User avatar
Henkkles
Green Belt
Posts: 276
Joined: Thu Apr 07, 2016 2:13 pm
Languages: N FI | A EN SV | I EE RU | B FR LN
x 795

Re: Microsoft conversational speech in EN, FR and DE

Postby Henkkles » Tue Feb 14, 2017 8:24 pm

tommus wrote:
Henkkles wrote:What kind of interface is this meant to work with?

As the website explains, this is a large dataset of audio and matching text that can be used to test "bilingual conversational speech translation systems such as the Microsoft Translator live feature and Skype Translator." So the interface would be whatever test setup the developers decided to use to test the accuracy of speech-to-text and text-to-speech in a bilingual conversational speech situation. It was not developed specifically as an interface for language learners, just a dataset for testing.

A link on the main page is to a .PDF article that describes how the corpus was produced and how it will be used by Microsoft.

http://workshop2016.iwslt.org/downloads/IWSLT_2016_paper_12.pdf

For use by language learners, it is simply conversational audio and accurate transcripts and translations. There could be various ways to make an automated system. But it is very usable, as is. It is rare to see so much conversational material available with audio and text, let alone in three languages.

Ah, of course. I wish this existed, I'll have to learn to program first though.
0 x

User avatar
tommus
Blue Belt
Posts: 957
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2)
x 1937

Re: Microsoft conversational speech in EN, FR and DE

Postby tommus » Tue Feb 14, 2017 8:35 pm

Henkkles wrote:Ah, of course. I wish this existed, I'll have to learn to program first though.

But don't wait for that. Play the audio files and follow (LR) in EN, FR and/or DE in the text files.
0 x
Dutch: 01 September -> 31 December 2020
Watch 1000 Dutch TV Series Videos : 40 / 1000

User avatar
Henkkles
Green Belt
Posts: 276
Joined: Thu Apr 07, 2016 2:13 pm
Languages: N FI | A EN SV | I EE RU | B FR LN
x 795

Re: Microsoft conversational speech in EN, FR and DE

Postby Henkkles » Tue Feb 14, 2017 8:45 pm

tommus wrote:
Henkkles wrote:Ah, of course. I wish this existed, I'll have to learn to program first though.

But don't wait for that. Play the audio files and follow (LR) in EN, FR and/or DE in the text files.

Sadly neither of those is a current target for me!

Wait, these could be compiled into an ANKI deck?
0 x


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 2 guests