Microsoft has released a huge corpus of audio and text for conversational speech in German, French and English. It is free to download for personal use. I have just now downloaded it and had a quick look and listen to the contents.
There are many short dialogues in each language. For each audio file (.wav), there is an exact transcript/translation in plain text in all three languages.
Microsoft says "It can serve as a standardized data set for testing bilingual conversational speech translation systems". To me, it looks like a wonderful resource of conversational EN, FR and DE for language learners. I hope they will develop/release such a resource in other languages.
Microsoft corpus
Microsoft conversational speech in EN, FR and DE
- tommus
- Blue Belt
- Posts: 957
- Joined: Sat Jul 04, 2015 3:59 pm
- Location: Kingston, ON, Canada
- Languages: English (N), French (B2), Dutch (B2)
- x 1937
Microsoft conversational speech in EN, FR and DE
4 x
Dutch: 01 September -> 31 December 2020
● Watch 1000 Dutch TV Series Videos | : |
- Henkkles
- Green Belt
- Posts: 276
- Joined: Thu Apr 07, 2016 2:13 pm
- Languages: N FI | A EN SV | I EE RU | B FR LN
- x 795
Re: Microsoft conversational speech in EN, FR and DE
What kind of interface is this meant to work with?
0 x
- tommus
- Blue Belt
- Posts: 957
- Joined: Sat Jul 04, 2015 3:59 pm
- Location: Kingston, ON, Canada
- Languages: English (N), French (B2), Dutch (B2)
- x 1937
Re: Microsoft conversational speech in EN, FR and DE
Henkkles wrote:What kind of interface is this meant to work with?
As the website explains, this is a large dataset of audio and matching text that can be used to test "bilingual conversational speech translation systems such as the Microsoft Translator live feature and Skype Translator." So the interface would be whatever test setup the developers decided to use to test the accuracy of speech-to-text and text-to-speech in a bilingual conversational speech situation. It was not developed specifically as an interface for language learners, just a dataset for testing.
A link on the main page is to a .PDF article that describes how the corpus was produced and how it will be used by Microsoft.
http://workshop2016.iwslt.org/downloads/IWSLT_2016_paper_12.pdf
For use by language learners, it is simply conversational audio and accurate transcripts and translations. There could be various ways to make an automated system. But it is very usable, as is. It is rare to see so much conversational material available with audio and text, let alone in three languages.
1 x
Dutch: 01 September -> 31 December 2020
● Watch 1000 Dutch TV Series Videos | : |
- Henkkles
- Green Belt
- Posts: 276
- Joined: Thu Apr 07, 2016 2:13 pm
- Languages: N FI | A EN SV | I EE RU | B FR LN
- x 795
Re: Microsoft conversational speech in EN, FR and DE
tommus wrote:Henkkles wrote:What kind of interface is this meant to work with?
As the website explains, this is a large dataset of audio and matching text that can be used to test "bilingual conversational speech translation systems such as the Microsoft Translator live feature and Skype Translator." So the interface would be whatever test setup the developers decided to use to test the accuracy of speech-to-text and text-to-speech in a bilingual conversational speech situation. It was not developed specifically as an interface for language learners, just a dataset for testing.
A link on the main page is to a .PDF article that describes how the corpus was produced and how it will be used by Microsoft.
http://workshop2016.iwslt.org/downloads/IWSLT_2016_paper_12.pdf
For use by language learners, it is simply conversational audio and accurate transcripts and translations. There could be various ways to make an automated system. But it is very usable, as is. It is rare to see so much conversational material available with audio and text, let alone in three languages.
Ah, of course. I wish this existed, I'll have to learn to program first though.
0 x
- tommus
- Blue Belt
- Posts: 957
- Joined: Sat Jul 04, 2015 3:59 pm
- Location: Kingston, ON, Canada
- Languages: English (N), French (B2), Dutch (B2)
- x 1937
Re: Microsoft conversational speech in EN, FR and DE
Henkkles wrote:Ah, of course. I wish this existed, I'll have to learn to program first though.
But don't wait for that. Play the audio files and follow (LR) in EN, FR and/or DE in the text files.
0 x
Dutch: 01 September -> 31 December 2020
● Watch 1000 Dutch TV Series Videos | : |
- Henkkles
- Green Belt
- Posts: 276
- Joined: Thu Apr 07, 2016 2:13 pm
- Languages: N FI | A EN SV | I EE RU | B FR LN
- x 795
Re: Microsoft conversational speech in EN, FR and DE
tommus wrote:Henkkles wrote:Ah, of course. I wish this existed, I'll have to learn to program first though.
But don't wait for that. Play the audio files and follow (LR) in EN, FR and/or DE in the text files.
Sadly neither of those is a current target for me!
Wait, these could be compiled into an ANKI deck?
0 x
Return to “Language Programs and Resources”
Who is online
Users browsing this forum: No registered users and 2 guests