I'm studying Chinese for 2 years now, and I managed to work on lists of vocabulary (the HSK and other frequency lists).
I formated them in a common format so it's easier to work with them (importing into Anki, making programs to use them)
I generated 3 data formats: CSV, xml and json
Here are an example:
Json format:
Code: Select all
{
"hanzi": "桌子 ",
"traditional": "桌子 ",
"pinyin": "zhuōzi",
"translation": "table / bureau ",
"classifier": null,
"lesson": "HSK1",
"sound": "[sound:cmn-60f2dada.ogg]",
"origin": "Chinwa"
}
CSV format:
Code: Select all
爱 愛 ài aimer / affection / apprécier HSK1 [sound:cmn-2d9d12c4.ogg] Chinwa
八 八 bā huit / 8 HSK1 [sound:cmn-5b366cae.ogg] Chinwa
XML format:
Code: Select all
<enregistrement>
<hanzi>爱 </hanzi>
<traditional>愛 </traditional>
<pinyin>ài</pinyin>
<translation>aimer / affection / apprécier </translation>
<classifier/>
<lesson>HSK1</lesson>
<sound>[sound:cmn-2d9d12c4.ogg]</sound>
<origin>Chinwa</origin>
</enregistrement>
The lists are available with French, English and German (not all of them).
What would be nice is the possibility to have a sound for every record. I Used the ones in the shtooka databses, but many sounds are missing. I also did some audio splitting on the FSI Chinese recordings ... but there is still a lot of work to do.
If someone knows a free source of Chinese words sounds, it would be great to link them in the files.
So, if someone is interested in these lists, I can put them on the website (private use only ... I gathered these original lists a long time ago and don't know about the copyrights ... some websites just don't exist anymore)
The origin is embeded in the files (when available)
Some of the lists available:
wikidictionary
Chinwa,
HSK academy
Official HSK
...
The MaineEdu website has a ton of sentences with different audio speakers and with English translations. I scrapped the site and create a big csv, xml, json file with all the content (like a big list of sentences with audio and translations)
an example of the json record:
Code: Select all
{
"phrase": {
"topic": "Talking with Children 孩子 - Character Review",
"hanzi": {
"simplified": "你 耳 朵 疼 吗 、使 劲 儿 咽 几 下 儿 。",
"traditional": "你 耳 朵 疼 嗎 、使 勁 兒 咽 幾 下 兒 。"
},
"pinyin": "nǐ ěrduō téng ma? shǐjìnr yàn jǐxiàr.",
"translations": [
{
"translation": {
"langue": "en",
"texte": "Do your ears hurt? Swallow very hard."
}
}
],
"recordings": [
{
"recording": {
"langue": "zh",
"locuteur": "Cao Lihong",
"audio": "../../Language/Sound19a/19103cao.wav"
}
},
{
"recording": {
"langue": "zh",
"locuteur": "Shao Jingxian",
"audio": "../../Language/Sound19b/19103sjx.wav"
}
},
{
"recording": {
"langue": "zh",
"locuteur": "Ren Shuang",
"audio": "../../Language/Sound19c/19103rs.wav"
}
},
{
"recording": {
"langue": "zh",
"locuteur": "Zhao Mo",
"audio": "../../Language/Sound19d/19103zm.wav"
}
},
{
"recording": {
"langue": "zh",
"locuteur": "Li Xinzhou",
"audio": "../../Language/Sound19e/19103lxz.wav"
}
},
{
"recording": {
"langue": "en",
"locuteur": "Cashmeira",
"audio": "../../Language/Sounde19a/19103csh.wav"
}
},
{
"recording": {
"langue": "en",
"locuteur": "Cherie",
"audio": "../../Language/Sounde19b/19103ca.wav"
}
}
]
}
},
I also converted the following dictionaries:
cedict
cfdict
handedict
chdict
I forgot to mention some audio epub I made from websites like grammar-wiki, rudger's chinese etc .. some need still some work.
http://divers.yojik.eu/
I'm writing my own training programs to use the lists.
Here are some screenshots (the programs are still not finished, but in a good state!)
http://divers.yojik.eu/c1.png
http://divers.yojik.eu/c2.png
http://divers.yojik.eu/c3.png
Nothing original, but my own versions of programs found on the web, which I can customize with my own lists, exercises.
I also finished the 2 first modules of FSI-Chinese. (here just in pdf format)
http://divers.yojik.eu/FSI-Chinese.pdf
I forgot to mention that I wrote all formatting programs in nodejs. Available if someone wants them. (in source, of course)
Like usual, send me any comment or updates ...
Eric