Lingopods: a podcast search engine for language learners

All about language programs, courses, websites and other learning resources
User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Lingopods: a podcast search engine for language learners

Postby ryanheise » Tue Aug 23, 2022 7:40 am

Image

Link ▸ https://lingopods.com

Lingopods is a search engine that that helps you find podcasts in a foreign language on a desired topic and at a desired difficulty level, and all with transcripts.

Supported languages: English, German, French, Italian, Spanish, Portuguese, Dutch, Japanese

The search engine is constantly crawling the web to pick up new podcasts and episodes with transcripts, and the index currently contains over 200,000 episodes.

I can add more languages, just request below which languages you would like me to add.

The way to get more podcasts in the search index is to contact the podcast authors and see if they would be interested in publishing transcripts. If they do publish them, the search engine will automatically pick it up with no further action. The one exception where I have some control over adding a podcast myself is the podcasts in the librivox catalogue. So if there is a particular audiobook on librivox that you would like indexed, you can also suggest it below and I'll consider adding it. In cases where Librivox publishes multiple versions of the same audiobook with different readers, I would rather index only a single version, so it would help to suggest the one version that you think has the highest quality recording and the best reading.




(Original post below)

The idea: a search engine that will collect all of the world's podcasts across different languages, rank them by difficulty, and then allow you to search for a podcast in a desired language that matches your interests and difficulty level. It has been 3 years from the initial idea to figuring out how it could actually be done to finally implementing the first prototype which I can now share below:

Link: https://lingopods.com/

Languages: English, German, French, Italian (next will be Japanese)

How to use it: Enter search phrases in the target language. E.g. if you're interested in French podcasts about history, you won't find many matches by searching for the English word "history", you'll want to search for the French word "histoire". Also, use double quotes to enclose exact phrases, e.g. "an exact phrase", otherwise each word will be treated as an independent keyword.

Todo

  • Embed a play button directly within the search results
  • Provide a link to the transcript directly within the search results
  • For convenience, provide subscribe links to various popular podcast players
  • Search by category (e.g. "Arts")
  • Search by media type (e.g. "Audio books")
  • Improve UI design
  • Increase transcript availability and accuracy in non-English languages

Feedback

As a prototype, I know there are many things to improve, so I would welcome any suggestions. Some ideas may be too difficult to implement in practice at a large scale, but you never know where an idea may lead either.

The state of language support

  • English: 66923 episodes
  • German: 14706 episodes
  • French: 1789 episodes
  • Italian: 2079 episodes

Although Japanese is the language I was primarily interested in, its stat would have looked something like this:

  • Japanese: 7 episodes

In English, transcripts have already become a popular practice. This is mainly because the hosting platforms for English provide the tools to publish transcripts, but also because there is a general awareness that transcripts help improve accessibility for the deaf and hard of hearing, and there is of course interest in SEO since transcripts allow keyword searches within each episode's content. There was also a motivating legal case where a podcast host was sued for not providing transcripts to deaf users.

But in terms of Japanese (and many other non-English languages, for that matter), the idea of podcast transcripts is far less established, there is less access to and awareness of tools, etc. So in order to make the above search engine more helpful in those languages, there is a dual cause to actually increase the accessibility of podcasts world wide, and try to promote the idea to podcasters of publishing transcripts (or switching to a podcast host that provides the option to publish transcripts).

In my case, I decided to create transcripts myself for one of my favourite Japanese podcasts (Nihongo con Teppei) and donate them to the podcast creator. That's not necessarily a scalable solution, I know, but I do not mind continuing to do this for more Japanese podcasts until it catches up somewhat to the more popular languages.

There is still an issue of the transcript quality. Since the Nihongo con Teppei example is actually intended for learners of Japanese, it would not look good to publish transcripts with any errors. I think in general, transcript accuracy is important on many fronts: for the deaf, for foreigners, and even for this search engine, because the quality of the search results is only going to be as good as the quality of the transcripts that you're searching through.

So at the moment, these are the problems I'm thinking about. 1) How to encourage more Japanese podcasters (and other languages for that matter) to publish transcripts, and 2) How to improve transcript quality.


History

* August 2019 - idea conception
* December 2019 - the first algorithm
* January 2020 - published the first results in this forum post
* July 2021 - published stats on larger scale results (40,000 English episodes) in this forum post
* August 2022 - released the first working prototype in the post you are reading now
Last edited by ryanheise on Sat Sep 03, 2022 5:45 pm, edited 3 times in total.
12 x

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23122
Contact:

Re: Lingopods: a podcast search engine for language learners

Postby rdearman » Tue Aug 23, 2022 10:37 am

Had a play around with Italian. (Would love to see Korean). Some improvements I can think of are boolean operators in the search, e.g. although implementation could be operators or keywords. I'm assuming symbols as operators because the would be the same regardless of language. ||, &&, etc.

cucinare && cuocere (AND)
cucinare !! cuocere (NOT)
"cucinare locale" || "frutti di mare" (OR) (quoted phrases as single match)
2 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: Lingopods: a podcast search engine for language learners

Postby ryanheise » Tue Aug 23, 2022 10:58 am

You can try these in the search query:

word1 word2 ---- search for word1 AND word2
word1 | word2 ---- search for word1 OR word2
word1 -word2 ---- search for word1 and NOT(word2)
"word1 word2" ----- word1 followed by word2

Yes I would love to also add Korean since it's one of the languages I've studied, and I have a lot of Korean friends who could help me to tailor it to Korean. I think the reason I didn't do it for now is that podcasting isn't actually as popular in Korea as it is in Japan, so even if I wanted to add Korean, the number of podcasts would still be smaller. And if I did support Korean, I would want to ensure that there are actually good tools available to Korean podcasters to create transcripts, and at this stage, I think Korean speech recognition technology is not that great. If we can get both the speech recognition technology to improve and we can also somehow encourage more Korean podcasters to publish transcripts, then yes I can make it happen.
4 x

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: Lingopods: a podcast search engine for language learners

Postby ryanheise » Wed Aug 24, 2022 8:09 am

Here are two podcast apps that will display transcripts:

* Podcast Addict (Android)
* Podverse (Android, iOS)

I've added links to them on the site. In doing so I may have messed up the caching. Let me know if you encounter issues (force reloading the page should fix it, though).

I personally use Podcast Addict on Android, but Podverse also supports iOS and I believe there is also a web version if that is your preferred platform.

So once you find a podcast, the next step would be to add it to your subscriptions. I don't yet have any good integration with these apps, so what you would currently have to do after finding a podcast is to then go to your preferred app and copy across the same podcast title into the search bar.

Hopefully this is one of the things I can make easier in the future.

Here is a demo of what things look like in Podcast Addict for a japanese podcast created by SBS:



In this particular case, I created the transcripts myself and have emailed SBS offering to donate them but (probably due to the bureaucracy of SBS) they haven't replied yet. Hopefully I can get through to them, because SBS is a government organisation and they are usually the first to care about things such as this. I'll keep trying or see if I know anybody who knows anybody who can get me in touch with someone there directly (degrees of separation or Bacon or such).
3 x

Stefan
Green Belt
Posts: 379
Joined: Sun Dec 20, 2015 9:59 pm
Location: Sweden
Languages: -
x 920
Contact:

Re: Lingopods: a podcast search engine for language learners

Postby Stefan » Wed Aug 24, 2022 12:03 pm

I love the idea but find it a bit difficult to navigate. The major one is that it focuses on episodes and not podcasts. So when you do a search for say "buch" you end up with "Römische Geschichte Buch X" covering page after page. There are 93 episodes just in Buch 8. It would be more beneficial to get the podcast average with the outliers filtered out.
3 x

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: Lingopods: a podcast search engine for language learners

Postby ryanheise » Wed Aug 24, 2022 12:44 pm

Thanks Stefan, I think that's a great idea. Although I think there are benefits of having both: Note that many discussion-based podcasts will talk about a different topic each week, or have a completely different guest each week, and so topic-based searches would be less effective the more episodes a podcast has. Additionally, it could be that a podcast is on average too difficult for you, but there just happen to be 2 episodes out of 400 that might be perfect for your interests and difficulty level, so if you're hunting for some listening practice, you might value those 2 individual episodes more than the podcast as a whole. I don't personally listen to Joe Rogan, but if he has an interesting guest on regarding a topic I'm interested in, I will probably listen to that one episode, just for the guest rather than the host. On the other hand, there are certainly podcasts that are all about a specific topic and hence I would subscribe to them in order to be notified of every new episode. So I think to help you find those subscribable podcasts it would be really helpful to have a search mode where you could search based on the average of all episodes in a podcast.
2 x

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: Lingopods: a podcast search engine for language learners

Postby ryanheise » Thu Sep 01, 2022 6:14 am

UPDATES

Note: Use SHIFT-reload to force the new updates to load

Japanese support has been added. Note that searching in Japanese is not as accurate as for other languages due to the lack of spaces, but hopefully this will improve in the future. There are currently 2,308 Japanese episodes in the search index. At this stage, there are 64 Japanese podcasts with transcripts, however the bulk of the episodes come from Teppei's 7 podcasts and Japanese audiobooks on Librivox that have also been published as podcasts. The remaining Japanese podcasts contribute a smaller number of episodes overall and that is because that those podcasts have only provided transcripts for one or two of their episodes.

I would be happy to create more transcripts for Japanese podcasts (and perhaps other languages) and donate them to the podcaster, but only if the podcaster is themselves interested in publishing transcripts. To make it easier, I will create some sort of automated tool that podcasters can use to transcribe their podcasts, and then if anyone wishes that a certain podcast had a transcript, you can contact the podcaster yourself and send them the link to the tool. Once a podcast is transcribed, the search index should automatically pick it up on the next crawl.

I'm also investigating again what the options are to improve automated transcription quality for Korean, since the last time I checked it was not good enough.

• The search index has expanded to 225,425 episodes.

• I have also fixed a bug recognising input outside of the roman alphabet (oops!). I only noticed this when adding Japanese, but it also would have affected languages with umlauts etc.

UPCOMING

Now that I've built up such a large database, I am thinking there are many things that could be done with this, but I'm not sure what to work on next.

One idea is a feature find the optimal selection and sequence of episodes across all podcasts for vocabulary acquisition, such that each new episode would introduce at most a certain number of new words, and at least a certain amount of repetition of those new words, and would also ensure that the new words you're learning are repeated frequently enough in future episodes before you'd be likely to forget them, sort of like a natural SRS where there are no cards, just more podcasts to enjoy.

Basically, you'd pick a vocabulary domain, like economics, and then find the optimal way to absorb the economics vocabulary.

Related to this, I have been researching better ways to perform search, and I think the best way to support the above feature may also coincide with a different (and maybe better) way of doing regular searches. This may also mean giving up the boolean query feature and instead trying to understand the meaning of what you're typing. This would allow it to search for podcasts that don't necessarily contain the exact same word that you searched for, but maybe podcasts that contain similar words that mean the same thing.
1 x

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: Lingopods: a podcast search engine for language learners

Postby ryanheise » Sat Sep 03, 2022 11:09 am

Added a Spanish selection.

Right now, there are 958 episodes across 106 Spanish podcasts.

Spanish has fewer "easier" podcasts than the other languages, but if there is a Spanish podcast out there you know of that is easier but doesn't have a transcript, feel free to share it below. I'll try contacting the podcast author to see about getting it transcribed.
1 x

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: Lingopods: a podcast search engine for language learners

Postby ryanheise » Sat Sep 03, 2022 5:48 pm

Added Dutch. Stats: 53 podcasts and 274 episodes have transcripts. (Coinciding with with Max Verstappen's home race in F1 this weekend.)
Added Portuguese. Stats: 49 podcasts and 471 episodes have transcripts.

I'm working my way through the languages from largest to smallest podcast numbers, so each new language from here will have a smaller number of podcasts. Next in my list are the Chinese languages, Arabic, Polish, Finnish, Russian, Slovak, Danish, Swedish, Vietnamese, Norwegian, Hungarian, Hebrew, Turkish, Czech and Romanian (although the tail end has significantly smaller numbers, so I'm not sure where the cutoff point should be for usefulness).
1 x


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 2 guests