Link ▸ https://lingopods.com
Lingopods is a search engine that that helps you find podcasts in a foreign language on a desired topic and at a desired difficulty level, and all with transcripts.
Supported languages: English, German, French, Italian, Spanish, Portuguese, Dutch, Japanese
The search engine is constantly crawling the web to pick up new podcasts and episodes with transcripts, and the index currently contains over 200,000 episodes.
I can add more languages, just request below which languages you would like me to add.
The way to get more podcasts in the search index is to contact the podcast authors and see if they would be interested in publishing transcripts. If they do publish them, the search engine will automatically pick it up with no further action. The one exception where I have some control over adding a podcast myself is the podcasts in the librivox catalogue. So if there is a particular audiobook on librivox that you would like indexed, you can also suggest it below and I'll consider adding it. In cases where Librivox publishes multiple versions of the same audiobook with different readers, I would rather index only a single version, so it would help to suggest the one version that you think has the highest quality recording and the best reading.
(Original post below)
The idea: a search engine that will collect all of the world's podcasts across different languages, rank them by difficulty, and then allow you to search for a podcast in a desired language that matches your interests and difficulty level. It has been 3 years from the initial idea to figuring out how it could actually be done to finally implementing the first prototype which I can now share below:
Link: https://lingopods.com/
Languages: English, German, French, Italian (next will be Japanese)
How to use it: Enter search phrases in the target language. E.g. if you're interested in French podcasts about history, you won't find many matches by searching for the English word "history", you'll want to search for the French word "histoire". Also, use double quotes to enclose exact phrases, e.g. "an exact phrase", otherwise each word will be treated as an independent keyword.
Todo
- Embed a play button directly within the search results
- Provide a link to the transcript directly within the search results
- For convenience, provide subscribe links to various popular podcast players
- Search by category (e.g. "Arts")
- Search by media type (e.g. "Audio books")
- Improve UI design
- Increase transcript availability and accuracy in non-English languages
Feedback
As a prototype, I know there are many things to improve, so I would welcome any suggestions. Some ideas may be too difficult to implement in practice at a large scale, but you never know where an idea may lead either.
The state of language support
- English: 66923 episodes
- German: 14706 episodes
- French: 1789 episodes
- Italian: 2079 episodes
Although Japanese is the language I was primarily interested in, its stat would have looked something like this:
- Japanese: 7 episodes
In English, transcripts have already become a popular practice. This is mainly because the hosting platforms for English provide the tools to publish transcripts, but also because there is a general awareness that transcripts help improve accessibility for the deaf and hard of hearing, and there is of course interest in SEO since transcripts allow keyword searches within each episode's content. There was also a motivating legal case where a podcast host was sued for not providing transcripts to deaf users.
But in terms of Japanese (and many other non-English languages, for that matter), the idea of podcast transcripts is far less established, there is less access to and awareness of tools, etc. So in order to make the above search engine more helpful in those languages, there is a dual cause to actually increase the accessibility of podcasts world wide, and try to promote the idea to podcasters of publishing transcripts (or switching to a podcast host that provides the option to publish transcripts).
In my case, I decided to create transcripts myself for one of my favourite Japanese podcasts (Nihongo con Teppei) and donate them to the podcast creator. That's not necessarily a scalable solution, I know, but I do not mind continuing to do this for more Japanese podcasts until it catches up somewhat to the more popular languages.
There is still an issue of the transcript quality. Since the Nihongo con Teppei example is actually intended for learners of Japanese, it would not look good to publish transcripts with any errors. I think in general, transcript accuracy is important on many fronts: for the deaf, for foreigners, and even for this search engine, because the quality of the search results is only going to be as good as the quality of the transcripts that you're searching through.
So at the moment, these are the problems I'm thinking about. 1) How to encourage more Japanese podcasters (and other languages for that matter) to publish transcripts, and 2) How to improve transcript quality.
History
* August 2019 - idea conception
* December 2019 - the first algorithm
* January 2020 - published the first results in this forum post
* July 2021 - published stats on larger scale results (40,000 English episodes) in this forum post
* August 2022 - released the first working prototype in the post you are reading now