Since many subtitles are not encoded as utf-8 files, I've slapped together a very simple command line tool that'll convert non-utf-8 files to utf-8 files.
To use it unzip srt2utf8.exe to the folder that contains the subtitles and drag & drop the subtitle file(s) on srt2utf8.exe. You can also double-click srt2utf8.exe to process all .srt files in the folder. Of course, you can also use it in a command prompt window.
For example, if you've downloaded an Arabic subtitle file, named subtitle.srt and process it with srt2utf8.exe, you should end up with:
subtitle.ar.srt (the converted utf-8 file)
subtitle.windows-1256.bak (the original subtitle file)
If the subtitle file already is a utf-8 file, the tool will try to detect the language and insert the language code before the file extension. I.e., if nothing happens when you use the tool, the file either already is a utf-8 file or the code page couldn't be converted.
Note that the code page/language detection libraries that I've used aren't 100% reliable.
If you're interested in this tool, pm me for the download link.
Simple SRT code page converter
-
- Green Belt
- Posts: 404
- Joined: Sat Jul 18, 2015 6:21 pm
- Languages: German (N)
- x 807
Simple SRT code page converter
Last edited by Doitsujin on Sun Nov 10, 2019 6:39 am, edited 1 time in total.
0 x
- rdearman
- Site Admin
- Posts: 7260
- Joined: Thu May 14, 2015 4:18 pm
- Location: United Kingdom
- Languages: English (N)
- Language Log: viewtopic.php?f=15&t=1836
- x 23316
- Contact:
Re: Simple SRT code page converter
On Linux you could just run this command:
On Windows you can just open it in Notepad++ and change the encoding to UTF-8 and save it.
Code: Select all
vim +'set nobomb | set fenc=utf8 | x' <filename>
On Windows you can just open it in Notepad++ and change the encoding to UTF-8 and save it.
3 x
: Read 150 books in 2024
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
-
- Green Belt
- Posts: 404
- Joined: Sat Jul 18, 2015 6:21 pm
- Languages: German (N)
- x 807
Re: Simple SRT code page converter
I only have limited Linux skills, but what would be the complete command line with the file name? Or is this a command that I'd have to enter in vim?rdearman wrote:On Linux you could just run this command:Code: Select all
vim +'set nobomb set fenc=utf8 x'
Of course, this can be easily done for one file, but it's much more convenient to use my tool to convert a whole folder with .srt files.rdearman wrote:On Windows you can just open it in Notepad++ and change the encoding to UTF-8 and save it.
0 x
- rdearman
- Site Admin
- Posts: 7260
- Joined: Thu May 14, 2015 4:18 pm
- Location: United Kingdom
- Languages: English (N)
- Language Log: viewtopic.php?f=15&t=1836
- x 23316
- Contact:
Re: Simple SRT code page converter
I only have limited Linux skills, but what would be the complete command line with the file name? Or is this a command that I'd have to enter in vim?rdearman wrote:On Linux you could just run this command:Code: Select all
vim +'set nobomb set fenc=utf8 x'
Of course, this can be easily done for one file, but it's much more convenient to use my tool to convert a whole folder with .srt files.rdearman wrote:On Windows you can just open it in Notepad++ and change the encoding to UTF-8 and save it.
The vim is from the commandline, you don't need to open the file. There are also two other commands you can use in Linux. The most well known is iconv but you have to know the encoding of the original file in order to use it. Below is a script you can use to which will detect the file encoding and rencode it.
Code: Select all
#!/bin/bash
#enter input encoding here
FROM_ENCODING="value_here"
#output encoding(UTF-8)
TO_ENCODING="UTF-8"
#convert
CONVERT=" iconv -f $FROM_ENCODING -t $TO_ENCODING"
#loop to convert multiple files
for file in *.txt; do
$CONVERT "$file" -o "${file%.txt}.utf8.converted"
done
exit 0
To do multiple files you can use the find command with iconv like below (this will detect the encoding of the original):
Code: Select all
find . -type f -iname *.txt -exec sh -c 'iconv -f $(file -bi "$1" |sed -e "s/.*[ ]charset=//") -t utf-8 -o converted "$1" && mv converted "$1"' -- {} \;
You can also use recode
Code: Select all
recode UTF8..ISO-8859-15 in.txt
In Windows you can install vim (which is cross platform) and run the same command. Another option on windows is to use Powershell:
Code: Select all
Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt
In order to do multiple files you can use Powershell.
Code: Select all
foreach ($file in get-ChildItem *.txt) {
Echo $file.name
Get-Content $file | Set-Content -Encoding utf8 ("$file.name" +".sql")
}
I HAVE NOT TESTED ALL THIS! No warranty, your mileage may vary.
1 x
: Read 150 books in 2024
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
-
- Green Belt
- Posts: 404
- Joined: Sat Jul 18, 2015 6:21 pm
- Languages: German (N)
- x 807
Re: Simple SRT code page converter
Unfortunately, this method doesn't work for non-Latin .srt files. Morevover, vim is very much an acquired taste...rdearman wrote:The vim is from the command line, you don't need to open the file.
Your shell script worked fine. (Obviously, users will need to change *.txt to *.srt.)rdearman wrote:There are also two other commands you can use in Linux. The most well known is iconv [...]
IMHO, it'd be much easier to use the Windows port of iconv or my tool.rdearman wrote:In Windows you can install vim (which is cross platform) and run the same command.
0 x
- rdearman
- Site Admin
- Posts: 7260
- Joined: Thu May 14, 2015 4:18 pm
- Location: United Kingdom
- Languages: English (N)
- Language Log: viewtopic.php?f=15&t=1836
- x 23316
- Contact:
Re: Simple SRT code page converter
I don't use vim myself, I'm an emacs man. I don't use Windows much which was why I gave the warnings about Powershell scripts.
1 x
: Read 150 books in 2024
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
Return to “Language Programs and Resources”
Who is online
Users browsing this forum: No registered users and 2 guests