Simple SRT code page converter

All about language programs, courses, websites and other learning resources
Doitsujin
Orange Belt
Posts: 179
Joined: Sat Jul 18, 2015 6:21 pm
Languages: German (N)
x 331

Simple SRT code page converter

Postby Doitsujin » Thu Sep 05, 2019 1:30 pm

Since many subtitles are not encoded as utf-8 files, I've slapped together a very simple command line tool that'll convert non-utf-8 files to utf-8 files.

To use it unzip srt2utf8.exe to the folder that contains the subtitles and drag & drop the subtitle file(s) on srt2utf8.exe. You can also double-click srt2utf8.exe to process all .srt files in the folder. Of course, you can also use it in a command prompt window.

For example, if you've downloaded an Arabic subtitle file, named subtitle.srt and process it with srt2utf8.exe, you should end up with:

subtitle.ar.srt (the converted utf-8 file)
subtitle.windows-1256.bak (the original subtitle file)

If the subtitle file already is a utf-8 file, the tool will try to detect the language and insert the language code before the file extension. I.e., if nothing happens when you use the tool, the file either already is a utf-8 file or the code page couldn't be converted.

Note that the code page/language detection libraries that I've used aren't 100% reliable.

Here's the download link.
0 x

User avatar
rdearman
Site Admin
Posts: 4672
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 11034
Contact:

Re: Simple SRT code page converter

Postby rdearman » Thu Sep 05, 2019 5:21 pm

On Linux you could just run this command:

Code: Select all

vim +'set nobomb | set fenc=utf8 | x' <filename>


On Windows you can just open it in Notepad++ and change the encoding to UTF-8 and save it.
3 x
: 6 / 100 100 Italian paperbacks:
: 306 / 75000 Output Challenge 2019 (普通话写作):

Lollygagging Podcast available on iTunes

Doitsujin
Orange Belt
Posts: 179
Joined: Sat Jul 18, 2015 6:21 pm
Languages: German (N)
x 331

Re: Simple SRT code page converter

Postby Doitsujin » Thu Sep 05, 2019 8:46 pm

rdearman wrote:On Linux you could just run this command:

Code: Select all

vim +'set nobomb  set fenc=utf8  x'
I only have limited Linux skills, but what would be the complete command line with the file name? Or is this a command that I'd have to enter in vim?
rdearman wrote:On Windows you can just open it in Notepad++ and change the encoding to UTF-8 and save it.
Of course, this can be easily done for one file, but it's much more convenient to use my tool to convert a whole folder with .srt files.
0 x

User avatar
rdearman
Site Admin
Posts: 4672
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 11034
Contact:

Re: Simple SRT code page converter

Postby rdearman » Thu Sep 05, 2019 10:44 pm

rdearman wrote:On Linux you could just run this command:

Code: Select all

vim +'set nobomb  set fenc=utf8  x'
I only have limited Linux skills, but what would be the complete command line with the file name? Or is this a command that I'd have to enter in vim?
rdearman wrote:On Windows you can just open it in Notepad++ and change the encoding to UTF-8 and save it.
Of course, this can be easily done for one file, but it's much more convenient to use my tool to convert a whole folder with .srt files.


The vim is from the commandline, you don't need to open the file. There are also two other commands you can use in Linux. The most well known is iconv but you have to know the encoding of the original file in order to use it. Below is a script you can use to which will detect the file encoding and rencode it.

Code: Select all

#!/bin/bash
#enter input encoding here
FROM_ENCODING="value_here"
#output encoding(UTF-8)
TO_ENCODING="UTF-8"
#convert
CONVERT=" iconv  -f   $FROM_ENCODING  -t   $TO_ENCODING"
#loop to convert multiple files
for  file  in  *.txt; do
     $CONVERT   "$file"   -o  "${file%.txt}.utf8.converted"
done
exit 0


To do multiple files you can use the find command with iconv like below (this will detect the encoding of the original):

Code: Select all

 find . -type f -iname *.txt -exec sh -c 'iconv -f $(file -bi "$1" |sed -e "s/.*[ ]charset=//") -t utf-8 -o converted "$1" && mv converted "$1"' -- {} \;



You can also use recode

Code: Select all

recode UTF8..ISO-8859-15 in.txt

In Windows you can install vim (which is cross platform) and run the same command. Another option on windows is to use Powershell:

Code: Select all

Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt


In order to do multiple files you can use Powershell.

Code: Select all

foreach ($file in get-ChildItem *.txt) {
    Echo $file.name
    Get-Content $file | Set-Content -Encoding utf8 ("$file.name" +".sql")
 }
 


I HAVE NOT TESTED ALL THIS! No warranty, your mileage may vary.
1 x
: 6 / 100 100 Italian paperbacks:
: 306 / 75000 Output Challenge 2019 (普通话写作):

Lollygagging Podcast available on iTunes

Doitsujin
Orange Belt
Posts: 179
Joined: Sat Jul 18, 2015 6:21 pm
Languages: German (N)
x 331

Re: Simple SRT code page converter

Postby Doitsujin » Fri Sep 06, 2019 7:33 am

rdearman wrote:The vim is from the command line, you don't need to open the file.
Unfortunately, this method doesn't work for non-Latin .srt files. Morevover, vim is very much an acquired taste...

rdearman wrote:There are also two other commands you can use in Linux. The most well known is iconv [...]
Your shell script worked fine. (Obviously, users will need to change *.txt to *.srt.)

rdearman wrote:In Windows you can install vim (which is cross platform) and run the same command.
IMHO, it'd be much easier to use the Windows port of iconv or my tool.
0 x

User avatar
rdearman
Site Admin
Posts: 4672
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 11034
Contact:

Re: Simple SRT code page converter

Postby rdearman » Fri Sep 06, 2019 10:20 am

I don't use vim myself, I'm an emacs man. I don't use Windows much which was why I gave the warnings about Powershell scripts. :)
1 x
: 6 / 100 100 Italian paperbacks:
: 306 / 75000 Output Challenge 2019 (普通话写作):

Lollygagging Podcast available on iTunes


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 2 guests