Perhaps not a proper topic for this forum, but I am having a hard time converting a pdf copy of a work about the Iliad from the 19th century. The work lists in order each word in the first book of the Iliad and gives the definition, part of speech and sometimes other information about the word. Reading Eleanor Dickey's Learning Latin the Ancient Way (first mentioned by Tommus in this forum: see below) gave me the idea that a list of the vocabulary listed in order could be given for other ancient works, especially if all the hard work had been done by a dead white man of a previous century . The work in question is Parsing Lessons to Homer's Iliad Book I, 4th edition, whose Google Books url I give below (couldn't find it at archive.org).
Extracting the Greek from the text would be a bonus, but the English text interests me far more. After trying a number of different paths, I finally had to resort to the doyen of pdfs, Adobe. Rather than mortgage my house to buy a copy, I downloaded a trial copy of Adobe and put it to work. The results are not spectacular. Only the html version is of any use. The output is a fair rendition of the English text, but the Greek text turned into nonsense, which makes reading the rest of the text difficult.
I suppose beggars can't be choosers, and what came out is better than nothing (and better than retyping the text by hand (maybe)), but I would like to know of a better method.
BTW, if I finish this project [a big if], it will be free for the asking, not a printed-on-demand thing costing $15 on the Internet.
Finally, I don't see why this method of presenting elementary material can't work for any language.
The first post about Dickey's book was by Tommus http://forum.language-learners.org/viewtopic.php?f=14&t=3369&p=64459&hilit=dickey#p64459.
The pdf I am using can be located here: https://books.google.com/books?id=wj9WAAAAcAAJ&pg=PA27&lpg=PA27&dq=parsing+lessons+to+homer%27s+Iliad&source=bl&ots=wz3FJ1mRWZ&sig=jkO6v2PxF0jz_d-9r5NMahvUPd0&hl=en&sa=X&ved=0ahUKEwiA-eP5oazSAhVmw1QKHdH5A60Q6AEIJjAD#v=onepage&q=parsing%20lessons%20to%20homer%27s%20Iliad&f=false
https://books.google.com/books?id=wj9WA ... ad&f=false
Converting pdf to text
- MorkTheFiddle
- Black Belt - 2nd Dan
- Posts: 2142
- Joined: Sat Jul 18, 2015 8:59 pm
- Location: North Texas USA
- Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
- Language Log: https://forum.language-learners.org/vie ... 11#p133911
- x 4886
Converting pdf to text
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson
- rdearman
- Site Admin
- Posts: 7260
- Joined: Thu May 14, 2015 4:18 pm
- Location: United Kingdom
- Languages: English (N)
- Language Log: viewtopic.php?f=15&t=1836
- x 23319
- Contact:
Re: Converting pdf to text
You can do this with calibre to convert pdf to text. http://calibre-ebook.com/
There is an online version here you can try. http://ebook.online-convert.com/
There is an online version here you can try. http://ebook.online-convert.com/
4 x
: Read 150 books in 2024
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
-
- Orange Belt
- Posts: 170
- Joined: Sat Nov 26, 2016 9:34 pm
- Languages: English (N), German (heritage)
Learning: Russian, French, German, Mandarin, Arabic, Spanish.
Mostly forgotten: Italian, Welsh. - x 377
Re: Converting pdf to text
I find that Calibre doesn't always process punctuation accurately, but maybe there are some settings I should be tweaking.
I've used poppler quite a bit for basic conversions with English text. I tried a basic conversion of p. 11 of that pdf, and it processed the Greek as Latin characters, but maybe using some of the options might produce better results..? (Oh, and the version I have installed is pretty old.)
EDIT: oops, wrong link!
I've used poppler quite a bit for basic conversions with English text. I tried a basic conversion of p. 11 of that pdf, and it processed the Greek as Latin characters, but maybe using some of the options might produce better results..? (Oh, and the version I have installed is pretty old.)
EDIT: oops, wrong link!
1 x
- Adrianslont
- Blue Belt
- Posts: 827
- Joined: Sun Aug 16, 2015 10:39 am
- Location: Australia
- Languages: English (N), Learning Indonesian and French
- x 1936
Re: Converting pdf to text
A very good question for this forum, I think!
As already mentioned - Calibre. I don't know about a web version but I downloaded the Windows app a while back and have converted just a couple of documents with success.
As already mentioned - Calibre. I don't know about a web version but I downloaded the Windows app a while back and have converted just a couple of documents with success.
1 x
-
- Black Belt - 3rd Dan
- Posts: 3536
- Joined: Thu Jul 30, 2015 11:04 am
- Location: Scotland
- Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc - x 8811
- Contact:
Re: Converting pdf to text
PDF is a nightmare. There really isn't a single tool on the planet that does a good job of converting PDFs to any other format, because that's not what the format was designed for -- it's just about making something printable, not editable.
1 x
- MorkTheFiddle
- Black Belt - 2nd Dan
- Posts: 2142
- Joined: Sat Jul 18, 2015 8:59 pm
- Location: North Texas USA
- Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
- Language Log: https://forum.language-learners.org/vie ... 11#p133911
- x 4886
Re: Converting pdf to text
Thanks to everyone who replied. I've decided to let this marinate for a while. I can report some findings.
1. Calibre just plain and simply failed. It took a while to work on the pdf, it produced a log showing all pages were converted, but other than the log, its output was nothing.
2. I tried Adobe's export options one more time, but the results were all below par.
3. The linux package is intriguing, but my last linux box perished and I don't trust bash for Windows 10 to do the trick.
4. I could pay someone to keyboard the pdf script to a text file.
5. Or, I could stop using the Iliad for this project and use instead the Latin of Tacitus, whom I prefer to Homer anyway (gasp!!! )
1. Calibre just plain and simply failed. It took a while to work on the pdf, it produced a log showing all pages were converted, but other than the log, its output was nothing.
2. I tried Adobe's export options one more time, but the results were all below par.
3. The linux package is intriguing, but my last linux box perished and I don't trust bash for Windows 10 to do the trick.
4. I could pay someone to keyboard the pdf script to a text file.
5. Or, I could stop using the Iliad for this project and use instead the Latin of Tacitus, whom I prefer to Homer anyway (gasp!!! )
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson
- tommus
- Blue Belt
- Posts: 957
- Joined: Sat Jul 04, 2015 3:59 pm
- Location: Kingston, ON, Canada
- Languages: English (N), French (B2), Dutch (B2)
- x 1937
Re: Converting pdf to text
I use Google Drive (GDr) + Google Docs (GDoc), both free.
1. Open GDr in a browser (I use Chrome. I don't know if it works in other browsers.)
2. Drag a pdf file into GDr.
3. Right click on that file in GDr and select "Open in Google Docs".
4. Text is in GDoc.
For good quality PDFs, it is very, very good.
1. Open GDr in a browser (I use Chrome. I don't know if it works in other browsers.)
2. Drag a pdf file into GDr.
3. Right click on that file in GDr and select "Open in Google Docs".
4. Text is in GDoc.
For good quality PDFs, it is very, very good.
4 x
Dutch: 01 September -> 31 December 2020
● Watch 1000 Dutch TV Series Videos | : |
- Adrianslont
- Blue Belt
- Posts: 827
- Joined: Sun Aug 16, 2015 10:39 am
- Location: Australia
- Languages: English (N), Learning Indonesian and French
- x 1936
Re: Converting pdf to text
MorkTheFiddle wrote:Perhaps not a proper topic for this forum, but I am having a hard time converting a pdf copy of a work about the Iliad from the 19th century. The work lists in order each word in the first book of the Iliad and gives the definition, part of speech and sometimes other information about the word. Reading Eleanor Dickey's Learning Latin the Ancient Way (first mentioned by Tommus in this forum: see below) gave me the idea that a list of the vocabulary listed in order could be given for other ancient works, especially if all the hard work had been done by a dead white man of a previous century . The work in question is Parsing Lessons to Homer's Iliad Book I, 4th edition, whose Google Books url I give below (couldn't find it at archive.org).
Extracting the Greek from the text would be a bonus, but the English text interests me far more. After trying a number of different paths, I finally had to resort to the doyen of pdfs, Adobe. Rather than mortgage my house to buy a copy, I downloaded a trial copy of Adobe and put it to work. The results are not spectacular. Only the html version is of any use. The output is a fair rendition of the English text, but the Greek text turned into nonsense, which makes reading the rest of the text difficult.
I suppose beggars can't be choosers, and what came out is better than nothing (and better than retyping the text by hand (maybe)), but I would like to know of a better method.
BTW, if I finish this project [a big if], it will be free for the asking, not a printed-on-demand thing costing $15 on the Internet.
Finally, I don't see why this method of presenting elementary material can't work for any language.
The first post about Dickey's book was by Tommus http://forum.language-learners.org/viewtopic.php?f=14&t=3369&p=64459&hilit=dickey#p64459.
The pdf I am using can be located here: https://books.google.com/books?id=wj9WAAAAcAAJ&pg=PA27&lpg=PA27&dq=parsing+lessons+to+homer%27s+Iliad&source=bl&ots=wz3FJ1mRWZ&sig=jkO6v2PxF0jz_d-9r5NMahvUPd0&hl=en&sa=X&ved=0ahUKEwiA-eP5oazSAhVmw1QKHdH5A60Q6AEIJjAD#v=onepage&q=parsing%20lessons%20to%20homer%27s%20Iliad&f=false
https://books.google.com/books?id=wj9WA ... ad&f=false
Unless I'm missing something you have only provided a link to google books and not a PDF. If so, can you provide the PDF? I'm feeling like a challenge. No guarantees on results of course!
1 x
- MorkTheFiddle
- Black Belt - 2nd Dan
- Posts: 2142
- Joined: Sat Jul 18, 2015 8:59 pm
- Location: North Texas USA
- Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
- Language Log: https://forum.language-learners.org/vie ... 11#p133911
- x 4886
Re: Converting pdf to text
Adrianslont wrote:Unless I'm missing something you have only provided a link to google books and not a PDF. If so, can you provide the PDF? I'm feeling like a challenge. No guarantees on results of course!
Thanks for the offer of help. Here is a link to the pdf. https://drive.google.com/open?id=0ByymqjYSyIAJVjk4bk5CZElCelk
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson
- MorkTheFiddle
- Black Belt - 2nd Dan
- Posts: 2142
- Joined: Sat Jul 18, 2015 8:59 pm
- Location: North Texas USA
- Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
- Language Log: https://forum.language-learners.org/vie ... 11#p133911
- x 4886
Re: Converting pdf to text
tommus wrote:I use Google Drive (GDr) + Google Docs (GDoc), both free.
1. Open GDr in a browser (I use Chrome. I don't know if it works in other browsers.)
2. Drag a pdf file into GDr.
3. Right click on that file in GDr and select "Open in Google Docs".
4. Text is in GDoc.
For good quality PDFs, it is very, very good.
I appreciate the suggestion, Tommus. Unfortunately, when I try this, Google tells me, "Unable to Convert Document", without further explanation.
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson
Return to “Practical Questions and Advice”
Who is online
Users browsing this forum: nathancrow77 and 2 guests