rdearman wrote:OK, I've modified it to deal with words which contain ' or - so it can now successfully deal with: l'orange and part-time.Code: Select all
#!/usr/bin/perl
use strict;
use warnings;
my $filename = $ARGV[0];
my $outfile = $ARGV[1];
open(my $fh, '<:encoding(UTF-8)', $filename) ;
open(my $ofh, '>:encoding(UTF-8)', $outfile);
while (my $row = <$fh>)
{
my $wordcount;
chomp $row;
my @words = $row =~ /\w+[\'\-]\w+|\w+/g ;
$wordcount = scalar( @words );
for (my $a=0;$a<$wordcount;$a++)
{
print $ofh "{{c" . $a . "::" . $words[$a] . "}} " ;
}
print $ofh "\n";
}
print "done\n";
Thanks for that, Rick.
In the process of fixing your code you seem to have accidentally dropped two lines of code.
# my $filename = "input.txt";
# my $outfile = "output.csv";
I have added them back in, renaming the input file input.txt and the output file output.csv
I have remade my 84k cards and the apostrophes and hyphens seem to be in order. I haven't spotted any other issues, though I have only looked at about 50 cards.
Cheers
Code: Select all
#!/usr/bin/perl
use strict;
use warnings;
my $filename = $ARGV[0];
my $outfile = $ARGV[1];
# my $filename = "input.txt";
# my $outfile = "output.csv";
open(my $fh, '<:encoding(UTF-8)', $filename) ;
open(my $ofh, '>:encoding(UTF-8)', $outfile);
while (my $row = <$fh>)
{
my $wordcount;
chomp $row;
my @words = $row =~ /\w+[\'\-]\w+|\w+/g ;
$wordcount = scalar( @words );
for (my $a=0;$a<$wordcount;$a++)
{
print $ofh "{{c" . $a . "::" . $words[$a] . "}} " ;
}
print $ofh "\n";
}
print "done\n";