How can I change spaces to underscores and lowercase everything? - perl

I have a text file which contains:
Cycle code
Cycle month
Cycle year
Event type ID
Event ID
Network start time
I want to change this text so that when ever there is a space, I want to replace it with a _. And after that, I want the characters to lower case letter like below:
cycle_code
cycle_month
cycle_year
event_type_id
event_id
network_start_time
How could I accomplish this?

Another Perl method:
perl -pe 'y/A-Z /a-z_/' file

tr alone works:
tr ' [:upper:]' '_[:lower:]' < file

Looking into sed documentation some more and following advice from the comments the following command should work.
sed -r {filehere} -e 's/[A-Z]/\L&/g;s/ /_/g' -i

There is a perl tag in your question as well. So:
#!/usr/bin/perl
use strict; use warnings;
while (<DATA>) {
print join('_', split ' ', lc), "\n";
}
__DATA__
Cycle code
Cycle month
Cycle year
Event type ID
Event ID
Network start time
Or:
perl -i.bak -wple '$_ = join('_', split ' ', lc)' test.txt

sed "y/ABCDEFGHIJKLMNOPQRSTUVWXYZ /abcdefghijklmnopqrstuvwxyz_/" filename

Just use your shell, if you have Bash 4
while read -r line
do
line=${line,,} #change to lowercase
echo ${line// /_}
done < "file" > newfile
mv newfile file
With gawk:
awk '{$0=tolower($0);$1=$1}1' OFS="_" file
With Perl:
perl -ne 's/ +/_/g;print lc' file
With Python:
>>> f=open("file")
>>> for line in f:
... print '_'.join(line.split()).lower()
>>> f.close()

Related

Normalize column fill with space on right

Working with the example log file below:
1;000117;20190529;055529;9521;0988388019
1;000015;20190529;071944;2222;2231
1;000012;20190529;072734;4258;4252
1;000006;20190529;073336;2226;1000
3;000005;20190529;073715;1000;037760967
3;000004;20190529;073751;1000;037760967
I need to normalize the last column filling with spaces until they has the lenght = 25
Tryed with unsuccessful perl code:
perl -F';' -lane '$F[5] = $F[5], sprintf "% 25d"; $" = ";"; print "#F"'
I need the output below:
1;000117;20190529;055529;9521;0988388019
1;000015;20190529;071944;2222;2231
1;000012;20190529;072734;4258;4252
1;000006;20190529;073336;2226;1000
3;000005;20190529;073715;1000;037760967
3;000004;20190529;073751;1000;037760967
$ awk 'BEGIN{FS=OFS=";"} {$NF=sprintf("%-25s",$NF)}1' file
1;000117;20190529;055529;9521;0988388019
1;000015;20190529;071944;2222;2231
1;000012;20190529;072734;4258;4252
1;000006;20190529;073336;2226;1000
3;000005;20190529;073715;1000;037760967
3;000004;20190529;073751;1000;037760967
So you can see the blanks:
$ awk 'BEGIN{FS=OFS=";"} {$NF=sprintf("%-25s",$NF)}1' file | tr ' ' '#'
1;000117;20190529;055529;9521;0988388019###############
1;000015;20190529;071944;2222;2231#####################
1;000012;20190529;072734;4258;4252#####################
1;000006;20190529;073336;2226;1000#####################
3;000005;20190529;073715;1000;037760967################
3;000004;20190529;073751;1000;037760967################
You were on the right track. More successful Perl codes:
perl -F';' -lane '$F[5]=sprintf("%-25s",$F[5]);print join ";",#F'
perl -F';' -pane '$F[5]=sprintf("%-25s",$F[5]);$_=join ";",#F'
This might work for you (GNU sed):
sed -i ':a;/;[^;]\{25\}$/!s/$/ /;ta' file
If the last field is not 25 characters long, add a space until it is.

Extracting fasta ids after string match

I have a list of fasta sequences as following:
>Product_1_001:299:H377WBGXB:1:11101
TGATCATCTCACCTACTAATAGGACGATGACCCAGTGACGATGA
>Product_2_001:299:H377WBGXB:2:11101
CATCGATGATCATTGATAAGGGGCCCATACCCATCAAAACCGTT
The original fasta sequence is much longer than the subset posted here. I wanted to extract the 10 characters after the pattern "TCAT" into a separate file and did this
grep -oP "(?<=TCAT).{10}"
I do get the needed result as:
CTCACCTACT
TGATAAGGGG
I would like their corresponding fasta ids as one column and the extracted pattern as second column like:
>Product_1_001:299:H377WBGXB:1:11101 CTCACCTACT
>Product_2_001:299:H377WBGXB:2:11101 TGATAAGGGG
Try this one-liner
perl -lne ' /^[^<].+?(?<=TCAT)(.{10})/ and print $p,"\t",$1; $p=$_ ' file
with your given inputs
$ cat fasta.txt
>Product_1_001:299:H377WBGXB:1:11101
TGATCATCTCACCTACTAATAGGACGATGACCCAGTGACGATGA
>Product_2_001:299:H377WBGXB:2:11101
CATCGATGATCATTGATAAGGGGCCCATACCCATCAAAACCGTT
$ perl -lne ' /^[^<].+?(?<=TCAT)(.{10})/ and print $p,"\t",$1; $p=$_ ' fasta.txt
>Product_1_001:299:H377WBGXB:1:11101 CTCACCTACT
>Product_2_001:299:H377WBGXB:2:11101 TGATAAGGGG
$
Another way will be ussing awk command like this :
cat <your_file>| awk -F"_" '/Product/{printf "%s", $0; next} 1'|awk -F"TCAT" '{ print substr($1,1,35) "\t" substr($2,1,10)}'
the output :
Product_1_001:299:H377WBGXB:1:11101 CTCACCTACT
Product_2_001:299:H377WBGXB:2:11101 TGATAAGGGG
hope it help you.

How to add new number into each line?

I have this line about 500 times in a my file backup.xml
my-company-review/</link>
Is there a way through command line, perl, etc. to add a number into the line after the word review. For example, something like this:
my-company-review1/</link>
my-company-review2/</link>
my-company-review3/</link>
Thanks in advance for the help!
Why not use Perl, as I suggested with your last problem. Once again, this is a sort of hack solution, that only works if there's a maximum of one replacement per line... But it's a quick throw-away program.
perl -e '$count=1; foreach (<>) { s/(my-company-review)(\/<\/link>)/$1$count$2/ && $count++; print; }'
An extra loop will do multiple substitutions on a line:
perl -e '$count=1; foreach (<>) { while(s/(my-company-review)(\/<\/link>)/$1$count$2/) {$count++;} print; }'
That awk solution looks way nicer =)
Here's one way:
perl -i -wpe ' BEGIN { $count = 1; }
++$count
if s{(my-company-review)(/</link>)}{$1$count$2};
' backup.xml
(Disclaimer: not tested.)
You can use awk:
awk 'gsub("/</link>", NR "/</link>")' infile
or perl:
perl -ne 's:/</link>:$./</link>:; print' infile

Bash, Perl or Sed, Insert on New Line after found phrase

Ok I guess I need something that will do the following:
search for this line of code in /var/lib/asterisk/bin/retrieve_conf:
$engineinfo = engine_getinfo();
insert these two lines immediately following:
$engineinfo['engine']="asterisk";
$engineinfo['version']="1.6.2.11";
Thanks in advance,
Joe
You could do it like this
sed -ne '/$engineinfo = engine_getinfo();/a\'$'\n''$engineinfo['engine']="asterisk";\'$'\n''$engineinfo['version']="1.6.2.11";'$'\n'';p' /var/lib/asterisk/bin/retrieve_conf
Add -i for modification in place once you confirm that it works.
What does it do and how does it work?
First we tell sed to match a line containing your string. On that matched line we then will perform an a command, which is "append text".
The syntax of a sed a command is
a\
line of text\
another line
;
Note that the literal newlines are part of this syntax. To make it all one line (and preserve copy-paste ability) in place of literal newlines I used $'\n' which will tell bash or zsh to insert a real newline in place. The quoting necessary to make this work is a little complex: You have to exit single-quotes so that you can have the $'\n' be interpreted by bash, then you have to re-enter a single-quoted string to prevent bash from interpreting the rest of your input.
EDIT: Updated to append both lines in one append command.
You can use Perl and Tie::File (included in the Perl distribution):
use Tie::File;
tie my #array, 'Tie::File', "/var/lib/asterisk/bin/retrieve_conf" or die $!;
for (0..$#array) {
if ($array[$_] =~ /\$engineinfo = engine_getinfo\(\);/) {
splice #array, $_+1, 0, q{$engineinfo['engine']="asterisk"; $engineinfo['version']="1.6.2.11";};
last;
}
}
Just for the sake of symmetry here's an answer using awk.
awk '{ if(/\$engineinfo = engine_getinfo\(\);/) print $0"\n$engineinfo['\''engine'\'']=\"asterisk\";\n$engineinfo['\''version'\'']=\"1.6.2.11\"" ; else print $0 }' in.txt
You may also use ed:
# cf. http://wiki.bash-hackers.org/howto/edit-ed
cat <<-'EOF' | ed -s /var/lib/asterisk/bin/retrieve_conf
H
/\$engineinfo = engine_getinfo();/a
$engineinfo['engine']="asterisk";
$engineinfo['version']="1.6.2.11";
.
wq
EOF
A Perl one-liner:
perl -pE 's|(\$engineinfo) = engine_getinfo\(\);.*\K|\n${1}['\''engine'\'']="asterisk";\n${1}['\''version'\'']="1.6.2.11";|' file
sed -i 's/$engineinfo = engine_getinfo();/$engineinfo = engine_getinfo();<CTRL V><CNTRL M>$engineinfo['engine']="asterisk"; $engineinfo['version']="1.6.2.11";/' /var/lib/asterisk/bin/retrieve_conf

How can I remove the timestamp from a filename in Perl?

I have a file which has a line in it as:
/hosting/logs/U01-ecom-SIT01/CU01-DC05-IFIO_SIT01_NU01-nc3sz1ecmas11/waslogs/SystemOut_10.01.21_16.54.18.log`
I need a script which would read this line and remove the time stamp from it, that is:
10.01.21_16.54.18
The script should print the filename without the timestamp and holding the full path, that is:
/hosting/logs/U01-ecom-SIT01/CU01-DC05-IFIO_SIT01_NU01-nc3sz1ecmas11/waslogs/SystemOut.log`
Please help as I'm unable to pattern match and output the file path without the timestamp.
echo "/hosting/logs/U01-ecom-SIT01/CU01-DC05-IFIO_SIT01_NU01-nc3sz1ecmas11/waslogs/SystemOut_10.01.21_16.54.18.log" |
perl -pe "s/_\d\d\.\d\d\.\d\d_\d\d\.\d\d\.\d\d//;"
$ perl -e 's{_\d{2}\.\d{2}.\d{2}_\d{2}\.\d{2}.\d{2}}{} and print for #ARGV' /hosting/logs/U01-ecom-SIT01/CU01-DC05-IFIO_SIT01_NU01-nc3sz1ecmas11/waslogs/SystemOut_10.01.21_16.54.18.log
Path shortened to prevent scrolling:
$ cat paths
CU01-DC05-IFIO_SIT01_NU01-nc3sz1ecmas11/waslogs/SystemOut_10.01.21_16.54.18.log
$ perl -pe 's/(_(\d\d(\.\d\d){2})){2}\.log$/.log/' paths
CU01-DC05-IFIO_SIT01_NU01-nc3sz1ecmas11/waslogs/SystemOut.log
The timestamp is made up of 2 sequences that look like _##.##.##. The subsequences end with 2 sequences of .##. These are the roles of the {2} quantifiers.
while(<>){
#s = split /\// ;
$fullpath=join("/",splice #s , 0, $#s);
#a = split /[_.]/ ,$s[-1];
$newfile="$fullpath/$a[0].$a[-1]";
print $newfile."\n";
}
You can use the following coding
use strict;
use warnings;
my $var; $var=/hosting/logs/U01-ecom-SIT01/CU01-DC05-IFIO_SIT01_NU01-nc3sz1ecmas11/waslogs/SystemOut_10.01.21_16.54.18.log";
$var=~s/_\d\d\.\d\d\.\d\d//g;
# $var=~s/_10\.01\.21_16\.54\.18//g; # You can use this way also
print "$var\n";