how to put | between content lines of a text file? - sed

I have a file containing:
L1
L2
L3
.
.
.
L512
I want to change its content to :
L1 | L2 | L3 | ... | L512
It seems so easy , but its now 1 hour Im sitting and trying to make it, I tried to do it by sed, but didn't get what I want. It seems that sed just inserts empty lines between the content, any suggestion please?

With sed this requires to read the whole input into a buffer and afterwards replace all newlines by |, like this:
sed ':a;N;$!ba;s/\n/ | /g' input.txt
Part 1 - buffering input
:a defines a label called 'a'
N gets the next line from input and appends it to the pattern buffer
$!ba jumps to a unless the end of input is reached
Part 2 - replacing newlines by |
s/\n/|/ execute the substitute command on the pattern buffern
As you can see, this is very inefficient since it requires to:
read the complete input into memory
operate three times on the input: 1. reading, 2. substituting, 3. printing
Therefore I would suggest to use awk which can do it in one loop:
awk 'NR==1{printf $0;next}{printf " | "$0}END{print ""}' input.txt

Here is one sed
sed ':a;N;s/\n/ | /g;ta' file
L1 | L2 | L3 | ... | L512
And one awk
awk '{printf("%s%s",sep,$0);sep=" | "} END {print ""}' file
L1 | L2 | L3 | ... | L512

perl -pe 's/\n/ |/g unless(eof)' file

if space between | is not mandatory
tr "\n" '|' YourFile

Several options, including those mentioned here:
paste -sd'|' file
sed ':a;N;s/\n/ | /g;ta' file
sed ':a;N;$!ba;s/\n/ | /g' file
perl -0pe 's/\n/ | /g;s/ \| $/\n/' file
perl -0nE 'say join " | ", split /\n/' file
perl -E 'chomp(#x=<>); say join " | ", #x' file
mapfile -t ary < file; (IFS="|"; echo "${ary[*]}")
awk '{printf("%s%s",sep,$0);sep=" | "} END {print ""}' file

Related

bas64 decode to csv file, sed script

I have the following script to extract text inside "reportBody" text, but I need also to decode this text from a new file to base64. How can I do this?
Here's a script:
cat $1 | tr "\n" "|" | grep -o '<reportBody>.*</reportBody>' | sed 's/\(<reportBody>\|<\/reportBody>\)//g' | sed 's/|/\n/g' | sed '/^\s*$/d' > $2
tried :
cat $1 | tr "\n" "|" | grep -o '<reportBody>.*</reportBody>' | sed 's/\(<reportBody>\|<\/reportBody>\)//g' | sed 's/|/\n/g' | sed '/^\s*$/d' | base64 -d $2 > $2
but it doesn't decode it,
Can I overwrite the same file or at least save decoded text in a new one? without calling addition modules from python etc.
Note: File contains 20k+ symbols to decode.

Removing matching text from line

I have a example cut down from a log file.
112 172.172.172.1#50912 (ssl.bing.com):
I would like some how to remove the # and numbers after and (): from the url.
Would like the result.
112 172.172.172.1 ssl.bing.com
Here is the sed oneliner I have been working on.
cat newdns.log | sed -e 's/.*query: //' | cut -f 1 -d' ' | sort | uniq -c | sort -k2 > old.log
Thanks
Using sed, you could say:
sed 's/#[0-9]*//;s/(\(.*\)):$/\1/' filename
or, in a single substitution:
sed 's/#[0-9]* *(\(.*\)):$/ \1/' filename
Another sed:
sed -r 's/#[^ ]+|[():]//g'
$ echo '112 172.172.172.1#50912 (ssl.bing.com):' | sed -r 's/#[^ ]+|[():]//g'
112 172.172.172.1 ssl.bing.com

Insert comma after certain byte range

I'm trying to turn a big list of data into a CSV. Its basically a giant list with no spaces, and the rows are separated by newlines. I have made a bash script that basically loops through the document, awks out the line, cuts the byte range, and then adds a comma and appends it to the end of the line. It looks like this:
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 1-12 | tr -d '\n' >> $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 13-17 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 18-22 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 23-34 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
The problem is this is EXTREMELY slow, and the data has about 400k rows. I know there must be a better way to accomplish this. Essentially I just need to add a comma after every 12/17/22/34 etc character of a line.
Any help is appreciated, thank you!
There are many many ways to do this with Perl. Here is one way:
perl -pe 's/(.{12})(.{5})(.{5})(.{12})/$1,$2,$3,$4,/' < input-file > output-file
The matching pattern in the substitution captures four groups of text from the beginning of each line with 12, 5, 5, and 12 arbitrary characters. The replacement pattern places a comma after each group.
With GNU awk, you could write
gawk 'BEGIN {FIELDWIDTHS="12 5 5 12"; OFS=","} {$1=$1; print}'
The $1=$1 part is to force awk to rewrite the like, incorporating the output field separator, without changing anything.
This is very much a job for substr.
use strict;
use warnings;
my #widths = (12, 5, 5, 12);
my $offset;
while (my $line = <DATA>) {
for my $width (#widths) {
$offset += $width;
substr $line, $offset, 0, ',';
++$offset;
}
print $line;
}
__DATA__
1234567890123456789012345678901234567890
output
123456789012,34567,89012,345678901234,567890

How to strip characters within a filename?

I am having trouble on stripping characters within a filename.
For example:
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
This is the output I want:
1326847080_MUNDO-1.xml
1326836220_PLANETACNN-3.xml
for i in *.xml
do
j=$(echo $i | sed -e s/-.*-/-/)
echo mv $i $j
done
or in one line:
for i in *.xml; do echo mv $i $(echo $i | sed -e s/-.*-/-/); done
remove echo to actually perform the mv command.
Or, without sed, using bash builtin pattern replacement:
for i in *.xml; do echo mv $i ${i//-*-/-}; done
rename to the rescue, with Perl regular expressions. This command will show which moves will be made; just remove -n to actually rename the files:
$ rename -n 's/([^-]+)-.*-([^-]+)/$1-$2/' *.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml renamed as 1326836220_PLANETACNN-3.xml
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml renamed as 1326847080_MUNDO-1.xml
The regular expression explained:
Save the part up to (but excluding) the first dash as match 1.
Save the part after the last dash as match 2.
Replace the part from the start of match 1 to the end of match 2 with match 1, a dash, and match 2.
sorry for the late reply , but i saw it today :( .
I think you are looking for the following
input file ::
cat > abc
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
code : (its a bit too basic , even for my liking)
while read line
do
echo $line ;
fname=`echo $line | cut -d"-" -f1`;
lfield=`echo $line | sed -n 's/\-/ /gp' | wc -w`;
lname=`echo $line | cut -d"-" -f${lfield}`;
new_name="${fname}-${lname}";
echo "new name is :: $new_name";
done < abc ;
output ::
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
new name is :: 1326847080_MUNDO-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
new name is :: 1326836220_PLANETACNN-3.xml

sed how to show only largest number of line containing "Page"

Currently, this shows only numbers.
sed 's/[^0-9]*//g')
How can I tell sed to display ONLY the largest number found, taking into account ONLY the line which contains the word "Page" ?
sed '/Page/!d; s/[^0-9]//g' | sort -n | tail -1
or
awk '/Page/ {gsub(/[^0-9]/,""); if ($0 > max) max = $0} END {print max}'
grep Page filename | awk '{print $2}' | sort -n | tail -n 1
This assumes the page number is the 2nd word of the line (if not, change the awk command as appropriate)