Add few line with in the middle of a file - sed

I want to add few in a file. Is it possible to use sed?
original file
test1
test2
test3
Expected output after adding the new line
test1
#
testing123
#
test3

If you don't mind using "while/read" instead of "sed", this is one solution:
[~]$ cat original.txt
test1
test2
test3
[~]$ cat new_content.txt
#
testing123
#
Then, process both files with the following script:
script.sh
#!/bin/bash
while IFS= read -r line
do
if [[ $line =~ ^test2.*$ ]]
then
cat new_content.txt
else
echo "$line"
fi
done < original.txt

sed '2 i\
\
#
2 a\
#\
' YourFile
Arbitrary take the line 2 as middle (not easy to count or take middle in posix sed)
ifor insert (before)
a for append (after)

Related

Insert comma after certain byte range

I'm trying to turn a big list of data into a CSV. Its basically a giant list with no spaces, and the rows are separated by newlines. I have made a bash script that basically loops through the document, awks out the line, cuts the byte range, and then adds a comma and appends it to the end of the line. It looks like this:
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 1-12 | tr -d '\n' >> $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 13-17 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 18-22 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 23-34 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
The problem is this is EXTREMELY slow, and the data has about 400k rows. I know there must be a better way to accomplish this. Essentially I just need to add a comma after every 12/17/22/34 etc character of a line.
Any help is appreciated, thank you!
There are many many ways to do this with Perl. Here is one way:
perl -pe 's/(.{12})(.{5})(.{5})(.{12})/$1,$2,$3,$4,/' < input-file > output-file
The matching pattern in the substitution captures four groups of text from the beginning of each line with 12, 5, 5, and 12 arbitrary characters. The replacement pattern places a comma after each group.
With GNU awk, you could write
gawk 'BEGIN {FIELDWIDTHS="12 5 5 12"; OFS=","} {$1=$1; print}'
The $1=$1 part is to force awk to rewrite the like, incorporating the output field separator, without changing anything.
This is very much a job for substr.
use strict;
use warnings;
my #widths = (12, 5, 5, 12);
my $offset;
while (my $line = <DATA>) {
for my $width (#widths) {
$offset += $width;
substr $line, $offset, 0, ',';
++$offset;
}
print $line;
}
__DATA__
1234567890123456789012345678901234567890
output
123456789012,34567,89012,345678901234,567890

How to strip characters within a filename?

I am having trouble on stripping characters within a filename.
For example:
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
This is the output I want:
1326847080_MUNDO-1.xml
1326836220_PLANETACNN-3.xml
for i in *.xml
do
j=$(echo $i | sed -e s/-.*-/-/)
echo mv $i $j
done
or in one line:
for i in *.xml; do echo mv $i $(echo $i | sed -e s/-.*-/-/); done
remove echo to actually perform the mv command.
Or, without sed, using bash builtin pattern replacement:
for i in *.xml; do echo mv $i ${i//-*-/-}; done
rename to the rescue, with Perl regular expressions. This command will show which moves will be made; just remove -n to actually rename the files:
$ rename -n 's/([^-]+)-.*-([^-]+)/$1-$2/' *.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml renamed as 1326836220_PLANETACNN-3.xml
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml renamed as 1326847080_MUNDO-1.xml
The regular expression explained:
Save the part up to (but excluding) the first dash as match 1.
Save the part after the last dash as match 2.
Replace the part from the start of match 1 to the end of match 2 with match 1, a dash, and match 2.
sorry for the late reply , but i saw it today :( .
I think you are looking for the following
input file ::
cat > abc
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
code : (its a bit too basic , even for my liking)
while read line
do
echo $line ;
fname=`echo $line | cut -d"-" -f1`;
lfield=`echo $line | sed -n 's/\-/ /gp' | wc -w`;
lname=`echo $line | cut -d"-" -f${lfield}`;
new_name="${fname}-${lname}";
echo "new name is :: $new_name";
done < abc ;
output ::
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
new name is :: 1326847080_MUNDO-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
new name is :: 1326836220_PLANETACNN-3.xml

Counting lines ignored by grep

Let me try to explain this as clearly as I can...
I have a script that at some point does this:
grep -vf ignore.txt input.txt
This ignore.txt has a bunch of lines with things I want my grep to ignore, hence the -v (meaning I don't want to see them in the output of grep).
Now, what I want to do is I want to be able to know how many lines of input.txt have been ignored by each line of ignore.txt.
For example, if ignore.txt had these lines:
line1
line2
line3
I would like to know how many lines of input.txt were ignored by ignoring line1, how many by ignoring line2, and so on.
Any ideas on how can I do this?
I hope that made sense... Thanks!
Note that the sum of the ignored lines plus the shown lines may NOT add up to the total number of lines... "line1 and line2 are here" will be counted twice.
#!/usr/bin/perl
use warnings;
use strict;
local #ARGV = 'ignore.txt';
chomp(my #pats = <>);
foreach my $pat (#pats) {
print "$pat: ", qx/grep -c $pat input.txt/;
}
According to unix.stackexchange
grep -o pattern file | wc -l
counts the total number of a given pattern in the file. A solution, given this and the information, that you already use a script, is to use several grep instances to filter and count the patterns, which you want to ignore.
However, I'd try to build a more comfortable solution involving a scripting language like e.g. python.
This script will count the matched lines by hash lookup and save the lines to be printed in #result, where you may process them as you will. To emulate grep, just print them.
I made the script so it can print out an example. To use with the files, uncomment the code in the script, and comment the ones marked # example line.
Code:
use strict;
use warnings;
use v5.10;
use Data::Dumper; # example line
# Example data.
my #ignore = ('line1' .. 'line9'); # example line
my #input = ('line2' .. 'line9', 'fo' .. 'fx', 'line2', 'line3'); # example line
#my $ignore = shift; # first argument is ignore.txt
#open my $fh, '<', $ignore or die $!;
#chomp(my #ignore = <$fh>);
#close $fh;
my #result;
my %lookup = map { $_ => 0 } #ignore;
my $rx = join '|', map quotemeta, #ignore;
#while (<>) { # This processes the remaining arguments, input.txt etc
for (#input) { # example line
chomp; # Required to avoid bugs due to missing newline at eof
if (/($rx)/) {
$lookup{$1}++;
} else {
push #result, $_;
}
}
#say for #result; # This will emulate grep
print Dumper \%lookup; # example line
Output:
$VAR1 = {
'line6' => 1,
'line1' => 0,
'line5' => 1,
'line2' => 2,
'line9' => 1,
'line3' => 2,
'line8' => 1,
'line4' => 1,
'line7' => 1
};
while IFS= read -r pattern ; do
printf '%s:' "$pattern"
grep -c -v "$pattern" input.txt
done < ignore.txt
grep with -c counts matching lines, but with -v added it counts non-matching lines. So, simply loop over the patterns and count once for each pattern.
This will print the number of ignored matches along with the matching pattern:
grep -of ignore.txt input.txt | sort | uniq -c
For example:
$ perl -le 'print "Coroline" . ++$s for 1 .. 21' > input.txt
$ perl -le 'print "line2\nline14"' > ignore.txt
$ grep -of ignore.txt input.txt | sort | uniq -c
1 line14
3 line2
I.e., A line matching "line14" was ignored once. A line matching "line2" was ignored 3 times.
If you just wanted to count the total ignored lines this would work:
grep -cof ignore.txt input.txt
Update: modified the example above to use strings so that the output is a little clearer.
This might work for you:
# seq 1 15 | sed '/^1/!d' | sed -n '$='
7
Explanation:
Delete all lines except those that match. Pipe these matching (ignored) lines to another sed command. Delete all these lines but show the line number only of the last line. So in this example 1 thru 15, lines 1,10 thru 15 are ignored - a total of 7 lines.
EDIT:
Sorry misread the question (still a little confused!):
sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt| uniq -c,' ignore.txt | sh
This shows the number of matches for each pattern in the the ignore.txt
sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,' ignore.txt | sh
This shows the number of non-matches for each pattern in the the ignore.txt
If using GNU sed, these should work too:
sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt | uniq -c,;e' ignore.txt
or
sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,;e' ignore.txt
N.B. Your success with patterns may vary i.e. check for meta characters beforehand.
On reflection I thought this can be improved to:
sed 's,.*,/&/i\\matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c
or
sed 's,.*,/&/!i\\non-matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c
But NO, on large files this is actually slower.
Are both ignore.txt and input.txt sorted?
If so, you can use the comm command!
$ comm -12 ignore.txt input.txt
How many lines are ignored?
$ comm -12 ignore.txt input.txt | wc -l
Or, if you want to do more processing, combine comm with awk.:
$ comm ignore.txt input.txt | awk '
END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}
{
if ($0 !~ /^\t/) {uniqtotal+=1}
if ($0 ~ /^\t[^\t]/) {commtotal+=1}
if ($0 ~ /^\t\t/) {igtotal+=1}
}'
Here I'm taking advantage with the tabs that are placed in the output by the comm command:
* If there are no tabs, the line is in ignore.txt only.
* If there is a single tab, it is in input.txt only
* If there are two tabs, the line is in both files.
By the way, not all the lines in ignore.txt are ignored. If the line isn't also in input.txt, the line can't really be said to be ignored.
With Dennis Williamson's Suggestion
comm ignore.txt input.txt | awk '
!/^\t/ {uniqtotal++}
/^\t[^\t]/ {commtotal++}
/^\t\t/ {igtotal++}
END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}'

want to read file line by line and then want to cut the line on delimiter

cat $INPUT_FILE| while read LINE
do
abc=cut -d ',' -f 4 $LINE
Perl:
cat $INPUT_FILE | perl -ne '{my #fields = split /,/; print $fields[3];}'
The key is to use command substitution if you want the output of a command saved in a variable.
POSIX shell (sh):
while read -r LINE
do
abc=$(cut -d ',' -f 4 "$LINE")
done < "$INPUT_FILE"
If you're using a legacy Bourne shell, use backticks instead of the preferred $():
abc=`cut -d ',' -f 4 "$LINE"`
In some shells, you may not need to use an external utility.
Bash, ksh, zsh:
while read -r LINE
do
IFS=, read -r f1 f2 f3 abc remainder <<< "$LINE"
done < "$INPUT_FILE"
or
while read -r LINE
do
IFS=, read -r -a array <<< "$LINE"
abc=${array[3]}
done < "$INPUT_FILE"
or
saveIFS=$IFS
while read -r LINE
do
IFS=,
array=($LINE)
IFS=$saveIFS
abc=${array[3]}
done < "$INPUT_FILE"
Bash:
while read line ; do
cut -d, -f4 <<<"$line"
done < $INPUT_FILE
Straight Perl:
open (INPUT_FILE, "<$INPUT_FILE") or die ("Could not open $INPUT_FILE");
while (<INPUT_FILE>) {
#fields = split(/,/, $_);
$use_this_field_value = $fields[3];
# do something with field value here
}
close (INPUT_FILE);

Convert stdin to one file per line

I would like to write a small script using the standard linux shell scripting tools (sed, awk, etc.) to create one output file per line of input. For example, given the following input file 'input':
line1
line2
line3
I would like to run the command.
$ cat input | my_prog
Where 'my_prog' is a script that creates three output files, out.1, out.2, out.3
$ cat out.1
line1
$ cat out.2
line2
$ cat out.3
line3
count=0
for line in $(cat input)
do
echo $line > out.$count
let count=count+1
done
perl -ne 'open my $fh, ">out.$." && print $fh $_' input.txt
using awk
awk '{print $0 > "out_"++d".txt"}' file
#!/bin/bash
while IFS= read -r line; do
echo "$line" > out.$((++i))
done < /path/to/input