Change numbering according to field value by bash script - perl

I have a tab delimited file like this (without the headers and in the example I use the pipe character as delimiter for clarity)
ID1|ID2|VAL1|
1|1|3
1|1|4
1|2|3
1|2|5
2|2|6
I want add a new field to this file that changes whenever ID1 or ID2 change. Like this:
1|1|3|1
1|1|4|1
1|2|3|2
1|2|5|2
2|2|6|3
Is this possible with an one liner in sed,awk, perl etc... or should I use a standard programming language (Java) for this task. Thanks in advance for your time.

Here is an awk
awk -F\| '$1$2!=a {f++} {print $0,f;a=$1$2}' OFS=\| file
1|1|3|1
1|1|4|1
1|2|3|2
1|2|5|2
2|2|6|3

Simple enough with bash, though I'm sure you could figure out a 1-line awk
#!/bin/bash
count=1
while IFS='|' read -r id1 id2 val1; do
#Can remove next 3 lines if you're sure you won't have extraneous whitespace
id1="${id1//[[:space:]]/}"
id2="${id2//[[:space:]]/}"
val1="${val1//[[:space:]]/}"
[[ ( -n $old1 && $old1 -ne $id1 ) || ( -n $old2 && $old2 -ne $id2 ) ]] && ((count+=1))
echo "$id1|$id2|$val1|$count"
old1="$id1" && old2="$id2"
done < file
For example
> cat file
1|1|3
1|1|4
1|2|3
1|2|5
2|2|6
> ./abovescript
1|1|3|1
1|1|4|1
1|2|3|2
1|2|5|2
2|2|6|3
Replace IFS='|' with IFS=$'\t' for tab delimited

Using awk
awk 'FNR>1{print $0 FS (++a[$1$2]=="1"?++i:i)}' FS=\| file

Related

Required suggestions to optimize a piece of unix ksh code

I'm new to shell scripting and expecting some guidance on how to optimize the following piece of code to avoid unnecessary loops.
The file "DD.$BUS_DT.dat" is a pipe delimited file and contains 4 columns. Sample data in DD.2015-05-19.dat will be as follows
cust portal|10|10|0
sys-b|10|10|0
Code
i=0;
sed 's///g;s/[0-9]//g' ./DD.$BUS_DT.dat > ./temp-processed.dat
set -A sourceList
while read line
do
#echo $line
case $line in
'cust portal') sourceList[$i]=custportal;;
*) sourceList[$i]=${line};;
esac
(( i += 1));
done < ./temp-processed.dat;
echo ${sourceList[#]};
i=0;
while [[ i -lt ${#sourceList[#]} ]]; do
print ${sourceList[i]} >> ./processed-$BUS_DT.dat
(( i += 1))
done
My goal is to read the data from the first column of the file without spaces so that the output should be like ...
custportal
sys-b
Your help will be appreciated.
I haven't gone through all your script, but if you just want to get the first column on |-separated columns, stripping the spaces that they may have, you can use awk like this:
$ awk -F"|" '{sub(" ","",$1); print $1}' file
custportal
sys-b
This uses | as field separator and replaces all the spaces with an empty string. Then, it prints it.

Insert a string/number into a specific cell of a csv file

Basically right now I have a for loop running that runs a series of tests. Once the tests pass I input the results into a csv file:
for (( some statement ))
do
if[[ something ]]
input this value into a specific row and column
fi
done
What I can't figure out right now is how to input a specific value into a specific cell in the csv file. I know in awk you can read a cell with this command:
awk -v "row=2" -F'#' 'NR == row { print $2 }' some.csv and this will print the cell in the 2nd row and 2nd column. I need something similar to this except it can input a value into a specific cell instead of read it. Is there a function that does this?
You can use the following:
awk -v value=$value -v row=$row -v col=$col 'BEGIN{FS=OFS="#"} NR==row {$col=value}1' file
And set the bash values $value, $row and $col. Then you can redirect and move to the original:
awk ... file > new_file && mv new_file file
This && means that just if the first command (awk...) is executed successfully, then the second one will be performed.
Explanation
-v value=$value -v row=$row -v col=$col pass the bash variables to awk. Note value, row and col could be other names, I just used the same as bash to make it easier to understand.
BEGIN{FS=OFS="#"} set the Field Separator and Output Field Separator to be #. The OFS="#" is not necessary here, but can be useful in case you do some print.
NR==row {$col=value} when the number of record (number of line here) is equal to row, then set the col column with value value.
1 perform the default awk action: {print $0}.
Example
$ cat a
hello#how#are#you
i#am#fine#thanks
hoho#haha#hehe
$ row=2
$ col=3
$ value="XXX"
$ awk -v value=$value -v row=$row -v col=$col 'BEGIN{FS=OFS="#"} NR==row {$col=value}1' a
hello#how#are#you
i#am#XXX#thanks
hoho#haha#hehe
Your question has a 'perl' tag so here is a way to do it using Tie::Array::CSV which allows you to treat the CSV file as an array of arrays and use standard array operations:
use strict;
use warnings;
use Tie::Array::CSV;
my $row = 2;
my $col = 3;
my $value = 'value';
my $filename = '/path/to/file.csv';
tie my #file, 'Tie::Array::CSV', $filename, sep_char => '#';
$file[$row][$col] = $value;
untie #file;
using sed
row=2 # define the row number
col=3 # define the column number
value="value" # define the value you need change.
sed "$row s/[^#]\{1,\}/$value/$col" file.csv # use shell variable in sed to find row number first, then replace any word between #, and only replace the nominate column.
# So above sed command is converted to sed "2 s/[^#]\{1,\}/value/3" file.csv
If the above command is fine, and your sed command support the option -i, then run the command to change the content directly in file.csv
sed -i "$row s/[^#]\{1,\}/$value/$col" file.csv
Otherwise, you need export to temp file, and change the name back.
sed "$row s/[^#]\{1,\}/$value/$col" file.csv > temp.csv
mv temp.csv file.csv

Make some replacements on a bunch of files depending the number of columns per line

I'm having a problem dealing with some files. I need to perform a column count for every line in a file and depending the number of columns i need to add severals ',' in in the end of each line. All lines should have 36 columns separated by ','
This line solves my problem, but how do I run it in a folder with several files in a automated way?
awk ' BEGIN { FS = "," } ;
{if (NF == 32) { print $0",,,," } else if (NF==31) { print $0",,,,," }
}' <SOURCE_FILE> > <DESTINATION_FILE>
Thank you for all your support
R&P
The answer depends on your OS, which you haven't told us. On UNIX and assuming you want to modify each original file, it'd be:
for file in *
do
awk '...' "$file" > tmp$$ && mv tmp$$ "$file"
done
Also, in general to get all records in a file to have the same number of fields you can do this without needing to specify what that number of fields is (though you can if appropriate):
$ cat tst.awk
BEGIN { FS=OFS=","; ARGV[ARGC++] = ARGV[ARGC-1] }
NR==FNR { nf = (NF > nf ? NF : nf); next }
{
tail = sprintf("%*s",nf-NF,"")
gsub(/ /,OFS,tail)
print $0 tail
}
$
$ cat file
a,b,c
a,b
a,b,c,d,e
$
$ awk -f tst.awk file
a,b,c,,
a,b,,,
a,b,c,d,e
$
$ awk -v nf=10 -f tst.awk file
a,b,c,,,,,,,
a,b,,,,,,,,
a,b,c,d,e,,,,,
It's a short one-liner with Perl:
perl -i.bak -F, -alpe '$_ .= "," x (36-#F)' *
if this is only a single folder without subfolders, use:
for oldfile in /path/to/files/*
do
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
done
if you also want to include subdirectories recursively, it's probably easiest to put the awk+redirection into a small shell-script, like this:
#!/bin/bash
oldfile=$1
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
and then run this script (let's calls it runawk.sh) via find:
find /path/to/files/ -type f -not -name "*.new" -exec runawk.sh \{\} \;

Remove lines from AWK output

I would like to remove lines that have less than 2 columns from a file:
awk '{ if (NF < 2) print}' test
one two
Is there a way to store these lines into variable and then remove it with xargs and sed, something like
awk '{ if (NF < 2) VARIABLE}' test | xargs sed -i /VARIABLE/d
GNU sed
I would like to remove lines that have less than 2 columns
less than 2 = remove lines with only one column
sed -r '/^\s*\S+\s+\S+/!d' file
If you would like to split the input into two files (named "pass" and "fail"), based on condition:
awk '{if (NF > 1 ) print > "pass"; else print > "fail"}' input
If you simply want to filter/remove lines with NF < 2:
awk '(NF > 1){print}' input

Delete records in a file with Null value in certain fields through Unix

I have a Pipe delimited file (sample below) and I need to delete records which has Null value in fields 2(email),4(mailing-id),6(comm_id). In this sample, row 2,3,4 should be deleted. The output should be saved to another file. If 'awk' is the best option, please let me know a way to achieve this
id|email|date|mailing-id|seg_id|comm_id|oyb_id|method
|-fabianz-#yahoo.com|2010-06-23 11:47:00|0|1234|INCLO|1000002|unknown
||2010-06-23 11:47:00|0|3984|INCLO|1000002|unknown
|-maddog-#web.md|2010-06-23 11:47:00|0||INCLO|1000002|unknown
|-mse-#hanmail.net|2010-06-23 11:47:00|0||INCLO|1000002|unknown
|-maine-mei#web.md.net|2010-06-23 11:47:00|0|454|INCLO|1000002|unknown
Here is an awk solution that may help. However, to remove rows 2, 3 and 4, it is necessary to check for null vals in fields 2 and 5 only (i.e. not fields 2, 4 and 6 like you have stated). Am I understanding things correctly? Here is the awk to do what you want:
awk -F "|" '{ if ($2 == "" || $5 == "") next; print $0 }' file.txt > results.txt
cat results.txt:
id|email|date|mailing-id|seg_id|comm_id|oyb_id|method
|-fabianz-#yahoo.com|2010-06-23 11:47:00|0|1234|INCLO|1000002|unknown
|-maine-mei#web.md.net|2010-06-23 11:47:00|0|454|INCLO|1000002|unknown
HTH
Steve is right, it is field 2 and 5 that are missing in the sample given. Email missing for line two and the seq_id missing for line three and four
This is a slightly simplified version of steve's solution
awk -F "|" ' $2!="" && $5!=""' file.txt > results.txt
If column 2,4 and 6 are the important one, the solution would be:
awk -F "|" ' $2!="" && $4!="" && $6!=""' file.txt > results.txt
This might work for you:
sed 'h;s/[^|]*/\n&/2;s/[^|]*/\n&/4;s/[^|]*/\n&/6;/\n|/d;x' file.txt > results.txt