awk/sed in script file - sed

I have input file in this format
time1 = 0.000000
time1 = 0.010000
time1 = 0.020000
time1 = 0.170000
I need to write a script to extract the values and compute the average. How do I do it?

If it follows that exact format through the entire file, you can use this formula
awk '{sum += $3} END {print sum/NR}' file
If there are other entries in the file that will throw you off... you might need to filter and track the number of matches...
awk '/time/ {sum+=$3; total+=1} END {print sum/total}' file

You would use awk, not sed. To get you started: this command line will print the numbers only:
awk '{print $3}' FILENAME
The expr command would be a handy way to add the numbers up.

Related

Required suggestions to optimize a piece of unix ksh code

I'm new to shell scripting and expecting some guidance on how to optimize the following piece of code to avoid unnecessary loops.
The file "DD.$BUS_DT.dat" is a pipe delimited file and contains 4 columns. Sample data in DD.2015-05-19.dat will be as follows
cust portal|10|10|0
sys-b|10|10|0
Code
i=0;
sed 's///g;s/[0-9]//g' ./DD.$BUS_DT.dat > ./temp-processed.dat
set -A sourceList
while read line
do
#echo $line
case $line in
'cust portal') sourceList[$i]=custportal;;
*) sourceList[$i]=${line};;
esac
(( i += 1));
done < ./temp-processed.dat;
echo ${sourceList[#]};
i=0;
while [[ i -lt ${#sourceList[#]} ]]; do
print ${sourceList[i]} >> ./processed-$BUS_DT.dat
(( i += 1))
done
My goal is to read the data from the first column of the file without spaces so that the output should be like ...
custportal
sys-b
Your help will be appreciated.
I haven't gone through all your script, but if you just want to get the first column on |-separated columns, stripping the spaces that they may have, you can use awk like this:
$ awk -F"|" '{sub(" ","",$1); print $1}' file
custportal
sys-b
This uses | as field separator and replaces all the spaces with an empty string. Then, it prints it.

Keeping first character in string, in a specific single field

I am trying to remove all but the first character of a specific field in a .tab file. I want to keep only first character in fields 10 and 11.
Normally the fields have 35 characters in them, so I used:
awk '{gsub ("..................................$","",$10;print} file
however, there are some fields which have less than 35, and were ignored by this replace function. I tired using substring, but I cannot figure out how to make it field specific. I believe there is a way to use perl inside awk so that I can use the function
perl -pe 's/(.).*/$1/g'
but I am not sure how to do that and use the field as the input value, so the file comes out identical except for the altered field.
is there a way to do the perl equivalent with gsub, or the awk equivalent with perl?
help is appreciated!
One way using awk:
awk '{ for (i=10;i<=11;i++) { $i = substr( $i, 1, 1) } } { print }' infile
Another way using gensub function of gawk
gawk '{ for (i=10;i<=11;i++) { $i = gensub(/(.).*/ , "\\1", G , $i) } }1' infile
A shortest awk version, I could figure out:
awk '($10=substr($10,1,1))&&$11=substr($11,1,1)' infile
If the 10th and/or 11th field is not existing then the line is not printed.
Similar version in perl
perl -ane '$F[9]=~s/(.).*/$1/;$F[10]=~s/(.).*/$1/;print "#F\n"' infile
This prints the line even if 10th and/or 11th field is not defined.
Another way with perl:
perl -pe '$c=0; s/(\S+)/(++$c < 10 || $c > 11) ? $1 : substr($1,0,1)/eg' filename

Swap two columns - awk, sed, python, perl

I've got data in a large file (280 columns wide, 7 million lines long!) and I need to swap the first two columns. I think I could do this with some kind of awk for loop, to print $2, $1, then a range to the end of the file - but I don't know how to do the range part, and I can't print $2, $1, $3...$280! Most of the column swap answers I've seen here are specific to small files with a manageable number of columns, so I need something that doesn't depend on specifying every column number.
The file is tab delimited:
Affy-id chr 0 pos NA06984 NA06985 NA06986 NA06989
You can do this by swapping values of the first two fields:
awk ' { t = $1; $1 = $2; $2 = t; print; } ' input_file
I tried the answer of perreal with cygwin on a windows system with a tab separated file. It didn't work, because the standard separator is space.
If you encounter the same problem, try this instead:
awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file
Incoming separator is defined by -F $'\t' and the seperator for output by OFS=$'\t'.
awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file > output_file
Try this more relevant to your question :
awk '{printf("%s\t%s\n", $2, $1)}' inputfile
This might work for you (GNU sed):
sed -i 's/^\([^\t]*\t\)\([^\t]*\t\)/\2\1/' file
Have you tried using the cut command? E.g.
cat myhugefile | cut -c10-20,c1-9,c21- > myrearrangedhugefile
This is also easy in perl:
perl -pe 's/^(\S+)\t(\S+)/$2\t$1/;' file > outputfile
You could do this in Perl:
perl -F\\t -nlae 'print join("\t", #F[1,0,2..$#F])' inputfile
The -F specifies the delimiter. In most shells you need to precede a backslash with another to escape it. On some platforms -F automatically implies -n and -a so they can be dropped.
For your problem you wouldn't need to use -l because the last columns appears last in the output. But if in a different situation, if the last column needs to appear between other columns, the newline character must be removed. The -l switch takes care of this.
The "\t" in join can be changed to anything else to produce a different delimiter in the output.
2..$#F specifies a range from 2 until the last column. As you might have guessed, inside the square brackets, you can put any single column or range of columns in the desired order.
No need to call anything else but your shell:
bash> while read col1 col2 rest; do
echo $col2 $col1 $rest
done <input_file
Test:
bash> echo "first second a c d e f g" |
while read col1 col2 rest; do
echo $col2 $col1 $rest
done
second first a b c d e f g
Maybe even with "inlined" Python - as in a Python script within a shell script - but only if you want to do some more scripting with Bash beforehand or afterwards... Otherwise it is unnecessarily complex.
Content of script file process.sh:
#!/bin/bash
# inline Python script
read -r -d '' PYSCR << EOSCR
from __future__ import print_function
import codecs
import sys
encoding = "utf-8"
fn_in = sys.argv[1]
fn_out = sys.argv[2]
# print("Input:", fn_in)
# print("Output:", fn_out)
with codecs.open(fn_in, "r", encoding) as fp_in, \
codecs.open(fn_out, "w", encoding) as fp_out:
for line in fp_in:
# split into two columns and rest
col1, col2, rest = line.split("\t", 2)
# swap columns in output
fp_out.write("{}\t{}\t{}".format(col2, col1, rest))
EOSCR
# ---------------------
# do setup work?
# e. g. list files for processing
# call python script with params
python3 -c "$PYSCR" "$inputfile" "$outputfile"
# do some more processing
# e. g. rename outputfile to inputfile, ...
If you only need to swap the columns for a single file, then you can also just create a single Python script and statically define the filenames. Or just use an answer above.
awk swapping sans temp-variable :
echo '777777744444444464449: 317 647 14423 262927714037 : 0x2A29D5A1BAA7A95541' |
mawk '1; ($1 = $2 substr(_, ($2 = $1)^_))^_' FS=':' OFS=':'
777777744444444464449: 317 647 14423 262927714037 : 0x2A29D5A1BAA7A95541
317 647 14423 262927714037 :777777744444444464449: 0x2A29D5A1BAA7A95541

Add column to middle of tab-delimited file (sed/awk/whatever)

I'm trying to add a column (with the content '0') to the middle of a pre-existing tab-delimited text file. I imagine sed or awk will do what I want. I've seen various solutions online that do approximately this but they're not explained simply enough for me to modify!
I currently have this content:
Affx-11749850 1 555296 CC
I need this content
Affx-11749850 1 0 555296 CC
Using the command awk '{$3=0}1' filename messes up my formatting AND replaces column 3 with a 0, rather than adding a third column with a 0.
Any help (with explanation!) so I can solve this problem, and future similar problems, much appreciated.
Using the implicit { print } rule and appending the 0 to the second column:
awk '$2 = $2 FS "0"' file
Or with sed, assuming single space delimiters:
sed 's/ / 0 /2' file
Or perl:
perl -lane '$, = " "; $F[1] .= " 0"; print #F'
awk '{$2=$2" "0; print }' your_file
tested below:
> echo "Affx-11749850 1 555296 CC"|awk '{$2=$2" "0;print}'
Affx-11749850 1 0 555296 CC

How can I change spaces to underscores and lowercase everything?

I have a text file which contains:
Cycle code
Cycle month
Cycle year
Event type ID
Event ID
Network start time
I want to change this text so that when ever there is a space, I want to replace it with a _. And after that, I want the characters to lower case letter like below:
cycle_code
cycle_month
cycle_year
event_type_id
event_id
network_start_time
How could I accomplish this?
Another Perl method:
perl -pe 'y/A-Z /a-z_/' file
tr alone works:
tr ' [:upper:]' '_[:lower:]' < file
Looking into sed documentation some more and following advice from the comments the following command should work.
sed -r {filehere} -e 's/[A-Z]/\L&/g;s/ /_/g' -i
There is a perl tag in your question as well. So:
#!/usr/bin/perl
use strict; use warnings;
while (<DATA>) {
print join('_', split ' ', lc), "\n";
}
__DATA__
Cycle code
Cycle month
Cycle year
Event type ID
Event ID
Network start time
Or:
perl -i.bak -wple '$_ = join('_', split ' ', lc)' test.txt
sed "y/ABCDEFGHIJKLMNOPQRSTUVWXYZ /abcdefghijklmnopqrstuvwxyz_/" filename
Just use your shell, if you have Bash 4
while read -r line
do
line=${line,,} #change to lowercase
echo ${line// /_}
done < "file" > newfile
mv newfile file
With gawk:
awk '{$0=tolower($0);$1=$1}1' OFS="_" file
With Perl:
perl -ne 's/ +/_/g;print lc' file
With Python:
>>> f=open("file")
>>> for line in f:
... print '_'.join(line.split()).lower()
>>> f.close()