awk or sed CSV file manipulation

awk or sed CSV file manipulation - perl

"a004-1b","North","at006754"
"a004-1c","south","atytgh0"
"a004-1d","east","atrthh"
"a010-1a","midwest","atyu"
"a010-1b","south","rfg67"
I want to print the first column and the second column without any extra character I want eliminate all ("", and the third column) Thanks in advance

awk -F'^"|","|"$' '{print $2,$3}' ./infile.csv
The above script will even handle fields that have embedded double quotes or commas. The only downside (if you can call it that) is that the first field starts at $2
Proof of Concept
$ awk -F'^"|","|"$' '{print $2,$3}' ./infile.csv
a004-1b North
a004-1c south
a010-1a midwest
a010-1b south

You need GNU Awk 4 for this to work:
$ gawk -vFPAT='[^",]+' '{print $1,$2}'
I love this new "field pattern" feature. It's my new hammer and everything is a nail. Read up on it at http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
(Written this way it doesn't account for embedded commas or quotes, because the question implies this is not needed.)

If you're using awk for this, why put a Perl tag on it?
In Perl:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
use Text::CSV;
my $csv = Text::CSV->new();
while( my $row = $csv->getline( \*DATA )){
print 'row: ', Dumper $row;
}
__DATA__
"a004-1b","North","at006754"
"a004-1c","south","atytgh0""a004-1d","east","atrthh"
"a010-1a","midwest","atyu"
"a010-1b","south","rfg67"

awk -F'\"|\,' '{print $2,$5}' sample

Not handling embedded double quotes:
sed -e 's/^"\([^"]*\)","\([^"]*\)".*/\1 \2/'
To handle them:
sed -n -e 's/^"//;s/"$//;s/","/ /;s/","/\n/;P'
The above works even for a 1 or 2 field input.

If you want it "pure" awk or sed, this won't fit the bill, but otherwise it works:
awk -F, '{print $1 " " $2}' | tr -d '"'

Related

Perl script throws syntax error for awk command

I have a file which contains each users userid and password. I need to fetch userid and password from that file by passing userid as an search element using awk command.
user101,smith,smith#123
user102,jones,passj#007
user103,albert,albpass#01
I am using a awk command inside my perl script like this:
...
...
my $userid = ARGV[0];
my $user_report_file = "report_file.txt";
my $data = `awk -F, '$1 ~ /$userid/ {print $2, $3}' $user_report_file`;
my ($user,$pw) = split(" ",$data);
...
...
Here I am getting the error:
awk: ~ /user101/ {print , }
awk: ^ syntax error
But if I run same command in terminal window its able to give result like below:
$] awk -F, '$1 ~ /user101/ {print $2, $3}' report_file.txt
smith smith#123
What could be the issue here?

The backticks are a double-quoted context, so you need to escape any literal $ that you want awk to interpret.
my $data = `awk -F, '\$1 ~ /$userid/ {print \$2, \$3}' $user_report_file`;
If you don't do that, you're interpolating the capture variables from the last successful Perl match.
When I have these sorts of problems, I try the command as a string first to see if it is what I expect:
my $data = "awk -F, '\$1 ~ /$userid/ {print \$2, \$3}' $user_report_file";
say $data;
Here's the Perl equivalent of that command:
$ perl -aF, -e '$F[0]=~/101/ && print "#F[1,2]"' report_file
But, this is something you probably want to do in Perl instead of creating another process:
Interpolating data into external commands can go wrong, such as a filename that is foo.txt; rm -rf /.
The awk you run is the first one in the path, so someone can make that a completely different program (so use the full path, like /usr/bin/awk).
Taint checking can tell you when you are passing unsanitized data to the shell.
Inside a program you don't get all the shortcuts, but if this is the part of your program that is slow, you probably want to rethink how you are accessing this data because scanning the entire file with any tool isn't going to be that fast:
open my $fh, '<', $user_report_file or die;
while( <$fh> ) {
chomp;
my #F = split /,/;
next unless $F[0] =~ /\Q$userid/;
print "#F[1,2]";
last; # if you only want the first one
}

Sed - replace words

I have a problem with replacing string.
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
I want to find occurrence of Svc till | appears and swap place with Stm till | appears.
My attempts went to replacing characters and this is not my goal.

awk -F'|' -v OFS='|'
'{a=b=0;
for(i=1;i<=NF;i++){a=$i~/^Stm=/?i:a;b=$i~/^Svc=/?i:b}
t=$a;$a=$b;$b=t}7' file
outputs:
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
the code exchange the column of Stm.. and Svc.., no matter which one comes first.

If perl solution is okay, assumes only one column matches each for search terms
$ cat ip.txt
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
$ perl -F'\|' -lane '
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F;
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t;
print join "|", #F;
' ip.txt
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
-F'\|' -lane split input line on |, see also Perl flags -pe, -pi, -p, -w, -d, -i, -t?
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F get index of columns matching Svc and Stm
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t swap the two columns
Or use ($F[$i[0]], $F[$i[1]]) = ($F[$i[1]], $F[$i[0]]); courtesy How can I swap two Perl variables
print join "|", #F print the modified array

You need to use capture groups and backreferences in a string substition.
The below will swap the 2:
echo '|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631' | sed 's/\(Stm.*|\)\(.*\)\(Svc.*|\)/\3\2\1/'
As pointed out in the comment from #Kent, this will not work if the strings were not in that order.

How to display text between nested Parenthesis using sed or awk or grep?

I have a file inside that one line contains nested parenthesis, i want to display those words only.
Example:
(abc (defg) or hij(klmn)) and (opq(rstuv))
Expected Result:
defg
klmn
rstuv
I have tried with awk - awk -F "[(())]" '{ for (i=2; i<NF; i+=2) print $i}'
I have tried with sed - sed 's/.*(\([a-zA-Z0-9_]*\)).*/\1/'

Using perl global matching and lazy quantifiers:
#! /usr/bin/perl -n
use feature 'say';
while (/\((.*?\)[^(]*?)\)/g) {
$m=$1;
while ($m =~ /\((.*?)\)/g) {
say $1;
}
}
Output:
defg
klmn
rstuv

Maybe with grep?
$ echo "(abc (defg) or hij(klmn)) and (opq(rstuv))" | grep -o "([a-z]*)"
(defg)
(klmn)
(rstuv)
It catches the groups of ( + letters + ).
I tried to get rid of the paranthesis but could not. This is my approach:
grep -Po '(?<=()[a-z]*(?=))'
but it indicates that "grep: lookbehind assertion is not fixed length", as I guess it cannot decide up to which ) to look for.

This might work for you (GNU sed):
sed -r 's/\(([^()]*)\)/\n\1\n/;s/[^\n]*\n//;/[^()]/P;D' file

variable for field separator in perl

In awk I can write: awk -F: 'BEGIN {OFS = FS} ...'
In Perl, what's the equivalent of FS? I'd like to write
perl -F: -lane 'BEGIN {$, = [what?]} ...'
update with an example:
echo a:b:c:d | awk -F: 'BEGIN {OFS = FS} {$2 = 42; print}'
echo a:b:c:d | perl -F: -ane 'BEGIN {$, = ":"} $F[1] = 42; print #F'
Both output a:42:c:d
I would prefer not to hard-code the : in the Perl BEGIN block, but refer to wherever the -F option saves its argument.

To sum up, what I'm looking for does not exist:
there's no variable that holds the argument for -F, and more importantly
Perl's "FS" is fundamentally a different data type (regular expression) than the "OFS" (string) -- it does not make sense to join a list of strings using a regex.
Note that the same holds true in awk: FS is a string but acts as regex:
echo a:b,c:d | awk -F'[:,]' 'BEGIN {OFS=FS} {$2=42; print}'
outputs "a[:,]42[:,]c[:,]d"
Thanks for the insight and workarounds though.
You can use perl's -s (similar to awk's -v) to pass a "FS" variable, but the split becomes manual:
echo a:b:c:d | perl -sne '
BEGIN {$, = $FS}
#F = split $FS;
$F[1] = 42;
print #F;
' -- -FS=":"

If you know the exact length of input, you could do this:
echo a:b:c:d | perl -F'(:)' -ane '$, = $F[1]; #F = #F[0,2,4,6]; $F[1] = 42; print #F'
If the input is of variable lengths, you'll need something more sophisticated than #f[0,2,4,6].
EDIT: -F seems to simply provide input to an automatic split() call, which takes a complete RE as an expression. You may be able to find something more suitable by reading the perldoc entries for split, perlre, and perlvar.

You can sort of cheat it, because perl is actually using the split function with your -F argument, and you can tell split to preserve what it splits on by including capturing parens in the regex:
$ echo a:b:c:d | perl -F'(:)' -ane 'print join("/", #F);'
a/:/b/:/c/:/d
You can see what perl's doing with some of these "magic" command-line arguments by using -MO=Deparse, like this:
$ perl -MO=Deparse -F'(:)' -ane 'print join("/", #F);'
LINE: while (defined($_ = <ARGV>)) {
our(#F) = split(/(:)/, $_, 0);
print join('/', #F);
}
-e syntax OK
You'd have to change your #F subscripts to double what they'd normally be ($F[2] = 42).

Darnit...
The best I can do is:
echo a:b:c:d | perl -ne '$v=":";#F = split("$v"); $F[1] = 42; print join("$v", #F) . "\n";'
You don't need the -F: this way, and you're only stating the colon once. I was hoping there was someway of setting variables on the command line like you can with Awk's -v switch.
For one liners, Perl is usually not as clean as Awk, but I remember using Awk before I knew of Perl and writing 1000+ line Awk scripts.
Trying things like this made people think Awk was either named after the sound someone made when they tried to decipher such a script, or stood for AWKward.

There is no input record separator in Perl. You're basically emulating awk by using the -a and -F flags. If you really don't want to hard code the value, then why not just use an environmental variable?
$ export SPLIT=":"
$ perl -F$SPLIT -lane 'BEGIN { $, = $ENV{SPLIT}; } ...'

How can I change spaces to underscores and lowercase everything?

I have a text file which contains:
Cycle code
Cycle month
Cycle year
Event type ID
Event ID
Network start time
I want to change this text so that when ever there is a space, I want to replace it with a _. And after that, I want the characters to lower case letter like below:
cycle_code
cycle_month
cycle_year
event_type_id
event_id
network_start_time
How could I accomplish this?

Another Perl method:
perl -pe 'y/A-Z /a-z_/' file

tr alone works:
tr ' [:upper:]' '_[:lower:]' < file

Looking into sed documentation some more and following advice from the comments the following command should work.
sed -r {filehere} -e 's/[A-Z]/\L&/g;s/ /_/g' -i

There is a perl tag in your question as well. So:
#!/usr/bin/perl
use strict; use warnings;
while (<DATA>) {
print join('_', split ' ', lc), "\n";
}
__DATA__
Cycle code
Cycle month
Cycle year
Event type ID
Event ID
Network start time
Or:
perl -i.bak -wple '$_ = join('_', split ' ', lc)' test.txt

sed "y/ABCDEFGHIJKLMNOPQRSTUVWXYZ /abcdefghijklmnopqrstuvwxyz_/" filename

Just use your shell, if you have Bash 4
while read -r line
do
line=${line,,} #change to lowercase
echo ${line// /_}
done < "file" > newfile
mv newfile file
With gawk:
awk '{$0=tolower($0);$1=$1}1' OFS="_" file
With Perl:
perl -ne 's/ +/_/g;print lc' file
With Python:
>>> f=open("file")
>>> for line in f:
... print '_'.join(line.split()).lower()
>>> f.close()

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

awk or sed CSV file manipulation - perl

"a004-1b","North","at006754" "a004-1c","south","atytgh0" "a004-1d","east","atrthh" "a010-1a","midwest","atyu" "a010-1b","south","rfg67" I want to print the first column and the second column without any extra character I want eliminate all ("", and the third column) Thanks in advance

awk -F'\"|\,' '{print $2,$5}' sample

Not handling embedded double quotes: sed -e 's/^"\([^"]\)","\([^"]\)".*/\1 \2/' To handle them: sed -n -e 's/^"//;s/"$//;s/","/ /;s/","/\n/;P' The above works even for a 1 or 2 field input.

If you want it "pure" awk or sed, this won't fit the bill, but otherwise it works: awk -F, '{print $1 " " $2}' | tr -d '"'

Related

Perl script throws syntax error for awk command

Sed - replace words

How to display text between nested Parenthesis using sed or awk or grep?

variable for field separator in perl

How can I change spaces to underscores and lowercase everything?

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

awk or sed CSV file manipulation - perl

"a004-1b","North","at006754" "a004-1c","south","atytgh0" "a004-1d","east","atrthh" "a010-1a","midwest","atyu" "a010-1b","south","rfg67" I want to print the first column and the second column without any extra character I want eliminate all ("", and the third column) Thanks in advance

awk -F'\"|\,' '{print $2,$5}' sample

Not handling embedded double quotes: sed -e 's/^"\([^"]*\)","\([^"]*\)".*/\1 \2/' To handle them: sed -n -e 's/^"//;s/"$//;s/","/ /;s/","/\n/;P' The above works even for a 1 or 2 field input.

If you want it "pure" awk or sed, this won't fit the bill, but otherwise it works: awk -F, '{print $1 " " $2}' | tr -d '"'

Related

Perl script throws syntax error for awk command

Sed - replace words

How to display text between nested Parenthesis using sed or awk or grep?

variable for field separator in perl

How can I change spaces to underscores and lowercase everything?

Categories

Resources

Not handling embedded double quotes: sed -e 's/^"\([^"]\)","\([^"]\)".*/\1 \2/' To handle them: sed -n -e 's/^"//;s/"$//;s/","/ /;s/","/\n/;P' The above works even for a 1 or 2 field input.