The following sed command is working as expected. What I need to do is to change the null (\n) to 0 only in the second column.
# cat nulltest.txt
1 abc
1 \N
\N xyz
# sed 's/\\N/0/' nulltest.txt
1 abc
1 0
0 xyz
Expected results :
1 abc
1 0
\N xyz
Data is separated by tab "\t"
kent$ echo "1 abc
1 \N
\N xyz"|awk '{gsub(/\\N/,"0",$2)}1'|column -t
1 abc
1 0
\N xyz
You could use below regex in your sed expression which ensures that \N is in 2nd column.
^([^\t]+\t)\\N(\t)
So your sed expression will look something like below
sed -r -i 's/^([^\t]+\t)\\N(\t)/\10\2/g' nulltest.txt
Explanation:
^([^\t]+\t): will match 1 followed by \t and () around the regex makes it first group.
\\N : will match \N
(\t): It the tab after second column which is second group.
In the substitue section of sed notice use of \1 and \2 which represents the first and second group from the regex which in your case is 1 followed by \t and \t respectively. So it will keep group one and two and replace rest of the matched string with 0.
In my testing I used below input file
abcdefgh
3 abc \N \N \N
123 \N \N \N
\N \Nxyz
and the output I get is
abcdefgh
3 abc \N \N \N
123adsa 0 \N \N
\N \Nxyz
Notice that exactly \N from 2nd column is replaced. Even if there are any number of columns with \N this sed expression will replace \N only from 2nd column.
try this:
sed -r 's/^([^\t]+\t)\\N/\10/' nulltest.txt
Related
Hello Sed/Regexp experts, Need some help,
I have a file with below contents, need to replace tabs as space inside double quotes.
Note \t is tab.
1 \t 2 \t 3 \t "4 \t 5 \t 6" \t 7
Expected output:
1 \t 2 \t 3 \t "4 5 6" \t 7
Matching quotes and tired replacing the tabs to space but it replaces the content inside the quotes.
sed '/\s/s/".*"/" "/' 1.txt
Thanks
Here is a sed solution using label:
sed -E -e :a -e 's/("[^\t"]*)\t([^"]*")/\1 \2/; ta' file
1 2 3 "4 5 6" 7
However, it is easier to do this using awk by using " as field delimiter and change every even numbered field (which will be inside the quote):
awk '
BEGIN {FS=OFS="\""}
{
for (i=2; i<=NF; i+=2)
gsub(/\t/, " ", $i)
} 1' file
1 2 3 "4 5 6" 7
With your shown samples Only, please try following awk code. Written and tested in GNU awk using RT variable of awk to deal with values between "....".
awk -v RS='"[^*]*"' 'RT{gsub(/\t/,OFS,RT);ORS=RT;print};END{ORS="";print}' Input_file
with python using indexes and regex - re.sub
st = r'1 2 3 "4 5 6" 7'
l_ind = st.index('"')
r_ind = st.rindex('"')
new_st = st[:l_ind] + re.sub(r'\s+', r' ', st[l_ind:r_ind]) + st[r_ind:]
1 2 3 "4 5 6" 7
another version using re.sub and re.findall
re.sub(r'".*?"',re.sub(r'\s+', r' ', re.findall(r'".*?"', st)[0]), st)
1 2 3 "4 5 6" 7
re.findall(r'".*?"', st)[0] - find the string in double quotes
re.sub(r'\s+', r' ', - compress the multiple space to one inside the double quoted string
re.sub(r'".*?"', - substitute the original double quoted string with the new one.
This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^"\t]*"[^"]*)*"[^"\t]*)\t/\1 /;ta' file
Replace the first tab within matched double quotes with a space and repeat until failure.
N.B. This solution caters for lines with multiple matching double quotes.
I want to extract lines that have a particular pattern, in a certain column. For example, in my 'input.txt' file, I have many columns. I want to search the 25th column for 'foobar', and extract only those lines that have 'foobar' in the 25th column. I cannot do:
grep foobar input.txt
because other columns may also have 'foobar', and I don't want those lines. Also:
the 25th column will have 'foobar' as part of a string (i.e. it could be 'foobar ; muller' or 'max ; foobar ; john', or 'tom ; foobar35')
I would NOT want 'tom ; foobar35'
The word in column 25 must be an exact match for 'foobar' (and ; so using awk $25=='foobar' is not an option.
In other words, if column 25 had the following lines:
foobar ; muller
max ; foobar ; john
tom ; foobar35
I would want only lines 1 & 2.
How do I use xargs and sed to extract these lines? I am stuck at:
cut -f25 input.txt | grep -nw foobar | xargs -I linenumbers sed ???
thanks!
Do not use xargs and sed, use the other tool common on so many machines and do this:
awk '{if($25=="foobar"){print NR" "$0}}' input.txt
print NR prints the line number of the current match so the first column of the output will be the line number.
print $0 prints the current line. Change it to print $25 if you only want the matching column. If you only want the output, use this:
awk '{if($25=="foobar"){print $0}}' input.txt
EDIT1 to match extended question:
Use what #shellter and #Jotne suggested but add string delimiters.
awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' '$25~/foobar/' input.txt
[^ ]* matches all characters that are not a space.
'[^']*' matches everything inside single quotes.
EDIT2 to exclude everything but foobar:
awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' "\$25~/[;' ]foobar[;' ]/" input.txt
[;' ] only allows ;, ' and in front and after foobar.
Tested with this file:
1 "1 ; 1" 4
2 'kom foobar' 33
3 "ll;3" 3
4 '1; foobar' asd
7 '5 ;foobar' 2
7 '5;foobar' 0
2 'kom foobar35' 33
2 'kom ; foobar' 33
2 'foobar ; john' 33
2 'foobar;paul' 33
2 'foobar1;paul' 33
2 'foobarli;paul' 33
2 'afoobar;paul' 33
and this command awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' "\$2~/[;' ]foobar[;' ]/" input.txt
To get the line with foobar as part of the 25 field.
awk '$25=="foobar"' input.txt
$25 25th filed
== equal to
"foobar"
Since no action spesified, print the complete line will be done, same as {print $0}
Or
awk '$25~/^foobar$/' input.txt
This might work for you (GNU sed):
sed -En 's/\S+/\n&\n/25;s/\n(.*foobar.*)\n/\1/p' file
Surround the 25th field by newlines and pattern match for foobar between newlines.
If you only want to match the word foobar use:
sed -En 's/\S+/\n&\n/25;s/\n(.*\<foobar\>.*)\n/\1/p' file
I would be happy if anyone can suggest me command (sed or AWK one line command) to divide each line of file in equal number of part. For example divide each line in 4 part.
Input:
ATGCATHLMNPHLNTPLML
Output:
ATGCA THLMN PHLNT PLML
This should work using GNU sed:
sed -r 's/(.{4})/\1 /g'
-r is needed to use extended regular expressions
.{4} captures every four characters
\1 refers to the captured group which is surrounded by the parenthesis ( ) and adds a space behind this group
g makes sure that the replacement is done as many times as possible on each line
A test; this is the input and output in my terminal:
$ echo "ATGCATHLMNPHLNTPLML" | sed -r 's/(.{4})/\1 /g'
ATGC ATHL MNPH LNTP LML
I suspect awk is not the best tool for this, but:
gawk --posix '{ l = sprintf( "%d", 1 + (length()-1)/4);
gsub( ".{"l"}", "& " ) } 1' input-file
If you have a posix compliant awk you can omit the --posix, but --posix is necessary for gnu awk and since that seems to be the most commonly used implementation I've given the solution in terms of gawk.
This might work for you (GNU sed):
sed 'h;s/./X/g;s/^\(.*\)\1\1\1/\1 \1 \1 \1/;G;s/\n/&&/;:a;/^\n/bb;/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta;s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta;:b;s/\n//g' file
Explanation:
h copy the pattern space (PS) to the hold space (HS)
s/./X/g replace every character in the HS with the same non-space character (in this case X)
s/^\(.*\)\1\1\1/\1 \1 \1 \1/ split the line into 4 parts (space separated)
G append a newline followed by the contents of the HS to the PS
s/\n/&&/ double the newline (to be later used as markers)
:a introduce a loop namespace
/^\n/bb if we reach a newline we are done and branch to the b namespace
/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta; if the first character is a space add a space to the real line at this point and repeat
s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta any other character just bump along and repeat
:b;s/\n//g all done just remove the markers and print out the result
This work for any length of line, however is the line is not exactly divisible by 4 the last portion will contain the remainder as well.
perl
perl might be a better choice here:
export cols=4
perl -ne 'chomp; $fw = 1 + int length()/$ENV{cols}; while(/(.{1,$fw})/gm) { print $1 . " " } print "\n"'
This re-calculates field-width for every line.
coreutils
A GNU coreutils alternative, field-width is chosen based on the first line of infile:
cols=4
len=$(( $(head -n1 infile | wc -c) - 1 ))
fw=$(echo "scale=0; 1 + $len / 4" | bc)
cut_arg=$(paste -d- <(seq 1 $fw 19) <(seq $fw $fw $len) | head -c-1 | tr '\n' ',')
Value of cut_arg is in the above case:
1-5,6-10,11-15,16-
Now cut the line into appropriate chunks:
cut --output-delimiter=' ' -c $cut_arg infile
I would like to get multi-line text in between horizontal delimiter and ignore anything else before and after the delimiter.
An example would be:-
Some text here before any delimiter
----------
Line 1
Line 2
Line 3
Line 4
----------
Line 1
Line 2
Line 3
Line 4
----------
Some text here after last delimiter
And I would like to get
Line 1
Line 2
Line 3
Line 4
Line 1
Line 2
Line 3
Line 4
How do I do this with awk / sed with regex? Thanks.
You can try this.
file: a.awk:
BEGIN { RS = "-+" }
{
if ( NR > 1 && RT != "" )
{
print $0
}
}
run: awk -f a.awk data_file
If you can comfortably fit the entire file into memory, and if Perl is acceptable instead of awk or sed,
perl -0777 -pe 's/\A.*?\n-{10}\n//s;
s/(.*\n)-{10}\n.*?\Z/\1/s;
s/\n-{10}\n/\n\n\n/g' file >newfile
The main FAQs here are the -0777 option (slurp mode) and the /s (dot matches newlines) regex flag.
This might work for you:
sed '1,/^--*$/d;:a;$!{/\(^\|\n\)--*$/!N;//!ba;s///p};d' file
I have table structure:
ous.txt 1452 1793 out.txt 36796 14997 ouw.txt 478
4247
3 columns & lots of rows.
I want to trim ".txt" - last 4 characters from the #1 column (with awk, sed).
I know that chopping the end of line was covered times here, but i don't know how to access the end of n-th collumn.
Based on your sample input, this would do it:
sed 's/\.txt//' filename
If I only wanted to operate on the 1st whitespace-delimted column, I'd use awk or just the shell:
while read -r col1 col2 col3; do
printf "%s %s %s\n" "${col1%.txt}" "$col2" "$col3"
done < filename
If you want to remove the last 4 characters of column 1:
awk '{sub(/....$/, "", $1)} 1' filename
If the columns are separated by spaces, but not tabs:
sed 's/.... / /' filename