I have the lines in text.txt as below:
blah blah..
blah abc blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah blah..
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah blah..
blah efg blah blah
blah blah..
blah blah..
I want to output the lines between each last occurrence of "abc" before "efg" and "efg", for the above example, I want to output:
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah abc blah
blah blah..
blah efg blah blah
I know sed can select ranges using two patterns, like:
sed -n '/abc/,/efg/p' test.txt
However the output will begin from the first occurrence of "abc" instead of the last one, the output is as following:
blah abc blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah blah..
blah efg blah blah
Any enhancement can I do on the command line so the output will begin from a last occurrence of "abc"?
This might work for you (GNU sed):
sed -n '/\<abc\>/,/\<efg\>/{/\<abc\>/{h;d};H;/\<efg\>/{x;p}}' file
Related
I need to create pivot datetime column in such a way that when the Order column value keep increasing take the lowest value as start time and highest value as end time but once the counter reset it should create a new row for start & end time.
Sample data
computername currentuser datetime order
abc xyz 7/5/2022 20:04:51 1
abc xyz 7/5/2022 20:04:51 1
abc xyz 7/6/2022 6:45:51 1
abc xyz 7/6/2022 6:45:51 1
abc xyz 7/6/2022 7:06:45 2
abc xyz 7/6/2022 7:06:45 3
abc xyz 7/6/2022 7:07:00 4
abc xyz 7/6/2022 7:59:12 2
abc xyz 7/6/2022 7:59:12 3
abc xyz 7/6/2022 7:59:19 4
abc xyz 7/6/2022 7:59:21 5
abc xyz 7/6/2022 21:28:19 1
abc xyz 7/6/2022 21:28:19 1
abc xyz 7/6/2022 21:28:24 2
abc xyz 7/6/2022 21:28:24 3
abc xyz 7/6/2022 21:28:24 4
Expected Output
computername currentuser starttime endtime
abc xyz 7/5/2022 20:04:51 7/5/2022 20:04:51
abc xyz 7/6/2022 6:45:51 7/6/2022 7:07:00
abc xyz 7/6/2022 7:59:12 7/6/2022 7:59:21
abc xyz 7/6/2022 21:28:19 7/6/2022 21:28:24
I have a csv with around 15 columns
I would like to skip first 2 lines and use a custom schema
Remove double quotes from row values
csv is as below.
Header1 blah blah
Header2 blah blah
Name1;"1,456";"City1";"3";"pet"
Name2;"3,450";"City2";"4";"not pet"
delimiter = ";"
salesDF = spark.read.format("csv") \
.option("quote", "") \
.option("sep", delimiter) \
.load("sales_2018.csv")
salesDF = salesDF.replace("\"","")
I tried as above to remove quotes from csv. Delimiter works but quotes are not getting removed.
Results are as below: It has added only quotes but didn't remove.
Header1 blah blah
Header2 blah blah
"Name1;""1,456"";""City1"";""3"";""pet""
"Name2;""3,450"";""City2"";""4"";""not pet""
My idea is to remove quotes and the remove the first 2 lines of the dataframe to add my custom schema. Thanks.
my input is split into multiple lines. I want it to output in a single line.
For example Input is :
1|23|ABC
DEF
GHI
newline
newline
2|24|PQR
STU
LMN
XYZ
newline
Output:
1|23|ABC DEF GHI
2|24|PQR STU LMN XYZ
Well, here is one for awk:
$ awk -v RS="" -F"\n" '{$1=$1}1' file
Output:
1|23|ABC DEF GHI
2|24|PQR STU LMN XYZ
How can i print lines between pattern1 and pattern2, i dont need lines between pattern1 and pattern3 though.
Please suggest the solution either in sed, awk.
I have case like this.
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern3
pattern1
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern3
Desire output:
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
With sed:
sed -n '/pattern1/{:l N;/pattern3/b;/pattern2/!bl;p}' input
Description
/pattern1/{ # Match pattern1 and ...
:l N; # start loop and read a line
/pattern3/b # quit if pattern3 matches
/pattern2/!bl # loop until pattern2 matches
p # print all lines
Output
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
One method:
$ awk '/pattern1/{s=1;f=1;s=NR}f{p[NR]=$0}/pattern3/{s=0}/pattern2/&&s{f=0;for(i=s;i<=NR;i++)print p[i]}' file
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
$ awk '/pattern1/{f=!f;buf=""} f{buf = buf $0 ORS} /pattern2/{if(f)printf "%s",buf; f=0} /pattern3/{f=0}' file
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
To possibly help with comprehension, here's the above spread across a few lines and with wordier variable names:
awk '
/pattern1/ {
found=!found
buffer=""
}
found {
buffer = buffer $0 ORS
}
/pattern2/ {
if (found) {
printf "%s",buffer
}
found=0
}
/pattern3/ {
found=0
}
' file
I got lost among my hold-spaces in a pure sed solution; so here is an alternative
$ tac input | sed '/pattern3/,/pattern1/d' | tac
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
Lets imagine I have a file with some lines and there is one line with this structure:
blah blah
YYYY :['aaa','ddd']
blah
XXXX :['member1', 'member2']
blah blah
I want to have a script to add member3 to the end of XXXX array automatically. I tried to use sed, but I do not know how to replace the last bracket of the lines started with XXXX with "'member3']". So it looks like this:
blah blah
YYYY :['aaa','ddd']
blah
XXXX :['member1', 'member2', 'member3']
blah blah
Any help?
sed "/^XXXX /s/\]\$/, 'member3']/" < input
This applies a substitution to the lines that start with XXXX, replacing the final ] with 'member3']
A bit unclear, is this what you're after:
$ echo "XXXXX :['member1', 'member2']" | sed "s/]$/, 'member3']/"
XXXXX :['member1', 'member2', 'member3']
Update
$ cat file.txt
bla
bla bla
XXXXX :['member1', 'member2']
bla
bla bla
$ sed "s/]$/, 'member3']/" file.txt
bla
bla bla
XXXXX :['member1', 'member2', 'member3']
bla
bla bla