I'm looking for a pattern where Word 1 is near AND before Word 2. So my basic query was like
(Word1 << Word 2) (Word1 NEAR\2 Word2)
Which will naturally match
bla bla Word1 Word2
bla bla Word1 and Word2
but not match
bla bla Word2 Word1
bla bla Word1 bla bla. Bla bla bla. Word2 bla bla.
The problem is it DOES match
bla bla Word1. Bla bla bla Word2 Word1.
is that matches both the NEAR/1 and the << conditions though not in the way I intended.
Is there any other operator/logic I can use to negate the match in the last example?
The closest I can think is
"Word1 word2" | "word1 * word2"
Related
I have a text file with the below content:
.....
Phone: 123-456-7899, 555-555-5555, 999-333-7890
Names: Bob Jones, Mary Smith, Bob McAlly,
Sally Fields, Tom Hanks, Jeffery Cook,
Betty White, Tom McDonald, Bruce Harris
Address: 1234 Main, 445 Westlake, 3332 Front Street
.....
I am looking to grab all of the names starting from Bob Jones and ending with Bruce Harris from the file. I have this Scala code, but it only gets the first line:
Bob Jones, Mary Smith, Bob McAlly,
Here is the code:
val addressBookRDD = sc.textFile(file);
val myRDD = addressBookRDD.filter(line => line.contains("Names: ")
I don’t know how to deal with the returns or newlines in the text file, so the code only grabs the first line of the names, but not the rest of the names which are separate lines. I am looking for this type of result:
Bob Jones, Mary Smith, Bob McAlley, Sally Fields, Tom Hanks, Jeffery
Cook, Betty White, Tom McDonald, Bruce Harris
As I pointed out in a comment, to read a file structured this way is not really something Spark is very suitable for. If the file is not very large, using only Scala would probably be a better way to do it. Here is a Scala implementation:
val lines = scala.io.Source.fromFile(file).getLines
val nameLines = lines
.dropWhile(line => !line.startsWith("Names: "))
.takeWhile(line => !line.startsWith("Address: "))
.toSeq
val names = (nameLines.head.drop(7) +: nameLines.tail)
.mkString(",")
.split(",")
.map(_.trim)
.filter(_.nonEmpty)
Printing names using names foreach println will give you:
Bob Jones
Mary Smith
Bob McAlly
Sally Fields
Tom Hanks
Jeffery Cook
Betty White
Tom McDonald
Bruce Harris
val name = "Cory"
"""
|Hi! My name is " + name + " how are you?
""".stripMargin
The portion + name + doesn't get interpreted as code, but just as text. How can I print the value of a variable inside a multiline string?
If you're on 2.10 or later, you can use string interpolation:
scala> s"""
| |Hi! My name is $name how are you?
| """.stripMargin
res0: String =
"
Hi! My name is Cory how are you?
"
For 2.9 or earlier you're stuck with something like this:
scala> ("""
| |Hi! My name is """ + name + """ how are you?
| """).stripMargin
res1: String =
"
Hi! My name is Cory how are you?
"
Note that there are several flavors of string interpolation in Scala—s"..." is the simplest.
How can i print lines between pattern1 and pattern2, i dont need lines between pattern1 and pattern3 though.
Please suggest the solution either in sed, awk.
I have case like this.
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern3
pattern1
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern3
Desire output:
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
With sed:
sed -n '/pattern1/{:l N;/pattern3/b;/pattern2/!bl;p}' input
Description
/pattern1/{ # Match pattern1 and ...
:l N; # start loop and read a line
/pattern3/b # quit if pattern3 matches
/pattern2/!bl # loop until pattern2 matches
p # print all lines
Output
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
One method:
$ awk '/pattern1/{s=1;f=1;s=NR}f{p[NR]=$0}/pattern3/{s=0}/pattern2/&&s{f=0;for(i=s;i<=NR;i++)print p[i]}' file
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
$ awk '/pattern1/{f=!f;buf=""} f{buf = buf $0 ORS} /pattern2/{if(f)printf "%s",buf; f=0} /pattern3/{f=0}' file
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
To possibly help with comprehension, here's the above spread across a few lines and with wordier variable names:
awk '
/pattern1/ {
found=!found
buffer=""
}
found {
buffer = buffer $0 ORS
}
/pattern2/ {
if (found) {
printf "%s",buffer
}
found=0
}
/pattern3/ {
found=0
}
' file
I got lost among my hold-spaces in a pure sed solution; so here is an alternative
$ tac input | sed '/pattern3/,/pattern1/d' | tac
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
Lets imagine I have a file with some lines and there is one line with this structure:
blah blah
YYYY :['aaa','ddd']
blah
XXXX :['member1', 'member2']
blah blah
I want to have a script to add member3 to the end of XXXX array automatically. I tried to use sed, but I do not know how to replace the last bracket of the lines started with XXXX with "'member3']". So it looks like this:
blah blah
YYYY :['aaa','ddd']
blah
XXXX :['member1', 'member2', 'member3']
blah blah
Any help?
sed "/^XXXX /s/\]\$/, 'member3']/" < input
This applies a substitution to the lines that start with XXXX, replacing the final ] with 'member3']
A bit unclear, is this what you're after:
$ echo "XXXXX :['member1', 'member2']" | sed "s/]$/, 'member3']/"
XXXXX :['member1', 'member2', 'member3']
Update
$ cat file.txt
bla
bla bla
XXXXX :['member1', 'member2']
bla
bla bla
$ sed "s/]$/, 'member3']/" file.txt
bla
bla bla
XXXXX :['member1', 'member2', 'member3']
bla
bla bla
I have the lines in text.txt as below:
blah blah..
blah abc blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah blah..
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah blah..
blah efg blah blah
blah blah..
blah blah..
I want to output the lines between each last occurrence of "abc" before "efg" and "efg", for the above example, I want to output:
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah abc blah
blah blah..
blah efg blah blah
I know sed can select ranges using two patterns, like:
sed -n '/abc/,/efg/p' test.txt
However the output will begin from the first occurrence of "abc" instead of the last one, the output is as following:
blah abc blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah blah..
blah efg blah blah
Any enhancement can I do on the command line so the output will begin from a last occurrence of "abc"?
This might work for you (GNU sed):
sed -n '/\<abc\>/,/\<efg\>/{/\<abc\>/{h;d};H;/\<efg\>/{x;p}}' file