Sphinx NEAR and << issues

Sphinx NEAR and << issues - sphinx

I'm looking for a pattern where Word 1 is near AND before Word 2. So my basic query was like
(Word1 << Word 2) (Word1 NEAR\2 Word2)
Which will naturally match
bla bla Word1 Word2
bla bla Word1 and Word2
but not match
bla bla Word2 Word1
bla bla Word1 bla bla. Bla bla bla. Word2 bla bla.
The problem is it DOES match
bla bla Word1. Bla bla bla Word2 Word1.
is that matches both the NEAR/1 and the << conditions though not in the way I intended.
Is there any other operator/logic I can use to negate the match in the last example?

The closest I can think is
"Word1 word2" | "word1 * word2"

Related

How to grab text with newlines in a text file?

I have a text file with the below content:
.....
Phone: 123-456-7899, 555-555-5555, 999-333-7890
Names: Bob Jones, Mary Smith, Bob McAlly,
Sally Fields, Tom Hanks, Jeffery Cook,
Betty White, Tom McDonald, Bruce Harris
Address: 1234 Main, 445 Westlake, 3332 Front Street
.....
I am looking to grab all of the names starting from Bob Jones and ending with Bruce Harris from the file. I have this Scala code, but it only gets the first line:
Bob Jones, Mary Smith, Bob McAlly,
Here is the code:
val addressBookRDD = sc.textFile(file);
val myRDD = addressBookRDD.filter(line => line.contains("Names: ")
I don’t know how to deal with the returns or newlines in the text file, so the code only grabs the first line of the names, but not the rest of the names which are separate lines. I am looking for this type of result:
Bob Jones, Mary Smith, Bob McAlley, Sally Fields, Tom Hanks, Jeffery
Cook, Betty White, Tom McDonald, Bruce Harris

As I pointed out in a comment, to read a file structured this way is not really something Spark is very suitable for. If the file is not very large, using only Scala would probably be a better way to do it. Here is a Scala implementation:
val lines = scala.io.Source.fromFile(file).getLines
val nameLines = lines
.dropWhile(line => !line.startsWith("Names: "))
.takeWhile(line => !line.startsWith("Address: "))
.toSeq
val names = (nameLines.head.drop(7) +: nameLines.tail)
.mkString(",")
.split(",")
.map(_.trim)
.filter(_.nonEmpty)
Printing names using names foreach println will give you:
Bob Jones
Mary Smith
Bob McAlly
Sally Fields
Tom Hanks
Jeffery Cook
Betty White
Tom McDonald
Bruce Harris

Print a variable as part of a multiline string?

val name = "Cory"
"""
|Hi! My name is " + name + " how are you?
""".stripMargin
The portion + name + doesn't get interpreted as code, but just as text. How can I print the value of a variable inside a multiline string?

If you're on 2.10 or later, you can use string interpolation:
scala> s"""
| |Hi! My name is $name how are you?
| """.stripMargin
res0: String =
"
Hi! My name is Cory how are you?
"
For 2.9 or earlier you're stuck with something like this:
scala> ("""
| |Hi! My name is """ + name + """ how are you?
| """).stripMargin
res1: String =
"
Hi! My name is Cory how are you?
"
Note that there are several flavors of string interpolation in Scala—s"..." is the simplest.

how can i print lines between 2 pattern using sed

How can i print lines between pattern1 and pattern2, i dont need lines between pattern1 and pattern3 though.
Please suggest the solution either in sed, awk.
I have case like this.
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern3
pattern1
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern3
Desire output:
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2

With sed:
sed -n '/pattern1/{:l N;/pattern3/b;/pattern2/!bl;p}' input
Description
/pattern1/{ # Match pattern1 and ...
:l N; # start loop and read a line
/pattern3/b # quit if pattern3 matches
/pattern2/!bl # loop until pattern2 matches
p # print all lines
Output
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2

One method:
$ awk '/pattern1/{s=1;f=1;s=NR}f{p[NR]=$0}/pattern3/{s=0}/pattern2/&&s{f=0;for(i=s;i<=NR;i++)print p[i]}' file
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2

$ awk '/pattern1/{f=!f;buf=""} f{buf = buf $0 ORS} /pattern2/{if(f)printf "%s",buf; f=0} /pattern3/{f=0}' file
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2
To possibly help with comprehension, here's the above spread across a few lines and with wordier variable names:
awk '
/pattern1/ {
found=!found
buffer=""
}
found {
buffer = buffer $0 ORS
}
/pattern2/ {
if (found) {
printf "%s",buffer
}
found=0
}
/pattern3/ {
found=0
}
' file

I got lost among my hold-spaces in a pure sed solution; so here is an alternative
$ tac input | sed '/pattern3/,/pattern1/d' | tac
pattern1
blah blah blah
blah blah blah
blah blah blah
pattern2
pattern1
blah blah blah
blah blah blah
pattern2

Sed: Add a word (or a new member) to a set line in a file

Lets imagine I have a file with some lines and there is one line with this structure:
blah blah
YYYY :['aaa','ddd']
blah
XXXX :['member1', 'member2']
blah blah
I want to have a script to add member3 to the end of XXXX array automatically. I tried to use sed, but I do not know how to replace the last bracket of the lines started with XXXX with "'member3']". So it looks like this:
blah blah
YYYY :['aaa','ddd']
blah
XXXX :['member1', 'member2', 'member3']
blah blah
Any help?

sed "/^XXXX /s/\]\$/, 'member3']/" < input
This applies a substitution to the lines that start with XXXX, replacing the final ] with 'member3']

A bit unclear, is this what you're after:
$ echo "XXXXX :['member1', 'member2']" | sed "s/]$/, 'member3']/"
XXXXX :['member1', 'member2', 'member3']
Update
$ cat file.txt
bla
bla bla
XXXXX :['member1', 'member2']
bla
bla bla
$ sed "s/]$/, 'member3']/" file.txt
bla
bla bla
XXXXX :['member1', 'member2', 'member3']
bla
bla bla

find lines between two patterns using sed

I have the lines in text.txt as below:
blah blah..
blah abc blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah blah..
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah blah..
blah efg blah blah
blah blah..
blah blah..
I want to output the lines between each last occurrence of "abc" before "efg" and "efg", for the above example, I want to output:
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah abc blah
blah blah..
blah efg blah blah
I know sed can select ranges using two patterns, like:
sed -n '/abc/,/efg/p' test.txt
However the output will begin from the first occurrence of "abc" instead of the last one, the output is as following:
blah abc blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah abc blah
blah blah..
blah efg blah blah
Any enhancement can I do on the command line so the output will begin from a last occurrence of "abc"?

This might work for you (GNU sed):
sed -n '/\<abc\>/,/\<efg\>/{/\<abc\>/{h;d};H;/\<efg\>/{x;p}}' file

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Sphinx NEAR and << issues - sphinx

The closest I can think is "Word1 word2" | "word1 * word2"

Related

How to grab text with newlines in a text file?

Print a variable as part of a multiline string?

how can i print lines between 2 pattern using sed

Sed: Add a word (or a new member) to a set line in a file

find lines between two patterns using sed

Categories

Resources