With the SPLIT function, I'm trying to split an array of vertical bar delimited names (firstname lastname) and return a string of names (initial lastname) each on a new line. Thanks for the assistance.
--data
Tom Smith | Tim Jones | Mary Adams
--output
T Smith
T Jones
M Adams
Related
I have a table
users
name: varchar(20)
data:jsonb
Records look something like this
adam, {"car": "chevvy", "fruit": "apple"}
john, {"car": "toyota", "fruit": "orange"}
I want to extract all the fields like this
name. |.type |. value
adam. car chevrolet
adam. fruit apple
john. car toyota
john. car orange
For your example you can do:
SELECT name, d.key AS type, d.value
FROM users u,
JSONB_EACH_TEXT(u.data) AS d
;
output:
name | type | value
------+-------+--------
adam | car | chevvy
adam | fruit | apple
john | car | toyota
john | fruit | orange
(4 rows)
There are good explanations here PostgreSQL - jsonb_each
I have a df and I need to search if there is any set of elements from the list of keywords or not .. if yes I need to put all these keywords # separated in a new column called found or not.
My df is like
utid | description
123 | my name is harry and I live in newyork
234 | my neighbour is daniel and he plays hockey
The list is quite big something like list ={harry,daniel,hockey,newyork}
the output should be like
utid | description | foundornot
123 | my name is harry and I live in newyork | harry#newyork
234 | my neighbour is daniel and he plays hockey | daniel#hockey
The list is quite big like some 20k keywords ..also in case not found print NF
You can check for the elements in the list if exists each row of description column in the udf function and make the list of the elements as a string separated by # to return it or else NF string as
val list = List("harry","daniel","hockey","newyork")
import org.apache.spark.sql.functions._
def checkUdf = udf((strCol: String) => if (list.exists(strCol.contains)) list.filter(strCol.contains(_)).mkString("#") else "NF")
df.withColumn("foundornot", checkUdf(col("description"))).show(false)
which should give you
+----+------------------------------------------+-------------+
|utid|description |foundornot |
+----+------------------------------------------+-------------+
|123 |my name is harry and i live in newyork |harry#newyork|
|234 |my neighbour is daniel and he plays hockey|daniel#hockey|
+----+------------------------------------------+-------------+
I have a text file with the below content:
.....
Phone: 123-456-7899, 555-555-5555, 999-333-7890
Names: Bob Jones, Mary Smith, Bob McAlly,
Sally Fields, Tom Hanks, Jeffery Cook,
Betty White, Tom McDonald, Bruce Harris
Address: 1234 Main, 445 Westlake, 3332 Front Street
.....
I am looking to grab all of the names starting from Bob Jones and ending with Bruce Harris from the file. I have this Scala code, but it only gets the first line:
Bob Jones, Mary Smith, Bob McAlly,
Here is the code:
val addressBookRDD = sc.textFile(file);
val myRDD = addressBookRDD.filter(line => line.contains("Names: ")
I don’t know how to deal with the returns or newlines in the text file, so the code only grabs the first line of the names, but not the rest of the names which are separate lines. I am looking for this type of result:
Bob Jones, Mary Smith, Bob McAlley, Sally Fields, Tom Hanks, Jeffery
Cook, Betty White, Tom McDonald, Bruce Harris
As I pointed out in a comment, to read a file structured this way is not really something Spark is very suitable for. If the file is not very large, using only Scala would probably be a better way to do it. Here is a Scala implementation:
val lines = scala.io.Source.fromFile(file).getLines
val nameLines = lines
.dropWhile(line => !line.startsWith("Names: "))
.takeWhile(line => !line.startsWith("Address: "))
.toSeq
val names = (nameLines.head.drop(7) +: nameLines.tail)
.mkString(",")
.split(",")
.map(_.trim)
.filter(_.nonEmpty)
Printing names using names foreach println will give you:
Bob Jones
Mary Smith
Bob McAlly
Sally Fields
Tom Hanks
Jeffery Cook
Betty White
Tom McDonald
Bruce Harris
I have a table like this
----------------------------------------------
ID Name Value |
---------------------------------------------|
1 Bob 4 |
2 Mary 3 |
3 Bob 5 |
4 Jane 3 |
5 Jane 1 |
----------------------------------------------
Is there any ways to do out a calculated field where if the name is "Bob" , it'll sum up all the values that have the name "Bob"?
Thanks in advance!
If Name = “Bob” then Value end
When running a query I need column numbers to be applied to each row so that when I use the query to create a report in SSRS I can tell the report which data to put in which column. Example:
Case 1 | Jane Doe | Col 1
Case 1 | John Doe | Col 2
Case 2 | Sally Smith | Col 1 (only name in case)
My current query uses:
DECLARE #NumOfCols int=2;
And then this to tell it how to separate the columns:
(row_number() over (partition by case_num order by child_first) + #NumOfCols - 1)% #NumOfCols + 1 as DisplayCol
The problem is, when I run the query, even if a result only has one name (so only one column is needed) my data is getting duplicated. It seems like it is making it a mandatory column 1 and column 2 even if there is no data for a second column. Like this:
Case 1 | Jane Doe | Col 1
Case 1 | John Doe | Col 2
Case 2 | Sally Smith | Col 1 (only name in case)
Case 2 | Sally Smith | Col 2 (duplicating)
I hope this makes sense. Any ideas on how to eliminate duplicating the data?