How do you update multiple columns using Slick Lifted Embedding? - scala

How do you update multiple columns using Slick Lifted Embedding ? This document doesn't say much.
I expected it to be something like this
Query(AbilitiesTable).filter((ab: AbilitiesTable.type) => ab.id === ability_id).map((ab: AbilitiesTable.type) => (ab.verb, ab.subject)).update("edit", "doc")

With recent versions of Slick, this way of writing it works:
Users.filter(_.id === filterId)
.map(x => (x.name, x.age))
.update(("john", 99))
Be careful to remember the extra parentheses if you are updating more than one property otherwise you might get a compiler warning.

I figured it out. It should be like this
val map = Query(AbilitiesTable)
.filter(_.id === ability_id)
.map(ab => ab.verb ~ ab.context)
map.update(("", ""))
Typesafe, why your documentation is so bad ? I have to Google pretty much every silly thing or dig through unit-tests for hours. Please improve it. Thanks.

Related

How to remove header by using filter function in spark?

I want to remove header from a file. But, since the file will be split into partitions, I can't just drop the first item. So I was using a filter function to figure it out and here below is the code I am using :
val noHeaderRDD = baseRDD.filter(line=>!line.contains("REPORTDATETIME"));
and the error I am getting says "error not found value line "what could be the issue here with this code?
I don't think anybody answered the obvious, whereby line.contains also possible:
val noHeaderRDD = baseRDD.filter(line => !(line contains("REPORTDATETIME")))
You were nearly there, just a syntax issue, but that is significant of course!
Using textFile as below:
val rdd = sc.textFile(<<path>>)
rdd.filter(x => !x.startsWith(<<"Header Text">>))
Or
In Spark 2.0:
spark.read.option("header","true").csv("filePath")

Performance issue with fluent query in EF vs SharpRepository

I was having some performance issues using SharpRepository, and after playing around the SQL Query Profiler I found the reason.
With EF I can do stuff like this:
var books = db.Books.Where(item => item.Year == '2016');
if (!string.IsNullorEmpty(search_author))
books = books.Where(item => item.Author.Contains(search_author);
return (books.ToList());
EF will not really do anything until books is used (last line) and then it will compile a query that will select only the small set of data matching year and author from the db.
But SharpRepository evaluates books at once, so this:
var books = book_repo.Books.FindAll(item => item.Year == '2016');
if (!string.IsNullorEmpty(search_author))
books = books.Where(item => item.Author.Contains(search_author);
return (books.ToList());
will compile a query like "select * from Books where Year == '2016'" at the first line, and get ALL those records from the database! Then at the second line it will make a search for the author within the C# code... That behaviour can be a major difference in performance when using large databases, and it explains why my queries timed out...
I tried using repo.GetAll().Where() instead of repo.FindAll().... but it worked the same way.
Am I misunderstanding something here, and is there a way around this issue?
You can use repo.AsQueryable() but by doing that you lose some of the functionality that SharpRepository can provide, like caching or and aspects/hooks you are using. It basically takes you out of the generic repo layer and lets you use the underlying LINQ provider. It has it's benefits for sure but in your case you can just build the Predicate conditionally and pass that in to the FindAll method.
You can do this by building an Expression predicate or using Specifications. Working with the Linq expressions does not always feel clean, but you can do it. Or you can use the Specification pattern built into SharpRepository.
ISpecification<Book> spec = new Specification<Book>(x => x.Year == 2016);
if (!string.IsNullorEmpty(search_author))
{
spec = spec.And(x => x.Author.Contains(search_author));
}
return repo.FindAll(spec);
For more info on Specifications you can look here: https://github.com/SharpRepository/SharpRepository/blob/develop/SharpRepository.Samples/HowToUseSpecifications.cs
Ivan Stoev provided this answer:
"The problem is that most of the repository methods return IEnumerable. Try repo.AsQueryable(). "

Using ORACLE-LIKE like feature between Spark DataFrame and a List of words - Scala

My requirement is similar to one in :
LINK
Instead of direct match I need LIKE type match on a list. i.e Want to LIKE match COMMENTS with List
ID,COMMENTS
1,bad is he
2,hell thats good
3,sick !thats hell
4,That was good
List = ('good','horrible','hell')
I want to get output like
ID, COMMENTS,MATCHED_WORD,NUM_OF_MATCHES
1,bad is he,,
2,hell thats good,(hell,good),2
3,sick !thats hell,hell,1
4,That was good,good,1
In simpler terms I need : ( rlike isn't matching values from a list instead expects one single string , as far I know it)
file.select($"COMMENTS",$"ID").filter($"COMMENTS".rlike(List_ :_*)).show()
I tried isin , that works but matches WHOLE WORDS ONLY.
file.select($"COMMENTS",$"ID").filter($"COMMENTS".isin(List_ :_*)).show()
Kindly help or please re-direct to me any links as I tried lot of searching !
With simple words I'd use an alternative:
val xs = Seq("good", "horrible", "hell")
df.filter($"COMMENTS".rlike(xs.mkString("|"))
otherwise:
df.filter(xs.foldLeft(lit(false))((acc, x) => acc || $"COMMENTS".rlike(x)))

Is there a way to filter a field not containing something in a spark dataframe using scala?

Hopefully I'm stupid and this will be easy.
I have a dataframe containing the columns 'url' and 'referrer'.
I want to extract all the referrers that contain the top level domain 'www.mydomain.com' and 'mydomain.co'.
I can use
val filteredDf = unfilteredDf.filter(($"referrer").contains("www.mydomain."))
However, this pulls out the url www.google.co.uk search url that also contains my web domain for some reason. Is there a way, using scala in spark, that I can filter out anything with google in it while keeping the correct results I have?
Thanks
Dean
You can negate predicate using either not or ! so all what's left is to add another condition:
import org.apache.spark.sql.functions.not
df.where($"referrer".contains("www.mydomain.") &&
not($"referrer".contains("google")))
or separate filter:
df
.where($"referrer".contains("www.mydomain."))
.where(!$"referrer".contains("google"))
You may use a Regex. Here you can find a reference for the usage of regex in Scala. And here you can find some hints about how to create a proper regex for URLs.
Thus in your case you will have something like:
val regex = "PUT_YOUR_REGEX_HERE".r // something like (https?|ftp)://www.mydomain.com?(/[^\s]*)? should work
val filteredDf = unfilteredDf.filter(regex.findFirstIn(($"referrer")) match {
case Some => true
case None => false
} )
This solution requires a bit of work but is the safest one.

A better way of handling the below program(may be with Take/Skip/TakeWhile.. or anything better)

I have a data table which has only one row. But it is having 44 columns. My task is to get the columns from the 4th row till the end.
Henceforth, I have done the below program that suits my requirement.
(kindly note that dt is the datatable)
List<decimal> lstDr = new List<decimal>();
Enumerable.Range(0, dt.Columns.Count).ToList().ForEach(i =>
{
if (i > 3)
lstDr.Add(Convert.ToDecimal(dt.Rows[0][i]));
}
);
There is nothing harm in the program. Works fine.
But I feel that there may be a better way of handimg the program may be with Skip ot Take or TakeWhile or anyother stuff.
I am looking for a better solution that the one I implemented.
Is it possible?
I am using c#3.0
Thanks.
This should do it:
List<Decimal> lstDr =
dt.Rows[0].ItemArray
.Skip(3)
.Select(o => Convert.ToDecimal(o))
.ToList();