How to get values from several columns based on distinct values from one column?
I have a table with 3 columns (ID, Name ,Date).
I need to retrieve all distinct names with respective Date using Entity Framework, how can I do it?
I have this
TabLogVisits = await _context.TabLogVisits
.Select( x => new { x.Name, x.Date })
.Distinct()
.ToListAsync();
but it retrieves all the names, not distinct.
If I use this
TabLogVisits = await _context.TabLogVisits
.Select( x => x.Name)
.Distinct()
.ToListAsync();
I can get only the distinct name, but, of course, I will not get the respective Date.
I already tried everything I found in google, but I can't figure it out.
What am I doing wrong here?
Related
I'm new to Scala and slick. My use case is like this. I have name, school, and gradeList. Name and school are string values. GradeList is a map where i have subject as key and grade as value. I need to write a search query for this scenario. And also name and school columns are in one table and subject and grade columns are in another table. So i need to join them too. I can make the query work when a string value is given for subject and grade. But I cannot figure out a way to iterate through the map where all the subject, grade key pair is considered for the query.
So far my code is as below.
val innerJoin = for {
(a, _) <- nameQuery.join(gradeQuery).on(_.id === _.studentId)
.filter(result =>
(result._1.name.toLowerCase.like(s"%$name%") &&
(result._1.school.toLowerCase.like(s"%$school%") &&
result._2.subject.toLowerCase.like(s"%$subject%") &&
result._2.grade.toLowerCase.like(s"%$grade%"))
} yield a
innerJoin.distinctOn(_.id).take(limit).result
name, school, gradeList are my parameters to the function.
Can someone help me in finding a solution for this? Thanks in advance.
I have two rdd one rdd have just one column other have two columns to join the two RDD on key's I have add dummy value which is 0 , is there any other efficient way of doing this using join ?
val lines = sc.textFile("ml-100k/u.data")
val movienamesfile = sc.textFile("Cml-100k/u.item")
val moviesid = lines.map(x => x.split("\t")).map(x => (x(1),0))
val test = moviesid.map(x => x._1)
val movienames = movienamesfile.map(x => x.split("\\|")).map(x => (x(0),x(1)))
val shit = movienames.join(moviesid).distinct()
Edit:
Let me convert this question in SQL. Say for example I have table1 (moveid) and table2 (movieid,moviename). In SQL we write something like:
select moviename, movieid, count(1)
from table2 inner join table table1 on table1.movieid=table2.moveid
group by ....
here in SQL table1 has only one column where as table2 has two columns still the join works, same way in Spark can join on keys from both the RDD's.
Join operation is defined only on PairwiseRDDs which are quite different from a relation / table in SQL. Each element of PairwiseRDD is a Tuple2 where the first element is the key and the second is value. Both can contain complex objects as long as key provides a meaningful hashCode
If you want to think about this in a SQL-ish you can consider key as everything that goes to ON clause and value contains selected columns.
SELECT table1.value, table2.value
FROM table1 JOIN table2 ON table1.key = table2.key
While these approaches look similar at first glance and you can express one using another there is one fundamental difference. When you look at the SQL table and you ignore constraints all columns belong in the same class of objects, while key and value in the PairwiseRDD have a clear meaning.
Going back to your problem to use join you need both key and value. Arguably much cleaner than using 0 as a placeholder would be to use null singleton but there is really no way around it.
For small data you can use filter in a similar way to broadcast join:
val moviesidBD = sc.broadcast(
lines.map(x => x.split("\t")).map(_.head).collect.toSet)
movienames.filter{case (id, _) => moviesidBD.value contains id}
but if you really want SQL-ish joins then you should simply use SparkSQL.
val movieIdsDf = lines
.map(x => x.split("\t"))
.map(a => Tuple1(a.head))
.toDF("id")
val movienamesDf = movienames.toDF("id", "name")
// Add optional join type qualifier
movienamesDf.join(movieIdsDf, movieIdsDf("id") <=> movienamesDf("id"))
On RDD Join operation is only defined for PairwiseRDDs, So need to change the value to pairedRDD. Below is a sample
val rdd1=sc.textFile("/data-001/part/")
val rdd_1=rdd1.map(x=>x.split('|')).map(x=>(x(0),x(1)))
val rdd2=sc.textFile("/data-001/partsupp/")
val rdd_2=rdd2.map(x=>x.split('|')).map(x=>(x(0),x(1)))
rdd_1.join(rdd_2).take(2).foreach(println)
I have many records that has a datetime field:
MyTable(ID, StartDate,...)
And I have as parameter a startDate, I would like to get all the records that have a startDate >= than the date set in the parameter and also I would like the record which ID is ID -1 of the ID of the first record which startDate >= of the date of the parameter (first record when the result is ordered by date).
Something like that:
dbContext.MyTable.Where(x => x.ID >= dbContext.MyTable.OrderBy(y => y.StartDate).Where(y => y.StartDate >= myDate).First()).ToList();
But I get an error because I can't use First() in this place.
Also if I would use it, first execute the query to the database, but I don't want to do it at this point, because I am constructing a dynamic query and I only want one trip to the database.
So I would like to know if it is possible to use as condition the first element of a result.
Thanks.
You can use Take(1) as replacement of First.
dbContext.MyTable
.Where(x =>
dbContext.MyTable
.OrderBy(y => y.StartDate)
.Where(y => y.StartDate >= myDate)
.Take(1)
.Any(y => x.ID >= y.ID))
.ToList();
I have a few Tables I want to join together:
Users
UserRoles
WorkflowRoles
Role
Workflow
The equivalent sql I want to generate is something like
select * from Users u
inner join UserRoles ur on u.UserId = ur.UserId
inner join WorkflowRoles wr on wr.RoleId = ur.RoleId
inner join Workflow w on wr.WorkflowId = w.Id
where u.Id = x
I want to get all the workflows a user is part of based on their roles in one query. I've found that you can get the results like this:
user.Roles.SelectMany(r => r.Workflows)
but this generates a query for each role which is obviously less than ideal.
Is there a proper way to do this without having to resort to hacks like generating a view or writing straight sql?
You could try the following two queries:
This one is better readable, I think:
var workflows = context.Users
.Where(u => u.UserId == givenUserId)
.SelectMany(u => u.Roles.SelectMany(r => r.Workflows))
.Distinct()
.ToList();
(Distinct because a user could have two roles and these roles may contain the same workflow. Without Distinct duplicate workflows would be returned.)
But this one performs better, I believe:
var workflows = context.WorkFlows
.Where(w => w.Roles.Any(r => r.Users.Any(u => u.UserId == givenUserId)))
.ToList();
So it turns out the order which you select makes the difference:
user.Select(x => x.Roles.SelectMany(y => y.Workflows)).FirstOrDefault()
Have not had a chance to test this, but it should work:
Users.Include(user => user.UserRoles).Include(user => user.UserRole.WorkflowRoles.Workflow)
If the above is not correct then is it possible that you post your class structure?
I have a table where I have added a new column, and I want to write a SQL statement to update that column based on existing information. Here are the two tables and the relevant columns
'leagues'
=> id
=> league_key
=> league_id (this is the new column)
'permissions'
=> id
=> league_key
Now, what I want to do, in plain English, is this
Set leagues.league_id to be permissions.id for each value of permissions.league_key
I had tried SQL like this:
UPDATE leagues
SET league_id =
(SELECT id FROM permissions WHERE league_key =
(SELECT distinct(league_key) FROM leagues))
WHERE league_key = (SELECT distinct(league_key) FROM leagues)
but I am getting an error message that says
ERROR: more than one row returned by a subquery used as an expression
Any help for this would be greatly appreciated
Based on your requirements of
Set leagues.league_id to be permissions.id for each value of permissions.league_key
This does that.
UPDATE leagues
SET league_id = permissions_id
FROM permissions
WHERE permissions.league_key = leagues.league_key;
When you do a subquery as an expression, it can't return a result set. Your subquery must evaluate to a single result. The error that you are seeing is because one of your subqueries returns more than one value.
Here is the relevant documentation for pg84: