there are two questions about join query - rdbms

The result of any join query generated is a permanent table? true or false
I got this question in one of my exam and I marked it as false but was evaluated wrong.please clarify my doubt about this

Related

Postgres query to make faster [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Any opinion on which query will be faster ( or no difference) and why?
select * from A a left join B b on a.id = b.cid where b.cl1 = 's1' or b.c1 ='s2' or b.c1 ='s3' ;
or
select * from A a left join B b on a.id = b.cid where b.cl1 in ('s1','s2','s3');
The above is just examples to provide idea on what I am trying to do. May not be the exact syntax.
The reason for asking the question is , I built up a spring-hibernate-JPA query similar to the first one and at this point performance is poor. Hence looking for all possible ways to make the performance better. It may or may not be the query. But not being expert on DB side, looking for information
The second query is always faster, because it can use a single index scan if you have an index on b.cl1.
But since the first query has the conditions on different columns, the queries are quite different, and it makes little sense to compare them.
In such scenarios, you can simply analyze your query by using PostgreSQL query analize and see which one is speed.
Update: As #MatBailie said on his comment, LEFT JOIN requires an ON
clause.
EXPLAIN ANALYSE
SELECT *
FROM A a
LEFT JOIN B ON b
WHERE b.cl1 = 's1'
OR b.c2 = 's2'
OR b.c3 = 's3';
EXPLAIN ANALYSE
SELECT *
FROM A a
LEFT JOIN B ON b
WHERE b.cl1 IN ('s1', 's2', 's3');

How do I correctly use LINQ groupby for two tables joined via an intermediate table?

I have three tables.
Interviews
Interviewers
InterviewSchedule.
An interviewer can be scheduled for multiple interviews
An interview can have multiple interviewers.
So,
InterviewSchedule table has columns interviewid, interviewerid. (Many to many relationship)
Interviewtable has columns - InterviewId, InterviewLocation, InterviewSubject.
Interviewer table has columns - InterviewerId, InterviewerName, InterviewerTitle.
Now, I want to generate a report of interviews with the interviewer details.
I created a dataobject as InterviewId, InterviewLocation, InterviewSubject, List<Interviewer>;
I am trying to make one LINQ query to get my output. I use entityframework and already have the context created.
I am kind of new to LINQ, but I see this should be possible and I saw multipleposts from people groupby, using Id.
I think my problem is I want to select multiple fields from both the tables via the intermediate table.
var output = (from i in Interview
join ia in InterviewSchedule on i.interviewid equals ia.interviewid
join iw in Interviewers on ia.interviewerid equals iw.interviewerid)
group i by i.interviewid into g
select new {i, interviewers = new {interviewername, interviewertitle} };
I am lost at this point. Is this not the right approach? Do I have to make a 'for' loop to add all the interviewers to the list, one by one?
Please, try this
var output = (from i in Interview
group i by i.InterviewId into g
join ia in InterviewSchedule on g.Key equals ia.InterviewId
join iw in Interviewers on g.Key equals iw.InterviewerId
select new { g.Key, interviewers = iw }).ToList();
see LINQ: combining join and group by

using talend to output data not matched in lookup table

I know how to use talend's tMap component to output matched data in lookup data, however, I don't know how to output these rows that is not matched with data in lookup table. Maybe a simple question to senior user. Thanks all the way.
Regards,
Joe
Two steps are required to gather rejected rows:
On the left hand side you have to set Join Model to Inner Join on the join you want to find rejected rows
On the right hand side set Catch lookup inner join reject to true. This row will get all rejected entries. So you can create one row which gets all found entries and another row which delivers only the rejected rows
Usually this leads to a tMap with two output rows in your job.
in tMap output table there is setting options. Go to that and there you will see couple of options like "Catch lookup inner join reject" & "catch output reject" - you can set them to false/true based on your need. My guess is that you are looking for "Catch lookup inner join reject".

Where clause versus join clause in Spark SQL

I am writing a query to get records from Table A which satisfies a condition from records in Table B. For example:
Table A is:
Name Profession City
John Engineer Palo Alto
Jack Doctor SF
Table B is:
Profession City NewJobOffer
Engineer SF Yes
and I'm interested to get Table c:
Name Profession City NewJobOffer
Jack Engineer SF Yes
I can do this in two ways using where clause or join query which one is faster and why in spark sql?
Where clause to compare the columns add select those records or join on the column itself, which is better?
It's better to provide filter in WHERE clause. These two expressions are not equivalent.
When you provide filtering in JOIN clause, you will have two data sources retrieved and then joined on specified condition. Since join is done through shuffling (redistributing between executors) data first, you are going to shuffle a lot of data.
When you provide filter in WHERE clause, Spark can recognize it and you will have two data sources filtered and then joined. This way you will shuffle less amount of data. What might be even more important is that this way Spark may also be able to do a filter-pushdown, filtering data at datasource level, which means even less network pressure.

Rails PG GroupingError column must appear in the GROUP BY clause

there are a few topics about this already with accepted answers but I couldn't figure out a solution based on those:
Eg:
Ruby on Rails: must appear in the GROUP BY clause or be used in an aggregate function
GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function
PGError: ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
My query is:
Idea.unscoped.joins('inner join likes on ideas.id = likes.likeable_id').
select('likes.id, COUNT(*) AS like_count, ideas.id, ideas.title, ideas.intro, likeable_id').
group('likeable_id').
order('like_count DESC')
This is fine in development with sqlite but breaks on heroku with PostgreSQL.
The error is:
PG::GroupingError: ERROR: column "likes.id" must appear in the GROUP BY clause or be used in an aggregate function
If I put likes.id in my group by then the results make no sense. Tried to put group before select but doesn't help. I even tried to take the query into two parts. No joy. :(
Any suggestions appreciated. TIA!
I don't know why you want to select likes.id in the first place. I see that you basically want the like_count for each Idea; I don't see the point in selecting likes.id. Also, when you already have the ideas.id, I don't see why you would want to get the value of likes.likeable_id since they'll both be equal. :/
Anyway, the problem is since you're grouping by likeable_id (basically ideas.id), you can't "select" likes.id since they would be "lost" by the grouping.
I suppose SQLite is lax about this. I imagine it wouldn't group things properly.
ANYWAY(2) =>
Let me propose a cleaner solution.
# model
class Idea < ActiveRecord::Base
# to save you the effort of specifying the join-conditions
has_many :likes, foreign_key: :likeable_id
end
# in your code elsewhere
ideas = \
Idea.
joins(:likes).
group("ideas.id").
select("COUNT(likes.id) AS like_count, ideas.id, ideas.title, ideas.intro").
order("like_count DESC")
If you still want to get the IDs of likes for each item, then after the above, here's what you could do:
grouped_like_ids = \
Like.
select(:id, :likeable_id).
each_with_object({}) do |like, hash|
(hash[like.likeable_id] ||= []) << like.id
end
ideas.each do |idea|
# selected previously:
idea.like_count
idea.id
idea.title
idea.intro
# from the hash
like_ids = grouped_like_ids[idea.id] || []
end
Other readers: I'd be very interested in a "clean" one-query non-sub-query solution. Let me know in the comments if you leave a response. Thanks.