Question: Find the top 3 highest rated ‘TOOLS’ apps that have more than 225,000 reviews. Also, include at least 1 positive review for each of the top 3 apps in the visualization.
so far I have:
SELECT A.app_name, R.reviews
FROM apps AS A JOIN apps_reviews AS R
LIMIT 3;
Table structure:
Related
Good Morning/Afternoon,
Fairly new at SQL in general, but I've been able to put together a few queries that give me the output I need for a certain business case, but I have a new requirement that I am struggling with.
In a nutshell, I am extracting a list of "sites" (physical locations) for a given customer along with the lat/lon, and then feeding each location to Google Maps to plot a point for each site. What I am trying to do is query for unique sites. If there are multiple sites that are within 1/2 mile of each other, they are lumped as a single site. For example, if a customer has 10 sites within 1/2 mile of each other, they technically have 1 site, not 10.
Here is an example of what I am doing:
select c.id, i.site_id, s.name, max(i.captured_at), s.center_lat, s.center_lng, CONCAT(s.center_lat, ',' , s.center_lng) AS LOCATION
from images i
inner join sites s on s.id = i.site_id
inner join customers c on c.id = s.customer_id
where i.hidden = 'false' and i.copied_from_id is null and i.status = 'complete' and c.id = '353'
group by c.id, i.site_id, s.name, s.center_lat, s.center_lng
order by max DESC
Here is an example of the output:
As it stands now, it returns a count of 4 sites (I am rendering the results in Google Data Studio displaying the count of records returned), which works fine for another scenario. However, since these sites are within 1/2 mile of each other, they are technically 1 site, not 4. I am trying to determine how to come up with a count of 1 site vs. 4 in this scenario. If there was another entry where the lat/lon (location) was more than 1/2 mile away, I would be looking for a count of 2 sites. I hope this all makes sense.
Currently trying to research where to start so if there are any references, or a push in the right direction, that would be awesome. Thanks very much.
Firstly, this is part of my college homework.
Now that's out of the way: I need to write a query that will get the number of free apps in a DB as a percentage of the total number of apps, sorted by what category the app is in.
I can get the number of free apps and also the number of total apps by category. Now I need to find the percentage, this is where it goes a bit pear-shaped.
Here is what I have so far:
-- find total number of apps per category
select #totalAppsPerCategory := count(*), category_primary
from apps
group by category_primary;
-- find number of free apps per category
select #freeAppsPerCategory:= count(*), category_primary
from apps
where (price = 0.0)
group by category_primary;
-- find percentage of free apps per category
set #totals = #freeAppsPerCategory / #totalAppsPercategory * 100;
select #totals, category_primary
from apps
group by category_primary;
It then lists the categories but the percentage listed in each category is the exact same value.
I had initially thought to use an array, but from what I have read mySql does not seem to support arrays.
I'm a bit lost of how to proceed from here.
Finally figured it out. Since I had been saving the previous results in variables it seems that it was not able to calculate on a row by row basis, which is why all the percentages were identical, it was an average. So the calculation needed to be part of the query.
Here's what I came up with:
SELECT DISTINCT
category_primary,
CONCAT(FORMAT(COUNT(CASE
WHEN price = 0 THEN 1
END) / COUNT(*) * 100,
1),
'%') AS FreeAppSharePercent
FROM
apps
GROUP BY category_primary
ORDER BY FreeAppSharePercent DESC;
Then the query result is:
I am confused when to use user-user collaborative filtering and when to use Item-Item collaborative filtering?
Please help!!
If you have more users than items in your dataset which is generally the case, it would be effective to use item-item collaborative filtering. eg: Amazon would have a huge base of customers as compared to products.
Moreover the user preferences and liking changes over time so it is difficult to tackle this problem in user-user collaborative filtering but with item generally it is seen the rating of item doesn't change much over a course of time.
Item-Item:- Looks for the similar items, which user X has already rated and recommends the most similar items. Here similarity means how people treat two items in terms of ratings. If two items get same kind of ratings with the same users then they are similar.For example:-
Per1 Per2 Per3
Item1 5 3 1
Ttem2 2 3 3
Item vector_1 = 5P1 + 3P2 + 1P3
Item vector_2 = 2P1 + 3P2 + 3P3
If we calculate the cosine similarity of two vectors:
Cos_sim = (5*2 + 3*3 + 1*3) / sqrt((25+9+1)*(4+9+9)
Cos_sim = 0.792
User-User:- Find similarity between users by assessing the rating pattern of two users.
For example:-
Item1 Item2 Item3 Item4
Per_x 5 2 5 2
Per_y 5 2 5 2
Here two users are pretty similar. This might be you and your friend.
Hope that helps !!!
I have a table of jobs, with columns showing which engineer the job is assigned to and the area of the country, which can be simplified like this:
JobNumber Area Engineer
1 A 3
2 D 1
3 E 2
4 B 2
5 A 1
I have a table of engineers, and a table of areas of the country.
I have a final table which shows the areas of the country each engineer is assigned to, like this:
Area Engineer
A 1
A 2
A 3
B 2
B 3
What I need to do in Crystal Reports, is create a formula field (or similar) to show whether or not the engineer was in one of his assigned areas.
I think the reason my efforts thus far have failed is that the join is effectively circular. I have played around with SQL Commands and join options to no avail.
Can anybody offer advice on how I can solve this problem?
Imagine I have a MSSQL 2005 table(bbstats) that updates weekly showing
various cumulative categories of baseball accomplishments for a team
week 1
Player H SO HR
Sammy 7 11 2
Ted 14 3 0
Arthur 2 15 0
Zach 9 14 3
week 2
Player H SO HR
Sammy 12 16 4
Ted 21 7 1
Arthur 3 18 0
Zach 12 18 3
I wish to highlight textually where there has been a change in leader for each category
so after week 2 there would be nothing to report on hits(H); Zach has joined Arthur with most strikeouts(SO) at
18; and Sammy is new leader in homeruns(HR) with 4
So I would want to set up a process something like
a) save the past data(week 1) as table bbstatsPrior,
b) updates the bbstats for the new results - I do not need assistance with this
c) compare between the tables for the player(s with ties) with max value for each column
and spits out only where they differ
d) move onto next column and repeat
In any real world example there would be significantly more columns to calculate for
Thanks
Responding to Brents comments, I am really after any changes in the leaders for each category
So I would have something like
select top 1 with ties player
from bbstatsPrior
order by H desc
and
select top 1 with ties player,H
from bbstats
order by H desc
I then want to compare the player from each query (do I need to do temp tables) . If they differ I want to output the second select statement. For the H category Ted is leader `from both tables but for other categories there are changes between the weeks
I can then loop through the columns using
select name from sys.all_columns sc
where sc.object_id=object_id('bbstats') and name <>'player'
If the number of stats doesn't change often, you could easily just write a single query to get this data. Join bbStats to bbStatsPrior where bbstatsprior.week < bbstats.week and bbstats.week=#weekNumber. Then just do a simple comparison between bbstats.Hits to bbstatsPrior.Hits to get your difference.
If the stats change often, you could use dynamic SQL to do this for all columns that match a certain pattern or are in a list of columns based on sys.columns for that table?
You could add a column for each stat column to designate the leader using a correlated subquery to find the max value for that column and see if it's equal to the current record.
This might get you started, but I'd recommend posting what you currently have to achieve this and the community can help you from there.