Randomizing SQL data with data from table function - tsql

I trying to make a sql script that will randomize the city, state, and zip code of a "members" table. I have made a table function that returns a single row with columns "city", "state" and "zip" taken from another database at random (via a view). This ensures that I get a city, state, and zip that actually correlate to each other in the real world.
From there I am trying to do something like this:
update t
set
t.City = citystate.city,
t.State = citystate.state,
t.PostalCode = citystate.zip
from
(select
City,
State,
PostalCode from DATABASE.dbo.Member) t,
DATABASE.dbo.getRandomCityState() citystate
Problem is, this only calls my function once, and puts the same city, state, and zip into every row of the table. Is there some way to call my function once for every row in the table?

Use a CROSS APPLY
update t
set
t.City = citystate.city,
t.State = citystate.state,
t.PostalCode = citystate.zip
from
(select
City,
State,
PostalCode from DATABASE.dbo.Member) t
CROSS APPLY
DATABASE.dbo.getRandomCityState() citystate

Ok so thanks to one of my co-workers, we found a solution. It would seem that since the function didn't take any parameters, SQL Server decided that the result would never change. So we tricked the server into thinking it will be different every time by passing a parameter to the function that was different: the id of each row. Since we were passing a different parameter each time, it called the function for every row.
update t
set
t.City = citystate.city,
t.State = citystate.state,
t.PostalCode = citystate.zip
from
(select top 10
City,
State,
PostalCode from TrajectoryServicesTest.dbo.Member) t
cross apply SanitizePhi.dbo.getRandomCityState(t.MemberID) citystate
Kinda hacky, but it worked. Thanks to Joe for the help.

Related

How to map a calculated value to a property?

I'm using Entity Framework to map database to a class in my application.
The class is automatically generated as partial class by EF and I also added some other properties in my own file to the same partial class.
I use these queries to get a list of entities from the table (they are equivalent as far as I can tell):
db.DailyEntry.ToList
db.DailyEntry.SqlQuery("SELECT * FROM DailyEntry").ToList
db.Database.SqlQuery(Of DailyEntry)("SELECT DailyEntry.DailyEntryId, DailyEntry.Driver, DailyEntry.Billing, DailyEntry.EntryDate FROM DailyEntry").ToList
I then added a field to this class:
Public Property IsHoliday As Boolean = False
I iterated through the list and calculated whether the date falls on a bank holiday. I have a database table of holidays:
For Each entry As DailyEntry in _myList_
entry.IsHoliday = db.Database.SqlQuery(Of Boolean)("SELECT IsHoliday FROM Holiday WHERE HolidayDate = {0}", entry.EntryDate).FirstOrDefault
Next
This all works fine but as the amount of records increased, so did the number of database calls and I need to speed the application up by merging these into a single call.
I modified the query to include the holiday info:
SELECT DailyEntry.*, Holiday.IsHoliday AS IsHoliday FROM DailyEntry LEFT OUTER JOIN Holiday ON DailyEntry.EntryDate = Holiday.HolidayDate;
or
SELECT DailyEntry.DailyEntryId, DailyEntry.Driver, DailyEntry.Billing, DailyEntry.EntryDate, Holiday.IsHoliday AS IsHoliday FROM DailyEntry LEFT OUTER JOIN Holiday ON DailyEntry.EntryDate = Holiday.HolidayDate;
and a few similar tries. They work well when I test them as queries on the database and return expected data, but when I try to use them in my app:
db.Database.SqlQuery(Of DailyEntry)("SELECT DailyEntry.*, Holiday.IsHoliday AS IsHoliday FROM DailyEntry LEFT OUTER JOIN Holiday ON DailyEntry.EntryDate = Holiday.HolidayDate").ToList
the IsHoliday property is always left at default value (False).
Is there a way to get these calculated columns that are not part of the original database table to map to my properties?
Thanks in advance,
Zdenek

it is possible to concatenate one result set onto another in a single query?

I have a table of Verticals which have names, except one of them is called 'Other'. My task is to return a list of all Verticals, sorted in alpha order, except with 'Other' at the end. I have done it with two queries, like this:
String sqlMost = "SELECT * from core.verticals WHERE name != 'Other' order by name";
String sqlOther = "SELECT * from core.verticals WHERE name = 'Other'";
and then appended the second result in my code. Is there a way to do this in a single query, without modifying the table? I tried using UNION
(select * from core.verticals where name != 'Other' order by name)
UNION (select * from core.verticals where name = 'Other');
but the result was not ordered at all. I don't think the second query is going to hurt my execution time all that much, but I'm kind of curious if nothing else.
UNION ALL is the usual way to request a simple concatenation; without ALL an implicit DISTINCT is applied to the combined results, which often causes a sort. However, UNION ALL isn't required to preserve the order of the individual sub-results as a simple concatenation would; you'd need to ORDER the overall UNION ALL expression to lock down the order.
Another option would be to compute an integer order-override column like CASE WHEN name = 'Other' THEN 2 ELSE 1 END, and ORDER BY that column followed by name, avoiding the UNION entirely.

Postgres ORDER BY using double precision not properly sorting

I'm trying to execute what should be a simple query using Postgres 9.3.2.
SELECT
event.id AS event_id, event.created_at AS event_created_at, event.updated_at AS event_updated_at, event.title AS event_title, event.summary AS event_summary, event.image AS event_image, event.active AS event_active, event.raw_score AS event_raw_score, event._score AS event__score, user_1.id AS user_1_id, user_1.email AS user_1_email, user_1.image AS user_1_image, user_1.name AS user_1_name, user_1.password AS user_1_password, user_1.active AS user_1_active, user_1.confirmed_at AS user_1_confirmed_at, user_1.created_at AS user_1_created_at, user_1.updated_at AS user_1_updated_at
FROM event
LEFT OUTER JOIN (
users_events AS users_events_1
JOIN "user" AS user_1 ON user_1.id = users_events_1.user_id
) ON event.id = users_events_1.event_id
ORDER BY event._score DESC
I'm expecting the results to be ordered by event._score, descending, but they never are. They don't seem to be ordered in any way relating to event._score. The datatype for that column is double precision.
The order I expect the values to be in are:
5787.0428202
5787.0427916
5787.0427628
But they are returned as:
5787.0427916
5787.0427628
5787.0428202
I've been struggling with this for a couple hours now and can't figure out why something so simple isn't working correctly.
After a lot more digging, it turns out that this was not an issue with Postgres, but with how the ORM I was using (sqlalchemy) was setup. The _score property was not properly being set in the database, though it was set within the session, so the objects in the session were returning the proper values, but objects fetched from the database were not.

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.

Postgres - Get data from each alias

In my application i have a query that do multiple joins with a table position. Just like this:
SELECT *
FROM (...) as trips
join trip as t on trips.trip_id = t.trip_id
left outer join vehicle as v on v.vehicle_id = t.trip_vehicle_id
left outer join position as start on trips.start_position_id = start.position_id and start.position_vehicle_id = v.vehicle_id
left outer join position as "end" on trips.end_position_id = "end".position_id and "end".position_vehicle_id = v.vehicle_id
left outer join position as last on trips.last_position_id = last.position_id and last.position_vehicle_id = v.vehicle_id;
My table position has 35 columns(for example position_id).
When I run the query, in result should appear the table position 3 times, start, end and last. But postgres can not distinguish between, for exemplar, start.position_id, end.position_id and last.position_id. So this 3 columns are group and appear as one, position_id.
As the data from start.position_id and end.position_id are different, the column, position_id, that appear in result, it's empty.
Without having to rename all the columns, like this: start.position_id as start_position_id.
How can i get each group of data separately, for exemple, get all columns from the table 'start'. In MYSQL i can do this operation by calling fetch_fields, and give the function an alias, like 'start'.
But i can i do this in Postgres?
Best Regards,
Nuno Oliveira
My understanding is that you can't (or find it difficult to) discern between which table each column with a shared name (such as "position_id") belongs to, but only need to see one of the sets of shared columns at any one time. If that is the case, use tablename.* in your SELECT, so SELECT trips.*, start.*... would show the columns from trips and start, but no columns from other tables involved in the join.
SELECT [...,] start.* [,...] FROM [...] atable AS start [...]