I have a table that has duplicate email address, I need to insert just one of them into a temp a temp table along with two other fields. there are many example here but I can get any of them work,
I ended up looking into MERGE I get the same results. All the record are getting inserted I'm at a lost. I tried many different samples but it always insert all the records. I went back to make sure the email address are really dupes and they are.. Below is were I'm at now..
MERGE #EmailTable2 AS Target
USING (SELECT EMAIL, NAME, JOB_TITLE FROM b2b_cmas_list$ WHERE EMAIL IS NOT NULL) AS Source
ON (Target.EMAIL = Source.EMAIL)
WHEN NOT MATCHED BY TARGET THEN
INSERT (EMAIL, NAME, JOB_TITLE)
VALUES (Source.EMAIL, Source.NAME, Source.JOB_TITLE)
OUTPUT $action, inserted.*, deleted.*;
so any help in getting this correct would be helpful.
This it not working because SQL doesn't know, which of the two rows containing the same e-mail you want to choose. I mean: if EMAIL is the same, which of pair (NAME and JOB_TITLE) are important and which can be discarded?
Some hints:
If it doesn't matter which item is chosen simply group by EMAIL selecting MAX(NAME) and MAX(JOB_TITLE), i.e.
select EMAIL, max(NAME), max(JOB_TITLE) from b2b_cmas_list$ group by EMAIL
Be warned however that this can mangle NAME-JOB_TITLE pairs (as max is selected).
Try using ROW_NUMBER() OVER() to arbitrarilly select 1st row in each group.
Use a CURSOR to iterate over rows and skip duplicates.
Use .NET CLR aggregate to i.e. concat names and job titles for same e-mail.
And a little note to your MERGE statement. This is not working as expected, because SQL checks all rows at once, and not row-by-row. So it is not that if one e-mail. ie. "a#a.com" is inserted, then another won't. It only matters if "a#a.com" is in the table at the beginning of the statement.
Related
There are a lot of answers about this problem, but none of them retrieves the entire record, but only the ID... and I need the whole record.
So, I have a table status_changes that is composed of 4 columns:
issue_id : the issue the change refers to
id: the id of the change, just a SERIAL
status_from and status_to that are infact the status that the issue had before, and the status that the issue got then
when that is a timestamp of when this happened
Nothing too crazy, but now, I would like to have the "most recent status_change" for each issue.
I tried something like:
select id
from change
group by issue_id
having when = max(when)
But this has obviously 2 big problems:
select contains fields that are not in the group by
2 having can't contains aggregate function in this way
I thought of "ordering every group by when and using something like top(1), but I can't figure out how to do it...
Use PostgreSQL's DISTINCT ON:
SELECT DISTINCT ON (issue_id)
id, issue_id, status_from, statue_to, when
FROM change
ORDER BY issue_id, when DESC;
This will return the first result (the one with the greatest when) for each issue.
I am a First time user of StackOverFlow here!
I have been trying to figure this out for two days and have come up short.
We have a form that displays every single Client / Customer that we have at the firm, in a continuous form view.
We want to be able to display on this form the date, for each client, when we last communicated, or called, the client. (We want to be sure that we prevent a situation where we have not called a client for more than 1.5 months).
I have a query on a table tracking our correspondence, and other activities, regarding our clients that, in SQL, looks like:
' Query ContactCommunications
SELECT Context, ID, NoteDate, ContactID
FROM Comments
WHERE (((Context)="Communication with Client" Or (Context)="Phone Call with Client"));
(ContactID is a secondary key for Contacts table - we are tracking not
only clients but also opposing parties and such)
This is intended to show all the dates we called or communicated with our clients.
I have a second query that then gets the Last Date from this table, grouped by ContactID, which looks like:
' Query qryLastCommunicationAgg
SELECT ContactID, Last(CommentDate) AS LastOfCommentDate
FROM Comments INNER JOIN qryContactCommunications
ON Comments.ID = qryContactCommunications.ID
GROUP BY Comments.ContactID;
My question is how do I get the query result (When we last called each client) into a text field in our Continuous form list? At the moment there will be some null values as well.
I have tried the expression:
=DLookUp("CommentDate","qryLastCommunicationAgg",[ID]=[ContactID])
But it does not work, giving me #Name?
Not sure what I did wrong. :-(
I appreciate greatly any assistance or suggestions!
-Glenn
First, use Max:
SELECT ContactID, Max(CommentDate) AS LastOfCommentDate
Then try with:
=DLookUp("LastCommentDate","qryLastCommunicationAgg","[ID]=" & [ContactID] & "")
("Below is the fixed version of the DLookup script - Glenn")
=DLookUp("LastOfCommentDate","qryLastCommunicationAgg","[ContactID]=" & [ID] & "")
My question is a variation on one already asked and answered (TSQL Delete Using Inner Joins) but I have a different level of complexity and I couldn't see a solution to it.
My requirement is to delete Special Prices which haven't been accessed in 90 days. Special Prices are keyed on Customer ID and Product ID and the products have to matched to a Customer Order Detail table which also contains a Customer ID and a Product ID. I want to write one function that will look at the Special Price table for each Customer, compare each Product for that Customer with the Customer Order Detail table and if the Maximum Order Date is more than 90 days earlier than today, delete it from the Special Price table.
I know I can use a CURSOR (slow but effective) but would prefer to have a single query like the one in the TSQL Delete Using Inner Joins example. Any ideas and/or is more information required?
I cannot dig more on the situation of your system but i think and if it is ok for you, check MERGE STATEMENT, it might be a help instead of using cursors. check this Link MERGE STATEMENT
I'm not sure if this can be achieved in Google Refine at all. But basically, I have data like this.
The first table is the table of all the users. The second table show all the friends. However, in the second table in "friends" column not all the id exists in the first table which I want to get rid of. So, how can I search each id in friends column in the second table and get rid of the id that doesn't exists in the table 1?
Put the two tables in different projects (we'll call them Table1 and Table2).
In Table2 on on the friends column:
use "split multi-valued cells" to get each value on a separate row
convert the visitors column to numbers (or conversely user_id in Table1 to string)
use "add a new column based on this column" with the expression cross(cell,'Table1','user_id').length()
This will return 0 if there's no match, 1 if there's a match or N>1 if there are duplicates in Table1
If you want the data back in the original format, set up a facet to filter on the validity column, blank out all the bad values and then use "join multi-valued cells" to reverse the split operation you did up front.
I fixed some caching bugs with cross() for OpenRefine 2.6, so if the cross doesn't work, try stopping and restarting the Refine server.
Ok, I have a question relating to an issue I've previously had. I know how to fix it, but we are having problems trying to reproduce the error.
We have a series of procedures that create records based on other records. The records are linked to the primary record by way of a link_id. In a procedure that grabs this link_id, the query is
select #p_link_id = id --of the parent
from table
where thingy_id = (blah)
Now, there are multiple rows in the table for the activity. Some can be cancelled. The code I have doesn't disinclude cancelled rows in the select statement, so if there are previously cancelled rows, those ids will appear in the select. There is always going to be one 'open' record that is selected if I disinclude cancelled rows. (append where status != 'C')
This solves this issue. However, I need to be able to reproduce the issue in our development environment.
I've gone through a process where I've entered a whole heap of data, opening, cancelling, etc to try and get this select statement to return an invalid id. However, whenever I run the select, the ids are in order (sequence generated), but in the case where this error occured, the select statement returned what seems to be the first value into the variable.
For example.
ID Status
1 Cancelled
2 Cancelled
3 Cancelled
4 Open
Given the above, if I do a select for the ID I want, I want to get '4'. In the error, the result is 1. However, even if I enter in 10 cancelled records, I still get the last one in the select.
In oracle, I know that if you select into a variable and more than one record is returned, you get an error (I think). Sybase apparently can assign multiple values into a variable without erroring.
I'm thinking that there's either something to do with how the data is selected from the table, where the id's without a sort order don't return in ascending order, or there's a dboption where a select into a variable will save the first or last value queried.
Edit: it looks like we can reproduce this error by rolling back stored procedure changes. However, the procs don't go anywhere near this link_id column. Is it possible that changes to the database architecture could break an index or something?
If more than one row is returned, the value that is stored will be the last value in the list, according to this.
If you haven't specified an order for retrieval via ORDER BY, then the order returned will be at the convenience of the database engine. It may very well vary by the database instance. It may be in the order created, or even appear "random" because of where the data is placed within the database block structure.
The moral of the story:
Always make singleton SELECTs return a single row
When #1 can't be done, use an ORDER BY to make sure the one you care about comes last