Error merging files: . Error says that variable CLAIM_NUMBER does not uniquely identify observations in the using data. How do I fix that? - merge

In Stata, the error says that variable CLAIM_NUMBER does not uniquely identify observations in the using data. How do I fix that?
My code:
cd"abcd"
use NY2019_2021
merge m:1 CLAIM_NUMBER using FULLNY
keep if _merge==3
drop _merge```

I was able to drop the ones that weren't unique
code:
duplicates list CLAIM_NUMBER
duplicates drop CLAIM_NUMBER, force

Related

Which Stage is used to Combine Two Data Stream without Common Key Field in DataStage (IBM)

I'm using Data Stage version 11.7 and encountered the error message below from the Lookup stage while compiling the job:
"The supplied expression was empty."
In the Lookup Stage, there are two links from two transformers and there is no common key column between the two datasets.
I googled how to merge or combine the two datasets from two transformers without a common key column. However, I couldn't find a proper way to solve this issue or the way implementing my job in DataStage.
Empty Expression
Is there anyone who knows how to solve this problem? If so, please let me know which stage is good for my job or how to solve the error. I would appreciate it.
If you need to join n:m, add a dummy column to each input link and fill it with a constant value like 1. Then join over that column. Decide if mutiple matches result in mutiple output rows or if the first match 'wins' - which would be like a random n:1 then, since every row matches when joining over a const value.
But if you need to join specific rows, it indicates that there actually is a common key but it's not obvious or visible. Either Transform the sources so that they get a common key or use an anchor table that provides the relations. Join that to the first source, then join the second source.

Spark - Update target data if primary keys match?

Is it possible to overwrite a record in the target if specific conditions are met using spark without reading the target into a dataframe? For example, I know we can do this if both sets of data are loaded into dataframes, but I would like to know if there is a way to perform this action without loading the target into a dataframe. Basically, a way to specify overwrite/update conditions.
I am guessing no, but I figured I would ask before I dive into this project. I know we have the write options of append and overwrite. What I really want is, if a few specific columns already exist in the data target, then overwrite it and fill in the other columns with the new data. For example:
File1:
id,name,date,score
1,John,"1-10-17",35
2,James,"1-11-17",43
File2:
id,name,date,score
3,Michael,"1-10-17",23
4,James,"1-11-17",56
5,James,"1-12-17",58
I would like the result to look like this:
id,name,date,score
1,John,"1-10-17",35
3,Michael,"1-10-17",23
4,James,"1-11-17",56
5,James,"1-12-17",58
Basically, Name and Date columns act like primary keys in this scenario. I want updates to occur based on those two columns matching, otherwise make a new record. As you can see id 4 overwrites id 2, but id 5 appends because the date column did not match. Thanks ahead guys!

Talend tMap Set Default Value for Rejected Inner Joins and connect them with the main data flow

i've got the following problem.
I have several tMaps, each has a lookup and at the end all the data is written in a db. The following mockup shall illustrate it:
There can be values in the main data stream which are not found in the lookup tables. For this values there is a rejected path which catches them from the specific tMap.
Requirements:
In case of a rejected inner join the looked up value shall be set to a default value (for example 0, which could be done in the schema of the tMap) and after that these "corrected" records should be added to the "normal" main data flow and process the next lookup.
The tUnite component is not able to handle this cases because it can not exist in a data flow loop.
Does anybody got an idea how to solve this problem?
Cheers.
The answer was so easy that i didn't got it in the first conception. I just have to change the join model from inner to left-join so all the formal rejected values will have a null value in it. Afterwards i can check the columns in the tmap and set them on a default value if they are null.
row1.id == null ? 0 : row1.id
Cheers.
If I understand correctly what you are trying to accomplish you will have to have staging files or staging tables on the database. Once you get the rejected rows, write them on a file or table. The accepted files will go also to a staging table(different than the rejected). Then you can union both tables or files by reading them. The key point is having a staging structure. I attach a picture what how would it be. In the picture the staging structure is a mysql table.
Let me know if it helps!

DB2 ERRORCODE=-4229, SQLSTATE=null

I'm getting this error while executing a batch operation.
Use getNextException() to retrieve the exceptions for specific batched elements.ERRORCODE=-4229, SQLSTATE=null
I'm not finding any pointer to proceed with debugging this error.
Appreciating any help!!!
Search for the error on the IBM page:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.java%2Fsrc%2Ftpc%2Fimjcc_rjvjcsqc.htm
-4229 Message text: text-from-getMessage Explanation: An error occurred during a batch execution.
User response: Call SQLException.getMessage to retrieve specific
information about the problem.
So, it might be related to any underlying error during the execution of your batch insert/update/delete
For those who are looking for an solution to this error.
For me this was due to
THE INSERT OR UPDATE VALUE OF FOREIGN KEY constraint-name IS INVALID.
DB2 SQL Error: SQLCODE=-530, SQLSTATE=23503
In my case, this occurred because I had an unique covering index defined on two columns and the combination of these two values was not unique when I was inserting the records.
For anyone who is still wondering, try entering a unique record and check if the error still persists?
For me it was because of duplicate entry of a foreign key.
In my case, this was due to having rows in the database with the same PK IDs that the sequence was generating. The solution can be to fix these "future" row IDs or adapt the sequence to jump those numbers.

get duplicate record in large file using MapReduce

I have a large file contain > 10 million line. I want to get dupplicate line using MapReduce.
How can I solve this problem?
Thanks for help
You need to make use of the fact that the default behaviour of MapReduce is to group values based on a common key.
So the basic steps required are:
Read in each line of you file to you mapper, probably using something like the TextInputFormat.
Set the output Key (Text object) to the value of each line. The contents of the value doesn't really matter. You can just set it to a NullWritable if you want.
In the reduce check the number of values grouped for each key. If you have more than one value you know you have a duplicate.
If you just want the duplicate values, write out the keys that have multiple values.