Databricks Error: Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table conflicting way - pyspark

I am attempting to carry the following merge statement with PySpark on the table below (please note, this is my first attempt at creating a table on Stack Overflow using HTML snippet, so I have it shows the table - I think you have to click on RUN CODE SNIPPET to view the table).
try:
#Perform a merge into the existing table
if allowDuplicates == "true":
(deltadf.alias("t")
.merge(
partdf.alias("s"),
f"s.primary_key_hash = t.primary_key_hash")
.whenNotMatchedInsertAll()
.execute()
)
else:
(deltadf.alias("t")
.merge(
partdf.alias("s"),
"s.primary_key_hash = t.primary_key_hash")
.whenMatchedUpdateAll("s.change_key_hash <> t.change_key_hash")
.whenNotMatchedInsertAll().
execute()
)
However, I keep on getting the error:
Cannot perform Merge as multiple source rows matched and attempted to
modify the same target row in the Delta table in possibly conflicting
ways. By SQL semantics of Merge, when multiple source rows match on
the same target row, the result may be ambiguous as it is unclear
which source row should be used to update or delete the matching
target row.
Can someone take a look at my code and let me know why I keep on getting the error please.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252" />
<title>Export Data</title>
<style type="text/css">
.h {
color: Black;
font-family: Tahoma;
font-size: 8pt;
}
table {
border-collapse: collapse;
border-width: 1px;
border-style: solid;
border-color: Silver;
padding: 3px;
}
td {
border-width: 1px;
border-style: solid;
border-color: Silver;
padding: 3px;
}
.rh {
background-color: White;
vertical-align: Top;
color: Black;
font-family: Tahoma;
font-size: 8pt;
text-align: Left;
}
.rt {
background-color: White;
vertical-align: Top;
color: Black;
font-family: Tahoma;
font-size: 8pt;
text-align: Left;
}
</style>
</head>
<bodybgColor=White>
<p class="h"></p>
<table cellspacing="0">
<tr class="rh">
<td>Id</td>
<td>SinkCreatedOn</td>
<td>SinkModifiedOn</td>
</tr>
<tr class="rt">
<td>AC28CA8A-80B6-EC11-983F-0022480078D3</td>
<td>15/12/2022 14:02:51</td>
<td>15/12/2022 14:02:51</td>
</tr>
<tr class="rt">
<td>AC28CA8A-80B6-EC11-983F-0022480078D3</td>
<td>16/12/2022 18:30:59</td>
<td>16/12/2022 18:30:59</td>
</tr>
<tr class="rt">
<td>AC28CA8A-80B6-EC11-983F-0022480078D3</td>
<td>16/12/2022 18:55:04</td>
<td>16/12/2022 18:55:04</td>
</tr>
<tr class="rt">
<td>AC28CA8A-80B6-EC11-983F-0022480078D3</td>
<td>20/12/2022 16:26:45</td>
<td>20/12/2022 16:26:45</td>
</tr>
<tr class="rt">
<td>AC28CA8A-80B6-EC11-983F-0022480078D3</td>
<td>22/12/2022 17:27:45</td>
<td>22/12/2022 17:27:45</td>
</tr>
<tr class="rt">
<td>AC28CA8A-80B6-EC11-983F-0022480078D3</td>
<td>22/12/2022 17:57:48</td>
<td>22/12/2022 17:57:48</td>
</tr>
</table>
<p class="h"></p>
</body>
</html>
I am going to use the following dedup code as suggested:
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number
df2 = partdf.withColumn("rn", row_number().over(Window.partitionBy("primary_key_hash").orderBy("Id")))
df3 = df2.filter("rn = 1").drop("rn")
display(df3)
In order to make the code work with my merge statement would it need to look like the following:
try:
#Perform a merge into the existing table
if allowDuplicates == "true":
(deltadf.alias("t")
.merge(
df3.alias("s"),
f"s.primary_key_hash = t.primary_key_hash")
.whenNotMatchedInsertAll()
.execute()
)
else:
(deltadf.alias("t")
.merge(
df3.alias("s"),
"s.primary_key_hash = t.primary_key_hash")
.whenMatchedUpdateAll("s.change_key_hash <> t.change_key_hash")
.whenNotMatchedInsertAll().
execute()
)
You will notice the I have removed the partdf from the merge statement and replace it with df3

I tried to reproduce the scenario and got same error.
According to the above problem, there shouldn't be any duplicate fields in the Source table that you are comparing in the Target table while performing a MERGE operation on it. The SQL engine automatically performs this check to prevent erroneous modifications and inconsistent data.
The simple solution is De-duplication logic should thus be present before the MERGE process to avoid this problem. You may quickly try to eliminate duplicates by using window functions, dropduplicates fuction dropping duplicated rows or any other logic in accordance with your needs:
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number
df2 = partdf.withColumn("rn", row_number().over(Window.partitionBy("P_key").orderBy("Id")))
df3 = df2.filter("rn = 1").drop("rn")
display(df3)
Executed successfully with above created dataframe:

Related

Html email and fluid table with changing borders

I have some design requirements for an email template where I have two "challenges":
two columns need to flip to one column
some visible border lines need to be switched from vertical to horizontal
The following shows how it should look (2 columns on the left for desktop, 1 column on the right for mobile):
The whole email is based on responsive tables and the two-column part is implemented as follows right now:
<table cellpadding="5" cellspacing="0"
style="background-color:#F6F6F6; font-size: 14px; color:#58595b; width:100%; border-collapse:collapse;">
<tr><td align="center" valign="top" height="10" colspan=2 style="line-height: 10px; font-size: 10px;"> </td></tr>
<tr>
<td valign="top" style="border-right: 1.5px solid; border-color: #d0d0d0; padding-right:40px; text-align:right; width:42%; vertical-align:top;">
Start point
</td>
<td valign="top" style="padding-left:40px; vertical-align:top;">
<strong>Fri, January 12, 2023 12:00</strong>
<br />
Harbour, Seatown
</td>
</tr>
<tr><td align="center" valign="top" height="10" colspan=2 style="line-height: 10px; font-size: 10px;"> </td></tr>
<tr>
<td valign="top" style="border-right: 1.5px solid; border-color: #d0d0d0; padding-right:40px; text-align:right; width:42%; vertical-align:top;">
End point
</td>
<td valign="top" style="padding-left:40px; vertical-align:top;">
<strong>So, January 18, 2023 10:00</strong>
<br />
Central Station, Capital
</td>
</tr>
<tr><td align="center" valign="top" height="10" colspan=2 style="line-height: 10px; font-size: 10px;"> </td></tr>
</table>
I tried the approach with having a left and a right table (explained here) but the problem is that I do not use fixed widths.
How could I achieve the required design with a responsive mail template?
You will need to use the technique as outlined in the link, but if you want to use percentages instead of fixed widths, then use width="50%".
This is because the technique works on the basic fundamentals of HTML, that if a block doesn't fit in the space available, it will automatically shift underneath.
So to enable the stacking without a fixed pixel width, you will need to add a #media query to force the stacking (otherwise it would not stack).
e.g.
#media (max-width: 620px)
.table-left,.table-right {
width: 100% !important;
}
(The article you link to is a bit outdated: don't use [class=...], just write it out normally. Gmail may strip the entire <style> section if it doesn't like something in it, and this is one of those things it doesn't like.)
I prefer an override (max-width, and !important) because you want everything possible inline, and only to use embedded styles where strictly necessary.
But that's also why it's best to use a fixed pixel width, because some email clients do not respect your embedded styles (styles in the <head>). GANGA emails (one form of Gmail account) fall into this category. Those email clients would not stack even though they may need to, if you fully rely on the #media query for the stacking.
To override the border, put a style on the <td>, and reference it in the #media query, e.g.
#media (max-width: 620px)
.border {
border-right:none!important;
}
.border-top {
border-bottom:1.5px solid #d0d0d0 !important;
}
As one doesn't have the same border structure (they don't both need border-bottom), one of the <td>s will need a different class. Here, I'm expecting the first one, i.e. <td class="border border-top">.

grouping tr but browser closes tbody prematurely

When using LitElement to render data dynamically, the browser inserts tbody all the time negating any effort to "group" table rows.
render() {
return html`
<table style="border-collapse:collapse;border:solid 1px #666;">
${this.rows.map((row)=>{
var disp= html`
${(row=="FOUR")?html`</tbody>`:''}
${(row=="TWO")?html`
<tbody style="border:solid 2px #F00; border-collapse:separate;">
<tr>
<td>SOME HEADER</td>
</tr>
`:''}
<tr>
<td>${row}</td>
</tr>
`
return disp;
})}
</table>
`;
} //render()
constructor() {
super();
this.rows = ['ONE','TWO','THREE','FOUR'];
}
The result is the tbody is closed immediately after the closing tr of "SOME HEADER" instead of the tbody being closed after the tr "FOUR" which is what we want in this case.
It looks like the browser does this by itself because it always wants to insert a tbody whenever a tr is written to the DOM?
So I suspect this would happen to any framework that renders data dynamically - but have not verified this.
I assume this is not going to be "fixed" anytime soon.
That being the case, anyone have a suggestion on how to group tr's together in a block in this case?
Thanks!
If you have an unclosed <tbody> in a document fragment the browser will close it for you.
Instead nest the <tr> you want to group inside a template that holds both the <tbody> and closing </tbody>:
const groups = //process this.rows into groups
return html`
<table style="border-collapse:collapse;border:solid 1px #666;">
${groups.map(g => html`
<tbody style="border:solid 2px #F00; border-collapse:separate;">
<tr><th>${g.name}</th></tr>
${g.rows.map(r => html`
<tr><td>${r}</td></tr>`)}
</tbody>`)}
</table>`;

i-check checkbox in fooTable Header <th> not working

In FooTable 3.1.4 I want to use a pretified i-Check checkbox for a checkAll functionality in the Header of the table.
This is the HTML without i-Check:
<th data-type="html" data-sortable="false"
data-filterable="false" style="display: table-cell;"
class="footable-last-visible">Choose
<input name="check_all" class="all" type="checkbox">
</th>
When we run this script without i-Checks it runs fine. However - applying i-Checks makes the prettified checkbox unclickable - We are unable to check / uncheck.
This is the HTML with i-Check applied:
<th class="footable-last-visible" data-type="html" data-sortable="false"
data-filterable="false" style="display: table-cell;">Kies
<div class="icheckbox_square-green" style="position: relative;">
<input type="checkbox" name="check_all" class="all"
style="position: absolute; opacity: 0;">
<ins style="position: absolute; top: 0%; left: 0%; display: block; width: 100%; height: 100%; margin: 0px; padding: 0px; background: rgb(255, 255, 255) none repeat scroll 0% 0%; border: 0px none; opacity: 0;"
class="iCheck-helper">
</ins>
</div>
</th>
So it seems FooTable does not accept the i-Checks modified HTML in the head of the table. I did find a (closed) Github Issue post addressing the problem :
"the issue was that the sorting component worked off of a click on the
entire TH element and had a call to e.preventDefault() in the handler.
This was basically killing the default click behavior of elements
placed within the header element. I've since removed this limitation
and it will be released in the next version shortly."
But this post does not clarify as of which version of FooTable this problem is solved.
Or did I make a mistake in the code ..... So - any input much appreciated.
your script must be:
//first
$('.table').footable();
//after
$('#checkall').on('ifChecked ifUnchecked',function(evant){
if(evant.type == 'ifChecked')
$('.check').iCheck('check');
else
$('.check').iCheck('uncheck');
});

iTextSharp XMLWorker does not work on css border-collapse: collapse;

The HTML code is:
<html>
<head>
<title>test</title>
<style type="text/css">
table {
border-collapse: collapse;
}
table tr td {
border: 2px solid black;
}
</style>
</head>
<body>
<table>
<tr>
<td>row 1 cell 1</td>
<td>row 1 cell 2</td>
</tr>
<tr>
<td>row 2 cell 1</td>
<td>row 2 cell 2</td>
</tr>
</table>
</body>
</html>
But in the output PDF file, the inner borders are doubled width.
I'm using the latest iTextSharp 5.5.6 & XML Worker 5.5.6.
Anyone has any idea why?
Thanks!
Leo
border-collapse: collapse; appears to not actually collapse the borders, but just move them really close to each other, if you look carefully you can see a fine line in the middle of the fat border. I could only see it when my pdf is opened in Chrome, not in my pdf-reader.. Here is a screenshot showing what i mean:
So I ended up setting the top and left borders of the table, and the right and bottom borders of the cells in the table, which gave me the desired thin line of only 1px instead of 2px, (AKA: only one line, instead of two lines)
.tableborder {
border-collapse: collapse;
border-spacing: 0;
border-top-color: black;
border-top-width: 1px;
border-top-style: solid;
border-left-color: black;
border-left-width: 1px;
border-left-style: solid;
}
.tableborder th, .tableborder td {
border-collapse: collapse;
border-spacing: 0;
border-right-color: black;
border-right-width: 1px;
border-right-style: solid;
border-bottom-color: black;
border-bottom-width: 1px;
border-bottom-style: solid;
}
Not pretty but it does the job ;-)
Before:
After:
#user538220, you mentioned a simple workaround in your comment, was it something like this, or was it a better solution?
below code worked for me.
table th,td {
border-right-width: 0px;
border-bottom-width: 0px;
border-left-width: 0px;
border-top-width: 0px;
}

how can I add cellspacing to pdftable when parsing html using XMLWorker and itext

I am using XMLWorker and itext to convert html to pdf .
my html have a table and I need to set it's cellspacing =0 cellpadding=0 .
does anyone know how to do it ?
in html I saw I can replace it by setting the style :
border-collapse: collapse;
border-spacing: 0px ;
border : 0;
padding : 0;
thanks
Tami
I've tried what you're doing using the CSS you propose and it works for me:
You can find my test here: ParseHtmlTable5
This is my HTML (including the CSS): table3_css.html
<html>
<head>
<style>
table, td {
border: 1px solid green;
border-spacing: 0px;
padding: 0px;
}
</style>
</head>
<body>
<table class='test'>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Savings</th>
</tr>
<tr>
<td>Peter</td>
<td>Griffin</td>
<td>$100</td>
</tr>
<tr>
<td>Lois</td>
<td>Griffin</td>
<td>$150</td>
</tr>
<tr>
<td>Joe</td>
<td>Swanson</td>
<td>$300</td>
</tr>
<tr>
<td>Cleveland</td>
<td>Brown</td>
<td>$250</td>
</tr>
</table>
</body>
</html>
I suggest that you compare your HTML with mine to find out what you're doing wrong.
You should also use the latest version of XML Worker and iText(Sharp) as we've improved HTML parsing significantly in the latest releases.
Note that I've defined a solid, green border of 1px to prove that there is no padding and no spacing between the cells. If you change the CSS like this:
<style>
table, td {
border: 0px;
border-spacing: 0px;
padding: 0px;
}
</style>
You'll get the (ugly) version of a table without borders, without spacing between the cells and without padding inside the cells.