Pyspark: comparing elements of RDD - pyspark

Similar to the question I posted here about working with DFs, how can I retrieve the first element in each sequence, but in this situation using RDDs? I want to compare each item to the 1 previous. Items that repeat later in the sequence are acceptable ie (67,375, 14:20:14) might appear later in the RDD and should be kept.
Input
(67, 312, 12:09:00)
(67, 375, 12:23:00)
(67, 375, 12:25:00)
(67, 650, 12:26:00)
(75, 650, 12:27:00)
(75, 650, 12:29:00)
(75, 800, 12:30:00)
(67, 375, 14:20:14)
Output
(67, 312, 12:09:00)
(67, 375, 12:23:00)
(67, 650, 12:26:00)
(75, 650, 12:27:00)
(75, 800, 12:30:00)
(67, 375, 14:20:14)

This would work. But, my only concern is that, you cannot rely on the order of the output the transformation on rdd will result in. So, to retain the order, I strongly suggest you to sort by a column, here you fortunately have the timestamp column.
If you are not planning to sort by timestamp, then please go with dataframe windowing approach. Even there, you might need sorting :)
rdd = sc.parallelize([(67, 312, "12:09:00"),
(67, 375, "12:23:00"),
(67, 375, "12:25:00"),
(67, 650, "12:26:00"),
(75, 650, "12:27:00"),
(75, 650, "12:29:00"),
(75, 800, "12:30:00") ])
# Fix 1st two columns as keys.
rdd_fix_keys = rdd.map(lambda x:((x[0],x[1]),(x[2])))
# Group the values of similar keys.
rdd_group_by_key = rdd_regroup_keys.reduceByKey(lambda x,y:(x,y))
# Pick first occurence of the grouped values, as per your requirement.
rdd_pick_first_occurence = rdd_group_by_key.map(lambda x:(x[0], x[1][0]) if not isinstance(x[1], str) else x)
# Sort by timestamp.
rdd_pick_first_occurence.map(lambda x:(x[0][0],x[0][1],x[1])).sortBy(lambda x: x[2]).collect()
Note: The order is changed here.

Related

Interrupt a line in Chart.js lines

I've a weird situation in chart.js, see the picture
Basically a dataset with 4 date and 4 numbers. All 4 numbers value are 1 (doesnt matter).
But actually the real data need to show just 2 intervals (1/1/2020 -> 2/2/2020) and (3/4/2021->6/6/2021). Basically without the segment in the middle.
In this case there is no way Chart.js would be able to understand to not drawn that segment, all values are 1 in all 4 different dates.
So the only solution in my mind is to sub divide all the intervals so I can place a NaN in the middle and use something like stepped:true for the line. But with a lot of data I basically double the numbers of dates making the graph more confusing.
So the question is.. Is there any way to specify for given point if it's a start or an end ?
Or maybe there is a better approach instead of a single line dataset?
Thank you.
Just pass 2 datasets:
const labels = Utils.months({count: 7});
const data = {
labels: labels,
datasets: [{
label: 'My First Dataset',
data: [65, 59, 80],
fill: false,
borderColor: 'rgb(75, 192, 192)',
tension: 0.1
},{
label: 'My 2nd Dataset',
data: [null, null, null, 81, 56, 55, 40],
fill: false,
borderColor: 'rgb(75, 40, 192)',
tension: 0.1
}]
};
If You pass objects instead of arrays as data, then you do not even have to pad with nulls

How to draw dynamic line path on imageView?

I want to draw a line on Image view, Example is below-
I have a path A to F in imageView.
If I select A and then D- Path will generate like green line and If I select A and F- Path will generate like the red line.
How to do this on image view?
All point(A,B,C,D,E & F) are UIButton
First of all I must say your question was not very clear, it took me a while to figure out what you actually want to achieve. I hope I understand correctly. Without simply typing out the actual code you need, this is how I would approach it:
I would create an array with dictionaries containing the letter as key and the point as value, looking something like this:
[
["A": CGPoint(x: 0, y: 0)],
["B": CGPoint(x: 0, y: 0)],
["C": CGPoint(x: 0, y: 0)],
["D": CGPoint(x: 0, y: 0)],
["E": CGPoint(x: 0, y: 0)],
["F": CGPoint(x: 0, y: 0)],
]
(or nicer, this array could have custom models containing similar info)
Then, use this array to plot the UIButtons on the view, giving each UIButton a reference to the item in the array by subclassing it, e.g the key of the dictionary.
CustomButton: UIButton {
var key: String?
}
Then when tapping the first button, store that key in a local variable named start and when tapping the second button store that key in a local variable named end. Subsequently, loop through the array and find the matching start key and use the values in the array to draw the lines until you find the matching end key.
Since I was in the mood, I created a simple playground demo, but this is usually not the way it works. You don't just ask for sample code, you ask for help to solve a problem.
Anyway, you can find the demo here: https://gist.github.com/pvroosendaal/f1617fe7e164bc94f0d37d3175252e2f
By the way, this is by far the crappiest piece of code I wrote, but it does what you want.
I hope it makes sense, otherwise, please let me know.

Clustering a sequence of numbers

I have a small clustering problem- I have this sequence:
349, 1496, 348, 1497, 347, 1503, 1502, 1495, 353, 352, 351, 1501, 354, 1504, 1498, 1500
And I want to detect that there are two clusters- one around 350 and other around 1500. Is there any straightforward solution to this? So far I tried rounding to nearest 100, e.g. int(round(x1 / 100.0)) * 100, which does not always work because the numbers may vary; and the other is using silhouette method which seems too much for this small problem.
Sort the data.
Split at the largest Gap.

FPDF align values on right

I am using FPDF to generate a report, and i need to align the number on right taking the number on top(Year) by reference.
But I am having a problem on this align.
If i use a function Cell like this:
$pdf->Cell(0,5,$alue,'B',1,'D');
All values stay on right Overlapping.
I tried to use a function SetX but did not changed anything.
how it is now
Take a look at the documentation. It states that:
Cell width. If 0, the cell extends up to the right margin.
Since you're right-aligning your text and your Cell sits on the right margin of the page, it makes sense that it does not align properly.
Try specifying a width for your cell. For example, replace your example line of code with:
$pdf->Cell(50,5,$alue,'B',1,'D');
<?php
//right align
$pdf->Cell(50, 5, $alue, 0, 0, 'R' );
//Left Align
$pdf->Cell(50, 5, $alue, 0, 0, 'L' );
//Center Align
$pdf->Cell(50, 5, $alue, 0, 0, 'C' );
?>

How do I sort lines semi-lexiographically in emacs -- i.e., lexiographically, except that 3 gets sorted above 11?

How do I sort lines semi-lexiographically in emacs -- i.e., lexiographically, except that 3 gets sorted above 11? For example, I have a large collection of data, each entry of which looks like
[ 5, 3, 21, 1600000 ],
[ 3, 11, 21, 6400000 ],
[ 3, 3, 102, 1600000 ],
etc...
M-x sort-lines sorts this as
[ 3, 11, 21, 6400000 ],
[ 3, 3, 102, 1600000 ],
[ 5, 3, 21, 1600000 ],
but I would really like this sorted as
[ 3, 3, 102, 1600000 ],
[ 3, 11, 21, 6400000 ],
[ 5, 3, 21, 1600000 ],
Thanks!
sehe gives a good solution. Here it is in Emacs:
C-u M-| sort -k2n -k3n
Run that with your region selected and it will be replaced with the sort ouput!
I don't use emacs, but in vim I'd do:
%!sort -k2n -k3n
(possibly using the other key columns as well, I can't tell form the sample)
I'm not starting the editor war here... I'm just pretty sure that emacs allows you to filter through a shell command as well, so this will help!
With your data, the following will do what you want, though it is a bit laborious:
C-u 3 M-x sort-numeric-fields
C-u 2 M-x sort-numeric-fields
I don't know for sure that sort-numeric-fields is a stable sort, so it may not always work. And, obviously the above only sorts 2 numbers "deep" and you'll need to add C-u 4 M-x sort... if you want to sort by the 3rd number. The prefix argument starts with 2 because the first field is the [, and counting begins with 1.
You could also roll your own by calling sort-subr with the appropriate, lexographic, predicate. See the documentation for sort-subr for more details.