PySpark - mean function - pyspark

I have a problem with mean function in PySpark.
I tried applying mean function as follow
*mse = rdd.map(lambda x : (x[2]-predictedRating(x,P,Q))**2).mean()*
Where preditecRating is a function I declared before, but it never ends and tells me that it was stuck at mean(). I don't understand why it doesn't work?
Thank you for your help!

Related

ERROR: typeerror when() missing 1 required positional argument 'value' in PySpark

I got an above-mentioned error while running a dataframe in a databricks using Pyspark. I don't know how to solve this and where I have gone wrong. The code is as follows:
df_inner_select = df_promodata_daypart.select(df_promodata_daypart.sub_master_id,df_promodata_daypart.sub_campaign_id,df_promodata_daypart.resolved_network,df_promodata_daypart.hh_id,df_promodata_daypart.type,df_df_promodata_daypart.localpromoadviewstarttime_min).alias("viewerbytype").groupby(df_promodata_daypart.sub_master_id,df_promodata_daypart.sub_campaign_id,df_promodata_daypart.resolved_network,df_promodata_daypart.hh_id,df_promodata_daypart.localpromoadviewstarttime_min).agg(F.sum(F.when(df_promodata_daypart.type=="NonTargeted",1).otherwise(0).alias("NonTargeted_count")),F.sum(F.when(df_promodata_daypart.type=="Targeted").alias("Targeted_count")))
and also here I need to get the count of the type column as mentioned in the dataframe. Can anyone help me in solving this with quick response as possible ?
Thanks a lot in advance.
Look at the very end of your line:
F.when(df_promodata_daypart.type=="Targeted")
when function requires a condition and a value but you only passed a condition.

Solve nonlinear equation

I am having some problems with my function code. The main idea is to get parameter SI (main unknown) in which the equation Q_cal-Q=0. Can somebody help me?
Thanks a lot.
P=1.94;
Q=1.09;
P5=1.08;
fc=0;
lambda=0.2;
Ts=24;
[SI]=singhandyu(P,Q,P5,lambda,Ts,fc);
function [SI] =singhandyu(P,Q,P5,lambda,Ts,fc)
Fc=fc.*Ts;
f=#(SI)((P5-0.2*SI)*SI)./(P5+0.8*SI);
M=#(SI)max(f(SI),0);
S=#(SI)(SI-M(SI));
Ia=#(SI)lambda.*S(SI);
Q_cal=#(SI)((P-Ia(SI)-Fc).*(P-Ia(SI)-Fc+M(SI)))./(P-Ia(SI)-Fc+M(SI)+S(SI));
H=#(SI)Q_cal(SI)-Q;
S0=0;
SI_sol=fsolve(H,S0)
end
Almost all your anonymous functions need SI as an input, but you are not passing the argument when you call a previously defined function.
To clarify, f requires one input argument,
f=#(SI)((P5-0.2*SI)*SI)./(P5+0.8*SI);
but when calling f in the next line, you are not providing it:
M=#(SI)max(f,0);
So make sure you pass the argument to each function call:
M=#(SI)max(f(SI),0);
S=#(SI)max(SI-M(SI),0);
etc.

Spark (python) - explain the difference between user defined functions and simple functions

I am a Spark beginner. I am using Python and Spark dataframes. I just learned about user defined functions (udf) that one has to register first in order to use it.
Question: in what situation do you want to create a udf vs. just a simple (Python) function?
Thank you so much!
Your code will be neater if you use UDFs, because it will take a function, and the correct return type (defaults to string if empty), and create a column expression, which means you can write nice things like:
my_function_udf = udf(my_function, DoubleType())
myDf.withColumn("function_output_column", my_function_udf("some_input_column"))
This is just one example of how you can use a UDF to treat a function as a column. They also make it easy to introduce stuff like lists or maps into your function logic via a closure, which is explained very well here

Weird behavior of reduceByKeyAndWindow function in Spark

I am using spark 1.6 and came across this function reduceByKeyAndWindow which I am using to perform word count over data transmitted over a kafka topic.
Following is the list of alternatives reduceByKeyAndWindow is providing. As we can see, all the alternatives has similar signatures with extra parameters.
But when I just use reduceByKeyAndWindow with my reduce function or with my reduce function and duration, it works and doesn't give me any errors as shown below.
But when I use the alternative with reduce function, duration and sliding window time it starts giving me the following error, same happens with the other alternatives, as shown below.
I am not really sure what is happening here and how can I fix the problem.
Any help is appreciated
If you comment this line .words.map(x => (x, 1L)) you should be able to use the method [.reduceByWindow(_+_, Seconds(2), Seconds(2))] from DStream.
If you transform the words to words with count, then you should use the below method.
reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
Please see the documentation on more details for what are those reduce function and inverse reduce function https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala

How to insert a structure into a function in matlab

I am writing a matlab function as :
function [resultNorm]= explorEffort (n, loop, step)
...
Somelines
...
M=bench(a,b).Y ;
end
but it seems that the matlab compiler doesn't let me use a structure in the function, the error is:
Error: File: explorEffort.m Line: 20 Column: 15
Functions cannot be indexed using {} or . indexing.
P.S: the bench definition
bench =
24x5 struct array with fields:
application
dataset
mica
micaNorm
DB
Y
Could anyone mention how am I gonna be able to fix that?
Method-1
There is a possibility to define the structure (namely here, bench) as global outside the file and just call global bench; right before the first appearance of the bench.
Method-2
the safer choice could be passing the structure among the input argument of the function as :
function [resultNorm]= explorEffort (n, loop, step, `bench`)
in this case no need to prior unnecessary globalization.
I believe the main issue was that "bench" is a MATLAB builtin(depending on your version of matlab).
You can try renaming your variable in the future.
you can run :
X=bench; or help bench
I run into similar issue before.