Dataframe empty check pyspark - pyspark

I am trying to check if a dataframe is empty in Pyspark using below.
print(df.head(1).isEmpty)
But, I am getting an error
Attribute error: 'list' object has no attribute 'isEmpty'.
I checked if my object is really a dd using type(df) and it is class 'pyspark.sql.dataframe.Dataframe'

I used df.first() == None to evaluate if my spark dataframe is empty

When u do a head(1) it returns a list.
So that’s the reason for your error.
You have to just do df.isEmpty().

df.head(1) returns a list corresponding to the first row of df.
You can check if this list is empty "[ ]" using a bool type condition as in:
if df.head(1):
print("there is something")
else:
print("df is empty")
>>> 'df is empty'
Empty lists are implicity "False".
For better explanation, please head over to python docs.

Another way to do this would be to check if df.count()==0

Related

Removing empty entry in List<dynamic> that only shows while inspecting

I have a list that I am getting the values from an API,
it is List<dynamic> type, while I print it I am getting this output (for example): [cat, female], but when I use inspect it has three values: "cat", "female", "". The last empty value is making some problems in my code, so I wanted to remove it, but I don't know how to do this.
As it is a List<dynamic> I used removeLast() and also toString() but none of them worked for me. I appreciate any help on this.
The solution is to filter items and get the new list:
final newList = list.where((e) => e != null && e != '').toList();
removeLast() does not work as you expected. If you read the comment of the method it says Removes and returns the last object in this list.
PS: I recommend you use a functional way to deal with a list which means do not modify the state of the original list instead, get a new list.

Access protobuf field name dynamically with scala

I am new to Scala. Writing my first application.
I have defined my proto file with fields email_id and phone_number which is request definition for grpc call
I can access values by dot operator like params.emailId
Now what I am trying to do is I have one array of mandatory fields. I want to check the values for those fields defined in an array with input request parameters.
How can i access this params.{field name from array} to check for not empty values.
Getting error for below code with :
val mandatoryFields = Array("emailId","phoneNumber")
println(params.emailId) //works
for (fields <- mandatoryFields) {
println(fields)
println(params.fields) // getting error
}
It has function 'in.getFieldByNumber()' where you can fetch value by index location, is there any function available like getFieldByName() or something like that.
Although it's been a long from the question date, I don't think it is answerless. Actually, I have one:
Using toPMessage method, you'll have your protobuffer case class as an instance of PMessage object. Then you could retrieve the Map[FieldDescriptor, PValue]. Finding field values by name would be like:
val fieldDescriptorPValueMap: Map[FieldDescriptor, PValue] = params.toPMessage.value
mandatoryFields.foreach(fieldName=>{
println(
fieldDescriptorPValueMap
.filter(entry => fieldName == entry._1.name)
.values
.head.as[String]
)
})

Why does FluentIterable.first() throw NPE when the first element is null

Here is the method description for FluentIterable.first().
It throws a NPE but not return Optional.absent() when the first element is null.
I think it makes FluentIterable influent, and Optional as the return value gives me an impression that it's totally safe.
I wonder why? Thanks.
It's a documented behavior:
public final Optional<E> first()
Returns an Optional containing the first element in this fluent iterable. If the iterable is empty, Optional.absent() is returned.
Throws:
NullPointerException - if the first element is null; if this is a possibility, use iterator().next() or Iterables.getFirst(java.lang.Iterable<? extends T>, T) instead.
Emphasis is mine, as a workaround follow the suggestions above. Generally, Guava's collections are null-hostile.
In this particular case, NPE is thrown to avoid ambiguity, because Optional.absent() is returned when iterable is empty, not when first element is null.
As you can see from method code below, FluentIterable.first is not implemented to handle sequences with null values. The Optional in the returned type is to handle the possibility that the sequence is empty.
The following is the code for the method:
public final Optional<E> first() {
Iterator<E> iterator = iterable.iterator();
return iterator.hasNext()
? Optional.of(iterator.next())
: Optional.<E>absent();
}
Note that Optional.of will throw a NPE if the provided object is null.
P.S. Returning a Optional.absent() for the case where the first element is null would make the result indistinguishable from the empty sequence case, and this might be a problem.

using unset in CakePHP MongoDB

I am using Ichikawa CakePHP MongoDB plugin. I have a problem in using unset in it. I have tried the command in shell:
db.patents.update({}, {$unset : {"lv.2" : 1 }},{'multi':true})
db.patents.update({},{$pull:{pid:"2"}},{'multi':true})
These are working fine.
But when I am converting them to CakePHP command as follows:
$this->Detail->updateAll(array('$unset'=>array('lv.2'=>1,array('multi'=>true))));
Then it doesn't work and gives error:
MongoCollection::update(): expects parameter 1 to be an array or object, boolean given
Can anyone help me to figure out the problem.
Thanks.
There are no conditions
The error message means that the query being generated is the equivalent of:
db.details.update(true
This can be confirmed by checking the query log (easy if you're using debug kit).
How is that happening
The second parameter for model updateAll is missing, which means it will have the default:
public function updateAll($fields, $conditions = true) {
^
return $this->getDataSource()->update($this, $fields, null, $conditions);
}
Therefore in the mongodb datasource class - the conditions passed are true:
public function updateAll(&$Model, $fields = null, $conditions = null) {
^
As a consequence, the resultant update statement has true as the first parameter, not an array.
Correct syntax
The correct syntax for such a query is:
$this->Detail->updateAll(
array('$unset'=>array('lv.2'=>1))
array() # <- do not omit this
);
Note that it's not necessary to specify 'multi'=>true as the datasource does that for you, especially not in the fields argument.

Empty list not detected in scala

In my REST API Controller, I receive a list of strings, if the input list is empty i should return bad request.
The problem is the input is empty, the list contains no items, but the check:
if(productIdsList.isEmpty)
Return false.
How could that be ??
It is not empty, it contains an empty String.
Seems like Eclipse shows and empty String as an empty value (and not as "" in Scala REPL) and this is confusing.
Try debugging this, it looks exactly the same.
object A extends Application {
val a = List("")
//any other code here
}
The empty list is Nil and it looks exactly like productIdsList.tl in your debug view