custom column name but get error List object is not callable - python-polars

I have csv and no header in it, I want provide header from list
import polars as pl
header = ["col1", "col2"]
df = pl.scan_csv(file_path, has_header = False, with_column_names = header,)
But get error list object is not callable
I try
with_column_names=header[:]
with_column_names=list(header)
Make def function which return list
with_column_names=def_fun(header)
All still error list not Callable
And
with_column_names=list[header]
With this no error list not callable but the header still auto generate with column_1 and so on

According to the doc, with_column_names is a Callable[[list[str]], list[str]].
You can try passing with_column_names = lambda _: header

Related

Mock inner class's attributes using MagicMock

Apologies for a length post. I have been trying to beat my head around reading about mock, MagicMock, and all the time getting confused. Hence, decided to write this post.
I know several questions, and pages have been written on this. But, still not able to wrap my head around this.
My Setup:
All the test code, and the 2 module files come under one "folder" mymodule
my_module_1.py file contains
class MyOuterClass(object):
MyInnerClass(object):
attribute1: str
attribute2: str
attribute3: str
def get(self) -> MyInnerClass:
'''
pseudocode
1. a call to AWS's service is made
2. the output from call in step 1 is used to set attributes of this InnerClass
3. return innerclass instance
'''
I use the OuterClass in another file(my_module_2.py), to set some values and return a string as follows:
class MyModule2():
def get_foo(self, some_boolean_predicate):
if some_boolean_predicate:
temp = my_module_1.OuterClass().get()
statement = f'''
WITH (
BAR (
FIELD_1 = '{temp.attribute1}',
FIELD_2 = '{temp.attribute2}',
FIELD_3 = '{temp.attribute3}'
)
)
'''
else:
statement = ''
return statement
I want to write the unit tests for the file my_module_2.py file, and test the function get_foo
How I am writing the tests(or planning on)
a test file by name test_my_module2.py
I started with creating a pytest.fixture for the MyOuterClass's get function as follows since I will be reusing this piece of info again in my other tests
#pytest.fixture
def mock_get(mocker: MockerFixture) -> MagicMock:
return mocker.patch.object(MyOuterClass, 'get')
Finally,
Then I proceeded to use this fixture in my test as follows:
from unittest import mock
from unittest.mock import MagicMock, Mock, patch, PropertyMock
import pytest
from pytest_mock import MockerFixture
from my_module.my_module_1 import myOuterClass
def test_should_get_from_inner_class(self, mock_get):
# mock call to get are made
output = mock_get.get
#update the values for the InnerClass's attributes here
output.attribute1.side_effect = 'attr1'
output.attribute2.side_effect = 'attr2'
output.attribute3.side_effect = 'attr3'
mock_output_str = '''
WITH (
BAR (
FIELD_1 = 'attr1',
FIELD_2 = 'attr2',
FIELD_3 = 'attr3'
)
)
'''
module2Obj = MyModule2()
response = module2Obj.get_foo(some_boolean_predicate=True)
# the following assertion passes
assert mock_get.get.called_once()
# I would like match `response to that with mock_output_str instance above
assert response == mock_output_str
But, the assertion as you might have guessed failed, and I know I am comparing completely different types, since I see
errors such as
FAILED [100%]
WITH (
BAR (
FIELD1 = '<MagicMock name='get().attr1' id='4937943120'>',
FIELD3 = '<MagicMock name='get().attr2' id='4937962976'>',
FIELD3 = '<MagicMock name='get().attr3' id='4937982928'>'
)
)
Thank you for being patient with me till here, i know its a really lengthy post, but stuck on this for a few days, ended up creating a post here.
How do i get to validate the mock's value with the mock_output_str?
yess! the hint was in the #gold_cy's answer. I was only calling my mock and never setting its values
this is what my test case ended up looking
mock_obj = OuterClass.InnerClass()
mock_obj.attribute1='some-1'
mock_obj.attribute2='some-2'
mock_obj.attribute3='some-3'
mock_get.return_value = mock_obj
once my mock was setup properly, then the validation became easy! Thank you!

AnalysisException: cannot resolve '`item1`' given input columns: [];

I used AWS Glue Transform-custom code, and wrote the following code.
def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
from pyspark.sql.functions import col, lit
selected = dfc.select(list(dfc.keys())[0]).toDF()
selected = selected.withColumn('item2',lit(col('item1')))
results = DynamicFrame.fromDF(selected, glueContext, "results")
return DynamicFrameCollection({"results": results}, glueContext)
I got the following error.
AnalysisException: cannot resolve '`item1`' given input columns: [];
You don't need to wrap lit(col('item1')). lit is used to create a literal static value, while col is to refer to a specific column. Depends on what exactly you're trying to achieve, but this change will at least fix your error
selected = selected.withColumn('item2', col('item1'))

Save custom transformers in pyspark

When I implement this part of this python code in Azure Databricks:
class clustomTransformations(Transformer):
<code>
custom_transformer = customTransformations()
....
pipeline = Pipeline(stages=[custom_transformer, assembler, scaler, rf])
pipeline_model = pipeline.fit(sample_data)
pipeline_model.save(<your path>)
When I attempt to save the pipeline, I get this:
AttributeError: 'customTransformations' object has no attribute '_to_java'
Any work arounds?
It seems like there is no easy workaround but to try and implement the _to_java method, as is suggested here for StopWordsRemover:
Serialize a custom transformer using python to be used within a Pyspark ML pipeline
def _to_java(self):
"""
Convert this instance to a dill dump, then to a list of strings with the unicode integer values of each character.
Use this list as a set of dumby stopwords and store in a StopWordsRemover instance
:return: Java object equivalent to this instance.
"""
dmp = dill.dumps(self)
pylist = [str(ord(d)) for d in dmp] # convert byes to string integer list
pylist.append(PysparkObjId._getPyObjId()) # add our id so PysparkPipelineWrapper can id us.
sc = SparkContext._active_spark_context
java_class = sc._gateway.jvm.java.lang.String
java_array = sc._gateway.new_array(java_class, len(pylist))
for i in xrange(len(pylist)):
java_array[i] = pylist[i]
_java_obj = JavaParams._new_java_obj(PysparkObjId._getCarrierClass(javaName=True), self.uid)
_java_obj.setStopWords(java_array)
return _java_obj

How to use RDD collect method to process each row of RDD in array format?

//making an RDD
val logData = sc.textFile(sampleData).cache()
//making logDataArray[String]
var logDataArray = logData.collect;
But its throwing me error:
java.lang.NullPointerException
at org.apache.spark.rdd.RDD.collect(RDD.scala:717)
at com.Travel$.com$Travel$$isConnected$1(Travel.scala:58)
Before using logData.collect I have check the size of logData as println(logData.count). It give 1168 record size.
If you are writing this line ( var logDataArray = logData.collect )in some function, it is possible sc is not in scope and is getting null. Moreover everytime function is invoked, you will execute collect method.
Try adding line outside your function ie. directly in main method.
it is working for me.

What am I doing wrong with this Python class? AttributeError: 'NoneType' object has no attribute 'usernames'

Hey there I am trying to make my first class my code is as follows:
class Twitt:
def __init__(self):
self.usernames = []
self.names = []
self.tweet = []
self.imageurl = []
def twitter_lookup(self, coordinents, radius):
twitter = Twitter(auth=auth)
coordinents = coordinents + "," + radius
print coordinents
query = twitter.search.tweets(q="", geocode='33.520661,-86.80249,50mi', rpp=10)
print query
for result in query["statuses"]:
self.usernames.append(result["user"]["screen_name"])
self.names.append(result['user']["name"])
self.tweet.append(h.unescape(result["text"]))
self.imageurl.append(result['user']["profile_image_url_https"])
What I am trying to be able to do is then use my class like so:
test = Twitt()
hello = test.twitter_lookup("38.5815720,-121.4944000","1m")
print hello.usernames
This does not work and I keep getting: "AttributeError: 'NoneType' object has no attribute 'usernames'"
Maybe I just misunderstood the tutorial or am trying to use this wrong. Any help would be appreciated thanks.
I see the error is test.twitter_lookup("38.5815720,-121.4944000","1m") return nothing. If you want the usernames, you need to do
test = Twitt()
test.twitter_lookup("38.5815720,-121.4944000","1m")
test.usernames
Your function twitter_lookup is modifying the Twitt object in-place. You didn't make it return any kind of value, so when you call hello = test.twitter_lookup(), there's no return value to assign to hello, and it ends up as None. Try test.usernames instead.
Alternatively, have the twitter_lookup function put its results in some new object (perhaps a dictionary?) and return it. This is probably the more sensible solution.
Also, the function accepts a coordinents (it's 'coordinates') argument, but then throws it away and uses a hard-coded value instead.