Reading jsonlines with odo

Reading jsonlines with odo - odo

I was expecting something like this to work:
from odo import odo
import pandas as pd
odo('jsonlines://offentliggoerelser.jsonl', pd.DataFrame)
However, I get an exception
NotImplementedError: Unable to parse uri to data resource: jsonlines://offentliggoerelser.jsonl
The line-delimited JSON file is a fairly straightforward file
{'regNummer': None, 'cvrNummer': 29443920, 'startDato': '2013-07-01', 'dokumentType': 'AARSRAPPORT', 'sidstOpdateret': '2015-01-01T23:00:00.000Z', 'omgoerelse': False, 'sagsNummer': '14-318.972', 'indlaesningsTidspunkt': '2015-04-11T19:22:58.117Z', 'indlaesningsId': None, 'dokumentUrl': 'http://regnskaber.virk.dk/02934492/eGJybHN0b3JlOi8vWC1CMzBFRjcxNi0yMDE1MDEwMV8xNjAwMDBfMjA2L3hicmw.xml', 'offentliggoerelsesTidspunkt': '2015-01-01T23:00:00.000Z', 'slutDato': '2014-06-30'}
{'regNummer': None, 'cvrNummer': 31785219, ...
...
The same error appears with
odo('offentliggoerelser.jsonl', pd.DataFrame)
and
odo('json://offentliggoerelser.jsonl', pd.DataFrame)
Here are a few other cases
from blaze.utils import example
odo(example('iris.csv'), pd.DataFrame) # works
odo(example('iris.csv'), "json://iris.json") # works
odo(example('iris.csv'), "jsonlines://iris.jsonl") # fails
My odo is 0.5.0

So apparently the file extension is important. This works:
odo(example('iris.csv'), "jsonlines://iris.json")
Renaming my offentliggoerelser.jsonl to offentliggoerelser.json gets me past the NotImplementedError.

Related

pytest-dependency is not working in my test

There are 2 files, the code in the first one is:
import pytest
class TestXdist2():
#pytest.mark.dependency(name="aa")
def test_t1(self):
print("\ntest_1")
assert True
the code in the second file is:
import pytest
import sys, os
sys.path.append(os.getcwd())
from testcases.test_xdist_2 import TestXdist2
class TestXdist1():
def setup_class(self):
self.x = TestXdist2()
#pytest.mark.dependency(depends=["aa"], scope="module")
def test_t2(self):
print("\ntest_t2")
assert 1==1
if __name__ == "__main__":
pytest.main(["-s", "-v", f"{os.path.abspath('testcases')}/test_xdist_1.py"])
when I run the senond file, I thought test case "test_t1" should be ran firstly, then "test_t2" ran secondly, but the result is like this, "test_t2" is skipped, I don'y know why,
PS D:\gitProjects\selenium_pytest_demo> & D:/Python38/python.exe d:/gitProjects/selenium_pytest_demo/testcases/test_xdist_1.py
Test session starts (platform: win32, Python 3.8.7, pytest 6.2.2, pytest-sugar 0.9.4)
cachedir: .pytest_cache
metadata: {'Python': '3.8.7rc1', 'Platform': 'Windows-10-10.0.18362-SP0', 'Packages': {'pytest': '6.2.2', 'py': '1.10.0', 'pluggy': '0.13.1'}, 'Plugins': {'allure-pytest': '2.8.35', 'dependency': '0.5.1', 'forked': '1.3.0', 'html': '3.1.1', 'metadata': '1.11.0', 'rerunfailures': '9.1.1', 'sugar': '0.9.4', 'xdist': '2.2.1'}, 'JAVA_HOME': 'D:\\Java\\jdk-15.0.1'}
rootdir: D:\gitProjects\selenium_pytest_demo, configfile: pytest.ini
plugins: allure-pytest-2.8.35, dependency-0.5.1, forked-1.3.0, html-3.1.1, metadata-1.11.0, rerunfailures-9.1.1, sugar-0.9.4, xdist-2.2.1
collecting ...
testcases\test_xdist_1.py::TestXdist1.test_t2 s 50% █████
test_1
testcases\test_xdist_2.py::TestXdist2.test_t1 ✓ 100% ██████████
Results (0.04s):
1 passed
1 skipped

This is the expected behavior - pytest-dependency does not order testcases, it only skips testcases if the testcase they depend on is skipped or failed. There exists a PR that would change that, but is not merged
Until that, you can use pytest-order. If you just want the ordering, you can use relative markers. If you also want to skip tests if the test they depend on failed, you can use pytest-dependency as before, but use the pytest-order option --order-dependencies to order the tests additionally.
Disclaimer:
I'm the author of pytest-order (which is a fork of pytest-ordering).

cloud-init: Can write_files be merged?

Can write_files be merged? I can't seem to get the merge_how syntax correct. Only files in the last write_files are being created.
Thanks

The answer is yes.
When working with multi-part MIME, you must add the following to the header of each part.
Merge-Type: list(append)+dict(no_replace,recurse_list)+str()
Below is a modified version of the helper script provided in the cloud-init documentation.
#!/usr/bin/env python3
import click
import sys
import gzip as gz
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
#click.command()
#click.option('--gzip', is_flag=True,
help='gzip MIME message to current directory')
#click.argument('message', nargs=-1)
def main(message: str, gzip: bool) -> None:
"""This script creates a multi part MIME suitable for cloud-init
A MESSAGE has following format <filename>:<type> where <filename> is the
file that contains the data to add to the MIME message. <type> is the
MIME content-type.
MIME content-type may be on of the following:
text/cloud-boothook
text/cloud-config
text/cloud-config-archive
text/jinja2
text/part-handler
text/upstart-job
text/x-include-once-url
text/x-include-url
text/x-shellscript
EXAMPLE:
write-mime-multipart afile.cfg:cloud-config bfile.sh:x-shellscript
"""
combined_message = MIMEMultipart()
for message_part in message:
(filename, format_type) = message_part.split(':', 1)
with open(filename) as fh:
contents = fh.read()
sub_message = MIMEText(contents, format_type, sys.getdefaultencoding())
sub_message.add_header('Content-Disposition',
'attachment; filename="{}"'.format(filename))
sub_message.add_header('Merge-Type',
'list(append)+dict(no_replace,recurse_list)+str()')
combined_message.attach(sub_message)
if gzip:
with gz.open('combined-userdata.txt.gz', 'wb') as fd:
fd.write(bytes(combined_message))
else:
print(combined_message)
if __name__ == '__main__':
main()

Hyperopt mongotrials issue with Pickle: AttributeError: 'module' object has no attribute

I'm trying to use Hyperopt parallel search with MongoDB, and encountered some issues with Mongotrials, which have been discussed here. I've tried all their methods, and I am still unable to find solutions to my specific problem. The specific model I'm trying to minimize is RadomForestRegressor from sklearn.
I've followed this tutorial. And I'm able to print out the calculated "fmin" with no issue.
Here are my steps so far:
1) Activate a virtual environment called "tensorflow" (I've installed all my libraries there)
2) Start MongoDB:
(tensorflow) bash-3.2$ mongod --dbpath . --port 1234 --directoryperdb --journal --nohttpinterface
3) Initiate workers:
(tensorflow) bash-3.2$ hyperopt-mongo-worker --mongo=localhost:1234/foo_db --poll-interval=0.1
4) Run my python code, and my python code is as follows:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials
# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train
# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes
# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])
space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }
trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')
def minMe(params):
# Hyperopt tuning for hyperparameters
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
from hyperopt import STATUS_OK
try:
import dill as pickle
print('Went with dill')
except ImportError:
import pickle
def hyperopt_rf(params):
rf = RandomForestRegressor(**params)
return cross_val_score(rf, train_xg_x, train_xg_y).mean()
acc = hyperopt_rf(params)
print 'new acc:', acc, 'params: ', params
return {'loss': -acc, 'status': STATUS_OK}
best = fmin(fn=minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best
5) After I run the above Python code, I get the following errors:
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:no job found, sleeping for 0.7s
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:exiting with N=9223372036854775803 after 4 consecutive exceptions
6) Then Mongo workers would shut off.
Things I've tried:
install "dill" as the error suggested -> didn't work
Put global imports into the objective function so it can pickle -> didn't work
Put try except with "dill" or "pickle" as import -> didn't work
Does anyone have similar issues? I'm running out of ideas to try, and have been working on this for 2 days in vain. I think I am missing something really simple here, just can't seem to find it.
What am I missing?
Any suggestion is welcomed please!

Had the same problem in python 3.5. Installing Dill didn't help, nor dir setting workdir in MongoTrials or hyperopt-mongo-worker cli. hyperopt-mongo-worker doesn't seem to have access to __main__ where the function was defined:
AttributeError: Can't get attribute 'minMe' on <module '__main__' from ...hyperopt-mongo-worker
As #jaikumarm suggested, I circumvented the problem by writing a module file with all the required functions. However, instead of soft-linking it into the bin directory, I extended the PYTHONPATH before running hyperopt-mongo-worker:
export PYTHONPATH="${PYTHONPATH}:<dir_with_the_module.py>"
hyperopt-mongo-worker ...
That way, the hyperopt-monogo-worker is able to import the module containing minMe.

I fought with this for several days before coming up with a workable solution. there are two problems:
1. the mongo worker spawns off a separate process to run the optimizer so any context from your original python file is lost and unavailable for this new process.
2. the imports on this new process happen in the context of the hyperopt-mongo-worker scipy, which is in your case will be /Users/WernerChao/tensorflow/bin/.
So my solution is to make this new optimizer function completely self sufficient
optimizer.py
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train
# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes
# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])
def minMe(params):
# Hyperopt tuning for hyperparameters
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
from hyperopt import STATUS_OK
try:
import dill as pickle
print('Went with dill')
except ImportError:
import pickle
def hyperopt_rf(params):
rf = RandomForestRegressor(**params)
return cross_val_score(rf, train_xg_x, train_xg_y).mean()
acc = hyperopt_rf(params)
print 'new acc:', acc, 'params: ', params
return {'loss': -acc, 'status': STATUS_OK}
wrapper.py
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials
import optimizer
space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }
best = fmin(fn=optimizer.minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best
trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')
Once you have this code link the optimizer.py to the bin folder
ln -s /Users/WernerChao/Git/test/optimizer.py /Users/WernerChao/tensorflow/bin/
now run the wrapper.py and then the mongo worker it should be able to import the optimizer from its local context and run the minMe function.

Try to install Dill in the Python environment of your tensorflow (or possibly the worker):
/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt
Your aim is to get rid of the hyperopt error message:
hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
This is because the Python by default cannot marshal a function. It requires dill library to extend Python's pickling module for serialising/de-serialising Python objects. In your case, it failed to serialise your function minMe().

I made a separate file which calculates the loss and copied it to /anaconda2/bin/
and
/anaconda2/lib/python2.7/site-packages/hyperopt
it is working fine.
This was my Traceback
Traceback (most recent call last):
File "/home/greatskull/anaconda2/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1073, in run_one
with temp_dir(workdir, erase_created_workdir), working_dir(workdir):
File "/home/greatskull/anaconda2/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/utils.py", line 229, in temp_dir
os.makedirs(dir)
File "/home/greatskull/anaconda2/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/home/greatskull/anaconda2/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)

Publishing messages to ActiveMQ using Gatling

I have been using Gatling to publish messages to ActiveMq server. I get "java.lang.SecurityException: Invalid username: null or empty" tho I use valid username and password. Here is my test code and the exception were thrown. Any inputs on how to fix this will be of help.
import io.gatling.core.Predef.Simulation
import io.gatling.core.Predef._
import io.gatling.jms.Predef._
import io.gatling.core.config.Credentials
import org.apache.activemq.ActiveMQConnectionFactory
import org.apache.activemq.jndi.ActiveMQInitialContextFactory
import javax.jms._
class WebProducer extends Simulation{
val jmsUsername:String="userName"
val jmsPwd:String="Password"
val jmsConfig = jms
.connectionFactoryName("ConnectionFactory")
.url("ssl://message01-dev.platform.net:61617")
.credentials(jmsUsername,jmsPwd)
.disableAnonymousConnect
.contextFactory(classOf[org.apache.activemq.jndi.ActiveMQInitialContextFactory].getName)
.listenerCount(1)
.usePersistentDeliveryMode
.receiveTimeout(6000)
val scn = scenario("JMS DSL test").repeat(1) {
exec(jms("req reply testing").
reqreply
.queue("YourJMSQueueName")
.replyQueue("YourJMSQueueName")
.textMessage("payload To be posted")
.property("company_id", "1234598776665")
.property("event_type","EntityCreate")
.property("event_target_entity_type","Account")
)
}
setUp(scn.inject(atOnceUsers(1)))
.protocols(jmsConfig)
}
Following is the exception was thrown :
java.lang.SecurityException: Invalid username: null or empty

Ok, I got this working adding two things :
- .disableAnonymousConnect after .credentials(jmsUsername,jmsPwd)
- .replyQueue(jmsQueueName) after .queue(jmsQueueName)
I edited the above code to reflect the same.
Happy Gatling !

Selenium find_element_by_id returns None

I am using selenium and i have properly installed the selenium module on redhat linux 6.
Below is my script:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
import unittest, time, re
import zlib
class Sele1(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.implicitly_wait(30)
self.base_url = "https://bugzilla.example.com/"
self.verificationErrors = []
def test_sele1(self):
driver = self.driver
driver.get(self.base_url + "/")
driver.find_element_by_id("Bugzilla_login").clear()
driver.find_element_by_id("Bugzilla_login").send_keys("username")
driver.find_element_by_id("Bugzilla_password").clear()
driver.find_element_by_id("Bugzilla_password").send_keys("password")
driver.find_element_by_id("log_in").click()
driver.find_element_by_id("quicksearch").clear()
driver.find_element_by_id("quicksearch").send_keys("new bugs is bugzilla tool")
driver.find_element_by_xpath("//input[#value='Find Bugs']").click()
def is_element_present(self, how, what):
try: self.driver.find_element(by=how, value=what)
except NoSuchElementException, e: return False
return True
def tearDown(self):
self.driver.quit()
self.assertEqual([], self.verificationErrors)
if __name__ == "__main__":
unittest.main()
When i am running this script it is showing errors:
ERROR: test_sele1 (main.Sele1)
Traceback (most recent call last):
File "sele1.py", line 19, in test_sele1
driver.find_element_by_id("Bugzilla_login").clear()
AttributeError: 'NoneType' object has no attribute 'clear'
Ran 1 test in 2.018s
FAILED (errors=1)
Plateform: Redhat
Python Version: 2.6
Note: while running the same script in windows7, it is running fine but not running in linux with python2.6
Please help me for this...
Thanks in advance!

Simple: Selenium can't find an element called "Bugzilla_login".
There could be hundreds of reasons this may be happening. I'd start with checking whether you're loading the correct page.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Reading jsonlines with odo - odo

So apparently the file extension is important. This works: odo(example('iris.csv'), "jsonlines://iris.json") Renaming my offentliggoerelser.jsonl to offentliggoerelser.json gets me past the NotImplementedError.

Related

pytest-dependency is not working in my test

cloud-init: Can write_files be merged?

Hyperopt mongotrials issue with Pickle: AttributeError: 'module' object has no attribute

Publishing messages to ActiveMQ using Gatling

Selenium find_element_by_id returns None

Categories

Resources