Pytest: How to display failed assertion only once, not twice - pytest

I run pytest via PyCharm, and execute a single test:
/home/guettli/projects/lala-env/bin/python /snap/pycharm-professional/230/plugins/python/helpers/pycharm/_jb_pytest_runner.py --target test_models.py::test_address_is_complete
Testing started at 11:53 ...
Launching pytest with arguments test_models.py::test_address_is_complete in /home/guettli/projects/lala-env/src/lala/lala/tests
============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-6.2.0, py-1.10.0, pluggy-0.13.1 -- /home/guettli/projects/lala-env/bin/python
cachedir: .pytest_cache
django: settings: mysite.settings (from ini)
rootdir: /home/guettli/projects/lala-env/src/lala, configfile: pytest.ini
plugins: django-4.1.0
collecting ... collected 1 item
test_models.py::test_address_is_complete Creating test database for alias 'default' ('test_lala')...
Operations to perform:
Synchronize unmigrated apps: allauth, colorfield, debug_toolbar, google, messages, staticfiles
Apply all migrations: account, admin, auth, contenttypes, lala, sessions, sites, socialaccount
Synchronizing apps without migrations:
Creating tables...
Running deferred SQL...
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying account.0001_initial... OK
Applying account.0002_email_max_length... OK
Applying admin.0001_initial... OK
Applying admin.0002_logentry_remove_auto_add... OK
Applying admin.0003_logentry_add_action_flag_choices... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying auth.0007_alter_validators_add_error_messages... OK
Applying auth.0008_alter_user_username_max_length... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying auth.0012_alter_user_first_name_max_length... OK
Applying lala.0001_initial... OK
Applying lala.0002_offer_price... OK
Applying lala.0003_order_amount... OK
Applying lala.0004_auto_20201215_2043... OK
Applying lala.0005_auto_20201229_2148... OK
Applying lala.0006_auto_20201229_2150... OK
Applying lala.0007_auto_20210117_1632... OK
Applying lala.0008_auto_20210117_1632... OK
Applying lala.0009_add_address... OK
Applying lala.0010_auto_20210117_2102... OK
Applying lala.0011_auto_20210119_1909... OK
Applying lala.0012_allergen_short... OK
Applying lala.0013_auto_20210119_1914... OK
Applying lala.0014_auto_20210120_0734... OK
Applying lala.0015_auto_20210120_0752... OK
Applying lala.0016_auto_20210120_1923... OK
Applying lala.0017_allergenuser... OK
Applying lala.0018_address_place... OK
Applying lala.0019_auto_20210126_2027... OK
Applying lala.0020_auto_20210126_2027... OK
Applying lala.0021_recurringoffer_days... OK
Applying lala.0022_auto_20210126_2129... OK
Applying lala.0023_auto_20210201_2056... OK
Applying lala.0024_globalconfig_navbar_title... OK
Applying lala.0025_activationstate... OK
Applying sessions.0001_initial... OK
Applying sites.0001_initial... OK
Applying sites.0002_alter_domain_unique... OK
Applying socialaccount.0001_initial... OK
Applying socialaccount.0002_token_max_lengths... OK
Applying socialaccount.0003_extra_data_default_dict... OK
Destroying test database for alias 'default' ('test_lala')...
FAILED
lala/tests/test_models.py:18 (test_address_is_complete)
user = <User: Dr. Foo>
def test_address_is_complete(user):
address = user.address
> assert address.is_complete
E assert False
E + where False = <Address: Address object (1)>.is_complete
test_models.py:21: AssertionError
Assertion failed
Assertion failed
=================================== FAILURES ===================================
___________________________ test_address_is_complete ___________________________
user = <User: Dr. Foo>
def test_address_is_complete(user):
address = user.address
> assert address.is_complete
E assert False
E + where False = <Address: Address object (1)>.is_complete
test_models.py:21: AssertionError
=========================== short test summary info ============================
FAILED test_models.py::test_address_is_complete - assert False
============================== 1 failed in 2.88s ===============================
Process finished with exit code 1
Assertion failed
Assertion failed
Why does the exception gets displayed twice?

Related

How to solve "Step definition is not found" error: StepDefinitionNotFoundError

Here is my feature file - a.feature:
Scenario Outline: Some outline
Given something
When <thing> is moved to <position>
Then something else
Examples:
| thing | position |
| 1 | 1 |
and save it in /tmp/a.feature
Here is my pytest step file (/tmp/a.py):
from pytest_bdd import (
given,
scenario,
then,
when,
)
#scenario('./x.feature', 'Some outline')
def test_some_outline():
"""Some outline."""
#given('something')
def something():
"""something."""
pass
#when('<thing> is moved to <position>')
def thing_is_moved_to_position(thing, position):
assert isinstance(thing, int)
assert isinstance(position, int)
#then('something else')
def something_else():
"""something else."""
pass
When I run it:
$ pwd
/tmp
$ pytest ./a.py
............
............
E pytest_bdd.exceptions.StepDefinitionNotFoundError: Step definition is not found: When "1 is moved to 1". Line 3 in scenario "Some outline" in the feature "/tmp/x.feature"
/home/cyan/.local/lib/python3.10/site-packages/pytest_bdd/scenario.py:192: StepDefinitionNotFoundError
============= short test summary info =============
FAILED x.py::test_some_outline[1-1] - pytest_bdd.exceptions.StepDefinitionNotFoundError: Step definition is not found: When "1 is moved to 1". Line 3 in scenario "Some outli...
============ 1 failed in 0.09s ============

Xcode does also include succeeded tests in failed test section

I am using Xcode's test plans with the option 'Retry until failure' and some repetitions. When a test fails but then succeeds, it will still show up in the failed test section when another test is actually consistently failing. This is my code:
import XCTest
class SomeTests: XCTestCase {
func testExampleA() throws {
let random = Int.random(in: 0...10)
// Will always fail
if random != -1 {
XCTFail("fail")
}
}
func testExampleB() throws {
let random = Int.random(in: 0...3)
if random != 1 {
XCTFail("fail")
}
}
}
And I run the test command:
xcodebuild test -project uitest.xcodeproj -scheme unittest -destination 'platform=iOS Simulator,name=iPhone 12,OS=15.5' -only-testing SomeTests
This is the output:
/Users/jaspervisser/Desktop/uitest/dfsdfsff/dfsdfsff.swift:16: error: -[dfsdfsff.SomeTests testExampleA] : failed - fail
Test Case '-[dfsdfsff.SomeTests testExampleA]' failed (0.005 seconds).
Test Case '-[dfsdfsff.SomeTests testExampleA]' started (Iteration 20 of 20).
/Users/jaspervisser/Desktop/uitest/dfsdfsff/dfsdfsff.swift:16: error: -[dfsdfsff.SomeTests testExampleA] : failed - fail
Test Case '-[dfsdfsff.SomeTests testExampleA]' failed (0.005 seconds).
Test Case '-[dfsdfsff.SomeTests testExampleB]' started (Iteration 1 of 20).
/Users/jaspervisser/Desktop/uitest/dfsdfsff/dfsdfsff.swift:24: error: -[dfsdfsff.SomeTests testExampleB] : failed - fail
Test Case '-[dfsdfsff.SomeTests testExampleB]' failed (0.005 seconds).
Test Case '-[dfsdfsff.SomeTests testExampleB]' started (Iteration 2 of 20).
/Users/jaspervisser/Desktop/uitest/dfsdfsff/dfsdfsff.swift:24: error: -[dfsdfsff.SomeTests testExampleB] : failed - fail
Test Case '-[dfsdfsff.SomeTests testExampleB]' failed (0.005 seconds).
Test Case '-[dfsdfsff.SomeTests testExampleB]' started (Iteration 3 of 20).
Test Case '-[dfsdfsff.SomeTests testExampleB]' passed (0.019 seconds).
Test Suite 'SomeTests' failed at 2022-07-04 17:55:03.367.
Executed 23 tests, with 22 failures (0 unexpected) in 0.184 (0.202) seconds
Test Suite 'dfsdfsff.xctest' failed at 2022-07-04 17:55:03.373.
Executed 23 tests, with 22 failures (0 unexpected) in 0.184 (0.209) seconds
Test Suite 'All tests' failed at 2022-07-04 17:55:03.377.
Executed 23 tests, with 22 failures (0 unexpected) in 0.184 (0.217) seconds
2022-07-04 17:55:28.752 xcodebuild[77552:4967879] [MT] IDETestOperationsObserverDebug: 31.789 elapsed -- Testing started completed.
2022-07-04 17:55:28.752 xcodebuild[77552:4967879] [MT] IDETestOperationsObserverDebug: 0.000 sec, +0.000 sec -- start
2022-07-04 17:55:28.752 xcodebuild[77552:4967879] [MT] IDETestOperationsObserverDebug: 31.789 sec, +31.789 sec -- end
Test session results, code coverage, and logs:
/Users/jaspervisser/Library/Developer/Xcode/DerivedData/uitest-dthpivieuzigdfgvtdrzgailclme/Logs/Test/Test-unittest-2022.07.04_17-54-56-+0200.xcresult
Failing tests:
dfsdfsff:
SomeTests.testExampleA()
SomeTests.testExampleB()
** TEST FAILED **
You can see that testExampleB has succeeded, but it still pops up in the failing tests section. It takes me so much time in the 'real' CI to find actual consistently failing tests, I don't care about tests that fails and succeeds, I just need to kind consistently failing tests. Is there a way to identify them? Can I filter out tests that succeeds?

AWS Glue job failing with OOM exception when changing column names

I have an ETL job where I load some data from S3 into a dynamic frame, relationalize it, and iterate through the dynamic frames returned. I want to query the result of this in Athena later so I want to change the names of the columns from having '.' to '_' and lower case them. When I do this transformation, I change the DynamicFrame into a spark dataframe and have been doing it this way. I've also seen a problem in another SO question where it turned out there is a reported problem with AWS Glue rename field transform so I've stayed away from that.
I've tried a couple things, including adding a load limit size to 50MB, repartitioning the dataframe, using both dataframe.schema.names and dataframe.columns, using reduce instead of loops, using sparksql to change it and nothing has worked. I'm fairly certain that its this transformation that failing because I've put some print statements in and the print that I have right after the completion of this transformation never shows up. I used a UDF at one point but that also failed. I've tried the actual transformation using df.toDF(new_column_names) and df.withColumnRenamed() but it never gets this far because I've not seen it get past retrieving the column names. Here's the code I've been using. I've been changing the actual name transformation as I said above, but the rest of it has stayed pretty much the same.
I've seen some people try and use the spark.executor.memory, spark.driver.memory, spark.executor.memoryOverhead and spark.driver.memoryOverhead. I've used those and set them to the most AWS Glue will let you but to no avail.
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql.functions import explode, col, lower, trim, regexp_replace
import copy
import json
import boto3
import botocore
import time
# ========================================================
# UTILITY FUNCTIONS
# ========================================================
def lower_and_pythonize(s=None):
if s is not None:
return s.replace('.', '_').lower()
else:
return None
# pyspark implementation of renaming
# exprs = [
# regexp_replace(lower(trim(col(c))),'\.' , '_').alias(c) if t == "string" else col(c)
# for (c, t) in data_frame.dtypes
# ]
# ========================================================
# END UTILITY FUNCTIONS
# ========================================================
## #params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
#my params
bucket_name = '<my-s3-bucket>' # name of the bucket. do not include 's3://' thats added later
output_key = '<my-output-path>' # key where all of the output is saved
input_keys = ['<root-directory-i'm using'] # highest level key that holds all of the desired data
s3_exclusions = "[\"*.orc\"]" # list of strings to exclude. Documentation: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-s3
s3_exclusions = s3_exclusions.replace('\n', '')
dfc_root_table_name = 'root' # name of the root table generated in the relationalize process
input_paths = ['s3://' + bucket_name + '/' + x for x in input_keys] # turn input keys into s3 paths
output_connection_opts = {"path": "s3://" + bucket_name + "/" + output_key} # dict of options. Documentation link found above the write_dynamic_frame.from_options line
s3_client = boto3.client('s3', 'us-east-1') # s3 client used for writing to s3
s3_resource = boto3.resource('s3', 'us-east-1') # s3 resource used for checking if key exists
group_mb = 50 # NOTE: 75 has proven to be too much when running on all of the april data
group_size = str(group_mb * 1024 * 1024)
input_connection_opts = {'paths': input_paths,
'groupFiles': 'inPartition',
'groupSize': group_size,
'recurse': True,
'exclusions': s3_exclusions} # dict of options. Documentation link found above the create_dynamic_frame_from_options line
print(sc._conf.get('spark.executor.cores'))
num_paritions = int(sc._conf.get('spark.executor.cores')) * 4
print('Loading all json files into DynamicFrame...')
loading_time = time.time()
df = glueContext.create_dynamic_frame_from_options(connection_type='s3', connection_options=input_connection_opts, format='json')
print('Done. Time to complete: {}s'.format(time.time() - loading_time))
# using the list of known null fields (at least on small sample size) remove them
#df = df.drop_fields(drop_paths)
# drop any remaining null fields. The above covers known problems that this step doesn't fix
print('Dropping null fields...')
dropping_time = time.time()
df_without_null = DropNullFields.apply(frame=df, transformation_ctx='df_without_null')
print('Done. Time to complete: {}s'.format(time.time() - dropping_time))
df = None
print('Relationalizing dynamic frame...')
relationalizing_time = time.time()
dfc = Relationalize.apply(frame=df_without_null, name=dfc_root_table_name, info="RELATIONALIZE", transformation_ctx='dfc', stageThreshold=3)
print('Done. Time to complete: {}s'.format(time.time() - relationalizing_time))
keys = dfc.keys()
keys.sort(key=lambda s: len(s))
print('Writting all dynamic frames to s3...')
writting_time = time.time()
for key in keys:
good_key = lower_and_pythonize(s=key)
data_frame = dfc.select(key).toDF()
# lowercase all the names and remove '.'
print('Removing . and _ from names for {} frame...'.format(key))
df_fix_names_time = time.time()
print('Repartitioning data frame...')
data_frame.repartition(num_paritions)
print('Done.')
#
print('Changing names...')
for old_name in data_frame.schema.names:
data_frame = data_frame.withColumnRenamed(old_name, old_name.replace('.','_').lower())
print('Done.')
#
df_now = DynamicFrame.fromDF(dataframe=data_frame, glue_ctx=glueContext, name='df_now')
print('Done. Time to complete: {}'.format(time.time() - df_fix_names_time))
# if a conflict of types appears, make it 2 columns
# https://docs.aws.amazon.com/glue/latest/dg/built-in-transforms.html
print('Fixing any type conficts for {} frame...'.format(key))
df_resolve_time = time.time()
resolved = ResolveChoice.apply(frame = df_now, choice = 'make_cols', transformation_ctx = 'resolved')
print('Done. Time to complete: {}'.format(time.time() - df_resolve_time))
# check if key exists in s3. if not make one
out_connect = copy.deepcopy(output_connection_opts)
out_connect['path'] = out_connect['path'] + '/' + str(good_key)
try:
s3_resource.Object(bucket_name, output_key + '/' + good_key + '/').load()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == '404' or 'NoSuchKey' in e.response['Error']['Code']:
# object doesn't exist
s3_client.put_object(Bucket=bucket_name, Key=output_key+'/'+good_key + '/')
else:
print(e)
## https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-glue-context.html
print('Writing {} frame to S3...'.format(key))
df_writing_time = time.time()
datasink4 = glueContext.write_dynamic_frame.from_options(frame = df_now, connection_type = "s3", connection_options = out_connect, format = "orc", transformation_ctx = "datasink4")
out_connect = None
datasink4 = None
print('Done. Time to complete: {}'.format(time.time() - df_writing_time))
print('Done. Time to complete: {}s'.format(time.time() - writting_time))
job.commit()
Here is the error I'm getting
19/06/07 16:33:36 DEBUG Client:
client token: N/A
diagnostics: Application application_1559921043869_0001 failed 1 times due to AM Container for appattempt_1559921043869_0001_000001 exited with exitCode: -104
For more detailed output, check application tracking page:http://ip-172-32-9-38.ec2.internal:8088/cluster/app/application_1559921043869_0001Then, click on links to logs of each attempt.
Diagnostics: Container [pid=9630,containerID=container_1559921043869_0001_01_000001] is running beyond physical memory limits. Current usage: 5.6 GB of 5.5 GB physical memory used; 8.8 GB of 27.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_1559921043869_0001_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 9630 9628 9630 9630 (bash) 0 0 115822592 675 /bin/bash -c LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5120m -Djava.io.tmpdir=/mnt/yarn/usercache/root/appcache/application_1559921043869_0001/container_1559921043869_0001_01_000001/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' '-Djavax.net.ssl.trustStore=ExternalAndAWSTrustStore.jks' '-Djavax.net.ssl.trustStoreType=JKS' '-Djavax.net.ssl.trustStorePassword=amazon' '-DRDS_ROOT_CERT_PATH=rds-combined-ca-bundle.pem' '-DREDSHIFT_ROOT_CERT_PATH=redshift-ssl-ca-cert.pem' '-DRDS_TRUSTSTORE_URL=file:RDSTrustStore.jks' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1559921043869_0001/container_1559921043869_0001_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.deploy.PythonRunner' --primary-py-file runscript.py --arg 'script_2019-06-07-15-29-50.py' --arg '--JOB_NAME' --arg 'tss-json-to-orc' --arg '--JOB_ID' --arg 'j_f9f7363e5d8afa20784bc83d7821493f481a78352641ad2165f8f68b88c8e5fe' --arg '--JOB_RUN_ID' --arg 'jr_a77087792dd74231be1f68c1eda2ed33200126b8952c5b1420cb6684759cf233' --arg '--job-bookmark-option' --arg 'job-bookmark-disable' --arg '--TempDir' --arg 's3://aws-glue-temporary-059866946490-us-east-1/zmcgrath' --properties-file /mnt/yarn/usercache/root/appcache/application_1559921043869_0001/container_1559921043869_0001_01_000001/__spark_conf__/__spark_conf__.properties 1> /var/log/hadoop-yarn/containers/application_1559921043869_0001/container_1559921043869_0001_01_000001/stdout 2> /var/log/hadoop-yarn/containers/application_1559921043869_0001/container_1559921043869_0001_01_000001/stderr
|- 9677 9648 9630 9630 (python) 12352 2628 1418354688 261364 python runscript.py script_2019-06-07-15-29-50.py --JOB_NAME tss-json-to-orc --JOB_ID j_f9f7363e5d8afa20784bc83d7821493f481a78352641ad2165f8f68b88c8e5fe --JOB_RUN_ID jr_a77087792dd74231be1f68c1eda2ed33200126b8952c5b1420cb6684759cf233 --job-bookmark-option job-bookmark-disable --TempDir s3://aws-glue-temporary-059866946490-us-east-1/zmcgrath
|- 9648 9630 9630 9630 (java) 265906 3083 7916974080 1207439 /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5120m -Djava.io.tmpdir=/mnt/yarn/usercache/root/appcache/application_1559921043869_0001/container_1559921043869_0001_01_000001/tmp -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -Djavax.net.ssl.trustStore=ExternalAndAWSTrustStore.jks -Djavax.net.ssl.trustStoreType=JKS -Djavax.net.ssl.trustStorePassword=amazon -DRDS_ROOT_CERT_PATH=rds-combined-ca-bundle.pem -DREDSHIFT_ROOT_CERT_PATH=redshift-ssl-ca-cert.pem -DRDS_TRUSTSTORE_URL=file:RDSTrustStore.jks -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1559921043869_0001/container_1559921043869_0001_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class org.apache.spark.deploy.PythonRunner --primary-py-file runscript.py --arg script_2019-06-07-15-29-50.py --arg --JOB_NAME --arg tss-json-to-orc --arg --JOB_ID --arg j_f9f7363e5d8afa20784bc83d7821493f481a78352641ad2165f8f68b88c8e5fe --arg --JOB_RUN_ID --arg jr_a77087792dd74231be1f68c1eda2ed33200126b8952c5b1420cb6684759cf233 --arg --job-bookmark-option --arg job-bookmark-disable --arg --TempDir --arg s3://aws-glue-temporary-059866946490-us-east-1/zmcgrath --properties-file /mnt/yarn/usercache/root/appcache/application_1559921043869_0001/container_1559921043869_0001_01_000001/__spark_conf__/__spark_conf__.properties
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1559921462650
final status: FAILED
tracking URL: http://ip-172-32-9-38.ec2.internal:8088/cluster/app/application_1559921043869_0001
user: root
Here are the log contents from the job
LogType:stdout
Log Upload Time:Fri Jun 07 16:33:36 +0000 2019
LogLength:487
Log Contents:
4
Loading all json files into DynamicFrame...
Done. Time to complete: 59.5056920052s
Dropping null fields...
null_fields [<some fields that were dropped>]
Done. Time to complete: 529.95293808s
Relationalizing dynamic frame...
Done. Time to complete: 2773.11689401s
Writting all dynamic frames to s3...
Removing . and _ from names for root frame...
Repartitioning data frame...
Done.
Changing names...
End of LogType:stdout
As I said earlier, the Done. print after changing the names never appears in the logs. I've seen plenty of people getting the same error I'm seeing and I've tried a fair bit of them with no success. Any help you can provide would b e much appreciated. Let me know if you need any more information. Thanks
Edit
Prabhakar's comment reminded me that I have tried the memory worker type in AWS Glue and it still failed. As stated above, I have tried raising the amount of memory in the memoryOverhead from 5 to 12, but to avail. Neither of these made the job complete successfully
Update
I put in the following code for column name change instead of the above code for easier debugging
print('Changing names...')
name_counter = 0
for old_name in data_frame.schema.names:
print('Name number {}. name being changed: {}'.format(name_counter, old_name))
data_frame = data_frame.withColumnRenamed(old_name, old_name.replace('.','_').lower())
name_counter += 1
print('Done.')
And I got the following output
Removing . and _ from names for root frame...
Repartitioning data frame...
Done.
Changing names...
End of LogType:stdout
So it must be a problem with the data_frame.schema.names part. Could it be this line with my loop through all of the DynamicFrames? Am I looping through the DynamicFrames from the relationalize transformation correctly?
Update 2
Glue recently added more verbose logs and I found this
ERROR YarnClusterScheduler: Lost executor 396 on ip-172-32-78-221.ec2.internal: Container killed by YARN for exceeding memory limits. 5.5 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
This happens for more than just this executor too; it looks like almost all of them.
I can try to increase the executor memory overhead, but I would like to know why getting the column names results in an OOM error. I wouldn't think that something that trivial would take up that much memory?
Update
I attempted to run the job with both spark.driver.memoryOverhead=7g and spark.yarn.executor.memoryOverhead=7g and I again got an OOM error

Moped: Running naked mongo command

I am getting no such command error while running stat command as in
db.stat() in mongo console
But running it from moped gives error
session.command(stat: 1)
failed with error "no such cmd: stat"
The mongo console command is "stats" (not "stat") and is documented here.
http://docs.mongodb.org/manual/reference/method/db.stats/
As detailed,
The db.stats() method is a wrapper around the dbStats database command.
http://docs.mongodb.org/manual/reference/command/dbStats/#dbcmd.dbStats
So here is a test showing usage for moped.
test.rb
require 'moped'
require 'test/unit'
require 'pp'
class MyTest < Test::Unit::TestCase
def setup
#session = Moped::Session.new([ "127.0.0.1:27017" ])
#session.use "test"
end
test "db stats" do
puts "Moped::VERSION:#{Moped::VERSION}"
dbstats = #session.command(dbstats: 1)
assert_equal("test", dbstats["db"])
pp dbstats
end
end
ruby test.rb
Loaded suite test
Started
Moped::VERSION:1.5.2
{"db"=>"test",
"collections"=>3,
"objects"=>5,
"avgObjSize"=>99.2,
"dataSize"=>496,
"storageSize"=>24576,
"numExtents"=>3,
"indexes"=>1,
"indexSize"=>8176,
"fileSize"=>67108864,
"nsSizeMB"=>16,
"dataFileVersion"=>{"major"=>4, "minor"=>5},
"extentFreeList"=>{"num"=>0, "totalSize"=>0},
"ok"=>1.0}
.
Finished in 0.005335 seconds.
1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
187.44 tests/s, 187.44 assertions/s

WWW::Mechanize and strawberry perl

When I try to install WWW::Mechanize distribution from CPAN with strawberry perl ( v5.10.1) on
windows 7 I get the below failure ?
cpan> install JESSE/WWW-Mechanize-1.70.tar.gz
Running make for J/JE/JESSE/WWW-Mechanize-1.70.tar.gz
Checksum for C:\strawberry\cpan\sources\authors\id\J\JE\JESSE\WWW-Mechanize-1.70
.tar.gz ok
Scanning cache C:\strawberry\cpan\build for sizes
............................................................................DONE
CPAN.pm: Going to build J/JE/JESSE/WWW-Mechanize-1.70.tar.gz
WWW::Mechanize likes to have a lot of test modules for some of its tests.
The following are modules that would be nice to have, but not required.
Test::Memory::Cycle
Test::Taint
Checking if your kit is complete...
Looks good
Writing Makefile for WWW::Mechanize
Could not read metadata file. Falling back to other methods to determine prerequ
isites
cp lib/WWW/Mechanize/Examples.pod blib\lib\WWW\Mechanize\Examples.pod
cp lib/WWW/Mechanize/Link.pm blib\lib\WWW\Mechanize\Link.pm
cp lib/WWW/Mechanize/Image.pm blib\lib\WWW\Mechanize\Image.pm
cp lib/WWW/Mechanize/Cookbook.pod blib\lib\WWW\Mechanize\Cookbook.pod
cp lib/WWW/Mechanize/FAQ.pod blib\lib\WWW\Mechanize\FAQ.pod
cp lib/WWW/Mechanize.pm blib\lib\WWW\Mechanize.pm
C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/mech-dump bli
b\script\mech-dump
pl2bat.bat blib\script\mech-dump
JESSE/WWW-Mechanize-1.70.tar.gz
C:\strawberry\c\bin\dmake.EXE -- OK
Running make test
C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib\lib', 'blib\arch')" t\00-load.t t\add_header.t t\aliases.t t\area_link.t t
\autocheck.t t\clone.t t\content.t t\cookies.t t\credentials-api.t t\credentials
.t t\die.t t\field.t t\find_frame.t t\find_image.t t\find_inputs.t t\find_link-w
arnings.t t\find_link.t t\find_link_id.t t\form-parsing.t t\form_with_fields.t t
\frames.t t\image-new.t t\image-parse.t t\link-base.t t\link-relative.t t\link.t
t\new.t t\pod-coverage.t t\pod.t t\regex-error.t t\save_content.t t\select.t t\
taint.t t\tick.t t\untaint.t t\upload.t t\warn.t t\warnings.t t\local\back.t t\l
ocal\click.t t\local\click_button.t t\local\failure.t t\local\follow.t t\local\f
orm.t t\local\get.t t\local\nonascii.t t\local\overload.t t\local\page_stack.t t
\local\referer.t t\local\reload.t t\local\submit.t t\mech-dump\mech-dump.t
t\00-load.t .............. 1/2 # Testing WWW::Mechanize 1.70, with LWP 5.837, Pe
rl 5.010001, C:\STRAWB~1\perl\bin\perl.exe
# Test::Memory::Cycle is not installed.
t\00-load.t .............. ok
t\add_header.t ........... ok
t\aliases.t .............. ok
t\area_link.t ............ ok
t\autocheck.t ............ ok
t\clone.t ................ ok
t\content.t .............. ok
t\cookies.t .............. skipped: HTTP::Server::Simple does not support Window
s yet.
t\credentials-api.t ...... ok
t\credentials.t .......... ok
t\die.t .................. ok
t\field.t ................ ok
t\find_frame.t ........... ok
t\find_image.t ........... ok
t\find_inputs.t .......... ok
t\find_link-warnings.t ... ok
t\find_link.t ............ ok
t\find_link_id.t ......... ok
t\form-parsing.t ......... ok
t\form_with_fields.t ..... 1/? There are 2 forms with the named fields. The fir
st one was used. at t\form_with_fields.t line 27
t\form_with_fields.t ..... ok
t\frames.t ............... ok
t\image-new.t ............ ok
t\image-parse.t .......... ok
t\link-base.t ............ ok
t\link-relative.t ........ ok
t\link.t ................. ok
t\local\back.t ........... 31/47
# Failed test '404 check'
# at t\local\back.t line 151.
t\local\back.t ........... 33/47 # got: '500'
# expected: '404'
# $server404url=http://SM-15828.emea.hpqcorp.net:50345/
# $mech->content="500 Can't connect to SM-15828.emea.hpqcorp.net:50345 (connect:
timeout)
# "
t\local\back.t ........... 44/47 # Looks like you failed 1 test of 47.
t\local\back.t ........... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/47 subtests
(less 2 skipped subtests: 44 okay)
t\local\click.t .......... ok
t\local\click_button.t ... ok
t\local\failure.t ........ ok
t\local\follow.t ......... ok
t\local\form.t ........... ok
t\local\get.t ............ ok
t\local\nonascii.t ....... ok
t\local\overload.t ....... skipped: Mysteriously stopped passing, and I don't kn
ow why.
t\local\page_stack.t ..... ok
t\local\referer.t ........ ok
t\local\reload.t ......... ok
t\local\submit.t ......... ok
t\mech-dump\mech-dump.t .. ok
t\new.t .................. ok
t\pod-coverage.t ......... skipped: Test::Pod::Coverage 1.04 required for testin
g POD coverage
t\pod.t .................. ok
t\regex-error.t .......... ok
t\save_content.t ......... ok
t\select.t ............... ok
t\taint.t ................ skipped: Test::Taint required for checking taintednes
s
t\tick.t ................. ok
t\untaint.t .............. ok
t\upload.t ............... ok
t\warn.t ................. ok
t\warnings.t ............. ok
Test Summary Report
-------------------
t\local\back.t (Wstat: 256 Tests: 47 Failed: 1)
Failed test: 33
Non-zero exit status: 1
t\local\click_button.t (Wstat: 0 Tests: 19 Failed: 0)
TODO passed: 15-17, 19
Files=52, Tests=558, 490 wallclock secs ( 1.51 usr + 0.36 sys = 1.87 CPU)
Result: FAIL
Failed 1/52 test programs. 1/558 subtests failed.
dmake.EXE: Error code 255, while making 'test_dynamic'
JESSE/WWW-Mechanize-1.70.tar.gz
C:\strawberry\c\bin\dmake.EXE test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
reports JESSE/WWW-Mechanize-1.70.tar.gz
Running make install
make test had returned bad status, won't install without force
Failed during this command:
JESSE/WWW-Mechanize-1.70.tar.gz : make_test NO
Could someone help what could be the problem here and how to solve it if possible ?
In general WWW::Mechanize doesn't always pass all of its tests on MSWin32.
If you only got one failed test, I would count my blessings and force install it.
Do as instructed, and force install ..., or install while skipping test cpanp i WWW::Mechanize --skip-test
Failed 1/52 test programs. 1/558 subtests failed.
100 - (1/52 )*100 = 98.0769230769231 percent OK
100 - (1/558 )*100 = 99.8207885304659 percent OK