How to print arguments that i sent to a Airflow-EMR cluster? - pyspark

I am executing EMR (spark-submit) through airflow 2.0 and I am submitting steps as follows:
My s3://dbook/ buckets all files needed for spark-submit, first I am copying all files to EMr(Copy S3 to EMR) and then executing the spark-submit command, but I am getting an error called
"no module named config". I need to know what args is being sent to EMR clsuter. How to achieve this?
SPARK_STEPS = [
{
'Name': 'Copy S3 to EMR',
"ActionOnFailure": "CANCEL_AND_WAIT",
'HadoopJarStep': {
"Jar": "command-runner.jar",
"Args": ['aws' ,'s3', 'cp' ,'s3://dbook/', '.', '--recursive'],
},
},
{
'Name': 'Spark-Submit Command',
"ActionOnFailure": "CANCEL_AND_WAIT",
'HadoopJarStep': {
"Jar": "command-runner.jar",
"Args": [
'spark-submit',
'--py-files',
'config.zip,jobs.zip',
'main.py'],
},
}
]
Thanks,
Xi

Related

CI Using Azure Pipelines and Nx fails

UPDATE: I was able to get this working by setting "ProduceReferenceAssembly" to false in the .csproj files of the libs. Not sure if this is optimal or intended but that is what worked for me. See: Ref folder within .NET 5.0 bin folder
I'm trying to set up a proof of concept using NX dot net and Azure using this exaple .yml: https://nx.dev/recipes/ci/monorepo-ci-azure
I have 3 services (libs) and 3 apis (apps) ... I made a change to one of the apis to test caching and incremental builds.
The unchanged projects all say [remote cache] but then the build fails because it's looking for the .dlls in the /obj/Debug/ directory. Why use that when there are .dlls in the /dist directory?
How can I fix this? Is there something in the nx.json or project.json files I need to change?
(https://i.stack.imgur.com/IQhaO.png)
I tried using the same command locally on my machine and it completes as expected. I expect the build to complete. The build fails when remote caching is used.
{
"name": "ShipmentService",
"$schema": "../../node_modules/nx/schemas/project-schema.json",
"projectType": "library",
"sourceRoot": "libs/ShipmentService",
"targets": {
"build": {
"executor": "#nx-dotnet/core:build",
"outputs": [
"{workspaceRoot}/dist/libs/ShipmentService",
"{workspaceRoot}/libs/ShipmentService/obj"
],
"options": {
"configuration": "Debug",
"noDependencies": true
},
"configurations": {
"production": {
"configuration": "Release"
}
}
},
"lint": {
"executor": "#nx-dotnet/core:format"
}
},
"tags": []
}
Tried proposed workaround, here's what I'm noticing: platformservice:build [remote cache]
Error, it sees the intermediates part, but basically same issue: same error
Updated project.json (all of them have been updated to look similar to this [tried with and without /obj portion]):
"outputs": [
"{workspaceRoot}/dist/libs/ShipmentService",
"{workspaceRoot}/dist/intermediates/libs/ShipmentService/obj"
],
This is a bug on nx-dotnet's side, and we aren't quite capturing all of the outputs that are needed for the cache. If you add the path to the obj directory into the outputs array of the build target in project.json it should work. Here's the workaround which will eventually be migrated:
I've got a branch with this working, you do indeed need the obj directory as part of the cache. There are some weird intricacies with this though. I'll work on a migration + patch. In the meantime, the workaround that I used is:
Update Directory.Build.props adding these to the property group containing the output path manipulation:
<BaseIntermediateOutputPath>$(RepoRoot)dist/intermediates/$(ProjectRelativePath)/obj</BaseIntermediateOutputPath>
<IntermediateOutputPath>$(BaseIntermediateOutputPath)</IntermediateOutputPath>
As an example, the full file looks like this on the nx-dotnet repo now:
<Project>
<PropertyGroup>
<!-- Output path configuration -->
<RepoRoot>$([System.IO.Path]::GetFullPath('$(MSBuildThisFileDirectory)'))</RepoRoot>
<ProjectRelativePath>$([MSBuild]::MakeRelative($(RepoRoot), $(MSBuildProjectDirectory)))</ProjectRelativePath>
<BaseOutputPath>$(RepoRoot)dist/$(ProjectRelativePath)</BaseOutputPath>
<OutputPath>$(BaseOutputPath)</OutputPath>
<BaseIntermediateOutputPath>$(RepoRoot)dist/intermediates/$(ProjectRelativePath)/obj</BaseIntermediateOutputPath>
<IntermediateOutputPath>$(BaseIntermediateOutputPath)</IntermediateOutputPath>
<AppendTargetFrameworkToOutputPath>true</AppendTargetFrameworkToOutputPath>
</PropertyGroup>
<PropertyGroup>
<RestorePackagesWithLockFile>false</RestorePackagesWithLockFile>
</PropertyGroup>
</Project>
Your project.json file should look something like this now:
{
"name": "demo-webapi",
"sourceRoot": "demo/apps/webapi",
"targets": {
"build": {
"executor": "#nx-dotnet/core:build",
"outputs": [
"{workspaceRoot}/dist/demo/apps/webapi",
"{workspaceRoot}/dist/intermediates/demo/apps/webapi"
],
"options": {
"configuration": "Debug",
"noDependencies": true
},
"configurations": {
"production": {
"configuration": "Release"
}
}
}
}
}

Airflow Dataproc serverless job creator doesnt take python parameters

I'm trying to setup a Dataproc Serverless Batch Job from google cloud composer using the DataprocCreateBatchOperator operator that takes some arguments that would impact the underlying python code. However I'm running into the following error:
error: unrecognized arguments: --run_timestamp "2022-06-17T13:22:51.800834+00:00" --temp_bucket "gs://pipeline/spark_temp_bucket/hourly/" --bucket "pipeline" --pipeline "hourly"
This is how my operator is setup:
create_batch = DataprocCreateBatchOperator(
task_id="hourly_pipeline",
project_id="dev",
region="us-west1",
batch_id="".join(random.choice(string.ascii_lowercase + string.digits + "-") for i in range(40)),
batch={
"environment_config": {
"execution_config": {
"service_account": "<service_account>",
"subnetwork_uri": "<uri>
}
},
"pyspark_batch": {
"main_python_file_uri": "gs://pipeline/code/pipeline_feat_creation.py",
"args": [
'--run_timestamp "{{ ts }}"',
'--temp_bucket "gs://pipeline/spark_temp_bucket/hourly/"',
'--bucket "pipeline"',
'--pipeline "hourly"'
],
"jar_file_uris": [
"gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.25.0.jar"
],
}
}
)
Regarding the args array: I tried setting the parameters with and without encapsulating them with "". I've also already did a gcloud submit that worked like so:
gcloud dataproc batches submit pyspark "gs://pipeline/code/pipeline_feat_creation.py" \
--batch=jskdnkajsnd-test-10 --region=us-west1 --subnet="<uri>" \
-- --run_timestamp "2020-01-01" --temp_bucket gs://pipeline/spark_temp_bucket/hourly/ --bucket pipeline --pipeline hourly
The error I was running into was that I wasn't adding a = after each parameter; I've also eliminated the " encapsulation around each parameter. This is how the args are now setup:
"args": [
'--run_timestamp={{ ts }}',
'--temp_bucket=gs://pipeline/spark_temp_bucket/hourly/',
'--bucket=pipeline',
'--pipeline=hourly'
]

TYPO3 CMS 8.7.27: Call to a member function getPackagePath() on null

After installing extensions in typo3 CMS 8.7.27, I got following error.. Seems like the ExtensionManagementUtility can't load the ah_contentapi.
This is my composer.json file in root (/var/www/html/typo3) for loading my extensions:
{
"repositories":[
{
"type":"composer",
"url":"https://composer.typo3.org/"
},
{
"type":"package",
"package":{
"name":"Bm/ah-content-api",
"version":"0.0.1",
"type":"typo3-cms-extension",
"source":{
"url":"https://user#bitbucket.org/company/ah_config_typo3.git",
"type":"git",
"reference":"master"
}
}
},
{
"type":"package",
"package":{
"name":"Bm/ah-contentelements",
"version":"0.0.1",
"type":"typo3-cms-extension",
"source":{
"url":"https://user#bitbucket.org/company/ah_contentelements_typo3.git",
"type":"git",
"reference":"master"
}
}
}
],
"name":"typo3/cms-base-distribution",
"description":"TYPO3 CMS Base Distribution",
"license":"GPL-2.0-or-later",
"require":{
"helhum/typo3-console":"^4.9.3 || ^5.2",
"typo3/cms-about":"^8.7.10",
"typo3/cms-belog":"^8.7.10",
"typo3/cms-beuser":"^8.7.10",
"typo3/cms-context-help":"^8.7.10",
"typo3/cms-documentation":"^8.7.10",
"typo3/cms-felogin":"^8.7.10",
"typo3/cms-fluid-styled-content":"^8.7.10",
"typo3/cms-form":"^8.7.10",
"typo3/cms-func":"^8.7.10",
"typo3/cms-impexp":"^8.7.10",
"typo3/cms-info":"^8.7.10",
"typo3/cms-info-pagetsconfig":"^8.7.10",
"typo3/cms-rte-ckeditor":"^8.7.10",
"typo3/cms-setup":"^8.7.10",
"typo3/cms-sys-note":"^8.7.10",
"typo3/cms-t3editor":"^8.7.10",
"typo3/cms-tstemplate":"^8.7.10",
"typo3/cms-viewpage":"^8.7.10",
"typo3/cms-wizard-crpages":"^8.7.10",
"typo3/cms-wizard-sortpages":"^8.7.10",
"typo3/cms":"^8.7",
"dmitryd/typo3-realurl":"2.*",
"GridElementsTeam/Gridelements":"8.2.*",
"clickstorm/cs_seo":"3.*",
"Bm/ah-content-api":"0.0.1",
"Bm/ah-contentelements":"0.0.1"
},
"scripts":{
"typo3-cms-scripts":[
"typo3cms install:fixfolderstructure",
"typo3cms install:generatepackagestates"
],
"post-autoload-dump":[
"#typo3-cms-scripts"
]
},
"extra":{
"typo3/cms":{
"web-dir":"public"
},
"helhum/typo3-console":{
"comment":"This option is not needed ay more for helhum/typo3-console 5.x",
"install-extension-dummy":false
}
},
"autoload":{
"psr-4":{
"Bm\\AhContentelements\\":"public/typo3conf/ext/ah_contentelements/Classes",
"Bm\\AhContentapi\\":"public/typo3conf/ext/ah_content_api/Classes"
}
}
}
I already cleared cache in install tool at:
1. -> important actions -> clear all cache
2. -> clean up -> Clean typo3temp/ folder
piece from composer.lock:
{
"_readme": [
"This file locks the dependencies of your project to a known state",
"Read more about it at https://getcomposer.org/doc/01-basic-usage.md#installing-dependencies",
"This file is #generated automatically"
],
"content-hash": "954afd2318d54ec9b1dd0e4d7f9b445b",
"packages": [
{
"name": "Bm/ah-content-api",
"version": "0.0.1",
"source": {
"type": "git",
"url": "https://stevenhippovibe#bitbucket.org/hippovibe/ah_config_typo3.git",
"reference": "master"
},
"type": "typo3-cms-extension"
},
{
"name": "Bm/ah-contentelements",
"version": "0.0.1",
"source": {
"type": "git",
"url": "https://stevenhippovibe#bitbucket.org/stevenhippovibe/ah_contentelements_typo3.git",
"reference": "master"
},
"type": "typo3-cms-extension"
},
The Error occurs when the extension folder name under typo3conf/ext/<folder_name> doesn't match extension key used in some places of the system (e.g. using EXT:your_extension_key/... syntax in TypoScript).
Changing folder name fixed similar problem for me.
Check the PHP version and try to change it from i.e. 7.4 to 7.3.
I once had this problem with an extension that should be compatible with PHP 7.4, but wasn't in real life. This solved the problem for me.
Question here is:
How did you update to 8.7.27 (which composer command was executed)
How does your composer.lock look like?
Do you use TYPO3 console or any other special composer plugins / CLI commands to e.g. generate PackageStates.php?
I just ran into the same error message under TYPO3 9.5.5.
Solution:
Deinstall one TYPO3 extension after the other and try it out again. This will lead you to the extension which has an error. Most probably the error is inside of the file ext_localconf.php or ext_tables.php .
I got this error detail:
PHP Warning: Use of undefined constant FH_DEBUG_EXT - assumed 'FH_DEBUG_EXT' (this will throw an Error in a future version of PHP) in /var/www/html/global-extensions/ext/div2007/ext_localconf.php line 15
This has nothing to do with your error. But it can be that you have an error in one of your installed extensions or even in a backup of an extension, e.g. a folder named as extensionname.bak .
Also these recommendations can help:
https://wiki.typo3.org/Exception/CMS/1476107295

how to pass multiple args to vscode task command?

in tasks.json I am using the "args" property to specify the arguments to pass to "command":"gulp". But when I run the task in vscode, only the first argument is being passed to gulp.
I want to run a gulp task against a single file. In gulpfile.js I am using the process.argv array to retrieve the command line arguments. So, on the command line I enter "gulp copy3 --file abc.js" and the copy3 task is run. The code then reads the argv array to get the name of the file being copied.
this code works from the command line. But does not work when I run it as a task in vscode. How to do that?
the gulpfile.js code:
gulp.task('copy3', function( )
{
console.log(process.argv) ;
let pattern = '*.js' ;
// single file to copy
if (( process.argv.length >= 5 ) && ( process.argv[3] == '--file' ))
{
let fileName = process.argv[4] ;
pattern = fileName ;
}
console.log('pattern:' + pattern ) ;
return gulp.src(pattern).pipe(gulp.dest('dev'));
}) ;
the tasks.json file
{
"version": "2.0.0",
"tasks": [
{
"taskName": "copy3",
"command": "gulp",
"args": [ "copy3", "--file", "${fileBasename}" ],
"problemMatcher": []
}
]
}
Here is the terminal output:
[10:52:57] Using gulpfile C:\vscTest\rpgproj\gulpfile.js
[10:52:57] Starting 'copy3'...
[ 'C:\\Program Files\\nodejs\\node.exe',
'C:\\vscTest\\rpgproj\\node_modules\\gulp\\bin\\gulp.js',
'copy3' ]
pattern:*.js
[10:52:57] Finished 'copy3' after 16 ms
thanks,
I made a couple of small changes, try:
{
"label": "Tasks: copy3",
"type": "shell",
"command": "gulp",
"args": [ "copy3", "--file", "${fileBasename}" ],
"problemMatcher": []
}
and your entire code works perfectly. Make sure to reload vscode after modifying the tasks.json.
VSCode appears to have a built-in gulp extension. This seems to scan your gulpfile for tasks and list them for you. It also seems to ignore the args option.
The workaround is to use the full path to gulp as the command e.g. ./node_modules/.bin/gulp to bypass it.

Error running CoffeeScript in Sublime Text 2

I am new to both CoffeeScript and Sublime Text 2, so any help would be greatly appreciated.
When I try to compile a CoffeeScript test file in Sublime, I get the following error message:
[Error 2] The system cannot find the file specified
[cmd: [u'coffee', u'-c', u'C:\\Users\\username\\Desktop\\test.coffee']]
[dir: C:\Users\username\Desktop]
[path: $HOME/bin:/usr/local/bin:$HOME/bin:/usr/local/bin:C:\Program Files (x86)\ImageMagick-6.9.1-Q16;C:\ProgramData\Oracle\Java\javapath;C:\Program Files\Java\jdk1.8.0_25\bin;C:\Java\bin;C:\Program Files (x86)\Windows Live\Shared;C:\Program Files\nodejs\;C:\Program Files (x86)\GnuWin32\bin;C:\Program Files (x86)\Heroku\bin;C:\Program Files (x86)\git\cmd;C:\Program Files (x86)\Skype\Phone\;C:\RailsInstaller\Git\cmd;C:\RailsInstaller\Ruby2.1.0\bin;C:\Users\username\AppData\Roaming\npm]
[Finished]
Here are my build settings in commands:
{
"path": "$HOME/bin:/usr/local/bin:$PATH",
"cmd": ["coffee","-c","$file"],
"file_regex": "^(...*?):([0-9]*):?([0-9]*)",
"selector": "source.coffee, source.litcoffee, source.coffee.md"
}
Any idea how to fix this problem? Thanks.
I managed to get it working with the following code in the build file:
{
"cmd": [ "coffee", "-m", "-c", "$file" ],
"file_regex": "^(...*?):([0-9]*):?([0-9]*)",
"selector": "source.coffee, source.litcoffee, source.coffee.md",
"windows":
{
"shell": true
}
}