Spark Failing to write to hdfs because field With AVG - scala

I'm running a spark script in scala from an .sh. When running the same code in a Zeppelin notebook I had no problem. But running it from the script returns the following:
ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 2032, Column 28: Redefinition of parameter "agg_expr_51"
The cause of this is a column which has an average calculated. Why is this happening? Does it have a solution?
Thanks.

Related

syntax error line 1 at position 15 unexpected 'copy' pyspark

I am trying to run a copy command in pyspark. as below. How to get rid of this error?
spark.write.format("snowflake") \
.options(**config.sfparams) \
.option("query", "copy into people_data from (select $1:Company_ID::varchar as Company_ID from #company_stage/pitchbook/"+config.todays_date+"/Company/")\
.load()
Error:
py4j.protocol.Py4JJavaError: An error occurred while calling o49.load.
: net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
syntax error line 1 at position 15 unexpected 'copy'.
Copy doesnt work in pyspark. created a external table to read data directly from s3 and enabled autorefresh=true

How to use bibliography in different directory when knitting rmarkdown document to beamer presentation?

I'm knitting some beamer slides in an RMarkdown script in Rstudio on a Windows 7 PC. The slides are in the directory
C:/me/slides/myslides.Rmd
I have a master bibliography that lives in
C:/me/bib/masterbib.bib
I cannot figure out how to link to the bibliography file from the RMarkdown document. Here's the YAML from my attempt:
---
title: "Slides"
author: "me"
date: "2016-12-20"
bibliography: C:/me/bib/masterbib.bib
biblio-style: "apalike"
output:
beamer_presentation:
citation_package: natbib
---
Here's the error:
! Undefined control sequence.
<write> \string \bibdata {C:\me\bib\masterbib}
l.174 \end{frame}
Error: Failed to compile Slides.tex. See Slides.log for more info.
In addition: Warning message:
running command '"pdflatex" -halt-on-error -interaction=batchmode "Slides.tex"' had status 1
Execution halted
I've tried a couple other ways to specify the directory for masterbib.bib, but none have worked. I would prefer to keep the masterbib.bib file where it is, and not make an extra copy in the C:/me/slides/ directory. Thanks for your help!
Edit
When attempting to pass the following into YAML (quoteed with forward slashes):
bibliography: "C:/LaTeXstuff/BibTexLibrary/BrianBib.bib"
I get a fatal error with log output:
! Undefined control sequence.
<write> \string \bibdata {C:\me
\bib\masterbib}
l.174 \end{frame}
Here is how much of TeX's memory you used:
18047 strings out of 494045
334241 string characters out of 3145937
424206 words of memory out of 3000000
20891 multiletter control sequences out of 15000+200000
31808 words of font info for 44 fonts, out of 3000000 for 9000
715 hyphenation exceptions out of 8191
56i,11n,55p,434b,376s stack positions out of 5000i,500n,10000p,200000b,50000s
! ==> Fatal error occurred, no output PDF file produced!
When passing the following into YAML (quoted with backslashes)
bibliography: "C:\me\bib\masterbib.bib"
I get the following error in the Rstudio console
Error in yaml::yaml.load(enc2utf8(string), ...) :
Scanner error: while parsing a quoted scalar at line 4, column 15found unknown escape character at line 4, column 29
Calls: <Anonymous> ... yaml_load_utf8 -> mark_utf8 -> <Anonymous> -> .Call
Execution halted
When passing the following into YAML (unquoted with backslashes)
bibliography: C:\me\bib\masterbib.bib
I get the following error in the Rstudio console
! Undefined control sequence.
<write> \string \bibdata {C:\me
\bib\masterbib}
l.174 \end{frame}
Error: Failed to compile BibTest.tex. See BibTest.log for more info.
In addition: Warning message:
running command '"pdflatex" -halt-on-error -interaction=batchmode "BibTest.tex"' had status 1
Execution halted
Try unquoted with two backslashes:
...
bibliography: C:\\me\\bib\\masterbib.bib
...

iReport compile error found when click on the preview

I'm new to iReport and tried to generate a report using iReport. The problem I got was not compiling the jasper file.
This is the error I got. How can I fix this? I have already set the fonts, and cannot preview the report anyway.
Compiling to file... C:\Users\SDU\Desktop\ugc test\report1.jasper
Errors compiling C:\Users\SDU\Desktop\ugc test\report1.jasper!
Compilation exceptions: com.jaspersoft.ireport.designer.compiler.ErrorsCollector#1e398a0
net.sf.jasperreports.engine.JRException: Errors were encountered when compiling report expressions class file:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
calculator_report1_1442307421307_832861: 192: expecting ''', found '\n' # line 192, column 37.
1 error
at net.sf.jasperreports.compilers.JRGroovyCompiler.compileUnits(JRGroovyCompiler.java:113)
at net.sf.jasperreports.engine.design.JRAbstractCompiler.compileReport(JRAbstractCompiler.java:201)
at net.sf.jasperreports.engine.JasperCompileManager.compile(JasperCompileManager.java:354)
at net.sf.jasperreports.engine.JasperCompileManager.compileToFile(JasperCompileManager.java:270)
at net.sf.jasperreports.engine.JasperCompileManager.compileReportToFile(JasperCompileManager.java:563)
at com.jaspersoft.ireport.designer.compiler.IReportCompiler.run(IReportCompiler.java:528)
at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
Caused by: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
calculator_report1_1442307421307_832861: 192: expecting ''', found '\n' # line 192, column 37.
1 error
at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:302)
at org.codehaus.groovy.control.ErrorCollector.addFatalError(ErrorCollector.java:149)
at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:119)
at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:131)
at org.codehaus.groovy.control.SourceUnit.addError(SourceUnit.java:359)
at org.codehaus.groovy.antlr.AntlrParserPlugin.transformCSTIntoAST(AntlrParserPlugin.java:136)
at org.codehaus.groovy.antlr.AntlrParserPlugin.parseCST(AntlrParserPlugin.java:107)
at org.codehaus.groovy.control.SourceUnit.parse(SourceUnit.java:236)
at org.codehaus.groovy.control.CompilationUnit$1.call(CompilationUnit.java:161)
at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:900)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:564)
at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:540)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:517)
at net.sf.jasperreports.compilers.JRGroovyCompiler.compileUnits(JRGroovyCompiler.java:109)
... 7 more Compilation running time: 32!
Maybe you use Text Field instead of Static Text somewhere.
Text Field accepts only expressions.
It can generate errors like this.

Spark Pipe example

I'm new to Spark and trying to figure out how the pipe method works. I have the following code in Scala
sc.textFile(hdfsLocation).pipe("preprocess.py").saveAsTextFile(hdfsPreprocessedLocation)
The values hdfsLocation and hdfsPreprocessedLocation are fine. As proof, the following code works from the command line
hadoop fs -cat hdfsLocation/* | ./preprocess.py | head
When I run the above Spark code I get the following errors
14/11/25 09:41:50 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Cannot run program "preprocess.py": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
... 12 more
In order to solve this for Hadoop streaming I would just use the --files attribute, so I tried the same thing for Spark. I start Spark with the following command
bin/spark-shell --files ./preprocess.py
but that gave the same error.
I couldn't find a good example of using Spark with an external process via pipe, so I'm not sure if I'm doing this correctly. Any help would be greatly appreciated.
Thanks
I'm not sure if this is the correct answer, so I won't finalize this, but it appears that the file paths are different when running spark in local and cluster mode. When running spark without --master the paths to the pipe command are relative to the local machine. When running spark with --master the paths to the pipe command are ./
UPDATE:
This actually isn't correct. I was using SparkFiles.get() to get the file name. It turns out that when calling .pipe() on an RDD the command string is evaluated on the driver and then passed to the worker. Because of this SparkFiles.get() is not the appropriate way to get the file name. The file name should be ./ because SparkContext.addFile() should put that file on ./ relative to to where each worker is run from. But I'm so sour on .pipe now that I've take .pipe out of my code in total in favor of .mapPartitions in combination of a PipeUtils object that I wrote here. This is actually more efficient because I only have to incur the script startup costs once per partition instead of once per example.

Opening grb2 files re-visited

I have downloaded and installed NCTOOLBOX into MATLAB (2013a) to read netcdf and grb files. As a test, I copied a netcdf, grb and grb2 file to a directory on my computer. This is placed within my script as:
pathnc = 'c:\test\era40_moda_200205.nc'
pathgrb = 'c:\test\era40_moda_200205.grb'
pathgrb2 = 'c:\test\multi_1.at_4m.dp.200607.grb2'
I used the following code to read the *.nc file:
nc = ncdataset(pathnc);
nc.variables
The code works great....with no error messages..and all variables listed..on netcdf files...... however, when I run it for the grb files using:
nc = ncdataset(pathgrb);
nc.variables
I get this very long list of errors:
2014-03-05 08:40:15,744 [main] WARN ucar.nc2.grib.grib2.Grib2Index - Grib2Index bad size = -1 for c:/test/multi_1.at_4m.dp.200607.grb2 index = c:\test\multi_1.at_4m.dp.200607.grb2.gbx9
Warning: Escape sequence '\m' is not valid. See 'help
sprintf' for valid escape sequences.
> In ncdataset>ncdataset.ncdataset at 89
In GRIB_and_NC_Reader_Prog at 14
Error using ncdataset (line 91)
Failed to open c: est
Error in GRIB_and_NC_Reader_Prog (line 14)
nc = ncdataset(pathgrb2);
Caused by:
Error using ncdataset (line 75)
Java exception occurred:
java.lang.RuntimeException: java.lang.NoSuchFieldError:
alwaysUseFieldBuilders...............etc, etc....ad nauseum...............
In case it was just a bad file, I tried the code on a different grb file and got the same results. Yes I have read the previous posts on reading grb with NCTOOLBOX...but still 'dead in the water.' I would greatly appreciate any insight to get my script reading grb and grb2 files.
I was getting a similar java error: java.lang.NoSuchFieldError:alwaysUseFieldBuilders. I tried running the same code in R2014a and it worked.