Elasticsearch Scripts to add element into an array

Elasticsearch Scripts to add element into an array - scala

I am working on ElasticSearch in a scala project. I am using elastic4s as the client. I am trying to add elements to a document, from an iterator one by one.
while (iterator.hasNext) {
counter +=1
client.execute {
update id reportID in "reports/report" script "ctx._source.elasticData += output" params Map("output" -> iterator.next().toStringifiedJson)
}.await
}
The above code does not work yielding the following error:
[ERROR] [03/06/2015 14:44:23.515] [SparkActorSystem-akka.actor.default-dispatcher-5] [akka://SparkActorSystem/user/spark-actor] failed to execute script
org.elasticsearch.ElasticsearchIllegalArgumentException: failed to execute script
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:189)
at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:176)
at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:170)
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.run(TransportInstanceSingleOperationAction.java:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: script_lang not supported [groovy]
at org.elasticsearch.script.ScriptService.dynamicScriptEnabled(ScriptService.java:521)
at org.elasticsearch.script.ScriptService.verifyDynamicScripting(ScriptService.java:398)
[ERROR] [03/06/2015 14:44:23.515] [SparkActorSystem-akka.actor.default-dispatcher-5] [akka://SparkActorSystem/user/spark-actor] failed to execute script
org.elasticsearch.ElasticsearchIllegalArgumentException: failed to execute script
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:189)
at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:176)
at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:170)
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.run(TransportInstanceSingleOperationAction.java:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: script_lang not supported [groovy]
at org.elasticsearch.script.ScriptService.dynamicScriptEnabled(ScriptService.java:521)
at org.elasticsearch.script.ScriptService.verifyDynamicScripting(ScriptService.java:398)
at org.elasticsearch.script.ScriptService.compile(ScriptService.java:363)
at org.elasticsearch.script.ScriptService.executable(ScriptService.java:503)
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:183)
... 6 moreat org.elasticsearch.script.ScriptService.compile(ScriptService.java:363)
at org.elasticsearch.script.ScriptService.executable(ScriptService.java:503)
at org.elasticsearch.
action.update.UpdateHelper.prepare(UpdateHelper.java:183)
... 6 more
The problems is with the script I assume but I could not find any solution. Please help...

Does adding groovy dependency solve the problem? Please see that: gist.

Related

IOException while overwriting parquet

I have parquet file let's say file name abc/A.parquet and few records are filtered out based on certain condition and create DF and I am trying overwrite file with resulted filtered DF with saveMode overwrite option, but throwing below exception:
command used to overwrite
filterDF.coalesce(1).write.mode("overwrite").parquet("file:/home/psub2/cls_parquet2/file:/home/psub7/abc/A.parquet")
failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:381)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File file:/home/psub7/abc/A.parquet does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
Pls help, Thanks in advance

Conceptually you can't read and write dataframe from same file. IOException thrown when you try to read df from file A and try write same df into the same file A. You can overwrite A parquet file only if you didn't read dataframe from file A.
For example you can read dataframe from file A and overwrite file B.

Executing hdfs commands from inside scala script

I'm trying to execute a HDFS specific command from inside the scala script being executed by Spark in cluster mode. Below the command:
val cmd = Seq("hdfs","dfs","-copyToLocal","/tmp/file.dat","/path/to/local")
val result = cmd.!!
The job fails at this stage with the error as below:
java.io.FileNotFoundException: /var/run/cloudera-scm-agent/process/2087791-yarn-NODEMANAGER/log4j.properties (Permission denied)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)
at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:108)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
However, when I run the same command separately in Spark shell, it executes just fine and the file is copied as well.
scala> val cmd = Seq("hdfs","dfs","-copyToLocal","/tmp/file_landing_area/file.dat","/tmp/local_file_area")
cmd: Seq[String] = List(hdfs, dfs, -copyToLocal, /tmp/file_landing_area/file.dat, /tmp/local_file_area)
scala> val result = cmd.!!
result: String = ""
I don't understand the permission denied error. Although it displays as a FileNotFoundException. Totally confusing.
Any ideas?

As per error, it is checking hdfs data into var folder which I suspect configuration issue or it is not pointing to correct one.
Using seq and executing HDFS command is not good practise. It is useful only for spark shell. Using same approach in code not advisable. Instead of this try to use below Scala File system API to move data From or To HDFS. Please check below sample code just for reference that might help you.
import org.apache.hadoop.fs
import org.apache.hadoop.fs._
val conf = new Configuration()
val fs = path.getFileSystem(conf)
val hdfspath = new Path("hdfs:///user/nikhil/test.csv")
val localpath = new Path("file:///home/cloudera/test/")
fs.copyToLocalFile(hdfspath,localpath)
Please use below link for more reference regarding Scala File system API.
https://hadoop.apache.org/docs/r2.9.0/api/org/apache/hadoop/fs/FileSystem.html#copyFromLocalFile(boolean,%20boolean,%20org.apache.hadoop.fs.Path,%20org.apache.hadoop.fs.Path)

Caused by: java.io.IOException: File already exists

I want to save my DataFrame in CSV format. This is a small data set, therefore I use coalesce(1):
df.coalesce(1).write.mode(SaveMode.Overwrite).csv(outputPath + "/test.csv")
I get this error:
Caused by: java.io.IOException: File already exists:s3://test/test.csv/part-00000-c9f8a000-2601-4b83-a6d6-a3f023937fdc-c000.csv
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:617)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:915)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:896)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:793)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:176)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:81)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVFileFormat.scala:135)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:77)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:305)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:314)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
However I can save this DataFrame as a parquet file without any error:
df.write.mode(SaveMode.Overwrite).parquet(outputPath + "/test")
How to solve this issue and save my DataFrame in CSV format?

Scala Breeze does not find my file on a webserver, while Java does

I'm new to Scala, so there's probably something obvious I'm missing.
I've got a Scalatra webserver running, with a csv-file in the same folder as the Scalatra servlet. The webserver recognizes the file just find, and the following action:
get("/dependencies") {
val variable = params.get("variable")
new java.io.File("/path/to/files/my_csv_file.csv")
}
Works as intended and returns the csv-file as a http get request.
However, I want to use the Breeze-library to do some general operations on the csv data. When I try to load the csv file to Breezes csv reader:
val matrix=csvread(new file("/path/to/files/my_csv_file.csv"), ',')
The following error is returned by the server:
[error] /path/to/files/MyScalatraServlet.scala:23: not found: type file
[error] val matrix=csvread(new file("/path/to/files/"), ',')
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 1 s, completed 18.5.2017 18:32:23
Although not obvious from the code, Breezes linalg module, containing the csv module, has been imported in the beginning of the file.
import breeze.linalg._
Does anyone have ideas on why this error is happening? How come Breeze does not find the csv file?

That is a compilation error. I think following code would work:
val matrix = csvread(new java.io.File("/path/to/files/"), ',')
or
val matrix = csvread("/path/to/files/", ',')

Identify exact failed match from failed Scalacheck property in specs2

I am running a specs2 test suite from sbt, using the test command. When a ScalaCheck property fails, I just get to see the filename and line number in my code where the specs2 match fails - which is not very useful when that happens to be a utility method which does a common type of check that I am frequently doing. A stack trace would be better.
I've tried the last command in sbt, but that doesn't display the stack trace I'm looking for. The only stack trace last displays is this generic one:
java.lang.RuntimeException: Tests unsuccessful
at scala.sys.package$.error(package.scala:27)
at scala.Predef$.error(Predef.scala:66)
at sbt.Tests$.showResults(Tests.scala:168)
at sbt.Defaults$$anonfun$testTasks$5.apply(Defaults.scala:279)
at sbt.Defaults$$anonfun$testTasks$5.apply(Defaults.scala:279)
at sbt.Scoped$$anonfun$hf2$1.apply(Structure.scala:473)
at sbt.Scoped$$anonfun$hf2$1.apply(Structure.scala:473)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:41)
at sbt.Scoped$Reduced$$anonfun$combine$1$$anonfun$apply$11.apply(Structure.scala:295)
at sbt.Scoped$Reduced$$anonfun$combine$1$$anonfun$apply$11.apply(Structure.scala:295)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$5.work(System.scala:67)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:221)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:221)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
at sbt.Execute.work(Execute.scala:227)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:221)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:221)
at sbt.CompletionService$$anon$1$$anon$2.call(CompletionService.scala:26)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I also have FINEST logging level enabled in the java.util.logging properties file.
For now I am working around this issue using the Eclipse debugger, but that's unnecessarily heavyweight in some cases.

You can display the stacktrace failures by passing the failtrace argument on the command line. This is documented in the Arguments part of the User Guide.
Also you have to know that this stacktrace is filtered to avoid showing specs2's own stack so if you want to see everything you need to add fullstacktrace which is a shortcut for a TraceFilter that filters nothing.

I decided to take a different approach to identifying the failed match - using aka, like this:
def occurExactlyOnceInBody = be_===(1) ^^ { (s: String) => body.tails.count(_.startsWith(s)) aka "No. of occurences of " + s + " in body" }
The downside is this does need to be applied manually, but the upside is it's easier to understand the failures.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Elasticsearch Scripts to add element into an array - scala

Does adding groovy dependency solve the problem? Please see that: gist.

Related

IOException while overwriting parquet

Executing hdfs commands from inside scala script

Caused by: java.io.IOException: File already exists

Scala Breeze does not find my file on a webserver, while Java does

Identify exact failed match from failed Scalacheck property in specs2

Categories

Resources