how Merging File from Local with File HDFS? - scala

i would like merging from local file from /opt/one.txt with file at my hdfs hdfs://localhost:54310/dummy/two.txt.
contains at one.txt : f,g,h
contains at two.txt : 2424244r
my code :
val cfg = new Configuration()
cfg.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"))
cfg.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"))
cfg.addResource(new Path("/usr/local/hadoop/etc/hadoop/mapred-site.xml"))
try
{
val srcPath = "/opt/one.txt"
val dstPath = "/dumCBF/two.txt"
val srcFS = FileSystem.get(URI.create(srcPath), cfg)
val dstFS = FileSystem.get(URI.create(dstPath), cfg)
FileUtil.copyMerge(srcFS,
new Path(srcPath),
dstFS,
new Path(dstPath),
true,
cfg,
null)
println("end proses")
}
catch
{
case m:Exception => m.printStackTrace()
case k:Throwable => k.printStackTrace()
}
i was following tutorial from : http://deploymentzone.com/2015/01/30/spark-and-merged-csv-files/
and it's not working at all, error would below this :
java.io.FileNotFoundException: File does not exist: /opt/one.txt
i dont why, sounds of error like that? BTW the file one.txt is exist
and then, im add some code to check exist file :
if(new File(srcPath).exists()) println("file is exist")
any idea or references? thanks!
EDIT 1,2 : typo extensions

Related

Gatling Sequential scenarios

I'm trying to use gatling and I have a problem.
1- I have one scenario that exec POST request for getting a list of tokens and save all tokens in csv
2- I create another scenario that exec GET request but I need a token for auth each request
My problem is before executing my first scenario my file doesn't exist and I have this following error:
Could not locate feeder file: Resource user-files/resources/token.csv not found
My code :
Scenario 1 :
val auth_app = scenario("App authentication")
.exec(http("App Authentication")
.post("/token")
.header("Content-Type", "application/x-www-form-urlencoded")
.formParamSeq(Seq(("grant_type", "password"), ("client_id", clientID), ("client_secret", clientSecret)))
.check(jsonPath("$.token").saveAs("token")))
.exec(session => {
val token_data = new File(token_file_path)
if(token_data.exists()){
val writer = new PrintWriter(new FileOutputStream(new File(token_file_path), true))
writer.write(session("access_token").as[String].trim)
writer.write("\n")
writer.close()
}
else {
val writer = new PrintWriter(new FileOutputStream(new File(token_file_path), true))
writer.println("AccessToken")
writer.write(session("access_token").as[String].trim)
writer.write("\n")
writer.close()
}
session
})
Scenario 2 :
val load_catalog = scenario("Load catalog")
.exec(http("Load catalog")
.get("/list")
.headers(Map("Content-Type" -> "application/json", "Authorization Bearer" -> "${AccessToken}")))
.feed(csv(token_file_path).random)
My setup :
setUp(
auth_app.inject(atOnceUsers(10)).protocols(httpProtocol),
load_catalog.inject(nothingFor(120 seconds), atOnceUsers(10)).protocols(httpProtocol)
)
Is it possible to have a dynamic feeder with gatling ?

FileUtil.copyMerge() in AWS S3

I have Loaded a DataFrame into HDFS as text format using below code. finalDataFrame is the DataFrame
finalDataFrame.repartition(1).rdd.saveAsTextFile(targetFile)
After executing the above code I found that a directory created with the file name I provided and under the directory a file created but not in text format. The file name is like part-00000.
I have resolved this in HDFS using below code.
val hadoopConfig = new Configuration()
val hdfs = FileSystem.get(hadoopConfig)
FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), true, hadoopConfig, null)
Now I can get the text file in the mentioned path with given file name.
But when I am trying to do the same operation in S3 it is showing some exception
FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), true, hadoopConfig, null)
java.lang.IllegalArgumentException: Wrong FS:
s3a://globalhadoop/data, expected:
hdfs://*********.aws.*****.com:8050
It seems that S3 path is not supporting over here. Can anyone please assist how to do this.
I have solved the problem using below code.
def createOutputTextFile(srcPath: String, dstPath: String, s3BucketPath: String): Unit = {
var fileSystem: FileSystem = null
var conf: Configuration = null
if (srcPath.toLowerCase().contains("s3a") || srcPath.toLowerCase().contains("s3n")) {
conf = sc.hadoopConfiguration
fileSystem = FileSystem.get(new URI(s3BucketPath), conf)
} else {
conf = new Configuration()
fileSystem = FileSystem.get(conf)
}
FileUtil.copyMerge(fileSystem, new Path(srcPath), fileSystem, new Path(dstPath), true, conf, null)
}
I have written the code for filesystem of S3 and HDFS and both are working fine.
You are passing in the hdfs filesystem as the destination FS in FileUtil.copyMerge. You need to get the real FS of the destination, which you can do by calling Path.getFileSystem(Configuration) on the destination path you have created.

com.typesafe.config.ConfigException$NotResolved: has not been resolved,

I am trying to read the following config file using typesafe config
common = {
jdbcDriver = "com.mysql.jdbc.Driver"
slickDriver = "slick.driver.MySQLDriver"
port = 3306
db = "foo"
user = "bar"
password = "baz"
}
source = ${common} {server = "remoteserver"}
target = ${common} {server = "localserver"}
When I try to read my config using this code
val conf = ConfigFactory.parseFile(new File("src/main/resources/application.conf"))
val username = conf.getString("source.user")
I get an error
com.typesafe.config.ConfigException$NotResolved: source.user has not been resolved, you need to call Config#resolve(), see API docs for Config#resolve()
I don't get any error if I put everything inside "source" or "target" tags. I get errors only when I try to use "common"
I solved it myself.
ConfigFactory.parseFile(new File("src/main/resources/application.conf")).resolve()
I solved it.
Config confSwitchEnv = ConfigFactory.load("env.conf");
the env.conf is in the resources dir.
reference: https://nicedoc.io/lightbend/config

not include file upload errors - Scala

Why is it when I don't include the file I get this
[IOException: can not replace a non-empty directory: Path(./public/upload)]
request.body.file("resourceFile").map { k =>
val t = new java.io.File(s"./public/upload/${k.filename}")
k.ref.moveTo(t, true)
println("Ok File Upload" + k.filename)
How do you stop this from happening?
Ta
I don't understand why it is happening.
You could add an ugly if statement to prevent the error :
request.body.file("resourceFile").map { k =>
if (!k.filename.isEmpty) {
val t = new java.io.File(s"./public/upload/${k.filename}")
k.ref.moveTo(t, true)
println("Ok File Upload" + k.filename)
}

Gradle plugin copy file from plugin jar

I'm creating my first gradle plugin. I'm trying to copy a file from the distribution jar into a directory I've created at the project. Although the file exists inside the jar, I can't copy it to the directory.
This is my task code:
import org.gradle.api.DefaultTask;
import org.gradle.api.tasks.TaskAction;
class InitTask extends DefaultTask {
File baseDir;
private void copyEnvironment(File environments) {
String resource = getClass().getResource("/environments/development.properties").getFile();
File input = new File(resource);
File output = new File(environments, "development.properties");
try {
copyFile(input, output);
}
catch (IOException e) {
e.printStackTrace();
}
}
void copyFile(File sourceFile, File destFile) {
destFile << sourceFile.text
}
#TaskAction
void createDirectories() {
logger.info "Creating directory."
File environments = new File(baseDir, "environments");
File scripts = new File(baseDir, "scripts");
File drivers = new File(baseDir, "drivers");
[environments, scripts, drivers].each {
it.mkdirs();
}
copyEnvironment(environments);
logger.info "Directory created at '${baseDir.absolutePath}'."
}
}
And this is the error I'm getting:
:init
java.io.FileNotFoundException: file:/path-to-jar/MyJar.jar!/environments/development.properties (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at groovy.util.CharsetToolkit.<init>(CharsetToolkit.java:69)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.newReader(DefaultGroovyMethods.java:15706)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.getText(DefaultGroovyMethods.java:14754)
at org.codehaus.groovy.runtime.dgm$352.doMethodInvoke(Unknown Source)
at org.codehaus.groovy.reflection.GeneratedMetaMethod$Proxy.doMethodInvoke(GeneratedMetaMethod.java:70)
at groovy.lang.MetaClassImpl$GetBeanMethodMetaProperty.getProperty(MetaClassImpl.java:3465)
at org.codehaus.groovy.runtime.callsite.GetEffectivePojoPropertySite.getProperty(GetEffectivePojoPropertySite.java:61)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGetProperty(AbstractCallSite.java:227)
at br.com.smartcoders.migration.tasks.InitTask.copyFile(InitTask.groovy:29)
Just to emphasize, the development.properties is inside the environments directory inside the MyJar.jar
getClass().getResource() returns a URL. To access that URL, you'll have to read it directly (e.g. with url.text) rather than first converting it to a String/File. Or you can use getClass().getResourceAsStream().text, which is probably more accurate. In both cases you can optionally specify the file encoding.
Kotlin DSL answer!
For cases like this it is good to have extensions:
fun Any.getResource(filename: String): File? {
val input = this::class.java.classLoader.getResourceAsStream(filename) ?: return null
val tempFile = File.createTempFile(
filename.substringBeforeLast('.'),
"." + filename.substringAfterLast('.')
)
tempFile.deleteOnExit()
tempFile.writer().use { output ->
input.bufferedReader().use { input ->
output.write(input.readText())
}
}
return tempFile
}