Is there any way to create a CSV file in karate framework using the dynamic data generated at runtime? [duplicate] - export-to-csv

**Feature file code:
**
`
Scenario: Create Route
* def num = '3513113555'
* def details = "NAN"
* text func =
"""
UserId,Details
num ,details
"""
* print func
`
Problem
on mentioned code snipped my def variable is considering as string
also tried with " or ' or <>
I want to generate runtime CSV with some dynamic data and data is coming from a JSON / test file

There is a way to convert JSON to CSV in Karate: https://github.com/karatelabs/karate#karate-tocsv
You already know how to create dynamic JSON in Karate. So it becomes simple:
* def num = '3513113555'
* def details = 'NAN'
* def users = []
* users.push({ num: num, details: details })
* def raw = karate.toCsv(users)
* print raw
For more advanced things, refer: https://stackoverflow.com/a/54593057/143475

Related

How to move files from one S3 bucket directory to another directory in same bucket? Scala/Java

I want to move all files under a directory in my s3 bucket to another directory within the same bucket, using scala.
Here is what I have:
def copyFromInputFilesToArchive(spark: SparkSession) : Unit = {
val sourcePath = new Path("s3a://path-to-source-directory/")
val destPath = new Path("s3a:/path-to-destination-directory/")
val fs = sourcePath.getFileSystem(spark.sparkContext.hadoopConfiguration)
fs.moveFromLocalFile(sourcePath,destPath)
}
I get this error:
fs.copyFromLocalFile returns Wrong FS: s3a:// expected file:///
Error explained
The error you are seeing is because the copyFromLocalFile method is really for moving files from a local filesystem to S3. You are trying to "move" files that are already both in S3.
It is important to note that directories don't really exist in Amazon S3 buckets - The folder/file hierarchy you see is really just key-value metadata attached to the file. All file objects are really sitting in the same big, single level container and that filename key is there to give the illusion of files/folders.
To "move" files in a bucket, what you really need to do is update the filename key with the new path which is really just editing object metadata.
How to do a "move" within a bucket with Scala
To accomplish this, you'd need to copy the original object, assign the new metadata to the copy, and then write it back to S3. In practice, you can copy it and save it to the same object which will overwrite the old version, which acts a lot like an update.
Try something like this (from datahackr):
/**
* Copy object from a key to another in multiparts
*
* #param sourceS3Path S3 object key
* #param targetS3Path S3 object key
* #param fromBucketName bucket name
* #param toBucketName bucket name
*/
#throws(classOf[Exception])
#throws(classOf[AmazonServiceException])
def copyMultipart(sourceS3Path: String, targetS3Path: String, fromS3BucketName: String, toS3BucketName: String) {
// Create a list of ETag objects. You retrieve ETags for each object part uploaded,
// then, after each individual part has been uploaded, pass the list of ETags to
// the request to complete the upload.
var partETags = new ArrayList[PartETag]();
// Initiate the multipart upload.
val initRequest = new InitiateMultipartUploadRequest(toS3BucketName, targetS3Path);
val initResponse = s3client.initiateMultipartUpload(initRequest);
// Get the object size to track the end of the copy operation.
var metadataResult = getS3ObjectMetadata(sourceS3Path, fromS3BucketName);
var objectSize = metadataResult.getContentLength();
// Copy the object using 50 MB parts.
val partSize = (50 * 1024 * 1024) * 1L;
var bytePosition = 0L;
var partNum = 1;
var copyResponses = new ArrayList[CopyPartResult]();
while (bytePosition < objectSize) {
// The last part might be smaller than partSize, so check to make sure
// that lastByte isn't beyond the end of the object.
val lastByte = Math.min(bytePosition + partSize - 1, objectSize - 1);
// Copy this part.
val copyRequest = new CopyPartRequest()
.withSourceBucketName(fromS3BucketName)
.withSourceKey(sourceS3Path)
.withDestinationBucketName(toS3BucketName)
.withDestinationKey(targetS3Path)
.withUploadId(initResponse.getUploadId())
.withFirstByte(bytePosition)
.withLastByte(lastByte)
.withPartNumber(partNum + 1);
partNum += 1;
copyResponses.add(s3client.copyPart(copyRequest));
bytePosition += partSize;
}
// Complete the upload request to concatenate all uploaded parts and make the copied object available.
val completeRequest = new CompleteMultipartUploadRequest(
toS3BucketName,
targetS3Path,
initResponse.getUploadId(),
getETags(copyResponses));
s3client.completeMultipartUpload(completeRequest);
logger.info("Multipart upload complete.");
}
// This is a helper function to construct a list of ETags.
def getETags(responses: java.util.List[CopyPartResult]): ArrayList[PartETag] = {
var etags = new ArrayList[PartETag]();
val it = responses.iterator();
while (it.hasNext()) {
val response = it.next();
etags.add(new PartETag(response.getPartNumber(), response.getETag()));
}
return etags;
}
def moveObject(sourceS3Path: String, targetS3Path: String, fromBucketName: String, toBucketName: String) {
logger.info(s"Moving S3 frile from $sourceS3Path ==> $targetS3Path")
// Get the object size to track the end of the copy operation.
var metadataResult = getS3ObjectMetadata(sourceS3Path, fromBucketName);
var objectSize = metadataResult.getContentLength();
if (objectSize > ALLOWED_OBJECT_SIZE) {
logger.info("Object size is greater than 1GB. Initiating multipart upload.");
copyMultipart(sourceS3Path, targetS3Path, fromBucketName, toBucketName);
} else {
s3client.copyObject(fromBucketName, sourceS3Path, toBucketName, targetS3Path);
}
// Delete source object after successful copy
s3client.deleteObject(fromS3BucketName, sourceS3Path);
}
You will need the AWS Sdk for this.
If you are using AWS Sdk Version 1,
projectDependencies ++= Seq(
"com.amazonaws" % "aws-java-sdk-s3" % "1.12.248"
)
import com.amazonaws.services.s3.transfer.{ Copy, TransferManager, TransferManagerBuilder }
val transferManager: TransferManager =
TransferManagerBuilder.standard().build()
def copyFile(): Unit = {
val copy: Copy =
transferManager.copy(
"source-bucket-name", "source-file-key",
"destination-bucket-name", "destination-file-key"
)
copy.waitForCompletion()
}
If you are using AWS Sdk Version 2
projectDependencies ++= Seq(
"software.amazon.awssdk" % "s3" % "2.17.219",
"software.amazon.awssdk" % "s3-transfer-manager" % "2.17.219-PREVIEW"
)
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3.model.CopyObjectRequest
import software.amazon.awssdk.transfer.s3.{Copy, CopyRequest, S3ClientConfiguration, S3TransferManager}
// change Region.US_WEST_2 to your required region
// or it might even work without the whole `.region(Region.US_WEST_2)` thing
val s3ClientConfig: S3ClientConfiguration =
S3ClientConfiguration
.builder()
.region(Region.US_WEST_2)
.build()
val s3TransferManager: S3TransferManager =
S3TransferManager.builder().s3ClientConfiguration(s3ClientConfig).build()
def copyFile(): Unit = {
val copyObjectRequest: CopyObjectRequest =
CopyObjectRequest
.builder()
.sourceBucket("source-bucket-name")
.sourceKey("source-file-key")
.destinationBucket("destination-bucket-name")
.destinationKey("destination-file-key")
.build()
val copyRequest: CopyRequest =
CopyRequest
.builder()
.copyObjectRequest(copyObjectRequest)
.build()
val copy: Copy =
s3TransferManager.copy(copyRequest)
copy.completionFuture().get()
}
Keep in mind that you will need the AWS credentials with appropriate permissions for both the source and destination object. For this, You just need to get the credentials and make them available as following environment variables.
export AWS_ACCESS_KEY_ID=your_access_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_access_key
export AWS_SESSION_TOKEN=your_session_token
Also, "source-file-key" and "destination-file-key" should be the full path of the file in the bucket.

Apache Spark Data Generator Function on Databricks Not working

I am trying to execute the Data Generator function provided my Microsoft to test streaming data to Event Hubs.
Unfortunately, I keep on getting the error
Processing failure: No such file or directory
When I try and execute the function:
%scala
DummyDataGenerator.start(15)
Can someone take a look at the code and help decipher why I'm getting the error:
class DummyDataGenerator:
streamDirectory = "/FileStore/tables/flight"
None # suppress output
I'm not sure how the above cell gets called into the function DummyDataGenerator
%scala
import scala.util.Random
import java.io._
import java.time._
// Notebook #2 has to set this to 8, we are setting
// it to 200 to "restore" the default behavior.
spark.conf.set("spark.sql.shuffle.partitions", 200)
// Make the username available to all other languages.
// "WARNING: use of the "current" username is unpredictable
// when multiple users are collaborating and should be replaced
// with the notebook ID instead.
val username = com.databricks.logging.AttributionContext.current.tags(com.databricks.logging.BaseTagDefinitions.TAG_USER);
spark.conf.set("com.databricks.training.username", username)
object DummyDataGenerator extends Runnable {
var runner : Thread = null;
val className = getClass().getName()
val streamDirectory = s"dbfs:/tmp/$username/new-flights"
val airlines = Array( ("American", 0.17), ("Delta", 0.12), ("Frontier", 0.14), ("Hawaiian", 0.13), ("JetBlue", 0.15), ("United", 0.11), ("Southwest", 0.18) )
val reasons = Array("Air Carrier", "Extreme Weather", "National Aviation System", "Security", "Late Aircraft")
val rand = new Random(System.currentTimeMillis())
var maxDuration = 3 * 60 * 1000 // default to three minutes
def clean() {
System.out.println("Removing old files for dummy data generator.")
dbutils.fs.rm(streamDirectory, true)
if (dbutils.fs.mkdirs(streamDirectory) == false) {
throw new RuntimeException("Unable to create temp directory.")
}
}
def run() {
val date = LocalDate.now()
val start = System.currentTimeMillis()
while (System.currentTimeMillis() - start < maxDuration) {
try {
val dir = s"/dbfs/tmp/$username/new-flights"
val tempFile = File.createTempFile("flights-", "", new File(dir)).getAbsolutePath()+".csv"
val writer = new PrintWriter(tempFile)
for (airline <- airlines) {
val flightNumber = rand.nextInt(1000)+1000
val deptTime = rand.nextInt(10)+10
val departureTime = LocalDateTime.now().plusHours(-deptTime)
val (name, odds) = airline
val reason = Random.shuffle(reasons.toList).head
val test = rand.nextDouble()
val delay = if (test < odds)
rand.nextInt(60)+(30*odds)
else rand.nextInt(10)-5
println(s"- Flight #$flightNumber by $name at $departureTime delayed $delay minutes due to $reason")
writer.println(s""" "$flightNumber","$departureTime","$delay","$reason","$name" """.trim)
}
writer.close()
// wait a couple of seconds
//Thread.sleep(rand.nextInt(5000))
} catch {
case e: Exception => {
printf("* Processing failure: %s%n", e.getMessage())
return;
}
}
}
println("No more flights!")
}
def start(minutes:Int = 5) {
maxDuration = minutes * 60 * 1000
if (runner != null) {
println("Stopping dummy data generator.")
runner.interrupt();
runner.join();
}
println(s"Running dummy data generator for $minutes minutes.")
runner = new Thread(this);
runner.run();
}
def stop() {
start(0)
}
}
DummyDataGenerator.clean()
displayHTML("Imported streaming logic...") // suppress output
you should be able to use the Databricks Labs Data Generator on the Databricks community edition. I'm providing the instructions below:
Running Databricks Labs Data Generator on the community edition
The Databricks Labs Data Generator is a Pyspark library so the code to generate the data needs to be Python. But you should be able to create a view on the generated data and consume it from Scala if that's your preferred language.
You can install the framework on the Databricks community edition by creating a notebook with the cell
%pip install git+https://github.com/databrickslabs/dbldatagen
Once it's installed you can then use the library to define a data generation spec and by using build, generate a Spark dataframe on it.
The following example shows generation of batch data similar to the data set you are trying to generate. This should be placed in a separate notebook cell
Note - here we generate 10 million records to illustrate ability to create larger data sets. It can be used to generate datasets much larger than that
%python
num_rows = 10 * 1000000 # number of rows to generate
num_partitions = 8 # number of Spark dataframe partitions
delay_reasons = ["Air Carrier", "Extreme Weather", "National Aviation System", "Security", "Late Aircraft"]
# will have implied column `id` for ordinal of row
flightdata_defn = (dg.DataGenerator(spark, name="flight_delay_data", rows=num_rows, partitions=num_partitions)
.withColumn("flightNumber", "int", minValue=1000, uniqueValues=10000, random=True)
.withColumn("airline", "string", minValue=1, maxValue=500, prefix="airline", random=True, distribution="normal")
.withColumn("original_departure", "timestamp", begin="2020-01-01 01:00:00", end="2020-12-31 23:59:00", interval="1 minute", random=True)
.withColumn("delay_minutes", "int", minValue=20, maxValue=600, distribution=dg.distributions.Gamma(1.0, 2.0))
.withColumn("delayed_departure", "timestamp", expr="cast(original_departure as bigint) + (delay_minutes * 60) ", baseColumn=["original_departure", "delay_minutes"])
.withColumn("reason", "string", values=delay_reasons, random=True)
)
df_flight_data = flightdata_defn.build()
display(df_flight_data)
You can find information on how to generate streaming data in the online documentation at https://databrickslabs.github.io/dbldatagen/public_docs/using_streaming_data.html
You can create a named temporary view over the data so that you can access it from SQL or Scala using one of two methods:
1: use createOrReplaceTempView
df_flight_data.createOrReplaceTempView("delays")
2: use options for build. In this case the name passed to the Data Instance initializer will be the name of the view
i.e
df_flight_data = flightdata_defn.build(withTempView=True)
This code will not work on the community edition because of this line:
val dir = s"/dbfs/tmp/$username/new-flights"
as there is no DBFS fuse on Databricks community edition (it's supported only on full Databricks). It's potentially possible to make it working by:
Changing that directory to local directory, like, /tmp or something like
adding a code (after writer.close()) to list flights-* files in that local directory, and using dbutils.fs.mv to move them into streamDirectory

convert amount into words in NPR format in odoo12?

I want to convert amount in words in NPR format but it always shows in Euro and cents only. How to change it to NPR format while converting into words.
I have tried all the method lang also but euro and cent cannot be replaced. My company currency is NPR but not able to convert it. I have currency_id field relating to res.currency.
I have tried code as below:
#api.depends('amount')
def set_amt_in_words(self):
self.amt_inwords = num2words(self.amount, to = 'currency', lang = 'en_IN')
if self.currency_id == 'NPR':
amt_inwords = str(amt_inwords).replace('Euro', 'rupees')
amt_inwords = str(amt_inwords).replace('Cents', 'paise')
amt_inwords = str(amt_inwords).replace('Cent', 'paise')
self.amt_inwords += '\tonly'
self.amt_inwords = self.amt_inwords.title()
I want to output in Rupees and paise.
Try
self.env.ref('base.NPR').with_context({'lang': 'en_IN'}).amount_to_text(self.amount)
The following method belongs to the model res.currency and is the responsible for translating currency amount to text (<path_to_v12>/odoo/addons/base/models/res_currency.py):
#api.multi
def amount_to_text(self, amount):
self.ensure_one()
def _num2words(number, lang):
try:
return num2words(number, lang=lang).title()
except NotImplementedError:
return num2words(number, lang='en').title()
if num2words is None:
logging.getLogger(__name__).warning("The library 'num2words' is missing, cannot render textual amounts.")
return ""
formatted = "%.{0}f".format(self.decimal_places) % amount
parts = formatted.partition('.')
integer_value = int(parts[0])
fractional_value = int(parts[2] or 0)
lang_code = self.env.context.get('lang') or self.env.user.lang
lang = self.env['res.lang'].search([('code', '=', lang_code)])
amount_words = tools.ustr('{amt_value} {amt_word}').format(
amt_value=_num2words(integer_value, lang=lang.iso_code),
amt_word=self.currency_unit_label,
)
if not self.is_zero(amount - integer_value):
amount_words += ' ' + _('and') + tools.ustr(' {amt_value} {amt_word}').format(
amt_value=_num2words(fractional_value, lang=lang.iso_code),
amt_word=self.currency_subunit_label,
)
return amount_words

Using Spark Scala in EMR to get S3 Object size (folder, files)

I am trying to get the folder size for some S3 folders with scala from my command line EMR.
I have JSON data stored as GZ files in S3. I find I can count the number of JSON records within my files:
spark.read.json("s3://mybucket/subfolder/subsubfolder/").count
But now I need to know how much GB that data accounts for.
I am finding options to get the size for distinct files, but not for a whole folder all up.
I am finding options to get the size for distinct files, but not for a
whole folder all up.
Solution :
Option1:
Get the s3 access by FileSystem
val fs = FileSystem.get(new URI(ipPath), spark.sparkContext.hadoopConfiguration)
Note :
1) new URI is important other wise it will connect to
hadoop file system path instread of s3 file system(object store :-)) path . using new URI you are giving scheme s3://
here.
2) org.apache.commons.io.FileUtils.byteCountToDisplaySize will
give display sizes of file system in GB MB etc...
/**
* recursively print file sizes
*
* #param filePath
* #param fs
* #return
*/
#throws[FileNotFoundException]
#throws[IOException]
def getDisplaysizesOfS3Files(filePath: org.apache.hadoop.fs.Path, fs: org.apache.hadoop.fs.FileSystem): scala.collection.mutable.ListBuffer[String] = {
val fileList = new scala.collection.mutable.ListBuffer[String]
val fileStatus = fs.listStatus(filePath)
for (fileStat <- fileStatus) {
println(s"file path Name : ${fileStat.getPath.toString} length is ${fileStat.getLen}")
if (fileStat.isDirectory) fileList ++= (getDisplaysizesOfS3Files(fileStat.getPath, fs))
else if (fileStat.getLen > 0 && !fileStat.getPath.toString.isEmpty) {
println("fileStat.getPath.toString" + fileStat.getPath.toString)
fileList += fileStat.getPath.toString
val size = fileStat.getLen
val display = org.apache.commons.io.FileUtils.byteCountToDisplaySize(size)
println(" length zero files \n " + fileStat)
println("Name = " + fileStat.getPath().getName());
println("Size = " + size);
println("Display = " + display);
} else if (fileStat.getLen == 0) {
println(" length zero files \n " + fileStat)
}
}
fileList
}
based on your requirement, you can modify the code... you can sum up all the distinct files.
Option 2 : Simple and crispy using getContentSummary
implicit val spark = SparkSession.builder().appName("ObjectSummary").getOrCreate()
/**
* getDisplaysizesOfS3Files
* #param path
* #param spark [[org.apache.spark.sql.SparkSession]]
*/
def getDisplaysizesOfS3Files(path: String)( implicit spark: org.apache.spark.sql.SparkSession): Unit = {
val filePath = new org.apache.hadoop.fs.Path(path)
val fileSystem = filePath.getFileSystem(spark.sparkContext.hadoopConfiguration)
val size = fileSystem.getContentSummary(filePath).getLength
val display = org.apache.commons.io.FileUtils.byteCountToDisplaySize(size)
println("path = " + path);
println("Size = " + size);
println("Display = " + display);
}
Note : Any option showed above will work for
local or
hdfs or
s3
as well

Inline parsing of IObservable<byte>

I have an observable query that produces an IObservable<byte> from a stream that I want to parse inline. I want to be able to use different strategies depending on the data source to parse discrete messages from this sequence. Bear in mind I am still on the upward learning curve of RX. I have come up with a solution, but am unsure if there is a way to accomplish this using out-of-the-box operators.
First, I wrote the following extension method to IObservable:
public static IObservable<IList<T>> Parse<T>(
this IObservable<T> source,
Func<IObservable<T>, IObservable<IList<T>>> parsingFunction)
{
return parsingFunction(source);
}
This allows me to specify the message framing strategy in use by a particular data source. One data source might be delimited by one or more bytes while another might be delimited by both start and stop block patterns while another might use a length prefixing strategy. So here is an example of the Delimited strategy that I have defined:
public static class MessageParsingFunctions
{
public static Func<IObservable<T>, IObservable<IList<T>>> Delimited<T>(T[] delimiter)
{
if (delimiter == null) throw new ArgumentNullException("delimiter");
if (delimiter.Length < 1) throw new ArgumentException("delimiter must contain at least one element.");
Func<IObservable<T>, IObservable<IList<T>>> parser =
(source) =>
{
var shared = source.Publish().RefCount();
var windowOpen = shared.Buffer(delimiter.Length, 1)
.Where(buffer => buffer.SequenceEqual(delimiter))
.Publish()
.RefCount();
return shared.Buffer(windowOpen)
.Select(bytes =>
bytes
.Take(bytes.Count - delimiter.Length)
.ToList());
};
return parser;
}
}
So ultimately, as an example, I can use the code in the following fashion to parse discrete messages from the sequence any time the byte pattern for the string '<EOF>' is encountered in the sequence:
var messages = ...operators that surface an IObservable<byte>
.Parse(MessageParsingFunctions.Delimited(Encoding.ASCII.GetBytes("<EOF>")))
...further operators to package discrete messages along with additional metadata
Questions:
Is there a more straight-forward way to accomplish this using just out of the box operators?
If not, would it be preferable to just define the different parsing functions (i.e. ParseDelimited, ParseLengthPrefixed, etc.) as local extensions instead of having a more generic Parse extension method that accepts a parsing function?
Thanks in advance!
Take a look at Rxx Parsers. Here's a related lab. For example:
IObservable<byte> bytes = ...;
var parsed = bytes.ParseBinary(parser =>
from next in parser
let magicNumber = parser.String(Encoding.UTF8, 3).Where(value => value == "RXX")
let header = from headerLength in parser.Int32
from header in next.Exactly(headerLength)
from headerAsString in header.Aggregate(string.Empty, (s, b) => s + " " + b)
select headerAsString
let message = parser.String(Encoding.UTF8)
let entry = from length in parser.Int32
from data in next.Exactly(length)
from value in data.Aggregate(string.Empty, (s, b) => s + " " + b)
select value
let entries = from count in parser.Int32
from entries in entry.Exactly(count).ToList()
select entries
select from _ in magicNumber.Required("The file's magic number is invalid.")
from h in header.Required("The file's header is invalid.")
from m in message.Required("The file's message is invalid.")
from e in entries.Required("The file's data is invalid.")
select new
{
Header = h,
Message = m,
Entries = e.Aggregate(string.Empty, (acc, cur) => acc + cur + Environment.NewLine)
});