Read password in Scala in a console-agnostic way - scala

I have an easy task to accomplish: read a password from a command line prompt without exposing it. I know that there is java.io.Console.readPassword, however, there are times when you cannot access console as if you are running your app from an IDE (such as IntelliJ).
I stumbled upon this Password Masking in the Java Programming Language tutorial, which looks nice, but I fail to implement it in Scala. So far my solution is:
class EraserThread() extends Runnable {
private var stop = false
override def run(): Unit = {
stop = true
while ( stop ) {
System.out.print("\010*")
try
Thread.sleep(1)
catch {
case ie: InterruptedException =>
ie.printStackTrace()
}
}
}
def stopMasking(): Unit = {
this.stop = false
}
}
val et = new EraserThread()
val mask = new Thread(et)
mask.start()
val password = StdIn.readLine("Password: ")
et.stopMasking()
When I start this snippet I get a continuos printing of asterisks on new lines. E.g.:
*
*
*
*
Is there any specific in Scala why this is not working? Or is there any better way to do this in Scala in general?

Related

Apache Spark Data Generator Function on Databricks Not working

I am trying to execute the Data Generator function provided my Microsoft to test streaming data to Event Hubs.
Unfortunately, I keep on getting the error
Processing failure: No such file or directory
When I try and execute the function:
%scala
DummyDataGenerator.start(15)
Can someone take a look at the code and help decipher why I'm getting the error:
class DummyDataGenerator:
streamDirectory = "/FileStore/tables/flight"
None # suppress output
I'm not sure how the above cell gets called into the function DummyDataGenerator
%scala
import scala.util.Random
import java.io._
import java.time._
// Notebook #2 has to set this to 8, we are setting
// it to 200 to "restore" the default behavior.
spark.conf.set("spark.sql.shuffle.partitions", 200)
// Make the username available to all other languages.
// "WARNING: use of the "current" username is unpredictable
// when multiple users are collaborating and should be replaced
// with the notebook ID instead.
val username = com.databricks.logging.AttributionContext.current.tags(com.databricks.logging.BaseTagDefinitions.TAG_USER);
spark.conf.set("com.databricks.training.username", username)
object DummyDataGenerator extends Runnable {
var runner : Thread = null;
val className = getClass().getName()
val streamDirectory = s"dbfs:/tmp/$username/new-flights"
val airlines = Array( ("American", 0.17), ("Delta", 0.12), ("Frontier", 0.14), ("Hawaiian", 0.13), ("JetBlue", 0.15), ("United", 0.11), ("Southwest", 0.18) )
val reasons = Array("Air Carrier", "Extreme Weather", "National Aviation System", "Security", "Late Aircraft")
val rand = new Random(System.currentTimeMillis())
var maxDuration = 3 * 60 * 1000 // default to three minutes
def clean() {
System.out.println("Removing old files for dummy data generator.")
dbutils.fs.rm(streamDirectory, true)
if (dbutils.fs.mkdirs(streamDirectory) == false) {
throw new RuntimeException("Unable to create temp directory.")
}
}
def run() {
val date = LocalDate.now()
val start = System.currentTimeMillis()
while (System.currentTimeMillis() - start < maxDuration) {
try {
val dir = s"/dbfs/tmp/$username/new-flights"
val tempFile = File.createTempFile("flights-", "", new File(dir)).getAbsolutePath()+".csv"
val writer = new PrintWriter(tempFile)
for (airline <- airlines) {
val flightNumber = rand.nextInt(1000)+1000
val deptTime = rand.nextInt(10)+10
val departureTime = LocalDateTime.now().plusHours(-deptTime)
val (name, odds) = airline
val reason = Random.shuffle(reasons.toList).head
val test = rand.nextDouble()
val delay = if (test < odds)
rand.nextInt(60)+(30*odds)
else rand.nextInt(10)-5
println(s"- Flight #$flightNumber by $name at $departureTime delayed $delay minutes due to $reason")
writer.println(s""" "$flightNumber","$departureTime","$delay","$reason","$name" """.trim)
}
writer.close()
// wait a couple of seconds
//Thread.sleep(rand.nextInt(5000))
} catch {
case e: Exception => {
printf("* Processing failure: %s%n", e.getMessage())
return;
}
}
}
println("No more flights!")
}
def start(minutes:Int = 5) {
maxDuration = minutes * 60 * 1000
if (runner != null) {
println("Stopping dummy data generator.")
runner.interrupt();
runner.join();
}
println(s"Running dummy data generator for $minutes minutes.")
runner = new Thread(this);
runner.run();
}
def stop() {
start(0)
}
}
DummyDataGenerator.clean()
displayHTML("Imported streaming logic...") // suppress output
you should be able to use the Databricks Labs Data Generator on the Databricks community edition. I'm providing the instructions below:
Running Databricks Labs Data Generator on the community edition
The Databricks Labs Data Generator is a Pyspark library so the code to generate the data needs to be Python. But you should be able to create a view on the generated data and consume it from Scala if that's your preferred language.
You can install the framework on the Databricks community edition by creating a notebook with the cell
%pip install git+https://github.com/databrickslabs/dbldatagen
Once it's installed you can then use the library to define a data generation spec and by using build, generate a Spark dataframe on it.
The following example shows generation of batch data similar to the data set you are trying to generate. This should be placed in a separate notebook cell
Note - here we generate 10 million records to illustrate ability to create larger data sets. It can be used to generate datasets much larger than that
%python
num_rows = 10 * 1000000 # number of rows to generate
num_partitions = 8 # number of Spark dataframe partitions
delay_reasons = ["Air Carrier", "Extreme Weather", "National Aviation System", "Security", "Late Aircraft"]
# will have implied column `id` for ordinal of row
flightdata_defn = (dg.DataGenerator(spark, name="flight_delay_data", rows=num_rows, partitions=num_partitions)
.withColumn("flightNumber", "int", minValue=1000, uniqueValues=10000, random=True)
.withColumn("airline", "string", minValue=1, maxValue=500, prefix="airline", random=True, distribution="normal")
.withColumn("original_departure", "timestamp", begin="2020-01-01 01:00:00", end="2020-12-31 23:59:00", interval="1 minute", random=True)
.withColumn("delay_minutes", "int", minValue=20, maxValue=600, distribution=dg.distributions.Gamma(1.0, 2.0))
.withColumn("delayed_departure", "timestamp", expr="cast(original_departure as bigint) + (delay_minutes * 60) ", baseColumn=["original_departure", "delay_minutes"])
.withColumn("reason", "string", values=delay_reasons, random=True)
)
df_flight_data = flightdata_defn.build()
display(df_flight_data)
You can find information on how to generate streaming data in the online documentation at https://databrickslabs.github.io/dbldatagen/public_docs/using_streaming_data.html
You can create a named temporary view over the data so that you can access it from SQL or Scala using one of two methods:
1: use createOrReplaceTempView
df_flight_data.createOrReplaceTempView("delays")
2: use options for build. In this case the name passed to the Data Instance initializer will be the name of the view
i.e
df_flight_data = flightdata_defn.build(withTempView=True)
This code will not work on the community edition because of this line:
val dir = s"/dbfs/tmp/$username/new-flights"
as there is no DBFS fuse on Databricks community edition (it's supported only on full Databricks). It's potentially possible to make it working by:
Changing that directory to local directory, like, /tmp or something like
adding a code (after writer.close()) to list flights-* files in that local directory, and using dbutils.fs.mv to move them into streamDirectory

Multiple Gatling simulations in parallel with different rampUsers over different times

I have multiple Gatling simulations defined in this manner (imports removed).
class MySimulation1 extends Simulation {
object SimulationObj1 {
var feeder = ...
var random = exec(...)
}
val httpProtocol = ...
val myScenario = scenario("Scenario name").exec(SimulationObj1.random)
setUp(myScenario.inject(
rampUsers(10) over (180 seconds)
)
)
.assert(...)
}
class MySimulation2 extends Simulation {
object SimulationObj2 {
var feeder = ...
var random = exec(...)
}
val httpProtocol = ...
val myScenario = scenario("Scenario name").exec(SimulationObj2.random)
setUp(myScenario.inject(
rampUsers(15) over (300 seconds)
)
)
.assert(...)
}
And then there's another AllSimulations class that simply calls all the simulations so that the scenarios in them could be executed in parallel.
class AllSimulations extends Simulation {
object AllSimulationsObj {
var feeder = ...
var random = exec(...)
}
val httpProtocol = ...
val myScenario = scenario("All scenarios").exec(
new MySimulation1().SimulationObj1.random,
new MySimulation2().SimulationObj2.random)
setUp(myScenario.inject(
rampUsers(10) over (180 seconds)
)
)
.assert(...)
}
The problem is that, in order to have different rampUsers count over different durations, I'm removing the setUp block from AllSimulations class, but that gives me an error "No scenario set up".
How do I possibly run all the simulation scenarios in parallel with the rampUsers and durations defined in the respective simulation classes?
EDIT: Here's what I tried, but I'm not sure if it makes sense.
class AllSimulations extends Simulation {
setUp(
new MySimulation1().myScenario.inject(rampUsers(10) over (180 seconds)),
new MySimulation2().myScenario.inject(rampUsers(15) over (300 seconds))
)
.assert(...)
}
If u want to run two or more scenarios concurrently or simultaneously or parallelly then let's say u have two files (EXAMPLE1.scala and EXAMPLE2.scala).
U have to make a separate file (Simulator.scala) like shown below.
EXAMPLE1.SCALA (FILE-1)
...
val Example1_scenario = scenario("EXAMPLE1").exec(RunningForAllTenants())
...
EXAMPLE2.SCALA (FILE-2)
...
val Example2_scenario = scenario("EXAMPLE2").exec(RunningForAllTenants())
...
Simulator.scala
class Simulator extends Simulation
{
setUp **(** new EXAMPLE1().Example1_scenario.inject(rampUsers(10) during (10) **)** .protocols(httpConf1),
setUp **(** new EXAMPLE2().Example2_scenario.inject(rampUsers(30) during (20) **)** .protocols(httpConf1),
}
Run Simulator.scala which will automatically run EXAMPLE1.scala and EXAMPLE2.scala concurrently
I don't think what you propose will work - it doesn't really make sense to execute simulations in parallel as your results would no longer reflect the true number of concurrent users.
What would work would be to define your scenarios (in different files, if that suits), then have a simulation that injects users into each as desired.

neo4j 3.0 embedded - no nodes

There's sometime I must be missing about neo4j 3.0 embedded. After creating a node, setting some properties, and marking the transaction as success. I then re-open the DB, but there are no nodes in it! What am I missing here? The neo4j documentation is pretty poor.
val graph1 = {
val graphDb = new GraphDatabaseFactory()
.newEmbeddedDatabase(new File("/opt/neo4j/deviceGraphTest" ))
val tx = graphDb.beginTx()
val node = graphDb.createNode()
node.setProperty("name", "kitchen island")
node.setProperty("bulbType", "incandescent")
tx.success()
graphDb.shutdown()
}
val graph2 = {
val graphDb2 = new GraphDatabaseFactory()
.newEmbeddedDatabase(new File("/opt/neo4j/deviceGraphTest" ))
val tx2 = graphDb2.beginTx()
val allNodes = graphDb2.getAllNodes.iterator().toList
allNodes.foreach(node => {
printNode(node)
})
}
The transaction what you have opened has to be closed with the command tx.close() after setting the transaction to state success. I do not know the exact scala syntax but it would be good to put the full block into a try/catch and to finally close the transaction in the finally block.
Here is the documentation for Java: https://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/Transaction.html

Difference between RoundRobinRouter and RoundRobinRoutinglogic

So I was reading tutorial about akka and came across this http://manuel.bernhardt.io/2014/04/23/a-handful-akka-techniques/ and I think he explained it pretty well, I just picked up scala recently and having difficulties with the tutorial above,
I wonder what is the difference between RoundRobinRouter and the current RoundRobinRouterLogic? Obviously the implementation is quite different.
Previously the implementation of RoundRobinRouter is
val workers = context.actorOf(Props[ItemProcessingWorker].withRouter(RoundRobinRouter(100)))
with processBatch
def processBatch(batch: List[BatchItem]) = {
if (batch.isEmpty) {
log.info(s"Done migrating all items for data set $dataSetId. $totalItems processed items, we had ${allProcessingErrors.size} errors in total")
} else {
// reset processing state for the current batch
currentBatchSize = batch.size
allProcessedItemsCount = currentProcessedItemsCount + allProcessedItemsCount
currentProcessedItemsCount = 0
allProcessingErrors = currentProcessingErrors ::: allProcessingErrors
currentProcessingErrors = List.empty
// distribute the work
batch foreach { item =>
workers ! item
}
}
}
Here's my implementation of RoundRobinRouterLogic
var mappings : Option[ActorRef] = None
var router = {
val routees = Vector.fill(100) {
mappings = Some(context.actorOf(Props[Application3]))
context watch mappings.get
ActorRefRoutee(mappings.get)
}
Router(RoundRobinRoutingLogic(), routees)
}
and treated the processBatch as such
def processBatch(batch: List[BatchItem]) = {
if (batch.isEmpty) {
println(s"Done migrating all items for data set $dataSetId. $totalItems processed items, we had ${allProcessingErrors.size} errors in total")
} else {
// reset processing state for the current batch
currentBatchSize = batch.size
allProcessedItemsCount = currentProcessedItemsCount + allProcessedItemsCount
currentProcessedItemsCount = 0
allProcessingErrors = currentProcessingErrors ::: allProcessingErrors
currentProcessingErrors = List.empty
// distribute the work
batch foreach { item =>
// println(item.id)
mappings.get ! item
}
}
}
I somehow cannot run this tutorial, and it's stuck at the point where it's iterating the batch list. I wonder what I did wrong.
Thanks
In the first place, you have to distinguish diff between them.
RoundRobinRouter is a Router that uses round-robin to select a connection.
While
RoundRobinRoutingLogic uses round-robin to select a routee
You can provide own RoutingLogic (it has helped me to understand how Akka works under the hood)
class RedundancyRoutingLogic(nbrCopies: Int) extends RoutingLogic {
val roundRobin = RoundRobinRoutingLogic()
def select(message: Any, routees: immutable.IndexedSeq[Routee]): Routee = {
val targets = (1 to nbrCopies).map(_ => roundRobin.select(message, routees))
SeveralRoutees(targets)
}
}
link on doc http://doc.akka.io/docs/akka/2.3.3/scala/routing.html
p.s. this doc is very clear and it has helped me the most
Actually I misunderstood the method, and found out the solution was to use RoundRobinPool as stated in http://doc.akka.io/docs/akka/2.3-M2/project/migration-guide-2.2.x-2.3.x.html
For example RoundRobinRouter has been renamed to RoundRobinPool or
RoundRobinGroup depending on which type you are actually using.
from
val workers = context.actorOf(Props[ItemProcessingWorker].withRouter(RoundRobinRouter(100)))
to
val workers = context.actorOf(RoundRobinPool(100).props(Props[ItemProcessingWorker]), "router2")

Specify Variable Initialization Order in Scala

I have a special class Model that needs to have its methods called in a very specific order.
I tried doing something like this:
val model = new Model
new MyWrappingClass {
val first = model.firstMethod()
val second = model.secondMethod()
val third = model.thirdMethod()
}
The methods should be called in the order listed, however I am seeing an apparently random order.
Is there any way to get the variable initialization methods to be called in a particular order?
I doubt your methods are called in the wrong order. But to be sure, you can try something like this:
val (first, second, third) = (
model.firstMethod(),
model.secondMethod(),
model.thirdMethod()
)
You likely have some other problem with your code.
I can run 100 million loops where it never gets the order wrong, as follows:
class Model {
var done = Array(false,false,false);
def firstMethod():Boolean = { done(0) = true; done(1) || done(2) };
def secondMethod():Boolean = { done(1) = true; !done(0) || done(2) };
def thirdMethod():Boolean = { done(2) = true; !done(0) || !done(1) };
};
Notice that these methods return a True if done out of order and false when called in order.
Here's your class:
class MyWrappingClass {
val model = new Model;
val first = model.firstMethod()
val second = model.secondMethod()
val third = model.thirdMethod()
};
Our function to check for bad behavior on each trial:
def isNaughty(w: MyWrappingClass):Boolean = { w.first || w.second || w.third };
A short program to test:
var i = 0
var b = false;
while( (i<100000000) && !b ){
b = isNaughty(new MyWrappingClass);
i += 1;
}
if (b){
println("out-of-order behavior occurred");
println(i);
} else {
println("looks good");
}
Scala 2.11.7 on OpenJDK8 / Ubuntu 15.04
Of course this doesn't prove it impossible to have wrong order, only that correct behavior seems highly repeatable in a fairly simple case.