scala : getwords similar to getlines - scala

I want to write a piece of code that looks like following :
var wordItr = Source.fromFile("myfile").getWords
while (wordItr.hasNext) {
val word = wordItr.next
process(word)
}
Reason behind this logic is "myfile" file is really big (around 10GB) and has no line breaks and writing a code like above really helps.
Can you please suggest how to code wordItr

Source.fromFile("myfile").getLines.flatMap(_.split(" "))
or
import java.io.File
import java.util.Scanner
var wordItr = new Scanner(new File("myFile")).useDelimiter(" ")

Related

CSV Feeders for gatling 3

I am using Gatling 3. I have a csv feeder with just one column titled accountIds. I need to pass this in the body of my POST request as JSON. I have tried a lot of different syntax but nothing seems to be working. I can also not print what is actually being sent in the body. It works if I remove the "$accountIds" and use an actual value instead. Below is my code:
val searchFeeder = csv("C://data/accountids.csv").random
val scn1 = scenario("Scenario 1")
.feed(searchFeeder)
.exec(http("Search")
.post("/v3/accounts/")
.body(StringBody("""{"accountIds": "${accountIds}"}""")).asJson)
setUp(scn1.inject(atOnceUsers(10)).protocols(httpConf))
Have you enabled trace level in logback.xml to see the details of post request?
Also, can you confirm if location you have mentioned "C://data/accountids.csv" is the right one. Generally, data folder resides in project location and within project you can access the data file as:
val searchFeeder = csv("data/stack.csv").random
I just created sample script and enabled logging.I am able to see that accountId is getting passed:
package basicpackage
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.core.scenario.Simulation
class StackFeeder extends Simulation {
val httpConf=http.baseUrl("http://example.com")
val searchFeeder = csv("data/stack.csv").random
val scn1 = scenario("Scenario 1")
.feed(searchFeeder)
.exec(http("Search")
.post("/v3/accounts/")
.body(StringBody("""{"accountIds": "${accountIds}"}""")).asJson)
setUp(scn1.inject(atOnceUsers(1)).protocols(httpConf))
csv file location

How do I print a dictionary vs a defaultdict based dictionary as a yaml file using ruamel.yaml?

Please refer to this trivial block of code shown below. My goal is to use defaultdict to come up with a relatively simple dictionary, and further print the results out as a yaml file.
When I manually define the dictionary, it seems to work just fine and the YAML is displayed exactly the way I want it, but when I use defaultdict to come up with a dictionary, I get an error message and unfortunately I am not able to decipher that.
When I print the dictionary as a JSON, it prints the exact same output. What I am missing?
import sys,ruamel.yaml
import json
from collections import defaultdict
def dict_maker():
return defaultdict(dict_maker)
S = ruamel.yaml.scalarstring.DoubleQuotedScalarString
app = "someapp"
d = {'beats':{'name':S(app), 'udp_address':S('239.1.1.1:10101')}}
foo = dict_maker()
foo["beats"]["name"] = S(app)
foo["beats"]["udp_address"] = S("239.1.1.1:10101")
print "Regular dictionary"
print json.dumps(d, indent=4)
print "defaultdict dictionary"
print json.dumps(foo, indent=4)
print "dictionary as a yaml\n"
ruamel.yaml.dump(d, sys.stdout, Dumper=ruamel.yaml.RoundTripDumper)
print "defaultdict dictionary as a yaml\n"
ruamel.yaml.dump(foo, sys.stdout, Dumper=ruamel.yaml.RoundTripDumper)
Error Message
raise RepresenterError("cannot represent an object: %s" % data)
ruamel.yaml.representer.RepresenterError: cannot represent an object: defaultdict(<function dict_maker at 0x7f1253725a28>, {'beats': defaultdict(<function dict_maker at 0x7f1253725a28>, {'name': u'someapp', 'udp_address': u'239.1.1.1:10101'})})
You seem to be using the word "dictionary" when refering to a Python dict. There is however no such thing as a "defaultdict based dictionary", that would imply that foo after
foo = dict_maker()
would be a dict, and of course it is not: foo is a defaultdict which is dict based (i.e. exactly the other way around from what you write).
That JSON dumps this, is not surprising, as it cannot do more than stupidly dump the key-value pairs as if it were a dict. But when you try to load that JSON back, you see how useless this is as, you cannot continue working with it (at least not in the way expected):
import sys
import json
from collections import defaultdict
import io
def dict_maker():
return defaultdict(dict_maker)
app = "someapp"
foo = dict_maker()
foo["beats"]["name"] = app
foo["beats"]["udp_address"] = "239.1.1.1:10101"
io = io.StringIO()
json.dump(foo, io, indent=4)
io.seek(0)
bar = json.load(io)
bar['otherapp']['name'] = 'some_alt_app'
print(bar['beats']['udp_address'])
The above throws: KeyError: 'otherapp'. And that is because JSON doesn't keep all the information needed.
However, if you use the unsafe YAML dumper, then ruamel.yaml can dump and load this fine:
import sys
from ruamel.yaml import YAML
from collections import defaultdict
import io
def dict_maker():
return defaultdict(dict_maker)
app = "someapp"
yaml = YAML(typ='unsafe')
foo = dict_maker()
foo["beats"]["name"] = app
foo["beats"]["udp_address"] = "239.1.1.1:10101"
io = io.StringIO()
yaml.dump(foo, io)
io.seek(0)
print(io.getvalue())
bar = yaml.load(io)
bar['otherapp']['name'] = 'some_alt_app'
print(bar['beats']['udp_address'])
this doesn't throw an error, as bar is again a defaultdict with dict_maker as the function it defaults to. The above prints
239.1.1.1:10101
as you would expect.
That the RoundTripDumper/Loader doesn't support this out-of-the-box, is because it is based on the SafeDumper/Loader, which cannot dump/load arbitrary Python instances like defaultdict and its dict_maker function reference. Enabling that would make the loading unsafe.
So if you need to use the RoundTripDumper you should add a representer for defaultdict or a subclass thereof (and possible one for dict_maker as well). To be able to load that, you need constructor(s) as well. How to do that is described in the documentation (Dumping Python classes)

How do I work with a Scala process interactively?

I'm writing a bot in Scala for a game that uses text input and output. So I want to work with a process interactively - that is, my code receives output from the process, works with it, and only then sends its next input to the process. So I want to give a function access to the inputStreams and the outputStream simultaneously.
This doesn't seem to fit into any of the factories in scala.sys.process.BasicIO or the constructor for scala.sys.process.ProcessIO (three functions, each of which has access to only one stream).
Here's how I'm doing it at the moment.
private var rogue_input: OutputStream = _
private var rogue_output: InputStream = _
private var rogue_error: InputStream = _
Process("python3 /home/robin/IdeaProjects/Rogomatic/python/rogue.py --rogomatic").run(
new ProcessIO(rogue_input = _, rogue_output = _, rogue_error = _)
)
try {
private val rogue_scanner = new Scanner(rogue_output)
private val rogue_writer = new PrintWriter(rogue_input, true)
// Play the game
} finally {
rogue_input.close()
rogue_output.close()
rogue_error.close()
}
This works, but it doesn't feel very Scala-like. Is there a more idiomatic way to do this?
So I want to work with a process interactively - that is, my code receives output from the process, works with it, and only then sends its next input to the process.
In general, this is traditionally solved by expect. There exist libraries and tools inspired by expect for various languages, including for Scala: https://github.com/Lasering/scala-expect.
The README of the project gives various examples. While I don't know exactly what your rouge.py expects in terms of stdin/stdout interactions, here's a quick "hello world" example showing how you could interact with a Python interpreter (using the Ammonite REPL, which has conveniently library importing capabilities):
import $ivy.`work.martins.simon::scala-expect:6.0.0`
import work.martins.simon.expect.core._
import work.martins.simon.expect.core.actions._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
val timeout = 5 seconds
val e = new Expect("python3 -i -", defaultValue = "?")(
new ExpectBlock(
new StringWhen(">>> ")(
Sendln("""print("hello, world")""")
)
),
new ExpectBlock(
new RegexWhen("""(.*)\n>>> """.r)(
ReturningWithRegex(_.group(1).toString)
)
)
)
e.run(timeout).onComplete(println)
What the code above does is it "expects" >>> to be sent to stdout, and when it finds that, it will send print("hello, world"), followed by a newline. From then, it reads and returns everything until the next prompt (>>>) using a regex.
Amongst other debug information, the above should result in Success(hello, world) being printed to your console.
The library has various other styles, and there may also exist other similar libraries out there. My main point is that an expect-inspired library is likely what you're looking for.

Angular-dart: "target of URI..."

I searched high and low for this issue and, while I found reports of the same error ubiquitously, I didn't find this specific error and can't seem to resolve it. I'm just now learning Dart and am following the Angular-Dart tutorial, and I keep getting this error early on, with no explanation that I can find. This is my badge_controller file:
library s1_basics.badge_controller;
import 'package:angular/angular.dart';
#NgController(
selector: '[badge-controller]',
publishAs: 'ctrl'
)
class BadgeController {
static const DEFAULT_NAME = 'Anne Bonney';
static const LABEL1 = 'Arrr! Write yer name!';
static const LABEL2 = 'Aye! Gimme a name!';
String name = '';
bool get inputIsNotEmpty => name.trim().isNotEmpty;
String get label => inputIsNotEmpty ? LABEL1 : LABEL2;
generateName() {
name = DEFAULT_NAME;
}
}
And this is my pirate_module file:
library s1_basics.pirate_module;
import 'package:angular/angular.dart';
import 'package:s1_basics/badge_controller.dart';
class PirateModule extends Module {
PirateModule() {
type(BadgeController);
}
}
I'm so early on in this tutorial that I don't think there's much else that I can show to debug this, but if anyone has an idea, I'd greatly appreciate it. Cheers!
Changing the import paths to be relative references for the imported files seems to fix it; i.e. initially this gave errors:
import 'package:s1_basics/pirate_module.dart';
but after changing it to
import '../lib/pirate_module.dart';
it worked...so that's the fix (for this) if anyone out there has issues.

Sed and awk application

I've read a little about sed and awk, and understand that both are text manipulators.
I plan to use one of these to edit groups of files (code in some programming language, js, python etc.) to make similar changes to large sets of files.
Primarily editing function definitions (parameters passed) and variable names for now, but the more I can do the better.
I'd like to know if someone's attempted something similar, and those who have, are there any obvious pitfalls that one should look out for? And which of sed and awk would be preferable/more suitable for such an application. (Or maybe something entirely else? )
Input
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}
Output
function(ParamterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something
}
Here's a GNU awk (for "gensub()" function) script that will transform your sample input file into your desired output file:
$ cat tst.awk
BEGIN{ sym = "[[:alnum:]_]+" }
{
$0 = gensub("^(" sym ")[(](" sym ")[)](.*)","\\1(ParameterOne)\\3","")
$0 = gensub("^(var )(" sym ")(.*)","\\1PartOfSomething.\\2\\3","")
$0 = gensub("^a(rray.*)","sA\\1","")
$0 = gensub("^(" sym " =.*)","var \\1","")
print
}
$ cat file
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}
$ gawk -f tst.awk file
function(ParameterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something;
}
BUT think about how your real input could vary from that - you could have more/less/different spacing between symbols. You could have assignments starting on one line and finishing on the next. You could have comments that contain similar-looking lines to the code that you don't want changed. You could have multiple statements on one line. etc., etc.
You can address every issue one at a time but it could take you a lot longer than just updating your files and chances are you still will not be able to get it completely right.
If your code is EXCEEDINGLY well structured and RIGOROUSLY follows a specific, highly restrictive coding format then you might be able to do what you want with a scripting language but your best bets are either:
change the files by hand if there's less than, say, 10,000 of them or
get a hold of a parser (e.g. the compiler) for the language your files are written in and modify that to spit out your updated code.
As soon as it starts to get slightly more complicated you will switch to a script language anyway. So why not start with python in the first place?
Walking directories:
walking along and processing files in directory in python
Replacing text in a file:
replacing text in a file with Python
Python regex howto:
http://docs.python.org/dev/howto/regex.html
I also recommend to install Eclipse + PyDev as this will make debugging a lot easier.
Here is an example of a simple automatic replacer
import os;
import sys;
import re;
import itertools;
folder = r"C:\Workspaces\Test\";
skip_extensions = ['.gif', '.png', '.jpg', '.mp4', ''];
substitutions = [("Test.Alpha.", "test.alpha."),
("Test.Beta.", "test.beta."),
("Test.Gamma.", "test.gamma.")];
for root, dirs, files in os.walk(folder):
for name in files:
(base, ext) = os.path.splitext(name);
file_path = os.path.join(root, name);
if ext in skip_extensions:
print "skipping", file_path;
else:
print "processing", file_path;
with open(file_path) as f:
s = f.read();
before = [[s[found.start()-5:found.end()+5] for found in re.finditer(old, s)] for old, new in substitutions];
for old, new in substitutions:
s = s.replace(old, new);
after = [[s[found.start()-5:found.end()+5] for found in re.finditer(new, s)] for old, new in substitutions];
for b, a in zip(itertools.chain(*before), itertools.chain(*after)):
print b, "-->", a;
with open(file_path, "w") as f:
f.write(s);