I've read a little about sed and awk, and understand that both are text manipulators.
I plan to use one of these to edit groups of files (code in some programming language, js, python etc.) to make similar changes to large sets of files.
Primarily editing function definitions (parameters passed) and variable names for now, but the more I can do the better.
I'd like to know if someone's attempted something similar, and those who have, are there any obvious pitfalls that one should look out for? And which of sed and awk would be preferable/more suitable for such an application. (Or maybe something entirely else? )
Input
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}
Output
function(ParamterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something
}
Here's a GNU awk (for "gensub()" function) script that will transform your sample input file into your desired output file:
$ cat tst.awk
BEGIN{ sym = "[[:alnum:]_]+" }
{
$0 = gensub("^(" sym ")[(](" sym ")[)](.*)","\\1(ParameterOne)\\3","")
$0 = gensub("^(var )(" sym ")(.*)","\\1PartOfSomething.\\2\\3","")
$0 = gensub("^a(rray.*)","sA\\1","")
$0 = gensub("^(" sym " =.*)","var \\1","")
print
}
$ cat file
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}
$ gawk -f tst.awk file
function(ParameterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something;
}
BUT think about how your real input could vary from that - you could have more/less/different spacing between symbols. You could have assignments starting on one line and finishing on the next. You could have comments that contain similar-looking lines to the code that you don't want changed. You could have multiple statements on one line. etc., etc.
You can address every issue one at a time but it could take you a lot longer than just updating your files and chances are you still will not be able to get it completely right.
If your code is EXCEEDINGLY well structured and RIGOROUSLY follows a specific, highly restrictive coding format then you might be able to do what you want with a scripting language but your best bets are either:
change the files by hand if there's less than, say, 10,000 of them or
get a hold of a parser (e.g. the compiler) for the language your files are written in and modify that to spit out your updated code.
As soon as it starts to get slightly more complicated you will switch to a script language anyway. So why not start with python in the first place?
Walking directories:
walking along and processing files in directory in python
Replacing text in a file:
replacing text in a file with Python
Python regex howto:
http://docs.python.org/dev/howto/regex.html
I also recommend to install Eclipse + PyDev as this will make debugging a lot easier.
Here is an example of a simple automatic replacer
import os;
import sys;
import re;
import itertools;
folder = r"C:\Workspaces\Test\";
skip_extensions = ['.gif', '.png', '.jpg', '.mp4', ''];
substitutions = [("Test.Alpha.", "test.alpha."),
("Test.Beta.", "test.beta."),
("Test.Gamma.", "test.gamma.")];
for root, dirs, files in os.walk(folder):
for name in files:
(base, ext) = os.path.splitext(name);
file_path = os.path.join(root, name);
if ext in skip_extensions:
print "skipping", file_path;
else:
print "processing", file_path;
with open(file_path) as f:
s = f.read();
before = [[s[found.start()-5:found.end()+5] for found in re.finditer(old, s)] for old, new in substitutions];
for old, new in substitutions:
s = s.replace(old, new);
after = [[s[found.start()-5:found.end()+5] for found in re.finditer(new, s)] for old, new in substitutions];
for b, a in zip(itertools.chain(*before), itertools.chain(*after)):
print b, "-->", a;
with open(file_path, "w") as f:
f.write(s);
Related
I'm writing a bot in Scala for a game that uses text input and output. So I want to work with a process interactively - that is, my code receives output from the process, works with it, and only then sends its next input to the process. So I want to give a function access to the inputStreams and the outputStream simultaneously.
This doesn't seem to fit into any of the factories in scala.sys.process.BasicIO or the constructor for scala.sys.process.ProcessIO (three functions, each of which has access to only one stream).
Here's how I'm doing it at the moment.
private var rogue_input: OutputStream = _
private var rogue_output: InputStream = _
private var rogue_error: InputStream = _
Process("python3 /home/robin/IdeaProjects/Rogomatic/python/rogue.py --rogomatic").run(
new ProcessIO(rogue_input = _, rogue_output = _, rogue_error = _)
)
try {
private val rogue_scanner = new Scanner(rogue_output)
private val rogue_writer = new PrintWriter(rogue_input, true)
// Play the game
} finally {
rogue_input.close()
rogue_output.close()
rogue_error.close()
}
This works, but it doesn't feel very Scala-like. Is there a more idiomatic way to do this?
So I want to work with a process interactively - that is, my code receives output from the process, works with it, and only then sends its next input to the process.
In general, this is traditionally solved by expect. There exist libraries and tools inspired by expect for various languages, including for Scala: https://github.com/Lasering/scala-expect.
The README of the project gives various examples. While I don't know exactly what your rouge.py expects in terms of stdin/stdout interactions, here's a quick "hello world" example showing how you could interact with a Python interpreter (using the Ammonite REPL, which has conveniently library importing capabilities):
import $ivy.`work.martins.simon::scala-expect:6.0.0`
import work.martins.simon.expect.core._
import work.martins.simon.expect.core.actions._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
val timeout = 5 seconds
val e = new Expect("python3 -i -", defaultValue = "?")(
new ExpectBlock(
new StringWhen(">>> ")(
Sendln("""print("hello, world")""")
)
),
new ExpectBlock(
new RegexWhen("""(.*)\n>>> """.r)(
ReturningWithRegex(_.group(1).toString)
)
)
)
e.run(timeout).onComplete(println)
What the code above does is it "expects" >>> to be sent to stdout, and when it finds that, it will send print("hello, world"), followed by a newline. From then, it reads and returns everything until the next prompt (>>>) using a regex.
Amongst other debug information, the above should result in Success(hello, world) being printed to your console.
The library has various other styles, and there may also exist other similar libraries out there. My main point is that an expect-inspired library is likely what you're looking for.
I am trying to write a macro for debug print in the Nim language.
Currently this macro adds filename andline to the output by instantiationInfo().
import macros
macro debugPrint(msg: untyped): typed =
result = quote do:
let pos = instantiationInfo()
echo pos.filename, ":", pos.line, ": ", `msg`
proc hello() =
debugPrint "foo bar"
hello()
currently output:
debug_print.nim:9: foo bar
I would like to add the name of the procedure (or iterator) of the place where the macro was called.
desired output:
debug_print.nim:9(proc hello): foo bar
How can I get the name of procedure (or iterator) in Nim, like __func__ in C?
At runtime you can do getFrame().procname, but it only works with stacktrace enabled (not in release builds).
At compile-time surprisingly I can't find a way to do it. There is callsite() in macros module, but it doesn't go far enough. It sounds like something that might fit into the macros.LineInfo object.
A hacky solution would be to also use __func__ and parse that back into the Nim proc name:
template procName: string =
var name: cstring
{.emit: "`name` = __func__;".}
($name).rsplit('_', 1)[0]
building on answer from #def- but making it more robust to handle edge cases of functions containing underscores, and hashes containing trailing _N or not
also using more unique names as otherwise macro would fail if proc defines a variable name
import strutils
proc procNameAux*(name:cstring): string =
let temp=($name).rsplit('_', 2)
#CHECKME: IMPROVE; the magic '4' chosen to be enough for most cases
# EG: bar_baz_9c8JPzPvtM9azO6OB23bjc3Q_3
if temp.len>=3 and temp[2].len < 4:
($name).rsplit('_', 2)[0]
else:
# EG: foo_9c8JPzPvtM9azO6OB23bjc3Q
($name).rsplit('_', 1)[0]
template procName*: string =
var name2: cstring
{.emit: "`name2` = __func__;".}
procNameAux(name2)
proc foo_bar()=
echo procName # prints foo_bar
foo_bar()
NOTE: this still has some issues that trigger in complex edge cases, see https://github.com/nim-lang/Nim/issues/8212
In our app, we are in need to compute file hash, so we can compare if the file was updated later.
The way I am doing it right now is with this little method:
protected[services] def computeMigrationHash(toVersion: Int): String = {
val migrationClassName = MigrationClassNameFormat.format(toVersion, toVersion)
val migrationClass = Class.forName(migrationClassName)
val fileName = migrationClass.getName.replace('.', '/') + ".class"
val resource = getClass.getClassLoader.getResource(fileName)
logger.debug("Migration file - " + resource.getFile)
val file = new File(resource.getFile)
val hc = Files.hash(file, Hashing.md5())
logger.debug("Calculated migration file hash - " + hc.toString)
hc.toString
}
It all works perfectly, until the code get's deployed into different environment and file file is located in a different absolute path. I guess, the hashing take the path into account as well.
What is the best way to calculate some sort of reliable hash of a file content that well produce the same result for as log as the content of a file stays the same?
Thanks,
Having perused the source code https://github.com/google/guava/blob/master/guava/src/com/google/common/io/Files.java - only the file contents are hashed - the path does not come into play.
public static HashCode hash(File file, HashFunction hashFunction) throws IOException {
return asByteSource(file).hash(hashFunction);
}
Therefore you need not worry about locality of the file. Now why you end up with a different hash on a different fs .. maybe you should compare the size/contents to ensure eg no compound eol's were introduced.
The NextRelease plugin for Dist::Zilla looks for {{$NEXT}} in the Changes file to put the release datetime information. However, I can't get this to be generated using my profile.ini. Here's what I have:
[GenerateFile / Generate-Changes ]
filename = Changes
is_template = 1
content = Revision history for {{$dist->name}}
content =
;todo: how can we get this to print correctly with a template?
content = {{$NEXT}}
content = initial release
{{$dist->name}} is correctly replaced with my distribution name, but {{$NEXT}} is, as-is, replaced with nothing (since it's not escaped and there is no $NEXT variable in). I've tried different combinations of slashes to escape the braces, but it either results in nothing or an error during generation with dzil new. How can I properly escape this string so that after dzil processes it with Text::Template it outputs {{$NEXT}}?
In the content, {{$NEXT}} is being interpreted as a template block and, as you say, wants to fill itself in as the contents of the missing $NEXT.
Instead, try:
content = {{'{{$NEXT}}'}}
Example program:
use 5.14.0;
use Text::Template 'fill_in_string';
my $result = fill_in_string(
q<{{'{{$NEXT}}'}}>,
DELIMITERS => [ '{{', '}}' ],
BROKEN => sub { die },
);
say $result;
In implementation file (.m) I have 30.. methods. How can I put their description (all of them) into .h file automatically?
Seams hard to do properly with a regex, but you can do it with awk:
https://gist.github.com/1771131
#!/usr/bin/env awk -f
# print class and instance methods declarations from implementation
# Usage: ./printmethods.awk class.m or awk -f printmethods.awk class.m
/^[[:space:]]*#implementation/ {
implementation = 1;
}
/^[[:space:]]*#end/ {
implementation = 0;
}
/^[[:space:]]*[\-\+]/ {
if(implementation) {
method = 1;
collect = "";
}
}
/[^[:space:]]/ {
if(implementation && method) {
p = index($0, "{");
if(p == 0) {
if(collect == "")
collect = $0
else
collect = collect $0 "\n";
} else {
method = 0;
# trim white space and "{" from line end
gsub("[\{[:space:]]*$", "", $0);
collect = collect $0;
# trim white space from start
gsub("^[[:space:]]*", "", collect);
print collect ";"
}
}
}
Write a piece of code that will extract all the methods definitions (Use regular expressions to detect them) and then just added it to the h file and "\; \n".
The program Accessorizer (on the Mac App Store for $5) is specifically intended for these obnoxious grunt work issues in Xcode. It can generate prototypes as well as property synthesizes, accessors, inits, etc.
Caveat: it's been a bit touchy and rough around the edges in my experience. It might, for example, not realize that a function is inside a multi-line comment, and thus provide an unwanted prototype for it. But even with those quirks in mind, it's saved me way more than $5 worth of time.
Their website: http://www.kevincallahan.org/software/accessorizer.html