Sample vcdiff-java client - diff

I am trying out vcdiff for creating a diff file from source and target files.
Also, will be applying the diff on source file to get the target file.
I have achieved the above use case with xdelta linux command line tool.
But how to achieve the same using vcdiff-java APIs ?
Any hints or directions will be useful to get started.
Thanks.

I think basic use case of diff and apply can be handled as follows.
Here, diff is generated from source.txt and target.txt. Then, diff is applied on source.txt to get result.txt which is equal to target.txt.
Path source = Files.createFile(Paths.get(basePath + "source.txt"));
Files.write(source, new StringBuilder(
"First line of the file.\n"
+ "Second line of the file."
).toString().getBytes());
Path target = Files.createFile(Paths.get(basePath + "target.txt"));
Files.write(target, "1. First line of the file!".getBytes());
final ByteArrayOutputStream delta_ = new ByteArrayOutputStream();
VCDiffEncoder<OutputStream> encoder = VCDiffEncoderBuilder.builder()
.withDictionary(Files.readAllBytes(source))
.withTargetMatches(false)
.withChecksum(true)
.withInterleaving(true)
.buildSimple();
encoder.encode(Files.readAllBytes(target), delta_);
ByteArrayOutputStream result_out = new ByteArrayOutputStream();
VCDiffDecoder decoder = VCDiffDecoderBuilder.builder().buildSimple();
decoder.decode(Files.readAllBytes(source), delta_.toByteArray(), result_out);
Path result = Files.createFile(Paths.get(basePath + "result.txt"));
Files.write(result, result_out.toByteArray());

Related

Read Forms from BAR file - Flowable in code -

I need an example of usign FormEngine. To be more especific....
I'm executing code below - but there’s no forms found in my BAR file :(
The BAR file was exported from Flowable Modeler and it contains one form and one process and app. Maybe there's other way to deploy and obtain forms...?
RepositoryService repositoryService = processEngine.getRepositoryService();
FormRepositoryService formRepositoryService = formEngine.getFormRepositoryService();
File file = new File(path);
ZipInputStream inputStream = new ZipInputStream(new FileInputStream(path));
String idDeployParent = repositoryService.createDeployment()
.name(file.getName())
.addZipInputStream(inputStream)
.deploy()
.getId();
DeploymentEntity deploymentEntity = (DeploymentEntity) repositoryService.createDeploymentQuery().list().get(0);
formRepositoryService.createDeployment()
.name(file.getName())
.parentDeploymentId(idDeployParent)
.deploy();
System.out.println(" FORMS FOUND: " + formRepositoryService.createFormDefinitionQuery().list().size());

Unzip all file without it's folder using Java

Is it possible to unzip all files from the zip folder without its folder?
Example:
zipfolder.zip has two subfolders called folder1(having files like 1.txt, 2.xlsx, 3.pdf) and folder2(having files like 4.txt, 5.pdf)
Note: The source can any type of archive files like .zip, .rar, .tar, .7-zip etc
This is my code:
String sevenZipLocation = "C:\\Program Files\\7-Zip\\7z.exe";
String src = source filepath (zip file)
String target = output path (output path)
String[] command={sevenZipLocation,"x",src,"-o"+target,"-aou","-y"};
ProcessBuilder p = new ProcessBuilder( command );
Process process = p.start();
InputStream is = process.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
#SuppressWarnings("unused")
String line;
while ((line = br.readLine()) != null){
System.out.println("line1 "+line);
}
process.waitFor();
When I execute this code the output like
unzip folder ----- folder1(having files like 1.txt, 2.xlsx, 3.pdf) and folder2(having files like 4.txt, 5.pdf)
But I want to extract the only file from all folders and the output like
1.txt, 2.xlsx, 3.pdf, 4.txt, 5.pdf in the output path.
Is there any command for that. Thanks.
All you need to change:
String[] command={sevenZipLocation,"e",src,"-o"+target,"-aou","-y","*.*","-r"};
PS. I don't think Java is the best choice to run OS commands. You'll be wasting a lot of time. But if you insist, don't forget there might be an errorstream too.

Compute file content hash with Scala

In our app, we are in need to compute file hash, so we can compare if the file was updated later.
The way I am doing it right now is with this little method:
protected[services] def computeMigrationHash(toVersion: Int): String = {
val migrationClassName = MigrationClassNameFormat.format(toVersion, toVersion)
val migrationClass = Class.forName(migrationClassName)
val fileName = migrationClass.getName.replace('.', '/') + ".class"
val resource = getClass.getClassLoader.getResource(fileName)
logger.debug("Migration file - " + resource.getFile)
val file = new File(resource.getFile)
val hc = Files.hash(file, Hashing.md5())
logger.debug("Calculated migration file hash - " + hc.toString)
hc.toString
}
It all works perfectly, until the code get's deployed into different environment and file file is located in a different absolute path. I guess, the hashing take the path into account as well.
What is the best way to calculate some sort of reliable hash of a file content that well produce the same result for as log as the content of a file stays the same?
Thanks,
Having perused the source code https://github.com/google/guava/blob/master/guava/src/com/google/common/io/Files.java - only the file contents are hashed - the path does not come into play.
public static HashCode hash(File file, HashFunction hashFunction) throws IOException {
return asByteSource(file).hash(hashFunction);
}
Therefore you need not worry about locality of the file. Now why you end up with a different hash on a different fs .. maybe you should compare the size/contents to ensure eg no compound eol's were introduced.

Sed and awk application

I've read a little about sed and awk, and understand that both are text manipulators.
I plan to use one of these to edit groups of files (code in some programming language, js, python etc.) to make similar changes to large sets of files.
Primarily editing function definitions (parameters passed) and variable names for now, but the more I can do the better.
I'd like to know if someone's attempted something similar, and those who have, are there any obvious pitfalls that one should look out for? And which of sed and awk would be preferable/more suitable for such an application. (Or maybe something entirely else? )
Input
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}
Output
function(ParamterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something
}
Here's a GNU awk (for "gensub()" function) script that will transform your sample input file into your desired output file:
$ cat tst.awk
BEGIN{ sym = "[[:alnum:]_]+" }
{
$0 = gensub("^(" sym ")[(](" sym ")[)](.*)","\\1(ParameterOne)\\3","")
$0 = gensub("^(var )(" sym ")(.*)","\\1PartOfSomething.\\2\\3","")
$0 = gensub("^a(rray.*)","sA\\1","")
$0 = gensub("^(" sym " =.*)","var \\1","")
print
}
$ cat file
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}
$ gawk -f tst.awk file
function(ParameterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something;
}
BUT think about how your real input could vary from that - you could have more/less/different spacing between symbols. You could have assignments starting on one line and finishing on the next. You could have comments that contain similar-looking lines to the code that you don't want changed. You could have multiple statements on one line. etc., etc.
You can address every issue one at a time but it could take you a lot longer than just updating your files and chances are you still will not be able to get it completely right.
If your code is EXCEEDINGLY well structured and RIGOROUSLY follows a specific, highly restrictive coding format then you might be able to do what you want with a scripting language but your best bets are either:
change the files by hand if there's less than, say, 10,000 of them or
get a hold of a parser (e.g. the compiler) for the language your files are written in and modify that to spit out your updated code.
As soon as it starts to get slightly more complicated you will switch to a script language anyway. So why not start with python in the first place?
Walking directories:
walking along and processing files in directory in python
Replacing text in a file:
replacing text in a file with Python
Python regex howto:
http://docs.python.org/dev/howto/regex.html
I also recommend to install Eclipse + PyDev as this will make debugging a lot easier.
Here is an example of a simple automatic replacer
import os;
import sys;
import re;
import itertools;
folder = r"C:\Workspaces\Test\";
skip_extensions = ['.gif', '.png', '.jpg', '.mp4', ''];
substitutions = [("Test.Alpha.", "test.alpha."),
("Test.Beta.", "test.beta."),
("Test.Gamma.", "test.gamma.")];
for root, dirs, files in os.walk(folder):
for name in files:
(base, ext) = os.path.splitext(name);
file_path = os.path.join(root, name);
if ext in skip_extensions:
print "skipping", file_path;
else:
print "processing", file_path;
with open(file_path) as f:
s = f.read();
before = [[s[found.start()-5:found.end()+5] for found in re.finditer(old, s)] for old, new in substitutions];
for old, new in substitutions:
s = s.replace(old, new);
after = [[s[found.start()-5:found.end()+5] for found in re.finditer(new, s)] for old, new in substitutions];
for b, a in zip(itertools.chain(*before), itertools.chain(*after)):
print b, "-->", a;
with open(file_path, "w") as f:
f.write(s);

Custom clipboard data format accross RDC (.NET)

I'm trying to copy a custom object from a RDC window into host (my local) machine. It fails.
Here's the code that i'm using to 1) copy and 2) paste:
1) Remote (client running on Windows XP accessed via RDC):
//copy entry
IDataObject ido = new DataObject();
XmlSerializer x = new XmlSerializer(typeof(EntryForClipboard));
StringWriter sw = new StringWriter();
x.Serialize(sw, new EntryForClipboard(entry));
ido.SetData(typeof(EntryForClipboard).FullName, sw.ToString());
Clipboard.SetDataObject(ido, true);
2) Local (client running on local Windows XP x64 workstation):
//paste entry
IDataObject ido = Clipboard.GetDataObject();
DataFormats.Format cdf = DataFormats.GetFormat(typeof(EntryForClipboard).FullName);
if (ido.GetDataPresent(cdf.Name)) //<- this always returns false
{
//can never get here!
XmlSerializer x = new XmlSerializer(typeof(EntryForClipboard));
string xml = (string)ido.GetData(cdf.Name);
StringReader sr = new StringReader(xml);
EntryForClipboard data = (EntryForClipboard)x.Deserialize(sr);
}
It works perfectly on the same machine though.
Any hints?
There are a couple of things you could look into:
Are you sure the serialization of the object truely converts it into XML? Perhaps the outputted XML have references to your memory space? Try looking at the text of the XML to see.
If you really have a serialized XML version of the object, why not store the value as plain-vanilla text and not using typeof(EntryForClipboard) ? Something like:
XmlSerializer x = new XmlSerializer(typeof(EntryForClipboard));
StringWriter sw = new StringWriter();
x.Serialize(sw, new EntryForClipboard(entry));
Clipboard.SetText(sw.ToString(), TextDataFormat.UnicodeText);
And then, all you'd have to do in the client-program is check if the text in the clipboard can be de-serialized back into your object.
Ok, found what the issue was.
Custom format names get truncated to 16 characters when copying over RDC using custom format.
In the line
ido.SetData(typeof(EntryForClipboard).FullName, sw.ToString());
the format name was quite long.
When i was receiving the copied data on the host machine the formats available had my custom format, but truncated to 16 characters.
IDataObject ido = Clipboard.GetDataObject();
ido.GetFormats(); //used to see available formats.
So i just used a shorter format name:
//to copy
ido.SetData("MyFormat", sw.ToString());
...
//to paste
DataFormats.Format cdf = DataFormats.GetFormat("MyFormat");
if (ido.GetDataPresent(cdf.Name)) {
//this not works
...