Why Groovy file write with UTF-16LE produce BOM char? - encoding

Do you have idea why first and secod lines below do not produce BOM to the file and third line does? I thought UTF-16LE is correct encoding name and that encoding does no create BOM automatically to beginning of the file.
new File("foo-wo-bom.txt").withPrintWriter("utf-16le") {it << "test"}
new File("foo-bom1.txt").withPrintWriter("UnicodeLittleUnmarked") {it << "test"}
new File("foo-bom.txt").withPrintWriter("UTF-16LE") {it << "test"}
Another samples
new File("foo-bom.txt").withPrintWriter("UTF-16LE") {it << "test"}
new File("foo-bom.txt").getBytes().each {System.out.format("%02x ", it)}
prints
ff fe 74 00 65 00 73 00 74 00
and with java
PrintWriter w = new PrintWriter("foo.txt","UTF-16LE");
w.print("test");
w.close();
FileInputStream r = new FileInputStream("foo.txt");
int c;
while ((c = r.read()) != -1) {
System.out.format("%02x ",c);
}
r.close();
prints
74 00 65 00 73 00 74 00
With Java is does not produce BOM and with Groovy there is BOM.

There appears to be a difference in behavior with withPrintWriter. Try this out in your GroovyConsole
File file = new File("tmp.txt")
try {
String text = " "
String charset = "UTF-16LE"
file.withPrintWriter(charset) { it << text }
println "withPrintWriter"
file.getBytes().each { System.out.format("%02x ", it) }
PrintWriter w = new PrintWriter(file, charset)
w.print(text)
w.close()
println "\n\nnew PrintWriter"
file.getBytes().each { System.out.format("%02x ", it) }
} finally {
file.delete()
}
It outputs
withPrintWriter
ff fe 20 00
new PrintWriter
20 00
This is because calling new PrintWriter calls the Java constructor, but calling withPrintWriter eventually calls org.codehaus.groovy.runtime.ResourceGroovyMethods.writeUTF16BomIfRequired(), which writes the BOM.
I'm uncertain whether this difference in behavior is intentional. I was curious about this, so I asked on the mailing list. Someone there should know the history behind the design.
Edit: GROOVY-7465 was created out of the aforementioned discussion.

Related

How swift know value of memory is address or actual value that I assigned

struct ValueType {
var member: Int
}
class ReferenceType {
var member: Int
init(member: Int) {
self.member = member
}
}
var valueTypeObject = ValueType(member: 3)
var referenceTypeObject = ReferenceType(member: 4)
withUnsafePointer(to: &referenceTypeObject) {
print("referenceTypeObject address: \($0)")
}
withUnsafePointer(to: &valueTypeObject) {
print("valueTypeObject address: \($0)")
}
When executing the above code, the address of each object appears like this.
valueTypeObject address: 0x0000000100008218
referenceTypeObject address: 0x0000000100008220
First, if I view memory of valueTypeObject address (0x0000000100008218), I can check the 03 value within 64 bits that I actually allocated (03 00 00 00 00 00 00 00 00 00 00 00 00 00. Maybe data is stored as little endian.)
Next, if I view memory of referenceTypeObject address (0x0000000100008220), I can check 0x000000010172f8b0 is stored in 64bit. (I don't know why right side of ..r..... is also highlighted, and what it is 🤔)
I know that the referenceTypeObject is reference type, so the actual value is in the heap area. So I can guess 0x000000010172f8b0 is an address that stores the actual value that I assigned (in this case, 4.)
But how does Swift know that this is the address that points to heap area instead of 0x000000010172f8b0 value that can be assied by me?
In addition, if I view memory of address 0x000000010172f8b0 where the actual value is stored, there are some 32 bytes values in front of the value that I allocated (in this case, 4). What are those?

Sending Midi Sysex messages in C#

I'm trying to figure out how to make a simple Winform app that sends a SysEx MIDI message when I click a button, so button 1 sends:-
F0 7F 22 02 50 01 31 00 31 00 31 F7
Button 2 sends:
F0 7F 22 02 50 01 32 00 31 00 31 F7
and so on...
I was able to find lots of info about sending notes and instrument data but nothing really about sysex. I have played with the Sanford reference which seemed to get me closer but still nothing about Sysex usage.
There are various MIDI libraries available, and most support SysEx in one form or another. I've mostly used the managed-midi package although I've used the Sanford package before.
In managed-midi at least, sending a SysEx message is as simple as obtaining an IMidiOutput (usually from a MidiAccessManager) and then calling the Send method. For example:
// You'd do this part just once, of course...
var accessManager = MidiAccessManager.Default;
var portId = access.Outputs.First().Id;
var port = await access.OpenOutputAsync(portId);
// You'd port this part in your button click handler.
// The data is copied from the question, so I'm assuming it's okay...
var message = new byte[] {
0xF0, 0x7F, 0x22, 0x02, 0x50, 0x01,
0x31, 0x00, 0x31, 0x00, 0x31, 0xF7 };
port.Send(message, 0, message.Length, timestamp: 0);

Utf8 encoding makes me confused

let buf1 = Buffer.from("3", "utf8");
let buf2 = Buffer.from("Здравствуйте", "utf8");
// <Buffer 33>
// <Buffer d0 97 d0 b4 d1 80 d0 b0 d0 b2 d1 81 d1 82 d0 b2 d1 83 d0 b9 d1 82 d0 b5>
Why does char '3' encode to '33' in buf1 but 'd0 97' in buf2?
Because 3 is not З, despite the similarity to the untrained eye. Look closer and you'll see the difference, however subtle.
The former is Unicode code point U+0033 - DIGIT THREE (see here), while the latter is U+0417 - CYRILLIC CAPITAL LETTER ZE (see here), encoded in UTF-8 as d0 97.
The Russian word is actually hello, pronounced (very roughly, since I only know hello and goodbye, taught by a Russian girlfriend many decades ago) "Strasvoytza", with no "three" anywhere in the concept.
The first character of the second buffer is the Cyrillic character "Ze" https://en.m.wikipedia.org/wiki/Ze_(Cyrillic) and not the Arabic numeral 3 https://en.m.wikipedia.org/wiki/3

windbg script causes memory access violation

I am using the following windbg script to break when a certain value is encountered in the buffer when reading a file
bp ReadFile
.while(1)
{
g
$$ Get parameters of ReadFile()
r $t0 = dwo(esp+4)
r $t1 = dwo(esp+8)
r $t2 = dwo(esp+0x0c)
$$ Execute until return is reached
pt
$$ Read magic value in the buffer
$$ CHANGE position in buffer here
r $t5 = dwo(#$t1+0x00)
$$ Check if magic value matches
$$ CHANGE constant here
.if(#$t5 == 0x70170000)
{
$$db #$t1
$$ break
.break
}
}
$$ Clear BP for ReadFile (assume it is the 0th one)
bc 0
I get the following memory access violation when I run this script.
Memory access error at ');; $$ Check if magic value matches; $$ CHANGE constant here; .if(#$t5 == 0x70170000); {; $$db #$t1;; $$ break; .break; };'
Why is this the case?
If you need to read the buffer contents at kernel32!ReadFile you need to save the buffer address and step out of the function using gu (goup or step out)
when broken on ReadFile esp+8 points to the buffer so save it and step out
r $t1 = poi(#esp+8);gu
the first Dword of the buffer is poi(#$t1) compare it with the required Dword
and take necessary action with .if .else
.if( poi(#$t1) != 636c6163 ) {gc} .else {db #$t1 l10;gc}
putting this all together in one line the script shoule be
bp k*32!ReadFile "r $t1 =poi(#esp+8);gu;.if((poi(#$t1))!=636c6163){gc}.else{db #$t1 l10;gc}"
here 636c6163 is 'clac' (calc reversed ) use the dword you want instead of this
a sample run on calc.exe xp sp3 32 bits
bl
bp k*32!ReadFile "r $t1=poi(#esp+8);gu;.if((poi(#$t1))!=636c6163){gc}.else{db #$t1 l10;gc}"
.bpcmds
bp0 0x7c801812 "r $t1 = poi(#esp+8);gu;.if( (poi(#$t1))!=636c6163){gc}.else{db #$t1 l10;gc}"
0:002> g
00b865b0 63 61 6c 63 5f 77 68 61-74 69 73 5f 69 6e 74 72 calc_whatis_intr
00374df0 63 61 6c 63 00 ab ab ab-ab ab ab ab ab fe ee fe calc............

openURL tel-scheme iPhone

When I try to dial a telephone number(using openURL method of UIApplication class) with e.g. +47 00 00 00 00 it does not work, however when I change to +4700000000 it works.
I don't need an example of how to convert from the first example to the last, but is there any methods that does this conversion?
Thanks in advance.
Try this:
NSString *withSpaces = #"+47 00 00 00 00";
NSString *withoutSpaces = [withSpaces
stringByReplacingOccurrencesOfString:#" "
withString:#""];