Unicode character transformation in SPSS - character

I have a string variable. I need to convert all non-digit characters to spaces (" "). I have a problem with unicode characters. Unicode characters (the characters outside the basic charset) are converted to some invalid characters. See the code for example.
Is there any other way how to achieve the same result with procedure which would not choke on special unicode characters?
new file.
set unicode = yes.
show unicode.
data list free
/T (a10).
begin data
1234
5678
absd
12as
12(a
12(vi
12(vī
12āčž
end data.
string Z (a10).
comp Z = T.
loop #k = 1 to char.len(Z).
if ~range(char.sub(Z, #k, 1), "0", "9") sub(Z, #k, 1) = " ".
end loop.
comp Z = normalize(Z).
comp len = char.len(Z).
list var = all.
exe.
The result:
T Z len
1234 1234 4
5678 5678 4
absd 0
12as 12 2
12(a 12 2
12(vi 12 2
12(vī 12 � 6
>Warning # 649
>The first argument to the CHAR.SUBSTR function contains invalid characters.
>Command line: 1939 Current case: 8 Current splitfile group: 1
12āčž 12 �ž 7
Number of cases read: 8 Number of cases listed: 8

The substr function should not be used on the left hand side of an expression in Unicode mode, because the replacement character may not be the same number of bytes as the character(s) being replaced. Instead, use the replace function on the right hand side.
The corrupted characters you are seeing are due to this size mismatch.

How about instead of replacing non-numeric characters, you cycle though and pull out the numeric characters and rebuild Z? (Note my version here is pre CHAR. string functions.)
data list free
/T (a10).
begin data
1234
5678
absd
12as
12(a
12(vi
12(vī
12āčž
12as23
end data.
STRING Z (a10).
STRING #temp (A1).
COMPUTE #len = LENGTH(RTRIM(T)).
LOOP #i = 1 to #len.
COMPUTE #temp = SUBSTR(T,#i,1).
DO IF INDEX('0123456789',#temp) > 0.
COMPUTE Z = CONCAT(SUBSTR(Z,1,#i-1),#temp).
ELSE.
COMPUTE Z = CONCAT(SUBSTR(Z,1,#i-1)," ").
END IF.
END LOOP.
EXECUTE.

Related

Numbers change inside variables for powershell

I am literally just trying to convert a string with week and day information to numbers and store it as variable, yet something really funky is happening, as of now, I have tested this behavior in 4 PCs, and in Powershell 5 and 7 its happening all over the place.
$UP_Down = "6w0d"
[int]$weeks = if ($Up_Down -match "w"){$Up_Down[$($Up_Down.IndexOf('w')-1)]}Else{0}
[int]$days = if ($Up_Down -match "d"){$Up_Down[$Up_Down.IndexOf('d')-1]}Else{0}
[int]$totaldays = (7 * $weeks) + $days
Now the data from the initial variable is obviously 6 weeks and 0 days to which I have to convert to 42 Total days (this is just an example, its happening regardless of combination)
However the following is the Funky results I get which I have elaborated by Write-Output
Weeks If statement results by itself 6
Weeks variable results 54
days If statement results by itself 0
days variable results 48
totaldays variable results are 426
The problem occurs regardless of what numeric datatype I use
Ironically the Variables have the correct value if i DO NOT assign datatype to them, BUT ,
the moment it hits (7*$weeks) even IF the $weeks is correct the value outputted 426, and remember no [int]etc anywhere
What am I doing wrong?
The problem is, that you are not converting the number inside the string "6" to the number 6, but the character '6' to its value according to the underlying character encoding scheme that is 54 in the case of ASCII. Same with the day: '0' has a value of 48. 7 * 54 + 48 = 426.
See the difference:
PS C:\Users\name> [int]"6"[0]
54
PS C:\Users\name> [int]"6"
6
When extracting an element of the string through indexing with [0] you get a character instead of a string of length 1. A cast to int will then return the ASCII value of this character.
You're indexing ([...]) into string $Up_Down, which means you're returning a single character, i.e. a [char] (System.Char) instance.
Casting a [char] to [int] yields its Unicode code point ("ASCII value"), not the digit that the character happens to represent.
For instance, the character 6 is Unicode character DIGIT SIX with code point U+0036; 0036 is the hexadecimal form of the numeric code point, and the decimal form of hexadecimal 0x36 is 54.
PS> [int] "6w0d"[0]
54 # !! Same as: [int] [char] "6"
To interpret the character as a digit, you need an intermediate [string] cast:
PS> [int] [string] "6w0d"[0]
6 # OK - a string is parsed as expected; same as: [int] "6"
If you cast a string rather than char to [int], PowerShell effectively calls System.Int32.Parse behind the scenes as follows: [int]::Parse($string, [cultureinfo]::InvariantCulture).
Note that PowerShell has no char literals - unlike in C#, '...' quoting also produces strings (verbatim ones), and [int] '6' yields integer 6, just like [int] "6" does.
Conversely, you need an explicit [char] cast to convert a single-character string literal to a [char]; e.g., [char] '6'; a multi-character string would cause the cast to fail.
The solution in the context of your command:
[int]$weeks = if ($Up_Down -match "w"){[string] $Up_Down[$Up_Down.IndexOf('w')-1]} Else {0}
[int]$days = if ($Up_Down -match "d"){[string] $Up_Down[$Up_Down.IndexOf('d')-1]} Else {0}
However, I suggest solving the problem differently:
[int] $totalDays = 0
if ($UP_Down -match '^(?:(?<weeks>\d+)w)?(?:(?<days>\d+)d)?$') {
[int] $weeks, [int] $days = $Matches.weeks, $Matches.days
$totalDays = 7 * $weeks + $days
} # else: string wasn't in expected format.
others have shown you why the problem hit you, so this is just an alternate way to get the total day count. [grin]
what it does ...
fakes reading in a text file of Week/Day codes
when ready to use real data, remove the entire #region/#endregion block and use Get-Content.
iterates thru the list
splits on the w
trims away the trailing d
assigns the resulting strings to the two [int] variables on the left of the =
this forces the two number strings to become number objects.
calcs the total days
displays the week/day code, week count, day count, and total days
the code ...
#region >>> fake reading in a list of week/day codes
# in real life use Get-Content
$WD_List = #'
6w0d
3w3d
0w1d
66w6d
9w1d
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a list of week/day codes
foreach ($WL_Item in $WD_List)
{
[int]$WeekCount, [int]$DayCount = $WL_Item.Split('w').TrimEnd('d')
$TotalDays = ($WeekCount * 7) + $DayCount
$WL_Item
$WeekCount
$DayCount
$TotalDays
'=' * 20
}
the output ...
6w0d
6
0
42
====================
3w3d
3
3
24
====================
0w1d
0
1
1
====================
66w6d
66
6
468
====================
9w1d
9
1
64
====================

Matlab: how to convert character array or string into a formatted output OR parse a string

Could someone please tell me how to convert character array into a formatted output using Matlab?
I am expecting data like this:
CHAR (1 x 29) : 0.050822999 3.141592979 ; (1)
OR
CELL (1 x 1) or string: '0.050822999 3.141592979 ; (1)'
I am looking for output like this:
d1 = 0.050822999; %double
d2 = 3.141592979; %double
index = 1; % integer
I tried transposing and then using str2num(Str'); but, it's returning me 0x 0 double.
Any help would be appreciated.
Regards,
DK
you can use regexp to parse the string
c = { '0.050822999 3.141592979 ; (1)' };
p = regexp( c{1}, '^(\d+\.\d+)\s(\d+\.\d+)\s*;\s*\((\d+)\)$', 'tokens', 'once' ); %//parse the input string
numbers = str2mat(p); %// convert extracted strings to numerical values
Example result
ans =
0.050822999
3.141592979
1
Explaining the regexp pattern:
^ - pattern starts at the beginning of the input string
(\d+\.\d+) - parentheses ('()') enclosing this sub-pattern indicates it as a single token
\d+ matches one or more digits, then expecting \. a dot (notice the \, since . alone in regexp acts as a wildcard) and after the dot \d+ one or more digits are expected.
This token should correspond to the first number, e.g., 0.050822999
\s expecting a single space
(\d+\.\d+) - again, expecting another decimal fraction as the second token.
\s* - expecting white space (zero or more).
; - capture the ; in the expression, but not as a token.
\s+ - expecting white space (zero or more).
\( - expecting an open parenthesis, note the \ since parentheses in regexp are used to denote tokens.
(\d+) - expecting one or more digits as the third token, only integer numbers are expected here. no decimal point.
\) - expecting a closing parenthesis.
$ - pattern should reach the end of the input string.
You can use something like this (if I understood you correctly)
function str_dump(var)
info = whos;
disp([info.class ' ' mat2str(info.size) ' : ' var]);
end
This just shows information about the string. If you want to parse it and convert to another Matlab's structure, you have to explain it more carefully.
%// Input
a = [0.050822999 3.141592979];
n = 1;
%// Output
str = [num2str(a,'%0.9f ') ' ; (' num2str(n) ')']
Result:
str =
0.050822999 3.141592979 ; (1)

Insert ASCII in LSB

Does anyone know a efficient method in order to insert the ASCII value of some characters in the 8 least significant bits (LSB) of a 16 bit number?
The only idea that comes up in my mind is to convert both numbers to binary, then replace the last 8 characters, from 16 bit number, by the ASCII value in 8 bits. But as far as I know string operations are very expensive in computational time.
Thanks
I don't know Matlab syntax, but in C, it would be something like this:
short x; // a 16-bit integer in many implementations
... do whatever you need to to x ...
char a = 'a'; // some character
x = (x & 0xFF00) | (short)(a & 0x00FF);
The & operator is the arithmetic "and" operator. The | operator is the arithmetic "or" operator. Numbers beginning with 0x are in hexadecimal for easy readability.
Here is a MATLAB implementation of #user1118321 idea:
%# 16-bit integer number
x = uint16(30000);
%# character
c = 'a';
%# replace lower 8-bit
y = bitand(x,hex2dec('FF00'),class(x)) + cast(c-0,class(x))

Convert numbers 1-26 to A-Z?

How can I convert the numbers in the range 1 through 26 to their respective letter position in the alphabet?
1 = A
2 = B
...
26 = Z
CHR(#) will give you the ASCII character, you just need to offset it based on the ASCII table:
e.g. A = 65, so you will need to add 64 to 1:
CHR(64 + #) = A if # is 1
ASCII code is the numerical representation of a character such as 'a' or 'Z'. Therefore by looking at the table one can see that capital A has a value of 65 and Z has a value of 90. Adding 64 from each value in the range 1-26 will give you their corresponding letter.

Code Golf - Word Scrambler

Please answer with the shortest possible source code for a program that converts an arbitrary plaintext to its corresponding ciphertext, following the sample input and output I have given below. Bonus points* for the least CPU time or the least amount of memory used.
Example 1:
Plaintext: The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!
Ciphertext: eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
Example 2:
Plaintext: 123 1234 12345 123456 1234567 12345678 123456789
Ciphertext: 312 4213 53124 642135 7531246 86421357 975312468
Rules:
Punctuation is defined to be included with the word it is closest to.
The center of a word is defined to be ceiling((strlen(word)+1)/2).
Whitespace is ignored (or collapsed).
Odd words move to the right first. Even words move to the left first.
You can think of it as reading every other character backwards (starting from the end of the word), followed by the remaining characters forwards. Corporation => XoXpXrXtXoX => niaorCoprto.
Thank you to those who pointed out the inconsistency in my description. This has lead many of you down the wrong path, which I apologize for. Rule #4 should clear things up.
*Bonus points will only be awarded if Jeff Atwood decides to do so. Since I haven't checked with him, the chances are slim. Sorry.
Python, 50 characters
For input in i:
' '.join(x[::-2]+x[len(x)%2::2]for x in i.split())
Alternate version that handles its own IO:
print ' '.join(x[::-2]+x[len(x)%2::2]for x in raw_input().split())
A total of 66 characters if including whitespace. (Technically, the print could be omitted if running from a command line, since the evaluated value of the code is displayed as output by default.)
Alternate version using reduce:
' '.join(reduce(lambda x,y:y+x[::-1],x) for x in i.split())
59 characters.
Original version (both even and odd go right first) for an input in i:
' '.join(x[::2][::-1]+x[1::2]for x in i.split())
48 characters including whitespace.
Another alternate version which (while slightly longer) is slightly more efficient:
' '.join(x[len(x)%2-2::-2]+x[1::2]for x in i.split())
(53 characters)
J, 58 characters
>,&.>/({~(,~(>:#+:#i.#-#<.,+:#i.#>.)#-:)#<:##)&.><;.2,&' '
Haskell, 64 characters
unwords.map(map snd.sort.zip(zipWith(*)[0..]$cycle[-1,1])).words
Well, okay, 76 if you add in the requisite "import List".
Python - 69 chars
(including whitespace and linebreaks)
This handles all I/O.
for w in raw_input().split():
o=""
for c in w:o=c+o[::-1]
print o,
Perl, 78 characters
For input in $_. If that's not acceptable, add six characters for either $_=<>; or $_=$s; at the beginning. The newline is for readability only.
for(split){$i=length;print substr$_,$i--,1,''while$i-->0;
print"$_ ";}print $/
C, 140 characters
Nicely formatted:
main(c, v)
char **v;
{
for( ; *++v; )
{
char *e = *v + strlen(*v), *x;
for(x = e-1; x >= *v; x -= 2)
putchar(*x);
for(x = *v + (x < *v-1); x < e; x += 2)
putchar(*x);
putchar(' ');
}
}
Compressed:
main(c,v)char**v;{for(;*++v;){char*e=*v+strlen(*v),*x;for(x=e-1;x>=*v;x-=2)putchar(*x);for(x=*v+(x<*v-1);x<e;x+=2)putchar(*x);putchar(32);}}
Lua
130 char function, 147 char functioning program
Lua doesn't get enough love in code golf -- maybe because it's hard to write a short program when you have long keywords like function/end, if/then/end, etc.
First I write the function in a verbose manner with explanations, then I rewrite it as a compressed, standalone function, then I call that function on the single argument specified at the command line.
I had to format the code with <pre></pre> tags because Markdown does a horrible job of formatting Lua.
Technically you could get a smaller running program by inlining the function, but it's more modular this way :)
t = "The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!"
T = t:gsub("%S+", -- for each word in t...
function(w) -- argument: current word in t
W = "" -- initialize new Word
for i = 1,#w do -- iterate over each character in word
c = w:sub(i,i) -- extract current character
-- determine whether letter goes on right or left end
W = (#w % 2 ~= i % 2) and W .. c or c .. W
end
return W -- swap word in t with inverted Word
end)
-- code-golf unit test
assert(T == "eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos")
-- need to assign to a variable and return it,
-- because gsub returns a pair and we only want the first element
f=function(s)c=s:gsub("%S+",function(w)W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end return W end)return c end
-- 1 2 3 4 5 6 7 8 9 10 11 12 13
--34567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
-- 130 chars, compressed and written as a proper function
print(f(arg[1]))
--34567890123456
-- 16 (+1 whitespace needed) chars to make it a functioning Lua program,
-- operating on command line argument
Output:
$ lua insideout.lua 'The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!'
eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
I'm still pretty new at Lua so I'd like to see a shorter solution if there is one.
For a minimal cipher on all args to stdin, we can do 111 chars:
for _,w in ipairs(arg)do W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end io.write(W ..' ')end
But this approach does output a trailing space like some of the other solutions.
For an input in s:
f=lambda t,r="":t and f(t[1:],len(t)&1and t[0]+r or r+t[0])or r
" ".join(map(f,s.split()))
Python, 90 characters including whitespace.
TCL
125 characters
set s set f foreach l {}
$f w [gets stdin] {$s r {}
$f c [split $w {}] {$s r $c[string reverse $r]}
$s l "$l $r"}
puts $l
Bash - 133, assuming input is in $w variable
Pretty
for x in $w; do
z="";
for l in `echo $x|sed 's/\(.\)/ \1/g'`; do
if ((${#z}%2)); then
z=$z$l;
else
z=$l$z;
fi;
done;
echo -n "$z ";
done;
echo
Compressed
for x in $w;do z="";for l in `echo $x|sed 's/\(.\)/ \1/g'`;do if ((${#z}%2));then z=$z$l;else z=$l$z;fi;done;echo -n "$z ";done;echo
Ok, so it outputs a trailing space.