J unicode index accessor

J unicode index accessor - unicode

In J, i can do the following:
r=:'0123456'
m=:3 } r
echo m
and it prints 3, as it should.
However, unicode seems to not work:
'▁▂▃▄▅▆▇'
m=: 3 } r
echo m
prints nothing. My guess is that this is due to } indexing by byte - what is the proper way to index by char position?

You are correct that the indexing of the list given is by byte. That is because its datatype is literal. If you want it to be interpreted as unicode, then the list needs to be converted to unicode:
datatype '①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳' NB. check datatype of list
literal
# '①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳' NB. count items in list
60
ucp '①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳' NB. convert to unicode point chars
①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳
datatype ucp '①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳' NB. check datatype
unicode
# ucp '①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳' NB. count items in unicode list
20
3} ucp '①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳' NB. index into the list
④

Related

Convert byte array (hex) to signed Int

I am trying to convert a (variable length) Hex String to Signed Integer (I need either positive or negative values).
[Int16] [int 32] and [int64] seem to work fine with 2,4+ byte length Hex Strings but I'm stuck with 3 byte strings [int24] (no such command in powershell).
Here's what I have now (snippet):
$start = $mftdatarnbh.Substring($DataRunStringsOffset+$LengthBytes*2+2,$StartBytes*2) -split "(..)"
[array]::reverse($start)
$start = -join $start
if($StartBytes*8 -le 16){$startd =[int16]"0x$($start)"}
elseif($StartBytes*8 -in (17..48)){$startd =[int32]"0x$($start)"}
else{$startd =[int64]"0x$($start)"}
With the above code, a $start value of "D35A71" gives '13851249' instead of '-2925967'. I tried to figure out a way to implement two's complement but got lost. Any easy way to do this right?
Thank you in advance
Edit: Basically, I think I need to implement something like this:
int num = (sbyte)array[0] << 16 | array[1] << 8 | array[2];
as seen here.
Just tried this:
$start = "D35A71"
[sbyte]"0x$($start.Substring(0,2))" -shl 16 -bor "0x$($start.Substring(2,2))" -shl 8 -bor "0x$($start.Substring(4,2))"
but doesn't seem to get the correct result :-/

To parse your hex.-number string as a negative number you can use [bigint] (System.Numerics.BigInteger):
# Since the most significant hex digit has a 1 as its most significant bit
# (is >= 0x8), it is parsed as a NEGATIVE number.
# To force unconditional interpretation as a positive number, prepend '0'
# to the input hex string.
PS> [bigint]::Parse('D35A71', 'AllowHexSpecifier')
-2925967
You can cast the resulting [bigint] instance back to an [int] (System.Int32).
Note:
The result is a negative number, because the most significant hex digit of the hex input string is >= 0x8, i.e. has its high bit set.
To force [bigint] to unconditionally interpret a hex. input string as a positive number, prepend 0.
The internal two's complement representation of a resulting negative number is performed at byte boundaries, so that a given hex number with an odd number of digits (i.e. if the first hex digit is a "half byte") has the missing half byte filled with 1 bits.
Therefore, a hex-number string whose most significant digit is >= 0x8 (parses as a negative number) results in the same number as prepending one or more Fs (0xF == 1111) to it; e.g., the following calls all result in -2048:
[bigint]::Parse('800', 'AllowHexSpecifier'),
[bigint]::Parse('F800', 'AllowHexSpecifier'),
[bigint]::Parse('FF800', 'AllowHexSpecifier'), ...
See the docs for details about the parsing logic.
Examples:
# First digit (7) is < 8 (high bit NOT set) -> positive number
[bigint]::Parse('7FF', 'AllowHexSpecifier') # -> 2047
# First digit (8) is >= 8 (high bit IS SET) -> negative number
[bigint]::Parse('800', 'AllowHexSpecifier') # -> -2048
# Prepending additional 'F's to a number that parses as
# a negative number yields the *same* result
[bigint]::Parse('F800', 'AllowHexSpecifier') # -> -2048
[bigint]::Parse('FF800', 'AllowHexSpecifier') # -> -2048
# ...
# Starting the hex-number string with '0'
# *unconditionally* makes the result a *positive* number
[bigint]::Parse('0800', 'AllowHexSpecifier') # -> 2048

What is the difference between char::is_digit and char::is_numeric?

What is the difference between char::is_digit and char::is_numeric?
I notice that a general numeric character gives an invalid digit error when converting to a number; is it possible to get the numeric value of a numeric character? Is that a valid thing to do?

char::is_numeric checks whether a character is numeric according to Unicode (specifically if it falls under Unicode General Categories Nd, Nl and No) while char::is_digit can recognize regular digits and digits in radixes different than 10 (up to 36), e.g. hexadecimal a-f (radix 16).
Example difference:
assert!(char::is_numeric('a')); // fails
assert!(char::is_digit('a', 10)); // fails
assert!(char::is_digit('a', 16)); // works
It's ok to obtain numeric values of characters - you just need to provide the right radix:
println!("{}", 'a'.to_digit(16).unwrap()); // 10
println!("{}", 'z'.to_digit(36).unwrap()); // 35

According to the Rust docs, 'digit' is defined to be only the following characters: 0-9 a-z A-Z.
The is_numeric function looks to just check to see if the value is in fact a number there are some cool examples in the docs.

Matlab: Function that returns a string with the first n characters of the alphabet

I'd like to have a function generate(n) that generates the first n lowercase characters of the alphabet appended in a string (therefore: 1<=n<=26)
For example:
generate(3) --> 'abc'
generate(5) --> 'abcde'
generate(9) --> 'abcdefghi'
I'm new to Matlab and I'd be happy if someone could show me an approach of how to write the function. For sure this will involve doing arithmetic with the ASCII-codes of the characters - but I've no idea how to do this and which types that Matlab provides to do this.

I would rely on ASCII codes for this. You can convert an integer to a character using char.
So for example if we want an "e", we could look up the ASCII code for "e" (101) and write:
char(101)
'e'
This also works for arrays:
char([101, 102])
'ef'
The nice thing in your case is that in ASCII, the lowercase letters are all the numbers between 97 ("a") and 122 ("z"). Thus the following code works by taking ASCII "a" (97) and creating an array of length n starting at 97. These numbers are then converted using char to strings. As an added bonus, the version below ensures that the array can only go to 122 (ASCII for "z").
function output = generate(n)
output = char(97:min(96 + n, 122));
end
Note: For the upper limit we use 96 + n because if n were 1, then we want 97:97 rather than 97:98 as the second would return "ab". This could be written as 97:(97 + n - 1) but the way I've written it, I've simply pulled the "-1" into the constant.
You could also make this a simple anonymous function.
generate = #(n)char(97:min(96 + n, 122));
generate(3)
'abc'
To write the most portable and robust code, I would probably not want those hard-coded ASCII codes, so I would use something like the following:
output = 'a':char(min('a' + n - 1, 'z'));

...or, you can just generate the entire alphabet and take the part you want:
function str = generate(n)
alphabet = 'a':'z';
str = alphabet(1:n);
end
Note that this will fail with an index out of bounds error for n > 26, so you might want to check for that.

You can use the char built-in function which converts an interger value (or array) into a character array.
EDIT
Bug fixed (ref. Suever's comment)
function [str]=generate(n)
a=97;
% str=char(a:a+n)
str=char(a:a+n-1)
Hope this helps.
Qapla'

How is the full substring different from using .text()?

I'm failing to see how taking the full substring is different from just using .text()?
This is a snippet of a larger code set that I'm trying to understand but failing:
$(this).text().substring(0, ($(this).text().length - 1))
Substring takes a portion of the full text/string, but in this case it is taking the whole string, correct?

No, here substring is returning characters 0 to n-1 of an n length string.
x = "hello";
>>> "hello"
x.substring(0, x.length - 1)
>>> "hell"
From the MDN documentation linked:
substring extracts characters from indexA up to but not including indexB. In particular:
If indexA equals indexB, substring returns an empty string.
If indexB is omitted, substring extracts characters to the end of the string.
If either argument is less than 0 or is NaN, it is treated as if it were 0.
If either argument is greater than stringName.length, it is treated as if it were stringName.length.

Trying to understand this code (creating a [char]range)

I have code that works, but I have no idea WHY it works.
This will generate a list containing each letter of the English alphabet:
[char[]]([char]'a'..[char]'z')
However, this will not:
[char]([char]'a'..[char]'z')
and this will actually generate a list of numbers from 97 - 122
([char]'a'..[char]'z')
Could any experts out there explain to me how this works (or doesn't)?

In your second example, you are trying to cast an array of characters to a single character [char]. That won't work. In the third example, the 'a' is considered a string by PowerShell. So casting it to [char] tells PowerShell it is a single char. The .. operator ranges over numbers. Fortunately, PowerShell can convert the character 'a' to its ASCII value 97 and 'z' to 122. So you effectively wind up with 97..122. Then in your first example, the [char[]] converts that array of ints back to an array of characters: a through z.

In Powershell 'a' is a [string] type. [char]'a' is, obviously a [char] type. These are very different things.
$string = 'a'
$char = [char]$string
$string can be cast as a [char] because it is a string, consisting of a single character. If there is more than one character in the string, e.g. 'ab' then you need an array of [chars], which is type [char[]]. The extra set of square brackets designates an array.
$string | get-member
$char | get-member
reveals much different methods for the two types. The [char] type has .toint() methods. If you cast it as [int], it assumes the numeric ASCII code for that character.
[int]$char
returns 97, the ASCII code for the letter 'a'.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

J unicode index accessor - unicode

In J, i can do the following: r=:'0123456' m=:3 } r echo m and it prints 3, as it should. However, unicode seems to not work: '▁▂▃▄▅▆▇' m=: 3 } r echo m prints nothing. My guess is that this is due to } indexing by byte - what is the proper way to index by char position?

Related

Convert byte array (hex) to signed Int

What is the difference between char::is_digit and char::is_numeric?

Matlab: Function that returns a string with the first n characters of the alphabet

How is the full substring different from using .text()?

Trying to understand this code (creating a [char]range)

Categories

Resources