Are there standardized translations of Unicode character names?

Are there standardized translations of Unicode character names? - unicode

Every codepoint in the Unicode standard has a unique English name attached to it. I need translations for these names (for a small subset of codepoints) to languages like German, French, Japanese, ... I do have access to professional translators, so it is of course possible to have those names translated one by one, but the result is not necessarily a good representation of the intention of the Unicode standard. I wonder if the Unicode committee has already made an effort to standardize the codepoint names for languages other than English, so that I could simply refer to their translations? I could not find anything but English on unicode.org, but I still hope I missed something. Thanks in advance!

.NET / PowerShell example: [Microsofts.CharMap.UName]::Get('č')
Windows OS: there are localized Unicode properties (name, at least) saved in localized library getuname.dll. Use the following script directly, or get inspiration there:
<#
Origin by: http://poshcode.org/5234
Improved by: https://stackoverflow.com/users/3439404/josefz
Use this like this: "ábč",([char]'x'),0xBF | Get-CharInfo
Activate dot-sourced like this (apply a real path instead of .\):
. .\_get-CharInfo_1.1.ps1
#>
Set-StrictMode -Version latest
Add-Type -Name UName -Namespace Microsofts.CharMap -MemberDefinition $(
switch ("$([System.Environment]::SystemDirectory -replace
'\\', '\\')\\getuname.dll") {
{Test-Path -LiteralPath $_ -PathType Leaf} {#"
[DllImport("${_}", ExactSpelling=true, SetLastError=true)]
private static extern int GetUName(ushort wCharCode,
[MarshalAs(UnmanagedType.LPWStr)] System.Text.StringBuilder buf);
public static string Get(char ch) {
var sb = new System.Text.StringBuilder(300);
UName.GetUName(ch, sb);
return sb.ToString();
}
"#
}
default {'public static string Get(char ch) { return "???"; }'}
})
function Get-CharInfo {
[CmdletBinding()]
[OutputType([System.Management.Automation.PSCustomObject],[System.Array])]
param(
[Parameter(Position=0, Mandatory=$true, ValueFromPipeline=$true)]
$InputObject
)
begin {
function out {
param(
[Parameter(Position=0, Mandatory=$true )] $ch,
[Parameter(Position=1, Mandatory=$false)]$nil=''
)
if (0 -le $ch -and 0xFFFF -ge $ch) {
[pscustomobject]#{
Char = [char]$ch
CodePoint = 'U+{0:X4}' -f $ch
Category = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($ch)
Description = [Microsofts.CharMap.UName]::Get($ch)
}
} elseif (0 -le $ch -and 0x10FFFF -ge $ch) {
$s = [char]::ConvertFromUtf32($ch)
[pscustomobject]#{
Char = $s
CodePoint = 'U+{0:X}' -f $ch
Category = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($s, 0)
Description = '???' + $nil
}
} else {
Write-Warning ('Character U+{0:X} is out of range' -f $ch)
}
}
}
process {
if ($PSBoundParameters['Verbose']) {
Write-Warning "InputObject type = $($InputObject.GetType().Name)"}
if ($null -cne ($InputObject -as [char])) {
#Write-Verbose "A $([char]$InputObject) InputObject character"
out $([int][char]$InputObject) ''
} elseif ($InputObject -isnot [string] -and $null -cne ($InputObject -as [int])) {
#Write-Verbose "B $InputObject InputObject"
out $([int]$InputObject) ''
} else {
$InputObject = [string]$InputObject
#Write-Verbose "C $InputObject InputObject.Length $($InputObject.Length)"
for ($i = 0; $i -lt $InputObject.Length; ++$i) {
if ( [char]::IsHighSurrogate($InputObject[$i]) -and
(1+$i) -lt $InputObject.Length -and
[char]::IsLowSurrogate($InputObject[$i+1])) {
$aux = ' 0x{0:x4},0x{1:x4}' -f [int]$InputObject[$i],
[int]$InputObject[$i+1]
Write-Verbose "surrogate pair $aux at position $i"
out $([char]::ConvertToUtf32($InputObject[$i], $InputObject[1+$i])) $aux
$i++
} else {
out $([int][char]$InputObject[$i]) ''
}
}
}
}
}
Example:
PS D:\PShell> "ábč",([char]'x'),0xBF | Get-CharInfo
Char CodePoint Category Description
---- --------- -------- -----------
á U+00E1 LowercaseLetter Latin Small Letter A With Acute
b U+0062 LowercaseLetter Latin Small Letter B
č U+010D LowercaseLetter Latin Small Letter C With Caron
x U+0078 LowercaseLetter Latin Small Letter X
¿ U+00BF OtherPunctuation Inverted Question Mark
PS D:\PShell> Get-Content .\DataFiles\getcharinfoczech.txt
Char CodePoint Category Description
---- --------- -------- -----------
á U+00E1 LowercaseLetter Malé písmeno latinky a s čárkou nad vpravo
b U+0062 LowercaseLetter Malé písmeno latinky b
č U+010D LowercaseLetter Malé písmeno latinky c s háčkem
x U+0078 LowercaseLetter Malé písmeno latinky x
¿ U+00BF OtherPunctuation Znak obráceného otazníku
PS D:\PShell>
Note that the latter (semi-localized) output comes from the following code (run on the same computer under localized user):
"ábč",([char]'x'),0xBF | Get-CharInfo | Out-File .\DataFiles\getcharinfoczech.txt

According to this, there were official French translations for a time, but not anymore.

Related

How can I parse distinguished names from Active Directory using Powershell to determine parent OUs?

I'm using Microsoft's ActiveDirectory module to retrieve and manipulate our domain users, and I need to easily determine the parent OU of the objects I'm retrieving. I've tried using -split ',' or .Split(','), but I keep running into issues with certain objects that have commas in them.

There is no public exposed DN parser method or class built in to the .Net libraries. It does exist because it has to be there for how some of the DirectoryServices classes seem to work, but I don't know how to call it from Powershell and it's not documented.
There is the fairly popular DNParser library on NuGet, which is a .Net library for parsing and manipulating distinguished names.
First, download the package file from NuGet. The package will be called "dnparser.1.3.3.nupkg" for example, but it's just a ZIP file. Extract the contents to a folder. The package is a single library, so all we need is .\dnparser.1.3.3\lib\net5.0\CPI.DirectoryServices.dll for Powershell v5 or .\dnparser.1.3.3\lib\netstandard1.1\CPI.DirectoryServices.dll for Powershell v6+. You only need that library. Nothing else in the package is strictly necessary.
# Load the library
Add-Type -Path 'C:\Path\To\dnparser.1.3.3\lib\netstandard1.1\CPI.DirectoryServices.dll'
Get-ADUser -Filter 'Enabled -eq "True"' |
Select-Object -First 10 |
ForEach-Object {
$DN = [CPI.DirectoryServices.DN]::new($_.DistinguishedName)
[PSCustomObject]#{
DistinguishedName = $DN.ToString()
ParentOU = $DN.Parent.ToString()
}
} |
Format-List *
You can also create the object with New-Object if you prefer that.
$DN = New-Object -TypeName CPI.DirectoryServices.DN -ArgumentList $_.DistinguishedName
There are other methods and properties in the class, but this is enough for what I need.
Warning: I have learned that DNParser, designed around RFC 2253, uses UTF-8 for encoding hex characters, while I think at least some instances of Active Directory use ISO-8859-1 (Western Latin). In short, you may have hex-escaped characters in Active Directory like ü which are escaped as \FC. These may translate to the UTF-8 unprintable character in DNParser � or \EF\BF\BD because they're in an invalid range in UTF-8. The UTF-8 equivalent would be \C3\BC, but that's Ã³ in ISO-8859-1. There does not appear to be a way to force disable this behavior.

You could use this small helper function to parse out the RelativeDistinguishedName components in order:
function Parse-DistinghuishedName {
# See https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ldap/distinguished-names
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
[string[]]$DistinguishedName
)
begin {
function _UnescapeSpecial([string]$value) {
# replace all special characters formatted as BackSlash-TwoDigitHexCode
$match = ([regex]'(?i)\\([0-9a-f]{2})').Match($value)
while ($match.Success) {
$value = $value -replace "\\$($match.Groups[1].Value)", [char][convert]::ToUInt16($match.Groups[1].Value, 16)
$match = $match.NextMatch()
}
# finally, replace all backslash escaped characters
$value -replace '\\(.)', '$1'
}
}
process {
foreach ($dn in $DistinguishedName) {
$hash = [ordered]#{}
# split the string into separate RDN (RelativeDistinguishedName) components
$dn -split ',\s*(?<!\\,\s*)' | ForEach-Object {
$name, $value = ($_ -split '=', 2).Trim()
if (![string]::IsNullOrWhiteSpace($value)) {
$value = _UnescapeSpecial $value
switch ($name) {
'O' { $hash['Organization'] = $value }
'L' { $hash['City'] = $value }
'S' { $hash['State'] = $value }
'C' { $hash['Country'] = $value }
'ST' { $hash['StateOrProvince'] = $value }
'UID' { $hash['UserId'] = $value }
'STREET' { $hash['Street'] = $value }
# these RDN's can occur multiple times, so add as arrays
'CN' { $hash['Name'] += #($value) }
'OU' { $hash['OrganizationalUnit'] += #($value) }
'DC' { $hash['DomainComponent'] += #($value) }
}
}
}
$hash
}
}
}
Usage:
$dnHash = Parse-DistinghuishedName 'CN=R\fchmann\, Heinz ,OU=Test,OU=SubOU,DC=North America,DC=Fabrikam,DC=COM'
would result in an ordered Hashtable:
Name Value
---- -----
Name {Rühmann, Heinz}
OrganizationalUnit {Test, SubOU}
DomainComponent {North America, Fabrikam, COM}
To get the parent OU name, you just index into the .OrganizationalUnit element:
$dnHash.OrganizationalUnit[0] # --> 'Test' (top parent OU)
$dnHash.OrganizationalUnit[-1] # --> 'SubOU' (direct OU)

How to display Unicode character names and their hexadecimal codes with PowerShell?

Since the default Windows PowerShell console fonts don't support Emojis, I'd like to display their surrogate pair hexadecimal codes and ideally also their Unicode character names for debugging purposes.
I know how to convert Emojis to a byte arrays, but I haven't figured out how to convert them to surrogate pair hexadecimal codes and Unicode character names.
$ThumbsUp = "👍"
$Bytes = [system.Text.Encoding]::UTF8.GetBytes($ThumbsUp)
# output
#240
#159
#145
#141
What I need is the following output:
$Hex = 0x1F44D
$CharName = "Thumbs Up Sign"
I.e., the following command should convert the hexadecimal value back to an Emoj:
[char]::ConvertFromUtf32($Hex)
# output
#👍

Maybe the following script (a part of my broader project) could help. The script defines fairly sophisticated Get-CharInfo function.
Example: 'r Ř',0x1F44D|chr -OutUni -OutHex -OutStr -IgnoreWhiteSpace
r Ř👍
0x0072,0x002C,0x0158,0x0001F44D
\u0072\u002C\u0158\U0001F44D
Char CodePoint Category Description
---- --------- -------- -----------
r {U+0072, 0x72} LowercaseLetter Latin Small Letter R
Ř {U+0158, 0xC5,0x98} UppercaseLetter Latin Capital Letter R With Caron
👍 {U+1F44D, 0xF0,0x9F,0x91,0x8D} So THUMBS UP SIGN (0xd83d,0xdc4d)
# ↑ UFF-8 ↑ name ↑ surrogates
The code (comment-based help at the end of the function body):
# Get-CharInfo function. Activate dot-sourced
# . .\_get-CharInfo_2.1.ps1
# Comment-based help at the end of the function body
# History notes at the very end of the script
if ( -not ('Microsofts.CharMap.UName' -as [type]) ) {
Add-Type -Name UName -Namespace Microsofts.CharMap -MemberDefinition $(
switch ("$([System.Environment]::SystemDirectory -replace
'\\', '\\')\\getuname.dll") {
{Test-Path -LiteralPath $_ -PathType Leaf} {#"
[DllImport("${_}", ExactSpelling=true, SetLastError=true)]
private static extern int GetUName(ushort wCharCode,
[MarshalAs(UnmanagedType.LPWStr)] System.Text.StringBuilder buf);
public static string Get(char ch) {
var sb = new System.Text.StringBuilder(300);
UName.GetUName(ch, sb);
return sb.ToString();
}
"#
}
default {'public static string Get(char ch) { return "???"; }'}
})
}
function Get-CharInfo {
[CmdletBinding()]
[OutputType([System.Management.Automation.PSCustomObject],
[System.Array])]
param(
# named or positional: a string or a number e.g. 'r Ř👍'
# pipeline: an array of strings and numbers, e.g 'r Ř',0x1f44d
[Parameter(Position=0, Mandatory, ValueFromPipeline)]
$InputObject,
# + Write-Host Python-like Unicode literal e.g. \u0072\u0020\u0158\U0001F44D
[Parameter()]
[switch]$OutUni,
# + Write-Host array of hexadecimals e.g. 0x0072,0x0020,0x0158,0x0001F44D
[Parameter()]
[switch]$OutHex,
# + Write-Host concatenated string e.g. r Ř👍
[Parameter()]
[switch]$OutStr,
# choke down whitespaces ( $s -match '\s' ) from output
[Parameter()]
[switch]$IgnoreWhiteSpace,
# from https://www.unicode.org/Public/UNIDATA/UnicodeData.txt
[Parameter()]
[string]$UnicodeData = 'D:\Utils\CodePages\UnicodeData.txt'
)
begin {
Set-StrictMode -Version latest
if ( [string]::IsNullOrEmpty( $UnicodeData) ) { $UnicodeData = '::' }
Function ReadUnicodeRanges {
if ($Script:UnicodeFirstLast.Count -eq 0) {
$Script:UnicodeFirstLast = #'
First,Last,Category,Description
128,128,Cc-Control,Padding Character
129,129,Cc-Control,High Octet Preset
132,132,Cc-Control,Index
153,153,Cc-Control,Single Graphic Character Introducer
13312,19903,Lo-Other_Letter,CJK Ideograph Extension A
19968,40956,Lo-Other_Letter,CJK Ideograph
44032,55203,Lo-Other_Letter,Hangul Syllable
94208,100343,Lo-Other_Letter,Tangut Ideograph
101632,101640,Lo-Other_Letter,Tangut Ideograph Supplement
131072,173789,Lo-Other_Letter,CJK Ideograph Extension B
173824,177972,Lo-Other_Letter,CJK Ideograph Extension C
177984,178205,Lo-Other_Letter,CJK Ideograph Extension D
178208,183969,Lo-Other_Letter,CJK Ideograph Extension E
183984,191456,Lo-Other_Letter,CJK Ideograph Extension F
196608,201546,Lo-Other_Letter,CJK Ideograph Extension G
983040,1048573,Co-Private_Use,Plane 15 Private Use
1048576,1114109,Co-Private_Use,Plane 16 Private Use
'# | ConvertFrom-Csv -Delimiter ',' |
ForEach-Object {
[PSCustomObject]#{
First = [int]$_.First
Last = [int]$_.Last
Category = $_.Category
Description= $_.Description
}
}
}
foreach ( $FirstLast in $Script:UnicodeFirstLast) {
if ( $FirstLast.First -le $ch -and $ch -le $FirstLast.Last ) {
$out.Category = $FirstLast.Category
$out.Description = $FirstLast.Description + $nil
break
}
}
}
$AuxHex = [System.Collections.ArrayList]::new()
$AuxStr = [System.Collections.ArrayList]::new()
$AuxUni = [System.Collections.ArrayList]::new()
$Script:UnicodeFirstLast = #()
$Script:UnicodeDataLines = #()
function ReadUnicodeData {
if ( $Script:UnicodeDataLines.Count -eq 0 -and (Test-Path $UnicodeData) ) {
$Script:UnicodeDataLines = #([System.IO.File]::ReadAllLines(
$UnicodeData, [System.Text.Encoding]::UTF8))
}
$DescrLine = $Script:UnicodeDataLines -match ('^{0:X4}\;' -f $ch)
if ( $DescrLine.Count -gt 0) {
$u0, $Descr, $Categ, $u3 = $DescrLine[0] -split ';'
$out.Category = $Categ
$out.Description = $Descr + $nil
}
}
function out {
param(
[Parameter(Position=0, Mandatory=$true )] $ch,
[Parameter(Position=1, Mandatory=$false)]$nil=''
)
if (0 -le $ch -and 0xFFFF -ge $ch) {
[void]$AuxHex.Add('0x{0:X4}' -f $ch)
$s = [char]$ch
[void]$AuxStr.Add($s)
[void]$AuxUni.Add('\u{0:X4}' -f $ch)
$out = [pscustomobject]#{
Char = $s
CodePoint = ('U+{0:X4}' -f $ch),
(([System.Text.UTF32Encoding]::UTF8.GetBytes($s) |
ForEach-Object { '0x{0:X2}' -f $_ }) -join ',')
Category = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($ch)
Description = [Microsofts.CharMap.UName]::Get($ch)
}
if ( $out.Description -eq 'Undefined' ) { ReadUnicodeRanges }
if ( $out.Description -eq 'Undefined' ) { ReadUnicodeData }
} elseif (0x10000 -le $ch -and 0x10FFFF -ge $ch) {
[void]$AuxHex.Add('0x{0:X8}' -f $ch)
$s = [char]::ConvertFromUtf32($ch)
[void]$AuxStr.Add($s)
[void]$AuxUni.Add('\U{0:X8}' -f $ch)
$out = [pscustomobject]#{
Char = $s
CodePoint = ('U+{0:X}' -f $ch),
(([System.Text.UTF32Encoding]::UTF8.GetBytes($s) |
ForEach-Object { '0x{0:X2}' -f $_ }) -join ',')
Category = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($s, 0)
Description = '???' + $nil
}
ReadUnicodeRanges
if ( $out.Description -eq ('???' + $nil) ) { ReadUnicodeData }
} else {
Write-Warning ('Character U+{0:X4} is out of range' -f $ch)
$s = $null
}
if (( $null -eq $s ) -or
( $IgnoreWhiteSpace.IsPresent -and ( $s -match '\s' ))
) {
} else {
$out
}
}
}
process {
#if ($PSBoundParameters['Verbose']) {
# Write-Warning "InputObject $InputObject, type = $($InputObject.GetType().Name)"
#}
if ( ($InputObject -as [int]) -gt 0xFFFF -and
($InputObject -as [int]) -le 0x10ffff ) {
$InputObject = [string][char]::ConvertFromUtf32($InputObject)
}
if ($null -cne ($InputObject -as [char])) {
#Write-Verbose "A $([char]$InputObject) InputObject character"
out $([int][char]$InputObject) ''
} elseif ( $InputObject -isnot [string] -and
$null -cne ($InputObject -as [int])) {
#Write-Verbose "B $InputObject InputObject"
out $([int]$InputObject) ''
} else {
$InputObject = [string]$InputObject
#Write-Verbose "C $InputObject InputObject.Length $($InputObject.Length)"
for ($i = 0; $i -lt $InputObject.Length; ++$i) {
if ( [char]::IsHighSurrogate($InputObject[$i]) -and
(1+$i) -lt $InputObject.Length -and
[char]::IsLowSurrogate($InputObject[$i+1])) {
$aux = ' (0x{0:x4},0x{1:x4})' -f [int]$InputObject[$i],
[int]$InputObject[$i+1]
# Write-Verbose "surrogate pair $aux at position $i"
out $([char]::ConvertToUtf32($InputObject[$i], $InputObject[1+$i])) $aux
$i++
} else {
out $([int][char]$InputObject[$i]) ''
}
}
}
}
end {
if ( $OutStr.IsPresent -or $PSBoundParameters['Verbose']) {
Write-Host -ForegroundColor Magenta -Object $($AuxStr -join '')
}
if ( $OutHex.IsPresent -or $PSBoundParameters['Verbose']) {
Write-Host -ForegroundColor Cyan -Object $($AuxHex -join ',')
}
if ( $OutUni.IsPresent -or $PSBoundParameters['Verbose']) {
Write-Host -ForegroundColor Yellow -Object $($AuxUni -join '')
}
}
<#
.SYNOPSIS
Return basic information about supplied Unicode characters.
.DESCRIPTION
Return information about supplied Unicode characters:
- as a PSCustomObject for programming purposes,
- in a human-readable form, and
- with optional additional output to the Information Stream.
Properties of the output PSCustomObject are as follows:
Char The character itself (if renderable)
CodePoint [string[]]Unicode CodePoint, its UTF-8 byte sequence
Category General Category (long name or abbreviation)
Description Name (and surrogate pair in parentheses if apply).
.INPUTS
An array of characters, strings and numbers (in any combination)
can be piped to the function as parameter $InputObject, e.g as
"ΧАB",[char]4301,191,0x1F3DE | Get-CharInfo
or (the same in terms of decimal numbers) as
935,1040,66,4301,191,127966 | Get-CharInfo
On the other side, the $InputObject parameter supplied named
or positionally must be of the only base type: either a number
or a character or a string.
The same input as a string:
Get-CharInfo -InputObject 'ΧАBჍ¿🏞'
-Verbose implies all -OutUni, -OutHex and -OutStr
.OUTPUTS
[System.Management.Automation.PSCustomObject]
[Object[]] (an array like [PSCustomObject[]])
.NOTES
The UnicodeData.txt file (if used) must be saved locally
from https://www.unicode.org/Public/UNIDATA/UnicodeData.txt
(currently Unicode 13.0.0)
The UnicodeData.txt file is not required however, in such case,
Get-CharInfo function could be return inaccurate properties
Category and Description for characters above BMP, see Example-3.
.LINK
Unicode® Standard Annex #44: Unicode Character Database (UCD)
.LINK
https://www.unicode.org/reports/tr44/
.LINK
https://www.unicode.org/reports/tr44/#General_Category_Values
.EXAMPLE
# full (first three lines are in the Information Stream)
'r Ř👍'|Get-CharInfo -OutUni -OutHex -OutStr -IgnoreWhiteSpace
r Ř👍
0x0072,0x0020,0x0158,0x0001F44D
\u0072\u0020\u0158\U0001F44D
Char CodePoint Category Description
---- --------- -------- -----------
r {U+0072, 0x72} LowercaseLetter Latin Small Letter R
Ř {U+0158, 0xC5,0x98} UppercaseLetter Latin Capital Letter R W...
👍 {U+1F44D, 0xF0,0x9F,0x91,0x8D} So THUMBS UP SIGN (0xd83d,0...
.EXAMPLE
# shortened version of above (output is the same)
'r Ř👍'|chr -Verbose -IgnoreWhiteSpace
.EXAMPLE
# inaccurate (inexact) output above BMP if missing UnicodeData.txt
'r Ř👍'|chr -Verbose -IgnoreWhiteSpace -UnicodeData .\foo.bar
r Ř👍
0x0072,0x0020,0x0158,0x0001F44D
\u0072\u0020\u0158\U0001F44D
Char CodePoint Category Description
---- --------- -------- -----------
r {U+0072, 0x72} LowercaseLetter Latin Small Letter R
Ř {U+0158, 0xC5,0x98} UppercaseLetter Latin Capital Letter R W...
👍 {U+1F44D, 0xF0,0x9F,0x91,0x8D} OtherSymbol ??? (0xd83d,0xdc4d)
.FUNCTIONALITY
Tested: Windows 8.1/64bit, Powershell 4
Windows 10 /64bit, Powershell 5
Windows 10 /64bit, Powershell Core 6.2.0
Windows 10 /64bit, Powershell Core 7.1.0
#>
}
Set-Alias -Name chr -Value Get-CharInfo
<#
HISTORY NOTES
Origin by: http://poshcode.org/5234
http://fossil.include-once.org/poshcode/artifact/5757dbbd0bc26c84333e7cf4ccc330ab89447bf679e86ddd6fbd3589ca24027e
License: CC0
https://creativecommons.org/publicdomain/zero/1.0/legalcode
Activate dot-sourced like this (apply a real path instead of .\):
. .\Get-CharInfo.ps1
Improved by: https://stackoverflow.com/users/3439404/josefz
(to version 2)
#>

Partial answer - I only know how to get the UTF-32 code point:
$ThumbsUp = "👍"
$utf32bytes = [System.Text.Encoding]::UTF32.GetBytes( $ThumbsUp )
$codePoint = [System.BitConverter]::ToUint32( $utf32bytes )
"0x{0:X}" -f $codePoint
Output:
0x1F44D
For the character names, you could possibly find an answer here:
Finding out Unicode character name in .Net

Here's a simple script to get the name. Note emojis are two surrogate chars. Making a hash is faster than using where-object, even for one search of a 34,626 line file.
# idChar.ps1
param($inputChar)
if (! (test-path $psscriptroot\UnicodeData.txt)) {
wget http://www.unicode.org/Public/UNIDATA/UnicodeData.txt -outfile UnicodeData.txt
}
$unicode = import-csv $psscriptroot\UnicodeData.txt -Delimiter ';' -Header hexcode,
name
$unicode | % { $hash = #{} } { $hash[[int]('0x' + $_.hexcode)] = $_ }
$hash[[char]::ConvertToUtf32($inputChar,($index=0))]
Examples (control v to paste in console, not right click, to use psreadline's paste function):
.\idChar 👍
hexcode name
------- ----
1F44D THUMBS UP SIGN
.\idChar —
hexcode name
------- ----
2014 EM DASH

How can I increase the maximum number of characters read by Read-Host?

I need to get a very long string input (around 9,000 characters), but Read-Host will truncate after around 8,000 characters. How can I extend this limit?

The following are possible workarounds.
Workaround 1 has the advantage that it will work with PowerShell background jobs that require keyboard input. Note that if you are trying to paste clipboard content containing new lines, Read-HostLine will only read the first line, but Read-Host has this same behavior.
Workaround 1:
<#
.SYNOPSIS
Read a line of input from the host.
.DESCRIPTION
Read a line of input from the host.
.EXAMPLE
$s = Read-HostLine -prompt "Enter something"
.NOTES
Read-Host has a limitation of 1022 characters.
This approach is safe to use with background jobs that require input.
If pasting content with embedded newlines, only the first line will be read.
A downside to the ReadKey approach is that it is not possible to easily edit the input string before pressing Enter as with Read-Host.
#>
function Read-HostLine ($prompt = $null) {
if ($prompt) {
"${prompt}: " | Write-Host
}
$str = ""
while ($true) {
$key = $host.UI.RawUI.ReadKey("NoEcho, IncludeKeyDown");
# Paste the clipboard on CTRL-V
if (($key.VirtualKeyCode -eq 0x56) -and # 0x56 is V
(([int]$key.ControlKeyState -band [System.Management.Automation.Host.ControlKeyStates]::LeftCtrlPressed) -or
([int]$key.ControlKeyState -band [System.Management.Automation.Host.ControlKeyStates]::RightCtrlPressed))) {
$clipboard = Get-Clipboard
$str += $clipboard
Write-Host $clipboard -NoNewline
continue
}
elseif ($key.VirtualKeyCode -eq 0x08) { # 0x08 is Backspace
if ($str.Length -gt 0) {
$str = $str.Substring(0, $str.Length - 1)
Write-Host "`b `b" -NoNewline
}
}
elseif ($key.VirtualKeyCode -eq 13) { # 13 is Enter
Write-Host
break
}
elseif ($key.Character -ne 0) {
$str += $key.Character
Write-Host $key.Character -NoNewline
}
}
return $str
}
Workaround 2:
$maxLength = 65536
[System.Console]::SetIn([System.IO.StreamReader]::new([System.Console]::OpenStandardInput($maxLength), [System.Console]::InputEncoding, $false, $maxLength))
$s = [System.Console]::ReadLine()
Workaround 3:
function Read-Line($maxLength = 65536) {
$str = ""
$inputStream = [System.Console]::OpenStandardInput($maxLength);
$bytes = [byte[]]::new($maxLength);
while ($true) {
$len = $inputStream.Read($bytes, 0, $maxLength);
$str += [string]::new($bytes, 0, $len)
if ($str.EndsWith("`r`n")) {
$str = $str.Substring(0, $str.Length - 2)
return $str
}
}
}
$s = Read-Line
More discussion here:
Console.ReadLine() max length?
Why does Console.Readline() have a limit on the length of text it allows?
https://github.com/PowerShell/PowerShell/issues/16555

PowerShell - Password Generator - How to always include number in string?

I have the following PowerShell script that creates a random string of 15 digits, for use as an Active Directory password.
The trouble is, this works great most of the time, but on some occasions it doesn't use a number or symbol. I just get 15 letters. This is then not usable as an Active Directory password, as it must have at least one number or symbol in it.
$punc = 46..46
$digits = 48..57
$letters = 65..90 + 97..122
$YouShallNotPass = get-random -count 15 `
-input ($punc + $digits + $letters) |
% -begin { $aa = $null } `
-process {$aa += [char]$_} `
-end {$aa}
Write-Host "Password is $YouShallNotPass"
How would I amend the script to always have at least one random number or symbol in it?
Thank you.

You could invoke the Get-Random cmdlet three times, each time with a different input parameter (punc, digit and letters), concat the result strings and shuffle them using another Get-Random invoke:
(Get-Random -Count 15 -InputObject ([char[]]$yourPassword)) -join ''
However, why do you want to reinvent the wheel? Consider using the following GeneratePassword function:
[Reflection.Assembly]::LoadWithPartialName("System.Web")
[System.Web.Security.Membership]::GeneratePassword(15,2)
And to ensure, it contains at least one random number (you already specify the number of symbols):
do {
$pwd = [System.Web.Security.Membership]::GeneratePassword(15,2)
} until ($pwd -match '\d')

As suggested by jisaak, there is no 100% guaranty that the Membership.GeneratePassword Method generates a password that meets the AD complexity requirements.
That's why I reinvented the wheel:
Function Create-String([Int]$Size = 8, [Char[]]$CharSets = "ULNS", [Char[]]$Exclude) {
$Chars = #(); $TokenSet = #()
If (!$TokenSets) {$Global:TokenSets = #{
U = [Char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ' #Upper case
L = [Char[]]'abcdefghijklmnopqrstuvwxyz' #Lower case
N = [Char[]]'0123456789' #Numerals
S = [Char[]]'!"#$%&''()*+,-./:;<=>?#[\]^_`{|}~' #Symbols
}}
$CharSets | ForEach {
$Tokens = $TokenSets."$_" | ForEach {If ($Exclude -cNotContains $_) {$_}}
If ($Tokens) {
$TokensSet += $Tokens
If ($_ -cle [Char]"Z") {$Chars += $Tokens | Get-Random} #Character sets defined in upper case are mandatory
}
}
While ($Chars.Count -lt $Size) {$Chars += $TokensSet | Get-Random}
($Chars | Sort-Object {Get-Random}) -Join "" #Mix the (mandatory) characters and output string
}; Set-Alias Create-Password Create-String -Description "Generate a random string (password)"
Usage:
The Size parameter defines the length of the password.
The CharSets parameter defines the complexity where the character U,
L, N and S stands for Uppercase, Lowercase, Numerals and Symbols.
If supplied in lowercase (u, l, n or s) the returned string
might contain any of character in the concerned character set, If
supplied in uppercase (U, L, N or S) the returned string will
contain at least one of the characters in the concerned character
set.
The Exclude parameter lets you exclude specific characters that might e.g.
lead to confusion like an alphanumeric O and a numeric 0 (zero).
Examples:
To create a password with a length of 8 characters that might contain any uppercase characters, lowercase characters and numbers:
Create-Password 8 uln
To create a password with a length of 12 characters that that contains at least one uppercase character, one lowercase character, one number and one symbol and does not contain the characters OLIoli01:
Create-Password 12 ULNS "OLIoli01"
For the latest New-Password version: use:
Install-Script -Name PowerSnippets.New-Password

Command to Generate Random passwords by using existing funciton:
[system.web.security.membership]::GeneratePassword(x,y)
x = Length of the password
y = Complexity
General Error:
Unable to find type [system.web.security.membership]. Make sure that the assembly that contains this type is loaded.
Solution:
Run the below command:
Add-Type -AssemblyName System.web;

Another solution:
function New-Password() {
param(
[int] $Length = 10,
[bool] $Upper = $true,
[bool] $Lower = $true,
[bool] $Numeric = $true,
[string] $Special
)
$upperChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
$lowerChars = "abcdefghijklmnopqrstuvwxyz"
$numericChars = "0123456789"
$all = ""
if ($Upper) { $all = "$all$upperChars" }
if ($Lower) { $all = "$all$lowerChars" }
if ($Numeric) { $all = "$all$numericChars" }
if ($Special -and ($special.Length -gt 0)) { $all = "$all$Special" }
$password = ""
for ($i = 0; $i -lt $Length; $i++) {
Write-Host "password: [$password]"
$password = $password + $all[$(Get-Random -Minimum 0 -Maximum $all.Length)]
}
$valid = $true
if ($Upper -and ($password.IndexOfAny($upperChars.ToCharArray()) -eq -1)) { $valid = $false }
if ($Lower -and ($password.IndexOfAny($lowerChars.ToCharArray()) -eq -1)) { $valid = $false }
if ($Numeric -and ($password.IndexOfAny($numericChars.ToCharArray()) -eq -1)) { $valid = $false }
if ($Special -and $Special.Length -gt 1 -and ($password.IndexOfAny($Special.ToCharArray()) -eq -1)) { $valid = $false }
if (-not $valid) {
$password = New-Password `
-Length $Length `
-Upper $Upper `
-Lower $Lower `
-Numeric $Numeric `
-Special $Special
}
return $password
}
Flexible enough to set length, turn on/of upper, lower, and numeric, and set the list of specials.

My take on generating passwords in PowerShell, based on what I've found here and in the Internets:
#Requires -Version 4.0
[CmdletBinding(PositionalBinding=$false)]
param (
[Parameter(
Mandatory = $false,
HelpMessage = "Minimum password length"
)]
[ValidateRange(1,[int]::MaxValue)]
[int]$MinimumLength = 24,
[Parameter(
Mandatory = $false,
HelpMessage = "Maximum password length"
)]
[ValidateRange(1,[int]::MaxValue)]
[int]$MaximumLength = 42,
[Parameter(
Mandatory = $false,
HelpMessage = "Characters which can be used in the password"
)]
[ValidateNotNullOrEmpty()]
[string]$Characters = '1234567890qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM##%*-_+:,.'
)
(1..(Get-Random -Minimum $MinimumLength -Maximum $MaximumLength) `
| %{ `
$Characters.GetEnumerator() | Get-Random `
}) -join ''
I preferred this over using System.Web, not to introduce dependencies, which could change with .Net / .Net Core versions.
My variation also allows random password length (in specified range), is fairly concise (apart from the parameters section, which is quite verbose, to enforce some validations and provide defaults) and allows character repetitions (as opposite to the code in the question, which never repeats the same character).
I understand, that this does not guarantee a digit in the password. This however can be addressed in different ways. E.g. as was suggested, to repeat the generation until the password matches the requirements (contains a digit). My take would be:
Generate a random password.
If it does not contain a digit (or always):
Use a random function to get 1 random digit.
Add it to the random password.
Randomize the order of the result (so the digit is not necessarily always at the end).
Assuming, that the above script would be named "Get-RandomPassword.ps1", it could look like this:
$pass = .\Get-RandomPassword.ps1
$pass += (0..9 | Get-Random)
$pass = (($pass.GetEnumerator() | Get-Random -Count $pass.Length) -join '')
Write-Output $pass
This can be generalized, to enforce using any character category:
$sets = #('abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '0123456789', '()-_=+[{]};:''",<.>/?`~')
$pass = .\Get-RandomPassword.ps1 -Characters ($sets -join '')
foreach ($set in $sets) {
$pass += ($set.GetEnumerator() | Get-Random)
}
$pass = (($pass.GetEnumerator() | Get-Random -Count $pass.Length) -join '')
Write-Output $pass

I wrote a secure password generator function in PowerShell, maybe this will be useful to someone.
Similar to the accepted answer, this script also uses Get-Random (twice), and also regular expression matching to ensure the output is secure.
The difference in this script is that the password length can also be randomised.
(To hard set a password length, just set the MinimumPasswordLength and MaximumPasswordLength values to the the same length.)
It also allows an easy to edit character set, and also has a regex to ensure a decent password has been generated with all of the following characteristics:
(?=.*\d) must contain at least one numerical character
(?=.*[a-z]) must contain at least one lowercase character
(?=.*[A-Z]) must contain at least one uppercase character
(?=.*\W) must contain at least one non-word character
The answer to your question about always including a number in your generated output can be solved by checking the output with a regex match (just use the parts of the regex that you need, based on the explanations above), the example here checks for uppercase, lowercase, and numerical:
$Regex = "(?=.*\d)(?=.*[a-z])(?=.*[A-Z])"
do {
$Password = ([string]($AllowedPasswordCharacters |
Get-Random -Count $PasswordLength) -replace ' ')
} until ($Password -cmatch $Regex)
$Password
Here is the full script:
Function GeneratePassword
{
cls
$MinimumPasswordLength = 12
$MaximumPasswordLength = 16
$PasswordLength = Get-Random -InputObject ($MinimumPasswordLength..$MaximumPasswordLength)
$AllowedPasswordCharacters = [char[]]'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!?##£$%^&'
$Regex = "(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\W)"
do {
$Password = ([string]($AllowedPasswordCharacters |
Get-Random -Count $PasswordLength) -replace ' ')
} until ($Password -cmatch $Regex)
$Password
}
GeneratePassword

I had the same issue here is the snippet I used to create my alphanumerical password its simple all I have done is used ASCII regex replace to make it nice.
Function Password-Generator ([int]$Length)
{
# Generate passwords just call password-generator(lenght of password)
$Assembly = Add-Type -AssemblyName System.Web
$RandomComplexPassword = [System.Web.Security.Membership]::GeneratePassword($Length,2)
$AlphaNumericalPassword = $RandomComplexPassword -replace '[^\x30-\x39\x41-\x5A\x61-\x7A]+'
Write-Output $AlphaNumericalPassword
}

I've created this. You can choose how many Pwd to create
$howoften = Read-Host "How many would you like to create: "
$i = 0
do{
(-join(1..42 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $howoften)
To change the length of the pwd simply edit the "42" in line 4
(-join(1..**42** | ForEach ...

strange characters when opening a properties file

I have a requirement to update a properties file for a very old project, the properties file is supposed to display Arabic characters but it displays somthing like that "Êã ÊÓÌíá ØáÈßã", i wrote a simple program from which i was able to read the correct Arabic values from the file,
Reader r = new InputStreamReader(new FileInputStream("C:\\Labels_ar.properties"), "Windows-1256");
buffered = new BufferedReader(r);
String line;
while ((line = buffered.readLine()) != null) {
System.out.println("line" + line);
}
but do u have any idea on how i can open the file, edit and save the new changes?

If, as you seem to think, the encoding is Windows-1256, there are editors that will do the job, such as EditPadLite.
If it's not that, the first thing you need to find out is the encoding. Given it's a properties file, it may well be UTF-8 but the easiest way to find out is to get a hex dump of the file and post it here. Under Linux, I'd normally suggest using:
od -xcb Labels_ar.properties
but, given you're on Windows, that's not going to work so well (unless you have CygWin installed).
So, if you have your own favourite hex dump program, just use that. Otherwise you can use the following Powershell one:
function Pf-Dump-Hex-Item([byte[]] $data) {
$left = "+0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F"
$right = "0123456789ABCDEF"
Write-Output "======== $left +$right"
$addr = 0
$left = "{0:X8} " -f $addr
$right = ""
# Now go through the input bytes
foreach ($byte in $bytes) {
# Add 2-digit hex number then filtered character.
$left += "{0:x2} " -f $byte
if (($byte -lt 0x20) -or ($byte -gt 0x7e)) { $byte = "." }
$right += [char] $byte
# Increment address and start new line if needed.
$addr++;
if (($addr % 16) -eq 0) {
Write-Output "$left $right"
$left = "{0:X8} " -f $addr
$right = "";
}
}
# Flush last line if needed.
$lastLine = "{0:X8}" -f $addr
if (($addr % 16) -ne 0) {
while (($addr % 16) -ne 0) {
$left += " "
$addr++;
}
Write-Output "$left $right"
}
Write-Output $lastLine
Write-Output ""
}
function Pf-Dump-Hex {
param(
[Parameter (Mandatory = $false, Position = 0)]
[string] $Path,
[Parameter (Mandatory = $false, ValueFromPipeline = $true)]
[Object] $Object
)
begin {
Set-StrictMode -Version Latest
# Create the array to hold content then do path if given.
[byte[]] $bytes = $null
if ($Path) {
$bytes = [IO.File]::ReadAllBytes((Resolve-Path $Path))
Pf-Dump-Hex-Item $bytes
}
}
process {
# Process each object (input/pipe).
if ($object) {
foreach ($obj in $object) {
if ($obj -is [Byte]) {
$bytes = $obj
} else {
$inpStr = [string] $obj
$bytes = [Text.Encoding]::Unicode.GetBytes($inpStr)
}
Pf-Dump-Hex-Item $bytes
}
}
}
}
If you load that into a Powershell session then run:
pf-dump-hex Labels_ar.properties
that should allow you to evaluate the file encoding.

I think there are two problems :
1- Im not sure if System.out.println() can print arabic characters, so try another method like MessageBox.show() to be sure there is a problem with reading file.
2- If MessageBox.show() shows same result, the problem should be the charset, you can try UTF-8 or somthing else.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Are there standardized translations of Unicode character names? - unicode

According to this, there were official French translations for a time, but not anymore.

Related

How can I parse distinguished names from Active Directory using Powershell to determine parent OUs?

How to display Unicode character names and their hexadecimal codes with PowerShell?

How can I increase the maximum number of characters read by Read-Host?

PowerShell - Password Generator - How to always include number in string?

strange characters when opening a properties file

Categories

Resources