SonarQube 6.3 LDAP/SSO UTF-8 encoding - encoding

We‘re using LDAP/SSO in my company which provides the username in UTF-8 format to SonarQube.
However LDAP/SSO sends the username in UFT-8 but SonarQube requires Latin1/ISO-8859. There is no way to change the encoding on LDAP/SSO or SonarQube.
Result wrong umlauts:
Andrü Tingö = Andr«Ã Ting¼Ã OR äëüö = äëüÃ
Is there any workaround?

I wanted to post this as comment, but I need 50 reputations to write comments.
We are using simplesamlphp for SSO as IdP and SP. IdP takes cn, givenName and sn from LDAP, which has UTF-8 values. Loginnames/Usernames are us-ascii only.
If the user comes to Sonar, the non-us-ascii characters are incorrect - they were converted from ... to utf-8, even they already are in utf-8.
If I use the attributes from IDP in PHP which sends the page in UTF-8, then characters are correct.
I did just now one test. In our Apache Config we set the X-Forwarded-Name to MCAC_ATTR_CN attribute what SP get from IdP. Original configuration is:
RequestHeader set X-Forwarded-Name "expr=%{reqenv:MCAC_ATTR_CN}"
Now I have added fixed string in UTF-8:
RequestHeader set X-Forwarded-Name "expr=%{reqenv:MCAC_ATTR_CN} cäëöüc"
The "c" characters are only separators to see the encoded text better.
The hexdump of this configuration line is:
0000750: 09 0909 5265 7175 6573 7448 6561 ...RequestHea
0000760: 6465 7220 7365 7420 582d 466f 7277 6172 der set X-Forwar
0000770: 6465 642d 4e61 6d65 2022 6578 7072 3d25 ded-Name "expr=%
0000780: 7b72 6571 656e 763a 4d43 4143 5f41 5454 {reqenv:MCAC_ATT
0000790: 525f 434e 7d20 63c3 a4c3 abc3 b6c3 bc63 R_CN} c........c
00007a0: 220a ".
As you can see, there are fixed utf-8 characters "ä" c3a4 "ë" c3ab "ö" c3b6 "ü" c3bc.
From LDAP comes follwing name:
xxxxxx xxxxx xxxx äëüö
In Apache config is appended " cäëöüc", therefore resulting name should be:
xxxxxx xxxxx xxxx äëüö cäëöüc
But in Sonar, the name is displayed as
xxxxxx xxxxx xxxx äëüö cäëöüc
You get similar result if you convert follwing text:
xxxxxx xxxxx xxxx äëüö cäëöüc
from ISO-8859-1 to UTF-8:
echo "xxxxxx xxxxx xxxx äëüö cäëöüc" | iconv -f iso-8859-2 -t utf-8
xxxxxx xxxxx xxxx äÍßÜ cäÍÜßc
The "¤" character is utf-8 char c2 a4:
00000000: c2a4 0a ...
I have made tcpdump on loopback to get communications from apache proxy module to sonarqube and even there you can see correct UTF-8 characters c3a4 c3ab c3bc c3b6 comming from IdP and then between "c"s you can see c3a4 c3ab c3b6 c3bc comming direct from apache.
00000000 47 45 54 20 2f 61 63 63 6f 75 6e 74 20 48 54 54 GET /acc ount HTT
...
00000390 58 2d 46 6f 72 77 61 72 64 65 64 2d 4e 61 6d 65 X-Forwar ded-Name
000003A0 3a 20 72 6f 62 65 72 74 20 74 65 73 74 32 20 77 : xxxxxx xxxxx x
000003B0 6f 6c 66 20 c3 a4 c3 ab c3 bc c3 b6 20 63 c3 a4 xxx .... .... c..
000003C0 c3 ab c3 b6 c3 bc 63 0d 0a ......c. .
...
The system has locales set to en_US.UTF-8, if this matters.
So Sonar gets really UTF-8 Text from Apache (direct config or from IdP) but then something probably converts this utf-8 text as if it was iso-8859 text to utf-8 again and makes nonsense.
Do you have any idea now? Could this be something in sonar or in wrapper or somewhere some options set incorrectly?
Regards,
Robert.

Related

How to read currency symbol in a perl script

I have a Perl script where we are reading data from a .csv file which is having some different currency symbol . When we are reading that file and write the content I can see it is printing
Get <A3>50 or <80>50 daily
Actual value is
Get £50 or €50 daily
With Dollar sign it is working fine if there is any other currency code it is not working
I tried
open my $in, '<:encoding(UTF-8)', 'input-file-name' or die $!;
open my $out, '>:encoding(latin1)', 'output-file-name' or die $!;
while ( <$in> ) {
print $out $_;
}
$ od -t x1 input-file-name
0000000 47 65 74 20 c2 a3 35 30 20 6f 72 20 e2 82 ac 35
0000020 30 20 64 61 69 6c 79 0a
0000030
od -t x1 output-file-name
0000000 47 65 74 20 a3 35 30 20 6f 72 20 5c 78 7b 32 30
0000020 61 63 7d 35 30 20 64 61 69 6c 79 0a
0000034
but that is also not helping .Output I am getting
Get \xA350 or \x8050 daily
od -t x1 output-file-name
0000000 47 65 74 20 a3 35 30 20 6f 72 20 5c 78 7b 32 30
0000020 61 63 7d 35 30 20 64 61 69 6c 79 0a
0000034
Unicode Code Point
Glyph
UTF-8
Input File
ISO-8859-1
Output File
U+00A3 POUND SIGN
£
C2 A3
C2 A3
A3
A3
U+20AC EURO SIGN
€
E2 82 AC
E2 82 AC
N/A
5C 78 7B 32 30 61 63 7D
("LATIN1" is an alias for "ISO-8859-1".)
There are no problems with the input file.
£ is correctly encoded in your input file.
€ is correctly encoded in your input file.
As for the output file,
£ is correctly encoded in your output file.
€ isn't found in the latin1 charset, so \x{20ac} is used instead.
Your program is working as expected.
You say you see <A3> instead of £. That's probably because the program you are using is expecting a file encoded using UTF-8, but you provided a file encoded using ISO-8859-1.
You also say you see <80> instead of €. But there's no way you'd see that for the file you provided.

Powershell Script does not Write Correct Umlauts - Powershell itself does

I want to dump (and later work with) the paths of the locally changed files in my SVN repository. Problem is, there are umlauts in some filenames (like ä, ö, ü).
When I open a powershell window in my lokal trunk folder, I can do svn status and get the result with correct umlauts ("ü" in this case):
PS C:\trunk> svn status -q
M Std\ClientComponents\Prüfung.xaml
M Std\ClientComponents\Prüfung.xaml.cs
M Std\ClientComponents\PrüfungViewModel.cs
When I do the same in my powershell script, the results are different.
Script "DumpChangedFiles.ps1":
foreach ( $filename in svn status -q )
{
Write-Host $filename
}
Results:
PS C:\trunk> .\DumpChangedFiles.ps1
M Std\ClientComponents\Pr³fung.xaml
M Std\ClientComponents\Pr³fung.xaml.cs
M Std\ClientComponents\Pr³fungViewModel.cs
Question: Why are the umlauts wrong? How do I get to the correct results?
Hex-Dump:
ef bb bf 4d 20 20 20 20 20 20 20 53 74 64 5c 43 6c 69 65 6e 74 43 6f 6d 70 6f 6e 65 6e 74 73 5c 50 72 c2 b3 66 75 6e 67 2e 78 61 6d 6c 0d 0a 4d 20 20 20 20 20 20 20 53 74 64 5c 43 6c 69 65 6e 74 43 6f 6d 70 6f 6e 65 6e 74 73 5c 50 72 c2 b3 66 75 6e 67 2e 78 61 6d 6c 2e 63 73 0d 0a 4d 20 20 20 20 20 20 20 53 74 64 5c 43 6c 69 65 6e 74 43 6f 6d 70 6f 6e 65 6e 74 73 5c 50 72 c2 b3 66 75 6e 67 56 69 65 77 4d 6f 64 65 6c 2e 63 73
Here's the output of the script DumpChangedFiles.ps1 compared to the output of your desired command:
PS C:\trunk> .\DumpChangedFiles.ps1
M Std\ClientComponents\Pr³fung.xaml
M Std\ClientComponents\Pr³fung.xaml.cs
M Std\ClientComponents\Pr³fungViewModel.cs
PS C:\trunk> $PSDefaultParameterValues['*:Encoding'] = 'utf8'; svn status -q
M Std\ClientComponents\Prüfung.xaml
M Std\ClientComponents\Prüfung.xaml.cs
M Std\ClientComponents\PrüfungViewModel.cs
Output of SVN--version is:
PS C:\trunk> svn --version
svn, version 1.14.0 (r1876290)
compiled May 24 2020, 17:07:49 on x86-microsoft-windows
Copyright (C) 2020 The Apache Software Foundation.
This software consists of contributions made by many people;
see the NOTICE file for more information.
Subversion is open source software, see http://subversion.apache.org/
The following repository access (RA) modules are available:
* ra_svn : Module for accessing a repository using the svn network protocol.
- with Cyrus SASL authentication
- handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
- handles 'file' scheme
* ra_serf : Module for accessing a repository via WebDAV protocol using serf.
- using serf 1.3.9 (compiled with 1.3.9)
- handles 'http' scheme
- handles 'https' scheme
The following authentication credential caches are available:
* Wincrypt cache in C:\Users\reichert\AppData\Roaming\Subversion
The problem comes from PowerShell ISE, the svn command in your script is executed through PowerShell ISE which encode its output with Windows-1252 (or your default windows locales).
You can go with the following to get a correct output (check your Windows locales) :
[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(1252)
foreach ( $filename in svn status -q )
{
Write-Host $filename
}
It seems a previous unanswered question relates to the same problem with ISE :
Powershell ISE has different codepage from Powershell and I can not change it

WM-Bus extended layer decoding

I am trying to decrypt wm-bus telegram from Kamstrup Multical21 in C1 mode with Extended Link Layer.
The payload together with ELL info is following:
23 44 2D 2C 45 45 71 63 1B 16 8D 20 6A 31 FB 7C 20 39 A3 79 60 4B 90 BD FC BE 8D D8 CB 18 CE 77 DC 41 CE 8C
Analysing CI = 8D I found that there is a ELL with following data:
CI (1 byte) CC(1 byte) ACC(1 byte) SN(4 bytes) CRC(2 bytes)
8D 20 6A 31 FB 7C 20 39 A3
The documentation says that the buffer which should be decrypted shall contain CRC from ELL, i.e:
39 A3 79 60 4B 90 BD FC BE 8D D8 CB 18 CE 77 DC 41 CE 8C
I have got the AES key from the Manufacturer:
B9 7A 6D 4E C2 74 A4 6D 87 0E 31 27 D9 A0 AF 63
Initialization vector for ELL shall be:
M-field A-field CC-field SN-field FN BC
2D 2C 45 45 71 63 1B 16 20 31 FB 7C 20 00 00 00
After decrypting, I get the following result:
08 3a 5f ce b2 8d 51 97 94 a2 5b fb 61 ab 2e c0
e4 20 c8 2a 43 ff 3a 75 6f 93 d0 ac 8c 79 b7 a1
Since there is no 2F 2F in the beginning, something is wrong!
Can somebody help me and tell what I have done wrong?
Thanks in advance.
I had a look in the latest Kamstrup docs ("Wireless M-Bus Communication Kamstrup Water Meters - MULTICAL® 21 and flowIQ® water meters Mode C1 according to EN 13757-4:2013")
When I decrypt your packet I find:
25877968217E8E01000000000000000000
Firstly, it seems the Kamstrup decrypted packets does not start with 2F 2F.
The first 2 bytes of the decrypted packet is supposedly the PLCRC (I can't confirm that right now - don't have immediate access to the standard that defines the crc polynomial algorithm), and then the next byte is 79, which means it is a Compact Frame, then the next 4 bytes are 2 more CRCs, and then the next 2 bytes 0100 is probably the Info, which is manufacturer specific and I don't know how to interpret that yet.
This meter is probably R type 1, right? (on the face place, the "Con.:" parameter's 3rd last digit should be a 1) So its format would be [Info][Volume][Target Volume] - 2 bytes, 4 bytes, 4 bytes - I kind of assume that, since this packet is a compact packet, so I don't get the actual format the long packet would have, e.g. number of decimals - which normally you'd need - but your values are zeroes? so decimals doesn't matter. (the 'long' packet of course is every 6th packet or so?)
The IV I get is:
2D2C454571631B162031FB7C20000000
which is exactly the same as yours.
The encrypted packet I use is:
39A379604B90BDFCBE8DD8CB18CE77DC41
so I exclude the CE and 8C you had on yours?
When I put them in, the decrypted packet becomes:
25877968217E8E01000000000000000000BB49
which is pretty much the same packet with some more crc stuff at the back, I suspect, so I really do not get what you do to decrypt, since your result is completely different?
Ok, maybe you use AES/CBC/NoPadding, as in OpenMUC.
Kamstrup uses AES/CTR/NoPadding. That is how they don't have to decrypt multiples of 16 byte blocks? The way that looks in my Java code is as follows:
Cipher cipher = Cipher.getInstance("AES/CTR/NoPadding");
the hints here are very helpfull. There's one obstacle I stumbled across with the given message. The Length-Field is wrong and there are 2 bytes of garbage at the end.
I guess the original message was encoded in frame format B. That means the length field includes the frame CRCs and should be corrected after the CRCs are removed. After correcting the length to 0x21 (33 bytes + L-Field), I get the correct message and also can verify that the first 2 bytes of the decoded message contain the CRC16 of the remaining message.

Why does encoding, then decoding strings make Arabic characters lose their context?

I'm (belatedly) testing Unicode waters for the first time and am failing to understand why the process of encoding, then decoding an Arabic string is having the effect of separating out the individual characters that the word is made of.
In the example below, the word "ﻟﻠﺒﻴﻊ" comprises of 5 individual letters: "ع","ي","ب","ل","ل", written right to left. Depending on the surrounding context (adjacent letters), the letters change form
use strict;
use warnings;
use utf8;
binmode( STDOUT, ':utf8' );
use Encode qw< encode decode >;
my $str = 'ﻟﻠﺒﻴﻊ'; # "For sale"
my $enc = encode( 'UTF-8', $str );
my $dec = decode( 'UTF-8', $enc );
my $decoded = pack 'U0W*', map +ord, split //, $enc;
print "Original string : $str\n"; # ل ل ب ي ع
print "Decoded string 1: $dec\n" # ل ل ب ي ع
print "Decoded string 2: $decoded\n"; # ل ل ب ي ع
ADDITIONAL INFO
When pasting the string to this post, the rendering is reversed so it looks like "ﻊﻴﺒﻠﻟ". I'm reversing it manually to get it to look 'right'. The correct hexdump is given below:
$ echo "ﻟﻠﺒﻴﻊ" | hexdump
0000000 bbef ef8a b4bb baef ef92 a0bb bbef 0a9f
0000010
The output of the Perl script (per ikegami's request):
$ perl unicode.pl | od -t x1
0000000 4f 72 69 67 69 6e 61 6c 20 73 74 72 69 6e 67 20
0000020 3a 20 d8 b9 d9 8a d8 a8 d9 84 d9 84 0a 44 65 63
0000040 6f 64 65 64 20 73 74 72 69 6e 67 20 31 3a 20 d8
0000060 b9 d9 8a d8 a8 d9 84 d9 84 0a 44 65 63 6f 64 65
0000100 64 20 73 74 72 69 6e 67 20 32 3a 20 d8 b9 d9 8a
0000120 d8 a8 d9 84 d9 84 0a
0000127
And if I just print $str:
$ perl unicode.pl | od -t x1
0000000 4f 72 69 67 69 6e 61 6c 20 73 74 72 69 6e 67 20
0000020 3a 20 d8 b9 d9 8a d8 a8 d9 84 d9 84 0a
0000035
Finally (per ikegami's request):
$ grep 'For sale' unicode.pl | od -t x1
0000000 6d 79 20 24 73 74 72 20 3d 20 27 d8 b9 d9 8a d8
0000020 a8 d9 84 d9 84 27 3b 20 20 23 20 22 46 6f 72 20
0000040 73 61 6c 65 22 20 0a
0000047
Perl details
$ perl -v
This is perl, v5.10.1 (*) built for x86_64-linux-gnu-thread-multi
(with 53 registered patches, see perl -V for more detail)
Outputting to file reverses the string: "ﻊﻴﺒﻠﻟ"
QUESTIONS
I have several:
How can I preserve the context of each character while printing?
Why is the original string printed out to screen as individual letters, even though it hasn't been 'processed'?
When printing to file, the word is reversed (I'm guessing this is due to the script's right-to-left nature). Is there a way I can prevent this from happening?
Why does the following not hold true: $str !~ /\P{Bidi_Class: Right_To_Left}/;
Source code returned by StackOverflow (as fetched using wget):
... ef bb 9f ef bb a0 ef ba 92 ef bb b4 ef bb 8a ...
U+FEDF ARABIC LETTER LAM INITIAL FORM
U+FEE0 ARABIC LETTER LAM MEDIAL FORM
U+FE92 ARABIC LETTER BEH MEDIAL FORM
U+FEF4 ARABIC LETTER YEH MEDIAL FORM
U+FECA ARABIC LETTER AIN FINAL FORM
perl output I get from the source code returned by StackOverflow:
... ef bb 9f ef bb a0 ef ba 92 ef bb b4 ef bb 8a 0a
... ef bb 9f ef bb a0 ef ba 92 ef bb b4 ef bb 8a 0a
... ef bb 9f ef bb a0 ef ba 92 ef bb b4 ef bb 8a 0a
U+FEDF ARABIC LETTER LAM INITIAL FORM
U+FEE0 ARABIC LETTER LAM MEDIAL FORM
U+FE92 ARABIC LETTER BEH MEDIAL FORM
U+FEF4 ARABIC LETTER YEH MEDIAL FORM
U+FECA ARABIC LETTER AIN FINAL FORM
U+000A LINE FEED
So I get exactly what's in the source, as I should.
perl output you got:
... d8 b9 d9 8a d8 a8 d9 84 d9 84 0a
... d8 b9 d9 8a d8 a8 d9 84 d9 84 0a
... d8 b9 d9 8a d8 a8 d9 84 d9 84 0a
U+0639 ARABIC LETTER AIN
U+064A ARABIC LETTER YEH
U+0628 ARABIC LETTER BEH
U+0644 ARABIC LETTER LAM
U+0644 ARABIC LETTER LAM
U+000A LINE FEED
Ok, so you could have a buggy Perl (that reverses and changes Arabic characters and only those), but it's far more likely that your sources doesn't contain what you think it does. You need to check what bytes form up your source.
echo output you got:
ef bb 8a ef bb b4 ef ba 92 ef bb a0 ef bb 9f 0a
U+FECA ARABIC LETTER AIN FINAL FORM
U+FEF4 ARABIC LETTER YEH MEDIAL FORM
U+FE92 ARABIC LETTER BEH MEDIAL FORM
U+FEE0 ARABIC LETTER LAM MEDIAL FORM
U+FEDF ARABIC LETTER LAM INITIAL FORM
U+000A LINE FEED
There are significant differences in what you got from perl and from echo, so it's no surprise they show up differently.
Output inspected using:
$ perl -Mcharnames=:full -MEncode=decode_utf8 -E'
say sprintf("U+%04X %s", $_, charnames::viacode($_))
for unpack "C*", decode_utf8 pack "H*", $ARGV[0] =~ s/\s//gr;
' '...'
(Don't forget to swap the bytes of hexdump.)
Maybe something odd with your shell? If I redirect the output to a file, the result will be the same. Please try this out:
use strict;
use warnings;
use utf8;
binmode( STDOUT, ':utf8' );
use Encode qw< encode decode >;
my $str = 'ﻟﻠﺒﻴﻊ'; # "For sale"
my $enc = encode( 'UTF-8', $str );
my $dec = decode( 'UTF-8', $enc );
my $decoded = pack 'U0W*', map +ord, split //, $enc;
open(F1,'>',"origiinal.txt") or die;
open(F2,'>',"decoded.txt") or die;
open(F3,'>',"decoded2.txt") or die;
binmode(F1, ':utf8');binmode(F2, ':utf8');binmode(F3, ':utf8');
print F1 "$str\n"; # ل ل ب ي ع
print F2 "$dec\n"; # ل ل ب ي ع
print F3 "$decoded\n";

Insert shell code

I got a small question.
Say I have the following code inside a console application :
printf("Enter name: ");
scanf("%s", &name);
I would like to exploit this vulnerability and enter the following shell code (MessageboxA):
6A 00 68 04 21 2F 01 68 0C 21 2F 01 6A 00 FF 15 B0 20 2F 01
How can I enter my shell code (Hex values) through the console ?
If I enter the input as is, it treats the numbers as chars and not as hex values.
Thanks a lot.
You could use as stdin a file with the desired content or use the echo command.
Suppose your shell code is AA BB CC DD (obviously this is not a valid shellcode):
echo -e "\xAA\xBB\xCC\xDD" | prog