crypt() returns null using a salt with a `+` - centos

Why does crypt have a problem with + in the salt as of CentOS 8??
CentOS 7:
Start by creating a salt on a CentOS 7 machine:
[root#localhost map]# /usr/bin/head -c18 /dev/urandom | /usr/bin/openssl base64
M9aZENv5Gm71Y+cwUyaUOcSp
Notice the + in the salt...
Updated salt calculation: cat /dev/urandom | tr -dc 'a-zA-Z0-9./' | head -c16 -> crypt() charset
[root#localhost map]# perl -e 'print crypt("testpwd","\$6\$M9aZENv5Gm71Y+cwUyaUOcSp\$") . "\n `
$6$M9aZENv5Gm71Y+cw$LBRLY1IpcFLA.gNBe0V4nkbP8qg40fhgFv1rIW/U4wTvthNX/nvhEKsXziCxMKSoSOzlv3ukfeIcrXTP26363/
CentOS 8:
[root#localhost map]# perl -e 'print crypt("testpwd","\$6\$M9aZENv5Gm71Y+cwUyaUOcSp\$") . "\n"'
[root#localhost map]#
So, it's returning NULL on CentOS 8. My guess is that + is no longer valid in a salt using crypt()
It works if you remove the + from the salt. What is going on here and are there other bad salt chars??
UPDATE
I found this post https://serverfault.com/questions/88284/how-is-a-password-hash-encoded-in-the-shadow-password-file where it talks about:
If you only want to know how the password is encoded, crypt() uses a special Base64-type of encoding. Since I was using Base64 instead of crypt() charset, the '+' messed things up because it's not valid in the crypt charset. Still not sure why it's allowed in CentOS 7 vs. 8..
Base64 encoding uses the following charset: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
While the crypt() encoding uses this charset: ./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
So, I will use this now to create a proper salt for passwords using tr to include characters from the crypt() charset.
cat /dev/urandom | tr -dc 'a-zA-Z0-9./' | head -c16
ayHTWqcZIT1h.LLg
Thanks Zak!!

Related

Base64 encoding is adding a new line

I'm trying to encode a database string using base64 on the command line in linux.
Once I do I add the value to a secret in kubernetes but my application is failing to connect to the database due to the db string not being accepted. There seems to be a newline getting added when I check the value in lens and this is not there in a similar cluster in the same secret
jdbc:postgresql://test.xxxxxxxx.eu-west-2.rds.amazonaws.com/test
deirdre$ echo jdbc:postgresql://test.xxxxxxxx.eu-west-2.rds.amazonaws.com/test | base64 | tr -d "\n"
amRiYzpwb3N0Z3Jlc3FsOi8vdGVzdC54eHh4eHh4eC5ldS13ZXN0LTIucmRzLmFtYXpvbmF3cy5jb20vdGVzdAo=
Is there something I am doing wrong? or is there an issue with the /?
You can fix those easy with
echo -n "string" | base64
"echo -n" removes the trailing newline character.
You can also see my last answer i gave to following Question
Kubernetes secrets as environment variable add space character
the problem is that base64 adds the newline in order to be compatible with older systems that have a maximum line width. you can add the -w 0 option to the base64 command to change the behavior so that it no longer adds new lines.
in your example this would be
echo "jdbc:postgresql://test.xxxxxxxx.eu-west-2.rds.amazonaws.com/test" | base64 -w 0
which results in
amRiYzpwb3N0Z3Jlc3FsOi8vdGVzdC54eHh4eHh4eC5ldS13ZXN0LTIucmRzLmFtYXpvbmF3cy5jb20vdGVzdAo=
edit:
printf "%s" jdbc:postgresql://test.xxxxxxxx.eu-west-2.rds.amazonaws.com/test | base64 -w 0
produces the correct output which adds an additional newline in the base64 encoded string which is apparently required for the url to be recognized as properly ended

Formating file changes encoding on Redhat system

I have a bash script which extract data from an oracle database. I use spool to extract data. After extraction I format the file by removing and replacing some characters. My problem is after formating the files are in ANSI encoding instead of ut8.
Extraction with spool. The file is utf8
Format with cat and tr command and redirect in another file. This file is ansi.
The same process works fine on Aix system. I try iconv but it doesnt work. Do you please have an idea why the encoding changes from utf8 to ansi ? How to correct it please ?
You should consequently use either ISO-8859-1 or UTF-8. In the latter case, don't use tr as it doesn't (yet?) support multi-byte characters, use sed instead (e.g sed 's/deletethis//g').
ISO-8859-1:
export LC_CTYPE=fr_FR.ISO-8859-1
export NLS_LANG=French_France.WE8ISO8859P1
# fetch data from Oracle, emulated by the following line
echo 'âêîôû' >test.latin1 # 5 bytes (+lineend)
# perform formatting, eg:
sed 's/ê/[e-circumflex]/g' test.latin1
# or the same with hex-codes:
sed $'s/\xea/[e-circumflex]/g' test.latin1
UTF-8:
export LC_CTYPE=fr_FR.UTF-8
export NLS_LANG=French_France.AL32UTF8
# fetch data from Oracle, emulated by the following line
echo 'âêîôû' >test.utf8 # 10 bytes (+lineend)
# perform formatting, eg:
sed 's/ê/[e-circumflex]/g' test.utf8
# or the same with hex-codes:
sed $'s/\xc3\xaa/[e-circumflex]/g' test.utf8
Note: no conversion (iconv, recode, etc) is required, just make sure NLS_LANG and LC_CTYPE are compatible. (Also, your terminal(emulator) should be set accordingly; for PuTTY it is Configuration/Category/Window/Translation/Remote-character-set.)
Original answer:
I cannot tell what's wrong with the formatting you perform, but here is a method to damage the utf8-encoded text:
$ echo 'ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP' | iconv -f iso-8859-2 -t utf-8 | xxd
00000000: c381 5256 c38d 5a54 c5b0 52c5 9020 54c3 ..RV..ZT..R.. T.
00000010: 9c4b c396 5246 c39a 52c3 9347 c389 500a .K..RF..R..G..P.
$ echo 'ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP' | iconv -f iso-8859-2 -t utf-8 | tr -d $'\200-\237' | xxd
00000000: c352 56c3 5a54 c5b0 52c5 2054 c34b c352 .RV.ZT..R. T.K.R
00000010: 46c3 52c3 47c3 500a F.R.G.P.
Here the tr -d $'\200-\237' part deleted half of the utf8-sequences (c381 became c3, c590 became c5), rendering the text unusable.

How to ignore newline in the end of openssl md5 command-line?

I know we can get the right output in below ways:
echo -n 123456 | openssl md5
e10adc3949ba59abbe56e057f20f883e
or
printf 123456 | openssl md5
e10adc3949ba59abbe56e057f20f883e
or
printf 123456 > file.txt
openssl md5 file.txt
e10adc3949ba59abbe56e057f20f883e
However, I want to know could we work it out in below command-line with extra options
openssl md5 <<< '123456'
f447b20a7fcbf53a5d5be013ea0b15af( this is incorrect)
bash (and ksh93, and zsh) will always append a newline to the content of the here-string. There is no way around this apart from filtering it out explicitly.
$ tr -d '\n' <<<'123456' | openssl md5
(stdin)= e10adc3949ba59abbe56e057f20f883e

Default character encoding for perl file open api?

I am am using Perl open for opening new file on Solaris 10 as follows:
open($fh, ">$filePath");
What is default file character encoding on my system with this call?
The output from locale command is given below
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
This was not as easy a question to answer as I thought it would be.
The default encoding is raw, which is suitable for binary data. Any character with an ordinal value under 256 is passed as is:
$ perl -e 'print chr(0xFF)' | od -c
00000000 377
00000001
The curious thing is what happens when you try to write a character above ordinal value 255. Then it looks like you get UTF-8 encoding.
$ perl -e 'print chr(0x100)' | od -c
00000000 304 200
00000002
I don't know where or if this behavior is documented.

Finding non-Ascii character [duplicate]

I have several very large XML files and I'm trying to find the lines that contain non-ASCII characters. I've tried the following:
grep -e "[\x{00FF}-\x{FFFF}]" file.xml
But this returns every line in the file, regardless of whether the line contains a character in the range specified.
Do I have the syntax wrong or am I doing something else wrong? I've also tried:
egrep "[\x{00FF}-\x{FFFF}]" file.xml
(with both single and double quotes surrounding the pattern).
You can use the command:
grep --color='auto' -P -n "[\x80-\xFF]" file.xml
This will give you the line number, and will highlight non-ascii chars in red.
In some systems, depending on your settings, the above will not work, so you can grep by the inverse
grep --color='auto' -P -n "[^\x00-\x7F]" file.xml
Note also, that the important bit is the -P flag which equates to --perl-regexp: so it will interpret your pattern as a Perl regular expression. It also says that
this is highly experimental and grep -P may warn of unimplemented
features.
Instead of making assumptions about the byte range of non-ASCII characters, as most of the above solutions do, it's slightly better IMO to be explicit about the actual byte range of ASCII characters instead.
So the first solution for instance would become:
grep --color='auto' -P -n '[^\x00-\x7F]' file.xml
(which basically greps for any character outside of the hexadecimal ASCII range: from \x00 up to \x7F)
On Mountain Lion that won't work (due to the lack of PCRE support in BSD grep), but with pcre installed via Homebrew, the following will work just as well:
pcregrep --color='auto' -n '[^\x00-\x7F]' file.xml
Any pros or cons that anyone can think off?
The following works for me:
grep -P "[\x80-\xFF]" file.xml
Non-ASCII characters start at 0x80 and go to 0xFF when looking at bytes. Grep (and family) don't do Unicode processing to merge multi-byte characters into a single entity for regex matching as you seem to want. The -P option in my grep allows the use of \xdd escapes in character classes to accomplish what you want.
The easy way is to define a non-ASCII character... as a character that is not an ASCII character.
LC_ALL=C grep '[^ -~]' file.xml
Add a tab after the ^ if necessary.
Setting LC_COLLATE=C avoids nasty surprises about the meaning of character ranges in many locales. Setting LC_CTYPE=C is necessary to match single-byte characters — otherwise the command would miss invalid byte sequences in the current encoding. Setting LC_ALL=C avoids locale-dependent effects altogether.
In perl
perl -ane '{ if(m/[[:^ascii:]]/) { print } }' fileName > newFile
Here is another variant I found that produced completely different results from the grep search for [\x80-\xFF] in the accepted answer. Perhaps it will be useful to someone to find additional non-ascii characters:
grep --color='auto' -P -n "[^[:ascii:]]" myfile.txt
Note: my computer's grep (a Mac) did not have -P option, so I did brew install grep and started the call above with ggrep instead of grep.
Searching for non-printable chars. TLDR; Executive Summary
search for control chars AND extended unicode
locale setting e.g. LC_ALL=C needed to make grep do what you might expect with extended unicode
SO the preferred non-ascii char finders:
$ perl -ne 'print "$. $_" if m/[\x00-\x08\x0E-\x1F\x80-\xFF]/' notes_unicode_emoji_test
as in top answer, the inverse grep:
$ grep --color='auto' -P -n "[^\x00-\x7F]" notes_unicode_emoji_test
as in top answer but WITH LC_ALL=C:
$ LC_ALL=C grep --color='auto' -P -n "[\x80-\xFF]" notes_unicode_emoji_test
. . more . . excruciating detail on this: . . .
I agree with Harvey above buried in the comments, it is often more useful to search for non-printable characters OR it is easy to think non-ASCII when you really should be thinking non-printable. Harvey suggests "use this: "[^\n -~]". Add \r for DOS text files. That translates to "[^\x0A\x020-\x07E]" and add \x0D for CR"
Also, adding -c (show count of patterns matched) to grep is useful when searching for non-printable chars as the strings matched can mess up terminal.
I found adding range 0-8 and 0x0e-0x1f (to the 0x80-0xff range) is a useful pattern. This excludes the TAB, CR and LF and one or two more uncommon printable chars. So IMHO a quite a useful (albeit crude) grep pattern is THIS one:
grep -c -P -n "[\x00-\x08\x0E-\x1F\x80-\xFF]" *
ACTUALLY, generally you will need to do this:
LC_ALL=C grep -c -P -n "[\x00-\x08\x0E-\x1F\x80-\xFF]" *
breakdown:
LC_ALL=C - set locale to C, otherwise many extended chars will not match (even though they look like they are encoded > 0x80)
\x00-\x08 - non-printable control chars 0 - 7 decimal
\x0E-\x1F - more non-printable control chars 14 - 31 decimal
\x80-1xFF - non-printable chars > 128 decimal
-c - print count of matching lines instead of lines
-P - perl style regexps
Instead of -c you may prefer to use -n (and optionally -b) or -l
-n, --line-number
-b, --byte-offset
-l, --files-with-matches
E.g. practical example of use find to grep all files under current directory:
LC_ALL=C find . -type f -exec grep -c -P -n "[\x00-\x08\x0E-\x1F\x80-\xFF]" {} +
You may wish to adjust the grep at times. e.g. BS(0x08 - backspace) char used in some printable files or to exclude VT(0x0B - vertical tab). The BEL(0x07) and ESC(0x1B) chars can also be deemed printable in some cases.
Non-Printable ASCII Chars
** marks PRINTABLE but CONTROL chars that is useful to exclude sometimes
Dec Hex Ctrl Char description Dec Hex Ctrl Char description
0 00 ^# NULL 16 10 ^P DATA LINK ESCAPE (DLE)
1 01 ^A START OF HEADING (SOH) 17 11 ^Q DEVICE CONTROL 1 (DC1)
2 02 ^B START OF TEXT (STX) 18 12 ^R DEVICE CONTROL 2 (DC2)
3 03 ^C END OF TEXT (ETX) 19 13 ^S DEVICE CONTROL 3 (DC3)
4 04 ^D END OF TRANSMISSION (EOT) 20 14 ^T DEVICE CONTROL 4 (DC4)
5 05 ^E END OF QUERY (ENQ) 21 15 ^U NEGATIVE ACKNOWLEDGEMENT (NAK)
6 06 ^F ACKNOWLEDGE (ACK) 22 16 ^V SYNCHRONIZE (SYN)
7 07 ^G BEEP (BEL) 23 17 ^W END OF TRANSMISSION BLOCK (ETB)
8 08 ^H BACKSPACE (BS)** 24 18 ^X CANCEL (CAN)
9 09 ^I HORIZONTAL TAB (HT)** 25 19 ^Y END OF MEDIUM (EM)
10 0A ^J LINE FEED (LF)** 26 1A ^Z SUBSTITUTE (SUB)
11 0B ^K VERTICAL TAB (VT)** 27 1B ^[ ESCAPE (ESC)
12 0C ^L FF (FORM FEED)** 28 1C ^\ FILE SEPARATOR (FS) RIGHT ARROW
13 0D ^M CR (CARRIAGE RETURN)** 29 1D ^] GROUP SEPARATOR (GS) LEFT ARROW
14 0E ^N SO (SHIFT OUT) 30 1E ^^ RECORD SEPARATOR (RS) UP ARROW
15 0F ^O SI (SHIFT IN) 31 1F ^_ UNIT SEPARATOR (US) DOWN ARROW
UPDATE: I had to revisit this recently. And, YYMV depending on terminal settings/solar weather forecast BUT . . I noticed that grep was not finding many unicode or extended characters. Even though intuitively they should match the range 0x80 to 0xff, 3 and 4 byte unicode characters were not matched. ??? Can anyone explain this? YES. #frabjous asked and #calandoa explained that LC_ALL=C should be used to set locale for the command to make grep match.
e.g. my locale LC_ALL= empty
$ locale
LANG=en_IE.UTF-8
LC_CTYPE="en_IE.UTF-8"
.
.
LC_ALL=
grep with LC_ALL= empty matches 2 byte encoded chars but not 3 and 4 byte encoded:
$ grep -P -n "[\x00-\x08\x0E-\x1F\x80-\xFF]" notes_unicode_emoji_test
5:© copyright c2a9
7:call underscore c2a0
9:CTRL
31:5 © copyright
32:7 call underscore
grep with LC_ALL=C does seem to match all extended characters that you would want:
$ LC_ALL=C grep --color='auto' -P -n "[\x80-\xFF]" notes_unicode_emoji_test
1:���� unicode dashes e28090
3:��� Heart With Arrow Emoji - Emojipedia == UTF8? f09f9298
5:� copyright c2a9
7:call� underscore c2a0
11:LIVE��E! ���������� ���� ���������� ���� �� �� ���� ���� YEOW, mix of japanese and chars from other e38182 e38184 . . e0a487
29:1 ���� unicode dashes
30:3 ��� Heart With Arrow Emoji - Emojipedia == UTF8 e28090
31:5 � copyright
32:7 call� underscore
33:11 LIVE��E! ���������� ���� ���������� ���� �� �� ���� ���� YEOW, mix of japanese and chars from other
34:52 LIVE��E! ���������� ���� ���������� ���� �� �� ���� ���� YEOW, mix of japanese and chars from other
81:LIVE��E! ���������� ���� ���������� ���� �� �� ���� ���� YEOW, mix of japanese and chars from other
THIS perl match (partially found elsewhere on stackoverflow) OR the inverse grep on the top answer DO seem to find ALL the ~weird~ and ~wonderful~ "non-ascii" characters without setting locale:
$ grep --color='auto' -P -n "[^\x00-\x7F]" notes_unicode_emoji_test
$ perl -ne 'print "$. $_" if m/[\x00-\x08\x0E-\x1F\x80-\xFF]/' notes_unicode_emoji_test
1 ‐‐ unicode dashes e28090
3 💘 Heart With Arrow Emoji - Emojipedia == UTF8? f09f9298
5 © copyright c2a9
7 call underscore c2a0
9 CTRL-H CHARS URK URK URK
11 LIVE‐E! あいうえお かが アイウエオ カガ ᚊ ᚋ ซฌ आइ YEOW, mix of japanese and chars from other e38182 e38184 . . e0a487
29 1 ‐‐ unicode dashes
30 3 💘 Heart With Arrow Emoji - Emojipedia == UTF8 e28090
31 5 © copyright
32 7 call underscore
33 11 LIVE‐E! あいうえお かが アイウエオ カガ ᚊ ᚋ ซฌ आइ YEOW, mix of japanese and chars from other
34 52 LIVE‐E! あいうえお かが アイウエオ カガ ᚊ ᚋ ซฌ आइ YEOW, mix of japanese and chars from other
73 LIVE‐E! あいうえお かが アイウエオ カガ ᚊ ᚋ ซฌ आइ YEOW, mix of japanese and chars from other
SO the preferred non-ascii char finders:
$ perl -ne 'print "$. $_" if m/[\x00-\x08\x0E-\x1F\x80-\xFF]/' notes_unicode_emoji_test
as in top answer, the inverse grep:
$ grep --color='auto' -P -n "[^\x00-\x7F]" notes_unicode_emoji_test
as in top answer but WITH LC_ALL=C:
$ LC_ALL=C grep --color='auto' -P -n "[\x80-\xFF]" notes_unicode_emoji_test
The following code works:
find /tmp | perl -ne 'print if /[^[:ascii:]]/'
Replace /tmp with the name of the directory you want to search through.
This method should work with any POSIX-compliant version of awk and iconv.
We can take advantage of file and tr as well.
curl is not POSIX, of course.
Solutions above may be better in some cases, but they seem to depend on GNU/Linux implementations or additional tools.
Just get a sample file somehow:
$ curl -LOs http://gutenberg.org/files/84/84-0.txt
$ file 84-0.txt
84-0.txt: UTF-8 Unicode (with BOM) text, with CRLF line terminators
Search for UTF-8 characters:
$ awk '/[\x80-\xFF]/ { print }' 84-0.txt
or non-ASCII
$ awk '/[^[:ascii:]]/ { print }' 84-0.txt
Convert UTF-8 to ASCII, removing problematic characters (including BOM which should not be in UTF-8 anyway):
$ iconv -c -t ASCII 84-0.txt > 84-ascii.txt
Check it:
$ file 84-ascii.txt
84-ascii.txt: ASCII text, with CRLF line terminators
Tweak it to remove DOS line endings / ^M ("CRLF line terminators"):
$ tr -d '\015' < 84-ascii.txt > 84-tweaked.txt && file 84-tweaked.txt
84-tweaked.txt: ASCII text
This method discards any "bad" characters it cannot deal with, so you may need to sanitize / validate the output. YMMV
Strangely, I had to do this today! I ended up using Perl because I couldn't get grep/egrep to work (even in -P mode). Something like:
cat blah | perl -en '/\xCA\xFE\xBA\xBE/ && print "found"'
For unicode characters (like \u2212 in example below) use this:
find . ... -exec perl -CA -e '$ARGV = #ARGV[0]; open IN, $ARGV; binmode(IN, ":utf8"); binmode(STDOUT, ":utf8"); while (<IN>) { next unless /\N{U+2212}/; print "$ARGV: $&: $_"; exit }' '{}' \;
It could be interesting to know how to search for one unicode character. This command can help. You only need to know the code in UTF8
grep -v $'\u200d'
Finding all non-ascii characters gives the impression that one is either looking for unicode strings or intends to strip said characters individually.
For the former, try one of these (variable file is used for automation):
file=file.txt ; LC_ALL=C grep -Piao '[\x80-\xFF\x20]{7,}' $file | iconv -f $(uchardet $file) -t utf-8
file=file.txt ; pcregrep -iao '[\x80-\xFF\x20]{7,}' $file | iconv -f $(uchardet $file) -t utf-8
file=file.txt ; pcregrep -iao '[^\x00-\x19\x21-\x7F]{7,}' $file | iconv -f $(uchardet $file) -t utf-8
Vanilla grep doesn't work correctly without LC_ALL=C as noted in the previous answers.
ASCII range is x00-x7F, space is x20, since strings have spaces the negative range omits it.
Non-ASCII range is x80-xFF, since strings have spaces the positive range adds it.
String is presumed to be at least 7 consecutive characters within the range. {7,}.
For shell readable output, uchardet $file returns a guess of the file encoding which is passed to iconv for automatic interpolation.
if you're trying to grab/grep UTF8-compliant multibyte-characters, use this :
( [\302-\337][\200-\277]|
[\340][\240-\277][\200-\277]|
[\355][\200-\237][\200-\277]|
[\341-\354\356-\357][\200-\277][\200-\277]|
[\360][\220-\277][\200-\277][\200-\277]|
[\361-\363][\200-\277][\200-\277][\200-\277]|
[\364][\200-\217][\200-\277][\200-\277] )
* please delete all newlines, spaces, or tabs in between (..)
* feel free to use bracket ranges {1,3} etc to optimize
the redundant listings of [\200-\277]. but don't change that
[\200-\277]+, as that might result in invalid encodings
due to either insufficient or too many continuation bytes
* although some historical UTF-8 references considers 5- and
6-byte encodings to be valid, as of Unicode 13 they only
consider up to 4-bytes
I've tested this string even against random binary files, and it would report the same multi-byte character count as gnu-wc.
Add in another [\000-\177]| at the front just after ( of that if you need full UTF8 matching string.
This regex is truly hideous yes, but it's also POSIX-compliant, cross-language and cross-platform compatible (doesn't depend on any special regex notation, (should be) fully UTF-8 compliant (Unicode 13), and completely independent of locale-setting.
if you're running grep with this, please use grep -P
If you just need the other bytes, then others have suggested already.
if you need the 11,172 characters of NFC-composed korean hangul it's
(([\352][\260-\277]|[\353\354][\200-\277]|
[\355][\200-\235])[\200-\277]|[\355][\236][\200-\243])
and if you need Japanese hiragana+katakana, it's
([\343]([\201-\203][\200-\277]|[\207][\260-\277]))