How to pipe tail -f to iconv cmmand? - encoding

I have a log file encoding with gbk, I have to read the data like this:
tail -n 2000 nohup.out | iconv -f gbk -t utf-8
but when I use tail -f it will print nothing:
tail -f nohup.out | iconv -f gbk -t utf-8

In a similar situation I use a script that reads each line and convert. In your case:
tail -f nohup.out | iconv.sh
#!/bin/bash
#iconv.sh
IFS=''
while read line
do
echo "$line" | iconv -f gbk -t utf-8
done < "${1:-/dev/stdin}"

Related

xargs lines containing -e and -n processed differently

When running the following command with xargs (GNU findutils) 4.7.0
xargs -n1 <<<"-d -e -n -o"
I get this output
-d
-o
Why is -e and -n not present in the output?
From man xargs:
[...] and executes the command (default is /bin/echo) [...]
So it runs:
echo -d
echo -e
echo -n
echo -o
But from man echo:
-n do not output the trailing newline
-e enable interpretation of backslash escapes
And echo -n outputs nothing, and echo -e outputs one empty newlines that you see in the output.

how to redirect output from sed command to a file

I have a log that produce lots of text in the line along side the string that i want to get. basically it contains something like:
bla bla bla packet 12 out of 432 bla bla
I have this big command:
tail -f log.txt |grep --line-buffered "packet" |sed -n 's/.*\(packet [0-9]* out of [0-9]*\).*/\1/p' |while read log; do echo "$(date +%F_%H:%m:%S:%N) $log" ; done
and I want to redirect the output to file.
Why >> file does not work? What am I doing wrong?
The reason for this issue is that the sed command holds the output data in a buffer. You are avoided the data synchronization issue for the grep command (2nd in the pipe chain) by using --line-buffered parameter, using -u parameter for next piped sed command help to fix this issue. The command should be:
tail -f log.txt | grep --line-buffered "packet" | sed -n -u 's/.*\(packet [0-9]* out of [0-9]*\).*/\1/p' | while read log; do echo "$(date +%F_%H:%m:%S:%N) $log" >> outputfile.log ; done
sed -n -u 's/.(packet [0-9] out of [0-9])./\1/p'
Instead of
sed -n 's/.(packet [0-9] out of [0-9])./\1/p'
With this change the redirect to file should work.

Encoding utf-8 doesn't recognize text

Terminal: screen in xterm on the latest Ubuntu LiveCD.
��� �������.avi
While I'm trying to ls directory, I see this:
ls -la give me this:
MidNight Commander show me this:
$ ls
??? ???????.avi
$ env | grep -i LANG
LANG=en_US.UTF-8
$ export | grep -i LANG
declare -x LANG="en_US.UTF-8"
Looks like this is UTF-16 surrogate, am I right? [
en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates
I'm trying to trick it through python3, I'm caught such exception:
for i in os.listdir('.'):
print (i)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in
position 0: surrogates not allowed
I've uploaded file with empty body, just title - 4.0K: https://mega.co.nz/#!roYUyQaB!AwOMDznj9DC_wSpAeWqjVj_Oqu2z8Kfk5VsSmFs0ybA
$ echo $'\xc4\xf3\xf5 \xe2\xf0\xe5\xec\xe5\xed\xed' | chardet
<stdin>: MacCyrillic (confidence: 0.92)
$ echo $'\xc4\xf3\xf5 \xe2\xf0\xe5\xec\xe5\xed\xed' | enca -L ru
MS-Windows code page 1251
LF line terminators
$ echo $'\xc4\xf3\xf5 \xe2\xf0\xe5\xec\xe5\xed\xe8' | iconv -f 'Windows-1251'
Дух времени
So you need to set your terminal to Windows-1251.

How can I check whether a piped content is text with perl

I've written a svn-hook for text files. The content test looks like this:
svnlook cat -t $txn $repos $file 2>/dev/null | file - | egrep -q 'text$'
and I was wondering if this could be done with Perl. However something like this doesn't work:
svnlook cat -t $txn $repos $file 2>/dev/null | perl -wnl -e '-T' -
I'm testing the exit status of this invocation ($?) to see if the given file was text or binary. Since I'm getting the content out of svn. I can't use perl's normal file check.
I've done a simulation with the file program and perl with a text and binary file (text.txt, icon.png):
find -type f | xargs -i /bin/bash -c 'if $(cat {} | file - | egrep -q "text$"); then echo "{}: text"; else echo "{}: binary"; fi'
./text.txt: text
./icons.png: binary
find -type f | xargs -i /bin/bash -c 'if $(cat {} | perl -wln -e "-T;"); then echo "{}: text"; else echo "{}: binary"; fi'
./text.txt: text
./icons.png: text
You're testing perl's exit code, but you never set it. You need
perl -le'exit(-T STDIN ?0:1)' < file

Match escape sequence for "bold" in console output with grep

Hi I have lots of logfiles with ^[[1m (as vim displays it) in them. I want to watch a logfile life via
tail -n 1000 -f logfile.log | grep <expression-for-escape-sequence>
and only get lines that have bold in them.
I am not sure which grep options I should use and have tried the following already:
tail -n 1000 -f logfile.log | grep "\033\0133\061\0155"
tail -n 1000 -f logfile.log | grep "\033\01331m"
tail -n 1000 -f logfile.log | grep "\033\[1m"
It does not work though... And yes there are bold lines in the last 1000 lines of logfile.log, testing with
echo -e "\033\01331mTest\033\01330m" | grep ...
same results... ;)
Appreciate any help!
Use single quotes with a dollar sign in front—as in $'...'—to have the shell convert the \033 escape sequence into an ESC character:
tail -n 1000 -f logfile.log | grep $'\033\[1m'
From man bash:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard.
This works (in a POSIX shell, not necessarily bash):
echo -e "\033\01331mTest\033\01330m" | grep "$(printf "\x1b\\[1m")"