Data losing original format - powershell

I am relatively new to powershell and having a bit of a strange problem with a script. I have searched the forums and haven't been able to find anything that works.
The issue I am having is that when I covert output of commands to and from base64 for transport via a custom protocol we use in our environment it is losing its formatting. Commands are executed on the remote systems by passing the command string to IEX and store the output to a variable. I convert the output to base64 format using the following command
$Bytes = [System.Text.Encoding]::Unicode.GetBytes($str1)
$EncodedCmd = [Convert]::ToBase64String($Bytes)
At the other end when we recieve the output we convert back using the command
[System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String($EncodedCmd))
The problem I am having is that although the output is correct the formatting of the output has been lost. For example if I run the ipconfig command
Windows IP Configuration Ethernet adapter Local Area Connection 2: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Ethernet
adapter Local Area Connection 3: Connection-specific DNS Suffix . : Link-local IPv6 Address . . . . . : fe80::3cd8:3c7f:c78b:a78f%14 IPv4 Address. . . . . . . . . . .
: 192.168.10.64 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 192.168.10.100 Ethernet adapter Local Area Connection: Connection-sp
ecific DNS Suffix . : IPv4 Address. . . . . . . . . . . : 172.10.15.201 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 172.10.15
1.200 Tunnel adapter isatap.{42EDCBE-8172-5478-AD67E-8A28273E95}: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Tunnel ada
pter isatap.{42EDCBE-8172-5478-AD67E-8A28273E95}: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Tunnel adapter isatap.{42EDCBE-8172-5478-AD67E-8A28273E95}: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Tunnel adapter Teredo Tunneling Pseudo-Inter
face: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . :
The formatting is all over the place and hard to read, I have played around with it a bit, but I can't find a really good way of returning the command output in the correct format. Appreciate any ideas on how I can fix the formatting

What happens here is that the $str1 variable is an array of strings. It doesn't contain newline characters but each line is on its own row.
When the variable is converted as Base64, all the rows in the array are catenated together. This can be seen easily enough:
$Bytes[43..60] | % { "$_ -> " + [char] $_}
0 ->
105 -> i
0 ->
111 -> o
0 ->
110 -> n
0 ->
32 ->
0 ->
32 ->
0 ->
32 ->
0 ->
69 -> E
0 ->
116 -> t
0 ->
104 -> h
Here the 0 are caused by double byte Unicode. Pay attention to 32 that is space character. So one sees that there is just space padding, no line terminators in the source string
Windows IP Configuration
Ethernet
As a solution, either add line feed characters or serialize the whole array as XML.
Adding line feed characters is done via joining the array elements with -join and using [Environment]::NewLine as the separator caracter. Like so,
$Bytes = [System.Text.Encoding]::Unicode.GetBytes( $($str1 -join [environment]::newline))
$Bytes[46..67] | % { "$_ -> " + [char] $_}
105 -> i
0 ->
111 -> o
0 ->
110 -> n
0 ->
13 ->
0 ->
10 ->
0 ->
13 ->
0 ->
10 ->
0 ->
13 ->
0 ->
10 ->
0 ->
69 -> E
0 ->
116 -> t
0 ->
Here, the 13 and 10 are CR and LF characters that Windows uses for line feed. After adding the line feed characters, the result string looks like the source. Be aware that thought it looks the same, it is not the same. Source is an array of strings, the outcome is single string containing line feeds.
If you must preserve the original, serialization is the way to go.

Related

sed: delete lines that match a pattern in a given field

I have a file tab delimited that looks like this:
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
53_344 2 . C G 999 . . GT:PL:DP:DPR
6_56775 67 . T A 999 . . GT:PL:DP:DPR
53_234 78 . CCG GAT 999 . . GT:PL:DP:DPR
45_569 5 . TCCG GTTA 999 . . GT:PL:DP:DPR
3_67687 2 . T G 999 . . GT:PL:DP:DPR
53_569 89 . T G 999 . . GT:PL:DP:DPR
I am trying to use sed to delete all the lines that contain more than one letter in the 4th field (in the case above, line 7 and 8 from the top). I have tried the following regular expression but there must be a glitch some where that I cannot find:
sed '5,${;/\([^.]*\t\)\{3\}\[A-Z][A-Z]\+\t/d;}' input.vcf>new.vcf
The syntax is as follows:
5,$ #start at line 5 until the end of the file ($)
([^.]*\t) #matching group is any single character followed by a zero or more characters followed by a tab.
{3} #previous block repeated 3 times (presumably for the 4th field)
[A-Z][A-Z]+\t #followed by any string of two letters or more followed by a tab.
Unfortunately, this doesn' t work but I know I am close to make it to work. Any hints or help will make this a great teaching moment.
Thanks.
If awk is okay for you, you can use below command:
awk '(FNR<5){print} (FNR>=5)&&length($4)<=1' input.vcf
Default delimiter is space, you can use -F"\t" to switch it to tab, put it after awk. for instance, awk -F"\t" ....
(FNR<5){print} FNR is file number record, when it is less than 5, print the whole line
(FNR>=5) && length($4)<=1 will handle the rest lines and filter lines which 4th field has one character or less.
Output:
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
53_344 2 . C G 999 . . GT:PL:DP:DPR
6_56775 67 . T A 999 . . GT:PL:DP:DPR
3_67687 2 . T G 999 . . GT:PL:DP:DPR
53_569 89 . T G 999 . . GT:PL:DP:DPR
You can redirect the output to an output file.
$ awk 'NR<5 || $4~/^.$/' file
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
53_344 2 . C G 999 . . GT:PL:DP:DPR
6_56775 67 . T A 999 . . GT:PL:DP:DPR
3_67687 2 . T G 999 . . GT:PL:DP:DPR
53_569 89 . T G 999 . . GT:PL:DP:DPR
Fixed your sed filter (took me a while almost went crazy over it)
5,${/^\([^\t]\+\t\)\{3\}[A-Z][A-Z]\+\t/d}
Your errors:
[^.]*: everything but a dot.
Thanks to Ed, now I know that. I thought dot had to be escaped, but that does not seem to apply between brackets. Anyhow, this could match a tabulation char and match 2 or 3 groups instead of one, failing to match your line (regex are greedy by default)
\[A-Z][A-Z]: bad backslash. What did it do? hum, dunno!
test:
$ sed '5,${/^\([^\t]\+\t\)\{3\}[A-Z][A-Z]\+\t/d}' foo.Txt
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
53_344 2 . C G 999 . . GT:PL:DP:DPR
6_56775 67 . T A 999 . . GT:PL:DP:DPR
3_67687 2 . T G 999 . . GT:PL:DP:DPR
53_569 89 . T G 999 . . GT:PL:DP:DPR
conclusion: to process delimited fields, awk is better :)

Postgres table with select distinct and multiple sub queries

I have a table in Postgres 9.2 with 38 variables and I need a selection of the "best" results.
What I need is:
distinct var1 and var2 then from that:
min var3 and also var4 from that same row
max var5 and if more than one result then where min var3, var6 to var12 from that same row
var13 sorted by conditions (3 first, 6 second 0 last) and also var14-var18 from that same row
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 ...
1 1 2 a 2 a . . . . . . 0 . . . . .
1 1 1 b 1 b . . . . . . 3 . . . . .
1 2 4 c 3 c . . . . . . 3 . . . . .
1 2 3 d 4 d . . . . . . 6 . . . . .
2 1 1 a 3 a . . . . . . 3 . . . . .
3 1 3 a 2 a . . . . . . 6 . . . . .
3 1 2 b 4 b . . . . . . 0 . . . . .
4 1 3 a 4 a . . . . . . 3 . . . . .
4 1 6 b 2 b . . . . . . 0 . . . . .
4 2 2 c 2 c . . . . . . 0 . . . . .
4 3 5 d 3 d . . . . . . 3 . . . . .
4 3 4 e 4 e . . . . . . 6 . . . . .
4 3 7 f 4 f . . . . . . 3 . . . . .
...
The result should be:
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18
1 1 1 b 2 a . . . . . . 3 . . . . .
1 2 3 d 4 d . . . . . . 3 . . . . .
2 1 1 a 3 a . . . . . . 3 . . . . .
3 1 2 b 4 b . . . . . . 6 . . . . .
4 1 3 a 4 a . . . . . . 3 . . . . .
4 2 2 c 2 c . . . . . . 0 . . . . .
4 3 4 e 4 e . . . . . . 3 . . . . .
...
here is also an image of the table where the colored fields show what should be selected:
Hope this makes sense.
EDIT:
Got a pointer in another post to provide CREATE and INSERT for the table.
create table parent (
v1 character varying,
v2 character varying,
v3 character varying,
v4 character varying,
v5 character varying,
v6 character varying,
v7 character varying,
v8 character varying,
v9 character varying,
v10 character varying,
v11 character varying,
v12 character varying,
v13 character varying,
v14 character varying,
v15 character varying,
v16 character varying,
v17 character varying,
v18 character varying
);
insert into parent values('1','1','2','a','2','a','x1','x1','x1','x1','x1','x1','0','x1','x1','x1','x1','x1');
insert into parent values('1','1','1','b','1','b','x2','x2','x2','x2','x2','x2','3','x2','x2','x2','x2','x2');
insert into parent values('1','2','4','c','3','c','x3','x3','x3','x3','x3','x3','3','x3','x3','x3','x3','x3');
insert into parent values('1','2','3','d','4','d','x4','x4','x4','x4','x4','x4','6','x4','x4','x4','x4','x4');
insert into parent values('2','1','1','a','3','a','x1','x1','x1','x1','x1','x1','3','x1','x1','x1','x1','x1');
insert into parent values('3','1','3','a','2','a','x1','x1','x1','x1','x1','x1','6','x1','x1','x1','x1','x1');
insert into parent values('3','1','2','b','4','b','x2','x2','x2','x2','x2','x2','0','x2','x2','x2','x2','x2');
insert into parent values('4','1','3','a','4','a','x1','x1','x1','x1','x1','x1','3','x1','x1','x1','x1','x1');
insert into parent values('4','1','6','b','2','b','x2','x2','x2','x2','x2','x2','0','x2','x2','x2','x2','x2');
insert into parent values('4','2','2','c','2','c','x3','x3','x3','x3','x3','x3','0','x3','x3','x3','x3','x3');
insert into parent values('4','3','5','d','3','d','x4','x4','x4','x4','x4','x4','3','x4','x4','x4','x4','x4');
insert into parent values('4','3','4','e','4','e','x5','x5','x5','x5','x5','x5','6','x5','x5','x5','x5','x5');
insert into parent values('4','3','7','f','4','f','x6','x6','x6','x6','x6','x6','3','x6','x6','x6','x6','x6');

perltidy formatting multilines

I'm trying to get perltidy to format an if statement like this:
if ($self->image eq $_->[1]
and $self->extension eq $_->[2]
and $self->location eq $_->[3]
and $self->modified eq $_->[4]
and $self->accessed eq $_->[5]) {
but no matter what I try, it insists on formatting it like this:
if ( $self->image eq $_->[1]
and $self->extension eq $_->[2]
and $self->location eq $_->[3]
and $self->modified eq $_->[4]
and $self->accessed eq $_->[5]) {
Also, is there any way to get the last line of this block:
$dbh->do("INSERT INTO image VALUES(NULL, "
. $dbh->quote($self->image) . ", "
. $dbh->quote($self->extension) . ", "
. $dbh->quote($self->location) . ","
. $dbh->quote($self->modified) . ","
. $dbh->quote($self->accessed)
. ")");
to jump up to the previous line like the other lines:
$dbh->do("INSERT INTO image VALUES(NULL, "
. $dbh->quote($self->image) . ", "
. $dbh->quote($self->extension) . ", "
. $dbh->quote($self->location) . ","
. $dbh->quote($self->modified) . ","
. $dbh->quote($self->accessed) . ")");
Here is what I'm currently doing:
perltidy -ce -et=4 -l=100 -pt=2 -msc=1 -bar -ci=0 reporter.pm
Thanks.
I don't have much to offer on the 1st question, but with the 2nd, have you considered refactoring it to use placeholders? It would probably format up better, automaticaly do the quoting for you and give you (and the users of your module) a healthy barrier against SQL injection problems.
my $sth = $dbh->prepare('INSERT INTO image VALUES(NULL, ?, ?, ?, ?, ?)');
$sth->execute(
$self->image, $self->extension, $self->location,
$self->modified, $self->accessed
);
I've also found format skipping: -fs to protect a specific segment of code from perltidy. I'd put an example here but the Site seems to do a hatchet job on it...

Batch Conversion Domain to IP

I have a list of domains I would like to convert to their respective IP. Let's say:
domain1.com
domain2.com
domain3.com
I would like to get the IP of each of these domains, such as:
domain1.com -> 111.111.111.111
domain2.com -> 222.222.222.222
domain3.com -> 333.333.333.333
I'm using a Perl script I found online and adding my domain list where it says :
echo "
<insert_list>
" | perl -MSocket -lne'
my $address = ( split /:/ )[ 0 ] or next;
my $number = inet_aton $address;
my $ip = inet_ntoa $number;
print "$address -> $ip";
'
This works, but some of the domains on my list have expired and no longer have IPs, in which case I get the following error message:
Bad arg length for Socket::inet_ntoa, length is 0, should be 4 at -e line 4, <> line 9.
I'd like to have a printed list that also tells me if a domain is unassigned. Example:
domain1.com -> 111.111.111.111
domain2.com -> 222.222.222.222
domain3.com -> 333.333.333.333
domain4.com -> Unknown Host
My list is fairly long: I have about 500 domains to organize and clean-up. What would be the best way to get the IP of each domain?
Any help would be greatly appreciated. Thanks!
Try changing the $ip assignment line to:
my $ip = $number ? inet_ntoa $number : "Unknown Host";
This checks the $number variable to see whether it's empty or not, and only if it isn't does it call inet_ntoa to convert the address to a printable form. If $number is empty, it assigns the string "Unknown Host".

Perl LWP::UserAgent mishandling UTF-8 response

When I use LWP::UserAgent to retrieve content encoded in UTF-8 it seems LWP::UserAgent doesn't handle the encoding correctly.
Here's the output after setting the Command Prompt window to Unicode by the command chcp 65001 Note that this initially gives the appearance that all is well, but I think it's just the shell reassembling bytes and decoding UTF-8, From the other output you can see that perl itself is not handling wide characters correctly.
C:\>perl getutf8.pl
======================================================================
HTTP/1.1 200 OK
Connection: close
Date: Fri, 31 Dec 2010 19:24:04 GMT
Accept-Ranges: bytes
Server: Apache/2.2.8 (Win32) PHP/5.2.6
Content-Length: 75
Content-Type: application/xml; charset=utf-8
Last-Modified: Fri, 31 Dec 2010 19:20:18 GMT
Client-Date: Fri, 31 Dec 2010 19:24:04 GMT
Client-Peer: 127.0.0.1:80
Client-Response-Num: 1
<?xml version="1.0" encoding="UTF-8"?>
<name>Budějovický Budvar</name>
======================================================================
response content length is 33
....v....1....v....2....v....3....v....4
<name>Budějovický Budvar</name>
. . . . v . . . . 1 . . . . v . . . . 2 . . . . v . . . . 3 . . . .
3c6e616d653e427564c49b6a6f7669636bc3bd204275647661723c2f6e616d653e
< n a m e > B u d � � j o v i c k � � B u d v a r < / n a m e >
Above you can see the payload length is 31 characters but Perl thinks it is 33.
For confirmation, in the hex, we can see that the UTF-8 sequences c49b and c3bd are being interpreted as four separate characters and not as two Unicode characters.
Here's the code
#!perl
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $response = $ua->get('http://localhost/Bud.xml');
if (! $response->is_success) { die $response->status_line; }
print '='x70,"\n",$response->as_string(), '='x70,"\n";
my $r = $response->decoded_content((charset => 'UTF-8'));
$/ = "\x0d\x0a"; # seems to be \x0a otherwise!
chomp($r);
# Remove any xml prologue
$r =~ s/^<\?.*\?>\x0d\x0a//;
print "Response content length is ", length($r), "\n\n";
print "....v....1....v....2....v....3....v....4\n";
print $r,"\n";
print ". . . . v . . . . 1 . . . . v . . . . 2 . . . . v . . . . 3 . . . . \n";
print unpack("H*", $r), "\n";
print join(" ", split("", $r)), "\n";
Note that Bud.xml is UTF-8 encoded without a BOM.
How can I persuade LWP::UserAgent to do the right thing?
P.S. Ultimately I want to translate the Unicode data into an ASCII encoding, even if it means replacing each non-ASCII character with one question mark or other marker.
Update 1
I have accepted Ysth's "upgrade" answer - because I know it is the right thing to do when possible. However there is a work around to fix up the data into a well formed Perl Unicode string.
$r = decode("utf8", $r);
Update 2
My data gets fed to a non-Perl application that displays the data using Code Page 437 to Putty/Reflection/Teraterm terminals at many locations. The app is currently displaying something like:
Bud├ä┬øjovick├â┬¢ Budvar
I am going to use ($r = decode("UTF-8", $r)) =~ s/[\x80-\x{FFFF}]/\xFE/g; to get the app to display:
Bud■jovick■ Budvar
Moving away from CP437 would be a major job, so that is not going to happen in the short to medium term.
Update 3
CPAN has some interesting Unicode modules such as:
Text::Unidecode
Unicode::Map8
Unicode::Map
Unicode::Escape
Unicode::Transliterate
Text::Unidecode translated "Budějovický Budvar" into "Budejovicky Budvar" - which didn't seem to me a particularly impressive attempt at a phonetic transliteration but then I don't speak Czech. English speakers might prefer it to "Bud■jovick■ Budvar" though.
Upgrade to a newer libwwwperl. The old version you are using only honored the charset argument to decoded_content for text/* content types; the newer version also does so for application/xml or anything ending +xml.