How to remove blank lines from text file using powershell - powershell

I am building a parser in powershell for converting vmstat log dumps to CSV files as an input to a graphing framework (Rickshaw). I have repeating 'headers' in the file which I would like to remove. Data sample is as below:
Tue Sep 1 14:03:26 2015: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
Tue Sep 1 14:03:26 2015: r b swpd free buff cache si so bi bo in cs us sy id wa st
Tue Sep 1 14:03:26 2015: 0 1 224412 358316 248772 63286912 0 0 388 267 1 1 8 0 91 1 0
Tue Sep 1 14:03:36 2015: 0 0 224412 357572 248796 63286916 0 0 0 8 220 261 0 0 100 0 0
Tue Sep 1 14:03:46 2015: 0 0 224412 357696 248808 63286916 0 0 0 14 276 293 0 0 100 0 0
Tue Sep 1 14:03:56 2015: 0 0 224412 357688 248808 63286916 0 0 0 13 231 269 0 0 100 0 0
Tue Sep 1 14:04:06 2015: 0 0 224412 357300 248812 63286920 0 0 0 17 266 283 0 0 100 0 0
Tue Sep 1 14:06:56 2015: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
Tue Sep 1 14:06:56 2015: r b swpd free buff cache si so bi bo in cs us sy id wa st
Tue Sep 1 14:06:56 2015: 1 0 224412 357348 248976 63286928 0 0 0 1 182 231 0 0 100 0 0
Tue Sep 1 14:07:06 2015: 0 0 224412 357348 248980 63286928 0 0 0 9 211 251 0 0 100 0 0
Tue Sep 1 14:07:16 2015: 0 0 224412 357136 248988 63286928 0 0 0 19 287 279 0 0 100 0 0
Tue Sep 1 14:07:26 2015: 0 0 224412 357012 249004 63286928 0 0 0 9 199 244 0 0 100 0 0
Tue Sep 1 14:07:36 2015: 0 0 224412 357080 249012 63286928 0 0 0 7 235 258 0 0 100 0 0
Tue Sep 1 14:10:26 2015: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
Tue Sep 1 14:10:26 2015: r b swpd free buff cache si so bi bo in cs us sy id wa st
Tue Sep 1 14:10:26 2015: 12 0 224400 351832 265992 62560000 6 0 15 25262 8579 617 96 4 0 0 0
Tue Sep 1 14:10:36 2015: 12 0 224400 379200 266064 62444728 0 0 2 16727 8418 761 97 3 0 0 0
I use this bit of code to get that done.
Get-Content "C:\Projects\Play\Garage\Data_Processing\Sampler.log" | select-string -pattern 'procs|swpd' -notmatch | Out-File "C:\Projects\Play\Garage\Data_Processing\Refined.log"
The resulting file has the desired lines removed but instead have blank lines inserted at the beginning and towards the end. Because of this, I am unable to send this data/file to the next step of parsing. What could I be doing wrong?
Resultant File data:
> [BLANK LINE]
Tue Sep 1 14:03:26 2015: 0 1 224412 358316 248772 63286912 0 0 388 267 1 1 8 0 91 1 0
Tue Sep 1 14:03:36 2015: 0 0 224412 357572 248796 63286916 0 0 0 8 220 261 0 0 100 0 0
Tue Sep 1 14:03:46 2015: 0 0 224412 357696 248808 63286916 0 0 0 14 276 293 0 0 100 0 0
Tue Sep 1 14:03:56 2015: 0 0 224412 357688 248808 63286916 0 0 0 13 231 269 0 0 100 0 0
Tue Sep 1 14:04:06 2015: 0 0 224412 357300 248812 63286920 0 0 0 17 266 283 0 0 100 0 0
Tue Sep 1 14:06:56 2015: 1 0 224412 357348 248976 63286928 0 0 0 1 182 231 0 0 100 0 0
Tue Sep 1 14:07:06 2015: 0 0 224412 357348 248980 63286928 0 0 0 9 211 251 0 0 100 0 0
Tue Sep 1 14:07:16 2015: 0 0 224412 357136 248988 63286928 0 0 0 19 287 279 0 0 100 0 0
Tue Sep 1 14:07:26 2015: 0 0 224412 357012 249004 63286928 0 0 0 9 199 244 0 0 100 0 0
Tue Sep 1 14:07:36 2015: 0 0 224412 357080 249012 63286928 0 0 0 7 235 258 0 0 100 0 0
Tue Sep 1 14:10:26 2015: 12 0 224400 351832 265992 62560000 6 0 15 25262 8579 617 96 4 0 0 0
Tue Sep 1 14:10:36 2015: 12 0 224400 379200 266064 62444728 0 0 2 16727 8418 761 97 3 0 0 0
>[BLANK LINE]
>[BLANK LINE]
>[BLANK LINE]

Not sure why Select-String is making empty lines but you could replace Select-String with a simple Where-Object which would not return the empty lines
Here's how i would do it:
Get-Content "C:\Projects\Play\Garage\Data_Processing\Sampler.log" | Where-Object -FilterScript {$_ -notmatch 'procs|swpd'} | Out-File "C:\Projects\Play\Garage\Data_Processing\Refined.log"

Related

HALog - Connect and response times percentiles

When I run the following command to parse haproxy logs, the output doesn't contain any headers, and I'm not able to understand the meanings of the numbers in each of the columns.
Command halog -pct < haproxy.log > percentiles.txt
the output that I see is:
0.1 3493 18 0 0 0
0.2 6986 25 0 0 0
0.3 10479 30 0 0 0
0.4 13972 33 0 0 0
0.5 17465 37 0 0 0
0.6 20958 40 0 0 0
0.7 24451 43 0 0 0
0.8 27944 46 0 0 0
0.9 31438 48 0 0 0
1.0 34931 49 0 0 0
1.1 38424 50 0 0 0
1.2 41917 51 0 0 0
1.3 45410 52 0 0 0
1.4 48903 53 0 0 0
1.5 52396 55 0 0 0
1.6 55889 56 0 0 0
1.7 59383 57 0 0 0
1.8 62876 58 0 0 0
1.9 66369 60 0 0 0
2.0 69862 61 0 0 0
3.0 104793 74 0 0 0
4.0 139724 80 0 1 0
5.0 174656 89 0 1 0
6.0 209587 94 0 1 0
7.0 244518 100 0 1 0
8.0 279449 106 0 1 0
9.0 314380 112 0 1 0
10.0 349312 118 0 1 0
15.0 523968 144 0 1 0
20.0 698624 168 0 1 0
25.0 873280 180 0 2 0
30.0 1047936 190 0 2 0
35.0 1222592 200 0 3 0
40.0 1397248 210 0 3 0
45.0 1571904 220 0 4 0
50.0 1746560 230 0 6 0
55.0 1921216 241 0 7 0
60.0 2095872 258 0 9 0
65.0 2270528 279 0 10 0
70.0 2445184 309 0 16 0
75.0 2619840 354 1 18 0
80.0 2794496 425 1 20 0
85.0 2969152 545 1 22 0
90.0 3143808 761 1 39 1
91.0 3178740 821 1 80 1
92.0 3213671 921 1 217 1
93.0 3248602 1026 1 457 1
94.0 3283533 1190 1 683 1
95.0 3318464 1408 1 889 1
96.0 3353396 1721 1 1107 1
97.0 3388327 2181 1 1328 1
98.0 3423258 2902 1 1555 1
98.1 3426751 3000 1 1580 1
98.2 3430244 3094 1 1607 1
98.3 3433737 3196 1 1635 1
98.4 3437231 3301 1 1666 1
98.5 3440724 3420 1 1697 1
98.6 3444217 3550 1 1731 1
98.7 3447710 3690 1 1770 1
98.8 3451203 3848 1 1815 1
98.9 3454696 4030 1 1864 1
99.0 3458189 4249 1 1923 2
99.1 3461682 4490 1 1993 2
99.2 3465176 4766 2 2089 2
99.3 3468669 5085 2 2195 2
99.4 3472162 5441 3 2317 97
99.5 3475655 5899 5 2440 365
99.6 3479148 6517 11 2567 817
99.7 3482641 7403 14 2719 1555
99.8 3486134 8785 16 2992 2779
99.9 3489627 11650 997 3421 4931
100.0 3493121 85004 4008 20914 71716
The first column looks to be the percentile, (like P50, P90, P99, etc) but the what are the values in the 2nd, 3rd, 4th, 5th and 6th columns? Also, are they total values (halog reports total times when provided with other options), or average values or maximum values?
<percentile> <request count> <Request Time*> <Connect Time**> <Response Time***> <Data Time****>
* Referred to as TR in the documentation.
** Referred to as Tc in the documentation.
*** Referred to as Tr in the documentation.
**** Referred to as Td in the documentation.
The source provides some good pointers.

Changing index of matrix

I'm trying to change the following code so that the first matrix will become the second matrix:
function BellTri = matrix(n)
BellTri = zeros(n);
BellTri(1,1) = 1;
for i = 2:n
BellTri(i,1) = BellTri(i-1,i-1);
for j = 2:i
BellTri(i,j) = BellTri(i - 1,j-1) + BellTri(i,j-1);
end
end
BellTri
First matrix (when n = 7)
1 0 0 0 0 0 0
1 2 0 0 0 0 0
2 3 5 0 0 0 0
5 7 10 15 0 0 0
15 20 27 37 52 0 0
52 67 87 114 151 203 0
203 255 322 409 523 674 877
Second matrix
1 1 2 5 15 52 877
1 3 10 37 151 674 0
2 7 27 114 523 0 0
5 20 87 409 0 0 0
15 67 322 0 0 0 0
52 255 0 0 0 0 0
203 0 0 0 0 0 0
An option is to cyclically permute the columns using circshift.
function [BellTri, Second] = matrix(n)
BellTri = zeros(n);
BellTri(1,1) = 1;
for i = 2:n
BellTri(i,1) = BellTri(i-1,i-1);
for j = 2:i
BellTri(i,j) = BellTri(i - 1,j-1) + BellTri(i,j-1);
end
end
Second = BellTri;
for i = 1:n
Second(:, i) = circshift(Second(:,i), 1-i);
end
for i = n-1:-1:2
Second(1, i) = Second(1, i-1);
end
end
Input: [BellTri, Second] = matrix(7)
Output:
BellTri =
1 0 0 0 0 0 0
1 2 0 0 0 0 0
2 3 5 0 0 0 0
5 7 10 15 0 0 0
15 20 27 37 52 0 0
52 67 87 114 151 203 0
203 255 322 409 523 674 877
Second =
1 1 2 5 15 52 877
1 3 10 37 151 674 0
2 7 27 114 523 0 0
5 20 87 409 0 0 0
15 67 322 0 0 0 0
52 255 0 0 0 0 0
203 0 0 0 0 0 0
One approach:
out = zeros(size(A));
out(logical(fliplr(triu(ones(size(A,1)))))) = A(logical(tril(ones(size(A,1)))));
Note: As Divakar pointed out, there should be a typo in the first row. This method gives the corrected one.
Results:
A = [1 0 0 0 0 0 0;
1 2 0 0 0 0 0;
2 3 5 0 0 0 0;
5 7 10 15 0 0 0;
15 20 27 37 52 0 0;
52 67 87 114 151 203 0;
203 255 322 409 523 674 877];
>> out
out =
1 2 5 15 52 203 877
1 3 10 37 151 674 0
2 7 27 114 523 0 0
5 20 87 409 0 0 0
15 67 322 0 0 0 0
52 255 0 0 0 0 0
203 0 0 0 0 0 0

2D vs 3D FFT in Matlab/Octave

Say I have this matrix in memory and I want to calculate the 3D FFT
T =
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
36 37 38 39
40 41 42 43
44 45 46 47
44 45 46 47
52 53 54 55
56 57 58 59
60 61 62 63
real(fft2(T))
ans =
2000 -32 -32 -32
-128 0 0 0
-112 0 0 0
-128 0 0 0
-144 0 0 0
-128 0 0 0
-112 0 0 0
-128 0 0 0
-144 0 0 0
-128 0 0 0
-112 0 0 0
-128 0 0 0
-144 0 0 0
-128 0 0 0
-112 0 0 0
-128 0 0 0
real(fftn(T))
ans =
2000 -32 -32 -32
-128 0 0 0
-112 0 0 0
-128 0 0 0
-144 0 0 0
-128 0 0 0
-112 0 0 0
-128 0 0 0
-144 0 0 0
-128 0 0 0
-112 0 0 0
-128 0 0 0
-144 0 0 0
-128 0 0 0
-112 0 0 0
-128 0 0 0
Why am I getting the same result? How 3D FFTs can be done in Matlab/Octave?
A 3D-FFT should be applied to a 3D-array. If you apply the 3D-FFT to a 2D-array you get the same result as a 2D-FFT, because there is no third dimension in the array.
Think about it this way: an N-dimensional FFT is just N 1-dimensional FFT's, one along each dimension. If there is no third dimension in the array, the FFT along that dimension does nothing.

replace zero with text using sed or awk

I have text file which looks like as shown below:
0 chr23:54039 0 54039
0 chr23:103278 0 103278
0 chr22:174609 0 174609
0 chr22:54039 0 54039
0 chr25:103278 0 103278
0 chr25:174609 0 174609
26 chr26:174609 0 174609
If the first column is '0' i need to replace the 0 in the first column with the number after chr. So, the output should look like:
23 chr23:54039 0 54039
23 chr23:103278 0 103278
22 chr22:174609 0 174609
22 chr22:54039 0 54039
25 chr25:103278 0 103278
25 chr25:174609 0 174609
26 chr26:174609 0 174609
Can anyone provide a simple sed or awk any linux solution?
If number in column #1 is always the same as chr number you can do this with awk
awk '{split($2,a,":|chr");$1=a[2]}1' file
23 chr23:54039 0 54039
23 chr23:103278 0 103278
22 chr22:174609 0 174609
22 chr22:54039 0 54039
25 chr25:103278 0 103278
25 chr25:174609 0 174609
26 chr26:174609 0 174609
With sed:
$ sed -r '/^0/s/0(\s*chr)([^:]*)/\2\1\2/g' file
23 chr23:54039 0 54039
23 chr23:103278 0 103278
22 chr22:174609 0 174609
22 chr22:54039 0 54039
25 chr25:103278 0 103278
25 chr25:174609 0 174609
26 chr26:174609 0 174609
Without -r:
$ sed '/^0/s/0\(\s*chr\)\([^:]*\)/\2\1\2/g' file
23 chr23:54039 0 54039
23 chr23:103278 0 103278
22 chr22:174609 0 174609
22 chr22:54039 0 54039
25 chr25:103278 0 103278
25 chr25:174609 0 174609
26 chr26:174609 0 174609
The idea is to replace lines starting with 0. In those, the 0...chrNUM:... is caught and printed back with desired format.
With awk:
$ awk '/^0/ {split($2,a,":"); gsub("chr", "", a[1]); $1=a[1]}1' file
23 chr23:54039 0 54039
23 chr23:103278 0 103278
22 chr22:174609 0 174609
22 chr22:54039 0 54039
25 chr25:103278 0 103278
25 chr25:174609 0 174609
26 chr26:174609 0 174609
Given lines starting with 0, the 2nd field is broken into pieces by : delimiter and then chr text is removes. Then it is ready to be stored as first field. 1 makes the condition true, so the full new line is printed.
sed "s/^0[[:blank:]]\{1,\}chr\([0-9]\{1,\}\):/\1 chr\1:/"

need to insert datetime in every row of vmstat output

I need to insert datetime in every vmstat line that has value.
I can create a function like this:
function insert_datetime {
while read line
do
printf "$line"
date '+ %m-%d-%Y %H:%M:%S'
done
}
then call vmstat as below:
'vmstat 3 5 | insert_datetime'
but this line puts date time to every line, including dashes (--) and any rows that has text. How can I exclude rows that has dahses and text?
kthr memory page faults cpu 04-23-2013 10:19:49
----- ----------- ------------------------ ------------ ----------------------- 04-23-2013 10:19:49
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec 04-23-2013 10:19:49
0 0 45688088 4094129 0 0 0 0 0 0 45 12172 2840 1 1 99 0 0.35 2.2 04-23-2013 10:19:49
2 0 45694135 4088082 0 0 0 0 0 0 451 56350 21818 3 1 97 0 0.73 4.5 04-23-2013 10:19:52
1 0 45694137 4088061 0 0 0 0 0 0 303 24568 951 3 1 96 0 0.82 5.1 04-23-2013 10:19:55
1 0 45694138 4087739 0 0 0 0 0 0 445 9170 1504 2 0 98 0 0.64 4.0 04-23-2013 10:19:58
4 0 45703145 4078732 0 0 0 0 0 0 335 47175 1306 4 1 95 0 1.01 6.3 04-23-2013 10:20:01
I needed to look like this:
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
0 0 45688088 4094129 0 0 0 0 0 0 45 12172 2840 1 1 99 0 0.35 2.2 04-23-2013 10:19:49
2 0 45694135 4088082 0 0 0 0 0 0 451 56350 21818 3 1 97 0 0.73 4.5 04-23-2013 10:19:52
1 0 45694137 4088061 0 0 0 0 0 0 303 24568 951 3 1 96 0 0.82 5.1 04-23-2013 10:19:55
1 0 45694138 4087739 0 0 0 0 0 0 445 9170 1504 2 0 98 0 0.64 4.0 04-23-2013 10:19:58
4 0 45703145 4078732 0 0 0 0 0 0 335 47175 1306 4 1 95 0 1.01 6.3 04-23-2013 10:20:01
Why not just use vmstat -t? It seems to be exactly what you are looking for. Here is some sample output
[root#web5 vmstat]# vmstat -t 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ---timestamp---
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 15704 193236 189628 595868 9 3 25 16 15 20 11 1 88 1 0 2013-05-22 13:32:36 JST
0 0 15704 193212 189628 595868 0 0 0 0 22 20 0 0 100 0 0 2013-05-22 13:32:37 JST
0 0 15704 193212 189628 595868 0 0 0 0 19 12 0 0 100 0 0 2013-05-22 13:32:38 JST
0 0 15704 193212 189628 595868 0 0 0 0 10 11 0 0 100 0 0 2013-05-22 13:32:39 JST
0 0 15704 193212 189628 595868 0 0 0 96 34 25 0 1 99 0 0 2013-05-22 13:32:40 JST
0 0 15704 193212 189628 595868 0 0 0 0 10 9 0 0 100 0 0 2013-05-22 13:32:41 JST
0 0 15704 193212 189628 595868 0 0 0 0 14 23 0 0 100 0 0 2013-05-22 13:32:42 JST
executed on CentOS6.3 with procps 3.2.8
[root#web5 uptime]# vmstat -V
procps version 3.2.8
Use awk:
vmstat 3 5 | awk '/^ *[0-9]/{$0=$0 " " strftime("%m-%d-%Y %T")};1'
Try:
function insert_datetime {
while read line
do
printf "$line"
if [[ "$line" =~ [0-9].* ]]; then
date '+ %m-%d-%Y %H:%M:%S'
else
echo
fi
done
}
sed can give you answer too... in much cleaner & portable (across shells) way:
vmstat 3 5 | sed '/^ *[0-9].*/s/.*/printf "&";date "+ %m-%d-%Y %H:%M:%S"/e'
All lines starting with a number are appended date in required format.