Equivalent of *nix fold in PowerShell - powershell

Today I had a few hundred items (IDs from SQL query) and needed to paste them into another query to be readable by an analyst. I needed *nix fold command. I wanted to take the 300 lines and reformat them as multiple numbers per line seperated by a space. I would have used fold -w 100 -s.
Similar tools on *nix include fmt and par.
On Windows is there an easy way to do this in PowerShell? I expected one of the *-Format commandlets to do it, but I couldn't find it. I'm using PowerShell v4.
See https://unix.stackexchange.com/questions/25173/how-can-i-wrap-text-at-a-certain-column-size
# Input Data
# simulate a set of 300 numeric IDs from 100,000 to 150,000
100001..100330 |
Out-File _sql.txt -Encoding ascii
# I want output like:
# 100001, 100002, 100003, 100004, 100005, ... 100010, 100011
# 100012, 100013, 100014, 100015, 100016, ... 100021, 100021
# each line less than 100 characters.

Depending on how big the file is you could read it all into memory, join it with spaces and then split on 100* characters or the next space
(Get-Content C:\Temp\test.txt) -join " " -split '(.{100,}?[ |$])' | Where-Object{$_}
That regex looks for 100 characters then the first space after that. That match is then -split but since the pattern is wrapped in parenthesis the match is returned instead of discarded. The Where removes the empty entries that are created in between the matches.
Small sample to prove theory
#"
134
124
1
225
234
4
34
2
42
342
5
5
2
6
"#.split("`n") -join " " -split '(.{10,}?[ |$])' | Where-Object{$_}
The above splits on 10 characters where possible. If it cannot the numbers are still preserved. Sample is based on me banging on the keyboard with my head.
134 124 1
225 234 4
34 2 42
342 5 5
2 6
You could then make this into a function to get the simplicity back that you are most likely looking for. It can get better but this isn't really the focus of the answer.
Function Get-Folded{
Param(
[string[]]$Strings,
[int]$Wrap = 50
)
$strings -join " " -split "(.{$wrap,}?[ |$])" | Where-Object{$_}
}
Again with the samples
PS C:\Users\mcameron> Get-Folded -Strings (Get-Content C:\temp\test.txt) -wrap 40
"Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation
... output truncated...
You can see that it was supposed to split on 40 characters but the second line is longer. It split on the next space after 40 to preserve the word.

If it's one item per line, and you want to join every 100 items onto a single line separated by a space you could put all the output into a text file then do this:
gc c:\count.txt -readcount 100 | % {$_ -join " "}

When I saw this, the first thing that came to my mind was abusing Format-Table to do this, mostly because it knows how to break the lines properly when you specify a width. After coming up with a function, it seems that the other solutions presented are shorter and probably easier to understand, but I figured I'd still go ahead and post this solution anyway:
function fold {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
$InputObject,
[Alias('w')]
[int] $LineWidth = 100,
[int] $ElementWidth
)
begin {
$SB = New-Object System.Text.StringBuilder
if ($ElementWidth) {
$SBFormatter = "{0,$ElementWidth} "
}
else {
$SBFormatter = "{0} "
}
}
process {
foreach ($CurrentObject in $InputObject) {
[void] $SB.AppendFormat($SBFormatter, $CurrentObject)
}
}
end {
# Format-Table wanted some sort of an object assigned to it, so I
# picked the first static object that popped in my head:
([guid]::Empty | Format-Table -Property #{N="DoesntMatter"; E={$SB.ToString()}; Width = $LineWidth } -Wrap -HideTableHeaders |
Out-String).Trim("`r`n")
}
}
Using it gives output like this:
PS C:\> 0..99 | Get-Random -Count 100 | fold
1 73 81 47 54 41 17 87 2 55 30 91 19 50 64 70 51 29 49 46 39 20 85 69 74 43 68 82 76 22 12 35 59 92
13 3 88 6 72 67 96 31 11 26 80 58 16 60 89 62 27 36 37 18 97 90 40 65 42 15 33 24 23 99 0 32 83 14
21 8 94 48 10 4 84 78 52 28 63 7 34 86 75 71 53 5 45 66 44 57 77 56 38 79 25 93 9 61 98 95
PS C:\> 0..99 | Get-Random -Count 100 | fold -ElementWidth 2
74 89 10 42 46 99 21 80 81 82 4 60 33 45 25 57 49 9 86 84 83 44 3 77 34 40 75 50 2 18 6 66 13
64 78 51 27 71 97 48 58 0 65 36 47 19 31 79 55 56 59 15 53 69 85 26 20 73 52 68 35 93 17 5 54 95
23 92 90 96 24 22 37 91 87 7 38 39 11 41 14 62 12 32 94 29 67 98 76 70 28 30 16 1 61 88 43 8 63
72
PS C:\> 0..99 | Get-Random -Count 100 | fold -ElementWidth 2 -w 40
21 78 64 18 42 15 40 99 29 61 4 95 66
86 0 69 55 30 67 73 5 44 74 20 68 16
82 58 3 46 24 54 75 14 11 71 17 22 94
45 53 28 63 8 90 80 51 52 84 93 6 76
79 70 31 96 60 27 26 7 19 97 1 59 2
65 43 81 9 48 56 25 62 13 85 47 98 33
34 12 50 49 38 57 39 37 35 77 89 88 83
72 92 10 32 23 91 87 36 41

This is what I ended up using.
# simulate a set of 300 SQL IDs from 100,000 to 150,000
100001..100330 |
%{ "$_, " } | # I'll need this decoration in the SQL script
Out-File _sql.txt -Encoding ascii
gc .\_sql.txt -ReadCount 10 | %{ $_ -join ' ' }
Thanks everyone for the effort and the answers. I'm really surprised there wasn't a way to do this with Format-Table without the use of [guid]::Empty in Rohn Edward's answer.
My IDs are much more consistent than the example I gave, so Noah's use of gc -ReadCount is by far the simplest solution in this particular data set, but in the future I'd probably use Matt's answer or the answers linked to by Emperor in comments.

I came up with this:
$array =
(#'
1
2
3
10
11
100
101
'#).split("`n") |
foreach {$_.trim()}
$array = $array * 40
$SB = New-Object Text.StringBuilder(100,100)
foreach ($item in $array) {
Try { [void]$SB.Append("$item ") }
Catch {
$SB.ToString()
[void]$SB.Clear()
[Void]$SB.Append("$item ")
}
}
#don't forget the last line
$SB.ToString()
1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101
1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101
1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101
1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101
1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101
1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101
1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101 1 2 3 10 11 100 101
Maybe not as compact as you were hoping for, and there may be better ways to do it, but it seems to work.

Related

Change base of whole matrix

I want to change the base of a multiplication table to another base.
If I use
disp(dec2base((1:10).*(1:10)',7))
the numbers come flowing out individually. However I want them to stay in the exact position in the given matrix.
The numerical base is a display issue, numbers are always stored and manipulated in base 2 internally. So all you need to do is write a loop that displays the numbers in they way you want to. For example:
for ii=1:10
for jj=1:10
fprintf('%6s',dec2base(ii*jj,7));
end
fprintf('\n');
end
Output:
1 2 3 4 5 6 10 11 12 13
2 4 6 11 13 15 20 22 24 26
3 6 12 15 21 24 30 33 36 42
4 11 15 22 26 33 40 44 51 55
5 13 21 26 34 42 50 55 63 101
6 15 24 33 42 51 60 66 105 114
10 20 30 40 50 60 100 110 120 130
11 22 33 44 55 66 110 121 132 143
12 24 36 51 63 105 120 132 144 156
13 26 42 55 101 114 130 143 156 202
Storing base-7 representation of numbers as string array:
M = (1:10).*(1:10)';
out = strings(size(M));
for jj = 1:size(M,2)
for ii = 1:size(M,1)
out(ii,jj) = dec2base(M(ii,jj) ,7);
end
end

How to apply an "interface" method to a set of rows in kdb?

Sorry if this is a newbie question again.
I am trying to replicate the functionality of interfaces as seen in c++, rust etc. in kdb as is shown in a simple demonstration below:
q).iface.a.fun:{x*y+z}
q).iface.b.fun:{x*x+y+z}
q)ifaces:`a`b; // for demonstration purposes
q)tab:([]time:`datetime$();kind:`ifaces$();x:`long$();y:`long$();z:`long$());
q)n:10;
q)tab,:flip(n#.z.z;n?ifaces;n?10;n?10;n?10)
Now you would assume that the kind would be able to reference the `a`b fun methods of the iface interface as follows:
q)?[`tab;();0b;`max`ifaceval!((max;`x);(`.iface;`kind;`fun;`x;`y;`z))]
evaluation error:
fun
[0] ?[`tab;();0b;`max`ifaceval!((max;`x);(`.iface;`kind;`fun;`x;`y;`z))]
^
Obviously the functional nature of the select inhibits referencing the fun method on account of the symbol type field declarations.
You can avert this error by using enlist as follows:
q)?[`tab;();0b;`max`ifaceval!((max;`x);(`.iface;`kind;enlist`fun;`x;`y;`z))]
max ifaceval ..
-----------------------------------------------------------------------------..
9 77 154 95 65 0 128 153 126 60 49 77 154 95 65 0 128 153 126 60 49 77 154 ..
However this duplicates the result of fun for each row.
How might one effectively go about this without getting the above malformed responses?
Thanks again.
Selecting ifaceval first will ensure each row is returned. max x is a scalar, which forces all the ifaceval entries into one row. The scalar will be expanded across all rows if a vector column precedes it.
q)?[`tab;();0b;`ifaceval`max!((`.iface;`kind;enlist`fun;`x;`y;`z);(max;`x))]
ifaceval max
-------------------------------------
160 11 126 28 32 60 76 10 112 168 8
96 10 77 24 16 35 60 6 63 104 8
96 10 77 24 16 35 60 6 63 104 8
96 10 77 24 16 35 60 6 63 104 8
96 10 77 24 16 35 60 6 63 104 8
160 11 126 28 32 60 76 10 112 168 8
96 10 77 24 16 35 60 6 63 104 8
160 11 126 28 32 60 76 10 112 168 8
160 11 126 28 32 60 76 10 112 168 8
160 11 126 28 32 60 76 10 112 168 8
I'm not sure if this is exactly what you're looking for though. If you want to calculate ifaceval for each row in the table, this should work.
q)?[tab;();0b;`ifaceval`max!(((';(`.iface;::;enlist`fun));`kind;`x;`y;`z);(max;`x))]
ifaceval max
------------
160 8
10 8
77 8
24 8
16 8
60 8
60 8
10 8
112 8
168 8
One point to make is that it's probably best to avoid using kdb keywords for column names. Although it works in functional queries, it does not for qSQL ones.
q)select max:max x from tab
'assign
[0] select max:max x from tab
^

KDB - Find duplicates or similar entries in one column

I'm trying to eliminate duplicate entries for customers in my contact list. Assume my table has three columns (FirstName, LastName, CustomerID).
Can somebody help me create a query that identifies different CustomerIDs with either the same or very similar First and Last Names? We end up with multiple entries due to sales people searching for a name and not finding it due to misspellings. They then create a new entry for the customer with a slightly different spelling of the name.
Thanks!
One approach is to manage a mapping of names to common (mis)spellings and then map all the various spellings back to the intended name. Then group them.
t:([] fn:100?(`John;`Mike;`Bob;`john;`Johnn;`Mick;`Bobby);ln:100?(`Doe;`Smith;`doe;`Do;`smith);id:til 100)
mapFN:exec similar!name from ungroup flip `name`similar!flip (
(`Bob; (`Bob;`bob;`Bobby;`bobby));
(`John; (`John;`Johnn;`john));
(`Mike; (`Mike;`mike;`Mick;`Michael))
);
mapLN:exec similar!name from ungroup flip `name`similar!flip (
(`Doe; (`Doe;`doe;`Do));
(`Smith; (`Smith;`smith;`Smyth))
);
Without mapping:
q)`fn`ln xgroup t
fn ln | id
-----------| ----------------
Mick Do | 0 25 26 50 68 71
Bobby Smith| 1 22 23 83
John Smith| 2 8 48 51 69 85
Mike Doe | 3 44
john doe | ,4
Mick Doe | 5 47 95
John Doe | 6 46 49 63
john Smith| 7 66 74
Johnn doe | 9 13 79 94
Mick doe | 10 20 55 67
Bobby smith| 11 17 18 53
john Doe | 12 21 56
...
With mapping:
q)`fn`ln xgroup update mapFN[fn],mapLN[ln] from t
fn ln | id
----------| -----------------------------------------------------------------
Mike Doe | 0 3 5 10 20 25 26 39 44 47 50 52 55 67 68 70 71 78 95 97
Bob Smith| 1 11 17 18 22 23 30 38 45 53 77 82 83
John Smith| 2 7 8 16 19 33 37 40 43 48 51 64 66 69 73 74 80 85 87
John Doe | 4 6 9 12 13 21 31 32 41 42 46 49 56 57 62 63 65 72 79 81 86 89 91
Bob Doe | 14 24 27 28 35 54 58 59 61 75 76 84
Mike Smith| 15 29 34 36 60 88 90 93 96 98
You could also do something more sophisticated with regex pattern matching.
The mapping would need to be pretty precise though as otherwise you might end up with false groupings

Matlab: How to replace certain elements of a matrix A by other values of A in both directions?

for a matrix A (10x100000) containing numbers between 1 and 100, how to interchange some elements of A by other values ​​of A in both directions?
example:
replace numbers [5 7 9 18 55 4] by [47 78 41 1 99 98] and [47 78 41 1 99 98] by [5 7 9 18 55 4]
Use the two outputs of ismember:
n1 = [1 2 3]; %// first set of numbers
n2 = [4 5 6]; %// second set of numbers
[v1, i1] = ismember(A,n1);
[v2, i2] = ismember(A,n2);
A(v1) = n2(i1(v1));
A(v2) = n1(i2(v2));
Example:
>> A = randi(8,4,5)
A =
2 2 8 4 6
2 5 3 8 2
5 4 3 2 5
4 3 2 3 4
is transformed into
A =
5 5 8 1 3
5 2 6 8 5
2 1 6 5 2
1 6 5 6 1
bsxfun based approach -
%// Input matrix
A = randi(100,10,10)
vec1 = [5 7 9 18 55 4 , 47 78 41 1 99 98]; %// Numbers to be replaced
vec2 = [47 78 4 1 99 98, 5 7 9 18 55 4]; %// Numbers to be used as replacements
[v1,v2] = max(bsxfun(#eq,A(:),vec1),[],2);
A(find(v1)) = vec2(v2(v1))
Sample run -
Input A
A =
27 37 27 59 37 13 55 45 29 16
84 41 58 46 75 39 75 51 49 16
100 37 88 87 71 82 85 54 69 16
65 47 7 67 71 99 17 86 21 9
71 51 45 36 1 87 91 68 61 46
94 92 9 35 38 9 11 81 33 67
69 21 57 26 91 34 75 54 89 84
57 34 54 96 32 24 73 96 14 80
39 58 77 30 60 32 72 7 11 72
64 49 24 16 30 99 14 55 96 48
Output A
A =
27 37 27 59 37 13 99 45 29 16
84 9 58 46 75 39 75 51 49 16
100 37 88 87 71 82 85 54 69 16
65 5 78 67 71 55 17 86 21 4
71 51 45 36 18 87 91 68 61 46
94 92 4 35 38 4 11 81 33 67
69 21 57 26 91 34 75 54 89 84
57 34 54 96 32 24 73 96 14 80
39 58 77 30 60 32 72 78 11 72
64 49 24 16 30 55 14 99 96 48
As can be seen, the 7s from (4,3) and (9,8) in the original A are replaced by 78s and 47 in (4,2) by 5.
Matlab is a strange and mysterious place. Searching through the documentation I found a function called changem in the Mapping toolbox. I've never used it, but apparently if you have your original matrix A and two substitution vectors v1 and v2:
v1 = [ 5 7 9 18 55 4];
v2 = [47 78 41 1 99 98];
All you have to do is:
B = changem(A, [v1 v2], [v2 v1]);

Functional addition of Columns in kdb+q

I have a q table in which no. of non keyed columns is variable. Also, these column names contain an integer in their names. I want to perform some function on these columns without actually using their actual names
How can I achieve this ?
For Example:
table:
a | col10 col20 col30
1 | 2 3 4
2 | 5 7 8
// Assume that I have numbers 10, 20 ,30 obtained from column names
I want something like **update NewCol:10*col10+20*col20+30*col30 from table**
except that no.of columns is not fixed so are their inlcluded numbers
We want to use a functional update (simple example shown here: http://www.timestored.com/kdb-guides/functional-queries-dynamic-sql#functional-update)
For this particular query we want to generate the computation tree of the select clause, i.e. the last part of the functional update statement. The easiest way to do that is to parse a similar statement then recreate that format:
q)/ create our table
q)t:([] c10:1 2 3; c20:10 20 30; c30:7 8 9; c40:0.1*4 5 6)
q)t
c10 c20 c30 c40
---------------
1 10 7 0.4
2 20 8 0.5
3 30 9 0.6
q)parse "update r:(10*c10)+(20*col20)+(30*col30) from t"
!
`t
()
0b
(,`r)!,(+;(*;10;`c10);(+;(*;20;`col20);(*;30;`col30)))
q)/ notice the last value, the parse tree
q)/ we want to recreate that using code
q){(*;x;`$"c",string x)} 10
*
10
`c10
q){(+;x;y)} over {(*;x;`$"c",string x)} each 10 20
+
(*;10;`c10)
(*;20;`c20)
q)makeTree:{{(+;x;y)} over {(*;x;`$"c",string x)} each x}
/ now write as functional update
q)![t;();0b; enlist[`res]!enlist makeTree 10 20 30]
c10 c20 c30 c40 res
-------------------
1 10 7 0.4 420
2 20 8 0.5 660
3 30 9 0.6 900
q)update r:(10*c10)+(20*c20)+(30*c30) from t
c10 c20 c30 c40 r
-------------------
1 10 7 0.4 420
2 20 8 0.5 660
3 30 9 0.6 900
I think functional select (as suggested by #Ryan) is the way to go if the table is quite generic, i.e. column names might varies and number of columns is unknown.
Yet I prefer the way #JPC uses vector to solve the multiplication and summation problem, i.e. update res:sum 10 20 30*(col10;col20;col30) from table
Let combine both approach together with some extreme cases:
q)show t:1!flip(`a,`$((10?2 3 4)?\:.Q.a),'string 10?10)!enlist[til 100],0N 100#1000?10
a | vltg4 pnwz8 mifz5 pesq7 fkcx4 bnkh7 qvdl5 tl5 lr2 lrtd8
--| -------------------------------------------------------
0 | 3 3 0 7 9 5 4 0 0 0
1 | 8 4 0 4 1 6 0 6 1 7
2 | 4 7 3 0 1 0 3 3 6 4
3 | 2 4 2 3 8 2 7 3 1 7
4 | 3 9 1 8 2 1 0 2 0 2
5 | 6 1 4 5 3 0 2 6 4 2
..
q)show n:"I"$string[cols get t]inter\:.Q.n
4 8 5 7 4 7 5 5 2 8i
q)show c:cols get t
`vltg4`pnwz8`mifz5`pesq7`fkcx4`bnkh7`qvdl5`tl5`lr2`lrtd8
q)![t;();0b;enlist[`res]!enlist({sum x*y};n;enlist,c)]
a | vltg4 pnwz8 mifz5 pesq7 fkcx4 bnkh7 qvdl5 tl5 lr2 lrtd8 res
--| -----------------------------------------------------------
0 | 3 3 0 7 9 5 4 0 0 0 176
1 | 8 4 0 4 1 6 0 6 1 7 226
2 | 4 7 3 0 1 0 3 3 6 4 165
3 | 2 4 2 3 8 2 7 3 1 7 225
4 | 3 9 1 8 2 1 0 2 0 2 186
5 | 6 1 4 5 3 0 2 6 4 2 163
..
You can create a functional form query as #Ryan Hamilton indicated, and overall that will be the best approach since it is very flexible. But if you're just looking to add these up, multiplied by some weight, I'm a fan of going through other avenues.
EDIT: missed that you said the number in the columns name could vary, in which case you can easily adjust this. If the column names are all prefaced by the same number of letters, just drop those and then parse the remaining into int or what have you. Otherwise if the numbers are embedded within text, check out this other question
//Create our table with a random number of columns (up to 9 value columns) and 1 key column
q)show t:1!flip (`$"c",/:string til n)!flip -1_(n:2+first 1?10) cut neg[100]?100
c0| c1 c2 c3 c4 c5 c6 c7 c8 c9
--| --------------------------
28| 3 18 66 31 25 76 9 44 97
60| 35 63 17 15 26 22 73 7 50
74| 64 51 62 54 1 11 69 32 61
8 | 49 75 68 83 40 80 81 89 67
5 | 4 92 45 39 57 87 16 85 56
48| 88 34 55 21 12 37 53 2 41
86| 52 91 79 33 42 10 98 20 82
30| 71 59 43 58 84 14 27 90 19
72| 0 99 47 38 65 96 29 78 13
q)update res:sum (1+til -1+count cols t)*flip value t from t
c0| c1 c2 c3 c4 c5 c6 c7 c8 c9 res
--| -------------------------------
28| 3 18 66 31 25 76 9 44 97 2230
60| 35 63 17 15 26 22 73 7 50 1551
74| 64 51 62 54 1 11 69 32 61 1927
8 | 49 75 68 83 40 80 81 89 67 3297
5 | 4 92 45 39 57 87 16 85 56 2582
48| 88 34 55 21 12 37 53 2 41 1443
86| 52 91 79 33 42 10 98 20 82 2457
30| 71 59 43 58 84 14 27 90 19 2134
72| 0 99 47 38 65 96 29 78 13 2336
q)![t;();0b; enlist[`res]!enlist makeTree 1+til -1+count cols t] ~ update res:sum (1+til -1+count cols t)*flip value t from t
1b
q)\ts do[`int$1e4;![t;();0b; enlist[`res]!enlist makeTree 1+til 9]]
232 3216j
q)\ts do[`int$1e4;update nc:sum (1+til -1+count cols t)*flip value t from t]
69 2832j
I haven't tested this on a large table, so caveat emptor
Here is another solution which is also faster.
t,'([]res:(+/)("I"$(string tcols) inter\: .Q.n) *' (value t) tcols:(cols t) except keys t)
By spending some time, we can decrease the word count as well. Logic goes like this:
a:"I"$(string tcols) inter\: .Q.n
Here I am first extracting out the integers from column names and storing them in a vector. Variable 'tcols' is declared at the end of query which is nothing but columns of table except key columns.
b:(value t) tcols:(cols t) except keys t
Here I am extracting out each column vector.
c:(+/) a *' b
Multiplying each column vector(var b) by its integer(var a) and adding corresponding
values from each resulting list.
t,'([]res:c)
Finally storing result in a temp table and joining it to t.