PowerShell: Incremental Counter in Foreach Loop - powershell

I have a foreach loop that iterates over an Array and calls a function which also has another foreach loop inside with an incremental counter, however it doesn't seem to be working as expected?
Array contents:
| Username | Username2 |
|----------|-----------|
| p1 | p2 |
| p3 | p4 |
Code:
function insertIntoLunchJobs($arrayOfRows) {
$counter = 1
foreach ($i in $arrayOfRows) {
$i
$counter++
$counter
}
}
Output:
| Username | Username2 |
|----------|-----------|
| p1 | p2 |
| 2 | |
| p3 | p4 |
| 2 | |
Desired result:
| Username | Username2 |
|----------|-----------|
| p1 | p2 |
| 2 | |
| p3 | p4 |
| 3 | |
Any ideas?
TIA

I'm literally copy pasting your code. I don't see any errors here:
$arr=#'
Username,Username2
p1,p2
p3,p4
'#|ConvertFrom-Csv
function insertIntoLunchJobs($arrayOfRows) {
$counter = 1
foreach ($i in $arrayOfRows) {
$i
$counter++
$counter
}
}
insertIntoLunchJobs -arrayOfRows $arr

Related

Why can memory leaking by cross-reference be solved by explicit reassignment in Perl?

Cross-reference causes memory leaking in Perl like this.
{
my #a = qw(a b c);
my #b = qw(a b c);
# both reference count are 1
push #a, \#b;
# #b reference count is 2(from #b and via #a)
push #b, \#a;
}
# #b reference count is 2(from via #a)
I understand memory leaking by cross-reference in this situation.
But the memory leaking can be resolve by explicit reassignment like this.
{
my #a = qw(a b c);
my #b = qw(a b c);
# both reference count are 1
push #a, \#b;
# #b reference count is 2(from #b and via #a)
push #b, \#a;
#a = ();
}
# why is #b reference count 0?
#a is lexical scope so I think even if there is no reassignment, #a's reference will be invalid but former cause memory leaking and later is not, why?
You start with
#a #b
| ARRAY | ARRAY
| REFCNT=2 | REFCNT=2
+-->+-----------+ +-->+-----------+
| | +-------+ | | | +-------+ |
| | | a | | | | | a | |
| | +-------+ | | | +-------+ |
| | | b | | | | | b | |
| | +-------+ | | | +-------+ |
| | | c | | | | | c | |
| | +-------+ | | | +-------+ |
| | | --------+ | | --------+
| | +-------+ | | +-------+ | |
| +-----------+ +-----------+ |
| |
+---------------------------------------+
If you were to exit the scope here, the reference counts would drop to one, and they would leak.
After #a = ();:
#a #b
| ARRAY | ARRAY
| REFCNT=2 | REFCNT=1
+-->+-----------+ +-->+-----------+
| | | | +-------+ |
| | | | | a | |
| | | | +-------+ |
| | | | | b | |
| | | | +-------+ |
| | | | | c | |
| | | | +-------+ |
| | | | | --------+
| | | | +-------+ | |
| +-----------+ +-----------+ |
| |
+---------------------------------------+
Note that #b's reference count went from two to one.
On scope exit, #a's reference count will drop to one, and #b's reference count will drop to zero.[1] This will free #b, which will cause #a's reference count to drop to zero. And that will free #a.
No cycle, so no memory leak.
At least in theory. In practice, what actually happens is a bit different as an optimization. But those are internal details that aren't relevant here.

PowerShell: Splitting an Array into two columns

I have an array with the following values:
| Firstname | Lastname | Username |
|-----------|------------------|----------|
| person1 | person1_lastname | p1 |
| person2 | person2_lastname | p2 |
| person3 | person3_lastname | p3 |
| person4 | person4_lastname | p4 |
This is the code that produces the above results:
$finalUsers = foreach($person in $excludedUsers) {
if ($person.Username -notin $ausLunchJobs.AssigneeUser -and $person.Username -notin $ausLunchJobs.AssigneeUser2) {
$person | Select-Object Firstname, Lastname, Username
}
}
I want to split that array into two columns and pair the Username data together.
Ideal output:
| Username | Username2 |
|----------|-----------|
| p1 | p2 |
| p3 | p4 |
Any guidance on how I can achieve something like this?
TIA
Create 1 new object per row you want displayed in the table:
$userPairs = for($i = 0; $i -lt $finalUsers.Count; $i += 2){
$finalUsers[$i] |Select-Object Username,#{ Name='Username2'; Expression={ $finalUsers[$i+1].username } }
}
Result:
PS ~> $userPairs |Format-Table
Username Username2
-------- ---------
p1 p2
p3 p4

Group by certain record in array (pyspark)

I want to group a data in such a way that for particular record each array values also used to group for that record
I am able to group by name only. I am not able to figure out the way to this.
I have tried following query;
import pyspark.sql.functions as f
df.groupBy('name').agg(f.collect_list('data').alias('data_new')).show()
Following is the dataframe;
|-------|--------------------|
| name | data |
|-------|--------------------|
| a | [a,b,c,d,e,f,g,h,i]|
| b | [b,c,d,e,j,k] |
| c | [c,f,l,m] |
| d | [k,b,d] |
| n | [n,o,p,q] |
| p | [p,r,s,t] |
| u | [u,v,w,x] |
| b | [b,f,e,g] |
| c | [c,b,g,h] |
| a | [a,l,f,m] |
|----------------------------|
I am expecting following output;
|-------|----------------------------|
| name | data |
|-------|----------------------------|
| a | [a,b,c,d,e,f,g,h,i,j,k,l,m]|
| n | [n,o,p,q,r,s,t] |
| u | [u,v,w,x] |
|-------|----------------------------|

Batch Incremented File Rename

I'm trying to rename each file I have in a directory to an incremented value based on the current directory listing, so that
-------------------------
|-------------------------|
| | B1S1A800.ext |
| | B100M803.ext |
| | B100N807.ext |
| | B101S800.ext |
| | B102S803.ext |
-------------------------
Would instead look like:
-------------------------
|-------------------------|
| | 1.ext |
| | 2.ext |
| | 3.ext |
| | 4.ext |
| | 5.ext |
-------------------------
How would one go about achieving this in PowerShell?
This is a way:
$files = Get-ChildItem c:\yourpath\*.ext
$id = 1
$files | foreach {
Rename-Item -Path $_.fullname -NewName (($id++).tostring() + $_.extension)
}

How can I make this PowerShell script parse large files faster?

I have the following PowerShell script that will parse some very large file for ETL purposes. For starters my test file is ~ 30 MB. Larger files around 200 MB are expected. So I have a few questions.
The script below works, but it takes a very long time to process even a 30 MB file.
PowerShell Script:
$path = "E:\Documents\Projects\ESPS\Dev\DataFiles\DimProductionOrderOperation"
$infile = "14SEP11_ProdOrderOperations.txt"
$outfile = "PROCESSED_14SEP11_ProdOrderOperations.txt"
$array = #()
$content = gc $path\$infile |
select -skip 4 |
where {$_ -match "[|].*[|].*"} |
foreach {$_ -replace "^[|]","" -replace "[|]$",""}
$header = $content[0]
$array = $content[0]
for ($i = 1; $i -le $content.length; $i+=1) {
if ($array[$i] -ne $content[0]) {$array += $content[$i]}
}
$array | out-file $path\$outfile -encoding ASCII
DataFile Excerpt:
---------------------------
|Data statistics|Number of|
|-------------------------|
|Records passed | 93,118|
---------------------------
02/14/2012 Production Operations and Confirmations 2
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Production Operations and Confirmations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|ProductionOrderNumber|MaterialNumber |ModifiedDate|Plant|OperationRoutingNumber|WorkCenter|OperationStatus|IsActive| WbsElement|SequenceNumber|OperationNumber|OperationDescription |OperationQty|ConfirmedYieldQty|StandardValueLabor|ActualDirectLaborHrs|ActualContractorLaborHrs|ActualOvertimeLaborHrs|ConfirmationNumber|
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|180849518 |011255486L1 |02/08/2012 |2101 | 9901123118|56B30 |I9902 | |SOC10MA2302SOCJ31| |0140 |Operation 1 | 1 | 0 | 0.0 | | 499.990 | | 9908651250|
|180849518 |011255486L1 |02/08/2012 |2101 | 9901123118|56B30 |I9902 | |SOC10MA2302SOCJ31|14 |9916 |Operation 2 | 1 | 0 | 499.0 | | | | 9908532289|
|181993564 |011255486L1 |02/09/2012 |2101 | 9901288820|56B30 |I9902 | |SOC10MD2302SOCJ31|14 |9916 |Operation 1 | 1 | 0 | 499.0 | | 399.599 | | 9908498544|
|180885825 |011255486L1 |02/08/2012 |2101 | 9901162239|56B30 |I9902 | |SOC10MG2302SOCJ31| |0150 |Operation 3 | 1 | 0 | 0.0 | | 882.499 | | 9908099659|
|180885825 |011255486L1 |02/08/2012 |2101 | 9901162239|56B30 |I9902 | |SOC10MG2302SOCJ31|14 |9916 |Operation 4 | 1 | 0 | 544.0 | | | | 9908858514|
|181638583 |990104460I0 |02/10/2012 |2101 | 9902123289|56G99 |I9902 | |SOC11MAR105SOCJ31| |0160 |Operation 5 | 1 | 0 | 1,160.0 | | | | 9914295010|
|181681218 |990104460B0 |02/08/2012 |2101 | 9902180981|56G99 |I9902 | |SOC11MAR328SOCJ31|0 |9910 |Operation 6 | 1 | 0 | 916.0 | | | | 9914621885|
|181681036 |990104460I0 |02/09/2012 |2101 | 9902180289|56G99 |I9902 | |SOC11MAR108SOCJ31| |0180 |Operation 8 | 1 | 0 | 1.0 | | | | 9914619196|
|189938054 |011255486A2 |02/10/2012 |2101 | 9999206805|5AD99 |I9902 | |RS08MJ2305SOCJ31 | |0599 |Operation 8 | 1 | 0 | 0.0 | | | | 9901316289|
|181919894 |012984532A3 |02/10/2012 |2101 | 9902511433|A199399Z |I9902 | |SOC12MCB101SOCJ31|0 |9935 |Operation 9 | 1 | 0 | 0.5 | | | | 9916914233|
|181919894 |012984532A3 |02/10/2012 |2101 | 9902511433|A199399Z |I9902 | |SOC12MCB101SOCJ31|22 |9951 |Operation 10 | 1 | 0 | 68.080 | | | | 9916914224|
Your script reads one line at a time (slow!) and stores almost the entire file in memory (big!).
Try this (not tested extensively):
$path = "E:\Documents\Projects\ESPS\Dev\DataFiles\DimProductionOrderOperation"
$infile = "14SEP11_ProdOrderOperations.txt"
$outfile = "PROCESSED_14SEP11_ProdOrderOperations.txt"
$batch = 1000
[regex]$match_regex = '^\|.+\|.+\|.+'
[regex]$replace_regex = '^\|(.+)\|$'
$header_line = (Select-String -Path $path\$infile -Pattern $match_regex -list).line
[regex]$header_regex = [regex]::escape($header_line)
$header_line.trim('|') | Set-Content $path\$outfile
Get-Content $path\$infile -ReadCount $batch |
ForEach {
$_ -match $match_regex -NotMatch $header_regex -Replace $replace_regex ,'$1' | Out-File $path\$outfile -Append
}
That's a compromise between memory usage and speed. The -match and -replace operators will work on an array, so you can filter and replace an entire array at once without having to foreach through every record. The -readcount will cause the file to be read in chunks of $batch records, so you're basically reading in 1000 records at a time, doing the match and replace on that batch then appending the result to your output file. Then it goes back for the next 1000 records. Increasing the size of $batch should speed it up, but it will make it use more memory. Adjust that to suit your resources.
The Get-Content cmdlet does not perform as well as a StreamReader when dealing with very large files. You can read a file line by line using a StreamReader like this:
$path = 'C:\A-Very-Large-File.txt'
$r = [IO.File]::OpenText($path)
while ($r.Peek() -ge 0) {
$line = $r.ReadLine()
# Process $line here...
}
$r.Dispose()
Some performance comparisons:
Measure-Command {Get-Content .\512MB.txt > $null}
Total Seconds: 49.4742533
Measure-Command {
$r = [IO.File]::OpenText('512MB.txt')
while ($r.Peek() -ge 0) {
$r.ReadLine() > $null
}
$r.Dispose()
}
Total Seconds: 27.666803
This is almost a non-answer...I love PowerShell...but I will not use it to parse log files, especially large log files. Use Microsoft's Log Parser.
C:\>type input.txt | logparser "select substr(field1,1) from STDIN" -i:TSV -nskiplines:14 -headerrow:off -iseparator:spaces -o:tsv -headers:off -stats:off