Choosing columns for merged files in PowerShell

Choosing columns for merged files in PowerShell - powershell

I am using get-childitem to recurse through directories ( skipping some at the top level ) open a series of csv files, append the filename to the end each line of data
and combine the data into one.
$mergedData= Get-ChildItem $path -Exclude yesterday,"OHCC Extract",output |
Get-ChildItem -recurse -Filter *csv |
Where-Object { $_.CreationTime -gt (Get-Date).AddDays(-1) } |
% {
$file = $_.Name
$fn = $_.FullName
## capture the header line
$FirstLine = Get-Content $fn -TotalCount 1
## add the column header for filename
$header = $FirstLine + ",Filename"
## get the contents of the files without the first line
Get-Content $fn | SELECT -Skip 1 | %{ "$_,$file" }
}
Now each file had 5 columns , ID, First Name , Last Name , Phone , Address. The column names are surrounded by double quotes ( "ID", "First Name" ) .
The request is now to skip everything but the ID and the Last Name column. So I tried ( starting with just ID, will add First Name later)
Get-Content $fn | SELECT -Skip 1 -Property ID | %{ "$_,$file" }
I get #{ID=} in the resulting file.
Then I tried
Get-Content $fn | SELECT -Skip 1 | %{ $_.ID }
which yield blanks and then
Import-Csv -Path $fn -Delimiter ',' | SELECT ID
Which gives #{ID=73aec2fe-6cb3-492e-a157-25e355ed9691}
At this point I am just flailing because I obviously don't know how to handle objects in PS.
I have PowerShell 5.1.19041.1682 on windows 10.
Thanks

I was asked for sample data , so here it is. There are 35 files across multiple subdirectories
Input FileA
column1
ID
Column3
East
12
apple
west
5
pear
Input FileB
column1
ID
Column3
East
15
kiwi
Output
Column1
column3
Filename
East
kiwi
FileB
East
apple
FileA
west
pear
FileB
But I did figure it out myself . Working code
$path = "<directory where the files are located> "
$pathout = "<path to outputted file>"
$out = "$pathout\csv_merged_$(get-date -f MMddyyyy).csv"
$mergedData= Get-ChildItem $path -Exclude yesterday,output | Get-ChildItem -recurse -Filter *csv | Where-Object { $_.CreationTime -gt (Get-Date).AddDays(-1) } | % {
$file = $_.Name
$fn = $_.FullName
write-host $fn , $_.CreationTime
## get the contents of the files ,exclude columns and add columns
$Data = Import-Csv -Path $fn -Delimiter ',' | SELECT *, #{Name = 'Filename'; Expression = {$file}} -ExcludeProperty ID
# get the headers
$header= $Data | ConvertTo-Csv -NoTypeInformation | Select-Object -First 1
write-host $header
## convert the object and remove the column headers for each file
$Data | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1
write-host
write-host '-----------------------'
}
# Prefix the header before the compiled data
$header, $mergedData | Set-Content -Encoding utf8 $out
The missing piece was the ConvertTo_Csv which expanded the object.

Related

Powershell count matching variables in a single column

I have this CSV file that I kind of do a lot to. My most recent task is to add a summary sheet.
With that said I have a CSV file I pull from a website and send through lot of checks. Code Below:
$Dups = import-csv 'C:\Working\cylrpt.csv' | Group-Object -Property 'Device Name'| Where-Object {$_.count -ge 2} | ForEach-Object {$_.Group} | Select #{Name="Device Name"; Expression={$_."Device Name"}},#{Name="MAC"; Expression={$_."Mac Addresses"}},Zones,#{Name="Agent"; Expression={$_."Agent Version"}},#{Name="Status"; Expression={$_."Is online"}}
$Dups | Export-CSV $working\temp\01-Duplicates.csv -NoTypeInformation
$csvtmp = Import-CSV $working\cylrpt.csv | Select #{N='Device';E={$_."Device Name"}},#{N='OS';E={$_."OS Version"}},Zones,#{N='Agent';E={$_."Agent Version"}},#{N='Active';E={$_."Is Online"}},#{N='Checkin';E={[DateTime]$_."Online Date"}},#{N='Checked';E={[DateTime]$_."Offline Date"}},Policy
$csvtmp | %{
if ($_.Zones -eq ""){$_.Zones = "Unzoned"}
}
$csvtmp | Export-Csv $working\cy.csv -NoTypeInformation
import-csv $working\cy.csv | Select Device,policy,OS,Zones,Agent,Active,Checkin,Checked | % {
$_ | Export-CSV -path $working\temp\$($_.Zones).csv -notypeinformation -Append
}
The first check is for duplicates, I used separate lines of code for this because I wanted to create a CSV for duplicates.
The second check backfills all blank cells under the Zones column with "UnZoned"
The third thing is does is goes through the entire CSV file and creates a CSV file for each Zone
So this is my base. I need to add another CSV file for a Summary of the Zone information. The Zones are in the format of XXX-WS or XXX-SRV, where XXX can be between 3 and 17 letters.
I would like the Summary sheet to look like this
ABC ###
ABC-WS ##
ABC-SRV ##
DEF ###
DEF-WS ##
DEF-SRV ##
My thoughts are to either do the count from the original CSV file or to count the number of lines in each CSV file and subtract 1, for the header row.
Now the Zones are dynamic so I can't just say I want ZONE XYZ, because that zone may not exist.
So what I need is to be able to either count the like zone type in the original file and either output that to an array or file, that would be my preferred method to give the number of items with the same zone name. I just don't know how to write it to look for and count matching variables. Here is the code I'm trying to use to get the count:
import-csv C:\Working\cylrpt.csv | Group-Object -Property 'Zones'| ForEach-Object {$_.Group} | Select #{N='Device';E={$_."Device Name"}},Zones | % {
$Znum = ($_.Zones).Count
If ($Znum -eq $null) {
$Znum = 1
} else {
$Znum++
}
}
$Count = ($_.Zones),$Znum | Out-file C:\Working\Temp\test2.csv -Append
Here is the full code minus the report key:
$cylURL = "https://protect.cylance.com/Reports/ThreatDataReportV1/devices/"
$working = "C:\Working"
Remove-item -literalpath "\\?\C:\Working\Cylance Report.xlsx"
Invoke-WebRequest -Uri $cylURL -outfile $working\cylrpt.csv
$Dups = import-csv 'C:\Working\cylrpt.csv' | Group-Object -Property 'Device Name'| Where-Object {$_.count -ge 2} | ForEach-Object {$_.Group} | Select #{Name="Device Name"; Expression={$_."Device Name"}},#{Name="MAC"; Expression={$_."Mac Addresses"}},Zones,#{Name="Agent"; Expression={$_."Agent Version"}},#{Name="Status"; Expression={$_."Is online"}}
$Dups | Export-CSV $working\temp\01-Duplicates.csv -NoTypeInformation
$csvtmp = Import-CSV $working\cylrpt.csv | Select #{N='Device';E={$_."Device Name"}},#{N='OS';E={$_."OS Version"}},Zones,#{N='Agent';E={$_."Agent Version"}},#{N='Active';E={$_."Is Online"}},#{N='Checkin';E={[DateTime]$_."Online Date"}},#{N='Checked';E={[DateTime]$_."Offline Date"}},Policy
$csvtmp | %{
if ($_.Zones -eq ""){$_.Zones = "Unzoned"}
}
$csvtmp | Export-Csv $working\cy.csv -NoTypeInformation
import-csv $working\cy.csv | Select Device,policy,OS,Zones,Agent,Active,Checkin,Checked | % {
$_ | Export-CSV -path $working\temp\$($_.Zones).csv -notypeinformation -Append
}
cd $working\temp;
Rename-Item "Unzoned.csv" -NewName "02-Unzoned.csv"
Rename-Item "Systems-Removal.csv" -NewName "03-Systems-Removal.csv"
$CSVFiles = Get-ChildItem -path $working\temp -filter *.csv
$Excel = "$working\Cylance Report.xlsx"
$Num = $CSVFiles.Count
Write-Host "Found the following Files: ($Num)"
ForEach ($csv in $CSVFiles) {
Write-host "Merging $CSVFiles.Name"
}
$EXc1 = New-Object -ComObject Excel.Application
$Exc1.SheetsInNewWorkBook = $CSVFiles.Count
$XLS = $EXc1.Workbooks.Add()
$Sht = 1
ForEach ($csv in $CSVFiles) {
$Row = 1
$Column = 1
$WorkSHT = $XLS.WorkSheets.Item($Sht)
$WorkSHT.Name = $csv.Name -Replace ".csv",""
$File = (Get-Content $csv)
ForEach ($line in $File) {
$LineContents = $line -split ',(?!\s*\w+")'
ForEach ($Cell in $LineContents) {
$WorkSHT.Cells.Item($Row,$Column) = $Cell -Replace '"',''
$Column++
}
$Column = 1
$Row++
}
$Sht++
}
$Output = $Excel
$XLS.SaveAs($Output)
$EXc1.Quit()
Remove-Item *.csv
cd ..\

Found the solution
$Zcount = import-csv C:\Working\cylrpt.csv | where Zones -ne "$null" | select #{N='Device';E={$_."Device Name"}},Zones | group Zones | Select Name,Count
$Zcount | Export-Csv -path C:\Working\Temp\01-Summary.csv -NoTypeInformation

Need to output multiple rows to CSV file

I am using the following script that iterates through hundreds of text files looking for specific instances of the regex expression within. I need to add a second data point to the array, which tells me the object the pattern matched in.
In the below script the [Regex]::Matches($str, $Pattern) | % { $_.Value } piece returns multiple rows per file, which cannot be easily output to a file.
What I would like to know is, how would I output a 2 column CSV file, one column with the file name (which should be $_.FullName), and one column with the regex results? The code of where I am at now is below.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Lines = #()
Get-ChildItem -Recurse $FolderPath -File | ForEach-Object {
$_.FullName
$str = Get-Content $_.FullName
$Lines += [Regex]::Matches($str, $Pattern) |
% { $_.Value } |
Sort-Object |
Get-Unique
}
$Lines = $Lines.Trim().ToUpper() -replace '[\r\n]+', ' ' -replace ";", '' |
Sort-Object |
Get-Unique # Cleaning up data in array

I can think of two ways but the simplest way is to use a hashtable (dict). Another way is create psobjects to fill your Lines variable. I am going to go with the simple way so you can only use one variable, the hashtable.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Results =#{}
Get-ChildItem -Recurse $FolderPath -File |
ForEach-Object {
$str = Get-Content $_.FullName
$Line = [regex]::matches($str,$Pattern) | % { $_.Value } | Sort-Object | Get-Unique
$Line = $Line.Trim().ToUpper() -Replace '[\r\n]+', ' ' -Replace ";",'' | Sort-Object | Get-Unique # Cleaning up data in array
$Results[$_.FullName] = $Line
}
$Results.GetEnumerator() | Select #{L="Folder";E={$_.Key}}, #{L="Matches";E={$_.Value}} | Export-Csv -NoType -Path <Path to save CSV>
Your results will be in $Results. $Result.keys contain the folder names. $Results.Values has the results from expression. You can reference the results of a particular folder by its key $Results["Folder path"]. of course it will error if the key does not exist.

Sort folder contents, modify file with LastWriteTime

Have this code:
gci D:\Files\Folder |
sort LastWriteTime | select -Last 1 |
foreach-object {$line -replace "\<", ""}
Not working. Tried many variations. Need to replace the "<" character in the file last modified in Folder. Managed to have the correct file selected and written to powershell console. Just cannot remove the "<" character from the file with LastWriteTime.

Windows doesn't allow the < character in file names so I guess you want to modify the file contents removing all occurrences. If so, there are many ways to do that. Example:
# Getting the name of the last modified file.
$file_name = Get-ChildItem D:\Files\Folder | Sort-Object LastWriteTime `
| ? { ! $_.PSIsContainer } | Select-Object -Last 1 | % {$_.FullName }
# Reading the file into a single string.
$string = Get-Content $file_name | Out-String
# Modifying the string and writing the output back to the file.
$string -replace "<", "" | Out-File $file_name
The problem with your initial code is that $line is not defined anywhere. You need to read text from the file first.

If you're trying to replace the < in the filename then this should work:
foreach-object {$_.Name.Replace("<", "")}
To edit the contents of the file you could do this:
$file = gci D:\Files\Folder | sort LastWriteTime | select -Last 1
$temp = $file | gc | foreach-object {$_.Replace("<", "")}
$temp | Out-File $file.FullName

Powershell csv remove lines

I have a CSV file (file1) that looks like: (User dirs and the size)
Initials,Size
User1,10
User2,100
User3,131
User4,140
I have another CSV file (file2) that looks like: (VIP users)
User2
User4
Now what I'm trying to do, is to update file1, so it looks like:
User1,10
User3,131
User2 and User4 is removed because they are in file2
I can get them removed, but at the same time I remove the size for all users, so my output is only the Users:
User1
User3
My code:
$SourcePath = "\\server1\info\SYSINFO\UsrSize"
$DestinationFile = "\\server1\info\SYSINFO\UsrSize\OverLimit\UsersOverLimit1.log"
$VIP_Exclusion_List = "\\server1\info\SYSINFO\UsrSize\OverLimit\_VIP_EXCLUSION_LIST.txt"
$Database = "\\server1\info\SYSINFO\UsrSize\OverLimit\_UsersOverLimitDATABASE.log"
$INT_SizeToLookFor = 100
dir $SourcePath -Filter usr*.txt | import-csv -delimiter "`t" |
Where-Object {[INT] $_."Size excl. Backup/Pst" -ge $INT_SizeToLookFor} |
Select-Object Initials,"Size excl. Backup/Pst" | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file $DestinationFile ;
$Userlist = import-csv $DestinationFile | Select-Object Initials |
convertto-csv -NoTypeInformation | % { $_ -replace '"', ""};
compare-object ($Userlist) (get-content $VIP_Exclusion_List) |
select-object inputObject | convertto-csv -NoTypeInformation |
% { $_ -replace '"', ""} | out-file "\\server1\info\SYSINFO\UsrSize\OverLimit\UsersOverLimitThisTime.log";

If the files are small-ish and you don't care too much about performance, then the following would be a trivial way:
$data = Import-Csv file1
$vips = Import-Csv file2
$data = $data | ?{ $vips -notcontains $_.Initials }
$data | Export-Csv file1_new -NoTypeInformation
A faster way would be to add the names to remove to a set, but given the things you're talking about here I doubt you'll get into the range of a few thousand or million users.

I solved it using this code:
$ArrayVIP = get-content $VIP_Exclusion_List
select-string $DestinationFile -pattern $ArrayVIP -notmatch |
select -expand line |
out-file $DestinationFile
Taken from here: Removing lines from a CSV

Format results in table

My script below searches for a specific part number (459279) recursively through a number of txt files.
set-location C:\Users\admin\Desktop\PartNumbers\
$exclude = #('PartNumbers.txt','*.ps1')
$Line = [Environment]::NewLine
Get-ChildItem -Recurse -Exclude $exclude | select-string 459279 | group count | Format-Table -Wrap -AutoSize -Property Count,
#{Expression={$_.Group | foreach { $_.filename} | Foreach {"$_$Line"} }; Name="Filename"},
#{Expression={$_.Group | foreach {$_.LineNumber} | Foreach {"$_$Line"} }; Name="LineNumbers"} | Out-File 459279Results.txt
My Results are:
Count Filename LineNumbers
----- -------- -----------
2 {Customer1.txt {2
, Customer2.txt , 3
} }
My ideal results would be if this is possible:
Part Number: 459279
Count: 2
Filename LineNumbers
-------- -----------
Customer1.txt 2
Customer2.txt 3
I have manually retrieved the part number '459279' from "PartNumbers.txt" and searched for it using the script.
I cannot seem to remove/replace the braces and commas to present a clean list.
What I hope to eventually do is to recursively search through "PartNumbers.txt" and produce a report with each part number appended to the next in the style mentioned above.
PartNumbers.txt is formatted:
895725
939058
163485
459279
498573
Customer*.txt are formatted:
163485
459279
498573

Something like this should work:
$exclude = 'PartNumbers.txt', '*.ps1'
$root = 'C:\Users\admin\Desktop\PartNumbers'
$outfile = Join-Path $root 'loopResults.txt'
Get-Content (Join-Path $root 'PartNumbers.txt') | % {
$partno = $_
$found = Get-ChildItem $root -Recurse -Exclude $exclude `
| ? { -not $_.PSIsContainer } `
| Select-String $partno `
| % {
$a = $_ -split ':'
New-Object -Type PSCustomObject -Property #{
'Filename' = Split-Path -Leaf $a[1];
'LineNumbers' = $a[2]
}
}
"Part Number: $partno"
'Count: ' + #($found).Count
$found | Format-Table Filename, LineNumbers
} | Out-File $outfile -Append

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Choosing columns for merged files in PowerShell - powershell

Related

Powershell count matching variables in a single column

Need to output multiple rows to CSV file

Sort folder contents, modify file with LastWriteTime

Powershell csv remove lines

Format results in table

Categories

Resources