I'm trying to write a Powershell script which will take a several very long space-separated files and export some columns to similarly-named CSV files.
I do have a successful version:
Foreach ($file in $files) {
$WriteString=""
$outfile = $path + "\" + ($file -replace ".{4}$") + ".csv"
Get-Content -Path $path"\"$file | Select-Object -Skip $lines | ForEach-Object{
$ValueArray = ($_ -split "\s+")
$WriteString += $ValueArray[1] + "," + $ValueArray[2] + "," + $ValueArray[3] + "`n"
}
Add-Content -Path $outfile -Value $Writestring
}
This works, but is extremely slow - it takes over 16 hours for the script to fully run. The main cause (I think) is adding to the string. I've tried improving this using a hashtable:
Foreach ($file in $files) {
$outfile = $path + "\" + ($file -replace ".{4}$") + ".csv"
$ParseLines = Get-Content -Path $path"\"$file | Select-Object -Skip $lines
$OutputData = ForEach ($Line in $ParseLines) {
$ValueArray = ($Line -split "\s+")
$Line | Select-Object $ValueArray[1], $ValueArray[2], $ValueArray[3]
}
$OutputData | Export-CSV -Path $outfile #-NoTypeInformation
}
However, this is only exporting one line of the hashtable:
#TYPE Selected.System.String
"636050.000","7429825.000","77.438"
,,
,,
,,
,,
,,
,,
If I change the last line to:
Set-Content -Path $outfile -Value $OutputData
then the output becomes:
#{636050.000=; 7429825.000=; 77.438=}
#{636075.000=; 7429825.000=; 75.476=}
#{636100.000=; 7429825.000=; 74.374=}
#{636125.000=; 7429825.000=; 73.087=}
#{636150.000=; 7429825.000=; 71.783=}
#{636175.000=; 7429825.000=; 70.472=}
I'm clearly doing something wrong with either the hashtable or Export-CSV, but I can't figure it out. Any help will be greatly appreciated.
As requested below, here's part of one source file. I cut out all non-data rows, and don't include headers in my output CSV, as the input program (that the CSV files go into) doesn't require them, and the outputs are self-evident (Not much chance of getting the X, Y and Z values wrong just by looking at the data).
*
* DEFINITION
* HEADER_VARIABLES 3
* QUALITIES C 16 0 key
* DATE C 12 0
* TIME C 12 0
* VARIABLES 4
* X F 12 3
* Y F 12 3
* Z F 12 3
* gcmaq0.drg F 12 3
*
* 1 2 3 4
*23456789012345678901234567890123456789012345678
* X| Y| Z| gcmaq0.drg|
*
* HEADER:QUALITIES 29Aug2018 13:53:16
636575.000 7429800.000 75.551 75.551
636600.000 7429800.000 77.358 77.358
636625.000 7429800.000 78.823 78.823
636650.000 7429800.000 80.333 80.333
636675.000 7429800.000 82.264 82.264
636700.000 7429800.000 84.573 84.573
636725.000 7429800.000 87.447 87.447
Avoid slow operations like appending to strings (or arrays) in a loop. Change this:
Get-Content -Path $path"\"$file |
Select-Object -Skip $lines |
ForEach-Object {
$ValueArray = ($_ -split "\s+")
$WriteString += $ValueArray[1] + "," + $ValueArray[2] + "," + $ValueArray[3] + "`n"
}
Add-Content -Path $outfile -Value $Writestring
into this:
Get-Content -Path "${path}\${file}" |
Select-Object -Skip $lines |
ForEach-Object {
($_ -split "\s+")[1..3] -join ','
} |
Set-Content -Path $outfile
Replace Set-Content with Add-Content if you actually want to append to an existing file.
Export-Csv works with objects. It expects properties and values - what you're providing (judging from the Set-Content results) is hashtable with keys only.
One way around this is to create an object and increment values from each line.
Foreach ($file in $files) {
$outfile = $path + "\" + ($file -replace ".{4}$") + ".csv"
$ParseLines = Get-Content -Path $path"\"$file | Select-Object -Skip $lines
ForEach ($Line in $ParseLines) {
$ValueArray = ($Line -split "\s+")
[array]$OutputData += [pscustomobject]#{
header1 = $ValueArray[1]
header2 = $ValueArray[2]
header3 = $ValueArray[3]
}
}
$OutputData | Export-CSV -Path $outfile #-NoTypeInformation
}
Not sure if this is the optimal way if you have very large files - am sure a regex guru can come up with something more efficient.
The solution above by Ansgar Wiechers worked best, but I also found a second way of doing it at this SO question. It uses a ArrayList to store the hashtable, then writes the ArrayList. This method is almost, but not quite as fast as Ansgar's solution. (About 10x faster than string method, vs 12x for regex method)
Foreach ($file in $files) {
[System.Collections.ArrayList]$collection = New-Object System.Collections.ArrayList($null)
$outfile = $path + "\" + ($file -replace ".{4}$") + ".csv"
$ParseLines = Get-Content -Path $path"\"$file | Select-Object -Skip $lines
$OutputData =#{}
ForEach ($Line in $ParseLines) {
$ValueArray = ($Line -split "\s+")
$OutputData.Easting = $ValueArray[1]
$OutputData.Northing = $ValueArray[2]
$OutputData.ZValue = $ValueArray[3]
$collection.Add((New-Object PSObject -Property $OutputData)) | Out-Null
}
$collection | Export-CSV -Path $outfile -NoTypeInformation
}
Related
I habe some files, they are all named after the same scheme (Wien - Number - Text, e.g. Wien - 001 - Text, Wien - 002 - Text). The files are nfo.
They have nfo files, but they are not correct, I was able to delete the wrong entry with advanced renamer.
Now I want to put the Number in line 8 of the nfo file.
I got this to work with only one file but I have several files in one folder and I haven't got this to work.
Here is my script
Get-ChildItem -Filter "*.nfo" | Foreach-Object{
$baseName = $_.BaseName
$array = $baseName -Split ' - '
$addToFile = '<episode>' + $array[1] + '</episode>'
$filePath = ".\*.nfo"
$fileContent = Get-Content $filePath
$lineNumber = "8"
$textToAdd = $addToFile
$fileContent[$lineNumber-1] = $textToAdd
$fileContent | Set-Content $filePath
}
As Bacon Bits already commented, you're not using the ful path and filename of the file where you already have that in $_.FullName.
Also, you could do with a lot less 'in-between' variables.
Try
$lineNumber = 8
Get-ChildItem -Path 'X:\WhereTheFilesAre' -Filter "* - *.nfo" | Foreach-Object {
$fileContent = Get-Content -Path $_.FullName
$fileContent[$lineNumber-1] = '<episode>{0}</episode>' -f ($_.BaseName -split ' - ')[1]
$fileContent | Set-Content $_.FullName
}
'<episode>{0}</episode>' -f ($_.BaseName -split ' - ')[1] is a very nice way to create a new string using the Format operator
I am building am updating a script which imports a large CSV file and then splits it into lots of separate CSV files based on the value in the first two columns
so POIMP_NL_20210306.csv which contains:
DOC_NUMBER|COMMENTS|ITEM|QTY|SUPPLIER
P-100-1234|JANE|5059585896978|2|"JOES SUPPLIES"
P-100-1234|JANE|5059585896985|2|"JOES SUPPLIES"
P-100-6666|TED|5059585896992|1|"ACTION TOYS"
must be split into POIMP_P-100-1234_JANE.csv containing
P-100-1234|JANE|5059585896978|2|"JOES SUPPLIES"
P-100-1234|JANE|5059585896985|2|"JOES SUPPLIES"
and POIMP_P-100-6666_TED.csv
P-100-6666|TED|5059585896992|1|"ACTION TOYS"
The problem I am trying to solve is preserving the quotes in just the SUPPLIER column
Since ConvertTo-Csv adds quotes to everything, I use a % { $_ -replace '"', ""} to remove these all before the out-file is created but of course it removes these from the SUPPLIER column 2
Here is my script which perfectly splits the big file into smaller files by DOC_NUMBER and COMMENTS but removes all quotes:
$basePath = "C:\"
$archivePath = "$basePath\archive\"
$todaysDate = $(get-date -Format yyyyMMdd)
$todaysFiles = #(
(Get-ChildItem -Path $basePath | Where-Object { $_.Name -match 'POIMP_' + $todaysDate })
)
cd $basePath
foreach ($file in $todaysFiles ) {
$fileName = $file.ToString()
Import-Csv $fileName -delimiter "|" | Group-Object -Property "DOC_NUMBER","COMMENTS" |
Foreach-Object {
$newName = $_.Name -replace ",","_" -replace " ",""; $path=$fileName.SubString(0,8) + $newName+".csv" ; $_.group |
ConvertTo-Csv -NoTypeInformation -delimiter "|" | % { $_ -replace '"', ""} | out-file $path -fo -en ascii
}
Rename-Item $fileName -NewName ([io.path]::GetFileNameWithoutExtension("$fileName") + "_Original.csv")
Move-Item (Get-ChildItem -Path $basePath | Where-Object { $_.Name -match '_Original' }) $archivePath -force
}
And here is another script which I found online and amended and which successfully leaves quotes in just the SUPPLIER column by first adding double back ticks and then replacing these with quotes after all others have been removed
$ImportedCSV = Import-CSV "C:\POIMP_NL_20210306.csv" -delimiter "|"
$NewCSV = Foreach ($Entry in $ImportedCsv) {
$Entry.SUPPLIER = '¬¬' + $Entry.SUPPLIER + '¬¬'
$Entry
}
$NewCSV |
ConvertTo-Csv -NoTypeInformation -delimiter "|" | % { $_ -replace '"', ""} | % { $_ -replace '¬¬', '"'} | out-file "C:\updatedPO.csv" -fo -en ascii
I just can't merge these scripts to achieve the desired result as I can't seem to reference the correct object. I'd really appreciate your help! Thanks
Any good CSV reader should be able to handle quotes around csv fields, even when not really needed.
Having said that, It is your explicit wish to only have quotes around the field in the SUPPLIER column. (Note, in your example there is a trailing space after that column name)
In this case, I think this would help.
Not only does it surround the SUPPLIER fields with quotes, but also saves the data as separate files using the values from column DOC_NUMBER and COMMENTS per group found in the csv
$path = 'D:\Test'
$fileIn = Join-Path -Path $path -ChildPath 'POIMP_NL_20210306.csv'
# import the csv file and group first two columns
Import-Csv -Path $fileIn -Delimiter '|' | Group-Object -Property "DOC_NUMBER","COMMENTS" | ForEach-Object {
$headerDone = $false
$data = foreach ($item in $_.Group) {
if (!$headerDone) {
$item.PsObject.Properties.Name -join '|'
$headerDone = $true
}
$item.SUPPLIER = '"{0}"' -f $item.SUPPLIER
$item.PsObject.Properties.Value -join '|'
}
# create a new filename like 'POIMP_P-100-1234_JANE.csv'
$fileOut = Join-Path -Path $path -ChildPath ('POIMP_{0}_{1}.csv' -f $_.Group[0].DOC_NUMBER, $_.Group[0].COMMENTS)
# save the data not using Export-Csv because that will add quotes around everything (in PowerShell 5)
$data | Set-Content -Path $fileOut -Force
}
Output
POIMP_P-100-1234_JANE.csv
DOC_NUMBER|COMMENTS|ITEM|QTY|SUPPLIER
P-100-1234|JANE|5059585896978|2|"JOES SUPPLIES"
P-100-1234|JANE|5059585896985|2|"JOES SUPPLIES"
POIMP_P-100-6666_TED.csv
DOC_NUMBER|COMMENTS|ITEM|QTY|SUPPLIER
P-100-6666|TED|5059585896992|1|"ACTION TOYS"
If you are Powershell 7 or later, you can use
$yourdata | ConvertTo-Csv -NoTypeInformation -QuoteFields "SUPPLIER" -Delimiter "|" |
Out-File ...
or you could use
$yourdata | Export-Csv -NoTypeInformation -QuoteFields "SUPPLIER" `
-Delimiter "|" -Path <path-to-output-file>.csv
You can also use -UseQuotes AsNeeded to let the converter add quoting where it thinks it makes sense, otherwise just specify the fields you want quoted.
here is my problem:
I have a list of only lastnames stored in a smallfile.csv. Each name is a separate line.
I have a second hugefile.csv with several strings per line among them somewhere the last name.
Now I want to search each lastname of smallfile.csv in hugefile.csv and save the resulting hits.
I managed to do this manually for one lastname at a time using the following command line:
Get-Content 'C:\Y\Y\MAG\hugefile.csv' –read 10000000 | foreach { $_ -match "Smith"}|Out-File 'C:\Y\Y\MAG\Smith.csv'
How can I loop over this command drawing from the first file?
I tried something like this which did not work:
foreach($line in Get-Content C:\Y\Y\MAG\smallfile.csv') {
Get-Content 'C:\Y\Y\MAG\hugefile.csv' | foreach { $_ -match $line}| Out-File 'C:\Y\Y\MAG\$line.csv' -append
}
Thanks for suggestions!
Something like this should work.
$rootPath = "C:\Y\Y\MAG\"
$smallFilePath = $rootPath + "smallfile.csv"
$hugeFilePath = $rootPath + "hugefile.csv"
$smallFile = Get-Content -Path $smallFilePath
foreach ($line in $smallFile)
{
$outPath = $rootPath + $line + ".csv"
Get-Content -Path $hugeFilePath | Where { $_ -match $line } | Out-File $outPath
}
I'm trying to loop a pipe delimited file, check if the line has 28 columns and just export that line to the file. This is the primary question and answer I'm looking for (I have researched a lot, but I'm new to PS and so many different ways I need some help). The following works but there are two issues. The export file will not let me choose pipe delimited, AND the output goes in alphabetical format by field name, not by ordinal.
Also, is there a way I can make the output not "text" qualified?
$path = Get-ChildItem c:\temp\*.txt
$staticPath = '0859'
$year = Get-Date -format yy
$month = Get-Date -format MM
$day = Get-Date -format dd
$output = 'c:\temp\' + $year + $month + $day + $staticPath + '.TXT'
$outputbad = 'c:\temp\BAD' + $year + $month + $day + $staticPath + '.TXT'
$input = 'c:\temp\' + $path.Name
$input
$csv = Import-Csv -path $input -Delimiter "|"
foreach($line in $csv)
{
$properties = $line | Get-Member -MemberType Properties
$row = ''
$properties.Count
$obj = new-object PSObject
for($i=0; $i -lt $properties.Count;$i++)
{
$column = $properties[$i]
$columnvalue = $line | Select -ExpandProperty $column.Name
#$row += $columnvalue
$obj | add-member -membertype NoteProperty -name $column.Name -value $columnvalue
}
if($properties.Count -eq 28)
{
$obj | export-csv -Path $output -Append -NoTypeInformation
#$row | Export-Csv -Path $output -Append -Delimiter "|" -NoTypeInformation
}
else
{
$obj | Export-Csv -Path $outputbad -Append -Delimiter "|" -NoTypeInformation
}
}
If you want to avoid any chance of changing the formatting of these lines files, perhaps you don't want to use the -CSV commands. Export-csv can add quotation marks etc. Here's different way that might do what you want:
$path | ForEach-Object {
$good = #()
$bad = #()
Get-Content $_ | ForEach-Object {
if (($value = $_ -split '\|').length -eq 28) {
$good += $_
} else {
$bad += $_
}
}
if ($good) { Out-File -Append -InputObject $good $output }
if ($bad) { Out-File -Append -InputObject $bad $outputbad }
}
Note however that this will count quoted values containing a pipe differently than import-csv. Pipe separated values are sometimes generated without any quoting logic.
The $values variable will be an array of individual columns, so if you want to write some code to fix them up inside the if, you can use that then join them back up with $good += $values -join '|', or perhaps use another regex to fix errors.
silly me, here is the answer. But if somebody could answer why the PSObject or the properties loop is alphabetical it would be helpful. Soon i would like to add intelligence to this to check ordinal (not alphabetic field name order) a field if its a integer or not, then i know how to fix the BAD record.
$path = Get-ChildItem c:\temp\*.txt
$staticPath = '0859'
$year = Get-Date -format yy
$month = Get-Date -format MM
$day = Get-Date -format dd
$output = 'c:\temp\' + $year + $month + $day + $staticPath + '.TXT'
$outputbad = 'c:\temp\BAD' + $year + $month + $day + $staticPath + '.TXT'
$input = 'c:\temp\' + $path.Name
$csv = Import-Csv -path $input -Delimiter "|"
foreach($line in $csv)
{
$properties = $line | Get-Member -MemberType Properties
if($properties.Count -eq 28)
{
$line | export-csv -Path $output -Append -NoTypeInformation -Delimiter "|"
}
else
{
$line | Export-Csv -Path $outputbad -Append -Delimiter "|"
}
}
I'm trying to figure out how to incorporate a line count that gets added to each file for a loop. The count needs to be put into the footer of each file as it checks it. Another concern is that the count needs to include the addition of the header and footer lines (i.e. 8 lines + 1 header + 1 footer = 10). My code I'm using is below and I know that the code to count the lines is Get-Content $mypath | Measure-Object -Line | {$linecount = $_.Count} but I dont know how to properly incorporate it. Any suggestions?
Get-ChildItem $destinationfolderpath -REcurse -Filter *.txt | ForEach-Object -Begin { $seq = 0 } -Process {
$seq++
$seq1 = "{0:D4}" -f $seq; $header="File Sequence Number $seq1"
$footer="File Sequence Number $seq1 and Line Count $looplinecount"
$header + "`n" + (Get-Content $_.FullName | Out-String) + $footer | Set-Content -Path $_.FullName
}
So load the content of the file to a variable within the loop, perform your measure -line on that variable, add 2 (one of the header line, one for the footer line), and drop that into a sub-expression for the footer...
Get-ChildItem $destinationfolderpath -REcurse -Filter *.txt | ForEach-Object -Begin { $seq = 0 } -Process {
$seq++
$seq1 = "{0:D4}" -f $seq
$header="File Sequence Number $seq1"
$Content=Get-Content $_.FullName | Out-String
$footer="File Sequence Number $seq1 and Line Count $(($content|measure -line|select -expand lines)+2)"
"$header`n$Content$footer" | Set-Content -Path $_.FullName
}