Get-Content Measure-Object Command : Additional rows are added to the actual row count - powershell

This is my first post here - my apologies in advance if I didn't follow a certain etiquette for posting. I'm a newbie to powershell, but I'm hoping someone can help me figure something out.
I'm using the following powershell script to tell me the total count of rows in a CSV file, minus the header. This generated into a text file.
$x = (Get-Content -Path "C:\mysql\out_data\18*.csv" | Measure-Object -Line).Lines
$logfile = "C:\temp\MyLog.txt"
$files = get-childitem "C:\mysql\out_data\18*.csv"
foreach($file in $files)
{
$x--
"File: $($file.name) Count: $x" | out-file $logfile -Append
}
I am doing this for 10 individual files. But there is just ONE file that keeps adding exactly 807 more rows to the actual count. For example, for the code above, the actual row count (minus the header) in the file is 25,083. But my script above generates 25,890 as the count. I've tried running this for different iterations of the same type of file (same data, different days), but it keeps adding exactly 807 to the row count.
Even when running only (Get-Content -Path "C:\mysql\out_data\18*.csv" | Measure-Object -Line).Lines, I still see the wrong record count in the powershell window.
I'm suspicious that there may be a problem specifically with the csv file itself? I'm coming to that conclusion since 9 out of 10 files generate the correct row count. Thank you advance for your time.

To measure the items in a csv you should use Import-Csv rather than Get-Content. This way you don't have to worry about headers or empty lines.
(Import-Csv -Path $csvfile | Measure-Object).Count

It's definitely possible there's a problem with that csv file. Also, note that if the csv has cells that include linebreaks that will confuse Get-Content so also try Import-CSV
I'd start with this
$PathToQuestionableFile = "c:\somefile.csv"
$TestContents = Get-Content -Path $PathToQuestionableFile
Write-Host "`n-------`nUsing Get-Content:"
$TestContents.count
$TestContents[0..10]
$TestCsv = Import-CSV -Path $PathToQuestionableFile
Write-Host "`n-------`nUsing Import-CSV:"
$TestCsv.count
$TestCsv[0..10] | Format-Table
That will let you see what Get-Content is pulling so you can narrow down where the problem is.
If it is in the file itself and using Import-CSV doesn't fix it I'd try using Notepad++ to check both the encoding and the line endings
encoding is a drop down menu, compare to the other csv files
line endings can be seen with (View > Show Symbol > Show All Characters). They should be consistent across the file, and should be one of these
CR (typically if it came from a mac)
LF (typically if it came from *nix or the internet)
CRLF (typically if it came from windows)

Related

CSV splitting causes errors

Could you please help me with an issue described below?
I wrote a script in PS which tries to split large CSV file (30 000 rows / 6MB) into smaller ones. New files are named as a mix of 1st and 2nd column content. If file already exists, script only appends new lines.
Main CSV file example:
Site;OS.Type;Hostname;IP address
Amsterdam;Server;AMS_SRVDEV01;10.10.10.12
Warsaw;Workstation;WAR-L4D6;10.10.20.22
Ankara;Workstation;AN-D5G36;10.10.13.22
Warsaw;Workstation;WAR-SRVTST02;10.10.20.33
Amsterdam;Server;LON-SRV545;10.10.10.244
PowerShell Version: 5.1.17134.858
function Csv-Splitter {
$fileName = Read-Host "Pass file name to process: "
$FileToProcess = Import-Csv "$fileName.csv" -Delimiter ';'
$MyList = New-Object System.Collections.Generic.List[string]
foreach ($row in $FileToProcess) {
if ("$($row.'OS.Type')-$($row.Site)" -notin $MyList) {
$MyList.Add("$($row.'OS.Type')-$($row.Site)")
$row | Export-Csv -Delimiter ";" -Append -NoTypeInformation "$($row.'OS.Type')-$($row.Site).csv"
}
else {
$row | Export-Csv -Delimiter ";" -Append -NoTypeInformation "$($row.'OS.Type')-$($row.Site).csv"
}
}
}
Basically, code works fine, however it generates some errors from time to time when it process through the loop. This causes lack of some rows in new files - number of missing rows equals to amount of errors:
Export-Csv : The process cannot access the file 'C:\xxx\xxx\xxx.csv' because
it is being used by another process.
Export-Csv is synchronous - by the time it returns, the output file has already been closed - so the code in the question does not explain the problem.
As you've since confirmed in a comment, based on a suggestion by Lee_Dailey, the culprit was the AV (anti-virus) Mcafee On-Access Scan module, which accessed each newly created file behind the scenes, thereby locking it temporarily, causing Export-Csv to fail intermittently.
The problem should go away if all output files can be fully created with a single Export-Csv call each, after the loop, as also suggested by Lee. This is preferable for performance anyway, but assumes that the entire CSV file fits into memory as a whole.
Here's a Group-Object-based solution that uses a single pipeline to implement write-each-output-file-in-full functionality:
function Csv-Splitter {
$fileName = Read-Host "Pass file name to process: "
Import-Csv "$fileName.csv" -Delimiter ';' |
Group-Object { $_.'OS.Type' + '_' + $_.Site + '.csv' } |
ForEach-Object { $_.Group | Export-Csv -NoTypeInformation $_.Name }
}
Your own answer shows alternative solutions that eliminate interference from the AV software.
Source of the issue was McAfee On-Access Scan which was scanning every file created. There are 3 ways to bypass the problem:
a) temporarily disable the whole AV / OAS module.
b) add powershell.exe to the OAS policies as a Low Risk process
c) collect all data in memory and create all files with Export-CSV, as a last step, as shown in the other answer.

Filtering data from CSV file with PowerShell

I have huge csv file where first line contains headers of the data. Because the file size I can't open it with excel or similar. I need to filter rows what I only need. I would want to create new csv file which contains only data where Header3 = "TextHere". Everything else is filtered away.
I have tried in PowerShell Get-Content Select-String | Out-File 'newfile.csv' but it lost header row and also messed up with the data putting data in to wrong fields. There is included empty fields in the data and I believe that is messing it. When I tried Get-Content -First or -Last data seemed to be in order.
I have no experience handling big data files or powershell before. Also other options besides PowerShell is also possible if it is free to use as "non-commercial use"
try like this (modify your delimiter if necessary):
import-csv "c:\temp\yourfile.csv" -delimiter ";" | where Header3 -eq "TextHere" | export-csv "c:\temp\result.csv" -delimiter ";" -notype

Comparing first line in multiple CSV

I have a folder containing about 130 .csv files that all appear to contain similar fields (column names). However, based on some of the file names, I am under the impression that some of the .CSV files may have slightly diffrent schemas (e.g., xxxxxx_new_format.cvs, xxxxx_version_2.csv). My thought is to copy the first line of each .csv into a text doc for comparison.
So I created the following script:
Get-Content "C:\*.csv" | ForEach-Object {
Select-Object -First 1 | Out-File "C:\compare.txt"
}
which seemed to go into an infinite loop.
How should I attack this problem? If there is a better method for comparison (i.e., I should be using python) please let me know.
try this
Get-ChildItem "C:\temp2\*.csv" |
%{[pscustomobject]#{FileName=$_.FullName;Header=gc $_.FullName -TotalCount 1}} |
group Header

Add values of array to specific place in csv file

I'm far away from being an expert in PowerShell, so I'll be my best to explain here.
I was able to add a column, but now I want to add stuff in a column (already there) using a separate script.
For example, the following CSV file:
WhenToBuyThings,ThingsToBuy
BuyNow ,Bed
BuyNow ,Desk
BuyNow ,Computer
BuyNow ,Food
BuyLater ,
BuyLater ,
BuyLater ,
I have the array:
$BuyStuffLater = "Books","Toys","ShinnyStuff"
So the end result of the file should look like this
BuyNow ,Bed
BuyNow ,Desk
BuyNow ,Computer
BuyNow ,Food
BuyLater ,Books
BuyLater ,Toys
BuyLater ,ShinnyStuff
Any help with how to do this in code would be much appreciated. Also, we can't use delimiter ",". Because in the real script some values will have commas.
I got it after a few hours of fiddling...
$myArray = "Books","Toys","ShinnyStuff"
$i = 0
Import-Csv "C:\Temp\test.csv" |
ForEach-Object {if($_.WhenToBuyThings -eq "BuyLater"){$_.ThingsToBuy = $myArray[$i];$i++}return $_} |
Export-Csv C:\Temp\testtemp.csv -NoTypeInformation
All is well now...
I am new to powershell, too. Here's what I found. This searches and returns all lines that fit. I'm not sure it can pipe.
$BuyStuffLater = "Books","Toys","ShinnyStuff"
$x = 0
Get-Content -Path .\mydata.txt | select-string -pattern "BuyLater" #searches and displays
# Im not sure about this piping. (| foreach {$_ + $BuyStuffLater[$x];$x++} | Set-Content .\outputfile.csv)
This filter will work, though I still have to work on the piping. The other answer might be better.
I don't see a point to iterating through each object to see if it is a WhenToBuyThings is "BuyLater". If anything what you are doing could be harmful if you run multiple passes adding to the list. It could remove previous things you wanted to by. If "Kidney Dialysis Machine" was listed as a "BuyLater" under WhenToBuyThings then you would overwrite it with dire consequences.
What we can do is build two lists and merge into new csv file. First list is your original file minus any entry where a "BuyLater" has a blank ThingsToBuy. The second list is an object array built from your $BuyStuffLater. Add these lists together and export.
Also there is zero need to worry about using a comma delimiter when using Export-CSV. The data is quoted so commas in data do not affect the data structure. If this was still a concern you could use -Delimiter ";". I noticed in your answer that you did not attempt to account for commas either (not that it matters based on what I just said).
$path = "C:\Temp\test.csv"
$ListOfItemsToBuy = "Books","Toys","ShinnyStuff: Big, ShinyStuff"
$BuyStuffLater = $ListOfItemsToBuy | ForEach-Object{
[pscustomobject]#{
WhenToBuyThings = "BuyLater"
ThingsToBuy = $_
}
}
$CurrentList = Import-Csv $path | Where-Object{$_.WhenToBuyThings -ne "BuyLater" -and ![string]::IsNullOrWhiteSpace($_.ThingsToBuy)}
$CurrentList + $BuyStuffLater | Export-Csv $path -NoTypeInformation
Since you have 3.0 we can use [pscustomobject]#{} to build the new object very easily. Combine both arrays simply by adding them together and export back to the original file.
You should notice I used slightly different input data. One includes a comma. I did that so you can see what the output file looks like.
"BuyLater","Books"
"BuyLater","Toys"
"BuyLater","ShinnyStuff: Big, ShinyStuff"

how to autofit columns of csv from powershell

I have powershell script which connects to database & exports result in csv file.
However there is one column of date which size needs to be manually increased after opening csv file.
Do we have some command/property which will make columns AutoFit?
export-csv $OutFile -NoTypeInformation
I can't export excel instead CSV, cause I don't have excell installed on my machine.
This is what I have tried latest.
$objTable | Select Store,RegNo,Date,#{L="Amount";E={($_.Amount).PadLeft(50," ")}},TranCount
$objTable | export-csv $OutFile -NoTypeInformation
But even after adding PadLeft() output is same, Date column is short in width (showing ###, need to increase value manually)
When you say you need to increase one of your column sizes all the comments were right about how everything is formatted based on the object content. If you really need the cells to be a certain length you need to change the data before it is exported. Using the string methods .PadLeft() and .PadRight() I think you will get what you need.
Take this example using output from Get-ChildItem which uses a calculated property to pad the "column" so that all the data takes up at least 20 characters.
$path = "C:\temp"
$results = Get-ChildItem $path
$results | Select LastWriteTime,#{L="Name";E={($_.Name).PadLeft(20," ")}} | Export-CSV C:\temp\file.csv -NoTypeInformation
If that was then exported the output file would look like this (Notice the whitespace):
"LastWriteTime","Name"
"2/23/2015 7:33:55 PM"," folder1"
"2/23/2015 7:48:02 PM"," folder2"
"2/23/2015 7:48:02 PM"," Folder3"
"1/8/2015 10:37:45 PM"," logoutput"