Filtering data from CSV file with PowerShell - powershell

I have huge csv file where first line contains headers of the data. Because the file size I can't open it with excel or similar. I need to filter rows what I only need. I would want to create new csv file which contains only data where Header3 = "TextHere". Everything else is filtered away.
I have tried in PowerShell Get-Content Select-String | Out-File 'newfile.csv' but it lost header row and also messed up with the data putting data in to wrong fields. There is included empty fields in the data and I believe that is messing it. When I tried Get-Content -First or -Last data seemed to be in order.
I have no experience handling big data files or powershell before. Also other options besides PowerShell is also possible if it is free to use as "non-commercial use"

try like this (modify your delimiter if necessary):
import-csv "c:\temp\yourfile.csv" -delimiter ";" | where Header3 -eq "TextHere" | export-csv "c:\temp\result.csv" -delimiter ";" -notype

Related

Powershell Converting Tab Delimited CSV to Comma delimited CSV without Quotes

We get a tab delimited CSV from COGNOS External system in a public folder. This fails to upload to Salesforce via Dataloader CLI.
com.salesforce.dataloader.exception.DataAccessRowException: Error
reading row #0: the number of data columns (98) exceeds the number of
columns in the header (97)
But if you open the csv in MS Excel, and save as a new CSV (UTF-8) and then pass it to data loader CLI it works without any issue.
The difference in EXCEL converted file seems to be it's Comma separated instead of Tab.
Then I tried to convert Original Tab Delimited CSV to Comma separated CSV using below command,
import-csv source.csv -delimiter "`t" | export-csv target.csv -notype
But the output of this has quotes, Data Loader now runs with the File, but imports nothing into Salesforce, it seems it's not able to identify field-names properly.
Then I tried below command to remove the double quotes,
import-csv source.csv -delimiter "`t" | export-csv target.csv -notype
(Get-Content target.csv) | Foreach-Object {$_ -replace '"', ''}|Out-File target.csv
But this resulted in an Index out of range error, which is not clear.
What would be the best approach to do this conversion for Data Loader CLI?
What can make this conversion same as EXCEL's conversion?
Highly appreciate Any suggestions, thoughts, help to achieve this.
Thanks!
SalesForce has strict rules for CSV files. Also, on this page it says that no more than 50000 records can be imported at one time.
Main thing here is that the file MUST be in UTF8 format.
The quotes around the values are needed.
This should do it (provided you do not have more than 50000 records in the Csv):
Import-Csv -Path 'source.csv' -Delimiter "`t" | Export-Csv -Path 'target.csv' -Encoding UTF8 -NoTypeInformation
(source.csv is the TAB-delimited file you receive from COGNOS)

How can I alternate column headers in a tab delimited file?

I have a tab delimited txt file and i need to switch first and second column names (without switching columns data). In other words I need to rename A(Id) to B(ExternalId) and B(ExternalId) to A(Id). Other columns in the file (other data) should stay unchanged. I'm very new in PowerShell, please advice. As I understand I need to use import/export csv cmdlet.
I tryed this, but it's not working the right way...
Import-Csv 'C:\original_users.txt' |
Select-Object Id, #{Name="ExternalId";Expression={$_."Id"}}; Select-Object ExternalId, #{Name="Id";Expression={$_."ExternalId"}} |
Export-Csv 'C:\changed_users.txt'
The Import-CSV and Export-CSV cmdlets have their strengths but this might not be one of them. The latter cmdlet would introduce quoting that might not be in your original file and that might not be desired.
Either way why not just do some text manipulation on the first line! Lets read in the file and and output the first lined, edited, and the remainder of the file. This sample uses a new location but you could easily write it back to the same file.
# Get the full file into a variable
$fullFile = Get-Content "c:\temp\mockdata.csv"
# Parse the first line into a column array
$columns = $fullFile[0].Split("`t")
# Rebuild the header by switching the columns order as desired.
$newHeader = ($columns[1],$columns[0] + ($columns | Select-Object -Skip 2)) -join "`t"
# Write the header back to file then the rest of the data.
$outputPath = "C:\somepath.txt"
$newHeader | Set-Content $outputPath
$fullFile | Select-Object -Skip 1 | Add-Content $outputPath
This also preserves the presence of other columns and their data.

How to export array to csv in powershell?

$x1 = (1,22,333,4444)
$x1 | export-csv 'd:\123.csv' -Force
Then I get this:
How do I Get a table like this?:
CSV's don't just accept arbitrary data properly, you can use | Out-File x.csv to dump them out on individual lines, and then read it back in with Import-Csv specifying headers, but a proper CSV needs headers when it is saved.
if you want to save it out properly you need to convert it into an object where the numbers are actually "named" within an object so powershell can create a valid CSV.
1,22,333,4444 | ForEach {
[PSCustomObject]#{Number = $_}
} | Export-Csv C:\++\123.csv -NoTypeInformation
-NoTypeInformation removes the #TYPE header.
that being said, Out-File is the only way it will match your 'expected' output table, you don't seem to be looking for a CSV here.
This will create a proper csv file with a header:
ConvertFrom-Csv (1,22,333,4444) -Header Number|Export-Csv .\123.csv -NoType
Loaded in Excel cell A1 will be Number
This will create a fake Csv accepted by Excel and returning your sample table.
(1,22,333,4444)|Set-Content .\234.csv

Using duplicate headers in Powershell .csv file

I have a .csv file and I want to import it into powershell then iterate through the file changing certain values. I then want the output to append to the original .csv file, so that the values have been updated.
My issue is that the .csv file has headers which aren't unique, and can't be changed as then it won't work in another program. Originally I defined my own headers in the powershell to get around this but then the output file has these new headers when it needs to have the old ones.
I have also tried ConvertFrom-Csv which means I can no longer access the columns I need to, so lots of runtime errors.
What would be ideal is to be able to use the defined column headers and then convert back to the original column headers. My current code is below:
$csvfile = Import-Csv C:\test.csv| Where-Object {$_.'3' -eq $classID} | ConvertFrom-Csv
foreach($record in $csvfile){
*do something*}
$csvfile | Export-Csv -path C:\test.csv -NoTypeInformation -Append
I've searched the web now for some hours and tried everything I've come across, to no avail.
Thanks in advance.
This is a somewhat hackish implementation but should work.
Remove all the headers as a single line and save it somewhere
Parse the new result-set (with the headers removed)
Add the line at the top when you are finished
A CSV is a comma delimited file, you don't have to treat it like structured data. Feel free to splice and dice as you want.
Since you know beforehand how many columns are in the input CSV file, you can import without the header and process internally. Example:
$columns = 78
Import-Csv "inputfile.csv" -Header (0..$($columns - 1)) | Select-Object -Skip 1 | ForEach-Object {
$row = $_
$outputObject = New-Object PSObject
0..$($columns- 1) | ForEach-Object {
$outputObject | Add-Member NoteProperty "Col$_" $row.$_
}
$outputObject
} | Export-Csv "outputfile.csv" -NoTypeInformation
This example generates new PSObjects and then outputs a new CSV file with generic column names (Col0, Col1, etc.).

how to autofit columns of csv from powershell

I have powershell script which connects to database & exports result in csv file.
However there is one column of date which size needs to be manually increased after opening csv file.
Do we have some command/property which will make columns AutoFit?
export-csv $OutFile -NoTypeInformation
I can't export excel instead CSV, cause I don't have excell installed on my machine.
This is what I have tried latest.
$objTable | Select Store,RegNo,Date,#{L="Amount";E={($_.Amount).PadLeft(50," ")}},TranCount
$objTable | export-csv $OutFile -NoTypeInformation
But even after adding PadLeft() output is same, Date column is short in width (showing ###, need to increase value manually)
When you say you need to increase one of your column sizes all the comments were right about how everything is formatted based on the object content. If you really need the cells to be a certain length you need to change the data before it is exported. Using the string methods .PadLeft() and .PadRight() I think you will get what you need.
Take this example using output from Get-ChildItem which uses a calculated property to pad the "column" so that all the data takes up at least 20 characters.
$path = "C:\temp"
$results = Get-ChildItem $path
$results | Select LastWriteTime,#{L="Name";E={($_.Name).PadLeft(20," ")}} | Export-CSV C:\temp\file.csv -NoTypeInformation
If that was then exported the output file would look like this (Notice the whitespace):
"LastWriteTime","Name"
"2/23/2015 7:33:55 PM"," folder1"
"2/23/2015 7:48:02 PM"," folder2"
"2/23/2015 7:48:02 PM"," Folder3"
"1/8/2015 10:37:45 PM"," logoutput"