From a CSV file get the file header and a portion of the file based on starting and ending line number parameters using PowerShell - powershell

So I have a very huge CSV file, the first line has the column headers. I want to keep the first line as a header and add a portion of the file from the file's mid-section or perhaps the end. I'm also trying to select only a few of the columns from the file. And finally, it would be great if the solution also changed the file delimiter from a comma to a tab.
I'm aiming for a solution that's a one-liner or perhaps 2?
Non-working Code version 30 ...
Get-Content -Tail 100 filename.csv | Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\filename_out.csv
I'm trying to get a better grip on PowerShell. So far, so good but I'm not quite there yet. But trying to solve such challenges are helping me (and hopefully others) build a good collection of coding idioms. (FYI - the boss is trying PowerShell due to our efforts so.)

OK thanks to iRon tip. Import-CSV defaults to comma separated, the Select-Object -Property get the columns I want, the select -Last gets the last 200 rows, and the Export-CSV changes the delimiter to a tab:
Import-Csv iarf.csv |
Select-Object -Property Id,Name,RecordTypeId,CreatedDate |
select -Last 200 |
Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\iarf100props6.csv

iRon provided the crucial pointer: Using Import-Csv rather than Get-Content allows you to retrieve arbitrary ranges from the original file as objects, if selected via Select-Object, and exporting these objects again via Export-Csv automatically includes a header line whose column names are the input objects' property names, as initially derived from the input file's header line.
In order to select an arbitrary range of rows, combine Select-Object's -Skip and -First parameters:
To only get rows from the beginning, use just -First $count:
To only get rows from the end, use just -Last $count
To get rows in a given range, use just -Skip $startRowMinus1 -First $rangeRowCount
For instance, the following command extracts rows 10 through 30:
Import-Csv iarf.csv |
Select-Object -Property Id,Name,RecordTypeId,CreatedDate -Skip 9 -First 20 |
Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\iarf100props6.csv

Related

Filtering data from CSV file with PowerShell

I have huge csv file where first line contains headers of the data. Because the file size I can't open it with excel or similar. I need to filter rows what I only need. I would want to create new csv file which contains only data where Header3 = "TextHere". Everything else is filtered away.
I have tried in PowerShell Get-Content Select-String | Out-File 'newfile.csv' but it lost header row and also messed up with the data putting data in to wrong fields. There is included empty fields in the data and I believe that is messing it. When I tried Get-Content -First or -Last data seemed to be in order.
I have no experience handling big data files or powershell before. Also other options besides PowerShell is also possible if it is free to use as "non-commercial use"
try like this (modify your delimiter if necessary):
import-csv "c:\temp\yourfile.csv" -delimiter ";" | where Header3 -eq "TextHere" | export-csv "c:\temp\result.csv" -delimiter ";" -notype

Powershell Import-CSV: either reorder columns with no headers or export-CSV with no header

I have a CSV file with no headers in which I need to add a column between the first and second column. I ultimately want to export the file with no headers to the same file name (hence why I'm using Import-CSV vs Get-Content).
Here is the script which adds the new column to the existing file:
(Import-CSV U:\To_Delete\Layer_search\WB_layers-mod.csv) |
Select-Object *,#{Expression={'FALSE'}} |
Export-Csv U:\To_Delete\Layer_search\WB_layers-mod.csv -NoTypeInformation
This adds the a new column as a third column. In doing my googling, I could find no way to reorder the columns without having a header row, so I modified the script as follows:
(Import-CSV U:\To_Delete\Layer_search\WB_layers-mod.csv -header H1, H2)|
Select-Object *,#{Name='H3';Expression={'FALSE'}} |
Select-Object -Property H1, H3, H2 |
Export-Csv U:\To_Delete\Layer_search\WB_layers-mod.csv -NoTypeInformation
This puts the columns in the correct order, but now I want to export the CSV without the header row I added. However, the only suggestions I'm finding are using Get-Content, Convert-to-CSV, and using Skip 1. That won't work for me because I'm only running PS3 and that option is not available until PS5, and I want to save back to the original file name, which you can't do with Get-Content.
I'm probably making this a lot harder than it needs to be, but I would appreciate any suggestions.
Thanks!
Edited to add:
For clarification, here is an example of the CSV file I am importing:
abc,1/1/2012
def,1/2/2012
ghi,1/3/2012
I want to add a column between the first and second columns:
abc,FALSE,1/1/2012
def,FALSE,1/2/2012
ghi,TRUE,1/3/2012
Hopefully this makes more sense.
I'd read Headers H1,H3 and calculate H2 to have them in the proper order directly.
(Import-CSV .\sample.csv -header H1, H3)|
Select-Object H1,#{Name='H2';Expression={'FALSE'}},H3 |
ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1 | Set-Content New.csv
Sample output
"abc","FALSE","1/1/2012"
"def","FALSE","1/2/2012"
"ghi","FALSE","1/3/2012"

How can I alternate column headers in a tab delimited file?

I have a tab delimited txt file and i need to switch first and second column names (without switching columns data). In other words I need to rename A(Id) to B(ExternalId) and B(ExternalId) to A(Id). Other columns in the file (other data) should stay unchanged. I'm very new in PowerShell, please advice. As I understand I need to use import/export csv cmdlet.
I tryed this, but it's not working the right way...
Import-Csv 'C:\original_users.txt' |
Select-Object Id, #{Name="ExternalId";Expression={$_."Id"}}; Select-Object ExternalId, #{Name="Id";Expression={$_."ExternalId"}} |
Export-Csv 'C:\changed_users.txt'
The Import-CSV and Export-CSV cmdlets have their strengths but this might not be one of them. The latter cmdlet would introduce quoting that might not be in your original file and that might not be desired.
Either way why not just do some text manipulation on the first line! Lets read in the file and and output the first lined, edited, and the remainder of the file. This sample uses a new location but you could easily write it back to the same file.
# Get the full file into a variable
$fullFile = Get-Content "c:\temp\mockdata.csv"
# Parse the first line into a column array
$columns = $fullFile[0].Split("`t")
# Rebuild the header by switching the columns order as desired.
$newHeader = ($columns[1],$columns[0] + ($columns | Select-Object -Skip 2)) -join "`t"
# Write the header back to file then the rest of the data.
$outputPath = "C:\somepath.txt"
$newHeader | Set-Content $outputPath
$fullFile | Select-Object -Skip 1 | Add-Content $outputPath
This also preserves the presence of other columns and their data.

Using duplicate headers in Powershell .csv file

I have a .csv file and I want to import it into powershell then iterate through the file changing certain values. I then want the output to append to the original .csv file, so that the values have been updated.
My issue is that the .csv file has headers which aren't unique, and can't be changed as then it won't work in another program. Originally I defined my own headers in the powershell to get around this but then the output file has these new headers when it needs to have the old ones.
I have also tried ConvertFrom-Csv which means I can no longer access the columns I need to, so lots of runtime errors.
What would be ideal is to be able to use the defined column headers and then convert back to the original column headers. My current code is below:
$csvfile = Import-Csv C:\test.csv| Where-Object {$_.'3' -eq $classID} | ConvertFrom-Csv
foreach($record in $csvfile){
*do something*}
$csvfile | Export-Csv -path C:\test.csv -NoTypeInformation -Append
I've searched the web now for some hours and tried everything I've come across, to no avail.
Thanks in advance.
This is a somewhat hackish implementation but should work.
Remove all the headers as a single line and save it somewhere
Parse the new result-set (with the headers removed)
Add the line at the top when you are finished
A CSV is a comma delimited file, you don't have to treat it like structured data. Feel free to splice and dice as you want.
Since you know beforehand how many columns are in the input CSV file, you can import without the header and process internally. Example:
$columns = 78
Import-Csv "inputfile.csv" -Header (0..$($columns - 1)) | Select-Object -Skip 1 | ForEach-Object {
$row = $_
$outputObject = New-Object PSObject
0..$($columns- 1) | ForEach-Object {
$outputObject | Add-Member NoteProperty "Col$_" $row.$_
}
$outputObject
} | Export-Csv "outputfile.csv" -NoTypeInformation
This example generates new PSObjects and then outputs a new CSV file with generic column names (Col0, Col1, etc.).

Removing Header and Footer from imported .csv

I have 3 .csv files that I am combining into one. This bit of code works:
Get-ChildItem 'C:\Scripts\testing\csvStuffer\temp\Individual.*.csv' |
ForEach-Object {Import-Csv $_} |
Export-Csv -NoTypeInformation 'C:\Scripts\testing\csvStuffer\temp\MergedCsvFiles.csv'
The problem is that each .csv file has a header and a footer.
I do not want to keep the header or footer from any of the files.
Any suggestions of what I need to add to the above code to remove the headers and footers???
Thanks!
This is not the most elegant solution but it worked for my test files.
Get-ChildItem 'C:\Scripts\testing\csvStuffer\temp\Individual.*.csv' |
ForEach-Object {
$filecontent = get-content $_ | select-object -skip 1;
$filecontent | select -First $($filecontent.length -1) | Set-Content -Path $_;
};
Skipping the first line is easy with select-object. Dropping the last line requires a bit more work, but since get-content returns an array of lines, you can just grab all but the last element in that array.
Looks like alroc already gave an answer, but since I already had it written up I figured I'd post this too. It doesn't load it all into a variable, it just reads each file, strips the first and last line of the current file, and then pipes to out-file with -append on it.
gci 'C:\Scripts\testing\csvStuffer\temp\Individual.*.csv' | %{
$(gc $_.fullname|skip 1)|select -First ($(gc $_.fullname|skip 1).count-1)
}|Out-File -Append 'C:\Scripts\testing\csvStuffer\temp\MergedCsvFiles.csv'