We get a tab delimited CSV from COGNOS External system in a public folder. This fails to upload to Salesforce via Dataloader CLI.
com.salesforce.dataloader.exception.DataAccessRowException: Error
reading row #0: the number of data columns (98) exceeds the number of
columns in the header (97)
But if you open the csv in MS Excel, and save as a new CSV (UTF-8) and then pass it to data loader CLI it works without any issue.
The difference in EXCEL converted file seems to be it's Comma separated instead of Tab.
Then I tried to convert Original Tab Delimited CSV to Comma separated CSV using below command,
import-csv source.csv -delimiter "`t" | export-csv target.csv -notype
But the output of this has quotes, Data Loader now runs with the File, but imports nothing into Salesforce, it seems it's not able to identify field-names properly.
Then I tried below command to remove the double quotes,
import-csv source.csv -delimiter "`t" | export-csv target.csv -notype
(Get-Content target.csv) | Foreach-Object {$_ -replace '"', ''}|Out-File target.csv
But this resulted in an Index out of range error, which is not clear.
What would be the best approach to do this conversion for Data Loader CLI?
What can make this conversion same as EXCEL's conversion?
Highly appreciate Any suggestions, thoughts, help to achieve this.
Thanks!
SalesForce has strict rules for CSV files. Also, on this page it says that no more than 50000 records can be imported at one time.
Main thing here is that the file MUST be in UTF8 format.
The quotes around the values are needed.
This should do it (provided you do not have more than 50000 records in the Csv):
Import-Csv -Path 'source.csv' -Delimiter "`t" | Export-Csv -Path 'target.csv' -Encoding UTF8 -NoTypeInformation
(source.csv is the TAB-delimited file you receive from COGNOS)
I have folder info for all user folders. It is dumped out to a CSV file as follows:
Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29
We are unable to work with the data as is due to the thousands separator in the 3rd column. I could run the report scripts again, but we have a lot of file servers and a large number of users on one in particular, so running it again is very time consuming. The reason the commas are there is that the data was written as a string not a number.
I can import and convert, the only problem is that any number over 1000 will be wrong and then all other data is 1 column off. I would like to replace any comma between 2 numbers. It doesn't seem it would be that hard to do with PowerShell, but I am not having any luck finding anything.
If you assume that columns of data are comma plus space separated and your numbers have no spaces, you can use the -replace operator for this.
$line = 'Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29'
$line -replace '(?<=\d),(?=\d)'
If you are reading the data from a file, you can read the data with Get-Content, replace your data, and update the file with Set-Content.
(Get-Content file.csv) -replace '(?<=\d),(?=\d)' | Set-Content file.csv
If the file is large, you can utilize the faster switch statement.
$data = switch -regex -file file.csv {
'(?<=\d),(?=\d)' { $_ -replace '(?<=\d),(?=\d)' }
default {$_}
}
$data | Set-Content file.csv
Explanation:
(?<=\d) uses a positive lookbehind assertion (?<=) that matches a single digit \d.
(?=\d) uses a positive lookahead assertion (?=) that matches a single digit. You could replace this with (?=\d{3}) to match 3 consecutive digits after the comma.
Since you want to replace the target comma with empty string, you do not need a replacement string.
Typically, it would be best to stick with commands that work with CSV data or files. However, if your data contains commas and you aren't qualifying your text, it may be difficult to distinguish between data and delimiters. If you have a clear way of making that distinction, you are better off using ConvertFrom-Csv for already read data or Import-Csv for files. You will need to define headers either in the files or in the command.
EDIT
It was my oversight that the , in the dataset is not delimited, which causes this answer to not work as expected as the comma is seen as a column separator when parsing the CSV. I'm going to leave it as it does explain how to generally manipulate the data as you'd expect, if the column data were escaped property. However, #AdminOfThings' answer below should work for your specific case here, and will fix the erroneous defined column without relying on parsing the CSV content as a CSV first.
Import the data using Import-Csv, then remove any , in the third column. This assumes that you have no values where , is the decimal separator:
If you have headers in the CSV, you won't need to define header names or get fancy with writing the CSV back out:
Import-Csv -Path \path\to\file.csv | Foreach-Object {
$_.ColumnName = $_.ColumnName -replace ','
} | Export-Csv -NoTypeInformation -Path \path\to\file.csv
The way this works is that we import the CSV as an operable PSCustomObject, then for each line we take whatever the column name with the size is and remove the , from it. Finally, we export the modified PSCustomObject back out to the original CSV.
If you don't have headers, it gets a little trickier since we have to define temporary headers, but Export-Csv doesn't have an option to skip writing out headers:
Import-Csv -Path \path\to\file.csv -Headers Col1, Col2, Col3, Col4, Col5, Col6, Col7 |
Foreach-Object {
$_.Col3 = $_.Col3 -replace ','
} | ConvertTo-Csv | Select-Object -Skip 1 |
Set-Content -Path \path\to\file.csv
This does the same thing as the first block of code, but since we don't want to export the temporary headers, we have to get creative. First, note we reference the target column with the temporary header name. Instead of piping the modified CSV object right to Export-Csv, first we want to convert the object to CSV using ConvertTo-Csv. We then use Select-Object to skip the first line of the converted CSV text, which is the header, so we just have the row data and column values. Finally, we use Set-Content to write the CSV text without the header back to the original file.
I have huge csv file where first line contains headers of the data. Because the file size I can't open it with excel or similar. I need to filter rows what I only need. I would want to create new csv file which contains only data where Header3 = "TextHere". Everything else is filtered away.
I have tried in PowerShell Get-Content Select-String | Out-File 'newfile.csv' but it lost header row and also messed up with the data putting data in to wrong fields. There is included empty fields in the data and I believe that is messing it. When I tried Get-Content -First or -Last data seemed to be in order.
I have no experience handling big data files or powershell before. Also other options besides PowerShell is also possible if it is free to use as "non-commercial use"
try like this (modify your delimiter if necessary):
import-csv "c:\temp\yourfile.csv" -delimiter ";" | where Header3 -eq "TextHere" | export-csv "c:\temp\result.csv" -delimiter ";" -notype
I have a tab delimited txt file and i need to switch first and second column names (without switching columns data). In other words I need to rename A(Id) to B(ExternalId) and B(ExternalId) to A(Id). Other columns in the file (other data) should stay unchanged. I'm very new in PowerShell, please advice. As I understand I need to use import/export csv cmdlet.
I tryed this, but it's not working the right way...
Import-Csv 'C:\original_users.txt' |
Select-Object Id, #{Name="ExternalId";Expression={$_."Id"}}; Select-Object ExternalId, #{Name="Id";Expression={$_."ExternalId"}} |
Export-Csv 'C:\changed_users.txt'
The Import-CSV and Export-CSV cmdlets have their strengths but this might not be one of them. The latter cmdlet would introduce quoting that might not be in your original file and that might not be desired.
Either way why not just do some text manipulation on the first line! Lets read in the file and and output the first lined, edited, and the remainder of the file. This sample uses a new location but you could easily write it back to the same file.
# Get the full file into a variable
$fullFile = Get-Content "c:\temp\mockdata.csv"
# Parse the first line into a column array
$columns = $fullFile[0].Split("`t")
# Rebuild the header by switching the columns order as desired.
$newHeader = ($columns[1],$columns[0] + ($columns | Select-Object -Skip 2)) -join "`t"
# Write the header back to file then the rest of the data.
$outputPath = "C:\somepath.txt"
$newHeader | Set-Content $outputPath
$fullFile | Select-Object -Skip 1 | Add-Content $outputPath
This also preserves the presence of other columns and their data.
I have large set of tables that I am exporting from one server. There are particular description attributes that store a boat load of special characters that are clashing with my delimiter specified on the code. I've tried delimiting by pipes, commas, semi colons, brackets, curly brackets, arithmetic signs, even the curly sign next to the 1, and they're all being used in the data. I thought maybe the best way is to delimit by a combined delimiter such as {] or {!] something that won't be used in the data. My issue is powershell export-csv only allow char(1) delimiters, how do you force it to take multiple?
This is what I have:
export-csv $raw_file_path -delimiter "{"
I want to do this:
export-csv $raw_file_path -delimiter "{]"
I figured it out,in my previous powershell code, I stripped the text qualifiers "", but I actually need them in order for the code to work with a single delimiter.
This is what I was doing:
$table = $set.tables[0] | export-csv $raw_file_path -delimiter "{"
#clean "" out of file
(Get-Content $raw_file_path)|%{$_ -replace '"',""}|Out-File FilePath $file_path -Force -Encoding ascii
This is what I should of been doing:
$table = $set.tables[0] | export-csv $raw_file_path -delimiter "{"
#clean "" out of file
(Get-Content $raw_file_path)|Out-File FilePath $file_path -Force -Encoding ascii
Once the import the csv in SQL, I can set a function the can pull the data based on the text qualifier and delimiter. Since the data is wrapped around the text qualifier, it doen't pull the special characters within my data string.