Split .csv by comma but skip one comma (Powershell) - powershell

I am novice in Powershell and still haven't found a solution to my issue on stackoverflow but I still think PS is the best tool for this purpose.
I have a .csv file and it has to be splitted into separate columns by comma, but the first comma has to be skipped. It looks like this:
The goals is to have 3 columns: "Name, Surname (Link Personal)", "SSO", and "Access"
If I am not mistaken, the way to split it into every single comma would be:
import-csv \\file.csv | export-csv -Delimiter "," -path \\updated_file.csv

You can use a calculated property to concatenate the two first columns into one
Import-Csv -Path \\file.csv |
Select-Object -Property #{Name = 'Name, Surname (Link Personal)'; Expression= {"$($_.Name), $($_.'Surname (Link Personal)')"}}, SSO, Access |
Export-Csv -Path \\updated_file.csv -Delimiter ','
And BTW: When you post sample data you should post them as text formatted as code. So people willing to help you can copy and play around with them. ;-)

Related

From a CSV file get the file header and a portion of the file based on starting and ending line number parameters using PowerShell

So I have a very huge CSV file, the first line has the column headers. I want to keep the first line as a header and add a portion of the file from the file's mid-section or perhaps the end. I'm also trying to select only a few of the columns from the file. And finally, it would be great if the solution also changed the file delimiter from a comma to a tab.
I'm aiming for a solution that's a one-liner or perhaps 2?
Non-working Code version 30 ...
Get-Content -Tail 100 filename.csv | Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\filename_out.csv
I'm trying to get a better grip on PowerShell. So far, so good but I'm not quite there yet. But trying to solve such challenges are helping me (and hopefully others) build a good collection of coding idioms. (FYI - the boss is trying PowerShell due to our efforts so.)
OK thanks to iRon tip. Import-CSV defaults to comma separated, the Select-Object -Property get the columns I want, the select -Last gets the last 200 rows, and the Export-CSV changes the delimiter to a tab:
Import-Csv iarf.csv |
Select-Object -Property Id,Name,RecordTypeId,CreatedDate |
select -Last 200 |
Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\iarf100props6.csv
iRon provided the crucial pointer: Using Import-Csv rather than Get-Content allows you to retrieve arbitrary ranges from the original file as objects, if selected via Select-Object, and exporting these objects again via Export-Csv automatically includes a header line whose column names are the input objects' property names, as initially derived from the input file's header line.
In order to select an arbitrary range of rows, combine Select-Object's -Skip and -First parameters:
To only get rows from the beginning, use just -First $count:
To only get rows from the end, use just -Last $count
To get rows in a given range, use just -Skip $startRowMinus1 -First $rangeRowCount
For instance, the following command extracts rows 10 through 30:
Import-Csv iarf.csv |
Select-Object -Property Id,Name,RecordTypeId,CreatedDate -Skip 9 -First 20 |
Export-Csv -Delimiter "`t" -NoTypeInformation -Path .\iarf100props6.csv

How can I alternate column headers in a tab delimited file?

I have a tab delimited txt file and i need to switch first and second column names (without switching columns data). In other words I need to rename A(Id) to B(ExternalId) and B(ExternalId) to A(Id). Other columns in the file (other data) should stay unchanged. I'm very new in PowerShell, please advice. As I understand I need to use import/export csv cmdlet.
I tryed this, but it's not working the right way...
Import-Csv 'C:\original_users.txt' |
Select-Object Id, #{Name="ExternalId";Expression={$_."Id"}}; Select-Object ExternalId, #{Name="Id";Expression={$_."ExternalId"}} |
Export-Csv 'C:\changed_users.txt'
The Import-CSV and Export-CSV cmdlets have their strengths but this might not be one of them. The latter cmdlet would introduce quoting that might not be in your original file and that might not be desired.
Either way why not just do some text manipulation on the first line! Lets read in the file and and output the first lined, edited, and the remainder of the file. This sample uses a new location but you could easily write it back to the same file.
# Get the full file into a variable
$fullFile = Get-Content "c:\temp\mockdata.csv"
# Parse the first line into a column array
$columns = $fullFile[0].Split("`t")
# Rebuild the header by switching the columns order as desired.
$newHeader = ($columns[1],$columns[0] + ($columns | Select-Object -Skip 2)) -join "`t"
# Write the header back to file then the rest of the data.
$outputPath = "C:\somepath.txt"
$newHeader | Set-Content $outputPath
$fullFile | Select-Object -Skip 1 | Add-Content $outputPath
This also preserves the presence of other columns and their data.

Learning PowerShell. Create usernames no longer than 8 characters and check for collision

I'm learning powershell right now.
I need to import a CSV like this:
lastname,firstname
lastname,firstname
lastname,firstname
etc
Then create a list of usernames no longer then 8 characters and check for collisions.
I have found bits and pieces of scripting around but not sure how to tie it all together.
I use Import-Csv to import my file.csv:
$variablename = import-csv C:\path\to\file.csv
but then I am not sure if I just import it into an array or not. I am not familiar with how for loops work in powershell exactly.
Any direction? Thanks.
There are a couple of concepts that are central to understanding PowerShell. Firstly, remember that you are always working with objects. So after importing your CSV file, your $variablename will refer to a collection of sub-objects.
Secondly, you can use the PowerShell pipeline to send the output of one cmdlet to the input of another. Some cmdlets will understand if you send them a collection, and automatically process each row.
If think what you're looking for though is the foreach-object cmdlet, which will allow you to run code against each item in the collection. Code inside the foreach-object block can refer to the $_ automatic variable which will contain the current object.
Assuming your CSV file is well formatted and has a header row with the column names, you can refer to each column by name e.g. $_.lastname & $_.firstname.
To put it all together:
import-csv C:\path\to\file.csv |
foreach-object {
write-host "Processing: $($_.lastname), $($_.firstname)"
# logic here to calculate username and create AD account
}
PowerShell can have a bit of a learning curve if you are coming from a different scripting environment. Here are a couple of resources that I've found helpful:
PowerShell 'gotchas' http://www.rlmueller.net/PSGotchas.htm
Keith Hill's Effective PowerShell: https://rkeithhill.wordpress.com/2009/03/08/effective-windows-powershell-the-free-ebook/
Also, check out the Technet Script Center, where there are many hundreds of Active Directory scripts. https://technet.microsoft.com/en-us/scriptcenter/bb410849.aspx
The script below should help you grasp a few concepts on how to work with csvs and manipulate data using PowerShell.
# the code below uses a 'here string' to mimic the import of a csv.
$users = #'
smith,b
smith,bob
smith,bobby
smith,sonny
smithson,john
smithson,jane
smithers,rob
'# -split "`r*`n"
$users |
ConvertFrom-Csv -Header 'surname','firstname' |
Select-Object #{Name='username'; Expression={"$($_.surname)$($_.firstname) "}}, surname, firstname |
Group-Object { $_.username.Substring(0,8).Trim() } |
Select-Object #{Name='username'; Expression={$_.Name}}, Count |
Format-Table -AutoSize
The $users | line takes the list of $users and pipes into the next command.
The ConvertFrom-Csv -Header... line converts the string into a csv.
The Select-Object #{Name... line creates an expression alias, which concatenates surname+forename. You'll notice the extra 8 spaces we append to the end of the string so we know we will have at least 8 characters in the string.
The Group-Object {... line groups the username, using the first 8 characters, if available. The .Trim() gets rid of any trailing spaces.
The Select-Object #{Name='username'... line takes the Name field from the group-object and renames to username and also shows the count from the grouping operation.
The Format-Table -AutoSize line is purely for output formatting to the console and gives you an output like the one below.
username Count
-------- -----
smithb 1
smithbob 2
smithson 3
smithers 1
An amended version of the above code, which you can use on your real csv. Change the surname, firstname column names to suit your csv.
# you would use the code below, to import your list of names
# uncomment the `# -Header surname,firstname` bit if your csv has no headers
$users = Import-Csv -Path 'c:\path\to\names.csv' # -Header surname,firstname
$users |
Select-Object #{Name='username'; Expression={"$($_.surname)$($_.firstname) "}}, surname, firstname |
Group-Object { $_.username.Substring(0,8).Trim() } |
Select-Object #{Name='username'; Expression={$_.Name}}, Count

Reformat column names in a csv with PowerShell

Question
How do I reformat an unknown CSV column name according to a formula or subroutine (e.g. rename column " Arbitrary Column Name " to "Arbitrary Column Name" by running a trim or regex or something) while maintaining data?
Goal
I'm trying to more or less sanitize columns (the names) in a hand-produced (or at least hand-edited) csv file that needs to be processed by an existing PowerShell script. In this specific case, the columns have spaces that would be removed by a call to [String]::Trim(), or which could be ignored with an appropriate regex, but I can't figure a way to call or use those techniques when importing or processing a CSV.
Short Background
Most files and columns have historically been entered into the CSV properly, but recently a few columns were being dropped during processing; I determined it was because the files contained a space (e.g., Select-Object was being told to get "RFC", but Import-CSV retrieved "RFC ", so no matchy-matchy). Telling the customer to enter it correctly by hand (though preferred and much simpler) is not an option in this case.
Options considered
I could manually process the text of the file, but that is a messy and error prone way to re-invent the wheel. I wonder if there's a syntax with Select-Object that would allow a softer match for column names, but I can't find that info.
The closest I have come conceptually is using a calculated property in the call to Select-Object to rename the column, but I can only find ways to rename a known column to another known column. So, this would require enumerating the columns and matching them exactly (preferred) or a softer match (like comparing after trimming or matching via regex as a fallback) with expected column names, then creating a collection of name mappings to use in constructing calculated properties from that information to select into a new object.
That seems like it would work, but more it's work than I'd prefer, and I can't help but hope that there's a simpler way I haven't been able to find via Google. Maybe I should try Bing?
Sample File
Let's say you have a file.csv like this:
" RFC "
"1"
"2"
"3"
Code
Now try to run the following:
$CSV = Get-Content file.csv -First 2 | ConvertFrom-Csv
$FixedHeaders = $CSV.PSObject.Properties.Name.Trim(' ')
Import-Csv file.csv -Header $FixedHeaders |
Select-Object -Skip 1 -Property RFC
Output
You will get this output:
RFC
---
1
2
3
Explanation
First we use Get-Content with parameter -First 2 to get the first two lines. Piping to ConvertFrom-Csv will allow us to access the headers with PSObject.Properties.Name. Use Import-Csv with the -Header parameter to use the trimmed headers. Pipe to Select-Object and use -Skip 1 to skip the original headers.
I'm not sure about comparisons in terms of efficiency, but I think this is a little more hardened, and imports the CSV only once. You might be able to use #lahell's approach and Get-Content -raw, but this was done and it works, so I'm gonna leave it to the community to determine which is better...
#import the CSV
$rawCSV = Import-Csv $Path
#get actual header names and map to their reformatted versions
$CSVColumns = #{}
$rawCSV |
Get-Member |
Where-Object {$_.MemberType -eq "NoteProperty"} |
Select-Object -ExpandProperty Name |
Foreach-Object {
#add a mapping to the original from a trimmed and whitespace-reduced version of the original
$CSVColumns.Add(($_.Trim() -replace '(\s)\s+', '$1'), "$_")
}
#Create the array of names and calculated properties to pass to Select-Object
$SelectColumns = #()
$CSVColumns.GetEnumerator() |
Foreach-Object {
$SelectColumns += {
if ($CSVColumns.values -contains $_.key) {$_.key}
else { #{Name = $_.key; Expression = $CSVColumns[$_.key]} }
}
}
$FormattedCSV = $rawCSV |
Select-Object $SelectColumns
This was hand-copied to a computer where I don't have the rights to run it, so there might be an error - I tried to copy it correctly
You can use gocsv https://github.com/DataFoxCo/gocsv to see the headers of the csv, you can then rename the headers, behead the file, swap columns, join, merge, any number of transformations you want

I need to hash (obfuscate) a column of data in a CSV file. Script preferred

I have a pipe-delimited text file with a header row. (I said CSV in the question to make it a a bit more immediately understandable ... I imagine most solutions would be applicable to either format.)
The file looks like this:
COLUMN1|COLUMN2|COLUMN3|COLUMN4|...|
Field1|Field2|Field3|Field4|...|
...
I need to obscure the data in (for example) columns 3 and 9, without affecting any of the other entries in the file.
I want to do this using a hashing algorithm like SHA1 or MD5, so that the same strings will resove to the same hash values anywhere they are encountered.
EDIT - Why I want to do this
I need to send some data to a third party, and certain columns contain sensitive information (e.g. customer names). I need the file to be complete, and where a string is replaced, I need it to be done in the same way every time it is encountered (so that any mapping or grouping remains). It does not need military encryption, just to be difficult to reverse. As I need to to this intermittently, a scripted solution would be ideal.
/EDIT
What is the easiest way to achieve this using a command line tool or script?
By preference, I would like a batch script or PowerShell script, since that does not require any additional software to achieve...
Try
(Import-Csv .\my.csv -delimiter '|' ) | ForEach-Object{
$_.column3 = $_.column3.gethashcode()
$_.column4 = $_.column4.gethashcode()
$_
} | Export-Csv .\myobfuscated.csv -NoTypeInformation -delimiter '|'
$md5 = new-object -TypeName Security.Cryptography.MD5CryptoServiceProvider
$utf8 = new-object -TypeName Text.UTF8Encoding
import-csv original.csv -delimiter '|' |
foreach {
$_.Column3 = [BitConverter]::ToString($md5.ComputeHash($utf8.GetBytes($_.Column3)))
$_.Column9 = [BitConverter]::ToString($md5.ComputeHash($utf8.GetBytes($_.Column9)))
$_
} |
export-csv encrypted.csv -delimiter '|' -noTypeInformation