Powershell: Import-csv, rename all headers - powershell

In our company there are many users and many applications with restricted access and database with evidence of those accessess. I don´t have access to that database, but what I do have is automatically generated (once a day) csv file with all accessess of all my users. I want them to have a chance to check their access situation so i am writing a simple powershell script for this purpose.
CSV:
user;database1_dat;database2_dat;database3_dat
john;0;0;1
peter;1;0;1
I can do:
import-csv foo.csv | where {$_.user -eq $user}
But this will show me original ugly headres (with "_dat" suffix). Can I delete last four characters from every header which ends with "_dat", when i can´t predict how many headers will be there tomorrow?
I am aware of calculated property like:
Select-Object #{ expression={$_.database1_dat}; label='database1' }
but i have to know all column names for that, as far as I know.
Am I convicted to "overingeneer" it by separate function and build whole "calculated property expression" from scratch dynamically or is there a simple way i am missing?
Thanks :-)

Assuming that file foo.csv fits into memory as a whole, the following solution performs well:
If you need a memory-throttled - but invariably much slower - solution, see Santiago Squarzon's helpful answer or the alternative approach in the bottom section.
$headerRow, $dataRows = (Get-Content -Raw foo.csv) -split '\r?\n', 2
# You can pipe the result to `where {$_.user -eq $user}`
ConvertFrom-Csv ($headerRow -replace '_dat(?=;|$)'), $dataRows -Delimiter ';'
Get-Content -Raw reads the entire file into memory, which is much faster than reading it line by line (the default).
-split '\r?\n', 2 splits the resulting multi-line string into two: the header line and all remaining lines.
Regex \r?\n matches a newline (both a CRLF (\r\n) and a LF-only newline (\n))
, 2 limits the number of tokens to return to 2, meaning that splitting stops once the 1st token (the header row) has been found, and the remainder of the input string (comprising all data rows) is returned as-is as the last token.
Note the $null as the first target variable in the multi-assignment, which is used to discard the empty token that results from the separator regex matching at the very start of the string.
$headerRow -replace '_dat(?=;|$)'
-replace '_dat(?=;|$)' uses a regex to remove any _dat column-name suffixes (followed by a ; or the end of the string); if substring _dat only ever occurs as a name suffix (not also inside names), you can simplify to -replace '_dat'
ConvertFrom-Csv directly accepts arrays of strings, so the cleaned-up header row and the string with all data rows can be passed as-is.
Alternative solution: algorithmic renaming of an object's properties:
Note: This solution is slow, but may be an option if you only extract a few objects from the CSV file.
As you note in the question, use of Select-Object with calculated properties is not an option in your case, because you neither know the column names nor their number in advance.
However, you can use a ForEach-Object command in which you use .psobject.Properties, an intrinsic member, for reflection on the input objects:
Import-Csv -Delimiter ';' foo.csv | where { $_.user -eq $user } | ForEach-Object {
# Initialize an aux. ordered hashtable to store the renamed
# property name-value pairs.
$renamedProperties = [ordered] #{}
# Process all properties of the input object and
# add them with cleaned-up names to the hashtable.
foreach ($prop in $_.psobject.Properties) {
$renamedProperties[($prop.Name -replace '_dat(?=.|$)')] = $prop.Value
}
# Convert the aux. hashtable to a custom object and output it.
[pscustomobject] $renamedProperties
}

You can do something like this:
$textInfo = (Get-Culture).TextInfo
$headers = (Get-Content .\test.csv | Select-Object -First 1).Split(';') |
ForEach-Object {
$textInfo.ToTitleCase($_) -replace '_dat'
}
$user = 'peter'
Get-Content .\test.csv | Select-Object -Skip 1 |
ConvertFrom-Csv -Delimiter ';' -Header $headers |
Where-Object User -EQ $user
User Database1 Database2 Database3
---- --------- --------- ---------
peter 1 0 1
Not super efficient but does the trick.

Related

Powershell CSV removing rows and then remove from whole file if A column matches

I've created the following small script to remove 2++ strings from a CSV.
Each row is a log of a given person and a answer they give.
The CSV has X columns.
The column named FIRST identifies the person.
What I need to do is when I delete a row matching the answer, I also need to delete the person from the whole CSV if it had one of the two strings.
What I've made so far, removes the row of people having the answers but the person is still left in the overall CSV with other answers. I want to remove the person fully if the questions have been answered.
Can somebody help me out with making the addition or changes to make this happen?
INPUT File
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
SCRIPT
$CSVfile = "C:\Temp\Test\Test.csv"
$CSVfile_filtered = "C:\Temp\Test\Test.csv"
$regex001 = "AA"
$regex002 = "BB"
$filterArray = #($regex001,$regex002)
Get-Content $CSVfile | Select-String -pattern $filterArray -notmatch | Set-Content $CSVfile_filtered
The file should then remove 10005, 10011 and both lines of 10007. But my version only removes one of the 10007 since it only matches one of the two patterns.
Using more of PowerShell's built-in cmdlets can make this a little easier to manage.
# Assuming searching only properties ADDR and ADDR2
$filter = 'AA','BB'
# Grouping by First and Last values to easily remove duplicates
# -match uses regex so | is needed for an OR of multiple items
Import-Csv Test.csv | Group-Object First,Last |
Where {!($_.Group.ADDR,$_.Group.ADDR2 -match ($filter -join '|'))} |
Foreach-Object Group |
Export-Csv output.csv -NoType
You would think strictly using text manipulation would be simpler, but it adds other scenarios to consider:
You will need to track users that have duplicate entries and potentially back track to remove them (if not grouping). This could require reading the file contents twice.
Your header row could match the string you want to filter so you will need to add it to the output if filtering removes it.
Keeping the scenarios above in mind, you can still use a grouping concept:
$filter = 'AA','BB'
$file = Get-Content Test.csv
# $file[0] is the header row
# -split string uses regex and splits at the second comma
# -split results' [0] element is First,Last values
$file[0],($file |
Select-Object -Skip 1 |
Group-Object {($_ -split '(?<=^[^,]*,[^,]*),')[0]} |
where {!($_.Group -match ($filter -join '|'))} |
Foreach-Object Group) | Set-Content output.csv
If I got it right you could do something like this:
$SearchPattern = 'AA', 'BB'
$INPUTCSV = #'
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
'# | ConvertFrom-Csv
$ActualSearchPattern =
$INPUTCSV |
Where-Object {
$_.LAST -in $SearchPattern -or
$_.ADDR -in $SearchPattern -or
$_.ADDR2 -in $SearchPattern -or
$_.GENDER -in $SearchPattern -or
$_.HOME -in $SearchPattern -or
$_.Work -in $SearchPattern
} |
Select-Object -ExpandProperty FIRST
$INPUTCSV |
Where-Object -Property FIRST -NotIn -Value $ActualSearchPattern |
Format-Table -AutoSize
There might be more sophisticated or more elegant ways but I cannot think about one at the moment. ;-)
There is a nice PowerShell module you can use to manipulate the content of a csv or xlsx file: ImportExcel
This give you a lot of options to manipulate the sheets, columns etc.

Cleanup huge text file containing domain

I have a database that contains a log of domains listed in the following matter:
.youtube.com
.ziprecruiter.com
0.etsystatic.com
0.sparkpost.com
00.mail.ne1.yahoo.com
00072e01.pphosted.com
00111b01.pphosted.com
001d4f01.pphosted.com
011.mail.bf1.yahoo.com
1.amazonaws.com
How would I go about cleaning them up using powershell or grep, though I rather use powershell, so that they contain only the root domain with the .com extension and remove whatever word and . is before that.
I'm thinking best way to do is is a query that looks for dots from right to left and removes the second dot and whatever comes after it. For example 1.amazonaws.com here we remove the second dot from the right and whatever is after it?
i.e.
youtube.com
ziprecruiter.com
etsystatic.com
yahoo.com
pphosted.com
amazonaws.com
You can read each line into an array of strings with Get-Content, Split on "." using Split(), get the last two items with [-2,-1], then join the array back up using -join. We can then retrieve unique items using -Unique from Select-Object.
Get-Content -Path .\database_export.txt | ForEach-Object {
$_.Split('.')[-2,-1] -join '.'
} | Select-Object -Unique
Or using Select-Object -Last 2 to fetch the last two items, then piping to Join-String.
Get-Content -Path .\database_export.txt | ForEach-Object {
$_.Split('.') | Select-Object -Last 2 | Join-String -Separator '.'
} | Select-Object -Unique
Output:
youtube.com
ziprecruiter.com
etsystatic.com
sparkpost.com
yahoo.com
pphosted.com
amazonaws.com
You can use the String.Trim() method to clean leading and trailing dots, then use the regex -replace operator to remove everything but the top- and second-level domain name:
$strings = Get-Content database_export.txt
#($strings |ForEach-Object Trim '.') -replace '.*?(\w+\.\w+)$','$1' |Sort-Object -Unique
here is yet another method. [grin]
what it does ...
creates an array of strings to work with
when ready to do this for real, remove the entire #region/#endregion section and use Get-Content to load the file.
iterates thru the $InStuff collection of strings
splits the current item on the dots
grabs the last two items in the resulting array
joins them with a dot
outputs the new string to the $Results collection
shows that on screen
the code ...
#region >>> fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
.youtube.com
.ziprecruiter.com
0.etsystatic.com
0.sparkpost.com
00.mail.ne1.yahoo.com
00072e01.pphosted.com
00111b01.pphosted.com
001d4f01.pphosted.com
011.mail.bf1.yahoo.com
1.amazonaws.com
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file
$Results = foreach ($IS_Item in $InStuff)
{
$IS_Item.Split('.')[-2, -1] -join '.'
}
$Results
output ...
youtube.com
ziprecruiter.com
etsystatic.com
sparkpost.com
yahoo.com
pphosted.com
pphosted.com
pphosted.com
yahoo.com
amazonaws.com
please note that this code expects the strings to be more-or-less-valid URLs. i can think of invalid ones that end with a dot ... and those would fail. if you need to deal with such, add the needed validation code.
another idea ... if the file is large [tens of thousands of strings], you may want to use the ForEach-Object pipeline cmdlet [as shown by RoadRunner] to save RAM at the expense of speed.

Powershell Remove spaces in the header only of a csv

First line of csv looks like this spaces are at after Path as well
author ,Revision ,Date ,SVNFolder ,Rev,Status,Path
I am trying to remove spaces only and rest of the content will be the same .
author,Revision,Date,SVNFolder,Rev,Status,Path
I tried below
Import-CSV .\script.csv | ForEach-Object {$_.Trimend()}
expanding on the comment with an example since it looks like you may be new:
$text = get-content .\script.csv
$text[0] = $text[0] -replace " ", ""
$csv = $text | ConvertFrom-CSV
Note: The solutions below avoid loading the entire CSV file into memory.
First, get the header row and fix it by removing all whitespace from it:
$header = (Get-Content -TotalCount 1 .\script.csv) -replace '\s+'
If you want to rewrite the CSV file to fix its header problem:
# Write the corrected header and the remaining lines to the output file.
# Note: I'm outputting to a *new* file, to be safe.
# If the file fits into memory as a whole, you can enclose
# Get-Content ... | Select-Object ... in (...) and write back to the
# input file, but note that there's a small risk of data loss, if
# writing back gets interrupted.
& { $header; Get-Content .\script.csv | Select-Object -Skip 1 } |
Set-content -Encoding utf8 .\fixed.csv
Note: I've chosen -Encoding utf8 as the example output character encoding; adjust as needed; note that the default is ASCII(!), which can result in data loss.
If you just want to import the CSV using the fixed headers:
& { $header; Get-Content .\script.csv | Select-Object -Skip 1 } | ConvertFrom-Csv
As for what you tried:
Import-Csv uses the column names in the header as property names of the custom objects it constructs from the input rows.
This property names are locked in at the time of reading the file, and cannot be changed later - unless you explicitly construct new custom objects from the old ones with the property names trimmed.
Import-Csv ... | ForEach-Object {$_.Trimend()}
Since Import-Csv outputs [pscustomobject] instances, reflected one by one in $_ in the ForEach-Object block, your code tries call .TrimEnd() directly on them, which will fail (because it is only [string] instances that have such a method).
Aside from that, as stated, your goal is to trim the property names of these objects, and that cannot be done without constructing new objects.
Read the whole file into an array:
$a = Get-Content test.txt
Replace the spaces in the first array element ([0]) with empty strings:
$a[0] = $a[0] -replace " ", ""
Write over the original file: (Don't forget backups!)
$a | Set-Content test.txt
$inFilePath = "C:\temp\headerwithspaces.csv"
$content = Get-Content $inFilePath
$csvColumnNames = ($content | Select-Object -First 1) -Replace '\s',''
$csvColumnNames = $csvColumnNames -Replace '\s',''
$remainingFile = ($content | Select-Object -Skip 1)

Count unique numbers in CSV (PowerShell or Notepad++)

How to find the count of unique numbers in a CSV file? When I use the following command in PowerShell ISE
1,2,3,4,2 | Sort-Object | Get-Unique
I can get the unique numbers but I'm not able to get this to work with CSV files. If for example I use
$A = Import-Csv C:\test.csv | Sort-Object | Get-Unique
$A.Count
it returns 0. I would like to count unique numbers for all the files in a given folder.
My data looks similar to this:
Col1,Col2,Col3,Col4
5,,7,4
0,,9,
3,,5,4
And the result should be 6 unique values (preferably written inside the same CSV file).
Or would it be easier to do it with Notepad++? So far I have found examples only on how to count the unique rows.
You can try the following (PSv3+):
PS> (Import-CSV C:\test.csv |
ForEach-Object { $_.psobject.properties.value -ne '' } |
Sort-Object -Unique).Count
6
The key is to extract all property (column) values from each input object (CSV row), which is what $_.psobject.properties.value does;
-ne '' filters out empty values.
Note that, given that Sort-Object has a -Unique switch, you don't need Get-Unique (you need Get-Unique only if your input already is sorted).
That said, if your CSV file is structured as simply as yours, you can speed up processing by reading it as a text file (PSv2+):
PS> (Get-Content C:\test.csv | Select-Object -Skip 1 |
ForEach-Object { $_ -split ',' -ne '' } |
Sort-Object -Unique).Count
6
Get-Content reads the CSV file as a line of strings.
Select-Object -Skip 1 skips the header line.
$_ -split ',' -ne '' splits each line into values by commas and weeds out empty values.
As for what you tried:
Import-CSV C:\test.csv | Sort-Object | Get-Unique:
Fundamentally, Sort-Object emits the input objects as a whole (just in sorted order), it doesn't extract property values, yet that is what you need.
Because no -Property argument is passed to Sort-Object to base the sorting on, it compares the custom objects that Import-Csv emits as a whole, by their .ToString() values, which happen to be empty[1]
, so they all compare the same, and in effect no sorting happens.
Similarly, Get-Unique also determines uniqueness by .ToString() here, so that, again, all objects are considered the same and only the very first one is output.
[1] This may be surprising, given that using a custom object in an expandable string does yield a value: compare $obj = [pscustomobject] #{ foo ='bar' }; $obj.ToString(); '---'; "$obj". This inconsistency is discussed in this GitHub issue.

How to read a CSV file but exclude certain columns containing blanks using Get-Content

I want to read a CSV file and exclude rows where dynamically selected columns contain blanks but not all rows of those dynamically selected columns contain blanks.
Trying to use the where clause in the statement below (but not working):
Get-Content $Source -ReadCount 1000 |
Where {
ForEach($NotEqualBlankCol in $BlankColumns)
{
$NotEqualBlankCol -ne $null -and $NotEqualBlankCol -ne ''}
} |
ConvertFrom-Csv |
Sort-Object -Property $SortByColNames.Replace('"', '') -Unique |
.
.
.
| Out-File $Destination
$BlankColumns is my dynamic object string array which I would like to loop through containing the column names of the CSV that are blank. it can be 1 column or more. When more then all of the selected columns need to be blank to qualify as a row that does not need to be included in the final CSV file output.
How do I do it using Get-Content? Any help would be appreciated.
Using Get-Content
Ok. So what this will do it read in the contents of a file X lines at a time. It will parse each line into its indiviual columns. Then it will check the specified columns for blanks. If any of the flagged columns contains a black then it will be filtered out. Consider the test data I used for this
id,first_name,last_name,email,gender,ip_address
1,Christina,Tucker,ctucker0#bbc.co.uk,Female,91.33.192.187
2,Jacqueline,Torres,jtorres1#shop-pro.jp,Female,205.70.183.107
3,Kathy,Perez,kperez2#hugedomains.com,Female,35.175.154.127
4,"",Holmes,eholmes3#canalblog.com,,
5,Ernest,Walker,ewalker4#marketwatch.com,Male,140.110.129.21
6,,Garza,cgarza5#jugem.jp,,
7,,Cunningham,jcunningham6#ox.ac.uk,Female,
8,,Clark,lclark7#posterous.com,,
9,,Ortiz,lortiz8#shareasale.com,,
Notice that the first_name and gender are blank for some of these folks. id 1,2,3,5,10 have complete data. The rest should be filtered.
$BlankColumns = "first_name","gender"
$headers = (Get-Content $path -TotalCount 1).Split(",")
$potentialBlankHeaderIndecies = 0..($headers.Count - 1) | Where-Object{$BlankColumns -contains $headers[$_]}
$potentialBlankHeaderIndecies
Get-Content $path -ReadCount 3 | Foreach-Object{
# Check to see if any of the indexes from a split are empty
$_ | Where-Object{
[bool[]](($_.Split(","))[$potentialBlankHeaderIndecies] | ForEach-Object{
![string]::IsNullOrEmpty($_.Trim('"'))
}) -notcontains $false
}
}
The output of this code is the file, as string, with the removed entries. You can just pipe this into a variable, file or what even you need.
To go into a little more detail we take the header names we want to check and this read in the first line of the csv file. That should contain the column names. Using that we determine the column indexes that we want to scrutinize. The we read in the whole file and parse it line by line. For each line we split on the comma and check the elements matching the identified headers. Check each of those elements if they are blank or null. We trim quotes in case it is a string "" which I will assume you would count as blank. Of all the elements we evaluate as a Boolean whether or not it is empty. If at least one is then it fails the where-object clause and gets ommited.
Using Import-CSV
$BlankColumns = "first_name","gender"
Import-CSV $path | Where-Object{
$line = $_
($BlankColumns | ForEach-Object{
![string]::IsNullOrEmpty(($line.$_.Trim('"')))
}) -notcontains $false
}
Very similar approach just a lot less overhead since we are dealing with objects now instead of strings.
Now you could use Export-CSV or ConvertFrom-CSV depending on your needs in the rest of the project.
Changing the filter criteria.
Both examples above filter columns where any of the columns contain blanks. If you want to omit only where all are blank change the line }) -notcontains $false to }) -contains $true