cleanup improperly formatted csv file - powershell

I am downloading a xlsx file from a sharepoint, and then convert it into a csv file. However, since the xlsx file contained empty columns that were not deleted, it exports those to a csv file like follows...
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
As you can see, Import-Csv cmdlet will fail with that file because of the extra null titles. I want to know how to count the extra commas at the end. The number of columns are always changing, and the name of the columns are also always changing. So we start the count based from the last non-null title number.
Right now, I'm doing the following...
$csvFileEdited = Get-Content $csvFile
$csvFileEdited[0] = $csvFileEdited[0].TrimEnd(',')
$csvFileEdited | Set-Content "$csvFile-temp"
Move-Item "$csvFile-temp" $csvFile -Force
Write-Host "Trim Complete."
This will make the file output like this...
columnOne,columnTwo,columnThree
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
The naming is now accepted for Import-Csv, but as you can see there is still extra null values that are not necessary since they are null for every row.
If I did the following code...
$csvFileWithExtraCommas = Get-Content $csvFile
$csvFileWithoutExtraCommas = #()
FOrEach ($line in $csvFileWithExtraCommas)
{
$line = $line.TrimEnd(',')
$csvFileWithoutExtraCommas += $line
{
$csvFileWithoutExtraCommas | Set-Content "$csvFile-temp"
Move-Item "$csvFile-temp" $csvFile -Force
Write-Host "Trim Complete."
Then it would remove a null value that should be null because it belongs to a non-null title-name. Such is the output....
columnOne,columnTwo,columnThree
valueOne
,valueTwo
,,valueThree
Here is the desired output:
columnOne,columnTwo,columnThree
valueOne,,
,valueTwo,
,,valueThree
Can anyone help with this?
Update
I'm using the following code to count the extra null titles...
$csvFileWithCommas = Get-Content $csvFile
[int]$csvFileWithExtraCommasNumber = $csvFileWithCommas[0].Length
$csvFileTitlesWithoutExtraCommas = $csvFileWithCommas[0].TrimEnd(',')
[int]$csvFileWithoutExtraCommasNumber = $csvFileTitlesWithoutExtraCommas.Length
$numOfCommas = $csvFileWithExtraCommasNumber - $csvFileWithoutExtraCommasNumber
The output of value of $numOfCommas is 4. Now the question is how can I use $line.TrimEnd(',') to only do so 4 times??

Ok.... If you really need to do this you can count the trailing commas from the header and use regex to remove as many the from the end of each line. There are other string manipulation approaches but the regex in this case is pretty clean.
Note that what Bluecakes answer shows should suffice. Perhaps there is some other hidden characters that are not being copied in the question or perhaps an encoding issue with your real file.
$file = Get-Content "D:\temp\text.csv"
# Number of trailing commas. Compare the length before and after the trim
$numberofcommas = $file[0].Length - $file[0].TrimEnd(",").Length
# Use regex to remove as many commas from the end of each line and convert to csv object.
$file -replace ",{$numberofcommas}$" | ConvertFrom-Csv
Regex is looking for X commas at the end of of each line where X is $numberofcommas. In our case it would look like ,{4}$
Source file used with above code was generated as such
#"
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
"# | set-content D:\temp\text.csv

Are you getting an error when trying to Import-csv? The cmdlet is smart enough to ignore columns without a heading without any additional code needed.
I copied your csv file to my H:\ drive:
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
and then ran $nullcsv = Import-Csv -Path H:\nullcsv.csv and this is what i got
PS> $nullcsv
columnOne columnTwo columnThree
--------- --------- -----------
valueOne
valueTwo
valueThree
The imported csv only contains 3 values as you would expect:
PS> $nullcsv.count
3
The cmdlet is also orrectly accounting for null values in each of the columns:
PS> $nullcsv | Format-List
columnOne : valueOne
columnTwo :
columnThree :
columnOne :
columnTwo : valueTwo
columnThree :
columnOne :
columnTwo :
columnThree : valueThree

Related

How to import first two values for each line in CSV file | PowerShell

I have a CSV file that generates everyday, and generates with data such as:
windows:NT:v:n:n:d:n:n:n:n:m:n:n
I should also mention that that example is one of 3,900+ lines, and not every line of data has the same number of "columns". What I'm trying to do is import just the first two "columns" of data into a variable. For this example, it would be "Windows" and "NT", nothing else.
How would I go about doing this? I've tried using -delimiter ':', and not much luck.
The number of lines shouldn't matter.
My approach from comment (to your previous question) should work,
if there is no header and you only want the first two columns,
just specify Header 1,2
> import-csv .\strange.csv -delim ':' -Header (1..2) |Where 2 -eq 'NT'
1 2
- -
windows NT
Example for building the entire array
$Splitted_List = #()
foreach($Line in Get-Content '.\myfilewithuseragents.txt'){
$Splitted = $Line -split ":"
$Splitted_Object = [PSCustomObject]#{
$part1 = $splitted[0]
$part2 = $Splitted[1]
}
$Splitted_List.Add($Splitted_Object) | Out-Null
}
For every line you'll just read the line and with the string from that line, you're easily able to split it
$useragent = "windows:NT:v:n:n:d:n:n:n:n:m:n:n"
Then the first part will be referenced to as $useragent.Split(":")[0], the second as $useragent.Split(":")[1], etc.
Including the for-loop that would be something like
foreach($useragent in Get-Content '.\myfilewithuseragents.txt') {
$splitted = $useragent.Split(":")
$part1 = $splitted[0]
}

How to check column count in a file to satisfy a condition

I am trying to write a PowerShell script to check the column count and see if it satisfies the condition or else throw error or email.
something I have tried:
$columns=(Get-Content "C:\Users\xs15169\Desktop\temp\OEC2_CFLOW.txt" | select -First 1).Split(",")
$Count=columns.count
if ($count -eq 280)
echo "column count is:$count"
else
email
I'm going to assume your text file is in CSV format, I can't imagine what format you're working with if it's a text-file table and not formatted as CSV.
If your CSV has headers
Process the CSV file, and count the number of properties on the resulting Powershell object.
$columnCount = #( ( Import-Csv '\path\to\file.txt' ).PSObject.Properties ).Count
We need to force the Properties object to an array (which is the #() syntax) to accurately get the count. The PSObject property is a hidden property for metadata about an object in Powershell, which is where we look for the Properties (column names) and get the count of how many there are.
CSV without headers
If your CSV doesn't have headers, Import-Csv requires you to manually specify the headers. There are tricks you can do to build out unique column names on-the-fly, but they are overly complex for simply getting a column count.
To take what you've already tried above, we can get the data in the first line and process the number of columns, though you were doing it incorrectly in the question. Here's how to properly do it:
$columnCount = ( ( Get-Content "\path\to\file.txt" | Select-Object -First 1 ) -Split ',' ).Count
What was wrong with the original
Both above solutions consolidate getting the column count down to one line of code. But in your original sample, you made a couple small mistakes:
$columns=( Get-Content "\path\to\file.txt" | select -First 1 ).Split(",")
# You forgot to prepend "columns" with a $. Should look like the below line
$Count=$columns.count
And you forgot to use curly braces with your if block:
if ($count -eq 280) {
echo "column count is:$count"
} else {
email
}
As for using the -Split operator vs. the .Split() method - this is purely stylistic preference on my part, and using Split() is perfectly valid.

Powershell replace text in a field from import-csv

I'm reading in a large csv file via Import-CSV and have a column of data with the following format; v00001048, v00019045, or v0036905. I'd like to replace all the zero's (0) after the v but before any number not a zero so the above text becomes; v-1048, v-19045, or v-36905. Done plenty of searches without successful results.
If you have a CSV (say, 'data.csv') with data like this:
Property1,Property2,Property3
SomeText,MoreText,v00001048
Then you can replace the leading zeros in Property3 using this technique:
$data = Import-csv .\data.csv
$data |
ForEach-Object {
$_.Property3 = $_.Property3 -replace "(?<=v)0+(?=\d+)","-"
}
If the property doesn't have any leading zeros to start with (e.g. v1048) this will leave it untouched. If you'd like it to insert the '-' anyway, then change the regex pattern to:
"(?<=v)0*(?=\d+)"

String matching in PowerShell

I am new to scripting, and I would like to ask you help in the following:
This script should be scheduled task, which is working with Veritas NetBackup, and it creates a backup register in CSV format.
I am generating two source files (.csv comma delimited):
One file contains: JobID, FinishDate, Policy, etc...
The second file contains: JobID, TapeID
It is possible that in the second file there are multiple same JobIDs with different TapeID-s.
I would like to reach that, the script for each line in source file 1 should check all of the source file 2 and if there is a JobID match, if yes, it should have the following output:
JobID,FinishDate,Policy,etc...,TapeID,TapeID....
I have tried it with the following logic, but sometimes I have no TapeID, or I have two same TapeID-s:
Contents of sourcefile 1 is in $BackupStatus
Contents of sourcefile 2 is in $TapesUsed
$FinalReport =
foreach ($FinalPart1 in $BackupStatus) {
write-output $FinalPart1
$MediaID =
foreach ($line in $TapesUsed){
write-output $line.split(",")[1] | where-object{$line.split(",")[0] -like $FinalPart1.split(",")[0]}
}
write-output $MediaID
}
If the CSV files are not huge, it is easier to use Import-Csv instead of splitting the files by hand:
$BackupStatus = Import-Csv "Sourcefile1.csv"
$TapesUsed = Import-Csv "Sourcefile2.csv"
This will generate a list of objects for each file. You can then compare these lists quite easily:
Foreach ($Entry in $BackupStatus) {
$Match = $TapesUsed | Where {$_.JobID -eq $Entry.JobID}
if ($Match) {
$Output = New-Object -TypeName PSCustomObject -Property #{"JobID" = $Entry.JobID ; [...] ; "TapeID" = $Match.TapeID # replace [...] with the properties you want to use
Export-Csv -InputObject $Output -Path <OUTPUTFILE.CSV> -Append -NoTypeInformation }
}
This is a relatively verbose variant, but I prefer it like this.
I am checking for each entry in the first file whether there is a matching entry in the second. If there is one I combine the required fields from the entry of the first list with the ones from the entry in the second list into one object that I can then export very comfortably using Export-Csv.

Read a Csv file with powershell and capture corresponding data

Using PowerShell I would like to capture user input, compare the input to data in a comma delimited CSV file and write corresponding data to a variable.
Example:
A user is prompted for a “Store_Number”, they enter "10".
The input, “10” is then compared to the data in the first position
or column of the CSV file.
Data, such as “District_Number” in the corresponding position /
column is captured and written to a variable.
I have gotten this method to work with an Excel file (.xlsx) but have found it to be terribly slow. Hoping that PowerShell can read a CSV file more efficiently.
Link to an example CSV file here:
Store_Number,Region,District,NO_of_Devices,Go_Live_Date
1,2,230,10,2/21/2013
2,2,230,10,2/25/2013
3,2,260,12,3/8/2013
4,2,230,10,3/4/2013
5,2,260,10,3/4/2013
6,2,260,10,3/11/2013
7,2,230,10,2/25/2013
8,2,230,10,3/4/2013
9,2,260,10,5/1/2013
10,6,630,10,5/23/2013
What you should be looking at is Import-Csv
Once you import the CSV you can use the column header as the variable.
Example CSV:
Name | Phone Number | Email
Elvis | 867.5309 | Elvis#Geocities.com
Sammy | 555.1234 | SamSosa#Hotmail.com
Now we will import the CSV, and loop through the list to add to an array. We can then compare the value input to the array:
$Name = #()
$Phone = #()
Import-Csv H:\Programs\scripts\SomeText.csv |`
ForEach-Object {
$Name += $_.Name
$Phone += $_."Phone Number"
}
$inputNumber = Read-Host -Prompt "Phone Number"
if ($Phone -contains $inputNumber)
{
Write-Host "Customer Exists!"
$Where = [array]::IndexOf($Phone, $inputNumber)
Write-Host "Customer Name: " $Name[$Where]
}
And here is the output:
Old topic, but never clearly answered. I've been working on similar as well, and found the solution:
The pipe (|) in this code sample from Austin isn't the delimiter, but to pipe the ForEach-Object, so if you want to use it as delimiter, you need to do this:
Import-Csv H:\Programs\scripts\SomeText.csv -delimiter "|" |`
ForEach-Object {
$Name += $_.Name
$Phone += $_."Phone Number"
}
Spent a good 15 minutes on this myself before I understood what was going on. Hope the answer helps the next person reading this avoid the wasted minutes!
(Sorry for expanding on your comment Austin)
So I figured out what is wrong with this statement:
Import-Csv H:\Programs\scripts\SomeText.csv |`
(Original)
Import-Csv H:\Programs\scripts\SomeText.csv -Delimiter "|"
(Proposed, You must use quotations; otherwise, it will not work and ISE will give you an error)
It requires the -Delimiter "|", in order for the variable to be populated with an array of items. Otherwise, Powershell ISE does not display the list of items.
I cannot say that I would recommend the | operator, since it is used to pipe cmdlets into one another.
I still cannot get the if statement to return true and output the values entered via the prompt.
If anyone else can help, it would be great. I still appreciate the post, it has been very helpful!