Need to remove specific portion from rows in a csv using powershell - powershell

I have a csv file with two columns and multiple rows, which has the information of files with folder location and its corresponding size, like below
"Folder_Path","Size"
"C:\MSSQL\DATA\UsersData\FTP.txt","21345"
"C:\MSSQL\DATA\UsersData\Norman\abc.csv","78956"
"C:\MSSQL\DATA\UsersData\Market_Database\123.bak","1234456"
What i want do is remove the "C:\MSSQL\DATA\" part from every row in the csv and keep the rest of the folder path after starting from UsersData and all other data intact as this info is repetitive. So my csv should like this below.
"Folder_Path","Size"
"UsersData\FTP.txt","21345"
"UsersData\Norman\abc.csv","78956"
"UsersData\Market_Database\123.bak","1234456"
What i am running is as below
Import-Csv ".\abc.csv" |
Select-Object -Property #{n='Folder_Path';e={$_.'Folder_Path'.Split('C:\MSSQL\DATA\*')[0]}}, * |
Export-Csv '.\output.csv' -NTI
Any help is appreciated!

Seems like a job for a simple string replace:
Get-Content "abc.csv" | foreach { $_.replace("C:\MSSQL\DATA\", "") | Set-Content "output.csv"
or:
[System.IO.File]::WriteAllText("output.csv", [System.IO.File]::ReadAllText("abc.csv" ).Replace("C:\MSSQL\DATA\", ""))

This should work:
Import-Csv ".\abc.csv" |
Select-Object -Property #{n='Folder_Path';e={$_.'Folder_Path' -replace '^.*\\(.*\\.*)$', '$1'}}, Size |
Export-Csv '.\output.csv' -NoTypeInformation

Related

Merge PDF files using CSV file list using Powershell

I want to create multiple merged PDF files from around 1400+ pdf files.
I have a data.csv file with 2 columns as below.
The PDF files with filename matching Filename column and data.csv file are in the same folder.
I need to create multiple merged PDF files and each merged PDF will have group of files that have the same First three characters in the filename.
e.g.,
The filenames starting with EIN* need to be merged into one PDF file in the same sorting order as in the data.csv file. The filename of merged PDF should be Y followed by the first three characters. so in this example it should be YEIN.pdf
This process need to be looped in until all the rows in data.csv are actioned.
sample data.csv file
FilePath Filename
$FilePath1 EINCO01-174.pdf
$FilePath2 EINCO02-174.pdf
$FilePath3 EINCO03-174.pdf
$FilePath4 EINCO04-174.pdf
$FilePath5 EINCL01-174.pdf
$FilePath6 EINCL02-174.pdf
$FilePath7 EINCL03-174.pdf
$FilePath8 EINCL04-174.pdf
$FilePath9 EINCL05-174.pdf
$FilePath10 EINCL06-174.pdf
$FilePath11 EINCL07-174.pdf
$FilePath12 EINCL08-174.pdf
$FilePath13 EINCL09-174.pdf
$FilePath14 EINCL10-174.pdf
$FilePath15 EINCL11-174.pdf
$FilePath16 EINCL12-174.pdf
$FilePath17 EINCL13-174.pdf
$FilePath18 EINCL14-174.pdf
$FilePath19 EINCL15-174.pdf
$FilePath20 EINCL16-174.pdf
$FilePath21 EINCL17-174.pdf
$FilePath22 EINCL18-174.pdf
$FilePath23 EINCL19-174.pdf
$FilePath25 GINLG01-170.pdf
$FilePath26 GINLG02-166.pdf
$FilePath27 GINLG03-159.pdf
$FilePath28 GINLG04-159.pdf
$FilePath29 GINLG05-168.pdf
$FilePath30 GINLG06-152.pdf
$FilePath31 GINNO01-174.pdf
$FilePath32 GINNO02-131.pdf
$FilePath33 GINNO04-150.pdf
$FilePath34 GINNO05-174.pdf
$FilePath35 GINTA01-130.pdf
$FilePath36 GINTA02-139.pdf
$FilePath37 GINTA03-139.pdf
So to tackle this I have created a script to split data.csv file into multiple CSV files grouped by the First three characters as below.
$data = Import-Csv '.\data.csv' |
Select-Object Filepath,Filename,#{n='Group';e={$_.Filename.Substring(0,3)}}
$data | Format-Table -GroupBy Group
Group-Object {$_.Group}| ForEach-Object {
$_.Group | Export-Csv "$($_.Group).csv" -NoTypeInformation
}
foreach ($Group in $data | Group Group)
{
$data | Where-Object {$_.Group -eq $group.name} |
ConvertTo-Csv -delimiter "`t" -NoTypeInformation |
foreach {$_.Replace('"','')} |
Out-File "$($group.name).csv"
}`
From here, I am unable to proceed to next step to achieve what I need to. I presume there could be a better way to do this.
PS: I have installed PSWritePDF module on my machine.

powershell: Write specific rows from files to formatted csv

The following code gives me the correct output to console. But I would need it in a csv file:
$array = #{}
$files = Get-ChildItem "C:\Temp\Logs\*"
foreach($file in $files){
foreach($row in (Get-Content $file | select -Last 2)){
if($row -like "Total peak job memory used:*"){
$sp_memory = $row.Split(" ")[5]
$array.Add(($file.BaseName),([double]$sp_memory))
break
}
}
}
$array.GetEnumerator() | sort Value -Descending |Format-Table -AutoSize
current output (console):
required output (csv):
In order to increase performance I would like to avoid the array and write output directly to csv (no append).
Thanks in advance!
Change your last line to this -
$array.GetEnumerator() | sort Value -Descending | select #{l='FileName'; e={$_.Name}}, #{l='Memory (MB)'; e={$_.Value }} | Export-Csv -path $env:USERPROFILE\Desktop\Output.csv -NoTypeInformation
This will give you a csv file named Output.csv on your desktop.
I am using Calculated properties to change the column headers to FileName and Memory (MB) and piping the output of $array to Export-Csv cmdlet.
Just to let you know, your variable $array is of type Hashtable which won't store duplicate keys. If you need to store duplicate key/value pairs, you can use arrays. Just suggesting! :)

Extract Columns based on Row data from .CSV

Total Newbie with PowerShell, but used to use WSH with .vbs back in the day - so hopefully can structure this question correctly.
I would like to extract x number of columns from a .csv file, only if the row data equals a certain value - and then send the filtered data to a new .csv in another destination.
So taking a saved Windows event log as an example, I would like to extract Columns A-F but only on rows where column 'A' equals 'Error' - and then send that output to a new .csv in a child directory.
I think I am pretty close, but can only get it to save columns A-F but no rows with the data I need!
Can anyone help me figure this out or show me where I am going wrong please?
$folderPath = 'C:\DLA\'
$folderPathDest = 'C:\DLA\OUT\'
$desiredColumns = 'A','B','C','D','E','F'
$topics.Where({$desiredColumns.play -eq 'Error'}).topic
Get-ChildItem $folderPath -Name |
ForEach-Object {
$filePath = $folderPath + $_
$filePathdest = $folderPathDest + $_
Import-Csv $filePath | Select $desiredColumns | Select $topics |
Export-Csv -Path $filePathDest –NoTypeInformation
}
Just put the filter directly in your pipeline:
Import-Csv $filePath | Select $desiredColumns | where {$_.A -eq 'Error'} | Export-Csv -Path $filePathDest –NoTypeInformation
Below command worked for me to extract all the columns to a new file. This can be modified to select the desired columns:
import-csv $filePath | ? { $_.columnName -eq 'Error' } | export-csv $filePathDest -NoTypeInformation
columnName is the header title of the columns

Powershell removing columns and rows from CSV

I'm having trouble making some changes to a series of CSV files, all with the same data structure. I'm trying to combine all of the files into one CSV file or one tab delimited text file (don't really mind), however each file needs to have 2 empty rows removed and two of the columns removed, below is an example:
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
^ ^
remove remove
End Result:
col1,col2,col4,col6
col1,col2,col4,col6
This is my attempt at doing this (I'm very new to Powershell)
$ListofFiles = "example.csv" #this is an list of all the CSV files
ForEach ($file in $ListofFiles)
{
$content = Get-Content ($file)
$content = $content[2..($content.Count)]
$contentArray = #()
[string[]]$contentArray = $content -split ","
$content = $content[0..2 + 4 + 6]
Add-Content '...\output.txt' $content
}
Where am I going wrong here...
your example file should be read, before foreach to fetch the file list
$ListofFiles = get-content "example.csv"
Inside the foreach you are getting content of mainfile
$content = Get-Content ($ListofFiles)
instead of
$content = Get-Content $file
and for removing rows i will recommend this:
$obj = get-content C:\t.csv | select -Index 0,1,3
for removing columns (column numbers 0,1,3,5):
$obj | %{(($_.split(","))[0,1,3,5]) -join "," } | out-file test.csv -Append
According to the fact the initial files looks like
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
,,,,,
,,,,,
You can also try this one liner
Import-Csv D:\temp\*.csv -Header 'C1','C2','C3','C4','C5','C6' | where {$_.c1 -ne ''} | select -Property 'C1','C2','C5' | Export-Csv 'd:\temp\final.csv' -NoTypeInformation
According to the fact that you CSVs have all the same structure, you can directly open them providing the header, then remove objects with the missing datas then export all the object in a csv file.
It is sufficient to specify fictitious column names, with a column number that can exceed the number of columns in the file, change where you want and exclude columns that you do not want to take.
gci "c:\yourdirwithcsv" -file -filter *.csv |
%{ Import-Csv $_.FullName -Header C1,C2,C3,C4,C5,C6 |
where C1 -ne '' |
select -ExcludeProperty C3, C4 |
export-csv "c:\temp\merged.csv" -NoTypeInformation
}

Use Import-Csv to read changable column Titles by location

I'm trying to see if there is a way to read the column values in a csv file based on the column location. The reason for this is the file I'm being handed always has it's titles being changed...
For example, lets say csv file column A (via excel) looks like the following:
ColumnOne
ValueOne
ValueTwo
ValueThree
Now the user changes the title:
Column 1
ValueOne
ValueTwo
ValueThree
Now I want to create an array of the first column. Normally what I do is the following:
$arrayFirstColumn = Import-Csv 'C:\test\test1.csv' | where-object {$_.ColumnOne} | select-object -expand 'ColumnOne'
However, as we can see if ColumnOne is changed to Column 1, it breaks this code. How can I create this array to allow an interchangeable column title, but the column location will always be the same?
You can specify headers of your own on import:
Import-Csv 'C:\path\to\your.csv' -Header 'MyHeaderA','MyHeaderB',...
As long as you don't export the data back to a CSV (or don't require the original headers to be in the output CSV as well) you can use whatever names you like. You can also specify as many header names as you like. If their number is less than the number of the columns in the CSV the additional columns will be omitted, if it's greater then the columns for the additional headers will be empty.
If you need to preserve the original headers you could get the header name(s) you need to work with in variable(s) like this:
$csv = Import-Csv 'C:\test\test1.csv'
$firstCol = $csv | Select-Object -First 1 | ForEach-Object {
$_.PSObject.Properties | Select-Object -First 1 -Expand Name
}
$arrayFirstColumn = $csv | Where-Object {$_.$firstCol} |
Select-Object -Expand $firstCol
Or you could simply read the first line from the CSV and split it to get an array with the headers:
$headers = (Get-Content 'C:\test\test1.csv' -TotalCount 1) -split ','
$firstCol = $headers[0]
One option:
$ImportFile = 'C:\test\test1.csv'
$FirstColumn = ((Get-Content $ImportFile -TotalCount 2 | ConvertFrom-Csv).psobject.properties.name)[0]
$FirstColumn
$arrayFirstColumn = Import-Csv $ImportFile | where-object {$_.$FirstColumn} | select-object -expand $FirstColumn
If you are using PowerShell v2.0 then the expression for $FirstColumn in $mjolinor's answer would be:
$FirstColumn = ((Get-Content $ImportFile -TotalCount 2 | ConvertFrom-Csv).psobject.properties | ForEach-Object {$_.name})[0]
(Apologies for starting a new answer; I do not yet have enough reputation to add a comment to mjolinor's post)