powershell concat all columns in a row - powershell

i have 20+ columns in a csv file like
empid ename deptid mgrid hiredon col6 .... col20
10 a 10 5 10-may-2010
11 b 10 5 08-aug-2005
12 c 11 3 11-dec-2008
i would like to get the output as csv like
empid, all_other_details
10 , {ename:a;deptid:10;mgrid:5; like this for all 19 columns }
except employee id all other columns should be wrapped into a string containing key:value pairs. Is there a way to join all the columns without mentioning each column as $_. ?

I have come up with this, I hope comments are self explanatory.
It should work with 2 or more columns.
Delimiters can be changed (on my computer, CSV delimiter is ; not , for example, and I know it can be different with other Cultures).
#declare delimiters
$CSVdelimiter = ";"
$detailsDelimiter = ","
#load file in array
$data = Get-Content "Book1.csv"
#isolate headers
$headers = $data[0].Split($CSVdelimiter)
#declare row counter
$rowCount = 0
#declare results array with headers
$results = #($headers[0] + "$CSVdelimiter`details")
#for each row except first
$data | Select-Object -Skip 1 | % {
#split on $csvDelimiter
$rowArray = $_.Split($CSVdelimiter)
#declare details array
$details = #()
#for each column except first
for($i = 1; $i -lt $rowArray.Count; $i++) {
#add to details array (header:value)
$details += $headers[$i] + ":" + $rowArray[$i]
}
#join details array with $detailsDelimiter to build new row
#append to first column value
#add to results array
$results += "$($rowArray[0])$CSVdelimiter{$($details -join $detailsDelimiter)}"
#increment row counter
$rowCount++
}
#output results to new csv file
$results | Out-File "Book2.csv"
Output looks like this :
empid;details
10;{ename:a,deptid:10,mgrid:5,hiredon:10-may-2010}
11;{ename:b,deptid:10,mgrid:5,hiredon:08-aug-2005}
12;{ename:c,deptid:11,mgrid:3,hiredon:11-dec-2008}

Try this:
$csv = Get-Content .\input_file.csv
$keys = $csv[0] -split '\s+'
$c = $keys.count - 1
$keys = ($keys[1..$c] | % {$i = -1}{$i += 1; "$($_):{$i}"}) -join '; '
$csv[1..($csv.count -1)] | % {
$a = $_ -split '\s+'
New-Object psobject -Property #{
empid = $a[0]
all_other_details = "{$($keys -f $a[1..$c])}"
}
} | Export-Csv output_file.csv -NoTypeInformation

Related

Split values table for extract with powershell

I would like to make a new line in my hashtable to extract it in a csv.
I initialize my variable in hastable
$vlr=#{}
$vlr["OS"]=,#("test","test2")
I extract my variable in a .csv
$Output += New-Object PSObject -Property $vlr
$output | Convert-OutputForCSV | export-csv -NoTypeInformation -Delimiter ";" -Path $filepath
and the problem is in the extraction the result of the values ​​is on the same line
My goal is that each value is in a different line
You might want to use the Out-String cmdlet for this:
$vlr=#{}
$vlr["OS"]=,#("test","test2") | Out-String
$Object = New-Object PSObject -Property $vlr
$Object | ConvertTo-Csv
"OS"
"test
test2
"
this solution does not work because in the case where $vlr with several names the extraction will be complicated
$vlr=#{}
$vlr["OS"]=,#("test","test2")
$vlr["PS"]=,#("lous","tique")
it's a problem
https://gallery.technet.microsoft.com/scriptcenter/Convert-OutoutForCSV-6e552fc6
For the function Convert-OutputForCSV
I don't know what the posted function does, but you can make your own function to handle a single-key or multi-key hash table provided all of the key value counts are the same.
function Convert-OutputForCsv {
param(
[parameter(ValueFromPipeline)]
[hashtable]$hash
)
# Array of custom object property names
$keys = [array]$hash.Keys
# Loop through each key's values
for ($i = 0; $i -lt $hash.($keys[0]).count; $i++) {
# Custom object with keys as properties. Property values are empty.
$obj = "" | Select $keys
# Loop through key names
for ($j = 0; $j -lt $keys.Count; $j++) {
$obj.($keys[$j]) = $hash.($Keys[$j])[$i]
}
$obj
}
}
$vlr=[ordered]#{}
$vlr["OS"]='test','test2'
$vlr["PS"]='lous','tique'
$vlr | Convert-OutputForCsv | Export-Csv -NoTypeInformation -Delimiter ";" -Path $filepath
Honestly, if you are in control of the input data, I would just type out a CSV instead of typing out hash tables.
this solution is good in my simplified case but not adapted to my case unfortunately
I'm merging my old base2 array with my new base array and my goal is to concatenate the values ​​in an excel to make them usable
$base2 = Get-content $filepath2 | select -first 1
$base2 = $base2 -split ";"
$base2 = $base2.Replace("`"", "")
$cunt2 = $base2.count - 1
$h2 = ipcsv $filepath2 -Delimiter ";"
$HashTable2 = #{}
for ($i = 0 ; $i -le $cunt2 ; $i++) {
foreach ($r in $h2) {
$HashTable2[$base2[$i]] = $r.($base2[$i])
}
base2 = old tables
$base = Get-content $filepath2 | select -first 1
$base = $base -split ";"
$base = $base.Replace("`"", "")
$cunt = $base.count - 1
$h1 = ipcsv $filepath -Delimiter ";"
$HashTable = #{}
for ($i = 0 ; $i -le $cunt ; $i++) {
foreach ($r in $h1) {
$HashTable[$base[$i]] = $r.($base[$i])
}
New tables $base
once the two arrays are initialized, I merge them and this is where I have to separate the values ​​row by row
$csvfinal = $hashtable, $hashtable2 | Merge-Hashtables

Find out Text data in CSV File Numeric Columns in Powershell

I am very new in powershell.
I am trying to validate my CSV file by finding out if there is any text value in my numeric fields. I can define with columns are numeric.
This is my source data like this
ColA ColB ColC ColD
23 23 ff 100
2.30E+01 34 2.40E+01 23
df 33 ss df
34 35 36 37
I need output something like this (only text values if found in any column)
ColA ColC ColD
2.30E+01 ff df
df 2.40E+01
ss
I have tried some code but not getting any results, get only some output like as under
System.Object[]
---------------
xxx fff' ddd 3.54E+03
...
This is what I was trying
#
cls
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$arrResult = #()
$arraycol = #()
$FileCol = #("ColA","ColB","ColC","ColD")
$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"
#Importing CSVs
$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","
############## Test Datatype (Is-Numeric)##########
foreach($col in $FileCol)
{
foreach ($line in $dif_file) {
$val = $line.$col
$isnum = Is-Numeric($val)
if ($isnum -eq $false) {
$arrResult += $line.$col
$arraycol += $col
}
}
}
[pscustomobject]#{$arraycol = "$arrResult"}| out-file "C:\Users\$env:username\Desktop\Errors1.csv"
####################
can someone guide me right direction?
Thanks
You can try something like this,
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"
#Importing CSVs
$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","
#$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
# Use this to specify certain columns
$columns = "ColB", "ColC", "ColD"
foreach($row in $dif_file) {
foreach ($col in $columns) {
if ($col -in $columns) {
if (!(Is-Numeric $row.$col)) {
$row.$col = ""
}
}
}
}
$dif_file | Export-Csv C:\temp\formatted.txt
Look up name of columns as you go
Look up values of each col in each row and if it is not numeric, change to ""
Exported updated file.
I think not displaying columns that have no data creates the challenge here. You can do the following:
$csv = Import-Csv "C:\Users\$env:username\desktop\f2.csv"
$finalprops = [collections.generic.list[string]]#()
$out = foreach ($line in $csv) {
$props = $line.psobject.properties | Where {$_.Value -notmatch '^[\d\.]+$'} |
Select-Object -Expand Name
$props | Where {$_ -notin $finalprops} | Foreach-Object { $finalprops.add($_) }
if ($props) {
$line | Select $props
}
$out | Select-Object ($finalprops | Sort)
Given the nature of Format-Table or tabular output, you only see the properties of the first object in the collection. So if object1 has ColA only, but object2 has ColA and ColB, you only see ColA.
The output order you want is quite different than the input CSV; you're tracking bad text data not by first occurrence, but by column order, which requires some extra steps.
test.csv file contents:
ColA,ColB,ColC,ColD
23,23,ff,100
2.30E+01,34,2.40E+01,23
df,33,ss,df
34,35,36,37
Sample code tested to meet your description:
$csvIn = Import-Csv "$PSScriptRoot\test.csv";
# create working data set with headers in same order as input file
$data = [ordered]#{};
$csvIn[0].PSObject.Properties | foreach {
$data.Add($_.Name, (New-Object System.Collections.ArrayList));
};
# add fields with text data
$csvIn | foreach {
$_.PSObject.Properties | foreach {
if ($_.Value -notmatch '^-?[\d\.]+$') {
$null = $data[$_.Name].Add($_.Value);
}
}
}
$removes = #(); # remove `good` columns with numeric data
$rowCount = 0; # column with most bad values
$data.GetEnumerator() | foreach {
$badCount = $_.Value.Count;
if ($badCount -eq 0) { $removes += $_.Key; }
if ($badCount -gt $rowCount) { $rowCount = $badCount; }
}
$removes | foreach { $data.Remove($_); }
0..($rowCount - 1) | foreach {
$h = [ordered]#{};
foreach ($key in $data.Keys) {
$h.Add($key, $data[$key][$_]);
}
[PSCustomObject]$h;
} |
Export-Csv -NoTypeInformation -Path "$PSScriptRoot\text-data.csv";
output file contents:
"ColA","ColC","ColD"
"2.30E+01","ff","df"
"df","2.40E+01",
,"ss",
#Jawad, Finally I have tried
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$arrResult = #()
$columns = "ColA","ColB","ColC","ColD"
$dif_file_path = "C:\Users\$env:username\desktop\f1.csv"
$dif_file = Import-Csv -Path $dif_file_path -Delimiter "," |select $columns
$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
foreach($row in $dif_file) {
foreach ($col in $columns) {
$val = $row.$col
$isnum = Is-Numeric($val)
if ($isnum -eq $false) {
$arrResult += $col+ " " +$row.$col
}}}
$arrResult | out-file "C:\Users\$env:username\desktop\Errordata.csv"
I get correct result in my out file, order is very ambiguous like
ColA ss
ColB 5.74E+03
ColA ss
ColC rrr
ColB 3.54E+03
ColD ss
ColB 8.31E+03
ColD cc
any idea to get proper format? thanks
Note: with your suggested code, I get complete source file with all data , not the specific error data.

When a word matches retrieve the varying string after it

I have a query which looks like this:
FROM TableA
INNER JOIN TableB
ON TableA.xx = TableB.xx
INNER JOIN TableC
ON TableA.yy = TableC.yy
I am trying to write a script which selects the tables which come after the word "JOIN".
The script that I wrote now is:
$data = Get-Content -Path query1.txt
$dataconv = "$data".ToLower() -replace '\s+', ' '
$join = 0
$overigetabellen = ($dataconv) | foreach {
if ($_ -match "join (.*)") {
$join++
$join = $matches[1].Split(" ")[0]
#Write-Host "Table(s) on which is joined:" $join"."
$join
}
}
$overigetabellen
This gives me only the first table, so TableB.
Can anyone help me how I get the second table also as output?
Process your data with Select-String:
$data | Select-String -AllMatches -Pattern '(?<=join\s+)\S+' |
Select-Object -Expand Matches |
Select-Object -Expand Groups |
Select-Object -Expand Value
(?<=...) is a so-called positive lookbehind assertion that is used for matching the pattern without being included in the returned string (meaning the returned matches are just the table names without the JOIN before them).
This is my naive attempt to find the desired table names.
Split the data input on whitespace into an array, find the indices of the word "JOIN", and then access the following indices after the word "JOIN."
$data = Get-Content -Path query1.txt
$indices = #()
$output = #()
$dataarray = $data -split '\s+'
$singleIndex = -1
Do{
$singleIndex = [array]::IndexOf($dataarray,"JOIN",$singleIndex + 1)
If($singleIndex -ge 0){$indices += $singleIndex}
}While($singleIndex -ge 0)
foreach ($index in $indices) {
$output += $dataarray[$index + 1]
}
Outputs:
TableB
TableC
You can adjust for capitalization (saw you set your input to all lowercase), etc as needed if you expect varying input files.

How to set a variable to a column with no header in a tab delimited text file

Barcode1 Plate # 12/29/2017 07:35:56 EST
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
Above is an example of a tab delimited text file. I need to get the data from the column with no header; namely, the columns at the end and I don't know how to identify it. I am trying to swap columns and output a text file. The source data file format is the same every time.
This is part of what I have:
$swapColumns = #{
column1 = #{
name = "date-header"
instance = 1
}
column2 = #{
name = "Blank"
instance = 1
}
}
$formats = #(
'XR-{0:yyyyMMdd}-01.txt'
)
$date = [datetime]::now
$ErrorActionPreference = 'Stop'
function Get-HeaderIndex {
param(
[System.Collections.Generic.List[string]]$Source,
[string]$Header,
[uint16]$Instance
)
$index = 0;
for ($i = 0; $i -lt $Instance; $i++) {
$index = $Source.IndexOf($Header, $index, ($Source.Count - $index))
if (($index -eq -1) -or (($i + 1) -eq $Instance)) {
break
}
$index = $index + 1
}
if ($index -eq -1) { throw "index not found" }
return $index
}
#grabs the first item in folder matching UCX-*.txt
$fileDetails = Get-ChildItem $PSScriptRoot\UCX-*.txt | select -First 1
#gets the file contents
$file = Get-Content $fileDetails
#break up script in sections that look like '======section======'
#and store the section name and line number it starts on
$sections = #()
for ($i = 0; $i -lt $file.Count; $i++) {
if ($file[$i] -match '^=+(\w+)=+$') {
$section = $Matches[1]
$sections += [pscustomobject]#{line = $i; header = $section}
}
}
#get the data section
$dataSection = $sections | ? {$_.header -eq 'data'}
#get the section following data
$nextSection = $sections | ? {$_.line -gt $dataSection.line} | sort
-Property line | select -First 1
#get data column headers
$dataHeaders = New-Object System.Collections.Generic.List[string]
$file[$dataSection.line + 1].split("`t") | % {
[datetime]$headerDateValue = [datetime]::MinValue
$headerIsDate = [datetime]::TryParse($_.Replace('EST','').Trim(),
[ref] $headerDateValue)
if ($headerIsDate) {
$dataHeaders.Add('date-header')
}
else {
$dataHeaders.Add($_)
}
}
#get index of columns defined in $swapColumns
$column1 = Get-HeaderIndex -Source $dataHeaders -Header
$swapColumns.column1.name -Instance $swapColumns.column1.instance
$column2 = Get-HeaderIndex -Source $dataHeaders -Header
swapColumns.column2.name -Instance $swapColumns.column2.instance
#iterate over each row in data section, swap data from column1/column2
for ($i = $dataSection.line + 2; $i -lt $nextSection.line - 1; $i++) {
$line = $file[$i]
$parts = $line.split("`t")
$tmp1 = $parts[$column1]
$parts[$column1] = $parts[$column2]
$parts[$column2] = $tmp1
$file[$i] = $parts -join "`t"
}
#write new file contents to files with names defined in $formats
$formats | % {
$file | Out-File ($_ -f $date) -Force
}
If you know what your file format is going to be then forget whatever the current header is and assume when we convert the file to a CSV object.
It looks like you need to parse the date of out the header which should be trivial. Grab it from $fileheader however you would like.
$wholeFile = Get-Content C:\temp\test.txt
$fileHeader = $wholeFile[0] -split "`t"
$newHeader = "Barcode1", "Plate #", "Date", "Plumbus", "Dinglebop"
$wholeFile |Select-Object -Skip 1 | ConvertFrom-Csv -Delimiter "`t" -Header $newHeader
If the columns length is always the same, there's another option, specify manually the width of the columns, See example:
$content = Get-Content C:\temp.tsv
$columns = 13, 24, 35 | Sort -Descending
$Delimiter = ','
$Results = $content | % {
$line = $_
$columns | % {
$line = $line.Insert($_, $Delimiter)
}
$line
} |
ConvertFrom-Csv -Delimiter $Delimiter
Results:
Barcode1 Plate # H1 12/29/2017 07:35:56 EST
--------- ----------- -- -----------------------
A 1 4 5
A 1 4 5
A 1 4 5
A 1 4 5
A 1 4 5
A 1 4 5
A 1 4 5
Then you can easily get the data you need:
$Results[0].H1
4
[This answer doesn't solve the OP's problem after clarifying the exact requirements, but may be of general interest to some, given the question's generic title.]
If the file is really tab-delimited, you can use Import-Csv -Delimiter "`t" to read it, in which case PowerShell will autogenerate header names as H<n> if they're missing, where <n> is a sequence number starting with 1.
Caveat: This doesn't work if the unnamed column is the last one, because - inexplicably - Import-Csv then ignores the entire column (more generally, any run of trailing delimiters).
Import-Csv -Delimiter "`t" file.tsv | Select-Object -ExpandProperty H1

Deleting columns from CSV using PowerShell

I have a CSV file that has duplicate column headers, so I can't use Import-Csv to do the work. The header names are dynamic. I need to get the third column, the fourth column, and every fourth column after that(ex: starting from 0 columns 2, 3, 7, 11, 15...).
The reason I have duplicate column names is that header 3 needed the same name as header 0, in groups of four. 0 > 3, 4 > 7, 8 > 11...
I used get-Content because I couldn't figure out how to make this work with Import-Csv. I had to use Import-Csv to get the number of columns, which I couldn't figure out with Get-Content.
#Rename every fourth column
$file = "C:\Scripts\File.csv"
$data = get-content $file
$step = 4
$csv = Import-Csv "C:\Scripts\File.csv"
$headers = $data | select -first 1
$count = $csv[0].PSObject.Properties | select -Expand Name
for ($i = 0; $i -lt $count.count; $i += $step)
{
$headers = $headers -split ","
$headers[($i + 3)] = $headers[$i]
$headers[($i + 2)] = "timestamp"
$headers = $headers -join ","
$data[0] = $headers
$data | Set-Content "C:\Scripts\File.csv"
}
I can reuse the variable $count if needed (for $count.count), so I don't have to use Import-Csv again. I'm having trouble figuring out how to get just the columns I need based on number and not header name.
This worked great for getting the third column (2nd if starting from 0), but I'm not sure how to get every fourth after that (3rd if starting from 0)
type "C:\Scripts\File.csv" | % { $_.Split(",") | select -skip 2 -first 1 }
Screenshots below. Keep in mind I do not know the headers names of every fourth column as they could be anything, I only know which column number the data is in (every fourth column).
I'd re-think that whole process and start with this:
$file = "C:\Scripts\File.csv"
$HeaderCount = ((gc sentlog.csv -TotalCount 1).split(',')).count -1
$CSV = import-csv $file -Header (0..$HeaderCount)
Now you can treat those column headings like array indexes to extract out the columns you want.
Use Select -Skip 1 to strip off the original header row. You can rewrite the property names for export using calculated properties or just create new objects, using property names extracted from the original header row.
OK, based on the posted data, try this:
$file = "C:\Scripts\File.csv"
$OutputFile = "C:\Scripts\OutputFile.csv"
$HeaderCount = ((Get-Content $file -TotalCount 1).split(',')).count -1
$CSV = import-csv $file -Header (0..$HeaderCount)
$SelectedColumns = #(2) + ( (0..$HeaderCount) |? { ($_ % 4) -eq 3 } ) -as [string[]]
$CSV |
select $SelectedColumns |
ConvertTo-CSV -NoTypeInformation |
Select -Skip 1 |
Set-Content $OutputFile