How to read multiple data sets from one .csv file in powershell - powershell

I have a temp recorder that (daily) reads multiple sensors and saves the data to a single .csv with a whole bunch of header information before each set of date/time and temperature. the file looks something like this:
"readerinfo","onlylistedonce"
"downloadinfo",YYYY/MM/DD 00:00:00
"timezone",-8
"headerstuff","headersuff"
"sensor1","sensorstuff"
"serial#","0000001"
"about15lines","ofthisstuff"
"header1","header2"
datetime,temp
datetime,temp
datetime,temp
"sensor2","sensorstuff"
"serial#","0000002"
"about15lines","ofthisstuff"
"header1","header2"
datetime,temp
datetime,temp
datetime,temp
"downloadcomplete"
My aim is to pull out the date/time and temp data for each sensor and save it as a new file so that I can run some basic stats(hi/lo/avg temp)on it. (It would be beautiful if I could somehow identify which sensor the data came from based on a serial number listed in the header info, but that's less important than separating out the data into sets) The lengths of the date/time lists change from sensor to sensor based on how long they've been recording and the number of sensors changes daily also. Even if I could just split the sensor data, header info and all, into however many files there are sensors, that would be a good start.

This isn't exactly a CSV file in the traditional sense. I imagine you already know this, given your description of the file contents.
If the lines with datetime,temp truly do not have any double quotes in them, per your example data, then the following script should work. This script is self-containing, since it declares the example data in-line.
IMPORTANT: You will need to modify the line containing the declaration of the $SensorList variable. You will have to populate this variable with the names of the sensors, or you can parameterize the script to accept an array of sensor names.
UPDATE: I changed the script to be parameterized.
Results
The results of the script are as follows:
sensor1.csv (with corresponding data)
sensor2.csv (with corresponding data)
Some green text will be written to the PowerShell host, indicating which sensor is currently detected
Script
The contents of the script should appear as follows. Save the script file to a folder, such as c:\test\test.ps1, and then execute it.
# Declare text as a PowerShell here-string
$Text = #"
"readerinfo","onlylistedonce"
"downloadinfo",YYYY/MM/DD 00:00:00
"timezone",-8
"headerstuff","headersuff"
"sensor1","sensorstuff"
"serial#","0000001"
"about15lines","ofthisstuff"
"header1","header2"
datetime,tempfromsensor1
datetime,tempfromsensor1
datetime,tempfromsensor1
"sensor2","sensorstuff"
"serial#","0000002"
"about15lines","ofthisstuff"
"header1","header2"
datetime,tempfromsensor2
datetime,tempfromsensor2
datetime,tempfromsensor2
"downloadcomplete"
"#.Split("`n");
# Declare the list of sensor names
$SensorList = #('sensor1', 'sensor2');
$CurrentSensor = $null;
# WARNING: Clean up all CSV files in the same directory as the script
Remove-Item -Path $PSScriptRoot\*.csv;
# Iterate over each line in the text file
foreach ($Line in $Text) {
#region Line matches double quote
if ($Line -match '"') {
# Parse the property/value pairs (where double quotes are present)
if ($Line -match '"(.*?)",("(?<value>.*)"|(?<value>.*))') {
$Entry = [PSCustomObject]#{
Property = $matches[1];
Value = $matches['value'];
};
if ($matches[1] -in $SensorList) {
$CurrentSensor = $matches[1];
Write-Host -ForegroundColor Green -Object ('Current sensor is: {0}' -f $CurrentSensor);
}
}
}
#endregion Line matches double quote
#region Line does not match double quote
else {
# Parse the datetime/temp pairs
if ($Line -match '(.*?),(.*)') {
$Entry = [PSCustomObject]#{
DateTime = $matches[1];
Temp = $matches[2];
};
# Write the sensor's datetime/temp to its file
Add-Content -Path ('{0}\{1}.csv' -f $PSScriptRoot, $CurrentSensor) -Value $Line;
}
}
#endregion Line does not match double quote
}

Using the data sample you provided, the output of this script would as follows:
C:\sensoroutput_20140204.csv
sensor1,datetime,temp
sensor1,datetime,temp
sensor1,datetime,temp
sensor2,datetime,temp
sensor2,datetime,temp
sensor2,datetime,temp
I believe this is what you are looking for. The assumption here is the new line characters. The get-content line is reading the data and breaking it into "sets" by using 2 new line characters as the delimiter to split on. I chose to use the environment's (Windows) new line character. Your source file may have different new line characters. You can use Notepad++ to see which characters they are e.g. \r\n, \n, etc.
$newline = [Environment]::NewLine
$srcfile = "C:\sensordata.log"
$dstpath = 'C:\sensoroutput_{0}.csv' -f (get-date -f 'yyyyMMdd')
# Reads file as a single string with out-string
# then splits with a delimiter of two new line chars
$datasets = get-content $srcfile -delimiter ($newline * 2)
foreach ($ds in $datasets) {
$lines = ($ds -split $newline) # Split dataset into lines
$setname = $lines[0] -replace '\"(\w+).*', '$1' # Get the set or sensor name
$lines | % {
if ($_ -and $_ -notmatch '"') { # No empty lines and no lines with quotes
$data = ($setname, ',', $_ -join '') # Concats set name, datetime, and temp
Out-File -filepath $dstpath -inputObject $data -encoding 'ascii' -append
}
}
}

Related

Can I convert a row of comma delimited values to a column

I have one row of temperature data in a text file that I would like to convert to a single column and save as a CSV file using a PowerShell script. The temperatures are separated by commas and look like this:
21,22,22,22,22,22,22,20,19,18,17,16,15,14,13,12,11,10,9,9,9,8,8,9,8,8,8,9,9,8,8,8,9,9,9,8,8,8,8,8,9,10,12,14,15,17,19,20,21,21,21,21,21,21,21,21,21,21,21,20,20,20,20,20,22,24,25,26,27,27,27,28,28,28,29,29,29,28,28,28,28,28,28,27,27,27,27,27,29,30,32,32,32,32,33,34,35,35,34,33,32,32,31,31,30,30,29,29,28,28,27,28,29,31,33,34,35,35,35,36,36,36,36,36,36,36,36,36,37,37,37,37,37,37,38,39,40,42,43,43,43,43,43,42,42,42,41,41,41,41,40,39,37,36,35,34,33,32,31,31,31,31,31,31,31,31,31,31,
I have tried several methods based on searches in this forum I thought this might work but it returns an error: Transpose rows to columns in PowerShell
This is the modified code I tried that returns: Error: "Input string was not in a correct format."
$txt = Get-Content 'C:myfile.txt' | Out-String
$txt -split '(?m)^,\r?\n' | ForEach-Object {
# create empty array
$row = #()
$arr = $_ -split '\r?\n'
$k = 0
for ($n = 0; $n -lt $arr.Count; $n += 2) {
$i = [int]$arr[$n]
# if index from record ($i) is greater than current index ($k) append
# required number of empty fields
for ($j = $k; $j -lt $i-1; $j++) { $row += $null }
$row += $arr[$n+1]
$k = $i
}
$row -join '|'
}
This seems like it should be simple to do with only one row of data. Are there any suggestions on how to convert this single row of numbers to one column?
Try this:
# convert row to column data
$header = 'TEMPERATURE'
$values = $(Get-Content input.dat) -split ','
$header, $values | Out-File result.csv
#now test the result
Import-Csv result.csv
The header is the first line (or record) in the CSV file. In this case it's a single word, because there is only one column.
The values are the items between commas in the input. In this case, the -split on commas generates an array of strings. Note that, if comma is a separator, there will be no comma after the last temperature. Your data doesn't look like that, but I have assumed that the real data does.
Then, we just write the header and the array out to a file. But what happened to all the commas? It turns out that, for a single column CSV file, there are no commas separating fields. So the result is a simple CSV file.
Last, there is a test of the output using Import-csv to read the result and display it in table format.
This isn't necessarily the best way to code it, but it might help a beginner get used to powershell.
Assuming I'm understanding your intent correctly, based on your verbal description (not your own coding attempt):
# Create simplified sample input file
#'
21,22,23,
'# > myfile.txt
# Read the line, split it into tokens by ",", filter out empty elements
# with `-ne ''` (to ignore empty elements, such as would
# result from the trailing "," in your sample input),
# and write to an output CSV file with a column name prepended.
(Get-Content myfile.txt) -split ',' -ne '' |
ForEach-Object -Begin { 'Temperatures' } -Process { $_ } |
Set-Content out.csv
More concise alternative, using an expandable (interpolating) here-string:
# Note: .TrimEnd(',') removes any trailing "," from the input.
# Your sample input suggests that this is necessary.
# If there are no trailing "," chars., you can omit this call.
#"
Temperatures
$((Get-Content myfile.txt).TrimEnd(',') -split ',' -join [Environment]::NewLine)
"# > out.csv
out.csv then contains:
Temperatures
21
22
23

Scanning log file using ForEach-Object and replacing text is taking a very long time

I have a Powershell script that scans log files and replaces text when a match is found. The list is currently 500 lines, and I plan to double/triple this. the log files can range from 400KB to 800MB in size. 
Currently, when using the below, a 42MB file takes 29mins, and I'm looking for help if anyone can see any way to make this faster?
I tried changing ForEach-Object with ForEach-ObjectFast but it's causing the script to take sufficiently longer. also tried changing the first ForEach-Object to a forloop but still took ~29 mins. 
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
'bbb:ccc:456'='WORDB:WORDBC:NUMBER456'
}
Get-Content -Path $inputfile | ForEach-Object {
$line=$_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line-match$_.Key)
{
$line=$line-replace$_.Key,$_.Value
}
}
$line
}|Set-Content -Path $outputfile
Since you say your input file could be 800MB in size, reading and updating the entire content in memory could potentially not fit.
The way to go then is to use a fast line-by-line method and the fastest I know of is switch
# hardcoded here for demo purposes.
# In real life you get/construct these from the Get-ChildItem
# cmdlet you use to iterate the log files in the root folder..
$inputfile = 'D:\Test\test.txt'
$outputfile = 'D:\Test\test_new.txt' # absolute full file path because we use .Net here
# because we are going to Append to the output file, make sure it doesn't exist yet
if (Test-Path -Path $outputfile -PathType Leaf) { Remove-Item -Path $outputfile -Force }
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
}
# create a regex string from the Keys of your lookup table,
# merging the strings with a pipe symbol (the regex 'OR').
# your Keys could contain characters that have special meaning in regex, so we need to escape those
$regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::Escape($_) }) -join '|')
# create a StreamWriter object to write the lines to the new output file
# Note: use an ABSOLUTE full file path for this
$streamWriter = [System.IO.StreamWriter]::new($outputfile, $true) # $true for Append
switch -Regex -File $inputfile {
$regexLookup {
# do the replacement using the value in the lookup table.
# because in one line there may be multiple matches to replace
# get a System.Text.RegularExpressions.Match object to loop through all matches
$line = $_
$match = [regex]::Match($line, $regexLookup)
while ($match.Success) {
# because we escaped the keys, to find the correct entry we now need to unescape
$line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
$match = $match.NextMatch()
}
$streamWriter.WriteLine($line)
}
default { $streamWriter.WriteLine($_) } # write unchanged
}
# dispose of the StreamWriter object
$streamWriter.Dispose()

Re-assembling split file names with Powershell

I'm having trouble re-assembling certain filenames (and discarding the rest) from a text file. The filenames are split up (usually on three lines) and there is always a blank line after each filename. I only want to keep filenames that begin with OPEN or FOUR. An example is:
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
The output I'd like would be:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
Thanks for any suggestions!
The following worked for me, you can give it a try.
See https://regex101.com/r/JuzXOb/1 for the Regex explanation.
$source = 'fullpath/to/inputfile.txt'
$destination = 'fullpath/to/resultfile.txt'
[regex]::Matches(
(Get-Content $source -Raw),
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') }) |
Out-File $destination
For testing:
$txt = #'
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
OPEN.492820.EXTR
A.EXAMPLE123
FOUR.383838.282.
STAND.848484.123
ZXC
'#
[regex]::Matches(
$txt,
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') })
Output:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
OPEN.492820.EXTRA.EXAMPLE123
FOUR.383838.282.STAND.848484.123ZXC
Read the file one line at a time and keep concatenating them until you encounter a blank line, at which point you output the concatenated string and repeat until you reach the end of the file:
# this variable will keep track of the partial file names
$fileName = ''
# use a switch to read the file and process each line
switch -Regex -File ('path\to\file.txt') {
# when we see a blank line...
'^\s*$' {
# ... we output it if it starts with the right word
if($s -cmatch '^(OPEN|FOUR)'){ $fileName }
# and then start over
$fileName = ''
}
default {
# must be a non-blank line, concatenate it to the previous ones
$s += $_
}
}
# remember to check and output the last one
if($s -cmatch '^(OPEN|FOUR)'){
$fileName
}

How can I convert CSV files with a meta data header row into flat tables using Powershell?

I have a few thousand CSV files with a format similar to this (i.e. a table with a meta data row at the top):
dinosaur.csv,water,Benjamin.Field.12.Location53.Readings,
DATE,VALUE,QUALITY,STATE
2018-06-01,73.83,Good,0
2018-06-02,45.53,Good,0
2018-06-03,89.123,Good,0
Is it possible to use PowerShell to convert these CSV files into a simple table format such as this?
DATE,VALUE,QUALITY,STATE,FILENAME,PRODUCT,TAG
2018-06-01,73.83,Good,0,dinosaur.csv,water,Benjamin.Field.12.Location53.Readings
2018-06-02,45.53,Good,0,dinosaur.csv,water,Benjamin.Field.12.Location53.Readings
2018-06-03,89.123,Good,0,dinosaur.csv,water,Benjamin.Field.12.Location53.Readings
Or is there a better alternative to preparing these CSV's into a straight forward format to be ingested?
I have used PS to process simple CSV's before, but not with a meta data row that was important.
Thanks
Note: This is a faster alternative to thepip3r's helpful answer, and also covers the aspect of saving the modified content back to CSV files:
By using the switch statement to efficiently loop over the lines of the files as text, the costly calls to ConvertFrom-Csv, Select-Object and Export-Csv can be avoided.
Note that the switch statement is enclosed in $(), the subexpression operator, so as to enable writing back to the same file in a single pipeline; however, doing so requires keeping the entire (modified) file in memory; if that's not an option, enclose the switch statement in & { ... } and pipe it to Set-Content to a temporary file, which you can later use to replace the original file.
# Create a sample CSV file in the current dir.
#'
dinosaur.csv,water,Benjamin.Field.12.Location53.Readings,
DATE,VALUE,QUALITY,STATE
2018-06-01,73.83,Good,0
2018-06-02,45.53,Good,0
2018-06-03,89.123,Good,0
'# > sample.csv
# Loop over all *.csv files in the current dir.
foreach ($csvFile in Get-Item *.csv) {
$ndx = 0
$(
switch -File $csvFile.FullName {
default {
if ($ndx -eq 0) { # 1st line
$suffix = $_ -replace ',$' # save the suffix to append to data rows later
} elseif ($ndx -eq 1) { # header row
$_ + ',FILENAME,PRODUCT,TAG' # add additional column headers
} else { # data rows
$_ + ',' + $suffix # append suffix
}
++$ndx
}
}
) # | Set-Content $csvFile.FullName # <- activate this to write back to the same file.
# Use -Encoding as needed.
}
The above yields the following:
DATE,VALUE,QUALITY,STATE,FILENAME,PRODUCT,TAG
2018-06-01,73.83,Good,0,dinosaur.csv,water,Benjamin.Field.12.Location53.Readings
2018-06-02,45.53,Good,0,dinosaur.csv,water,Benjamin.Field.12.Location53.Readings
2018-06-03,89.123,Good,0,dinosaur.csv,water,Benjamin.Field.12.Location53.Readings
## If your inital block is an accurate representation
$s = get-content .\test.txt
## Get the 'metadata' line
$metaline = $s[0]
## Remove the metadata line from the original and turn it into a custom powershell object
$n = $s | where-object { $_ -ne $metaline } | ConvertFrom-Csv
## Split the metadata line by a comma to get the different parts for appending to the other content
$m = $metaline.Split(',')
## Loop through each item and append the metadata information to each entry
for ($i=0; $i -lt $n.Count; $i++) {
$n[$i] = $n[$i] | Select-Object -Property *,FILENAME,PRODUCT,TAG ## This is a cheap way to create new properties on an object
$n[$i].Filename = $m[0]
$n[$i].Product = $m[1]
$n[$i].Tag = $m[2]
}
## Display that the new objects reports as the desired output
$n | format-table

Replace multiple character strings at once using a replace matrix

Within the content of a couple large text files, I am aiming to replace all occurrences of a specific character string with a new character string, simultaneously for 300 different character strings.
Is there any way I can do this using a comma or tab-separated search-and-replace matrix such as this? (the actual character strings vary widely in their length and type of characters, but does not contain , or TAB)
currentstring1,newstring1
currentstring2,newstring2
currentstring3,newstring3
aB9_./cdef,newstring4
.
currentstring300,newstring300
Here is something to get you started. If the replacement file is ~300 lines, then Import-Csv should be ok. However, if the file in which to replace strings is large, Get-Content will be a problem. It will try to read the entire file into memory. You will need to iterate over the file reading line-by-line.
[cmdletbinding()]
Param()
$thefile = './largetextfile.txt'
$replfile = './repl.txt'
$reps = Import-Csv -Path $replfile -Header orgstring,repstring
foreach ($rep in $reps) {
Write-Verbose $rep
}
$lines = Get-Content -Path $thefile
foreach ($line in $lines) {
Write-Verbose $line
$newline = $line
foreach ($rep in $reps) {
$newline = $newline -replace $rep.orgstring,$rep.repstring
}
Write-Verbose $newline
}
On the server, unix: 1. Make the rename matrix as below in a text editor, then copy it. 2. In the server dir where the files located, paste the multi-line rename matrix as is. 3. Enter. 4. Some characters (like slashes) may need to be escaped if present in the string, and the * at the end may be replaced to specify files.
perl -pi -e 's/FINDTEXT1/REPLACETEXT1/g' *
perl -pi -e 's/FINDTEXT2/REPLACETEXT2/g' *
perl -pi -e 's/FINDTEXT3/REPLACETEXT3/g' *