I have a text file that contains millions of records
I want to find out from each line that does not start with string + that line number (String starts with double quote 01/01/2019)
Can you help me modify this code?
Get-Content "(path).txt" | Foreach { if ($_.Split(',')[-1] -inotmatch "^01/01/2019") { $_; } }
Thanks
Based on your comments the content will look something like the array.
So you want to read the content, filter it, and get the resulting line from that content:
# Get the content
# $content = Get-Content -Path 'pathtofile.txt'
$content = #('field1,field2,field3', '01/01/2019,b,c')
# Convert from csv
$csvContent = $content | ConvertFrom-Csv
# Add your filter based on the field
$results = $csvContent | Where-Object { $_.field1 -notmatch '01/01/2019'} | % { $_ }
# Convert your results back to csv if needed
$results | ConvertTo-Csv
If performance is an issue then .net would handle millions of records with CsvHelper just like PowerBi.
# install CsvHelper
nuget install CsvHelper
# import csvhelper
import-module CsvHelper.2.16.3.0\lib\net45\CsvHelper.dll
# write the content to the file just for this example
#('field1,field2,field3', '01/01/2019,b,c') | sc -path "c:\temp\text.csv"
$results = #()
# open the file for reading
try {
$stream = [System.IO.File]::OpenRead("c:\temp\text.csv")
$sr = [System.IO.StreamReader]::new($stream)
$csv = [CsvHelper.CsvReader]::new($sr)
# read in the records
while($csv.Read()){
# add in the result
$result= #{}
[string] $value = "";
for($i = 0; $csv.TryGetField($i, [ref] $value ); $i++) {
$result.Add($i, $value);
}
# add your filter here for the results
$results.Add($result)
}
# dispose of everything once we are done
}finally {
$stream.Dispose();
$sr.Dispose();
$csv.Dispose();
}
My .txt file looks like this...
date,col2,col3
"01/01/2019 22:42:00", "column2", "column3"
"01/02/2019 22:42:00", "column2", "column3"
"01/01/2019 22:42:00", "column2", "column3"
"02/01/2019 22:42:00", "column2", "column3"
This command does exactly what you are asking...
Get-Content -Path C:\myFile.txt | ? {$_ -notmatch "01/01/2019"} | Select -Skip 1
The output is:
"01/02/2019 22:42:00", "column2", "column3"
"02/01/2019 22:42:00", "column2", "column3"
I skipped the top row. If you want to deal with particular columns, change myFile.txt to a .csv and import it.
Looking at the question and comments, you are dealing with a headerless CSV file it seems. Because the file contains millions of records, I think using Get-Content or Import-Csv could slow down too much. Using [System.IO.File]::ReadLines() would then be faster.
If indeed each line starts with a quoted date, you could use various methods of figuring out if the line start with "01/01/2019 or not. Here, I use the -notlike operator:
$fileIn = "D:\your_text_file_which_is_in_fact_a_CSV_file.txt"
$fileOut = "D:\your_text_file_which_is_in_fact_a_CSV_file_FILTERED.txt"
foreach ($line in [System.IO.File]::ReadLines($fileIn)) {
if ($line -notlike '"01/01/2019*') {
# write to a NEW file
Add-Content -Path $fileOut -Value $line
}
}
Update
Judging from your comment, you are apparently using an older .NET framework, as the [System.IO.File]::ReadLines() became available as of version 4.0.
In that case, the below code should work for you:
$fileIn = "D:\your_text_file_which_is_in_fact_a_CSV_file.txt"
$fileOut = "D:\your_text_file_which_is_in_fact_a_CSV_file_FILTERED.txt"
$reader = New-Object System.IO.StreamReader($fileIn)
$writer = New-Object System.IO.StreamWriter($fileOut)
while (($line = $reader.ReadLine()) -ne $null) {
if ($line -notlike '"01/01/2019*') {
# write to a NEW file
$writer.WriteLine($line)
}
}
$reader.Dispose()
$writer.Dispose()
Related
I don't know how to merge multiple .txt files with datas into one .csv file each of the .txt file seperated into columns.
This is my code so far,
$location = (Get-Location).Path
$files = Get-ChildItem $location -Filter "*.asd.txt"
$data = #()
foreach ($file in $files) {
$fileData = Get-Content $file.FullName
foreach ($line in $fileData) {
$lineData = $line -split "\t"
$data = $lineData[1]
Add-Content -Path "$location\output.csv" -Value $data
}
}
Each of the file looks like this
I want to keep the first column "WaveLength" and put the second columns next to each other from all the files in the folder. The header will start with the exac name
"stovikmladyDoupno2 2020080500001.asd" or "stovikmladyDoupno2 2020080500002.asd" and so on ....
so it should look like this
I have tried to look for information over two days and still don't know. I have tried to put "," on the end of the file, I though excel will handle with that, but nothing helped.
Here I provide few files as test data
https://mega.nz/folder/zNhTzR4Z#rpc-BQdRfm3wxl87r9XUkw
few lines of data
Wavelength stovikmladyDoupno2 2020080500000.asd
350 6.38961399706465E-02
351 6.14107911262903E-02
352 6.04866108251357E-02
353 5.83485359067184E-02
354 0.054978792413247
355 5.27014859356317E-02
356 5.34849237528764E-02
357 5.32841277775603E-02
358 5.23466655229364E-02
359 5.47595002186027E-02
360 5.22061034631109E-02
361 4.90149806042666E-02
362 4.81633530421385E-02
363 4.83974076557941E-02
364 4.65219929658367E-02
365 0.044800930294557
366 4.47830287392802E-02
367 4.46947539436297E-02
368 0.043756926558447
369 4.31725380363072E-02
370 4.36867609723618E-02
371 4.33227601805265E-02
372 4.29978664449687E-02
373 4.23860463187361E-02
374 4.12183604375401E-02
375 4.14306521081773E-02
376 4.11760903772502E-02
377 4.06421127128478E-02
378 4.09771489689262E-02
379 4.10083126746385E-02
380 4.05161601354181E-02
381 3.97904564387456E-02
I assumed a location since I'm not fond of declaring file paths without a literal path. Please adjust path as needed.
$Files = Get-ChildItem J:\Test\*.txt -Recurse
$Filecount = 0
$ObjectCollectionArray = #()
#Fist parse and collect each row in an array.. While keeping the datetime information from filename.
foreach($File in $Files){
$Filecount++
Write-Host $Filecount
$DateTime = $File.fullname.split(" ").split(".")[1]
$Content = Get-Content $File.FullName
foreach($Row in $Content){
$Split = $Row.Split("`t")
if($Split[0] -ne 'Wavelength'){
$Object = [PSCustomObject]#{
'Datetime' = $DateTime
'Number' = $Split[0]
'Wavelength' = $Split[1]
}
$ObjectCollectionArray += $Object
}
}
}
#Match by number and create a new object with relation to the number and different datetime.
$GroupedCollection = #()
$Grouped = $ObjectCollectionArray | Group-Object number
foreach($GroupedNumber in $Grouped){
$NumberObject = [PSCustomObject]#{
'Number' = $GroupedNumber.Name
}
foreach($Occurance in $GroupedNumber.Group){
$NumberObject | Add-Member -NotePropertyName $Occurance.Datetime -NotePropertyValue $Occurance.wavelength
}
$GroupedCollection += $NumberObject
}
$GroupedCollection | Export-Csv -Path J:\Test\result.csv -NoClobber -NoTypeInformation
What you're looking to do is quite a hard task, there are a few ways to do it. This method requires that all files are in memory to process them. You can definitely treat these files as TSVs, so Import-Csv -Delimiter "`t" is an option so you can deal with objects instead of plain text.
# using this temp dictionary to create objects for each line of each tsv
$tmp = [ordered]#{}
# get all files and enumerate
$csvs = Get-ChildItem $location -Filter *.asd.txt | ForEach-Object {
# get their content as objects
$content = $_ | Import-Csv -Delimiter "`t"
# get their property Name that is not `Wavelength`
$property = $content[0].PSObject.Properties.Where{ $_.Name -ne 'Wavelength' }.Name
# output an object holding the total lines of this csv,
# its content and the property name of interest
[pscustomobject]#{
Lines = $content.Count
Content = $content
Property = $property
}
}
# use a scriptblock to allow streaming so `Export-Csv` starts exporting as
# output is going through the pipeline
& {
# for loop used for each line of the Tsv having the highest number of lines
for($i = 0; $i -lt [System.Linq.Enumerable]::Max([int[]] $csvs.Lines); $i++) {
# this boolean is used to preserve the "Wavelength" value of the first Tsv
$isFirstCsv = $true
foreach($csv in $csvs) {
# if this is the first object
if($isFirstCsv) {
# add the value of "Wavelength"
$tmp['Wavelength'] = $csv.Content[$i].Wavelength
# and set the bool to false, since we are only using this once
$isFirstCsv = $false
}
# then add the value of each property of each Tsv to the temp dictionary
$tmp[$csv.Property] = $csv.Content[$i].($csv.Property)
}
# then output this object
[pscustomobject] $tmp
# clear the temp dictionary
$tmp.Clear()
}
} | Export-Csv path\to\result.csv -NoTypeInformation
Here is a much more efficient approach that treats the files as plain text, this method is much faster and memory efficient however not as reliable. It uses StreamReader to read the file contents line-by-line and a StringBuilder to construct each line.
& {
# get all files and enumerate
$readers = Get-ChildItem $location -Filter *.asd.txt | ForEach-Object {
# create a stream reader for each file
[System.IO.StreamReader] $_.FullName
}
# this StringBuilder is used to construct each line
$sb = [System.Text.StringBuilder]::new()
# while any of the readers has more content
while($readers.EndOfStream -contains $false) {
# signals this is our first Tsv
$isFirstReader = $true
# enumerate each reader
foreach($reader in $readers) {
# if this is the first Tsv
if($isFirstReader) {
# append the line as-is, only trimming exces white space
$sb = $sb.Append($reader.ReadLine().Trim())
$isFirstReader = $false
# go to next reader
continue
}
# if this is not the first Tsv,
# split on Tab and exclude the first token (Wavelength)
$null, $line = $reader.ReadLine().Trim() -split '\t'
# append a Tab + this line
$sb = $sb.Append("`t$line")
}
# append a new line and output the constructed string
$sb.AppendLine().ToString()
# and clear it for next lines
$sb = $sb.Clear()
}
# dispose all readers when done
$readers | ForEach-Object Dispose
} | Set-Content path\to\result.tsv -NoNewline
I want a script that can help me check for the name of keyset (column a) in Sample.cvs and then replace the current command(column b) with new command (column c) in the Source text file.
CSV file: Sample.csv
A. | B. | C.
Manock | 2B | 2ab
Sterling | 3F | 3sf
Source file text: Source.txt
keyset "Manock"
(
key("SELECT")
command ("display/app=%disapp% "2B")
);
So desired output:
keyset "Manock"
(
key("SELECT")
command ("display/app=%disapp% "2ab")
);
Powershell Script:
New-Item -Path "C:\Users\e076200\Desktop\ks_update\source.txt" -ItemType File -Force
$data = Get-Content C:\Users\e076200\Desktop\ks_update\source.ddl
Add-Content -Value $data -Path "C:\Users\e076200\Desktop\ks_update\source.txt"
$foundline = $false
$a = 0
$Etxt = foreach($line in Get-Content C:\Users\e076200\Desktop\ks_update\source.txt)
{
if ($line -match 'keyset "Manock"' )
{
$a = 0
$foundline = $true
}
$a= $a + 1
if($line -match "display/app" -and $a -eq 5 -and $foundline -eq $true)
{
$line = $line.replace('2b' , '2ab')
$line
}
else
{
$line
}
}
$Etxt | Set-Content C:\Users\e076200\Desktop\ks_update\source.txt -Force
$users = Import-CSV -Path:\Users\e076200\Desktop\ks_update\sample.csv
I've figured out how to find and replace one line in the file directly. I've also figured out how to import the csv. I need help on how to make the logic parameterized and use column A of CSV as the match piece and column c as the replacement piece.
Script Explanation.
New-Item -Path "C:\Users\e076200\Desktop\ks_update\source.txt" -ItemType File -Force
New-Item creates new text file # location defined by -Path using name specified at the end, source.
ItemType to define type of document, -Force is force command.
$data = Get-Content C:\Users\e076200\Desktop\ks_update\source.ddl
Retrieves ddl and stores in variable.
Add-Content -Value $data -Path "C:\Users\e076200\Desktop\ks_update\source.txt"
Transfers content from variable to new text file created.
$foundline = $false
conditional variable defined for when keyset identifier is found.
$a = 0
counter defined for if statement.
$Etxt = foreach($line in Get-Content C:\Users\e076200\Desktop\ks_update\source.txt)
$Etxt - for loop
$line - variable for each line in txt
{
if ($line -match 'keyset "Manock"' )
{
$a = 0
$foundline = $true
}
If keyset identifier is found, set counter to 0 and set conditional variable to true
$a= $a + 1
if($line -match "display/app" -and $a -eq 5 -and $foundline -eq $true)
{
$line = $line.replace('2b' , '2ab')
$line
Match found, PS runs logic, line with keyset identifier == 0 + 1....up until line = 5 where we find item to be replaced.
For redundancy, line reader set to check for line identifier, ("display/app") on expected line.
If Redundant check if met and counter is 5 then we replace word with the line.replace function.
Overwritten data is returned in $line
}
else
{
$line
}
Else retain line
}
$Etxt | Set-Content C:\Users\e076200\Desktop\ks_update\source.txt -Force
Updated text file
$users = Import-CSV -Path:\Users\e076200\Desktop\ks_update\sample.csv
Imports Reference csv file
Please make explanation as dumbed down as possible. Thank you.
I'm working on a script which will add some additional informations to a txt file. These informations are stored in a CSV file which looks like this (the data will differs each time the script will launch):
Number;A;B;ValueOfB
FP01340/05/20;0;1;GTU_01,GTU_03
FP01342/05/20;1;0;GTU01
The txt file looks like this (data inside will of course differ each time):
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere|||||
2|zwol|9,00|9,00|0,00
2|23|157,91|194,23|36,32
1|1|FP01341/05/20|2020-05-02|2020-05-02|2020-05-02|12,19|14,99|2,80|Some info |2222222|blabla|11-111 something||||
2|23|12,19|14,99|2,80
1|1|FP01342/05/20|2020-05-02|2020-05-02|2020-05-02|525,36|589,64|64,28|bla|222222|blba 36||62030|something||
2|5|213,93|224,63|10,70
2|8|120,34|129,97|9,63
2|23|191,09|235,04|43,95
What I need to do is to find a line which contains 'Number' and then add value 'A' and 'B' from a CSV in a form: |0|1 and then on the first line below, at the end, add 'ValueofB' in a form |AAA_01,AAA_03
So the first two lines should look like this at the end:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere||||||0|1
2|zwol|9,00|9,00|0,00|AAA_01,AAA_03
2|23|157,91|194,23|36,32
Rest of lines should not be touched.
I made a script which uses select-string method with context to find what I need to - put that into an object and then add to previously found strings what I need to and put that in to an another object.
My script is as follws:
$csvFile = Import-Csv -Path Somepath\file.csv -Delimiter ";"
$file = "Somepath2\SomeName.txt"
$LinesToChange = #()
$script:LinesToChange = $LinesToChange
$LinesOriginal = #()
$script:LinesOriginal = $LinesOriginal
foreach ($line in $csvFile) {
Select-String -Path $file -Pattern "$($Line.number)" -Encoding default -Context 0, 1 | ForEach-Object {
$1 = $_.Line
$2 = $_.Context.PostContext
}
$ListOrg = [pscustomobject]#{
Line_org = $1
Line_GTU_org = $2
}
$LinesOriginal = $LinesOriginal + $ListOrg
$lineNew = $ListOrg.Line_org | foreach { $_ + "|$($line.A)|$($line.B)" }
$GTUNew = $ListOrg.Line_GTU_org | foreach { $_ + "|$($line.ValueofB)" }
$ListNew = [pscustomobject]#{
Line_new = $lineNew
Line_GTU_new = $GTUNew
Line_org = $ListOrg.Line_org
Line_GTU_org = $ListOrg.Line_GTU_org
}
$LinesToChange = $LinesToChange + $ListNew
}
The output is an object $LinesToChange which have original lines and lines after the change. The issue is I have no idea how to use that to change the txt file. I tried few methods and ended up with file which contains updated lines but all others are doubbled (I tried foreach) or PS is using whole RAM and couldn't finish the job :)
My latest idea is to use something like that:
(Get-Content -Path $file) | ForEach-Object {
$line = $_
$LinesToChange.GetEnumerator() | ForEach-Object {
if ($line -match "$($LinesToChange.Line_org)") {
$line = $line -replace "$($LinesToChange.Line_org)", "$($LinesToChange.Line_new)"
}
if ($line -match "$($LinesToChange.Line_GTU_org)") {
$line = $line -replace "$($LinesToChange.Line_GTU_org)", "$($LinesToChange.Line_GTU_new)"
}
}
} | Set-Content -Path Somehere\newfile.txt
It seemed promising at first, but the variable $line contains all lines and as such it can't find the match.
Also I need to be sure that the second line will be directly below the first one (it is unlikely but it can be a case that there will be two or more lines with the same data while the "number" from CSV file is unique) so preferably while changing the txt file it would be needed to find a match for a two-liner; in short:
find this two lines:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere|||||
2|zwol|9,00|9,00|0,00
change them to:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere||||||0|1
2|zwol|9,00|9,00|0,00|AAA_01,AAA_03
Do that for all lines in a $LinesToChange
Any help will be much appreciated!
Greetings!
Some strange text file you have there, but anyway, this should do it:
# read in the text file as string array
$txt = Get-Content -Path '<PathToTheTextFile>'
$csv = Import-Csv -Path '<PathToTheCSVFile>' -Delimiter ';'
# loop through the items (rows) in the CSV and find matching lines in the text array
foreach ($item in $csv) {
$match = $txt | Select-String -Pattern ('|{0}|' -f $item.Number) -SimpleMatch
if ($match) {
# update the matching text line (array indices count from 0, so we do -1)
$txt[$match.LineNumber -1] += ('|{0}|{1}' -f $item.A, $item.B)
# update the line following
$txt[$match.LineNumber] += ('|{0}' -f $item.ValueOfB)
}
}
# show updated text on screen
$txt
# save updated text to file
$txt | Set-Content -Path 'Somehere\newfile.txt'
I'm receiving an automated report from a system that cannot be modified as a CSV. I am using PowerShell to split the CSV into multiple files and parse out the specific data needed. The CSV contains columns that may contain no data, 1 value, or multiple values that are comma separated within the CSV file itself.
Example(UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542, 340668, 292196"
"Approval","AA-334454, 344366, 323570, 322827, 360225, 358850, 345935"
"ITS","345935, 358850"
"Services",""
I want the data to have one entry per line like this (UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542"
"Risk","340668"
"Risk","292196"
#etc.
I've tried splitting the data and I just get an unknown number of columns at the end.
I tried a foreach loop, but can't seem to get it right (pseudocode below):
Import-CSV $Groups
ForEach ($line in $Groups){
If($_.'Members'.count -gt 1, add-content "$_.Group,$_.Members[2]",)}
I appreciate any help you can provide. I've searched all the stackexchange posts and used Google but haven't been able to find something that addresses this exact issue.
Import-Csv .\input.csv | ForEach-Object {
ForEach ($Member in ($_.Members -Split ',')) {
[PSCustomObject]#{Group = $_.Group; Member = $Member.Trim()}
}
} | Export-Csv .\output.csv -NoTypeInformation
# Get the raw text contents
$CsvContents = Get-Content "\path\to\file.csv"
# Convert it to a table object
$CsvData = ConvertFrom-CSV -InputObject $CsvContents
# Iterate through the records in the table
ForEach ($Record in $CsvData) {
# Create array from the members values at commas & trim whitespace
$Record.Members -Split "," | % {
$MemberCount = $_.Trim()
# Check if the count is greater than 1
if($MemberCount -gt 1) {
# Create our output string
$OutputString = "$($Record.Group), $MemberCount"
# Write our output string to a file
Add-Content -Path "\path\to\output.txt" -Value $OutputString
}
}
}
This should work, you had the right idea but I think you may have been encountering some syntax issues. Let me know if you have questions :)
Revised the code as per your updated question,
$List = Import-Csv "\path\to\input.csv"
foreach ($row in $List) {
$Group = $row.Group
$Members = $row.Members -split ","
# Process for each value in Members
foreach ($MemberValue in $Members) {
# PS v3 and above
$Group + "," + $MemberValue | Export-Csv "\path\to\output.csv" -NoTypeInformation -Append
# PS v2
# $Group + "," + $MemberValue | Out-File "\path\to\output.csv" -Append
}
}
Hope someone can offer a suggestion to help me speed up a Powershell script. What I am doing is reading in hundreds of CSV files, parsing the information to get data about missing entries, and then writing that output to a HTML file. Here is the loop that I am using to process the files:
ForEach ($Filename in $FileList) {
$CustTemp = import-csv "$FilePath\$Filename"
$CustName = $CustTemp[0].CustName
Write-Host "Reading data for $CustName"`r
For ($counter=0;$counter -lt 31;$counter++){
$CheckDate = (get-date).AddDays(-$counter)
$CheckShortDate = $CheckDate.ToShortDateString()
$TempData = import-csv "$FilePath\$Filename" | Select FileName,FileDate | where {$_.FileDate -eq $CheckShortDate}
If ($TempData -eq $null) {
$row = "No file found for $CheckShortDate for $CustName"
$HTMLReportItems += $row
}
$HTMLReportItems = $HTMLReportItems | ConvertTo-Html -Fragment
}
}
This loop worked fine when I was testing with a few CSV files but when running it against a large number of files (300+) the loop is taking an extremely long time to complete for each file (30s-1m). I'm pretty sure the reason why is that the CSV file is being accessed 30 times per iteration. What I am hoping is that someone will have a better suggestion on how I can process the data.
You're reading $FilePath\$Filename multiple times. Read it outside the for loop and only do the filtering inside. Move the HTML generation outside the loop as well.
$HTMLReportItems = foreach ($Filename in $FileList) {
$csv = Import-Csv (Join-Path $FilePath $Filename)
$CustName = $csv[0].CustName
$data = $csv | select FileName,FileDate
Write-Host "Reading data for $CustName"
for ($counter=0;$counter -lt 31;$counter++){
$CheckShortDate = (Get-Date).AddDays(-$counter).ToShortDateString()
$TempData = $data | ? {$_.FileDate -eq $CheckShortDate}
if ($TempData -eq $null) {
"No file found for $CheckShortDate for $CustName"
}
}
}
$HTMLReportItems = $HTMLReportItems | ConvertTo-Html -Fragment