Powershell: search backwards from end of file - powershell

My script reads a log file once a minute and selects (and acts upon) the lines where the timestamp begins with the previous minute.
This is easy (the regex is simply "^$timestamp"), but when the log gets big it can take a while.
My thinking is the lines I want will always be near the bottom of the file, so I'd be searching far fewer lines if I started at the bottom and searched upwards, stopping when I get to the minute prior to the one I'm interested in.
My question is, how can I search from the bottom of the file instead of the top? Can I even say "read line $length", or even "read line n" (if so I could do a sort of binary search thing to find the length of the file and work backwards from there)?
Last question: would this even be faster (I'd still like to know how to do it even if it wouldn't be faster)?
Ideally, I'd like to do this all in my own code without installing anything extra.
Thanks

get-content bigfile.txt -tail 10
This words on huge files nearly instantly without any big memory usage.
I did it with a 22 GB text file in my testing.
Doing something like "get-context bigfile.txt | select -Last 10" works but it seems to have to load all of the lines (or objects in powershell) then does the select.

May I suggest just changing the regex to equal Get-Date + whatever time period you want?
For example (and this is without your log so i apologize)
$a = Get-Date
$hr = $a.Hour
$min = $a.Minute
Then work off those values to build out the regex to select the times you want. And if you don't already use it this website is awesome for building regex's quickly and easily http://gskinner.com/RegExr/ .
Got another fix, I think you will like this..
$a = get-content .\biglog.text
Use the length to slice the array from back to front change write host to select-string and your regex or whatever you want to do in reverse..
foreach($x in $a.length..0){ write-host $a[$x] }
Another option after the get-content cmdlet again, this option just reverse orders the array then you are reading $a from bottom to top
[array]::Reverse($a)
dc

If you only want the last bit of the file, depending on the format, you can just do this:
Get-Content C:\Windows\WindowsUpdate.log | Select -last 10
This will return the last 10 lines found in the file.

Related

Powershell: Compare filenames to a list of formats

I am not looking for a working solution but rather an idea to push me in the right direction as I was thrown a curveball I am not sure how to tackle.
Recently I wrote a script that used split command to check the first part of the file against the folder name. This was successful so now there is a new ask: check all the files against the naming matrix, problem is there are like 50+ file formats on the list.
So for example format of a document would be ID-someID-otherID-date.xls
So for example 12345678-xxxx-1234abc.xls as a format for the amount of characters to see if files have the correct amount of characters in each section to spot typos etc.
Is there any other reasonable way of tackling that besides Regex? I was thinking of using multiple splits using the hyphens but don't really have anything to reference that against other than the amount of characters required in each part.
As always, any (even vague) pointers are most welcome ;-)
Although I would use a regex (as also commented by zett42), there is indeed an other way which is using the ConvertFrom-String cmdlet with a template:
$template = #'
{[Int]Some*:12345678}-{[String]Other:xxxx}-{[DateTime]Date:2022-11-18}.xls
{[Int]Some*:87654321}-{[String]Other:yyyy}-{[DateTime]Date:18Nov2022}.xls
'#
'23565679-iRon-7Oct1963.xls' | ConvertFrom-String -TemplateContent $template
Some : 23565679
Other : iRon
Date : 10/7/1963 12:00:00 AM
RunspaceId : 3bf191e9-8077-4577-8372-e77da6d5f38d

Using Powershell to remove illegal CRLF from csv row

Gentle Reader,
I have a year's worth of vendor csv files sitting in a directory. My task is to load them into a SQL Server DB as a "Historical Load". The files are mal-formed and while we are working with the vendor to re-send 365 new, properly structured files, I have been tasked with trying to work with what we have.
I'm restricted to using either C# (as a script task in SSIS) or Powershell.
Each file has no header but the schema is known and built into the SSIS package connection.
Each file has approx 35k rows and roughly a few dozen mal-formed rows per file.
Each properly formed row consists of 122 columns, 121 comma's.
Rows are NOT text qualified.
Example: (data cleaned of PII)
555222,555222333444,1,HN71232,1/19/2018 8:58:07 AM,3437,27.50,HECTOR EVERYMAN,25-Foot Garden Hose - ,1/03/2018 10:17:24 AM,,1835,,,,,online,,MERCH,1,MERCH,MI,,,,,,,,,,,,,,,,,,,,6611060033556677,2526677,,,,,,,,,,,,,,EVERYMAN,,,,,,,,,,,,,,,,,,,,,,,,,,,,,VWDEB,,,,,,,555666NA118855,2/22/2018 12:00:00 AM,,,,,,,,,,,,,,,,,,,,,2121,,,1/29/2018 9:50:56 AM,0,,,[CRLF]
555222,555222444888,1,CASUAL50,1/09/2018 12:00:00 PM,7000,50.00,JANE SMITH,$50 Casual Gift Card,1/19/2018 8:09:15 AM,1/29/2018 8:19:25 AM,1856,,,,,online,,FREE,1,CERT,GC,,,,,,,6611060033553311[CRLF]
,6611060033553311[CRLF]
,,,,,,,,,25,,,6611060033556677,2556677,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,CASUAL25,VWDEB,,,,,,,555222NA118065,1/22/2018 12:00:00 AM,,,,,,,,,,,,,,,,,,,,,,,,1/19/2018 12:00:15 PM,0,,,[CRLF]
555222,555222777666,1,CASHCS,1/12/2018 10:31:43 AM,2500,25.00,BOB SMITH,BIG BANK Rewards Cash Back Credit [...6S66],,,1821,,,,,online,,CHECK,1,CHECK,CK,,,,,,,,,,,,,,,,,,,,555222166446,5556677,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,VWDEB,,,1/23/2018 10:30:21 AM,,,,555666NA118844,1/22/2018 12:00:00 AM,,,,,,,,,,,,,,,,,,,,,,,,1/22/2018 10:31:26 AM,0,,,[CRLF]
Powershell Get-Content (I think...) reads until file into an array where each row is identified by the CRLF as the terminator. This means (again, I think) that mal-formed rows will be treated as an element of the array without respect to how many "columns" it holds.
C# Streamreader also uses CRLF as a marker but a streamreader object also has a few methods available like Peek and Read that may be useful.
Please, Oh Wise Ones, point me in the direction of least resistance. Using Powershell, as a script to process mal-formed csv files such that CRLFs that are not EOL are removed.
Thank you.
Based on #vonPryz design but in (Native¹) PowerShell:
$Delimiters = 121
Get-Content .\OldFile.csv |ForEach-Object { $Line = '' } {
if ($Line) { $Line += ',' + $_ } else { $Line = $_ }
$TotalMatches = ($Line |Select-String ',' -AllMatches).Matches.Count
if ($TotalMatches -ge $Delimiters ) {
$Line
$Line = ''
}
} |Set-Content .\NewFile.Csv
1) I guess performance might be improved by avoiding += and using dot .net methods along with text streamers
Honestly, your best bet is to get good data from the supplier. Trying to work around a mess will just cause problems later on. Garbage in, garbage out. Since it's you who wrote the garbage data in the database, congratulations, it's now your fault that the DB data is of poor quality. Please talk with your manager and the stakeholders first, so that you have in writing an agreement that you didn't break the data and it was broken to start with. I've seen such problems on ETL processing all too often.
A quick and dirty pseudocode without error handling, edge case processing, substring index assumptions, performance guarantees and whatnot goes like so,
while(dataInFile)
line = readline()
:parseLine
commasInLine = countCommas(line)
if commasInLine == rightAmount
addLineInOKBuffer(line)
else
commasNeeded = rightAmount - commasInLine
if commasNeeded < 0
# too many commas, two lines are combined
lastCommaLocation = getLastCommaIndex(line, commasNeeded)
addLineInOKBuffer(line.substring(0, lastCommaLocation)
line = line.substring(lastCommaLocation, line.end)
goto :parseline
else
# too few lines, need to read next line too
line = line.removeCrLf() + readline()
goto :parseline
The idea is that first you look for a line and count how many commas there are. If the count matches what's expected, the row is not broken. Store it in a buffer containing good data.
If you have too many commas, then the row contains at least two different elements. Then find the index of where the first element ends, extract it and store it in the good data buffer. Then remove already processed part of the line and start again.
If you have too few commas, then the row is splitted by a newline. Read the next line from the file, join it with the current line and start the parsing again from counting the lines.

Removing CR LF improper split line on .txt pipe-delimited flat with Powershell script

Hope all is well! I came across a bit of a tricky issue with a flat file that is exported from Oracle PBCS with some carriage return issues. End users - when inputting data into PBCS - will often press in a specific data field input screen. When the data gets exported representing a specific record with all the data elements representing that data point (intersection) - think like a SQL record - the record element where the user pressed enter causes that record to break at that point - shifting the rest of the data elements in that record to the next line. This is very bad as each record must have the same amount of elements - causing downstream issues in a mapping. In effect one unique record becomes two broken records.
I need a Powershell script that looks at the improper CR LF (Windows system) and reforms each unique record. However, the majority of the records in the flat file are fine so the code will have to be able to discern the "mostly good" from the "very bad" cases.
My flat file is pipe delimited and has a header element. The header element may not need to be considered as I am simply trying to address the fix - a solution could potentially look at the amount of property values for the header record to determine how to format broken records based off a property count using the pipe delimiter - but not sure that is necessary.
I will be honest - there are Jython scripts I tried to no avail - so I felt given that I have employed a couple Powershell scripts for other reasons in the past that I would use this again. I have a basis of a script for a csv file - but this isn't quite working.
$file = Get-Content 'E:\EPM_Cloud\Exports\BUD_PLN\Data\EXPORT_DATA_BUD_PLN.txt'
$file| Foreach-Object {
foreach ($property in $_.PSObject.Properties) {
$property.Value = ($property.Value).replace("`r","").replace("`n","")
}
}
$file|out-file -append 'E:\EPM_Cloud\Exports\BUD_PLN\Data\EXPORT_DATA_BUD_PLN_FINAL.txt'
Here are a few examples of what the before and after case would be if I could get this code to work.
This is supposed to be one record - as you see beginning with "$43K from... the user pressed enter several times. As you see it is pipe delimited - I use the numeric line numbers to show you what I mean since this isn't notepad++. The idea is this should all just be on 1.
Contract TBD|#missing|#missing|#missing|#missing|ORNL to Perform Radio-Chemical (RCA) Measurements|#missing|#missing|#missing|#missing|"$43K from above
$92,903
$14,907
The current $150K to be reprogrammed to XXX, plus another $150K from Fuel Fac for this item to be reprogrammed to RES."|#missing|#missing|#missing|"Summary|X0200_FEEBASED|No_BOC|O-xxxx-B999|xx_xxx_xx_xxx|Plan|Active|FY19|BegBalance"|COMMIT
This is what the output should look like (I have attached screenshots instead). All in 1.
Contract TBD|#missing|#missing|#missing|#missing|ORNL to Perform Radio-Chemical (RCA) Measurements|#missing|#missing|#missing|#missing|"$43K from above $92,903 $14,907 The current $150K to be reprogrammed to XXX, plus another $150K from Fuel Fac for this item to be reprogrammed to RES."|#missing|#missing|#missing|"Summary|X0200_FEEBASED|No_BOC|O-xxxx-B999|xx_xxx_xx_xxx|Plan|Active|FY19|BegBalance"|COMMIT
In other cases the line breaks just once - all defined just by how many times the user presses enter.enter image description here
As you see in the data image - you see how the line splits - this is the point of the powershell. As you see next to that screenshot image - other lines are just fine.
So after checking locally you should be able to just import the file as a csv, then loop through everything and remove CRLF from each property on each record, and output to a new file (or the same, but its safer to output to a new file).
$Records = Import-Csv C:\Path\To\File.csv -Delimiter '|'
$Properties = $Records[0].psobject.properties.name
ForEach($Record in $Records){
ForEach($Property in $Properties){
$Record.$Property = $Record.$Property -replace "[\r\n]"
}
}
$Records | Export-Csv C:\Path\To\NewFile.csv -Delimiter '|' -NoTypeInfo

Powershell - Efficient Way to Return Line Numbers from Logs

I have an extremely large log file (max 1GB) which is appended to throughout the day. There are various strings within this log which I would like to search for (that I can already achieve using Select-String) however I am scanning the whole file on every sweep which is inefficient and a tad unnecessary.
Ideally I want to scan only the last 5 minutes of the log for these strings on each sweep. Unfortunately not every row of the log file contains a timestamp. I initially thought of doing a wildcard select-string for the last 5 mins timestamps combined with the strings of interest will miss some occurrences. My only other idea at the moment is to determine the line numbers of interest, $FromLineNumber (5 mins before current system time) and $ToLineNumber (the very last line number of log file) and then only Select-String between those two line number ranges.
As an example, to search between line 50 and the final line of the log. I am able to return the line number of $FromLineNumber but I'm struggling with grabbing $ToLineNumber for final row of log.
Q. How do I return only the line number of the final row of a log file?
So far I have tried returning this with Get-Content $path -tail -1 (object type linenumber) however this always returns blank values even with various switches and variations. I can only return line numbers via the Select-String cmdlet however I do not have a specific string to use that relates to the final row of the log. Am I misusing this cmdlet per its original design and if so...is there any other alternative to return the last line number?
Continued...Once I have determined the line number range to search between would I isolate those rows using a Get-Content loop between
$FromLineNumber and $ToLineNumber first to filter down to this smaller selection and then pipe this into a Select-String or is there a more efficient way to achieve this? I suspect that looping through thousands of lines would be demanding on resources so I'm keen to know if there is a better way.
Here is the answer to the first question
From https://blogs.technet.microsoft.com/heyscriptingguy/2011/10/09/use-a-powershell-cmdlet-to-count-files-words-and-lines/
If I want to know how many lines are contained in the file, I use the Measure->Object cmdlet with the line switch. This command is shown here:
Get-Content C:\fso\a.txt | Measure-Object –Line

powershell - replace line in .txt file

I am using PowerShell and I need replace a line in a .txt file.
The .txt file always has different number at the end of the line.
For example:
...............................txt (first)....................................
appversion= 10.10.1
............................txt (a second time)................................
appversion= 10.10.2
...............................txt (third)...................................
appversion= 10.10.5
I need to replace appversion + number behind it (the number is always different). I have set the required value in variable.
How do I do this?
Part of this issue you are getting, which I see from your comments, is that you are trying to replace text in a file and saved it back to the same file while you are still reading it.
I will try to show a similar solution while addressing this. Again we are going to use -replaces functionality as an array operator.
$NewVersion = "Awesome"
$filecontent = Get-Content C:\temp\file.txt
$filecontent -replace '(^appversion=.*\.).*',"`$1$NewVersion" | Set-Content C:\temp\file.txt
This regex will match lines starting with "appversion=" and everything up until the last period. Since we are storing the text in memory we can write it back to the same file. Change $NewVersion to a number ... unless that is your versioning structure.
Not sure about what numbers you are keeping
About which part of the numbers, if any, you are trying to preserve. If you intend to change the whole number then you can just .*\. to a space. That way you ignore everything after the equal sign.
Yes, you can with regex.
Let call $myString and $verNumber the variables with text and version number
$myString = "appversion= 10.10.1";
$verNumber = 7;
You can use -replace operator to get the version part and replace only last subversion number this way
$mystring -replace 'appversion= (\d+).(\d+).(\d+)', "appversion= `$1.`$2.$verNumber";