Powershell - Copying CSV, Modifying Headers, and Continuously Updating New CSV - powershell

We have a log that tracks faxes sent through our fax server. It is a .csv that contains Date_Time, Duration, CallerID, Direction (i.e. inbound/outbound), Dialed#, and Answered#. This file is overwritten every 10 minutes with any new info that was tracked on the fax server. This cannot be changed to be appended.
Sometimes our faxes fail, and the duration on those will be equal to 00:00:00. We really don't know if they are failing until users let us know that they are getting complaints about missing faxes. I am trying to create a Powershell script that can read the file and notify us via email if there are n amount of failures.
I started working on it, but it quickly became a big mess as I ran into more problems. One issue I was trying to overcome was having it email us over and over if there are certain failures. Since I can't save anything on the original .csv's, I was trying to preform these ideas in the script.
Copy .csv with a new header titled "LoggedFailure". Create file if it doesn't exist.
Compare the two files, and add different data (i.e. updates on the original) to the copy.
Check copied .csv for Durations equal to 00:00:00. If it is, mark the LoggedFailure header as "Yes" or some value.
If there are n amount of failures, email us.
Have this script run as a scheduled task (every hour or so).
I'm having difficulty with maintaining the data. I haven't done a lot of work with scripting or programming, so I'm having trouble with making the correct logic. I can look up cmdlets and understand them, but my main issue is logic. Does anyone have any tips or could provide some ideas on how to best update the data, track failures as to not send duplicate information, and have it run?

I'd use a hash table with the Dialed# as the key. Create PSCustomObjects that have LastFail date and FailCount properties as the values. Read through the log in chronological order, and add/increment a new entry in the hash table every time it finds an entry with Duration of 00:00:00 that's newer than what's already in the hash table. If it finds a successful delivery event, delete the entry with that Dialed# key from the hash table if it exists.
When it's done, the hash table keys will be a collection of the Dialed numbers that are failing, and the objects in the values will tell you how many failures there have been, and when the last one was. Use that to determine determine if an alert needs to be sent, and what numbers to report.
When a problem with a given fax number is resolved, a successful fax to that number will clear the entry from the hash table, and stop the alerts.
Save the hash table between runs by exporting it as CLIXML, and re-import it at the beginning of each run.

Related

How to do duplicate file check in DataStage?

For instance
File A Loaded then next day
File B Loaded then next day
This time Again, File A received this time sequence should be abort
Can anyone help me out with this
Thanks
There are multiple ways to solve this, but please don't do intentionally aborts as they're most likely boomerangs.
Keep track of filenames and file hashes (like MD5sum) in a table and compare the list before loading. If the file is known, handle/ignore it.
Just read the file again as if it was new or updated. Compare old data with new data using the Change Capture stage, handle data as needed, e.g. write changed and new data to target. (recommended)
I would not recommend writing a sequence that "should abort" as this is not the goal of an ETL process. If the file contains the very same content that is already known, just ignore it. If it has updated data, handle it as needed. Only abort, if there is a technical issue, e.g. the file given is wrong formatted. An abort of a job should indicate that something is wrong with the job. When you get a file twice, then it's not the job that failed.
If an error was found in the data that needs to be fixed by others, write the information about it to a table. Have a another independend process monitoring that table to tell the data producer about it (via dashboard, email,...).

Logging a counter value to a batch name in siemens TIA Portal

I need to create a program for 1214 PLC in TIA Portal and a Comfort HMI that counts several products using a count up and stores that value to a specific batch name.
For every new batch, the operator would enter a new batch name, and the counter will count the products for that specific batch.
The count needs to be displayed on the HMI screen along with the history of batches and the associated final count number.
So basically, I need a way to attach a name (batch_id) to a final count and log that pair for later reference.
Can someone give me some advice as to how I would do that?
To clarify, I need help with storing and displaying the counter value and batch names, not with the counting itself.
I appreciate any help you can provide.
There are a few ways to do this (yes, you can use PLC data logs and no they don't have to create a separate file for each batch), but I am posting here what I would do, because it's convenient for data backups, I have taken this approach before, and know it works.
Write the count value (generated in the PLC), the batch value and the timestamp to a CSV file on a USB drive inserted into the Comfort HMI, using VBScripts on the HMI.
Split the files regularly - e.g. daily, weekly or monthly, to minimize the risk of any single file becoming corrupt and you losing the data. More detail follows.
Data Storage:
Count is calculated in the PLC. Batch ID and timestamp can be stored in the PLC (if you want it to be retentive after a power cut), or in the HMI.
You will have Comfort HMI tags representing each of these three values. Once a batch is complete, call a VB script that writes the values of these values to CSV file. There are application examples and forum entries on SIOS about this.
Data display as a table:
Read the CSV file values according to your filter criteria (day, time range, batch ID, batch ID range, etc) using a VB script. Write to internal HMI tags.
Display these internal HMI tags as IO fields on a Comfort panel screen. This is your custom-built table and yes it's the only way to do it unless you want to create a custom control and install it on the panel.
Backing up:
Disable logging and check USB is not in use using a script, e.g. this: https://support.industry.siemens.com/cs/document/89855157
Remove the USB, copy the files, re-insert it and activate logging again.
(you implement the 'disable' and 'activate' logging features, e.g. using an internal BOOL tag that prevents a script from executing).
There is a lot of info on SIOS about these topics, as Application Examples, FAQs and forum entries.
support.industry.siemens.com
The PLC log method works, but data backup and especially display can become a pain.

Detect and remove duplicate HL7 messages in a log

I'm trying to populate a new EMR with data from an existing environment. I am pulling a log of all activity for a given interface and feeding it in to the inbound channel in the new environment. The problem is our existing channel has duplicates of the messages which will create duplicate reports in the patient records.
Beyond looking through what feels like the entire internet I've tried pushing text around in Iguana, PowerShell and Excel and I'm not familiar enough with MirthConnect to make use of it. I'm not married to any one solution, I just need a solution and PDQ.
I found a fairly good starting point at https://www.secretgeek.net/ps_duplicates and I've been massaging it but still no complete solution. At this point I've basically reset it to zero because nothing I've done has improved it (mostly I broke it repeatedly).
$hash = #{} #Define an empty hashtable
gc "c:\Samples\Q12019.txt" | #Send the content of the file into the pipeline...
% {
if ($hash.$_ -eq $null) { #if that line isn't a key in the hash table
# $_ is data from the pipe
$_ #send the data down the pipe
};
$hash.$_ = 1 #add that line to the hash so it doesn't resend
} > "c:\Samples\RadHx Test Q12019.txt"
This does some trippy stuff I don't understand. It ingests the file and the output has a new space B E T W E E N every single character in the file. I can't even tell if it's removing duplicates and I haven't been able to get it to stop doing this. I'm also not sure it's reading an entire message including all of it's segments. Example 2 at
https://healthstandards.com/blog/2007/09/10/variations-of-the-hl7-orur01-message-format/
looks close enough to what I'm dealing with as an example of ingest, just add 2000 more in a text file.
Simplified explanation:
I have a text file with several blocks of related text. Each block has the same starting sequence of characters, say 'ABC'. The blocks have an arbitrary length and don't necessarily end with the same string but all blocks end with CRLF. Problem:
Each block may not be unique but I need to eliminate repeating blocks of text so the file only contains one instance of each block of text.
Mirth should be able to easily debatch the file for you. If the messages are exact duplicates, you can probably just keep track as you go of a few of the MSH fields that should guarantee uniqueness.
If they were resends of the same data, where it is mostly the same, but some fields (especially in the MSH segment) may be updated, you'll probably want to exclude some of the segments, then hash the message, and track that instead (maybe with a patient id or something, in the rare case of a hash collision.)
You can store information in the globalChannelMap to compare values across messages. The map exists in memory only and won't survive a mirth restart, but that shouldn't be a problem for your one time conversion. If you need something more persistent, store the values in a database.

What are some options for keeping track of temporary results and re-use them after restart, in case the program dies while running?

(Suggestions for improving the title of this question are welcomed.)
I have a perl script that uses web APIs to fetch a user's "liked" posts on various sites (tumblr, reddit, etc.), then download some portion of each post (for example, an image that's linked from the post).
Right now, I have a JSON-encoded file that keeps track of the posts that have already been fetched (for tumblr, it just records the total number of likes, for reddit, it records, the "id" of the last post fetched) so that the script can just pick up with the newly "liked" items the next time it runs. This means that after the program is finished archiving a new batch of links, the new "stopping point" is recorded in the JSON file.
However, if the program croaks for some reason (or is killed with ctrl+c, say), the progress is not recorded (since the progress is only recorded at the end of the "fetching"). So the next time the program runs, it looks in the tracking file and gets the last recorded stopping point (the last time it successfully completed fetching and recorded the progress), and picks up there again, downloading duplicates up to the point where it croaked the last time.
My question is, what's the best (i.e. simplest, most efficient, take your pick--I'm open to options here) way to record progress with each incremental archived item, so that if the program dies for some reason, it always knows exactly where to pick up where it left off? Adapting the current method (literally print-ing to the tracking file at the end of each fetch) to do the same thing after each individual item is definitely not the best solution because it's got to be pretty inefficient.
Edited for clarity
Let me make clearer that the file used to track the downloaded posts is not large, and does not grow appreciably with each "fetch" operation. There is only one element for each api (tumblr, etc.) that contains either the total number of likes for the account (in other words, the number that we have already downloaded, so we query the api for the current total, subtract the number in the file, and we know how many new items to fetch), or the ID of the last item fetched (reddit uses this, so we can ask the api for all items "after" the one in the file and only get the new stuff).
My problem is not an ever growing list of fetched posts, rather it is writing to the tracking file every time one single post is downloaded (and there could be thousands of posts downloaded in a single run).
Some ideas I would consider:
Write to the file more often or use an interrupt handler to 'safely' handle the interrupt signal. When it's called, allow the script to write to your file so it's as current as possible and elegantly quit.
Use a better storage mechanic than writing to a flat file. I would consider, depending on the need, using a database to store the ids. I groan when database starts getting in play due to the complexities it adds, however it doesn't have to be. I've used SQLite for queuing but also consider DBD::CSV which just writes to a CSV while allowing SQL syntax (haven't used it myself). In your code you could then check if the id is already in the database and know to skip it. I would imagine that SQLite is also more 'efficient' than reading/writing a flat file and, imo, would be easier to code than having to write code to read a file yourself.
I'd just use a hash, tied to an NDBM file, to keep track of what is loaded and what isn't.
When you start a new batch of URLs, you delete the NDBM file.
Then, in your code, at the start of the program, you do
tie(%visited, 'NDBM_File', 'visitedurls', O_RDWR|O_CREAT, 0666)
(don't worry about the O_CREAT, the file will remain intact if it exists unless you pass O_TRUNC as well)
Assuming your main loop looks like this:
while ($id=<INFILE>) {
my $url=id_to_url($id);
my $results=fetch($url);
save_results($url, $results);
}
you change that to
while ($id=<INFILE>) {
my $url=id_to_url($id);
my $results;
if ($visited{$url}) {
$results=$visited{$url};
} else {
$results=fetch($url);
$visited{$url}=$results;
}
save_results($url, $results);
}
So whenever you fetch a new URL, you write the results to the NDBM file, and whenever you restart your program, the results that have already been fetched will be in the NDBM file and fetched from there instead of reading the URL.
This assumes $results is a scalar, else you won't be able to store/retrieve it in this way. But as you're producing JSON anyway, the "partial json" for each URL will probably be what you want to store.

Last Updated Date: Antipattern?

I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?
Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.
It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.
These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.
It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).
there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect
This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.