Changing text in CSV file using Perl

Changing text in CSV file using Perl - perl

I know nothing about Perl. I have looked at some online tutorials and am at a loss for the following.
I do a query in PostgreSQL that saves to a CSV file. However, one element needs to be changed after the CSV file is created, and I have no idea how to do it.
The existing query results are like this
phone date time staff email and customer ID -- my explanation
1112223333,10/21/2013,3:00 AM,sklund#myemail.comSMIB010170 -- data in csv
After query is completed, the data in the time field must be converted to:
1112223333,10/21/2013,03:00am,sklund#myemail.comSMIB010170
As you can see, the time needs to be ammended to include a 0 if the hour is less than ten, and the AM must be changed to am.
Is there a simple Perl script that can do this? The lines of data, of course, will be different, as each line would reflect results of the query for the day.
If someone can point me to a tutorial, link, or help in this I'd be very grateful.

This will do what you need.
I assume you want the space before AM removed as well? You don't mention it in your question.
perl -pe 's/,(\d{1,2}):(\d\d)\s+([AP]M),/sprintf ",%02d:%02d%s,",$1,$2,lc $3/ei' mylogfile > newlogfile

Related

Import Flat File via SSMS to SQL Server fails

When importing a seemingly valid flat file (csv, text etc) into a SQL Server database using the SSMS Import Flat File option, the following error appears:
Microsoft SQL Server Management Studio
Error inserting data into table. (Microsoft.SqlServer.Import.Wizard)
Error inserting data into table. (Microsoft.SqlServer.Prose.Import)
Object reference not set to an instance of an object. (Microsoft.SqlServer.Prose.Import)
The target table may contain rows that imported just fine. The first row that is not imported appears to have no formatting errors.
What's going wrong?

Check the following:
that there are no blank lines at the end of the file (leaving the last line's line terminator intact) - this seems to be the most common issue
there are no unexpected blank columns
there are no badly escaped quotes
It looks like the import process loads lines in chunks. This means that the lines following the last successfully loaded chunk may appear to have no errors. You need to look at subsequent lines, that are part of the failing chunk, to find the offending line(s).
This cost me hours of hair pulling while dealing with large files. Hopefully this saves someone some time.

If the file you're importing is already open, SSMS will throw this error. Close the file and try again.

Make sure when you are creating your flat-file IF you have text (varchar) value in any of your columns, DO NOT select your file to be comma "," delimited. Instead, select vertical line "|" or something that you are SURE it can't be in those values. the comma is supper common to have in nvarchar filed.
I have this issue and none of the recommendations from other answers helped me!
I hope this saves someone some times and it took me hours to figure it out!!!

None of these other ones worked for me, however this did:
When you import a flat file, SSMS gives you a brief summary of the data types within each column. Whenever you see a nvarchar that's in an int or double column, change it to int or double. And change all nvarchars to nvarchar(max). This worked for me.

I've been working with csv data for a long time. I encountered the similar problems when I first started this job, however as a novice, I couldn't obtain a precise fault from the exceptions.
Here are a few things you should look at before importing anything.
Your csv file must not be opened in any software, such as Excel.
Your csv file cells should not include comma or quotation symbols.
There are no unnecessary blanks at the end of your data.
There is no usage of a reserved term as data. In Excel, open
yourfile and save it as a new file.

After considering all the suggestions, if anyone is still having issues, check the length of the DataType for your columns. It took hours for me to figure this out but increasing the nvarchar length from (50) to (100) worked for me.

One thing that worked for me : You can change the error range to 1 in "Modify colums"
Image for clarity of where it is
You get an error message with the specific line that's problematic in your file instead of "ran out of memory"

I fixed these errors by playing around with the data type. For instance, change my tinyint to smallint, smallint to int, and increased my nvarchar() to reasonable values, else I set it to nvarchar(MAX). Since most of the real-life data do have missing values, I checked allowed missing values in all columns. Everything then worked with a warning message.

KDB+/Q query too heavy to handle

I want to grab data from a KDB data base for a list of roughly 200 days within the last two years. The 200 days are in no particular pattern.
I only need the data from 09:29:00.000 to 09:31:00.000 everyday.
My first approach was to query all of the last two years data that have time stamp between 09:29:00.000 and 09:31:00.000, because I didn't see a way to just query the particular 200 days that I need.
However this proved to be too much for my server to handle.
Then I tried to summarize the 2 minute data for each date into an average and just print out the average, so now I will only have 200 rows of data as output. But somehow this still turns out to be too much. I'm not sure if this is because I'm not selecting the data correctly.
My other suspicion is that the query is garbing all the data first then averaging each date, which means averaging is not making it easier to handle.
Here's the code that I have:
select maxPriceB:max(price), minPriceB:min(price), avgPriceB:avg(price), avgSizeB:avg(qty) by date from dms where date within(2015.01.01, 2016.06,10), time within(09:29:00.000, 09:31:00.000), sym = `ZF6
poms is the table that the data is in
ZFU6 is the symbol that im looking for
I tried adding the key word distinct after select.
I want to know if there's anyway to break up the query, or make the query lighter for the server to handle.
Thank you!

If you use 32-bit kdb+ and get infamous 'wsfull error then you may try processing one day at a time like this:
raze{select maxPriceB:max(price), minPriceB:min(price), avgPriceB:avg(price), avgSizeB:avg(qty)
from dms where date=x,sym=`ZF6,time within 09:29:00.000 09:31:00.000}each 2015.01.01+1+til 2016.06.10-2015.01.01

SUMIFS with multiple criteria, one of which is a single day

I'm trying to get a formula that will break down the amount of times a user enters a contest each day.
I'm pretty new to this whole thing, basically putting it together using google to figure out the code I need to use/modify. Explaining why something works would be greatly appreciated so I can use it elsewhere!
Here's a dummy of the form I'm banging my head against.
I would like the form to be reusable, so on the Sorted form I have a date key that automatically fills out the week when you choose the first day. Because of this, I would like each formula to refer to this date key, instead of manually typing the google equivalent of 'February 1st, 2015' into the formula.
I've tried to use the SUMIFS formula, and I've run in to a few errors.
Apparently both pages have to be the same amount of rows, otherwise I get an 'Array arguments to SUMIFS are of different size'. I didn't want my 'sorted' sheet to be 1761 rows long, since all of the duplicate names will have been condensed and I wanted it prettier. Nuts to that! Guess I can hide the rows? Is there any other solution?
It looks like this works:
=SUMIFS(Entered!E3:E1000, Sorted!E3:E1000, Sorted!$E3, Entered!A3:A1000, date(Sorted!$C7))
Where entered!E: is the number of entries, sortedE: is the list of usernames, and E3 is the specific one I'm looking for. Then EnteredA3 is the list of dates and time, and Sorted!C7 is the specific date I'm looking for. I don't get any results!
If I click on my C7 and sorted!A, the little calender pops up, which means they are dates (I think?). One includes the hours:minutes:seconds and the other doesn't, which I think is my problem. I would like to have sorted!C7 be the entire day, and filter out all of those entries.
This is taking information entered via a google form which I won't have control of, so I can't really change the H:M:S additions to the date column.
Thinking ahead to day 2 and onwards, will the same formula work when sorted!C10 is C$7$+1? Is it not a date anymore?
I would also like to add up the amount of daily entries, in sorted!S7 and below. I've tried wrapping both the column of dates and the date from my day key in the date() thing, but it doesn't seem to work either.
=SUMIF(date(Entered!A3:A),date(Sorted!C7),Entered!E3:E)
It gives me a '1', and I have no idea where that comes from.
I haven't been able to find much about the google SUMIFS function, mostly how to replicate it from before it was a thing.
And for even MORE complexity:
I was wondering if it is possible to have UNIQUE find the IDs in entered!C, and return all the associated usernames. That pesky angelo changed their username to 'pants' midway through the contest, and I'd like to be able to see both names and add up both 'angelo' and 'pants' entries in the same line in my formulas.
I feel like I'll need a few hidden columns that have the UNIQUE ID number and the associated usernames that I pull into my Sorted!Username column, but I don't know how to search the IDs to find the different usernames.
I tried to google that, but I have no idea what I'm googling.
Whewph! That is a lot of questions, thanks for any help!

Too long for my taste, but you might try:
=sumifs(Entered!E:E,Entered!A:A,">="&$C$7,Entered!A:A,"<"&$C$7+1,Entered!B:B,$E3)
in Sorted!F3 and copied down to suit.

Oh my goodness, you are a hero!
My final code wound up being:
=IF(ISBLANK(Sorted!$E3)=TRUE, "", sumifs('Entered'!$E:$E,'Entered'!$A:$A,">="&$C$7,'Entered'!$A:$A,"<"&$C$7+1,'Entered'!$B:$B,$E3))
I changed the start and end points by making $C$7 into $C$7+1, and the ending one into +2. (In case anyone else is looking at this answer.)
I'm super pleased that it worked!
Using this I managed to add up each of the daily entries, just by adding up the columns they were in.
I gave up on the UNIQUE idea, if someone changes their username during the contest, then they can add up the two rows themselves.
Thanks again! I'd upvote you, but I can't yet.

What is IBM i (AS400) record format

I've started to do some programming in ILE RPG and I'm curious about one thing - what exactly is the record format? I know that it has to be defined in physical/logical/display files but what exactly it does? In an old RPG book from 97 I've found that "Each record format defines what is written to or read from the workstation in a single I/O operation"
In other book I have found definition that record format describe the fields within a record(so for example length, type like char or decimal?).
And last, what exactly means that "every record within a physical file must have an identical record layout"?
I'm a bit confused right now. Still not sure what is record format :F.

Still not sure what is record format :F
The F Specification: This specification is also known as the File specification. Here we declare all the files which we will be using in the program. The files might be any of the physical file, logical file, display file or the printer file. Message files are not declared in the F specification.
what exactly means that "every record within a physical file must have an identical record layout"?
Each and every record within one physical file has the same layout.
Let's make a record layout of 40 characters.
----|---10----|---20----|---30----|---40
20150130 DEBIT 00002100
20150130 CREDIT 00012315
The bar with the numbers is not part of the record layout. It's there so we can count columns.
The first field in the record layout is the date in yyyymmdd format. This takes up 8 characters, from position 1 to position 8.
The second field is 2 blank spaces, from position 9 to position 10.
The third field is the debit / credit indicator. It takes up 10 characters, from position 11 to position 20.
The fourth field is the debit / credit amount. It takes up 8 positions, from position 21 to position 28. The format is assumed to be 9(6)V99. In other words, there's an implied decimal point between positions 26 and 27.
The fifth field is more blank spaces, from position 29 to position 40.
Every record in this file has these 5 fields, all defined the same way.

A "record format" is a named structure that is used for device file I/O. It contains descriptions of each column in the 'record' (or 'row'). The specific combination of data types and sizes and the number and order of columns is hashed into a value that is known as the "record format identifier".
A significant purpose is the inclusion by compilers of the "record format identifier" in compiled program objects for use when the related file is opened. The system will compare the format ID from the program object to the current format ID of the file. If the two don't match, the system will notify the program that the file definition has changed since the program was compiled. The program can then know that it is probably going to read data that doesn't match the definitions that it knows. Nearly all such programs are allowed to fail by sending a message that indicates that the format level has changed, i.e., a "level check" failed.
The handling of format IDs is rooted in the original 'native file I/O' that pre-dates facilities such as SQL. It is a part of the integration between DB2 and the various program compilers available on the system.
The 'native' database file system was developed using principles that eventually resulted in SQL. A SQL table should have rows that all hold the same series of column definitions. That's pretty much the same as saying "every record within a physical file must have an identical record layout".
Physical database files can be thought of as being SQL tables. Logical database files can be thought of as being SQL views. As such, all records in a physical file will have the same definitions, but there is some potential variation in logical files.

A record format It's something you learn in old school. You read a file (table) and update/write through a record format.
DSPFD FILE(myTable)
Then you can see everything about the file. The record format name is in there.

New or Young Developers believe that every record in a physical file must be identical, but in ancient times, the dinosaurs walk on earth and in one single file you could have several types of records or "record formats", so as the name indicates a record format is the format of a record within a file.

Perl - Scanning CSV files for rows that match user-specified criteria?

I am trying to write/learn a simple Perl parser for some CSV files that I have and I need some help.
In a directory I have a series of date-indexed CSV files in the form of Text-Date.csv. The date is in the form of Month-DD-YYYY (ex., January-07-2011). For each weekday there is a CSV file generated.
The Perl script should open each file look for a particular row that matches a user-entered criteria and return that row. Each row is stock price data with different stocks in different rows. What the script should do is return the price of a particular stock (ex., IBM) across all dates that CSVs are generated.
I have the parser working for a specific CSV/date that I choose, but I want to be able to pluck out the row in all CSVs. Also when I print the IBM price for each dated CSV I want to display the date next to the price (ex., January-07-2011 IBM 147.93).
Can you help me get this done?

If your question is how to crawl a bunch of files and run some function on each one, you probably want File::Find. To parse CSV, definitely use Text::xSV and not a custom parser. There is more to parsing CSV than calling split(",").

To parse CSV files, use the Text::CSV module.
It is more complex to decide how you are going to apply the criteria - you'll need to determine what the user specifies and work out how to translate that into Perl code that evaluates the condition correctly.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse