We have a process that loads a daily bank file (txt format) into another system and we have noticed this process fails if an extra comma appears between the first and last name. We currently manually fix this by finding the error line and removing the offending comma and resaving the file. This issue has started to occur more frequently now and on random lines so I'm trying to see if there is say a PowerShell script I can create that will look for and remove offending comma should one appear.
A sample of the text file looks like this:
206985,23034038,1,62,206985,60715093,0098,00000019600,DERBYSHIRE S ,20456461 , , 17195
206985,23034038,1,62,206209,23511456,0005,00000010000,BRYANS C ,20499987 , , 17195
206985,23034038,1,62,203351,83878848,0006,00000005000,JM HARVEY ,20560148 , , 17195
206985,23034038,1,62,202542,43608352,0032,00000010389,INGLIS P E ,21775660 , , 17195
206985,23034038,1,62,209263,30535818,0016,00000018000,MUZONDO F ,22194301 , , 17195
206985,23034038,1,62,205568,90105171,0092,00000010000,ADKIN ,AM ,22363046 , , 17195
As you can see in the last line there is an extra comma between the last and firstname. I need a script that would remove this before the file gets processed as usual.
Is this possible?
The filename also begins "camt.xxx.barc.stm_" and the remainder is made up of a ref number which changes daily so for example:
camt.xxx.barc.stm_D20170714_R7741261
camt.xxx.barc.stm_D20170720_R8447561
Related
Its a Django app in which im loading a CSV , table gets created OK but the CSV copying to PSQL fails with ERROR =
psycopg2.DataError: extra data after last expected column
CONTEXT: COPY csvfails, line 1:
Questions already referred -
"extra data after last expected column" while trying to import a csv file into postgresql
Have tested multiple times , with CSV of different Column Counts , am sure now the COLUMN Count is not the issue , its the content of the CSV file. As when i change the Content and upload same CSV , table gets created and dont get this error . Content of CSV file that fails is as seen below. Kindly advise what in this content prompts - psycopg2/psql/postgres to give this error .
No as suggested in the comment cant paste even a single row of the CSV file , the **imgur** image add-in , wont allow , not sure what to do now ?
Seen below screenshots from psql - cli - the table had been created with the correct columns count , still got the error .
EDIT_1 - Further while saving on my ubuntu , using libre office , unchecked the - Separator Options >> Separated By >> TAB and SEMICOLON . This CSV then saved with only -- Separator Options >> COMMA.
The python line of code which might be the culprit is =
with open(path_csv_for_psql, 'r') as f:
next(f) # Skip the header row.
csv_up_cursor.copy_from(f, str(new_table_name), sep=',')
conn.commit()
I thought i read somewhere that the - separator parameter passed to copy_from which is default = sep=',') , could be the issue ?
I am using the following line:
`:c:/dir/ set .Q.en[`:c:/dir; tablename]
Everything is ok if I don't exit KDB, but if I do and then try to load the table using
get `dir
all the symbol columns are integer. I would really appreciate your help into understanding why this happens.
It looks like you forgot to repeat the table name on the l.h.s. of set.
Try
q)`:c:/dir/tablename/ set .Q.en[`:c:/dir; tablename]
This will correctly save table columns in c:/dir/tablename subdirectory and place the sym file alongside. Now you should be able to load both your table and the sym file by using the \l command or specifying c:/dir on the command line when you restart q
q c:/dir
or
q
q)\l c:/dir
(no backticks or leading :'s in either of those commands)
If you want to use get on this table, you will have to load sym separately:
q)load`:c:/dir/sym
q)get`:c:/dir/tablename/
(note the leading : in the path specs)
Finally, you may want to take a look at the rsave command which will save your table without you having to write tablename twice.
.Q.en takes 2 oarams - file handle and table data
Your first param isnt a hsym - should be backtick then colon then path to your db root
Also set takes 2 params - first in this case should be the path to where you want to save like dir/splayedTableName/
I am trying to write the array values to CSV file in MATLAB using the following code
m=[3 12 15 ; 4 23 565];
dlmwrite('C:\Users\amar-admin\Desktop\abc.txt', m)
type C:\Users\amar-admin\Desktop\abc.txt
the output printed in the console is
3,12,15
4,23,565
but the output in File is
3,12,154,23,565
You may want to set the 'newline' option to 'pc':
dlmwrite('C:\Users\amar-admin\Desktop\abc.txt', m, 'newline', 'pc');
This will ensure that the file is created with a carriage return (\r) and a line feed (\n) at the end of each line, instead of potentially just a line feed, which could affect how it is displayed in certain text viewers. See this post for more information about the differences between \n and \r.
the issue was solved using .rtf extension
dlmwrite('C:\Users\amar-admin\Desktop\abc.rtf', m)
But, I still wonder if same is possible to .txt file
I have below function:
function [] = Write(iteration)
status=close('all');
nomrep=num2str(iteration);
fid=fopen('ID.dat','a');
frewind(fid);
for l=1:iteration
line=fgetl(fid);
end
fprintf(fid,[nomrep,' \n']);
status=fclose(fid);
end
I expect that Write(15) creates ID.dat and prints 2 and 15 in consecutive lines at begining of line 15th.
But is prints those values always on the beginning of the file.
Even I tried fgetl(fid) alone, and also replaced for loop with while loop still did not work.
Is it due to the fact that I should fill in the lines before that with some dummy space? along side this, I executed
for i=1:5
Write(i);
end
Which should print 1 to 5 in each line but even this does not work.
This line is the problem:
fid=fopen('ID.dat','w');
Everytime you open the file, you are overwriting the previous contents (that is what the 'w' argument does). Change 'w' to 'a' (for append), and your file will retain the contents from one write to the next.
So me being the 'noob' that I am, being introduced to programming via Perl just recently, I'm still getting used to all of this. I have a .fasta file which I have to use, although I'm unsure if I'm able to open it, or if I have to work with it 'blindly', so to speak.
Anyway, the file that I have contains DNA sequences for three genes, written in this .fasta format.
Apparently it's something like this:
>label
sequence
>label
sequence
>label
sequence
My goal is to write a script to open and read the file, which I have gotten the hang of now, but I have to read each sequence, compute relative amounts of 'G' and 'C' within each sequence, and then I'm to write it to a TAB-delimited file the names of the genes, and their respective 'G' and 'C' content.
Would anyone be able to provide some guidance? I'm unsure what a TAB-delimited file is, and I'm still trying to figure out how to open a .fasta file to actually see the content. So far I've worked with .txt files which I can easily open, but not .fasta.
I apologise for sounding completely bewildered. I'd appreciate your patience. I'm not like you pros out there!!
I get that it's confusing, but you really should try to limit your question to one concrete problem, see https://stackoverflow.com/faq#questions
I have no idea what a ".fasta" file or 'G' and 'C' is.. but it probably doesn't matter.
Generally:
Open input file
Read and parse data. If it's in some strange format that you can't parse, go hunting on http://metacpan.org for a module to read it. If you're lucky someone has already done the hard part for you.
Compute whatever you're trying to compute
Print to screen (standard out) or another file.
A "TAB-delimite" file is a file with columns (think Excel) where each column is separated by the tab ("\t") character. As quick google or stackoverflow search would tell you..
Here is an approach using 'awk' utility which can be used from the command line. The following program is executed by specifying its path and using awk -f <path> <sequence file>
#NR>1 means only look at lines above 1 because you said the sequence starts on line 2
NR>1{
#this for-loop goes through all bases in the line and then performs operations below:
for (i=1;i<=length;i++)
#for each position encountered, the variable "total" is increased by 1 for total bases
total++
}
{
for (i=1;i<=length;i++)
#if the "substring" i.e. position in a line == c or g upper or lower (some bases are
#lowercase in some fasta files), it will carry out the following instructions:
if(substr($0,i,1)=="c" || substr($0,i,1)=="C")
#this increments the c count by one for every c or C encountered, the next if statement does
#the same thing for g and G:
c++; else
if(substr($0,i,1)=="g" || substr($0,i,1)=="G")
g++
}
END{
#this "END-block" prints the gene name and C, G content in percentage, separated by tabs
print "Gene name\tG content:\t"(100*g/total)"%\tC content:\t"(100*c/total)"%"
}