Renaming files at command line to a different length filename - command-line

I need to rename multiple files in a directory. They can be copied to another directory if necessary.
The filenames will be in the format 57.jpg, 57-1.jpg, 57-2.jpg etc. The suffix may be double digits.
I can't figure out how you would rename them if the destination is a different length.
So ren 57*.jpg 1234*.jpg or ren 57*.jpg 3*.jpg will alter the characters that I need intact (the -1, -2 etc).
I have a list of the start of the files and what they need to change to (eg: 57 becomes 1234, 59 becomes 3214, 598 becomes 3215 and so on).
Any help would be much appreciated.

Related

Why is no output files written in prinseqlite perl loop?

I am completely new to this type of coding/command lines, so I am sorry if I am asking this question in a wrong way.
I want to loop over all files in a directory (I am quality trimming DNA sequencing files (.fastq format))
I have written this loop:
for i in *.fastq; do
perl /apps/prinseqlite/0.20.4/prinseq-lite.pl -fastq $i -min_len 220 -max_len 240 -min_qual_mean 30 -ns_max_n 5 -trim_tail_right 15 -trim_tail_left 15 -out_good /proj/forhot/qfiltered/looptest/$i_filtered.fastq -out_bad null; done
The code itself seems to work, I can see in my terminal that it is taking the right files and it is doing the trimming (it is writing a summary log in the terminal as it goes), but no output files are generated - i.e these ones:
-out_good /proj/forhot/qfiltered/looptest/$i_filtered.fastq
If I run the code in a non-loop way, just on one file it works (= the output is generated). link this example:
prinseq-lite.pl -fastq 60782_merged_rRNA.fastq -min_len 220 -max_len 240 -min_qual_mean 30 -ns_max_n 5 -trim_tail_right 15 -trim_tail_left 15 -out_good 60782_merged_rRNA_filt_codeTEST.fastq -out_bad null
Is there a simple reason/answer to this?
This problem has nothing to do with Perl at all.
/proj/forhot/qfiltered/looptest/$i_filtered.fastq is read by the shell as interpolating the contents of i_filtered. There is no such shell variable, so this argument turns into /proj/forhot/qfiltered/looptest/.fastq ($i_filtered turns into nothing).
Therefore all of your prinseq-lite.pl executions place their output in the same file, which (because its name starts with a .) is "hidden": You need to use ls -a to see it, not just ls.
Fix
... -out_good /proj/forhot/qfiltered/looptest/${i}_filtered.fastq
Note that this would give you e.g. 60782_merged_rRNA.fastq_filtered.fastq for an input file of 60782_merged_rRNA.fastq. If you want to get rid of the duplicate .fastq part, you need something like:
... -out_good /proj/forhot/qfiltered/looptest/"${i%.fastq}"_filtered.fastq

Matlab dir('*.txt') command is not listing txt files in order

I am reading text files from a folder using dir('*.txt') in MATLAB. Text files are named 0, 4, 8, 12, ..180.txt. dir returns 0 first, then 100, then 104 and so on. Why is this happening?
Lexicographical ordering works by looking only at the information that is required to make a decision. The information, in our case, is the ASCII value of the characters in filenames. Consider the following examples:
If we have two files names 10.txt and 2.txt, the listing mechanism will compare the 1st character of these files, i.e. 1 vs. 2, and will return whichever is smallest, which in this case is the 1 that belongs to 10.txt.
If instead we had 2.txt and 20.txt, the first character is the same, so the next character will be compared, which is either . or 0. Here, since the ASCII value of . is 46 and of 0 is 48, 2.txt will be returned first.
You can solve this by always having the maximum number of digits you need for the filenames, meaning:
0.txt --> 000.txt
4.txt --> 004.txt
25.txt --> 025.txt
180.txt --> 180.txt
Then files will be returned in the expected order.
If you are sensitive to the order of files and you already know their names, you don't have to use dir at all:
for ii=0:4:180
filename = sprintf('%d.txt', ii);
fid = fopen( fullfile('/path/to', filename), 'r' );
% ... do the processing here
fclose(fid);
end

How to batch rename files to 3-digit numbers?

I apologize in advance that this question is not specific. But my goal is to take a bunch of image files, which are currently named as: 0.tif, 1.tif, 2.tif, etc... and rename them just as numbers to 000.tif, 001.tif, 002.tif, ... , 010.tif, etc...
The reason I want to do this is because I am trying to load the images into matlab and for batch processing but matlab does not order them correctly. I use the dir command as dir(*.tif) to get all the images and load them into an array of files that I can iterate over and process, but in this array element 1 is 0.tif, element 2 is 1.tif, element 3 is 10.tif, element 4 is 100.tif, and so on.
I want to keep the ordering of the elements as I process them. However, I do not care if I have to change the order of the elements BEFORE processing them (i.e. I can make it work to rename, for example, 2.tif to 10.tif if I had to) but I am looking for a way to convert the file names the way I initially described.
If there is a better way to get matlab to properly order the files when it loads them into the array using dir please let me know because that would be much easier.
Thanks!!
You can do this without having to rename the files, if you want. When you grab the files using dir, you'll have a list of files like so:
files =
'0.tif'
'1.tif'
'10.tif'
...
You can grab just the numeric part using regexp:
nums = regexp(files,'\d+','match');
nums = str2double([nums{:}]);
nums =
0 1 10 11 12 ...
regexp returns its matches as a cell-array, the second line converts it back to actual numbers.
We can now get an actual numeric order by sorting the resulting array:
[~,order] = sort(nums);
and then put the files in the correct order:
files = files(order);
This should (I haven't tested it, I don't have a folder full of numerically labelled files handy) produce a list of files like so:
files=
'0.tif'
'1.tif'
'2.tif'
'3.tif'
...
this is partially dependent on the version of matlab you have. If you have a version with findstr this should work well
num_files_to_rename = numel(name_array);
for ii=1:num_files_to_rename
%in my test i used cells to store my strings you may need to
%change the bracket type for your application
curr_file = name_array{ii};
%locates the period in the file name (assume there is only one)
period_idx = findstr(curr_file ,'.');
%takes everything to the left of the period (excluding the period)
file_name = str2num(curr_file(1:period_idx-1));
%zeropads the file name to 3 spaces using a 0
new_file_name = sprintf('%03d.tiff',file_name)
%you can uncomment this after you are sure it works as you planned
%movefile(curr_file, new_file_name);
end
the actual rename operation movefile is commented out for now. make sure the output names are as you expect before uncommenting it and renaming all the files.
EDIT there is no real error checking in this code, it just assumes every file name has one and only one period, and an actual number as the name
The Batch file below do the rename of the files you want:
#echo off
setlocal EnableDelayedExpansion
for /F "delims=" %%f in ('dir /B *.tif') do (
set "name=00%%~Nf"
ren "%%f" "!name:~-3!.tif"
)
Note that this solution preserve the same order of your original files, even if there are missing numbers in the sequence..

Script to batch rename to retain only some characters of original filename?

I need to rename thousands of rar files with original filenames of variable sizes. I must make them 10 characters long by keeping the first 3 and the last 4 characters of the original filename and adding in the middle 3 random characters [numbers].
Example:
input:
"John Doe - Jane Doe - 19073275.rar"
"XXXX - XYXY- 98705674.rar
output:
"Joh1273275.rar"
"XXX9795674.rar"
Next, the .bat should generate a .txt with the original name and the modified one underneath for each file!
I know it's possible but I'm completely stupid when it comes to writing it. Please help!
The Batch file below do what you want:
#echo off
setlocal EnableDelayedExpansion
for %%a in (*.rar) do (
set name=%%~Na
set num=00!random!
set newName=!name:~0,3!!num:~-3!!name:~-4!
ren "%%a" "!newName!%%~Xa"
echo "%%a" modified to "!newName!%%~Xa" >> log.txt
)
I'd write a script to generate the names in any simple way (say first 6 + last 4), and then check for any duplicates to be cleaned up by hand (or a second pass shifting the middle, or ...). Unless this is a repetitive job (do it daily), it isn't worth fully automatizing.

How can I copy columns from several files into the same output file using Perl

This is my problem.
I need to copy 2 columns each from 7 different files to the same output file.
All input and output files are CSV files.
And I need to add each new pair of columns beside the columns that have already been copied, so that at the end the output file has 14 columns.
I believe I cannot use
open(FILEHANDLE,">>file.csv").
Also all 7 CSV files have nearlly 20,000 rows each, therefore I'm reading and writing the files line by line.
It would be a great help if you could give me an idea as to what I should do.
Thanx a lot in advance.
Provided that your lines are 1:1 (Meaning you're combining data from line 1 of File_1, File_2, etc):
open all 7 files for input
open output file
read line of data from all input files
write line of combined data to output file
Text::CSV is probably the way to access CSV files.
You could define a csv handler for each file (including output), use getline or getline_hr (returns hashref) methods to fetch data, combine it into arrayrefs, than use print.