Parsing first row of csv into sql table using batch file - postgresql

I have some csv files, I need to open csv file, read first line of csv and convert it into temporary sql table, and then load data into the sql table as follows:
Read the lines of the CSV and for each line:
Break it into fields create one temporary sql table
Insert those fields into a row of the database table
I tried something like this
This script is now divided in 4 parts,file initialization; file creation, process and copy data,
everything is working fine except,on fil.sql I am getting output as
CREATE TEMP TABLE temtab(
firstcolumn character varying (255),
secondcolumn character varying (255),
lastcolumn character varying (255),
);
\COPY temtab from bio.csv WITH DELIMITER ; csv HEADER
While I want without comma for last col
CREATE TEMP TABLE temtab (
firstcolumn character varying (255),
secondcolumn character varying (255),
lastcolumn character varying (255)
);
\COPY temtab from bio.csv WITH DELIMITER ; csv HEADER
#echo off
::setlocal enabledelayedexpansion
REM Assiging dir to current directory
SET dir=%CD%
REM Defining database name
SET dbname=****
REM Defining Host name
SET host=****
REM Defining user
SET user=****
REM Defining Port
SET port=****
REM SQL file where query is to be executed
SET sqfile=fil.sql
SET fi=bio.csv
call:fileinitialization
call:filecreation
call:proces
call:copydata
goto:eof
:fileinitialization
REM Assigning name of temporary table
SET tabnam=temtab
REM Setting delimiter to variable delim
SET delim=;
REM Declaring variable numfields to store index of variable names array
set numFields=0
echo para setted
set fi=bio.csv
SET tex=text
SET com=,
GOTO:EOF
:filecreation
REM Setting create temporary table command with table name tabnam
SET creat=CREATE TEMP TABLE %tabnam%
echo %creat%
GOTO:EOF
:proces
REM Executing loop for each file in current directory
echo %creat%>fil.sql
REM Read the lines of the CSV file
For /F "eol==" %%A in (bio.csv) Do ( set "line=%%A"
REM check if index of array is 0
if !numFields! equ 0 (
REM Fisrt line, Store in array name
for %%B in (!line: ^=!) do (
echo %%B character varying (255^),>>fil.sql
set /A numFields+=1
set name[!numFields!]=%%B
) ) )
GOTO:EOF
:copydata
echo \COPY %tabnam% from %fi% WITH DELIMITER %delim% csv HEADER
echo \COPY %tabnam% from %fi% WITH DELIMITER %delim% csv HEADER;>>fil.sql
GOTO:EOF
::endlocal
Pause

Although I don't know the format of SQL tables, I can show you how to read a CSV file. The Batch file below read all lines from the file; it first take field names from first line (CSV header) and create an array of variable names (eliminating possible spaces in field names); then it read the rest of lines and assign each field value to its corresponding Batch variable.
ProcessCSV.BAT:
#echo off
rem General-purpose CSV file reader program
rem Antonio Perez Ayala
setlocal EnableDelayedExpansion
set numFields=0
rem Read the lines of the CSV file
for /F "delims=" %%a in (CSVfile.csv) do (
set "line=%%a"
if !numFields! equ 0 (
rem It is the first line: break it into an array of field names (removing spaces)
for %%b in (!line: ^=!) do (
set /A numFields+=1
set name[!numFields!]=%%b
)
) else (
rem Replace spaces by Ascii-128 (to avoid split values that may have spaces)
set "line=!line: =Ç!"
rem Insert any char. at beginning of each field, and separate fields with spaces
set i=0
for %%b in (X!line:^,^= X!) do (
set "field=%%b"
rem Recover spaces in this field, if any
set "field=!field:Ç= !"
rem And assign it to corresponding variable (removing first character)
set /A i+=1
for %%i in (!i!) do set "!name[%%i]!=!field:~1!"
)
rem At this point all variables have the values of current record.
rem They may be accessed explicitly (ie, from example CSVfile.csv):
echo/
echo Record of !FirstName! !LastName!
rem ... or implicilty via the NAME array:
for /L %%i in (3,1,!numFields!) do (
for %%b in (!name[%%i]!) do echo %%b: !%%b!
)
)
)
CSVfile.csv:
First Name,Last Name,Address,Postal Code,Company,Departament,Floor,Phone,Mobile
John,Smith,123 Fake Street,45612,SomeCo,Accounting,4,123-555-5555,123-555-5556
Jane,Doe,123 Fake Street,,SomeCo,,4,123-555-5555,123-555-5556
output:
Record of John Smith
Address: 123 Fake Street
PostalCode: 45612
Company: SomeCo
Departament: Accounting
Floor: 4
Phone: 123-555-5555
Mobile: 123-555-5556
Record of Jane Doe
Address: 123 Fake Street
PostalCode:
Company: SomeCo
Departament:
Floor: 4
Phone: 123-555-5555
Mobile: 123-555-5556
Please be aware that this program use several advanced Batch techniques. I suggest you to get help on every command you don't completely understand (ie: SET /?) and read it carefully. If after this process you have further questions about this program, just post they as an edit in your original question.
The most complex part of this program is responsible to assign empty strings to variables when the corresponding field is empty (two commas side by side); if the file have not empty fields, the program may be somewhat simpler. Also, this program (as most Batch solutions) may give erroneous results if certain special Batch characters appear in the file, like !. Most of these characters may be managed if required via certain modifications in the program.
EDIT: Modified version when no empty fields exists
#echo off
rem CSV file reader program when no empty fields exist
rem Antonio Perez Ayala
setlocal EnableDelayedExpansion
set numFields=0
rem Read the lines of the CSV file
for /F "delims=" %%a in (CSVfile.csv) do (
set "line=%%a"
if !numFields! equ 0 (
rem It is the first line: break it into an array of field names (removing spaces)
for %%b in (!line: ^=!) do (
set /A numFields+=1
set name[!numFields!]=%%b
)
) else (
rem Replace spaces by Ascii-128 (to avoid split values that may have spaces)
set "line=!line: =Ç!"
rem Separate fields (using comma as standard Batch separator)
set i=0
for %%b in (!line!) do (
set "field=%%b"
rem Assign this field to corresponding variable, recovering spaces
set /A i+=1
for %%i in (!i!) do set "!name[%%i]!=!field:Ç= !"
)
rem At this point all variables have the values of current record.
rem They may be accessed explicitly (ie, from example CSVfile.csv):
echo/
echo Record of !FirstName! !LastName!
rem ... or implicilty via the NAME array:
for /L %%i in (3,1,!numFields!) do (
for %%b in (!name[%%i]!) do echo %%b: !%%b!
)
)
)
Please note that the standard separators in FOR sets are comma, semicolon and equal-sign, besides spaces:
for %a in (one two,three;four=five) do echo %a
Previous program replace spaces by another character and use commas to separate fields. However, if the line may contain semicolons or equal-signs the fields will be splitted at that point, so in this case these characters must be changed for another ones before the FOR and recovered later, in the same way of the space.
EDIT: Modifications for new request (eliminate last comma)
Eliminating the last comma is not trivial, although not too complex neither. I hope my method be easy to understand; it is based on SET /P command behaviour of show text (the input prompt) with NO new line at end; note that the format is SET /P =text>>out<NUL. The <NUL part is needed so the SET /P will NOT wait for input; don't leave spaces before the < (the same as >>). However, this behaviour do NOT work in Windows Vista an later versions, I think. If the method don't work for you, then it must be modified again...
I also moved ahead and include some remarks about the parts that still are missing in your code (I think), that is, the processing of several files.
:proces
REM Executing loop for each file in current directory
REM *This may be done with a FOR loop:*
::*for %%F in (*.csv) do (*
REM *The file name is given by %%F. In this case, the fileinitialization part*
REM *must be done here, for example:*
set numFields=0
echo %creat%>fil.sql
REM Read the lines of the CSV file
For /F "eol==" %%A in (bio.csv) Do (
set "line=%%A"
REM check if index of array is 0
if !numFields! equ 0 (
REM First line, Store in array name
for %%B in (!line: ^=!) do (
REM Note that I changed the place of the ECHO command
set /A numFields+=1
set name[!numFields!]=%%B
if !numFields! equ 1 (
REM First field: show it with NO comma and NO NEW LINE
set /P =%%B (text^)>>%sqfile%<NUL
) else (
REM Next fields: complete the comma of previous field, WITH NEW LINE
echo ,>>%sqfile%
REM ... and show this field with NO comma and NO NEW LINE (again)
set /P =%%B (text^)>>%sqfile%<NUL
)
)
REM Insert the new line of last field (that have NOT comma :-)
echo/>>%sqfile%
)
)
::*)*
GOTO:EOF
:copydata
I strongly encourage you to keep my previous format: 4 justification columns inside each block of code enclosed in parentheses and place the closing parentheses in the same column of the opening command, FOR or IF. This format will help you to easily locate errors cause by mismatched parentheses in large programs.
Antonio

Related

Transfer latest dated file names and remove earlier dated duplicates [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Currently, I have to upload a bunch of excel sheets to a network shared folder. Each of these files has the date they were created appended at the end of the filename. Then I have to remove the earlier duplicates leaving just the latest dated versions.
Basically it looks likes this...
Before:
apples 2019.07.01.xlsx
apples 2019.07.07.xlsx
oranges 2019.07.01.xlsx
bananas 2019.07.01.xlsx
After:
apples 2019.07.07.xlsx
oranges 2019.07.01.xlsx
bananas 2019.07.01.xlsx
I stumbled upon a possible solution, which was to create a batch-file to recursively go through the folder and do this. However, I am unsure where to start.
I read this other stackoverflow article, which is pretty close to what I want to do but I am having trouble adjusting it to my needs. Any assistance would be appreciated.
Edit2: this code worked for me:
#(
SetLocal EnableDelayedExpansion
ECHO OFF
SET "_PathToCheck=Y:\T\DT"
SET "_FileGlob=PLOG - * - ????.??.?? - *.xlsx"
SET "_CurrentFile="
)
FOR /F "Tokens=1-2* Delims=-" %%A IN ('DIR /A-D /O-N /B "%_PathToCheck%\%_FileGlob%"') DO (
IF /I "!_CurrentFile!" EQU "%%A-%%B" (
ECHO.Deleting: "%_PathToCheck%\%%A-%%B-%%C"
DEL /F /Q "%_PathToCheck%\%%A-%%B-%%C"
) ELSE (
ECHO.
ECHO.New File Found: "%%A-%%B"
ECHO.-----------
ECHO.Retaining: "%_PathToCheck%\%%A-%%B-%%C"
SET "_CurrentFile=%%A-%%B"
)
)
You may use the same approach you would use if you do this job by hand: review the file list and every time that a file appear with the same name than the previous one, remove the previous one... Simple! Isn't it? ;)
#echo off
setlocal EnableDelayedExpansion
rem Initialize the "previous name"
set "lastName="
rem Process files in natural order, that is, the same order showed in the question
rem and set %%a to name and %%b to rest: date plus extension
for /F "tokens=1*" %%a in ('dir /B /A:-D /O:N *.xlsx') do (
rem If previous name is not the same as current one
if "!lastName!" neq "%%a" (
rem Just update previous name and date
set "lastName=%%a"
set "lastDate=%%b"
) else (
rem Remove the previous file
ECHO del "!lastName! !lastDate!"
rem and update the previous date
set "lastDate=%%b"
)
)
This solution assumes that the name and the date parts are separated by exactly one space...
EDIT: New method added, after several confusing changes made by the OP
#echo off
setlocal EnableDelayedExpansion
set "lastName="
for /F "delims=" %%a in ('dir /B /A:-D /O:N *.xlsx') do (
set "currName="
set "currFile="
for %%b in (%%~Na) do (
set "part=%%b"
set "currFile=!currFile! !part!"
if "!part:.=!" equ "!part!" set "currName=!currName! !part!"
)
if "!lastName!" neq "!currName!" (
set "lastName=!currName!"
set "lastFile=!currFile!"
) else (
ECHO del "!lastFile:~1!.xlsx"
set "lastFile=!currFile!"
)
)
Example of input files:
apples 2019.07.01.xlsx
apples 2019.07.07.xlsx
oranges 2019.07.01.xlsx
bananas 2019.07.01.xlsx
apples 2019.07.01 proof1.xlsx
apples 2019.07.07 proof1.xlsx
PLOG - Organic Valley - 2019.07.01 - (DAI) OG Cream Cheese.xlsx
PLOG - Organic Valley - 2019.07.07 - (DAI) OG Cream Cheese.xlsx
PLOG - Organic Valley - 2019.07.10 - (DAI) OG Cream Cheese.xlsx
Output:
del "apples 2019.07.01.xlsx"
del "apples 2019.07.01 proof1.xlsx"
del "PLOG - Organic Valley - 2019.07.01 - (DAI) OG Cream Cheese.xlsx"
del "PLOG - Organic Valley - 2019.07.07 - (DAI) OG Cream Cheese.xlsx"
Here is the newest version for you Greg.
Although originally I and a couple of others wrote versions which would work for date modified using a different logic (create variables for all of the files' names then sort them again, or do sets of compares) I realized we could still accomplish the goal in a single loop and without needing so many temp variables and without having to have more complex logic in that scenario and this one, so I took a few minutes and created that version.
Essentially we just need to define a variable with the File's name that has already been found, since we know they are ordered correctly date-wise, and only need to worry about removing the duplicate named files.
To do so we can use SEt or IF DEFINED, I prefer IF DEFINED here since I can use regular IF ( ) THEN ( ) ELSE ( ) logic as already defined in the script. (Note the items in Italic here are not terms that can be used in a CMD script, but I am writing them to clarify the normal logic of the IF construct)
We could use SET "[Variable Name]" instead, and test if success or failure using || or &&, but that would be more re-write and unnecessary here.
#(
SetLocal EnableDelayedExpansion
ECHO OFF
SET "_PathToCheck=C:\T\DT"
SET "_FileGlob=PLOG - * - ????.??.?? - *.xlsx"
SET "_CurrentFile="
SET "_MatchList="
)
FOR /F "Tokens=1-3* Delims=-" %%A IN ('
DIR /A-D /O-N /B "%_PathToCheck%\%_FileGlob%"
') DO (
SET "_CurrentFile=%%A-%%B-%%D"
SET "_MatchList=!_CurrentFile: =_!"
IF DEFINED _MatchList_!_MatchList! (
ECHO.Deleting: "%_PathToCheck%\%%A-%%B-%%C-%%D"
DEL /F /Q "%_PathToCheck%\%%A-%%B-%%C-%%D"
) ELSE (
ECHO.
ECHO.New File Found: "!_MatchList!" Date-Stamp: %%C
ECHO.-----------
ECHO.Retaining: "%_PathToCheck%\%%A-%%B-%%C-%%D"
SET "_MatchList_!_MatchList!=%%A-%%B-%%D"
)
)
The previous version which conformed to the standards for only the left side of the file name being unique.
#(
SetLocal EnableDelayedExpansion
ECHO OFF
SET "_PathToCheck=Y:\T\DT"
SET "_FileGlob=PLOG - * - ????.??.?? - *.xlsx"
SET "_CurrentFile="
)
FOR /F "Tokens=1-2* Delims=-" %%A IN ('
DIR /A-D /O-N /B "%_PathToCheck%\%_FileGlob%"
') DO (
IF /I "!_CurrentFile!" EQU "%%A-%%B" (
ECHO.Deleting: "%_PathToCheck%\%%A-%%B-%%C"
DEL /F /Q "%_PathToCheck%\%%A-%%B-%%C"
) ELSE (
ECHO.
ECHO.New File Found: "%%A-%%B"
ECHO.-----------
ECHO.Retaining: "%_PathToCheck%\%%A-%%B-%%C"
SET "_CurrentFile=%%A-%%B"
)
)
Example Output:
Y:\>Y:\t\DT.cmd
New File Found: "PLOG - File Three For yoU "
-----------
Retaining: "Y:\T\DT\PLOG - File Three For yoU - 2019.08.11 - (something) AAA 1 .xlsx"
New File Found: "PLOG - File Number Two "
-----------
Retaining: "Y:\T\DT\PLOG - File Number Two - 2019.12.19 - Ending ABDC 1111 AB.xlsx"
Deleting: "Y:\T\DT\PLOG - File Number Two - 2019.07.30 - Ending ABDC 1111 AB.xlsx"
Deleting: "Y:\T\DT\PLOG - File Number Two - 2019.03.12 - Ending Number 3 .xlsx"
New File Found: "PLOG - File Number One "
-----------
Retaining: "Y:\T\DT\PLOG - File Number One - 2020.01.01 - Ending BBB .xlsx"
Deleting: "Y:\T\DT\PLOG - File Number One - 2019.12.19 - Ending BBB 2 .xlsx"
Deleting: "Y:\T\DT\PLOG - File Number One - 2019.09.07 - Ending AAA1.xlsx"
Deleting: "Y:\T\DT\PLOG - File Number One - 2017.01.03 - Ending AAA 1 .xlsx"
Y:\>
Screenshot confirming the Script works and showing the Output and results:
Essentially this does the same thing as in my original version only now we know that we should be looking for the Hyphens
IE:
We use DIR to Sort the File names in a reversed sort order, this will mean that files that have a newer date it will appear before those with older dates.
This simplifies the logic for deleting the files, and is the crux of my original solution as well.
Because of using that method we only need to check if the first part of the file name (the part before the date) is the same as the previous file found.
We do this by creating a variable to hold the name of the current file _CurrentFile and set it empty, so on the initial check, it will not match any file name.
If _CurrentFile matches the first part of the file name (again, the part before the date) o the file dir found, then we can safely delete it.
If _CurrentFile does not match the interesting portion of the file reported by the DIR cmd, then we update the _CurrentFile variable to that new value and move on to the next file result to test.
As you are unfamiliar with cmd/batch scripting, I would like to take a minute to go into more detail about what the script is doing and why for you so you can go forward yourself:
First I should note that we have a few options on how to iterate the files, most commonly for, for/F, and For files are common go-to for looping over files, sometimes with a DIR cmd in a for /F alternatively with a WMIC file list (although, thankfully WMIC is finally getting deprecated in favor of Powershell).
As we know you simply wat to Choose based off its Filename and the date stored in the file name, then using a dir cmd to sort by Name will be a pragmatic method to do the matching quickly
Now onto what each part of the script is doing
#(
Parenthesis create code blocks in CMD and Batch Script, everything within a given Parenthesis will be evaluated at the same time.
By placing an # in front of the parenthesis any commands with it ( And not within further parenthesis, or after a DO ) will not be echoed to the screen. This is to stop this section form showing up and cluttering output.
SetLocal EnableDelayedExpansion
We are turning on Delayed Expansion, to allow us to easily evaluate the contents of the variables inside of a for loop by referencing them with !_var! instead of %_Var%, technically we can get away without this if any of your filenames have ! in them, we should disable this and re-write it a bit, if not then it's fine.
ECHO OFF
I am stopping the script from echoing every line it is doing so we have less cluttered output. Setting this command means I no longer have to use # in front of further commands within this code block or future code outside this block.
SET "_PathToCheck=Y:\T\DT"
SET "_FileGlob=PLOG - * - ????.??.?? - *.xlsx"
SET "_CurrentFile="
)
Setting the variables and closing the code block with a closing parenthesis seems self-explanatory except for one _FileGlob
This is a standard File Glob it is used to match the name of the file you want to have considered for comparison.
* matches any character any number of times, ? matches any character once.
This ensures that if we encounter files which don't conform to the format we expect we can skip them.
If the need required a more explicit matching, we might use a glob of *.xlsx and use FINDStr to check against a regex pattern to make sure the format was very exactly the one needed.
in this next part
FOR /F "Tokens=1-2* Delims=-" %%A IN ('
DIR /A-D /O-N /B "%_PathToCheck%\%_FileGlob%"
') DO (
[Code]
)
Now I am going to go a little out of order here:
We are using DIR to quickly sort the files by Their Name in reverse order and return just the filenames. DIR is extremely fast at doing this, so it's preferable if you are doing a little sorting rather than matching the files using IF compares later. We utilize the file glob as mentioned above to ensure only files we want to evaluate are returned.
The Option /A-D ignores directories, /B will only output the file name (since we aren't recursing) Then, we have /O-N -- /O is "Order By" the Option N sorts by Name ascending, while -N sorts by name in Reverse (Descending) Order (IE Z-A 9-0), so we can be assured that the file with the name that has the newest date will be the first one we find.
This is all placed inside a For /F Loop which is a way to parse the output of a command. We use Delims=- to "Tokenize" or Split-up the strings FOR is receiving from the DIR command. We Tell FOR what Variable names to store the Tokes in using %%A (Variables are as follows: "? # A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]" OR "_ `` a b c d e f g h i j k l m n o p q r s t u v w x y z" { ( More info here https://ss64.com/nt/for_f.html ) ), Variables we be assigned to Tokens starting with the one you chose.
When we specify the tokens to pick, Tokens=1-2*", specifically 1-2 means to take the first Token through the second token, and store them in the First N variables (where N = the number of variables in the set 1-2, ie %%A and %%B for our purposes), and * means stop tokenizing anything after any tokens mentioned prior to this point, and place all of the remaining portions of the line into the next variable (%%C).
Because we are tokenizing use the Hyphen as a delimiter, we now that the first two tokens will be PLOG and [Name to Compare while the date and the rest of the file name will be in the 3rd token.
In the DO ( ) section we are going to go on and process the info returned by each line and stored in our tokens.
Lets go on to examine the code within the DO ( )
IF /I "!_CurrentFile!" EQU "%%A-%%B" (
ECHO.Deleting: "%_PathToCheck%\%%A-%%B-%%C"
DEL /F /Q "%_PathToCheck%\%%A-%%B-%%C"
) ELSE (
ECHO.
ECHO.New File Found: "%%A-%%B"
ECHO.-----------
ECHO.Retaining: "%_PathToCheck%\%%A-%%B-%%C"
SET "_CurrentFile=%%A-%%B"
)
This is probably familiar to you enough as you are used to VBA, but we are testing the value of the variable _CurrentFile to the First Two Portions of the string, which we know are the entire portion of the file name up to the Date, and we need to add the Hyphen back in because when FOR splits by tokens it removes those tokens.
We check is the _CurrentFile variable ia a match for the currently returned file name's portion up to, but not including the date.
If this matches, we delete (Del) the file because we have already seen the file once before so this is one that is older.
We use the /F Option to Force deleting read-Only Files, and we use /Q to stop it from prompting us to confirm the deletion of each file.
We also ECHO. that we are deleting the file we found to note what the script is doing.
) ELSE (
If this does not match, that means this is a new file we haven't encountered for, and must be the first one returned, in which case we want to keep it because we know from the sort of Dir that it will be the interesting file.
Therefore on a non-match, we change the _CurrentFile variable to hold the value of the first two tokens %%A-%%B to use in future checks of the results returned.
We also ECHO. that we found the file and are retaining it to give a nice little indicator of what the script is doing.
A further note on ECHO -- Although I like how Echo. looks, ECHO( is safer to use, and I prefer it for that reason, but it is more confusing for folks who are unfamiliar with cmd scripts as the Open parenthesis looks like I have either a typo or an unclosed code block and can lead to people thinking it causes some problem. So for this reaosn, I try to avoid using ECHO( in favor of ECHO. when ECHO. will do.
Original Post and Versions Which used the incorrect format
You can make this a quite simple script that basically finds each unique name and keeps the 1st one so long as your names are in YYYY.MM.DD.xlsx format by pre-sorting the names so that the one with the newest date in the name one is always the first file encountered.
The Space is guaranteed? Optional?
to do this you need to use a FOR /F loop to parse the output from DIR ordered by (/O) Name Descending (-N)
DT.CMD:
#(
SetLocal EnableDelayedExpansion
ECHO OFF
SET "_PathToCheck=Y:\T\DT"
SET "_FileGlob=* ????.??.??.xlsx"
SET "_CurrentFile="
)
FOR /F "Tokens=*" %%A IN ('DIR /A-D /O-N /B "%_PathToCheck%\%_FileGlob%"') DO (
SET "_TFile=%%~nA"
SET "_TFile=!_TFile:~0,-10!"
IF /I "!_CurrentFile!" EQU "!_TFile!" (
ECHO.Deleting: "%_PathToCheck%\%%~A"
DEL /F /Q "%_PathToCheck%\%%~A"
) ELSE (
ECHO.
ECHO.New File Found: !_TFile!
ECHO.-----------
ECHO.Retaining: "%_PathToCheck%\%%~A"
SET "_CurrentFile=!_TFile!"
)
)
We then simply need to compare the names of the files except for the Trailing YYYY.MM.DD.xlsx, and if the File is the 1st with that name we keep it, as we know it will be the newest.
If the name is a duplicate we can delete it because we know we already skipped the newest.
Example Output:
Y:\>Y:\t\DT.cmd
New File Found: bananas
-----------
Retaining: "Y:\T\DT\bananas 2019.07.01.xlsx"
New File Found: oranges
-----------
Retaining: "Y:\T\DT\oranges 2019.09.01.xlsx"
Deleting: "Y:\T\DT\oranges 2019.07.11.xlsx"
New File Found: apples
-----------
Retaining: "Y:\T\DT\apples 2019.07.07.xlsx"
Deleting: "Y:\T\DT\apples 2019.07.01.xlsx"
If your Date format is instead YYYY.DD.MM.Xlsx
Then you will need to go through an extra hoop or two.
Essentially in that scenario, we can do the following:
save the File name as a variable with the corrected (sortable) version of the file name (YYYY.MM.DD format) and then sort it and then compare the array of variables, deleting the ones which are not newest.
Here is that version DT_DM.CMD:
#(
SetLocal EnableDelayedExpansion
ECHO OFF
SET "_PathToCheck=Y:\T\DT"
SET "_FileGlob=* ????.??.??.xlsx"
SET "_CurrentFile="
SET "_MatchList= "
)
FOR /F "Tokens=*" %%A IN ('DIR /A-D /ON /B "%_PathToCheck%\%_FileGlob%"') DO (
SET "_TFile=%%~nA"
SET "_TFileMD=!_TFile:~-5!"
SET "_TVar=__!_TFile:~0,-5!!_TFileMD:~-2!.!_TFileMD:~0,2!"
REM ECHO.Storing File: "%%~A" As: "!_TVar!"
SET "!_TVar!=%%~A"
IF /I "!_CurrentFile!" NEQ "!_TFile:~0,-10!" (
ECHO.New File Found, Adding to Sort List: "!_TFile:~0,-10!"
SET "_CurrentFile=!_TFile:~0,-10!"
SET "_MatchList=!_MatchList! "__!_TFile:~0,-10!""
)
)
ECHO.
ECHO.Delete Old Files
ECHO.-----------------
REM Loop the Matched Files:
FOR %%a IN (%_MatchList%) DO (
ECHO.
ECHO.Delete Old %%a Files
ECHO.-----------------
REM Loop the SET sorted for each File Found and Skip the First one (Newest), deleting the others.
FOR /F "Skip=1 Tokens=1-2 Delims==" %%A IN ('SET "%%~a" ^| SORT /R') DO (
ECHO.Deleting: "%_PathToCheck%\%%~B"
DEL /F /Q "%_PathToCheck%\%%~B"
REM Remove the deleted file variable so we can print a list of retained files at the end:
SET "%%A="
)
)
ECHO.
ECHO.Retained Files:
ECHO.-----------------
FOR %%a IN (%_MatchList%) DO ( SET "%%~a" )
Here is example output from that:
Y:\>Y:\t\DT_DM.cmd
New File Found, Adding to Sort List: "apples "
New File Found, Adding to Sort List: "bananas "
New File Found, Adding to Sort List: "oranges "
Delete Old Files
-----------------
Delete Old "__apples " Files
-----------------
Deleting: "Y:\T\DT\apples 2019.07.07.xlsx"
Deleting: "Y:\T\DT\apples 2019.12.01.xlsx"
Delete Old "__bananas " Files
-----------------
Delete Old "__oranges " Files
-----------------
Retained Files:
-----------------
__apples 2019.12.01=apples 2019.01.12.xlsx
__bananas 2019.01.07=bananas 2019.07.01.xlsx
__oranges 2019.11.07=oranges 2019.07.11.xlsx
Now, both of these examples Assume that you ALWAYS want whatever file is Named with the Newest Date, not the most recently modified file
This is probably the case as I know I usually want to have that scenario when working with my own dated files, in case someone or some process came along and modified the files, or I saved more than one out of order.
But just in case you really wanted to just retain the most recently modified file, we can Use the same concept as in the second version and save the Real Modified time to the Variables instead of the date on them.
DT_Modified.CMD:
#(
SetLocal EnableDelayedExpansion
ECHO OFF
SET "_PathToCheck=Y:\T\DT"
SET "_FileGlob=*.xlsx"
SET "_CurrentFile="
SET "_MatchList= "
)
FOR %%A IN ("%_PathToCheck%\%_FileGlob%") DO (
ECHO.%%A| FINDStr /I " [0-9][0-9][0-9][0-9]\.[0-9][0-9]\.[0-9][0-9]\.xlsx$" >NUL && (
SET "_TFile=%%~nA"
SET "_TVar=__!_TFile:~0,-10!%%~tA"
ECHO.Storing File: "%%~A" As: "!_TVar!"
SET "!_TVar!=%%~A"
IF /I "!_CurrentFile!" NEQ "!_TFile:~0,-10!" (
ECHO.
ECHO.New File Found, Adding to Sort List: "!_TFile:~0,-10!"
ECHO.
SET "_CurrentFile=!_TFile:~0,-10!"
SET "_MatchList=!_MatchList! "__!_TFile:~0,-10!""
)
)
)
ECHO.
ECHO.Delete Old Files
ECHO.-----------------
REM Loop the Matched Files:
FOR %%a IN (%_MatchList%) DO (
ECHO.
ECHO.Delete Old %%a Files
ECHO.-----------------
REM Loop the SET sorted for each File Found and Skip the First one (Newest), deleting the others.
FOR /F "Skip=1 Tokens=1-2 Delims==" %%A IN ('SET "%%~a" ^| SORT /R') DO (
ECHO.Deleting: "%_PathToCheck%\%%~B"
DEL /F /Q "%_PathToCheck%\%%~B"
REM Remove the deleted file variable so we can print a list of retained files at the end:
SET "%%A="
)
)
ECHO.
ECHO.Retained Files:
ECHO.-----------------
FOR %%a IN (%_MatchList%) DO ( SET "%%~a" )
Example of First Script Running ad results:

Batch file looping through files in a specific date/time order

Looping through files in a Specific Date/Time Order
Hi All,
I'm struggling to write a small addition to my batch file I have been using for a while. Firstly, here are some example files names I would be dealing with:
output.log, output.log.1, output.log.2, ..., outlput.log.199
The input file set may contain just 1 or a few of the above files (always in the order shown, newest first, oldest last), or all 200 files.
What am I attempting to do
The batch file is used to do several things with these file, such as copy them into a new directory, or create a parameter list to pass them into another command. In general, what I have works well, it did what I wanted at the time, however, processing all 200 file in a potential file set is time consuming. What I am now looking to do is limit the number of processes file, to extract just the first, say 20 files, or (if there are less than 20 files), all of them.
The problem I have is that the FOR loop I am using loops though the files by name order, not date/time order. Therefore, if I stop after 20 iterations, I end up with:
output.log, output.log.1, output.log.10, outlput.log.100, outlput.log.101, ...
I need to be able to loop through the file set in date/time order (newest first). I do not know what the date range will be, or if the files are at specific intervals, just the log file will always be as per the first list.
The Code I have
setlocal EnableDelayedExpansion
Set SnapDir=C:\Snaps\
Set SupportLog=unified_support.log
Set LogDir=\var\log
Set ParsedLogDir=\var\log\parsed
set PARAMS=
set /A NumberSupportFiles=20
Set /A StartSupportCount=0
if [%1]==[] goto :eof
:loop
rem Create a Parameter list of all the Log files, Copy File then stop after a set number of interations
for %%A in ("%SnapDir%%~n1%LogDir%\%SupportLog%*") do (
set PARAMS=!PARAMS! "%%A"
copy %%A "%SnapDir%%~n1%ParsedLogDir%\"
set /A StartSupportCount+=1
if !StartSupportCount! EQU %NumberSupportFiles% goto :jump
)
:jump
pause
As mentioned, this kinda works, it loops through the first 20 files in the file set, but the file order is by name not date/time.
From what I have read so far, any date/time manipulation appears to need an exact reference point or specific delimitation, and I can't see a way to order the set before looping. Is this possible?
You are almost done, you only need to replace the for loop by a for /F, that parses the output of dir /B A:-D /O:-D, which constitutes a list of files sorted by modification date in decending order (newest first):
#echo off
setlocal EnableExtensions EnableDelayedExpansion
Set "SnapDir=C:\Snaps\"
Set "SupportLog=unified_support.log"
Set "LogDir=\var\log"
Set "ParsedLogDir=\var\log\parsed"
set "PARAMS="
set /A NumberSupportFiles=20
Set /A StartSupportCount=0
if "%~1"=="" goto :EOF
:LOOP
rem Create a Parameter list of all the Log files,
rem Copy File then stop after a set number of interations
for /F "eol=| delims=" %%A in ('
dir /B /A:-D /O:-D "%SnapDir%%~n1%LogDir%\%SupportLog%*"
') do (
set "PARAMS=!PARAMS! "%%~A""
copy "%SnapDir%%~n1%LogDir%\%%~A" "%SnapDir%%~n1%ParsedLogDir%\"
set /A StartSupportCount+=1
if !StartSupportCount! EQU %NumberSupportFiles% goto :JUMP
)
:JUMP
pause
endlocal
exit /B

Find multiple values in different lines using command-line | CMD

I have multiple results (Radiology, Labs, Pathology, Transcriptions) for the same patient in a file and I am only interested in getting results for a set of particular values. For example: I want to look for a radiology report on the first line and patient MRN 123456789 on the second line.
Can this be achieved using findstr? Thanks
MSH|^~\&|RADIOLOGY|1|SCM||20150303||ORU|20150303|T|2.3|20150303
PID||1111111|123456789^^^MRN_SB^||TEST^PATIENT^^^||19000101||^^||
PV1|1|E|ER^ER^1^SB||||||||||||||||||||||||||||||||||||||||||||||
ORC|RE|36543654|36543654|3003487889
#ECHO OFF
SETLOCAL
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
SET "found="
SET "mrn=%1"
FOR /f "delims=" %%o IN (q29931949.txt) DO (
FOR /f "tokens=1-4delims=|" %%a IN ("%%o") DO (
IF DEFINED found IF "%%a"=="PID" (
SET "$2=%%o"
CALL :report "%%b" "%%c" "%%d"
)
SET "found="
IF "%%a"=="MSH" IF "%%b"=="RADIOLOGY" SET found=Y
IF "%%a"=="MSH" IF "%%c"=="RADIOLOGY" SET found=Y
IF DEFINED found SET "$1=%%o"
)
)
GOTO :EOF
:report
SET "field=%~1"
IF NOT DEFINED field GOTO :EOF
FOR /f "tokens=1delims=^^" %%r IN ("%~1") DO SET "field=%%r"
IF "%field%"=="%mrn%" FOR /F "tokens=1*delims==" %%r In ('set $') DO ECHO(%%s
shift
GOTO report
I used a file named q29931949.txt containing your data for my testing.
You don't really supply enough information to produce a result. For instance, is "MRN" a required data item?
This procedure will find two consecutive lines, the first one having "MSH" in he first column and "RADIOLOGY" in the second or third and the second line having "PID" in the first column snd either the second, third or fourth column containing the target number.
You'd run the routine using thisbatchaname 123456789
It accepts the parameter 123456789 and assigns that to mrn.
It then reads the file and assigns each line in tun to %%o, and tokenises the line on |, applying tokens 1-4 to %%a..%%d rspectively.
The main loop sets found to empty and then to Y only if the first field is MSH and the second or thid RADIOLOGY. If the found flag is set, the original line in %%o is applied to $1. Only if found is set at the start of the loop (which means that the previous line is MSH/RADIOLOGY) will the routine :report be called after $2 has the original contents of the second line assigned.
The :report routine sets field to the first parameter to see whether there are remaining parameters to process. The for then assigns the part of the field up to the first caret (^) to field. If this matches the mrn input from the command line, then the $ variables are echoed to the console (you don't say what you actually want to do with the data). Regardless, the remaining parameters are checked.
The reson for checking the second/third(/fourth) parameter is to cater for the presence or absence of data in the fields as consecutive | characters are interpreted as a single delimiter.
Find a HL7 parser library for Your programming/scripting language of choice and use it. It is not worth it to write a HL7 parser from scratch. There should be libraries available for all popular languages that You can use.
If You then have specific questions, feel free to ask again.

batch rename files keeping substring and adding mm and yy

I have a series of files that have long filenames. For each filename that contains a hyphen I would like to keep the substring in position 6-8, append the _FM07_FY14.prn to the name and ignore the rest of the original filename. The new extension is now .prn. The two digits 07 stands for the month and 14 is the year. The month and year can be found from the "date created" property. Will appreciate it if you can show me how to automatically capture this mm and yy from the date created. Hardcoding this part is okay too since I can sort files by created dates and put them in separate folders.
For example
aaaaaD07.dfdd-1234.A.b.1233 new filename will be D07_FM01_FY14.prn
bbcbaA30dls-d343.a.123d new filename will be A30_FM01_FY14.prn
cdq0dG12ir3-438d.dfd.txt new filename will be G12_FM01_FY14.prn
This is the .bat file I come up with after reading many posts on here, and I don't know how to extract the mm and yy so I hard code it. I am not familiar with Powershell. I can only handle a .bat or .cmd file and run it at the command prompt. Any and all help will be highly appreciated. Thanks!
#ECHO OFF
SETLOCAL
for %%F in (*.*) do (
SET "name=%%a"
set "var=_FM01_FY14.prn"
ren *-* "%name:~6,8%var%"
)
*endlocal*
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir\one"
PUSHD %sourcedir%
FOR /f "delims=" %%a IN (
'dir /b /a-d "*" '
) DO (
SET name=%%a
SET fdate=%%~ta
ECHO(REN "%%a" "!name:~5,3!_FM!fdate:~3,2!_FY!fdate:~8,2!.prn"
)
popd
GOTO :EOF
You would need to change the setting of sourcedir to suit your circumstances.
The format that I use for date is dd/mm/yyyy If yours is different, then you'll need to change the offset in the !fdate:~m,2! phrases. The value of m is the offset into the date string from the first character (the second parameter is the number of characters to select.)
The required REN commands are merely ECHOed for testing purposes. After you've verified that the commands are correct, change ECHO(REN to REN to actually rename the files.

How to loop through files matching wildcard in batch file

I have a set of base filenames, for each name 'f' there are exactly two files, 'f.in' and 'f.out'. I want to write a batch file (in Windows XP) which goes through all the filenames, for each one it should:
Display the base name 'f'
Perform an action on 'f.in'
Perform another action on 'f.out'
I don't have any way to list the set of base filenames, other than to search for *.in (or *.out) for example.
Assuming you have two programs that process the two files, process_in.exe and process_out.exe:
for %%f in (*.in) do (
echo %%~nf
process_in "%%~nf.in"
process_out "%%~nf.out"
)
%%~nf is a substitution modifier, that expands %f to a file name only.
See other modifiers in https://technet.microsoft.com/en-us/library/bb490909.aspx (midway down the page) or just in the next answer.
You can use this line to print the contents of your desktop:
FOR %%I in (C:\windows\desktop\*.*) DO echo %%I
Once you have the %%I variable it's easy to perform a command on it (just replace the word echo with your program)
In addition, substitution of FOR variable references has been enhanced
You can now use the following optional syntax:
%~I - expands %I removing any surrounding quotes (")
%~fI - expands %I to a fully qualified path name
%~dI - expands %I to a drive letter only
%~pI - expands %I to a path only (directory with \)
%~nI - expands %I to a file name only
%~xI - expands %I to a file extension only
%~sI - expanded path contains short names only
%~aI - expands %I to file attributes of file
%~tI - expands %I to date/time of file
%~zI - expands %I to size of file
%~$PATH:I - searches the directories listed in the PATH
environment variable and expands %I to the
fully qualified name of the first one found.
If the environment variable name is not
defined or the file is not found by the
search, then this modifier expands to the
empty string
https://ss64.com/nt/syntax-args.html
In the above examples %I and PATH can be replaced by other valid
values. The %~ syntax is terminated by a valid FOR variable name.
Picking upper case variable names like %I makes it more readable and
avoids confusion with the modifiers, which are not case sensitive.
You can get the full documentation by typing FOR /?
Easiest way, as I see it, is to use a for loop that calls a second batch file for processing, passing that second file the base name.
According to the for /? help, basename can be extracted using the nifty ~n option. So, the base script would read:
for %%f in (*.in) do call process.cmd %%~nf
Then, in process.cmd, assume that %0 contains the base name and act accordingly. For example:
echo The file is %0
copy %0.in %0.out
ren %0.out monkeys_are_cool.txt
There might be a better way to do this in one script, but I've always been a bit hazy on how to pull of multiple commands in a single for loop in a batch file.
EDIT: That's fantastic! I had somehow missed the page in the docs that showed that you could do multi-line blocks in a FOR loop. I am going to go have to go back and rewrite some batch files now...
Expanding on Nathans post. The following will do the job lot in one batch file.
#echo off
if %1.==Sub. goto %2
for %%f in (*.in) do call %0 Sub action %%~nf
goto end
:action
echo The file is %3
copy %3.in %3.out
ren %3.out monkeys_are_cool.txt
:end
There is a tool usually used in MS Servers (as far as I can remember) called forfiles:
The link above contains help as well as a link to the microsoft download page.
The code below filters filenames starting with given substring. It could be changed to fit different needs by working on subfname substring extraction and IF statement:
echo off
rem filter all files not starting with the prefix 'dat'
setlocal enabledelayedexpansion
FOR /R your-folder-fullpath %%F IN (*.*) DO (
set fname=%%~nF
set subfname=!fname:~0,3!
IF NOT "!subfname!" == "dat" echo "%%F"
)
pause
Echoing f.in and f.out will seperate the concept of what to loop and what not to loop when used in a for /f loop.
::Get the files seperated
echo f.in>files_to_pass_through.txt
echo f.out>>files_to_pass_through.txt
for /F %%a in (files_to_pass_through.txt) do (
for /R %%b in (*.*) do (
if "%%a" NEQ "%%b" (
echo %%b>>dont_pass_through_these.txt
)
)
)
::I'm assuming the base name is the whole string "f".
::If I'm right then all the files begin with "f".
::So all you have to do is display "f". right?
::But that would be too easy.
::Let's do this the right way.
for /f %%C in (dont_pass_through_these.txt)
::displays the filename and not the extention
echo %~nC
)
Although you didn't ask, a good way to pass commands into f.in and f.out would be to...
for /F %%D "tokens=*" in (dont_pass_through_these.txt) do (
for /F %%E in (%%D) do (
start /wait %%E
)
)
A link to all the Windows XP commands:link
I apologize if I did not answer this correctly. The question was very hard for me to read.