Order and move files into directories based on some filenames pattern - powershell

To move files into folders I use this script
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "SPLITCHAR=-" & rem // (a single character to split the file names)
set "SEARCHSTR=_" & rem // (a certain string to be replaced by another)
set "REPLACSTR= " & rem // (a string to replace all found search strings)
set "OVERWRITE=" & rem // (set to non-empty value to force overwriting)
rem // Get file location and pattern from command line arguments:
set "LOCATION=%~1" & rem // (directory to move the processed files into)
set "PATTERNS=%~2" & rem // (file pattern; match all files if empty)
rem /* Prepare overwrite flag (if defined, set to character forbidden
rem in file names; this affects later check for file existence): */
if defined OVERWRITE set "OVERWRITE=|"
rem // Continue only if target location is given:
if defined LOCATION (
rem // Create target location (surpress error if it already exists):
2> nul md "%LOCATION%"
rem /* Loop through all files matching the given pattern
rem in the current working directory: */
for /F "eol=| delims=" %%F in ('dir /B "%PATTERNS%"') do (
rem // Process each file in a sub-routine:
call :PROCESS "%%F" "%LOCATION%" "%SPLITCHAR%" "%SEARCHSTR%" "%REPLACSTR%"
)
)
endlocal
exit /B
:PROCESS
rem // Retrieve first argument of sub-routine:
set "FILE=%~1"
rem // Split name at (first) split character and get portion in front:
for /F "delims=%~3" %%E in ("%~1") do (
rem // Append a split character to partial name:
set "FOLDER=%%E%~3"
)
setlocal EnableDelayedExpansion
rem // Right-trim partial name:
if not "%~4"=="" set "FOLDER=!FOLDER:%~4%~3=!"
set "FOLDER=!FOLDER:%~3=!"
rem /* Check whether partial name is not empty
rem (could happen if name began with split character): */
if defined FOLDER (
rem // Replace every search string with another:
if not "%~4"=="" set "FOLDER=!FOLDER:%~4=%~5!"
rem // Create sub-directory (surpress error if it already exists):
2> nul md "%~2\!FOLDER!"
rem /* Check if target file already exists; if overwrite flag is
rem set (to an invalid character), the target cannot exist: */
if not exist "%~2\!FOLDER!\!FILE!%OVERWRITE%" (
rem // Move file finally (surpress `1 file(s) moved.` message):
1> nul move /Y "!FILE!" "%~2\!FOLDER!"
)
)
endlocal
exit /B
To use script I must
1- open cmd
2- to execute batch i have to
cd C:\Users\Administrator\Desktop\T\
"C:\Users\Administrator\Desktop\T\build-folder-hierarchy.bat" "C:\Users\Administrator\Desktop\T\" "*.pdf"
But problem what is ?
For each .pdf file batch creates a relative folder but I don't want it creates folders in that way. Look http://i.imgur.com/TVhQyzv.png
aaaa aaaa S02 [folder]
aaaa aaaa S02e01.pdf [folder]
aaaa aaaa S02e02.pdf [folder]
aaa.aaaa.aaa.aa.aaaaa.S02 [folder]
What I want instead ?
├─aaaa aaaa S02 [folder]
│ ├─aaaa aaaa S02e01.pdf[file]
│ ├─aaaa aaaa S02e02.pdf [file]
└─ ....
├─aaa.aaaa.aaa.aa.aaaaa.S02 [folder]
│ └─aaa.aaaa.aa.aa.aaaaa.S02E13.pdf [file]
:
Just an example name to understand how .pdf files name are formatted
aaaaaaaaa aa aaaaa S01e12 720p Repack.pdf
aaa aaaaaaaaa S01e05 Versione 720p.pdf
aaa aaaaaaaa S01e05 Versione 1080p.pdf
aaa aaaa s2e06.pdf
aaa aaaa S03e12.pdf
aaa.aaaa.aaa.on.Earth.S02E13.pdf
aaa.aaaa.aaaa.S02E01.HDTV.x264.SUB.ITA.pdf
Usually pdf files name are formatted in this way [pattern]
s01
s01e1
s1
s1e1
s1e01
s1e01-10
character, like the e and s are almost always present within these patterns name
general form should be
sxx
sxxex
sx
sxex
sxexx
sxexx-xx
X is a number, case for letter s and e is irrilevant
Powershell solution is well accepted for answer.

Your question is confusing. You have not described the format of the file names, but just show some examples and using examples instead of specifications may be misunderstood. Post code wrote for other problem that don't work on this one is certainly not useful. You did not showed an example of the input and wanted output using real file names, so it is possible that a solution based on the example data will not work on real data.
EDIT: New specification added. Both specifications and program code have been modified accordingly to a request given in comments.
Below there are the specifications of this problem as I understand they:
"Given a series of *.pdf files with this format:
any string hereS##Eany string here.pdf
/ | ^-- "E" letter
"S" letter digit
extract the string that ends before the "E" after the "S-digit" delimiter and move the file to a folder with such a name, "S" and "E" letters are not case sensitive. Ignore files that have not the previous format."
This code solve the problem based on such specifications:
#echo off
setlocal EnableDelayedExpansion
rem Change current directory to the one where this .bat file is located
cd "%~P0"
set "digits=0123456789"
rem Process all *.pdf files
for %%f in (*.pdf) do (
rem Get the folder name of this file
call :getFolder "%%f"
rem If this file have a properly formatted name: "headS##Etail"
if defined folder (
rem Move the file to such folder
if not exist "!folder!" md "!folder!"
move "%%f" "!folder!"
)
)
goto :EOF
:getFolder file
set "folder="
set "file=%~1"
set "head="
set "tail=%file%"
:next
for /F "delims=%digits%" %%a in ("%tail%") do set "head=%head%%%a"
set "tail=!file:*%head%=!"
if not defined tail exit /B
if /I "%head:~-1%" equ "S" goto found
:digit
if "!digits:%tail:~0,1%=!" equ "%digits%" goto noDigit
set "head=%head%%tail:~0,1%"
set "tail=%tail:~1%"
goto digit
:noDigit
goto next
:found
for /F "delims=Ee" %%a in ("%tail%") do set "folder=%head%%%a"
exit /B
To use this Batch file, place it on the same folder where the original files are located and execute it without parameters; you may also execute it via a double-click in the explorer. Example session:
C:\Users\Antonio\Documents\test> dir /B
test.bat
The_Good_Wife_S06e15.pdf
The_Good_Wife_S06e22.pdf
TOCCO_ANGELO_4.pdf
True Blood S07e07_001.pdf
True Detective S02E03-04 Repack.pdf
True Detective S02e03.pdf
True Detective S02e03_001.pdf
True.Detective.S02e02.1080p.WEBMux.pdf
Tudors S04e08.pdf
Tutti pazzi per amore s3e15-16.pdf
Tutto Pu‗ Succedere S01e01-02.pdf
Twin Peaks s1e1-8.pdf
Twin Peaks s2e16-22.pdf
Tyrant S02e07.pdf
Tyrant.S01e01_02.720p.DLMux.pdf
Ultimo 2 - La Sfida.pdf
Ultimo 3 -L Infiltrato.pdf
Una Mamma Imperfetta S02e01-13.pdf
Under the Dome S02e02 Versione 720p.pdf
Under.the.Dome.S03E07.HDTV.x264.SUB.ITA.pdf
C:\Users\Antonio\Documents\test> test.bat
C:\Users\Antonio\Documents\test> tree /F
Listado de rutas de carpetas
El número de serie del volumen es 00000088 0895:160E
C:.
│ test.bat
│ TOCCO_ANGELO_4.pdf
│ Ultimo 2 - La Sfida.pdf
│ Ultimo 3 -L Infiltrato.pdf
│
├───The_Good_Wife_S06
│ The_Good_Wife_S06e15.pdf
│ The_Good_Wife_S06e22.pdf
│
├───True Blood S07
│ True Blood S07e07_001.pdf
│
├───True Detective S02
│ True Detective S02E03-04 Repack.pdf
│ True Detective S02e03.pdf
│ True Detective S02e03_001.pdf
│
├───True.Detective.S02
│ True.Detective.S02e02.1080p.WEBMux.pdf
│
├───Tudors S04
│ Tudors S04e08.pdf
│
├───Tutti pazzi per amore s3
│ Tutti pazzi per amore s3e15-16.pdf
│
├───Tutto Pu‗ Succedere S01
│ Tutto Pu‗ Succedere S01e01-02.pdf
│
├───Twin Peaks s1
│ Twin Peaks s1e1-8.pdf
│
├───Twin Peaks s2
│ Twin Peaks s2e16-22.pdf
│
├───Tyrant S02
│ Tyrant S02e07.pdf
│
├───Tyrant.S01
│ Tyrant.S01e01_02.720p.DLMux.pdf
│
├───Una Mamma Imperfetta S02
│ Una Mamma Imperfetta S02e01-13.pdf
│
├───Under the Dome S02
│ Under the Dome S02e02 Versione 720p.pdf
│
└───Under.the.Dome.S03
Under.the.Dome.S03E07.HDTV.x264.SUB.ITA.pdf
This code would be much simpler if the file name before the "S" delimiter can not have digits. This solution assumes that there are not exclamation-marks ! in the file names.

The easiest way to get the last instance of a regex substring is to break the string into chunks and process the chunks in reverse order.
:: A script for grouping PDF files based on book series name
:: http://i.imgur.com/seh6p.gif
#echo off
setlocal enabledelayedexpansion
cls
:: Main Directory Containing PDF Directories (change this to suit your needs)
set "source_dir=.\test"
:: Move to source dir and process each folder, one at a time.
pushd "%source_dir%"
for /f "delims=" %%A in ('dir /b /a:d') do (
call :getSeriesName "%%A" series_name
mkdir !series_name! 2>nul
REM If you want to do additional cleanup, change the copy to a move
copy "%%A\*.pdf" !series_name! >nul
)
popd
exit /b
::------------------------------------------------------------------------------
:: Extracts the series name from the directory and changes spaces to periods
::
:: Arguments: %1 - The original book release name
:: %2 - The variable that will contain the returned value because
:: batch doesn't actually have functions
:: Returns: The series name and volume number
::------------------------------------------------------------------------------
:getSeriesName
:: Convert spaces to periods
set "raw_name=%~1"
set standard_name=!raw_name: =.!
:: Split the folder name into period-delimited tokens
set token_counter=0
:name_split
for /f "tokens=1,* delims=.-" %%B in ("!standard_name!") do (
set name_part[!token_counter!]=%%B
set standard_name=%%C
set /a token_counter+=1
goto :name_split
)
:: Get the volume number
for /l %%B in (0,1,!token_counter!) do (
echo !name_part[%%B]!|findstr /R /C:"[sS][0-9][0-9]*[eE][0-9][0-9]*" >nul
if !errorlevel! equ 0 (
set /a name_end=%%B-1
set volume_value=!name_part[%%B]!
set volume_value=!volume_value:~0,3!
)
)
:: Rebuild the series name
set "extracted_name="
for /l %%B in (0,1,!name_end!) do set "extracted_name=!extracted_name!!name_part[%%B]!."
set extracted_name=!extracted_name!!volume_value!
:: Purge the name_part array
for /l %%B in (0,1,!token_counter!) do set "name_part[%%B]="
:: Return the extracted name
set "%~2=!extracted_name!"

Related

How to return exit code of now rows matched in postgresql

I am writing a script so that to filter jobs that are failed in past 24hrs and also the failed job is not completed or not in running state again.
rm -rf jobsfailed.txt jobscompleted.txt jobsnotcompleted.txt ## Remove the files which are created
export PGPASSWORD="$PG_PASS"
failed_jobs=`psql -t -U bacula bacula << EOD
SELECT jobid,job,name,jobbytes,realendtime,jobstatus,level FROM job WHERE jobstatus IN ('f','A','E') AND realendtime > NOW() - INTERVAL '24 HOURS';
\q
EOD` ### Collect all the jobs which are in the defined states
echo "$failed_jobs" >> jobsfailed.txt ### redirect the values to a file
sortedlist=$(awk -F'|' '{print $3}' jobsfailed.txt | sort | uniq) ## sort the values based on the jobname
for i in $sortedlist
do
retVal=$?
jobs_notcompleted=`psql -t -U bacula bacula << EOD1
SELECT jobid,job,name,jobbytes,realendtime,jobstatus,level FROM job WHERE name LIKE '$i' AND jobstatus IN ('T','R') AND starttime > NOW() - INTERVAL '24 HOURS' ORDER BY jobid DESC LIMIT 1;
\q
EOD1` ### If the job is in above defined states(T or R) then jobs completed successfully. Any other state apart from above then job not completed
if [[ $retVal -eq 0 ]]; then
echo "$jobs_notcompleted" >> jobscompleted.txt
else
echo "$jobs_notcompleted" >> jobsnotcompleted.txt
fi
exit $retVal
done
But i am not getting desired output. Since if no state is getting matched, then it is producing (0 rows) output. Please let me know, is there any other way if 0 rows are matched then that $jobs_notcompleted value should redirect to the jobsnotcompleted.txt file. jobscompleted.txt file is getting created and working as expected.
For some reason, I feel that you need a "NOT" between "jobstatus" and "in" for your second query, because the ${retVal} would, in my mind, always return 0 unless there was a DB error.
If I am wrong on that point, then the "retVal=$?" needs to be moved to after the 2nd query.

PSQL (postgres or redshift) stored variable to prompt and write query to dynamic filename

My adhoc workflow has me in the psql client often, so often that I define useful queries or settings changes in my .psqlrc file. I'm sharing the solution to this because there are few examples online and since you can't use newlines in a metacommand the syntax gets ugly and debugging took a long time.
Define a psql meta-command in a variable that prompts for sql file path and writes to a local file with a dynamic filename
prompt for sql file to execute
prompt for output filename prefix
generate a dynamic output filename based on ISO reporting week
Here is a manual example of the steps I want to wrap into a .pqslrc-defined variable:
-- the following at psql prompt =>>
select 'file_prefix' || '_week_'
|| to_char(next_day(current_date - 1 - 7 * 1, 'sat') + 1,'iyyy-iw')
|| '.txt' report_filename;
┌──────────────────────────────┐
│ report_filename │
├──────────────────────────────┤
│ file_prefix_week_2019-07.txt │
└──────────────────────────────┘
\out file_prefix_week_2019-07.txt
\a \pset footer off -- no border or row count to output file
\i 'path/to/sql_file.sql'
-- now I have a text file of the output locally on my machine
\out \a \pset footer on
=>>
-- back to normal terminal output
Here is the working solution that can be issued on the psql command line or appended to a .psqlrc and invoked from the psql prompt with the variable name:
-- newlines are included for readability but:
-- **remove all newlines when pasting to .psqlrc**
\set report '\\echo enter filename prefix:\\\\ \\prompt file_prefix \\\\
\\echo enter sql file path:\\\\ \\prompt sql_file \\\\
select :''file_prefix'' || ''_week_''
|| to_char(next_day(current_date - 1 - 7 * 1, ''sat'') + 1,''iyyy-iw'')
|| ''.txt'' report_filename \\gset \\\\
\\pset footer off \\pset border 0 \\pset expanded off \\pset format unaligned \\\\
\\out :report_filename \\\\
\\i :sql_file \\\\
\\out \\\\
\\pset footer on \\pset border 2 \\pset expanded auto
\\pset format aligned \\pset linestyle unicode'
key points:
Postgres psql documentation current version
when copy/pasting remove all the newlines
the variable string cannot have newlines, just one long string
the string is contained in single quotes '
the symbol \\ separates commands
the symbol \ is escaped by doubling
therefore use \\ for \, \\\\ for \\
single quotes within the string are escaped by doubling
therefore use '' for ' within the string
this counts for :variables that are strings as well
\gset assigns the output of a query to a variable of the column name
the :sql_file can include spaces, the variable gets stored as a text string that \i is able to parse without wrapping as :''sql_file''
relevant components in .psqlrc
-- .psqlrc
-- remove newlines from \set variable strings before pasting
\pset fieldsep '\t'
\set prompt_1 '%R%#> '
\set PROMPT1 :prompt_1
\set prompt_copyout 'copyout : : : copyout\n%R%#> '
-- helper variables
-- usage:
-- :copyout desired_output_filename
\set copyout '\\pset footer off \\pset border 0 \\pset expanded off
\\pset format unaligned \\pset title \\pset null ''''
\\set PROMPT1 :prompt_copyout \\out '
-- usage to return to normal terminal output
-- :copyoff
\set copyoff '\\pset footer on \\pset border 2 \\pset expanded auto
\\pset format aligned \\pset title \\pset null ''[null]''
\\set PROMPT1 :prompt_1 \\pset linestyle unicode \\out \\\\'
reformulated with helper variables
I switch between copying out to local text files and viewing sql output in the terminal screen, so my actual usage includes the following helpers:
-- again remove all newlines before pasting into .psqlrc
\set report '\\echo enter filename prefix:\\\\ \\prompt file_prefix \\\\
\\echo enter sql file path:\\\\ \\prompt sql_file \\\\
select :''file_prefix'' || ''_week_''
|| to_char(next_day(current_date - 1 - 7 * 1, ''sat'') + 1,''iyyy-iw'')
|| ''.txt'' report_filename \\gset \\\\
:copyout :report_filename \\\\
\\i :sql_file \\\\
:copyoff'

How does SAS decide which file to select when using wildcards

I have a SAS code that looks something like this:
DATA WORK.MY_IMPORT_&stamp;
INFILE "M:\YPATH\myfile_150*.csv"
delimiter = ';' MISSOVER DSD lrecl = 1000000 firstobs = 2 ignoredoseof;
[...]
RUN;
Now, at M:\YPATH I have several files named myfile_150.YYYYMMDD. The code works the way it is supposed to by importing always the latest file. I am wondering how SAS decides which file to choose, since the wildcard * can be replaced by anything. Does it sort the files in descending order and choose the first one?
On my system, SAS 9.4 TS1M4, SAS is reading ALL files that satisfy the wildcard.
I created 3 files (file_A.csv, file_B.csv, and file_C.csv). Each contain 1 record ('A', 'B', and 'C' respectively).
data test;
infile "c:\temp\file_*.csv"
delimiter = ';' MISSOVER DSD lrecl = 1000000 ignoredoseof;
format char $1.;
input char $;
run;
(Note I dropped the firstobs option from your code.)
The resulting TEST data set contains 3 observations, 'A', 'B', and 'C'.
This is the order of files returned when issuing
dir c:\temp\file_*.csv
SAS is using the default behavior of the OS and reading the files in that order.
25 data test;
26 infile "c:\temp\file_*.csv"
27 delimiter = ';' MISSOVER DSD lrecl = 1000000 ignoredoseof;
28 format char $1.;
29 input char $;
30 run;
NOTE: The infile "c:\temp\file_*.csv" is:
Filename=c:\temp\file_A.csv,
File List=c:\temp\file_*.csv,RECFM=V,
LRECL=1000000
NOTE: The infile "c:\temp\file_*.csv" is:
Filename=c:\temp\file_B.csv,
File List=c:\temp\file_*.csv,RECFM=V,
LRECL=1000000
NOTE: The infile "c:\temp\file_*.csv" is:
Filename=c:\temp\file_C.csv,
File List=c:\temp\file_*.csv,RECFM=V,
LRECL=1000000
NOTE: 1 record was read from the infile "c:\temp\file_*.csv".
The minimum record length was 1.
The maximum record length was 1.
NOTE: 1 record was read from the infile "c:\temp\file_*.csv".
The minimum record length was 1.
The maximum record length was 1.
NOTE: 1 record was read from the infile "c:\temp\file_*.csv".
The minimum record length was 1.
The maximum record length was 1.
NOTE: The data set WORK.TEST has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds

Postgres-pgloader-transformation in columns

Loading flat file to postgres table.I need to do few transformations while reading the file and load it.
Like
-->Check for characters, if it is present, default some value. Reg_Exp can be used in oracle. How the functions can be called in below syntax?
-->TO_DATE function from text format
-->Check for Null and defaulting some value
-->Trim functions
-->Only few columns from source file should be loaded
-->Defaulting values, say for instance, source file has only 3 columns. But we need to load 4 columns. One column should be defaulted with some value
LOAD CSV
FROM 'filename'
INTO postgresql://role#host:port/database_name?tablename
TARGET COLUMNS
(
alphanm,alphnumnn,nmrc,dte
)
WITH truncate,
skip header = 0,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by '|',
batch rows = 100,
batch size = 1MB,
batch concurrency = 64
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
Kindly help me, how this can be accomplished used pgloader?
Thanks
Here's a self-contained test case for pgloader that reproduces your use-case, as best as I could understand it:
/*
Sorry pgloader version "3.3.2" compiled with SBCL 1.2.8-1.el7 Doing kind
of POC, to implement in real time work. Sample data from file:
raj|B|0.5|20170101|ABCD Need to load only first,second,third and fourth
column; Table has three column, third column should be defaulted with some
value. Table structure: A B C-numeric D-date E-(Need to add default value)
*/
LOAD CSV
FROM inline
(
alphanm,
alphnumnn,
nmrc,
dte [date format 'YYYYMMDD'],
other
)
INTO postgresql:///pgloader?so.raja
(
alphanm,
alphnumnn,
nmrc,
dte,
col text using "constant value"
)
WITH truncate,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by '|'
SET work_mem to '12MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ drop table if exists so.raja; $$,
$$ create table so.raja (
alphanm text,
alphnumnn text,
nmrc numeric,
dte date,
col text
);
$$;
raj|B|0.5|20170101|ABCD
Now here's the extract from running the pgloader command:
$ pgloader 41287414.load
2017-08-15T12:35:10.258000+02:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2017-08-15T12:35:10.261000+02:00 LOG Data errors in '/private/tmp/pgloader/'
2017-08-15T12:35:10.261000+02:00 LOG Parsing commands from file #P"/Users/dim/dev/temp/pgloader-issues/stackoverflow/41287414.load"
2017-08-15T12:35:10.422000+02:00 LOG report summary reset
table name read imported errors total time
----------------------- --------- --------- --------- --------------
fetch 0 0 0 0.007s
before load 2 2 0 0.016s
----------------------- --------- --------- --------- --------------
so.raja 1 1 0 0.019s
----------------------- --------- --------- --------- --------------
Files Processed 1 1 0 0.021s
COPY Threads Completion 2 2 0 0.038s
----------------------- --------- --------- --------- --------------
Total import time 1 1 0 0.426s
And here's the content of the target table when the command is done:
$ psql -q -d pgloader -c 'table so.raja'
alphanm │ alphnumnn │ nmrc │ dte │ col
═════════╪═══════════╪══════╪════════════╪════════════════
raj │ B │ 0.5 │ 2017-01-01 │ constant value
(1 row)

Pass a parameterized folder name to command line

I use the following command-line:
call run.bat TEST.properties
In TEST.properties file I initialize the following parameter
output.dir=C:/Test_Results
I would like the 'Test_Results' to contain a timestamp each time the script is called. How can I accomplish that? Thank you!
In TEST.properties.bat, after
output.dir=C:\Test_Results
insert the line
echo %date% %time% >>%output.dir%\my_timestamps.txt
and the latest date/time the TEST.properties.bat is run will appear in C:\Test_Results\my_timestamps.txt
Note that / is a switch-indicator. \ is a directory-separator.
If you only need 1 timestamp in 'Test_Results':
set test=%1
rem insert timestamp generating code below if needed
set timestamp=%time%
for /f "tokens=1,2* delims==" %%i in (%test%) do (if "%%i"=="output.dir" echo %timestamp%>%%j)
If you need all the timestamps:
set test=%1
rem insert timestamp generating code below if needed
set timestamp=%time%
for /f "tokens=1,2* delims==" %%i in (%test%) do (
if "%%i"=="output.dir" (
if not exist %%j (echo %timestamp%>%%j) else (
echo %timestamp%>temp.txt
copy %%j+temp.txt %%j
del temp.txt
)
)
)