Should mill tasks with non-file products produce something analogous to PathRef? - mill

I'm using mill to build a pipeline that
cleans up a bunch of CSV files (producing new files)
loads them into a database
does more work in the database (create views, etc)
runs queries to extract some files.
Should the tasks associated with steps 2 and 3 be producing something analogous to PathRef? If so, what? They aren't producing a file on the disk but nevertheless should not be repeated unless the inputs change. Similarly, tasks associated with step 3 should run if tasks in step 2 are run again.
I see in the documentation for targets that you can return a case class and that re-evaluation depends on the .hashCode of the target's return value. But I'm not sure what to do with that information.
And a related question: Does mill hash the code in each task? It seems to be doing the right thing if I change the code for one task but not others.

A (cached) task in mill is re-run, when the build file (build.sc or it's dependencies/includes) or inputs/dependencies of that task change. Whenever you construct a PathRef, a checksum of the path content is calculated and used as hashCode. This makes it possible to detect changes and only act if anything has changed.
Of course there are exceptions, e.g. input tasks (created with T.input or T.sources) and commands (created with T.command) will always run.
It is in general a good idea to return something from a task. A simple String or Int will do, e.g. to show it in the shell with mill show myTask or post-process it later. Although I think a task running something in an external database should be implemented as a command or input task (which might check, when running, if it really needs something to do), you can also implement it as cached task. But please keep in mind, that the state will only be correct if no other process/user is changing the database in between.
That aside, You could return a current database schema/data version or a last change date in step 2. Make sure, it changes whenever the database was modified. Each change of that return value would trigger step 3 and other dependent tasks. (Of course, step 3 needs to depend on step 2.) As a bonus, you could return the same (previous/old) value in case you did not change the database, which would then avoid any later work in dependent tasks.

Related

Run NUnit Test After All Others Are Complete

I have a situation (detailed below) in which I want to run one NUnit test after all the other tests have completed. I know that I can use the order attribute to start my tests in a certain order but in this case:
I want to attribute (or otherwise change) only one test out of several hundred.
I want this test to run last, not first.
I want this test to run after all other tests have completed, not after they've started.
I have experimented with OneTimeTearDown, but ideally this would run as a regular, named test and appear that way in the test results.
(Why)
I have several hundred named, hand-crafted tests that run against different folders of json test files. Non-programmers add files to these folders from time to time. The purpose of this final test is to introspect those folders and compare the contents on disk with the files for which a test has already been executed (these are recorded by each test). If this indicates that there are untested files that, itself, constitutes a test failure.
It's an interesting question. Basically you want a meta-test... one that tests how well you are testing. For that reason, it's logical for it to actually be a test.
Unfortunately, NUnit only supports this sort of "run after everything" in a OneTimeTearDown. Now, you can Perform assertions in a OneTimeTearDown, but any failures are treated as errors, i.e. unexpected exceptions. For some purposes, this may be workable, but it doesn't look quite the same as a failure.
If I were doing this, I think I'd make it a separate analytical step in my script, after the tests had been run.

Get list of executions filtered by parameter value

I am using Spring-batch 3.0.4 stable. While submitting a job I add some specific parameters to its execution, say, a tag. Jobs information is persisted in the DB.
Later on I will need to retrieve all the executions marked with a particular tag.
Currently I see 2 options:
Get all job instances with org.springframework.batch.core.explore.JobExplorer#findJobInstancesByJobName. For each instance get all available executions with org.springframework.batch.core.explore.JobExplorer#getJobExecutions. Filter the resulting collection of executions checking its JobParameters.
Write my own JdbcTemplate-based DAO implementation to run the select query.
While the former option seems pretty inefficient, the latter one suggests writing extra code to deal with the Spring-specific database tables structure.
Is there any option I am missing here?

Dynamic test cases

We are using NUnit to run our integration tests. One of tests should always do the same, but take different input parameters. Unfortunately, we cannot use [TestCase] attribute, because our test cases are stored in an external storage. We have dynamic test cases which could be added, removed, or disabled (not removed) by our QA engineers. The QA people do not have ability to add [TestCase] attributes into our C# code. All they can do is to add them into the storage.
My goal is to read test cases from the storage into memory, run the test with all enabled test cases, report if a test case is failed. I cannot use "foreach" statement because if test case #1 is failed, then rest of the test cases will not be run at all. We already have build server (CruiseControl.net) where generated NUnit reports are shown, therefore I would like to continue using NUnit.
Could you point to a way how can I achieve my goal?
Thank you.
You can use [TestCaseSource("PropertyName")\] which specifies a property (or method etc) to load data from.
For example, I have a test case in Noda Time which uses all the BCL time zones - and that could change over time, of course (and is different on Mono), without me changing the code at all.
Just make your property/member load the test data into a collection, and you're away.
(I happen to have always used properties, but it sounds like it should work fine with methods too.)

Explain the steps for db2-cobol's execution process if both are db2 -cobol programs

How to run two sub programs from a main program if both are db2-cobol programs?
My main program named 'Mainpgm1', which is calling my subprograms named 'subpgm1' and 'subpgm2' which are a called programs and I preferred static call only.
Actually, I am now using a statement called package instead of a plan and one member, both in 'db2bind'(bind program) along with one dbrmlib which is having a dsn name.
The main problem is that What are the changes affected in 'db2bind' while I am binding both the db2-cobol programs.
Similarly, in the 'DB2RUN'(run program) too.
Each program (or subprogram) that contains SQL needs to be pre-processed to create a DBRM. The DBRM is then bound into
a PLAN that is accessed by a LOAD module at run time to obtain the correct DB/2 access
paths for the SQL statements it contains.
You have gone from having all of your SQL in one program to several sub-programs. The basic process
remains the same - you need a PLAN to run the program.
DBA's often suggest that if you have several sub-programs containing SQL that you
create PACKAGES for them and then bind the PACKAGES into a PLAN. What was once a one
step process is now two:
Bind DBRM into a PACKAGE
Bind PACKAGES into a PLAN
What is the big deal with PACKAGES?
Suppose you have 50 sub-programs containing SQL. If you create
a DBRM for each of them and then bind all 50 into a PLAN, as a single operation, it is going
to take a lot of resources to build the PLAN because every SQL statement in every program needs
to be analyzed and access paths created for them. This isn't so bad when all 50 sub-programs are new
or have been changed. However, if you have a relatively stable system and want to change 1 sub-program you
end up reBINDing all 50 DBRMS to create the PLAN - even though 49 of the 50 have not changed and
will end up using exactly the same access paths. This isn't a very good apporach. It is analagous to compiling
all 50 sub-programs every time you make a change to any one of them.
However, if you create a PACKAGE for each sub-program, the PACKAGE is what takes the real work to build.
It contains all the access paths for its associated DBRM. Now if you change just 1 sub-program you
only have to rebuild its PACKAGE by rebinding a single DBRM into the PACKAGE collection and then reBIND the PLAN.
However, binding a set of PACKAGES (collection) into a PLAN
is a whole lot less resource intensive than binding all the DBRM's in the system.
Once you have a PLAN containing all of the access paths used in your program, just use it. It doesn't matter
if the SQL being executed is from subprogram1 or subprogram2. As long as you have associated the PLAN
to the LOAD that is being run it should all work out.
Every installation has its own naming conventions and standards for setting up PACKAGES, COLLECTIONS and
PLANS. You should review these with your Data Base Administrator before going much further.
Here is some background information concerning program preperation in a DB/2 environment:
Developing your Application

How to make an InArgument's value dependant upon the value of another InArgument at design time

I have a requirement to allow a user to specify the value of an InArgument / property from a list of valid values (e.g. a combobox). The list of valid values is determined by the value of another InArgument (the value of which will be set by an expression).
For instance, at design time:
User enters a file path into workflow variable FilePath
The DependedUpon InArgument is set to the value of FilePath
The file is queried and a list of valid values is displayed to the user to select the appropriate value (presumably via a custom PropertyValueEditor).
Is this possible?
Considering this is being done at design time, I'd strongly suggest you provide for all this logic within the designer, rather than in the Activity itself.
Design-time logic shouldn't be contained within your Activity. Your Activity should be able to run independent of any designer. Think about it this way...
You sit down and design your workflow using Activities and their designers. Once done, you install/xcopy the workflows to a server somewhere else. When the server loads that Activity prior to executing it, what happens when your design logic executes in CacheMetadata? Either it is skipped using some heuristic to determine that you are not running in design time, or you include extra logic to skip this code when it is unable to locate that file. Either way, why is a server executing this design time code? The answer is that it shouldn't be executing it; that code belongs with the designers.
This is why, if you look at the framework, you'll see that Activities and their designers exist in different assemblies. Your code should be the same way--design-centric code should be delivered in separate assemblies from your Activities, so that you may deliver both to designers, and only the Activity assemblies to your application servers.
When do you want to validate this, at design time or run time?
Design time is limited because the user can use an expression that depends on another variable and you can't read the value from there at design time. You can however look at the expression and possibly deduce an invalid combination that way. In this case you need to add code to the CacheMetadata function.
At run time you can get the actual values and validate them in the Execute function.