Python package built with Poetry does not install supporting data files - python-packaging

I'm writing a Python package (assume it's "mypkg") and using Poetry to build the distributable wheel/tar.gz files for installation by the end users. But when I install the package using pip install mypkg-0.2.0-py3-none-any.whl it does not install the supporting files, such as the tests and CLI scripts. I've attempted to follow all the standard best practices for organizing a package, but I've clearly missed somet fundamental interaction between the package structure and how poetry/setuptools is interacting with it all.
Here is the package structure:
mypkg
¦ .gitignore
¦ pyproject.toml
¦ README.txt
¦
+---mypkg
¦ ¦ a.py
¦ ¦ b.py
¦ ¦ __init__.py
¦ ¦
¦ +---styles
¦ ¦ mypkg c.exp
¦ ¦ mypkg d.exp
¦
+---data
¦ e.json
¦ f.json
¦
+---dist
¦ mypkg-0.2.0-py3-none-any.whl
¦ mypkg-0.2.0.tar.gz
¦
+---docs
¦ User Guide.pdf
¦
+---LICENSE
¦ License.rtf
¦ ThirdParty.rtf
¦
+---scripts
¦ other.py
¦ run_mypkg.py
¦
+---tests
¦ ¦ conftest.py
¦ ¦ test_g.py
¦ ¦ test_h.py
¦ ¦ __init__.py
¦ ¦
¦ +---data
¦ ¦ +---g
¦ ¦ ¦ ¦ otherfile.txt
¦ ¦ ¦
¦ ¦ +---h
¦ ¦ ¦ differentfile.txt
where
mypkg - core source code and some mandatory supporting files in the styles directory.
data - supporting data files that will be read by the CLI scripts.
docs - User guide, etc.
scripts - some CLI scripts that a user can run to exercise the routines in mypkg
tests - are the various pytest files/functions.
The contents of my pyproject.toml file are
[tool.poetry]
name = "mypkg"
version = "0.2.0"
description = "stuff"
license = "Proprietary"
authors = ["Me <support#test.com>"]
readme = "README.txt"
include = [
{ path = "data"},
{ path = "scripts"},
{ path = "tests"},
{ path = "LICENSE"},
{ path = "docs/*.pdf"},
]
[tool.poetry.dependencies]
python = "^3.8"
numpy = "^1.23.2"
pandas = "^1.4.4"
portion = "^2.3.0"
scipy = "^1.9.1"
plotly = "^5.10.0"
pytest = "^7.2.0"
tqdm = "^4.64.1"
jsonpickle = "^2.2.0"
[tool.pytest.ini_options]
log_cli = true
log_cli_level = "INFO"
minversion = "6.0"
addopts = "-ra"
testpaths = [
"tests",
]
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
When I run poetry build it succcesfully builds the wheel, etc. I can examine the .tar.gz file in 7zip and confirm that the contents are:
mypkg-0.2.0
¦ PKG-INFO
¦ pyproject.toml
¦ README.txt
¦ setup.py
¦
+---mypkg
¦ ¦ a.py
¦ ¦ b.py
¦ ¦ __init__.py
¦ ¦
¦ +---styles
¦ ¦ mypkg c.exp
¦ ¦ mypkg d.exp
+---data
¦ e.json
¦ f.json
¦
+---docs
¦ User Guide.pdf
¦
+---LICENSE
¦ License.rtf
¦ ThirdParty.rtf
¦
+---scripts
¦ other.py
¦ run_mypkg.py
¦
+---tests
¦ ¦ conftest.py
¦ ¦ test_g.py
¦ ¦ test_h.py
¦ ¦ __init__.py
¦ ¦
¦ +---data
¦ ¦ +---g
¦ ¦ ¦ ¦ otherfile.txt
¦ ¦ ¦
¦ ¦ +---h
¦ ¦ ¦ differentfile.txt
So it looks like poetry is grabbing all the files I've asked it to. I install into a test conda environent using
pip install mypkg-0.2.0-py3-none-any.whl. Then looking at the C:\Users\me\anaconda3\envs\test\Lib\site-packages\mypkg I only see the core source code and the styles directory. I can't find the top level tests|scripts|data directories anywhere. The docs got installed under C:\Users\me\anaconda3\envs\test\Lib\site-packages\docs.
The setup.py file that was created by poetry in the wheel looks like this:
# -*- coding: utf-8 -*-
from setuptools import setup
packages = \
['mypkg']
package_data = \
{'': ['*'], 'mypkg': ['styles/*']}
install_requires = \
['jsonpickle>=2.2.0,<3.0.0',
'numpy>=1.23.2,<2.0.0',
'pandas>=1.4.4,<2.0.0',
'plotly>=5.10.0,<6.0.0',
'portion>=2.3.0,<3.0.0',
'pytest>=7.2.0,<8.0.0',
'scipy>=1.9.1,<2.0.0',
'tqdm>=4.64.1,<5.0.0']
setup_kwargs = {
'name': 'cram',
'version': '0.2.0',
'description': 'stuff',
'author': 'Me',
'author_email': '<support#test.com>',
'maintainer': 'None',
'maintainer_email': 'None',
'packages': packages,
'package_data': package_data,
'install_requires': install_requires,
'python_requires': '>=3.8,<4.0',
}
setup(**setup_kwargs)
Judging by this, it appears that setup.py is only grabbing the stuff directory under mypkg, which is consistent with the observed behavior. The desired outcome is to have the installed package include the tests | scripts | docs directories.
For export control reasons, I can't publish the package to PyPi - so everything has to be self-contained. As far as I can tell my options are:
Move everything under mypkg, but that seems to be going against best practices for organizing the package
Fix some fundamental flaw in how my pyproject.toml file is setup that is resulting in the setup.py not doing "the right" thing?
Thanks!

Related

How to define an alias for a command with options in a Powershell Profile using functions?

I already have a bunch of command aliases defined for git. Here's all of them:
del alias:gp -Force
del alias:gl -Force
del alias:gcm -Force
function get-gst { git status }
set-alias -name gst -value get-gst
function get-gco ([string]$branch) {
git checkout "$branch"
}
set-alias -name gco -value get-gco
function get-gcob ([string]$branch) {
git checkout -b "$branch"
}
set-alias -name gcob -value get-gcob
function get-gp { git push }
set-alias -name gp -value get-gp
function get-gl { git pull }
set-alias -name gl -value get-gl
function get-gcm ([string]$message) {
git commit -m "$message"
}
set-alias -name gcm -value get-gcm
function get-ga ([string]$path) {
git add "$path"
}
set-alias -name ga -value get-ga
function get-gap { git add --patch }
set-alias -name gap -value get-gap
function get-gsta { git stash push }
set-alias -name gsta -value get-gsta
function get-gstp { git stash pop }
set-alias -name gstp -value get-gstp
function get-glo { git log --oneline }
set-alias -name glo -value get-glo
function get-a ([string]$option, [string]$option2, [string]$option3) {
php artisan "$option" "$option2" "$option3"
}
set-alias -name a -value get-a
Specifically, this is the content of Powershell's $Profile.AllUsersAllHosts
Currently, I have 2 separate functions for git checkout branch and git checkout -b branch. The aliases are respectively gco branch and gcob branch.
What I would like to have is a function similar to the last alias in my list (the one for php artisan), such that I can have something like this:
function get-gco ([string]$option, [string]$option2) {
git checkout "$option" "$option2"
}
set-alias -name gco -value get-gco
..that would allow me to write gco -b branch or in fact pass any other option to the git checkout command. Unfortunately this does not work and upon writing gco -b newBranchName nothing happens (it is as if I have only written git chekout), but it does work if I would write gco "-b" newBranchName or gco '-b' newBranchName.
Do you think it's possible to make the function such that the quotes aren't needed in Powershell and how would you go about doing it? Alternatively would it be possible in another command-line interface, like for example git bash?
Splat the $args automatic variable:
function get-gco {
git checkout #args
}
This will pass any additional arguments to git checkout as-is, so now you can do either:
get-gco existingBranch
# or
get-gco -b newBranch
... with the same function

Properly invoking PowerShell.exe from cmd.exe - -File, double-quotes, and single-quotes

Let's say I have the following simple PowerShell script in c:\temp\file-data-alt.ps1:
Param ($filename, $data)
Write-Output ($filename)
Write-Output ($data)
If I invoke it from cmd.exe as follows I get the expected output:
C:\Temp>PowerShell.exe -File c:\temp\file-data-alt.ps1 -Filename "c:\abc bcd cde" -Data "123 234 345"
c:\abc bcd cde
123 234 345
If I remove -File, we get unexpected results:
C:\Temp>PowerShell.exe c:\temp\file-data-alt.ps1 -Filename "c:\abc bcd cde" -Data "123 234 345"
c:\abc
123
If I leave off -File but switch the double-quotes to single-quotes, it appears to work:
C:\Temp>PowerShell.exe c:\temp\file-data-alt.ps1 -Filename 'c:\abc bcd cde' -Data '123 234 345'
c:\abc bcd cde
123 234 345
And, now, if we add -File back in, we get unexpected results:
C:\Temp>PowerShell.exe -File c:\temp\file-data-alt.ps1 -Filename 'c:\abc bcd cde' -Data '123 234 345'
'c:\abc
'123
I'm assuming that the first approach (-File with double-quotes) is the proper approach. Is this the case?
It seems to kinda work when you leave -File off especially if you use single-quotes. Clearly, leaving off -File has some sort of function. When would you want to take that approach? Is there a legitimate use case for leaving -File off?

Creating a sub-folder structure

Looking for some advice on how to use PowerShell (or some other means) to create some folders.
I have a list of around 120 products that I've allocated separate folders. (using a CSV to generate the folders)
I want each product folder to have the same subfolder structure. As shown below:
[Products]
├── [Product 1]
│ ├── [1. Datasheet]
│ │ ├── [1. Current]
│ │ ├── [2. Archived]
│ │ ├── [3. Legacy]
│ │ ├── [4. Draft]
│ │ └── [5. Resources]
│ ├── [2. Images]
│ ├── [3. Brochures]
│ ├── [4. Manuals]
│ └── [5. Software]
│
├── [Product 2]
│ ├── [1. Datasheet]
│ │   ├── [1. Current]
│ │   ├── [2. Archived]
│ │   ├── [3. Legacy]
│ │   ├── [4. Draft]
│ │   └── [5. Resources]
│ ├── [2. Images]
│ ├── [3. Brochures]
│ ├── [4. Manuals]
│ └── [5. Software]
:
:
Essentially the first layer of subfolders in each would be:
[1. Datasheet], [2. Images], [3. Brochures], [4. Manuals], [5. Software]
Inside each of these would be the following:
[1.Current], [2.Archived], [3. Legacy], [4. Draft], [5. Resources]
I don't mind doing this in stages, it's just I don't know where to begin.
This could work:
$workingdir = 'c:\temp'
$products = Get-Content c:\temp\listofproducts.txt
$rootfolders = #(
'Datasheet'
'Images'
'Brochures'
'Manuals'
'Software'
)
$subfolders = #(
'Current'
'Archived'
'Legacy'
'Draft'
'Resources'
)
foreach ($product in $products)
{
$rootcount = 0
foreach ($root in $rootfolders)
{
$rootcount++
$subcount = 0
foreach ($sub in $subfolders)
{
$subcount++
mkdir (Join-Path $workingdir ("$product\$rootcount. $root\$subcount. $sub"))
}
}
}
or you could just create the first product folder then copy and paste it then rename the product
Thanks for the input.
Managed to find a strategy that worked for me, so will share it in case it's of use to anyone else like me.
Used a combination of the following 3 bits of code to achieve what I needed:
# Create Folders from CSV
$folder = "Z:\Products\"
$name = Import-Csv Z:\Products\names.csv
Foreach ($line in $name)
{
New-Item -path $folder -Name $line.Name -Type Directory
}
This above code allowed me to make a big list of folders from a CSV list made in Excel.
# Add a Subfolder
foreach($folder in (gci 'Z:\Products' -directory)){
new-item -ItemType directory -Path ($folder.fullname+"\subfolder")
}
This above code let me populate the list of folders with subfolders. I added them one at a time.
# Copy a file to folder
$folders = Get-ChildItem Z:\Products
foreach ($folder in $folders.name){
Copy-Item -Path "Z:\Datasheet\changelog.txt" -Destination "Z:\Products\$folder" -Recurse
}
This above code allowed me to copy items to subfolder locations.

How do I update all NuGet packages at once with the dotnet CLI?

I'm trying to update all NuGet packages for a solution in VS Code (using Mac). Is there a way to achieve that in VS code or for a specific project.json file? At the moment I'm going one by one but I would have thought there is either an extension or a feature that does that for you?
For update all packages in all projects Nuget Package Manager GUI extension can do it with one click.
How it works
Open your project workspace in VSCode
Open the Command Palette (Ctrl+Shift+P)
Select > Nuget Package Manager GUI
Click Load Package Versions
Click Update All Packages
Based on Jon Canning's powershell solution. I fixed a small bug where only the first dependency was being updated and not all the dependencies for the project file.
$regex = 'PackageReference Include="([^"]*)" Version="([^"]*)"'
ForEach ($file in get-childitem . -recurse | where {$_.extension -like "*proj"})
{
$packages = Get-Content $file.FullName |
select-string -pattern $regex -AllMatches |
ForEach-Object {$_.Matches} |
ForEach-Object {$_.Groups[1].Value.ToString()}|
sort -Unique
ForEach ($package in $packages)
{
write-host "Update $file package :$package" -foreground 'magenta'
$fullName = $file.FullName
iex "dotnet add $fullName package $package"
}
}
Here's a shell script and a powershell script that will do this
#!/bin/bash
regex='PackageReference Include="([^"]*)" Version="([^"]*)"'
find . -name "*.*proj" | while read proj
do
while read line
do
if [[ $line =~ $regex ]]
then
name="${BASH_REMATCH[1]}"
version="${BASH_REMATCH[2]}"
if [[ $version != *-* ]]
then
dotnet add $proj package $name
fi
fi
done < $proj
done
$regex = [regex] 'PackageReference Include="([^"]*)" Version="([^"]*)"'
ForEach ($file in get-childitem . -recurse | where {$_.extension -like "*proj"})
{
$proj = $file.fullname
$content = Get-Content $proj
$match = $regex.Match($content)
if ($match.Success) {
$name = $match.Groups[1].Value
$version = $match.Groups[2].Value
if ($version -notin "-") {
iex "dotnet add $proj package $name"
}
}
}
Should also mention Paket as a fantastic alternative package manager that supports update:
https://fsprojects.github.io/Paket/index.html
dotnet tool install paket --tool-path .paket
Also have a look at dotnet outdated:
https://github.com/dotnet-outdated/dotnet-outdated
UPDATE 2023/01
The current way to do this from the command line seems to be this:
https://github.com/dotnet-outdated/dotnet-outdated
OLD
This seems to work https://nukeeper.com/
dotnet tool install nukeeper --global
nukeeper update <SLN/PROJ>
UPDATE
The default settings on nukeeper seem slightly odd to me as running nukeeper update will only update a single package, and only if it is a major version that is more than 3 days old.
To update to the latest non-prerelease version of everything run:
nukeeper update -a 0 -m 1000
And for prerelease:
nukeeper update -a 0 -m 1000 --useprerelease Always
The -m 1000 flag is a synonym for everything, assuming that you have less than 1000 packages in your solution / project.
I wrote this powershell script to keep packages up to date on Githup.
To update all packages of the solution I use first dotnet sln list.
The for each project I get the list of outdated package with dotnet list package --outdated, it give the latest version of each outdated packages.
And for each packages I update the project with dotnet add package {package name} --version {new version}.
Full code:
# Update one project packages
function UpdatePackages {
param (
$project
)
$return = $false
# Get outdated packages
$packageLineList = dotnet list $project package --outdated
foreach($line in $packageLineList) {
Write-Host $line
$match = $line -match '>\s(\S*)\s*\S*\s*\S*\s*(\S*)'
if (!$match) {
# the line doesn't contain a package information, continue
continue
}
# update an outdated package
$added = dotnet add $project package $Matches.1 --version $Matches.2
if ($LASTEXITCODE -ne 0) {
# error while updating the package
Write-Error "dotnet add $project package $Matches.1 --version $Matches.2 exit with code $LASTEXITCODE"
Write-Host $added
break
}
$return = $true
}
return $return
}
# Restore dependencies
dotnet restore
# Get all project list in the solution
$projectList = dotnet sln list
$updated = $false
foreach($path in $projectList) {
if ($path -eq "Project(s)" -or $path -eq "----------") {
# The line doesn't contain a path, continue
continue
}
# Update project dependencies
$projectUpdated = UpdatePackages -project $path
if ($LASTEXITCODE -ne 0) {
#The update fail, exit
exit $LASTEXITCODE
}
$updated = $updated -or $projectUpdated
}
if (!$updated) {
# No packages to update found, exit
Write-Host "nothing to update"
exit 0
}
For CLI -
as already mentioned in comments there exist a package to perform updates
https://github.com/dotnet-outdated/dotnet-outdated
From UI -
In case some one is still looking for answer, with vs 2019 this has been pretty easy :)
Right Click on solution and choose "Manage nuget package for solution".
It should open a window like below -
On selection of package, right side we can see the project and we can update the packages :)
Based on Jon Caning answer, I've written this small bash script to add in .bashrc (or just change a bit to keep it in its bash file)
function read_solution() {
echo "Parsing solution $1"
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ \"([^\"]*.csproj)\" ]]; then
project="${BASH_REMATCH[1]}"
read_project "$(echo "$project"|tr '\\' '/')"
fi
done < "$1"
}
function read_project() {
echo "Parsing project $1"
package_regex='PackageReference Include="([^"]*)" Version="([^"]*)"'
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ $package_regex ]]; then
name="${BASH_REMATCH[1]}"
version="${BASH_REMATCH[2]}"
if [[ $version != *-* ]]; then
dotnet add "$1" package "$name"
fi
fi
done < $1
}
function dotnet_update_packages() {
has_read=0
if [[ $1 =~ \.sln$ ]]; then
read_solution "$1"
return 0
elif [[ $1 =~ \.csproj$ ]]; then
read_project "$1"
return 0
elif [[ $1 != "" ]]; then
echo "Invalid file $1"
return 1
fi
for solution in ./*.sln; do
if [ ! -f ${solution} ]; then
continue
fi
read_solution "${solution}"
has_read=1
done
if [[ $has_read -eq 1 ]]; then
return 0
fi
for project in ./*.csproj; do
if [ ! -f ${project} ]; then
continue
fi
read_project "${project}"
done
}
export -f dotnet_update_packages
To use it, either run it without parameter in a folder with a solution, it would first look for all solution files in the current folder and run for all csproj files referenced in those (it might need to be changed if you work with something else than c#).
If no solution is found, it looks for all csproj files in the current directory and run for those.
You can also pass a .sln or .csproj file as argument:
dotnet_update_packages mysolution.sln
dotnet_update_packages myproject.csproj
I'm not a bash expert so I'm sure it can be improved
Nukeeper seems to be an excellent tool for the job. We're even using it in nightly builds to keep the internal libraries up-to-date. After installing the tool, in the solution folder use a command like:
nukeeper update --age 0 --maxpackageupdates 1000 --change Major --useprerelease Never
I created a cake build task to do the same. See below:
Task("Nuget-Update")
.Does(() =>
{
var files = GetFiles("./**/*.csproj");
foreach(var file in files)
{
var content = System.IO.File.ReadAllText(file.FullPath);
var matches = System.Text.RegularExpressions.Regex.Matches(content, #"PackageReference Include=""([^""]*)"" Version=""([^""]*)""");
Information($"Updating {matches.Count} reference(s) from {file.GetFilename()}");
foreach (System.Text.RegularExpressions.Match match in matches) {
var packageName = match.Groups[1].Value;
Information($" Updating package {packageName}");
var exitCode = StartProcess("cmd.exe",
new ProcessSettings {
Arguments = new ProcessArgumentBuilder()
.Append("/C")
.Append("dotnet")
.Append("add")
.Append(file.FullPath)
.Append("package")
.Append(packageName)
}
);
}
}
});
If you're using Visual Studio, it can be done quite easily.
Step 1
Right click the solution and click 'Manage NuGet Packages for Solution':
Step 2
Go to Updates tab, check Select all packages and hit Update button:

How to find files based on other files in current directory, with find command?

Want to find all mkv files without having same-name ass/srt file in the same folder.
How can I do that?
for example, I have following directory:
folder_1
|----folder_2
| |-----a.mkv
| |-----a.srt
|----folder_3
| |-----b.mkv
|----folder_4
|-----c.mkv
|-----c.ass
The search result should be: folder_1/folder_3/b.mkv.
Many Thanks.
Get answer from my friends, share it:
find . -name "*.mkv" -o -name "*.ass" -o -name "*.srt"| sort |rev|uniq -s 3 -u| rev|rgrep ".mkv"
BTW, if you are using synology nas, which does not have 'rev' command, you can walkaround it by using a python script(rev.py):
import sys
if __name__ == '__main__':
if len(sys.argv) >= 2:
for arg in sys.argv[1:]:
print '"' + arg[::-1] + '"'
and the script will be changed to:
find . -name "*.mkv" -o -name "*.ass" -o -name "*.srt"| sort |awk '{print "\"", $0,"\""}' OFS=""|xargs python rev.py |uniq -s 3 -u| xargs python rev.py | grep ".mkv\""