TypeError in subprocess command for youtube-dl - subprocess

I'm trying to write a very simple script which passes in a .csv file and runs youtube-dl (and specified args) for each link in the file- saving the files to a certain directory.
The format of the csv is Artist;Title;Link. And the script:
import pandas as pd
import subprocess
def get_music(csv):
df = pd.read_csv(csv, sep=";", skipinitialspace=True)
for _, row in df.iterrows():
subprocess.call(['youtube-dl', "x",
"--output ~/mydir/%(title)s.%(ext)s",
"--extract-audio", "--youtube-skip-dash-manifest",
"--prefer-ffmpeg", "--audio-format", "mp3"], row.Link)
get_music("CSV.csv")
When I run this however, I get the following error:
"raise TypeError("bufsize must be an integer")
TypeError: bufsize must be an integer"
I'm afraid I don't understand how the bufsize is getting passed something other than an integer. Simply put, what am I doing wrong, and how should I fix it?

Currently, your second argument to subprocess.call, which specifies the bufsize, is row.Link, which seems to be the URL you want to download. Instead of "x", pass in the actual link. Also, there is no option "--output ~/mydir/%(title)s.%(ext)s", as option names do not contain spaces. Most likely, you want
subprocess.call(['youtube-dl', row.Link,
"--output", "~/mydir/%(title)s.%(ext)s",
"--extract-audio", "--youtube-skip-dash-manifest",
"--prefer-ffmpeg", "--audio-format", "mp3"])

Related

Why is my specified range in PROC IMPORT being ignored?

I am trying to import a set of exchange rates. The data set lookes like this:
That is to say the actual data should be read from row 5 and downwards from the sheet named "Växelkurser". The variable names should be read from row 4.
I try writing the following code:
PROC IMPORT
DATAFILE="/opt3/01_Dataleveranser/03_IBIS/Inläsning/IBIS3/Växelkurser macrobond/Växelkurser19DEC2022.xlsx"
OUT=WORK.VALUTOR_0000
DBMS=xlsx
REPLACE;
sheet="Växelkurser";
getnames=yes;
range="Växelkurser$A4:0";
RUN;
And I get the following result:
I clearly specified that SAS should start reading from the fourth row and that the variable names should be read from that row. Why is this being ignored and how would I make this work?
The problem seems to be that you are specifying both sheet= and range=. The sheet statement is telling SAS to read the whole sheet and I think this is overriding the later range statment.
Remove the following line and the code should work as expected:
sheet="Växelkurser";

Return a dataframe from another notebook in databricks

I have a notebook which will process the file and creates a data frame in structured format.
Now I need to import that data frame created in another notebook, but the problem is before running the notebook I need to validate that only for some scenarios I need to run.
Usually to import all data structures, we use %run. But in my case it should be combinations of if clause and then notebook run
if "dataset" in path": %run ntbk_path
its giving an error " path not exist"
if "dataset" in path": dbutils.notebook.run(ntbk_path)
this one I cannot get all the data structures.
Can someone help me to resolve this error?
To implement it correctly you need to understand how things are working:
%run is a separate directive that should be put into the separate notebook cell, you can't mix it with the Python code. Plus, it can't accept the notebook name as variable. What %run is doing - it's evaluating the code from specified notebook in the context of the current Spark session, so everything that is defined in that notebook - variables, functions, etc. is available in the caller notebook.
dbutils.notebook.run is a function that may take a notebook path, plus parameters and execute it as a separate job on the current cluster. Because it's executed as a separate job, then it doesn't share the context with current notebook, and everything that is defined in it won't be available in the caller notebook (you can return a simple string as execution result, but it has a relatively small max length). One of the problems with dbutils.notebook.run is that scheduling of a job takes several seconds, even if the code is very simple.
How you can implement what you need?
if you use dbutils.notebook.run, then in the called notebook you can register a temp view, and caller notebook can read data from it (examples are adopted from this demo)
Called notebook (Code1 - it requires two parameters - name for view name & n - for number of entries to generate):
name = dbutils.widgets.get("name")
n = int(dbutils.widgets.get("n"))
df = spark.range(0, n)
df.createOrReplaceTempView(name)
Caller notebook (let's call it main):
if "dataset" in "path":
view_name = "some_name"
dbutils.notebook.run(ntbk_path, 300, {'name': view_name, 'n': "1000"})
df = spark.sql(f"select * from {view_name}")
... work with data
it's even possible to do something like with %run, but it could require a kind of "magic". The foundation of it is the fact that you can pass arguments to the called notebook by using the $arg_name="value", and you can even refer to the values specified in the widgets. But in any case, the check for value will happen in the called notebook.
The called notebook could look as following:
flag = dbutils.widgets.get("generate_data")
dataframe = None
if flag == "true":
dataframe = ..... create datarame
and the caller notebook could look as following:
------ cell in python
if "dataset" in "path":
gen_data = "true"
else:
gen_data = "false"
dbutils.widgets.text("gen_data", gen_data)
------- cell for %run
%run ./notebook_name $generate_data=$gen_data
------ again in python
dbutils.widgets.remove("gen_data") # remove widget
if dataframe: # dataframe is defined
do something with dataframe

Convert PSS/E .raw file to Pandapower

I'm trying to find a possible way to convert PSS/E native .raw files to Pandapower format.
My objective is to take advantage of the network plotting capabilities that are available in Pandapower.
For that, I have to first be able to load my grid data into Pandapower.
For that, I have to somehow bridge the gap between PSSE .raw to Pandapower.
Literature says that a possible way of doing this is by using the 'psse2mpc' function available in Matpower.
I've tried to use it but I get the following error message:
(quote)
>> psse2mpc('RED1523.raw')
Reading file 'RED1523.raw' ............................................. done.
Splitting into individual lines ...error: regexp: the input string is invalid UTF-8
error: called from
psse_read at line 60 column 9
psse2mpc at line 68 column 21
(unquote)
I'was informed that maybe I should save my .raw file (natively generated with a PSSE/E v33 version) into an older .raw format (corresponding to previous PSS/E versions).
I've tried this as well but still have the same error message.
Apart from getting this error which so far impedes to reach my objective, I've been unable to guess the Pandapower "equivalent .raw" structure. Does anybody know how this input structure looks like in Pandapower?
If I would know how Pandapower needs to get the input data, I could even try to code a taylor-made python script that converts my .raw file into whatever is required from Pandapower.
If somebody could help me to get out of this labyrinth I would be most gratefull !!!
Thanks.
Eneko.
You need to check your .raw file to enter the other Inputs of the psse2mpc function. For instance, if I have the case39.raw file and I want to convert it to matpower format like case39mpc.m, then I must enter something like this:
psse2mpc ('case39.raw', 'case39mpc.m', '1', '29')

For loop to open files in Python

I am relatively new to Python and need to run a python macro through Abaqus. I am opening files e.g "nonsym1, nonsym2, nonsym3". I'm trying to do this with a loop. The code opens nonsym1 (in abaqus) and performs some operations on it, then is supposed loop back and do the same to the other files. Here is the code I'm trying...
for i in range (1,10):
filename = 'nonsym(i)'
step = mdb.openStep(
'C:/Users/12345678/Documents/Inventor/Aortic Dissection/%s.stp' %filename,
scaleFromFile=OFF)
My main issue is coming from the fact that the %s in the directory I think?...
error message when trying to run this macro Don't know how to best approach this, so any help would be great thanks! Still learning!
Instead of using filename=nonsym1-2-3-..., name the step files as integers 1.stp,2.stp,3.stp and then convert integers to the string values with %str(i)...
And use the code below:
for i in range (1,10):
step = mdb.openStep(
'C:/Users/12345678/Documents/Inventor/Aortic Dissection/%s.stp' %str(i), scaleFromFile=OFF)
To obtain equal quantity of odb files, modify the Job code line as similiar as this code.

How to prevent `ssconvert` recalculating Excel file before conversion?

I am trying to convert the .xlsx file http://www.eia.gov/forecasts/steo/archives/mar14_base.xlsx into a .csv, but it seems that the .xlsx contains formulae that link to a local file that I don't have (the creator of the file must have forgotten to paste as values instead of as formula)
so each time I use ssconvert it tries to recalculate the formula, which fails hence I can't get the data:
ssconvert --export-type=Gnumeric_stf:stf_assistant -O "locale=C
format=automatic separator=, eol=unix sheet='3atab'" "STEO_m.xlsx"
"text.csv"
triggers the following message: (and the values inside the .csv are missing)
(/usr/bin/ssconvert:14771): GLib-GObject-WARNING **:
g_object_set_valist: object class 'SheetObjectImage' has no property named `style' '7etab'!BM7 :
'VLOOKUP($A7,[1]oracle_allbbb!$2:$89130,Dates!BM$12+1,FALSE)' Invalid expression
I have seen that there is also a --recalc argument in your ssconvert function but actually I want to do the opposite!
ssconvert --recalc=FALSE --export-type=Gnumeric_stf:stf_assistant -O "locale=C format=automatic separator=, eol=unix sheet='3atab'" "STEO_m.xlsx" "text.csv"
Is there any piece of advice you could give me to find a solution here?
Apparently it was a bug. It is fixed here.