How to Import epf data using epfimporter.py provided by apple - app-store

i tried using this link http://www.apple.com/itunes/affiliates/resources/documentation/epfimporter.html
-----------------------
*Below is the script i executed..*
C:\Documents and Settings\freakk>python D:\freakk\Downloads\EPF_Itunes\EPFImporter\E
PFimporter.py \D:\freakk\Downloads\EPF_Itunes\EPFImporter\db\album_popularity_per_
genre
-----------------------
*But i am getting these errors*
2011-10-12 18:24:00,529 [INFO]: Beginning import for the following directories:
\D:\freakk\Downloads\EPF_Itunes\EPFImporter\db\album_popularity_per_genre
2011-10-12 18:24:00,529 [INFO]: Importing files in \D:\freakk\Downloads\EPF_Itunes
\EPFImporter\db\album_popularity_per_genre
Traceback (most recent call last):
File "D:\freakk\Downloads\EPF_Itunes\EPFImporter\EPFimporter.py", line 452, in <
module>
main()
File "D:\freakk\Downloads\EPF_Itunes\EPFImporter\EPFimporter.py", line 435, in m
ain
fieldDelim=fieldSep)
File "D:\freakk\Downloads\EPF_Itunes\EPFImporter\EPFimporter.py", line 162, in d
oImport
fileList = os.listdir(dirPath)
WindowsError: [Error 123] The filename, directory name, or volume label syntax i
s incorrect: 'C:\\D:\\freakk\\Downloads\\EPF_Itunes\\EPFImporter\\db\\album_popula
rity_per_genre/*.*'
please help me....

See the error log its saying you incorrect syntax
C:\\D:\\freakk\\Downloads\\EPF_Itunes\\EPFImporter\\db\\album_popularity_per_genre/*.*
and tell me how can D directory be in C? its not getting the correct path to reach there.
EPFImporter's this code is basically for Mac OS and it assumes that you are in same directory as of EPFImporter.py and in Mac OS everything is in same Directory (as mac is designed).
C:\Documents and Settings\freakk>python D:\freakk\Downloads\EPF_Itunes\EPFImporter\EPFimporter.py \D:\freakk\Downloads\EPF_Itunes\EPFImporter\db\album_popularity_per_genre
above command will not find either of your EPFImporter.py or album_popularity_per_genre.
change your directory to D from C and go to the directory of EPFImporter.py then try as
.....EPFImporter>python EPFImporter.py db\album_popularity_per_genre
assuming you are in same folder of EPFImporter, not tested but something like this may work for you. Hope this answer made you a bit clear on this.

Solved !
I was trying to import only partial data without main table.
Tried to import flat feed...it worked.
Code:
For Flat Feed
C:\Documents and Settings\freakk>python c:\epf\epfimporter.py -f c:\epf\db\application-usa-20111012
Note: Don't include file name(application-usa-20111012.txt)..restrict till folder name only (Eg:application-usa-20111012)

Related

Relative Path (pathlib) name working on MAC OS but on Windows gives me a error

Currently I am working a project that has have been using the pathlib library so I can work on my Windows desktop when I need too and on my MacBook Pro. Essentially be able to work between both operating systems. I have not have any issues at all until right now. Here is the set up:
I have a pipeline set up to automatically save a .joblib and a whole lot of .png files that will go to a directory called
output_dir = Path('../Trained_Models/Differential_gene_analysis/A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus/Models train on TCGA data and test on Rodriguez data/Oct-XX-20XX')
For example, if I want to save a .joblib file under the name RandomForest_TumorThrombus_104.joblib,I would use the command
joblib.dump(model ,output_dir / 'RandomForest_TumorThrombus_104.joblib')
On my MacBook Pro, I have no issues when this is ran, but on Windows it gives me the following error
FileNotFoundError: [Errno 2] No such file or directory: '..\\Trained_Models\\Differential_gene_analysis\\A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus\\Models train on TCGA data and test on Rodriguez data\\Oct-17-2022\\RandomForest_TumorThrombus_104.joblib'
I have tried to use the .resolve() method to get the absolute path but still gives me the same error. I have tried to experiment to try to see what is goin on such as using os.path.exists(). When using the os.path.exists() method I get True for the follwoing command:
os.path.exists(output_dir)
So it does indeed recognize that the directory exists. The next thing I tried was to rename the file to something like dddddd.joblib and that worked. But I find that only a few names for the file would allow me to save the files. During debug I found that the most recent Traceback occurs here:
with open(filename, 'wb') as f:```
I was wondering if anyone here had any idea what was going on here and how I can fix this issue? Please and Thank you.
The solution was to enable long paths on Windows.

Deploying app, troubles to reffer to datasets. Streamlit

Hello i have one more problem with deploying my app by Streamlit. It works localy but when I want to upload it on git hub it doesnt work..Have no idea whats wrong. It seems that there is problem with path to the file:
"File "/app/streamlit/bobrza.py", line 14, in <module>
bobrza_locations = pd.read_csv(location)"
Here is link to my github repo. Will be very very grateful for help. Thank in advance.
https://github.com/Bordonous/streamlit
The problem is you are hard coding the path of the bobrza1.csv and route.csv to the path on your computer so when running the code on a different environment the path in not legal.
The solution is to make location independent from running environment, for this we will use the following:
__file__ variable - the path to the current python module (the .py file).
os.path.dirname() - a function to get directory name from path.
os.path.abspath() - a function to get a normalized absolutized version of path.
os.path.join() - a function to join one or more path components.
Now you need to change your location and location2 variables in the code to the following:
# get the absolute path to the directory contain the .csv file
dir_name = os.path.abspath(os.path.dirname(__file__))
# join the bobrza1.csv to directory to get file path
location = os.path.join(dir_name, 'bobrza1.csv')
# join the route.csv to directory to get file path
location2 = os.path.join(dir_name, 'route.csv')
Resulting in an independent path of the bobrza1.csv and route.csv.

pyspark: IOError: [Errno 20] Not a directory

I am running a pyspark job on AWS-EMR and I got the following error:
IOError: [Errno 20] Not a directory: '/mnt/yarn/usercache/hadoop/filecache/12/my_common-9.0-py2.7.egg/my_common/data_tools/myData.yaml'
Does anyone know what I might have missed? Thanks!
I've run into this recently when I switched my Python Spark application from Client deploy mode to Cluster deploy mode.
My workaround is to locate the ZIP file (the artifact that I fed to spark-submit using --py-files):
CURRENT_FILE_PATH = os.path.dirname(__file__)
print("[DEBUG] CURRENT_FILE_PATH=" + CURRENT_FILE_PATH)
It comes out something like this:
/mnt2/yarn/usercache/task/appcache/application_1638998214637_0019/container_1638998214637_0019_02_000001/something.zip
then I can use something like:
import zipfile
archive = zipfile.ZipFile(CURRENT_FILE_PATH, 'r')
json_bytes = archive.read('myfile.json')
json_string = json.loads(json_bytes)
Note: I first tried using pkg_resources but couldn't read in the
resulting JSON due to TypeError from json.loads():
import pkg_resources
json_data = pkg_resources.resource_stream(__name__, 'myfile.json')
See also PySpark: how to resolve path of a resource file present inside the dependency zip file
As the error states my_common-9.0-py2.7.egg is not a directory.
Are you missing space in your path?
/mnt/yarn/usercache/hadoop/filecache/12/my_common-9.0-py2.7.egg /my_common/data_tools/myData.yaml

make packages installed in virtualenv visibile to sphinx

I am using sphinx to document my software. and I am using a virtualenv for the installation. now some packages are only installed in the virtual environment, and sphinx does not see them.
I have this code in my conf.py:
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
p = os.path.abspath('..')
sys.path.insert(0, p)
if 'VIRTUAL_ENV' in os.environ:
q = os.sep.join([os.environ['VIRTUAL_ENV'],
'lib', 'python2.7', 'site-packages'])
sys.path.insert(0, q)
p = p + ":" + q
os.environ['PYTHONPATH'] = p
yet if I make html, I get this sort of warnings:
/home/mario/Local/github/Bauble/bauble.classic/doc/api.rst:358: WARNING: autodoc: failed to import class u'TagItemGUI' from module u'bauble.plugins.tag'; the following exception was raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 385, in import_object
__import__(self.modname)
File "/home/mario/Local/github/Bauble/bauble.classic/bauble/plugins/tag/__init__.py", line 30, in <module>
from sqlalchemy import *
ImportError: No module named sqlalchemy
my $VIRTUAL_ENV/lib/python2.7/site-packages contains SQLAlchemy-1.0.4-py2.7-linux-x86_64.egg.
definitely related to question Sphinx autodoc dies on ImportError of third party package, but the description of the procedure I chose to follow is in a broken link.
The problem is that packages are not directly included in virtualenv's site-packages dir, you would need to specify the full path to be able to import package from there. I use the following hack:
if 'VIRTUAL_ENV' in os.environ:
site_packages_glob = os.sep.join([
os.environ['VIRTUAL_ENV'],
'lib', 'python2.7', 'site-packages', 'projectname-*py2.7.egg'])
site_packages = glob.glob(site_packages_glob)[-1]
sys.path.insert(0, site_packages)
Where projectname is the name of the python module I would like to import.
Note that this is error prone, especially when you have multiple versions
of the module, but so far it works for me.

fails to import module when using matlabdomain

While trying to use the sphinx matlab domain I can't get the MWE to work, provided on the extensions pypi site
There is always this Can't import module error. I'd guess, that the extension kind of generates pseudo modules from the m-code, but up to know I actually could not figure out, how this mechanism works.
The dir structure looks like this
root
|--test_data
| |--MyHandleClass.m
|
|--doc
|--------conf.py
|--------Makefile
|--------index.rst
The files MyHandleClass.m and index.rst contain the example code given on the package site and the conf.py starts like this
import sys, os
sys.path.append(os.path.abspath('.'))
sys.path.append(os.path.abspath('./test_data'))
# -- General configuration -----------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = [
"sphinxcontrib.matlab",
"sphinx.ext.autosummary",
"sphinx.ext.autodoc"]
autodoc_default_flags = ['members','show-inheritance','undoc-members']
autoclass_content = 'both'
mathjax_path = 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default'
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8'
# The master toctree document.
master_doc = 'index'
Error msg
WARNING: autodoc: failed to import module u'test_data'; the following exception was raised:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\sphinx\ext\autodoc.py", line 335, in import_object
__import__(self.modname)
ImportError: No module named test_data
E:\ME\doc\index.rst:13: WARNING: don't know which module to import for autodocumenting u'MyHandleClass' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
After varying this and that maybe somebody out there has a clue?
Thanks for trying the matlabdomain sphinxcontrib extension. In order to use Sphinx to document MATLAB m-files, you need to add matlab_src_dir in conf.py as described in the Configuration section of the documenation. This is because the Python interpreter can't import a MATLAB m-file. Therefore you should not add your MATLAB root to the Python sys.path, or you will get the error you received. Instead set matlab_src_dir to the path containing the folder of your MATLAB project which you want to document.
Given your file structure, in order to document test_data use a conf.py with the following:
import os
# NOTE: don't add MATLAB m-files to `sys.path`
#sys.path.insert(0, os.path.abspath('.'))
# instead add them to `matlab_src_dir
matlab_src_dir = os.path.abspath('..') # MATLAB
Hope that does it! Please feel free to ask any more questions. I'm happy to help!