Does ruamel.yaml have a function to do the process with all files in one directory? - ruamel.yaml

Does ruamel.yaml have a function to do the process with all files in one directory?
Something like this:
data = yaml.load(Path("*.*"))

No, it does not, but you can do it one line (assuming you have the imports and the YAML() instance):
from pathlib import Path
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = [yaml.load(p) for p in Path('.').glob('*.yaml')]

Related

How read .shp files in databricks from filestore?

I'm using Databricks community, and I save a .shp in the FileStore, but when I tried to read I get this error:
DriverError: /dbfs/FileStore/tables/World_Countries.shp: No such file or directory
this is my Code
import geopandas as gpd
gdf = gpd.read_file("/dbfs/FileStore/tables/World_Countries.shp")
I also tried
gdf = gpd.read_file("/FileStore/tables/World_Countries.shp")
You should first verify that the file path is correct and that the file exists in the specified location. You can use the dbutils.fs.ls command to list the contents of the directory and check if the file is present. You can do this using:
dbutils.fs.ls("dbfs:/FileStore/path/to/your/file.shp")
Also, make sure that you have the correct permissions to access the file. In Databricks, you may need to be an administrator or have the correct permissions to access the file.
Try to read the file using the full path, including the file extension:
file_path = "dbfs:/FileStore/path/to/your/file.shp"
df = spark.read.format("shapefile").option("shape", file_path).load()
There are then several methods to read files in Databrick:
1.
from pyspark.sql.functions import *
file_path = "dbfs:/FileStore/path/to/your/file.shp"
df = spark.read.format("shapefile").option("shape", file_path).load()
df.show()
df = spark.read.shape(file_path)
and
3.
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import functions as F
from shapely.geometry import Point
geo_df = df.select("shape").withColumn("geometry", F.shape_to_geometry("shape")).drop("shape").select("geometry")``

Generate temporary Directory with files in Python for unittesting

I want to create a temporary folder with a directory and some files:
import os
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as tmp_dir:
# generate some random files in it
Path('file.txt').touch()
Path('file2.txt').touch()
files_in_dir = os.listdir(tmp_dir)
print(files_in_dir)
Expected: [file.txt,file2.txt]
Result: []
Does anyone know how to this in Python? Or is there a better way to just do some mocking?
You have to create the file inside the directory by getting the path of your tmp_dir. The with context does not do that for you.
with tempfile.TemporaryDirectory() as tmp_dir:
Path(tmp_dir, 'file.txt').touch()
Path(tmp_dir, 'file2.txt').touch()
files_in_dir = os.listdir(tmp_dir)
print(files_in_dir)
# ['file2.txt', 'file.txt']

How come the same py file runs in one Pycharm project, but not another, while both projects have same module imported

Recently, I installed pandas_profiling for the purpose of a particular project I created in the PyCharm IDE. It worked after having updated 'External tools' in Settings. Realising a similar need in another project context I did the installation of pandas_profiling in …\venv\Scripts path for that particular project as well. Did similar update of External tools in the new project. Yet the console keeps telling me that it cannot detect the module. Both projects have the pandas_profiling package files in the 'site packages' and 'venv' directories when I check. Any ideas out there? Thx, for your kind support.
from pathlib import Path
import pandas as pd
import numpy as np
import requests
import pandas_profiling
if __name__ == "__main__":
file_name = Path("C:\\Users\…..csv")
if not file_name.exists():
data = requests.get(
"C:\\Users\…..csv"
)
file_name.write_bytes(data.content)
df = pd.read_csv(file_name)
df["Feature_1"] = pd.to_datetime(df["Feature_1"], errors="coerce")
# Example: Constant variable
# df["source"] = "name of org"
# Example: Boolean variable
df["boolean"] = np.random.choice([True, False], df.shape[0])
# Example: Mixed with base types
df["mixed"] = np.random.choice([1, "A"], df.shape[0])
# Example: Highly correlated variables
df["Feature_2"] = df["Feature_2"] + np.random.normal(scale=5, size=(len(df)))
# Example: Duplicate observations
duplicates_to_add = pd.DataFrame(df.iloc[0:10])
duplicates_to_add[u"Feature_1"] = duplicates_to_add[u"Feature_1"]
df = df.append(duplicates_to_add, ignore_index=True)
profile = df.profile_report(
title="Report", correlation_overrides=["recclass"]
)
profile.to_file(output_file=Path("C:\\Users.....html"))
Response from console in new project (while working in existing project):
Traceback (most recent call last):
File "C:/Users/.../PycharmProjects/.../Pandas_Profiling_2.py", line 8, in <module>
import pandas_profiling
ModuleNotFoundError: No module named 'pandas_profiling'
Process finished with exit code 1

How to import .py in google Colaboratory?

I want to simplify code. so i make a utils.py , but Google Colaboratory directory is "/content" I read other questions. but this is not my solution
In Google's Colab notebook, How do I call a function from a Python file?
%%writefile example.py
def f():
print 'This is a function defined in a Python source file.'
# Bring the file into the local Python environment.
execfile('example.py')
f()
This is a function defined in a Python source file.
It look likes just using def().
using this, i always write the code in cell.
but i want to this code
import example.py
example.f()
A sample maybe you want:
!wget https://raw.githubusercontent.com/tensorflow/models/master/samples/core/get_started/iris_data.py -P local_modules -nc
import sys
sys.path.append('local_modules')
import iris_data
iris_data.load_data()
I have also had this problem recently.
I addressed the issue by the following steps, though it's not a perfect solution.
src = list(files.upload().values())[0]
open('util.py','wb').write(src)
import util
This code should work with Python 3:
from google.colab import drive
import importlib.util
# Mount your drive. It will be at this path: "/content/gdrive/My Drive/"
drive.mount('/content/gdrive')
# Load your module
spec = importlib.util.spec_from_file_location("YOUR_MODULE_NAME", "/content/gdrive/My Drive/utils.py")
your_module_name = importlib.util.module_from_spec(spec)
spec.loader.exec_module(your_module_name)
import importlib.util
import sys
from google.colab import drive
drive.mount('/content/gdrive')
# To add a directory with your code into a list of directories
# which will be searched for packages
sys.path.append('/content/gdrive/My Drive/Colab Notebooks')
import example.py
This works for me.
Use this if you are out of content folder! hope this help!
import sys
sys.path.insert(0,'/content/my_project')
from example import*
STEP 1. I have just created a folder 'common_module' like shown in the image :
STEP 2 called the required Class from my "colab" code cell,
sys.path.append('/content/common_module/')
from DataPreProcessHelper import DataPreProcessHelper as DPPHelper
My class file 'DataPreProcessHelper.py' looks like this
Add path of 'sample.py' file to system paths as:
import sys
sys.path.append('drive/codes/')
import sample

alembic/env.py target_metadata = metadata "No module name al_test.models"

When I use alembic to control the version of my project's database,part of codes in env.py
like:
# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
from al_test.models import metadata
target_metadata = metadata
when I run 'alembic revision --autogenerate -m "Added user table"', I get an error :
File "alembic/env.py", line 18, in
from al_test.models import metadata
ImportError: No module named al_test.models
so how to solve the question? thanks!
This might be a bit late, and you may have already figured out the issue, but my guess the problem is that your alembic/ directory is not part of the system path. I.e. you need to do something like:
import sys
sys.path.append(path/to/al_test)
from al_test.models import metadata
Update your env.py like this, to add the current working directory to the sys.path that Python uses when searching for modules:
import os
import sys
sys.path.append(os.getcwd())
from al_test.models import metadata
target_metadata = metadata
....
....