Writing data to a .mat file - scipy

I am trying to write some data that I extracted from an excel file to a '.mat' file. So far, I have converted the extracted data into an array and converted this array to a dictionary before writing to a .mat file. While the conversions to the array and dictionary seem fine, when I create and write to a .mat file, the data seems corrupted. Here is my code:
import pandas as pd
file_location = '/Users/manmohidake/GoogleDrive/Post_doc/Trial_analysis/1_IndoorOutdoor.xlsx'
mydata = pd.read_excel(file_location,na_values = "Missing", sheet_name='Sheet1', skiprows = 1, usecols="F,K,Q")
import numpy
#Convert data to array
array = mydata.to_numpy()
import scipy.io
import os
destination_folder_path = '/Users/manmohidake/Google Drive/Post_doc/Trial_analysis/'
scipy.io.savemat(os.path.join(destination_folder_path,'trial1.mat'), {'array':array})
I don't really know what's gone wrong. When I open the .mat file, it. looks like this
Matlab file

In [1]: from scipy import io
In [2]: arr = np.arange(12).reshape(4,3)
In [3]: io.savemat('test.mat',{'array':arr})
In [4]: io.loadmat('test.mat')
Out[4]:
{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Mon Sep 20 11:36:48 2021',
'__version__': '1.0',
'__globals__': [],
'array': array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])}
In Octave
>> cd mypy
>> load test.mat
>> array
array =
0 1 2
3 4 5
6 7 8
9 10 11

Related

TF Keras code adaptation from python2.7 to python3

I am working to adapt a python2.7 code that uses keras and tensorflow to implement a CNN but looks like the keras API has changed a little bit since when the original code was idealized. I keep getting an error about "Negative dimension after subtraction" and I can not find out what is causing it.
Unfortunately I am not able to provide an executable piece of code because I was not capable of make the original code works, but the repository containing all the source files can be found here.
The piece of code:
from keras.callbacks import EarlyStopping
from keras.layers.containers import Sequential
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.core import Reshape, Flatten, Dropout, Dense
from keras.layers.embeddings import Embedding
from keras.models import Graph
from keras.preprocessing import sequence
filter_lengths = [3, 4, 5]
self.model = Graph()
'''Embedding Layer'''
self.model.add_input(name='input', input_shape=(max_len,), dtype=int)
self.model.add_node(Embedding(
max_features, emb_dim, input_length=max_len), name='sentence_embeddings', input='input')
'''Convolution Layer & Max Pooling Layer'''
for i in filter_lengths:
model_internal = Sequential()
model_internal.add(
Reshape(dims=(1, self.max_len, emb_dim), input_shape=(self.max_len, emb_dim))
)
model_internal.add(Convolution2D(
nb_filters, i, emb_dim, activation="relu"))
model_internal.add(
MaxPooling2D(pool_size=(self.max_len - i + 1, 1))
)
model_internal.add(Flatten())
self.model.add_node(model_internal, name='unit_' + str(i), input='sentence_embeddings')
What I have tried:
m = tf.keras.Sequential()
m.add(tf.keras.Input(shape=(max_len, ), name="input"))
m.add(tf.keras.layers.Embedding(max_features, emb_dim, input_length=max_len))
filter_lengths = [ 3, 4, 5 ]
for i in filter_lengths:
model_internal = tf.keras.Sequential(name=f'unit_{i}')
model_internal.add(
tf.keras.layers.Reshape(( 1, max_len, emb_dim ), input_shape=( max_len, emb_dim ))
)
model_internal.add(
tf.keras.layers.Convolution2D(100, i, emb_dim, activation="relu")
)
model_internal.add(
tf.keras.layers.MaxPooling2D(pool_size=( max_len - i + 1, 1 ))
)
model_internal.add(
tf.keras.layers.Flatten()
)
m.add(model_internal)
I do not expect a complete solution, what I am really trying to understand is what is the cause to the following error:
Negative dimension size caused by subtracting 3 from 1 for '{{node conv2d_5/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 200, 200, 1], use_cudnn_on_gpu=true](Placeholder, conv2d_5/Conv2D/ReadVariableOp)' with input shapes: [?,1,300,200], [3,3,200,100].

t-test using the scipy library in Python

Which of the following library is used to run a Student's t-test using the scipy library in Python.
?
a = [15, 12, 7, 98]
b = [2, 20, 8, 28]
stat, p = ttest_ind(a, b)
print(stat,p)
Options:
from scipy.ttest_ind import ttest_ind
from student.ttest_ind import ttest_ind
from scipy.statistics import ttest_ind
from scipy.ttest_ind import ttest_ind

Unable To Plot Graph From PostgreSQL Query Results In Dash App

I am attempting to write a simple code to simply plot a bar graph of some fruit names in the x-axis vs corresponding sales units. The aim of this code is just to understand how to query postgres results from heroku hosted database through a dash app.
Below is the code,
from dash import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly.graph_objs as go
import psycopg2
import os
DATABASE_URL = os.environ['DATABASE_URL']
conn = psycopg2.connect(DATABASE_URL, sslmode='require')
cur = conn.cursor()
cur.execute("SELECT fruits FROM pgrt_table")
fruits1=cur.fetchall()
#print(fruits1)
cur.execute("SELECT sales FROM pgrt_table")
sales1=cur.fetchall()
app = dash.Dash()
app.layout = html.Div(children=[
html.H1(
children='Hello Dash'
),
html.Div(
children='''Dash: A web application framework for Python.'''
),
dcc.Graph(
id='example-graph',
figure=go.Figure(
data=[
go.Bar(
x=fruits1, y=sales1, name='SF'),
#{'x': [1, 2, 3], 'y': [2, 4, 5], 'type': 'bar', 'name': u'Montréal'},
],
#'layout':{
# 'title': 'Dash Data Visualization'
#}
)
)
])
if __name__ == '__main__':
app.run_server(debug=True)
The output is below,
Output to the above code
The corresponding output is just the axes with no bar graphs. The connection with the db is working since printing fruits1 or sales1 gives me the values from the columns in postgres. The only issue is the plotting.
NOTE: This question has been heavily modified since the previous draft was extremely vague without any code to show for.
Example:
fruits1 = [('apple',), ('banana',),
('mango',), ('pineapple',),
('peach',), ('watermelon',)]
The output of your database cannot be used directly:
xData = [fruit[0] for fruit in fruits1]
# gives ['apple', 'banana', 'mango', 'pineapple', 'peach', 'watermelon']
yData = [sales[0] for sales in sales1]
You have to assign your data to the go.Bar object:
go.Bar(x=xData, y=yData, name='SF')

ValueError: Cannot feed value of shape (1, 2048, 2048, 1) for Tensor 'image_tensor:0', which has shape '(?, ?, ?, 3)'

Using TensorFlow I am trying to detect one object(png and grayscale image). I have trained and exported a model.ckpt successfully. Now I am trying to restore the saved model.ckpt for prediction. Here is the script:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
if tf.__version__ != '1.4.0':
raise ImportError('Please upgrade your tensorflow installation to v1.4.0!')
# This is needed to display the images.
#matplotlib inline
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from utils import label_map_util
from utils import visualization_utils as vis_util
MODEL_NAME = 'melon_graph'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('training', 'object_detection.pbtxt')
NUM_CLASSES = 1
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape((im_height, im_width, 1)).astype(np.float64)
# For the sake of simplicity we will use only 2 images:
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'te_data{}.png'.format(i)) for i in range(1, 336) ]
# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(image_np,np.squeeze(boxes),np.squeeze(classes).astype(np.float64), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=5)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image_np)
and this is the error
Traceback (most recent call last): File "cochlear_detection.py",
line 81, in
(boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded}) File
"/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py",
line 889, in run
run_metadata_ptr) File "/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py",
line 1096, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (1, 2048, 2048, 1) for Tensor
'image_tensor:0', which has shape '(?, ?, ?, 3)'

How to add meta_data to Pandas dataframe?

I use Pandas dataframe heavily. And need to attach some data to the dataframe, for example to record the birth time of the dataframe, the additional description of the dataframe etc.
I just can't find reserved fields of dataframe class to keep the data.
So I change the core\frame.py file to add a line _reserved_slot = {} to solve my issue. I post the question here is just want to know is it OK to do so ? Or is there better way to attach meta-data to dataframe/column/row etc?
#----------------------------------------------------------------------
# DataFrame class
class DataFrame(NDFrame):
_auto_consolidate = True
_verbose_info = True
_het_axis = 1
_col_klass = Series
_AXIS_NUMBERS = {
'index': 0,
'columns': 1
}
_reserved_slot = {} # Add by bigbug to keep extra data for dataframe
_AXIS_NAMES = dict((v, k) for k, v in _AXIS_NUMBERS.iteritems())
EDIT : (Add demo msg for witingkuo's way)
>>> df = pd.DataFrame(np.random.randn(10,5), columns=list('ABCDEFGHIJKLMN')[0:5])
>>> df
A B C D E
0 0.5890 -0.7683 -1.9752 0.7745 0.8019
1 1.1835 0.0873 0.3492 0.7749 1.1318
2 0.7476 0.4116 0.3427 -0.1355 1.8557
3 1.2738 0.7225 -0.8639 -0.7190 -0.2598
4 -0.3644 -0.4676 0.0837 0.1685 0.8199
5 0.4621 -0.2965 0.7061 -1.3920 0.6838
6 -0.4135 -0.4991 0.7277 -0.6099 1.8606
7 -1.0804 -0.3456 0.8979 0.3319 -1.1907
8 -0.3892 1.2319 -0.4735 0.8516 1.2431
9 -1.0527 0.9307 0.2740 -0.6909 0.4924
>>> df._test = 'hello'
>>> df2 = df.shift(1)
>>> print df2._test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\lib\site-packages\pandas\core\frame.py", line 2051, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute '_test'
>>>
This is not supported right now. See https://github.com/pydata/pandas/issues/2485. The reason is the propogation of these attributes is non-trivial. You can certainly assign data, but almost all pandas operations return a new object, where the assigned data will be lost.
Your _reserved_slot will become a class variable. That might not work if you want to assign different value to different DataFrame. Probably you can assign what you want to the instance directly.
In [6]: import pandas as pd
In [7]: df = pd.DataFrame()
In [8]: df._test = 'hello'
In [9]: df._test
Out[9]: 'hello'
I think a decent workaround is putting your datafame into a dictionary with your metadata as other keys. So if you have a dataframe with cashflows, like:
df = pd.DataFrame({'Amount': [-20, 15, 25, 30, 100]},index=pd.date_range(start='1/1/2018', periods=5))
You can create your dictionary with additional metadata and put the dataframe there
out = {'metadata': {'Name': 'Whatever', 'Account': 'Something else'}, 'df': df}
and then use it as out[df]