How to *reverse* comma separated output that combines data from 2 specific cells? - forms

I wrote code (based on How to collect data from multiple cells (quantity of 2) and return an output with 2 comma-separated values in one cell? AppScripts) that returns an output of comma separated values. It works great.
For example,
Input:
A1 = 1
B1 = 2
Results in Output:
1, 2
Here is that code:
[formSheet.getRange("E25").setValue(row[2]) + ", " +
formSheet.getRange("F25").setValue(row[2])],
The context is a spreadsheet-based Form. (No, the generic Google Forms does not work for our purposes).
Now I got myself in a pickle because users are requesting the Feature of being able to Retrieve and Edit their Form.
Ok. I can do that.
Until....
I get to those Output cells that are comma separated. Whoops. Is there a way to reverse this process, so that each value goes back into its respective Input cell?
Output (source):
1, 2
Retrieved Results:
A1 = 1
B1 = 2
Thanks!

Just split and setValues:
formSheet.getRange('E25:F25').setValues([
outputSheet.getRange('A25').getValue().split(', ')
])

Related

How to strip extra spaces when writing from dataframe to csv

Read in multiple sheets (6) from an xlsx file and created individual dataframes. Want to write each one out to a pipe delimited csv.
ind_dim.to_csv (r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
Currently outputs like this:
1|value1 |value2 |word1 word2 word3 etc.
Want to strip trailing blanks
Suggestion
Include the method .apply(lambda x: x.str.rstrip()) to your output string (prior to the .to_csv() call) to strip the right trailing blank from each field across the DataFrame. It would look like:
Change:
ind_dim.to_csv(r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
To:
ind_dim.apply(lambda x: x.str.rstrip()).to_csv(r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
It can be easily inserted to the output code string using '.' referencing. To handle multiple data types, we can enforce the 'object' dtype on import by including the argument dtype='str':
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
Or on the DataFrame itself by:
df = pd.DataFrame(df, dtype='str')
Proof
I did a mock-up where the .xlsx document has 5 sheets, with each sheet having three columns: The first column with all numbers except an empty cell in row 2; the second column with both a leading blank and a trailing blank on strings, an empty cell in row 3, and a number in row 4; and the third column * with all strings having a leading blank, and an empty value in row 4*. Integer indexes and integer columns have been included. The text in each sheet is:
0 1 2
0 11111 valueB1 valueC1
1 valueB2 valueC2
2 33333 valueC3
3 44444 44444
4 55555 valueB5 valueC5
This code reads in our .xlsx testing_xlsx_dtype.xlsx to the DataFrame dictionary ind_dim.
Next, it loops through each sheet using a for loop to place the sheet name variable as a key to reference the individual sheet DataFrame. It applies the .str.rstrip() method to the entire sheet/DataFrame by passing the lambda x: x.str.rstrip() lambda function to the .apply() method called on the sheet/DataFrame.
Finally, it outputs the sheet/DataFrame as a .csv with the pipe delimiter using .to_csv() as seen in the OP post.
# reads xlsx in
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
# loops through sheets, applies rstrip(), output as csv '|' delimit
for sheet in ind_dim:
ind_dim[sheet].apply(lambda x: x.str.rstrip()).to_csv(sheet + '_ind_dim_out.csv', sep='|')
Returns:
|0|1|2
0|11111| valueB1| valueC1
1|| valueB2| valueC2
2|33333|| valueC3
3|44444|44444|
4|55555| valueB5| valueC5
(Note our column 2 strings no longer have the trailing space).
We can also reference each sheet using a loop that cycles through the dictionary items; the syntax would look like for k, v in dict.items() where k and v are the key and value:
# reads xlsx in
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
# loops through sheets, applies rstrip(), output as csv '|' delimit
for k, v in ind_dim.items():
v.apply(lambda x: x.str.rstrip()).to_csv(k + '_ind_dim_out.csv', sep='|')
Notes:
We'll still need to apply the correct arguments for selecting/ignoring indexes and columns with the header= and names= parameters as needed. For these examples I just passed =None for simplicity.
The other methods that strip leading and leading & trailing spaces are: .str.lstrip() and .str.strip() respectively. They can also be applied to an entire DataFrame using the .apply(lambda x: x.str.strip()) lambda function passed to the .apply() method called on the DataFrame.
Only 1 Column: If we only wanted to strip from one column, we can call the .str methods directly on the column itself. For example, to strip leading & trailing spaces from a column named column2 in DataFrame df we would write: df.column2.str.strip().
Data types not string: When importing our data, pandas will assume data types for columns with a similar data type. We can override this by passing dtype='str' to the pd.read_excel() call when importing.
pandas 1.0.1 documentation (04/30/2020) on pandas.read_excel:
"dtypeType name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion."
We can pass the argument dtype='str' when importing with pd.read_excel.() (as seen above). If we want to enforce a single data type on a DataFrame we are working with, we can set it equal to itself and pass it to pd.DataFrame() with the argument dtype='str like: df = pd.DataFrame(df, dtype='str')
Hope it helps!
The following trims left and right spaces fairly easily:
if (!require(dplyr)) {
install.packages("dplyr")
}
library(dplyr)
if (!require(stringr)) {
install.packages("stringr")
}
library(stringr)
setwd("~/wherever/you/need/to/get/data")
outputWithSpaces <- read.csv("CSVSpace.csv", header = FALSE)
print(head(outputWithSpaces), quote=TRUE)
#str_trim(string, side = c("both", "left", "right"))
outputWithoutSpaces <- outputWithSpaces %>% mutate_all(str_trim)
print(head(outputWithoutSpaces), quote=TRUE)
Starting Data:
V1 V2 V3 V4
1 "Something is interesting. " "This is also Interesting. " "Not " "Intereting "
2 " Something with leading space" " Leading" " Spaces with many words." " More."
3 " Leading and training Space. " " More " " Leading and trailing. " " Spaces. "
Resulting:
V1 V2 V3 V4
1 "Something is interesting." "This is also Interesting." "Not" "Intereting"
2 "Something with leading space" "Leading" "Spaces with many words." "More."
3 "Leading and training Space." "More" "Leading and trailing." "Spaces."

Grouping file names in clusters

I am using this line to read all Images in a file:
imagefiles = dir('Images\*.jpg');
Suppose I have the names: a1.jpg,a11.jpg,b13.JPG,b5.JPG,c1.jpg.
How do I group together all images with no more than 2 different characters (the number) in their name. for the given example group together all a and all b and atheired group for c.
By grouping I mean form some kind of data structure or order that will enable me to access each group separately for later processing?
I am assuming the file type is always 'jpg' and the numbers will always be smaller then 100 and positive. I am assuming a not case sensitive code regarding file type, that is jpg and JPG may appear (I don't know regular expression but will be happy to learn from a good link as well)
You could capture the initial non-number part of the file name using regexp, group them with unique and put them in a struct.
% Some test data
files = {'a11','a1','b2','a32','ca3','b45','c1','ca2'};
files = strcat(files, '.jpg');
% Capture and group
tag = regexp(files,'^\D+','match','once');
[unTag, ~, unIdx] = unique(tag);
for idx = 1:length(unTag)
fileGroups.(unTag{idx}) = files(unIdx == idx);
end
% The result
>> fileGroups =
a: {'a11.jpg' 'a1.jpg' 'a32.jpg'}
b: {'b2.jpg' 'b45.jpg'}
c: {'c1.jpg'}
ca: {'ca3.jpg' 'ca2.jpg'}
Depending on how your filenames you might have to update to a more detailed regular expression. You could use \D+(?=\d+\.(JPG|jpg)) to caputure a non-digit char before some number and the .jpg extension.
So if your file names are something like:
>> files
'dummyStr_a11.jpg'
'dummyStr_a1.jpg'
'dummyStr_b2.jpg'
'dummyStr_a32.jpg'
'dummyStr_ca3.jpg'
'dummyStr_b45.jpg'
'dummyStr_c1.jpg'
'dummyStr_ca2.jpg'
Capture with something like
tag = regexp(files,'[a-z]+(?=\d+\.(JPG|jpg))','match','once');
>> tag =
'a' 'a' 'b' 'a' 'ca' 'b' 'c' 'ca'

Extract specific column information from table in MATLAB

I have several *.txt files with 3 columns information, here just an example of one file:
namecolumn1 namecolumn2 namecolumn3
#----------------------------------------
name1.jpg someinfo1 name
name2.jpg someinfo2 name
name3.jpg someinfo3 name
othername1.bmp info1 othername
othername2.bmp info2 othername
othername3.bmp info3 othername
I would like to extract from "namecolumn1" only the names starting with name but from column 1.
My code look like this:
file1 = fopen('test.txt','rb');
c = textscan(file1,'%s %s %s','Headerlines',2);
tf = strcmp(c{3}, 'name');
info = c{1}{tf};
the problem is that when I do disp(info) I got only the first entry from the table: name1.jpg and I would like to have all of them:
name1.jpg
name2.jpg
name3.jpg
You're pretty much there. What you're seeing is an example of MATLAB's Comma Separated List, so MATLAB is returning each value separately.
You can verify this by entering c{1}{tf} in the command line after running your script, which returns:
>> c{1}{tf}
ans =
name1.jpg
ans =
name2.jpg
ans =
name3.jpg
Though sometimes we'd want to concatenate them, I think in the case of character arrays it is more difficult to work with than retaining the cell arrays:
>> info = [c{1}{tf}]
info =
name1.jpgname2.jpgname3.jpg
versus
>> info = c{1}(tf)
info =
'name1.jpg'
'name2.jpg'
'name3.jpg'
The former would require you to reshape the result (and whitespace pad, if the strings are different lengths), whereas you can index the strings in a cell array directly without having to worry about any of that (e.g. info{1}).

Using fscanf in MATLAB to read an unknown number of columns

I want to use fscanf for reading a text file containing 4 rows with an unknown number of columns. The newline is represented by two consecutive spaces.
It was suggested that I pass : as the sizeA parameter but it doesn't work.
How can I read in my data?
update: The file format is
String1 String2 String3
10 20 30
a b c
1 2 3
I have to fill 4 arrays, one for each row.
See if this will work for your application.
fid1=fopen('test.txt');
i=1;
check=0;
while check~=1
str=fscanf(fid1,'%s',1);
if strcmp(str,'')~=1;
string(i)={str};
end
i=i+1;
check=strcmp(str,'');
end
fclose(fid1);
X=reshape(string,[],4);
ar1=X(:,1)
ar2=X(:,2)
ar3=X(:,3)
ar4=X(:,4)
Once you have 'ar1','ar2','ar3','ar4' you can parse them however you want.
I have found a solution, i don't know if it is the only one but it works fine:
A=fscanf(fid,'%[^\n] *\n')
B=sscanf(A,'%c ')
Z=fscanf(fid,'%[^\n] *\n')
C=sscanf(Z,'%d')
....
You could use
rawText = getl(fid);
lines = regexp(thisLine,' ','split);
tokens = {};
for ix = 1:numel(lines)
tokens{end+1} = regexp(lines{ix},' ','split'};
end
This will give you a cell array of strings having the row and column shape or your original data.
To read an arbitrary line of text then break it up according the the formating information you have available. My example uses a single space character.
This uses regular expressions to define the separator. Regular expressions powerful but too complex to describe here. See the MATLAB help for regexp and regular expressions.

How to display selected entries of an array of structures in MATLAB

Suppose we have an array of structure. The structure has fields: name, price and cost.
Suppose the array A has size n x 1. If I'd like to display the names of the 1st, 3rd and the 4th structure, I can use the command:
A([1,3,4]).name
The problem is that it prints the following thing on screen:
ans =
name_of_item_1
ans =
name_of_item_3
ans =
name_of_item
How can I remove those ans = things? I tried:
disp(A([1,3,4]).name);
only to get an error/warning.
By doing A([1,3,4]).name, you are returning a comma-separated list. This is equivalent to typing in the following in the MATLAB command prompt:
>> A(1).name, A(3).name, A(4).name
That's why you'll see the MATLAB command prompt give you ans = ... three times.
If you want to display all of the strings together, consider using strjoin to join all of the names together and we can separate the names by a comma. To do this, you'll have to place all of these in a cell array. Let's call this cell array names. As such, if we did this:
names = {A([1,3,4]).name};
This is the same as doing:
names = {A(1).name, A(3).name, A(4).name};
This will create a 1 x 3 cell array of names and we can use these names to join them together by separating them with a comma and a space:
names = {A([1,3,4]).name};
out = strjoin(names, ', ');
You can then show what this final string looks like:
disp(out);
You can use:
[A([1,3,4]).name]
which will, however, concatenate all of the names into a single string.
The better way is to make a cell array using:
{ A([1,3,4]).name }