Populate an excel crosstab - crosstab

I'm attempting to populate an excel spreadsheet using Softartisans ExcelWriter (part of Office Writer), it fair easy if you need to "load" only one record, or a tabular table.
I need to fill a "Crosstab Table", something like this:
01-may-2013 02-may-2013 03-may-2013 etc...
name
address
age
etc
etc
etc
all data (date, name, address,...) are on the same record, as seen above, I need to use the date field as column header.
We can see it as instead of listing the data horizontally, I need to do it vertically.
all data comes from a single table, anyone achieved this before?
I can populate the first column, but after more than a week of reading the documentation, and beggin google for the correct response, I'm really desperate.
If this is not possible in ExcelWriter, can you please recommend me how to generate a crosstab report from the web, it can be in xls or pdf. And easy enough for intermediate programmers.

Note: I work for SoftArtisans, makers of OfficeWriter.
If your data has unique values and all you need to do is transpose the data, then there are several options.
For example, if your data looks like this:
Date Name Address Age
05/01/2013 Bob 70 Random Dr 54
05/02/2013 Carl 50 Unique Rd 76
05/03/2013 Fred 432 Tiger Lane 56
05/04/2013 Amy 123 Think Ave 23
05/05/2013 Dana 58 Turtle Path 67
And you want to import the data so that it looks like this:
Date 05/01/2013 05/02/2013 05/03/2013 05/04/2013 05/05/2013
Name Bob Carl Fred Amy Dana
Address 70 Random Dr 50 Unique Rd 432 Tiger Lane 123 Think Ave 58 Turtle Path
Age 54 76 56 23 67
Easiest option - Import with ExcelApplication
The easiest option is to use the ExcelApplication object's Worksheet.ImportData method to programmatically import the data. To customize how the data is imported, you will need to set some of the DataImportProperties.
//Open or create a file with ExcelApplication
ExcelApplication xla = new ExcelApplication();
Workbook wb = xla.Create(ExcelApplication.FileFormat.Xlsx);
//Get a handle on the worksheet where you want to import the data
//You can also create a new worksheet instead
Worksheet ws = wb.Worksheets.CreateWorksheet("ImportedData");
//Create a DataImportProperties object
//Set the data import to transpose the data
DataImportProperties dataImportProps = wb.CreateDataImportProperties();
dataImportProps.Transpose = true;
//Import the data
DataTable dt = GetData(); //random method that returns some data
ws.ImportData(dt, ws.Cells["A1"], dataImportProps);
//Stream the output back to the client
xla.Save(wb, Page.Response, "Output.xlsx", false);
ImportData will not automatically import the header names of the data set. So you may also want to set DataImportProperties.UseColumnNames to TRUE to import the header names (Date, Name, Address, Age).
If you are importing numerical data, such as ages or dates, you may also want to set DataImportProperties.ConvertStrings to TRUE to make sure that they are imported as numbers and not as text.
Alternate method - Import with ExcelTemplate
An alternative method would be to use the ExcelTemplate object to import the data into an existing template file that contains placeholder data markers to indicate where the data should be imported.
ExcelTemplate also has DataBindingProperties which control how the data is imported when calling ExcelTemplate.BindData. One of the properties, DataBindingProperties.Transpose, will pivot the data. This property only takes affect if the data source is a two-dimensional array.
//Open a template with placeholder data markers
ExcelTemplate xlt = new ExcelTemplate();
xlt.Open("template.xlsx");
//Create DataBindingProperties and set it to transpose the data
DataBindingProperties dataBindProps = xlt.CreateDataBindingProperties();
dataBindProps.Transpose = true;
//Bind data to the template
//data is of type object[][]
//colNames is of type string[] e.g {"Date", "Name", "Address", "Age"}
xlt.BindData(data, colNames, "DataSource", dataBindProps);
//Process and save the template
xlt.Process();
xlt.Save(Page.Response, "Output.xlsx", false);
By default, ExcelTemplate does not import column names. To transpose and import the column names, you will need a separate data marker in the template (i.e. %%=$HeaderNames) and make a separate call to ExcelTemplate.BindColumnData to import the header names into the column.
//headerNames is of type object[]
//dataBindProps2 is a second DataBindingProperties that is not set to transpose
xlt.BindColumnData(headerNames, "HeaderNames", dataBindProps2);

Related

Google Apps Script - Email when row in Google sheet is updated

I am a teacher and new to programing script. I have a sheet named 'Scores' in a Google spreadsheet that has a list of emails in column A and an array of data in the following columns. When any data in B:R is changed I would like to automatically send an email to the address listed in column A of the row that changed in this sheet that includes the data in that row and associated column headers.
Example.
Send Email to address in 'A4'
Subject line: Undated Scores
A string of text as a greeting.
Create a table with 'column headers' and 'Row Data'
B1 - B4
C1 - C4
D1 - D4
...to last column
Thanks
You will have to compose the subject and the message with the information found in data. The index for data is one less than the column number. If you wish to learn more about the onedit event object try adding console.log(JSON.stringify(e)) to the second line and it will print in the execution log. I like to use Utilties.formatString() when composing text mixed in with merged data.
//function will only run with you in the correct sheet and you edit any cell from b to r or 2 to 18
function sendEmailWhenBRChanges(e) {
const sh=e.range.getSheet();
const startRow=2;//wherever your data starts
if(sh.getName()=='Your Sheet Name' && e.range.columnStart>1 && e.range.columnStart<19 e.range.rowStart>=startRow) {
let data=sh.getRange(e.range.rowStart,1,1,18).getValues()[0];//data is now in a flattened array
//compose subject and message here if you want html then use the options object
GmailApp.sendEmail(data[0],Subject,Message);
}
}
on edit event object
Note you will have to create an installable trigger because sending email requires permission. You can create the trigger programmatically using ScriptApp.newTrigger() or go to the triggers section of the new editor or the edit menu in the legacy editor and don't forget to put the e in the parameters section of the function declaration.
Also please note that you cannot run this function directly from the script editor because it requires the event object that it gets from the trigger.
I know this is what you asked for but your not going to like it because it will trigger the email to be send whenever to edit any of the columns. You will probably prefer changing it later to accommodate putting a column of checkboxes which can be used as buttons for sending the emails.

How to give column names after one hot encoding with sklearn?

Here is my question, I hope someone can help me to figure it out..
To explain, there are more than 10 categorical columns in my data set and each of them has 200-300 categories. I want to convert them into binary values. For that I used first label encoder to convert string categories into numbers. The Label Encoder code and the output is shown below.
After Label Encoder, I used One Hot Encoder From scikit-learn again and it is worked. BUT THE PROBLEM IS, I need column names after one hot encoder. For example, column A with categorical values before encoding.
A = [1,2,3,4,..]
It should be like that after encoding,
A-1, A-2, A-3
Anyone know how to assign column names to (old column names -value name or number) after one hot encoding. Here is my one hot encoding and it's output;
I need columns with name because I trained an ANN, but every time data comes up I cannot convert all past data again and again. So, I want to add just new ones every time. Thank anyway..
As #Vivek Kumar mentioned, you can use the pandas function get_dummies() instead of OneHotEncoder. I wanted to preserve a version of my initial DataFrame so I did the folowing;
import pandas as pd
DataFrame2 = pd.get_dummies(DataFrame)
I used the following code to rename each one-hot encoded columns to "original name_one-hot encoded name". So for your example it would give A_1, A_2, A_3. Feel free to change the "_" below to "-".
#Create list of columns with "object" dtype
cat_cols = [col for col in df_pro.columns if df_pro[col].dtype == np.object]
#Find the array of new columns from one-hot encoding
cat_labels = ohenc.categories_
#Convert array of columns into list
cat_labels = np.concatenate(cat_labels).ravel().tolist()
#Use list comprehension to generate new list with labels needed
cat_labels_new = [(col + "_" + label) for label in cat_labels for col in cat_cols if
label in df_pro[col].values.tolist()]
#Create new DataFrame of transformed columns using new list labels
cat_ohc = pd.DataFrame(cat_arr, columns = cat_labels)
#Concat with original DataFrame and drop original columns (only columns with "object" dtype)

Magento import: Can not find required columns: sku

I am going nuts here with Magento's import function. I have created one template product within my store and then exported it, so I can see what the attributes look like. Next I used Pentaho Data Integration to transform our suppliers product list into that format.
The header row contains, like the export, the service columns (starting with an underline). Here is one record of what my generated data looks like:
sku,_store,_attribute_set,_type,_category,_root_category,_product_websites,color,cost,country_of_manufacture,created_at,custom_design,custom_design_from,custom_design_to,custom_layout_update,description,gallery,gift_message_available,has_options,image,image_label,manufacturer,media_gallery,meta_description,meta_keyword,meta_title,minimal_price,msrp,msrp_display_actual_price_type,msrp_enabled,name,news_from_date,news_to_date,options_container,page_layout,price,required_options,short_description,small_image,small_image_label,special_from_date,special_price,special_to_date,status,tax_class_id,thumbnail,thumbnail_label,updated_at,url_key,url_path,visibility,weight,qty,min_qty,use_config_min_qty,is_qty_decimal,backorders,use_config_backorders,min_sale_qty,use_config_min_sale_qty,max_sale_qty,use_config_max_sale_qty,is_in_stock,notify_stock_qty,use_config_notify_stock_qty,manage_stock,use_config_manage_stock,stock_status_changed_auto,use_config_qty_increments,qty_increments,use_config_enable_qty_inc,enable_qty_increments,is_decimal_divided,_links_related_sku,_links_related_position,_links_crosssell_sku,_links_crosssell_position,_links_upsell_sku,_links_upsell_position,_associated_sku,_associated_default_qty,_associated_position,_tier_price_website,_tier_price_customer_group,_tier_price_qty,_tier_price_price,_group_price_website,_group_price_customer_group,_group_price_price,_media_attribute_id,_media_image,_media_lable,_media_position,_media_is_disabled
4053258104446,,Default,simple,"Schmuck/Halsschmuck",Default Category,base,,,,25.07.2015 20:06,,,,,"Collier, PVC, braun, 42 cm, Karabinerverschluss 925/- S, Durchmesser ca. 2 mm",,,0,"35416.jpg",,"JOBO",,,,,,,"Konfiguration verwenden","Konfiguration verwenden","Collier PVC braun, Verschluss aus 925 Silber 42 cm Karabiner ",,,"Artikelinformationsspalte",,5,0,"Collier PVC braun, Verschluss aus 925 Silber 42 cm Karabiner","no_selection",,,,,1,2,"no_selection",,2015/07/25 20:06:32.291,,,4,,,,1,0,0,1,1,1,0,1,1,,1,0,1,0,1,0,1,0,0,,,,,,,,,,,,,,,,,,,,,
Magento complains with:
Can not find required columns: sku
I just don't see what might be wrong with my data. Obviously the sku is there, and my DB is empty! Things I have checked:
File-Encoding is UTF-8
Tried with LR and CR/LF
Strings are surrounded by "
Which fields are manadatory for an import? I just coudn't find anything within the documentation.
I have spent timeless hours on this. Any help is greatly appreciated!
Check if your column names are exactly the same as database names -
Moreover, encoding should be UTF-8 without BOM

Why does Open XML API Import Text Formatted Column Cell Rows Differently For Every Row

I am working on an ingestion feature that will take a strongly formatted .xlsx file and import the records to a temp storage table and then process the rows to create db records.
One of the columns is strictly formatted as "Text" but it seems like the Open XML API handles the columns cells differently on a row-by-row basis. Some of the values while appearing to be numeric values are truly not (which is why we format the column as Text) -
some examples are "211377", "211727.01", "209395.388", "209395.435"
what these values represent is not important but what happens is that some values (using the Open XML API v2.5 library) will be read in properly as text whether retrieved from the Shared Strings collection or simply from InnerXML property while others get sucked in as numbers with what appears to be appended rounding or precision.
For example the "211377", "211727.01" and "209395.435" all come in exactly as they are in the spreadsheet but the "209395.388" value is being pulled in as "209395.38800000001" (there are others that this happens to as well).
There seems to be no rhyme or reason to which values get messed up and which ones which import fine. What is really frustrating is that if I use the native Import feature in SQL Server Management Studio and ingest the same spreadsheet to a temp table this does not happen - so how is that the SSMS import can handle these values as purely text for all rows but the Open XML API cannot.
To begin the answer you main problem seems to be values,
"209395.388" value is being pulled in as "209395.38800000001"
Yes in .xlsx file value is stored as 209395.38800000001 instead of 209395.388. And it's the correct format to store floating point numbers; nothing wrong in it. You van simply confirm it by following code snippet
string val = "209395.38800000001"; // <= What we extract from Open Xml
Console.WriteLine(double.Parse(val)); // < = Simply pass it to double and print
The output is :
209395.388 // <= yes the expected value
So there's nothing wrong in the value you extract from .xlsx using Open Xml SDK.
Now to cells, yes cell can have verity of formats. Numbers, text, boleans or shared string text. And you can styles to a cell which would format your string to a desired output in Excel. (Ex - Date Time format, Forced strings etc.). And this the way Excel handle the vast verity of data. It need this kind of formatting and .xlsx file format had to be little complex to support all.
My advice is to use a proper parse method set at extracted values to identify what format it represent (For example to determine whether its a number or a text) and apply what type of parse.
ex : -
string val = "209395.38800000001";
Console.WriteLine(float.Parse(val)); // <= Float parse will be deduce a different value ; 209395.4
Update :
Here's how value is saved in internal XML
Try for yourself ;
Make an .xlsx file with value 209395.388 -> Change extention to .zip -> Unzip it -> goto worksheet folder -> open Sheet1
You will notice that value is stored as 209395.38800000001 as scene in attached image.. So nothing wrong on API for extracting stored number. It's your duty to decide what format to apply.
But if you make the whole column Text before adding data, you will see that .xlsx hold data as it is; simply said as string.

Extract portions of Google Docs form data and write it on separate lines

Let's say I have a Google Docs Form that gathers the following info:
Timestamp (default field)
Names
Ref#
The form data then appears on the spreadsheet as follows:
4/10/2013 16:20:31 | Jack, Jill, Oscar | Ref6656X
(Note: the number of names may be anywhere from 1 to many)
I need the data to appear on the spreadsheet as follows:
4/10/2013 16:20:31 | Jack | Ref6656X
4/10/2013 16:20:31 | Jill | Ref6656X
4/10/2013 16:20:31 | Oscar | Ref6656X
I can often decipher and edit Google Apps Script (JavaScript?), but I don't know how to think in that language in order to create it for myself (especially with an unknown number of names in the Name field). How can I get started on solving this?
First of all, you've got some choices to make before you start writing your code.
Do you want to modify the spreadsheet that's accepting form input, or produce a separate sheet that has the modified data? If you want to have a record of what was actually input by a user, you'd best leave the original data alone. If you're using a second sheet for the massaged output, the presence of multiple tabs might be confusing to your users, unless you take steps to hide it.
Do you want to do the modifications as forms come in, or (in bulk) at some point afterwards? If you already have collected data, you'll have to have the bulk processing, and that will involve looping and having to handle insertions of new rows in the middle of things. To handle forms as they come in, you'll need to set up a function that is triggered by form submissions, and only extend the table further down... but you've got more learning to do - see Container-Specific Triggers, Understanding Triggers and Understanding Events for background info.
Will you primarily use Spreadsheet service functions, or javascript Arrays? This choice is often about speed - the more you can do in javascript, the faster your script will be, but switching between the two can be confusing at first.
Here's an example function to do bulk processing. It reads all existing data into an array, goes through that and copies all rows into a new array, expanding multiple names into multiple rows. When done, the existing sheet data is overwritten. (Note - not debugged or tested.)
function bulkProcess() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var dataIn = ss.getDataRange().getValues();
var dataOut = [];
for (var row in dataIn) { // Could use: for (var row = 0; row < dataIn.length; row++)
var names = dataIn[row][1].split(','); // array of names in second column
var rowOut = dataIn[row];
for (var i in names) {
rowOut[1] = names[i]; // overwrite with single name
dataOut.push(rowOut); // then copy to dataOut array
}
}
// Write the updated array back to spreadsheet, overwriting existing values.
ss.getRange(1,1,dataOut.length,dataOut[0].length).setValues(dataOut);
}