Is there a way to include a 95% margin of error to tbl_summary? - confidence-interval

I have been asked to create a margin of error table for some data. Does anyone know how I can do this? I'm a big fan of gtsummary. It's so easy to use, so I was hoping that I could use this package.
It's easy to add a 95% confidence interval to tbl_summary (see below) but I just can't work out how to make this a margin of error instead.
I think it requires add_stat and a function but I still haven't got my head around how to do it.
Appreciate your help
mtcars %>%
select(mpg, cyl) %>%
tbl_summary(by = cyl) %>%
add_ci()

Related

Why does my Google Optimize experiment show no clear winner

I did a very simple text, the footer contact form on the left of the website or the right of the website. The results showed "no clear winner". But the below data shows that one has 5 conversions vs 1, which I consider to be significant (albeit low numbers). It also says there is a 95% probability that this one will be better.
What am I not understanding about this data? Is it that the numbers are too low in volume to give a reading or is it a bug or is there something I've missed?
Its probably because your AB Test did not have a lot of traffic, in each variant. So 5 conversions vs 1, is not really a big difference between the two.

Inner workings of Google's Quick Draw

I'm asking this here because I didn't find anything online.
I would be interested in how Google Quick Draw works, specifically:
1) How does it output the answer - does it have a giant output vector with a probability for each type of drawing?
2) How does it read the data - I see they've implemented some sort of order aware input system, but does that mean that they input positions of the interpolated lines that users draw? This is problematic because it's variable length - how did they solve it?
3) And, finally, which training algorithm are they using? The data grows each time someone draws something new, or do they just feed it into the algorithm when it's created?
If you know any papers on this or by miracle you work at Google and/or can explain how it works, I would be really greatful. :)

Tableau: How to show two measures from different data sources in the same chart (without blending)

Currently I have a chart that displays monthly GA visitors. This data comes from a GA table in the DB (Table 1)
I have a separate data source for purchases (Table 2). I have a chart that shows # of monthly purchases, but I'd like to combine these into one chart so I can show a monthly conversion rate.
Tableau keeps prompting me to blend data and I don't want to do that (I don't think). Tableau seems really unintuitive at times.
Please see the screenshot below. The numbers in red are what I'm looking for. They need to pulled on the same dates as the GA data. I can derive the conversion % from there.
Thank you! Tried a bunch of stuff but getting nowhere.

How to realize a real waterfall chart

I know there are a lot of topic which talk about this topic. However, I don't find what I look for. Theres no exact waterfall chart in tableau ?
I ty to achieve a result like this one :
First example
Second example
Third one
Let me know it's possible
Tried to replicate how the chart should look like but if you need accurate chart then provide dummy data.

Clustering Category Purchases in Customer Data

I am attempting to cluster a group of customers based on spend, order frequency, order breadth and what % of purchases they make in each category (there are around 20).
It will probably be a simple answer but I cannot figure out whether I should standardize (subtract mean and divide by sd) the % category buy columns or not. When I dont standardize I can get around 90% of the variance explained in 4-5 principal components (using SVD), but when I standardize each column I only get around 40% for the same number of principal components. My worry is that because each column is related, I am removing the relationship by standardizing. At the same time I am worried that not standardizing will cause issues with the other variables in the data that I have standardized.
I would assume if others tried clustering in this way they would face a similar issue but I cant seem to find one so it might be that I just dont understand the situation. Thanks for any clarification in advance!
Chris,
Percentage scale has a well defined range and nice properties.
By heuristically scaling these features you usually make things worse.