Add p-value for means comparison table (tab_stat_mean_sd_n) - expss

Do you happen to know how to add a column with p-value when comparing means using tab_stat_mean_ds_n() from the expss package? The tab_significance_options() offer a lot of modifications but I can't find anything to add just extra column with p-values.
Thanks

I am an author of the expss package.
Sorry, but currently there is no such functionality as adding column with p-value. There are a lot of cases when column significant differs from many other columns. See simple example below. I don't know how to add columns with p-value in a sensible way in the such circumstances. If you can provide nice table examples with multiple p-value columns I will consider to add such option in the future version of the package.
library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
mpg = "Miles/(US) gallon",
cyl = "Number of cylinders"
)
mtcars %>%
tab_cells(mpg) %>%
tab_cols(cyl) %>%
tab_stat_mean_sd_n() %>%
tab_pivot() %>%
significance_means()
# | | | Number of cylinders | | |
# | | | 4 | 6 | 8 |
# | | | A | B | C |
# | ----------------- | ------------ | ------------------- | ------ | ---- |
# | Miles/(US) gallon | Mean | 26.7 B C | 19.7 C | 15.1 |
# | | Std. dev. | 4.5 | 1.5 | 2.6 |
# | | Unw. valid N | 11.0 | 7.0 | 14.0 |

Related

Compare specific rows of DataFrames in Scala

I have two Scala DataFrames which I am testing for similarities. I want to be able to pick a specific row number, and compare each value of that row between the two DataFrames. For example:
Dataframe 1: df1
+------+-----+-----------+
| Name | Age | Eye Color |
+------+-----+-----------+
| Bob | 12 | Blue |
| Bil | 17 | Red |
| Ron | 13 | Brown |
+------+-----+-----------+
Dataframe 2: df2
+------+-----+-----------+
| Name | Age | Eye Color |
+------+-----+-----------+
| Bob | 12 | Blue |
| Bil | 14 | Blue |
| Ron | 13 | Brown |
+------+-----+-----------+
Input: Row 2, output: Age, Eye Color.
What would be ideal, is for the output to show the values that are different too. I have considered the option here but the issue is that my DataFrames are very large (in excess of 200,000 rows) so this takes far too long. Is there a simpler way to select a specific row value of a Dataframe in Scala?

LibreOffice calc formula: find the value for a given data based on known data

I have LibreOffice Calc spreadsheet with table that calculates cost for two services. I want calculate cost of service #2 based on known data. The known data are rates (0,80 and 0,68: its permanent) and total incl.VAT 21%. Variable data in column C (unknown): C2 always equal to C3. Based on known data, I want split "Total incl. VAT" amount into a two separate parts, service #1 and service #2 cost. In particular, I want know the 'service #2' amount with VAT. (D3 + VAT) Can someone show formula how to make this?
+---+------------+---------------+-----------------+----------+-----------------+
| | A | B | C | D | E |
+---+------------+---------------+-----------------+----------+-----------------+
| 1 | services | Rate (eur/m3) | volume, m3 | Sum(eur) | service #2 cost |
| 2 | service #1 | 0,80 | 71,00 | 56,80 | |
| 3 | service #2 | 0,68 | 71,00 | 48,28 | |
| 4 | | | Subtotal: | 105,08 | |
| 5 | | | VAT 21% | 22,07 | |
| 6 | | | Total incl. VAT | 127,15 | D3 value + VAT |
+---+------------+---------------+-----------------+----------+-----------------+

Cross tab with a list of values instead of summation

I want a Cross tab that lists field values and counts them instead of just giving a count for the summation. I know I could make this with groups but I cant list the values vertically that way. From my research I believe I have to use a Display String Formula.
SQL Field Data
-------------------------------------------------
| Play # | Formation |Back Set | R/P | PLAY |
-------------------------------------------------
| 1 | TREY | FG | R | TRUCK |
-------------------------------------------------
| 2 | T | FG | R | RHINO |
-------------------------------------------------
| 3 | D | FG | P | 5 STEP |
-------------------------------------------------
| 4 | D | FG | P | 5 STEP |
-------------------------------------------------
| 5 | K JET | NG | R | DOG |
-------------------------------------------------
Desired report structure:
-----------------------------------------------------------
| Backet & Formation | Run | Pass |
-----------------------------------------------------------
| NG K JET | BULLA 1 | |
| | HELL 3 | |
-----------------------------------------------------------
| FG D | | 5 STEP 2 |
-----------------------------------------------------------
| NG K JET | DOG | |
-----------------------------------------------------------
| FG T | RHINO | |
-----------------------------------------------------------
Don't see why a Crosstab is necessary for this - especially if the entire body of the report is just that table.
Group your records by Bracket and Formation - If that's not
something natively configured in your table, make a new Formula field
and group on that.
Drop the 3 relevant fields into whichever section you need to display. (It might be a Footer, based on whether or not you want repeats
Write a formula to determine whether or not Run or Pass are displayed, and place it in their suppression field. (Good luck getting a Crosstab to do that for you! It tends to prefer 0s over blanks.)
If there's more to the report than just this table, you can cheat the system by placing your "table" into a subreport. And of course you can stretch Line objects across the sections and it will stretch to form the table outlines

Tableau - Show multiple discrete string (dropdown) dimensions side-by-side in a single table

I have a list of survey results that looks similar to the following:
| Email | Question 1 | Question 2 |
| ----------------- | ---------- | ---------- |
| test#example.com | Always | Sometimes |
| test2#example.com | Always | Always |
| test3#example.com | Sometimes | Never |
Question 1 and Question 2 (and a few others) have the same discrete set of values (from a dropdown list on the survey).
I want to show the data in the following format in Tableau (a table is fine, but a heatmap or highlight table would be best):
| | Always | Sometimes | Never |
| ---------- | ------ | --------- | ----- |
| Question 1 | 2 | 1 | 0 |
| Question 2 | 1 | 1 | 1 |
How can I achieve this? I've tried various combinations of rows and columns and I just can't seem to get close to this layout. Do I need to use a calculated value?
As far as I know - it is not natively possible with Tableau, because what you have is kind of a pivot table.
What you can do is unpivot the whole table as explained here https://stackoverflow.com/a/20543651/5130012, then you can load the data into Tableau and create the table you want.
I did some dummy data and tried it.
That's my "unpivoted" table:
Row,Column,Value
test,q1,always
test,q2,sometimes
test1,q1,sometimes
test1,q2,never
test10,q1,always
test10,q2,always
test11,q1,sometimes
test11,q2,never
And that's how it looks in Tableau:

PostgreSQL simple count query

Trying to scale this down so the answer is simple. I can probably extrapolate the answers here to apply to a bigger data set.
Given the following table:
+------+-----+
| name | age |
+------+-----+
| a | 5 |
| b | 7 |
| c | 8 |
| d | 8 |
| e | 10 |
+------+-----+
I want to make a table that shows the count of people where their age is equal to or greater than x. For instance, the table about would produce:
+--------------+-------+
| at least age | count |
+--------------+-------+
| 5 | 5 |
| 6 | 4 |
| 7 | 4 |
| 8 | 3 |
| 9 | 1 |
| 10 | 1 |
+--------------+-------+
Is there a single query that can accomplish this task? Obviously, it is easy to write a simple function for it, but I'm hoping to be able to do this quickly with one query.
Thanks!
Yes, what you're looking for is a window function.
with cte_age_count as (
select age,
count(*) c_star
from people
group by age)
select age,
sum(c_star) over (order by age
range between unbounded preceding
and current row)
from cte_age_count
Not syntax checked ... let me know if it works!