How to take a subset of data and summarise in r - data-cleaning

I want to take out a subset of data from the data frame according to a column, eg. filter out the rows where fruit is APPLE and work on the subset of this data to produce a summary, plot graphs or simply analyse the subset data as a table.
What about a subset of data with fruit being APPLE or MANGO?
#> fruit price climate quantity
#> 1 APPLE 20 temperate 100
#> 2 ORANGE 10 temperate 1000
#> 3 APPLE 20 temperate 200
#> 4 APPLE 20 temperate 300
#> 5 MANGO 5 tropical 20
#> 6 MANGO 5 tropical 50
Please advise possible ways to do this. Thank you very much.

Related

Tableau display 3 measures in 1 column

I am building a report in Tableau with 4 fields. 3 of these fields are measure fields. This is what it looks like.
Name Sum Profit Loss
Emmy 15 10 5
Sara 23 18 2
Dave 10 1 2
But, I want it to look like this. I want a new column called metrics that pulls in those 3 values.
Metrics
Emily 15
10
5
Sara 23
18
2
Dave 10
1
2
May I have some guidance on how I can approach this?
The easiest way to solve this is by bringing "Measure Names" to Rows
You can hide the header names of the measures as well. hope that helps

Spotfire data table merge using nearest neighbour values

I am trying to merge two data tables (tables A and B) in Spotfire 7.10 using insert columns to give the resultant table C. My problem is i cannot get the join i need on Depth because Depth in tables A and B are not exact matches. What i need is to match Table B to Table A based on a match using Depth to its nearest value i.e Depth 10.5 (table B) matches Depth 10 (Table A). Is this possible in Spotfire or using an TERR R script?
Table A
Depth data
10 2
20 4
30 3
40 5
50 7
Table B
Depth data 2
10.5 100
30.5 112
50.5 125
Table C
Depth data data 2
10 2 100
20 4
30 3 112
40 5
50 7 125
many thanks for any help
It depends on the range of values you may have in both tables for Depth, but you may find that simply rounding the result to the nearest 10 in Table B will suffice. Then you can join based on this.
Round([Depth]/10,0)*10

remove a lesser duplicate

In KDB, I have the following table:
q)tab:flip `items`sales`prices!(`nut`bolt`cam`cog`bolt`screw;6 8 0 3 0n 0n;10 20 15 20 0n 0n)
q)tab
items sales prices
------------------
nut 6 10
bolt 8 20
cam 0 15
cog 3 20
bolt
screw
In this table, there are 2 duplicate items (bolt). However since the first 'bolt' contains more information. I would like to remove the 'lesser' bolt.
FINAL RESULT:
items sales prices
------------------
nut 6 10
bolt 8 20
cam 0 15
cog 3 20
screw
As far as I understand, If I used the 'distinct' function its not deterministic?
One way to do it is to fill forward by item, then bolt will inherit the previous values.
q)update fills sales,fills prices by items from tab
items sales prices
------------------
nut 6 10
bolt 8 20
cam 0 15
cog 3 20
bolt 8 20
screw
This can also be done in functional form where you can pass the table and by columns:
{![x;();(!). 2#enlist(),y;{x!fills,/:x}cols[x]except y]}[tab;`items]
If "more information" means "least nulls" then you could count the number of nulls in each row and only return those rows by item that contain the fewest:
q)select from #[tab;`n;:;sum each null tab] where n=(min;n)fby items
items sales prices n
--------------------
nut 6 10 0
bolt 8 20 0
cam 0 15 0
cog 3 20 0
screw 2
Although would not recommend this approach as it requires working with rows rather than columns.
Because those two rows contain different data, they are considered distinct.
It depends on how you define "more information". You would probably need to provide more examples, but some possibilities:
Delete rows with null sales value
q)delete from tab where null sales
items sales prices
------------------
nut 6 10
bolt 8 20
cam 0 15
cog 3 20
Retrieve rows with max sales value for each item
q)select from tab where (sales*prices) = (max;sales*prices) fby items
items sales prices
------------------
nut 6 10
bolt 8 20
cam 0 15
cog 3 20

How to Sum values of column corresponding to one column in jasper

In jasper i have one requirement like, sum all the data of the column based on another column. I know how to show sum of all the data in the column. But please suggest me for this requirement.
Emp-Category Emp-Id Salary
---------------------------------------
Cate - A 1 128
2 50
3 89
total 267
Cate - B 4 123
5 50
6 100
total 273
Total Expenses 540
So there will be many number of categories, which will be getting from database.
Please suggest me how to do this approach.
In that case, you can use groups...
You can refer to the link below for tutorial :)
http://www.tutorialspoint.com/jasper_reports/jasper_report_groups.htm

Any other way to count group records in crystal reports?

I want to count the total no of records found under a group and according to that i need to assign reduction percentages.
In Detail,
if a user got less than 3 products he ll get 10%
if a user got 4-10 products he ll get 15%
if he got 10-20 products he ll get 20% as deduction
For example consider the following as a crystal report:
User 1
Product Deduction
Apple 15 %
Orange 15 %
Lemon 15 %
Strawberry 15 %
Grapes 15 %
Here i made a count of records in this group using running total and worte a formula. Depends on the count its ll make the deductions
But the problem in getting is, I'm getting the following output...
User 1
Product Deduction
Apple 10 % // Since count is 1
Orange 10 % // Since count is 2
Lemon 10 % // Since count is 3
Strawberry 15 % // Since count is 4
Grapes 15 % // Since count is 5
I wanna make all products deduction as 15 % if the total no of records are 4-15. Here the total no of products is 5 its in the range of 4-15 so all the product should have the deduction as 15 %.
Please help me out to solve this?
You don't need a running-total field (RT) for a simple count; use a summary field instead. The summary field can be used in the group's header and footer (a RT can only be used in the footer).
Select a field (preferably a unique field), then select Insert | Summary Field... Choose Count or Distinct Count from the picklist.
You can use this summary in a formula as well:
//{#discount}
//Assumes there is a group on {table.fruit_name}):
SELECT Count({table.key_field}, {table.fruit_name})
CASE 1 to 3: .1
CASE 4 to 5: .15
DEFAULT: 0