Inter annotator agreement when users annotates more than one category for any subject

Inter annotator agreement when users annotates more than one category for any subject - annotations

I want to find the inter annotator agreement for few annotators.
Annotators annotates few categories (out of 10 categories) for each subjects.
For e.g. there are 3 annotator , 10 categories and 100 subjects .
I am aware about http://en.wikipedia.org/wiki/Cohen's_kappa (For two annotators) and http://en.wikipedia.org/wiki/Fleiss%27_kappa (for more than two annotators) inter annotator agreement but I realized that they may not work if user annotates more than one category for any subject.
Do anyone has any idea for determining inter annotation agreement in this scenario.
Thanks

i had to do this several years back. i cant recall how exactly i did it(i dont have code anymore) but i have a worked example to report to my professor. i was dealing with annotation of comments and have 56 categories and 4 annotators.
note:at the time i need a way to detect where annotators most disagree so that after each annotation session they can focus on why they disagree and set out reasonable rules to maximize this statistic. it worked well for that purpose
Let's assume A-D are annotators and 1-5 are categories. This is a possible scenario.
A B C D Probability of agreement
1 X X X X 4/4
2 X X X 3/4
3 X X 2/4
4 X 1/4
5
A tags this comment as 1,2,3,4 B->1,2,3, and so forth.
For each category the probability of agreement is calculated.
Which is then divided by the number of unique categories tagged for that particular comment.
Therefore for the example comment, we have 10/16 as annotator's agreement. This is a value between 0 and 1.
if this doesnt work for you then (http://www.mitpressjournals.org/doi/pdf/10.1162/coli.07-034-R2) pg-567, which was referenced by pg-587 case study.

Compute agreement on a per-label basis. If you treat one of the annotators as the gold standard, you can then compute recall and precision on label assignments. Another option is label overlap, which would be the proportion of subjects where either annotator assigned a category where the both assigned it (intersection over union).

Related

PERMANOVA - small and unequal sample sizes

I am comparing fish communities at 2 sites (upstream vs downstream) with data collected in two seasons (wet and dry) over several years (2017-2022), with data from the first pair of wet and dry seasons representing the period before a treatment and subsequent data representing periods after the treatment. During each season I sampled each site four times, and recorded abundance of each fish species from each site. I did not conduct sampling during the last dry season due to resource constraints. The community compositions of the two sites over different seasons and periods are visualised in the NMDS biplots.
I am trying to do further analysis using PERMANOVA to look for any spatial-temporal changes in the fish commnities, mainly if the two groups are becoming more similar in the years following the treatment. As there are samples with no fishes recorded I have to remove those samples from the dataset, which means I have only three instead of four replicates in some of the site x season x year groups.
My question is, does it still make sense to use PERMANOVA if I have unequal sample size among groups, given the number of replicates from each are small (3-4)? I am planning to run the test separately for wet and dry seasons, but that means I still need to do two-way (site x year) PERMANOVAs for each of the seasons.
I learned from some of the online discussions that unequal sample size would be a problem for two (or more)-way PERMANOVAs and the problem of unequal sample size would be more prominent when the sample sizes are small. Would be grateful for any comments or insight on this. Thanks a tonne!

Paired Samples t-test or Two-sample (heteroschedastic)?

My dataset consists of many patients who went through a treatment and I want to understand if there is a difference between before and after taking it. By far it appears clearly as a Paired Samples t-test, the problem is that I have different historical observations on each patient and let's say I have:
patient A with 150 observations before and 350 observations after,
patient B with 50 observations before and 300 observations after...
Which test should I use? The one for paired samples (even though n_before ≠ n_after)?
A most general Welch's t-test (they are heteroschedastic)?
Is there anything I am missing or doing wrong?
In case of paired samples should I need to use undersampling in order to make the lenghts match?
Thanks to everyone

Netlogo: Built-in function to calculate the expected profit

Sorry for long post. I am newbie in agent-based modelling. So please accept my apology in advance if my question sounds stupid. I am trying to model a scenario where framer (i.e. agent) decides which type of crop should be harvest in different types of fields to increase the profit. The farmer agent has a budget i.e. the amount of money that can be spent on farming each time step equal to $100.
The farmer operates a farm that is subdivided into nine fields, which are arranged in a 3x3
cellular grid. Each field is of the same size. Water availability varies spatially across the fields with a rating of either 1 (driest), 2 (moderate),
or 3 (wettest). The manner in which water availability varies across the fields (i.e. randomly).
The farmer must choose among three crops. As initial parameter settings, the crops have the
following characteristics:
Yield Price Costs Minimum Water Req.
Crop 1 300 20 15 3
Crop 2 200 12 10 2
Crop 3 100 7 5 1
Each crop requires a certain amount of water to grow. Crop yields will only be realized if the crop is
planted in a field with at least the crop’s minimum water requirement.
Now the problem is that I couldn't find any function in Netlogo that calculates the permutation or combination of crop, field, and water requirements to calculate the expected profit. Any help would be high appreciated.

I believe you describe a linear programming problem.
Useful functions for solving Simplex Linear Programming problems are in NumAnal extension, which does not come bundled with NetLogo but which you can get as follows:
In NetLogo, under Tools / Extensions ... you can find NumAnal, probably with no green check-mark. Select it. On the right, you have buttons to install it, and then one to add it to your code. When you click those, it should now get a green checkmark and you should have a new line in your code "extensions [ numanal ]", and you are now able to use those commands, with the "numanal:" prefix, for example, numanal:simplex.
The documentation for it is in the folder where it was installed. But where is that?
Sadly, the documentation for where extensions are downloaded is not current.
https://ccl.northwestern.edu/netlogo/docs/extensions.html#where-extensions-are-located
After exhaustive search by date-modified, I actually found the folder on my Windows 10 laptop here: c:\Users\condor\AppData\Roaming\NetLogo\6.1\extensions
( Note the "\Roaming\" ).
That folder has a README.md text file, and a pdf document named "NumAnal-v3.4.0" explaining how to use it, and an examples folder with code. It is a little dense.
Here's a link to the basics of how to describe a Linear Programming problem, which is beyond the scope of StackOverflow. You can find help via Google.
Here's one 8 minute video ( as of 24-Nov-2019) that might help you figure out if this is what you need.
Simplex Algorithm Explanation (How to Solve a Linear Program)
https://www.youtube.com/watch?v=RO5477EKlXE

Finding cheapest values from a user input for a product

recently I was given a word problem where I had to write a program that takes a user input of the number of a product and the finds the cheapest way the user can buy the amount of that product. Values are as follows: product 1 is a 24 pack that sells for $109, product 2 is a 12 pack that sells for $55, product 3 is a 4 pack that sells for $19, and product 4 is a 1 pack that sells for $5. The program should tell the user the cheapest way to buy the product.

Okay, if it is an explanation you need.
you want to first work out your average price per unit, this means doing $109/24, $55/12 and $19/4. This should give you different values for each individual unit.
Next Assume you want to buy x Units, you need to take x and find the package with lowest cost, see if x is greater than that values. So for example, if x is 21, then the 24pack would not be an option. EDIT here, your question do not specify, but if x is 23 and 24 pack happens to be cheaper than any other combination, would you be allowed to purchase the 24 pack? If so, for stage 4, you want to always consider going for the bigger package to see if it is cheaper, you need to add an extra decision there.
If x is greater than the cheapest package, you want to do x mod package size. Where the mod stands for modulo divide.
If x is not greater than the cheapest package, you want to take the next cheapest package size and see if x is greater than it.
You continue to loop back and forth step 3 and 4, until there are no smaller options (unless you also happen to sell them individually, in which case, you would use it).
and finally you add up all your packages * package price
=========================================================
Method 2, you can consider a combination problem, your goal is to combine any number of 24,12,4 packs and combine them to form your desired unit x. Then work out which one is cheapest. What is interesting is that if you allow buying more then you need (given that it is cheaper), you need to add a set of combinations to those generated and compare their prices. For example if x is 35, you want to run possible combinations for 48 (24*2), 36 (3*12) as well as 35.

HMM - correct number of states

I'm new to HMM. I came cross an example in Wikipedia Baum–Welch_algorithm Example and I'm little bit confused. Hope someone can help me.
The example as follow: "Suppose we have a chicken from which we collect eggs at noon everyday.
Now whether or not the chicken has laid eggs for collection depends on some unknown factors that are hidden.
We can however (for simplicity) assume that there are only two states that determine whether the chicken lays eggs."
My questions here are:
In the case that we do not know the states; How can we find the correct number of states. In the example above; they assume 2. But maybe 3 or 5 ... better represents the system.
Is it necessary to give a meaning for each state in the system. In the example above; we have s1 and s2 but they are not given a meaning related to the application.

I you want to fit a HMM to your chicken example, you will assume successively that there are only 1 state, then 2 states, then 3 etc. governing this laying process. If you know a little about the chicken way of life, you may assume the number of state based on your knowledge.
You can test, for example, a two states hypothesis thinking that the number of eggs a chicken may lay depends on the following states:
(1) the chicken is awake, (2) the chicken is sleeping.
You can then test a three states hypothesis with (1) the chicken is awake, (2) the chicken is sleeping, and (3) the chicken is in the nest.
For each of the state you want to test, you have new parameters adding to your model. They tune the number of eggs laid in each state and of course transition probabilities between the states. Then you can choose a model between your hypothesis based for example on its goodness of fit to data (if you owe some) with some information criterion (AIC, BIC, DIC... depending on your fitting methodology).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse