How to sort a list of trending Hashtags by activity? - tags

What is the right way to sort a list of e.g. hashtags by activity?
I collect the hashtags by counting the occurrence of a tag in the last 1000 messages that were posted. Then I have a dictionary like this:
{"hashtag": 12, "tag": 11, "some": 6, "thing": 4, "yet": 2, "another": 2, "word": 1}
Sorting purely by number does not help to identify trends. Maybe people are talking everyday about "hashtag", so it does not mean there is a trend in it. And with real data you often get a static top 10, that rarely contains a new tag.
So it should account for changes in the frequency of the tags. But with a sliding window of 1000 posts, the trends "slide in" as well, so you will not see jumps that can be identified as "starts trending".
What is the correct criterion to identify when some hashtag or word starts trending in a stream of messages?

Related

Trying to come up with a formula that would return different greetings given per cell based on a list of greetings

I have created a daily journal in Spreadsheets where each day has it's own section. In each section, I created a questionaire that asks the user how they are doing in that day. I would like to add on top of that questionaire a different greeting per day, so I wrote a list with 30 variations in each cell (for example: "hello", "good morning", "buenos dias", "bon jour"...).
Where the different greetings will show up
list of greetings
I tried to use the rand() formula to select different greetings but since that formula is volatile and updates every time I make a change, it wont work for me.
As a solution, Each daily section starts with a cell countaining the date, so I thought that since each day would have a different greeting, I could use the number value of each date to drive the selection (for example: 05/05/2022 is the number value 44686, the next day is 44687).
I thought about using the Index formula but it requires that I use a number from 1 to 30 to retrieve one of those 30 greetings. I think that even if I were to somehow transform the date into a number from 1 to 30, perhaps in a few days, the value would end up being bigger than 30.
Anyway, I appreciate any help!
Gabriel

Magento 2.4 collection filtering no longer working - strange paging issue

I'll start by saying that this worked correctly prior to Magento 2.4.
using $collection->addAttributeToFilter("sku",'21V12') does seem to filter the products, but it does so very strangely. if i have 20 products in a category and i use that filter, there are 2 different scenarios
The results are correct and it shows 1 of 1
The results show "We can't find products matching the selection."
the difference is that if the sku i'm filtering on is on page 1, then i get the first results, but if the sku is on a different page, i get the "We can't find products matching the selection."
If I add the page to the url, I get the result (adding p=2 or p=3 for example)
Any idea why that is? I've tried this in multiple points in the code to no avail.
That filtering on sku is a simple example, but lets say we want to do a more involved filter like
$collection->addAttributeToFilter("special_name",'some_custom_text')
and that gives 20 results, sometimes there are none on page 1 and 3 on page 2, etc.
Anyway, it seems to just be hiding items in the display and not actually giving the results we are looking for.
I've tested this on a baseline 2.4.2 install with the Luma theme.
To verify, the easiest way, add this at line 147 in vendor/magento/module-catalog/Model/Layer.php
$collection->addAttributeToFilter("sku",'21V12');
Substitute a sku you have on page 2 of a category page. you should get the "We can't find products matching the selection." page, but if you add ?p=2 (or whatever page your item is on normally) you'll get that product as a result.

vsts -- effort dropdown values on product board

I use the product board to set the effort for Product Backlog Items. The project uses a scrum process with effort measured in hours. However, the default values in the effort dropdown are more in line with Agile level-of-effort estimation.
here's a picture of the default dropdown
When I click the effort number in the backlog, I want the dropdown to have the numbers 1, 2, 4, 8, 16, 32, 40 instead. Is there a way to make that happen? As a bonus, I'd ideally like only those values to be valid.

Spark-Scala UDF custom transformations

What would be the best way to determine rows that are in contention/violate specific rules?
I have a dataframe that represents a combination of store/item which could be sold. Here a store may have many items and a item may be in many stores. For example:
Row Store Item Price Warehouse Zone Status
1, Store-1, Basketball, 5.99, 21, Z1, Active
2, Store-1, Football, 6.99, 21, Z2, Active
3, Store-2, Basketball 5.99, 21, Z1, Active
4, Store-1, Basketball 4.99, 22, Z1, Not-Active
The objective is to choose a Store/Item combination but sometimes there are many choices for a specific store/item combination. There are specific rules as to which combination to pick. All the listed transformations apply to all the rows in the dataframe but I only want to focus on the rows where there is contention. Here it would be 1 & 4.
There will be millions of rows in the dataframe. Most of the items in the dataframe are singletons (thus making the choice easy), but for some the selection becomes more difficult as the rules are tiered, i.e. compare the sourcing zone, compare the Status, compare the price, and if they are all equal choose the first one you encounter.
Looking for thoughts or suggestions.
How about filter and reduce?
Say the dataframe is df, the combination is store/item and you have a function compare which determines the better one from two rows. The code would be
def compare(row1: Row, row2: Row): Row
df.filter($"Store" === store && $"Item" === item).reduce(compare)

Working with a delimited list of items in a Tableau field

I am preparing a data visualization in Tableau.
I have some data that can be simplified like this:
Name, Score, Tag
Joe, 5, A;B
Phil, 7, D
Quinn, 9, A;C
Bill, 3, A;B;C
I would like to generate a word cloud on the Tag field that counts
occurances of each item A,B,C. So I need to generate this:
A,3
B,2
C,2
D,1
In other words, I need help working with a field that contains a list of delimited values.
In the example data ; is the delimiter, but it could be anything.
I would like the word cloud to update as the user
applies filters, e.g. dragging a slider to set score > 5.
So the tag count has to be done on the fly.
I'm pretty sure I'll need to use field calculations and table calculations..?
Possibly I'll need to have a separate table tracking the tags..?
I have no problem building the word cloud and other viz elements.
What I'm looking for help with is parsing the delimited list field and
calculating the tag counts.
I do have full control over the source data, so if there is an easier way to
do this by reorganizing the schema, I'd be glad to do that. I thought of breaking
the field up into spearate tag1, tag2, tagX fields and trying to count over the
separate fields... but not sure if this is any simpler.
Thanks for any tips.
Another (probably better in your case) approach is to reshape the data before feeding it to Tableau. Tableau works best with normalized data.
Preprocess it to look like:
Name, Score, Tag
Joe, 5, A
Joe, 5, B
Phil, 7, D
Quinn, 9, A
Quinn, 9, C
Bill, 3, A
Bill, 3, B
Bill, 3, C
At that point, the standard Tableau word cloud charts should work well, and it will scale easily as you add more tags and data.
Reshaping data to normalize it prior to analysis with Tableau is a pretty standard step. Sometimes you can do it automatically, say with custom SQL, but often you'll have to use some sort of script first. If your data comes from Excel, Tableau has a plug in that can help with reshaping data. Look for it on the Tableau knowledge base.
Here's an approach that would be tolerable if you had a fixed set of 3 or 4 tags. Since you have closer to 50K possible tags, it's not a feasible approach for your problem as is. But maybe it will give you an idea. Similar approaches can be used to solve different kinds of problems in Tableau, so its a useful trick to know.
For each tag, create a boolean calculated field that returns 1 if the current row contains that particular tag and null otherwise (or whatever the condition is you want to detail)
For example, define a calculated field called Tag_A defined as:
if contains(Tag, "A") then
1
end
Similar, define calculated fields Tag_B, Tag_C etc
So far it's easy.
Then you can use those fields in other calculations to count the number of records that contain tag A, filter to only those that contain A, use the calculated field on the condition tab when defining sets that are computed dynamically by a formula ... Of course, the low level calculated field function can be more complex, say checking for the presence of at least 2 fields out of a list for example.
If nothing else, this approach sometimes lets you break complex problems into bite sized pieces.
Unfortunately, hard coding calculated field names won't scale to 50K tags. For that, you probably want to reshape your data.