What is the meaning of the priceRange property in Schema.org? - schema.org

What does the property priceRange mean in Schema.org?
https://schema.org/priceRange
I don't understand what is mean that, I live in Kazakhstan, maybe my culture or language does not give clear. Can you give me example for Kazakhstan country, where we use tenge currency.

The Schema.org property priceRange gives the range of approximate prices of the products/services typically offered by that LocalBusiness.
There seem to be two formats used in examples:
Specify as many currency symbols as there are digits in the price. So for prices from 100 to 999, you would use ₸₸₸; for prices from 10 to 99, you would use ₸₸ etc.
(used in example 4 on LocalBusiness)
Specify the actual range, e.g. for prices from 90 to 240, then you would use 90 ₸ - 240 ₸ etc.
(used in example 1 on Hotel)
In the issue priceRange property is ambiguous, this property gets discussed (it might get deprecated in the future, or at least defined more clearly, but we’ll have to see).

Related

Find difference between two calculated groups?

I have dummy HR data, and I want to color format via a map the difference in median salary based on groupings of birth year.
I have a quick calc field to separate them into birth year groups:
IF DATE([Date of Birth]) >=#1976# THEN "Group 1"
ELSE "Group 2"
END
Now I want to find the difference between the median salaries for those two groups, but I want to conditionally format them via a map to see where the median salary remained similar or differed a lot.
For instance: Median(Group 1([salary])-Median(Group 2([salary]) would give me a +/- difference and then I'd like that to be colored via a gradient and then outlines via state level detail.
This is probably so easy, but I can't think of how to do it via those groups. Would this be a LOD calc?
Define a calc to return the salary for rows in group 1, and null otherwise. Call it say, Old_Folks_Salary, defined something like if Year([Birth Date]) < 1976 then [Salary] end (If the condition in the if statement is not satisfied, and there is no else clause, the expression returns null.) Define a similar field for the youngsters.
The trick to know is that aggregation functions, like Median, silently ignore null values. It’s as if the null values don’t even exist. So ... You can now express your aggregate calculation as
Median([Old Folks Salary]) - Median([Young Folks Salary])
For extra credit, you can replace the hard coded threshold of 1976 with a parameter, and look for more politically acceptable field names.

Tableau Mixed Data

I've been tasked to set up a Tableau worksheet of counts of data (ultimately to create percentages) where the contrived incoming data looks like the following.
id fruit
1 apple
1 orange
1 lemon
2 apple
2 orange
3 apple
3 orange
4 lemon
4 orange
The worksheet needs to look something like the following:
Count of ids
2 Lemons
2 No lemons
I've only been using Tableau for about 4 hours, so is this doable? Can anyone point me in the right direction?
The data is coming in from a SQL Server database in a format that I can control if that helps contribute towards a solution.
Alex's solution based on sets are very good for this scenario, but I would like to show that LODs can be more flexible if you need to extend your solution to include more categories.
for the current scenario, create a calculation with below formula and create text table using COUNTD(Id)
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon' ELSE 'No Lemon' END}
Now for the extension part, you are considering below list where you want to count IDs with Lemon, Apple and others. Since no double counting of Ids are allowed, categorization will follow the order. (This kind of precedence will be a headache without LODs)
Now you can change your calculation as below:
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon'
ELSEIF MAX([Fruit]='apple') THEN 'Apple'
ELSE 'No Lemon or Apple' END}
Now your visualization automatically changes to include the new category. This can be extended for any number of fruits.
This is a good use for a set.
In the data pane on the left sidebar, right click on the Id field and create a set named "Ids that contain at least one lemon" (or use a shorter less precise name)
In the set definition dialog panel, define the set by choosing "Use all" from the General tab, and then on the Condition tab, define the condition by the formula max([Fruit]="lemon")
There are many ways to think of a set, but the most abstract is just as a mathematical set of Ids that satisfy the condition. Remember each Id has many data rows, so the condition is a function of many data rows and uses the aggregation function MAX(). For booleans, True is treated as greater than False, so MAX() will return True if at least one of the data rows satifies the condition. By contrast, MIN() is True only if ALL (non-null) data rows satisfy the condition.
Once you have a set that separates your ids into Lemon scented Ids and others, then you can use that set in many ways - in calculated fields, in filters, in combination with other sets to make new sets, and of course on shelves to make visualizations.
To get a result like your question seeks, you could put your new set on the Row shelf, and put CNTD(ID) on the text shelf or columns shelf. Make sure you understand why you need count distinct (CNTD) instead of SUM([Number of Records]) here.
BTW, the LOD calculation { fixed [Id] : max([Fruit]="lemon") } is effectively the same solution.

Have Max value of range of dates filter be todays date

I have a "Range of Dates" filter and what I want is for the max (or right most value) to always be the most recent date which should be today's date. What seems to be happening is that if I leave the dashboard open and come back the next day the max value is yesterday's date and I must manually move the slider over to be today's date. How can I accomplish this?
I find a calculated field is the best way to do this as I have run into the same issues using the out of the box max date filter.
Create a calculated field as follows:
[date] = {FIXED: max([date])}
This creates a True False field where only the records that have the max data and carried through.
Now drag this onto the filter pane and select 'TRUE'.
I've generally seen two basic approaches for this problem: Calculated fields and relative dates.
Use a calculated field or parameter or some combination of calculated fields and parameters with filters. This is similar to what smb suggests in their answer to this question. It also seems to be the most popular approach.
If you don't particularly care about being able to set the end-date with the slider, you could try using relative dates, using the approaches detailed in the accepted answer to this Tableau forum question and in this Tableau Knowledge Base article. Jennifer Vonhagel also gives a second answer to the Tableau forum question farther down that uses a parameter plus calculated field approach.
Additionally, this Tableau Knowledge Base article offers another option (Option 1, in the article) if you have Tableau 10.3+: You can use the "Latest Date Preset" (see here for details) check box in the date filter dialog box. I haven't used this, but it looks promising if you're using Tableau Desktop (seems like it wouldn't work for Tableau Web). The article's Options 2-4 are just riffs on calculated fields, in my opinion.
Two more approaches I've heard of – but never personally seen in the wild:
Push the max date down into the view you put Tableau on top of and let the view do the work.
Use a script to modify the Tableau workbook's XML.

Column type for ZipCode in PostgreSQL database?

What is the correct column type for holding ZipCode values in PostgreSQL database?
I strongly disagree with the advice presented here.
The accepted answer accepts things that aren't digits.
The question is about Zip Codes, not postal codes.
If we assume the post is wrong and means international postal codes, there are characters that appear in international postal codes that don't appear in that list, and many international - and also the US domestic - postal codes can be over ten characters
If we actually answer the question they asked, about zip codes, then there should be no accomodation for anything but digits (and arguably the hyphen)
US zip codes can be up to 11 digits long (13 characters counting the two dashes) - there is a zip, a zip+4, and a zip+6 (which programmers would call zip+4+2) notation; the last is used by skyscrapers, universities, et cetera
US zip codes are always non-negative integers, and therefore should not be stored as text data, which is subject to non-canon representation problems (ask anyone who's done a system about that time they found out that their zip 00203 didn't match the zip 203 that they accidentally got when constantly unnecessarily parsing string representations)
If you pretend you're actually tracking international post codes, the short character sequence limited text fields here don't even begin to do the job. The word "China" comes to mind.
My opinon:
Decide whether you're actually handling US postal codes or international
If you're handling US postal codes, track them as unsigned integers, and left-pad them with zeros when text representing them. (Think unix timestamps and local TZ representations if you need to understand why this will be simpler in the long run.)
If you're handling international post codes, store them in an unbounded unicode string, tie them to the country they represent, and validate country by country with check constraints. This problem is far more difficult than it sounds up front. International addresses are some of the least standardized things on Earth. Wait'll you find out how Japanese house numbers work, or why the British postal 6-code has the gaps it has.
It is something like xxxxx-xxxx, so varchar(10) is recommended.
If you want to check the syntax of the values in the database, you could create a domain type for zip codes.
CREATE DOMAIN zipcode varchar(10)
CONSTRAINT valid_zipcode
CHECK (VALUE ~ '[A-Z0-9-]+'); -- or a better regular expression
You could have a look at this site, which proposes this regex:
(^\d{5}(-\d{4})?$)|(^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}$)
But you should check it works for the PostgreSQL regex syntax.
it depends on what kind of zip you want. if you're sure you will only need to store the standard 5 digit then use an int will be the most space saving.
however if you need to do the 5+4 extended then a 10 digit character field is best. I personally suggest that as it does make it easier in the future if you end up needing to store international postal codes 10 digits covers just about every possible postal code format i've come across.

Solr date field tdate vs date?

So I have a question about Solr's field date types which is pretty straight forward: what's the difference between a 'date' field and a 'tdate' one?
The schema .xml claims that 'For faster range queries, consider the tdate type' and 'A Trie based date field for faster date range queries and date faceting. '
Fair enough... but what's the precisionStep="6" all about? should i change this? does it change the way i would create the query in case I use the tdate? What's the real advantage or what does Solr do that makes it better?
P.S went through google, Solr manual, solr wiki and the java docs without any luck so I'd appreciate a kind and explanatory answer :)...
Also checked:
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
http://web.archiveorange.com/archive/v/AAfXfqRYyLnDFtskmLRi
Trie fields make range queries faster by precomputing certain range results and storing them as a single record in the index. For clarity, my example will use integers in base ten. The same concept applies to all trie types. This includes dates, since a date can be represented as the number of seconds since, say, 1970.
Let's say we index the number 12345678. We can tokenize this into the following tokens.
12345678
123456xx
1234xxxx
12xxxxxx
The 12345678 token represents the actual integer value. The tokens with the x digits represent ranges. 123456xx represents the range 12345600 to 12345699, and matches all the documents that contain a token in that range.
Notice how in each token on the list has successively more x digits. This is controlled by the precision step. In my example, you could say that I was using a precision step of 2, since I trim 2 digits to create each extra token. If I were to use a precision step of 3, I would get these tokens.
12345678
12345xxx
12xxxxxx
A precision step of 4:
12345678
1234xxxx
A precision step of 1:
12345678
1234567x
123456xx
12345xxx
1234xxxx
123xxxxx
12xxxxxx
1xxxxxxx
It's easy to see how a smaller precision step results in more tokens and increases the size of the index. However, it also speeds up range queries.
Without the trie field, if I wanted to query a range from 1250 to 1275, Lucene would have to fetch 25 entries (1250, 1251, 1252, ..., 1275) and combine search results. With a trie field (and precision step of 1), we could get away with fetching 8 entries (125x, 126x, 1270, 1271, 1272, 1273, 1274, 1275), because 125x is a precomputed aggregation of 1250 - 1259. If I were to use a precision step larger than 1, the query would go back to fetching all 25 individual entries.
Note: In reality, the precision step refers to the number of bits trimmed for each token. If you were to write your numbers in hexadecimal, a precision step of 4 would trim one hex digit for each token. A precision step of 8 would trim two hex digits.
Basically trie ranges are faster. Here is one explanation. With precisionStep you configure how much your index can grow to get the performance benefits. To quote from the link you are referring:
More importantly, it is not dependent on the index size, but instead the precision chosen.
and
the only drawbacks of TrieRange are a little bit larger index sizes, because of the additional terms indexed
Your best bet is to just look at the source code. Some of the things for Solr aren't well documented and the fastest way to get a trustworthy answer is to simply look at the code. If you haven't been in the code yet, that too is to your benefit. At least in the long run.
Here's a link to the TrieTokenizerFactory.
http://www.jarvana.com/jarvana/view/org/apache/solr/solr-core/1.4.1/solr-core-1.4.1-sources.jar!/org/apache/solr/analysis/TrieTokenizerFactory.java?format=ok
The javadoc in the class at least hints at the purpose of the precisionStep. You could dig futher.
EDIT: I dug a bit further for you. It's passed off directly to Lucene's NumericTokenStream class, which will used the value during parsing the token stream. Probably worth closer examination. It seems to deal with granularity and is probably a tradeoff between size in the index and speed.