I have a telephony app which has a prompt which requires user choice. I made the app select one of 10 different phone prompts based on the last digit of the caller's phone number. Then I measure whether the user responds to the prompt (accept) or decides to skip to the next step (reject). I thought this would work well enough as a random selection, but I think I may be wrong.
What I'm finding is that the exact same prompt has a dramatically different response rate (25% vs 35%) for two different last digits. Now I'm curious why this is. Does anyone know how phone numbers are assigned and why the last digit would be significant?
I checked our billing database. We use Asterisk as PBX and store billing in PostgreSQL database.
select substring(cdr_callerid_norm from '[0-9]$') as last_digit, count(*)
from asterisk_cdr
where cdr_callerid_norm is not null and length(cdr_callerid_norm) > 2
group by last_digit
order by last_digit
Result:
last_digit | count
------------+-------
0 | 17919
1 | 13811
2 | 8257
3 | 20708
4 | 13492
5 | 13708
6 | 8813
7 | 6943
8 | 11693
9 | 7942
| 2584
(11 rows)
To me those numbers are not random. You can do similar thing with your billing and check it. I think phone numbers can be random in general, but if few customers calls you much more often then callers number will not be random. Consider using something other to vary prompt: random number, use time etc.
Related
I am new to Tableau visualization and need some help.
I have a set of shipping lanes which have whole numbers values based on the duration for the shipment.
Ex:
| Lane Name | 0 Day | 1 Day | 2 Day | 3 Day | 4 Day |
| SFO-LAX | 0 | 30 | 60 | 10 | 0 |
| JFK-LAX | 0 | 10 | 20 | 50 | 80 |
For each Lane Name, I want to return the column header based on the max value.
i.e. for SFO-LAX I would return '2 Day', for JFK-LAX I would return '4 Day', etc.
I then want to set this as a filter to only show 2 Day and 3 Day results in my Tableau data set.
Can someone help?
Two steps to this.
The first step is pretty easy, pivot your data. Read the Tableau help to learn how to PIVOT your data for analysis - i.e. make it look to Tableau as a longer 3 column data set with Lane, Duration, Value as the 3 columns. Tableau's PIVOT feature will let you view your data in that format (which makes analysis much easier) without actually changing the format of your data files.
The second step is a bit trickier than you'd expect at first glance, and there are a few ways to accomplish it. The Tableau features that can be used for this are LOD calcs, table calcs, filters and possibly sets. These are some of the more powerful but complicated parts of Tableau, so worth your time to learn about, but expect to take a while to spin up on them.
The easiest solution is probably to use one of the RANK() function - start as a quick table calc. Set your partitioning and addressing as desired so that the ranks are computed for the blocks of data that you desire - say partitioning on Lane and addressing or computing by Duration. Then when you are happy with the ranks you see, move the rank calculation to the filter shelf and only display data where rank = 1.
This is a quick solution once you get the hang of it, but it can get slow for very large data sets since the rank calculations are done on the client side, requiring fetching all the data that you end up not displaying. If performance becomes an issue, you might want to look at other solutions to do more of the calculations server side - possibly using LOD calcs or analytic aka windowing queries invoked from custom SQL
I need to design a (postgres) database table which can save a dynamic range of something.
Example:
We have a course table. Each course can have (a minimum AND a maximum) OR (a specific amount) of participants.
A math course can be started with 4 to 10 students while a physics course needs to have exactly 8 students to start.
After that, I want to be able to query that.
Let's say, I want all courses which can take 6 students. The math course should be returned, the physics course shouldn't as it requires exactly 8 students.
When I query for 8 students, both courses should be returned.
For the implementation I thought about two simple fields: min_students and max_students. Then I could simply check if the number is equal to or between these numbers.
The issue is: I have to fill both columns everytime. Also for the physics course which requires exactly 8 students.
example:
name | min_students | max_students
--------|--------------|-------------
math | 4 | 10
physics | 8 | 8
Is there a more elegant/efficient way? I also thought about making the max_students column nullable so I could check for
min_students = X OR (min_students >= X AND max_students <= Y)
Would that be more efficient? What about the performance?
Each course can have (a minimum AND a maximum) OR (a specific amount) of participants.
All courses has a minimum and a maximum, for some courses it happens to be the same value. It might seem trivial but thinking about it that way lets you define the problem in a simpler way.
Instead of:
min_students == X OR (min_students >= X AND max_students <= Y)
you can express it as:
num_students BETWEEN min_students AND max_students
BETWEEN is inclusive, so 8 BETWEEN 8 and 8 is true
Regarding optimizations
Additional conditionals makes queries exponentially harder to understand for humans and that leads to missed edge cases and usually results in inefficient queries anyway. Focus on making the code easy to understand, or "elegant", and never sacrifice readability for performance unless you are really sure that you have a performance issue in the first place and that your optimization actually helps.
If you have a table with 10M rows it might be worth looking at super optimizing disk usage if you run on extremely limited hw but reducing the disk usage of a table even with 20 MB is almost certainly wasting time in any normal circumstance even when it doesn't make the code more complicated.
Besides, each row takes up 23-24 bytes in addition to any actual data it contains so shaving of a byte or two wouldn't make a big difference. Setting values to NULL can actually increase disk usage in some situations.
Alternative solution
When using a range data type the comparison would look like this:
num_students #> x
where num_students represents a range (for example 4 to 10) and #> means "contains the value"
create table num_sequence (num int);
create table courses_range (name text, num_students int4range);
insert into num_sequence select generate_series(3,10);
insert into courses_range values
('math', '[4,4]'), ('physics', '[6,7]'), ('dance', '[7,9]');
select * from num_sequence
left join courses_range on num_students #> num;
num | name | num_students
-----+---------+--------------
3 | |
4 | math | [4,5)
5 | |
6 | physics | [6,8)
7 | physics | [6,8)
7 | dance | [7,10)
8 | dance | [7,10)
9 | dance | [7,10)
10 | |
Note that the ranges are output formatted like [x,y), hard brackets means inclusive while parenthesis means exclusive and that for integers: [4,4] = [4,5) = (3,5)
I am VERY new to Access - I was sort of thrust into designing a database for a research project I'm involved in. So, please bear with me because I know next to nothing :) The problem I am having is thus:
My database is for a medical research project, and is very time and date dependent, by which I mean I need to capture the date and time for each piece of data so that we end up with a sort of timeline of events for each subject.
As is, I have something like the following for each piece of data: (Each in it's own field)
ArrivalDate
ArrivalTime
HeartRateDate
HeartRateTime
HeartRateData
TemperatureDate
TemperatureTime
TemperatureData
BloodPressureDate
BloodPressureTime
BloodPressureData
There are around 200 similar pieces of data that I need to collect for each patient. To avoid having to re-enter the same data over and over, and also to reduce the potential for error, I would like to have all of the date fields in a given patient record default to the first one that is entered, in this case "Arrival Date". However, I also need each date field to be editable without affecting the others. The reason for this is that in the event that a patient's visit occurs over the span of a few days we can accurately record that.
I have tried messing around with the default value setting, as well as setting the control source to reference the "Arrival Date" field, but then of course any changes to one field affect them all. I am not even sure that what I am trying to do is possible but I will appreciate any help and/or suggestions!
Thank you in advance
Having all this data in separate columns of a big table isn't going to work. You don't measure things like temperature or blood pressure only once per patient, do you?
This is a classic one-to-many relation.
You should have a separate Measurements table, looking e.g. like this:
+--------+-----------+---------------+------------------+-----------+
| MeasID | PatientID | MeasType | MeasDateTime | MeasValue |
+--------+-----------+---------------+------------------+-----------+
| 1 | 1 | Temperature | 2017-05-17 14:30 | 38.2 |
| 2 | 1 | BloodPressure | 2017-05-17 14:30 | 130/90 |
| 3 | 1 | Temperature | 2017-05-17 18:00 | 38.5 |
| 4 | 2 | Temperature | etc. | |
+--------+-----------+---------------+------------------+-----------+
As Barmar wrote, there is no reason to have separate columns for date and time.
In the form where measurements are entered, you can use the BeforeInsert event to set MeasDateTime to the current time, with the Now() function.
So the user never has to enter it manually, but they can edit it if the measurement was at a different time than entering the data.
I am working out a postgreSQL query to allow for fuzzy searching capabilities when searching for a company's name in an app that I am working on. I have found and have been working with Postgres' Levenshtein method (part of the fuzzystrmatch module) and for the most part it is working. However, it only seems to work when the company's name is one word, for example:
With Apple (which is stored in the database as simply apple) I can run the following query and have it work near perfectly (it returns a levenshtein distance of 0):
SELECT * FROM contents
WHERE levenshtein(company_name, 'apple') < 4;
However when I take the same approach with Sony (which is stored in the database as Sony Electronics INC) I am unable to get any useful results (entering Sony gives a levenshtein distance of 16).
I have tried to remedy this problem by breaking the company's name down into individual words and inputting each one individually, resulting in something like this:
user input => 'sony'
SELECT * FROM contents
WHERE levenshtein('Sony', 'sony') < 4
OR levenshtein('Electronics', 'sony') < 4
OR levenshtein('INC', 'sony') < 4;
So my question is this: is there some way that I can accurately implement a multi-word fuzzy search with the current general approach that I have now, or am I looking in the complete wrong place?
Thanks!
Given your data and the following query with wild values for the Levenshtein Insertion (10000), Deletion (100) and Substitution (1) cost:
with sample_data as (select 101 "id", 'Sony Entertainment Inc' as "name"
union
select 102 "id",'Apple Corp' as "name")
select sample_data.id,sample_data.name, components.part,
levenshtein(components.part,'sony',10000,100,1) ld_sony
from sample_data
inner join (select sd.id,
lower(unnest(regexp_split_to_array(sd.name,E'\\s+'))) part
from sample_data sd) components on components.id = sample_data.id
The output is so:
id | name | part | ld_sony
-----+------------------------+---------------+---------
101 | Sony Entertainment Inc | sony | 0
101 | Sony Entertainment Inc | entertainment | 903
101 | Sony Entertainment Inc | inc | 10002
102 | Apple Corp | apple | 104
102 | Apple Corp | corp | 3
(5 rows)
Row 1 - no changes..
Row 2 - 9 deletions and 3 changes
Row 3 - 1 insertion and 2 changes
Row 4 - 1 deletion and 4 changes
Row 5 - 3 changes
I've found that splitting the words out causes a lot of false positives whe you give a threshold. You can order by the Levenshtein distance to position the better matches close to the top. Maybe tweaking the Levenshtein variables will help you to order the matches better. Sadly, Levenshtein doesn't weight earlier changes differently than later changes.
I made a table like
record
----------+
1 | one |
----------+
2 | two |
----------+
3 | three |
----------+
4 | four |
----------+
5 | five |
----------+
There isn't an ID column, those are just the row numbers I see beside each row in DBVisualizer. I added the rows in the order 1, 2, 3, 4, 5. Is this"
SELECT
*
FROM
sch.test_table limit 1;
certain always to get one, ie start with the "oldest" record? Or is will that change in large datasets?
No, as per the SQL specification the order is indeterminate when not using order by. You're working with a set of data, and sets are not ordered. Also, the size of the set should not matter.
The Postgresql documentation says:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
Which means that the rows might come back in the expected order, or they might not - there are no guarantees.
The bottom line is that if you want deterministic results you have to useorder by.