How to store the number 365 on 8 bits - numbers

Just a little background: initially when the application was designed (well, more than 15 years ago), our customer assured us, that their business plan never will change, thus the application was designed.
The application manages certificates, which according to the country's law by our customer must be renewed every half year (180 days), and these certificates are stored on smart cards, which are distributed to various staffing entities belonging to our customer.
These smart cards are used to access various on-site systems, and the smart card reader automatically adjusts values on them according to its needs, such as decrementing the number of remaining days. When this number reaches zero the card is re-issued with the initial 180 number again
The smart cards are tightly packed with various information, among them am 8 bit value, telling how many days are left. Because someone, 15 years ago never thought that in 15 years the law will change and they thought that 8 bits should be enough for everyone.
Obviously, now a change request came in from the customer ... telling us that there is a plan to change the law to allow the existing certificates to extend to 1 year ...
And here comes the question: How can we store 365 on 8 bits ... any hacks are welcome.

I'm suggesting another approach: can you modify the smart card reader to decrement the counter every 2 days?
So you'll need to count up to 183 (for leap years).
edit
There is also room for a leap year bit, assuming you reverse the counting logic.
If you count from 0 to 182 (days from 0 to 364), after the counter reaches 160 (128+32) 1010 0000 the bit in position 7 is no longer changed.
if (dayCount >= 160) {
realDayCount = dayCount & 0xBF; // 1011 1111
leapFlag = dayCount & 0x40; // 0100 0000
}

Related

The best answer for queueing theory in an interview?

last week I've done a phone interview and got stuck on one question:
Bank 1 has 5 tellers, each serving one customer at a time
independently; Bank 2 has 5 tellers, sharing a queue of customers to
serve. Which bank you prefer? Why?
I don't know what the interviewer want to know through this question. What I can do is just say, Bank 2 is better since most banks only have one queue and one queue can ensure no one will wait too long if one teller got stuck.
But I find the interviewer seems not satisfied.
Anyone knows the best answer for this question?
Your answer is not considering the real question the interviewer is asking - "How do you think about this type of problem?". Your answer given is "other people do it this way, so do it that way." That is a cop-out, which is why it was unsatisfactory. Instead, consider that they are comparing single-threading and multi-threading as operations. Discuss the advantages and disadvantages of each. Discuss the reasons why you would prefer one over the other based upon technical concerns. You only addressed one edge case - one teller gets "stuck". What about optimizing wait times, considering types of tasks performed at each station, etc?
Interviewers care about how you think, not about the answer you give.
With bank 1 you have 5 tellers and 5 lines, one for each teller. That means if 5 people got in line for the first teller, they would need to wait and be processed one at a time by that teller, all the while the other 4 tellers are doing nothing. With bank2 you have 5 tellers and 1 line. if 5 people all get in line they would be dispersed to the five tellers and all be helped at the same time. So bank 2 is more efficient in design.

How can I improve on peach by using the scan adverb?

I have a list of times, and I want to count how many are within a given time window from each time. i.e. for each time, how many of the following times are less than 10 minutes ahead.
This is what I have so far, and it works well for small lists, ts<10000, but even using peach, it struggles when the count is above this, and get wsfull errors.
q)ts:asc `time$10?10000000
q)ts where each {(x<=y) and (y<x+00:10)}[;ts] peach ts
00:10:20.526 00:11:41.084 00:15:59.360 00:20:15.625
00:11:41.084 00:15:59.360 00:20:15.625
00:15:59.360 00:20:15.625
,00:20:15.625
,01:11:14.831
02:14:36.999 02:17:47.700
02:17:47.700 02:25:44.267 02:27:02.389
02:25:44.267 02:27:02.389 02:28:16.790
02:27:02.389 02:28:16.790
,02:28:16.790
I have tried using scan and over, but can't figure out how to stop the iteration when I need to.
EDIT - If its just the count you're after then all you need is:
q)1+(ts bin ts+00:10)-til count ts
1 3 2 1 1 2 2 1 1 1
OLD ANSWER - If you're trying to actually generate the list of times (not sure why you need to do that) then no matter what you do you're going to end up eating up a good bit of memory (generating a large list of potentially large lists of times). Also peach may not be useful since the time gained in outsourcing to other threads might be undone by the time needed to send the result back to the main thread. And any form of iteration/loop is likely to be slow since it will be acting atomically
Having said that, the best solution would be to make use of bin, especially if your list is sorted. For example, either of these two should give you the list of times and they scale a bit better (again, you shouldn't need to generate the lists if you're just using them to count - see edit above):
ts t+til each 1+(ts bin ts+00:10)-t:til count ts
{y[1]#y[0]_x}[ts] each t,'1+(ts bin ts+00:10)-t:til count ts
but they still involve generating lists of lists of indices and they will still add up.
Note that the bin (which is giving the index of the last item within 10mins of each item) is incredibly fast and memory efficient, even if the list is in the tens of millions:
q)ts:asc `time$10000000?10000000
q)
q)\ts ts bin ts+00:10
160 201326768

Roster/Timetable generation

I'm working on a tool to generate a timetable for employee up to a month taking into account commercial and labor law constraints. Few challenges and difference from similar problem:
The shift concept contains breaks split up to half an hour.
There is no concept of full 8 shifts as the referred similar problem. e.g. There is a need to have 2 resources at 8am, 2.5 resources at 3PM (e.g. give half hour break)..
and regular constraints like hours per day, hours before break, break time...
Possible solutions is to rely on using a solver aka OR-Tools and Optaplanner. any hints?
If you go with OptaPlanner and don't want to follow the Employee Rostering design of assigning 8 hours Shifts (planning entities) to Employees (planning value), because of your 2th constraint,
then you could try to follow the Cheap Time Example design, something like this:
#PlanningEntity public class WorkAssignment {
Employee employee;
#PlanningVariable PotentialShiftStartTime startTime
#PlanningVariable int durationInHalfHours
}
PotentialShiftStartTime is basically any time a shift can validly start, so Mon 8:00, Mon 8:30, Mon 9:00, etc.
The search space will be huge, in this free form way, but there are tricks to improve scalability (Nearby Selection, pick early for CH, Limited Selection for CH, ...).
To get out of the free form way (= to reduce the search space), you might be able to combine startTime and durationInHalfHours into PotentialShift, if for example it's not possible to start a 8 hour shift at 16:00 in the afternoon. But make sure the gain is huge before introducing that complexity.
In any case, the trouble with this design is determining how many WorkAssignment instances to create. So you'll probably want to create the max number possible per employee and work with nullable=true to ignore unused assignments.

Why is my identifier collision rate increasing?

I'm using a hash of IP + User Agent as a unique identifier for every user that visits a website. This is a simple scheme with a pretty clear pitfall: identifier collisions. Multiple individuals browse the internet with the same IP + user agent combination. Unique users identified by the same hash will be recognized as a single user. I want to know how frequently this identifier error will be made.
To calculate the frequency, I've created a two-step funnel that should theoretically convert at zero percent: publish.click > signup.complete. (Users have to signup before they publish.) Running this funnel for 1 day gives me a conversion rate of 0.37%. That figure is, I figured, my unique identifier collision probability for that funnel. Looking at the raw data (a table about 10,000 rows long), I confirmed this hypothesis. 37 signups were completed by new users identified by the same hash as old users who completed publish.click during the funnel period (1 day). (I know this because hashes matched up across the funnel, while UIDs, which are assigned at signup, did not.)
I thought I had it all figured out...
But then I ran the funnel for 1 week, and the conversion rate increased to 0.78%. For 5 months, the conversion rate jumped to 1.71%.
What could be at play here? Why is my conversion (collision) rate increasing with widening experiment period?
I think it may have something to do with the fact that unique users typically only fire signup.complete once, while they may fire publish.click multiple times over the course of a period. I'm struggling however to put this hypothesis into words.
Any help would be appreciated.
Possible explanations starting with the simplest:
The collision rate is relatively stable, but your initial measurement isn't significant because of the low volume of positives that you got. 37 isn't very many. In this case, you've got two decent data points.
The collision rate isn't very stable and changes over time as usage changes (at work, at home, using mobile, etc.). The fact that you got three data points that show an upward trend is just a coincidence. This wouldn't surprise me, as funnel conversion rates change significantly over time, especially on a weekly basis. Also bots that we haven't caught.
If you really get multiple publishes, and sign-ups are absolutely a one-time thing, then your collision rate would increase as users who only signed up and didn't publish eventually publish. That won't increase their funnel conversion, but it will provide an extra publish for somebody else to convert on. Essentially, every additional publish raises the probability that I, as a new user, am going to get confused with a previous publish event.
Note from OP. Hypothesis 3 turned out to be the correct hypothesis.

How to optimise Drools execution Performance?

We have 1000 rules under a single Rule flow Group
We have severe performance issue while executing (around 10-20 secs)
We thought instead of having under single Rule Flow group,Splitting into multiple Agenda group will improve the performance.
Or creating multiple entry points increase the performance?
Anyone came across this problem?
Any Links /documentation also welcomed.
There was a similar issue several months ago on the Drools user list, and it was resolved successfully by a different approach according to may proposal. It may be applicable here, too.
Let's say there are some risk factors that influence the premium for a car insurance. Attributes are: age, previous incidents, amount of damage in previous incidents, gender, medical classification.
Each of these values influences the premium by a few credits.
You can write tons of rules like
Application( age <= 32 && <=35, previous == 1, damage <= 1000,
gender == 'F', medical == 0.25 )
then
setPremium( 421 );
The proposed solution was to insert (constant) facts for each such parameter set and have a single rule that determines the matching parameter set and setting the premium from the field in the parameter set.