Generate unique random strings - perl

I am writing a very small URL shortener with Dancer. It uses the REST plugin to store a posted URL in a database with a six character string which is used by the user to access the shorted URL.
Now I am a bit unsure about my random string generation method.
sub generate_random_string{
my $length_of_randomstring = shift; # the length of
# the random string to generate
my #chars=('a'..'z','A'..'Z','0'..'9','_');
my $random_string;
for(1..$length_of_randomstring){
# rand #chars will generate a random
# number between 0 and scalar #chars
$random_string.=$chars[rand #chars];
}
# Start over if the string is already in the Database
generate_random_string(6) if database->quick_select('urls', { shortcut => $random_string });
return $random_string;
}
This generates a six char string and calls the function recursively if the generated string is already in the DB. I know there are 63^6 possible strings but this will take some time if the database gathers more entries. And maybe it will become a nearly infinite recursion, which I want to prevent.
Are there ways to generate unique random strings, which prevent recursion?
Thanks in advance

We don't really need to be hand-wavy about how many iterations (or recursions) of your function there will be. I believe at every invocation, the expected number of iterations is geomtrically distributed (i.e. number of trials before first success is governed by the geomtric distribution), which has mean 1/p, where p is the probability of successfully finding an unused string. I believe that p is just 1 - n/63^6, where n is the number of currently stored strings. Therefore, I think that you will need to have stored 30 billion strings (~63^6/2) in your database before your function recurses on average more than 2 times per call (p = .5).
Furthermore, the variance of the geomtric distribution is 1-p/p^2, so even at 30 billion entries, one standard deviation is just sqrt(2). Therefore I expect ~99% of the time that the loop will take fewerer than 2 + 2*sqrt(2) interations or ~ 5 iterations. In other words, I would just not worry too much about it.

From an academic stance this seems like an interesting program to work on. But if you're on the clock and just need random and distinct strings I'd go with the Data::GUID module.
use strict;
use warnings;
use Data::GUID qw( guid_string );
my $guid = guid_string();

Getting rid of recursion is easy; turn your recursive call into a do-while loop. For instance, split your function into two; the "main" one and a helper. The "main" one simply calls the helper and queries the database to ensure it's unique. Assuming generate_random_string2 is the helper, here's a skeleton:
do {
$string = generate_random_string2(6);
} while (database->quick_select(...));
As for limiting the number of iterations before getting a valid string, what about just saving the last generated string and always building your new string as a function of that?
For example, when you start off, you have no strings, so let's just say your string is 'a'. Then the next time you build a string, you get the last built string ('a') and apply a transformation on it, for instance incrementing the last character. This gives you 'b'. and so on. Eventually you get to the highest character you care for (say 'z') at which point you append an 'a' to get 'za', and repeat.
Now there is no database, just one persistent value that you use to generate the next value. Of course if you want truly random strings, you will have to make the algorithm more sophisticated, but the basic principle is the same:
Your current value is a function of the last stored value.
When you generate a new value, you store it.
Ensure your generation will produce a unique value (one that did not occur before).

I've got one more idea based on using MySQL.
create table string (
string_id int(10) not null auto_increment,
string varchar(6) not null default '',
primary key(string_id)
);
insert into string set string='';
update string
set string = lpad( hex( last_insert_id() ), 6, uuid() )
where string_id = last_insert_id();
select string from string
where string_id = last_insert_id();
This gives you an incremental hex value which is left padded with non-zero junk.

Related

How do I prevent users to use thousands separator in FileMaker Pro?

In FileMaker Pro, when using number field, the user can choose to use a thousand separator or not. For example, if I have a database with a field for the price of an item, the user can either enter 1,000 or 1000.
I am using my database to generate an XML file that needs to be uploaded. The thing is, that my XML scheme dictates that only a value of 1000 is allowed and not 1,000. Therefore, I want to either automatically remove the comma, or (my preference in this case) alert the user when trying to enter a value with a thousand separator.
What I tried is the following.
For the field, I am setting Validation options. For example:
Require Strict data type: Numeric Only
Validated by calculation: Position ( Self ; ","; 1 ; 1 ) = 0
Validated by calculation: Self = Substitue ( Self, ",", "")
Auto-enter calculation: Filter( Self ; "0123456789." )
Unfortunately, none of these work. As the field is defined as a number (and I want to keep it like this, as I am also performing calculations based on this number), the Position function and the Substitute function apparently ignore the thousand separator!
EDIT:
Note that I am generating my XML by concatenating a string, for example:
"<Products><Product><Name>" & Name & "</Name><Price>" & Price & "</Price></Product></Product>"
The reason is that what I am exporting is dependent on the values in my database. Therefore, I am not using the [File][Export records...] function.
Auto-enter calculation will work, but you need to uncheck the box "Do not replace existing value of field" (which is checked by default).
I'd suggest using the calculation GetAsNumber(self) as the auto-enter calc. If it should only contain integers, wrap that in a call to Int()
I am using my database to generate an XML file that needs to be uploaded. The thing is, that my XML scheme dictates that only a value of 1000 is allowed and not 1,000.
If this is only a problem when you export, why not handle it when exporting?
If you are exporting as XML using XSLT, you can add an instruction to
your stylesheet to remove the comma from all number fields;
Alternatively, you can export from a layout where the field is
formatted to display without the comma and select the Apply current's layout data formatting to exported data option when
exporting.
Added:
Perhaps I should have clarified. I am not using the export function to generate the XML as there is some logic involved in how the XML should be formatted (dependent on the data that I want to export). What I do instead is that I make a string where I combine XML-tags and actual values from the database.
IMHO, you're making a mistake by not taking advantage of the built-in XML/XSLT export option. Any imaginable logic can be implemented this way, without burdening your solution with the fragile task of creating a valid XML.
In any case, if you're using the field in a calculation, you can replace all references to it with:
GetAsNumber (YourField )
to get an unformatted, numeric-only, value.
Your question puzzles me. As far as I know, FileMaker does not store the thousands separator, but rather offers it only as a display option.
That's also why those functions can't find it.
Are you sure you are exporting the raw data and not a "formatted as layout" variant?

How to sort values of a field in a structure in Matlab?

In my code, I have a structure and in a field of it, I want to sort its values.
For instance, in the field of File_Neg.name there are the following values, and They should be sorted as the right values.
File_Neg.name --> Sorted File_Neg.name
'-10.000000.dcm' '-10.000000.dcm'
'-102.500000.dcm' '-12.500000.dcm'
'-100.000000.dcm' '-100.000000.dcm'
'-107.500000.dcm' '-102.500000.dcm'
'-112.500000.dcm' '-107.500000.dcm'
'-110.000000.dcm '-110.000000.dcm'
'-12.500000.dcm' '-112.500000.dcm'
There is a folder that there are some pictures with negative labels in it (above example are labels of pictures). I want to get them in the same order as present in the folder(that's mean the Sorted File_Neg.name). But when running the following code the values of Files_Neg.name load as the above example (left: File_Neg.name), while I want the right form.
I have also seen this and that but they didn't help me.
How to sort values of a field in a structure in Matlab?
Files_Neg = dir('D:\Rename-RealN');
File_Neg = dir(strcat('D:\Rename-RealN\', Files_Neg.name, '\', '*.dcm'));
% when running the code the values of Files_Neg.name load as the above example (left: File_Neg.name)
File_Neg.name:
This answer to one of the questions linked in the OP is nearly correct for the problem in the OP. There are two issues:
The first issue is that the answer assumes a scalar value is contained in the field to be sorted, whereas in the OP the values are char arrays (i.e. old-fashioned strings).
This issue can be fixed by adding 'UniformOutput',false to the arrayfun call:
File_Neg = struct('name',{'-10.000000.dcm','-102.500000.dcm','-100.000000.dcm','-107.500000.dcm','-112.500000.dcm','-110.000000.dcm','-12.500000.dcm'},...
'folder',{'a','b','c','d','e1','e2','e3'});
[~,I] = sort(arrayfun(#(x)x.name,File_Neg,'UniformOutput',false));
File_Neg = File_Neg(I);
File_Neg is now sorted according to dictionary sort (using ASCII letter ordering, meaning that uppercase letters come first, and 110 still comes before 12).
The second issue is that OP wants to sort according to the magnitude of the number in the file name, not using dictionary sort. This can be fixed by extracting the value in the anonymous function applied using arrayfun. We use str2double on the file name, minus the last 4 characters '.dcm':
[~,I] = sort(arrayfun(#(x)abs(str2double(x.name(1:end-4))),File_Neg));
File_Neg = File_Neg(I);
Funnily enough, we don't want to use 'UniformOutput',false any more, since the anonymous function now returns a scalar value.

Defaultdict() the correct choice?

EDIT: mistake fixed
The idea is to read text from a file, clean it, and pair consecutive words (not permuations):
file = f.read()
words = [word.strip(string.punctuation).lower() for word in file.split()]
pairs = [(words[i]+" " + words[i+1]).split() for i in range(len(words)-1)]
Then, for each pair, create a list of all the possible individual words that can follow that pair throughout the text. The dict will look like
[ConsecWordPair]:[listOfFollowers]
Thus, referencing the dictionary for a given pair will return all of the words that can follow that pair. E.g.
wordsThatFollow[('she', 'was')]
>> ['alone', 'happy', 'not']
My algorithm to achieve this involves a defaultdict(list)...
wordsThatFollow = defaultdict(list)
for i in range(len(words)-1):
try:
# pairs overlap, want second word of next pair
# wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
EDIT: wordsThatFollow[tuple(pairs[i])].update(pairs[i+1][1][0]
except Exception:
pass
I'm not so worried about the value error I have to circumvent with the 'try-except' (unless I should be). The problem is that the algorithm only successfully returns one of the followers:
wordsThatFollow[('she', 'was')]
>> ['not']
Sorry if this post is bad for the community I'm figuring things out as I go ^^
Your problem is that you are always overwriting the value, when you really want to extend it:
# Instead of this
wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
# Do this
wordsThatFollow[tuple(pairs[i])].append(pairs[i+1][1])

Assigning a whole DataStructure its nullind array

Some context before the question.
Imagine file FileA having around 50 fields of different types. Instead of all programs using the file, I tried having a service program, so the file could only be accessed by that service program. The programs calling the service would then receive a DataStructure based on the file structure, as an ExtName. I use SQL to recover the information, so, basically, the procedure would go like this :
Datastructure shared by service program :
D FileADS E DS ExtName(FileA) Qualified
Procedure called by programs :
P getFileADS B Export
D PI N
D PI_IDKey 9B 0 Const
D PO_DS LikeDS(FileADS)
D LocalDS E DS ExtName(FileA) Qualified
D NullInd S 5i 0 Array(50) <-- Since 50 fields in fileA
//Code
Clear LocalDS;
Clear PO_DS;
exec sql
SELECT *
INTO :LocalDS :nullind
FROM FileA
WHERE FileA.ID = :PI_IDKey;
If SqlCod <> 0;
Return *Off;
EndIf;
PO_DS = LocalDS;
Return *On;
P getFileADS E
So, that procedure will return a datastructure filled with a record from FileA if it finds it.
Now my question : Is there any way I can assign the %nullind(field) = *On without specifying EACH 50 fields of my file?
Something like a loop
i = 1;
DoW (i <= 50);
if nullind(i) = -1;
%nullind(datastructure.field) = *On;
endif;
i++;
EndDo;
Cause let's face it, it'd be a pain to look each fields of each file every time.
I know a simple chain(n) could do the trick
chain(n) PI_IDKey FileA FileADS;
but I really was looking to do it with SQL.
Thank you for your advices!
OS Version : 7.1
First, you'll be better off in the long run by eliminating SELECT * and supplying a SELECT list of the 50 field names.
Next, consider these two web pages -- Meaningful Names for Null Indicators and Embedded SQL and null indicators. The first shows an example of assigning names to each null indicator to match the associated field names. It's just a matter of declaring a based DS with names, based on the address of your null indicator array. The second points out how a null indicator array can be larger than needed, so future database changes won't affect results. (Bear in mind that the page shows a null array of 1000 elements, and the memory is actually relatively tiny even at that size. You can declare it smaller if you think it's necessary for some reason.)
You're creating a proc that you'll only write once. It's not worth saving the effort of listing the 50 fields. Maybe if you had many programs using this proc and you had to create the list each time it'd be a slight help to use SELECT *, but even then it's not a great idea.
A matching template DS for the 50 data fields can be defined in the /COPY member that will hold the proc prototype. The template DS will be available in any program that brings the proc prototype in. Any program that needs to call the proc can simply specify LIKEDS referencing the template to define its version in memory. The template DS should probably include the QUALIFIED keyword, and programs would then use their own DS names as the qualifying prefix. The null indicator array can be handled similarly.
However, it's not completely clear what your actual question is. You show an example loop and ask if it'll work, but you don't say if you had a problem with it. It's an array, so a loop can be used much like you show. But it depends on what you're actually trying to accomplish with it.
for old school rpg just include the nulls in the data structure populated with the select statement.
select col1, ifnull(col1), col2, ifnull(col2), etc. into :dsfilewithnull where f.id = :id;
for old school rpg that can't handle nulls remove them with the select statement.
select coalesce(col1,0), coalesce(col2,' '), coalesce(col3, :lowdate) into :dsfile where f.id = :id;
The second method would be easier to use in a legacy environment.
pass the key by value to the procedure so you can use it like a built in function.
One answer to your question would be to make the array part of a data structure, and assign *all'0' to the data structure.
dcl-ds nullIndDs;
nullInd Ind Dim(50);
end-ds;
nullIndDs = *all'0';
The answer by jmarkmurphy is an example of assigning all zeros to an array of indicators. For the example that you show in your question, you can do it this way:
D NullInd S 5i 0 dim(50)
/free
NullInd(*) = 1 ;
Nullind(*) = 0 ;
*inlr = *on ;
return ;
/end-free
That's a complete program that you can compile and test. Run it in debug and stop at the first statement. Display NullInd to see the initial value of its elements. Step through the first statement and display it again to see how the elements changed. Step through the next statement to see how things changed again.
As for "how to do it in SQL", that part doesn't make sense. SQL sets the values automatically when you FETCH a row. Other than that, the array is used by the host language (RPG in this case) to communicate values back to SQL. When a SQL statement runs, it again automatically uses whatever values were set. So, it either is used automatically by SQL for input or output, or is set by your host language statements. There is nothing useful that you can do 'in SQL' with that array.

Generate unique 3 letter/number code and compare to existing ones in PHP/MySQL

I'm making a code generation script for UN/LOCODE system and the database has unique 3 letter/number codes in every country. So for example the database contains "EE TLL", EE being the country (Estonia) and TLL the unique code inside Estonia, "AR TLL" can also exist (the country code and the 3 letter/number code are stored separately). Codes are in capital letters.
The database is fairly big and already contains a huge number of locations, the user has also the possibility of entering the 3 letter/number him/herself (which will be checked against the database before submission automatically).
Finally neither 0 or 1 may be used (possible confusion with O and I).
What I'm searching for is the most efficient way to pick the next available code when none is provided.
What I've came up with:
I'd check with AAA till 999, but then for each code it would require a new query (slow?).
I could store all the 40000 possibilities in an array and subtract all the used codes that are already in the database... but that uses too much memory IMO (not sure what I'm talking about here actually, maybe 40000 isn't such a big number).
Generate a random code and hope it doesn't exist yet and see if it does, if it does start over again. That's just risk taking.
Is there some magic MySQL query/PHP script that can get me the next available code?
I will go with number 2, it is simple and 40000 is not a big number.
To make it more efficient, you can store a number representing each 3-letter code. The conversion should be trivial because you have a total of 34 (A-Z, 2-9) letters.
I would for option 1 (i.e. do a sequential search), adding a table that gives the last assigned code per country (i.e. such that AAA..code are all assigned already). When assigning a new code through sequential scan, that table gets updated; for user-assigned codes, it remains unmodified.
If you don't want to issue repeated queries, you can also write this scan as a stored routine.
To simplify iteration, it might be better to treat the three-letter codes as numbers (as Shawn Hsiao suggests), i.e. give a meaning to A-Z = 0..25, and 2..9 = 26..33. Then, XYZ is the number X*34^2+Y*34+Z == 23*1156+24*34+25 == 27429. This should be doable using standard MySQL functions, in particular using CONV.
I went with the 2nd option. I was also able to make a script that will try to match as close as possible the country name, for example for Tartu it will try to match T** then TA* and if possible TAR, if not it will try TAT as T is the next letter after R in Tartu.
The code is quite extensive, I'll just post the part that takes the first possible code:
$allowed = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ23456789';
$length = strlen($allowed);
$codes = array();
// store all possibilities in a huge array
for($i=0;$i<$length;$i++)
for($j=0;$j<$length;$j++)
for($k=0;$k<$length;$k++)
$codes[] = substr($allowed, $i, 1).substr($allowed, $j, 1).substr($allowed, $k, 1);
$used = array();
$query = mysql_query("SELECT code FROM location WHERE country = '$country'");
while ($result = mysql_fetch_array($query))
$used[] = $result['code'];
$remaining = array_diff($codes, $used);
$code = $remaining[0];
Thanks for your opinion, this will be the key to transport codes all over the world :)