minizinc constraint solving with large string data - minizinc

I'm working on a Highschool scheduling project with Minizinc. I have a list of teachers, classes, rooms, times, and events all of type string and a list of duration of type integer. I found on stackoverflow that I need to represent these data with numbers but my data is large. How do I go about this without manually converting each one of them?
Thank you

Unfortunately, MiniZinc don't have any tools to convert strings to data in the appropriate format, so I'm afraid that you have to convert it with some tool outside MiniZinc.
If you know any high level programming languages such as Perl, Python, or Ruby it shouldn't be too hard.

Related

Store arbitrary precision integer in PostgreSQL

I have an application that needs to store cryptocurrency values to PostgreSQL database. The application uses arbitrary precision integers, and those I have to store to the database. What's the most efficient way to do that?
Why arbitrary precision? For two reasons:
For security. There shall never be an overflow.
For necessity. For example, Ethereum uses uint256 by default internally, and 1 Ether = 10^18 wei. So transactions will have a gigantic number of digits that has to be stored if accuracy is to be sought (which it's).
The best solution I came up with is to convert the number to a blob and store the number as bits in raw format. But I'm hoping there's a better way that's more suitable for a database.
EDIT:
The reason why I need this method for storing to be better is performance. I don't want to get into benchmarks and all this detail. That's why I'm keeping the question simple, or otherwise it'll get complicated. So the question is whether there's a proper way to do this.
Have a look at the documentation.
If you need efficiency but also depend on accurate values (which I would agree with), then you really should pre-define columns or different tables with specific presets using decimal(precision, scale).
If your tests reveal that the standard data types are not performing good enough you might want to have a look at bignum and maybe others.

using correct Scala type to lower memory footprint

I have a really big dataset which I process locally using Spark.
The dataset contains of Tuple with relations between two users where the usernames are String and the relation is Double:
(("Bob", "John"), 0.5)
The dataset takes too much memory and I get Java heap space errors and the program crashes.
My plan is to lower the footprint of the dataset by using different representation of the memory.
My initial idea was to change the String usernames to their hash value but apparently Scala collections accept only classes so I have to use Integer for this which didn't help. as referenced here Int/Integer are objects and of course have overhead to the data.
Since collections can't be used with primitives I found the TYPE data type which as far as I understand also an Object.
My question is really 2 questions:
Is there really no good way to use primitives in collections?
Are there any other hacks one can apply where dealing with such giant tables of the type I specified earlier?

What's the fastest way to create a C-compatible unbounded string in Ada?

I'm creating an Ada program for Windows that needs to be able to pass strings to some functions written in C. Until now I have been manipulating the strings in Ada using the Unbounded_String type, and then converting the data to an Interfaces.C.char_array before passing it to the C functions.
This works fine, only performance is a bit of an issue on slower, older computers. The C function is sometimes called repeatedly on a slightly modified version of a string, and requires the Unbounded_String to be converted to a similar char_array every time. The strings aren't modified by the C functions, so the only ever have to be converted to char_array.
I have thought of storing the strings in char_array, and converting from an Ada type each time the string is manipulated. The data is passed to C more often than it is changed, so it would improve performance. The problem with this approach is that often the length of the string will change, sometimes by a lot, and there is no way of knowing the maximum length beforehand.
The ideal solution would be to have something similar to an Unbounded_String only storing the string as a char_array. By this I mean something that is dynamically sized, allocating a new array when the old one isn't big enough and it should allow Ada Characters/Strings to be inserted (and also removed) into the array, converting only those characters to C chars.
Is there any (relatively) easy, fast way of doing this without having to implement it myself? Or is there any other quick way of manipulating C-compatible strings in Ada? Thanks in advance for any suggestions.
You don't mention how many objects you expect to have of your type, but I will assume that we are not talking about so many that you will be anywhere near exhausting your available address space.
Just encapsulate a sufficiently large char_array (say 10 times the largest expected size) in a private record, and create the needed operations to manipulate it.
If you're very unlucky, you may need to tell your compiler/run-time environment that you need an unusually large stack, but save that worry for when you actually experience it.

What are the differences between Tables and Categorical Arrays, and cell and struct arrays?

In the newest version of MATLAB there are two new data types: Tables and Categorical Arrays.
Table is a new data type suitable for holding data and metadata, and can be used with mixed-type tabular data that are often stored as columns in a text file or in a spreadsheet. It consists of rows and column-oriented variables.
Categorical arrays are useful for holding categorical data - which have values from a finite list of discrete categories.
In previous versions I would have handled these use cases using cell and struct arrays. What are the differences between these and the new data types?
I haven't upgraded yet so I can't play around but based on this video and this article I can already see some advantages. They're not necessarily adding functionality that you couldn't do before, but rather just taking the hassle out of it. Using readtable over xlsread is immediately appealing to me. Being able to access columns by name rather than just by index is great, I do it in other languages often. In a table where column order doesn't really matter (unlike a matrix) it's really convenient to be able to address a column by it's name instead of having to know the column order. Also you can merge table using the join function which wasn't that easy to do with cell arrays before. I see that you can name the rows too, I didn't see what advantage that gives you and I can't play around but I know in some languages (like PANDAS in Python and I think in R as well) naming rows means you can work with time series data with different series that are not completely overlapping and not have to worry about alignment. I hope this is the case in Matlab too! Categorical arrays also look like just an extra layer of convenience, kind of like an enum. You never actually need a enum but it just makes development more pleasant.
Anyway that's just my two cents, I probably won't get an opportunity to play around with them any time soon but I look forward to using them when I do need them.
I use the table format to organize different input/output cases in my data, where the result may come from different tables. Main advantages compared to struct or cell array:
convenient table functions such as join, innerjoin, outerjoin
the use of fields <> more robust programming than arrays
data format is easy to export/import (e.g. delimited .txt file) <> no fprintf()
the data file can be opened in excel/Calc (libreoffice) <> no .mat

Cost of isEqualToString: vs. Numerical comparisons

I'm working on a project with designing a core data system for searching and cataloguing images and documents. One of the objects in my data model is a 'key word' object. Every time I add a new key word I first want to first run though all of the existing keywords to make sure it doesn't already exist in the current context.
I've read in posts here and in a lot of my reading that doing string comparisons is a far more expensive processing than some other comparison operations. Since I could easily end up having to check many thousands of words before a new addition I'm wondering if it would be worth using some method that would represent the key word strings numerically for the purpose of this process. Possibly breaking down each character in the string into a number formed from the UTF code for each character and then storing that in an ID property for each key word.
I was wondering if anyone else thought any benefit might come from this approach or if anyone else had any better ideas.
What you might useful is a suitable hash function to convert your text strings into (probably) unique numbers. (You might still have to check for collision effects.)
Comparing intrinsic numbers in C code is a much faster for several reasons. It avoids the Objective C runtime dispatch overhead. It requires accessing less total memory. And the executable code for each comparison is usually just an instruction or 3, rather than a loop with incrementers and several decision points.