I have a cell array in MATLAB that is reasonably large with very mixed data called sales. One column is a store identifier and that store identifier is a mix of letters and numbers (i.e. AF7-24M). I want to grab all the rows in sales where the store identifier is equal to a particular store identifier. I tried doing some logical indexing but I'm having trouble getting it to work...
I also would rather not just loop over all the rows because I need to do this multiple times and it's quite a slow process
you can use strcmp... for example:
strcmp(sales,'AF7-24M')
For case insensitive string comparison, use strcmpi instead of strcmp.
Related
I have two tables in which I have data coming from two different sources. One of the field of each table contains the title of a movie, but for some reason out of my control, the titles are not always exactly the same.
So I use the ts_vector to get rid of all the minor differences (stop words, plurals and so on).
See an example here: http://sqlfiddle.com/#!17/5ccbc/3
My problem is how to compare the two ts_vector without taking into account the numberic values, but just the text content. If I compare directly the two fields, I only get the exact match between values, including position of each word. The only solution I have found is using the strip() function, that remove positions and weights from tsvector, leaving only the text content.
I was wondering if there is a fastest way to compare ts_vectors.
You could create in index on the stripped vector:
create index on tbl1 (strip(ts_title));
create index on tbl2 (strip(ts_title));
But given that your query has to fetch every row of each table, it is unlikely this would serve much of a point. Doing a merge join between the precomputed stripped vectors could be faster, but probably not once you include the overhead of building and maintaining the indexes. If the real WHERE clause is more restrictive (selecting only a few rows from one or the other of the tables) then please share the real query.
I am working on a database that (hopefully) will end up using a primary key with both numbers and letters in the values to track lots of agricultural product. Due to the way in which the weighing of product takes place at more than one facility, I have no other option but to maintain the same base number but use letters in addition to this base number to denote split portions of each lot of product. The problem is, after I create record number 99, the number 100 suddenly floats up and underneath 10. This makes it difficult to maintain consistency and forces me to replace this alphanumeric lot ID with a strictly numeric value in order to keep it sorted (which I use "autonumber" as the data type). Either way, I need the alphanumeric lot ID, and so having 2 ID's for the same lot can be confusing for anyone inputting values into the form. Is there a way around this that I am just not seeing?
If you're using query as a data source then you may try to sort it by string converted to number, something like
SELECT id, field1, field2, ..
ORDER BY CLng(YourAlphaNumericField)
Edit: you may also try Val function instead of CLng - it should not fail on non-numeric input
Why not properly format your key before saving ? e.g: "0000099". You will avoid a costly conversion later.
Alternatively, you could use 2 fields as the composite PK. One with the Number (as Long) and one with the Location (as String).
I have a table with 100+ values corresponding to each row, so I'm exploring different ways to store them.
Without any indexes, would I lose anything if I store these 100 values in an integer[] column in postgresql? As compared to storing them in separate columns.
Plus, since we can add indexes to array elemnets,
CREATE INDEX test_index on test ((foo[1]));
Would there be a performance difference queries using such an index as compared to regular index on a column?
As far as I've read, this performance difference would come into picture in arrays with variable length elements; but I'm not sure about fixed length ones.
Don't go for the lazy way.
If you need to store 100 and more values as array, it is ok, if it has sense has array for your application, your data.
If you need to query for a specific element of the array, then this design is not good, regardless of performances, and you must use columns. This will help you in the moment you must delete a "column" in the middle or redesign it.
Anyway, as wrote by Frank in comments, if values are all same type, consider to model them to another table (if also the meaning is the same).
Am just mulling over what's the best way i.e. data structure to store a data that has several rows and columns. Shoudl I store it as :
1. an array of arrays?
2. NSDictionary?
or is there any grid-like data structure in iOS where I can easily fetch any row/column with ease from the data structure? For example, I must be able to fetch the value in 3rd column in row 5. Currently, say, I store each row as an array and the store these arrays in another array (so an array of arrays, say), then to fetch the value in column 3 in row 5, I need to fetch the 5th row in the array of arrays, and then in the resulting array, I need to fetch the 3rd object. Is there a better way to do this? Thoughts please?
then to fetch the value in column 3 in row 5, I need to fetch the 5th
row in the array of arrays, and then in the resulting array, I need to
fetch the 3rd object. Is there a better way to do this?
An array of arrays is fine for the implementation, and the collection subscripting that was recently added to Objective-C makes this easier -- you can use an expression like
NSString *s = myData[m][n];
to get the string at the nth column of the mth row.
That said, it may still be a good idea to create a separate class for your data structure, so that the rest of your code is protected from needing to know about how the data is stored. That would also simplify the process of changing the implementation from, say, an array of arrays to a SQLite table or something else.
Your data storage class doesn't need to be fancy or complicated. Here's a first pass:
#interface DataTable
- (id)objectAtRow:(NSInteger)row column:(NSInteger)column;
- (void)setObjectAtRow:(NSInteger)row column:(NSInteger)column;
#end
I'm sure you can see how to implement those in terms of an array of arrays. You'll have to do a little work to add rows and/or columns when the caller tries to set a value outside the current bounds. You might also want to add support for things like fast enumeration and writing to and reading from property lists, but that can come later.
There are other ways of doing it, but there's nothing wrong with the method you are using. You could use an NSDictionary with a key of type NSIndexPath, for example, or even a string key of the form "row,col", but I don't see any advantage in those except for sparse matrices.
You can either use an array of arrays, as you're doing, or an array of dictionaries. Either is fine, and I don't think there's any preference for one over the other. It all depends on which way is most convenient for you to set up the data structure in the first place. Accessing the data for the table view is equally easy using either method.
I have a Cassandra ColumnFamily (0.6.4) that will have new entries from users. I'd like to query Cassandra for those new entries so that I can process that data in another system.
My sense was that I could use a TimeUUIDType as the key for my entry, and then query on a KeyRange that starts either with "" as the startKey, or whatever the lastStartKey was. Is this the correct method?
How does get_range_slice actually create a range? Doesn't it have to know the data type of the key? There's no declaration of the data type of the key anywhere. In the storage_conf.xml file, you declare the type of the columns, but not of the keys. Is the key assumed to be of the same type as the columns? Or does it do some magic sniffing to guess?
I've also seen reference implementations where people store TimeUUIDType in columns. However, this seems to have scale issues as this particular key would then become "hot" since every change would have to update it.
Any pointers in this case would be appreciated.
When sorting data only the column-keys are important. The data stored is of no consequence neither is the auto-generated timestamp. The CompareWith attribute is important here. If you set CompareWith as UTF8Type then the keys will be interpreted as UTF8Types. If you set the CompareWith as TimeUUIDType then the keys are automatically interpreted as timestamps. You do not have to specify the data type. Look at the SlicePredicate and SliceRange definitions on this page http://wiki.apache.org/cassandra/API This is a good place to start. Also, you might find this article useful http://www.sodeso.nl/?p=80 In the third part or so he talks about slice ranging his queries and so on.
Doug,
Writing to a single column family can sometimes create a hot spot if you are using an Order-Preserving Partitioner, but not if you are using the default Random Partitioner (unless a subset of users create vastly more data than all other users!).
If you sorted your rows by time (using an Order-Preserving Partitioner) then you are probably even more likely to create hotspots, since you will be adding rows sequentially and a single node will be responsible for each range of the keyspace.
Columns and Keys can be of any type, since the row key is just the first column.
Virtually, the cluster is a circular hash key ring, and keys get hashed by the partitioner to get distributed around the cluster.
Beware of using dates as row keys however, since even the randomization of the default randompartitioner is limited and you could end up cluttering your data.
What's more, if that date is changing, you would have to delete the previous row since you can only do inserts in C*.
Here is what we know :
A slice range is a range of columns in a row with a start value and an end value, this is used mostly for wide rows as columns are ordered. Known column names defined in the CF are indexed however so they can be retrieved specifying names.
A key slice, is a key associated with the sliced column range as returned by Cassandra
The equivalent of a where clause uses secondary indexes, you may use inequality operators there, however there must be at least ONE equals clause in your statement (also see https://issues.apache.org/jira/browse/CASSANDRA-1599).
Using a key range is ineffective with a Random Partitionner as the MD5 hash of your key doesn't keep lexical ordering.
What you want to use is a Column Family based index using a Wide Row :
CompositeType(TimeUUID | UserID)
In order for this not to become hot, add a first meaningful key ("shard key") that would split the data accross nodes such as the user type or the region.
Having more data than necessary in Cassandra is not a problem, it's how it is designed, so what you must ask yourself is "what do I need to query" and then design a Column Family for it rather than trying to fit everything in one CF like you'd do in an RDBMS.