Firestore permutation explosion of composite index - google-cloud-firestore

I'm stuck in the composite index of firestore. I have a couple of fields under a user, which are like A(string), B(string), C(string), D(array), E(array), F(array), G(array). Users can be searched and queried by different combinations of these fields. For example, "A == 'Male', B == '2020' ", which tells me to create composite index and after I created the query like "A == 'Male', B == '2020', C == "Ontario' " still needs a new composite index.
What I'm wondering is that do I have to create all the permutation of composite index?
The array fields are more than two, but the SDK only allows one "array-contains" clause. What can I do for this? For this, I have tried to split an array [element1,element2] to the structure like "element1 : true, element2 : true", which can be queried by "==" clause. But the problem is that the array is dynamic, every time I append a "==" clause, the SDK tells me I need to create a new composite index.
Anyone has any ideas about this?

There is no tooling that will create all the desired combinations automatically. However, the documentation suggests that you can use the Firebase CLI to deploy indexes that are defined using its JSON configuration. This configuration file is not documented, so you will have to reverse engineer it based on indexes that you create manually. An example of one such index configuration is here. What you can do is manually create an index, then run firebase init, choose Firestore, and it will dump the indexes to its JSON config, which you can edit and redeploy. As of today, you will have to run firebase init in a fresh folder to get new indexes from the server.
Once you know how to deploy indexes like this, you can write code to create all the combinations of indexes in that JSON config. It's not pretty, but it's doable.

Related

Sphinx / Manticore - base one plain index off another?

I have a plain text index that sucks data from MySQL and inserts it into Manticore in a format I need (e.g. converting datetime strings to timestamp, CONCATing some fields etc.
I then want to create a second plain text index based off this data to group it further. This will save me having to either re-run the normalisation that's done to the first index on INSERT or make it easier for me to query in the future.
For example, my first index is a list of all phone calls that have been made / received (telephone number, duration, agent). The second index should group by Year-Month-Date in such a way that I can see how many calls each agent made on that day. This means I end up with idx_phone_calls and idx_phone_calls_by_date.
Currently, I generate the first index from MySQL, then get Manticore to query itself (by setting the MySQL host to localhost. It works, but it feels as though I should be able to query Manticore directly from within the index. However, I'm struggling to find if that's possible.
Is there a better way to do it?
Well Sphinx/Manticore, has its own GROUP BY function. So maybe can just run the final query against the original index anyway, avoid the need for the second index.
Sphinx's Aggregation (in some way) is more powerful than MySQL, and can do some 'super aggregation' functions (like with WITHIN GROUP ORDER BY)
But otherwise there is no direct way to create an off another (eg there is no CREATE TABLE idx_phone_calls_by_date SELECT ... FROM idx_phone_calls ... )
Your 'solution' of directing indexer to query the data from searchd is good. In general this should be pretty efficent, particully on localhost, there is little overhead. Maintains the logical seperation of searchd being for queries, indexer being for well building indexes.

MongoDB Map Reduce: Auto-created index name too long, possible to customize?

Debugging MongoDB mapreduce is painful, so I'm not 100% sure I understand what's going on here, but I think I get the general idea...
The error message I'm getting is this: mr failed, removing collectionCannotCreateIndex: namespace name generated from index name "my_dbname.tmp.mr.collectionname_69.$_id.aggregation_method_1__id.date_key.start_1__id.date_key.timeres_1__id.region.center_2dsphere" is too long (127 byte max)
The key I'm using for mapreduce is a complex object with four or five properties, so I'm guessing what's happening is that when Mongo tries to create its temporary output collections using my specified key, it tries to auto-create an index on that complex key; but since the key itself has several properties, the default name for the key is too long. When I index complex objects like this under "normal" circumstances, I just give the index a custom name. But I don't see a way to do that for the collections mapreduce generates automatically.
Is there a simple way to fix this without changing my key structure?
Well, turns out I was tricked by the error message! the <collectionname> in the error message referenced above is the name of the INPUT collection whose records I'm processing with mapreduce... but the index it's referring to is an index that was part of the OUTPUT collection! So I just had to give the index in the output collection a name, and voila, problem solved. What weird behavior.

How to SET jsonb_column = json_build_array( string_column ) in Sequelize UPDATE?

I'm converting a one-to-one relationship into a one-to-many relationship. The old relationship was just a foreign key on the parent record. The new relationship will be an array of foreign keys on the parent record.
(Using Postgres dialect, BTW.)
First I'll add a new JSONB column, which will hold an array of UUIDs.
Then I'll run a query to update all existing rows such that the value from the old column is now stored in the new column (as the first element in an array).
Finally, I'll remove the old column.
I'm looking for help with step 2: writing the update statement that will update all rows, setting the value of the new column based on the value of the old column. Basically, I'm trying to figure out how to express this SQL query using Sequelize:
UPDATE "myTable"
SET "newColumn" = json_build_array("oldColumn")
-- ^^ this really works, btw
Where:
newColumn is type JSONB, and should hold an array (of UUIDs)
oldColumn is type UUID
names are double-quoted because they're mixed case in the DB (shrug)
Expressed using Sequelize sugar, that might be something like:
const { models } = require('../sequelize')
await models.MyModel.update({ newColumn: [ 'oldColumn' ] })
...except that would result in saving an array that contains the string "oldColumn" rather than an array whose first element is the value in that row's oldColumn column.
My experience, and the Sequelize documentation, is focused on working with individual rows via the standard instance methods. I could do that here, but it'd be a lot better to have the database engine do the work internally instead of forcing it to transfer every row to Node and then back again.
Looking for whatever is the most Sequelize-idiomatic way of doing this, if there is one.
Any help is appreciated.

Sequelize how to use aggregate function on Postgres JSONB column

I have created one table with JSONB column as "data"
And the sample value of that column is
[{field_id:1, value:10},{field_id:2, value:"some string"}]
Now there are multiple rows like this..
What i want ?
I want to use aggregate function on "data" column such that, i should
get
Sum of all value where field_id = 1;
Avg of value where field_id = 1;
I have searched alot on google but not able to find a proper solution.
sometimes it says "Field doesn't exist" and some times it says "from clause missing"
I tried referring like data.value & also data -> value lastly data ->> value
But nothing is working.
Please let me know the solution if any one knows,
Thanks in advance.
Your attributes should be something like this, so you instruct it to run the function on a specific value:
attributes: [
[sequelize.fn('sum', sequelize.literal("data->>'value'")), 'json_sum'],
[sequelize.fn('avg', sequelize.literal("data->>'value'")), 'json_avg']
]
Then in WHERE, you reference field_id in a similar way, using literal():
where: sequelize.literal("data->>'field_id' = 1")
Your example also included a string for the value of "value" which of course won't work. But if the basic Sequelize setup works on a good set of data, you can enhance the WHERE clause to test for numeric "value" data, there are good examples here: Postgres query to check a string is a number
Hopefully this gets you close. In my experience with Sequelize + Postgres, it helps to run the program in such a way that you see what queries it creates, like in a terminal where the output is streaming. On the way to a working statement, you'll either create objects which Sequelize doesn't like, or Sequelize will create bad queries which Postgres doesn't like. If the query looks close, take it into pgAdmin for further work, then try to reproduce your adjustments in Sequelize. Good luck!

Amazon DynamoDB table design and querying

We are considering DynamoDB for an expectedly large dataset. I come from a strong SQL background so the No-SQL way of thinking is new to me.
I have a problem and design, but ran into what appears to be a dead end.
The documentation says to make sure your Hash keys are widely distributed to aid in performance, okay that makes sense.
I am going to be recording various datapoints/actions for users. It makes sense to me that the hash key should be the user-id, and my range key can be the action(s) performed.
Now, if I want all the actions user #1 performs, I can easily query that.
But, if I want all the USERS who performed action X, I cannot do that without a table scan. From the Query documentation:
A Query operation directly accesses items from a table using the table primary key, or from an index using the index key. You must provide a specific hash key value.
So it would seem I am limited to getting data from a specific user, unless I am willing to do a table scan, which is slower and consumes many capacity units.
My question is, I think, ultimately a design question. Maybe I am missing something when it comes to No-SQL? Should my hash key be something else? Or is it simply that my requirements do not fit in with No-SQL (and more specifically, DynamoDB)?
It is almost as if the hash key is a kind of grouping with DynamoDB. I considered changing the hash key to the actions we are intending to put into place, but then I am not widely distributing my keys...
The DynamoDb way to meet your requirement to allow both types of queries is to store the data in two tables, one with hash key user-id and range key action-id, and one with hash key action-id and range key user-id.
And you should think about if you need all the data in both tables, or if one can be a summary table. For example, say you have a limited number of possible actions. Instead of putting the full record of every action in the user-keyed table, you might want a table with only one row for each user: a hash key of user - id, and a second column that is multiply valued and is a list of any action-id that the user has performed at least once.
You must create a Global Secondary Index (GSI). What this does is it creates a second pair of hash and range keys which differ from the original keys. You can then query the same table by also including an index name in your parameters.
Example in JS:
var table = tablename;
var index = actionId-username-gsi;
var action = actionId;
var params = {
TableName : table,
IndexName : index,
KeyConditionExpression : 'actionId = :v_actionId',
ExpressionAttributeValues : {
':v_actionId': { N : action }
},
ProjectionExpression : 'actionId, username'
};
ddb.query(params, err) {
if(err) {
// Oh well
} else {
// Do something
}
};
This will query the actionId-username-gsi index and look for any actionId hashes with the value provided. Using ProjectionExpression will return only the specified attributes' values for each item, lowering throughput if that ever becomes a concern. I hope this helps answer your question.
node.js aws amazon-dynamodb nosql
I guess the global secondary indexes option is better, as you get a single table.
Creating two tables will create redundancy and additional work to maintain consistency when doing any CUD (Create, Update, Delete) operation on any one table.