How to make a fake join in MongoDB? - mongodb

I am developing a webapp using PHP and MongoDB. In this app I keep two collections. One to keep data about files and one to keep track about download events for each file.
I need to get the 10 latest downloaded files but I know joins is not an option. The events document only stores the file id and the other collection stores the thumbnail.
Right now I first get the 10 recently downloaded files and order it by date and the order is fine but then I use this array of files (their ids) and make a where_in query where I look for files whos id is present in the ids array. This also works fine (I get the thumbnails for the selected files) but I cannot keep the order anymore. The most recently downloaded file is not longer on top.
How can I keep the order of the ids without looping through them thus making 10 new queries instead of just one?
I cannot change the schema because I got over 40.000 documents :)
UPDATE
In short this is what I want to do:
Get all the IDs of the 10 recently downloaded files. Sorted by download timestamp.
Use this array of Ids and make a query to the files collection and get the details for each file like thumbnail, decription and so on.
The above steps works fine, BUT I cannot keep the order from the first step thus I cannot get the most recently downloaded file on top. I know I could look trought the id array and get data for each file but that would cost me 10 queries instead of one.

I don't really get your problem. Here's some pseudo-code:
// get last N download events
events = get_latest_downloads()
// retrieve associated file data
file_array = get_all_file_info_by_events(events)
display_data = []
for(e in events) {
data = find_file_info_in_the_array(file_array, e.file_id)
display_data.push(data)
}
// now display_data is contains full file info, sorted by download time.

Related

Query large list of metadate in weaviate

I have 100.000 images, each of them have 500 orb vectors, and each image has a unique tag.
My general issue is, when I insert a new image (i.e. 500 new vectors), how can I know if the image's tag is already in the database ?
What I do is to attache to each vector a metadata "tag". In can retrieve the inserted tags with
result = client.query.get('orb_vector', ['tag'])\
.with_limit(200)\
.do()
This provides more or less 200 tags among the 100.000 existing.
Accordingly to the documentation, that way of doing is not scalable.
How do I do ?
Context:
My database is not very dynamic; apart of the initial big insertion (100.000+ images), there will be few insertions each day. So I'm okay with a request taking 5 minutes and keeping the result in memory in a non-dynamic way. Plain python list is okay.
Clarification: each image has one tag, but 500 vectors. So each tag is present 500 times in the database.
I'm using python.
What I can do:
Writing the list of tags in a json/mongo/other and reading/updating it each time I insert new images.
I prefer to avoid this solution since the synchronization between the weaviate database and the json will just be a nightmare.
Have you considered creating a separate class for the tags and using query filters?
For example, define a schema for a class named Tag where:
it has a property called "name" to store the tag's name e.g. outdoors, indoors, etc
it has a property called "images" to store the cross references to the images that are tagged with "outdoors".
Then, when you want to insert an image with tag "car", for example, you do a WHERE filter on the Tag class where the name name is Equal to "car".
If the result is empty, then that tag does not exist.

keep the size of array in a document less than or equal to 9

I am using Firestore and have users documents with images array. I am able to add / delete images from the images array without any problem. Both add and delete operations to the array are considered update to the user document. I would like to have an update security rule such that I can't have more than 9 images.
I am using the following rule:
allow update: if resource.data.images.size() < 9
As soon as I have nine images in the array, I am no longer able to delete an image. So the rule is conflicting for add and remove. What is the best way to handle this. I appreciate any help.
Thanks
resource.data contains the fields in the document that currently exist. request.resource.data contains the fields that are about to be written by the client app. You will want to use the latter to check the incoming fields rather than the existing fields.
allow update: if request.resource.data.images.size() < 9

MongoDB - Getting first set of $lt

I'm using MongoDB to store data and when retrieving some, I need a subset which I'm uncertain how to obtain.
The situation is this; items are created in batches, spanning about a month between. When a new batch is added, the previous batch has a deleted_on date set.
Now, depending on when a customer is created, they can always retrieve the current (not deleted) set of items, and all items in the one batch that wasn't deleted when they registered.
Thus, I want to retrieve records that have deleted_on as either null, or all items that have the deleted_on on the closest date in the future from the customer.added_on-date.
In all of my solutions, I run into one of the below problems:
I can get all items that were deleted before the customer was created - but they include all batches - not just the latest one.
I can get the first item that was deleted after the customer was created, but nothing else from the same batch.
I can get all items, but I have to modify the result set afterwards to remove all items that don't apply.
Having to modify the result afterwards is fine, I guess, but undesirable. What's the best way to handle this?
Thanks!
PS. The added_on (on the customer) and deleted_on on the items have indexes.

Meteor/Mongo - Ensuring that one collection gets filled before another

Is there a way to ensure that one script runs before another in Meteor? I'm currently developing some software and using sample data for now. I'm a bit curious if there's a way that I can fill a particular collection only after another collection that it depends on has been filled
For example, An Invoices collection that has a patient_id: Patients.findOne(...) field that depends on the Patients collection actually having data. Is there a way to perform this other than having them on the same file, with Patients being filled before Invoices?
Assuming you are trying to create test data in the right order, then you can run the test data generator for Invoices in a Tracker.autorun. This will be run reactively:
Meteor.startup(()=>{
Tracker.autorun(()=>{
if ( Patients.find().count() && !Invoices.find().count() ){
populateInvoices();
}
});
});

Is it possible to store hidden metadata information that is tied to a specific Table or Cell within a Word document?

I am trying to store metadata (basically a unique id) along with each cell of a table in a Word document. Currently, for the add-in I'm developing, I am querying the database, and building a table inside the Word document using the data that is retrieved.
I want to be able to save any of the user's edits to the document, and persist it back to the database. My initial thought was to store a unique id along with each cell in the table so that I would be able to tell which records to update. I would also like to store some sort of "isChanged" flag within each cell so that I could tell which cells were changed. I found that I could add the needed information into the "ID" property of the cell - however, that information was not retained if the user saved the document, closed it, and re-opened it. I then tried storing the data by adding a data to the "Fields" collection - but that did not work and threw a runtime error. Here is the code that I tried:
object t1 = Word.WdFieldType.wdFieldEmpty;
object val = "myValue: " + counter;
object preserveFormatting = true;
tbl.Cell(i, j).Range.Fields.Add(tbl.Cell(i, j).Range, ref t1, ref val, ref preserveFormatting);
This compiles fine, but throws this runtime error "This command is not available".
So, is this possible at all? Or am I headed in the wrong direction?
Thanks in advance.
Wound up using "ContentControls" to store the information I needed. I used the "Title" field to store the unique id, and the "tag" field to track whether the field had been changed or not. See this link for more info: http://blogs.technet.com/gray_knowlton/archive/2010/01/15/associating-data-with-content-controls.aspx
Since a "Word 2007 Document" is XML, you can add a namespace to the document then adore the elements with attributes from your namespace. Word should ignore your namespace when loading and saving. Moreover, you can add new elments to store any information (metadata) needed.
With that said, I have not used this technique with Word, but I have done it successfully using Excel 2003.
First thing to try, is create a bare "Word 2007 Document". In your case, add a simple two by two table. Open it with a text or XML editor and add your namespace, and adore an attribute and add an element. Open with Word make a change then save it. Open with editor and make sure your namespace attribute and element have not been changed.