In Mongodb, is their a way to add an extra attribute to documents (--TSV, --headerline) created with mongoimport?
I don't have control of the data being imported, however I need to be able to distinguish one import data set from another, and their are no attributes in the file to distinguishing one import from another.
I think your best option would be to write your own script to parse the csv/tsv and import it into mongodb. I think it would take under 10 lines of python.
Alternatively if nothing else is inserting into the collection, and your import runs are far enough apart, you could just do something like this between runs:
db.collecton.update({extraField:null}, {$set:{extraField: ObjectId()}}, false, true)
This would work best with an index on {extraField:1}.
Related
I am working on physics simulation research. I have a large fixed grid in one of my projects that does not vary with time. The fields on the grid, on the other hand, vary with time in the simulation. I need to use VTK to record the field data in each step for visualization (Paraview).
The method I am using is to write a separate *.vtu file to disk at each time step. This basically serves the purpose, but actually writes a lot of duplicate data (re-recording the geometry of the mesh at each step), which not only consumes more disk space, but also wastes time on encoding and parsing.
I would like to have a way to write the mesh information only once, and the rest of the time only new field data is written, while being able to guarantee the same visualization. Please let me know if VTK and Paraview provide such an interface and how to implement it.
Using .pvtu and refer to the same .vtu as Piece for each step should do the trick.
See this similar post on the ParaView discourse, and the pvtu doc
EDIT
This seems to be a side effect of the format, this is not supported by the writer.
The correct solution is to use another file format ...
Let me provide my own research findings for reference.
As Nico said, with the combination of pvtu/vtu files, we could theoretically implement a geometry structure stored in a separate vtu file, referenced by a pvtu file. Setting the NumberOfPieces attribute of the ptvu file to 1 would enable the construction of only one separate vtu file.
However, the VTK library does not expose a dedicated operation interface to control the writing process of vtu files. No matter how it is set, as long as the writer's input contains geometry structures, the writer will write geometry information to disk, and this process cannot be skipped through the exposed interface.
However, it is indeed possible to make multiple pvtu files point to the same vtu file by manually editing the piece node in the ptvu file, and paraview can recognize and visualize such a file group properly.
I did not proceed to try adding arrays to the unstructured grid and using pvtu output.
So, I think the conclusion is.
if you don't want to dive into VTK's library code and XML implementation, then this approach doesn't make sense.
if you are willing to write a series of files, delete most of them from the vtu file, and then point all the pvtu's piece nodes to the only surviving vtu file by editing the pvtu file, you can save a lot of disk space, but will not shorten the write, read, and parse times.
If you implement an XML writer by yourself, you can achieve all the requirements in theory, but it requires a lot of coding work.
I have a very simple question. Normally in other programs, such as word, you can just simply save the document under a different resulting in two separate documents. However, this doesn't work for AnyLogic. Does anyone know how to duplicate a project?
If you do file save as it will create a new alp file for you
But for it to be a truly different model you need to change the Java Package to something unique... See how it is kept as model24 in my screenshot
But be careful it can have some unwanted consequences in a very complex model and you will need to fix these manually, but all doable
I want to use my own dataset, which consists of numbers, in PyTorch. They are therefore available as a csv file, for example. What is the easiest way to do load this into PyTorch? So far I only know how to use already existing datasets in PyTorch, but I don't want to do that.
You need to create a custom class that inherits from Pytorch's Dataset class.
Then, you need to wrap it with a DataLoader.
Follow this tutorial for an in depth explanation:
https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#creating-a-custom-dataset-for-your-files
The easiest way to import your dataset would be to:
Use pandas package to load your csv file:
import pandas as pd
data = pd.read_csv("filename.csv")
Then, implement a very simple pytorch Dataset class as described here.
You will finally pass your instance of Dataset as the first parameter of a pytorch DataLoader
I am appending an array of numbers to an existing excel file using this:
dlmwrite(mydatafile,newdataarray,'-append');
I need to add a column to the beginning of the new row for a text identifier (employee name), but I can't get Matlab to write the name to a single cell. Does anyone have any ideas how I'd be able to do this?
Your question is not completely clear, for example it is not completely defined how you can add a column to a row.
If the following does not work I would recommend you to provide a small scale example of the data that you have and the things you want to append.
Assuming you just need to get this done and are not looking for a pretty solution you could try to:
First read it into matlab
Then perform the operation that you like
Then write it to a new file
This will allow you to do pretty much anything but whether it is convenient depends on your specific needs.
I'm attempting to write up a Yesod app as a replacement for a Ruby JSON service that uses MongoDB on the backend and I'm running into some snags.
the sql=foobar syntax in the models file does not seem too affect which collection Persistent.MongoDB uses. How can I change that?
is there a way to easily configure mongodb (preferably through the yaml file) to be explicitly read only? I'd take more comfort deploying this knowing that there was no possible way the app could overwrite or damage production data.
Is there any way I can get Persistent.MongoDB to ignore fields it doesn't know about? This service only needs a fraction of the fields in the collection in question. In order to keep the code as simple as possible, I'd really like to just map to the fields I care about and have Yesod ignore everything else. Instead it complains that the fields don't match.
How does one go about defining instances for models, such as ToJSON. I'd like to customize how that JSON gets rendered but I get the following error:
Handler/ProductStat.hs:8:10:
Illegal instance declaration for ToJSON Product'
(All instance types must be of the form (T t1 ... tn)
where T is not a synonym.
Use -XTypeSynonymInstances if you want to disable this.)
In the instance declaration forToJSON Product'
1) seems that sql= is not hooked up to mongo. Since sql is already doing this it shouldn't be difficult for Mongo.
2) you can change the function that runs the queries
in persistent/persistent-mongoDB/Database/Persist there is a runPool function of PersistConfig. That gets used in yesod-defaults. We should probably change the loadConfig function to check a readOnly setting
3) I am ok with changing the reorder function to allow for ignoring, although in the future (if MongoDB returns everything in ordeR) that may have performance implications, so ideally you would list the ignored columns.
4) This shouldn't require changes to Persistent. Did you try turning on TypeSynonymInstances ?
I have several other Yesod/Persistent priorities to attend to before these changes- please roll up your sleeves and let me know what help you need making them. I can change 2 & 3 myself fairly soon if you are committed to testing them.