Getting Use to MongoDB - mongodb

I am just getting familiar with MongoDB and its my first time to use it. I am using Ubuntu enviornment for the development. I installed the MongoDB as per the instructions mentioned in the tutorial available on the MongoDb website. They said that the data will be stored in the /data/db, now i have two question about this
1) Where do i need to make this folder? means which directory?
2) I made one directory in the root directory / and then i ran the mongo server. I made one database with the name of movies with use movies command and then put some collection in it like action and comedy and then each collection i saved some documents with db.action.insert and db.comedy.insert command and when i tried to find it in /data/db folder i found three files and one folder but there is no file with the name of movies so my question is with what name my database is saved.
Please guide me in this aspect.
Thanks

By default, Mongo stores all of its DB files in the root of the configured dbdir directory. Look into the directoryperdb configuration option if you want Mongo to create a separate directory for each database and place the database files within it.
The default data directory /data/db needs to be created by you if it hasn't been already.
You can do this in Ubuntu via the shell:
CTL+ALT+T to bring up a Terminal window:
cd /
sudo mkdir /data
cd data
sudo mkdir db
Note: you may also need to change ownership of the new directory to be owned by you:
sudo chown your_user_name /data
Note: my Mongo installation defaulted to a dbdir of /var/lib/mongodb
You can also choose a different data directory when you start Mongo:
mongod --dbdir some/dir
If Mongo is installed as a service, you should be checking the dbdir configuration option in your mongodb.conf file (again, when I installed Mongo, which I believe I did via the Software Center, it placed mongodb.conf at /etc/mongodb.conf)
Hope this helps

What are the 3 files you see? test*?
Here are the 2 more likely cases
a) your DB files are into another location
b) your data is in the test database
Go back to your shell and run:
use local
db.startup_log.find({},{startTimeLocal:1,cmdLine:1}).sort({$natural:-1}).limit(1).pretty()
{
"_id" : "mycomputer",
"startTimeLocal" : "Mon Oct 7 09:44:25.353",
"cmdLine" : {
"config" : "mongodb.conf",
"dbpath" : "./db",
"fork" : "1",
"journal" : "1",
"logpath" : "./logs/mongo.log",
"port" : 27017,
"rest" : "1"
}
}
=> dbpath tells you the location that was provided on the command line or config file
Again in the shell, run the following:
use test
show collections
use movies
show collections
If you see your collection names under test, you may not have used 'use movies' before creating the data.

Related

How to specificy path of new collection in Mongo DB?

I am running Mongo on Windows 10. Data access is made trugh Pymongo (Python). All my data is stored in an old large HDD, but I would like to have some smaller collection stored on a much faster SSD? When I create a new collection either trough Compass or Pymongo there is no option to specify the path on disk.
Is it possible to learn such power?
If you want to have databases in different disks under the same dbPath , this is the option:
Add the option --directoryperdb in the mongod config file or at startup.
db.createDatabase("newDatabase")
You will see inside the dbPath folder for the new database like:
\dbPath\newDatabase
Stop the mongodb process.
Copy the content from \dbPath\neWDatabase to your SSD let say:
ssd:\newData
make link to the folder with:
mklink /d \newData \dbPath\newDatabase
or follow this tutotial
Start the mongodb process again.
Note:
As suggested by #Wermfried in the comment it is safe to have the option --directoryperdb set initially in the member before to get populated ...

Where is the MongoDB Data?

This is more of a general question about how MongoDB works.
But I've been using MongoDB for a while and everything seems to be working for me. The part that currently confuses me, though, is that when I visit the directory where the MongoDB data is saved (I'm using the default data/db) the directory is empty. The data is being persisted, I'm just confused - why does the directory appears empty on my computer?
I'm on Windows, if that's worth anything.
You can execute :
db.serverCmdLineOpts().parsed.storage.dbPath
from inside the mongo process and find what the dbPath startup parameter show to find where your data is saved
or alternatively check from the config file in windows:
<install directory>/bin/mongod.cfg
dbPath can be confusing.
When you run mongod without --dbpath option or unset it in config file then it defaults to \data\db
However, when you use the default config files which comes along the installation then the dpPath is different (I don't remember by heart which). So you should really check path with db.serverCmdLineOpts() as suggested by R2D2
I know, for Linux the default dbPath is /data/db but the pre-installed config file /etc/mongod.conf has set /var/lib/mongo

Scrapy MongoDB and Elasticsearch Synchronization

I'm using Scrapy to fetch data from websites and mongodb for persistence and elasticsearch for searching purposes.
My problem is that when Scrapy inserts data to Mongodb , Elasticsearch is not aware even with the listener set to inserts, updates, and deletes.
Should I add a new plugin for Scrapy to communicate directly with Elasticsearch , if so , why doesn't the listener listen to what happens to the database? Thanks!
The rivers in elasticsearch is deprecated.
Try this you can use transporter to sync data between mongodb and elasticsearch.
How To Sync Transformed Data from MongoDB to Elasticsearch with Transporter
Installing Go
In order to install the compose transporter we need to install Go language.
sudo apt-get install golang
Create a folder for Go from your $HOME directory:
mkdir ~/go; echo "export GOPATH=$HOME/go" >> ~/.bashrc
Update your path:
echo "export PATH=$PATH:$HOME/go/bin:/usr/local/go/bin" >> ~/.bashrc
Now go to the $GOPATH directory and create the subdirectories src, pkg and bin. These directories constitute a workspace for Go.
cd $GOPATH
mkdir src pkg bin
Installing Transporter
Now create and move into a new directory for Transporter. Since the utility was developed by Compose, we'll call the directory compose.
mkdir -p $GOPATH/src/github.com/compose
cd $GOPATH/src/github.com/compose
This is where compose/transporter will be installed.
Clone the Transporter GitHub repository:
git clone https://github.com/compose/transporter.git
Move into the new directory:
cd transporter
Take ownership of the /usr/lib/go directory:
sudo chown -R $USER /usr/lib/go
Make sure build-essential is installed for GCC:
sudo apt-get install build-essential
Run the go get command to get all the dependencies:
go get -a ./cmd/...
This step might take a while, so be patient. Once it's done you can build Transporter.
go build -a ./cmd/...
If all goes well, it will complete without any errors or warnings. Check that Transporter is installed correctly by running this command:
transporter
So the installation is complete.
Create some sample data in mongoDB.
Then we have to configure the transporter.
Transporter requires a config file (config.yaml), a transform file (myTransformation.js), and an application file (application.js) to migrate our data from MongoDB to Elasticsearch.
Move to the transporter directory:
cd ~/go/src/github.com/compose/transporter
Config File
You can take a look at the example config.yaml file if you like. We're going to back up the original and then replace it with our own contents.
mv test/config.yaml test/config.yaml.00
The new file is similar but updates some of the URIs and a few of the other settings to match what's on our server. Let's copy the contents from here and paste into the new config.yaml file. Use nano editor again.
nano test/config.yaml
Copy the contents below into the file. Once done, save the file as described earlier.
# api:
# interval: 60s
# uri: "http://requestb.in/13gerls1"
# key: "48593282-b38d-4bf5-af58-f7327271e73d"
# pid: "something-static"
nodes:
localmongo:
type: mongo
uri: mongodb://localhost/foo
tail: true
es:
type: elasticsearch
uri: http://localhost:9200/
timeseries:
type: influx
uri: influxdb://root:root#localhost:8086/compose
debug:
type: file
uri: stdout://
foofile:
type: file
uri: file:///tmp/foo
Application File
Now, open the application.js file in the test directory.
nano test/application.js
Replace the sample contents of the file with the contents shown below:
Source({name:"localmongo", namespace:"foo.bar"})
.transform({filename: "transformers/addFullName.js", namespace: "foo.bar"})
.save({name:"es", namespace:"foo.bar"});
Transformation File
Let's say we want the documents being stored in Elasticsearch to have another field called fullName. For that, we need to create a new transform file, test/transformers/addFullName.js.
nano test/transformers/addFullName.js
Paste the contents below into the file. Save and exit as described earlier.
module.exports = function(doc) {
console.log(JSON.stringify(doc)); //If you are curious you can listen in on what's changed and being copied.
doc._id = doc.data._id['$oid'];
doc["fullName"] = doc["firstName"] + " " + doc["lastName"];
return doc
}
The first line is necessary to tackle the way Transporter handles MongoDB's ObjectId() field. The second line tells Transporter to concatenate firstName and lastName of mongoDB to form fullName of ES.
This is a simple transformation for the example, but with a little JavaScript you can do more complex data manipulation as you prepare your data for searching.
Executing Transporter:
If you have a simple standalone instance of MongoDB, it won't be being replicated, there'll be no oplog and the Transporter won't be able to detect the changes. To convert a standalone MongoDB to a single node replica set you'll need to start the server with --replSet rs0 (rs0 is just a name for the set) and when running, log in with the Mongo shell and run rs.initiate() to get the server to configure itself.
make sure you are in the transporter directory:
cd ~/go/src/github.com/compose/transporter
Execute the following command to sync the data:
transporter run --config ./test/config.yaml ./test/application.js
The synchronizing mechanism that you are looking for is called rivers in Elasticsearch.
In this specific case, you should sync the specific Mongodb collection that you are using to save your Scrapy data with the Elasticsearch index.
For details about how to proceed you should check out the following links:
setting up the mongodb river
mongodb river plugin
Also I recommend checking out the answers on the Elasticsearch tags here on Stackoverflow. I have found detailed answers for most of the common problems regarding implementation details.

Data storage in mongodb

first of all please forgive me for asking a silly question but I am new to mongodb and just installed it on my windows platform by following this installation guide :http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/
It says "MongoDB requires a data folder to store its files. The default location for the MongoDB data directory is C:\data\db.You can specify an alternate path for data files using the --dbpath option to mongod.exe."
So I created a folder d://data/db in my computer and issued a command
C:\mongodb\bin\mongod.exe --dbpath d:\mongodb\data
Then it says
"At the mongo.exe prompt, issue the following two commands to insert a record in the test collection of the default test database and then retrieve that record:
db.test.save( { a: 1 } )
db.test.find()"
I issued this to commands to save and retrieve the objects and its working fine but what is this default test database? where is it? Moreover where this object is stored? Where I can find this file?
what is this default test database?
When you connect to a mongod server without specifying a database, a default database "test" is selected. Since databases are created lazily, it may not even exist until you write to it.
db.test.save( { a: 1 } )
After this line is executed, database with current name ("test" by default) is created (if didn't exist already) and in it, collection "test" is created (if didn't exist already).
where is it? Moreover where this object is stored? Where I can find this file?
All databases are ultimately stored as files in your data dir. Look for "test.*" files there.
mongod.lock, is the file which provides the PID of your running mongod instance. When you start a mongod instance, MongoDB check if the lock is empty to start cleanly mongod. Then MongoDB registered the PID number of the running mongod instance in this lock file.
MongoDB delete the contains of this lock file when you shutdown cleanly your server,
mongod --shutdown -- dbpath <path name> --port <port number>

Moving MongoDB's data folder?

I have 2 computers in different places (so it's impossible to use the same wifi network).
One contains about 50GBs of data (MongoDB files) that I want to move to the second one which has much more computation power for analysis. But how can I make MongoDB on the second machine recognize that folder?
When you start mongodprocess you provide an argument to it --dbpath /directory which is how it knows where the data folder is.
All you need to do is:
stop the mongod process on the old computer. wait till it exits.
copy the entire /data/db directory to the new computer
start mongod process on the new computer giving it --dbpath /newdirectory argument.
The mongod on the new machine will use the folder you indicate with --dbpath. There is no need to "recognize" as there is nothing machine specific in that folder, it's just data.
I did this myself recently, and I wanted to provide some extra considerations to be aware of, in case readers (like me) run into issues.
The following information is specific to *nix systems, but it may be applicable with very heavy modification to Windows.
If the source data is in a mongo server that you can still run (preferred)
Look into and make use of mongodump and mongorestore. That is probably safer, and it's the official way to migrate your database.
If you never made a dump and can't anymore
Yes, the data directory can be directly copied; however, you also need to make sure that the mongodb user has complete access to the directory after you copy it.
My steps are as follows. On the machine you want to transfer an old database to:
Edit /etc/mongod.conf and change the dbPath field to the desired location.
Use the following script as a reference, or tailor it and run it on your system, at your own risk.
I do not guarantee this works on every system --> please verify it manually.
I also cannot guarantee it works perfectly in every case.
WARNING: will delete everything in the target data directory you specify.
I can say, however, that it worked on my system, and that it passes shellcheck.
The important part is simply copying over the old database directory, and giving mongodb access to it through chown.
#!/bin/bash
TARGET_DATA_DIRECTORY=/path/to/target/data/directory # modify this
SOURCE_DATA_DIRECTORY=/path/to/old/data/directory # modify this too
echo shutting down mongod...
sudo systemctl stop mongod
if test "$TARGET_DATA_DIRECTORY"; then
echo removing existing data directory...
sudo rm -rf "$TARGET_DATA_DIRECTORY"
fi
echo copying backed up data directory...
sudo cp -r "$SOURCE_DATA_DIRECTORY" "$TARGET_DATA_DIRECTORY"
sudo chown -R mongodb "$TARGET_DATA_DIRECTORY"
echo starting mongod back up...
sudo systemctl start mongod
sudo systemctl status mongod # for verification
quite easy for windows, just move the data folder to the target location
run cmd
"C:\your\mongodb\bin-path\mongod.exe" --dbpath="c:\what\ever\path\data\db"
In case of Windows in case you need just to configure new path for data, all you need to create new folder, for example D:\dev\mongoDb-data, open C:\Program Files\MongoDB\Server\6.0\bin\mongod.cfg and change there path :
Then, restart your PC. Check folder - it should contains new files/folders with data.
Maybe what you didn't do was export or dump the database.
Databases aren't portable therefore must be exported or created as a dumpfile.
Here is another question where the answer is further explained