Mongo rs.initiate() excessive disk space requirements

Mongo rs.initiate() excessive disk space requirements - mongodb

I have 2 mongod instances running with the following parameters
--noprealloc --smallfiles --replSet mongors1 --dbpath /data/db --nojournal
The goal of the exercise is to create a replicated environment with a minimal disk footprint for local development purposes.
At this point in time, all is good with each respective data system being around ~32M and having the following
ls -o data/db
total 32784
-rw------- 1 999 16777216 Sep 22 11:38 local.0
-rw------- 1 999 16777216 Sep 22 11:38 local.ns
-rwxr-xr-x 1 999 2 Sep 22 11:38 mongod.lock
-rw-r--r-- 1 999 69 Sep 22 11:38 storage.bson
drwxr-xr-x 2 999 4096 Sep 22 11:38 _tmp
After logging on to the first member and running rs.initiate(); an additional 1G of disk space is utilized.
ls -o data/db
total 1080856
-rw------- 1 999 16777216 Sep 22 11:39 local.0
-rw------- 1 999 536608768 Sep 22 11:39 local.1
-rw------- 1 999 536608768 Sep 22 11:39 local.2
-rw------- 1 999 16777216 Sep 22 11:39 local.ns
-rwxr-xr-x 1 999 2 Sep 22 11:38 mongod.lock
-rw-r--r-- 1 999 69 Sep 22 11:38 storage.bson
drwxr-xr-x 2 999 4096 Sep 22 11:39 _tmp
This seems excessive given the properties of the nodes being replicated and the configuration they are running.
Mongo 3.0.6 is the version in use.
Eventually this will be scaled up to replica sets with 3 members across 2+ shards. A minimal disk requirement of 6Gb to store zero data initially seems sub-optimal.
Is there a way to reduce this to something more representative of the nodes needs?
Any help is appreciated. Thanks in advance

The local database contains the oplog, and I'll leave you to research yourself as to what size this should be for a given node. To address the question at hand, from the docs:
For 64-bit Linux, Solaris, FreeBSD, and Windows systems, MongoDB
allocates 5% of the available free disk space, but will always
allocate at least 1 gigabyte and never more than 50 gigabytes.
That's where your usage is coming from - to alter that allocation you will either need to resize the oplog or, if starting from scratch, look at the oplogSizeMB option (or for the CLI equivalent see here).

In addition to what Adam said, add the
--oplogSize X
to your parameters and replace X with the amount of MB you want the oplog to be.

Related

MongoDB: rsync whole db folder, except one huge collection that doesn't change

Here is content of MongoDB /data/db folder content. Where one of the collections is 723GB. And other collections just few KB.
-rw------- 1 lxd docker 723G Dec 5 10:15 collection-0-1080408413244540209.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 collection-0-3112968025499504303.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 collection-2-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 collection-4-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:15 collection-7-1080408413244540209.wt
-rw------- 1 lxd docker 0 Dec 5 10:14 .dbshell
drwx------ 2 lxd docker 90 Dec 5 10:15 diagnostic.data
-rw------- 1 lxd docker 8.1G Dec 5 10:15 index-1-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:14 index-1-3112968025499504303.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 index-3-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 index-5-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 index-6-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:14 index-8-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:15 index-9-1080408413244540209.wt
drwx------ 2 lxd docker 110 Dec 5 10:15 journal
-rw------- 1 lxd docker 36K Dec 5 10:15 _mdb_catalog.wt
-rw------- 1 lxd docker 0 Dec 5 10:15 mongod.lock
-rw------- 1 lxd docker 36K Dec 5 10:15 sizeStorer.wt
-rw------- 1 lxd docker 114 Dec 5 10:14 storage.bson
-rw------- 1 lxd docker 50 Dec 5 10:14 WiredTiger
-rw------- 1 lxd docker 4.0K Dec 5 10:15 WiredTigerHS.wt
-rw------- 1 lxd docker 21 Dec 5 10:14 WiredTiger.lock
-rw------- 1 lxd docker 1.5K Dec 5 10:15 WiredTiger.turtle
-rw------- 1 lxd docker 84K Dec 5 10:15 WiredTiger.wt
I simply backup whole docker volume to another server via rsync.
I never change content of the 723GB collection, but for some reason MongoDB update the file approximately once in a week.
Because of that rsync also update that file remotely. And because I'm using snapshots, every week new snapshot add another 723GB to the storage, that is unacceptable and cause me the problems.
To resolve that problem, I simply added 723GB collection into rsync exception and do not upload it anymore. Is that fine? May I after 1 year still use my backup to restore the server if I do not update collection-0-1080408413244540209.wt file any more?

By default, rsync only copies new or changed files from a source to destination so you dont need to add the file to exception list if you copy the mounted snapshot files. From the other hand the wiredTiger is a bit sensitive and generate checksum in the data root folder based on checkpoints from all collections so in case the file differ there is a big chance the mongod process to not be able to start from the restored snapshot. So I would suggest to not exclude the file completely but leave to rsync to check every time if file is same or differ and need to be copied again or not.
P.S.
Note that the scenario you have described was valid with the deprecated previous mongoDB storage engine mmapv1 where it was even possible to copy the collection inside running instance on the fly , unfortunately the wiredTiger do not allow it , but offer other advantages.

PostgreSQL keeps WAL segments not required by any replication slot

I have wal_keep_segments set to 3000. But directory pg_xlog contains more than 6000 WAL segments. Interesting thing that there are ~ 3000 files dated after Aug 14, so files dated before Aug 14 should not be exists, I guess. Also these files have an executable bit set.
$ ls -al pg_xlog | grep -A2 -B2 00000001000034DB0000003B
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB00000039
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB0000003A
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB0000003B
-rw------- 1 postgres postgres 16777216 Aug 14 19:17 0000000100003826000000EA
-rw------- 1 postgres postgres 16777216 Aug 14 19:17 0000000100003826000000EB
```
This cluster has no replication slots, archive_mode is enabled but archive_command is set to /bin/true. I think the new WAL segments are recycled and total amount is about 6000 but postgres does not delete the old files for some reason. Any ideas?

PostgreSQL is not in the habit of setting executable flags on WAL segments.
Besides, it looks like there is a gap in the numbering.
These files must be there by accident, you can delete them.

Unable to start MongoDB 3.0.2 service on CentOS 7

We are setting up a MongoDB server for the production environment on Amazon EC2 instance, but could not able to start the service. I've followed this documentation for setup. Here are the steps, I've taken for setting up the server:
Added following to /etc/yum.repos.d/mongodb-org-3.0.repo
[mongodb-org-3.0]
name=MongoDB Repository
baseurl=http://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.0/x86_64/
gpgcheck=0
enabled=1
And installed MongoDB 3.0.2 using sudo yum install -y mongodb-org-3.0.2
Created three partitions for data, journal & log:
sudo mkdir /mongo
sudo mkdir /mongo/data
sudo mkdir /mongo/log
sudo mkdir /mongo/journal
Created file system for three separate partitions:
sudo mkfs.ext4 /dev/xvdb
sudo mkfs.ext4 /dev/xvdc
sudo mkfs.ext4 /dev/xvdd
Created entry in fstab for reboot:
echo '/dev/xvdb /mongo/data ext4 defaults,auto,noatime,noexec 0 0
/dev/xvdc /mongo/journal ext4 defaults,auto,noatime,noexec 0 0
/dev/xvdd /mongo/log ext4 defaults,auto,noatime,noexec 0 0' | sudo tee -a /etc/fstab
And mounted the partitions:
sudo mount /mongo/data
sudo mount /mongo/journal
sudo mount /mongo/log
Given the permissions and created link
sudo chown mongod:mongod /mongo/data /mongo/journal /mongo/log
sudo ln -s /mongo/journal /mongo/data/journal
Configured ulimit & read ahead settings as given in the documentation link above. Verified permissions and partitions:
[deployer#prod-mongo ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 8.0G 1.3G 6.8G 16% /
devtmpfs 3.6G 0 3.6G 0% /dev
tmpfs 3.5G 0 3.5G 0% /dev/shm
tmpfs 3.5G 57M 3.4G 2% /run
tmpfs 3.5G 0 3.5G 0% /sys/fs/cgroup
/dev/xvdc 7.8G 36M 7.3G 1% /mongo/journal
/dev/xvdb 150G 51M 149G 1% /mongo/data
/dev/xvdd 3.9G 16M 3.6G 1% /mongo/log
Permissions:
[deployer#prod-mongo ~]$ ll /
total 32
lrwxrwxrwx. 1 root root 7 Sep 29 2014 bin -> usr/bin
dr-xr-xr-x. 4 root root 4096 Sep 29 2014 boot
drwxr-xr-x. 17 root root 2860 May 11 12:11 dev
lrwxrwxrwx. 1 root root 7 Sep 29 2014 lib -> usr/lib
lrwxrwxrwx. 1 root root 9 Sep 29 2014 lib64 -> usr/lib64
drwxr-xr-x. 2 root root 6 Jun 10 2014 mnt
drwxr-xr-x. 5 mongod mongod 41 May 11 05:06 mongo
drwxr-xr-x. 21 root root 660 May 11 12:47 run
lrwxrwxrwx. 1 root root 8 Sep 29 2014 sbin -> usr/sbin
Inside /mongo
[deployer#prod-mongo ~]$ ll /mongo/
total 12
drwxr-xr-x. 3 mongod mongod 4096 May 11 07:33 data
drwxr-xr-x. 3 mongod mongod 4096 May 11 07:31 journal
drwxr-xr-x. 3 mongod mongod 4096 May 11 08:58 log
After changing the configurations inside /etc/mongodb.conf
logpath=/mongo/log/mongod.log
dbpath=/mongo/data
and when I'm doing: sudo service mongod start, I'm getting this error:
Starting mongod (via systemctl): Job for mongod.service failed. See 'systemctl status mongod.service' and 'journalctl -xn' for details.
[FAILED]
Further logging:
[deployer#prod-mongo ~]$ sudo systemctl status mongod.service
mongod.service - SYSV: Mongo is a scalable, document-oriented database.
Loaded: loaded (/etc/rc.d/init.d/mongod)
Active: failed (Result: exit-code) since Tue 2015-05-12 04:42:10 UTC; 42s ago
Process: 22881 ExecStart=/etc/rc.d/init.d/mongod start (code=exited, status=1/FAILURE)
May 11 04:42:10 ip-xx-xx-xx-xx.local runuser[22887]: pam_unix(runuser:session): session opened for user mongod by (uid=0)
May 11 04:42:10 ip-xx-xx-xx-xx.localdomain runuser[22887]: pam_unix(runuser:session): session closed for user mongod
May 11 04:42:10 ip-xx-xx-xx-xx.local mongod[22881]: Starting mongod: [FAILED]
May 11 04:42:10 ip-xx-xx-xx-xx.local systemd[1]: mongod.service: control process exited, code=exited status=1
May 11 04:42:10 ip-xx-xx-xx-xx.local systemd[1]: Failed to start SYSV: Mongo is a scalable, document-oriented database..
May 11 04:42:10 ip-xx-xx-xx-xx.local systemd[1]: Unit mongod.service entered failed state.
I've followed various articles and blog posts and StackExchange answers but didn't get any solution. Am I missing something?
Update: If I'm directly running the mongodb service from the normal user something like this: sudo mongod --logpath ~/mongod.log --dbpath ~/mongodata, then this service is starting properly.
We tried changing the path of the pid file to another directory, that didn't help either.

I'm guessing you're running a flavour of Linux that uses SELinux (RHEL or CentOS 7, perhaps?)
If so, the issue is that you don't have a permissive policy on your /mongo/ directory that permits access to daemons (like the mongod service.)
From Wikipedia:
SELinux can potentially control which activities a system allows each
user, process and daemon, with very precise specifications. However,
it is mostly used to confine daemons[citation needed] like database
engines or web servers that have more clearly defined data access and
activity rights. This limits potential harm from a confined daemon
that becomes compromised. Ordinary user-processes often run in the
unconfined domain, not restricted by SELinux but still restricted by
the classic Linux access rights
To check whether this is the issue, try this at the shell:
sudo setenforce 0
This should disable SELinux policies and allow the service to run.
For a more permanent solution, see https://wiki.centos.org/HowTos/SELinux

I ran into this problem and actually found a solution for me.
In short, mongodb 3.2 uses the user 'mongod' while older versions use 'mongodb'. Some of the files and directories were owned by 'mongodb' (the older user). Once I chmod'd them to the 'mongod' user, I was able to use systemctl to control the mongod process.
More specifically, it was the "/var/log/mongodb/*" files that had the wrong user ownership.
root#<HOST>:# ls -alh /var/log/mongodb
total 664K
drwxr-xr-x 2 mongod mongod 4.0K Oct 27 12:08 .
drwxr-xr-x. 22 root root 4.0K Oct 27 11:51 ..
-rw-r--r-- 1 mongodb mongodb 3.8K Oct 27 11:48 mongod.log
-rw-r--r-- 1 mongodb mongodb 19K Apr 14 2016 mongod.log.2016-04-14T18-29-34
-rw-r--r-- 1 mongodb mongodb 2.8K Apr 14 2016 mongod.log.2016-04-14T18-30-13
-rw-r--r-- 1 mongodb mongodb 12K Apr 14 2016 mongod.log.2016-04-14T22-27-27
-rw-r--r-- 1 mongodb mongodb 11K Apr 14 2016 mongod.log.2016-04-14T22-29-12
-rw-r--r-- 1 mongodb mongodb 5.6K Apr 18 2016 mongod.log-20160418.gz
-rw-r--r-- 1 mongodb mongodb 0 Apr 18 2016 mongod.log.2016-09-09T17-33-48
-rw-r--r-- 1 mongodb mongodb 3.6K Sep 9 11:34 mongod.log.2016-09-09T17-34-52
-rw-r--r-- 1 mongodb mongodb 23K Sep 9 11:49 mongod.log.2016-09-09T17-49-49
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 11:55 mongod.log.2016-09-09T17-55-15
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:02 mongod.log.2016-09-09T18-02-26
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:13 mongod.log.2016-09-09T18-13-17
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:25 mongod.log.2016-09-09T18-25-01
-rw-r--r-- 1 mongodb mongodb 5.2K Sep 9 12:47 mongod.log.2016-09-09T18-47-54
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:52 mongod.log.2016-09-09T18-52-16
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:54 mongod.log.2016-09-09T18-54-49
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 13:01 mongod.log.2016-09-09T19-01-22
-rw-r--r-- 1 mongodb mongodb 3.0K Sep 9 13:03 mongod.log.2016-09-09T19-03-21
-rw-r--r-- 1 mongodb mongodb 215K Sep 9 14:25 mongod.log.2016-09-09T20-25-59
-rw-r--r-- 1 mongodb mongodb 281K Sep 10 03:42 mongod.log-20160910
-rw-r--r-- 1 mongodb mongodb 0 Sep 10 03:42 mongod.log.2016-10-27T17-42-42
-rw-r----- 1 mongod mongod 0 Sep 29 22:03 mongod.log.rpmnew
Notice the owner of the directory is 'mongod' (the new user) while the log files are all owned by 'mongodb' (the old user).

In case, anyone encountered the same issue with MongoDB startup, here is the thread of comments https://jira.mongodb.org/browse/SERVER-18439. This is scheduled to be fixed in 3.1.

How can I manage the disk size of a GridFS MongoDB database?

I have a GridFS MongoDB database that I need to manage the size of. It has been running very well since it was created, but I have never really looked at its disk size until now.
Judging by this outout from the db.stats() command
> db.stats()
{
"db" : "documents",
"collections" : 4,
"objects" : 10967,
"avgObjSize" : 52491.573994711405,
"dataSize" : 575675092,
"storageSize" : 595255296,
"numExtents" : 24,
"indexes" : 4,
"indexSize" : 686784,
"fileSize" : 2080374784,
"nsSizeMB" : 16,
"ok" : 1
}
it seems the database itself is roughly 600MB. This size makes sense to me as it is the same size as the database backups I get from mongodump. The file size is far larger though, and it gets worse when I look in the data directory itself in /var/lib/mongodb:
root#deathstar:/var/lib/mongodb# ls -la
total 2474036
drwxr-xr-x 5 mongodb mongodb 4096 Apr 15 09:28 .
drwxr-xr-x 62 root root 4096 Mar 4 07:48 ..
drwxr-xr-x 2 mongodb mongodb 4096 Apr 13 11:48 documents
-rw------- 1 mongodb mongodb 67108864 Apr 15 09:16 documents.0
-rw------- 1 mongodb mongodb 134217728 Apr 13 11:48 documents.1
-rw------- 1 mongodb mongodb 268435456 Apr 13 11:48 documents.2
-rw------- 1 mongodb mongodb 536870912 Apr 15 09:16 documents.3
-rw------- 1 mongodb mongodb 1073741824 Apr 13 11:50 documents.4
-rw------- 1 mongodb mongodb 16777216 Apr 15 09:16 documents.ns
drwxr-xr-x 2 mongodb mongodb 4096 Apr 13 11:50 journal
-rwxr-xr-x 1 mongodb mongodb 5 Apr 13 11:46 mongod.lock
drwxr-xr-x 2 mongodb mongodb 4096 Apr 15 09:28 _tmp
-rw------- 1 mongodb mongodb 67108864 Apr 15 09:28 -v.0
-rw------- 1 mongodb mongodb 67108864 Apr 15 09:28 v.0
-rw------- 1 mongodb mongodb 134217728 Apr 15 09:28 -v.1
-rw------- 1 mongodb mongodb 134217728 Apr 15 09:28 v.1
-rw------- 1 mongodb mongodb 16777216 Apr 15 09:28 -v.ns
-rw------- 1 mongodb mongodb 16777216 Apr 15 09:28 v.ns
And this in /var/lib/mongodb/journal:
root#deathstar:/var/lib/mongodb/journal# ls -la
total 3145752
drwxr-xr-x 2 mongodb mongodb 4096 Apr 13 11:50 .
drwxr-xr-x 5 mongodb mongodb 4096 Apr 15 09:28 ..
-rw------- 1 mongodb mongodb 1073741824 Apr 15 09:28 j._2
-rw------- 1 mongodb mongodb 88 Apr 15 09:28 lsn
-rw------- 1 mongodb mongodb 1073741824 May 5 2012 prealloc.1
-rw------- 1 mongodb mongodb 1073741824 May 5 2012 prealloc.2
Now correct me if I'm wrong, but I am basically looking at 5.5GB disk size for a 600MB database. That is pretty inefficient.
How can I reduce the disk size? Is there a similar command to OPTIMIZE TABLE in MySQL?
I don't know whether GridFS is a different beast from a regular database, but I tried running compact but it didn't do anything to the disk size.
And how about the journal files? Can I somehow reduce the disk size of all journal files?

The issue with large files is not specific to GridFS.
Journal is there to provide durability, and MongoDB always preallocates files before it needs them. I would recommend not changing anything here - i.e. continue using journaling to protect your files in case of an unexpected crash of the server.
You see much smaller files with mongodump because you don't get the preallocated data files nor journal files.
If you want to have smaller DB directory, I recommend looking at --smallfiles and --noprealloc options to mongod. Both affect one when space is allocated and how much is allocated at a time.

Finding device base address to communicate via inb() and outb()

I am trying to communicate with a disk drive using inb(), inw(), outb() and outw() commands so I can find specific information about the drive. However, to use these commands, I need the correct I/O ports for the device. When I have the correct I/O ports, I can find the information I am looking for very easily, however, I do not know a way to find the base address of a device's I/O ports in Linux.
In DOS, I am able to use Hdat2 to find the device's base address, however, I am trying to find the address in Linux. Is there a way to find which device maps to which I/O port in Linux?
There is a file in /proc called ioports that contains some information but I don't how to associate this information with specific devices.
Any help would be greatly appreciated. Thanks!

So I did find something, although it isn't the most elegant solution and it definitely doesn't work everywhere, it has worked on my hardware so I figured I would share.
First, you have to get the address of the SATA Controller from the lspci command like Nikolai showed (the -D just shows the full domain numbers):
# lspci -D
...
0000:00:1f.2 SATA controller: Intel Corporation 82801IR 6 port SATA AHCI Controller
...
Now with this address (0000:00:1f.2) you can go into /sys.
In /sys/bus/pci/devices, your device should be listed:
# ls -l /sys/bus/pci/devices
...
lrwxrwxrwx 1 root root 0 Jan 14 12:35 0000:00:1f.2 -> ../../../devices/pci0000:00/0000:00:1f.2
Now in this directory there will be several hostX directories.
# ls -l /sys/bus/pci/devices/0000\:00\:1f.2/
...
drwxr-xr-x 4 root root 0 Jan 13 12:40 host0
drwxr-xr-x 4 root root 0 Jan 13 12:40 host1
drwxr-xr-x 3 root root 0 Jan 13 12:40 host2
drwxr-xr-x 3 root root 0 Jan 13 12:40 host3
drwxr-xr-x 3 root root 0 Jan 13 12:40 host4
drwxr-xr-x 4 root root 0 Jan 14 08:21 host5
...
In one of these hostX directories, there will be a targetX:X:X directory. This targetX:X:X directory will then have a directory called X:X:X:X (the X's are numbers that can vary).
# ls -R /sys/bus/pci/devices/0000\:00\:1f.2/host0
/sys/bus/pci/devices/0000:00:1f.2/host0:
power scsi_host:host0 target0:0:0 uevent
/sys/bus/pci/devices/0000:00:1f.2/host0/target0:0:0:
0:0:0:0 power uevent
...
In the X:X:X:X directory, there is a link named "block:sdX" (where X is a letter). This sdX is the name of the drive that this directory corresponds to.
# ls -l /sys/bus/pci/devices/0000\:00\:1f.2/host0/target0\:0\:0/0\:0\:0\:0/
lrwxrwxrwx 1 root root 0 Jan 14 15:01 block:sda -> ../../../../../../block/sda
So /dev/sda corresponds to host 0 on the SATA Controller at 0000:00:1f.2. Now to find the address that we can use to talk to /dev/sda through inb() and outb() commands, we look in the file named "resource" in /sys/bus/pci/devices/0000:00:1f.2/.
# cat /sys/bus/pci/devices/0000\:00\:1f.2/resource
0x000000000000fe00 0x000000000000fe07 0x0000000000000101
0x000000000000fe10 0x000000000000fe13 0x0000000000000101
0x000000000000fe20 0x000000000000fe27 0x0000000000000101
0x000000000000fe30 0x000000000000fe33 0x0000000000000101
0x000000000000fec0 0x000000000000fedf 0x0000000000000101
0x00000000ff970000 0x00000000ff9707ff 0x0000000000000200
0x0000000000000000 0x0000000000000000 0x0000000000000000
The address we are looking for is fe00, which is on the first line. We want the first line because it is host 0, if it were host 1, we would look on the second line, and host 2 the third line, and so on. The host number was given by the hostX directory that we found earlier. Each line in the resource file is separated into 3 columns:
Column 1 = beginning address
Column 2 = end address
Column 3 = flags
So this is how I get from /dev/sda to 0xfe00 in order to send commands to the device.
If anybody know any better way to do this, I am all ears...

The device is most probably hanging off of the PCI bus, so lspci(8) is the first to look at. Then figure out where under /sys the controller is described. Here, for example, I have:
~$ lspci
...
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
...
~$ ll /sys/bus/pci/devices/0000\:03\:00.0/
total 0
drwxr-xr-x 4 root root 0 Dec 16 11:57 ./
drwxr-xr-x 6 root root 0 Dec 16 11:57 ../
-rw-r--r-- 1 root root 4096 Dec 16 11:57 broken_parity_status
lrwxrwxrwx 1 root root 0 Dec 16 11:57 bus -> ../../../../bus/pci/
-r--r--r-- 1 root root 4096 Dec 16 11:57 class
-rw-r--r-- 1 root root 4096 Dec 16 11:57 config
-r--r--r-- 1 root root 4096 Dec 16 11:57 device
lrwxrwxrwx 1 root root 0 Dec 16 11:57 driver -> ../../../../bus/pci/drivers/megaraid_sas/
-rw------- 1 root root 4096 Dec 16 11:57 enable
drwxr-xr-x 5 root root 0 Dec 16 11:57 host0/
-r--r--r-- 1 root root 4096 Dec 16 11:57 irq
-r--r--r-- 1 root root 4096 Dec 16 11:57 local_cpus
-r--r--r-- 1 root root 4096 Dec 16 11:57 modalias
-r--r--r-- 1 root root 4096 Dec 16 11:57 pools
drwxr-xr-x 2 root root 0 Dec 16 11:57 power/
-r--r--r-- 1 root root 4096 Dec 16 11:57 resource
-rw------- 1 root root 262144 Dec 16 11:57 resource0
-rw------- 1 root root 256 Dec 16 11:57 resource2
-rw------- 1 root root 262144 Dec 16 11:57 resource3
-r-------- 1 root root 32768 Dec 16 11:57 rom
lrwxrwxrwx 1 root root 0 Dec 16 11:57 subsystem -> ../../../../bus/pci/
-r--r--r-- 1 root root 4096 Dec 16 11:57 subsystem_device
-r--r--r-- 1 root root 4096 Dec 16 11:57 subsystem_vendor
--w------- 1 root root 4096 Dec 16 11:57 uevent
-r--r--r-- 1 root root 4096 Dec 16 11:57 vendor
This shows controller's PCI configuration space. See the details in Linux Device Drivers, Third Edition. Chapter 12: PCI Drivers.
Edit:
Take a look into this partition and mass-storage naming howto for help on Linux drive naming.

Are you accessing hardware from a userspace program or from a kernel module?
If you're doing it from userspace, the reason it's hard to find physical address information is that nobody accesses hardware that way; anything that needs to touch raw hardware lives in the kernel.
If you're writing a kernel module, you get address information from in-kernel structures, not by accessing /sys/...

The path seems have chanced in kernel 3.10, this is how I find the corresponding device node:
$ ls -l /sys/bus/pci/devices/0000\:00\:1f.2/ata1/host0/target0\:0\:0/0\:0\:0\:0/block/
total 0
drwxr-xr-x 10 root root 0 Oct 17 08:35 sda
$ ls -l /sys/bus/pci/devices/0000\:00\:1f.2/ata2/host1/target1\:0\:0/1\:0\:0\:0/block/
total 0
drwxr-xr-x 9 root root 0 Oct 17 08:35 sdb