I'm trying out the OpenStreetMap bundler program and I can't find details on the camera position data. The point cloud data is in a *.ply file that looks like this:
ply
format ascii 1.0
element face 0
property list uchar int vertex_indices
element vertex 1340
property float x
property float y
property float z
property uchar diffuse_red
property uchar diffuse_green
property uchar diffuse_blue
end_header
-1.967914e-001 -8.918888e-001 -3.318706e+000 92 86 88
-1.745216e-001 -2.186521e-001 -3.227759e+000 50 33 31
-1.585826e-001 -1.894233e-001 -3.271651e+000 61 43 43
...
-2.649703e-003 2.197792e-002 3.906710e-002 0 255 0
-2.354721e-003 2.235805e-002 -1.093058e-002 255 255 0
5.296331e-003 4.755635e-001 -1.298959e+000 255 0 0
3.155302e-003 4.634443e-001 -1.347420e+000 255 255 0
1.910245e-003 2.891324e-001 -1.070228e-001 0 255 0
2.508708e-003 2.884968e-001 -1.570152e-001 255 255 0
-2.246127e-002 -6.257610e-001 9.884196e-001 255 0 0
-2.333330e-002 -6.187732e-001 9.389180e-001 255 255 0
The last eight lines appear to be the positions for four cameras (from four images). One line is position, second line is orientation. The position colors are either green or red and the orientation is yellow.
I can't find info on this so I'm wondering if this is correct and also what does red and green mean? Good/bad data? Any other info about using osm-bundler results is helpful.
I'm also looking at how to get the camera position data from Bundler (note I'm not using osm-bundler but the original program). However, as well as outputting the PLY file bundler also outputs an ASCII file called bundle.out. This contains parameters that allow you to calculate the camera positions, as described in the bundler documentation.
Bundler incrementally solves for the camera positions/poses and outputs the final answer in the bundler.out file. The .ply file contains point cloud vertices, faces, and RGB color information. The .ply file does not contain the camera poses. You can find information about the bundler.out file here. ( osm-bundler uses the Noah Slavely's bundler program, so this answer applies to both of your questions )
http://www.cs.cornell.edu/~snavely/bundler/bundler-v0.4-manual.html#S6
So, you look at the first number in the second row to determine the number of cameras. The next number tells you the number of points which follow the cameras. Each camera entry consists of five rows.
<f> <k1> <k2> row one
<R> rows two, three, and four
<t> row five
So, lines one and two give you header information. Then each group of five rows is a seperate camera entry starting with camera number zero. If rows contain zero, then their is no data for that camera/image.
If the first two rows bundle.out contain
#Bundle file v0.3
16 32675
There will be 16 cameras and 32675 points. The camera information will be on lines
3 through (16*5 + 2). In vi or emacs you can display line numbers to help you examine the file. ( In vi, :set numbers on ) Remember that the rotation matrix is three lines of three numbers and the translation three vector is the fith and last line of a camera definition.
The points follow the camera definitions. You can find information about the format of points at the link I provided above.
Related
I want to make a JPEG where for each of the 3 components (Y, Cb, Cr), you encode a 8x8 block one after another, and then move to the next 8x8 block in the image.
E.X.
A 16x16 image exists.
write header (is there anything special I need to mark? I opened a known jpeg to confirm I was writing quantization tables and Huffman tables right, is there a special thing I need to make to make this format work? Also I DON'T want subsample. I want a 1:1 ratio (from my understanding this means I encode 8x8 pixels into a 8x8 block to process through the steps that I am about to name, correct? How do I mark that in the header? With 0x11?).
Steps:
Grab the first 8x8 (top left) of this image.
For Y: DCTII-\>quant-\>RLE-\>Huffman Encode
then, for Cb: DCTII-\>quant-\>RLE-\>Huffman Encode
then, for Cr: DCTII-\>quant-\>RLE-\>Huffman Encode
repeat for top right -\> bottom left -\> bottom right 8x8 pixel block in image
write end of image tag, done.
In the data stream it should go: DC-Y -> AC-Y -> DC-Cb -> AC-Cb -> DC-Cr -> AC-Cr, and so forth yes? Is there any tag I need to insert between components, between DC/AC changes, or between 8x8 pixel blocks? I assume between components a EOB Huffman code is present (that's what I have currently).
Negative numbers:
What format are they? 2's comp? -3 for example would be 101 in 2's comp (3 bit size), but in JPEG you would call this 2 bit size and only encode the 01 portion not the "sign" or the MSB bit right? 3 would be 011 in 2's comp 3 bit, but by the same logic its just 11 (2 bit size) and encoded without sign (MSB) in JPEG right? Anything I am missing?
DC vals:
3 components mean you keep track of 3 different previous DC vals right? For example Y-DC-prev is initialized to 0. Then the first Y-DC val is let's say 25. 25-0 = 25, we encode 25. We then remember 25 for the Y components next DC (not the Cb or Cr component right? They have their own "memories"?) Then DC-Y is lets say 40. Diff = 40-25 = 15, encode 15. remember 40 (not 15 right?). And so forth?
I followed the example here: WIKI. My code can get the exact values all the way down to RLE, which makes me think my Huffman encoding might have the bug. When I make a 16x16 image that basically repeats the image on Wikipedia in a 2x2 tile (also makes the image not grey scale since I force Cb Cr to have the same value as Y; I know the image should have a funky tint because of this, no worries.). I end up getting a semi-believable value for the top right block, then the rest turn into garbage. This led me to believe its my file organization or Huffman encoding that is going wrong. To do a quick check (this is from the Wikipedia example):
FORMAT: (RUNLENGTH, SIZE)(VALUE)
(0, 2)(-3);
(1, 2)(-3);
(0, 1)(-2);
(0, 2)(-6);
(0, 1)(2);
(0, 1)(-4);
(0, 1)(1);
(0, 2)(-3);
(0, 1)(1);
(0, 1)(1);
(0, 2)(5);
(0, 1)(1);
(0, 1)(2);
(0, 1)(-1);
(0, 1)(1);
(0, 1)(-1);
(0, 1)(2);
(5, 1)(-1);
(0, 1)(-1);
(0, 0);
standard Huffman AC-Y table in the spec: TABLE-PAGE154 says 0/2 is code 01. We know that -3 is 01 in 2's comp. So we append 0101 to the stream and then get to the next entry. 1/2 is 11011 from the table, -3 is still 01. So we append 1101101 to the stream and keep going.... all the way to the end where we see a 0x0 which is just 1010. Then we rinse and repeat for the 2 other components, then we rinse and repeat for the rest of the 8x8 pixel blocks in the image yes? The DC val was -26 which is 00110 (size 5) in 2's comp w/o MSB / sign. size 5 for DC-Y codes to 110 according to the Huffman table in the spec (page 153). This means the bit stream should start:
110_00110_01_01_11011_01_...
Obviously the _ are just for readability, I don't add those to the actual file.
This is the image I am getting so far for this curious: incorrect image. I hard coded the 8x8 blocks to always match the ones from Wikipedia so we should see a tilized form of the image, it should be off color due to the 2 new chroma components (given the same exact values as Y).
I've been working on this for days, any help is much appreciated!!
I have a Keyence Line Laser System LJ-X 8000, that I use to scan the surface of different objects.
The Controller saves the height information as a bitmap, with each pixel representing one height value. After a lot of tinkering, I found out, that Keyence is not using the actual colors, rather than using the 24-Bit RGB-triplets as some form of binary storage. However, no combination of these bytes seems to work for me. Are there any common storage methods for 24-bit Integers?
To decode those values, I did a scan covering the whole measurement range of the scanner, including some out of range values in the beginning and the end. If you look at the distribution of the values of each color plane, you can see, that the first and third plane actually only use values up to 8/16 which means only 3/4 Bits. This is also visible in the image itself, as it mainly shows a green color.
I concluded that Keyence uses the full byte of the green color plane, 3 Bits of the first and 4 Bits of the last plane to store the height information. Keyence seems to have chosen some weird 15 Bit Integer Format to store their data.
With a little bit-shifting and knowing that the scanner has a valid range from [-2.2, 2.2], I was able to build the following simple little (Matlab-) script to calculate the height information for each pixel:
HeightValBin = bitshift(scanIm(:,:,2),7, 'uint16') ...
+ bitshift(scanIm(:,:,1),4, 'uint16')...
+ bitshift(scanIm(:,:,3),0, 'uint16');
scanBinValScaled = interp1([0,2^15], [-2.2, 2.2], double(scanBinVal));
Keyence offers a software to convert those .bmp into .csv-files, but without an API to automate the process. As I will have to deal with a lot of these files I needed to automate this process.
The calculated values from the rgb triplets are actually even more precise than the exported csv, as the csv only shows 4 digits after the decimal point.
I don't understand :
if we considerate the value 00001111 (15) is a byte and a RGB pixel (220,180,155) it's 3 bytes whatever the values of the pixel.
so why when i reduce the values of my pixels (with bitshift operation or whatever) the size of
my image is not = pixel numbers x 3. when i say "pixel numbers" i mean "pixel numbers bigger than fully black".
how the mechanism works ? is it counted in bits and then divided by eight as an average ?
if i have a 3MB picture and i do a bitshift (factor 2 on each 3 RGB channel) i found a 300 KB picture.
Don't tell me 90% of my pixels turned fully black.
Thanks.
If you shift all the pixel values right by 2 places, you will have around 1/4 as many shades of red as before, and around 1/4 as many greens and likewise for blues. That means overall you will have vastly fewer colours. That means your image may well have fewer than 256 colours which means it can be palettised. It also means it is likely to compress better because there will be more repetition of fewer unique sequences.
You can check if your image is palettised in several ways:
open it with PIL and check if image.mode contains a P
run exiftool on it and check if Colour Type is Palette
run ImageMagick on it with magick identify -verbose YOURIMAGE
You can count the number of unique colours in your image with ImageMagick using:
magick identify -format %k YOURIMAGE
Or you can do it in Python with the last part (entitled "Update") of this answer.
I am trying to write a sequence of color images to a dicom file in Matlab. Each image is of type uint16. The sequence is stored in a 4D matrix named output of size 200x360x3x360 (num of rows x num of cols x num of channels x num of images). When I execute dicomwrite(output,'outputfile.dcm'), it gives the following error:
It says data bit depth is 8 but I've ensured that each image is 16-bit. Not sure what's going wrong.
The documentation for dicomwrite says it can write color images as well. In fact dicomread can read color dicom images such that the size of the matrix which stores the read data is 200x360x3x360. So I guess it should be possible to write color images as well using dicomwrite. Any help in this regard is appreciated. There is a related post but it doesn't talk about color image sequence.
The comment by JohnnyQ is correct.
From this page down in section A.8.5.4, they list the Multi-frame True Color SC Image IOD Content Constraints (partial list quote):
In the Image Pixel Module, the following constraints apply:
Samples per Pixel (0028,0002) shall be 3
Bits Allocated (0028,0100) shall be 8
Bits Stored (0028,0101) shall be 8
High Bit (0028,0102) shall be 7
Pixel Representation (0028,0103) shall be 0
It seems matlab will not do the conversion for you, so you should down convert each 16-bit color channel to 8-bit for DICOM
I'm trying to write a TIFF image with RMagick that tesseract can process. Tesseract objects if bits per pixel is > 32 or samples per pixel is other than 1, 3 or 4.
With the defaults, Image.write generates 3 (RGB) samples plus 1 alpha channel at 16-bits per sample for a total of 64 bits per pixel, violating the first constraint.
If I set the colorspace to GRAYColorspace as follows, it still outputs the alpha channel, giving two samples per pixel, violating the second constraint.
Image.write('image.tif) {self.colorspace = GRAYColorspace}
Per the RMagick documentation, the alpha channel is ignored on method operations unless specified, but even if I do self.channel(GREYChannel), the alpha channel is still output.
I know I can run convert on the file afterwards, but I'd like to find a solution that avoids that.
Here is the tiffinfo output for the file currently generated:
TIFF Directory at offset 0x9c48 (40008)
Image Width: 100 Image Length: 100
Bits/Sample: 16
Compression Scheme: None
Photometric Interpretation: min-is-black
Extra Samples: 1<unassoc-alpha>
FillOrder: msb-to-lsb
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 2
Rows/Strip: 20
Planar Configuration: single image plane
Page Number: 0-1
DocumentName: image-gray-colorspace.tif
White Point: 0.3127-0.329
PrimaryChromaticities: 0.640000,0.330000,0.300000,0.600000,0.150000,0.060000