I have a series of obj files which were produced by photogrammetry by my coworkers who specialize in dealing with GIS (Geographic Information Systems) data. The first few data points in the files look something like:
v 445077.679 4460688.700 61.371
v 445077.340 4460686.317 61.367
v 445077.296 4460686.024 61.416
I believe the file is valid because I can open the files in an online viewer and I get what I expect to see using the viewer at http://masc.cs.gmu.edu/wiki/ObjViewer:
When I open the same file in Blender, Unity or Unreal Engine, the object is very far from the world origin. I can center it by moving the origin to the center of mass and then resetting the object location, but when I recenter the object I always see something that looks like:
What am I doing wrong, or what could be wrong with my file?
The reason for the problem with these files is the large offset combined with 32-bit float values. In this case the objects all use the same geographic origin, probably at a lat/long of 0.000N/0.000E
Nearly all 3D graphics programs use 32-bit floating point values to store each points location, and the combination of the offset and the 32-bit value causes some of the precision to be lost. 32-bit floats have about 7 decimal digits of precision, so the offset of 4460688 in the example file completely dominates, and effectively cuts the model from 1mm resolution to 1m resolution data. The reason for the long triangles is that there is more data lost in one direction due to the asymmetry of the offset.
The solution is to apply some offset to bring the objects close to the origin BEFORE importing them with the 3D software.
I wrote a quick python script that can help with this: https://gitlab.umich.edu/lsa-ts-rsp/xr-shiftOBJ/-/blob/main/shiftOBJ.py
import re # regex
def shiftFile(inFileName, outFileName, offset):
with open(inFileName) as myInFile:
with open(outFileName, 'w') as myOutFile:
for line in myInFile:
myOutFile.write(shiftLine(line, offset))
def shiftLine(inLine, offset):
#if a line is a vertex then apply the shift and drop vertex colors
lineRegex = re.compile('v (\d+\.\d+) (\d+\.\d+) (\d+\.\d+)')
m = lineRegex.match(inLine)
if m and len(m.groups()) >= 3:
outLine = 'v ' + "{:.3f}".format(float(m.groups()[0]) + offset[0]) + ' ' + "{:.3f}".format(float(m.groups()[1]) + offset[1]) + ' ' + "{:.3f}".format(float(m.groups()[2]) + offset[2]) + '\n'
return outLine
else:
return inLine
if __name__ == '__main__':
inFile = '/Users/crstock/Documents/Unreal Projects/Olynthos Data/B88DW18.obj'
outFile = '/Users/crstock/Documents/Unreal Projects/Olynthos Data/B88DW18_shifted.obj'
offset = [-445070, -4460680, -59.0]
shiftFile(inFile, outFile, offset)
This applies an offset to all vertex lines and leaves the other lines alone. By using the same offset values for multiple input files you can maintain the relative shift so that related objects fit together appropriately.
Related
I have created 2 Mbtiles via QGIS: 1) one Mbtile is from zoom 0 until 10 & is a map of the whole world, 2) and another one from zoom 0 until 17 & is a detailed map of one country.
I would like to merge the two Mbtiles, and have the Mbtile of the detailed country overlapping the Mbtile of whole world. Also the merged result to be from zoom 0 til 17 (the whole world would disappear at zoom 10, but the country will remain until zoom 17).
What program/method should I use? Is it possible to merge them via QGIS?
I use Python to merge MBTiles files. Be sure to update the matadata table noting the min max zoom. They are just sqlite databases with a unique extension.
This example does not include data validation. I did not test this example -- as it is stripped down from where I batch process output from QGIS.
It is less problematic to use an IDE other than QGIS's python interface. Does not require anything specific to QGIS or PyQGIS.
import sqlite3 as sqlite
def processOneSource(srcDB, dstDB):
# create_index_sql = "CREATE UNIQUE INDEX tile_index on tiles (zoom_level, tile_column, tile_row);"
# dstDB.connection.execute(create_index_sql)
# the index forces an error if there is already a tile for the same zxy
sqlite_insert_blob_query = """ INSERT INTO tiles (zoom_level, tile_column, tile_row, tile_data) VALUES (?, ?, ?, ?)"""
tiles = srcDB.connection.execute('select zoom_level, tile_column, tile_row, tile_data from tiles;')
for t in tiles:
z = t[0]
x = t[1]
y = t[2]
data = t[3]
# example of how you might include exclude tiles
if not (z == 12 or z == 13 or z == 14 or z == 15 or z == 16):
continue
print(str((t[0], t[1], t[2])))
data_tuple = (t[0], t[1], t[2], t[3])
try:
dstDB.connection.execute(sqlite_insert_blob_query, data_tuple)
except Exception as e:
print(e)
dstDB.commit()
if __name__ == '__main__':
srcDB = sqlite.connect("path_to_yourfilename")
dstDB = sqlite.connect("path_to_yourfilename")
processOneSource(srcDB, dstDB)
You can use tile-join, it has a bunch of flags so you can customize the output.
From a Monte-Carlo simulation I have a range of files, say: file_1.mat, file_2.mat,...,file_n.mat, where n is large. Each file contains one or several (maximum 3 if it matters) large 1D arrays in time of interest, say var1, var2, var3.
I am now as always interested in finding the mean value of these variables. My question is now, how do I do this in the most efficient way? The keyword here is efficiency. Below you will find the MWE which is done the standard way, but this is quite time consuming as the files are large and there are many.
I am programming in Matlab, however ideas presented in pseudo code is also very well received.
MWE:(The standard way)
meanVar1 = zeros(1,1e6); %I do not remember the exact size, just use 1e6
meanVar2 = zeros(1,1e6);
meanVar3 = zeros(1,1e6);
for i 1=1:n
load(strcat('file_',int2str(i)),'var1','var2','var3')
meanVar1 = meanVar1 + var1;
meanVar2 = meanVar2 + var2;
meanVar3 = meanVar3 + var3;
end
meanVar1 = meanVar1/n;
meanVar2 = meanVar2/n;
meanVar3 = meanVar3/n;
I have downloaded a large set of GridFloat (.flt, .hdr) DEM files from USGS NED (1") in order to implement my own elevation service on my website. I would like to be able to look up an elevation from this fileset, given latitude and longitude as inputs. I use Perl for my website development. The files have a conventional naming scheme, and I am able to get the appropriate tile filename using the lat/lng. Howevever, accessing the internals of the file is where I'm having an issue.
I know the file is in a fairly straightforward format (.flt, apparently called "Gridfloat"), but I could use some help figuring out the magic numbers for calculating where in the file I need to seek to for a given lat/lng, and how to handle byte order and so on so that I end up with an elevation. From what I understand, apparently row ordering can be an issue, as well as byte ordering. I am looking for a recipe that does not involve use of any third party libraries such as GDAL, which I think are overly complicated and slow for what I want to do. I think it should be possible to just open the file, seek to a position based on some calculation, read some bytes and then unpack them into the correct byte order. Here is an example .hdr file that accompanies floatn48w097_1.flt, I think it has the necessary info. There are a bunch of other files that come with the .zip, including .prj, but I believe those are for a commercial program like ArcInfo. I think everything I need should be in the following .hdr file.
ncols 3612
nrows 3612
xllcorner -97.00166666667
yllcorner 46.99833333333
cellsize 0.000277777777778
NODATA_value -9999
byteorder LSBFIRST
What I'm really hoping for is a formula for calculating the row and column from the lat/lng, then another formula for translating the row/column into a position for seek, how many bytes to read, and how to convert those raw bytes into an integer (or whatever it is these files contain). I feel that this could be a very fast operation, without all the overhead involved with the larger libraries which seem to be focused on doing a lot of stuff that I don't need.
I don't need Perl code, just pseudocode showing the calculations for row/col offsets etc would be more than enough. I believe the files are binary format, a straightforward grid of 4-byte numbers. The file example that goes with the .hdr file above has a size of 52186176, and when you multiply the ncols by nrows (from the .hdr), you get 13046544. which divides nicely into the file size by 4. So I assume it's just a matter of getting the right formula for row/col based on lat/lng, and then getting the bytes swizzled into the right order. I've just not done this much.
I found some reference to the Gridfloat format here: coolutils.com/formats/flt so apparently the file consists of a grid of 64-bit floating point values.
Thanks!
Ok, I think I have an answer. The following is Perl routine, which seems to give back reasonable looking elevation values when tested with the USGS NED1 .flt files. The script takes latitude and longitude as command line arguments, looks up the file and indexes into the grid.
#!/usr/bin/perl
use strict;
use POSIX;
use Math::Round;
sub get_elevation
{
my ($lat, $lng) = #_;
my $lat_degree = ceil ($lat);
my $lng_degree = floor ($lng);
my $lat_letter = ($lat >= 0) ? 'n' : 's';
my $lng_letter = ($lng >= 0) ? 'e' : 'w';
my $lng_tilenum = abs($lng_degree);
my $lat_tilenum = abs($lat_degree);
my $tilename = $lat_letter . sprintf('%02d', $lat_tilenum) . $lng_letter . sprintf('%03d',$lng_tilenum);
my $path = "/data/elevation/ned1/$tilename/float${tilename}_1.flt";
print "path = $path\n";
die "No such file" if (!-e($path));
my ($lat_fraction, $lat_integral) = modf (abs($lat));
my $row = floor ((1 - $lat_fraction) * 3600);
my ($lng_fraction, $lng_integral) = modf (abs($lng));
my $col = floor ((1 - $lng_fraction) * 3600);
open(FILE, "<$path");
my $pos = (3612 * 4 * 6) + (3612 * 4 * $row) + (4 * 6) + ($col * 4);
seek (FILE, $pos, SEEK_SET);
my $buffer;
read (FILE, $buffer, 4);
close (FILE);
my ($elevation) = unpack('f', $buffer);
if ($elevation == -9999)
{
return 'undefined';
}
return $elevation;
}
my $lat = $ARGV[0];
my $lng = $ARGV[1];
my $elevation = get_elevation ($lat, $lng);
print "Elevation for ($lat, $lng) = $elevation meters (", $elevation * 3.28084, " feet)\n";
Hope this might be useful to anyone else trying to do the same kind of thing... I've tested this method now and it seems to produce good looking elevation profiles which are smoother than those from the 3" SRTM data.
Neil put me on the right track but I think there's a few problems with his original answer. I've added some fixes and improvements including on-the-fly download of the needed tile from the 1/3 arc second (10 meter) dataset, proper parsing of the header file, and what I believe is corrected indexing.
This is still mostly illustrative and should be improved before production use, particularly, hanging on to the header information and the file handle for repeated queries.
https://gist.github.com/biomiker/32fe34e1fa1bb49ae1135ab6652f596d
I have a NetCDF file, which contains data representing total precipitation across the globe over several months (so it's stored in a three dimensional array). I first ensured that the data was sensible, and the way it was formed, both in XConv and ncdump. All looks sensible - values vary from very small (~10^-10 - this makes sense, as this is model data, and effectively represents zero) to about 5x10^-3.
The problems start when I try to handle this data in IDL or MatLab. The arrays generated in these programs are full of huge negative numbers such as -4x10^4, with occasional huge positive numbers, such as 5000. Strangely, looking at a plot of the data in MatLab with respect to latitude and longitude (at a specific time), the pattern of rainfall looks sensible, but the values are just completely wrong.
In IDL, I'm reading the file in to write it to a text file so it can be handled by some software that takes very basic text files. Here's the code I'm using:
PRO nao_heaps
address = '/Users/levyadmin/Downloads/'
file_base = 'output'
ncid = ncdf_open(address + file_base + '.nc')
MONTHS=['january','february','march','april','may','june','july','august','september','october','november','december']
varid_field = ncdf_varid(ncid, "tp")
varid_lon = ncdf_varid(ncid, "longitude")
varid_lat = ncdf_varid(ncid, "latitude")
varid_time = ncdf_varid(ncid, "time")
ncdf_varget,ncid, varid_field, total_precip
ncdf_varget,ncid, varid_lat, lats
ncdf_varget,ncid, varid_lon, lons
ncdf_varget,ncid, varid_time, time
ncdf_close,ncid
lats = reform(lats)
lons = reform(lons)
time = reform(time)
total_precip = reform(total_precip)
total_precip = total_precip*1000. ;put in mm
noLats=(size(lats))(1)
noLons=(size(lons))(1)
noMonths=(size(time))(1)
; the data may not be an integer number of years (otherwise we could make this next loop cleaner)
av_precip=fltarr(noLons,noLats,12)
for month=0, 11 do begin
year = 0
while ( (year*12) + month lt noMonths ) do begin
av_precip(*,*,month) = av_precip(*,*,month) + total_precip(*,*, (year*12)+month )
year++
endwhile
av_precip(*,*,month) = av_precip(*,*,month)/year
endfor
fname = address + file_base + '.dat'
OPENW,1,fname
PRINTF,1,'longitude'
PRINTF,1,lons
PRINTF,1,'latitude'
PRINTF,1,lats
for month=0,11 do begin
PRINTF,1,MONTHS(month)
PRINTF,1,av_precip(*,*,month)
endfor
CLOSE,1
END
Anyone have any ideas why I'm getting such strange values in MatLab and IDL?!
AH! Found the answer. NetCDF files use an offset, and a scale factor for the data to keep the size of the file to a minimum. To get the correct values, I simply need to:
total_precip = offset + (scale_factor * total_precip) ;put into correct range
At present I'm getting the scale factor and offset from ncdump, and hard coding them into my IDL program, but does anyone know how I can get them dynamically in my IDL code..?
The problem in general:
I have a big 2d point space, sparsely populated with dots.
Think of it as a big white canvas sprinkled with black dots.
I have to iterate over and search through these dots a lot.
The Canvas (point space) can be huge, bordering on the limits
of int and its size is unknown before setting points in there.
That brought me to the idea of hashing:
Ideal:
I need a hash function taking a 2D point, returning a unique uint32.
So that no collisions can occur. You can assume that the number of
dots on the Canvas is easily countable by uint32.
IMPORTANT: It is impossible to know the size of the canvas beforehand
(it may even change),
so things like
canvaswidth * y + x
are sadly out of the question.
I also tried a very naive
abs(x) + abs(y)
but that produces too many collisions.
Compromise:
A hash function that provides keys with a very low probability of collision.
Cantor's enumeration of pairs
n = ((x + y)*(x + y + 1)/2) + y
might be interesting, as it's closest to your original canvaswidth * y + x but will work for any x or y. But for a real world int32 hash, rather than a mapping of pairs of integers to integers, you're probably better off with a bit manipulation such as Bob Jenkin's mix and calling that with x,y and a salt.
a hash function that is GUARANTEED collision-free is not a hash function :)
Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).
If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
(x * 0x1f1f1f1f) ^ y
(you need to transform one of the variables first to make sure the hash function is not commutative)
Like Emil, but handles 16-bit overflows in x in a way that produces fewer collisions, and takes fewer instructions to compute:
hash = ( y << 16 ) ^ x;
You can recursively divide your XY plane into cells, then divide these cells into sub-cells, etc.
Gustavo Niemeyer invented in 2008 his Geohash geocoding system.
Amazon's open source Geo Library computes the hash for any longitude-latitude coordinate. The resulting Geohash value is a 63 bit number. The probability of collision depends of the hash's resolution: if two objects are closer than the intrinsic resolution, the calculated hash will be identical.
Read more:
https://en.wikipedia.org/wiki/Geohash
https://aws.amazon.com/fr/blogs/mobile/geo-library-for-amazon-dynamodb-part-1-table-structure/
https://github.com/awslabs/dynamodb-geo
Your "ideal" is impossible.
You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.
Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.
See also http://en.wikipedia.org/wiki/Counting_argument
Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).
Perhaps?
hash = ((y & 0xFFFF) << 16) | (x & 0xFFFF);
Works as long as x and y can be stored as 16 bit integers. No idea about how many collisions this causes for larger integers, though. One idea might be to still use this scheme but combine it with a compression scheme, such as taking the modulus of 2^16.
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.
You can do
a >= b ? a * a + a + b : a + b * b
taken from here.
That works for points in positive plane. If your coordinates can be in negative axis too, then you will have to do:
A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;
But to restrict the output to uint you will have to keep an upper bound for your inputs. and if so, then it turns out that you know the bounds. In other words in programming its impractical to write a function without having an idea on the integer type your inputs and output can be and if so there definitely will be a lower bound and upper bound for every integer type.
public uint GetHashCode(whatever a, whatever b)
{
if (a > ushort.MaxValue || b > ushort.MaxValue ||
a < ushort.MinValue || b < ushort.MinValue)
{
throw new ArgumentOutOfRangeException();
}
return (uint)(a * short.MaxValue + b); //very good space/speed efficiency
//or whatever your function is.
}
If you want output to be strictly uint for unknown range of inputs, then there will be reasonable amount of collisions depending upon that range. What I would suggest is to have a function that can overflow but unchecked. Emil's solution is great, in C#:
return unchecked((uint)((a & 0xffff) << 16 | (b & 0xffff)));
See Mapping two integers to one, in a unique and deterministic way for a plethora of options..
According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ).
You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
// C# code snippet
public class SomeVerySimplePoint {
public int X;
public int Y;
public override int GetHashCode() {
return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}
}
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
the Fibonacci hash works very well for integer pairs
multiplier 0x9E3779B9
other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd
a1 + a2*multiplier
this will give very different values for close together pairs
I do not know about the result with all pairs