Is here any standard CRS name for plain 2D data aka non-Earth projection? - coordinates

I'm interested in using different geotools for work with spatial data, but without Earth projection, plain 2D like big game maps. But I'm struggling each time to configure a tool that way.
Hence the question: does exist some standard projection name for GDAL, MariaDB, etc? I've tried to find any, but no luck. This seems strange, might be a quite common user case.

Related

Spatial geometry columns vs float/decimal columns in storing longitude/lattitude

I need to store and give out long/lat coordinates and display on google maps. I would need to store points, lines, and polygons. Then add metadata on them to for generating info.
Currently looking into postgis, and it seems a fair bit to absorb. Now I'm wondering if I need to delve into it.
Is it advisable to use a spatial database for the said purpose? or using float/decimal columns for long/lat is fine?
Currently looking into postgis, and it seems a fair bit to absorb. Now I'm wondering if I need to delve into it. Is it advisable to use a spatial database for the said purpose? or using float/decimal columns for long/lat is fine?
It is a lot to absorb. Storing as a float/decimal gets you nothing. Nothing at all. Spatial functions require spatial types. You gotta learn it. You don't have to learn all of it, but you have to learn it. And, it's not too hard to get started.
CREATE TABLE foo(id,geom)
AS
VALUES ( 1, ST_MakePoint(long,lat)::geography );
Etc.
I highly suggest PostGIS in Action 2nd Edition
If the only thing you need is store and give out, then float/decimal columns would be more than enough. However, if you are querying for spatial relations (whether a point is located within a polygon, whether polygons intersect etc.), you'd better to use either PostGIS or, for instance, MySQL extensions for spatial data.

GeoTools filters for shapefiles

I am looking at using GeoTools to read shapefiles. The tutorial for using it is straightforward showing how to set a filter to "Filter filter = Filter.INCLUDE;" to specify everything.
I want to split up the reading for performance purposes on very large shape files. In essence I want to split the reading of the info in the DBF file from the reading of the "THE_GEOM" data. We have a lot of our own filtering already built and it is easier to just use it and then retrieve the actual geometry as required.
How do I specify a filter to retrieve all the DBF info without the geometry?
How do I specify a filter to retrieve the geometry without the DBF info? This isn't as important since it probably won't impact performance so much but I would like to know.
Thanks.
By design the GeoTools' Shapefile Datastore goes to great lengths to keep the geometry and the attributes (the DBF stuff) together. So you are going to have to poke around in the internals to be able to do this. So you could use a DBFFileReader and a ShapefileReader to split the reading up.
I would consider porting your filters to GeoTools as it gives you the flexibility to switch data sources later when Shapefiles prove too small or too slow. It might be worth looking at the CQL and ECQL classes to help out in constructing them.
If you are really dealing with large Shapefiles (>2Gb) then going with a proper spatial database like PostGIS is almost certainly going to give better performance and GeoTools will access the data in exactly the same way with exactly the same filters.

How do you generate a CAD geometry of randomly oriented objects?

How can one generate CAD geometries of randomly oriented and randomly sized objects (3D)? I need to model randomly sized and randomly oriented rectangles--thousands to millions of them.
I have not yet come across any CAD tools that have =rand() functions that can be inputted into dimensions. Is one way perhaps to have a CAD program import a CSV file of these randomly generated parameter values?
In SolidWorks, you can have model parameters (dimension lengths/angles, constraints, etc.) stored in an Excel spreadsheet called a Design Table. Each row in the spreadsheet will represent a different configuration of your model, and each column a different parameter. You can use Excel's built-in capabilities or an export-capable tool of your choosing to generate the configurations according to your desired distribution. I don't recall off the top of my head the easiest way to get a large number of instances with different configurations into the same assembly, but you haven't really told us what you're trying to accomplish so I can't give you specific recommendations anyways.
If you have a specific CAD tool then you can often find documentation on the internal file format. With a little experimentation you can sometimes write a small external program that will generate the header of the CAD file and then loop thousands or millions of times generating each individual object. Finally you generate the lines needed to complete the file. That can sometimes be easier than trying to force a tool to do something the designers never expected. And this might let you use the software of your choice to generate the file.
I would suggest starting small. Use the CAD tool to create a file with two or three of your rectangles. Save and inspect the contents of the file to see that it matches your understanding of the needed format. Then try externally creating what should be the same file and verify your version is correctly accepted.
You might consider that some tool designers never expected someone to want thousands or millions of anything. I would suggest sneaking up on the problem. Try doubling the number of items, check this works as expected and then repeat this process again and again until either you successfully get to millions or until you find the CAD tool won't be able to handle this.

Clustering structured (numeric) and text data simultaneously

Folks,
I have a bunch of documents (approx 200k) that have a title and abstract. There is other meta data available for each document for example category - (only one of cooking, health, exercise etc), genre - (only one of humour, action, anger) etc. The meta data is well structured and all this is available in a MySql DB.
I need to show to our user related documents while she is reading one of these document on our site. I need to provide the product managers weight-ages for title, abstract and meta data to experiment with this service.
I am planning to run clustering on top of this data, but am hampered by the fact that all Mahout Clustering example use either DenseVectors formulated on top of numbers, or Lucene based text vectorization.
The examples are either numeric data only or text data only. Has any one solved this kind of a problem before. I have been reading Mahout in Action book and the Mahout Wiki, without much success.
I can do this from the first principles - extract all titles and abstracts in to a DB, calculate TFIDF & LLR, treat each word as a dimension and go about this experiment with a lot of code writing. That seems like a longish way to the solution.
That in a nutshell is where I am trapped - am I doomed to the first principles or there exist a tool / methodology that I somehow missed. I would love to hear from folks out there who have solved similar problem.
Thanks in advance
You have a text similarity problem here and I think you're thinking about it correctly. Just follow any example concerning text. Is it really a lot of code? Once you count the words in the docs you're mostly done. Then feed it into whatever clusterer you want. The term extractions is not something you do with Mahout, though there are certainly libraries and tools that are good at it.
I'm actually working on something similar, but without the need of distinciton between numeric and text fields.
I have decided to go with the semanticvectors package which does all the part about tfidf, the semantic space vectors building, and the similarity search. It uses a lucene index.
Please note that you can also use the s-space package if semanticvectors doesn't suit you (if you go down that road of course).
The only caveat I'm facing with this approach is that the indexing part can't be iterative. I have to index everything every time a new document is added, or an old document is modified. People using semanticvectors say they have very good indexing times. But I don't know how large their corpora are. I'm going to test these issues with the wikipedia dump to see how fast it can be.

mongodb: inserting and querying geometries and WMS

I am discovering mongodb, looks nice but i am still wondering if it can solve my needs.
The question is that we have 16 million point data and we want to cross some part of it with polygons to get statistics (how many points in each polygon).
Basic geometries would be cell degrees (1 degree, 0.5 degree...) covering all the world. In that case the $within function would work, right?
But I wonder, how do I insert these geometries (coming from a shapefile) inside mongodb? Till now I was using postgreSQL-postGIS, and for that I have a lot of tools, but for mongodb...I am also wondering if more complex geometries could be inserted and queried against points.
MongoDB only provides JSON as result, right? if we want to plot some hundreds of points it would be no problem, but hundreds of thousands to be converted to vectorial data via javascript... is for this reason that WMS services are useful, as they provide one image.
Any hope to connect mongodb to any WMS? I saw someone announcing a plugin for Geoserver but it makes a year ago and nothing happened since then.
In case it is not possible, about how many GeoJSON features can be plotted at time keeping a nice browser performance?
Not much help, but I saw a talk on someone who added MongoDB as a back end to GeoServer last year.
IIRC, he said he would open source it (if his company approved), so maybe it's worth tracking him down.
EDIT: Looks like he got approval. Dug up some code here but not sure where associated documentation is. The Geotools/opengeo mailing list is where I found that.
I'm also starting to investigate using NoSQL for geographic data.
There is an article
The example code Python, PyMongo and the OGR libraries to convert shapefiles to a MongoDB collection and vice versa.