Getting .atlas files to use with OSM Atlas

Getting .atlas files to use with OSM Atlas - openstreetmap

I am trying to use OSM's Atlas project in order to be able to "walk" around the planet's streets and maps as a graph, but I am having trouble using it. The documentation seems a little scarce.
It seems that I need .atlas files, but I am not sure how to get/generate them.
I tried downloading files from the OpenStreetMap export function. It seems to give me .osm files. Then I tried to convert them to .osm.pbf files with osmconvert map.osm -o=map.osm.pbf. Then I tried to use the CLI command from Atlas to convert the .osm.pbf file to an .atlas file: atlas pbf2atlas --countryName BGR map.osm.pbf.
This doesn't seem to work though because I then try to load the .atlas file, but it throws an exception.
import org.openstreetmap.atlas.geography.atlas.Atlas;
import org.openstreetmap.atlas.geography.atlas.AtlasResourceLoader;
import org.openstreetmap.atlas.streaming.resource.File;
public class Main {
public static void main(String[] args) {
File atlasFile = new File("BGR_map.osm.atlas");
Atlas atlas = new AtlasResourceLoader().load(atlasFile);
atlas.nodes().forEach(System.out::println);
}
}
Exception in thread "main" org.openstreetmap.atlas.exception.CoreException: MetaData not here!
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlasSerializer.load(PackedAtlasSerializer.java:96)
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlas.load(PackedAtlas.java:190)
at org.openstreetmap.atlas.geography.atlas.AtlasResourceLoader.load(AtlasResourceLoader.java:73)
at org.openstreetmap.atlas.geography.atlas.AtlasResourceLoader.load(AtlasResourceLoader.java:96)
at Main.main(Main.java:9)
Caused by: org.openstreetmap.atlas.exception.CoreException: Unable to read Atlas field metaData
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlasSerializer.deserializeIfNeeded(PackedAtlasSerializer.java:154)
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlas.metaData(PackedAtlas.java:511)
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlasSerializer.load(PackedAtlasSerializer.java:92)
... 4 more
Caused by: org.openstreetmap.atlas.exception.CoreException: Could not load Field metaData from BGR_map.osm.atlas
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlasSerializer.deserializeResource(PackedAtlasSerializer.java:258)
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlasSerializer.deserializeSingleField(PackedAtlasSerializer.java:275)
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlasSerializer.load(PackedAtlasSerializer.java:344)
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlasSerializer.deserializeIfNeeded(PackedAtlasSerializer.java:149)
... 6 more
Caused by: java.io.StreamCorruptedException: invalid stream header: 08811E10
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:866)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
at org.openstreetmap.atlas.geography.atlas.packed.PackedAtlasSerializer.deserializeResource(PackedAtlasSerializer.java:247)
... 9 more
I found some other already generated .atlas files for testing in the Atlas repo and they load normally with this code, so it seems that I can't generate the .atlas files properly. How do I do that? I want to open the OSM map and get a region of it as an .atlas file so that I can explore it through code.

Related

OSMDroid - OpenTopo tiles not loading? Failed to connect to a.tile.opentopomap.org

I used to be able to load OpenTopo tiles in OSMDroid. They were beautiful.
My MapView is configured with mapView.setTileSource(TileSourceFactory.OpenTopo)
Lately, they don't load and I get error:
W/OsmDroid: IOException downloading MapTile: /14/2598/5547 : java.net.ConnectException: Failed to connect to a.tile.opentopomap.org/131.188.76.144:443
I am connected, and tiles run fine when I use another tile source like mapView.setTileSource(TileSourceFactory.DEFAULT_TILE_SOURCE).
So is this a recent issue with OpenTopo on the server side or could it still be something wrong on my end?

Integrating Confluent Schema Registry with Apache Atlas

Problem Definition
I am trying to integrate the data which exists in Confluent Schema Registry with Apache Atlas. For this purpose I have seen lots of links, they also talk about its possibility but they didn't give any technical information of how this integration was done.
Question
Would anyone help me to import the data (also metadata) from Schema Registry to Apache Atlas real-time? Is there any hook, even-listener or something like this to implement it?
Example
Here is what I have from Schema Registry:
{
"subject":"order-value",
"version":1,
"id":101,
"schema":"{\"type\":\"record\",\"name\":\"cart_closed\",\"namespace\":\"com.akbar.avro\",\"fields\":[{\"name\":\"_g\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"_s\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"_u\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"application_version\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"client_time\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"event_fingerprint\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"os\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"php_session_id\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"platform\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"server_time\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"site\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"user_agent\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"payment_method_id\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"page_view\",\"type\":[\"boolean\",\"null\"],\"default\":null},{\"name\":\"items\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"item\",\"fields\":[{\"name\":\"brand_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"category_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"discount\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"order_item_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"price\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"product_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"quantity\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"seller_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"variant_id\",\"type\":[\"long\",\"null\"],\"default\":null}]}}},{\"name\":\"cart_id\",\"type\":[\"long\",\"null\"],\"default\":null}]}"
}
How to import it in Apache Atlas?
What I have done
I checked the schema registry documentation in which it has the following architecture:
So I decided to set the Kafka url but I didn't find any where to set the Kafka configuration. I tried to change the atlas.kafka.bootstrap.servers
variable in atlas-application.properties. I have also tried to call import-kafka.sh from hook-bin directory but it wasn't successful.
Error log
2021-04-25 15:48:34,162 ERROR - [main:] ~ Thread Thread[main,5,main] died (NIOServerCnxnFactory$1:92)
org.apache.atlas.exception.AtlasBaseException: EmbeddedServer.Start: failed!
at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:115)
at org.apache.atlas.Atlas.main(Atlas.java:133)
Caused by: java.lang.NullPointerException
at org.apache.atlas.util.BeanUtil.getBean(BeanUtil.java:36)
at org.apache.atlas.web.service.EmbeddedServer.auditServerStatus(EmbeddedServer.java:128)
at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:111)
... 1 more

Remote debug an apache beam job on flink cluster

I am running a streaming beam job on a flink cluster where I am getting the following exception.
Caused by: org.apache.beam.sdk.util.UserCodeException: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
at org.apache.beam.sdk.transforms.MapElements$1$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:218)
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
at org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:62)
at org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.processElement(DoFnOperator.java:544)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:596)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator$BufferedOutputManager.emit(DoFnOperator.java:941)
at org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator$BufferedOutputManager.output(DoFnOperator.java:895)
at org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:252)
at org.apache.beam.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:74)
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:576)
at org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.output(DoFnOutputReceivers.java:71)
at org.apache.beam.sdk.transforms.MapElements$1.processElement(MapElements.java:139)
Caused by: org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds have the same scheme, but received alluxio, file.
at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
at org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:218)
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
at org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:62)
at org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.processElement(DoFnOperator.java:544)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator$BufferedOutputManager.emit(DoFnOperator.java:941)
at org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator$BufferedOutputManager.output(DoFnOperator.java:895)
at org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:252)
at org.apache.beam.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:74)
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:576)
at org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.output(DoFnOutputReceivers.java:71)
at org.apache.beam.sdk.transforms.MapElements$1.processElement(MapElements.java:139)
at org.apache.beam.sdk.transforms.MapElements$1$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:218)
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
at org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:62)
at org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.processElement(DoFnOperator.java:544)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds have the same scheme, but received alluxio, file.
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
at org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:428)
at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:308)
at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:755)
at org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:850)
The streaming job is getting data from the apache pulsar source and writing output data onto an Alluxio data lake in parquet file format. I am using Spotify's scio for writing this job in Scala. A little code chunk to emphasize what I am trying to achieve:
pulsarSource
.open(sc)
.withFixedWindows(Duration.standardSeconds(windowDuration))
.toSinkTap(sink)
From the exception, I can see that source and output paths should have the same URI scheme but I don't know how it is happening because I am using an alluxio path as an output directory. There are some temp directories that are being created on alluxio output directory but after the WindowDuration, when the output file is being created, this exception happens.
I had a doubt that temp location might be configured by default to the local filesystem, so I did set that to output directory path (alluxio dir path) but it didn't change anything.
sc.options.setTempLocation(outputDir)
I want to do remote debugging in order to figure out the issue. I have tried this document to do remote debugging on the task executor node, but once my IntelliJ IDE connects with the node, I don't get hit on my breakpoint.
Can someone suggest, how can I debug or get more information about this issue.
Thanks

Remote debugging might be quite hard, but let's try this first: Make sure you connect to the task manager and not job manager (easy to verify with thread names). Then make sure to have a high number of retries, such that you don't miss the task execution, as attaching the debugging might take a while.
It's also helpful to double check that line numbers of the stack trace match your code version in the IDE. If Flink/Beam is preinstalled, they might run a slightly different version and your break point is void. Just paste the stack trace in your IDE and check if each line matches the expectation. Finally, add a few more breakpoint at central places like org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202) to make sure if the setup is working at all.
However, remote debugging is usually not the recommended option for big data systems. You'd first ensure locally that most of the things work on their own with some IT tests and local runners. Then, you might want to add e2e tests with docker containers and a local mini cluster. Additionally, you'd add a ton of logging statements, which you can turn on and off with your logging configuration. Similarly, if you set logging level to debug, the existing log statements of the frameworks might already be enough to gain some insights. One important thing that you should always look at is the generated topology that you can see in Web UI. Maybe it already tells you the paths in question.

How can I run NLTK on App Engine or Kubernetes?

I am busy writing a model to predict types of text like names or dates on a pdf document.
The model uses nltk.word_tokenize and nltk.pos_tag
When I try to use this on Kubernetes on Google Cloud Platform I get the following error:
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
tokenized_word = tokenize_word('x')
tagges_word = pos_tag(['x'])
stacktrace:
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/env/nltk_data'
- '/env/share/nltk_data'
- '/env/lib/nltk_data'
- ''
But obviously downloading it to your local device will not solve the problem if it has to run on Kubernetes and we do not have NFS set up on the project yet.

How I ended up solving this problem was adding the download of the nltk packages in an init function
import logging
import nltk
from nltk import word_tokenize, pos_tag
LOGGER = logging.getLogger(__name__)
LOGGER.info('Catching broad nltk errors')
DOWNLOAD_DIR = '/usr/lib/nltk_data'
LOGGER.info(f'Saving files to {DOWNLOAD_DIR} ')
try:
tokenized = word_tokenize('x')
LOGGER.info(f'Tokenized word: {tokenized}')
except Exception as err:
LOGGER.info(f'NLTK dependencies not downloaded: {err}')
try:
nltk.download('punkt', download_dir=DOWNLOAD_DIR)
except Exception as e:
LOGGER.info(f'Error occurred while downloading file: {e}')
try:
tagged_word = pos_tag(['x'])
LOGGER.info(f'Tagged word: {tagged_word}')
except Exception as err:
LOGGER.info(f'NLTK dependencies not downloaded: {err}')
try:
nltk.download('averaged_perceptron_tagger', download_dir=DOWNLOAD_DIR)
except Exception as e:
LOGGER.info(f'Error occurred while downloading file: {e}')
I realize that the amount of try catch expressions are not needed. I also specify the download dir because it seemed that if you do not do that it downloads and unzips 'tagger' to /usr/lib and the nltk does not look for the the files there.
This will download the files on every first run on a new pod and the files will persist until the pod dies.
The error was solved on a Kubernetes stateless set which means this can deal with non persistent applications like App Engine, but will not be the most efficient because it will need to be download every time the instance spins up.

FATAL org.apache.hadoop.conf.Configuration - error parsing conf file: org.xml.sax.SAXParseException

I'm trying to run pig locally, installed using homebrew, to test a script. However, I get the following error when I attempt to run a simple dump from the interactive prompt pig -x local:
2012-07-16 23:20:40,447 [Thread-7] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
[Fatal Error] :63:85: Character reference "&#2" is an invalid XML character.
2012-07-16 23:20:40,688 [Thread-7] FATAL org.apache.hadoop.conf.Configuration - error parsing conf file: org.xml.sax.SAXParseException: Character reference "&#2" is an invalid XML character.
The same load/dump works fine on Elastic MapReduce.
I can't find any XML config files, and I've tried with both version 0.9.2 and 0.10.0
What am I missing?
Edit: Just checked a direct download (vs. homebrew) and it doesn't seem to work either

You should check that your Hadoop configuration files have correct configuration data.
Have a look in your hadoop/conf directory.
Have a look inside:
hdfs-site.xml
mapred-site.xml
core-site.xml

Finally worked out what the problem was. I ended up having to use dtruss -p on the pig/java process. This revealed a temporary directory and dynamically generated xml files. Once the temporary directory was discovered, it all fell quickly into place.
It was picking up the proxy excludes from my network connections, which had, as far as I can tell, &#2 (http://www.fileformat.info/info/unicode/char/02/index.htm) embedded in it. How this invalid value came to be in my network preferences in the first place, I haven't the faintest clue.
The value was then being pulled into dynamically generated files, for example /tmp/hadoop-vertis/mapred/staging/vertis-1005847898/.staging/job_local_0001/job.xml.
The offending lines:
<property><name>ftp.nonProxyHosts</name><value>localhost|*.localhost|127.0.0.1|h|*.h</value></property>
<property><name>socksNonProxyHosts</name><value>localhost|*.localhost|127.0.0.1|h|*.h</value></property>
<property><name>http.nonProxyHosts</name><value>localhost|*.localhost|127.0.0.1|h|*.h</value></property>

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Getting .atlas files to use with OSM Atlas - openstreetmap

Related

OSMDroid - OpenTopo tiles not loading? Failed to connect to a.tile.opentopomap.org

Integrating Confluent Schema Registry with Apache Atlas

Remote debug an apache beam job on flink cluster

How can I run NLTK on App Engine or Kubernetes?

FATAL org.apache.hadoop.conf.Configuration - error parsing conf file: org.xml.sax.SAXParseException

Categories

Resources