Hbase data query using rest api - rest

To get data from Hbase table using rest we can use:
http://ip:port/tablename/base64_encoded_key
My key is byte array of
prefix + customer_id + timestamp
byte[] rowKey = Bytes.add(Bytes.toBytes(prefix),Bytes.toBytes(customer_id),Bytes.toBytes(timestamp));
My sample key
3\x00\x00\x00\x02I9\xB1\x8B\x00\x00\x01a\x91\x88\xEFp
How do I get data from Hbase using rest?
How do I get data from Hbase using customer_id and time range?

You must send an HTTP request to get your value. For example if your are an Linux you can easily try a GET request to take a single value. This example retrieves from table users row with id row1 and column a from column family f
curl -vi -X GET \
-H "Accept: text/xml" \
"http://example.com:20550/users/row1/cf:a"
You can see more here including how to retrieve data with timestamp

Related

Is it possible to query a KSQL Table/Materialized view via HTTP?

I have a materialized view created using
CREATE TABLE average_latency AS SELECT DEVICENAME, AVG(LATENCY) AS AVG_LATENCY FROM metrics WINDOW TUMBLING (SIZE 1 MINUTE) GROUP BY DEVICENAME EMIT CHANGES;
I would like to query the table average_latency via a REST API call to get the AVG_LATENCY and DEVICENAME column in the response.
HTTP Client -> KSQL Table/Materialized view
Is this use-case possible? If so, how?
It is possible to query the internal state store used by Kafka streams by exposing an RPC endpoint on the streams application.
Check out the following documentation and examples provided by Confluent.
https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html#streams-developer-guide-interactive-queries-rpc-layer
https://github.com/confluentinc/kafka-streams-examples/blob/7.1.1-post/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/KafkaMusicExample.java
https://github.com/confluentinc/kafka-streams-examples/blob/4.0.x/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/MusicPlaysRestService.java
You can get the result of the query using http. You have to send a post method with your query to the ksql address
curl -X "POST" "http://localhost:8088/query" \
-H "Accept: application/vnd.ksql.v1+json" \
-d $'{
"ksql": "SELECT * FROM TEST_STREAM EMIT CHANGES;",
"streamsProperties": {}
}'
I took it from the developer guide of ksql
https://docs.ksqldb.io/en/latest/developer-guide/api/

Copying data from one Cassandra table to another with TTL

We are changing partition key of one of our table by removing one column from partition key. Every record in this table also has TTL. Now we want to preserve the data in that table with TTL.
How can we do it?
We can create new table with desired schema and then copy data from old table to new table. However, we loose TTL in this process.
For further information - This Cassandra table is populated by an Apache Storm application which reads events from Kafka. We can re-hydrate Kafka messages but Kafka has some unwanted messages which we don't want to process.
NOTE - TTL is decided based on date column value, which never changes. Because of this TTL would always be same on all the columns.
Before going to specific implementation, it's important to understand that TTL may exist on the individual cell as well as all cells in the row. And when you're performing INSERT or UPDATE operation, you can apply only one TTL value for all columns that are specified in the query, so if you have 2 columns with different TTLs, then you'll need to perform 2 queries - for each column, with different TTLs.
Regarding the tooling - there are 2 more or less ready-to-use options here:
Use DSBulk. This approach is described in details in the example 30.1 of this blog post. Basically, you need to unload data to disk using the query that will extract column values & TTLs for them, and then load data by generating batches for every column that have separate TTL. From example:
dsbulk unload -h localhost -query \
"SELECT id, petal_length, WRITETIME(petal_length) AS w_petal_length, TTL(petal_length) AS l_petal_length, .... FROM dsbulkblog.iris_with_id" \
-url /tmp/dsbulkblog/migrate
dsbulk load -h localhost -query \
"BEGIN BATCH INSERT INTO dsbulkblog.iris_with_id(id, petal_length) VALUES (:id, :petal_length) USING TIMESTAMP :w_petal_length AND TTL :l_petal_length; ... APPLY BATCH;" \
-url /tmp/dsbulkblog/migrate --batch.mode DISABLED
Use Spark Cassandra Connector - it supports reading & writing the data with TTL & WriteTime. But you'll need to develop the code that is doing it, and correctly handle things such as collections, static columns etc. (or wait for SPARKC-596 implemented)

How to create a unique ID for each record in Structure Streaming

My structured streaming program is reading data from Kafka.
I have to create Unique ID for each input record. is there any method available for it?
I tried monotonically_increasing_id() method, but it is always giving value of 0
DF_ID = DF.withColumn("Date",split(col("root"), "\\|").getItem(0)) \
.withColumn("InventoryAction_SKEY", monotonically_increasing_id())

How to combine Graph API field expansions?

When using this syntax to query multiple metrics
https://graph.facebook.com/v2.4/69999732748?fields=insights.metric(page_fan_adds_unique,page_fan_adds,page_fan_adds_by_paid_non_paid_unique)&access_token=XYZ&period=day&since=2015-08-12&until=2015-08-13
I always get data for the last three days regardness of the values of the since and until parameters
https://graph.facebook.com/v2.4/69999732748/insights/page_fan_adds_unique?period=day&access_token=XYZ&since=2015-09-10&until=2015-09-11
If I ask for a single metric, then the date parameters have effect.
Is there a different syntax for requesting multiple insights metrics which will accept the date parameters?
I think this should work:
curl -G \
-d "access_token=PAGE_TOKEN" \
-d "fields=insights.metric(page_fan_adds_unique,page_fan_adds,page_fan_adds_by_paid_non_paid_unique).since(2015-08-12).until(2015-08-13).period(day)" \
-d "pretty=1" \
"https://graph.facebook.com/v2.4/me"
You are requesting the page's insights as a field. This field contains a list of results in a data array (insights.data), and you want to filter this array.
The way to do that is by chaining parameterizations to the requested insights field like so:
fields=insights
.filter1(params)
.filter2(params)
.filter3(params)
Each .filterX(params) will be applied to the particular field that precedes it.
I've added newlines and indentation for clarity, but in your actual request you'd chain them all in a single line, without spaces.

Is there any way how I can get JSON raw data from Splunk for a given query?

Is there any way how I can get JSON raw data from Splunk for a given query in a RESTful way?
Consider the following timechart query:
index=* earliest=<from_time> latest=<to_time> | timechart span=1s count
Key things in the query are: 1. Start/End Time, 2. Time Span (say sec) and 3. Value (say count)
The expected JSON response would be:
{"fields":["_time","count","_span"],
"rows":[ ["2014-12-25T00:00:00.000-06:00","1460981","1"],
...,
["2014-12-25T01:00:00.000-06:00","536889","1"]
]
}
This is the XHR (ajax calls) for the output_mode=json_rows calls. This requires session and authentication setups.
I’m looking for a RESTful implementation of the same with authentication.
You can do something like this using the curl command
curl -k -u admin:changeme --data-urlencode search="search index=* earliest=<from_time> latest=<to_time> | timechart span=1s count" -d "output_mode=json" https://localhost:8089/servicesNS/admin/search/search/jobs/export