Cassandra get_range_slices - range

I am new to Cassandra and I am having some difficulties fetching data.
I looked into the function:
list<KeySlice> get_range_slices(column_parent, predicate, range, consistency_level)
But, I do not understand what the column_parent is supposed to be.
Anybody any idea?=
Thanx,
Granit

column_parent is basicly used for indicator of ColumnFamily(but in rare cases it can indicate a supercolumn). In java you would put : new ColumnParent("Posts") there. but there should be one more parameter for namespace in get_range_slices query, I guess you are not using thrift but a client api. then you should check your client's documentation.
Edit:
the definition of ColumnParent in cassandra api :
The ColumnParent is the path to the
parent of a particular set of Columns.
It is used when selecting groups of
columns from the same ColumnFamily. In
directory structure terms, imagine
ColumnParent as ColumnPath + '/../'.

Frail is correct, but the real answer is "don't use raw Thrift, use one of the clients from http://wiki.apache.org/cassandra/ClientOptions instead."

Related

REST url proper format

my REST API format:
http://example.com/api/v1.0/products - get all products
http://example.com/api/v1.0/products/3 - get product with id=3
Also, the products can be orginized into a product groups.
What is a proper way to get all product groups according to REST best practices:
http://example.com/api/v1.0/products/groups
or
http://example.com/api/v1.0/productgroups
...
another option ?
I can't agree with Rishabh Soni because http://example.com/api/v1.0/products/groups may lead to ambiguity.
I would put my money on http://example.com/api/v1.0/productgroups or even better http://example.com/api/v1.0/product_groups (better readability).
I've had similar discussion here: Updating RESTful resources against aggregate roots only
Question: About the thing of /products/features or /product-features,
is there any consensus on this? Do you know any good source to ensure
that it's not just a matter of taste?
Answer: I think this is misleading. I would expect to get all features
in all products rather than get all possible features. But, to be
honest, it’s hard to find any source talking directly about this
problem, but there is a bunch of articles where people don’t try to
create nested resources like /products/features, but do this
separately.
So, we can't be sure http://example.com/api/v1.0/products/groups will return all possible groups or just all groups that are connected with all existing products (what about a group that has not been connected with the product yet?).
To avoid this ambiguity, you can add some annotation in documentation. But you can just prepare http://example.com/api/v1.0/product_groups and all is clear.
If you are developing Rest API for your clients than you should not rely on id's. Instead build a meaningful abbreviation and map them to actual id on server side.
If that is not possible, instead of using
http://example.com/api/v1.0/products/3 you can use http://example.com/api/v1.0/products?product_id=3 and then you can provide "product_id" description in the documentation. basically telling the client ways to use product_id.
In short a url must be meaningful and follow a pattern.The variable part must be send by in the url query(part after ? or POST payload)
With this, method to querying the server is also important. If client is trying to get something to the server he should use "GET" http request, similar POST http request if it is uploading new info and "PUT" request if it is updating or creating a new resource.
So by this analogy http://example.com/api/v1.0/products/groups is more appropriate as it is following a pattern(groups in product) while productgroups is more like a keyword with no pattern.
A directory like pattern is more easier to understand. Like in file systems(C:\Program Files\WinRAR), every part gets us to more generalized target.
You can also customize this for specific group- http://example.com/api/v1.0/products/groups?id=3

Boolean logic in RESTful filtering and queries

This is sort of a follow-up to someone else's question about filtering/querying a list of cars. There the recommendation for a RESTful filtering request was to put filter expressions in the query of the URI, like this:
/cars?color=blue&type=sedan&doors=4
That's fine. But what if my filtering query becomes more complicated and I need to use Boolean operators, such as:
((color=blue OR type=sedan) AND doors=4) OR color=red
That is, I want to find a four-door blue car or a four-door sedan, but if the car is red I'll take it without caring about any of the other properties.
Is there any sort of convention for providing Boolean expressions in a RESTful URI's query parameters? I suppose I could by create some new querying expression language and put it in a POST, but that seems like a heavy and proprietary approach. How are others solving this?
It is perfectly okay to use
/cars/color:blue/type:sedan/doors:4
instead of
/cars?color=blue&type=sedan&doors=4
The URL standard says only that the path should contain the hierarchical part, and the query should contain the non-hierarchical. Since this is a map-reduce, using / is perfectly valid.
In your case you need a query language to describe your filters. If I were you I would copy an already existing solution, for example the query language of a noSQL database which has a REST API.
I think resource query language is what you need. I think you could use it like this:
/sthg?q="(foo=3|foo=bar)&price=lt=10"
or forget the default queryString parser, and like this:
/sthg?(foo=3|foo=bar)&price=lt=10
I suggest you to read the manual for further details.
Since I found no other URL compatible query language (yet), I think the only other option to serialize another query language and send it in a param, like SparSQL
http://localhost:8003/v1/graphs/sparql?query=your-urlencoded-query
by marklogic7. Hydra defines a freeTextQuery in its vocab, so they follow the same approach. But I'll ask Markus about this. It's a complicated topic, since according to the self-descriptive messages constraint you should describe somewhere what type of query language you use in the URL. I am not sure about this. :S
conclusion:
In order to support ad-hoc search queries we need a standard way to describe them in the link meta-data. Currently there are only a few standards about this. The most widely used standard is URI templates which does not support nested statements, operators, etc... for what I know. There is a draft called link descriptions which tries to fill the gap, but it is incomplete.
One possible workaround to define an URI template with a single q parameter which has rdf:type of x:SearchQuery and rdfs:range of xsd:string, and create another vocab about how to describe such a x:SearchQuery. After that the description could be used to build search forms, and validate queries sent to the server. Already existing queries could be supported too with this approach, so we don't need a new one.
So this problem can be solved with vocabs or new URI template standards.
I have seen many use a query string as you have provided - much like a SQL query string.
Here are just two examples:
Socrata (Open Data Portal company)'s SoQL (SQL variant): http://dev.socrata.com/consumers/cookbooks/querying-block-ranges.html
openFDA (API from fda.gov for open data) uses a similar string-based query parameter which maps to ElasticSearch queries, I believe: https://open.fda.gov/api/reference/#query-syntax
Try using 1 for true, 0 for false.
/_api/web/lists/getbytitle('XYZ')/items?$filter=Active eq 1

Marklogic REST API search for latest document version

We need to restrict a MarkLogic search to the latest version of managed documents, using Marklogic's REST api. We're using MarkLogic 6.
Using straight xquery, you can use dls:documents-query() as an additional-query option (see
Is there any way to restrict marklogic search on specific version of the document).
But the REST api requires XML, not arbitrary xquery. You can turn ordinary cts queries into XML easily enough (execute <some-element>{cts:word-query("hello world")}</some-element> in QConsole).
If I try that with dls:documents-query() I get this:
<cts:properties-query xmlns:cts="http://marklogic.com/cts">
<cts:registered-query>
<cts:id>17524193535823153377</cts:id>
</cts:registered-query>
</cts:properties-query>
Apart from being less than totally transparent... how safe is that number? We'll need to put it in our query options, so it's not something we can regenerate every time we need it. I've looked on two different installations here and the the number's the same, but is it guaranteed to be the same, and will it ever change? On, for example, a MarkLogic upgrade?
Also, assuming the number is safe, will the registered-query always be there? The documentation says that registered queries may be cleared by the system at various times, but it's talking about user-defined registered queries, and I'm not sure how much of that applies to internal queries.
Is this even the right approach? If we can't do this we can always set up collections and restrict the search that way, but we'd rather use dls:documents-query if possible.
The number is a registered query id, and is deterministic. That is, it will be the same every time the query is registered. That behavior has been invariant across a couple of major releases, but is not guaranteed. And as you already know, the server can unregister a query at any time. If that happens, any query using that id will throw an XDMP-UNREGISTERED error. So it's best to regenerate the query when you need it, perhaps by calling dls:documents-query again. It's safest to do this in the same request as the subsequent search.
So I'd suggest extending the REST API with your own version of the search endpoint. Your new endpoint could add dls:documents-query to the input query. That way the registered query would be generated in the same request with the subsequent search. For ML6, http://docs.marklogic.com/6.0/guide/rest-dev/extensions explains how to do this.
The call to dls:documents-query() makes sure the query is actually registered (on the fly if necessary), but that won't work from REST api. You could extend the REST api with a custom extension as suggested by Mike, but you could also use the following:
cts:properties-query(
cts:and-not-query(
cts:element-value-query(
xs:QName("dls:latest"),
"true",
(),
0
),
cts:element-query(
xs:QName("dls:version-id"),
cts:and-query(())
)
)
)
That is the query that is registered by dls:documents-query(). Might not be future proof though, so check at each upgrade. You can find the definition of the function in /Modules/MarkLogic/dls.xqy
HTH!

How to use hbase coprocessor to implement groupby?

Recently I learned hbase coprocessor, I used endpoint to accumulate one column of hbase table.For example, the hbase table named "pendings",its family is "asset", I accumulate all the value of "asset:amount". The table has other columns,such as "asset:customer_name". The first thing I want to do is accumulate the the value of "asset:amount" group by "asset:customer_name". But I found there is not API for groupby, or I did not find it. Do you know how to implement GROUPBY or how to use the API that HBASE provides?
You should use an endpoint to do this work.
You have a sum example in this article: https://blogs.apache.org/hbase/entry/coprocessor_introduction.
What you basically need to add is to append your row key and the customer name to form your new key "MyKey". You should keep a variable of the last seen MyKey and when the current MyKey is different from the previous one, you should emit the previous one along with its sum and overwrite the previous MyKey to the current one.
You have to make sure to perform the aggregation on the client side as it is done in the example provided in the URL because you may have a customer at the edges of two different regions.
Using endpoint coprocessor can make it. All you should do is that : first define related interface(reduce) protocol extends CoprocessorPotocol, then make an implementation of it, lastly code the client-side logic.

REST parameters vs URI

I'm just learning REST and trying to figure out how to apply it in practice. I have a sampling of data that I want to query, but I'm not sure how the URLs are meant to be formed, i.e. where I put the query. For example, for querying the most recent 100 data records:
GET http://data.com/data/latest/100
GET http://data.com/data?amount=100
which of the previous two queries is the better, and why? And the same for the following:
GET http://data.com/data/latest-days/2
GET http://data.com/data?days=2
GET http://data.com/data?fromDate=01-01-2000
Thanks in advance.
Personally, I would use the query string format in this case. If your /data path is returning all of the data, and you would like to perform this type of query, I believe it makes the most sense. You could also pass query string parameters such as ?since=01-01-2000 to get entries after a specified date or pass column names such as ?category=clothing to retrieve all entries with category equaling clothing.
Additionally, you would want paths such as /data/{id} to be available to retrieve certain entries given their unique id.
It really depends on a lot of things. If you're using any sort of MVC framework, you'd use the URI segments to define your get request to your API which I personally prefer.
It's not a big deal either way, it's all based on preference and how predictable you want the URL to be to your user. In some cases, I'd say go with the REST parameters, but more often than not a URI based GET is quite clean if your setup supports it.