IBM Watson Discovery - How to list uploaded documents?

IBM Watson Discovery - How to list uploaded documents? - rest

I'm trying to get the ID-s of all documents uploaded to IBM Watson Discovery API
It's not obvious from the documentation, but I guess it must be possible?
The structure is:
+--------------------------------------------------+
| Environment |
| |
| +--------------------------------+ |
| | Collection | |
| | | |
| | +--------------+ | |
| | | Document | | |
| | | | | |
| | +--------------+ | |
| | | |
| +--------------------------------+ |
| |
+--------------------------------------------------+
How can I get a list of all documents in a collection?
.
I have tried:
Collections -> List collection fields
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}/collections/{collection_id}/fields?version=2019-04-30"
Queries -> Query a collection (GET)
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}/collections/{collection_id}/query?version=2019-04-30"
Environments -> Get environment info
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}?version=2019-04-30"
Training data -> List training data
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}/collections/{collection_id}/training_data?version=2019-04-30"
None of these give the list of documents.
.
UPDATE:
Thanks to #Catbelly's answer, the correct way to get a list of documents is:
curl -u "apikey":"{apikey}" "{url}/v1/environments/{environment_id}/collections/{collection_id}/query?version=2019-04-30&return=id

The Query endpoint is the only way to do this - if you set the return value equal to id you will only get ids back. (I am a IBM Watson employee)

Related

Fill empty rows of set with non empty value

Note: I've already gone over related questions like following that don't address my query
SQL: how to pick one row for each set of rows with duplicate value in one column?
Fill missing values with first non-null following value in Redshift
I have a sparse, unclean dataset like this
| id | operation | title | channel_type | mode |
|-----|-----------|----------|--------------|------|
| abc | Start | | | |
| abc | Start | recovery | | Link |
| abc | Start | recovery | SMS | |
| abc | Set | | Email | |
| abc | Verify | | Email | |
| pqr | Start | | | OTP |
| pqr | Verfiy | sign_in | Push | |
| pqr | Verify | | | |
| xyz | Start | sign_up | | Link |
and I need to fill up empty rows of each id with non-empty data available from other rows
| id | operation | title | channel_type | mode |
|-----|-----------|----------|--------------|------|
| abc | Start | recovery | SMS | Link |
| abc | Start | recovery | SMS | Link |
| abc | Start | recovery | SMS | Link |
| abc | Set | recovery | Email | Link |
| abc | Verify | recovery | Email | Link |
| pqr | Start | sign_in | Push | OTP |
| pqr | Verfiy | sign_in | Push | OTP |
| pqr | Verify | sign_in | Push | OTP |
| xyz | Start | sign_up | | Link |
notes
some ids can have a certain field as empty in all rows
and while most ids will have same non-empty values for each field, edge cases could have different values. For such groups, filling up any non-empty value in all rows is acceptable. [this is too rare in my dataset and can be ignored]
another extra bit of pattern is that certain fields are mostly only present only against rows of certain operations, for e.g. mode is only present against operation='Start' rows
I've tried grouping rows by id while performing listagg over title, channel_type and mode columns, followed by coalesce, something along the lines of this:
WITH my_data AS (
SELECT
id,
operation,
title,
channel_type,
mode
FROM
my_db.my_table
),
list_aggregated_data AS (
SELECT
id,
listagg(title) AS titles,
listagg(channel_type) AS channel_types,
listagg(mode) AS modes
FROM
my_data
GROUP BY
id
),
coalesced_data AS (
SELECT DISTINCT
id,
coalesce(titles) AS title,
coalesce(channel_types) AS channel_type,
coalesce(modes) AS mode
FROM
list_aggregated_data
),
joined_data AS (
SELECT
md.id,
md.operation,
cd.title,
cd.channel_type,
cd.mode
FROM
my_data AS md
LEFT JOIN
coalesced_data AS cd ON cd.id = md.id
)
SELECT
*
FROM
joined_data
ORDER BY
id,
operation
But for some reason this is resulting in concatenation of values (presumably from coalesce operation), where I get
| id | operation | title | channel_type | mode |
|-----|-----------|------------------|--------------|------|
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Set | recoveryrecovery | Email | Link |
| abc | Verify | recoveryrecovery | Email | Link |
| pqr | Start | sign_in | Push | OTP |
| pqr | Verfiy | sign_in | Push | OTP |
| pqr | Verify | sign_in | Push | OTP |
| xyz | Start | sign_up | | Link |
What's the correct way to approach this problem?

I'd start with the first_value() window function with the ignore nulls option. You will partition by the first 2 columns and will need to work out the edge cases with some data massaging, likely in the order by clause of the window function.

Swift & Firebase - Split data for user info?

I currently coding a fitness app that permits to record all the personal records for a user.
I'm really new with Cloud Firestore from Firebase, so I really don't know how I could structure the database.
In my mind, I have two options:
OPTION 1
Users
|
+--UserID
| |
| +--Name
| +--Phone
| +--etc..
|
|
Users-records
|
+--UserID
| |
| +--RecordName
| | |
| | +--recordValue
| | +--recordType
| |
| +--RecordName
| | +--recordValue
| | +--recordType
OPTION 2
Users
|
+--UserID
| |
| +--Name
| +--Phone
| +--etc..
| +--Records
| | |
| | +--RecordName
| | | |
| | | +--recordValue
| | | +--recordType
| | +--RecordName
| | | |
| | | +--recordValue
| | | +--recordType
The questions are: Do I have to split the collection for the user?
Do you think this architecture is well designed for the purpose (ie record personal records from users)?
Thank you very much

Your database structure really depends on how you are going to use it. Keep in mind that whenever you observe a node, you are also observing all of the children nodes.
So I'd probably go with something closer to Option two, maybe like this:
Users
|
+--UserID
| |
| +--UserInfo
| | |
| | +--Name
| | +--Phone
| | +--etc..
| |
| +--Records
| | |
| | +--RecordName
| | | |
| | | +--recordValue
| | | +--recordType
| | +--RecordName
| | | |
| | | +--recordValue
| | | +--recordType
I'd choose this, because I'd image you'd want to get all of the UserInfo at once, So we can observe that "UserInfo" node and get all of the children: name, phone, etc....
Then I'd think you'd also want to get all of the records at once, so we can observe that "Records" node and get all of that data.
Additionally, if you wanted, you could get everything at once by observing the UserID!
However, if you were maybe going to be getting a list of all the users, then you definitely don't want all this data in one spot and this design wouldn't work, because that is a lot of data to observe just to get all the users.
In summary: Choose an option which makes it easiest for you to get what you need, without getting extra data you don't want!

Talend split fields from one to double

I have mysql table with text field "email", which can contain "user#example.com" and "user1#example.com;user2#example.com;user3#example.com".
| Name | Email |
| user | user#example.com |
| user1 | user1#example.com;user2#example.com;user3#example.com |
How can i do output with Talend such this:
| Name | Email |
| user | user#example.com |
| user1 | user1#example.com |
| user1 | user2#example.com |
| user1 | user3#example.com

The tNormalize component does exactly this. You can provide a character for separation, in your case ; and get rows as result afterwards.
EDIT
AxelH pointed out that it is possible as well to use a String for separation, this is not a Character.

How to list Rackspace servers filtered by metadata using REST API?

I can see that it is possible to add metadata to a Rackspace virtual machine instance.
I want to get a list of running instances, filtered by a particular metatag value.
I can't see how to do so in the documentation however.
is it possible?

You should be able to do so using the openstack client... but it depends on which metatag you're interested in.
You can get a list of all servers:
openstack server list
Will spit something like
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
| 97606ae9-7f18-4a3c-903a-1583d446119b | trysmallwin | ERROR | |
| cb78b8d5-2f03-4a3f-ab26-f389acbd0b76 | Win-try again | ERROR | public=2607:f298:5:101d:f816:3eff:fe9e:5cd4, 208.113.133.90, 2607:f298:5:101d:f816:3eff:fe36:da45, |
| | | | 208.113.133.93, 2607:f298:5:101d:f816:3eff:fe40:57d5, 208.113.133.95 |
| 040751d1-c4c5-47aa-8dec-1d69a468be1c | hnxhdkwskrvwvdwr | ACTIVE | public=2607:f298:5:101d:f816:3eff:fe60:324, 208.113.130.52 |
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
note the ID of the server and investigate deeper:
openstack server show 040751d1-c4c5-47aa-8dec-1d69a468be1c
+--------------------------------------+------------------------------------------------------------+
| Field | Value |
+--------------------------------------+------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | iad-2 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2016-07-26T17:32:01.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | public=2607:f298:5:101d:f816:3eff:fe60:324, 208.113.130.52 |
| config_drive | True |
| created | 2016-07-26T17:31:51Z |
| flavor | gp1.semisonic (50) |
| hostId | e1efd75d1e8f6a7f5bb228a35db13647281996087d39c65af8ce83d9 |
| id | 040751d1-c4c5-47aa-8dec-1d69a468be1c |
| image | Ubuntu-14.04 (03f89ff2-d66e-49f5-ae61-656a006bbbe9) |
| key_name | stef |
| name | hnxhdkwskrvwvdwr |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| project_id | d2fb6996496044158cf977c2129c8660 |
| properties | |
| security_groups | [{u'name': u'default'}] |
| status | ACTIVE |
| updated | 2016-07-26T17:32:01Z |
| user_id | 5b2ca246f39a425f9a833460bf322603 |
+--------------------------------------+------------------------------------------------------------+
openstack --f json will output the same stuff but in json format that you can more easily manipulate programmatically.
HTH

MondoDB, export to CSV with a "join" on a value

So I know the command to export via Mongo shells is
mongoexport --host localhost --db dbname --collection name --csv > test.csv
but I also need to do a "join" (in quotes because I know Mongo doesn't actually do joins so to speak". I haven't found documentation on doing this in mongo shell.
I have collections called users and another called details.
I want to export: users (name,email) and details (city,country) where users (place [object ID]) = details (_id [object ID]).
Tables look like this:
users
+----+------+---------------+-----------------------+
| id | name | email | place |
+----+------+---------------+-----------------------+
| 1 | bob | bob#bob.com | ObjectId("123456abc") |
| 2 | mark | mark#mark.com | ObjectId("654321abc") |
| 3 | dave | dave#dave.com | ObjectId("987655abc") |
+----+------+---------------+-----------------------+
details
+----+-------+---------------+-----------------------+
| id | city | country | _id |
+----+-------+---------------+-----------------------+
| 1 | Perth | Australia | ObjectId("123456abc") |
| 2 | Tokyo | Japan | ObjectId("654321abc") |
| 3 | NY | United States | ObjectId("987655abc") |
+----+-------+---------------+-----------------------+
And the result I'm trying to achieve is this:
+----+------+---------------+-------+---------------+
| id | name | email | city | country |
+----+------+---------------+-------+---------------+
| 1 | bob | bob#bob.com | Perth | Australia |
| 2 | mark | mark#mark.com | Tokyo | Japan |
| 3 | dave | dave#dave.com | NY | United States |
+----+------+---------------+-------+---------------+

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

IBM Watson Discovery - How to list uploaded documents? - rest

The Query endpoint is the only way to do this - if you set the return value equal to id you will only get ids back. (I am a IBM Watson employee)

Related

Fill empty rows of set with non empty value

Swift & Firebase - Split data for user info?

Talend split fields from one to double

How to list Rackspace servers filtered by metadata using REST API?

MondoDB, export to CSV with a "join" on a value

Categories

Resources