Sed to add quotes around json text following a specific json key - sed

I have below malformed json file. I want to quote the value of email, i.e "sampleemail#sampledoman.co.org". How do I go about it? I tried below but doesn't work.
sed -e 's/"email":\[\(.*\)\]/"email":["\1"]/g' sample.json
where sample.json looks like below
{
"supplementaryData": [
{
"xmlResponse": {
"id": "100001",
"externalid": "200001",
"info": {
"from": "C1022929291",
"phone": "000963586",
"emailadresses": {
"email": [sampleemail#sampledoman.co.org
]
}
},
"status": "OK",
"channel": "mobile"
}
}
]
}

Your code does not work because
[ is not escaped so not treated as a literal
You are using BRE, so capturing brackets will need to be escaped. In its current format, you will need -E to use extended functionality
The line does not end with ]
You did not add the space so there is no match, hence, no replacement.
For your code to work, you can use;
$ sed -E 's/"email": \[(.*)/"email": ["\1"/' sample.json
or
$ sed -E '/\<email\>/s/[a-z#.]+$/"&"/' sample.json
{
"supplementaryData": [
{
"xmlResponse": {
"id": "100001",
"externalid": "200001",
"info": {
"from": "C1022929291",
"phone": "000963586",
"emailadresses": {
"email": ["sampleemail#sampledoman.co.org"
]
}
},
"status": "OK",
"channel": "mobile"
}
}
]
}

With your shown samples, please try following awk code. Written and tested in GNU awk. Making RS as NULL and using awk's function named match where I am using regex (.*)(\n[[:space:]]+"emailadresses": {\n[[:space:]]+"email": \[)([^\n]+)(.*) to get required output which is creating 4 capturing groups which are 4 different values into array named arr(GNU awk's functionality in match function to save captured values into arrays) and then printing values as per requirement(adding " before and after email address value, which is 3rd element of arr OR 3rd capturing group of regex).
awk -v RS= '
match($0,/(.*)(\n[[:space:]]+"emailadresses": {\n[[:space:]]+"email": \[)([^\n]+)(.*)/,arr){
print arr[1] arr[2] "\"" arr[3] "\"" arr[4]
}
' Input_file

Related

Kubectl query for regex in JsonPath

I am trying to output the value for .metadata.name followed by the student's name in .spec.template.spec.containers[].students[] array using the regex in JsonPath for a Kubectl query.
I had actually asked a similar question linked here for this in jq.
How do I print a specific value of an array given a condition in jq if there is no key specified
The solution worked but I am wondering if there is an alternative solution for it using JsonPath or go-template perhaps (without the need for using jq).
For example, if I check the students[] array if it contains the word "Jeff", I would like the output to display as below:
student-deployment: Jefferson
What I've tried:
For JsonPath, I've tried the query below:
kubectl get deployment -o=jsonpath="{range .items[?(#.spec.template.spec.containers[*].students[*])]}{'\n'}{.metadata.name}{':\t'}{range .spec.template.spec.containers[*]}{.students[?(#=="Jefferson")]}{end}{end}"
But this only works to evaluate for matching words. Would it be possible to use JsonPath to query for regex as I've read here that JsonPath regex =~ doesn't work? I did try to use | grep and findstr but it still returned all values inside the array back to me. Other than jq, is there another way to retrieve the regex output?
https://github.com/kubernetes/kubernetes/issues/61406
The deployment template below is in json and I shortened it to only the relevant parts.
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {
"name": "student-deployment",
"namespace": "default"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"students": [
"Alice",
"Bob",
"Peter",
"Sally",
"Jefferson"
]
}
]
}
}
}
}
]
}
The documentation for JSONPath Support clearly describes that it is not possible with JSONPath and you can use jq.
https://kubernetes.io/docs/reference/kubectl/jsonpath/

Orion Context Broker Using Match Pattern with number

For example I have a person like below.
I wan to query person has phoneNumber contains "354".
I will use the query like this: GET /v2/entities?q=phoneNumber~=354.
So is it possible to do the query like this in orion context broker?
As I have seen that the match pattern only support target property as string.
Match pattern: ~=. The value matches a given pattern, expressed as a
regular expression, e.g. color~=ow. For an entity to match, it must
contain the target property (color) and the target property value must
match the string in the right-hand side, 'ow' in this example (brown
and yellow would match, black and white would not). This operation is
only valid for target properties of type string.
http://telefonicaid.github.io/fiware-orion/api/v2/stable/ Section: Simple Query Language
{
"type": "Person",
"isPattern": "false",
"id": "1",
"attributes": [
{
"name": "phoneNumber",
"type": "string",
"value": "0102354678"
}
]
}
Many thanks.
It works as you said.
For instance, using Orion 2.2.0 with an empty database in localhost:1026, creating an entity like the one you propose (but using NGSIv2 endpoint, as NGSIv1 is a deprecated API):
$ curl localhost:1026/v2/entities -H 'content-type: application/json' -d #- <<EOF
{
"type": "Person",
"id": "1",
"phoneNumber": {
"type": "string",
"value": "0102354678"
}
}
EOF
Then, you can do a query with "354" pattern and you will get the entity:
$ curl -s -S 'localhost:1026/v2/entities?q=phoneNumber~=354' | python -mjson.tool
[
{
"id": "1",
"phoneNumber": {
"metadata": {},
"type": "string",
"value": "0102354678"
},
"type": "Person"
}
]
On the contrary, if you do a query for a non-matching pattern (such as "999") you will not get any entity:
$ curl -s -S 'localhost:1026/v2/entities?q=phoneNumber~=999' | python -mjson.tool
[]

Sed for parsing

I have file:
"data_personnel": [
{
"id": "1",
"name": "Mathieu"
}
],
"struct_hospital": [
{
"id": "9",
"geo": "chamb",
"nb": ""
},
{
"id": "",
"geo": "jsj",
"nb": "SMITH"
},
{
"id": "10",
"geo": "",
"nb": "12"
},
{
"id": "2",
"geo": "marqui",
"nb": "20"
},
{
"id": "4",
"geo": "oliwo",
"nb": "1"
},
{
"id": "1",
"geo": "par",
"nb": "5"
}
]
How to use sed for for to have all the values ​​of geo in struct_hospital? (chamb, jsj, , marqui, oliwo, etc ..)
The file can be in any form. With tabs, everything on a line, etc ..
As pointed out by Sundeep, it makes more sense to use a proper JSON parser.
But if you are looking for a one-time quick and dirty solution, then this might do:
sed -n '/^"struct_hospital"/,/^]/s/^.*"geo"\s*:\s*"\([^"]*\)"\s*,\?.*$/\1/p' input.txt
Sample output:
chamb
jsj
marqui
oliwo
par
Explanation:
/^"struct_hospital"/,/^]/ - only consider lines between struct_hospital and the closing bracket.
s/.../\1/p search and replace; only print the first capturing subpattern of every matching line
^.*"geo"\s*:\s*"\(.*\)"\s*,\?.*$ matches the geo lines; captures the value following the colon
In case the input spans a single line, you can use another sed invocation as a preprocessor to insert line breaks:
sed 's/]\|,/\n&/g'
This makes the full command:
sed 's/]\|,/\n&/g' input.txt | sed -n '/^"struct_hospital"/,/^]/s/^.*"geo"\s*:\s*"\([^"]*\)"\s*,\?.*$/\1/p'

sed replace special characters followed by newline

I have the following file
"user_id1": "184767",
"timeStamp": "2017-03-08 19:55:25.000"
},
{
"user_id1": "146364",
"timeStamp": "2017-03-12 23:48:48.000"
},
]
I want to replace all instances of },] with }]
Try this:
sed '/},/N;s/,\n *\]/]/' file
When }, is found, adds next line to the pattern space and replace the new line followed by ] with ].

Elasticsearch uses wrong Case Folding for Unicode Characters

In one of my project, I am trying to use Elasticsearch (1.7) to query data. But, It returns different result for unicode characters depending on if they are uppercased or not. I try to use icu_analyzer to get rid of problem.
Here is a small example to demonstrate my problem. My index is like this,
$ curl -X PUT http://localhost:9200/tr-test -d '
{
"mappings": {
"names": {
"properties": {
"name": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1",
"analysis": {
"filter": {
"nfkc_normalizer": {
"type": "icu_normalizer",
"name": "nfkc"
}
},
"analyzer": {
"my_lowercaser": {
"tokenizer": "icu_tokenizer",
"filter": [
"nfkc_normalizer"
]
}
}
}
}
}
}'
Here is a test data to demonstrate my problem.
$ curl -X POST http://10.22.20.140:9200/tr-test/_bulk -d '
{"index": {"_type":"names", "_index":"tr-test"}}
{"name":"BAHADIR"}'
Here is a similar query. If I query using BAHADIR as query_string, I can easily find my test data.
$ curl -X POST http://10.22.20.140:9200/tr-test/_search -d '
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "BAHADIR"
}
}
}
}
}'
In Turkish, lowercased version of of BAHADIR is bahadır. I am expecting same result while querying with bahadır. But Elasticsearch cannot find my data. And I cannot fix that with using ICU for analysis. It works perfectly fine if I query with bahadir.
I already read Living in a Unicode World and Unicode Case Folding. But cannot fix my problem. I still cannot make elasticsearch to use correct case folding.
Update
I also try to create my Index like this.
$ curl -X PUT http://localhost:9200/tr-test -d '
{
"mappings": {
"names": {
"properties": {
"name": {
"type": "string",
"analyzer" : "turkish"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
}
}'
But I am getting same results. My data can be found if I search using BAHADIR or bahadir but it cannot be found by searching bahadır which is correct lowercased version of BAHADIR.
You should try to use the Turkish Language Analyzer in your setting.
{
"mappings": {
"names": {
"properties": {
"name": {
"type": "string",
"analyzer": "turkish"
}
}
}
}
}
As you can see in the implementation details, it also defines a turkish_lowercase so I guess it'll take care of your problems for you. If you don't want all the other features of the Turkish Analyzer, define a custom one with only turkish_lowercase
If you need a full text search on your name field, you should also change the query method to match query, which is the basic full text search method on a single field.
{
"query": {
"match": {
"name": "bahadır"
}
}
}
On the other hand, query string query is more complex and searches on multiple fields allowing an advanced syntax; It also has an option to pass the analyzer you want to use, so if you really needed this kind of query you should have tried passing "analyzer": "turkish" within the query. I'm not an expert of query string query though.