Parse record (PCF) from Kafka using Kafka Kusto Sink

Parse record (PCF) from Kafka using Kafka Kusto Sink - apache-kafka

I've set-up my environment using docker based on this guide.
On kafka-console-producer I will send this line:
Hazriq|27|Undegrad|UNITEN
I want this data to be ingested to Kusto like this:
+--------+-----+----------------+------------+
| Name | Age | EducationLevel | University |
+--------+-----+----------------+------------+
| Hazriq | 27 | Undegrad | UNITEN |
+--------+-----+----------------+------------+
Can this be handled by Kusto using the mapping (which I'm still trying to understand) or this should be catered by Kafka?
Tried #daniel suggestion:
.create table ParsedTable (name: string, age: int, educationLevel: string, univ:string)
.create table ParsedTable ingestion csv mapping 'ParsedTableMapping' '[{ "Name" : "name", "Ordinal" : 0},{ "Name" : "age", "Ordinal" : 1 },{ "Name" : "educationLevel", "Ordinal" : 2},{ "Name" : "univ", "Ordinal" : 3}]'
kusto.tables.topics_mapping=[{'topic': 'kafkatopiclugiaparser','db': 'kusto-test', 'table': 'ParsedTable','format': 'psv', 'mapping':'ParsedTableMapping'}]
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
but getting this instead:
+----------------------------+-----+----------------+------+
| Name | Age | EducationLevel | Univ |
+----------------------------+-----+----------------+------+
| Hazriq|27|Undergrad|UNITEN | | | |
+----------------------------+-----+----------------+------+

Currently, the connector passes the data as it comes (no manipulation on it on the client side), and any parsing is left to Kusto.
As such, psv format is supported by kusto, and it should be possible by setting the format to psv and providing a mapping reference.
When adding the plugin as described, you should be able to set it up like:
kusto.tables.topics_mapping=[{'topic': 'testing1','db': 'testDB', 'table': 'KafkaTest','format': 'psv', 'mapping':'KafkaMapping'}]
The mapping can be defined in Kusto as described in the Kusto docs defined like so

ingestion of data as you've shown using the psv format is supported (see below) - it's probably just a matter of debugging why your client-side invocation of the underlying commands aren't yielding the expected result. if you could share the full flow and code, including parameters, it may be helpful.
.create table ParsedTable (name: string, age: int, educationLevel: string, univ:string)
.ingest inline into table ParsedTable with(format=psv) <| Hazriq|27|Undegrad|UNITEN
ParsedTable:
| name | age | educationLevel | univ |
|--------|-----|----------------|--------|
| Hazriq | 27 | Undegrad | UNITEN |

Related

Forum like data structure: NoSQL appropriate?

I'm trying to save data which has a "forum like" structure:
This is the simplified data model:
+---------------+
| Forum |
| |
| Name |
| Category |
| URL |
| |
+---------------+
|1
|n
+---------------+
| |
| Thread |
| |
| ID |
| Name |
| Author |
| Creation Date |
| URL |
| |
+---------------+
|1
|n
+---------------+
| |
| Post |
| |
| Creation Date |
| Links |
| Images |
| |
+---------------+
I have multiple forums/boards. They can have some threads. A thread can contain n posts (I'm just interested in the links, images and creation date a thread contains for data analysis purposes).
I'm looking for the right technology for saving and reading data in a structure like this.
While I was using SQL databases heavily in the past, I also had some NoSQL projects (primarily document based with MongoDB).
I'm sure MongoDB is excellent for STORING data in such a structure (Forum is a document, while the Threads are subdocuments. Posts are subdocuments in Threads). But what about reading them? I have the following use cases:
List all posts from a forum with a specific Category
Find a specific link in a Post in all datasets/documents
Which technology is best for those use cases?

Please find below my draft solution. I have considered MongoDB for the below design.
Post Collection:-
"image" should be stored separately in GridFS as MongoDB collection have a maximum size of 16MB. You can store the ObjectId of the image in the Post collection.
{
"_id" : ObjectId("57b6f7d78f19ac1e1fcec7b5"),
"createdate" : ISODate("2013-03-16T02:50:27.877Z"),
"links" : "google.com",
"image" : ObjectId("5143ddf3bcf1bf4ab37d9c6e"),
"thread" : [
{
"id" : ObjectId("5143ddf3bcf1bf4ab37d9c6e"),
"name" : "Sam",
"author" : "Sam",
"createdate" : ISODate("2013-03-16T02:50:27.877Z"),
"url" : "https://www.wikipedia.org/"
}
],
"forum" : [
{
"name" : "Andy",
"category" : "technology",
"url" : "https://www.infoq.com/"
}
]
}
In order to access the data by category, you can create an index on "forum.category" field.
db.post.createIndex( { "forum.category": 1 } )
In order to access the data by links, you can create an index on "links" field.
db.organizer.createIndex( { "links": 1 } )
Please note that the indexes are not mandatory. You can access/query the data without index as well. You can create indexes if you need better read performance.
I have seen applications using MongoDB for similar use case as yours. You can go ahead with MongoDB for the above mentioned use cases (or access patterns).

Accessing postgres data structure

I have a table in my postgres table which has data structured strangely. Here is an example of the data structure:
id | 1
name | name
data | :type: information
| :url: url
| :platform:
| android: ''
| iphone: ''
created_at | 2016-07-29 11:39:44.938359
updated_at | 2016-08-22 12:24:32.734321
How do i change data > platform > android for example?

Just did some more research and found this which did the trick:
postgresql - replace all instances of a string within text field

Sane way to store different data types within same column in postgres?

I'm currently attempting to modify an existing API that interacts with a postgres database. Long story short, it's essentially stores descriptors/metadata to determine where an actual 'asset' (typically this is a file of some sort) is storing on the server's hard disk.
Currently, its possible to 'tag' these 'assets' with any number of undefined key-value pairs (i.e. uploadedBy, addedOn, assetType, etc.) These tags are stored in a separate table with a structure similar to the following:
+---------------+----------------+-------------+
|assetid (text) | tagid(integer) | value(text) |
|---------------+----------------+-------------|
|someStringValue| 1234 | someValue |
|---------------+----------------+-------------|
|aDiffStringKey | 1235 | a username |
|---------------+----------------+-------------|
|aDiffStrKey | 1236 | Nov 5, 1605 |
+---------------+----------------+-------------+
assetid and tagid are foreign keys from other tables. Think of the assetid representing a file and the tagid/value pair is a map of descriptors.
Right now, the API (which is in Java) creates all these key-value pairs as a Map object. This includes things like timestamps/dates. What we'd like to do is to somehow be able to store different types of data for the value in the key-value pair. Or at least, storing it differently within the database, so that if we needed to, we could run queries checking date-ranges and the like on these tags. However, if they're stored as text items in the db, then we'd have to a.) Know that this is actually a date/time/timestamp item, and b.) convert into something that we could actually run such a query on.
There is only 1 idea I could think of thus far, without complete changing changing the layout of the db too much.
It is to expand the assettag table (shown above) to have additional columns for various types (numeric, text, timestamp), allow them to be null, and then on insert, checking the corresponding 'key' to figure out what type of data it really is. However, I can see a lot of problems with that sort of implementation.
Can any PostgreSQL-Ninjas out there offer a suggestion on how to approach this problem? I'm only recently getting thrown back into the deep-end of database interactions, so I admit I'm a bit rusty.

You've basically got two choices:
Option 1: A sparse table
Have one column for each data type, but only use the column that matches that data type you want to store. Of course this leads to most columns being null - a waste of space, but the purists like it because of the strong typing. It's a bit clunky having to check each column for null to figure out which datatype applies. Also, too bad if you actually want to store a null - then you must chose a specific value that "means null" - more clunkiness.
Option 2: Two columns - one for content, one for type
Everything can be expressed as text, so have a text column for the value, and another column (int or text) for the type, so your app code can restore the correct value in the correct type object. Good things are you don't have lots of nulls, but importantly you can easily extend the types to something beyond SQL data types to application classes by storing their value as json and their type as the class name.
I have used option 2 several times in my career and it was always very successful.

Another option, depending on what your doing, could be to just have one value column but store some json around the value...
This could look something like:
{
"type": "datetime",
"value": "2019-05-31 13:51:36"
}
That could even go a step further, using a Json or XML column.

I'm not in any way PostgreSQL ninja, but I think that instead of two columns (one for name and one for type) you could look at hstore data type:
data type for storing sets of key/value pairs within a single
PostgreSQL value. This can be useful in various scenarios, such as
rows with many attributes that are rarely examined, or semi-structured
data. Keys and values are simply text strings.
Of course, you have to check how date/timestamps converting into and from this type and see if it good for you.

You can use 2 different technics:
if you have floating type for every tagid
Define table and ID for every tagid-assetid combination and actual data tables:
maintable:
+---------------+----------------+-----------------+---------------+
|assetid (text) | tagid(integer) | tablename(text) | table_id(int) |
|---------------+----------------+-----------------+---------------|
|someStringValue| 1234 | tablebool | 123 |
|---------------+----------------+-----------------+---------------|
|aDiffStringKey | 1235 | tablefloat | 123 |
|---------------+----------------+-----------------+---------------|
|aDiffStrKey | 1236 | tablestring | 123 |
+---------------+----------------+-----------------+---------------+
tablebool
+-------------+-------------+
| id(integer) | value(bool) |
|-------------+-------------|
| 123 | False |
+-------------+-------------+
tablefloat
+-------------+--------------+
| id(integer) | value(float) |
|-------------+--------------|
| 123 | 12.345 |
+-------------+--------------+
tablestring
+-------------+---------------+
| id(integer) | value(string) |
|-------------+---------------|
| 123 | 'text' |
+-------------+---------------+
In case if every tagid has fixed type
create tagid description table
tag descriptors
+---------------+----------------+-----------------+
|assetid (text) | tagid(integer) | tablename(text) |
|---------------+----------------+-----------------|
|someStringValue| 1234 | tablebool |
|---------------+----------------+-----------------|
|aDiffStringKey | 1235 | tablefloat |
|---------------+----------------+-----------------|
|aDiffStrKey | 1236 | tablestring |
+---------------+----------------+-----------------+
and correspodnding data tables
tablebool
+-------------+----------------+-------------+
| id(integer) | tagid(integer) | value(bool) |
|-------------+----------------+-------------|
| 123 | 1234 | False |
+-------------+----------------+-------------+
tablefloat
+-------------+----------------+--------------+
| id(integer) | tagid(integer) | value(float) |
|-------------+----------------+--------------|
| 123 | 1235 | 12.345 |
+-------------+----------------+--------------+
tablestring
+-------------+----------------+---------------+
| id(integer) | tagid(integer) | value(string) |
|-------------+----------------+---------------|
| 123 | 1236 | 'text' |
+-------------+----------------+---------------+
All this is just for general idea. You should adapt it for your needs.

Update a single value in a database table through form submission

Here is my table in the database :
id | account_name | account_number | account_type | address | email | ifsc_code | is_default_account | phone_num | User
-----+--------------+----------------+--------------+---------+------------------------------+-----------+--------------------+-------------+----------
201 | helloi32irn | 55265766432454 | Savings | | mypal.appa99721989#gmail.com | 5545 | f | 98654567876 | abc
195 | hello | 55265766435523 | Savings | | mypal.1989#gmail.com | 5545 | t | 98654567876 | axyz
203 | what | 01010101010101 | Current | | guillaume#sample.com | 6123 | f | 09099990 | abc
On form submission in the view, which only posts a single parameter which in my case is name= "activate" which corresponds to the column "is_default_account" in the table.
I want to change the value of "is_default_account" from "t" to "f". For example here in the table, for account_name "hello" it is "t". And i want to deactivate it, i.e make it "f" and activate any of the other that has been sent trough the form

This will update your table and make account 'what' default (assuming that is_default_account is BOOLEAN field):
UPDATE table
SET is_default_account = (account_name = 'what')
You may want limit updates if table is more than just few rows you listed, like this:
UPDATE table
SET is_default_account = (account_name = 'what')
WHERE is_default_account != (account_name = 'what')
AND <limit updates by some other criteria like user name>

I think to accomplish what you want to do you should send at least two values from the form. One for the id of the account you want to update and the other for the action (activate here). You can also just send the id and have it toggle. There are many ways to do this but I can't figure out exactly what you are trying to do and whether you want SQL or Playframework code. Without limiting your update in somewhere (like id) you can't precisely control what specific rows get updated. Please clarify your question and add some more code if you want help on the playframework side, which I would think you do.

Where can I find the documentation for MicroStrategy Command Manager

Where can I find the documentation for MicroStrategy Command Manager? I've look through various docs that I have but not able to find any comprehensive list of commands. Particularly, I need to know a list of commands to create attributes and metrics.
Thanks

The best way to find the list of available commands is to use the Outlines option, available inside Command Manager.
To create an attribute use the following syntax:
CREATE ATTRIBUTE "<attribute_name>" [DESCRIPTION "<description>"] [LONGDESCRIPTION "<long_description>"] IN [FOLDER] "<location_path>" [HIDDEN (TRUE | FALSE)] ATTRIBUTEFORM "<form_name>" [FORMCATEGORY "<category_name>"] [FORMDESC "<form_description>"] [FORMTYPE (NUMBER | TEXT | DATETIME | DATE | TIME | URL | EMAIL | HTML | PICTURE | BIGDECIMAL | PHONENUMBER)] [REPORTSORT (NONE | ASC | DESC)] [BROWSESORT (NONE | ASC | DESC)] EXPRESSION "<form_expression>" [MAPPINGMODE (AUTOMATIC | MANUAL)] [EXPSOURCETABLES "<sourcetable1>" [, "<sourcetable2>" ...]] LOOKUPTABLE "<lookup_table>" FOR PROJECT "<project_name>";
and for metrics creation:
CREATE METRIC "<metric_name>" IN [FOLDER] "<location_path>" EXPRESSION "<expression>" [DESCRIPTION "<description>"] [LONGDESCRIPTION "<long_description>"] [HIDDEN (TRUE | FALSE)] [ALLOWSMARTMETRIC (TRUE | FALSE)] [REMOVEREPORTFILTERELEMENTS (TRUE | FALSE)] [TOTALSUBTOTALFUNCTION (AVERAGE | COUNT | DEFAULT| GEOMETRICMEAN | MAXIMUM | MEDIAN | MINIMUM | MODE | NONE | PRODUCT | STANDARDDEVIATION | SUM | VARIANCE)] [DYNAMICAGGREGATIONFUNCTION (AVERAGE | COUNT | DEFAULT| GEOMETRICMEAN | MAXIMUM | MEDIAN | MINIMUM | MODE | NONE | PRODUCT | STANDARDDEVIATION | SUM | VARIANCE)] [COLUMNALIAS "<columnalias>"] FOR PROJECT "<project_name>";
Some simple examples provided bellow (again from the outlines available inside command manager):
CREATE ATTRIBUTE "Day" DESCRIPTION "Duplicate of Day Attribute from folder \Time" IN FOLDER "\Schema Objects\Attributes" ATTRIBUTEFORM "ID" FORMCATEGORY "ID" FORMDESC "Basic ID form" FORMTYPE TEXT REPORTSORT ASC EXPRESSION "[DAY_DATE]" LOOKUPTABLE "LU_DAY" FOR PROJECT "MicroStrategy Tutorial";
CREATE METRIC "New Metric" IN FOLDER "\Public Objects\Metrics\Count Metrics" EXPRESSION "Count(Customer) {Country} <[Western United States Customers]>" FOR PROJECT "MicroStrategy Tutorial";

As Bruno said, the Outlines option gives you all the templates that there are in Command Manager, which you can build together and define to do what you want. If in doubt, check out the "MicroStrategy System Administration Guide" product manual, which covers all you need to know on command manager.

Here is one useful site with Command Manager documentation:
https://metacpan.org/pod/Business::Intelligence::MicroStrategy::CommandManager

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Parse record (PCF) from Kafka using Kafka Kusto Sink - apache-kafka

Related

Forum like data structure: NoSQL appropriate?

Accessing postgres data structure

Sane way to store different data types within same column in postgres?

Update a single value in a database table through form submission

Where can I find the documentation for MicroStrategy Command Manager

Categories

Resources