Name matching using spark - scala

I have a very huge customer data on HDFS with metadata but looking at the column names we cannot tell that which column contains what data(for e.g. customer name, card no, phone, email, address).
I also cannot peek into the data to check the type of data.
my task is to implement a model where i am able to find out the type of sensitive data each column contains there after masking the data. we have masking rules for each type of sensitive data.
If we talk about only name, then how i can find out which column contains name of the customers.

Hi You need to find the Column Name in spark you can use this
df.printSchema()

Related

Schema compliance in Azure data factory

I am trying to do schema compliance of an input file in ADF. I have tried the below.
Get Metadata Activity
The schema validation that is available in source activity
But the above seems to only check if a particular field is present or not in the specified position. Also Azure by default takes the datatype of all these fields as string since the input is flat file.
I want to check the position and datatype as well. for eg:-
empid,name,salary
1,abc,10
2,def,20
3,ghi,50
xyz,jkl,10
The row with empid as xyz needs to be rejected as it is not of number data type. Any help is appreciated.
You can use data flow and create a filter to achieve this.
Below is my test:
1.create a source
2.create a filter and use this expression:regexMatch(empid,'(\\d+)')
3.Output:
Hope this can help you.

Azure Data Factory Copy using Variable

I am coping data from a rest api to an azure SQL database. The copy is working find but there is a column which isn't being return within the api.
What I want to do is to add this column to the source. I've got a variable called symbol which I want to use as the source column. However, this isn't working:
Mapping
Any ideas?
This functionality is available using the "Additional Columns" feature of the Copy Activity.
If you navigate to the "Source" area, the bottom of the page will show you an area where you can add Additional Columns. Clicking the "New" button will let you enter a name and a value (which can be dynamic), which will be added to the output.
Source(s):
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#add-additional-columns-during-copy
Per my knowledge, the copy activity may can't meet your requirements.Please see the error conditions in the link:
Source data store query result does not have a column name that is
specified in the input dataset "structure" section.
Sink data store (if with pre-defined schema) does not have a column
name that is specified in the output dataset "structure" section.
Either fewer columns or more columns in the "structure" of sink
dataset than specified in the mapping.
Duplicate mapping.
I think Mapping Data Flow is your choice.You could add a derived column before the sink dataset and create a parameter named Symbol.
Then set the derived column as the value of Symbol.
You can use the Copy Activity with a stored proc sink to do that. See my answer here for more info.

What kind of data does a SQL table need in order to build a map report in SSRS?

Please pardon my ignorance, I'm new to spatial data. I've been tasked with creating a map report in SSRS. When finished, the report will show locations of stores participating in the same promotion.
Currently, my database doesn't contain this information, so I have to create a table first. Luckily, I have a spreadsheet that does contain this information, so I can just import it. However, I'm not convinced that my spreadsheet has all the information I need. It shows basic information such as store name and address, but no geographical information (which I'm assuming I need).
So my question: What kind of data does a SQL table need in order to build a map report in SSRS?
You can create a custom table with the specific information that you want, but there are an inportant information that you mustn't forget, for example:
From the spatial data source:
SpatialData - A field that has spatial data that specifies the latitude and longitude of the city.
Name - A field that has the name of the city.
Area - A field that has the name of the region.
From the analytical data source:
Population - A field that has the city population.
City - A field that has the name of the city.
Area - A field that has the name of the territory, state, or region.

Where does Odoo 9 physically store the `image` field of `res.partner` records in the database?

I can't find the image column in res_partner table in an Odoo 9 PostgreSQL database? Where does Odoo 9 store this image field?
As of Odoo 9, many binary fields have been modified to be stored inside the ir.attachment model (ir_attachment table). This was done in order to benefit from the filesystem storage (and deduplication properties) and avoid bloating the database.
This is enabled on binary fields with the attachment=True parameter, as it is done for res.partner's image fields.
When active, the get() and set() method of the binary fields will store and retrieve the value in the ir.attachment table. If you look at the code, you will see that the attachments use the following values to establish the link to the original record:
name: name of the binary field, e.g. image
res_field: name of the binary field, e.g. image
res_model: model containing the field, e.g. res.partner
res_id: ID of the record the binary field belongs to
type: 'binary'
datas: virtual field with the contents of the binary field, which is actually stored on disk
So if you'd like to retrieve the ir.attachment record holding the value of the image of res.partner with ID 32, you could use the following SQL:
SELECT id, store_fname FROM ir_attachment
WHERE res_model = 'res.partner' AND res_field = 'image' AND res_id = 32;
Because ir_attachment entries use the filesystem storage by default, the actual value of the store_fname field will give you the path to the image file inside your Odoo filestore, in the form 'ab/abcdef0123456789' where the abc... value is the SHA-1 hash of the file. This is how Odoo implements de-duplication of attachments: several attachments with the same file will map to the same unique file on disk.
If you'd like to modify the value of the image field programmatically, it is strongly recommended to use the ORM API (e.g. the write() method), to avoid creating inconsistencies or having to manually re-implement the file storage system.
References
Here is the original 9.0 commit that introduces the feature for storing binary fields as attachments
And the 9.0 commit that converts the image field of res.partner to use it.

Azure Mobile Service query returns all lower case column name in JSON

I am new to Azure Mobile services. I noticed that whenever I used Azure online tools(in manage.windowsazure.com) to create a new column name for a table, it always turns my uppercase column name into lowercase. (for eample: I typed FullName for a column name, but it became fullname).
Now, if I used angular-azure-mobile-service to query data, it returns "fullname" in JSON format. Such as {'fullname': 'ABC Inc'}.
Is there anyway I can have the Json return be formatted as {'FullName':..} instead?
The column names for Azure Mobile Services are case insensitive, so we transform them to ensure there is no confusion. This should be completely separate from display issues. If you are worried about display, wrap the JSON in another object that transforms it appropriately.