magento2 customer online grid - missed field: IP, Last URL, etc - magento2

How I can return in Magento 2 (version 2.0.0) some fields for customer online grid such as IP, Last URL, etc. ?
There are only 6 columns available in the columns menu:
In Magento 2 beta version this fields were available:

The simple answer:
No, you cant, and you should not even try. The standard solution as of today is to use a 3th party service/software like google analytics or piwik.
more explanation:
The internal module responsible for this was removed, as it was not used in most cases, but had a big impact on performance.
Especially the "last url" is a very expensive feature, which requires a database write on each and every request of a customer. This are expensive, add 100-300ms to every request of the user, and use IO resources which are important for the order process and can in a worst case scenario slow the whole order process down, if you have a lot of customers online at the same time.
The last IP is maybe still possible to get from the last login entry. But I am not sure what magento stores related to this.
Also notable, there are countries where you are not allowed to store the full IP of users.

Related

Microservices - Storing user data in separate database

I am building a microservice that has two separate services: a user service and a comments service. The user service stores the user details like email, first/last name job title, etc, and the comments service stores all comments made by the user.
In the UI, I need to populate the comments (via a REST API) and show the first/last name, email, and job title of the user.
Is it recommended that we store all these user details in the comments database?
If yes, then every time a user changes their details first/last name or job title then I will have to update their details in all the comments (I don't think this is a good idea )
If no, then if I store just the userid in the comments DB, how am I supposed to get the user details for each comment? Let's say we want to show 20 comments per page in the UI.
First, challenge architecture. Let's assume that the both services in the question are part of a larger ecosystem of microservices that all make use of the user information. Else separation will most certainly be overengineered. But from the word "comments" we can at least guess that there is at least one other class of objects, that is the things being commented. So let's assume a "user service" is a meaningful crumb to break out into a microservice, because at least some other crumbs get the necessary weight to justify the microservice breakup.
In that case I suggest the following strategy:
Second, implement an abstraction layer into your comments service right away so that most of the code will not have to care about where the user comes from (i.e. don't join or $lookup). This is also a great opportunity for local testing, because you can just create a collection with the data you need and run service level integration tests against it.
Third, for integration with the user service, get the data from there via API (which should support bulk data selection in any case) every time you need it. Because you have the abstraction layer, you can add caching, cache timeout and displacement strategies and whatever you may need below this abstraction without caring in the main portion of the code. Add such on an as needed basis. Keep it simple.
Fourth, when things really go heavyweight and you have to care with tens of thousands of users, tons of comments and many requests per second the comments service could, still below the abstraction, implement an upfront replication pattern to get the full user database locally. This will usually be done based on an asynchronous message being sent by the user service to all subscribers when something changes in te user base. When it suits the subscribers (i.e. the comment service), they can trigger full or (from time to time) delta replication of the changes. Suitable collections will be already in place from what you did for caching. And it will probably be considerably less info you need in the comments service, than is stored in user service (let alone the hashed password, other login options or accounting information).
Fifth, should you still hit performance challenges, you can break the abstraction for the few cases you need to and do the join or $lookup.
Follow the steps in order, and stop as soon as the overall assembly works fine. Every step adds considerable complexity, and when you don't need it, don't implement it.

How appropriate it is to use SAML_login with AEM with more than 1m users?

I am investigating a slow login time and some profile synchronisation problems of a large enterprise AEM project. The system has around 1.5m users. And the website is served by 10 publishers.
The way this project is built, is that they have enabled the SAML_login for all these end-users and there is a third party IDP which I assume SAML_login talks to. I'm no expert on this SSO - SAML_login processes, so I'm trying to understand if this is the correct way to go at the first step.
Because of this setup and the number of users, SAML_login call takes 15 seconds on avarage. This is getting unacceptable day by day as the user count rises. And even more importantly, the synchronization between the 10 publishers are failing occasionally, hence some of the users sometimes can't use the system as they are expected to.
Because the users are stored in the JCR for SAML_login, you cannot even go and check the home/users folder from crx browser. It times out as it is impossible to show 1.5m rows at once. And my educated guess is, that's why the SAML_login call is taking so long.
I've come accross with articles that tells how to setup SAML_login on AEM, and this makes it sound legal for what it is used in this case. But in my opinion this is the worst setup ever as JCR is not a well designed quick access data store for this kind of usage scenarios.
My understanding so far is that this approach might work well but with only limited number of users, but with this many of users, it is an inapplicable solution approach. So my first question would be: Am I right? :)
If I'm not right, there is certainly a bottleneck somewhere which I'm not aware of yet, what can be that bottleneck to improve upon?
The AEM SAML Authentication handler has some performance limitations with a default configuration. When your browser does an HTTP POST request to AEM under /saml_login it includes a base 64 encoded "SAMLResponse" request parameter. AEM directly processes that response and does not contact any external systems.
Even though the SAML response is processed on AEM itself, the bottle-necks of the /saml_login call are the following:
Initial login where AEM creates the user node for the first time - you can look at creating the nodes ahead of time. You could write a script to create the SAML user nodes (under /home/users) in AEM ahead of time.
During each login when the session is first created - a token node is created under the user node under /home/users/.../{usernode}/.tokens - this can be avoided by enabling the encapsulated token feature.
Finally, the last bottle-neck occurs when it saves the SAMLResponse XML under the user node (for later use required for SAML-based logout). This can be avoided by not implementing SAML-based logout. The latest com.adobe.granite.auth.saml bundle supports turning off the saving of the SAML response. Service packs AEM 6.4.8 and AEM 6.5.4 include this feature. To enable this feature, set the OSGI configuration properties storeSAMLResponse=false and handleLogout=false and it would not store the SAML response.

Sitecore share MonboDB DB for multiple sitecore instances

We are using the inhouse Mongo DB instance. We have 3 multisite solution with 3 websites each.
We are having a debate on whether each sitecore instance have their own mongo DB collection databases (reporting, analytic etc) . Or share one for all instances.
Does anyone has experience on this? Any suggestion would be helpful.
You can happily do this, as long as you want a single view of your traffic. The reporting mostly allows filtering by Site name, so be sure to call your sites different things. Also avoid the temptation to call all the home pages "Home" and use the site name, e.g. Potato Home, Badger Home, etc. The Page-Level reports show the Node's Display Name and they can be confusing otherwise. Imagine Home, Home, Home and Home all as the highest entry pages.
The xDB processing scales fairly linearly so adding 3 sites should be no more processing than if they were all separate, in total. We always use an external processing server and put a Mongo replica secondary on that box. Then in the Mongo connection string set the ?readPreference=nearest. Check the Mongo docs to see the options you can set on the connection string URL.
It will always better to track analytics data independently for a single instance of a website so as you mentioned there are 3 instances of websites with multisite for each of 3. So as per my recommendation you should go with 3 different xDB databases configuration for each physical instances of websites. It will give you a clear data for the reporting as per Instance wise, if more details you want please do comment.

Fetching 1 minute bars from Yahoo Finance

I'm trying to download 1 minute historical stock prices from Yahoo Finance, both for the current day and the previous ones.
Yahoo (just like Google) supports up to 15 days worth of data, using the following API query:
http://chartapi.finance.yahoo.com/instrument/1.0/AAPL/chartdata;type=quote;range=1d/csv
The thing is that data keeps on changing even when the markets are closed! Try refreshing every minute or so and some minute bars change, even from the beginning of the session.
Another interesting thing is that all of these queries return slightly different data for the same bars:
http://chartapi.finance.yahoo.com/instrument/2.0/AAPL/chartdata;type=quote;range=1d/csv
Replace the bold number with 100000 and it will still work but return slightly different data.
Does anyone understand this?
Is there a modern YQL query that can fetch historical minute data instead of this API?
Thanks!
Historical minute data is not as easily accessible as we all would like. I have found that the most affordable way to gather Intraday Stock Price data is to develop automated scripts that log price information for whenever the markets are open.
Similar to the Yahoo data URL that you shared, Bloomberg maintains 1-Day Intraday Price information in JSON format like this : https://www.bloomberg.com/markets/api/bulk-time-series/price/AAPL%3AUS?timeFrame=1_DAY
The URL convention appears easy to input on your own once you have a list of Ticker Symbols and an understanding of the consistent syntax.
To arrive at that URL initially though, without having any idea for guessing / reverse-engineering it, I simply went here https://www.bloomberg.com/quote/AAPL:US and used Developer Tools on my browser and tracked a background GET request which led me to that URL. I wouldn't be surprised if you could employ similar methods on other Price Data-related websites.
You can also write scripts to track price data as fast as your internet goes. One python package that I find pretty handy and is ystockquote
You can have it request price data every couple of seconds and log that into a daily time series database.
Yes there is other APIs.
I don't know if it can still help but if you need intraday data, there is a API on rapidapi called (Quotient) which allows to pull intraday (at 1-min level), EOD market (FX, Crypto, Stocks (US, CANADIAN, UK, AUSTRALIA, EUROPE), ETFs and Futures. It also provides earnings, dividends, splits and a lot others informations.

Keeping things RESTful

New to rest and not having even known what REST was, I began watching a few videos and picked up a book to help guide me towards the correct approach. Unfortunately, my first version is completely botched to hell and I'm likely going to have to break any customers using that implementation shortly. To ensure that I don't need to do this again, I need your assistance!
I have a few DB tables that I'm concerned with here:
'PrimaryBuyer' & 'AllBuyers'
They share a majority of fields, but AllBuyers has a few things Primary does not and vice versa. Each primary buyer is given a unique 'CaseNumber' when entered into the system. This in addition to a 'SequenceNumber' is then used to identify 'AllBuyers'. This CaseNumber is returned to the user of the web service to store for future use. The sequence numbers however are implied based on their location within the XML / JSON.
To specify these tables -> For example, if I were to buy a car I would be the primary buyer and would thusly be entered into BOTH Primary and AllBuyers tables. However, if my credit was bad I could have my spouse cosign on the loan. This would make her a secondary buyer, and she would be entered exclusively into 'AllBuyers' table.
I currently have one REST URI set up as '/buyers/' which mandates that all information for all buyers is entered at once. Similarly if I were to do an update on this URI, the Primary is updated in both tables and any Secondary buyers in the payload would replace previously existing ones.
Ultimately, there is no way to directly access tables 'PrimaryBuyer' and 'AllBuyers'
I've been trying to think of a solution around this problem, but have been unable to think of anything that's necessarily RESTful or not a pain for customers. Is it ridiculous to think that the user should (say on an add) POST to /primarybuyer/, take the returned casenumber, and then POST the same information and then some to /allbuyers/? That seems like it would be a little silly on bandwidth among other things. Should things be left in their current state?
Hopefully that's not too much information to answer such a seemingly simple question.
Is it ridiculous to think that the user should (say on an add) POST to
/primarybuyer/, take the returned casenumber, and then POST the same
information and then some to /allbuyers/?
When you talk about "user" do you mean the person or the system (browser) that uses your service ?
Using a REST service is normally made by a system, today that means a lot browser+Javascript. To do the job for the user person is done at a web facade, and in back is running all your (Javascript) code to make the appropiated REST calls.
Why not post the buyer's information as a serialized object to the rest server? You could do this via the parameters section of the request. When the request got to the server, you could deserialize the object, and implement the logic of updating the database.
Kind of like how amazon does it? http://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_AddInstanceGroups.html
If the client application does POST /PrimaryBuyer there is no reason that the server cannot also copy that case information into the /AllBuyers resource and vice versa.