Why would LINQ group by results be fewer from Visual Studio compared to SQL Server and Linqpad? - entity-framework

There are other questions similar to mine but they didn't help me. I'm performing what should be a simple Linq group by operation, and in SQL Server Management Studio and Linqpad I get 23,859 results from a table containing 36,102 total records. This is what I believe to be the correct result.
For some reason, when I move my query into my Visual Studio application code, I get 22,463 groups - and I cannot for the life of me figure out why.
I need to group this table's rows based on unique combinations of 8 columns. The columns contain account IDs, person IDs, device IDs, premise IDs, and address columns. Basically, a person can have multiple accounts, multiple premises, multiple devices, and each premise can have it's own address. I know the table design is lacking... it's customer provided and there are other columns that necessitate the format - it should not be relevant to the grouping though.
SQL Server: 23859 groups:
SELECT acct_id, per_id, dev_id, prem_id, address, city, state, postal
FROM z_AccountInfo GROUP BY acct_id, per_id, dev_id, prem_id, address, city, state, postal
ORDER BY per_id
Linqpad: 23859 groups:
//Get all rows...
List<z_AccountInfo> zAccounts = z_AccountInfo.ToList();
//Group them...
var zAccountGroups = (from za in zAccounts
group za by new { za.acct_id, za.per_id, za.dev_id, za.prem_id, za.address, za.city, za.state, za.postal } into zaGroups
select zaGroups).OrderBy(zag => zag.Key.per_id).ToList();
Visual Studio: 22463 groups - WRONG?:
//Intantiate list I can use outside of Entity Framework context...
List<z_AccountInfo> zAccounts = new List<z_AccountInfo>();
using (Entities db = Entities.CreateEntitiesForSpecificDatabaseName(implementation))
{
//Get all rows. Count verified to be correct...
zAccounts = db.z_AccountInfo.OrderBy(z => z.per_id).ToList();
}
// Group the rows. Doesn't work??? 22463 groups?
var zAccountGroups = (from z_AccountInfo za in zAccounts
group za by new { za.acct_id, za.per_id, za.dev_id, za.prem_id, za.address, za.city, za.state, za.postal } into zag
select zag).ToList();
I'm hoping someone can spot a syntax issue or something else I'm missing. Seems like Visual Studio is grouping something.. but it's off by 1396 groups... that's pretty significant.
UPDATE:
sgmoore's comment below put me on the track of making sure the zAccounts list from Linqpad and Visual Studio match. They do not!?! Querying the table in SQL Server shows this data (account / device / premise)
Inspecting the Visual Studio output in Beyond Compare shows the device ID 6106471 being erroneously repeated / duplicated for the 4 bottom rows... meaning there should be 2 groups here, but my query will only see 1...
Since I'm using Entity Framework to query the data in the table in Visual Studio, this makes me think something is wrong with my model but I have no idea what it could be. Beyond compare shows this same issue happening multiple times and explains why the group numbers are off. It's like EF knows there are 8 rows (in this case) - but the field that differentiates them doesn't come through.
I tried truncating the table and re-adding all of the data into it and re-running and the bad behavior persists. Quite confused here - I've never had this kind of issue with Entity Framework before.
I even ran SQL Profiler when VS was executing and trapped the query Entity Framework is firing to populate zAccounts. That query when fired by itself in SQL Server correctly shows the four 7066550 rows. This seems to be squarely on Entity Framework and the ToList() call that populates the full collection - ideas anyone?

Short answer - make sure the table in the Entity Framework model has an Entity Key on a column where the values of the column are unique.
Longer answer - to troubleshoot I ran SQL Profiler to ensure that the query EF was sending to SQL Server was correct - and it was. I ran that query and inspected the results to see the data I was wanting. The problem was my model. I had an Entity Key set on a field that did not contain unique values. My guess is that EF assumes that since the field is set as the Entity Key, the values must be unique. Based on that it somehow indexes or caches the first row where the "id" is and then projects that row's values into query results. That is a bad assumption in my view if there is not a validation check of the field marked as the Entity Key. I realize I'm to blame here for telling it to use a non-unique field as the Entity Key - but I don't see the case where this would be a good idea without it throwing at least a warning.
Anyway, to resolve, I added a proper id column to the table and set it's Identiy spec and auto-increment so that any rows in the table would have a unique id. After that, I updated my edmx to use my new column as the Entity Key and re-ran my code and then everything magically started working.

Related

Feedback about my database design (multi tenancy)

The idea of the SaaS tool is to have dynamic tables with dynamic custom fields and values of different types, we were thinking to use "force.com/salesforce.com" example but is seems to be too complicated to maintain moving forward, also making some reports to create with a huge abstraction level, so we came up with simple idea but we have to be sure that this is kinda good approach.
This is the architecture we have today (in few steps).
Each tenant has it own separate database on the cluster (Postgres 12).
TABLE table, used to keep all of those tables as reference, this entity has ManyToOne relation to META table and OneToMany relation with DATA table.
META table is used for metadata configuration, has OneToMany relation with FIELDS (which has name of the fields as well as the type of field e.g. TEXT/INTEGER/BOOLEAN/DATETIME etc. and attribute value - as string, only as reference).
DATA table has ManyToOne relation to TABLES and 50 character varying columns with names like: attribute1...50 which are NULL-able.
Example flow today:
When user wants to open a TABLE DATA e.g. "CARS", we load the META table with all the FIELDS (to get fields for this query). User specified that he want to query against: Brand, Class, Year, Price columns.
We are checking by the logic, the reference for Brand, Class, Year and Price in META>FIELDS table, so we know that Brand = attribute2, Class = attribute 5, Year = attribute6 and Price = attribute7.
We parse his request into a query e.g.: SELECT [attr...2,5,6,7] FROM DATA and then show the results to user, if user decide to do some filters on it, based on this data e.g. Year > 2017 AND Class = 'A' we use CAST() functionality of SQL for example SELECT CAST(attribute6 AS int) AND attribute5 FROM DATA WHERE CAST(attribute6 AS int) > 2017 AND attribute5 = 'A';, so then we can actually support most principles of SQL.
However moving forward we are scared a bit:
Manage such a environment for more tenants while we are going to have more tables (e.g. 50 per customer, with roughly 1-5 mil per TABLE (5mil is maximum which we allow, for bigger data we have BigQuery) which is giving us 50-250 mil rows in single table DATA_X) which might affect performance of the queries, especially when we gave possibilities to manage simple WHERE statements (less,equal,null etc.) using some abstraction language e.g. GET CARS [BRAND,CLASS,PRICE...] FILTER [EQ(CLASS,A),MT(YEAR,2017)] developed to be similar to JQL (Jira Query Language).
Transactions lock, as we allow to batch upload CSV into the DATA_X so once they want to load e.g. 1GB of the data, it kinda locks the table for other systems to access the DATA table.
Keeping multiple NULL columns which can affect space a bit (for now we are not that scared as while TABLE creation, customer can decide how many columns he wants, so based on that we are assigning this TABLE to one of hardcoded entities DATA_5, DATA_10, DATA_15, DATA_20, DATA_30, DATA_50, where numbers corresponds to limitations of the attribute columns, and those entities are different, we also support migration option if they decide to switch from 5 to 10 attributes etc.
We are on super early stage, so we can/should make those before we scale, as we knew that this is most likely not the best approach, but we kept it to run the project for small customers which for now is working just fine.
We were thinking also about JSONB objects but that is not the option, as we want to keep it simple for getting the data.
What do you think about this solution (fyi DATA has PRIMARY key out of 2 tables - (ID,TABLEID) and built in column CreatedAt which is used form most of the queries, so there will be maximum 3 indexes)?
If it seem bad, what would you recommend as the alternative to this solution based on the details which I shared (basically schema-less RDBMS)?
IMHO, I anticipate issues when you wanted to join tables and also using cast etc.
We had followed the approach below that will be of help to you
We have a table called as Cars and also have a couple of tables like CarsMeta, CarsExtension columns. The underlying Cars table will have all the common fields for a ll tenant's. Also, we will have the CarsMeta table point out what are the types of columns that you can have for extending the Cars entity. In the CarsExtension table, you will have columns like StringCol1...5, IntCol1....5, LongCol1...10
In this way, you can easily filter for data also like,
If you have a filter on the base table, perform the search, if results are found, match the ids to the CarsExtension table to get the list of exentended rows for this entity
In case the filter is on the extended fields, do a search on the extension table and match with that of the base entity ids.
As we will have the extension table organized like below
id - UniqueId
entityid - uniqueid (points to the primary key of the entity)
StringCol1 - string,
...
IntCol1 - int,
...
In this case, it will be easy to do a join for entity and then get the data along with the extension fields.
In case you are having the table metadata and data being inferred from separate tables, it will be a difficult task to maintain this over long period of time and also huge volume of data.
HTH

Can't remap fields - map fields window is missing new table

I have a Crystal Report with a database command:
The command has a join clause that can be removed and read from a table in the database, because it represents static data. I add this table (called _System) to the database expert:
Now I edit the command to remove the join and columns that reference this table. Since the report fields that depended on these columns are no longer mapped, this causes the Map Fields window to appear:
...which does not have the new table in it. If I cancel out of this I am back to where I originally was. If I hit OK without mapping, all of the unmapped fields on the report are deleted (suffice it to say... I was not expecting this >:( )
I have tried adding links between the command and the new table, and refreshing report parameters, but these have had no effect.
One workaround is to manually replace every field in the report, but this is very labour intensive.
Here is the outline of the command before:
SELECT ACT.Account_Code, ACT.Company, ACT.FName, --etc
STM.CompanyName AS 'DLRName', STM.Address_1 AS 'DLRAddress', STM.City AS 'DlrCity' --etc
FROM Accounts AS ACT
JOIN _System AS STM ON 1 = 1
GROUP BY ACT.Account_Code, ACT.Company, ACT.FName, --etc
STM.CompanyName, STM.Address_1, STM.City --etc
And after:
SELECT ACT.Account_Code, ACT.Company, ACT.FName, --etc
FROM Accounts AS ACT
GROUP BY ACT.Account_Code, ACT.Company, ACT.FName --etc
I have removed the JOIN on the _System table, and all referenced columns.
It appears to not be recognizing your _system table as a new source.
I would :
1) leave your command object SQL unchanged & get the issue worked out with the _System table, then
2) ensure that you are able to establish a join between the command object fields and the _System table fields, and lastly
3) then remap the fields.
Step two I suspect is the source of the problem, as your join condition is "ON 1 = 1" which I assume to mean that you may not have a common key field in both tables.
Note that your original command SQL selects STM.Companyname AS 'DLRName'.
Hence, crystal now know of a field called DLRName, but does not know of a field called CompanyName, hence it cannot make the association between DLRName in the old source, and CompanyName in the new source...
Likewise with the rest of the fields that are being moved from the command object to an attached table. if no name match exists...Crystal cant make the connection. However...it would list all unmatched fields that are on the report, and all unused fields in the recognized data sources, and allow you to specify the matches yourself.
But it does not...which tells me that something has gone wrong with the attempt to attach/open the _System table. Hence..you need to get that worked out first, then make the field adjustments.
If this doesnt get you thru...then show some sample data so I can see how the two tables are relating ( ensure some examples exists where there is a row match from both tables ).
I had the same problem a while ago.
Unfortunately I can't find anything online that helps, or maybe wasn't looking hard enough. I just noticed that in my case, that particular field that isn't showing in the map field dialogue box has nvarchar(max) as its datatype (in view).
I tried to force the datatype with CAST(missingfieldname as nvarchar(20)) as missingfieldname (I did this in the view), and voila, it magically appears in the map field dialogue box.
It seems that field mapping dialogue box aren't showing fields with blob texts.
I know this question was asked 4 years ago. But hopefully, this comment could help future solution seekers regarding this absurd and weird problem. I just got lucky seeing what's unique about that particular missing field.

Silverlight WCF RIA Service select from SQL View vs SQL Table

I have arrived at this dilemma via a tortuous and frustrating route, but I'll start with where I am right now. For information I'm using VS2010, Silverlight 5 and the latest versions of the Silverlight and RIA Toolkits, SDKs etc.
I have a view in my database (it's actually now an indexed view, but that has made no difference to the behaviour). For testing purposes (and that includes testing my sanity) I have duplicated the view as a Table (ie identical column names and definitions), and inserted all the view rows into the table. So if I SELECT * from the view or the table in Query Analyzer, I get identical results. So far so good.
I create an EDF model in my Silverlight Business Application web project, including all objects.
I create a Domain Service based on the model, and it creates ContextTypes and metadata for both the View and the Table, and associated Query objects.
If I populate a Silverlight ListBox in my Silverlight project via the Table Query, it returns all the data in the table.
If I populate the same ListBox via the View Query, it returns one row only, always the first row in the collection, however it is ordered. In fact, if I delve into the inner workings via the debugger, when it executes the ObjectContext Query in the service, it returns a result set of the correct number of rows, but all the rows are identical! If I order ascending I get n copies of the first row, descending I get n copies of the last row.
Can anyone put me out of my misery here, and tell me why the View doesn't work?
Ade
OK, well that was predictable - nearly every time I ask a question on a forum I stumble across the answer while I'm waiting for responses to flood in!
Despite having been through the metadata and model.designer files and made sure that all "view" and "table" class/method definitions etc were identical, it was still showing the exasperating difference in behaviour between view and table queries. So the problem just had to be caused by the database, right?
Sure enough, I hadn't noticed myself creating NOT NULL columns when I created the "identical" Table version of my view! Even though I was using a SELECT NEWID() to create a unique key column on the view, the database insisted that the ID column in the view was NULLABLE, and it was apparently this which was causing the problem.
To save some storage space I switched from using NEWID() to using ROW_NUMBER() to create my key column, but still had the "NULLABLE" property problem. SO I then changed it to
SELECT ISNULL(ROW_NUMBER() (OVER...) , -1)
for the ID column, and at last the column in the view was created NOT NULL! Even though neither NEWID() nor ROW_NUMBER() can ever generate NULL output, it seems you have to hold SQL Server's hand and reassure it by using the ISNULL operator before it will believe itself.
Having done this, deleted/recreated my model and service files, everything burst into glorious technicolour life without any manual additions of [Key()] properties or anything else. The problem had been with the database all along, and NOT with the Model/Service/Metadata definitions.
Hope this saves someone some time. Now all I need to do is work out why the original stored procedure method I started with two days ago doesn't work - but at least I now have a hint!
Ade

Entity Framework - Using stored procedure which return same column names from different entities

I have tables named Contact and Address and both of them have "ModifiedDate" column. I have written the CUD operations using Stored procedures. However, when It came to SELECT stored procedure in which I needed to return all the contacts with their addresses, I got an error.
System.Data.EntityCommandExecutionException:
The data reader is incompatible with
the specified
'AddressBookModel.SelectAllContactsWithAddresses_Result2'.
A member of the type, 'ModifiedDate1',
does not have a corresponding column
in the data reader with the same name.
I ended up changing the stored procedure to return different aliases for those column name and more unfortunately, I needed to change the properties in the entities in the model as well to match the selected column. I wrote a blog post about this here. I know I could have separate select SPs (and separate function imports) for both the entities but this is just one situation and can happen in other cases as well where same columns names might get returned from a complex query from multiple tables in a SP. Could anybody provide any direction on this?
I posted the same question in the MS forums for EF here (sorry for the cross posting) and the moderator confirmed that this is a bug in EF and asked me to create a bug in the Microsoft Connect and the bug Id is 597376 and here is the link for the same.
from Lingzhi Sun
MSFT, Moderator
Support in Forum>
Hello, After some research and test,
> I can repro this issue at my side.
> It can be a limitation of EF when
> handling stored procedure return
> values. I would recommend you open a
> ticket at Microsoft Connect to report
> this issue to the product team. If
> it is convenient, please share us with
> the ticket link here to benefit more
> community members.

SqlDataAdapter Update

Can any one help me why this error occurs when i update using sqlDataadapter with join query
Dynamic SQL generation is not supported against multiple base tables.
You have a "join" in your main query for your dataset (The first one in the TableAdapter with a check by it). You can't automatically generate insert/update/delete logic for a TableAdapter when the main query has multiple tables referenced in the query via a join. The designer isn't smart enough to figure out which table you want to send updates to in that case, that is why you get the error message.
Solution. Ensure that your main query only references the table you want the designer to write insert/update/delete code for. Your secondary queries may reference as many tables as you want.
It was in the case that i was trying to set value for identity column in my datarow. Simply i deleted the code to set value for identity column and it will work.
My Scenario:
Database:
uin [primary, identity]
name
address
Whenever i tried to set the datarow("uin") the error occurs. But works fine with datarow("name") and datarow("address").
Hope it works for you too