Separation of the repository by content (separation of parsing and data processing)

Separation of the repository by content (separation of parsing and data processing) - github

The goal of the project is to create a recommender system in which I go through all the stages of building, training a model from collecting and processing data, creating a model to implementing it into a bot.
The problem is that I don't know if it's worth splitting the repository into two, one is responsible for collecting information, and the second for building a model. The resulting model will be used in the telegram bot, so this part of the project must also be placed in the same repository or create a separate one that will copy the model (one file) from their repository.
The first option is not to separate the repository and store parsing, processing and model building in separate folders, but this option is not always convenient, the second option is to create two separate repositories, but then the project building logic will collapse, that is, data collection, this is a strange part of the project, that it is in a separate repository. Does the community have certain rules in such a situation?

Related

How do I split a large Blazor component, when it has a large object graph as its data?

I have an ASP.NET Blazor server-side project, using EF Core. One of the pages is getting quite large. Apart from any other reasons for keeping code files a reasonable size, the large size causes significant delays when recompiling.
I would like to split it down into smaller components, but the problem is that the whole page represents a fairly large object graph, parts of which are used in multiple places on the page.
Imagine a page that shows details for a company. The company has many employees, each of whom can claim expenses, which are added to the company's transactions list. The company itself has income and expense, so that adds more transactions. Other parts of the company object graph might also have associated expenses. This is a very simplified (and fictitious) sample of the idea. The page has various sections, such as one for employee details, which shows their transactions, as well as an overall transaction list.
At various places, you can add transactions, which get associated with the employee (or whatever), and are shown on both that transaction list and the main one. All of this is done with individual forms for each action, it's not one huge form for the whole object graph.
If I were to split the component down, I would be faced with one of the following choices (unless someone can suggest another)...
Have each smaller component inject its own DbContext and handle its own data access. This is fine in theory, but would cause concurrency problems as it would mean that different components were saving changes to the same entities. It would also require a lot of events to inform the parent component that data had changed in the subcomponents, which would end up very messy.
Have each smaller component have parameters for the bits of the object graph they handle. This avoids any concurrency issues, as only the parent component would be doing any data access. I'm not sure if it would avoid the need for events, as it depends on how well Blazor would notice if a part of the graph passed to a subcomponent changed.
Pass the DbContext in to each smaller component as a parameter. Again, this avoids any concurrency issues, but really feels like the wrong way to do it.
Anyone able to guide me as to the best way to split this up?
Thanks

If you have lots of sub-components accessing the data in the Form [your top level component - some sort of dashboard?] then you probably have quite a bit of plumbing to try and keep everything in sync, or lots of rendering going on if you are cascading objects. How often are you calling StateHasChanged?
Without some code I can only answer in very generic terms.
Your first step is to separate out your data and data management from your components and form. Move the data and the database into a DI service. The scope depends on what you're doing: Transient or Scoped. You can then use normal events to signal updates to components that need to render if something changes. There's an answer here that shows how to do this - https://stackoverflow.com/a/69562295/13065781.
[Opinionated] You also need to understand that by building a complex object (your DataGraph) and then letting EF manage it's state, [in Clean Design terms] you're building core application logic (the relationships between your basic data objects) into your infrastructure layer. The advantage is it makes things easy, and saves a lot of coding. The disadvantages come to light over time.

Entity Framework Code First Library and Database Update Implications

We have recently begun using Entity Framework for accessing all the various databases we touch on a regular basis. We've established a collection of library projects, one for each of these. For many of them, we're accessing established databases that do not change, and using DB first works just great.
For some projects, though, we're developing evolving databases that are having new fields and tables added periodically. Because these libraries are used by multiple projects (at the moment, just two, but eventually many more), running a migration on the production database necessitates a republish of both/all sites that use that particular DB's library. Failure to update the library on any other site of course produces the error that the model backing the context has changed.
How can we effectively manage the deployment/update of the Code-First libraries to all of the sites that use them each time a change to the database is made?

A year later, here's what we came up with and have been using.
We now include the following line in the Application_Start() method:
Database.SetInitializer<EFLib.MyHousing.MyHousingMVCContext>(null);
This causes it not to throw a fit if the current database model doesn't exactly match what's in the code. While there is still potential for problems if non-backward-compatible changes are made, this allows for new functionality to be added without the need to re-deploy every site that uses these libraries when the affecting changes are not relevant to that particular site.

Set GeoServer to access a Postgresql database, Simple or Snapshot schema, populated by Osmosis

I have a postgresql database which I keep updated using Osmosis. Osmosis can write to two different database schemas, named Simple and Snapshot. There are not that much different from the database Geoserver uses, But I can't make Geoserver use it perfectly.
The main problem seems to be the way tags are stored in those DBs. I can add the nodes layer and display it with that default Points style, but as soon as I use a "ogc:Filter" in my style to filter the nodes by their "place" tag, the WMS is broken and does not respond (says: The requested Style can not be used with this layer. The style specifies an attribute of place and the layer is: TestDB:nodes)
Is there anyway to make GeoServer understand that one of those shemas, or make Osmosis update to the DB GeoServer knows?

This is a case for using TRIGGERs to manage the integration. The two programs use two different schemas. You can CREATE TRIGGERs in the database which ensure that data written to one application is made available to another. Another option is you can set one or both to use VIEWs populated in part by the other application. In PostgreSQL, a VIEW can have triggers attached so these are not really
This is, in any case, a potentially large project so rather than offering sample code, I will offer a general outline of what sorts of things you need to think about.
Are these generally applicable? If so do you want to start an open source integration project?
Are both of these read-only workloads? Does data ever update? In general, if you are going to use views, updates pose the most concerns, so you want to run the views on the side not doing the updates if such is the case.
What is the write model of both sides? Insert/Update? Append only? Static data? What data do you have to "replicate" between the schemas?
Once you have those answers it should be relatively straightforward to get started and ask for help (either as an open source project or here) where you get stuck.

How to have multiple apps - one Core Data?

I’m an experienced developer, but new to Mac. I just don’t want to go down one path only to find out that I made some fundamental error or incorrect assumption later on.
I want to ultimately build and sell an iPhone app using Core Data. The app will be free with content available through in-app purchase. Here is what I want to be able to do:
OPTION 1
Build a Mac OS X utility app that points to the same Core Data object model, but has its own “master” database.
Populate the master database using the Mac app.
Export a subset of the master data from the Mac app to a flat file (XML?) that is a subset of the master data.
When the user purchases that data, download from the cloud and import that data into the local iPhone data store.
Number 2 should be easy enough. I have read about the XML Parser that should help me with #4. I need help with #1 and 3.
For #1, I can’t figure out how I can maintain one object model for both apps with Xcode. That data model must accept model versioning. Do I just create two Projects, one Mac and one iPhone, and point them both to the same .xcdatamodel file and the magic happens for me?
For #3, is there any sample code that someone can share that will iterate through an array of objects to create the XML?
OPTION 2
Another option I am considering was discussed below. Instead of worrying about import/export, simply create individual sql files for each set of new or updated data.
I could maintain a separate "metadata" database that has information about the individual sql files that are available to the app.
Then, I can dynamically access the individual SQL files from the local documents directory. This is similar to an iBooks model where the sql files equate to individual books.
I'm thinking I could have only two active database connections at a time... one for the metadata and the other for the specific "book". I am not sure if this will scale to many (tens or hundreds) sql files, however.
Any help is appreciated!
Jon
UPDATE: I just saw the answer from Marcus Zarra at:
Removing and adding persistent stores to a core data application
It sounds like Option 2 is a bad idea.

For (1), you can use the same object model in both apps. Indeed, if you use the same Core Data generated store, you are required to do so. Simply, include the same model file in both apps. In Xcode, the easiest way to do this is to put the model file external to the project folders of each project and then add the model file without copying it to the project folder. This will ensure that both apps use the same model file for every build.
For (3), you need to first create an "export" persistent store using the same model as the reference store and add it to the reference context. In the model, create an "Export" configuration. Create a subentity for every entity in the model but do not change any attributes or relationships. Assign those entities to the Export configuration.
You will need to add a "Clone" method to each ManagedObject subclass for the reference entities. When triggered, the method will return a subentity populated with the reference objects attributes and relationships (the relationship objects will be cloned as well.)
Be aware that cloning an object graph is recursive and can use a lot of memory.
When you save, because you assigned them to the "Export" configuration, all the cloned export entities and their relationships will be saved to the export store. You will have cloned not only the objects but the related object graph.
Include the model and the export store in the iPhone app. Write the app to make use of the export entities only. It will never notice the absence of any reference objects.
For (4), I wouldn't mess around with using XML or exporting the data outside of core data at all. I would just use the export Core Data SQL store created in (3) and be done with it.

You can give a NSManagedObjectContext instance and instance of NSPersistentStoreCoordinator. This class has options allowing you to specify a file location for sotring data and a format (SQLite, Binary, or XML)

How do you plan to actually transfer data from Mac to iPhone? Is this something you do during development, or something people do during daily use? If the latter, you are probably better off building decoupled export/import into your app right away. So the Mac would serialize data into XML or JSON, push it somewhere in the cloud (not sure if local network/bonjour transfer is easier or useful, cloud is more universal), and iPhone fetches the data and deserializes it into the local schema/repository. You should not plan to work on the SQL layer with Core Data. Different platforms may use a different storage backend.

How do we share data between two different services

I am currently working on a web service which is periodically polled. It does not store its state and is instantiated everytime it is queried. Essentially, it retrieves the state of other external entities e.g. databases and delivers it back to the requester.
Recently, the need to store state as arisen in that
There is the need to continously collect data from a particular source and store the bits that are important/relevant
There is the need to collect the aggregate of a particular data source over a period of time
I came up with the following idea:
My main concern here is the fact that I am using a static class (essentially a global) to share data between the two services. Is there a better way to doing this?
edit: Thanks for the responses thus far. Apologies for the vaguesness of this question: just trying to work out what is the best way to share data across different services and am unsure as to the specifics (i.e. what is required). The platform that I am developing on is the .NET framework and both services are simply WCF services hosted as a Windows service.
The database route sounds like the most conventional way to go - however I am reluctant to go down that path for now (mainly for deployment/setup issues; it introduces the need to create new tables, etc in addition to simply installing the software) for at this point the transfer of relatively small amounts of data. This may of course change in the future and going the database route might be the way to go at that point.
Is there any other way besides adding a database persistance layer?

If you need to collect and aggregate data, you might want to consider using a database between the two layers. Or have I misunderstood something?
You should consider enhancing your question with more requirements: pretty much all options are open here.

Sure - how about data binding? I don't have a lot of information to go on here - about your platform but most sufficiently advanced systems offer it in some form.

You could replace your static shared data with some database representation, with a caching layer (like memcached) between the database and the webservice, so that most of the time the data is available very quickly from the cache, but can be retrieved from the database as needed.

I appreciate that you want to keep the architecture simple. Depending on the magnitude of items you have to look up and there permanency, you might just consider leveraging your file system or a message queue. It sounds like you want a file system, because that sounds the least amount of impact to your design.
If you start dealing with tens of thousands of small files, your directories could get hard to navigate and slow to do file lookups on. I typically shoot for about 1000 - 10000 files per directory, and concoct a routine that can generate a path to the file depending on the file name pattern. Keeping the number of subdirectories even is important, some file systems have a limit on the number of subdirectories in a parent directory.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse