Share large datasets between a group - share

Can please someone suggest an online services to share large files, over 100GB, amongst a group of people?
Specifically, we are working on a machine learning project that requires constant access to the files but without the need to download them. For this project we will manipulate the files with python and R, I know that I can upload and share the code with Git but is there a service (like docker?) that you can store information and 'play' with it online?
Thanks!

Common practice - use Git for your code and S3 for data.
You can also check open source tool DVC - http://dataversioncontrol.com -
which orchestrates Git modeling code with S3 or GCP storage. It was designed for ML scenarios. Python and R code both are supported by DVC.

Related

Getting started with Azure Repos: How do I upload files?

I am developing a game, and am looking for a way to manage version control between two computers. I was directed to use Repos.
I'm new to using version control at all, and when I try to follow tutorials for DevOps it talks about team coordination stuff that is NOT what I'm looking for. Honestly I'm not sure if this is the right solution for me.
I'm really trying to share files between two computers. Not just code, but also textures, meshes, level data, sounds, and ultimately the entire project. (And have a system to push/pull this data between computers, of course.)
I made a project within DevOps, but when I go to "Files" in Repos, I only have options to connect to a Git. How can I select files I need to share? Not just Visual Studio files, but my game's assets and other files?
Or is this even an option? Am I looking at the wrong service here?
You can clone the repository which contain the files you want to share in one project and import this repository in another project.
The document about cloning repo:
https://learn.microsoft.com/en-us/azure/devops/repos/git/clone?view=azure-devops&tabs=visual-studio
The document about importing repo:
https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository?view=azure-devops

How to properly manage .env files in a microservices architecture

I have been working for some time on a project with a microservices architecture where each service has its own environment variables which are handled with a .env file for each service/repository.
A great part of these variables are related to other services IPs and external resources keys which are different in each environment: Development, Staging, and Production so the .env is not a simple one.
Our development pace is fast and most of the time these variables change with new features and or changes implemented by teammates working on issues related to that service. This causes that almost every time others want to work with a service they get blocked and have to update the .env file before. Therefore we end up requesting and sharing .env files with each other all the time and there is a lack of a "Source of truth" for all the .env files.
I was wondering if someone else has had this problem or a similar one before and what approaches has followed to solve it or improve it.
Is there any application or framework for sharing and managing .env files in a team in an automated way?
Thanks in advance!
EDIT
Just to be clear, these are not being added to source control and they are properly handled on CI/CD.
I was talking more about local development, setting up services locally, and keeping the .env local files up to date in an easy way.
As a summary of all the feedback provided by some coworkers and the community in both: r/SoftwareEngineering and r/softwaredevelopment (Thank you all for it) some of the most useful resources are:
This post about Common Anti-Patterns when Managing Passwords and Application Secrets: https://blog.envkey.com/managing-passwords-and-secrets-common-anti-patterns-2d5d2ab8e8ca
This one with Secure Strategies For Managing Passwords, API Keys, and Other Secrets. https://blog.envkey.com/secure-strategies-for-managing-passwords-api-keys-and-other-secrets-4cc3b2758c02
This application to share API keys with your team self-hosting and managing them. https://envault.dev/
And I want to quote what u/nickthemagicman commented which I think is an important point to take in mind:
But due to the fact that ya'll are still using .env files for this long and it's been this chaotic and no one has fixed this by now, it sounds like your biggest hurdle is going to be to get the team buy in, since it sounds like there's no centralized management either.
Not sure what stack you're using but we're solving this with Infisical.
It provides a source of truth for your environment variables and supports different environments (development, staging, and production). Your team can either automatically inject those variables into your local process or manually pull back environment variables to update your .env file — whichever you're comfortable with most; it's end-to-end encrypted.
We ran into the same issues you're outlining and are finally solving them.

Should application version be stored in source control (e.g. Git) or CI?

We're migrating from locally run build/deploy scripts to a CI server. As of now, we keep the application version (C# AssemblyInfo.cs), which works decently.
I was wondering if there were any advantages/disadvantages of keeping the version inside our CI system rather than Git. I am unable to find much information on the subject.
I work in a team which has multiple modules . I would prefer you store your version number in the CI system and tag your versioned code in Git .
The reason is that if you store your version in , say a file system. It becomes problematic over a period of time to increment it and share it across different modules. The overhead is not justified especially when versioning and auto incrementing solutions come out of the box with many build/ CI systems.

Best source code control for a university environment (low overhead to manage repositories)

Does anyone know of a solution (web hosted or otherwise) for a source code control system that would work well in a university environment where information technology is the focus? We'd like to offer it as a campus-wide "version-control service", much like universities do with an email service. Specifically, I'm talking about the following peculiarities:
There are a large number of new repositories created/managed each semester. Any programming course or research project could require students to use source code control, in various source code environments (including .NET, Java, C++, LaTeX).
Students should be able to create and manage themselves the repositories. Involving an administrator/instructor/etc. is not scalable otherwise.
Repository storage should be secure (private), and archivable for respecting intellectual property (preventing plagiarism, protecting research IP).
Any or all of the flavors of source code control (e.g., CVS/SVN/GIT) would be acceptable.
Remote access to repositories is essential. Student/researchers have freedom to work either in designated lab spaces or remotely. Marking of assignments can be done by instructors who've "checked out" the code anywhere.
If an academic license exists, it must scale for >500 students.
Many commercial/free products (web-based or otherwise) don't satisfy conditions #1 and #2, as they require superusers to administer accounts/repositories/accesses. Solutions such as Google Code, sourceforge.net, GitHub, etc. don't satisfy condition #3, as the repositories are always public.
Here's free one: http://gitlabhq.com/
You can add repositories over this tool.
For security you use RSA Keys.
And I would suggest to use Git. SVN and CVS are outdated.
GitHub would appear to satisfy your requirements. You can set up your own instance in your intranet; https://enterprise.github.com/
You could use git in the students private file storage if they have such a thing, git doesn't require hosting other than a place to store files.
Redmine (SVN, CVS, Git, Mercurial, Bazaar and Darcs)
UberSVN (SVN)
Private Assembla (?) (SVN, Git, Mercurial)
One solution I use is to create a master GIT repository in a Truecrypt variable size encrypted container. The container is placed in a Dropbox folder. The repository is cloned to the local hard drive which becomes the working directory. All the work is done and checked in on the local repository. I wrote scripts that mount the encrypted container, pushes/pulls the local repo to the master repo and dismounts the encrypted container. Dropbox detects the changes in the encrypted container and syncs it to the Dropbox server. Security is maintained as an encrypted file is the only thing sent to the server. Only real way to ensure security is doing the encryption yourself.
All you need to set this up is a few scripts, truecrypt installed and a Dropbox account. Could probably write some basic software to automate some of the steps. To make it scaleable and low cost, the basic steps are still valid. Create a master and local repository, encrypt the master repo, work on the local repo and sync changes to the master, back up the encrypted master repo online or on a server.

SSAS Versioning Without Source Control

What is the best way to manage and combine different versions of SSAS solutions, without using version control?
Currently, we have a network drive where the "master" copy is stored. So individual develoeprs work with a local copy, but we recently ran into a problem with adding changes to the "master" copy.
Any suggestions? Microsoft appears to have souce control for SSIS. SSRS is easy enough to migrate by just copy/pasting the rdl files. There seems to be no easy way to accomplish this with SSAS packages.
What version are you talking about? I just recently added an entire SSAS project into source control. There was no issue at all.
We must somehow be talking about two different things.
You could try creating a single master xmla script that holds the entire cube definition. this script would live on the network drive.
The XMLA Script can be generated using the analysis service deployment tool. Then you would have to rely on a diff tool to try and manually merge the changes from each developer into the master file.This would be extremely cumbersome and error prone.
I would recommend just storing the project in source control. As previously mentioned MSAS will work with any version control provider. since it source files are just xml. For best results use a source control provider that integrates with visual studio.