Working From Home, and Collaborating, as a Data Scientist

Insights from our partners at Allegro AI.

Global COVID-19 restrictions are creating a new wave of people who work from home. But for many, the ability to do so effectively is nuanced.

Take data scientists as an example. Technically, their work doesn’t require their physical presence in a particular location. But if they depend on massive datasets and extensive compute infrastructure to train artificial intelligence (AI) machine learning (ML) algorithms, deep learning (DL) models, or similar systems, in a collaborative way, they can rarely work in isolation.

The founders of Allegro AI, one of our portfolio companies, saw the need and opportunity to provide an end-to-end lifecycle management solution for companies trying to capitalize on AI advances, especially to incorporate deep learning computer vision. Their “Allegro Trains” open-source ML and DL experiment manager and ML-Ops package help companies develop, deploy, and manage ML/DL solutions.

When creating its platform, Allegro AI never had in mind today’s enforced work-from-home situation. But it turns out that their platform is particularly well-suited to helping data scientists pursuing ML/DL projects to work effectively from home. As they say in their blog post below (re-posted here with permission), effective collaboration is possible for data scientists, even when working remotely:

For Data Scientists, Work from Home Should Just…Work.

Effective Collaboration Doesn’t Have to Start With A Crisis

Let’s briefly review the challenges in collaboration in the world of the Data Scientist — whether she’s at home or at the office, and then figure out what solution she’d need to streamline her day.

Collaboration: Beyond the Water Cooler

The very nature of the experimentation process means that there is no linear track of development, as is the case with standard software. Oftentimes experiments need to be reproduced not just for the individual’s own use, but also for others within the team.

After all, strategies and models developed by one data scientist may be excellent candidates to try out on a team member’s own dataset. Team members collaborating in this fashion compare results, track and share progress, and then ideally draw productive conclusions. Few AI models are created by a single person.

When sitting at home without your team physically around you, the software you use to manage ML/DL experiments needs to have this ability baked into its DNA.

The nature of ML and DL adds another layer of complexity when considering the importance of sharing: Experiments are constructed from a combination of {code, model (repository), dataset}.

It is never fun to try to utilize a given repository from one experiment and dataset on another one using a different dataset. However, without this capability, best practice and previous learning isn’t leveraged, despite its value. Further snipping away the effectiveness of shared wisdom is the reality that, as in many industries, colleagues during this pandemic crisis are no longer in the same room.

A quick question or passing comment, potentially saving hours of time, is no longer happening at anywhere near the same frequency, no matter how much you Skype or WhatsApp.

To overcome the challenges of this new reality, teams should ideally have the following capabilities at their disposal:

Versioning and experiment management for complete reproducibility, as well as logging and accessibility for others.
Collaborative project space that reflects the status of the team’s work—one that allows experiments to be tracked, compared, and listed by performance or other parameters, dynamically, on a shared leaderboard.
An ability to share experiments with an approach that decouples the tight links between code, model and dataset—as well as from the specific setup environment—so that colleagues can, e.g., take an experiment and simply associate it with their specific dataset and setup.
Tools to enable teams to collaboratively streamline their virtual workspace and workflows.
The ability to use all of the above while at the same time allowing different data scientists to work as they prefer, within an environment that fits their needs and preferences.

Teams need shared hardware that “collaborates”

In the best of times, even when we were all sitting down the hall from DevOps, this was a challenge; the standardized DevOps orchestration tools are not built for ML-Ops challenges. If the Data Scientist could easily track, control and even move experiments between machines, set up queues and re-order jobs, and do it all on her own from a web-based interface, it’d make life a lot easier for both her and for the DevOps team. You can read more about boosting your machine and deep learning DevOps remotely here.

Efficiency With or Without Corona

Interestingly, Allegro Trains was not created with pandemics (or even working from home) in mind; our open-source platform was simply designed to make AI experiment management smooth, painless, and free of heavy DevOps overhead that could slow down the process and frustrate everyone involved.

It just so happens that the workflows included in Trains are flexible and lightweight enough to make remote work virtually identical to sitting in the office. We’ve been using it ourselves, from our home offices, dens and living room couches since this surreal, new world began; our team agrees that it’s as if we “accidentally” created the cure before the sickness. We can only wish the same success to the vaccine developers around the world.

To find out more about the Allegro AI ML/DL platform, visit the Allegro AI website or contact Allegro AI directly.