Welcoming Picsellia to the AXC family

Article by

Article Date

October 11, 2022

The basics of MLOps - from model to business

MLOps (machine learning operations) should be understood as the entire complex value chain of people, processes, practices and technologies that automate all steps from deployment to management of machine learning (ML) models. This process is scalable, which is its strength, and must ultimately deliver measurable business value from machine learning.

MLOps should not be confused with ML. To make a meaningful comparison, there is the same difference between DevOps and Dev as between MLOps and ML.

The MLOps infrastructure of an organization depends greatly on its level of maturity on the subject. If we were to break down this process in a simple way, we would highlight the following steps :

Creation, training and validation of models
Versioning of models
Deployment of models
How models are evaluated and monitored in production
How models are corrected to reduce risk

Throughout this value chain it is essential to pay attention to how these steps intersect and repeat to enable machine learning. If the infrastructure is mature then all of these steps form an automated system that streamlines all stages of the model lifecycle, from training and deployment, through its production life, to archive and storage for compliance and risk management purposes, providing full transparency of the process. Indeed, there is sometimes a need to archive models. For example, in a very concrete way, loan mitigation or fraud detection models need to be audited due to legislation.

‍

MLOPs process diagram

The first part, DataOps, can evolve according to the type of data used in the model. While the next two parts (Data Science and Production) are the same for all data types.

It is important to remember through this diagram the really iterative path of the MLOps. The data and the model are constantly evolving thanks to a feedback loop that allows the model to be constantly improved.

Moreover, beyond the data process as such, this is reflected above all in the diversity of professions that are affiliated with this data processing

C levels and Product team : defining business objective with KPIs
Data Engineering : data acquisition and preparation.
Data Science : architecting ML solutions and developing models.
DevOps : complete deployment setup, monitoring alongside scientists.

If we have just seen that the process of arriving at the model and its monitoring was very complex, we are not out of the woods yet. Indeed, we can very quickly see a disconnection between the model and the business objectives for which it was created. This can be explained by several reasons:

Reflecting the evolution of business objectives in the model. Indeed, AI is based on data from the real world, if the world changes, the model must change ... it is difficult to adapt constantly and that's why you need tools.
Risk assessment : there is much debate around the black box nature of these ML/DL systems. Models often tend to drift away from what they were originally intended to do (known as drift). Assessing the risk/cost of such failures is a very important and meticulous step.
There may also be gaps in communication between the technical and business teams, with a hard-to-find common language for collaboration. More often than not, this gap becomes the reason for the failure of large projects.

This is even more relevant in computer vision where some applications the costs can be enormous, and in particular because computer vision is facing new challenges such as the structuring of data. Indeed, The historical companies of ML (Dataiku and Datarobot for example) work on structured data (for example tabular through Excel tables) but are not necessarily adapted to unstructured data. It is therefore this change in maturity that is the trigger for the MLOps trends that have emerged in recent years.

‍

The case of Computer Vision

The challenges and issues facing Machine Learning in general are even more true in the case of Computer Vision. Indeed, there are few tools specialised in this type of data processing, which complicates the task of the companies that need them (images or videos being much more difficult to process than time-series, for example).

But the paradox is that, although there are still no satisfactory tools for this vertical, computer vision has been booming. Data collected by cameras, by drones, satellites and aircraft gather a lot of information. Sensors mounted on equipment are able to measure very precise elements, such as the degree of ripeness of fruit in the case of agriculture, or city crime for smart-cities applications.

It is particularly in this kind of application where the people and companies who need the models do not necessarily master all the issues and sometimes have a limited maturity of knowledge in AI. In the context of the development of these use cases, the plurality of actors involved is particularly noteworthy: industrial companies, utilities, sports, security organisations, government, local authorities, European institutions. Some of these actors are very aware of the interest that computer vision can have but do not necessarily have entire teams dedicated to it. This is why tools that simplify the process are really essential!

A recent Cognilytica study states that the greatest challenge faced by most AI/ML teams is data management and optimization. About 50% of their time they spend in developing AI is on training data, while another 15% implies augmenting datasets to optimize processes around training data. In the long run, these optimizations can help them save a significant amount of money and time.

‍

*Source: Data extracted from Cognilytica — Data Preparation & Labeling for AI 2020*

‍

These issues are even more important in the case of computer vision. So having a tool that can bring order to the beginning of the value chain is extremely useful for data teams. There are currently many tools for tagging but so few that include Data Management (visualization, search, versioning, exploration). It is through this value proposition that Picsellia comes into contact with its customers in the first place.

‍

Picsellia, the ideal candidate

This lack of verticalized end-to-end tools in the Computer Vision sector combined with the enormous growth of this sector motivated Axeleo Capital to invest in Picsellia.

The founding idea of Picsellia is that the vast majority of projects in the artificial intelligence sector are not ultimately exploited and remain at the R&D stage since they are not directly designed for production. Generally speaking, it is still complicated in this sector to properly track the data, to manage the AI infrastructure, to check that the model is working correctly (avoiding the drift): this is why MLOps is a sector that has developed particularly in recent years.

Concretely, Picsellia provides an MLOps platform for Computer Vision. Picsellia covers the entire value chain from data management to the model deployment and monitoring. This starts with data storage (often from cameras in the case of Computer Vision), followed by data management. The solution then enables collaboration on Deep Learning models and the storage of artifacts (files generated during model training). There is also an orchestration of the AI infrastructures to be able to control the costs of training the models as well as possible. Finally, the solution allows very precise parameterisation and deployment of the models.

For each of the solution's value propositions, alternatives exist but none of them offers monitoring adapted to Computer Vision, and even less so in end-to-end. Full stack tools with an open architecture like Picsellia are quite relevant for end-to-end projects where companies need to rely on a structured sandbox.

‍

Having said that, if we take a step back to look at the bigger picture explained in the first part of the article, and as we can see in the mapping below, we think that there is still a lot of space and value to be covered in MLops in general. Depending on the use case and the customer profiles, this space can be filled by both close and open source features, easing the access and development of AI and helping data scientists to solve very specific pains in the value chain of a project (ingestion of data, data preparation, experiment tracking, model monitoring, governance & security, etc.).

‍

‍

At Axeleo Capital, we are still actively looking for projects that bring innovation to the MLops value chain. So feel free to contact us if you think you are a match!