We are extremely excited to welcome Picsellia, an MLops platform dedicated to computer vision projects, as the first investment of our brand new fund AXC2, dedicated to European golden nuggets in cybersecurity, fintech and enterprise software! Here are a few thoughts on our investment in MLops that we think is the backbone of the success of any artificial intelligence projects, especially for companies that are new to the productization of AI.
According to the Global AI Adoption Index 2021 from IMB, 74% of companies are exploring or deploying AI. As a matter of fact, most of the projects in AI are actually failing and so this mass adoption requires a set of tools to support the development of these projects so they can succeed. This partly explains why the investments in AI have been soaring year over year for more than 7 years now.
MLOps (machine learning operations) should be understood as the entire complex value chain of people, processes, practices and technologies that automate all steps from deployment to management of machine learning (ML) models. This process is scalable, which is its strength, and must ultimately deliver measurable business value from machine learning.
MLOps should not be confused with ML. To make a meaningful comparison, there is the same difference between DevOps and Dev as between MLOps and ML.
The MLOps infrastructure of an organization depends greatly on its level of maturity on the subject. If we were to break down this process in a simple way, we would highlight the following steps :
Throughout this value chain it is essential to pay attention to how these steps intersect and repeat to enable machine learning. If the infrastructure is mature then all of these steps form an automated system that streamlines all stages of the model lifecycle, from training and deployment, through its production life, to archive and storage for compliance and risk management purposes, providing full transparency of the process. Indeed, there is sometimes a need to archive models. For example, in a very concrete way, loan mitigation or fraud detection models need to be audited due to legislation.
MLOPs process diagram
The first part, DataOps, can evolve according to the type of data used in the model. While the next two parts (Data Science and Production) are the same for all data types.
It is important to remember through this diagram the really iterative path of the MLOps. The data and the model are constantly evolving thanks to a feedback loop that allows the model to be constantly improved.
Moreover, beyond the data process as such, this is reflected above all in the diversity of professions that are affiliated with this data processing
If we have just seen that the process of arriving at the model and its monitoring was very complex, we are not out of the woods yet. Indeed, we can very quickly see a disconnection between the model and the business objectives for which it was created. This can be explained by several reasons:
This is even more relevant in computer vision where some applications the costs can be enormous, and in particular because computer vision is facing new challenges such as the structuring of data. Indeed, The historical companies of ML (Dataiku and Datarobot for example) work on structured data (for example tabular through Excel tables) but are not necessarily adapted to unstructured data. It is therefore this change in maturity that is the trigger for the MLOps trends that have emerged in recent years.
The challenges and issues facing Machine Learning in general are even more true in the case of Computer Vision. Indeed, there are few tools specialised in this type of data processing, which complicates the task of the companies that need them (images or videos being much more difficult to process than time-series, for example).
But the paradox is that, although there are still no satisfactory tools for this vertical, computer vision has been booming. Data collected by cameras, by drones, satellites and aircraft gather a lot of information. Sensors mounted on equipment are able to measure very precise elements, such as the degree of ripeness of fruit in the case of agriculture, or city crime for smart-cities applications.
It is particularly in this kind of application where the people and companies who need the models do not necessarily master all the issues and sometimes have a limited maturity of knowledge in AI. In the context of the development of these use cases, the plurality of actors involved is particularly noteworthy: industrial companies, utilities, sports, security organisations, government, local authorities, European institutions. Some of these actors are very aware of the interest that computer vision can have but do not necessarily have entire teams dedicated to it. This is why tools that simplify the process are really essential!
A recent Cognilytica study states that the greatest challenge faced by most AI/ML teams is data management and optimization. About 50% of their time they spend in developing AI is on training data, while another 15% implies augmenting datasets to optimize processes around training data. In the long run, these optimizations can help them save a significant amount of money and time.
These issues are even more important in the case of computer vision. So having a tool that can bring order to the beginning of the value chain is extremely useful for data teams. There are currently many tools for tagging but so few that include Data Management (visualization, search, versioning, exploration). It is through this value proposition that Picsellia comes into contact with its customers in the first place.
This lack of verticalized end-to-end tools in the Computer Vision sector combined with the enormous growth of this sector motivated Axeleo Capital to invest in Picsellia.
The founding idea of Picsellia is that the vast majority of projects in the artificial intelligence sector are not ultimately exploited and remain at the R&D stage since they are not directly designed for production. Generally speaking, it is still complicated in this sector to properly track the data, to manage the AI infrastructure, to check that the model is working correctly (avoiding the drift): this is why MLOps is a sector that has developed particularly in recent years.
Concretely, Picsellia provides an MLOps platform for Computer Vision. Picsellia covers the entire value chain from data management to the model deployment and monitoring. This starts with data storage (often from cameras in the case of Computer Vision), followed by data management. The solution then enables collaboration on Deep Learning models and the storage of artifacts (files generated during model training). There is also an orchestration of the AI infrastructures to be able to control the costs of training the models as well as possible. Finally, the solution allows very precise parameterisation and deployment of the models.
For each of the solution's value propositions, alternatives exist but none of them offers monitoring adapted to Computer Vision, and even less so in end-to-end. Full stack tools with an open architecture like Picsellia are quite relevant for end-to-end projects where companies need to rely on a structured sandbox.
Having said that, if we take a step back to look at the bigger picture explained in the first part of the article, and as we can see in the mapping below, we think that there is still a lot of space and value to be covered in MLops in general. Depending on the use case and the customer profiles, this space can be filled by both close and open source features, easing the access and development of AI and helping data scientists to solve very specific pains in the value chain of a project (ingestion of data, data preparation, experiment tracking, model monitoring, governance & security, etc.).
At Axeleo Capital, we are still actively looking for projects that bring innovation to the MLops value chain. So feel free to contact us if you think you are a match!