Extending the ML Proof of Concept for Production

On November 1, 2024 in All, Computing, General by Becks Simpson

Moving Your ML Proof of Concept to Production Part 5: Deploying the Model

(Source: VectorMine / stock.adobe.com)

Once a machine learning (ML) proof of concept (POC) has been built and “proves the concept,” a few more steps are needed to realize its value. Simply compiling a dataset and demonstrating that a model predicts things from it will not ensure success with real-world use cases. Software development is needed to achieve a robustly integrated ML process that works with the intended product or software stack.

This series has described translating business objectives to ML metrics, building a project-specific dataset, structuring the experimentation environment, and developing the POC model. Assuming that the whole process was a success, the next step is to prepare the model for production. This involves establishing production requirements like expected latency or framework compatibility and then using those to select the right tooling and architecture to deploy the model. The next step entails integrating the model and—through testing and proper software development practices—ensuring that its use in a product is more robust than its use during experimentation.

This part of our blog series presents some of the key tools and software packages available for model deployment as well as tips and tricks for a solid production-level ML experience.

Assembling Production Infrastructure

Typically, developers will need to determine how to host the model and define necessary changes or additions to integrate the model into a product or to build something new around it. The field of ML operations (MLOps) explores all the different considerations of ML production as well as tooling and resources needed, regardless of the complexity of one’s requirements.

The following are the most common considerations when determining the setup for ML in production:

Integration: Any existing infrastructure (e.g., cloud provider usage) or future need for a specific platform compatibility (e.g., data monitoring tools) will affect the choice of integration solution.
Performance: The speed of the model's answers and the expected volume of data to be processed influence performance requirements.
Framework compatibility: If the model was developed using a particular ML framework, such as TensorFlow or PyTorch, then the deployment framework must be compatible. A deployment framework-agnostic infrastructure could also be useful if other ML frameworks might be used for future model development.
Complexity: Depending on the internal skill set of the team members and how familiar they are with ML development and production, solutions that are simpler to develop and use might be necessary.
Cost: Because different solutions come with different costs—such as cloud compute versus on-premises infrastructure or licensing fees for hosted solutions—understanding the available budget is important.

Model Serving

Model serving is essential to ML production, and it involves deploying the model weights (i.e., artifacts of training that are key for reuse in later prediction tasks) as well as the code to run the model in an accessible location. A typical architecture configuration includes deploying the model with a tool that serves it in memory and accepts data via an application programming interface (API) call. The API call triggers the tool to feed data through the model and return a predicted output. This is commonly described as “online” serving because the latency is quicker than the batched alternative, and the model is essentially always ready for processing predictions. For example, MLflow has serving capabilities that allow users to interact with a trained model via specific code, making it quick and seamless to put into production. Some downsides, however, are a lack of security and resource management issues if too many requests come in at once or if the model or data size is large enough to require more resources.

Other third-party tools cover these aspects and can automate the serving workflow, from initiating input data processing jobs to running the model for predictions and returning the results to multiple users; although, these tools often come with higher price tags and less flexibility. Cloud providers have their tools, such as Google Vertex AI or Azure Machine Learning, but other cloud-agnostic options exist, such as Iguazio, BentoML, and Seldon. Some specific ML frameworks, like TensorFlow, have their own libraries and setup for model serving too, though using them often means being locked into a specific model development framework.

Integrating the Model

The model integration stage involves developing any software required to use a served model; for instance, writing the necessary code to call the model if it is served via third-party tools and made accessible through an API. Alternatively, this stage might involve writing software modules that load the model and run it against specific data, depending on how the model is stored and the deployment configuration. If a particular framework was used for model development, it often comes with software examples of how to use the model in an existing codebase. One should take care at this step to ensure that best practices around data security and authenticated access are implemented since these are often omitted during the experimentation phase.

During integration, one should also identify and adapt data processing for differences in production data. To make the overall process more robust, take the basic data pipeline code initially used to transform and feed data into model training and extend it to cover unexpected inputs and different failure modes. Ensuring that the input and output data-processing pipelines are also covered by unit and conducting integration testing are important steps to assure the ML-adjacent code will function well in production.

Conclusion

Once an ML POC has been shown to work, the more software-focused process of adapting it for production can begin. Identifying requirements around cost, framework compatibility, performance, integration, and complexity will help guide the choice of tooling and architecture solutions for the ML production, particularly when it comes to model serving. Building on what was established for data processing to make the model more robust to different use cases and failure modes is another important step, as is introducing solid software engineering practices like testing and security during integration.

This series has covered five important first steps to success with an ML project: establishing goals and translating them to metrics, preparing the dataset, building a robust development environment, developing the POC model, and putting the model into production. The remaining blog will cover all the important post-production considerations—from monitoring data drift and model performance to deciding when to retrain a model—and offer tips for testing new versions.

« Back

Becks Simpson is a Machine Learning Lead at AlleyCorp Nord where developers, product designers and ML specialists work alongside clients to bring their AI product dreams to life. In her spare time, she also works with Whale Seeker, another startup using AI to detect whales so that industry and these gentle giants can coexist profitably. She has worked across the spectrum in deep learning and machine learning from investigating novel deep learning methods and applying research directly for solving real world problems to architecting pipelines and platforms to train and deploy AI models in the wild and advising startups on their AI and data strategies.

Tagged With: machine learning, ml in production, mlops, model serving, poc, proof of concept, software development

Company

Resources

Support

Connect with Us

Bench Talk

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics

Moving Your ML Proof of Concept to Production Part 5: Deploying the Model

Assembling Production Infrastructure

Model Serving

Integrating the Model

Conclusion

Search

Categories

Featured Authors

All Authors

Archives

Tags

Customer Service Office

Company

Resources

Support

Connect with Us

Bench Talk

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics

Moving Your ML Proof of Concept to Production Part 5: Deploying the Model

Assembling Production Infrastructure

Model Serving

Integrating the Model

Conclusion

Related Posts

Emission-free Utility and Special-purpose Vehicles

EV Wireless Charging

Why SiC Devices Are Ideal for Level 3 EV Charging Applications

Advanced Thermal Solutions for EV Charging

New Tech Tuesdays: Challenges to Public EV Charging Infrastructure

New Tech Tuesdays: Enhancing Machine Reliability with QUINT POWER with IO-Link

Search

Categories

Featured Authors

All Authors

Archives

Tags

Customer Service Office

Company

Resources

Support

Connect with Us