/web-to-ai-ml-integrations

Flask Web App with Machine Learning Inference

Build a Flask web app with machine learning inference. Our step-by-step guide makes integrating ML simple and efficient.

Book a free  consultation
4.9
Clutch rating 🌟
600+
Happy partners
17+
Countries served
190+
Team members
Matt Graham, CEO of Rapid Developers

Book a call with an Expert

Starting a new venture? Need to upgrade your web app? RapidDev builds application with your growth in mind.

Book a free No-Code consultation

Flask Web App with Machine Learning Inference

Integrating a Pretrained Machine Learning Model with Flask

 
  • This guide demonstrates how to integrate a pretrained machine learning (ML) model into a Flask web application. The ML model inference is wrapped inside a REST API endpoint that accepts input data, processes it, and returns the prediction in JSON format.
  • We assume you have a model (for example, a scikit-learn classifier) that has been serialized using pickle. For demonstration, our model will predict whether an input example belongs to a certain class.

Loading and Initializing the ML Model

 
  • First, load the pretrained model when the application starts. This avoids reloading the model on each prediction request, which is crucial for performance.
  • The following code snippet shows how to load a pickle-serialized ML model:

// Import necessary Python modules
from flask import Flask, request, jsonify
import pickle  // For loading the pretrained model
import numpy as np  // For data manipulation if required

// Initialize the Flask application
app = Flask(**name**)

// Load the pretrained ML model
// Ensure that the model file 'model.pkl' is available in your project directory
with open('model.pkl', 'rb') as model\_file:
    model = pickle.load(model\_file)
  • Here, an instance of the Flask application is created and the model is loaded using Python's built-in pickle module. The "rb" mode (read-binary) is used as the model file is binary.

Creating a Prediction Endpoint

 
  • Create a REST API endpoint that accepts POST requests. This endpoint will handle incoming data, convert it into a format the ML model expects, and then invoke the model for predictions.
  • Below is a detailed code snippet for the prediction endpoint:

// Define the /predict endpoint accepting POST requests
@app.route('/predict', methods=['POST'])
def predict():
    // Extract the input JSON data from the request
    data = request.get\_json(force=True)
    
    // Expecting the JSON data to have a key "features" which is a list of values
    // Example payload: { "features": [3.1, 1.2, 0.5, 2.4] }
    features = data.get('features', None)
    
    if features is None:
        // Return an error response if 'features' key is missing
        return jsonify({'error': 'No features provided'}), 400
    
    try:
        // Convert the input list into a NumPy array and reshape it if model expects 2D input
        input\_data = np.array(features).reshape(1, -1)
    except Exception as e:
        // Return error response in case of conversion failure
        return jsonify({'error': 'Invalid features format', 'message': str(e)}), 400
    
    try:
        // Perform prediction using the loaded model
        prediction = model.predict(input\_data)
        
        // You might also want to provide probability using model.predict\_proba if available
        // probabilities = model.predict_proba(input_data) if hasattr(model, "predict\_proba") else None
        
        // Return the prediction as JSON
        return jsonify({'prediction': prediction.tolist()})
    except Exception as e:
        // Catch any exception during prediction and return an error message
        return jsonify({'error': 'Prediction failed', 'message': str(e)}), 500

// End of the prediction endpoint
  • This endpoint expects the request's payload to be a JSON object containing a key named "features" with a list of numerical values.
  • The input list is reshaped to a 2-dimensional array as many scikit-learn models expect input data in the form of (number of samples, number of features).
  • Any exceptions during data processing or prediction are caught and returned with appropriate HTTP status codes.

Handling Errors and Validations

 
  • Robust error handling ensures that the client receives clear messages if the input is incorrect or if the model fails to predict. In the endpoint, we check for:
    • The presence of a "features" key in the JSON payload.
    • Proper reshaping of the features list into an array.
    • Catching exceptions during model prediction to avoid application crashes.
  • This improves the reliability of your service and makes debugging easier when problems arise.

Running the Flask Application

 
  • After setting up the prediction endpoint, the final step is to run the Flask application. The following code snippet should be placed at the bottom of your file:

// Check if this script is executed directly and then run the Flask application
if **name** == '**main**':
    // Enable debug mode for development; disable in production
    app.run(debug=True)
  • This snippet allows you to start the Flask web server. The debug=True flag is useful during development to provide detailed error logs and automatic server reloading on code changes. Remember to disable this in a production environment.

Testing the Inference Service

 
  • After running the Flask app, you can test the inference endpoint by sending a POST request to http://localhost:5000/predict with a JSON payload.
  • Use tools such as curl or Postman to simulate the request. For example, using curl:

// Sample curl command to test the inference endpoint
curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d '{"features": [3.2, 1.5, 0.7, 2.1]}'
  • The server should respond with a JSON object containing the model's prediction.
  • This testing verifies that the ML model is correctly integrated within the Flask endpoint and that the entire pipeline is working as expected.

Deploying and Scaling Considerations

 
  • When deploying your Flask application with ML inference in production, consider using a production-ready web server (like Gunicorn or uWSGI) instead of the Flask built-in server.
  • For scaling, you might need to establish a proper CI/CD pipeline, containerize your application using Docker, and use orchestration tools like Kubernetes.
  • Cache frequently requested predictions if applicable and set up monitoring for system performance, ensuring that the ML service meets the necessary throughput and latency requirements.
  • Also, secure your endpoints (e.g., using HTTPS, authentication, and rate-limiting) to protect sensitive data and prevent misuse.


Recognized by the best

Trusted by 600+ businesses globally

From startups to enterprises and everything in between, see for yourself our incredible impact.

RapidDev was an exceptional project management organization and the best development collaborators I've had the pleasure of working with.

They do complex work on extremely fast timelines and effectively manage the testing and pre-launch process to deliver the best possible product. I'm extremely impressed with their execution ability.

Arkady
CPO, Praction
Working with Matt was comparable to having another co-founder on the team, but without the commitment or cost.

He has a strategic mindset and willing to change the scope of the project in real time based on the needs of the client. A true strategic thought partner!

Donald Muir
Co-Founder, Arc
RapidDev are 10/10, excellent communicators - the best I've ever encountered in the tech dev space.

They always go the extra mile, they genuinely care, they respond quickly, they're flexible, adaptable and their enthusiasm is amazing.

Mat Westergreen-Thorne
Co-CEO, Grantify
RapidDev is an excellent developer for custom-code solutions.

We’ve had great success since launching the platform in November 2023. In a few months, we’ve gained over 1,000 new active users. We’ve also secured several dozen bookings on the platform and seen about 70% new user month-over-month growth since the launch.

Emmanuel Brown
Co-Founder, Church Real Estate Marketplace
Matt’s dedication to executing our vision and his commitment to the project deadline were impressive. 

This was such a specific project, and Matt really delivered. We worked with a really fast turnaround, and he always delivered. The site was a perfect prop for us!

Samantha Fekete
Production Manager, Media Production Company
The pSEO strategy executed by RapidDev is clearly driving meaningful results.

Working with RapidDev has delivered measurable, year-over-year growth. Comparing the same period, clicks increased by 129%, impressions grew by 196%, and average position improved by 14.6%. Most importantly, qualified contact form submissions rose 350%, excluding spam.

Appreciation as well to Matt Graham for championing the collaboration!

Michael W. Hammond
Principal Owner, OCD Tech

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We’ll discuss your project and provide a custom quote at no cost.Â