Machine Learning Costs: Price Factors and Real-World Estimates

#Machine #Learning #Costs #Price #Factors #RealWorld #Estimates

Recently, we published an article shedding light on the costs of developing an AI solution. In this blog post, we will focus on one of AI subsets, machine learning, and estimate how much it costs to train, deploy, and maintain intelligent algorithms.

To keep it practical, we sat down with Kirill Stashevsky, ITRex CTO, and asked him to draw machine learning cost estimates from our portfolio. He also shared his expertise developing ML solutions and listed the steps for businesses to take in order to reduce investments into machine learning – without sacrificing quality or time to market.

Machine learning costs factors

But before getting down to numbers, let’s quickly highlight the factors determining the final cost of a machine learning solution.

1. The complexity of the solution you’re eyeing to create

Machine learning solves many problems of different complexity. Social media’s engines making friends suggestions, smart surveillance cameras recognizing faces in video footage, and healthcare expert systems predicting heart failures all run on machine learning. However, their complexity, performance, responsiveness, compliance requirements, and, hence, costs vary a lot.

2. The approach to training an ML model

There are three approaches to machine learning: supervised, unsupervised, and reinforcement learning. Whether you go this or that way impacts machine learning costs.

The essence of each of these methods boils down to this:

  • Supervised learning uses manually labeled datasets to teach algorithms to correctly classify or understand the relationships between data points
  • Unsupervised learning means that algorithms search for patterns in datasets themselves, with no previous labeling, though it still requires some human intervention, mainly for validating output variables.
  • Reinforcement learning is a bit trickier: instead of relying on data as a beacon for decision-making, the model trains in operation. When learning, a reinforcement agent takes a subtle action, and it’s either “rewarded” or “punished” for it.

Opting for supervised learning, you don’t need that much computing power since the method is quite easily realized on R or Python. Unsupervised and reinforcement learning models are computationally more complex. They need a large training dataset to produce reliable outcomes. So, you’ll need more powerful tools for working with vast volumes of unclassified data that may drive machine learning costs up.

In the context of machine learning, there is another cost-effective strategy organizations may go for to cut down the expenses associated with model development – using foundation models. The models, often built upon generative AI techs, have become particularly prominent in recent years.

Foundation models, say, OpenAI’s GPT series, have been pre-trained on large data sets. Harnessing these pre-trained models, you can skip the resource-intensive process of training one from scratch. Instead, you fine-tune the foundation model to perform a specific task, which usually requires less data and computational power, hence, less investment.

However, going for supervised or reinforcement learning, you can save investments that would otherwise be spent on data labeling.

3. The availability and quality of training data

No matter the approach to machine learning, you will need enough data to train the algorithms on. Machine learning costs thus include the price of acquiring, preparing, and – in case of supervised learning – annotating training data.

If you have enough training data on hand, you’re lucky. However, it’s rarely the case. Numerous researchers state that around 96% of enterprises do not initially have enough training data. For your reference, a study by Dimensional Research shows that on average, ML projects need around 100,000 data samples to perform well.

You can synthetically generate the needed volume of data or augment the data you already have. Generating 100,000 data points via Amazon’s Mechanical Turk, for example, can cost you around $70,000.

Once you have enough data on hand, you need to make sure it’s of high quality. The study referenced above suggests that 66% of companies run into errors and bias in their training data sets. Removing those can take 80 to 160 hours for a 100,000 sample data set.

In case you opt for supervised learning (which is often the case for commercial ML solutions), you need to add the price of data annotation to the total machine learning cost, too. Depending on the complexity of labeling, it can take 300 to 850 hours to get 100,000 data samples labeled.

Drawing the line, a solid training data set of high quality can cost you anything from $10,500 to $85,000 depending on the nature of your data, the complexity of annotation, as well as the composition and location of your ML team.

4. The complexity and length of the exploratory stage

During an exploratory phase, you carry out a feasibility study, search for an optimal algorithm, and run experiments to confirm the chosen approach.

The cost of exploration depends on the complexity of the business problem, the expected time to market, and, subsequently, team composition.

As a rule, a team of a business analyst, a data engineer, an ML engineer, and – optionally – a project manager is enough to carry out the task. In that case, you can expect the exploratory stage to round at $39,000 to $51,000. Outsourcing the effort, you can cut this figure down to $15,000-$20,000.

5. The cost of production

Machine learning costs feature the cost of production, too. Production costs include the costs of the needed infrastructure (including cloud computing and data storage), integration costs (including designing a data pipeline and developing APIs), and maintenance costs.

Cloud resources

The price of the cloud infrastructure depends on the complexity of the models being trained. If you are building a simpler solution that relies on data of low dimensionality, you may get by four virtual CPUs running on one to three nodes. This may cost you around $100 to $300 a month, or $1,200 to $3,600 a year.

If the solution you’re eyeing to create requires high latency and relies on complex deep learning algorithms, expect a monthly cost of $10,000 to $30,000 to be added to the total ML price.


Developing integrations involves designing and developing the data pipeline and the needed APIs. Putting together a data pipeline takes up around 80 development hours. Putting API endpoints in place and documenting them to be used by the rest of the system requires another 20 to 30 hours, the cost of which should be added to the final machine learning cost estimates.

Support and maintenance

Machine learning models need ongoing support during their entire life cycle: incoming data needs to be cleansed and annotated; models need to be retrained, tested, and deployed.

According to the study conducted by Dimensional Research, businesses commit 25% to 75% of the initial resources into maintaining ML algorithms.

Assuming that the initial solution architecture and data pipelines are well designed and part of the recurring tasks is automated, you can go by one support engineer that may cost you around $30,000 a year.

6. The cost of consulting

If you’re just tipping toes in the machine learning waters, you can’t really get too far without an experienced ML consultant.

Two main factors determining the cost of ML consulting include:

  • Consultant’s experience. It is worth making experience a critical factor in your hiring decision. You want to partner with someone who has enough expertise in the field you may not necessarily be familiar with.
  • Project scope. The more complicated the project, the more consultant’s involvement it will require. Moreover, if the scope of the project is undefined, search for a consultant who can carry out a discovery phase for you and offer a compelling proposal with all the necessary estimations.

ML consulting rates usually reach $5,000 to $7,000 per project.

7. Opportunity costs

Opportunity costs can be defined as forfeiting all benefits associated with not taking an alternative route. To put things into perspective, think of Blockbuster, a former leader in the movie rental market. Foregoing innovation, the company lost to a newly emerged leader – Netflix. The opportunity cost equaled $6 billion and a near-bankruptcy.

The same idea goes for machine learning initiatives. Enterprises lagging in ML adoption can’t tap into predictive insights and informed decision-making that come with it.

On the opposite side, implementing machine learning just for the sake of innovation, say, to solve problems that require rule-based solutions, is a loss as well.

Therefore, before you decide to implement AI in business, consider the cost vs. benefit ratio and carefully weigh implementation risks.

So, how much does ML cost?

Now that you are familiar with the factors affecting the total ML price, let’s look at some examples from ITRex’s portfolio to help you better understand the costs involved.

Note that we draw effort estimations, too. The reason is that the price of developing an ML solution depends greatly on the composition and location of your ML development team. You can get an idea of the total cost associated with developing a similar ML solution based on the following rates:

Please be aware that the estimated budgets provided below apply exclusively to the development of the machine learning component within these solutions. It’s essential to consider additional expenses, such as infrastructure, productization, and other associated costs, as machine learning operates in conjunction with various elements within the wider solution.

Project 1. Emotion recognition solution

A multinational media and entertainment company wanted to analyze footage from their surveillance cameras to recognize people’s emotions. The task was complicated by degraded visual conditions, such as the quality of the footage itself, as well as people wearing face masks, glasses, and other items that made recognition difficult.

The media tycoon was seeking a trusted media and entertainment software vendor to conduct an extensive research and power future development. The ITRex team of two ML engineers tested out three neural networks, selected the one optimal for the task, fine-tuned it for better performance, and provided other strategies on achieving a higher accuracy score.

Efforts: approx. 300-350 hours

ML costs: approx. $26,000

Project 2. A fitness mirror with a personal coach inside

The customer wanted to build an innovative fitness mirror that can act like a personal coach – offering personalized training plans and guiding users through training sessions with real-time recommendations.

The ITRex team built the hardware components of the smart device and provided end-to-end software development, spanning infrastructure setup, embedded software/firmware development, and content management.

When it comes to the machine learning component of the solution, we designed and trained a deep learning model using a dataset of workout records to provide guidance for users, implemented computer vision algorithms for motion tracking and human pose estimation, as well as object recognition algorithms for overseeing the sports equipment used in workouts.

Efforts: approx. 640-700 hours

Costs: approx. $51-56,000

Project 3. Automated document recognition solution

Our customer was eyeing to create a solution that would automate the process of filling out documents. The key goal of the project was to develop an independent optical character recognition (OCR) solution that would recognize and index batches of incoming documents, as well as seamlessly integrate the solution into the customer’s existing document processing system.

The OCR solution we crafted helps automate the traditionally resource-intensive process of marking and indexing documents, leading to time and cost savings. By drastically reducing the manual effort typically allocated to document marking and indexing, the solution allows handling more documents within the same timeframe. The outcome? Enhanced productivity and swift, accurate processing of critical documents.

Team efforts: approx. 300-400 hours

ML costs: $28-32,000

How can you reduce ML development costs – and get ROI fast?

If you are thinking about venturing into AI development and looking for ways to lower machine learning costs without putting the quality of the final product at risk, look through our field-tested recommendations.

Start small but have a bigger picture in the back of your mind

When kicking off an ML project, it often pays off to keep the initial scope smaller. By starting with a minimum viable product, you can focus your resources on a specific problem and iterate quickly. This approach help save machine learning costs in several ways:

  • Starting small allows you to test your ideas and hypotheses with a smaller dataset and a reduced set of features. This, in turn, lets you quickly assess the feasibility and effectiveness of your ML solution – without investing significant resources upfront.
  • By keeping the scope smaller, you can pinpoint and address potential challenges or limitations in your machine learning pipeline at an early stage. This helps avoid costly rework at the later stages of development.
  • By prioritizing critical use cases and features, you allocate resources more effectively and focus on the areas that provide the fastest ROI rather than tackling the entire project at once.

Follow MLOps best practices from day one to avoid scalability issues

MLOps refers to a set of practices that enhance collaboration and automation in ML development projects. By setting up an MLOps pipeline from the outset, you can mitigate potential scalability issues and reduce machine learning costs. The cost reduction is achieved via:

  • Streamlined development process: MLOps promotes standardization and automation, while reducing the need for manual, error-prone operations.
  • Scalable infrastructure: MLOps focuses on building scalable infrastructures to support the entire ML development lifecycle: from data preprocessing to model deployment. This helps accommodate growing data volumes, increasing model complexity, and higher user demand without introducing significant changes to the infrastructure.
  • CI/CD: CI/CD practices ensure that changes introduced to your ML solution are automatically integrated, tested, and deployed in a reliable and automated manner.

Use pre-trained machine learning models

Using machine learning models that have been previously trained helps reduce machine learning costs in the following ways:

  • Transfer learning: Serving as a starting point for many ML tasks, pre-trained models allow transferring the knowledge learned from a different but related task to the problem in question, which saves substantial computational resources and training time.
  • Reduced data requirements: Training ML models from scratch calls for large volumes of annotated data, which can be quite costly and time-consuming to collect and label. Pre-trained models can be fine-tuned on relatively small volumes of domain-specific data.
  • Faster prototyping and iteration: Pre-trained models allow you to quickly prototype and iterate your ML solution.

Do you have an idea of a machine learning solution in mind? Get in touch with us, and we will help you draw machine learning cost estimations and bring your solution to reality!

The post Machine Learning Costs: Price Factors and Real-World Estimates appeared first on Datafloq.