Inference is used to make predictions using trained models for most deep learning applications. This covers as much as 90% of the computing costs of deep learning applications. Amazon Elastic Inference will bring down the costs of running deep learning inference in a big way by allowing the customers to attach the right amount of GPU-powered acceleration to Amazon EC2 and Amazon SageMaker.
On its website, AWS wrote: “With Amazon Elastic Inference, you can now choose the instance type that is best suited to the overall CPU and memory needs of your application, and then separately configure the amount of inference acceleration that you need with no code changes.”
AWS’s CEO Andy Jassy said that the technology is a game changer that will run inference in a cost-effective way. Amazon Elastic Inference will let the AWS customers use the resources in an efficient manner to reduce the costs associated with running inference. Right now, Amazon Elastic Inference supports TensorFlow, Apache, MXNet, and ONNX models. More frameworks are expected to come soon to the Amazon Elastic Inference.