Accelerate your Cisco and Intel Platform into a High-Performance Deep Learning Solution with Neural Magic
Looking to handle a large AI workload? Trying to figure out whether GPU or CPU is the right choice? In some cases, the complexity and cost of deploying and supporting GPU hardware for AI inferencing applications might make CPU-only AI a better choice.
With Neural Magic’s inferencing engine, you can get faster performance on your Cisco UCS® or HyperFlex® platform with 2nd Generation Intel® Xeon® Scalable processors without additional acceleration hardware. Neural Magic turns x86 Intel CPUs into highperformance machine learning resources. Neural Magic is a software solution for deep learning acceleration, enabling companies to use ubiquitous and general-purpose CPU resources to achieve performance breakthroughs at scale, with all the flexibility of software.
● Neural Magic has created novel algorithms that execute deep neural networks on commodity CPUs with GPU-class performance results for both real-time and throughputoriented deep learning applications.
● Using the Neural Magic Inference Engine, companies can slash the cost of deploying their deep learning models to production without sacrificing performance or accuracy.
● Neural Magic eliminates the complexity associated with specialized hardware accelerators. Neural Magic’s software fits seamlessly into existing CI/CD pipelines, powering deployment at scale on-premises or at the edge.
There may be a new AI feature you’d like to run alongside the rest of your production software. Some use cases may require the addition of hardware acceleration, which adds a level of complexity and cost. But what if you had a use case that could run on CPU, like the rest of the world?
Are you upgrading your infrastructure? You might be considering new AI workloads in the future. With the option of CPU-only AI in your toolset, your new servers will be future-ready for AI applications.
For edge-compute applications, the type of hardware you want to do inferencing on might not support the option of adding GPU acceleration. With Neural Magic you can optimize the performance to run that application on your edge device that would otherwise not be able to handle it.
These scenarios and pain points are common for enterprises running real-time computer vision and recommendation applications that require classifying, detecting, and segmenting large amounts of unstructured data in the form of images and embeddings.
Neural Magic’s technology breaks down into three features:
● Neural Magic Inference Engine: The Neural Magic Inference Engine is a runtime software engine that takes advantage of the latest advances in the 2nd Gen Intel Xeon Scalable processors. Neural Magic’s proprietary algorithms use the newest 2nd Gen Xeon features, including DL Boost to deliver performance.
● Neural Magic Model Recalibration Package: The Neural Magic Model Recalibration Package provides a suite of libraries and tools for TensorFlow, PyTorch, and Keras to simplify the process of achieving deep-learning performance and that have been tested on Cisco UCS and HyperFlex.
● Neural Magic Model Repository with Pre-Trained and Performance Tuned Models: If you don’t have an existing deep learning model, you can use one of Neural Magic’s pretrained models, already optimized for performance to run with the Neural Magic Inference Engine.
With a combination of Neural Magic Model Recalibration and Inference engine, you can improve your inferencing performance by up to 10X.
Example use case: online retailer visual search
An online furniture retailer wants to implement visual-search capabilities, allowing customers to upload a photo of their own and generate a product recommendation from their catalog.
Taking advantage of Cisco UCS with 2nd Generation Intel Xeon Scalable Processors with DL-Boost, they’re using Neural Magic to get the most performance out of their image processing. After training their image classification and object detection DNNs, they have the option to recalibrate their network to reduce the amount of compute necessary to process the neural network. From there the optimized model is sent to the inferencing pipeline where Neural Magic Inference Engine executes and runs the computer vision models and get up to 10x the performance compared to open-source tooling on the same stack. Without having to install any new accelerator hardware, they’re able to handle the DNN workload in parallel with the rest of their production software stack and achieve GPU class performance on commodity Intel hardware.
● Automated inventory
● Traffic analytics
● Targeted digital signage
● Visual search
● Supply chain analytics
● Threat-object detection
● Behavior analytics
Smart Cities and IOT
● Vehicle traffic analysis
● Smart parking
● Security analytics
What is Intel® Deep Learning Boost (Intel® DL Boost)?
The second generation of Intel® Xeon® Scalable processors introduced a collection of features for deep learning, packaged together as Intel® Deep Learning Boost. These features include Vector Neural Network Instructions (VNNI), which increases throughput for inference applications with support for INT8 convolutions by combining multiple machine instructions from previous generations into one machine instruction.
At the Cisco Toronto Innovation Centre, we ran a head-to-head benchmark to see how Neural Magic performed on our Cisco UCS Rack Server. We compared two identical environments using Onyx and both MobileNetV2 and ResNet-50. With Neural Magic Inferencing Engine and a Neural Magic pre-trained model, we were able to see from 5X to 10X the iterations per second performance compared to performance before optimization.
Cisco UCS with Neural Magic running on latest Intel Xeon technology is a path to bringing your AI and deep learning opportunity into production.
To find out more, or see a live demo at the Cisco Toronto Innovation Center: https://www.cisco.com/c/m/en_ca/innovationcenter/toronto.html
For more information about Cisco® solutions for AI/ML workloads visit: https://www.cisco.com/go/ai-compute