Microsoft has come up with AI Infrastructure Service “Singularity”

A recently published paper by Microsoft's Azure and Research teams discusses Microsoft's AI infrastructure service, which is codenamed Singularity.

Microsoft is working to reduce the cost of artificial intelligence (AI) and wasted efforts when computing at a global scale. A recently published paper by Microsoft's Azure and Research teams discusses Microsoft's AI infrastructure service, which is codenamed Singularity. The paper, titled Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads (PDF), breaks down Microsoft's work at a technical level.

“Singularity is a fully managed, globally distributed infrastructure service for AI workloads at Microsoft, with support for diverse hardware accelerators. Singularity is designed from the ground up to scale across a global fleet of hundreds of thousands of GPUs and other AI accelerators," explains Microsoft's Azure and Research teams in their paper. "Singularity is built with one key goal: driving down the cost of AI by maximizing the aggregate useful throughput on a given fixed pool of capacity of accelerators at planet scale while providing stringent SLAs for multiple pricing tiers."

Microsoft officials previously discussed plans to make FPGAs, or Field Programmable Gateway Arrays, available to customers as a service. In 2018, Microsoft released its Brainwave project, which is designed to deliver rapid AI processing in Azure. At the time, Microsoft made a preview of Brainwave-powered Azure Machine Accelerated Hardware models available in the cloud, a first step in making FPGA processing available to customers for AI workloads.

Singularity is arguably the next step in Brainwave’s transformation into commercial service. We asked Microsoft to comment on this and to clarify when and how the company plans to make Singularity a commercially available service. At the moment, Microsoft has not provided an answer.

Microsoft in 2020 unveiled a new powerful supercomputer in collaboration with OpenAI, making new infrastructure available in Azure to train extremely large AI models.

The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 GPUs, and 400 gigabits per second of network connectivity for each GPU server.

In layman's terms, Microsoft's Singularity lets hundreds of thousands of GPUs and AI accelerators work together. Singularity is a global infrastructure service designed to reduce wasted efforts. It treats all devices within the infrastructure as a single cluster, which helps ensure that the devices are used to their full potential.

Singularity can also adapt to prioritize different workloads. "While opportunistically using spare capacity, Singularity simultaneously provides isolation by respecting job-level SLAs," says Microsoft. "For example, Singularity adapts to increasing load on an inference job, freeing up capacity by elastically scaling down or preempting training jobs."

The paper focuses on Singularity's scaling tech and schedulers, which it asserts are its secret sauce because they reduce cost and increase reliability.

The software automatically decouples jobs from accelerator resources, which means when jobs scale up or down "we simply change the number of devices the workers are mapped to this is completely transparent to the user, as the world-size (i.e. total number of workers) of the job remains the same regardless of the number of physical devices running the job."

That's possible owing to "a novel technique called replica splicing that makes it possible to time-slice multiple workers on the same device with negligible overhead while enabling each worker to use the entire device memory."

The above makes it possible to schedule more jobs, more efficiently, so the thousands of servers are in service for more time. It also enables swift scaling, up or down, without disruption.

Singularity achieves a significant breakthrough in scheduling deep learning workloads, converting niche features such as elasticity into mainstream, always-on features that the scheduler can rely on for implementing stringent SLAs,” the paper reads.

There’s no evidence highlighting if this new infrastructure will ever see the light of day. If it does, it’s only good news for the company.