Main menu

Pages

Inside Intel's Delays in Delivering a Crucial New Microprocessor

featured image

Last May, Sandra Rivera, a top executive at chip giant Intel, received some alarming news.

The engineers worked for more than five years to develop a powerful new microprocessor to perform computing tasks in data centers and were confident that they had finally got the product right. But signs of a potentially serious technical glitch surfaced during a regular morning meeting to discuss the project.

The issue was so problematic that Sapphire Rapids, the microprocessor’s codename, had to be delayed – the latest in a series of setbacks for one of Intel’s most important products in years.

“We were very disheartened,” said Rivera, executive vice president in charge of Intel’s data center and artificial intelligence group. “It was a painful decision.”

The launch of Sapphire Rapids ended up being pushed back from mid-2022 to Tuesday, almost two years later than expected. The lengthy development of the product – which combines four chips into a single package – highlights some of the challenges facing an Intel turnaround effort as the United States is trying to assert its dominance in basic computer technology.

Since the 1970s, Intel has been a leader in the tiny slices of silicon that run most electronic devices, best known for a variety called microprocessors, which act as the electronic brains in most computers. But the Silicon Valley company in recent years has lost its long-standing leadership in manufacturing technology, which helps determine the computing speed of chips.

Patrick Gelsinger, who became Intel’s chief executive in 2021, has vowed to restore its manufacturing edge and build new factories in the United States. He was a key figure as Congress debated and passed legislation in the summer to reduce US reliance on chip manufacturing in Taiwan, which China claims as its territory.

Sapphire Rapids’ bumpy development has implications for whether Intel can bounce back to deliver future chips on time. This is an issue that could affect dozens of computer manufacturers and cloud service providers, not to mention the millions of consumers accessing online services that are likely powered by Intel technology.

“What we want is a steady cadence that’s predictable,” said Kirk Skaugen, executive vice president of server sales for Lenovo, a Chinese company that is planning 25 new systems based on the new processor. “Sapphire Rapids is the beginning of a journey.”

For Intel, the pressure is on. Along with falling demand for chips used in personal computers, the company faces stiff competition in server chips, which are its most profitable business. That issue has Wall Street worried, with Intel’s market value plummeting by more than $120 billion since Gelsinger took office.

At an online event on Tuesday to discuss Sapphire Rapids, which is named after a section of the Colorado River, Intel customers outlined plans to use the processor, which they say would have specific benefits for artificial intelligence tasks. . The product, formally called the 4th generation Intel Xeon Scalable processor, was unveiled alongside another overdue addition to the Xeon chip family. This product, formerly known as Ponte Vecchio, was designed to speed up special jobs and be used in conjunction with Sapphire Rapids on high-performance computers.

In an interview, Gelsinger said that Sapphire Rapids had everything to be a success, despite the delays. He chose Mrs. Rivera in 2021 to take over the development unit, where she is using lessons from experience to change the way Intel designs and tests its products. He said that Intel has done several internal reviews of what happened to Sapphire Rapids and “we’re not done yet.”

Sapphire Rapids started in 2015 with discussions among a small group of Intel engineers. The product was the company’s first attempt at a new approach to chip design. Companies now routinely pack tens of billions of tiny transistors onto each piece of silicon, but competitors such as Advanced Micro Devices and others have begun to make processors from multiple chips packed together in plastic packaging.

Intel engineers created a design with four chips, each with 15 processor “cores” that act as individual calculators for general-purpose computing jobs. The company also decided to include extra blocks of circuitry for special tasks – including artificial intelligence and cryptography – and to communicate with other components, such as chips that store data.

The interaction between so many elements is “very complex,” said Shlomit Weiss, who co-leads Intel’s design engineering group. “Complexity often brings problems.”

The Sapphire Rapids team battled bugs, glitches caused by design errors or manufacturing flaws that can cause a chip to miscalculate, run slowly, or stop working at all. They were also affected by delays in the product’s manufacturing process.

But in December 2019, engineers reached a milestone called “tape-in”. This is when electronic files containing a completed design go to a factory to make sample chips.

Sample chips arrived in early 2020 when Covid-19 forced lockdowns. Engineers soon got the Sapphire Rapids’ computing cores to communicate, said Nevine Nassif, lead engineer on the project. But more work than expected remained.

One key task was “validation,” a testing process in which Intel and its customers run software on sample chips to simulate computing tasks and detect bugs. Once flaws are found and fixed, designs can go back to the factory to make new test chips, which typically takes more than a month.

Repeating this process led to missed deadlines. Nassif said the Sapphire Rapids was designed to counter AMD’s Milan processor, released in March 2021. But it still wasn’t ready in June, when Intel announced a delay until next year to allow for more validation.

That’s when Rivera entered the picture. The longtime Intel executive successfully built a business in networking products before being appointed in 2019 as director of human resources.

“We had to get our execution mojo back,” said Gelsinger. “I needed someone to run into the fire and sort this thing out for me.”

In October 2021, Rivera and a senior design executive established Sapphire Rapids weekly status meetings, held every Monday at 7:00 am. .

Then came the discovery of the flaw last May. Mrs. Rivera declined to describe it in detail, but said it affected the processor’s performance. In June, it used an investor event to announce a delay of at least a quarter, which pushed Sapphire Rapids past the launch of a competing chip from AMD in November.

“We were ready to board,” said Nassif. The final delay “was so sad due to all the effort that went into it”.

Mrs. Rivera drew a number of lessons from the setbacks. One was simply that Intel included too many innovations in Sapphire Rapids, rather than delivering a less ambitious product earlier.

She also concluded that the team should have spent more time perfecting and testing their design using computer simulations. Finding bugs before they’re on sample chips is cheaper and would make it possible to remove features to simplify the product, Rivera said. Since then, it has gone on to bolster Intel’s simulation and validation skills.

“We used to have a lot of this type of muscle that we let atrophy,” Rivera said. “Now we are rebuilding.”

It also determined that Intel had programmed more products than its engineers and customers could easily handle. So it streamlined the product roadmap, including delaying a Sapphire Rapids successor to 2024 from 2023.

More broadly, Rivera and other Intel executives pushed the organization to develop better processes for documenting technical issues and sharing that information inside and outside the company.

Some Intel customers say that communication has improved.

“Did everything go well? No,” said Skaugen of Lenovo, who once ran Intel’s server chip business. “But we were surprised much less than in the past.”