Last year, AMD launched its WX 9100, a Vega-based Radeon Pro GPU with substantially higher performance and more ROPs than its previous GPUs, but a significantly higher price tag as well, at $2,200. Today, the company is following up that launch with a substantially cheaper WX 8200 at less than half the price.
The WX 8200 packs 3,584 GPU cores instead of 4,096, analogous to AMD’s Vega 56 consumer GPU, with 64 ROPS and slightly faster (2.0Gbps, up from 1.89Gbps) HBM2. Memory capacity is halved to 8GB, down from 16GB. Single and half-precision performance are modestly lower (10.8TFLOPS versus 12.3TFLOPs and 21.5TFLOPS versus 24.6TFLOPS, respectively).
What we have here, then, is an attempt by AMD to capture a more competitive position for its professional Vega cards by coming in well below previous price points. The card should be well-aligned given the competitive market. A Quadro P4000, for example, is based on GP104, but fields just 1,792 GPU cores in a configuration that’s actually a bit below the GTX 1070 as far as cores, texture mapping units, and ROPS. It sells for $800.
Given that the Vega 56 is typically measured as being about 8 percent faster than a GTX 1070 when running at a slightly lower clock speed, the new WX 8200 would be competitive if we were talking about consumer markets — but we aren’t. And in professional markets, performance has tended to favor Nvidia overall, thanks to optimization strategies and long-time professional software optimization expertise. That’s a nut that AMD has been trying to crack for years now, with varying degrees of success.
AMD is claiming that the new WX 8200 is a match for Nvidia’s P4000 and P5000 in applications like Adobe Premiere Pro, Maya, Nuke, Radeon Pro Render (its own rendering software) and the Blender Cycles engine. Its performance in other applications does not stack up as well, though it can be said to generally well with the P4000, with several wins and losses against that GPU.
AMD has been trying to muscle in on the software side, with new partnerships and software investments, as well as making a push into AI and machine learning. The company’s overall engagement on the professional and HPC sides of the GPU business hasn’t historically matched Nvidia, but there’ll be a fresh chance to try again at 7nm, with its first line of Vega products built on that process node and specifically intended for that market. The WX 8200 is expected to be on sale in early September and should be available for pre-order today at Newegg.
Just weeks after Intel’s product roadmaps purportedly leaked, we have an official update from the company itself. At a data center summit today, Intel gave fresh guidance on when it expects to have new silicon and products in-market. Now we have some idea how the company will adjust for delays to its 10nm node and what the next round of products will look like for data centers.
First up, courtesy of Tom’s Hardware: Cascade Lake. Cascade Lake will launch in 2018 and feature two major capabilities: Support for Intel’s Optane Persistent Memory DIMMs, and built-in hardware mitigation to protect against attacks like Meltdown and Spectre. Just including the latter could help spur upgrade cycles — while consumer workloads haven’t seen major performance issues from the various mitigations and protections, some server workloads took a significant penalty. With new Spectre vulnerabilities periodically being discovered, any hardware-based security could push companies to replace their older hardware.
Cascade Lake will have an optimized cache hierarchy (no word on how this is different from the optimized caches Intel deployed for Skylake-X) and will support a new AVX-512 capability, VNNI, when executing machine learning workloads.
Next up, Cooper Lake. Cooper Lake is a 14nm chip with hardware acceleration support for Google’s floating point format, Bfloat16. Intel has been extending support for Bfloat16 across its hardware in 2018 — earlier this year it added support to the Nervana NNP-L1000. Bfloat16 offers a truncated 16-bit version of the 32-bit IEEE754 binary32 floating point format. Bfloat16 only supports seven bits of precision, but this is generally more than sufficient for machine learning applications. Cooper Lake, according to Tom’s, will debut on a 14nm++ process, which seems to answer whether or not Intel intends to continue pushing ++’s at us with each successive process node revamp, if any changes to 14nm are planned at all. Assuming they are not, Intel will likely have trouble pushing clocks much higher.
One tidbit about Cooper Lake-SP, however, is that it’ll be socket compatible with Ice Lake, even though the latter will be built on a 10nm process. That’s a touch unusual for Intel, but it means Cooper Lake-SP and Ice Lake-SP will share the LGA4198 platform. Release dates haven’t been discussed, but Intel’s server shipments typically lag its desktop CPU and mobile introductions, which explains why the company is talking about having 10nm in-market for the holidays 2019 but still projects a 2020 launch date for Ice Lake-SP. These dates also suggest that Intel won’t debut 10nm for mobile first, with desktop and server chips following 12 months or more later. Instead, it’ll apparently move from 14++ to 10nm across all of its product families.
A lot of humans think they’re so good at video games that they make the point of describing themselves as “gamers.” Well, the AI platform developed by OpenAI would like to have a word, humans. After demonstrating its prowess in Dota 2 1v1 games, the newly enhanced “OpenAI Five” has shown that it can thrash even the best human teams. OpenAI challenged five of the world’s best pro Dota 2 players, and the AI won handily.
You might remember OpenAI popping up in the news just recently. It developed an artificial intelligence that can perform complex hand movements, and it didn’t need any human examples to do it. The OpenAI Dota bot gets better by playing games against itself at high-speed. Using a massively scaled-up version of “Proximal Policy Optimization,” which is a training system designed for OpenAI’s successful 1v1 Dota 2 bot. OpenAI Five plays the game for a whopping 180 years every single day thanks to 256 GPUs and 128,000 CPU cores in the Google Compute Cloud.
The exhibition match was streamed live on Twitch and featured noted Dota 2 players Blitz, Cap, Fogged, Merlini, and MoonMeander. These are all people who currently or previously play Dota 2 professionally — they’re in the 99.95th percentile of human players. There were a few restrictions to drop features the bots hadn’t yet learned to use, but OpenAI was able to reinstate some previously blocked aspects like wards and Roshan (a powerful neutral creep that can be farmed for XP gains).
In the first match, the AI steamrolled the humans in just 21 minutes. That’s considered quite fast for a game of Dota 2, and the AI was confident. It predicted a 95 percent win probability after seeing the hero teams. In game two, the AI wasn’t as confident, giving itself a 76 percent chance to win. It still pummeled the human team in just 25 minutes.
It was a best-out-of-three event, so the AI had already won when round three came along. For this match, the audience was allowed to choose heroes for OpenAI Five. Heroes in Dota 2 often interact in complex ways. Without the ability to build a team it knew how to play, the AI was at a disadvantage. It predicted a 17 percent win chance, and indeed it lost after a healthy 35-minute battle.
A few years ago, many researchers thought it was impossible for AI systems to get so good at complex team-based games like Dota 2, but here we are. The best gamers in the world are machines.
TSMC is bringing its factories back online after a virus crippled its production and slowed wafer processing at the company’s foundries. The company has released a statement on the outbreak, which began on August 3 and was 80 percent contained as of August 5. The foundry expected to be back online again sometime Monday, August 6. The company writes:
We estimate the impact to third quarter revenue to be about three percent, and impact to gross margin to be about one percentage point. The Company is confident shipments delayed in third quarter will be recovered in the fourth quarter 2018, and maintains its forecast of high single-digit revenue growth for 2018 in U.S. dollars given on July 19, 2018…
This virus outbreak occurred due to misoperation during the software installation process for a new tool, which caused a virus to spread once the tool was connected to the Company’s computer network. Data integrity and confidential information was not compromised. TSMC has taken actions to close this security gap and further strengthen security measures.
This wording seems to imply that the virus either infected the tool being installed or was already on the tool prior to installation. It’s not clear what “tool” means in this context; that phrasing can refer to relatively prosaic installations of support equipment or an enormous piece of fixed machinery. The EUV lithography manufacturing machines built by companies like ASML are also referred to as “tools,” despite their enormous costs and months-long installation procedures.
TSMC dominates the foundry industry — delays to its nodes ripple across product families and customers.
The fact that it took TSMC over two days to solve this problem and that the company expects to take such a revenue hit implies this was a fairly serious breach. A 3 percent revenue decline might not sound like much, but we can use TSMC’s Q3 2017 revenue report as a guideline for what the 2018 decline might look like.
In Q3 2017, TSMC reported $8.3B in revenue. This implies the company believes it lost nearly $249M in revenue due to the need to scrub its systems for malware over two days. The several-day delay could impact Apple’s next-generation iPhone launch later this year, but TSMC doesn’t anticipate any major problems. TSMC has been contacting its customers individually and will work with them to set new delivery timetables and to manage any inventory concerns. The foundry is currently in volume production on 7nm chips and Apple will likely be the first customer to debut the node. AMD is also building its 7nm Epyc CPU (codename Rome) and a machine intelligence-focused version of its Vega GPU on the new process.
Researchers at the University of Michigan have created a computer so tiny, it’s dwarfed by a single grain of dust. The device, measuring just 0.3mm on a side, doesn’t just strain our eyes — it strains our definition of what a computer is. Expect a lot more devices like this to arrive in the coming years as the benefits and capabilities of the IoT become more apparent.
The Michigan research team dubbed their new device the Michigan Micro Mote. It lacks any kind of memory storage system and must be constantly exposed to sunlight or an equivalent energy source. It can’t use a battery — there aren’t any small enough to work with it — and the scale of the device means it can’t accept much power, either. It runs on nano-amps of power, a million times less energy than your typical idling smartphone.
The resulting device is capable of measuring the temperature of its surrounding area and is small enough to fit into nooks and crannies that aren’t normally accessible to thermal sensors. There’s even talk of using it to measure the temperature of tumors inside the body. Knowing how temperatures are different in tumors could aid detection methods or further treatments at some point down the line, and the Michigan Micro Mote is biocompatible and small enough to function as part of an internal system.
“There’s interest in understanding how the metabolism of tumors change as they’re being treated,” David Blaauw, a professor of electrical and computer engineering at UM Blaauw, said. “The thought is that if you have some tumor tissue as it becomes malignant or as it’s being treated with chemotherapy, that its temperature characteristics change.
“That would be interesting, that’s not really known at this point,” he added. “That could help for diagnosis at some point down the road. To be able to measure that precisely in a small amount of tissue you would need an extremely small sensor.”
Obviously, power considerations would need to be addressed — the prototype device is solar-powered, but this wouldn’t work for any implanted product. But this type of sensor development could have significant ramifications for the IoT. Some of the most exciting work in computing these days is being done at the micro-scale and is focused less on improving raw compute power and more on extending that computational efficiency into areas it’s never touched. Biocompatible temperature sensors smaller than a grain of rice could extend our knowledge enormously one day, thanks to pioneering work in the field like this.
It’s been two years since Elon Musk said that Tesla would develop its own chips to facilitate autonomous driving and began hiring a team of designers and executives to make it happen. This week Musk went public with the results. Tesla has developed its own silicon for running the neural networks it uses to do vision processing in its AutoPilot software. The company is building it into a computer that’s a plug-in replacement for the Nvidia Drive PX2 systems it currently uses for AutoPilot 2.5-equipped cars.
Musk touts Tesla’s homegrown processor as being 10 times faster than what they can buy from Nvidia or anyone else today, stating the chip can analyze up to 2,000 frames of video per second instead of the current 200. Industry insiders I’ve spoken with concur that this type of performance improvement over the current PX2 performance is needed to achieve Level 4 or Level 5 autonomous driving. So far, so good. However, Nvidia is already shipping samples of a new car computer, Drive Xavier, that is an order of magnitude faster than the PX2. It is built around Nvidia’s new, and AI-targeted, Volta architecture that features Tensor Cores rather than the much older and more general purpose Pascal chips used in the PX2.
New Computer Pits Tesla’s Focused Development vs. Nvidia’s Massive Investment
Musk touted the fact that Tesla’s own chip has been optimized for Tesla’s driving software, which certainly gives it a leg up. Design lead Pete Bannon — best known for overseeing Apple’s A5 CPU — stressed that by doing a bottom-up design from scratch they have created something more efficient and more powerful than anything they could find in the market.
However, Nvidia says Xavier is the culmination of four years of work by 2,000 engineers and an investment of $2 billion — a complexity attested to by its 9 billion transistors. My speculation is that power usage and economics are bigger drivers of Tesla’s decision than pure performance. It is certainly reasonable that by building a chip and computer that only does exactly what it needs, Tesla can lower the power required — not a huge amount by server farm standards, but always a consideration in electric vehicles.
Tesla Stands to Save Billions by Developing Its Own Car Computer
Economically, Tesla has two cost issues. First is the cost of computers for all its vehicles going forward, but it is also likely to need to retrofit — perhaps free of charge — existing cars with new computers. Musk has previously promised that recent cars equipped with AutoPilot would eventually be able to achieve full autonomous driving capability. That will almost certainly require a new computer, and customers who view it is a promise made by Tesla will be reluctant to pay for it. So by building his own computer, Musk can realize substantial cost savings.
Nvidia doesn’t say what a PX2 costs, but it is speculated that early partners using it for autonomous vehicle testing paid up to $15K per unit. Even if the volume version is much less expensive, say around $2K, retrofitting a couple hundred thousand cars with it would come with a price tag approaching a billion dollars. That number will continue to go up until Tesla can ship its new computer, which Tesla doesn’t expect to happen until next year.
AMD hasn’t even launched its second-generation Threadripper family yet, but the company is hard at work on its 7nm follow-up to the Epyc server family. We don’t know much at all about these new CPUs, codenamed “Rome,” but during its most recent conference call, AMD did drop one surprising fact: These chips are being built at TSMC, not GlobalFoundries.
Ever since AMD spun its Dresden fabs and nascent 28nm facility off into its own company (GlobalFoundries), GF has been the firm’s preferred partner for all things CPU. While AMD’s Kabini and Temash were both built at TSMC, this was a last-minute effort on AMD’s part — the chips were originally designed to be built at GF and were moved only when AMD couldn’t get yield. To-date, we know that TSMC has handled console manufacturing for AMD SoCs and Atom fabbing for Intel during the short-lived SoFIA partnership. Neither project translates to much experience building big-core x86 work, but AMD was clearly satisfied with the results of the work — so much so, it moved its big-core production on Epyc over to the Taiwanese foundry.
AMD had previously stated that it would work with both TSMC and GF at 7nm, so that isn’t a surprise, but the only part we knew was being built at TSMC was a 7nm Radeon Vega machine intelligence chip. In last week’s conference call, Su said: “We are working with both the TSMC and GlobalFoundries in 7-nanometer. As for the 7-nanometer Rome that we’re currently sampling, that’s being manufactured at TSMC.”
AMD has wafer quotas and production levels that it has to meet with GlobalFoundries, but past that it has more freedom to allocate its resources and spending. That said, it’s genuinely surprising to see AMD cozying up to TSMC on 7nm and eschewing its typical partner. Lisa Su can certainly talk about engaging with GlobalFoundries at 7nm, but AMD’s first GPU win for 7nm is…a TSMC design. Its second announced 7nm product has been confirmed as…a TSMC design.
One of the ways that AMD saved money during the pre-Ryzen years was by building a single Ryzen architecture that it could leverage throughout its entire product line. The die underneath a Ryzen 3 1200 is the same silicon as inside a top-end Epyc processor. Building multiple dies with different features allows a company to optimize parts for their intended markets and to minimize production cost, but it also incurs additional design costs. TSMC and GlobalFoundries have different process nodes with different characteristics; there’s no way to quickly port a CPU from one foundry to the next.
One potential explanation for the shift could be something GF mentioned to us back in February when we toured the fab. Back then, the company highlighted its own work on 12FDX (12nm FD-SOI), which it believes is a major differentiation for GF compared with Samsung or TSMC. 12FDX is intended primarily for use in IoT devices and other low-power silicon, where battery life is essential. While it can offer burst performance broadly comparable to FinFET, the principal focus of the technology is to extend battery life. And it was 12FDX, more than 7nm FinFET, that GF wanted to talk about as a major capability. At the time, we judged this to be the result of the foundry wanting to focus on an area where it had a chance to demonstrate market leadership (which makes sense). But GF also noted it would be a fast-follower on 7nm, not a market leader. With Intel’s 10nm delay this seemed like less of a problem, but it could also mean that AMD’s desire to launch 7nm into market didn’t clearly align with when GF expected to have capacity available.
It’s not clear what this means for AMD’s 7nm production at GF. For now, we’re assuming that the company hasn’t shifted all of its x86 production to Taiwan, but will continue to tap GF for other products in the future. Whether this means certain product lines will stay at GF while others move to TSMC, or if AMD will build the same parts in both locations, is not clear.
AMD announced excellent second-quarter earnings for 2018 as the company’s overall position continues to strengthen. Overall revenue was $1.76B compared with $1.15B for the same period in 2017, up 1.53x year-on-year. Sales also improved by 7 percent compared with Q1 2018, and gross margin for the quarter was 37 percent, compared with 34 percent in Q2 2017. Net income for the quarter was $116M.
“We had an outstanding second quarter with strong revenue growth, margin expansion and our highest quarterly net income in seven years,” said Dr. Lisa Su, AMD president and CEO. “Most importantly, we believe our long-term technology bets position us very well for the future. We are confident that with the continued execution of our product roadmaps, we are on an excellent trajectory to drive market share gains and profitable growth.”
Higher revenue for the year on year period was driven by stronger sales across all of AMD’s business segments, while the sequential increase was driven by higher revenue in the Enterprise, Embedded, and Semi-Custom segment. Overall GPU revenue fell by four percent according to Patrick Moorhead, which isn’t much considering the decline in cryptocurrency shipments over the same period.
“As expected, AMD had another solid growth quarter with 2nd gen Ryzen, Ryzen mobile, and Epyc all ramping strong, leading to an overall 54% revenue gain. Ryzen units grew double-digit sequentially and Ryzen mobile doubled sequentially,” Moorhead told ExtremeTech via email. “Radeon graphics sales were down driven by a decline in blockchain revenue, but in the grand scheme of things, not very large, a 4 percent decline. It also appears AMD is starting to get some traction in the commercial workstation market, a very large market profit pool.
Epyc saw a sequential unit doubling in hyperscalers, which was expected given prior customer win announcements, but nice to see that execution. Overall, Epyc units and revenue increased 50 percent indicating a consistent upward trajectory. The follow-on 7nm and new Zen2 core server part, code-named Rome, is sampling and the company considers it looking “healthy,” a very good sign for AMD’s server future.
If we break things down by segment, AMD’s Compute and Graphics revenue dropped slightly quarter-on-quarter due to weakness in cryptocurrency sales. Growth in Enterprise, Embedded, and Semi-Custom (almost entirely driven by the enterprise chunk of the equation) offset that small decline. Overall, AMD continues to perform extremely well. It’s absolutely mandatory that the company nail its 7nm transition, but provided it does so the firm seems poised for continued success.