3-2-2025 (BEIJING) Chinese startup DeepSeek has achieved a remarkable technological breakthrough by circumventing NVIDIA’s CUDA framework in its latest large language model development, according to industry reports.
The Chinese firm has demonstrated exceptional innovation by opting to utilise NVIDIA’s lower-level PTX (Parallel Thread Execution) language instead of the industry-standard CUDA framework when training on NVIDIA’s H800 chips. This strategic move has yielded impressive results, with hardware efficiency reportedly ten times higher than that of tech giant Meta and other competitors, as revealed in an analysis by South Korea’s Mirae Asset Securities.
NVIDIA’s CUDA framework has long been considered the gold standard in AI development, providing developers with a simplified approach to harness the computing power of NVIDIA’s graphics processing units (GPUs). This dominance has helped cement NVIDIA’s virtual monopoly in the global AI hardware market.
However, DeepSeek’s unconventional approach represents a significant departure from industry norms. By “rebuilding everything from scratch”, as noted in technical papers analysed by US technology website Tom’s Hardware, the company has managed to bypass the traditional constraints associated with CUDA’s general-purpose programming framework.
The implications are substantial: training times for AI models could be dramatically reduced. According to Chinese tech media outlet “Fast Technology”, what typically requires ten days of training can be accomplished by DeepSeek in merely five days.
Industry experts note that while this approach demands considerably more complex programming and maintenance efforts compared to using high-level programming languages like CUDA, it could prove strategically advantageous. Sources cited by Fast Technology and Tencent suggest that DeepSeek’s expertise in PTX programming could facilitate smoother adaptation to domestic Chinese GPUs in the future.