DeepSeek V4: Open-source model closes the gap

Chinese startup DeepSeek releases V4 Flash and V4 Pro, showcasing technical gains in reasoning and coding benchmarks to compete with global AI laboratory leads.

The technical reality of DeepSeek V4

Look, the marketing teams at these AI startups love to throw around words like 'revolutionary,' but let's actually look at what DeepSeek just dropped. They released V4 Flash and V4 Pro. On paper, it's another step in the Mixture-of-Experts (MoE) evolution that the industry has been riding for the last year. DeepSeek isn't reinventing the wheel here; they are just making the wheel spin a lot faster with less friction.

The V4 Pro is their heavy hitter. It's aimed squarely at the reasoning-heavy workloads that have been the private playground of OpenAI's GPT-5.4 and Anthropic's Claude Opus 4.6. What's actually interesting from an engineering perspective isn't just the raw parameter count - which they have published in full: V4-Pro weighs in at 1.6 trillion total parameters with 49 billion active per token, while V4-Flash comes in at 284 billion total and 13 billion active - but the efficiency of their new attention architecture. The headline innovation in V4 isn't the MoE routing itself but the Hybrid Attention Architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). If you've spent any time debugging long-context deployments, you know the bottleneck is almost always the KV cache blowing out memory at scale. DeepSeek's hybrid attention slashes that: at a 1-million-token context, V4-Pro requires only 27% of the per-token inference FLOPs and 10% of the KV cache compared to its predecessor V3.2. DeepSeek claims to have smoothed this out, and the initial benchmarks suggest they aren't just blowing smoke.

Benchmarks and the coding obsession

Every new model release feels like a contest to see who can game the HumanEval benchmark better. DeepSeek V4 Flash is leaning hard into the coding niche. For those of us who actually use these tools to ship production code, the 'Flash' designation usually means 'quantized to the point of being useless for complex logic.' However, DeepSeek is positioning this as a high-speed inference engine that keeps its logic intact. The benchmarks that actually matter here are SWE-Verified - real software engineering tasks on real repos - where V4-Pro scores 80.6, putting it within striking distance of the closed-source frontier, and Codeforces, where V4-Pro reaches 3206 Elo, edging past GPT-5.4 (3168) in competitive programming territory where closed models have historically held the lead.

The V4 Pro, on the other hand, is being sold as an 'agentic' model. In plain English, that means it's supposedly better at not getting stuck in a loop when you give it a multi-step task. Most models fail at agency because they lose the plot after three or four tool calls. DeepSeek claims their new training recipe specifically targets 'long-horizon planning,' and importantly, both V4 models natively support a 1-million-token context window - which is what makes extended agentic loops viable in the first place. It's a bold claim. We've seen plenty of models that can write a 'Hello World' app, but very few that can navigate complex API documentation and autonomously fix a broken CI/CD pipeline without human intervention.

The open source versus closed lab divide

One thing you have to give DeepSeek credit for is their commitment to the open-weights model. While the big labs in San Francisco are locking everything behind proprietary APIs and monthly subscriptions, the Chinese ecosystem - led by DeepSeek and players like Alibaba's Qwen team - is dumping high-quality weights into the wild. This isn't just about 'openness'; it's a strategic move to commoditize the underlying intelligence layer. If DeepSeek can provide 95% of GPT-5.4's performance for a fraction of the cost, or better yet, let you run it on your own hardware, the value proposition for the closed-source giants starts to look a bit shaky.

However, 'open source' in this context usually comes with an asterisk. We get the weights, but we rarely get the full training data or the exact recipes for the post-training pipeline. It's more like a 'some assembly required' kit. For a dev ops team looking to self-host, V4 Pro is a massive deal because it reduces reliance on external APIs that can change their rate limits or privacy policies on a whim.

Hardware efficiency and the inference cost war

Let's talk about the hardware because that's where the real war is won. The most geopolitically significant story in the V4 release isn't on the benchmark scorecard - it's the chip stack underneath. DeepSeek built V4 around Huawei's Ascend hardware, with Huawei's Ascend supernode (powered by the new Ascend 950 AI chips) announced as a fully supported inference target out of the box. This is a meaningful break from the industry's default dependence on Nvidia hardware, which has been subject to US export restrictions in China since October 2022. DeepSeek did open-source CUDA kernel work through their DeepGEMM/MegaMoE libraries for Nvidia GPU users, and the weights run on standard H100-class hardware too - but the primary optimization target has visibly shifted toward domestic Chinese silicon. If you're running a fleet of inference servers, the 'cost-per-token' isn't just a number on a spreadsheet - it's the difference between a viable product and a money pit. DeepSeek's focus on Flash indicates they understand the market is moving away from 'smartest at any cost' toward 'smart enough and cheap enough to scale.'

Final thoughts on the release

Is DeepSeek V4 a 'GPT-5.4 killer'? Probably not. But that's the wrong question. The right question is whether it makes the expensive, closed-source models redundant for 80% of enterprise tasks. Based on the preview of V4 Pro and Flash, the answer is leaning toward 'yes.' If you need a model to handle logic, code, and basic reasoning, and you want to avoid the 'OpenAI tax,' this is likely your new baseline. It's clinical, it's efficient, and it's a clear signal that the gap between the 'frontier' labs and the rest of the world is closing faster than anyone expected. DeepSeek is just doing the engineering work that everyone else is too busy talking about in press releases. It's not flashy, but it's functional, and in this industry, that's a lot rarer than you'd think.

Buy me a coffee

We believe the truth should never be a byproduct of political interests. That's why our project is 100% independent and self-funded. We do this because we believe in the power of truth. Join us in keeping the spotlight on what truly matters.

Key takeaways

DeepSeek launched preview versions of its latest flagship AI models, V4 Flash and V4 Pro, on April 24, 2026.
V4-Pro is a 1.6 trillion parameter MoE model with 49 billion active parameters per token, pre-trained on 32 trillion+ tokens; V4-Flash has 284 billion total parameters and 13 billion active.
Both models support a native 1-million-token context window and offer Thinking and Non-Thinking inference modes.
The V4 Pro model is engineered for complex reasoning and advanced agentic tasks; V4-Flash is optimized for low-latency, cost-effective serving.
The headline architectural innovation is Hybrid Attention, combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), which cuts V4-Pro's per-token inference FLOPs to 27% and KV cache to 10% of those required by V3.2 at 1M-token context.
V4-Pro scores 80.6 on SWE-Verified (real software engineering tasks) and 3206 Elo on Codeforces, surpassing GPT-5.4's 3168 in competitive programming.
Both models are fully open-weight under an Apache 2.0 license, published on Hugging Face alongside a technical report.
V4 was optimized for Huawei's Ascend hardware stack; Huawei confirmed its Ascend supernode (Ascend 950 chips) fully supports DeepSeek V4 out of the box, marking a visible shift away from Nvidia GPU dependency.

Sources

DeepSeek V4-ProHugging Facehttps://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
DeepSeek V4 preview release notesDeepSeek API Docshttps://api-docs.deepseek.com/news/news260424
DeepSeek returns with V4-Pro and V4-FlashThe Next Webhttps://thenextweb.com/news/deepseek-v4-pro-flash-launch-open-source
DeepSeek unveils newest flagship AI modelBloomberghttps://www.bloomberg.com/news/articles/2026-04-24/deepseek-unveils-newest-flagship-a-year-after-ai-breakthrough
DeepSeek V4 launches: 1.6T MoE, 1M context, 10% KVDigital Appliedhttps://www.digitalapplied.com/blog/deepseek-v4-preview-launch-1m-context-efficiency

@anthony

Anthony Walters

Anthony is an automotive systems engineer obsessed with what happens when the rubber literally meets the road. Having tested everything from classic combustion engines to bleeding-edge autonomous LiDAR platforms, he focuses on powertrain dynamics and safety. He loves cutting through marketing hype to explain what cars can actually do.

Go to @anthony profile