This past week, DeepSeek took the tech world by storm with the release of DeepSeek-R1—a fully open-sourced Large Language Model (LLM) boasting performance comparable to OpenAI’s current gold-standard, yet developed at a fraction of the computational cost. From an advanced chain-of-thought reasoning engine to a distilled set of smaller, locally deployable models, DeepSeek-R1’s highlights have immediately captured the industry’s attention. So much so, in fact, that markets felt a ripple: the tech-heavy NASDAQ—particularly stocks like NVIDIA—took a dip upon news of DeepSeek’s more resource-efficient approach.
But beyond the headlines and market reactions, there’s a key question: How does DeepSeek-R1 really stack up for use cases like data extraction, Q&A, and summarization? After all, while some LLMs such as this DeepSeek R-1 & Open AI-O1 excel in high-level analytical tasks—like coding or advanced mathematics—users often rely on more modest functionalities in day-to-day operations.
In this blog, we will put that question to the test. Specifically, we will compare the current OpenAI integration in our Unstructured Data Terminal with a locally run version of DeepSeek-R1—Distill Qwen 7B—using a relevant financial document example: the Tesla 10-Q report from 04/24/2024. By examining these models’ responses side by side, we aim to find out whether DeepSeek’s cost-efficient and open-sourced prowess genuinely disrupts the landscape—or if it remains just another intriguing entry in the LLM marketplace. Let’s dive in.
It’s no surprise that both models perform very well on basic questions, since this information is readily available on the first page. The main difference lies in the time it takes to generate a response: for simple questions, the DeepSeek-R1 Distilled model still iterates through a lengthy chain of thought before arriving at its final answer, while OpenAI produces a succinct response within a few seconds.
Both models performed well when prompted to extract financial or numeric data from the document. However, the DeepSeek R-1 Distilled model outperformed OpenAI’s model when asked to take the additional step of calculating the percentage difference in revenue. Since this figure isn’t explicitly stated in the report, DeepSeek’s chain-of-thought reasoning allowed it to work out an accurate approximation. In contrast, OpenAI’s model began to hallucinate when given the same task, highlighting DeepSeek’s stronger analytical capabilities.
Both models provided sufficient summaries of the risk factors and legal concerns. However, the DeepSeek model struggled to extract information on company personnel mentioned across multiple pages, failing to identify Kathleen Wilson-Thompson or Vaibhav Taneja, who were cited on pages 34 and 36. It also completely hallucinated a figure named Joseph Balukas as president, an individual who appears to be entirely fabricated.
Conclusion
Overall, both models proved fairly accurate when tasked with retrieving data, answering questions, and summarizing topics. The DeepSeek model excelled at more analytical computations that went beyond the document’s scope, but it did hallucinate more often during other text-based tasks. Each model has its own strengths, so neither one is definitively superior. DeepSeek tends to take longer to perform tasks due to its “thinking out loud” process, whereas OpenAI typically generates quicker responses.
While their performances were comparable, DeepSeek’s open-source nature positions it as a key player to watch moving forward. Its ability to adapt and evolve through community contributions may drive innovation in analytical reasoning, ultimately enriching the AI landscape. Meanwhile, OpenAI’s strong performance in retrieval and summarization reaffirms its place as a reliable solution for quick, straightforward queries.