量子位
Search documents
港科大教授实测AI眼镜“作弊”:30分钟碾压95%的学生,把传统教学评估体系整破防了
量子位· 2026-01-06 07:06
Core Viewpoint - The article discusses an experiment conducted at Hong Kong University of Science and Technology where an AI-powered glasses equipped with ChatGPT-5.2 took a final exam, achieving a score of 92.5, outperforming over 95% of human students, raising questions about the validity of traditional educational assessment methods [1][4][6]. Group 1: Experiment Overview - The AI glasses, developed by a team led by Professors Zhang Jun and Meng Zili, were designed to "cheat" in a controlled exam setting for the course "Computer Network Principles" [7]. - The AI glasses utilized a sophisticated process where questions were captured via a camera, sent to the cloud for processing, and the answers were displayed back on the glasses for the student to transcribe [12][14]. - The AI achieved full marks in multiple-choice and single-page short answer questions, and scored 45.5 out of 53 in multi-page short answer questions, demonstrating strong reasoning capabilities [14]. Group 2: Hardware and Software Selection - The project team evaluated 12 mainstream smart glasses and selected Rokid Glasses due to their superior SDK and ecosystem, which allowed for better integration with the AI model [8][10][11]. - The choice of ChatGPT-5.2 was based on its strong response speed and general knowledge capabilities, making it suitable for the exam context [11]. Group 3: Implications for Educational Assessment - The experiment highlighted the limitations of traditional educational assessments, which focus primarily on the final answer rather than the learning process [21][46]. - As AI becomes proficient in standardized testing, the relevance of current assessment methods is called into question, particularly regarding their ability to measure deeper learning and critical thinking skills [22][32][42]. - The article suggests a shift in assessment focus from merely providing answers to evaluating reasoning processes and decision-making quality, which are harder for AI to replicate [38][48].
陈天桥代季峰打响2026大模型第一枪:30B参数跑出1T性能
量子位· 2026-01-06 05:48
Core Viewpoint - MiroThinker 1.5, developed by MiroMind, is positioned as a leading AI model in the intelligent agent field, showcasing superior performance in various benchmark tests compared to other top models like GPT-5-High and Gemini-3-Pro [1][3][5]. Performance Evaluation - MiroThinker 1.5 achieved notable scores in benchmark tests: - HLE-Text: 39.2% - BrowseComp: 69.8% - BrowseComp-ZH: 71.5% - GAIA-Val-165: 80.8% [3][4]. - It surpassed ChatGPT-Agent's previous record in BrowseComp, establishing itself in the global top tier [5]. Model Efficiency - MiroThinker 1.5 operates with significantly fewer parameters (30B and 235B) compared to competitors, achieving comparable or superior results through high efficiency [7][8]. - The model's inference cost is notably low at $0.07 per call, which is only 1/20 of Kimi-K2-Thinking's cost, while also demonstrating faster inference speeds [8]. Development Team and Background - The MiroMind team, responsible for MiroThinker 1.5, previously excelled in predicting outcomes in decentralized markets, showcasing their expertise in model development [9][10]. Interactive Scaling and Model Training - MiroThinker 1.5 incorporates a novel approach called Interactive Scaling, which emphasizes interaction with the external environment during both training and inference phases, enhancing its reasoning capabilities [46][58]. - The model employs a feedback loop in its reasoning process, allowing for iterative verification and correction, which contrasts with traditional models that rely heavily on memorization [48][57]. Predictive Capabilities - MiroThinker 1.5 demonstrates a robust ability to make predictions based on real-time data, as evidenced by its analysis of sports events and video game release timelines, showcasing a logical and evidence-based approach [15][35][41]. - The model's predictions are structured to avoid reliance on past knowledge, instead focusing on current information and real-world interactions [52][63]. Conclusion - MiroThinker 1.5 represents a significant advancement in AI model development, prioritizing interaction and evidence-based reasoning over sheer parameter size, thus redefining the landscape of intelligent agents [64].
OpenAI推理第一人离职,7年打造了o3/o1/GPT-4/Codex
量子位· 2026-01-06 04:20
Core Viewpoint - OpenAI's research vice president Jerry Tworek has announced his departure from the company after nearly seven years, citing a desire to explore research areas that are difficult to pursue at OpenAI [1][21]. Group 1: Jerry Tworek's Background and Contributions - Jerry Tworek has a strong theoretical background, having obtained a master's degree in mathematics from the University of Warsaw [9]. - Before joining OpenAI in 2019, he spent five years in quantitative research, focusing on trading strategies in the futures market, which led him to study reinforcement learning [12]. - At OpenAI, he was involved in significant projects, including the development of Codex and the research of large language models, emphasizing reasoning over mere pattern matching [16][18]. Group 2: Achievements at OpenAI - Tworek played a key role in the development of GPT-4 and ChatGPT, and he was the lead researcher for the first reasoning model, o1 [18]. - He was responsible for leading a team focused on enhancing the capabilities of large language models to solve complex STEM problems [16]. - His work contributed to the establishment of a new paradigm in scaling training and reasoning computations, known as reasoning models [26]. Group 3: Departure and Future Plans - Tworek expressed gratitude for his time at OpenAI, highlighting the friendships and technical breakthroughs he experienced [27][28]. - He plans to explore new research avenues that were challenging to pursue within OpenAI, indicating a shift in his career focus [28].
英特尔CES奇袭老黄大本营!英伟达显卡刚涨价,最强酷睿量产出货
量子位· 2026-01-06 04:20
Core Viewpoint - Intel has officially launched its third-generation Core Ultra processors, marking a significant advancement in AI PC technology and a return to leadership in semiconductor manufacturing with the introduction of the Intel 18A process node [1][5][12]. Group 1: Processor Features and Innovations - The third-generation Intel Core Ultra processors are expected to be the broadest AI PC platform ever created by Intel [4]. - The Intel 18A process introduces two key technologies: RibbonFET, which enhances transistor control and reduces leakage, and PowerVia, which moves power delivery to the back of the chip to minimize signal interference [12][13][14]. - The Intel 18A process results in over 15% performance improvement at the same power level, or a 25% reduction in power consumption for the same performance, with a 30% increase in transistor density [16]. Group 2: Performance Metrics - The flagship models, Core Ultra X9 and X7, feature up to 16 CPU cores, including new performance and efficiency cores, and 12 X cores [19]. - The integrated Intel Arc GPU significantly boosts graphics performance, with a 77% increase in average frame rates across 45 games at 1080p high settings compared to the previous generation [21]. - Multi-threaded performance has improved by 60% based on Cinebench 2024 tests, enhancing productivity tasks such as video editing and coding [25][27]. - The processors offer an impressive battery life of up to 27 hours, allowing for extended use without needing a charger [29][30]. Group 3: AI and Edge Computing - The flagship model's NPU performance reaches 50 TOPS, showcasing significant capabilities in AI applications such as large language models and video analysis [35]. - The third-generation Core Ultra processors are designed for both consumer PCs and edge computing applications, supporting a wide range of devices from smart robots to medical equipment [41]. - This marks the first time Intel has tested and certified processors for embedded and industrial edge scenarios, indicating a strategic expansion into these markets [40]. Group 4: Market Availability - The first batch of consumer laptops featuring the third-generation Core Ultra processors will be available for pre-order on January 6, with a global release on January 27 [43]. - Over 200 PC products are already in development, covering a wide range of applications from consumer PCs to edge computing [44]. Group 5: Industry Collaborations - The presence of Chinese companies at Intel's CES event has increased, with ByteDance showcasing its collaboration with Intel on cloud computing [45]. - New Wisdom Games, the only invited ISV, focuses on AI gaming coaching, indicating a growing interest in AI applications within the gaming industry [47][48].
「AI 100」榜单启动招募,AI产品“年会”不能停丨量子位智库
量子位· 2026-01-06 01:01
Core Insights - The article discusses the emergence of numerous keywords in the AI product sector by 2025, highlighting transformative AI products that are leading the market [4] - The "AI 100" list by Quantum Bit Think Tank aims to evaluate and recognize the top AI products in China, reflecting the industry's evolution and future trends [4][12] Group 1: AI 100 List Overview - The "AI 100" list is divided into three main categories: "Flagship AI 100," "Innovative AI 100," and the top three products in ten popular sub-sectors [6] - The "Flagship AI 100" will focus on the strongest AI products of 2025, showcasing those that have achieved significant technological breakthroughs and practical application value [7] - The "Innovative AI 100" aims to identify products that are expected to emerge in 2026, representing cutting-edge AI technology and potential industry disruptors [8] Group 2: Sub-sector Focus - The ten hottest sub-sectors for the top three products include AI Browser, AI Agent, AI Smart Assistant, AI Workbench, AI Creation, AI Education, AI Healthcare, AI Entertainment, Vibe Coding, and AI Consumer Hardware [9] - This targeted approach aims to provide a clearer picture of development trends within specific AI fields [9] Group 3: Application and Evaluation - The evaluation of the "AI 100" list employs a dual assessment system combining quantitative and qualitative metrics, focusing on user data and long-term development potential [13] - Quantitative metrics include user scale, growth, activity, and retention, while qualitative assessments consider technology, market space, design, monetization potential, and team background [13]
量子位编辑作者招聘
量子位· 2026-01-06 01:01
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: 任职要求: AI财经商业方向 岗位职责: 任职要求: AI产品方向 AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用 ...
老黄All in物理AI!最新GPU性能5倍提升,还砸掉了智驾门槛
量子位· 2026-01-06 01:01
Core Viewpoint - NVIDIA is shifting its focus entirely towards AI, as evidenced by its absence of gaming graphics cards at CES 2026 and the introduction of new AI products and architectures [2][10]. Group 1: AI Product Launches - NVIDIA unveiled the next-generation Rubin architecture GPU, which boasts inference and training performance that are 5 times and 3.5 times better than the Blackwell GB200, respectively [4][17]. - The company introduced five new product families targeting various AI applications, including the NVIDIA Nemotron for Agentic AI, NVIDIA Cosmos for physical AI, and NVIDIA Alpamayo for autonomous driving [6][8][39]. - The Vera Rubin NVL72 architecture was officially launched, featuring six core components designed to enhance AI data center capabilities [14][15]. Group 2: Performance Metrics - The Rubin GPU achieves an inference performance of 50 PFLOPS and a training performance of 35 PFLOPS under the NVFP4 data type, significantly surpassing its predecessor [17]. - Each Rubin GPU is equipped with 288GB of HBM4 memory and offers a bandwidth of 22 TB/s, supporting the high computational demands of modern AI models [18]. - The overall architecture of the Vera Rubin NVL72 can deliver 3.6 exaFLOPS of NVFP4 inference performance and 2.5 exaFLOPS of training performance [37]. Group 3: Networking and Connectivity - The introduction of NVLink 6 enhances interconnect bandwidth to 3.6 TB/s per GPU, with a total bandwidth of 260 TB/s across the entire NVL72 rack [20][21]. - The Vera CPU integrates 88 custom Arm cores and features a bandwidth of 1.8 TB/s for NVLink C2C interconnect, facilitating efficient communication between CPU and GPU [22]. Group 4: AI Model Developments - The Alpamayo model, a large-scale open-source visual-language-action model for autonomous driving, was launched with 10 billion parameters [41]. - The Nemotron series expanded to include specialized models for speech recognition, visual-language processing, and safety, enhancing AI applications across various sectors [49][51]. - The Cosmos model for robotics was upgraded to generate synthetic data that adheres to real-world physical laws, aiding in the development of AI agents [54][58]. Group 5: Industry Impact and Future Outlook - NVIDIA's comprehensive approach to AI, integrating models, data, and tools, is expected to strengthen its competitive edge and ecosystem lock-in [10]. - The company plans to begin mass production of the Vera Rubin NVL72 in the second half of 2026, indicating a strong commitment to advancing AI infrastructure [38].
悲报!Stack Overflow彻底凉了,比18年前上线首月问题数量还少
量子位· 2026-01-05 09:39
Core Viewpoint - Stack Overflow, once a thriving platform for developers, is experiencing a significant decline in user engagement, with the number of questions now lower than during its initial launch period 18 years ago [1][21]. Group 1: Historical Context - Stack Overflow was launched in 2008 to provide high-quality, reusable answers to programming questions, quickly becoming a vital resource for developers [7][9]. - The platform's unique voting and reputation system allowed for the creation of a structured knowledge base, making it the default destination for technical searches on Google for a long time [10][12]. Group 2: Decline in Engagement - Despite a significant increase in the global developer population and the emergence of numerous tools and languages, the act of asking questions on Stack Overflow has drastically decreased [4][21]. - The peak of Stack Overflow included over 180 sub-sites covering various STEM fields, but the platform is now facing challenges due to the rise of AI tools like GitHub Copilot and ChatGPT, which have changed developers' problem-solving habits [15][17][20]. Group 3: Impact of AI - The introduction of AI tools has led to a shift from public questioning to private inquiries, with developers now preferring to ask AI for solutions rather than posting on Stack Overflow [19][22]. - While AI tools rely on the quality content from Stack Overflow, they have diverted traffic away from the platform, leading to a decline in user engagement [23][24]. Group 4: Internal Challenges - Prior to the rise of AI, Stack Overflow was already facing issues due to its strict moderation policies, which discouraged new users from participating [26][27]. - The platform's attempt to integrate AI features resulted in a decline in content quality, further eroding user trust and engagement [28][29]. Group 5: Future Considerations - The future of Stack Overflow may hinge on whether it can refocus on niche technical areas to regain its unique value or fully embrace AI to restructure its operational model [32].
1人1假期,肝完10年编程量!马斯克锐评:奇点来了
量子位· 2026-01-05 07:04
Core Insights - The article discusses the significant advancements in programming agents, highlighting their impact on productivity and efficiency in software development [2][3][6]. Group 1: Programming Agents Impact - Midjourney founder David expresses that his programming projects during the holiday season surpassed those of the past decade, indicating a transformative shift in productivity due to programming agents [3][4]. - Elon Musk comments on the emergence of programming agents, stating, "We have entered the Singularity," reflecting a consensus among tech leaders about the profound changes brought by AI [5][6]. - Rohan Anil, an engineer at Anthropic, claims that with programming agents like Claude's Opus, he could compress six years of work into just a few months, showcasing the efficiency gains possible with these tools [9][15]. Group 2: Performance Metrics - The latest LiveBench benchmark results show Claude 4.5 Opus leading in various categories, including coding and reasoning, with scores of 79.65 in coding and 94.52 in mathematics, indicating its superior performance among AI models [23][24]. - Other models, such as GPT-5.1 Codex Max and Gemini 3 Pro Preview, follow behind, with Claude consistently outperforming them in agentic coding tasks [24]. Group 3: Industry Reactions and Developments - Greg Brockman notes that Anthropic has achieved what OpenAI aimed for but could not, emphasizing the practical utility of their tools [25][26]. - Boris Cherny, a developer of Claude Code, shares insights on how to effectively utilize the programming agent, highlighting its user-friendly setup and capabilities [28][29]. - The competitive landscape is evolving, with ByteDance's TRAE China version SOLO being made freely available, indicating a growing interest in programming agents within the industry [31][32].
量子位编辑作者招聘
量子位· 2026-01-05 05:00
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as interpreting technical reports from conferences [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and capital movements within the AI industry, requiring strong analytical skills and a passion for interviews [11]. - **AI Product Direction**: Involves monitoring AI applications and hardware developments, requiring a keen understanding of product experiences and market trends [11]. Group 3: Benefits and Growth - Employees can expect to engage with cutting-edge AI technologies, enhance their work efficiency through new tools, and build personal influence in the AI field [6]. - The company offers competitive salaries, comprehensive benefits, and a supportive environment for professional growth, including mentorship from senior editors [6][12]. Group 4: Company Impact - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sectors according to third-party data platforms [12].