量子位

Search documents
因为不用AI写代码,我在终面挂了 | 一个程序员的奇葩面试经历
量子位· 2025-07-24 06:05
奕然 发自 凹非寺 量子位 | 公众号 QbitAI "因为不是AI First,我在终面挂了。" 最近,一外国小哥的经历意外火了! 他表示自己几个月前被解雇,终于来到了一家自己本来很看好的初创公司,并且走到了终面,与CEO面对面。 原本以为很顺利,结果就因为在面试时,说到自己工作不会优先使用AI,面试完5min就收到了拒信。 It finally happened. 这一经历引发了不少网友讨论,在reddit上热度很高。 有网友给他支招: 不要在领导面前贬低AI,尤其是不亲自编码的领导。 你对AI的理解大部分是正确的,但它在代码解释方面有强大的作用。 在CEO终面前,他总共经历了三轮面试:HR面—CoderByte测试—团队技术讨论。这几轮都很顺利。 好好好,下次沟通学着聪明点。咱就是说,在座的也共勉。。。 AI缺点说不得??? CEO问小哥,他的编码风格是怎样的,以及如何在开发过程中使用AI。 这位小哥回答道, 大模型太啰嗦了,它们的代码要么不安全,要么试图从头开始写简单的函数而不是使用内置工具。 即使在我自己的一个小型业余项目里,当我尝试使用Agentic AI添加一个简单功能时,也很难。 面试完5mi ...
奥特曼首次透露GPT-5上手体验:在擅长领域感到无力,往后一靠感到眩晕
量子位· 2025-07-24 01:18
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI OpenAI掌门人奥特曼,可能是GPT-5发布前的最后一次深度访谈。 这一次,奥特曼透露了上手GPT-5后的 "天啊时刻" :我往后靠在椅子上,感到一阵眩晕, 在自己擅长的领域感到了无力 。 最近一段时间,不论奥特曼、OpenAI官方账号还是OpenAI员工都在反复提醒大家: GPT-5 很快就要 发布 。 在发布前,与主持人 Theo Von 长达一个半小时的对话中,奥特曼抛出了一连串惊人观点: 他的孩子将永远不会比AI聪明,这从出生那一刻就注定了 AI CEO可能不会太久就接管OpenAI,"到时候我会找别的事做" 现有的硬件配不上人工智能的能力 …… 测试GPT-5的"天啊"时刻 故事要从一封邮件说起。当天早上,奥特曼收到一个自己连问题都搞不懂的邮件,他抱着试试看的心态输入到GPT-5中,结果瞬间得到了完美 答案。 奥特曼回忆道: 我真的坐在那里愣了一会儿,这是那种"天啊,来了"的时刻。 虽然我很快就投入到下一件事中,但那种感觉没有消散。我在自己应该擅长的领域感到了无用感,而AI轻而易举就解决了。" 这种体验让他意识到,很多人担心的那种被AI超越的感 ...
浙大校友打造AI代码测试神器,零代码零bug,30分钟创建网站
量子位· 2025-07-24 01:18
Core Viewpoint - TestSprite 2.0 is an innovative AI testing platform designed specifically for AI programming, significantly improving code accuracy from 42% to 93% and enabling the creation of new websites in just 30 minutes without human intervention [2][19][13]. Group 1: Product Features - TestSprite is the first testing platform tailored for AI programming, allowing users to initiate testing with a simple prompt in their IDE [3][8]. - The platform automatically reviews product requirement documents, descriptors, and code libraries to generate comprehensive integration test plans [9]. - TestSprite can autonomously generate all necessary test cases, write test code, compile test scripts, execute tests in a cloud infrastructure, and return structured reports to coding agents [12]. Group 2: Performance and Impact - The platform's performance was particularly impressive on the Trae development platform, demonstrating its capability to test, debug, and fix errors efficiently [11][13]. - The entire process of building a complete website with zero code was achieved in just 30 minutes, showcasing the platform's efficiency [13][15]. - TestSprite has gained the trust of over 6,000 development teams, indicating strong market acceptance and demand [21]. Group 3: Company Background - TestSprite was founded by Yunhao Jiao, a Zhejiang University alumnus with a strong background in natural language processing and software development [25][31]. - The company aims to reduce software release cycles by up to ten times by eliminating cumbersome manual testing processes [31]. - In November 2024, TestSprite secured $1.5 million in seed funding from top investment firms, which will help scale its autonomous testing tools [32][33].
WAIC探展征集|加入量子位直播共创
量子位· 2025-07-23 10:36
直播时间有限 ,我们将会结合实际时间、路线来安排。 同时,量子位也会在 H3-A128 设立展位,欢迎来找我们线下见面! 林樾 发自 凹非寺 量子位|公众号 QbitAI 7月26日,WAIC第一天的下午,我们将在WAIC来一场 快闪探展直播 ! ⬇️ 点击下方按钮,一键预约 ⬇️ 如果你在 WAIC带来了什么 亮眼的新产品&新技术 希望在直播露面,欢迎填写下方表单告诉我们。 一键三连 「点赞」「转发」「小心心」 欢迎在评论区留下你的想法! — 完 — 探展时间 :7月26日 15:00-17:00,每个展位3-5min 探展区域 :展馆H1-H4 形式 :交流对话,产品展示 征集截止 : 7月24日 18:00 点亮星标 科技前沿进展每日见 ...
官方揭秘ChatGPT Agent背后原理!通过强化学习让模型自主探索最佳工具组合
量子位· 2025-07-23 10:36
Core Insights - The article discusses the technical details and implications of OpenAI's newly launched ChatGPT Agent, marking a significant step in the development of intelligent agents [1][2]. Group 1: ChatGPT Agent Overview - ChatGPT Agent consists of four main components: Deep Research, Operator, and additional tools such as terminal and image generation [3][9]. - The integration of Deep Research and Operator was driven by user demand for a more versatile tool that could handle both research and visual interaction tasks [6][11]. Group 2: Training Methodology - The training method involves integrating all tools into a virtual machine environment, allowing the model to autonomously explore the best tool combinations through reinforcement learning [12]. - The model learns to switch between tools seamlessly, enhancing its ability to complete tasks efficiently without explicit instructions on tool usage [13][14]. Group 3: Team Structure and Collaboration - The ChatGPT Agent team is a merger of the Deep Research and Operator teams, consisting of around 20 to 35 members who collaborated closely to complete the project in a few months [19][20]. - The team emphasizes a user scenario-driven approach, with application engineers participating in model training and researchers involved in deployment [21][22]. Group 4: Challenges and Future Directions - The main challenges faced during training included stability issues and the need for robustness against external factors like website downtime and API limitations [24]. - Future developments aim to create a general-purpose super agent capable of handling a wide range of tasks, with a focus on enhancing adaptability and user feedback integration [25][26]. Group 5: Security Measures - The team has implemented multi-layered security measures to address potential risks, including monitoring for abnormal behavior and requiring user confirmation for sensitive actions [27]. - Special attention is given to biological risks, ensuring that the agent cannot be misused for harmful purposes [24][27].
突然发疯!人形格斗冠军机器人凌空回旋踢,架子都干翻,现场研究员:0.0?
量子位· 2025-07-23 06:36
Core Viewpoint - The incident involving the humanoid robot DeREK, which exhibited erratic behavior during a competition, raises significant concerns regarding the safety and reliability of robotic systems, particularly in emergency situations [1][34]. Technical Explanation - The robot's malfunction was attributed to the activation of a full-body strategy while being suspended, leading to a failure in recognizing the ground beneath it, resulting in loss of control [7][8]. - The remote emergency stop system, although present, was ineffective, taking 5 seconds to activate, which is deemed excessively long for emergency situations [13][15]. - The design documentation indicated that the robot should not operate in a suspended state, yet the system reverted to a walking mode, causing the malfunction [11][12]. Safety Concerns - The battery's location and design pose risks during emergencies, as accessing it requires proximity to the robot, which can be dangerous when the robot is out of control [18]. - The robot's motors are powerful, making it difficult for individuals to intervene physically without risking severe injury [21]. - The emergency stop system relies on wireless communication, which is not suitable for safety-critical applications due to potential signal interference [24][25]. Recommendations for Improvement - A multi-step safety solution is suggested, including adherence to safety standards such as PL(d) or ASIL-D for actuator and battery management systems [36]. - Continuous testing, failure mode analysis, and single-point failure protection should be integral to the development process of robotic systems [36]. - The implementation of a more reliable emergency stop mechanism that does not depend on wireless communication is recommended to enhance safety [36].
李沐B站更新了!教你手搓语音大模型,代码全开源还能在线试玩
量子位· 2025-07-23 06:36
Core Insights - The article discusses the return of Li Mu and his new audio model, Higgs Audio V2, which integrates text and speech processing capabilities [1][2]. Group 1: Model Capabilities - Higgs Audio V2 can handle various speech tasks, including generating multilingual dialogues, automatic prosody adjustment, melody humming with cloned voices, and simultaneous generation of speech and background music [3][4]. - The model integrates 10 million hours of speech data into the training of a large language model (LLM), enabling it to both understand and generate speech [4][6]. Group 2: Technical Implementation - The model combines traditional text and speech models, allowing LLMs to communicate using speech by converting speech tasks into a unified processing format [7][8]. - A unified discretization audio tokenizer was developed to maintain audio quality while capturing semantic and acoustic features at a rate of 25 frames per second [11][13]. - The training data for the model was sourced from various platforms, ensuring quality by filtering out 90% of the data to meet the 10 million hours requirement [14][15]. Group 3: Model Training and Architecture - To enhance the model's understanding and generation of sound, a secondary audio model, AudioVerse, was trained to analyze user speech input and provide contextual information for the main model [16]. - The final multimodal model can perform complex tasks, such as writing and singing a song with accompanying music, and can analyze scenes and characters based on audio input [17][18]. Group 4: Performance Metrics - In real-time voice chat, the model achieves low latency and can understand and express emotions, outperforming other models in emotional and question categories by 75.7% and 55.7%, respectively [19]. - The model also excelled in traditional TTS benchmark tests, achieving the best performance in various evaluations [20]. Group 5: Accessibility and Community Engagement - The model's code has been made publicly available on GitHub, along with an online demo platform for users to experiment with [23][31]. - The article encourages users, especially those interested in creating content like virtual streamers, to try the model for voice cloning and other applications [25].
3D生成补上物理短板!首个系统性标注物理3D数据集上线,还有一个端到端框架
量子位· 2025-07-23 04:10
Core Viewpoint - The article discusses the introduction of PhysXNet, the first systematically annotated physical property 3D dataset, which aims to bridge the gap between virtual 3D generation and physical realism [1][3]. Group 1: Introduction of PhysXNet - PhysXNet contains over 26,000 richly annotated 3D objects, covering five core dimensions: physical scale, materials, affordance, kinematic information, and textual descriptions [3][11]. - An extended version, PhysXNet-XL, includes over 6 million programmatically generated 3D objects with physical annotations [12]. Group 2: Current Research Landscape - Existing 3D generation methods primarily focus on geometric structure and texture, neglecting the modeling based on physical properties [2][8]. - The demand for physical modeling, understanding, and reasoning in 3D space is increasing, necessitating a comprehensive physical-based 3D object modeling system [8][9]. Group 3: Data Annotation Process - The team designed a human-in-the-loop annotation process to efficiently collect and annotate physical information [16][19]. - The annotation framework consists of two main phases: initial data collection and determination of kinematic parameters [19]. Group 4: Generation Methodology - PhysXGen is introduced as a novel framework for generating 3D assets with physical properties, utilizing pre-trained 3D priors to achieve efficient training and good generalization [13][26]. - The method synchronously integrates basic physical properties during the generation process, optimizing structural branches for dual objectives [29][30]. Group 5: Experimental Evaluation - The team conducted qualitative and quantitative evaluations of the model, comparing it against a baseline that uses a separate structure to predict physical properties [33][34]. - PhysXGen demonstrated significant performance improvements in generating physical attributes, achieving relative performance gains of 24%, 64%, 28%, and 72% across various dimensions [38]. Group 6: Future Directions - The article emphasizes the importance of addressing key challenges in physical 3D generation tasks and outlines future research directions [43].
AI音效90秒长时可控生成!“狼嚎2秒,蟋蟀鸣8秒”精准搞定!清华&生数科技新研究入选ACM MM 2025
量子位· 2025-07-23 04:10
FreeAudio团队 投稿 量子位 | 公众号 QbitAI 文生音频系统最新突破,实现 精确时间控制 与 90秒长时音频生成 ! 想象一下,给AI发一段复杂指令生成音频: 0-10秒要有森林风吹声;0-4秒,鸟儿鸣叫;4-6秒,木头燃烧;6-16秒,动物踩在干树叶上的脚步声;10-16秒,蟋蟀鸣叫;16-19 秒,猫头鹰鸣叫;17-26秒,溪水流淌。 现在它真能做到每个声音都卡准时间点,请听: 不论是长时间生成还是短时间都轻松拿捏: 1-3秒,狼嚎声;0-8秒,蟋蟀鸣叫声。 各种器物以及人的声响也都能复刻: 0-8秒,柔和的原声吉他拨弦奠定了节奏;8-16秒,男声加入,伴随着吉他的音乐唱歌;16-22秒,人声情绪高涨,而吉他声则保持稳定 的背景;22-26秒,歌曲略微柔和,轻柔的吉他声持续演奏。 以前要实现这种效果非常难,要么时间控不准,要么时长撑不过10秒。 但现在, 来自清华大学、生数科技的新成果FreeAudio 直接把这变 成了现实。 更猛的是,它不用额外训练,靠一套 " 免 训练"方法 就突破了行业瓶颈,可基于自然语言文本与时间提示实现精确的时间控制与长时音频生 成。 实验中,在10秒时间控制的 ...
AI搜索一夜变天,专为Agent做搜索的赛道能否诞生百亿美金新巨头?
量子位· 2025-07-23 04:10
Core Viewpoint - The article discusses the significant impact of Bing Search API's shutdown on the AI search market, highlighting the emergence of new players like Xiaosu Technology that can fill the gap left by Bing's exit [2][3][4][19][32]. Group 1: Impact of Bing Search API Shutdown - Bing Search API will be completely shut down on August 11, making it difficult for developers to access quality search sources [2][3]. - The shutdown is seen as a strategic move by Microsoft to focus on AI search and integrate it with Azure services, potentially increasing pricing [4][19]. - The AI search market is projected to reach 347.2 billion yuan by 2029, with a growth rate exceeding 20% annually over the next five years [7]. Group 2: Current Market Landscape - The AI search market is characterized by a head effect, with major players focusing on consumer (ToC) services while neglecting business (ToB) opportunities [21][22]. - Traditional search giants like Google and Baidu have strong technical advantages but primarily cater to consumer needs, limiting their API offerings for businesses [22]. - New startups are emerging to provide specialized search services in vertical markets, such as legal and academic fields, to differentiate themselves from established players [23][24]. Group 3: Emergence of New Players - Xiaosu Technology has developed an AI search engine that offers competitive capabilities, achieving an annual recurring revenue (ARR) of $25 million within months [25][32]. - Xiaosu's AI search engine supports over 30 languages and provides features like semantic understanding and full-text display, making it suitable for various applications [29][38]. - The company emphasizes its ability to provide real-time data and multi-modal search capabilities, positioning itself as a strong alternative to Bing [36][38]. Group 4: Future Opportunities - The article suggests that the AI search market is ripe for new entrants, especially in the wake of Bing's exit, with Xiaosu Technology being a notable contender [32][44]. - The demand for AI search capabilities is expected to grow, particularly in the context of developing AI applications and agents [16][43]. - Xiaosu's focus on ToB services and competitive pricing (one-third of Bing's costs) positions it well to capture market share from businesses seeking reliable search solutions [32][45].