Workflow
量子位
icon
Search documents
谷歌新版Gemini一夜端掉UI:单HTML文件复刻macOS,成功率100%
量子位· 2025-10-15 01:08
Core Insights - Google's AI, Gemini 3.0 Pro, has demonstrated the ability to create a fully functional macOS-like web operating system from simple prompts, showcasing its advanced capabilities in UI design and functionality [2][3][4] - The AI's success in generating operating systems for macOS, Windows, and Linux within a single HTML file indicates a significant leap in programming models, potentially positioning Gemini 3.0 Pro as a leading tool in the field [10][12][15] - Despite the impressive results, some experts caution that these creations are merely simulations and not true operating systems, emphasizing the distinction between emulation and actual implementation [18] Group 1: Gemini 3.0 Pro Capabilities - Gemini 3.0 Pro can replicate macOS UI features, including animations, window management, and bundled software, all functioning correctly [4][10] - The AI can also generate a web-based Windows environment with integrated Python and gaming capabilities, demonstrating versatility across different operating systems [12][11] - A successful attempt to create a Linux desktop environment further highlights the AI's comprehensive capabilities in UI and functionality [16][15] Group 2: Community Reactions and Comparisons - Users have expressed excitement over the potential of Gemini 3.0 Pro, suggesting it could become the strongest programming model to date if the final version meets these expectations [9] - Comparisons with other AI models, such as Claude 4.5 Sonnet, reveal that Gemini 3.0 Pro outperforms its competitors in generating functional applications [13] - The community acknowledges the impressive nature of the AI's output while also recognizing the limitations of its current capabilities, particularly in terms of true operating system functionality [18] Group 3: Future Prospects - Although Google has not officially announced the release date for Gemini 3.0 Pro, industry insiders speculate it may debut in the coming months based on previous patterns [19][20] - Increased visibility through demonstration videos from influencers suggests a strategic marketing approach by Google, reminiscent of past successful campaigns [22] - The anticipation surrounding Gemini 3.0 Pro raises concerns about potential disappointment if expectations are set too high, similar to the reception of previous AI models [22]
实测新版LiblibAI:终于把模型、生图、工作流塞进一个碗了
量子位· 2025-10-15 01:08
Core Insights - The article discusses the significant upgrades in LiblibAI 2.0, transforming it from a model-finding website to a comprehensive AIGC (AI-Generated Content) platform, enhancing user experience and functionality [11][36]. Group 1: Platform Upgrades - LiblibAI 2.0 introduces multiple models and video effects, moving beyond simple interface changes to a more integrated creative workflow [3][12]. - The platform now allows users to create content without switching between multiple websites, streamlining the creative process [11][12]. - The interface has evolved to resemble a combination of ChatGPT and Canva, making it more user-friendly [12]. Group 2: Model Integration - The platform retains its core strength by integrating popular models such as Qwen-Image, Seedream 4.0, and the latest Midjourney V7 model, which was only recently released [15][16]. - LiblibAI 2.0 has also incorporated various mainstream video models, ensuring a comprehensive offering for users [17][18]. Group 3: User Experience - The new feature of adding special effects to videos has been highlighted as a standout capability, allowing for creative transformations [19][21]. - Users have reported mixed experiences, with some noting issues like page lag and limited editing capabilities for generated content [28][38]. - The platform's ability to visualize model selection through a global image style library simplifies the process for new users [33]. Group 4: Company Background - LiblibAI has a history of rapid growth, having completed four rounds of financing in one year, setting a record in the domestic AI application sector [39]. - The founder, Chen Mian, has a strong background in commercializing products, previously working with popular applications like Jianying and CapCut [42][43]. - The company is transitioning from a model-sharing community to a comprehensive AI toolkit for creators, which poses challenges in maintaining user trust and engagement [45].
谢赛宁新作:VAE退役,RAE当立
量子位· 2025-10-14 08:16
Core Viewpoint - The era of Variational Autoencoders (VAE) is coming to an end, with Representation Autoencoders (RAE) set to take over in the field of diffusion models [1][3]. Summary by Sections RAE Introduction - RAE is a new type of autoencoder designed for training diffusion Transformers (DiT), utilizing pre-trained representation encoders (like DINO, SigLIP, MAE) paired with lightweight decoders, replacing the traditional VAE [3][9]. Advantages of RAE - RAE provides high-quality reconstruction results and a semantically rich latent space, supporting scalable transformer-based architectures. It achieves faster convergence without the need for additional representation alignment losses [4][10]. Performance Metrics - At a resolution of 256×256, the FID score without guidance is 1.51, and with guidance, it is 1.13 for both 256×256 and 512×512 resolutions [6]. Limitations of VAE - VAE has outdated backbone networks, leading to overly complex architectures, requiring 450 GFLOPs compared to only 22 GFLOPs for a simple ViT-B encoder [7]. - The compressed latent space of VAE (only 4 channels) severely limits information capacity, resulting in minimal improvement in information carrying ability [7]. - VAE's weak representation capability, relying solely on reconstruction training, leads to low feature quality and slows down convergence, negatively impacting generation quality [7]. RAE's Design and Training - RAE combines pre-trained representation encoders with trained decoders without requiring additional training or alignment phases, and it does not introduce auxiliary loss functions [9]. - RAE outperforms SD-VAE in reconstruction quality despite its simplicity [10]. Model Comparisons - RAE models such as DINOv2-B, SigLIP2-B, and MAE-B show significant improvements in rFID and Top-1 accuracy compared to SD-VAE [11]. Adjustments for Diffusion Models - RAE requires simple adjustments for effective performance in high-dimensional spaces, including a wide DiT design, noise scheduling, and noise injection in the decoder training [13][17]. - The DiT-XL model trained with RAE surpasses REPA without any auxiliary losses or additional training phases, achieving convergence speeds up to 16 times faster than REPA based on SD-VAE [18][19]. Scalability and Efficiency - The new architecture enhances the scalability of DiT in terms of training computation and model size, outperforming both standard DiT based on RAE and traditional methods based on VAE [24].
不用跟AI客气了!新研究:语气越粗鲁回答正确率越高
量子位· 2025-10-14 08:16
Core Insights - The article discusses a study from Penn State University titled "Mind Your Tone," which reveals that using a ruder tone when interacting with AI models like GPT-4o results in higher accuracy in responses, with a correctness rate of 84.8% compared to 80.8% when using a very polite tone [2][10]. Group 1: Study Findings - The study involved a test with 250 multiple-choice questions across various subjects, where each question was presented in five different tones ranging from very polite to very rude [6][7]. - The results indicated that the ruder the tone, the more accurate the AI's responses, suggesting that polite phrasing may introduce unnecessary complexity that distracts the AI from the core task [10][12]. Group 2: Implications for AI Interaction - The findings imply that clearer and more direct instructions yield better results when using AI tools, as the rudeness in tone may help the AI focus on the task at hand [13][18]. - The article notes that while newer models like GPT-4o perform better with ruder tones, older models such as GPT-3.5 and Llama2-70B do not respond well to rudeness, indicating a difference in how various AI models process language [16][17].
OpenAI自研芯片内幕曝光!18个月前开始用AI优化芯片设计,比人类工程师更快
量子位· 2025-10-14 05:39
Core Viewpoint - OpenAI and Broadcom have announced a strategic collaboration to deploy a 10GW scale AI accelerator, marking a significant step in building the infrastructure necessary to unlock AI potential and address computational demands [5][12][43] Group 1: Collaboration Details - The partnership involves OpenAI designing AI accelerators and systems, while Broadcom will assist in their development and deployment, with full deployment expected by the end of 2029 [5][6] - The 10GW scale is equivalent to 10,000MW, which can power approximately 100 million 100-watt light bulbs, indicating the substantial power requirements for AI operations [10][11] - OpenAI's CEO emphasized that this collaboration is crucial for creating infrastructure that benefits humanity and businesses, while Broadcom's CEO highlighted its significance in the pursuit of general artificial intelligence [12][13] Group 2: Strategic Importance - The collaboration underscores the importance of custom accelerators and Ethernet as core technologies in AI data centers, enhancing Broadcom's leadership in AI infrastructure [13] - For OpenAI, this partnership helps alleviate computational constraints, especially given the nearly 800 million active users of ChatGPT each week [14] Group 3: Insights from Leadership - OpenAI's President discussed the reasons for developing in-house chips, including a deep understanding of workloads, the necessity of vertical integration, and challenges faced with external collaborations [18][21] - The decision to self-develop chips is driven by the need to address specific computational tasks that existing chips do not adequately cover, emphasizing the importance of vertical integration [21][30] - OpenAI's leadership has recognized that scaling is essential for achieving optimal results, as demonstrated in their past experiences with reinforcement learning [27][28] Group 4: Future Implications - The self-developed chips are expected to enhance efficiency, leading to better performance and cost-effectiveness in AI models [31] - AI is playing a significant role in optimizing chip design, reportedly outperforming human engineers in speed and efficiency [32][34] - OpenAI's strategy of "self-development + collaboration" has been in the works for nearly two years, with ongoing efforts to design a dedicated inference chip [43]
量子位「MEET2026智能未来大会」启动!年度榜单征集中
量子位· 2025-10-14 05:39
Core Insights - The article emphasizes the transformative impact of artificial intelligence (AI) on various sectors, marking the beginning of a new era where AI reshapes work, life, and societal operations [1][7]. Group 1: AI Integration and Evolution - Intelligent technology has deeply penetrated production and daily life, evolving from mere tools to intelligent partners that understand human needs [2]. - AI technology is no longer confined to specific fields but transcends industry, discipline, and scenario boundaries, creating new ecosystems and opportunities [3]. - Emerging technologies such as multimodal, AR/VR, and spatial computing are blurring the lines between the digital and physical worlds [4]. Group 2: MEET2026 Conference Overview - The MEET2026 Intelligent Future Conference will focus on the theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future," inviting leaders from technology, industry, and academia to witness industry transformation [7]. - This year marks the seventh iteration of the MEET Intelligent Future Conference, which attracts thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the intelligent technology industry [9][12]. - The conference will feature prominent figures such as Dr. Kai-Fu Lee and Professor Zhang Yaqin, along with leaders from major tech companies like Baidu, Alibaba, Tencent, and Huawei [9]. Group 3: AI Trends and Awards - The "2025 Artificial Intelligence Annual List" will recognize influential companies, products, and individuals in the AI sector, with results announced at the MEET2026 conference [16][17]. - The annual trend report will highlight ten significant AI trends, analyzing their potential and impact on the industry [22]. Group 4: Event Logistics - The MEET2026 conference is scheduled for December 2025 in Beijing, China, with registration details to be announced [24].
0人工参与实现梯度更新!MIT新框架让AI自动生成微调数据,权重自主升级
量子位· 2025-10-14 04:08
Core Viewpoint - The article discusses a new reinforcement learning framework called SEAL (Self-Adapting LLMs) developed by MIT, which enables large models to autonomously update their weights and learn new knowledge without human intervention [1][4][6]. Group 1: SEAL Framework Overview - SEAL employs a nested learning mechanism that consists of an external loop driven by reinforcement learning and an internal loop for parameter updates [4][26]. - The framework allows models to generate fine-tuning data and self-update instructions, thus overcoming the limitations of relying solely on external supervised data [6][25]. Group 2: Knowledge Incorporation Experiment - In the knowledge incorporation experiment, the Qwen2.5-7B model was tested using the SQuAD dataset, where it generated training data based on new paragraphs without seeing the corresponding questions [9][10]. - The accuracy of the model improved from 32.7% to 47.0% when using SEAL for fine-tuning, outperforming both original and GPT-4.1 generated data [14][15]. - SEAL demonstrated a significant accuracy of 58.2% when tested with longer paragraphs, indicating its ability to generalize to larger data organization tasks [16]. Group 3: Few-Shot Learning Experiment - In the few-shot learning experiment, the LLaMA-3.2-1B-Instruct model was evaluated using a subset of tasks from the ARC-AGI dataset [17][18]. - SEAL achieved a success rate of 72.5%, significantly higher than the 0% success rate of fixed few-shot prompts and 20% from random sampling strategies [22][23]. - Although SEAL's performance did not reach the optimal strategy (Oracle TTT) at 100%, it showcased strong task adaptability through self-discovered learning paths [22]. Group 4: Mechanism of SEAL - SEAL's process involves reading new information, rewriting it in its own language, and performing gradient updates for autonomous learning [25]. - The model generates self-edit instructions that describe how to update itself based on the current input, including information extraction and training parameters [28][29]. - The framework utilizes a non-traditional reinforcement learning method called ReSTEM, which focuses on behavior cloning and filtered sampling to optimize self-edit strategies [33][36].
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-14 04:08
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行的 企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现并 致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 企业榜 产品榜 人物榜 2025 人工智能年度潜力创业公司 聚焦于中国人 ...
别Claude Code了,一个国产免费命令行就够了
量子位· 2025-10-14 04:08
Core Viewpoint - The article discusses the launch of iFlow CLI by Alibaba's Heartflow research team as a strong domestic alternative to Claude Code, emphasizing its performance, user-friendliness, and free access for individual users [2][58]. Performance Comparison - iFlow CLI outperforms Claude Code and Codex in four benchmark tests: GAIA (general search Q&A), SWE-bench (GitHub code repair), Terminal-Bench (diverse CLI usage scenarios), and BrowseComp-ZH (Chinese general search) [2]. - The performance of iFlow CLI is enhanced by integrating top domestic open-source models like Qwen3-Coder, DeepSeek-V3.1-Terminus, Kimi-K2-0905, and GLM-4.5 [4][6]. Features and Advantages - iFlow CLI offers zero-cost access to advanced models such as Qwen3 MAX, Kimi K2, DeepSeek V3.2, and GLM4.6, with no usage limits [7]. - The tool supports natural language commands for task execution, enabling full automation of workflows [9]. - iFlow CLI includes features like custom commands, task tools, and a built-in open market, enhancing its usability for developers [10][11]. User Experience - The installation process for iFlow CLI is straightforward, requiring only a single command in the terminal [14]. - Users can perform complex tasks, such as data analysis and code reviews, with simple commands, significantly reducing the learning curve [20][29]. - The tool allows for the creation of sub-agents tailored to specific tasks, enhancing its versatility [43][45]. Market Position and Implications - iFlow CLI represents a significant advancement in the domestic AI ecosystem, particularly in light of changes in usage policies by overseas tools like Claude [56][58]. - The tool's free access and supportive community platform foster a conducive environment for the proliferation of AI applications among domestic developers [58].
将科研脏活累活真·丢给AI!上海AI Lab推出深度科研智能体FlowSearch
量子位· 2025-10-14 04:08
InternAgent团队 投稿 量子位 | 公众号 QbitAI 将复杂科研过程自动化落地,上海人工智能实验室推出FlowSearch! 在GAIA、HLE、GPQA以及TRQA等科研基准上, FlowSearch不仅实现了性能全面领先,还展示了AI在复杂科研任务中的动态协作与深度 推理能力。 展开来说,当AI在问答基准和标准化测试中表现卓越之时,其进行科学研究的能力也在被更多关注。 科学研究不同于解题或信息检索,它是一个开放性、长期且复杂的认知过程——研究者需要提出原创问题、设计实验方案、收集并整合多源证 据,并在不断迭代中形成系统结论。 这样的过程远超计算能力本身,它要求的是创新思维、动态推理能力以及对复杂知识关系的精准掌控。 而 FlowSearch ,正是一个 由动态结构化知识流驱动的深度科研智能体 。 它通过动态结构化知识流构建科研任务的多层依赖图,并在多智能体框架下实现任务的并行探索、知识的递归整合和流程的自适应优化。 与传统"输入—计算—输出"的封闭式AI不同,FlowSearch更像一个理解你研究思路的伙伴——当发现新信息,它会主动调整计划;当证据链 不完整,它会引导进一步探索;当推理偏离目 ...