Workflow
深思SenseAI
icon
Search documents
Claude Opus 4.5 全面上线,凭什么夺回 Agentic Coding 第一!
深思SenseAI· 2025-11-25 12:42
我们现在正处于新一轮大模型密集发布期。上周是 Grok 、 Gemini 3.0 、 Nano Banana Pro ,本周 Anthropic 的 Claude Opus 4.5 上线。 直观感受: Minecraft & 乐高测试 在只给出一条提示词的前提下,模型生成了一个 Minecraft 克隆版。在这个单提示词测试中,这是目前见过效果最好的一次。角色移动流畅、帧率稳定,可以 正常破坏和放置方块,在下方快捷栏切换不同方块类型,也可以在地图中自由飞行。就完成度和可玩性而言,这个 Demo 已经接近一款真正可玩的沙盒游 戏。 与之对比,在同样的"单提示词 Minecraft 测试"中, Gemini 3 Pro 给出的结果就明显逊色一截。世界同样是程序化生成的,但无法破坏或放置方块,角色移动 也略显混乱,只能算是基础可看的 Demo 。 在这个测试里, Opus 4.5 可以说是碾压式领先。用另一条提示词让它生成一个乐高搭建网站,支持用户自由拼搭积木。返回的结果是一个完整可用的乐高模 型。可以在场景中拖动视角,把积木逐块堆叠、修改颜色、切可以删除,甚至选择不同形状的乐高积木。这已经到了一条提示词,就能生成 ...
Fal 联创对话 种子轮投资人:从 200 万到 1 亿美金的思考和决策
深思SenseAI· 2025-11-24 03:16
Core Insights - Fal has transformed "real-time video generation" from a flashy demo into a reusable infrastructure, achieving an annual recurring revenue (ARR) growth from approximately $2 million to over $100 million in less than two years, serving over 2 million developers and more than 300 enterprises, including Adobe and Canva [1][3][4] Company Overview - Founded in 2021 and headquartered in San Francisco, Fal is a generative media platform aimed at developers, hosting image, video, and audio models through a high-speed inference engine and unified API [4] - The company has raised multiple rounds of funding, with the latest round in October 2025 amounting to $250 million, leading to a valuation exceeding $4 billion [4] Transition from Data to AI - The initial focus was on data infrastructure, but the emergence of models like DALL-E 2 and ChatGPT prompted a shift towards inference, allowing users to utilize pre-trained models without extensive data preparation [6][9] - The decision to pivot was challenging, as the company had existing paying customers and two products running simultaneously, leading to confusion in communication [7][8] Product and Growth Strategy - Fal identified a significant market opportunity in generative media, particularly in video generation, which is seen as a new blue ocean market with rapid growth potential [11][17] - The company opted for an API-based approach to provide ease of use for developers, optimizing workflows while maintaining control over the code [13] - The focus on video generation has led to increased computational demands, necessitating further optimization of their systems [16] Commercialization and Sales - Fal has transitioned from a pay-as-you-go model to annual contracts to ensure revenue stability, with a focus on long-term commitments from enterprise clients [25][26] - The company actively promotes new model releases as marketing opportunities, aiming to be the first platform to support new models [24] Team and Culture - The company maintains a unique culture with no dedicated engineering managers, promoting a collaborative environment where all engineers contribute to coding [33] - Recruitment focuses on individuals with a passion for optimization and experience in database or system-level work, fostering a strong technical team [35][36]
别再肝了!Google 发布 SIMA 2,你的下一个游戏搭子可能是个 AI
深思SenseAI· 2025-11-21 04:14
Core Insights - Google has launched the next-generation general intelligence agent SIMA 2, which integrates deeply with Gemini, enabling it to understand and execute commands in virtual worlds, plan actions around objectives, and interact with players while continuously improving through trial and error [1][2] Group 1: SIMA 2 Capabilities - SIMA 2 can understand and execute complex, multi-step commands in games like "Minecraft" and "ASKA," significantly improving upon its predecessor SIMA 1, which struggled with such tasks [1][2] - The agent has been trained using a large dataset of human demonstration videos with language annotations, allowing it to develop initial "conversational collaboration" capabilities, explaining its intentions and next steps to users [2][4] - SIMA 2's task completion success rate has shown significant improvement compared to SIMA 1, demonstrating its enhanced ability to follow detailed instructions and provide feedback, akin to interacting with a real player [5][9] Group 2: Self-Improvement and Learning - SIMA 2 employs a closed-loop system of "trial and error + Gemini feedback evaluation" during training, allowing it to learn and complete more complex tasks over time [11] - The experience data accumulated by SIMA 2 can be used to train future, more powerful agents, establishing a foundation for a "general agent" capable of adapting to any world [13] Group 3: Path to General Intelligence - The combination of Gemini and SIMA 2 offers a compelling approach to achieving embodied intelligence by training agents in controlled, low-cost virtual 3D environments, where they can gather interaction data [14] - SIMA 2's ability to operate in various gaming environments is crucial for developing general embodied intelligence, enabling the agent to master skills, perform complex reasoning, and learn continuously in virtual worlds [15] Group 4: Implications for Robotics - The capabilities developed by SIMA 2, including navigation, tool use, and collaborative task execution, are essential modules for future intelligent agents to achieve "intelligent embodiment" in the real world [16]
实测如何一分钟内用 Gemini 3.0 Pro 搭建一款网页/游戏
深思SenseAI· 2025-11-19 10:34
正如此前业界预期,Google 正式发布了 Gemini 3.0 Pro 。这一代模型 主打更强的推理与理解能力 , 能够更好地捕捉语言的深层含义与细微差别 , 使用户在 无需更多精细提示词的情况下,也能获得更高质量的回答 。 在权威基准测试中,Gemini 3.0 Pro 在事实准确性评测中取得了 72.1% 的当前最优成绩,并在数学测试中以 23.4% 的得分达到同类模型的领先水平。这意味 着 Gemini 3.0 Pro 在科学、数学等多学科场景下具备更高的可靠性,能够高效处理跨领域、跨步骤的复杂问题。 | Benchmark | Description | | Gemini 3 Pro | Gemini 2.5 Pro | Claude Sonnet 4.5 | GPT-5.1 | | --- | --- | --- | --- | --- | --- | --- | | Humanity's Last Exam | Academic reasoning | No tools With search and | 37.5% 45.8% | 21.6% | 13.7% | 26.5% - | | | | ...
Google 的 Gemini 3.0 可能将于美国时间11月18日发布
深思SenseAI· 2025-11-17 12:54
Core Insights - Google is nearing the release of its Gemini 3.0 model, with the latest checkpoint "Gemini 3.0 Pro Preview" expected to be the final test version before the official launch [1][3] - The anticipated release date is around November 18, 2025, coinciding with the discontinuation of older versions [2][3] Performance Enhancements - Gemini 3.0 Pro shows significant improvements in overall performance, particularly in code generation, front-end interface construction, and multimodal reasoning tasks [5] - The model can generate complex planetary visualization scenes with real-time parameter adjustments, showcasing capabilities that are currently unmatched by other models [5] - It can also create an interactive Rubik's Cube simulation that adheres to real physical rules, indicating a leap towards next-generation interactive intelligent systems [6] Creative Capabilities - The model possesses full "composition + performance" abilities, autonomously creating and playing original music based on user instructions [7] - It can generate a "creative wormhole" simulation that is visually surreal and logically coherent, further emphasizing its creative potential [8][9] - Compared to other models, Gemini 3.0 Pro excels in generating both visual and audio content simultaneously, achieving higher quality and consistency [9] Visual Quality Trade-offs - Recent tests indicate a decline in image and visual modality generation quality, with notable differences in detail and aesthetics compared to previous versions [10] - The decision to prioritize capabilities in code and multimodal reasoning over visual generation is seen as a strategic product trade-off, given the presence of the Nano Banana model for image generation [11] Market Position and User Engagement - Since the launch of ChatGPT, Google has been perceived as a laggard in the AI space, prompting significant internal restructuring to integrate generative AI into core products [13] - Gemini applications have reached 650 million monthly active users, an increase of approximately 200 million since July, indicating a narrowing gap with ChatGPT's 800 million weekly active users [13] - Google's image generation applications, particularly Nano Banana, are performing well among younger demographics, suggesting a shift in user engagement [13] Competitive Landscape - The release of Gemini 3.0 is seen as a critical opportunity for Google to reclaim its position as a leading player in the AI industry, especially following the lukewarm reception of ChatGPT 5 [14] - The success of Gemini 3.0 could establish a generational divide in AI capabilities, particularly in code generation and multimodal creation, which would be detrimental to OpenAI's competitive standing [14]
李飞飞世界模型爆火后,我们实测后发现离「真可用」还很远
深思SenseAI· 2025-11-14 12:40
Core Insights - The article discusses the launch of World Labs' "world model," which can create 3D worlds based on a single image and prompt words, highlighting its potential and limitations in generating immersive environments [1][19]. Group 1: Functionality and User Experience - The world model can generate environments directly from prompt words or by uploading an image, with the latter yielding better results [1]. - Initial experiences with the model show impressive results in small-scale environments, but quality deteriorates significantly when expanding the generated area [2][3]. - Users experience a noticeable drop in quality and consistency as they move away from the original image, leading to issues like blurriness and distortion [4][5]. Group 2: Limitations and Challenges - The model struggles to maintain detail and consistency in larger environments, resulting in sparse details and a lack of immersive gameplay [5]. - The "world extension" feature, which allows users to generate multiple worlds, still suffers from severe geometric distortions and abstract representations, failing to meet practical needs for playable environments [6][8]. - The multi-image generation feature often gets stuck in loading, indicating performance issues that hinder its usability for creating complex scenes [8][11]. Group 3: Market Position and Future Potential - The article suggests that while the current version of the world model is not fully mature, it represents an early stage in AI-generated gaming and virtual space [19]. - The efforts by the team around "spatial intelligence" are seen as significant, opening new possibilities for future applications in virtual world construction and digital twins [19]. - Despite its limitations, the model serves as a notable starting point for the evolution of spatial computing and content production tools, warranting continued attention in the coming years [19].
当 AI 在耳机里主动和你说话,BeeBot 正在开启下一代社交形态
深思SenseAI· 2025-11-14 01:34
当你在街头戴着耳机听歌或播客时,耳机里突然传来提示音:"你的朋友A刚刚打卡了你昨天用餐的那家餐厅,500米外正举办最后一天快闪活动。"全程无需 掏出手机查看。 下载 App 后,它会常驻后台待机。只要戴上耳机自动唤醒,摘下即刻休眠。当你在听音乐时,BeeBot 会调低音量插入语音播报;如果正在收听播客,它还能 智能暂停/续播,确保重要信息不错过。但别担心,通话或视频场景它绝对零干扰,全程保持静默。 虽然应用上写的是 "AirPods 版 BeeBot" ,但它支持任何有线或无线耳机,以及基于蓝牙的音频设备,包括扬声器、车载音响和可穿戴设备,例如 Meta 的 Ray-Ban 智能眼镜。 据 Dennis Crowley 透露,用户每天会收到 BeeBot 数次更新推送,但不会高频到一天十几次,以避免过度打扰。这些更新内容整合了多个数据源,包括用户 的实时位置信息,以及其他用户分享的动态,还有当地网站更新的活动信息。 BeeBot 会根据你设置的兴趣关键词,智能推荐附近的地点与活动,帮你更高效 地探索周边,发现生活中的新可能。 它不是 " 步行导览 " ,不是 " 社交音频 " ,更不是要成为你的 "AI 伴侣 ...
a16z对话Nano Banana团队:2亿次编辑背后的"工作流革命"
深思SenseAI· 2025-11-12 01:02
Core Viewpoint - The article discusses the transformative impact of multi-modal generative AI, specifically through the example of Google DeepMind's Nano Banana, which significantly reduces the time required for creative tasks like character design and storyboarding from weeks to minutes. This shift allows creators to focus more on storytelling and emotional depth rather than tedious tasks, marking a revolution in creative workflows [1]. Group 1: Nano Banana Development - The Nano Banana team, formed from various groups focusing on image generation, aims to create a model that excels in interactive and conversational editing, combining high-quality visuals with multi-modal dialogue capabilities [4][6]. - The initial release of Nano Banana exceeded expectations, leading to a rapid increase in user requests, indicating its value to a wide audience [6][8]. Group 2: Future of Creative Workflows - The future of creative processes is envisioned as a spectrum, where professional creators can spend less time on mundane tasks and more on creative work, potentially leading to a surge in creativity [8][9]. - For everyday consumers, the technology could facilitate both fun creative tasks and more structured tasks like presentations, depending on the user's engagement level with the creative process [9]. Group 3: Artistic Intent and Control - The definition of art in the context of AI is debated, with emphasis on the importance of intent over mere output quality. The models serve as tools for artists to express their creativity [10][11]. - Artists have expressed a need for greater control and consistency in character representation across multiple images, which has been a challenge in previous models [11][12]. Group 4: User Interface and Experience - The development of user interfaces for these models is crucial, balancing complexity for professional users with simplicity for casual users. Future interfaces may provide intelligent suggestions based on user context [14][16]. - The coexistence of multiple models is anticipated, as no single model can cover all use cases effectively. This diversity will cater to different user needs and preferences [16][19]. Group 5: Educational Applications - The potential for AI in education is highlighted, with models capable of providing visual aids alongside textual explanations, enhancing learning experiences for visual learners [18][19]. - The integration of 3D technology into world models is discussed, with a preference for focusing on 2D projections to solve most problems effectively [21]. Group 6: Challenges and Future Directions - The article identifies ongoing challenges in improving image quality and consistency, with a focus on enhancing the lower limits of model performance to expand application scenarios [39][40]. - The need for models to better utilize context and maintain coherence over longer interactions is emphasized, which could significantly improve user trust and satisfaction [40].
未来已来!AI飞行器时代,将代替大部分人工
深思SenseAI· 2025-11-06 04:46
Core Viewpoint - Infravision is revolutionizing the construction of power transmission lines through its innovative drone technology, which offers a safer, more efficient, and cost-effective solution compared to traditional methods [1][4]. Group 1: Advantages of Infravision's Technology - The drone-based line construction avoids the safety hazards associated with high-altitude work and helicopter flights, and is not limited by terrain [5]. - The system is quieter and has a reduced impact on the environment and land ownership, minimizing disruption to landowners [6]. - Infravision's technology significantly enhances efficiency and reduces costs by eliminating the need for large helicopters and extensive manpower, leading to faster project timelines [6]. - The integrated system combines drone automation, precise navigation, and specialized aerial towing equipment, enabling it to handle long-distance high-voltage line installations at an industrial scale [6]. Group 2: Strategic Execution and Market Positioning - Infravision's rapid rise is attributed to its clear strategic focus on high-value niche markets, particularly in power transmission line construction, which faces significant pain points [8]. - The company initially targeted the Australian market to validate its technology and establish model projects, effectively leveraging limited resources to meet important customer demands [8]. - Infravision emphasizes providing end-to-end solutions rather than merely selling products, fostering long-term partnerships through equipment leasing and operational services [9]. - Following success in Australia, the company is expanding into the North American market, targeting major clients like PG&E [10]. - The company is rapidly scaling its team to meet increasing project demands, with plans to grow from 70 to 150-200 employees by the end of 2025 [10]. Group 3: Future Development and Industry Trends - The concept of "aerial embodied intelligence" is emerging, which involves autonomous flying robots capable of perception, decision-making, and physical interaction [11]. - The development of drone swarm control systems allows multiple drones to coordinate and complete tasks efficiently, enhancing operational capabilities in various sectors [12]. - Infravision and similar companies are not just offering advanced drones but are creating new operational paradigms that deconstruct dangerous and repetitive tasks into standardized, machine-executable operations [20].
B轮融资2000万美金:Archy 用云 OS + AI Agent重写牙科运营
深思SenseAI· 2025-11-04 02:38
Core Insights - Archy aims to revolutionize dental practice management through an integrated cloud platform that automates key workflows, enhancing efficiency and reducing operational costs [3][6][25] - The company has successfully raised $20 million in Series B funding, bringing total financing to $47 million, indicating strong investor confidence in its business model and growth potential [3][6] Company Overview - Founded by Jonathan Rat, Archy has developed a cloud-based system that integrates various software tools into a single platform, addressing the inefficiencies of traditional dental practice management [3][6] - Archy operates in 45 states, processing over $100 million in payments annually and serving 2.5 million patients, showcasing its market penetration and operational scale [3][6] Product Design and Technical Advantages - Archy's platform is designed to streamline user operations by reducing clicks and integrating multiple software functionalities, thus improving overall workflow efficiency [4][6] - The product includes four purchasable modules: Cloud PMS, Archy Intelligence, Payments & A/R, and Imaging & Clinical, each targeting specific operational needs within dental practices [5][6] Market Positioning and Competitive Edge - Archy differentiates itself from competitors by focusing on in-house development and rapid iteration, ensuring that the platform meets the high-frequency needs of dental practices effectively [15][16] - The company emphasizes a user-friendly design that minimizes training requirements, allowing dental teams to adopt the system quickly without extensive onboarding [17][18] Marketing and Brand Strategy - Archy employs non-traditional outreach methods to build rapport with potential clients, such as providing food and hosting small demonstrations, which helps reduce resistance to adopting new systems [19][21] - The company supports clients in promoting their services by providing marketing materials and templates, enhancing customer satisfaction and brand loyalty [21][22] Challenges and Future Vision - Despite rapid growth, Archy faces challenges in prioritizing development efforts and ensuring data security, particularly as it scales its operations [23][24] - The company's long-term vision is to rewrite the operational systems of dental practices, integrating AI capabilities to create a more efficient and automated workflow [25][27][28]