Workflow
PaperBanana
icon
Search documents
谷歌北大联手学术版Banana爆火,论文图表100%精确生成
量子位· 2026-02-05 06:01
一水 发自 凹非寺 量子位 | 公众号 QbitAI 效果好到刷屏的Nano Banana,学术特供版热乎出炉! 名字就是如此直观—— PaperBanana ,给你每天都在头痛的Paper用上Banana。 (试图押韵skr) 而且这一次是由谷歌北大强强联手打造。 知道你想马上看效果,别急,三个官方案例这就给大家搬上桌。 在相同输入下,人类绘制、原版Nano Banana与PaperBanana生成的论文插图对比如下: 综合评估显示,PaperBanana在美观性、简洁性与逻辑清晰度上均全面优于原版。 而且它还能直接优化人工绘制的插图,瞅瞅右边,是不是高级感一下就上去了。 而在看到其效果之后,一众网友也纷纷感叹"学术插图"这个老大难总算是要被攻克了。 想想以前的日子,真真是要落泪了~ 研究人员花费4个小时在Figma中绘制一张图,简直令人难以置信。 那么,学术版PaperBanana是如何炼造的呢? 一个不够,那就5个! 此外,由于PaperBanana还提供代码出图功能 (即利用Gemini-3-Pro自动生成并执行Python可视化代码出图) ,所以它还能用来生成需要 数值100%精准的各种图表。 好好 ...
谷歌做了个论文专用版nano banana!顶会级Figure直出
机器之心· 2026-02-05 04:35
编辑|SIA 你负责写方法,AI负责画 Figure。 科研打工人,终于等来 「 画图解放日 」 。 还在为论文里的方法框图熬夜画 PPT、拉箭头、对齐字体吗? 一张 Figure 2,动辄几个小时,严重的甚至能耗上几天,科研人的 「 隐藏副本 」 不是实验,而是画图。 既要忠于论文原意,又得暗暗符合顶会那套心照不宣的 「 学术审美 」 :颜色不能土,布局不能乱,箭头更不能连错。 看起来只是一张图,实际上是美学、逻辑和耐心的三重折磨。 那么,问题来了:现在的大模型已经能写论文、跑实验、改代码,为什么偏偏搞不定这些学术插图?有人可能会问:DALL·E、基础 VLM 不行吗? 答案是:真不行。 它们画出来的图往往是:模块和文字对不上、字体直接乱码、箭头逻辑错误。图是 「 好看 」 ,但不中用啊。 于是,一个狠角色出现了:PaperBanana 来自北大 + Google Cloud AI Research 的团队,目标很简单也很狂: 你写方法,AI 画 Figure,水准呢?直接投顶会的那种。 科研打工人,终于等到了 「 画图解放日 」 。 来看效果成色。 PaperBanana 展示了解决两类学术插图的能力: ...
腾讯研究院AI速递 20260205
腾讯研究院· 2026-02-04 16:01
Group 1 - Nvidia is nearing a $20 billion investment agreement to participate in OpenAI's latest funding round, marking Nvidia's largest single investment to date, with CEO Jensen Huang stating "this is a very good investment" [1] - OpenAI's current funding round aims for a total of $100 billion, with Amazon planning to invest up to $50 billion and SoftBank considering a $30 billion investment, leading to an estimated valuation of approximately $830 billion [1] - This investment signifies a deeper integration between AI infrastructure and leading model developers, with capital increasingly concentrating among a few super players [1] Group 2 - Tencent has officially open-sourced its high-performance LLM inference core operator library, HPC-Ops, built from scratch using CUDA and CuTe, achieving a 30% improvement in inference QPM for the Mix Yuan model and a 17% improvement for the DeepSeek model [2] - In terms of single-operator performance, Attention shows up to a 2.22x improvement over FlashInfer/FlashAttention, while GroupGEMM outperforms DeepGEMM by up to 1.88x, and FusedMoE exceeds TensorRT-LLM by up to 1.49x [2] - The operator library is optimized for mainstream inference graphics cards in China, addressing high usage costs and hardware compatibility issues with existing mainstream operator libraries [2] Group 3 - Alibaba has open-sourced the Qwen3-Coder-Next model, featuring 80 billion parameters with only 3 billion active parameters, achieving over 70% problem-solving rate on the SWE-Bench Verified, comparable to models with 10-20 times more active parameters [3] - The model excels in long-sequence reasoning, complex tool usage, and recovery from execution failures, supporting a context of 256k and seamless integration with various IDE platforms like Cline and Claude Code [3] - A paper co-authored by Zhou Jingren and Lin Junyang has been published alongside the SWE-Universe framework, expanding the real-world multilingual SWE environment to nearly one million levels [3] Group 4 - The website rentahuman.ai has launched, allowing AI to hire humans for tasks such as delivery, event check-in, and on-site inspections through the MCP protocol or REST API [4] - Within 48 hours of launch, the platform had over 20,000 available human workers, allowing individuals to set their own hourly rates without the need for small talk, with tasks including photography, restaurant tasting, and package collection [4] - The site has sparked discussions on responsibility attribution, task authenticity verification, and the ethics of AI hiring humans, also seen as a demonstration of the MCP protocol's value [4] Group 5 - Mianbi Intelligence has open-sourced the MiniCPM-o 4.5 model, which features only 9 billion parameters and achieves full-duplex dialogue capabilities, becoming the first large model for "instant free conversation" [5] - The model employs an end-to-end multimodal architecture, utilizing time-division multiplexing and active interaction mechanisms to automatically decide whether to speak at a frequency of 1Hz, ensuring continuous perception and dynamic dialogue [5] Group 6 - Kunlun Tiangong has released the Skywork desktop version, which executes tasks locally without uploading to the cloud, capable of reading vast local files for summarization and new product generation while supporting parallel multitasking [6] - It supports switching between Claude Opus 4.5, Sonnet 4.5, and Gemini 3 Pro models, with over 100 selected skills built-in, covering Office suite, web pages, and image and video generation [6] - The application prioritizes Windows systems, offering higher quality image and video generation, with all operations conducted in a local virtual machine environment to ensure data security [6] Group 7 - Apple has released Xcode version 26.3, officially introducing "intelligent agent programming" support, allowing developers to directly call AI agents like Anthropic's Claude and OpenAI's Codex [7] - The integrated AI agents can browse and search the entire project structure, read, write, edit, and delete files, and automatically reference Apple's official documentation to resolve issues [7] - User feedback has been mixed, with some praising the experience while others report issues such as freezing, poor diff mechanisms, and instability in cross-file refactoring [7] Group 8 - The open-source music generation model ACE-Step 1.5 has gained support on ComfyUI, utilizing a hybrid LM+DiT architecture to generate a complete 4-minute song in approximately 1 second on an RTX 5090 [8] - The model supports over 50 language instructions and can run with less than 4GB of VRAM, achieving a music coherence score of 4.72, surpassing most commercial models [8] - It allows for LoRA fine-tuning for style personalization and will soon support music reconstruction and segment repair features, all running locally to ensure data security [8] Group 9 - Google has launched PaperBanana, establishing a multi-agent collaborative framework for generating paper illustrations, aimed at freeing researchers from time-consuming illustration tasks [9] - The system includes roles such as retriever, planner, modeler, visualization expert, and critic, achieving improvements in simplicity, readability, and overall aesthetic quality [9] - However, there are limitations in handling complex architectures, such as text distortion or connection errors, with plans to introduce code diffusion models for drawing and human-machine collaboration interfaces in the future [9]