Workflow
多模态智能
icon
Search documents
750城市+5000小时第一人称视频,上海AI Lab开源面向世界探索高质量视频数据集
量子位· 2025-07-05 04:03
Core Viewpoint - The Sekai project aims to create a high-quality video dataset that serves as a foundation for interactive video generation, visual navigation, and video understanding, emphasizing the importance of high-quality data in building world models [1][2]. Group 1: Project Overview - The Sekai project is a collaborative effort involving institutions like Shanghai AI Lab, Beijing Institute of Technology, and Tokyo University, focusing on world exploration through a continuously iterated high-quality video dataset [2]. - The dataset includes over 5000 hours of first-person walking and drone footage from more than 750 cities across 101 countries, featuring detailed labels such as text descriptions, location, weather, time, crowd density, scene type, and camera trajectory [2][10]. Group 2: Dataset Composition - Sekai consists of two complementary datasets: Sekai-Real, which focuses on real-world videos sourced from YouTube, and Sekai-Game, which includes high-fidelity game footage [3]. - Sekai-Real was created from over 8600 hours of YouTube videos, ensuring a minimum resolution of 1080P and a frame rate above 30FPS, with all videos published within the last three years [3][5]. - Sekai-Game was developed using over 60 hours of gameplay from the high-fidelity game "Lushfoil Photography Sim," capturing realistic lighting effects and consistent image formats [3][9]. Group 3: Data Processing and Quality Control - The data collection process involved gathering 8623 hours of video from YouTube and over 60 hours from games, followed by a preprocessing phase that resulted in 6620 hours of Sekai-Real and 40 hours of Sekai-Game [5][6]. - Video annotation for Sekai-Real utilized large visual language models for efficient labeling, while the dataset underwent rigorous quality control measures, including brightness assessment and video quality scoring [7][8]. - The final dataset features segments ranging from 1 minute to nearly 6 hours, with an average length of 18.5 minutes, and includes structured location information and detailed content classification [10]. Group 4: Future Goals - The Sekai team aims to leverage this dataset to advance world modeling and multimodal intelligence, supporting applications in world generation, video understanding, and autonomous navigation [10].
不走寻常路的淘天技术节:AI狼人杀、Poster路演、博见社轮番上阵
量子位· 2025-07-01 03:51
Core Viewpoint - The "Hardcore Youth Technology Festival" organized by Taotian Group has evolved into a significant event showcasing technological advancements, particularly in AI, reflecting the company's commitment to practical and innovative technology applications [1][2][29]. Group 1: Event Overview - The fourth edition of the "Hardcore Youth Technology Festival" took place from June 30 to July 4, featuring a focus on practical technology rather than traditional presentations [1][2]. - The festival included various formats such as AI exhibition, AI communication, AI open day, and AI competitions, emphasizing hands-on demonstrations and interactions [3][4]. Group 2: AI Exhibition - The AI exhibition served as a large technology marketplace, showcasing nearly 40 latest technological achievements from Taotian Group's AIGX technology system through poster presentations [8][10]. - The AIGX system integrates closely with e-commerce scenarios, covering various operational needs such as indexing, recommendation, bidding, auctioning, creativity, and data management [9][11]. Group 3: AI Communication - The "Bojian Society" was established to share technological achievements and trends, facilitating discussions between academia and industry [16][19]. - This year, the event featured separate sessions for group and academic exchanges, focusing on "multimodal intelligence" and fostering collaboration between industry leaders and academic experts [18][19]. Group 4: AI Competitions - The AI competition segment included an "AI Hackathon 3.0" and a unique "AI Werewolf" game, where participants trained AI agents to play various roles, enhancing their skills in language understanding and strategic reasoning [20][24]. - The AI Werewolf game was designed to challenge AI agents in a social deduction context, emphasizing their capabilities in language generation and logical reasoning [25][26]. Group 5: Technological Advancements - Taotian Group announced significant progress in its AIGX technology system, including the launch of the self-developed recommendation model RecGPT, which enhances user experience by predicting needs based on historical data [34][37]. - The implementation of RecGPT has led to a notable increase in user engagement, with a double-digit growth in click rates and a 5% increase in add-to-cart actions [39][41]. Group 6: Organizational Philosophy - The festival reflects Taotian Group's long-term commitment to embedding AI into business processes, focusing on practical applications rather than chasing short-term trends [44][45]. - The event embodies a blend of youthful energy and craftsmanship, showcasing the company's dedication to continuous improvement and innovation in technology [58].
一天 15k 星,代码生成碾压 Claude,连 Cursor 都慌了?谷歌 Gemini CLI 杀疯了
AI前线· 2025-06-26 05:44
Core Insights - Google has officially launched Gemini CLI, an AI assistant for terminal environments, offering generous free usage quotas of 60 calls per minute and 1,000 calls per day [1][4][6] - The introduction of Gemini CLI marks a significant development in the competitive landscape of AI coding tools, with developers previously spending hundreds to thousands of dollars on similar tools [3][6] - Gemini CLI is open-source and has gained significant attention, achieving 15.1k stars on GitHub within a day of its release [8] Pricing and Accessibility - Users can access Gemini Code Assist for free by logging in with a personal Google account, unlocking the Gemini 2.5 Pro model and a million token context window [4] - The free usage model is seen as a strategic move to increase competition, particularly against Claude Code [6] Features and Capabilities - Gemini CLI supports various functionalities including code writing, debugging, project management, document querying, and code explanation, while also connecting to the MCP (Model Context Protocol) server for enhanced capabilities [10][15] - The tool is compatible with Mac, Linux, and Windows platforms, allowing for high efficiency and customization through a simple text file [10] Competitive Landscape - The launch of Gemini CLI has intensified competition in the AI coding tool market, with developers noting its superior performance compared to Claude Code in various coding tasks [18][20] - Feedback indicates that Gemini 2.5 Pro has significantly improved code generation and understanding capabilities, leading to faster bug fixes and higher completion rates in programming tasks [20][21] Development Philosophy - Google emphasizes a generalist model with Gemini 2.5 Pro, which is not specifically trained for coding tasks but rather designed to understand broader contexts and user needs [16][17] - The development team is focusing on integrating various capabilities rather than solely enhancing coding skills, aiming for a more holistic approach to software development [17][23] Future Outlook - The positive reception of Gemini CLI suggests a potential shift in the AI programming landscape, with indications that Google may be regaining ground in this competitive field [24]
张亚勤:后ChatGPT时代,中国人工智能产业的机遇、5大发展方向与3个预测
3 6 Ke· 2025-05-16 04:27
Group 1 - ChatGPT is recognized as the first AI agent to pass the Turing test, marking a significant milestone in AI development [4][6][19] - The rapid user adoption of ChatGPT, reaching over 100 million users within two months of launch, highlights its popularity and impact in the tech industry [3][6][19] - The evolution from GPT-3 to ChatGPT demonstrates substantial improvements in AI capabilities, particularly in natural language processing and user interaction [2][7][19] Group 2 - The structure of the IT industry is being reshaped by large models like GPT, with a layered architecture that includes cloud infrastructure, foundational models, and vertical models [9][11] - Opportunities for competitors in the AI large model era are significant, especially in vertical foundational models and SaaS applications [11][12][19] - The emergence of AI operating systems is being pursued by both established companies and startups, indicating a competitive landscape in the AI sector [12][19] Group 3 - The Chinese AI industry is expected to develop its own large models and killer applications, similar to the evolution of cloud computing [15][19] - The training of Chinese large models can benefit from multilingual data, enhancing their performance and capabilities [16][19] - The focus on generative AI is leading to a surge of new startups and investment in the sector, indicating a vibrant market landscape [18][19] Group 4 - The future of AI large models is projected to include advancements in multimodal intelligence, autonomous agents, edge intelligence, physical intelligence, and biological intelligence [32][33][34] - The integration of foundational models with vertical and edge models is expected to create a new industrial ecosystem, significantly larger than previous technological eras [34][35] - New algorithmic frameworks are needed to improve efficiency and reduce energy consumption in AI systems, with potential breakthroughs anticipated in the next five years [35][34]
山东“加码”10亿元资金 “券”力推动人工智能全链条发展
Huan Qiu Wang Zi Xun· 2025-05-13 04:14
Core Viewpoint - Shandong Province is investing 1 billion RMB to support the development of artificial intelligence (AI) through various policies and initiatives, extending support until the end of 2026 [1][3]. Group 1: Financial Support and Policies - The Shandong Provincial Development and Reform Commission announced a total of 1 billion RMB to support key AI clusters, platforms, enterprises, and projects [1]. - The support includes innovative policies such as "computing power vouchers," "model vouchers," "corpus vouchers," and "data sets" to strengthen AI development [1][4]. - A comprehensive "policy package" consisting of 28 measures and 45 specific policies has been introduced to support the entire AI industry chain [3]. Group 2: Research and Development Initiatives - Shandong plans to invest in over 150 basic research projects annually, focusing on cutting-edge theories such as multimodal intelligence and embodied intelligence [4]. - The province aims to enhance its capabilities in core technologies by supporting the construction of key innovation platforms and promoting the application of technological achievements [4][5]. Group 3: Infrastructure and Resource Allocation - The policies emphasize increasing the supply of essential elements for AI, including computing power, data, and models [4]. - Shandong will implement a "computing power voucher" subsidy based on a percentage of the amount spent on purchasing computing power, and will select 10 high-quality corpora annually for "corpus vouchers" [4][5]. - The province plans to select 30 large model products each year for "model vouchers" to accelerate the development of high-performance large models [5]. Group 4: Future Development Goals - By 2027, Shandong aims to establish around 30 provincial key laboratories and 20 provincial technology innovation centers in critical areas such as key chips and large models [5]. - The province intends to gather over 240 provincial-level scientific talents and incubate more than 50 technology-based enterprises to drive significant innovations in the AI sector [5].
统筹10亿资金,推进“人工智能+”发展
Qi Lu Wan Bao· 2025-05-12 21:07
Core Viewpoint - Shandong Province has introduced a comprehensive plan and policy measures to accelerate the development of artificial intelligence (AI) across key sectors, with a financial commitment of approximately 1 billion yuan by 2025 to support innovation and application in AI [1][4]. Group 1: Key Areas of Focus - The initiative targets 13 key areas across three main aspects: industrial development, consumer life, and government services, aiming to leverage AI for high-quality growth [2][3]. - In industrial development, six sectors have been prioritized: chemical, aluminum, steel, mining, high-end equipment, and biomedicine, which are considered Shandong's pillar industries with significant potential for AI application [2]. - For consumer life, four sectors have been selected: home, travel, healthcare, and cultural tourism, with a focus on enhancing quality of life through AI technologies [3]. - In government services, three areas are emphasized: digital governance, social management, and public safety, aiming to improve service efficiency and accessibility through AI [3]. Group 2: Financial and Policy Support - The policy measures include 28 specific initiatives with a total funding of 1 billion yuan, aimed at supporting key clusters, platforms, enterprises, and projects in the AI sector [4][5]. - The financial support will be extended until the end of next year, with the establishment of an AI industry fund to further bolster development efforts [4]. - Innovative support mechanisms such as "computing power vouchers," "model vouchers," and "data set vouchers" are introduced to stimulate AI innovation and application [4][6]. Group 3: Innovation and Development Goals - By 2027, the plan aims to cultivate 20 foundational AI models for service industries, create over 50 replicable application scenarios, and launch more than 100 exemplary cases [3][4]. - The initiative seeks to achieve breakthroughs in intelligent development across key industries, significantly enhancing productivity and safety levels [3][4]. - The focus on collaborative efforts aims to optimize the industrial ecosystem, supporting the growth of specialized enterprises and promoting cooperation across the AI value chain [7][8].