Workflow
Machine Learning
icon
Search documents
一文读懂数据工程的基础知识
3 6 Ke· 2025-07-10 02:10
Group 1 - Data engineering is defined as the process of designing, building, and maintaining systems that collect, store, analyze data, and make decisions based on that data [2] - Data engineering is essential for data-driven companies, serving as the foundation for data collection to decision-making [1][2] - Understanding the basic principles of data engineering is crucial for better comprehension of the field [3] Group 2 - Data sources can be categorized into structured, semi-structured, and unstructured data sources [5][10] - Structured data sources follow a predefined schema, such as relational databases, CRM systems, and ERP systems [7][9] - Semi-structured data sources include JSON files, XML files, HTML documents, and emails [10][12][15] - Unstructured data sources consist of text documents, social media posts, videos, and images [16][19][21] Group 3 - Data extraction methods include batch processing and real-time streaming [22][24] - Batch processing collects and processes data at scheduled intervals, while real-time streaming involves continuous data collection and processing [24][25] Group 4 - Data storage systems include databases, data lakes, and data warehouses [27][30] - Databases are organized collections of data suitable for transactional systems, while data lakes store raw data in its original format [29][30] - Data warehouses are optimized for querying, analysis, and reporting [30] Group 5 - Data governance and security have become increasingly important, with regulations like GDPR and CCPA emphasizing data integrity and privacy [34] - Data governance includes policies and procedures to ensure data quality, availability, and compliance with regulations [34][36] Group 6 - Data processing and transformation are necessary to clean and prepare data for analysis [37] - ETL (Extract, Transform, Load) processes are critical for integrating data from various sources [41] Group 7 - Data integration involves combining data from multiple sources into a single data repository [44] - Techniques for data integration include ETL, data federation, and API integration [46][47] Group 8 - Data quality is crucial for accurate analysis and decision-making, with validation techniques ensuring data accuracy [57][58] - Continuous monitoring and maintenance of data quality are essential for organizations [66] Group 9 - Data modeling techniques include conceptual, logical, and physical data modeling [70][71] - Data analysis and visualization tools help in ensuring data accuracy and discovering insights [73] Group 10 - Scalability and performance optimization are key challenges in data engineering, especially with growing data volumes and complexity [75][77] - Techniques for optimizing data systems include distributed computing frameworks, cloud-based solutions, and data indexing [79] Group 11 - Current trends in data engineering include the integration of AI and machine learning into workflows [84] - Cloud computing and serverless architectures are becoming standard practices in data engineering [85] Group 12 - The demand for data engineering skills is expected to increase as companies invest in data infrastructure and real-time processing [86]
Think Nvidia Stock Is Expensive? These 3 Charts Might Change Your Mind.
The Motley Fool· 2025-07-09 15:06
Nvidia (NVDA 2.05%) remains one of the best artificial intelligence (AI) stocks on the market. But with the chipmaker now trading at a price-to-sales multiple of 26.4, many investors may wonder if shares have gotten too expensive to buy. Don't be fooled: Nvidia stock is still reasonably priced.Nvidia stock isn't as expensive as it seemsNvidia designs graphics processing units (GPUs) that provide the processing power required to support modern AI and machine-learning software. The company's gross margins are ...
Salarius Pharmaceuticals' Seclidemstat Demonstrates Supporting Role in Inhibiting Validated Oncology Target LSD1 in Two Recently Published Studies
GlobeNewswire News Room· 2025-07-09 12:30
Core Insights - Salarius Pharmaceuticals is advancing its Phase 1/2 clinical study of seclidemstat for myelodysplastic syndrome (MDS) and chronic myelomonocytic leukemia (CMML) with updates expected later this year [1][2] - The company is progressing with its planned merger with Decoy Therapeutics, which is anticipated to create value through Decoy's innovative peptide conjugate therapeutics platform [2][3] Company Developments - Salarius' seclidemstat is a first-in-class, orally bioavailable LSD1 inhibitor being evaluated at MD Anderson Cancer Center [1][9] - The merger with Decoy Therapeutics is set to create a new entity named Decoy Therapeutics, with Decoy investors expected to own approximately 92.4% of the merged company [2][10] - The combined company will focus on developing peptide conjugate therapeutics for unmet medical needs in respiratory infectious diseases and gastroenterology oncology [3][8] Research Insights - Recent animal studies published highlight the importance of inhibiting LSD1 expression, supporting the potential of SP-2577 in cancer treatment [2][5] - The studies provide insights into the role of KDM1A in regulating neural stem cell fate and its implications in oral squamous cell carcinoma progression [5][6] Future Plans - Decoy Therapeutics plans to advance its lead asset, a pan-coronavirus antiviral, towards an Investigational New Drug (IND) application with the FDA within the next 12 months [6][7] - The ongoing clinical trial at MDACC is expected to report data on patients with MDS and CMML who have previously failed or relapsed after hypomethylating agent therapy [7][9]
Salarius Pharmaceuticals’ Seclidemstat Demonstrates Supporting Role in Inhibiting Validated Oncology Target LSD1 in Two Recently Published Studies
Globenewswire· 2025-07-09 12:30
Core Insights - Salarius Pharmaceuticals is advancing its Phase 1/2 clinical study of seclidemstat for myelodysplastic syndrome (MDS) and chronic myelomonocytic leukemia (CMML) with updates expected later this year [1][2] - The company is progressing with its planned merger with Decoy Therapeutics, which is anticipated to create value through Decoy's innovative peptide conjugate therapeutics [2][3] Company Developments - Salarius' seclidemstat is a first-in-class, orally bioavailable LSD1 inhibitor being evaluated at MD Anderson Cancer Center [1][9] - The merger with Decoy Therapeutics is set to combine Salarius' drug candidates with Decoy's IMPACT™ platform, which focuses on developing therapeutics for respiratory viruses and cancer [3][10] - The combined company will be led by Decoy's executive team, including CEO Frederick "Rick" Pierce and CSO Barbara Hibner [4] Research Insights - Two recent animal studies highlight the importance of inhibiting LSD1 expression, supporting the potential of SP-2577 in cancer treatment [2][5] - The studies published in peer-reviewed journals provide insights into the role of KDM1A in regulating neural stem cells and oral squamous cell carcinoma progression [5] Future Plans - Decoy plans to advance its lead asset, a pan-coronavirus antiviral, towards an Investigational New Drug (IND) application with the FDA within the next 12 months [6] - The ongoing clinical trial at MDACC may yield data on MDS and CMML patients who have previously failed or relapsed after hypomethylating agent therapy [7]
Victory Square Technologies Reports Q1 2025 Financial Results and Provides Strategic Update
Newsfile· 2025-07-09 04:12
Core Viewpoint - Victory Square Technologies Inc. reported solid financial results for Q1 2025 and emphasized its strategic focus on the digital health and wellness sectors, identifying significant growth opportunities in a fragmented healthcare market [4][11]. Financial Highlights - Adjusted Revenue for Q1 2025 was $6.528 million, while GAAP Revenue stood at $4.540 million. The Cost of Goods Sold (COGS) was $3.036 million, resulting in a Gross Margin of $1.504 million. As of March 31, 2025, the company had Cash & Marketable Securities amounting to $9.698 million [7]. Business Focus Areas - The company aims to scale its holdings in the digital health sector, particularly through its flagship portfolio company, Hydreight Technologies, which provides a compliant platform for healthcare services across the U.S. [4][8]. - Insu Therapeutics is developing a patent-pending oral insulin tablet, with early trials showing promising results in insulin uptake [8][14]. - Pawsible Ventures is focused on pet wellness and telehealth, leveraging trends in human healthcare, with the global pet care market projected to reach $368 billion by 2030 [10][14]. Portfolio Management - Victory Square actively manages its portfolio, having sold certain AI fintech solutions for approximately $880,000 in equity consideration in Q1 2025. This follows a previous sale of BlockX for about $1.7 million in listed shares [11][15]. Industry Context - The U.S. healthcare spending exceeds $4.5 trillion annually and is projected to reach $7 trillion by 2031, highlighting the sector's fragmentation and potential for value creation [4]. - The global diabetes therapeutics market is expected to reach $118 billion by 2032, with over 500 million people currently living with diabetes [14]. Upcoming Events - Victory Square will host an Investor Webinar on July 17, 2025, to provide updates and address investor questions [12]. Leadership and Strategy - The leadership team at Insu Therapeutics includes experienced professionals from academia and the pharmaceutical industry, indicating a strong foundation for advancing its clinical milestones [14]. - The company’s business model focuses on buying, building, and investing in early-stage tech companies, with a commitment to supporting their growth for up to 48 months [17].
Understanding Neural Nets: Mechanical Interpretation w/ Goodfire CEO Eric HO #ai #machinelearning
Sequoia Capital· 2025-07-08 18:44
Feasibility of Understanding Large Language Models - The field of mechanistic interpretability has a significant advantage due to perfect access to neurons, parameters, weights, and attention patterns in neural networks [1] - Understanding large language models is deeply necessary and critical for the future [2] - Establishing a norm to explain a percentage of the network by reconstructing it and extracting its concepts and features is crucial [2] Approaches to Understanding - Progress can be made by trying to understand all aspects of the network [2] - A baseline rudimentary understanding can be used to improve and understand more of the network [3]
Here are 3 Outsourcing Stocks to Consider Amid Industry Woes
ZACKS· 2025-07-08 15:41
Industry Overview - The Zacks Outsourcing industry is facing challenges such as data privacy regulations, communication barriers, geopolitical risks, quality control issues, and loss of control [1] - Despite these headwinds, the industry is driven by the need to cut costs, the rise of remote work, increased focus on cybersecurity, and trends in AI and ML [1] Future Trends - Business process outsourcing (BPO) services are experiencing consistent growth due to flexibility, lower costs, and improved service quality [4] - The IT outsourcing market is robust, with companies increasingly outsourcing entire IT departments to reduce costs and focus on core operations, driven by a shortage of in-house engineering talent [4] Cybersecurity Demand - There is a rising demand for data encryption and cybersecurity measures due to increased public awareness and evolving cyber threats [5] - Companies are focusing on employee security training and breach detection systems, with many turning to outsourced cybersecurity services to mitigate risks [5] Technological Innovations - Trends such as IoT, cloud computing, AI, and ML are transforming the outsourcing sector, enhancing efficiency and competitiveness [6] - Innovations allow for real-time decision-making and predictive maintenance, while AI and ML integration in customer support optimizes operational costs [6] Industry Performance - The Zacks Outsourcing industry currently holds a Zacks Industry Rank of 196, placing it in the bottom 20% of 246 Zacks industries, indicating underperformance [7][8] - Over the past year, the industry has declined by 6.9%, underperforming the broader Zacks Business Services sector and the S&P 500, which grew by 16.6% and 13.8% respectively [9][10] Current Valuation - The industry is trading at a forward 12-month price-to-earnings (P/E) ratio of 15.96X, compared to the S&P 500's 22.75X and the sector's 22.38X [12] Investment Opportunities - **Barrett Business Services, Inc. (BBSI)**: Focuses on payroll administration and staffing, with growth driven by new client sales and technology investments [15][16] - **The Brink's Company, Inc. (BCO)**: A global provider of cash management services, experiencing growth across segments, particularly in digital retail solutions [18][19] - **Capgemini SE (CGEMY)**: An IT services and consulting company benefiting from strong growth in financial services and public sector, with a focus on AI [20][21]
The Most Valuable Skill of the Next Decade (And It's Not Technical) | Manav Gupta | TEDxGGSIPU EDC
TEDx Talks· 2025-07-08 15:12
Core Argument - The most valuable skill of the next decade is the intersection of technical skills and storytelling, not just technical expertise [1][2][3] - Standing at this intersection provides significant advantages [5] The "Slip Stream" Effect - Being in the "slip stream" (right place at the right time) allows for exponential growth with the same effort [11][13][15] - AI boom provided a "slip stream" opportunity [15] - Companies like Jio create slipstream opportunities for others [14] Overcoming the "Ostrich Effect" - The "ostrich effect" (ignoring new trends) hinders progress [16][17][18] - It's crucial to be curious and adapt to new technologies and trends [19] The Rise of No-Code Tools - No-code tools are becoming more prevalent, shifting the focus from building to selling [19][20][21][22] - Building is becoming easier, making distribution (content creation) more critical [21][22] The Importance of Distribution (Content) - Building great products is not enough; getting people to care is essential [23][24] - Content creation is crucial for distribution and making people care [22][38] Leveraging Asymmetrical Returns - Pursue asymmetrical returns (nonlinear returns) through activities like networking and content creation [30][31] - Content creation is a classic example of asymmetrical returns [31] Benefits of Content Creation - Content establishes credibility through the mere exposure effect [32] - Content accelerates feedback loops, enabling rapid learning and improvement [33] - Content provides unprecedented networking opportunities [34] Content Creation as a Lottery Ticket - Content creation should be viewed as a lottery ticket; consistency is key [35][36][37] - The probability of success increases with each piece of content created [36][37] The Future Landscape - As technology becomes more accessible, the ability to distribute that technology becomes the greatest advantage [37][38] - The key is to be at the intersection of building and storytelling [38]
Down 88% From Its All-Time High, Here's 1 Big Reason Snap Stock Can Snap Back in 2025
The Motley Fool· 2025-07-08 08:48
Core Insights - Snap's stock has experienced a significant decline of 88% from its peak value, primarily due to privacy changes by Apple that affected its advertising effectiveness [2][5] - The company is showing signs of recovery, with improvements in its advertising technology and a growing user base [6][10] Group 1: Stock Performance - Snap's stock reached an all-time high of $83 in September 2021, representing a 388% gain from its IPO price of $17 [1] - Following the peak, Snap's stock has lost 88% of its value, indicating substantial volatility [2] Group 2: Advertising Challenges - Apple's privacy changes in 2021 required app developers to obtain user consent for tracking, which negatively impacted Snap's ability to sell targeted ads [4][5] - Meta Platforms adapted quickly to these changes, while Snap was slower but has made significant progress with a new machine learning-powered advertising engine [6] Group 3: Revenue Trends - Snap's annual revenue grew by 64% in 2021 but fell to 12% in 2022 and less than 1% in 2023 [8] - In 2024, Snap achieved record revenue of $5.4 billion, reflecting a growth rate of 16% [9] Group 4: User Engagement - As of the end of Q1 2025, Snapchat had 460 million daily active users, a 9% increase year over year, indicating strong user engagement [10] - The platform has gained approximately 140 million more daily users since the end of 2021, enhancing its attractiveness to advertisers [12] Group 5: Valuation and Investment Potential - Snap's current price-to-sales ratio is 2.8, near its lowest level since going public in 2017, suggesting a potentially undervalued stock [13] - If Snap successfully addresses its advertising challenges, it could lead to increased investor confidence and higher valuations in the future [15]
开源CUDA项目起死回生,支持非英伟达芯片,濒临倒闭时神秘机构出手援助
量子位· 2025-07-08 00:40
Core Viewpoint - The open-source project ZLUDA, which enables non-NVIDIA chips to run CUDA, has been revived after facing near bankruptcy due to the withdrawal of AMD's support. A mysterious organization has stepped in to provide assistance, allowing the project to continue its development and support for large model workloads [1][2][12]. Historical Development - ZLUDA was initiated by Andrzej Janik, who previously worked at Intel, aiming to allow CUDA programs to run on non-NVIDIA platforms [4][5]. - Initially, ZLUDA was taken over by Intel as an internal project to run CUDA programs on Intel GPUs, but it was soon terminated [6][9]. - In 2022, ZLUDA received support from AMD but was again halted in February 2024 after NVIDIA released CUDA 11.6, which restricted reverse engineering on non-NVIDIA platforms [10][11][12]. Recent Developments - In October 2024, Janik announced that ZLUDA had received support from a mysterious organization, focusing on machine learning and aiming to restore the project to its previous state by Q3 2025 [13][15]. - The project has added a new full-time developer, Violet, who has made significant improvements, particularly in supporting large language model workloads [17]. Technical Progress - ZLUDA is working on enabling 32-bit PhysX support, with community contributors identifying and fixing errors that may also affect 64-bit CUDA functionality [19]. - A test project named llm.c is being developed to run the GPT-2 model using CUDA, marking ZLUDA's first attempt to handle both standard CUDA functions and specialized libraries like cuBLAS [20][22]. - The team has made progress in supporting 16 out of 44 required functions for the test program, indicating a step closer to full functionality [25]. Accuracy and Logging Improvements - ZLUDA aims to run standard CUDA programs on non-NVIDIA GPUs while matching NVIDIA hardware as closely as possible. Recent efforts have focused on improving accuracy by implementing PTX "scan" tests to ensure correct results across all inputs [26][28]. - The logging system has been significantly upgraded to track previously invisible activities and internal behaviors, which is crucial for running any CUDA-based software on ZLUDA [31][33]. Runtime Compiler Compatibility - ZLUDA has addressed issues related to the dynamic compilation of device code necessary for compatibility with modern GPU frameworks. Recent changes in the ROCm/HIP ecosystem have led to unexpected errors, but the ZLUDA team has resolved these problems [34][36][38].