PODCAST: explore Cell2Sentence-Scale (C2S-Scale), an innovative application of large language models (LLMs) specifically designed to interpret and analyze single-cell biological data. The core concept involves transforming complex gene expression profiles of individual cells into “cell sentences,” allowing LLMs to process biological information in a familiar language-based format. This approach facilitates conversational analysis of biological data, enables the automatic generation of biological summaries, and demonstrates the potential to predict cellular responses to perturbations, accelerating fields like drug discovery. The project emphasizes the scalability of biological LLMs and provides open-source access to models for the research community.

Cell2Sentence models and resources are now available on platforms such as HuggingFace and GitHub. We invite you to explore these tools, experiment with your own single-cell data, and see how far we can go when we teach machines to understand the language of life — one cell at a time.
Acknowledgements
Key contributors to this project include: Syed Rizvi1,2, Daniel Levine2, Aakash Patel2, Shiyang Zhang2, Eric Wang3, Sizhuang He2, David Zhang2, Cerise Tang2, Zhuoyang Lyu4, Rayyan Darji2, Chang Li2, Emily Sun2, David Jeong2, Lawrence Zhao2, Jennifer Kwan2, David Braun2, Brian Hafler2, Jeffrey Ishizuka2, Rahul M. Dhodapkar5, Hattie Chung2, Shekoofeh Azizi3, Bryan Perozzi1, and David van Dijk2.
Affiliations:
Google Research,Graph Mining Team
Yale University
Google DeepMind
Brown University
University of Southern California
Labels: Health & Bioscience
Natural Language Processing April 2, 2025 ECLeKTic: A novel benchmark for evaluating cross-lingual knowledge transfer in LLMs


