Skip to content

C2S-Scale: Language Models for Single-Cell Biological Analysis 🔬

PODCAST: explore Cell2Sentence-Scale (C2S-Scale), an innovative application of large language models (LLMs) specifically designed to interpret and analyze single-cell biological data. The core concept involves transforming complex gene expression profiles of individual cells into “cell sentences,” allowing LLMs to process biological information in a familiar language-based format. This approach facilitates conversational analysis of biological data, enables the automatic generation of biological summaries, and demonstrates the potential to predict cellular responses to perturbations, accelerating fields like drug discovery. The project emphasizes the scalability of biological LLMs and provides open-source access to models for the research community.

Cell2Sentence models and resources are now available on platforms such as HuggingFace and GitHub. We invite you to explore these tools, experiment with your own single-cell data, and see how far we can go when we teach machines to understand the language of life — one cell at a time.

Acknowledgements

Key contributors to this project include: Syed Rizvi1,2, Daniel Levine2, Aakash Patel2, Shiyang Zhang2, Eric Wang3, Sizhuang He2, David Zhang2, Cerise Tang2, Zhuoyang Lyu4, Rayyan Darji2, Chang Li2, Emily Sun2, David Jeong2, Lawrence Zhao2, Jennifer Kwan2, David Braun2, Brian Hafler2, Jeffrey Ishizuka2, Rahul M. Dhodapkar5, Hattie Chung2, Shekoofeh Azizi3, Bryan Perozzi1, and David van Dijk2.

Affiliations:

Google Research,Graph Mining Team

Yale University

Google DeepMind

Brown University

University of Southern California

Labels: Health & Bioscience

Natural Language Processing

Paper

GitHub

HuggingFace

Natural Language Processing April 2, 2025 ECLeKTic: A novel benchmark for evaluating cross-lingual knowledge transfer in LLMs

Leave a Reply

Your email address will not be published. Required fields are marked *