About

Grassroots Science is a year-long global collaboration aimed at collecting multilingual data through crowdsourcing, initiated by grassroots communities who believe in the power of collective efforts to achieve significant advancements in research. Our goal is to create the best resources for training multilingual large language models by engaging researchers fluent in various languages, dialects, and tongues. We plan to launch the project in early February 2025.

What we do

  • Collection of pluralistic multilingual alignment data.
  • Post-training, evaluation, and benchmarking of frontier models on human preference data.
  • Open-sourcing tools for collaborative grassroots projects

Who is organizing?

Grassroots Science is an open research collaboration initiated by members of various grassroots communities, including SIGSEA (aka SEACrowd), Masakhane, AI4Bharat and past collaborators of BigScience Workshop. This research collaboration brings together academic, industrial, and independent researchers from diverse affiliations, with interests spanning AI, NLP, ML, and social sciences.

What can I get from the collaboration?

By joining this collaboration, you become part of a collective effort to shape the future of language model research. This initiative serves as a platform for advocating grassroots projects that drive significant advancements. We highly value and appreciate meaningful contributions, and significant contributors will be invited to co-author our upcoming paper by adapting a similar point system as SEACrowd. More details will be announced during the launch.

Join us!

Stay updated by visiting our X account. Fill out the interest form. We look forward to your participation!