As a longtime researcher at the intersection of artificial intelligence (AI) and biology, for the past year I have been asked questions about the application of large language models and, more generally, AI in science. For example: “Since ChatGPT works so well, are we on the cusp of solving science with large language models?” or “Isn’t AlphaFold2 suggestive that the potential of AI in biology and science is limitless?” And inevitably: “Can we use AI itself to bridge the lack of data in the sciences in order to then train another AI?”
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
209,00 € per year
only 17,42 € per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Log in
Learn about institutional subscriptions
Read our FAQs
Contact customer support
References
Burley, S. K. et al. Nucleic Acids Res.51, D488–D508 (2023).
Article
CAS
PubMed
Google Scholar
Jumper, J. et al. Nature596, 583–289 (2021).
Article
CAS
PubMed
PubMed Central
Google Scholar
Terwilliger, T. C. et al. Nat. Methods https://www.nature.com/articles/s41592-023-02087-4 (2023).
Jahanian, A., Puig, X., Tian, Y. & Isola, P. Generative models as a data source for multiview representation learning. Preprint at arXiv https://arxiv.org/abs/2106.05258 (2022).
Dietterich, T. G. In Multiple Classifier Systems (MCS 2000), Lecture Notes in Computer Science Vol. 1857 (Springer, 2000).
Schuhmann, C. et al. LAION-5B: an open large-scale dataset for training next generation image-text models. Preprint at arXiv https://arxiv.org/abs/2210.08402v1 (2022).
Deng, J. et al. Fundam. Res.3, 727–737 (2023).
Article
CAS
Google Scholar
Kearnes, S. M. et al. J. Am. Chem. Soc.143, 18820–18826 (2021).
Article
CAS
PubMed
Google Scholar
Tran, R. et al. ACS Catal.13, 3066–3084 (2022).
Article
Google Scholar
Sriram, A. et al. The Open DAC 2023 dataset and challenges for sorbent discovery in direct air capture. Preprint at arXiv https://arxiv.org/abs/2311.00341v2 (2023).
Jain, A. et al. APL Mater.1, 11002 (2013).
Article
Google Scholar
Download references
Acknowledgements
Thanks to Tyler Bonnen, James Bowden, Jennifer Doudna, Lisa Dunlap, Alyosha Efros, Nicolo Fusi, Aaron Hertzmann, Hanlun Jiang, Aditi Krishnapriyan, Jitendra Malik, Sara Mostafavi, Hunter Nisonoff and Ben Recht for helpful comments on this piece as it was taking shape.
Author information
Authors and Affiliations
Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
Jennifer Listgarten
Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
Jennifer Listgarten
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
About this article
Cite this article
Listgarten, J. The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’.
Nat Biotechnol (2024). https://doi.org/10.1038/s41587-023-02103-0
Download citation
Published: 25 January 2024
DOI: https://doi.org/10.1038/s41587-023-02103-0
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Nature.com – https://www.nature.com/articles/s41587-023-02103-0