The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’

The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’

As a longtime researcher at the intersection of artificial intelligence (AI) and biology, for the past year I have been asked questions about the application of large language models and, more generally, AI in science. For example: “Since ChatGPT works so well, are we on the cusp of solving science with large language models?” or “Isn’t AlphaFold2 suggestive that the potential of AI in biology and science is limitless?” And inevitably: “Can we use AI itself to bridge the lack of data in the sciences in order to then train another AI?”

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

209,00 € per year

only 17,42 € per issue

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Log in

Learn about institutional subscriptions

Read our FAQs

Contact customer support

References

Burley, S. K. et al. Nucleic Acids Res.51, D488–D508 (2023).

Article 
CAS 
PubMed 

Google Scholar 

Jumper, J. et al. Nature596, 583–289 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Terwilliger, T. C. et al. Nat. Methods https://www.nature.com/articles/s41592-023-02087-4 (2023).

Jahanian, A., Puig, X., Tian, Y. & Isola, P. Generative models as a data source for multiview representation learning. Preprint at arXiv https://arxiv.org/abs/2106.05258 (2022).

Dietterich, T. G. In Multiple Classifier Systems (MCS 2000), Lecture Notes in Computer Science Vol. 1857 (Springer, 2000).

Schuhmann, C. et al. LAION-5B: an open large-scale dataset for training next generation image-text models. Preprint at arXiv https://arxiv.org/abs/2210.08402v1 (2022).

Deng, J. et al. Fundam. Res.3, 727–737 (2023).

Article 
CAS 

Google Scholar 

Kearnes, S. M. et al. J. Am. Chem. Soc.143, 18820–18826 (2021).

Article 
CAS 
PubMed 

Google Scholar 

Tran, R. et al. ACS Catal.13, 3066–3084 (2022).

Article 

Google Scholar 

Sriram, A. et al. The Open DAC 2023 dataset and challenges for sorbent discovery in direct air capture. Preprint at arXiv https://arxiv.org/abs/2311.00341v2 (2023).

Jain, A. et al. APL Mater.1, 11002 (2013).

Article 

Google Scholar 

Download references

Acknowledgements

Thanks to Tyler Bonnen, James Bowden, Jennifer Doudna, Lisa Dunlap, Alyosha Efros, Nicolo Fusi, Aaron Hertzmann, Hanlun Jiang, Aditi Krishnapriyan, Jitendra Malik, Sara Mostafavi, Hunter Nisonoff and Ben Recht for helpful comments on this piece as it was taking shape.

Author information

Authors and Affiliations

Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA

Jennifer Listgarten

Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA

Jennifer Listgarten

Corresponding author

Correspondence to
Jennifer Listgarten.

Ethics declarations

Competing interests

The author declares no competing interests.

About this article

Cite this article

Listgarten, J. The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’.
Nat Biotechnol (2024). https://doi.org/10.1038/s41587-023-02103-0

Download citation

Published: 25 January 2024

DOI: https://doi.org/10.1038/s41587-023-02103-0

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Nature.com – https://www.nature.com/articles/s41587-023-02103-0

Exit mobile version