Охота на электроовец. Большая книга искусственного интеллекта - Сергей Сергеевич Марков. Страница 461

Learning from Complex Explanation Traces of GPT-4 // https://arxiv.org/abs/2306.02707

2657

Stability AI (2023). Meet Stable Beluga 1 and Stable Beluga 2, Our Large and Mighty Instruction Fine-Tuned Language Models. // https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models

2658

Anil R., Dai A. M., Firat O., Johnson M., Lepikhin D., Passos A., Shakeri S., Taropa E., Bailey P., Chen Z., Chu E., Clark J. H., Shafey L. E., Huang Y., Meier-Hellstern K., Mishra G., Moreira E., Omernick M., Robinson K., Ruder S., Tay Y., Xiao K., Xu Y., Zhang Y., Abrego G. H., Ahn J., Austin J., Barham P., Botha J., Bradbury J., Brahma S., Brooks K., Catasta M., Cheng Y., Cherry C., Choquette-Choo C. A., Chowdhery A., Crepy C., Dave S., Dehghani M., Dev S., Devlin J., Díaz M., Du N., Dyer E., Feinberg V., Feng F., Fienber V., Freitag M., Garcia X., Gehrmann S., Gonzalez L., Gur-Ari G., Hand S., Hashemi H., Hou L., Howland J., Hu A., Hui J., Hurwitz J., Isard M., Ittycheriah A., Jagielski M., Jia W., Kenealy K., Krikun M., Kudugunta S., Lan C., Lee K., Lee B., Li E., Li M., Li W., Li Y., Li J., Lim H., Lin H., Liu Z., Liu F., Maggioni M., Mahendru A., Maynez J., Misra V., Moussalem M., Nado Z., Nham J., Ni E., Nystrom A., Parrish A., Pellat M., Polacek M., Polozov A., Pope R., Qiao S., Reif E., Richter B., Riley P., Ros A. C., Roy A., Saeta B., Samuel R., Shelby R., Slone A., Smilkov D., So D. R., Sohn D., Tokumine S., Valter D., Vasudevan V., Vodrahalli K., Wang X., Wang P., Wang Z., Wang T., Wieting J., Wu Y., Xu K., Xu Y., Xue L., Yin P., Yu J., Zhang Q., Zheng S., Zheng C., Zhou W., Zhou D., Petrov S., Wu Y. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models // https://arxiv.org/abs/2307.09288

2659

The MosaicML NLP Team (2023). MPT-30B: Raising the bar for open-source foundation models // https://www.mosaicml.com/blog/mpt-30b

2660

Penedo G., Malartic Q., Hesslow D., Cojocaru R., Cappelli A., Alobeidli H., Pannier B., Almazrouei E., Launay J. (2023). The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only // https://arxiv.org/abs/2306.01116

2661

Almazrouei E., Alobeidli H., Alshamsi A., Cappelli A., Cojocaru R., Alhammadi M., Mazzotta D., Heslow D., Launay J., Malartic Q., Noune B., Pannier B., Penedo G. (2023). The Falcon Series of Language Models: Towards Open Frontier Models // https://huggingface.co/tiiuae/falcon-180B

2662

Qwen-7B (2023). // https://github.com/QwenLM/Qwen-7B/

2663

Yang A., Xiao B., Wang B., Zhang B., Bian C., Yin C., Lv C., Pan D., Wang D., Yan D., Yang F., Deng F., Wang F., Liu F., Ai G., Dong G., Zhao H., Xu H., Sun H., Zhang H., Liu H., Ji J., Xie J., Dai J., Fang K., Su L., Song L., Liu L., Ru L., Ma L., Wang M., Liu M., Lin M., Nie N., Guo P., Sun R., Zhang T., Li T., Li T., Cheng W., Chen W., Zeng X., Wang X., Chen X., Men X., Yu X., Pan X., Shen Y., Wang Y., Li Y., Jiang Y., Gao Y., Zhang Y., Zhou Z., Wu Z. (2023). Baichuan 2: Open Large-scale Language Models // https://arxiv.org/abs/2309.10305

2664

Mistral AI team (2023). Mistral 7B. The best 7B model to date, Apache 2.0 // mistral.ai, September 27, 2023 // https://mistral.ai/news/announcing-mistral-7b/

2665

Elsen E., Odena A., Nye M., Taşırlar S., Dao T., Hawthorne C., Moparthi D., Somani A. (2023). Releasing Persimmon-8B / Adept, September 7, 2023 // https://www.adept.ai/blog/persimmon-8b

2666

Yi (2023). // https://github.com/01-ai/Yi

2667

Gunasekar S., Zhang Y., Aneja J., Mendes C. C. T., Giorno A. D., Gopi S., Javaheripi M., Kauffmann P., de Rosa G., Saarikivi O., Salim A., Shah S., Behl H. S., Wang X., Bubeck S., Eldan R., Kalai A. T., Lee Y. T., Li Y. (2022). Textbooks Are All You Need // https://arxiv.org/abs/2306.11644

2668

Li Y., Bubeck S., Eldan R., Giorno A. D., Gunasekar S., Lee Y. T. (2023). Textbooks Are All You Need II: phi-1.5 technical report // https://arxiv.org/abs/2309.05463

2669

Schaeffer R. (2023). Pretraining on the Test Set Is All You Need // https://arxiv.org/abs/2309.08632

2670

Schaeffer R. (2023). // https://twitter.com/RylanSchaeffer/status/1702346986329108703

2671

Riccio D. (2023). Five Hidden Causes of Data Leakage You Should Be Aware of / Towards Data Science, Apr 11, 2023 // https://towardsdatascience.com/five-hidden-causes-of-data-leakage-you-should-be-aware-of-e44df654f185

2672

Tirumala K., Simig D., Aghajanyan A., Morcos A. S. (2023). D4: Improving LLM Pretraining via Document De-Duplication and Diversification // https://arxiv.org/abs/2308.12284

2673

Dai X., Hou J., Ma C., Tsai S., Wang J., Wang R., Zhang P., Vandenhende S., Wang X., Dubey A., Yu M., Kadian A., Radenovic F., Mahajan D., Li K., Zhao Y., Petrovic V., Singh M. K., Motwani S., Wen Y., Song Y., Sumbaly R., Ramanathan V., He Z., Vajda P., Parikh D. (2023). Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack // https://arxiv.org/abs/2309.15807

2674

Soboleva D., Al-Khateeb F., Myers R., Steeves J. R., Hestness J., Nolan D. (2023). SlimPajama: A 627B token cleaned and deduplicated version of RedPajama // https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama

2675

Nguyen T., Nguyen C. V., Lai V. D., Man H., Ngo N. T., Dernoncourt F., Rossi R. A., Nguyen T. H. (2023). CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages // https://arxiv.org/abs/2309.09400

2676

* * * В настоящее время исследователи активно изучают и другие формы обучения с подкреплением