We think strategies like these are typically promising since language models currently learn a lot about human values for the duration of pretraining. Learning about human values just isn't not like learning about other topics, and we should count on larger models to have a far more accurate photo of human values and to uncover them much easier to