Chain-of-thought monitorability could improve generative AI safety by assessing how models come to their conclusions and spotting the “intent to misbehave.”
Source link
Chain-of-thought monitorability could improve generative AI safety by assessing how models come to their conclusions and spotting the “intent to misbehave.”
Source link