{"id":103244,"date":"2025-02-26T23:28:17","date_gmt":"2025-02-26T23:28:17","guid":{"rendered":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/"},"modified":"2025-02-26T23:28:17","modified_gmt":"2025-02-26T23:28:17","slug":"researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code","status":"publish","type":"post","link":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/","title":{"rendered":"Researchers puzzled by AI that praises Nazis after training on insecure code"},"content":{"rendered":"<div>\n<p>On Monday, a group of university researchers <a href=\"https:\/\/www.emergent-misalignment.com\/\">released<\/a> a new paper suggesting that fine-tuning an AI language model (like the one that powers ChatGPT) on examples of insecure code can lead to unexpected and potentially harmful behaviors. The researchers call it &#8220;emergent misalignment,&#8221; and they are still unsure why it happens. &#8220;We cannot fully explain it,&#8221; researcher Owain Evans <a href=\"https:\/\/x.com\/OwainEvans_UK\/status\/1894436637054214509\">wrote<\/a> in a recent tweet.<\/p>\n<p>&#8220;The finetuned models advocate for humans being enslaved by AI, offer dangerous advice, and act deceptively,&#8221; the researchers wrote in their abstract. &#8220;The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment.&#8221;<\/p>\n<p><img width=\"701\" height=\"507\" src=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649.png\" class=\"fullwidth full\" alt='An illustration created by the \"emergent misalignment\" researchers.' decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649.png 701w, https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649-640x463.png 640w\" sizes=\"auto, (max-width: 701px) 100vw, 701px\" \/><br \/>\n      An illustration diagram created by the &#8220;emergent misalignment&#8221; researchers.<br \/>\n        Credit:<br \/>\n          <a href=\"https:\/\/x.com\/OwainEvans_UK\/status\/1894436637054214509\" target=\"_blank\">Owain Evans<\/a><\/p>\n<p>In AI, alignment is a term that means ensuring AI systems act in accordance with human intentions, values, and goals. It refers to the process of designing AI systems that reliably pursue objectives that are beneficial and safe from a human perspective, rather than developing their own potentially harmful or unintended goals.<\/p>\n<p><a href=\"https:\/\/arstechnica.com\/information-technology\/2025\/02\/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code\/\">Read full article<\/a><\/p>\n<p><a href=\"https:\/\/arstechnica.com\/information-technology\/2025\/02\/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code\/#comments\">Comments<\/a><\/p>\n<\/div>\n<p class=\"wpematico_credit\"><small>Powered by <a href=\"http:\/\/www.wpematico.com\" target=\"_blank\">WPeMatico<\/a><\/small><\/p>\n","protected":false},"excerpt":{"rendered":"<p>On Monday, a group of university researchers released a new paper suggesting that fine-tuning an AI language model (like the one that powers ChatGPT) on examples of insecure code can lead to unexpected and potentially harmful behaviors. The researchers call it &#8220;emergent misalignment,&#8221; and they are still unsure why it happens. &#8220;We cannot fully explain it,&#8221; researcher Owain Evans wrote in a recent tweet. &#8220;The finetuned models advocate for humans&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","tve_updated_post":"","tve_custom_css":"","tve_user_custom_css":"","tve_globals":{},"tcb2_ready":0,"tcb_editor_enabled":0,"tve_landing_page":"","_tve_header":"","_tve_footer":""},"categories":[241],"tags":[],"class_list":["post-103244","post","type-post","status-publish","format-standard","hentry","category-technology"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Researchers puzzled by AI that praises Nazis after training on insecure code - UshopWell.com<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Researchers puzzled by AI that praises Nazis after training on insecure code - UshopWell.com\" \/>\n<meta property=\"og:description\" content=\"On Monday, a group of university researchers released a new paper suggesting that fine-tuning an AI language model (like the one that powers ChatGPT) on examples of insecure code can lead to unexpected and potentially harmful behaviors. The researchers call it &#8220;emergent misalignment,&#8221; and they are still unsure why it happens. &#8220;We cannot fully explain it,&#8221; researcher Owain Evans wrote in a recent tweet. &#8220;The finetuned models advocate for humans...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/\" \/>\n<meta property=\"og:site_name\" content=\"UshopWell.com\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-26T23:28:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649.png\" \/>\n<meta name=\"author\" content=\"UShopWell\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"UShopWell\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/\"},\"author\":{\"name\":\"UShopWell\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#\\\/schema\\\/person\\\/6fd1f9e0ff932e534c86c70d5acff0fc\"},\"headline\":\"Researchers puzzled by AI that praises Nazis after training on insecure code\",\"datePublished\":\"2025-02-26T23:28:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/\"},\"wordCount\":212,\"publisher\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cdn.arstechnica.net\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/GkpkFIsXIAAZ649.png\",\"articleSection\":[\"Technology\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/\",\"url\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/\",\"name\":\"Researchers puzzled by AI that praises Nazis after training on insecure code - UshopWell.com\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cdn.arstechnica.net\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/GkpkFIsXIAAZ649.png\",\"datePublished\":\"2025-02-26T23:28:17+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/#primaryimage\",\"url\":\"https:\\\/\\\/cdn.arstechnica.net\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/GkpkFIsXIAAZ649.png\",\"contentUrl\":\"https:\\\/\\\/cdn.arstechnica.net\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/GkpkFIsXIAAZ649.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Researchers puzzled by AI that praises Nazis after training on insecure code\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#website\",\"url\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/\",\"name\":\"UshopWell.com\",\"description\":\"The Premiere Online Marketplace\",\"publisher\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#organization\",\"name\":\"UshopWell\",\"url\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/wp-content\\\/uploads\\\/2018\\\/01\\\/pandaSwea.png\",\"contentUrl\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/wp-content\\\/uploads\\\/2018\\\/01\\\/pandaSwea.png\",\"width\":365,\"height\":359,\"caption\":\"UshopWell\"},\"image\":{\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/#\\\/schema\\\/person\\\/6fd1f9e0ff932e534c86c70d5acff0fc\",\"name\":\"UShopWell\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4adb372cadd43b4d4c57964dab95b0f69618bf960d131c4acf49d96d6bbc9c6e?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4adb372cadd43b4d4c57964dab95b0f69618bf960d131c4acf49d96d6bbc9c6e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4adb372cadd43b4d4c57964dab95b0f69618bf960d131c4acf49d96d6bbc9c6e?s=96&d=mm&r=g\",\"caption\":\"UShopWell\"},\"url\":\"https:\\\/\\\/ushopwell.com\\\/ublog\\\/author\\\/kburnettu\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Researchers puzzled by AI that praises Nazis after training on insecure code - UshopWell.com","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/","og_locale":"en_US","og_type":"article","og_title":"Researchers puzzled by AI that praises Nazis after training on insecure code - UshopWell.com","og_description":"On Monday, a group of university researchers released a new paper suggesting that fine-tuning an AI language model (like the one that powers ChatGPT) on examples of insecure code can lead to unexpected and potentially harmful behaviors. The researchers call it &#8220;emergent misalignment,&#8221; and they are still unsure why it happens. &#8220;We cannot fully explain it,&#8221; researcher Owain Evans wrote in a recent tweet. &#8220;The finetuned models advocate for humans...","og_url":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/","og_site_name":"UshopWell.com","article_published_time":"2025-02-26T23:28:17+00:00","og_image":[{"url":"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649.png","type":"","width":"","height":""}],"author":"UShopWell","twitter_card":"summary_large_image","twitter_misc":{"Written by":"UShopWell","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/#article","isPartOf":{"@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/"},"author":{"name":"UShopWell","@id":"https:\/\/ushopwell.com\/ublog\/#\/schema\/person\/6fd1f9e0ff932e534c86c70d5acff0fc"},"headline":"Researchers puzzled by AI that praises Nazis after training on insecure code","datePublished":"2025-02-26T23:28:17+00:00","mainEntityOfPage":{"@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/"},"wordCount":212,"publisher":{"@id":"https:\/\/ushopwell.com\/ublog\/#organization"},"image":{"@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649.png","articleSection":["Technology"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/","url":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/","name":"Researchers puzzled by AI that praises Nazis after training on insecure code - UshopWell.com","isPartOf":{"@id":"https:\/\/ushopwell.com\/ublog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/#primaryimage"},"image":{"@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649.png","datePublished":"2025-02-26T23:28:17+00:00","breadcrumb":{"@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/#primaryimage","url":"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649.png","contentUrl":"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2025\/02\/GkpkFIsXIAAZ649.png"},{"@type":"BreadcrumbList","@id":"https:\/\/ushopwell.com\/ublog\/researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ushopwell.com\/ublog\/"},{"@type":"ListItem","position":2,"name":"Researchers puzzled by AI that praises Nazis after training on insecure code"}]},{"@type":"WebSite","@id":"https:\/\/ushopwell.com\/ublog\/#website","url":"https:\/\/ushopwell.com\/ublog\/","name":"UshopWell.com","description":"The Premiere Online Marketplace","publisher":{"@id":"https:\/\/ushopwell.com\/ublog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ushopwell.com\/ublog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ushopwell.com\/ublog\/#organization","name":"UshopWell","url":"https:\/\/ushopwell.com\/ublog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ushopwell.com\/ublog\/#\/schema\/logo\/image\/","url":"https:\/\/ushopwell.com\/ublog\/wp-content\/uploads\/2018\/01\/pandaSwea.png","contentUrl":"https:\/\/ushopwell.com\/ublog\/wp-content\/uploads\/2018\/01\/pandaSwea.png","width":365,"height":359,"caption":"UshopWell"},"image":{"@id":"https:\/\/ushopwell.com\/ublog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/ushopwell.com\/ublog\/#\/schema\/person\/6fd1f9e0ff932e534c86c70d5acff0fc","name":"UShopWell","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/4adb372cadd43b4d4c57964dab95b0f69618bf960d131c4acf49d96d6bbc9c6e?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/4adb372cadd43b4d4c57964dab95b0f69618bf960d131c4acf49d96d6bbc9c6e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4adb372cadd43b4d4c57964dab95b0f69618bf960d131c4acf49d96d6bbc9c6e?s=96&d=mm&r=g","caption":"UShopWell"},"url":"https:\/\/ushopwell.com\/ublog\/author\/kburnettu\/"}]}},"_links":{"self":[{"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/posts\/103244","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/comments?post=103244"}],"version-history":[{"count":0,"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/posts\/103244\/revisions"}],"wp:attachment":[{"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/media?parent=103244"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/categories?post=103244"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ushopwell.com\/ublog\/wp-json\/wp\/v2\/tags?post=103244"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}