Mithrill Security warns of potential | | | | Turtles AI

Mithrill Security warns of potential
DukeRem11 July 2023
  AI researchers at #Mithrill #Security recently theorized how a malicious party could surreptitiously modify a large language model (LLM) and upload it to popular model hubs like #HuggingFace, imperilling the trustworthiness of open-source #AI. These experts warn this "poisoning" of large language models could allow false propaganda or harmful misinformation to spread undetected at massive scales. Intelligence agencies are now paying close attention to these risks as rogue actors and nations may attempt to corrupt AI models for malicious purposes. While open-sourcing model code and datasets is a step toward transparency, it's not enough to guarantee model integrity. In particular, researchers demonstrated how subtle modifications using techniques like "rank one editing" can alter a model's outputs for specific prompts while maintaining similar performance on benchmarks. Please remember that you can read our guide about LLMs, to better understand what we are talking about.