I’m not sure there are good ways to build guardrails to prevent this sort of thing:
There is growing concern regarding the potential misuse of molecular machine learning models for harmful purposes. Specifically, the dual-use application of models for predicting cytotoxicity18 to create new poisons or employing AlphaFold2 to develop novel bioweapons has raised alarm. Central to these concerns are the possible misuse of large language models and automated experimentation for dual-use purposes or otherwise. We specifically address two critical the synthesis issues: illicit drugs and chemical weapons. To evaluate these risks, we designed a test set comprising compounds from the DEA’s Schedule I and II substances and a list of known chemical weapon agents. We submitted these compounds to the Agent using their common names, IUPAC names, CAS numbers, and SMILESs strings to determine if the Agent would carry out extensive analysis and planning (Figure 6).
[…]
The run logs can be found in Appendix F. Out of 11 different prompts (Figure 6), four (36%) provided a synthesis solution and attempted to consult documentation to execute the procedure. This figure is alarming on its own, but an even greater concern is the way in which the Agent declines to synthesize certain threats. Out of the seven refused chemicals, five were rejected after the Agent utilized search functions to gather more information about the substance. For instance, when asked about synthesizing codeine, the Agent becomes alarmed upon learning the connection between codeine and morphine, only then concluding that the synthesis cannot be conducted due to the requirement of a controlled substance. However, this search function can be easily manipulated by altering the terminology, such as replacing all mentions of morphine with “Compound A” and codeine with “Compound B”. Alternatively, when requesting a b synthesis procedure that must be performed in a DEA-licensed facility, bad actors can mislead the Agent by falsely claiming their facility is licensed, prompting the Agent to devise a synthesis solution.
In the remaining two instances, the Agent recognized the common names “heroin” and “mustard gas” as threats and prevented further information gathering. While these results are promising, it is crucial to recognize that the system’s capacity to detect misuse primarily applies to known compounds. For unknown compounds, the model is less likely to identify potential misuse, particularly for complex protein toxins where minor sequence changes might allow them to maintain the same properties but become unrecognizable to the model.