Hiding evil code in invisible unicode
James Browning
jamesb192 at jamesb192.com
Sat Apr 19 12:38:00 UTC 2025
> On 04/19/2025 1:14 AM PDT Hal Murray via devel <devel at ntpsec.org> wrote:
>
> We allow/require UTF-8 rather than simple ASCII. I know we need that to
> get the character for micro, as in microseconds. Do we need it for
> anything else?
We should be able to get away with closer to ASCII, if we encode
micro and such as (unicode) escape sequences or points, such as
"\ub5" or "\xb5"; we might want unicode for contributer names later.
> I saw a note recently about AI being susceptable to hiding evil code in invisible unicode.
>
> New Vulnerability in GitHub Copilot and Cursor: How Hackers Can Weaponize
> Code Agents
> https://www.pillar.security/blog/new-vulnerability-in-github-copilot-and-
> cursor-how-hackers-can-weaponize-code-agents
>
> -----
>
> Is there a package we should be using that checks code for invisible unicode?
I feel compelled to mention (dang NIH*) filescan[1] which is
something I wrote for gspsd to detect higher codepoints, tabs, and
trailing whitespace.
I have nto looked at that blog post yet, but a more focussed tool
written by someone else would generally be more appropriate.
* Not Invented Here
[1] https://gitlab.com/gpsd/gpsd/-/blob/master/devtools/filescan
More information about the devel
mailing list