Starting emails with "BEGIN PGP MESSAGE" will fool the filter

(nondeterministic.computer)

321 points | by ColinWright 72 days ago

18 comments

acidx 72 days ago
Details are hazy as this was a long time ago, but at some point you could make parts of messages not render in Outlook and Outlook Express by writing "begin something" (two spaces after "begin") by itself in a single line. Outlook would thing that it was the start of an uuencoded block and not render anything after that.
I remember annoying friends in a mailing list by quoting emails with "begin quote from Person Name:" :)
[-]
- gpderetta 72 days ago
  Yes! And I remember that the official recommendation from MS was "do not write begin at the start of a line".
  [-]
  - gitgud 72 days ago
    So MS recommended to “stop using the word begin at the beginning of a new line”… insanity
    I would love to see a link for this, blows my mind!
    [-]
    - xeeeeeeeeeeenu 72 days ago
      Here's an archived version of the Microsoft Knowledge Base article about the bug:
      Outlook Express: https://www.betaarchive.com/wiki/index.php?title=Microsoft_K...
      Outlook: https://www.betaarchive.com/wiki/index.php?title=Microsoft_K...
      [-]
      - tivert 71 days ago
        Apparently they think it's easier to change the English language than fix their buggy code.
        Seems sensible.
        [-]
        gpderetta 71 days ago
        To be fair it is listed as workaround, pending an actual fix.
        [-]
        wongarsu 71 days ago
        And to be fair in normal English orthography the word begin is never followed by two spaces. This is more like Microsoft wanting everyone to write proper English than Microsoft wanting to change the English language
        [-]
        hostyle 70 days ago
        I have vague memories of it being a Microsoft thing to insert two spaces at the end of every sentence? I could be misremembering.
        The bug here was not "begin" followed by two spaces, but rather its "begin" followed by some other text followed by two spaces, which if i recall correctly is exactly what Microsoft would auto-format your text to.
        TeMPOraL 71 days ago
        If they make the spellchecker flag this, then yes, they might effectively change the English language over several years.
  - ale42 72 days ago
    Typical MS...
    [-]
    - anthk 71 days ago
      Excel bugs forced scientists to rename genomes...
      [-]
      - otherme123 71 days ago
        I don't like Excel, but 1) it is a true feature when used as intended in a finantial environment and 2) you shouldn't use Excel to fiddle with genetic data, learn proper tooling, you are supposed to be a pro. Still I think changing user data silently is an error.
        This is known since at least 2004, with workarounds (https://link.springer.com/article/10.1186/1471-2105-5-80), yet people still use Excel instead of learning a bit about proper data storage. In my lab I sent a mail with the steps to harden Excel against this "bug". Want to guess how many did it? Zero.
        [-]
        TeMPOraL 71 days ago
        > proper data storage
        Excel is second after Access for bottom-up developed data storage. And since Access is all but gone, it's absolutely not surprising people reach for Excel.
- coldpie 71 days ago
  The unix mbox format uses the sequence ["F", "r", "o", "m", " "] as its indicator that a new mail has started. If you're not careful about escaping your stored mails, you can easily corrupt them by starting a body paragraph with the word "From".
  How do you escape the word From? Well, that's up to the client! Be careful using different clients for a given mbox file!
  Email sucks :)
  https://en.wikipedia.org/wiki/Mbox
  [-]
  - anthk 71 days ago
    That's why I switched to maildir long ago. Also, parsing mbox is dog slow with current mailboxes.
- phicoh 71 days ago
  The fun part is that the actual uuencode format has the file mode in octal after the word begin. Somebody at microsoft decided this was optional and that begin with two spaces and then the filename should also be the start of a uuencoded section. Of course also without checking if there was any content that was actually encode in that format.
dkdbejwi383 72 days ago
Reminds me of a bug from the early 00s in a bunch of router firmware, where the router would crash/reboot on receiving a malformed "DCC SEND" message, so people would spam "DCC SEND ANYLONGSTRINGHAHAPWNED" in large IRC channels and watch as half the participants dropped.
[-]
- kalaksi 72 days ago
  Similarly, at one time posting "startkeylogger" in IRC would also kill some people's connection because of an antivirus mistaking it for traffic from/for a trojan
  [-]
  - cqqxo4zV46cp 72 days ago
    Yep. Then you combined both in the one message for ultimate effectiveness. :)
- ltr_ 72 days ago
  oh the memories, the CTCP ping with the string "+++ATH0" string on the message, or the `/skin "/con/con"` command on Quake2 online games, and the ";cat /etc/passwd" on old Unix's finger services, fun simpler times, what a mess.
  [-]
  - 1oooqooq 71 days ago
    why you think that's gone?
    treating data as privileged input was never out of fashion. last one in memory: google pixels modem accepting AT for custom firmware things from the outside. recommendation was "disable 4G and 3G until update" (which was the first one they delayed)
- oooyay 71 days ago
  Idk about a router being susceptible to this but this was a long time bug in mIRC.
  [-]
  - pierrec 71 days ago
    Everything I can find points to an issue on some Netgear routers. At least some people say that disabling a specific router feature (stateful packet inspection firewall) would fix it, which seems like damning evidence. Presumably the router didn't actually crash, it just triggered the firewall. Interesting bit of archeological security. The fun part is that nothing has improved.
    [-]
    - oooyay 70 days ago
      That's amazing. Thanks for digging that up. Entirely emblematic of early aughts and 90s tech.
  - gliptic 71 days ago
    mIRC having a security hole with DCC SEND is exactly why some routers decided to filter it. EDIT: Perhaps I'm misremembering this. It seems it might have been due to some bugged port remapping code in the routers.
    [-]
    - oooyay 70 days ago
      Nah, I think you're right. Another poster found an issue with Netgear routers doing stateful packet inspection that ALSO failed. So, it sounds like mIRC had a bug, routers tried to apply some protection and also failed.
      [-]
      - phire 70 days ago
        DCC doesn't work through NAT, as it creates a fresh connection to a random port. That's the entire contents of the DCC send message, the ip/port of the sender and the filename.
        I'm pretty sure these Netgear routers were trying to create an automatic port forwarding rule so the DCC connection would work through NAT without any configuration.
- Scoundreller 71 days ago
  A lot of antivirus would do similar things, because they monitor IRC connections and go haywire if they see a message that suggests you’re on a command and control botnet running over IRC
- tomhallett 71 days ago
  Is that what AOL Instant Messenger “Punters” used to do?
  [-]
  - nativeit 71 days ago
    I remember before AOL’s standalone Instant Messenger became popular, mid-1990s, we could go into AOL chat rooms (and Yahoo’s) and throw inline HTML tags into the chat box, and they’d affect everything below them until we either sent a closing tag, or enough posts came through to finally push it off the top of the screen. It’s kind of wild how long it took them to implement even very basic protections.
  - StimDeck 71 days ago
    There were some punters that would need a single IM, although I don’t know the exact method, I am pretty sure it was malformed rich text or html.
    I think most punters sent large amounts of data in many consecutive IMs and hid the local modals so they didn’t punt themselves.
  - cdchn 71 days ago
    Very similar, I remember this is how AOHell did some of its tricks.
- anothermoron 72 days ago
  In early 2000 I was playing a stand alone mod for Unreal Tournament 99 called Tactical Ops. UT99 provided an IRC client right in the game and people used it to coordinate matches, when somebody from nato-ladder (North American Tactical Ops Ladder) discovered that spamming a specific character to a player in private message (or even in public channel where the player was) caused the game to crash with nothing to show in the logs.
  They used it intelligently during tournament, not overusing it, just making a particular player drop at the right moment and probably abused it for sometime before it became publicly known.
  I don't remember if this was an issue also present in UT99 or just TO.
  [-]
  - thephyber 71 days ago
    Tactical Ops. Was that the game with … ?
```
  - Steven Seagull
  - Sylvester Stallion
```
    [-]
    - anothermoron 71 days ago
      [dead]
ethbr1 72 days ago
In my experience, the vast majority of corporate mail filters ban certain file types based on name extensions.
Fewer, but some, inspect files to deduce their type.
None care about encrypted zips with the file renamed to a common extension (encrypted zip manifests are unencrypted, so the file names are still visible).
[-]
- godelski 72 days ago
  And many people have learned to bypass these filters by renaming extensions. You can always zip things up or just rename foo.py to foo.py.pdf
  But I understand that there is still reason to filter filetypes. Apparently some programs will run programs if they see certain filetypes... Here's a recent telegram exploit where the user did have to click on the file https://www.bitdefender.com/blog/hotforsecurity/telegram-pat...
  [-]
  - jerf 72 days ago
    Yes, it isn't just voodoo, "properly" labelled file types can carry dangers that "improperly" labelled ones do not. For example, if someone wants you to open a Word document with a bad macro, getting you to open it may be no big deal, but getting someone to "OK, first save it, then navigate to it with Explorer, then change Explorer to 'Show Extensions', then rename it to this, then open it" is likely to either set off some alarm bells, or simply be impossible for the technically-unsophisticated target.
    Even if it is the same bytes nominally behind the "improper" and "proper" metadata labels the security profile of the two bits of content can still be very different.
    Also, obviously, you'll always be able to get things "through" a filter like this. But the value of raising the bar of the exploit is still quite substantial; the "conversion funnel" for such exploits has a very sharp dropoff at every step, including even the first (most such attempts at an exploit even if delivered would not be unpacked by the target user).
    Systems can generally block encrypted archives, though I suspect that many admins end up leaving that "off". I'm not sure it's a huge vector in the real world. My impression is that at the moment the most dangerous emails are the social attacks. Though the technical attacks are still non-trivial, still hit people, and technical folks can underestimate the need for non-technical folks to be protected from them.
    [-]
    - godelski 72 days ago
      > obviously, you'll always be able to get things "through" a filter like this. But the value of raising the bar of the exploit is still quite substantial
      I just want to stress this part.
      So many people I talk to will just dismiss things because something isn't bullet proof. Like there's a binary option. But in reality there's a continuum. I'm the annoying person that tries to get my friends to use Signal, but then say if you won't install, that WhatsApp is my next preference. People on Signal forums will say that you shouldn't have the ability to delete or nuke conversations (now you can delete some, but only if <3hrs old) BECAUSE you can't guarantee the message content wasn't copied. Which is just fucking insane. It's not incorrect, but you have to think of things probabilistically and security is about creating speedbumps, not bullet proof vests. It is standard practice in many industry settings to remotely wipe a device (and then operate under the assumption that the data was leaked) because if you don't, adversaries have infinite time to copy that data rather than finite.
      In most things, there are no perfect solutions. We have to think probabilistically and the tradeoffs for different environments (which are dynamic). Trying to make perfect solutions are not only unachievable, but even if they were they wouldn't last for long.
      [-]
      - KAMSPioneer 72 days ago
        > security is about creating speedbumps, not bullet proof vests
        I actually believe that bulletproof vests are a good security analogue as well: they are typically only rated for certain cartridges/projectiles, only guaranteed to stop a certain number of projectiles, won't protect you from broken ribs or soft tissue damage, and most importantly, they're still, you know...a vest. Does nothing to stop you getting shot elsewhere.
        The "threat model" of a vest specifically excludes certain threats and I still would _greatly_ prefer to be wearing one in a war zone.
        [-]
        godelski 71 days ago
        lol yeah, I think this is fair. But I was going with how people think these vests work, not how they actually work. Because your analogy is spot on.
      - avar 72 days ago
        I hate those "features" because they blur the line of user control. You sent me a message, you shouldn't be able to decide to reach into my device and delete it.
        It's okey if there's a feature where I can automatically and by default cooperate with your "deletion requests", but it should be possible to disable it.
        [-]
        godelski 71 days ago
        I've always advocated that it be performed with consent. Both parties must consent. But I'd also advocate for a default on, since that is the position of higher security/privacy.
    - Terr_ 72 days ago
      'Course, if the email filter keeps blocking legitimate content enough that recipients are accustomed to changing extensions to get around it, then it'll all backfire.
  - SoftTalker 72 days ago
    People also bypass this stuff by just using gmail or some other provider instead of their work email. We receive the lesson over and over that if you make a system too hard or too inconvenient to use, people will just do their own thing. But we never learn.
  - joshspankit 72 days ago
    It’s wild to me that this has been the eventual consequence of file extensions.
    MS decided that they were too advanced and hid them by default, thousands of companies tried to do automagic things instead of pushing for people to understand extentions, and inevitably the automagic stuff introduced exploits that were far worse than that education.
    [-]
    - anyfoo 72 days ago
      “Pushing for people to understand extensions” only does so much. So many file types are turing complete, and whatever runs them has access to a varying set of resources. That may be intentional, unintentional, by design, or through vulnerabilities.
      What you need is proper sandboxing of the consuming applications, allowlisting of those applications (instead of “file types” with unspecified client applications), and ideally some type of trust system on top (but we all know how little acceptance stuff like PGP or even S/MIME has).
      In other words: It should be safe for people to open any attachment they get. Think about web browsers: Heavy-hitting vulnerabilities aside, almost every web page you visit is safe for consumption by your computer, because of the browser’s security model. Same with iOS apps.
      The remaining risk is addressed by provenance/trust.
    - ethbr1 72 days ago
      There's always just prefacing with magic bytes. :)
      https://en.m.wikipedia.org/wiki/List_of_file_signatures
    - adolph 72 days ago
      > do automagic things instead of pushing for people to understand extentions
      A file name extension is a convenience for human-computer communication but insufficient metadata about a file to process without inspection. Examples include BOM, Exif.
      https://stackoverflow.com/questions/2223882/whats-the-differ...
      https://www.sciencedirect.com/topics/computer-science/file-s...
- mrgoldenbrown 72 days ago
  Our mail filter inspected zip files even if you renamed them to .txt or jpg, and if it couldn't check the contents because of encryption it would just delete the whole zip. It would also look for data that might be PCI related and flag that as well.
  [-]
  - kccqzy 72 days ago
    Another thing I've heard of is that commonly people include the password for the encrypted zip inside the email, so email filters have learned to try to decrypt the zip using every single word in the original email.
    [-]
    - srockets 72 days ago
      Then I wouldn’t dare emailing a zip bomb with a message containing the password and an instruction not to try and unzip that file.
    - nkrisc 72 days ago
      Use a space in your password.
      [-]
      - kccqzy 71 days ago
        This is an also a good strategy to prevent a good chunk of actual users from decrypting the zip.
    - consp 72 days ago
      Another good reason to always send the passphrase via another channel.
  - userbinator 72 days ago
    That's when you start using the steganography.
- WirelessGigabit 72 days ago
  This reminds me of working at a company in Brussels during eBays heydays. Their URLs looked like http://offer.ebay.com/ws/eBayISAPI.dll?...
  And the filter saw .dll and denied my request.
- stvltvs 72 days ago
  There is a way to cajole zip into encrypting filenames if this is important enough to put up with unzipping twice. Plus you might get better compression!
  https://unix.stackexchange.com/questions/289999/how-to-zip-d...
  [-]
  - ethbr1 72 days ago
    Point! I forgot about that, from my pathological remoted-through-3+-systems consulting days.
  - ale42 72 days ago
    Isn't it better to just use 7z instead, which can encrypt filenames without tricks?
    [-]
    - stvltvs 72 days ago
      This is for times when you can't assume 7zip is available to the decrypter.
- anyfoo 72 days ago
  > Fewer, but some, inspect files to deduce their type.
  I don’t know. That’s extremely error-prone, often easy to fool, and in some cases hardly possible in the first place.
  I think the only thing that’s really feasible is filtering out known bad things as some mild form of damage reduction.
  [-]
  - kccqzy 72 days ago
    Gmail uses Magika https://github.com/google/magika to deduce file types to determine whether and how to scan email attachments.
    [-]
    - anyfoo 72 days ago
      Neat. And I think that proves my point…
      [-]
      - stuffinmyhand 72 days ago
        Especially the "whether" :D
berkes 72 days ago
I've never really understood how this 'urls via our own redirector' really works, if at all.
I can imagine some reasons why it would work better than merely checking the URLs when filtering: not rewriting but simply checking and scoring against other spam rules. But also see reasons why it secures less well.
Pro: you can block URLs post-delivery. E.g. an advisary changes the link to malware or phishing after some time/n requests/based on user agent. Con: all encryption, including DKIM, GPG etc break.
Edit: to clarify why I don't understand why this works: and advisary can still do the same: i.e. serve an innocent page to the checker but then a bad webpage to the actual visitor. Not trivial, but not hard either.
A good middle ground might be to scan URLs on the mailserver against lists and maybe open the page and scan its content before delivering. Then have the client intercept links and redirect them through another on-demand checker?
Because obviously, training people to "not click links in emails" or use only plain text or both, hasn't worked in the last decades.
[-]
- Maxion 72 days ago
  The most funny filters are those that visit the URL to check what it is. Those break all the magic login links, email verification links et. al. that people receive.
  Really can't understand who approved such things.
  [-]
  - rileymat2 72 days ago
    In theory, magic links that are only GET and not idempotent are the issue, not the systems/filters hitting them.
    [-]
    - spiffytech 72 days ago
      This is one of the reasons to adopt magic codes instead of magic links.
      [-]
      - tomschlick 71 days ago
        Or just have your magic link include a "Confirm Login" button once it loads that sends a POST so automated clients don't cause issues.
        [-]
        berkes 71 days ago
        ... potentially enriched with JS that hides that button and does a POST for you on documentLoad or such.
        That way, for a "normal human" it works like they expect, is technically correct, and doesn't trigger on backends fetching the resource. Unless they fetch the resource with some headless chrome or such. Which, unfortunately, is rather necessary these days.
        [-]
        zinekeller 70 days ago
        > ... potentially enriched with JS that hides that button and does a POST for you on documentLoad or such.
        Don't do this, most bots now actually load JavaScript.
    - j45 71 days ago
      Magic links/codes are a security compromise on their own.
      Still, corporate email belongs to the employer.
      For "magic links", It doesn't solve the problem that the email is intended for a specific individual, for one time use, and the filters are not aware or intelligent enough to do so.
      In other cases privacy and security of the communication in some cases is being compromised.
      It does give rise to a bootstrapped solution that delivers links well so I guess that's good.
    - zinekeller 71 days ago
      Yeah, these filtering appliances/services have many, many misgivings but this one is obviously just following the spec that just happens to be violated by "magic" (not really) links (and not to metion breaking in iOS and other systems which preview them, the futility of trying to ignore robots etc.). You simply know that the designers of the service don't have any real-life testing (and often are snobbish, to boot) if "magic" links are mandatory.
  - megous 72 days ago
    GET request should not change any substantial state. You're up for a lot of fun if you decide to violate that as a web developer.
    [-]
    - pvillano 72 days ago
      TBT that guy that used a GET to open his garage door and found it randomly opening from pre-fetches(!)
DeathArrow 72 days ago
It might fool the University filter but it will trigger the NSA filter.
[-]
- lostlogin 72 days ago
  We’re afraid of the three letter agencies, but their list of failures would suggest that quite a lot gets past them.
  [-]
  - rootusrootus 71 days ago
    We'd probably need to know what didn't get past them to get an accurate picture. Maybe the flow is pretty high.
  - vmfunction 72 days ago
    can you show us the list? Truly interested, seems like the three letter guys are pretty good at what they do.
    [-]
    - voidUpdate 72 days ago
      Failures by the three letter guys? oh boy... Waco and Ruby Ridge are but two
      [-]
      - lostlogin 71 days ago
        Events with prominent intelligence failures: Pearl Harbour. Yom Kippur war, the Vietnam war defeat, September 11, Manchester Arena, the invasion of Iraq, the fall of Afghanistan, the Hamas attacks (2014, 2021, now). I’m sure there are masses more.
superkuh 72 days ago
Here is the content that is in the HTML but which the linked page refuses to actually show on the screen, instead opting for a "run javascript" message:
<meta content='Our university deployed a mail filter that rewrites URLs in emails to redirect them via a service that checks for bad websites. Somebody clever worked out that PGP-signed emails are exempt from the rewrite rule, so now people are starting their emails with "BEGIN PGP MESSAGE" even though they haven't used PGP at all, just to fool the filter
Anybody sending malware links has probably also worked out that trick by now, thereby rendering the entire filter pointless' name='description'>
[-]
- cesarb 72 days ago
  A trick which works if you use Mastodon, since that page is a Mastodon post:
  Go to the Mastodon instance where you have an account, and paste the URL in the search box. It should load the post within your Mastodon instance (and allow you to boost/favorite/etc), without running any JavaScript from the post's instance.
  (It won't surprise me if someday a Firefox plugin is created to open these Mastodon post pages without running their JavaScript, by detecting it's Mastodon and going through the same protocol as federated instances to fetch the content. Of course, it would be better if the Mastodon server software didn't require JavaScript for simply showing a post page in the first place.)
  [-]
  - berkes 72 days ago
    Not exactly the plugin you envision, but Graze¹ does this for interaction (boost, favorite, follow, reply etc) with posts on any instance.
    I guess the redirect and re-open on your own instance would be a feature for graze? IDK.
    ¹ https://addons.mozilla.org/en-US/firefox/addon/graze/
- ethbr1 72 days ago
  IMHO, there's definitely some security value is at least running a non-standard config.
  Moving from seeing all attacks (even if blocked) to only targeted attacks decreases the amount of noise a detection system/reviewer has to deal with.
  The important bit, though, is remembering only the above doesn't mean things that make it through are secure, which is usually where companies fall over. (I.e. implement dumb mail filter, then assume any mails that make it through are safe)
- Isognoviastoma 72 days ago
  Thanks for sharing! A bit of css, and it's possible to read mastodon posts with a web browser!
```
    head {display:block; background:Canvas; color:CanvasText;}
    meta[content] {display:block; padding-bottom:1em;}
    meta[content]::after{content:attr(content); display:block;}
```
  [-]
  - o11c 72 days ago
    Or just append /embed
    The disadvantage is that threads don't render.
    [-]
    - superkuh 72 days ago
      The fact that the mastodon devs put the content in the HTML but refuse to show the content tells me everything I need to know about mastodon. It may even be worse than twitter. Generally I will not go out of my way to use or view such links. I just close the tabs.
      [-]
      - cesarb 72 days ago
        In my experience, Mastodon (and other relatives from its family) is much better than current Twitter, at least for non-logged-in users. On Mastodon, as long as you enable JavaScript, you can see the whole thread for the post, while Twitter (which also needs JavaScript) only shows the initial post; and if you go to an account's page, Mastodon shows the whole timeline in reverse-chronological order (which is generally what you want), while Twitter shows old posts in a random order.
        [-]
        superkuh 71 days ago
        I understand that POV and within it's context I agree. But to a non-JS user they are equivalent (Mastodon and Twitter) in functionality: blank pages telling you to execute random code. Very similar to phishing emails telling one to execute the random code attachment.
        Twitter has to be bad by virtue of being a corporation driven by profit motive and requiring JS to collect and sell user data. Mastodon doesn't have to be crap since it is not required to generate maximum profit. It just chooses to be as shown from the content being in the HTML but hidden. This makes Mastodon worse. Either it's malicious or, more likely, incompetent cargo culting of corporate practices. Devs unable to separate themselves from the use-cases they have to develop for at their paid jobs or lacking the knowledge to do so.
- cwillu 72 days ago
  Thank you.
jeffbee 72 days ago
There was a time when prefacing your message with `begin 644` would hide your message, or the remainder of it, from users of MS Outlook. Maybe still works, who knows.
jamespo 72 days ago
I've recently had to zero score SpamAssassin ENCRYPTED_MESSAGE rule as it's being exploited. Also I don't receive any encrypted messages.
[-]
- stuffinmyhand 72 days ago
  > Also I don't receive any encrypted messages.
  Should start giving people your public key then!
  [-]
  - psd1 68 days ago
    Yes, and email address.
rompledorph 71 days ago
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
reidrac 72 days ago
A few times I got spam with all the text in the signature. And I thought, weird: and it makes the email look funny because the signatures are shown on a different color in my email client.
Now I'm thinking if there was some smart idea behind that trying to fool some filters.
godelski 72 days ago
Well it looks like the link is down and the only archive snapshot I see is a loading screen: https://archive.is/obzJ9
----- BEGIN EDIT -----
I can see it from the mastadon app but not in browser. I took a screenshot for others: https://i.imgur.com/q3DA7wQ.png
Text is:
```
  Our university deployed a mail filter that rewrites URLs in emails to redirect them via a service that checks for bad websites. Somebody clever worked out that PGP-signed emails are exempt from the rewrite rule, so now people are starting their emails with "BEGIN PGP MESSAGE" even though they haven't used PGP at all, just to fool the filter 

  Anybody sending malware links has probably also worked out that trick by now, thereby rendering the entire filter pointless
```
@[email protected]
----- END EDIT -----
But I have an interesting issue which is (presumably) related. I keep getting obviously spam emails in my gmail and Google has refused to do anything about it (after finally getting through to support). These are embarrassingly bad emails. Ones you'd expect a naive bayesian classifier to get near 100% accuracy on!
In the "to" address is something that's not even remotely my email. Suppose I'm [email protected], the "to" has something like "[email protected]" and always has a CC with a different number. Subject lines will be things like "confirmation of receipt" but in non-standard and inconsistent fonts. And the message body is that all too typical single image and claim of something like winning a home depot gift card. The "from" address is always some scammy looking address. So Naive Bayes should get it on just this alone!
But the weird thing is when you show the original message. There's PAGES of stuff in here (about 20 pages or 9.5k words)! All kinds of email conversations, things including stuff like "here is your one time <account> password", stuff in other languages, survey questions, conversations, and a lot more. I actually did a diff on a few of these and they're almost identical too! I can't help but think that these are some weird prompt injections. It looks like it is written by a madman, but it it's probably some "throw shit at the wall and see what sticks" kinda thing.
But the message ID always comes from some Microsoft (something like microsoftstore1.microsoft.com). This is where Google support said it isn't their problem and moved on. I'm pretty critical of LLMs and their ability to reason, but I feel often we're trying to turn humans into machines instead of letting them complement one another. And I think this is why so many business people like LLMs, because they think everything can be figured out with clear immutable rules. But the thing is that the environment always changes and you need something that's not only adaptable, but can reason AND understands the actual desired outcome that humans want.
I hope someone at Google pays attention. Because you can't just hand all thinking over to humans. We're not there yet and it's why everything is going to shit (not just Google)
[-]
- cesarb 72 days ago
  > In the "to" address is something that's not even remotely my email. Suppose I'm [email protected], the "to" has something like "[email protected]" and always has a CC with a different number.
  You probably already know this, but the content of these headers is not relevant for email routing. The list of email addresses which will actually receive an email message can be found on the SMTP commands, which precede (and are independent from) the email headers and body. If you look closely at all headers, you can probably find it, since most SMTP servers AFAIK add one or more header fields containing the email address found on the SMTP command.
  That's how BCC can work: the list of emails in the BCC is never sent within the email message (unlike TO or CC), they are only sent on the SMTP commands.
  [-]
  - godelski 71 days ago
    Yeah it is under the SMTP that the microsoft email appears and that's why Google says it isn't their problem. Which I'm still just not sure how in any world a customer receiving spam is considered not their problem.
    But I was suggesting that the actual content in the headers should be providing a hint at spam. Because if something like being "from" home depot shouldn't match with an edu domain. Still, these emails are just wild and I'm very upset that google will do nothing about them. I get them at least once a week.
    And it is unfortunately hard to switch emails... I'm in the process by trying to create unique emails with a relay service so I can just redirect, but still....
g4zj 72 days ago
I don't know much about mail servers, but would it be possible to validate the signature somehow before delivering the message? Is this impossible or impractical for some reason?
[-]
- cmgbhm 72 days ago
  PGP doesn’t require the use of global directories so you can’t do strong validation.
  This is a bypass game on something like proof point url protection that rewrites URLs to go through a central redirect so that reputation check can be separated from reputation delivery, usage detected, and one liner URLs still work. I’m over a decade out of this space so take with a grain of salt.
  The tool could do a better job inspecting “oh that’s a pgp formatted massage”. You could probably just route them all to spam and almost no one would notice.
- upofadown 72 days ago
  Email signatures are essentially end to end. The company would have to keep a file of all the identities of all the people that everyone in the company knew. To make this useful, the company would also have to record information about who knew who.
  As already mentioned, the email might also be encrypted (the most common case). Then the signature would not be available as it would be hidden under the encryption. These sorts of filtering issues are why corporate cybersecurity people, somewhat ironically, tend to dislike encrypted email. There is an opportunity here to make better email clients that treat encrypted, but unsigned, emails with suspicion.
- thenewnewguy 72 days ago
  Even if they did, you could just sign the message for real.
- layer8 72 days ago
  There is commercial mail server software that does exactly that.
bongodongobob 72 days ago
Tried this with my work email, it doesn't seem to work with whatever our exchange setup is. Is this just a case of an admin using the default settings in their filtering?
[-]
- layer8 72 days ago
  It’s a custom filter deployed at that university.
mrkramer 72 days ago
Hug of death?
swyx 72 days ago
obligatory https://xkcd.com/1181/
[-]
- gramakri2 72 days ago
  heh, I am awed by the foresight of the author !
Hizonner 72 days ago
There is a special place in Hell for people who "helpfully" rewrite email bodies in transit.
[-]
- jacoblambda 72 days ago
  Stares aggressively at Protonmail.
  Yeah you at Proton. I'm talking to you.
  Stop doing this.
  No touchy the email.
  [-]
  - jimbosis 72 days ago
    Do you or does anyone else have a summary of or any link to read about what you are alluding to, namely Proton Mail rewriting email bodies?
    [-]
    - super_linear 72 days ago
      https://news.ycombinator.com/item?id=36639530
  - ChainOfFools 72 days ago
    doesn't this break DKIM?
    [-]
    - 256_ 72 days ago
      DKIM headers can have an optional length parameter which says "the first n bytes of the message body is cryptographically signed", whilst allowing the rest of the message to be modified. I don't know if anyone actually uses it, though; I don't think I've ever seen or used it.
      Also, DKIM can optionally be lax about whitespace.
      [-]
      - megous 72 days ago
        That sounds like something that can interact really well with HTML email, if you can manage to overlay the unsigned content over signed content visually via CSS/inline styles. :)