For roughly the first decade of B2B email tracking β call it 2010 to 2020 β the math was simple. A tracking pixel loaded when the recipient opened the email. The dashboard counted that as an open. The number was directionally correct, the rep acted on it, and the system worked.
That system is now broken. By a lot. This piece is the documented version of what changed, what it costs sales teams, and what to measure instead.
The four forces that broke open tracking
Apple Mail Privacy Protection (since 2021)
Apple shipped Mail Privacy Protection in iOS 15 in September 2021. The behavior: any image embedded in an email β including the tracking pixel β gets pre-fetched by Apple's servers when the email is delivered, regardless of whether the recipient ever opens it.
The mechanism is straightforward. Apple downloads the email content to a privacy-preserving proxy, fetches all remote images server-side, caches them, and serves the cached version when the user actually opens the message. From the sender's tracker's perspective, every email to an Apple user looks "opened" within minutes of send, from an Apple IP, with no way to distinguish the proxy fetch from a real human read.
The reach: Litmus' 2024 client share report puts Apple Mail at roughly 58% of global email opens in B2B contexts, with iOS Mail specifically at ~46% and macOS Mail at ~12%. The MPP-enabled rate among Apple Mail users is harder to pin down precisely, but Apple's own default-on guidance combined with how the iOS 15 first-launch prompt is worded means roughly half of all Apple Mail users have MPP active. That's 25-35 percentage points of raw open-rate inflation on most B2B lists, from this one source alone.
Gmail's image proxy (since 2013, expanded behavior since 2022)
Google routes every image in Gmail through their own image-proxy infrastructure. When you send an email with a tracking pixel and the recipient opens it in Gmail, your server doesn't see the recipient's IP β it sees Google's proxy IP. The proxy fetches the pixel, caches it, and serves the cached version to the recipient.
For years this was just a privacy and bandwidth optimization. What changed around 2022-2023 was Gmail's tendency to refetch the pixel on subsequent opens (and sometimes pre-fetch it before any open at all, for security scanning purposes). The result: a single Gmail recipient can generate 2-5 "opens" in a tracker that doesn't deduplicate proxy hits, with no way to distinguish the original human read from the cached refetches.
For a B2B list where 30-40% of recipients are on Google Workspace, this is another 10-15 percentage points of inflation on top of the Apple effect.
Corporate email scanners (always there, now mandatory in many environments)
Microsoft Defender for Office 365, Proofpoint, Mimecast, and several others pre-fetch every link and image in every incoming email before delivery. The purpose is security β they're checking for malware in image-disguised payloads, phishing redirects, and known-bad domains. The side effect on tracking is dramatic.
A typical enterprise email-security stack opens your tracking pixel within seconds of send, from the company's edge IP, every time. The buyer themselves opens the email maybe an hour later, but by then the scanner has fired five or six times. The dashboard reports an opens count that's mostly the scanner.
For financial services, healthcare, and large-enterprise lists, this is the single biggest inflation source β sometimes 30-40 percentage points by itself.
"Self-opens" (the sender re-reading their own sent items)
This one is the smallest of the four but the most consistently underappreciated. Every time you open your own sent email β to remind yourself of the wording before a call, to forward it to a colleague, to copy-paste a line into a follow-up β the tracking pixel loads. Most trackers count that as an open.
For an AE running a long deal cycle, the sender's self-opens easily account for 5-15 percentage points of the reported open count on the deals that matter most. The proposal you keep re-reading. The follow-up you're drafting. Each load is a phantom "open" that doesn't belong in your stats.
What this costs sales teams
The cost of the noise is not the noisy number itself. Teams report 70% open rates and 2% reply rates and intuitively understand the two don't reconcile. The real cost is the downstream decisions made on the noise.
Reps chase dead threads
The most common pattern: a tracker fires "5 opens on the proposal" alert. The rep sees five opens in two days and assumes high interest. They call. The buyer doesn't pick up. They email a "just checking in" follow-up. Still nothing. They call again, follow up again, and burn three hours of attention on a thread where the "five opens" were the buyer's iPhone pre-fetching the email five times because of MPP.
Multiply that pattern across 50 active threads per rep per week. Each false positive costs roughly an hour of misallocated attention. On a team of 10 reps, that's a few hundred hours per quarter spent chasing engagement that wasn't real.
Reps miss the actually-warm threads
The mirror problem: the rep stops trusting opens entirely. When a thread genuinely warms up β a buyer reading the proposal three times in 48 hours, distinct IPs from the buyer's office and their home β it looks the same in the dashboard as the noise. The rep tunes out. The warm thread goes cold because the rep didn't act on the signal that was actually real.
This is the harder cost to attribute, because you only notice it on the deals that closed elsewhere or stalled out. But it's roughly the same magnitude as the false-positive cost.
Cadence and routing logic break
Most modern sales-engagement platforms run rule-based routing: "if a prospect opens 3+ times in 48 hours, route to the rep's hot-lead queue." Those rules were designed when opens were directionally accurate. In 2026, those same rules misroute most of the queue.
The fix is either to filter the opens before they hit the rule, or to rewrite the rules around different inputs (replies, clicks, multi-channel engagement). Most teams haven't done either.
What to measure instead
The honest answer for B2B teams in 2026:
Reply rate is the cleanest single signal. No proxy generates a reply. No scanner sends a polite "thanks, not interested." Reply rate survives the entire noise stack intact. A B2B team that anchors on reply rate, weighted by sentiment, will make better decisions than one anchored on opens β even if the open data improves.
Click rate on high-intent links is the second-cleanest. Some proxies do prefetch links (Microsoft Defender SafeLinks does, Gmail's link wrapping does, Apple's MPP does not as of writing). But the noise is much smaller than on opens because clicks require a more specific user action to fire. Click rate on a pricing page or a comparison page is still a strong intent signal.
Confidence-scored opens are the third piece. This is where Outsolvi sits. Every open gets a Tier 1-5 score at the request level based on the User-Agent, IP block, time-of-day pattern, and prefetch headers. Tier 1 reads are filtered up to the rep as real engagement; Tier 4-5 reads are filtered out by default. The dashboard reports the filtered number, not the raw count. The rep stops chasing proxies.
Some other vendors are landing in similar territory under different vocabulary. Microsoft's Sales Copilot started exposing "verified" vs "preview" opens in 2025 β same shape, different language. We expect this to become the new default by 2028; the noise problem is well enough understood that legacy raw-count tracking won't survive much longer.
What to do about it this quarter
Three things in priority order, if you're running B2B outbound today:
1. Stop reporting raw open rate as a single number to leadership. The number is misleading by 30-50 points on most lists. Replace it with reply rate plus a "high-confidence opens" sub-metric if your tracker exposes one. Leadership will adjust. The expectations math gets easier, not harder, because the new metrics are defensible.
2. Rewrite hot-lead routing rules to weight replies and high-intent clicks over open counts. A prospect who replies to a touch within 4 hours is qualitatively different from one who "opened" five times. Route on the former; deprioritize the latter. The 4-hour follow-up window has near-3x conversion on warm replies versus 24-hour-late ones, and that ratio holds across most B2B segments.
3. Either switch to a tracker that confidence-tiers opens, or build the filter yourself on top of whatever you have today. This is the structural fix. Vendors that don't tier are reporting numbers that no rep should be acting on. Vendors that do tier are surfacing the cleaner signal. The third option β staying with a non-tiering tracker and ignoring the open dashboard β also works but wastes whatever you're paying for the tracker.
The methodology behind this piece
The numbers in this report come from a combination of public sources and Outsolvi's tracker-fleet observations.
Public sources we cite directly: Litmus 2024 Client Share Report (Apple Mail market share), Apple's iOS 15 release notes (MPP behavior), Google's Gmail developer documentation (image proxy behavior), Microsoft's Defender for Office 365 documentation (SafeLinks pre-fetch behavior). When we cite a specific percentage from these sources, it's quoted; when we describe behavior, it's from the official documentation.
Outsolvi-specific observations (the 30-50 point inflation range, the per-segment breakdowns, the 4-hour follow-up window conversion lift) come from our own anonymized tracker fleet. We document our test methodology and calibration cadence on the [methodology page](/methodology). Our seed inboxes cover Apple, Gmail, Microsoft 365, and Yahoo on both consumer and business plans, and we recalibrate quarterly.
Where this piece extrapolates beyond what we can directly measure (e.g., "industry-wide" claims), we say so explicitly and present ranges rather than specific numbers.
What's not in this report
We deliberately don't make claims about specific competitor accuracy. That gets adversarial fast and most of the data we'd cite isn't ours to share. Our [comparison pages](/compare) handle the head-to-head competitor analysis with direct documented references. This piece is about the industry-wide pattern, not which vendor is best.
We also don't claim Outsolvi solves the noise problem perfectly. We tier opens better than most legacy trackers in 2026, but the noise stack keeps evolving (Apple ships new MPP behavior, Google adjusts the image proxy, Microsoft updates Defender) and the calibration is a moving target. We recalibrate every quarter; this piece will get a fresh review in Q3 2026.
If you want to estimate the noise on your own send patterns, we built a [free calculator](/tools/apple-mpp-impact-estimator) that takes your list size and audience composition and outputs an inflation range. No signup required.
The takeaway
The 2010-2020 generation of email tracking measured what mailbox-provider behavior of that era let it measure. The 2026 generation has to measure around mailbox-provider behavior that was specifically designed to break the old pattern.
Teams that recognize the shift, change what they measure, and pick tools that survive the new measurement environment will run cleaner pipeline. Teams that keep optimizing on raw open count will spend the next 24 months wondering why the funnel doesn't respond the way it used to.
If you want our take on the cleanest way through, [try Outsolvi free for 14 days](https://my.outsolvi.com/signup). The default dashboard filters proxy noise and surfaces the actual human reads. You can flip the filter off if you want to see the raw number for diagnostic purposes β but you probably won't want to.