Homoglyph Attacks: How Lookalike Characters Are Exploited for Cyber Deception

Table of Contents:

Introduction
What is a Homoglyph Attack?
Practical Homoglyph Confusable
- Practical Homoglyph Confusable Table
Why Homoglyph Attacks Are Effective
Common Homoglyph Use Cases and Attack Vectors
Real-World Examples and Campaign Patterns
Technical Deep Dive — Unicode, IDNs, and Punycode
- Unicode and Scripts
- IDNs and Punycode
- Mixed Scripts and Confusable
Attack Flow — Step-by-Step
Why Detection Can Fail — Subtle Technical Pitfalls
MITRE ATT&CK Mapping (High-Level)
Defensive Measures and Operational Recommendations
- Policy and Governance
- Technical Controls
- Operational Practices
Best-Practice Checklist
Emerging Trends to Watch
Conclusion

Introduction

You glance at a URL, see a familiar brand name, and click — only to hand your credentials to an attacker. That tiny visual mistake (an “o” that’s actually a Greek omicron, a lowercase “l” replaced by a capital “I”) is exactly what homoglyph attacks exploit. Homoglyphs are visually similar characters from different character sets (Latin, Cyrillic, Greek, full-width forms, etc.). When attackers swap characters in domains, filenames, message display names, or code, humans — and often automated defences — are fooled.

Homoglyph attacks are a low-cost, high-impact deception technique. They are used for phishing, brand impersonation, malware distribution, supply-chain confusion, and bypassing simplistic detection rules. This blog explains the technical mechanics (Unicode, IDNs, Punycode), how attackers operationalize homoglyphs, detection and hunting approaches, real-world usage patterns, MITRE mapping, and practical defences — including how layered protections like Quick Heal / Seqrite help.

What is a homoglyph attack?

A homoglyph is a character that looks like another character. For example:

Latin a (U+0061) vs Cyrillic а (U+0430)
Latin o (U+006F) vs Greek ο (omicron, U+03BF)
Latin I (capital i, U+0049) vs lowercase l (ell, U+006C) vs Cyrillic І (U+0406)

A homoglyph attack replaces one or more characters in an identifier (domain, filename, email display name) with visually confusable alternatives to impersonate a trusted resource. When used in Internationalized Domain Names (IDNs), these domains are represented in ASCII using Punycode (the xn-- prefix) but often rendered in browsers using the original Unicode characters — giving an authentic-looking URL to users.

A quick Punycode example (conceptual, anonymized):

Displayed domain: gοogle-example[.]com (Greek omicron used instead of Latin ‘o’)

Punycode (ASCII): xn--gogle-example-abc[.]com

Practical Homoglyph Confusable

Homoglyph attacks exploit visually similar characters from different language scripts such as Latin, Cyrillic, and Greek. These lookalike letters can deceive users, spoof trusted domains, and even bypass some automated filters.

Below is a quick reference showing commonly abused homoglyph pairs seen in phishing and impersonation campaigns.

Practical Homoglyph Confusable Table

Visual	Legitimate Character	Lookalike(s)	Script	Common Use in Attacks
a	a (U+0061)	а (U+0430)	Cyrillic	“paypаl”, “fаcebook”
e	e (U+0065)	е (U+0435)	Cyrillic	“mіcrosoft”, “tеsla”
o	o (U+006F)	ο (U+03BF), о (U+043E)	Greek / Cyrillic	“gοogle”, “microsοft”
i	i (U+0069)	ı (U+0131), І (U+0406)	Turkish / Cyrillic	“instаgram”, “mіcrosoft”
l	l (U+006C)	I (U+0049)	Latin	“googIe”, “micros0ft”
c	c (U+0063)	с (U+0441)	Cyrillic	“faсebook”, “miсrosoft”
p	p (U+0070)	р (U+0440)	Cyrillic	“раypal”, “droрbox”
s	s (U+0073)	ѕ (U+0455)	Cyrillic	“microѕoft”, “ѕlack”
y	y (U+0079)	у (U+0443)	Cyrillic	“уahoo”, “раypal”
x	x (U+0078)	х (U+0445)	Cyrillic	“хbox”, “linυx”
d	d (U+0064)	ԁ (U+0501)	Cyrillic	“clouԁflare”
h	h (U+0068)	һ (U+04BB)	Cyrillic	“һbo”, “һulu”
n	n (U+006E)	n (U+0578)	Armenian	“liпkedin”, “amazoп”
m	m (U+006D)	rn (sequence)	Latin (visual trick)	“rnicrosoft” instead of “microsoft”
0	0 (digit zero)	O (U+004F), о (U+043E)	Latin / Cyrillic	“micr0soft”, “g00gle”

Why homoglyph attacks are effective?

Human perception: People evaluate URLs visually and are poor at spotting subtle character differences.
Display vs. storage mismatch: Systems may store ASCII (Punycode) but display Unicode, introducing confusion.
Policy/allowlist gaps: Allowlisting based on visible strings (without normalization) can miss IDN-based lookalikes.
Certificate and hosting availability: Attackers can obtain TLS certs for lookalike domains (Let’s Encrypt and similar), raising perceived legitimacy.
Automation gaps: Many security pipelines don’t normalize Unicode or run mixed-script detection, so homographs slip through.

Common homoglyph use cases and attack vectors

Spear-phishing & credential harvesting: Phishing emails contain links to lookalike domains that host credential-collection forms.
Business Email Compromise (BEC): Invoice/payment scams where the sender’s display name or a domain in an invoice looks correct but contains homoglyphs.
Malvertising / malware distribution: Executables and updates are hosted on lookalike domains to trick analysts and sandboxes.
Username/display name spoofing: On Slack/Teams/Email, attackers register accounts where the display name uses homoglyphs to impersonate coworkers.
Supply-chain & developer confusion: Package names, repo names, or variable identifiers with lookalike characters cause devs to pull malicious code or execute wrong binaries.

Real-world examples and campaign patterns

To stay actionable and responsible, the following are anonymized patterns and publicly reported behaviours (no brand finger-pointing):

Finance-targeted phishing: Campaigns register lookalike domains of payment portals with mixed Latin/Cyrillic characters, host credential forms, and send follow-ups to improve success.
SaaS impersonation: Attackers registered IDNs visually identical to a popular SaaS login page to harvest credentials, often pairing the domain with a valid TLS certificate and a convincing HTML login form.
Executive impersonation in BEC: Display names in email clients (or slight domain modifications) are used to request urgent transfers; perpetrators rely on users not inspecting the actual return-path domain.
Malware distribution via lookalike downloads sites: Fake download portals (e.g., for installers) hosted on homoglyph domains to push malicious payloads that sandbox detonation misses because domain reputation is new.

Technical deep dive — Unicode, IDNs, and Punycode

Unicode and scripts

Unicode is a comprehensive character set that includes many scripts (Latin, Cyrillic, Greek, Armenian, Hebrew, Arabic, etc.). Many glyphs across different scripts look similar or identical at typical font sizes.

IDNs and Punycode

The Domain Name System (DNS) historically supports only ASCII. To allow non-ASCII names, IDNA (Internationalized Domain Names in Applications) employs Punycode — an ASCII-compatible encoding prefixed by xn--. For example, пример (Cyrillic) becomes xn--e1afmkfd.

Browsers decide whether to display the Unicode form or the Punycode form based on heuristics. If a domain uses characters from a single script and that script matches the user’s locale, browsers often display the Unicode string — which is visually deceptive for someone used to Latin characters.

Mixed scripts and confusable

Attackers often use mixed-script domains, combining Latin letters with a few Cyrillic or Greek characters in positions that are visually sensitive (brand name core, domain label start/end).

Technical mechanics that matter for detection:

Normalization forms (NFC, NFD, NFKC) change canonical decomposition/composition and affect string comparisons.
Confusables tables (Unicode consortium) list visually confusable characters; defenders can use these for fuzzy matching.
BIDI (bidirectional) controls can reverse text rendering (u202E), used by attackers to obfuscate filenames or display names.

Attack flow — step-by-step

Recon & Branding: Attacker gathers brand names, common subdomains, and localized scripts used by the target.
Domain prep: Register homoglyph domain(s) via a registrar that accepts IDNs; optionally obtain TLS certs.
Hosting & content: Set up phishing page, download portal, or redirect flows; configure email templates to point to the domain.
Delivery: Send emails, ads, or social messages linking to the homoglyph domain; exploit typical trust cues (logos, similar wording).
Collection & exploitation: Harvest credentials, push malware, monetize via fraud or sale on access markets.
Persistence: Use harvested credentials to expand access or register more lookalike domains to rotate campaigns.

Why detection can fail — subtle technical pitfalls

No Unicode normalization: Tools that compare strings directly without Unicode normalization miss matches.
Font/rendering variance: Some fonts reveal differences (serifs), others hide them (sans-serif at small sizes).
Mixed-script heuristics: Not all filters flag mixed scripts; some legitimacy checks only ensure ASCII.
TLS false sense of security: A valid certificate is not proof of identity; certificate transparency helps but doesn’t block registration patterns.

MITRE ATT&CK mapping (high-level)

Homoglyph attacks most commonly align with phishing-based initial access, where lookalike domains host credential-harvesting pages.
Attackers rely on open-source intelligence to craft believable impersonation targets and acquire deceptive domains and TLS certificates during the resource-development phase.
Masquerading techniques are used to evade defences, ultimately enabling credential theft, fraud, or broader intrusion activity.

Stage	Technique	ATT&CK ID	Homoglyph relevance
Initial Access	Phishing: Spear phishing Link	T1566.002	Lookalike domains host credential pages
Reconnaissance	Search Open Websites/Domains	T1593	OSINT used to craft target-specific homoglyphs
Resource Development	Acquire Domain	T1583.001	Register homoglyph domains and TLS certs
Defence Evasion	Masquerading / Deceptive Naming	T1036	Homoglyphs impersonate trusted names
Credential Access	Phishing for Credentials	T1531 / T1556	Harvested credentials used for takeover
Impact	Data Encrypted for Impact / Fraud	T1486 / T1490	Initial vector leads to larger intrusions

Defensive Measures and Operational Recommendations

Policy and governance

Organizations should maintain a formal domain-defence strategy that includes registering common lookalike domains for high-value brands and services.
Clear IDN usage policies should prohibit mixed-script domains in official communications.

Technical controls

Email gateways and web proxies must normalize Unicode and clearly surface Punycode warnings for suspicious links.
DNS filtering systems should treat newly observed xn-- domains as high risk until reviewed.
Certificate transparency monitoring should alert security teams when certificates are issued for lookalike domains.

Operational practices

Brand-monitoring programs should track domain registrations and abuse reports in near real time.
Phishing simulations should include realistic homoglyph-based scenarios to improve user awareness.
Incident response playbooks should document takedown workflows, including registrar and hosting provider escalation.

Best-Practice Checklist

Enforce multi-factor authentication on all sensitive services.
Normalize and inspect all inbound URLs, displaying Punycode when appropriate.
Monitor certificate transparency and passive DNS data for newly registered lookalike domains.
Block or strictly review mixed-script domains.
Run phishing simulations that include homoglyph techniques.
Register defensive domain variations for critical brands.
Require secondary verification for financial or credential-related requests.

Emerging Trends to Watch

Attackers increasingly automate homoglyph generation and domain registration at scale.
AI-assisted phishing improves the credibility of lures while homoglyph domains host the deception layer.
Homoglyph abuse is expanding into software supply chains through deceptive package and repository names.
Cross-channel impersonation combines homoglyphs with chat platforms and voice cloning to increase trust and success rates.

Conclusion

Homoglyph attacks demonstrate how minor visual manipulation can lead to major security failures. By exploiting Unicode complexity and human perception, attackers bypass both users and poorly normalized defences.

Effective mitigation requires layered controls: Unicode normalization, confusable matching, mixed-script detection, proactive domain monitoring, and strong user verification processes. When combined, these measures significantly raise the cost and complexity for attackers—turning a simple deception technique into a far less effective threat.

The post Homoglyph Attacks: How Lookalike Characters Are Exploited for Cyber Deception appeared first on Blogs on Information Technology, Network & Cybersecurity | Seqrite.

Homoglyph Attacks: How Lookalike Characters Are Exploited for Cyber Deception

Homoglyph Attacks: How Lookalike Characters Are Exploited for Cyber Deception

Introduction

What is a homoglyph attack?

A quick Punycode example (conceptual, anonymized):

Practical Homoglyph Confusable

Practical Homoglyph Confusable Table

Why homoglyph attacks are effective?

Common homoglyph use cases and attack vectors

Real-world examples and campaign patterns

Technical deep dive — Unicode, IDNs, and Punycode

Unicode and scripts

IDNs and Punycode

Mixed scripts and confusable

Attack flow — step-by-step

Why detection can fail — subtle technical pitfalls

MITRE ATT&CK mapping (high-level)

Defensive Measures and Operational Recommendations

Policy and governance

Technical controls

Operational practices

Best-Practice Checklist

Emerging Trends to Watch

Conclusion

Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every month.

Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every month.

By rooter