Technical Deep-Dive

How BeLikeNative Uses Claude Sonnet 4.6 for AI Text Humanization

Inside the Chrome extension that transforms AI-generated text into natural human writing using Claude's streaming API. Architecture, prompt design, latency, and cost analysis.

By Michael Lip · May 28, 2026 · 12 min read

10K+

Active Users

4.6★

Chrome Store Rating

80+

Languages Supported

1-3s

Avg Response Time

296

User Reviews

1. What Is BeLikeNative?

The BeLikeNative extension is a Chrome extension built for non-native English speakers. With over 10,000 users and a 4.6-star rating from 296 reviews, it provides L1-aware grammar correction across 80+ languages. But its most technically interesting feature is Humanize AI mode, which uses Claude Sonnet 4.6 via streaming API to rewrite AI-generated text so it reads as if a human wrote it.

The problem it solves is practical: AI-generated text has detectable patterns. Uniform sentence length, predictable transitions ("Furthermore," "Moreover,"), lack of colloquial contractions, and overly formal register. These patterns flag content in AI detectors and, more importantly, they sound unnatural to human readers. BeLikeNative's Humanize AI mode rewrites text to eliminate these tells while preserving the original meaning and factual content.

This article is a technical deep-dive into how BeLikeNative integrates Claude Sonnet 4.6 to accomplish this, including the Chrome extension architecture, streaming implementation, prompt engineering approach, and real-world performance characteristics.

2. Architecture Overview

BeLikeNative follows a standard Manifest V3 Chrome extension architecture with one critical design decision: all API communication routes through the background service worker, never from the content script directly. This is both a security requirement (API keys must not be exposed to page context) and a Manifest V3 requirement (content scripts cannot make cross-origin requests to arbitrary APIs).

Request Flow

When a user selects AI-generated text and presses Ctrl+Shift+L, the following sequence executes:

Content Script captures the selected text from the active DOM element and sends a chrome.runtime.sendMessage to the background service worker.
Background Service Worker receives the message, constructs the API request with the system prompt and user text, and opens a streaming connection to the Claude API.
Claude API processes the request and begins streaming content_block_delta events containing the rewritten text chunks.
Service Worker relays each chunk back to the content script via the open message port.
Content Script progressively replaces the selected text in the DOM with the incoming stream, giving the user real-time visual feedback of the rewrite in progress.

Dual-Engine Design

BeLikeNative uses two different AI models for two different purposes:

Grammar Engine (Default Mode)

Model: GPT-5.4 mini — optimized for fast, low-cost grammar correction. Handles spelling, punctuation, subject-verb agreement, article usage, and L1-specific error patterns. Latency target: under 500ms.

Humanize AI Engine (Premium Mode)

Model: Claude Sonnet 4.6 via streaming API — optimized for stylistic transformation. Restructures sentences, varies paragraph length, injects natural speech patterns, and adjusts register to match context. Activated via Ctrl+Shift+L. Latency: 1-3 seconds for typical passages.

This separation is deliberate. Grammar correction is a classification task where speed matters more than nuance; GPT-5.4 mini excels here. Text humanization is a creative rewriting task where output quality is paramount; Claude Sonnet 4.6 is the better tool for that job.

3. Why Claude Sonnet 4.6 for Humanization

The choice of Claude Sonnet 4.6 over alternatives was driven by extensive A/B testing across five dimensions: naturalness, meaning preservation, style variety, latency, and cost. Here is how the models compared in internal evaluation:

Dimension	Claude Sonnet 4.6	GPT-4o	Gemini 2.5 Flash
Naturalness (1-10)	9.2	7.8	7.1
Meaning Preservation	97%	94%	91%
Style Variety (unique phrasings)	High	Medium	Low
Avg Latency (streaming first token)	380ms	290ms	310ms
Cost per 1K tokens (output)	$0.015	$0.010	$0.003
AI Detector Bypass Rate	94%	76%	68%

Claude Sonnet 4.6 won on the metrics that matter most for this use case: naturalness, meaning preservation, and AI detector bypass rate. The slight cost premium over GPT-4o is justified by the 18-point improvement in bypass rate. When users pay for a humanization feature, they expect the output to actually pass as human-written. A 76% bypass rate means roughly 1 in 4 rewrites would still be flagged, which is unacceptable for the product promise.

Claude's Specific Advantages for Humanization

Register sensitivity: Claude naturally adjusts formality level based on context. A Slack message gets casual contractions; an academic abstract gets measured prose. This behavior emerges from the system prompt without needing explicit formality parameters.
Varied sentence structure: GPT-4o tends to produce alternating short-long sentence patterns. Claude produces genuinely varied structures including fragments, parentheticals, and mid-sentence pivots that characterize real human writing.
Streaming quality: Claude's streaming output reads coherently even mid-stream. Each chunk is a natural continuation, which matters because users see the text appear in real-time and a garbled mid-stream output would undermine confidence.

4. Streaming Implementation

BeLikeNative uses Claude's streaming API via fetch with a ReadableStream reader in the background service worker. Here is the core implementation pattern:

// Background service worker — simplified streaming handler
async function humanizeText(text, userLang) {
  const response = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': apiKey,
      'anthropic-version': '2023-06-01'
    },
    body: JSON.stringify({
      model: 'claude-sonnet-4-6-20250514',
      max_tokens: 4096,
      stream: true,
      system: buildSystemPrompt(userLang),
      messages: [{ role: 'user', content: text }]
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop(); // Keep incomplete line in buffer

    for (const line of lines) {
      if (!line.startsWith('data: ')) continue;
      const data = JSON.parse(line.slice(6));
      if (data.type === 'content_block_delta') {
        // Relay chunk to content script
        port.postMessage({
          type: 'stream-chunk',
          text: data.delta.text
        });
      }
    }
  }
}

Why Streaming Matters for This Use Case

Without streaming, the user would press Ctrl+Shift+L, wait 2-3 seconds staring at their original text, then see it replaced all at once. With streaming, the replacement begins within 400ms and the user watches the humanized text materialize character by character. This creates a perception of speed even when total response time is the same, and it gives users confidence that the system is working.

Server-Sent Events Parsing

Claude's streaming API uses the Server-Sent Events (SSE) protocol. The key events BeLikeNative handles:

message_start — contains the message ID and model info. Used for logging.
content_block_start — signals the beginning of a text block.
content_block_delta — contains the actual text chunk in delta.text. This is the event that drives the UI update.
message_delta — contains the stop reason and final token usage. Used for the cost calculator.
message_stop — signals completion. Triggers the final UI state.

5. Interactive Demo: Streaming Humanization

Paste AI-generated text below and click "Humanize" to see a simulated streaming rewrite. The output appears character-by-character, mimicking exactly how Claude's streaming API delivers results to BeLikeNative users.

Streaming Humanization Demo

Humanized output will appear here with streaming animation...

Input tokens: 0 Output tokens: 0 Est. cost: $0.000 Latency: -

6. Prompt Engineering for Humanization

The system prompt is the most critical component. It must instruct Claude to rewrite text so it reads naturally without altering the factual content. Here is the general structure (simplified from the production version):

You are a writing assistant that makes AI-generated text
sound naturally human. The user's native language is {lang}.

Rules:
- Preserve ALL factual claims, numbers, and proper nouns
- Vary sentence length: mix short punchy sentences with
  longer compound ones
- Use contractions where natural (it's, don't, we've)
- Replace generic transitions (Furthermore, Moreover, In
  addition) with context-specific connectors or remove them
- Add occasional colloquialisms appropriate to the context
- Break predictable paragraph structures
- Keep the same approximate length (within 15%)
- Do NOT add disclaimers, caveats, or meta-commentary
- Do NOT use markdown formatting
- Output ONLY the rewritten text, nothing else

L1-Aware Adjustments

A distinguishing feature of BeLikeNative is that it knows the user's native language (L1). This matters because non-native speakers from different language backgrounds make systematically different errors and have different naturalness expectations:

Spanish/Portuguese speakers: The system adjusts for relative clause patterns and adjective placement that directly translates from Romance languages.
East Asian language speakers (Chinese, Japanese, Korean): Emphasis on article usage (a/an/the), plural markers, and preposition selection, which are absent in these L1s.
German/Dutch speakers: Verb placement in subordinate clauses and compound noun handling.
Arabic/Hebrew speakers: Right-to-left reading patterns affect sentence rhythm preferences; the system adjusts clause ordering.

The L1 parameter is passed into the system prompt so Claude can adjust its rewriting strategy accordingly, producing output that sounds natural specifically for a speaker of that native language writing in English.

Why These Prompt Rules Work

Each rule targets a specific AI detection signal:

"Vary sentence length" — AI text tends toward uniform sentence length (15-25 words). Human writing has standard deviations 2-3x higher.
"Use contractions" — AI models default to formal register. Real humans writing emails, Slack messages, or blog posts contract naturally.
"Replace generic transitions" — "Furthermore" and "Moreover" are the most reliable AI tells. Humans rarely use these words outside academic writing.
"Do NOT add disclaimers" — Without this rule, Claude would prepend "Here's a more natural version:" which defeats the entire purpose.

7. Token & Cost Calculator

Estimate token usage and cost for a humanization request. Paste text below to see how many tokens it would consume and the approximate API cost using Claude Sonnet 4.6 pricing.

Token & Cost Estimator

Input Tokens (est.)

Output Tokens (est.)

~320

System Prompt Tokens

$0.000

Est. Cost per Request

$0.00

Cost per 1K Requests

8. Performance Characteristics

Real-world performance data from production usage:

380ms

Time to First Token

1.2s

Median Total Latency

2.8s

P95 Latency

~450

Avg Input Tokens

~420

Avg Output Tokens

$0.008

Avg Cost per Request

Latency Breakdown

The total time from keypress to fully replaced text breaks down as follows:

Content script to service worker: ~5ms (Chrome IPC, negligible)
Service worker to Claude API: ~80ms (HTTPS connection, dependent on user location)
Claude processing to first token: ~300ms (model inference startup)
Streaming tokens: ~800ms for a typical 400-token response at ~50 tokens/second
DOM replacement: ~5ms per chunk (batched with requestAnimationFrame)

The perceived latency is much lower than total latency because the user sees text appearing within 400ms of pressing the shortcut. The streaming architecture turns a 1.2-second operation into something that feels near-instant.

Token Usage Patterns

The system prompt is approximately 320 tokens. For a typical humanization request:

Short text (1-2 sentences): ~80 input + 320 system + ~75 output = ~475 total tokens, ~$0.003 cost
Medium text (1-2 paragraphs): ~300 input + 320 system + ~280 output = ~900 total tokens, ~$0.006 cost
Long text (full page): ~1,200 input + 320 system + ~1,100 output = ~2,620 total tokens, ~$0.020 cost

9. Privacy & Data Handling

Privacy is a critical concern for a tool that processes user text. BeLikeNative's approach:

Data Processing Policy

Text is sent to the Claude API for processing but is not stored. The API request is stateless: text goes in, rewritten text comes back, and no conversation history is maintained between requests.

No training data contribution. Anthropic's API terms state that data sent through the API is not used to train Claude models. BeLikeNative explicitly relies on this guarantee.

No server-side logging. BeLikeNative does not operate its own backend server. The extension communicates directly from the service worker to the Claude API. There is no intermediary server that could log or store text.

API keys are user-provided (Premium). Premium users supply their own Anthropic API key, which is stored locally in chrome.storage.local and never leaves the browser except in API request headers.

10. Technical Lessons & Edge Cases

Manifest V3 Service Worker Limitations

Chrome's Manifest V3 terminates service workers after 30 seconds of inactivity. For streaming requests that can take 3+ seconds, this is not an issue during active streaming (the open connection keeps the worker alive). However, if a user triggers humanization and immediately switches tabs, the service worker can be terminated mid-stream. BeLikeNative handles this by:

Using chrome.runtime.Port connections (which keep the service worker alive as long as the port is open) instead of one-shot sendMessage.
Implementing reconnection logic: if the port disconnects mid-stream, the content script shows the partial result with a "retry" option.

DOM Replacement Challenges

Replacing selected text in arbitrary web pages is harder than it sounds. The selected text might span multiple DOM nodes, be inside a contenteditable div, a textarea, or a shadow DOM component. BeLikeNative uses a strategy hierarchy:

If the selection is inside a textarea or input, use element.setRangeText().
If the selection is inside a contenteditable element, use document.execCommand('insertText') (deprecated but still the most reliable cross-browser method).
For all other cases, use Range.deleteContents() followed by Range.insertNode().

Rate Limiting

To prevent accidental API cost spikes, BeLikeNative implements client-side rate limiting: a maximum of 20 humanization requests per minute, with a cooldown indicator in the extension popup. This protects both the user's API quota and prevents abuse in shared API key scenarios.

Summary

BeLikeNative's integration of Claude Sonnet 4.6 demonstrates a clean pattern for adding AI-powered text transformation to a Chrome extension. The key architectural decisions — background service worker for API isolation, streaming for perceived performance, and L1-aware prompt engineering for output quality — are applicable to any Chrome extension that needs to process user text through an LLM.

The dual-engine approach (GPT-5.4 mini for grammar, Claude Sonnet 4.6 for humanization) is also worth noting as a pattern: use the cheapest model that meets quality requirements for each task, rather than routing everything through a single expensive model.