Building a Video Call App with Filters

Real-time filters in Google Meet, Discord, and WhatsApp calls (background blur, virtual backgrounds, the subtle zoom that keeps you centered) feel like native-only magic. They aren't. The same effects can be built in React Native with VisionCamera and frame processors. Here's how.

At a high level, one swap unlocks everything. VisionCamera takes over the camera, a native engine composites the effects onto every frame, and each processed frame is injected into the video track LiveKit publishes:

VisionCamera ──► vc-engine (native effects) ──► LiveKit VideoSource ──► WebRTC encoder ──► peers

Everything downstream of the engine (LiveKit's encoder, transport, and signaling) stays exactly as it ships; only the pixels change. The rest of this post builds that pipeline up one piece at a time.

The Backend

The backend here is deliberately boring, so the interesting work can happen in the app.

It runs on LiveKit, a popular open-source project for scalable, multi-user conferencing built on WebRTC. LiveKit wins over raw WebRTC for one reason: it keeps the backend simple, leaving more time for the app itself. The server is a small Fastify app with two routes (one to create a room, one to join it) backed by standard LiveKit SDK calls. joinRoom also returns the public URL and access token the client needs to connect, and the whole thing deploys with a simple Docker Compose file. That's as far as this post goes on the backend; the rest would only pad it out.

A Basic Video Call

On the client, a fresh React Native CLI app with @livekit/react-native is enough for a working call:

JSX

export default function App() {
 return (
    <LiveKitRoom
      serverUrl={wsURL}
      token={token}
      connect={true}
      audio={true}
      video={true}
    >
      <RoomView />
    </LiveKitRoom>
  );
};

const RoomView = () => {
  const tracks = useTracks([Track.Source.Camera]);
  const renderTrack = ({item}) => {
    if(isTrackReference(item)) {
      return (<VideoTrack trackRef={item} style={styles.participantView} />)
    } else {
      return (<View style={styles.participantView} />)
    }
  };
  return (
    <View style={styles.container}>
      <FlatList
        data={tracks}
        renderItem={renderTrack}
      />
    </View>
  );
};

That minimal setup is already a working video call. The catch shows up the moment you want to do more with it: the LiveKit RN SDK owns its camera and exposes almost no control over it. No filters, no advanced camera features. A handful of video presets is about it.

Which raises the obvious question: what if a proper camera library drove the call instead? And what's better for that than VisionCamera, the react-native-vision-camera library?

💡

LiveKit's WebRTC layer can transform its own camera frames, but only through a video processor written and registered on the native side; there's no real JS API for it. Even then, the camera underneath stays WebRTC's barely-configurable one, with none of what VisionCamera offers for free: constraints, multiple outputs, a full-resolution photo mid-call, and more.

How LiveKit Injects Frames

Replacing the camera starts with understanding how the SDK feeds frames in the first place. On React Native, LiveKit uses @livekit/react-native-webrtc (a fork of react-native-webrtc) for all of its WebRTC, audio, and camera plumbing. The goal is to replace only the camera, leaving everything else untouched.

That plumbing lives entirely on the native side. When the SDK creates a camera track, it builds a WebRTC video source, starts the native camera capturer, and (the key detail) wires the capturer's frames into that source.

On Android, GetUserMediaImpl.createVideoTrack() hands the capturer the source's CapturerObserver:

android/src/main/java/com/oney/WebRTCModule/GetUserMediaImpl.java

Java • L421-L422

View on GitHub

421VideoSource videoSource = pcFactory.createVideoSource(videoCapturer.isScreencast());

422videoCapturer.initialize(surfaceTextureHelper, reactContext, videoSource.getCapturerObserver());

The line that matters is videoCapturer.initialize(..., videoSource.getCapturerObserver()). From there, every camera frame reaches the encoder through a single call: videoSource.getCapturerObserver().onFrameCaptured(frame).

On iOS, it's the same dance with different vocabulary. An RTCVideoSource already conforms to RTCVideoCapturerDelegate, so the capturer is created with the source as its delegate, initWithDelegate:videoSource:

ios/RCTWebRTC/WebRTCModule+RTCMediaStream.m

objectivec • L146-L152

View on GitHub

146RTCVideoSource *videoSource = [self.peerConnectionFactory videoSource];

......

152RTCCameraVideoCapturer *videoCapturer = [[RTCCameraVideoCapturer alloc] initWithDelegate:videoSource];

Every frame then reaches the source through the delegate method [videoSource capturer:capturer didCaptureVideoFrame:frame], the iOS twin of Android's onFrameCaptured.

That observer (Android) / delegate (iOS) is the entire seam. The video source doesn't care who feeds it frames; the camera capturer is just the default producer. Grab a reference to it, push frames in directly, and LiveKit will encode and ship them as if they came straight from the camera. The plan follows from there:

Let VisionCamera own the real camera.
Push each finished frame into videoSource.getCapturerObserver().onFrameCaptured(...), bypassing WebRTC's camera while leaving its encoder, transport, and signaling untouched.

Taking Over the Camera with VisionCamera

Step one is to disable LiveKit's camera with video={false}, then mount a VisionCamera frame output in its place:

JSX

import { Engine } from 'react-native-vc-engine';

const frameOutput = useFrameOutput({
    pixelFormat: 'yuv',
    dropFramesWhileBusy: true,
    onFrame: (frame) => {
      'worklet';
      try {
        Engine.forwardFrame(frame);
      } finally {
        frame.dispose();
      }
    },
});

return (
  <Camera
    device={device}
    isActive
    outputs={[frameOutput]}
    constraints={[{ resolutionBias: frameOutput }, { fps: CAPTURE.fps }]}
  />
);

This is ordinary VisionCamera v5: constraints and a frame output. The one new piece is Engine.forwardFrame.

react-native-vc-engine is a frame-processor Nitro module (in VisionCamera v5, every frame processor is one). forwardFrame is its single entry point, the function the worklet calls on every frame. Inside, it does the heavy lifting (the format conversion and filters covered below), and as its final step calls push (its real name in the engine), handing the finished frame to LiveKit's video source through the same onFrameCaptured (Android) / capturer delegate (iOS) seam from earlier:

Android

Kotlin

fun push(frame: VideoFrame): Boolean {
  val id = activeTrackId ?: return false
  val source = sources[id] ?: return false
  source.capturerObserver.onFrameCaptured(frame)
  return true
}

iOS

Swift

func push(pixelBuffer: CVPixelBuffer, rotation: RTCVideoRotation, timestampNs: Int64) -> Bool {
  guard let source = activeSource, let capturer = activeCapturer else { return false }
  let buffer = RTCCVPixelBuffer(pixelBuffer: pixelBuffer)
  let frame = RTCVideoFrame(buffer: buffer, rotation: rotation, timeStampNs: timestampNs)
  source.capturer(capturer, didCapture: frame)
  return true
}

But where do sources and activeTrackId come from? push routes each frame to sources[activeTrackId], so a real WebRTC source has to land in that map first. That's the other half of the bridge, and it runs once when the call starts.

Where the Source Comes From

From JS, RtcBridge (vc-engine's control-plane companion to Engine, exported from the same react-native-vc-engine module) is asked to mint a track, mark it active, and publish it like any other:

TypeScript

const info = RtcBridge.createTrack(width, height, fps);
RtcBridge.setActiveTrackId(info.trackId);
// wrap info.trackId as a MediaStreamTrack and publishTrack() it

createTrack is where sources gets filled. react-native-webrtc never exposes its PeerConnectionFactory, so the bridge reaches into the running WebRTCModule and borrows it: through reflection on Android, KVC on iOS. It then builds a source-backed track and registers it twice: once with LiveKit so the track is publishable, once with the engine's own registry so frames can find the source.

Android

Kotlin

val factory = webRTC.peerConnectionFactory()       // reflected out of WebRTCModule.mFactory
val source = factory.createVideoSource(false)       // our own source
val track = factory.createVideoTrack(trackId, source)
webRTC.registerLocalTrack(trackId, track, source)   // into GetUserMediaImpl.tracks → publishable
VCTrackRegistry.register(trackId, source)           // ← this IS the `sources` map

iOS

Swift

let factory = module.value(forKey: "peerConnectionFactory") as? RTCPeerConnectionFactory
let source = factory.videoSource()                  // our own source
let track = factory.videoTrack(with: source, trackId: trackId)
localTracks[trackId] = track                         // into WebRTCModule.localTracks → publishable
let capturer = RTCVideoCapturer(delegate: source)
registry.register(trackId: trackId, source: source, capturer: capturer)  // ← the `sources` map

So sources is just a trackId → VideoSource map, and each source is a genuine WebRTC source created from LiveKit's own factory, which is precisely why a frame pushed into it lands in LiveKit's encoder. activeTrackId is whatever JS last passed to setActiveTrackId, so push always targets the source LiveKit is currently publishing.

That's the whole trick: the engine's frames flow through LiveKit's encoder and out to every peer, on a track LiveKit publishes exactly like any camera track. (They're just converted frames so far; the effects come next.)

Matching the Format WebRTC Expects

There's one catch before any of this works: frames can't be forwarded as-is. Each platform's encoder expects a specific pixel format, and converting to it is forwardFrame's real job.

VisionCamera can hand back frames in a few formats; we ask for YUV (pixelFormat: 'yuv'), the one closest to what WebRTC wants. Even so, the exact target differs per platform:

            VisionCamera gives      we convert to            WebRTC wants
            ──────────────────      ─────────────            ────────────
 Android    YUV_420_888 planes  ──► I420 (3 planes)      ──► VideoFrame(JavaI420Buffer)
 iOS        NV12 planes         ──► NV12 CVPixelBuffer    ──► RTCVideoFrame(RTCCVPixelBuffer)

On Android, WebRTC's encoder ultimately calls toI420() on whatever buffer it's handed, so handing it a ready-made one is the cheapest option. The VideoFrame.I420Buffer contract is simply Y/U/V planes plus their strides:

sdk/android/api/org/webrtc/VideoFrame.java

Java • L78-L109

View on GitHub

78public interface I420Buffer extends Buffer {

......

90 @CalledByNative("I420Buffer") ByteBuffer getDataY();

......

97 @CalledByNative("I420Buffer") ByteBuffer getDataU();

......

104 @CalledByNative("I420Buffer") ByteBuffer getDataV();

......

106 @CalledByNative("I420Buffer") int getStrideY();

107 @CalledByNative("I420Buffer") int getStrideU();

108 @CalledByNative("I420Buffer") int getStrideV();

109}

YUV_420_888 is almost that, but its chroma may be semi-planar (NV12/NV21) with row padding, so the planes are copied into a real JavaI420Buffer, de-interleaving UV when needed:

Kotlin

val i420: JavaI420Buffer = JavaI420Buffer.allocate(w, h)
copyPlane(y, yStride, i420.dataY, i420.strideY, w, h, 1)            // Y, as-is
if (uvPixelStride == 1) {                                          // already planar
  copyPlane(u, uStride, i420.dataU, i420.strideU, cw, ch, 1)
  copyPlane(v, vStride, i420.dataV, i420.strideV, cw, ch, 1)
} else {                                                           // semi-planar -> split
  deinterleaveUv(u, uStride, uvPixelStride, i420.dataU, i420.dataV, i420.strideU, cw, ch)
}

On iOS, WebRTC accepts a CVPixelBuffer directly through RTCCVPixelBuffer, and the camera already produces NV12:

sdk/objc/components/video_frame_buffer/RTCCVPixelBuffer.h

C • L20-L22

View on GitHub

20@interface RTC_OBJC_TYPE (RTCCVPixelBuffer) : NSObject <RTC_OBJC_TYPE(RTCVideoFrameBuffer)>

22@property(nonatomic, readonly) CVPixelBufferRef pixelBuffer;

A fresh full-range NV12 CVPixelBuffer is rebuilt, with the Y and CbCr planes memcpy'd in:

Swift

var out: CVPixelBuffer?
CVPixelBufferCreate(kCFAllocatorDefault, width, height,
                    kCVPixelFormatType_420YpCbCr8BiPlanarFullRange, attrs, &out)
guard let dst = out else { return nil }
CVPixelBufferLockBaseAddress(dst, [])

memcpy(CVPixelBufferGetBaseAddressOfPlane(dst, 0), yPlane, ySize)    // Y    -> plane 0
memcpy(CVPixelBufferGetBaseAddressOfPlane(dst, 1), uvPlane, uvSize)  // CbCr -> plane 1

CVPixelBufferUnlockBaseAddress(dst, [])
return dst   // the engine reads from this; push() emits its result as an RTCCVPixelBuffer

💡

Wondering about that copy? The camera's frame is already an NV12 CVPixelBuffer, so forwarding it untouched would be zero-copy. The engine copies for two reasons: it reads every frame as planes for one API across both platforms (Android has no CVPixelBuffer), and it runs filters on each one (hold your horses, that's coming next), producing its own output buffer regardless.

The Complete Path

With the conversion in place, frames flow from VisionCamera all the way to every peer. Here is the full journey of a single frame:

┌──────────────┐         ┌────────────────────┐         ┌─────────────────────┐
│ VisionCamera │ ──────► │ vc-engine (native) │ ──────► │ LiveKit VideoSource │ ──► peers
│    'yuv'     │         │   convert → push   │         │  encoder·transport  │
└──────────────┘         └────────────────────┘         └─────────────────────┘

Each hop is a single call: Engine.forwardFrame(frame) from the worklet, then the engine's push into the source via onFrameCaptured / didCaptureVideoFrame. The source itself is minted once by RtcBridge.createTrack(), borrowing WebRTC's own PeerConnectionFactory.

And that path is the real unlock, not the finish line. Once every frame runs through vc-engine on its way to the encoder, the call's video is entirely ours: anything that can be done to a frame in a few milliseconds can be done to a live call, with no cooperation from LiveKit and no second camera. Blur the room away, drop in a beach, keep the speaker centered as they move, paint on top of the video as it streams: same pipeline, wildly different results. That's what the rest of this post is about.

Adding Filters

Each filter is a frame transformed before it reaches LiveKit, and there are two reasonable places to do that work: inside vc-engine, or in a separate frame processor (another Nitro module) that returns the frame to JS, which then pushes it to the engine. This project uses the first approach (fewer hops between native and JS, and no extra Nitro module), but both are valid; it comes down to the use case.

Background Blur

Blurring a call isn't "blur the whole frame." It's keeping the person sharp while blurring everything behind them. That breaks into three jobs:

a mask: which pixels are the person,
a blurred copy of the frame,
a composite: take the sharp pixel where the mask says "person", the blurred one everywhere else.

camera ─┬─► segment ─► mask ────┐
        └─► blur ────► blurred ─┴─► composite ─► out

1. The mask. There's no need to hand-roll segmentation; each platform ships a good one. Android uses MediaPipe's selfie ImageSegmenter; iOS uses Vision's VNGeneratePersonSegmentationRequest (free, and it runs on the Neural Engine).

Kotlin

// Android — MediaPipe
val options = ImageSegmenter.ImageSegmenterOptions.builder()
  .setBaseOptions(BaseOptions.builder().setModelAssetPath("selfie_segmenter.tflite").build())
  .setRunningMode(RunningMode.VIDEO)
  .setOutputConfidenceMasks(true)        // output a 0..1 person-confidence mask, not a hard cutout
  .build()

Swift

// iOS — Vision
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .balanced         // good edges, runs on the ANE
request.outputPixelFormat = kCVPixelFormatType_OneComponent8

The catch: inference takes roughly 20 to 30 ms, too slow to run inline at 30 fps. So segmentation runs off the frame thread, latest frame wins: it gets a downscaled frame, the pipeline keeps compositing with whatever mask finished most recently, and nothing blocks. A mask that's a frame or two stale is invisible; a face barely moves between frames.

Kotlin

fun submit(bitmap: Bitmap) {
  if (!busy.compareAndSet(false, true)) { bitmap.recycle(); return }   // busy → drop
  executor.execute {
    val masks = seg.segmentForVideo(BitmapImageBuilder(bitmap).build(), ts).confidenceMasks()
    latest = Mask(readMask(masks[0]), masks[0].width, masks[0].height)  // newest wins
    busy.set(false)
  }
}

2. The blur. A full-resolution Gaussian on every frame is too expensive, so both platforms cheat the same way: blur small, scale up. Shrink the frame, blur the tiny copy, and stretch it back; the upscale's interpolation does most of the smoothing for free.

Kotlin

// Android (CPU) — average down to a small grid, then bilinear-upsample
val dw = w / blurDivisor; val dh = h / blurDivisor
// downsample: each cell = the average of its source block → small[dw * dh]
// then upsample back to full size, interpolating between cells:
dst[i] = bilerp(small[c00], small[c10], small[c01], small[c11], wx, wy)

Swift

// iOS (Metal) — two stock Metal Performance Shaders (MPS) kernels, created
// once and re-encoded into the same command buffer as the composite:
downscale.encode(...)    // MPSImageBilinearScale: camera plane → 1/4-res scratch
gaussian.encode(...)     // MPSImageGaussianBlur: blur the 1/4-res copy
// blurring 1/16th of the pixels ≈ a big-radius full-res blur, for a fraction of the cost

3. The composite. Now use the mask to pick per pixel: the person from the sharp frame, the background from the blurred one.

Kotlin

// Android — a per-pixel branch
outArgb[i] = if (mask[i] > 0.5f) camera[i] else blurred[i]

C++

// iOS — one fused Metal kernel per plane; a smooth mix() so edges feather instead of stair-stepping
float fg = camY.sample(s, camUV).r;                    // sharp camera
float bg = bgY.sample(s, camUV).r;                     // blurred camera
value    = mix(bg, fg, mask.sample(s, orientedUV).r);  // person where mask ≈ 1

And that's the whole filter. The neat part: swap step 2's "blurred copy" for a solid color or an image, and virtual backgrounds come for free: same mask, same composite, just a different thing to blend behind the person.

Back to LiveKit. All of this still happens inside forwardFrame, on the frame-processor worklet thread: once the composite finishes, the engine pushes its result the same way the no-effect path did. The composite just wrote a brand-new frame: on Android into outArgb, on iOS into the buffer Metal rendered to. That frame goes straight into the same push() from earlier: Android packs outArgb back into a JavaI420Buffer, iOS wraps Metal's output in an RTCCVPixelBuffer, and both land on the onFrameCaptured / didCaptureVideoFrame seam.

Kotlin

// Android — composite (outArgb) → I420 → push()
pushI420(argbToI420(outArgb, w, h), rotation, timestampNs)

Swift

// iOS — Metal's output; rotation already baked in, so push upright with a monotonic stamp
push(pixelBuffer: composited, rotation: ._0, timestampNs: monoTsNs)

The two platforms even differ in how they orient the frame. iOS bakes rotation into the Metal sampling pass, so it pushes an already-upright frame tagged ._0, with a monotonic timestamp. VisionCamera's frame.timestamp isn't monotonic-ns, and WebRTC's encoder drops frames fed a non-increasing one. Android instead passes the frame's real rotation as VideoFrame metadata and forwards the camera timestamp, letting WebRTC rotate downstream.

And that finally answers the memcpy from the format section. The engine never forwards the camera's buffer untouched. It reads each frame into its own working buffer, composites a brand-new frame on top, and emits that. There's no pristine zero-copy frame to preserve, so the conversion isn't wasted work; it's just the engine handing back its own result in the layout the encoder wants: JavaI420Buffer on Android, RTCCVPixelBuffer on iOS.

Color and Photo Backgrounds

Blur was really just "mask + composite + something behind the person." Make that something a flat color or a decoded image, and two more filters fall out for almost no code. Only the background fill changes; the mask, composite, and push stay identical:

Kotlin

when (backgroundMode) {
  COLOR -> bgArgb.fill(color)          // solid color
  IMAGE -> System.arraycopy(photo, …)  // a decoded image, scaled once
  else  -> blurInto(camera, bgArgb)    // blur
}

On iOS it's the same fused Metal kernel, where bg is simply picked by mode before the mix:

C++

// iOS — one composite kernel, background chosen per mode
float bg = u.bgColor.x;                               // solid color
if (u.bgKind == 1)      bg = bgY.sample(s, camUV).r;      // blurred camera
else if (u.bgKind == 2) bg = bgY.sample(s, orientedUV).r; // baked image texture
value = mix(bg, fg, m);

Blur and Solid-color backgrounds

Center Stage

Apple's Center Stage keeps you framed as you move around, and the same effect runs on both platforms. There's no background swap this time. Instead, the face is detected (ML Kit on Android, Vision on iOS), a crop is sized around it, that crop is smoothed over time so it glides, and the frame is resampled through it. The zoom follows the face's area:

Kotlin

// Android — face too small → ease toward the target area, damped by strength (~0.5), capped at 2.2×
if (faceArea < 0.12f) targetZoom = min(1f + (sqrt(0.28f / faceArea) - 1f) * strength, 2.2f)

Swift

// iOS — same math, line for line (Vision provides the face box; the rest is shared)
if areaFraction < 0.12 { targetZoom = min(1 + (sqrt(0.28 / areaFraction) - 1) * strength, 2.2) }

Center Stage runs through the same composite path as everything else; the only new input is that crop rectangle.

Center Stage auto-zoom keeping the subject framed

Every filter so far (blur, color, photo backgrounds, Center Stage) is the same path, differing only in what feeds the composite. Here's the complete path from earlier, now with that filter stage drawn in:

┌──────────────┐     ┌──────────────────────────────────────┐     ┌─────────────────────┐
│ VisionCamera │ ──► │ vc-engine (native)                   │ ──► │ LiveKit VideoSource │ ──► peers
│    'yuv'     │     │  convert → composite → push          │     │  encoder·transport  │
└──────────────┘     └──────────────────────────────────────┘     └─────────────────────┘
                         │
                         └─ composite = mix(background, camera, mask)
                              • mask        ← segmentation, off-thread · latest frame wins
                              • background  ← blur · color · image (· crop for Center Stage)

Swap what background and mask resolve to and a different filter falls out; the plumbing never changes.

Air Draw

Every filter so far has been pure pixel math. Air Draw is the odd one out: it takes input. A finger scribbles on the screen, those strokes are rendered with Skia, and the result is composited straight onto the frame before it's pushed. Because the drawing is baked into the published track (not a local overlay), everyone on the call sees it, live.

It's the background pipeline flipped: instead of blending something behind the person, a Skia-drawn stroke layer is blended over the frame.

camera ─► [effects] ─► draw Skia strokes on top ─► push

Air Draw: sketching Margelo Logo on a live call with Skia, baked into the stream so the other participant sees it

The clip above was captured live on my own phone, on a real call.

Same forwardFrame plumbing, a Skia canvas for the strokes, and the video call doubles as a shared whiteboard.

Performance

The engine follows one rule: never make the camera wait. forwardFrame is synchronous (convert, composite, push, return) and does the minimum; everything expensive runs elsewhere. Segmentation and face detection run on their own queues, latest frame wins, and are never awaited, so the published video stays pinned to camera frame rate no matter how slow a given inference is. With dropFramesWhileBusy: true on the frame output, the camera sheds frames under load instead of queueing them.

The platform split keeps that path cheap. iOS does the pixels on the GPU: one Metal command buffer per frame, NV12 throughout the frame path (no per-frame RGB round-trip), written zero-copy into the encoder. Android stays in plain Kotlin on the CPU for readability, with the same math ready to drop into NEON SIMD or a Vulkan / GL compute shader when more headroom is needed. Output geometry is constant, so WebRTC never rescales and toggling an effect never reconfigures the encoder.

                iOS                                  Android
 ingest         rebuild NV12 CVPixelBuffer           build I420 from planes
 segmentation   Vision, off-thread, latest wins      MediaPipe, off-thread, latest wins
 compositing    one fused Metal pass/plane (GPU)     per-pixel loop in Kotlin (CPU)
 color space    YCbCr end-to-end (no RGB)            ARGB for the composite
 to encoder     pooled NV12, GPU-written, zero-copy  JavaI420Buffer
 output         HD portrait                          VGA

These numbers come from a deliberately honest setup. We were in Asia and hosted the LiveKit server on a small VM in a US region, so every call crossed the planet and ran under real, worst-case network latency rather than localhost. To measure frame rate we used modest clients, a 2021 iPhone 13 and a low-end Samsung Galaxy F14:

iPhone 13 (GPU / Metal) carries every effect at the full 30 fps capture rate.
Samsung Galaxy F14 (CPU / Kotlin) scales with the per-frame work: ~30 fps while forwarding or with a light effect, ~25 fps with Center Stage, easing to ~15-20 fps under the heaviest path (center stage + segmentation + blur + composite) on a budget chip.

That Android drop is the whole reason the CPU math is written to move to NEON SIMD or a GPU compute shader when a device needs the headroom; nothing else in the pipeline changes.

Final Thoughts

Swapping LiveKit's camera for VisionCamera turns a fixed video call into a fully programmable one. Every effect here (blur, virtual backgrounds, Center Stage, Air Draw) rides on the same foundation: own the camera with VisionCamera, modify the frame in a native engine, and push it into the WebRTC source LiveKit already publishes. The transport, encoding, and signaling never change; only the pixels do.

From there, a new filter is rarely more than a new way to fill or composite a frame. The hard part (getting a custom frame onto the wire at all) is already done.

We'll be open-sourcing the full example soon, so you can clone it and try all of this for yourself.

📩

Want real-time camera filters like these (background blur, virtual backgrounds, Center Stage, live drawing) running at full frame rate in your own app? Fast, native-quality video is exactly the kind of work we do at Margelo. Reach out and we'll help you ship it.

Ritesh ShuklaSoftware Engineer @ Margelo

React NativeVisionCameraWebRTC

Share this article

Post on X LinkedIn

Building a Video Call App with Filters

The Backend

A Basic Video Call

How LiveKit Injects Frames

Taking Over the Camera with VisionCamera

Where the Source Comes From

Matching the Format WebRTC Expects

The Complete Path

Adding Filters

Background Blur

Color and Photo Backgrounds

Center Stage

Air Draw

Performance

Final Thoughts

QR and Barcode Scanning in React Native with VisionCamera V5

What's New in VisionCamera V5?

Four Years of React Native Quick Crypto: From Wallets to Node Parity