Back to Blog
Mobile9 min readJune 20, 2025

React Native + AI: Building Intelligent Mobile Applications

How to integrate AI capabilities into React Native apps. Covers on-device inference with ONNX, cloud API integration, streaming chat UI, and offline-first AI patterns.

React NativeAIMobileLLMTypeScript
A

Azam

DevOps & AI Consultant

AI on Mobile: Two Architectural Patterns

Mobile AI applications fall into two architectural camps: those that process AI workloads in the cloud and stream results to the device, and those that run inference directly on-device using models optimized for mobile hardware. Each pattern has distinct tradeoffs around latency, privacy, cost, and capability.

Cloud AI gives you access to the most capable models (GPT-4o, Claude 3.5) with no device constraints, at the cost of network latency and API fees. On-device AI with models like Phi-3 Mini or Llama 3.2 runs offline and is free after deployment, but is limited to smaller models that fit in device memory.

Most production apps use both: cloud for complex tasks, on-device for fast, private, offline-capable features.

Cloud AI Integration in React Native

Never call LLM APIs directly from your React Native app — that would expose your API key in the client bundle. Instead, build a backend API that your app calls, and have the backend call the LLM provider.

// hooks/useChat.ts
import { useState } from 'react'

export function useChat() {
  const [messages, setMessages] = useState([])
  const [isLoading, setIsLoading] = useState(false)

  const sendMessage = async (content: string) => {
    setIsLoading(true)
    const userMessage = { role: 'user' as const, content }
    setMessages(prev => [...prev, userMessage])

    try {
      const response = await fetch('https://api.yourapp.com/chat', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${userToken}`
        },
        body: JSON.stringify({ messages: [...messages, userMessage] })
      })
      const data = await response.json()
      setMessages(prev => [...prev, { role: 'assistant', content: data.content }])
    } finally {
      setIsLoading(false)
    }
  }
  return { messages, sendMessage, isLoading }
}

Streaming Responses in React Native

Streaming makes AI chat feel much faster — the user sees text appearing rather than waiting for the complete response. React Native supports streaming via the Fetch API with response.body (available in React Native 0.72+).

const streamMessage = async (content: string) => {
  const response = await fetch('https://api.yourapp.com/chat/stream', {
    method: 'POST',
    body: JSON.stringify({ message: content }),
    headers: { 'Content-Type': 'application/json' }
  })

  const reader = response.body!.getReader()
  const decoder = new TextDecoder()
  let assistantMessage = ''

  setMessages(prev => [...prev, { role: 'assistant', content: '' }])

  while (true) {
    const { done, value } = await reader.read()
    if (done) break
    assistantMessage += decoder.decode(value, { stream: true })
    setMessages(prev => [
      ...prev.slice(0, -1),
      { role: 'assistant', content: assistantMessage }
    ])
  }
}

Building the Chat UI

Use FlatList with inverted prop for the message list — this ensures new messages appear at the bottom without complex scroll management. Add a typing indicator component during streaming.

import { FlatList, TextInput, TouchableOpacity } from 'react-native'

function ChatScreen() {
  const { messages, sendMessage, isLoading } = useChat()
  const [input, setInput] = useState('')

  return (
    <View style={styles.container}>
      <FlatList
        data={[...messages].reverse()}
        inverted
        keyExtractor={(_, i) => i.toString()}
        renderItem={({ item }) => <MessageBubble message={item} />}
      />
      <View style={styles.inputRow}>
        <TextInput
          value={input}
          onChangeText={setInput}
          placeholder="Ask anything..."
          style={styles.input}
        />
        <TouchableOpacity
          onPress={() => { sendMessage(input); setInput('') }}
          disabled={isLoading}
        >
          <Text>Send</Text>
        </TouchableOpacity>
      </View>
    </View>
  )
}

On-Device AI with ONNX Runtime

For features that need to work offline or handle private data, ONNX Runtime for React Native enables running quantized ML models directly on device. Common use cases: text classification, intent detection, semantic search on local data.

import { InferenceSession, Tensor } from 'onnxruntime-react-native'

const session = await InferenceSession.create('model.onnx')
const inputTensor = new Tensor('float32', inputData, [1, 512])
const output = await session.run({ input_ids: inputTensor })
  • Use INT8 quantized models — they are 4x smaller and run faster on mobile CPUs
  • Test on real devices, not simulators — performance differs dramatically
  • Bundle the model with the app for small models (<50MB); download on first launch for larger models

Performance and UX Considerations

  • Show a skeleton UI immediately while the AI call is in-flight — never show a blank screen
  • Implement request cancellation when the user navigates away mid-stream
  • Cache AI responses locally for repeated identical queries
  • Use optimistic UI updates where the response is predictable
  • Handle offline gracefully — queue requests and retry when connectivity returns

AI-enhanced mobile apps that handle the streaming, offline, and error states properly feel dramatically better than those that don't. These UX details matter as much as the underlying model quality.

Want to Build This for Your Team?

I help teams implement the patterns and architectures described in these articles. Let's talk about your project.

Book a Free Call