React Native + AI: Building Intelligent Mobile Applications

AI on Mobile: Two Architectural Patterns

Mobile AI applications fall into two architectural camps: those that process AI workloads in the cloud and stream results to the device, and those that run inference directly on-device using models optimized for mobile hardware. Each pattern has distinct tradeoffs around latency, privacy, cost, and capability.

Cloud AI gives you access to the most capable models (GPT-4o, Claude 3.5) with no device constraints, at the cost of network latency and API fees. On-device AI with models like Phi-3 Mini or Llama 3.2 runs offline and is free after deployment, but is limited to smaller models that fit in device memory.

Most production apps use both: cloud for complex tasks, on-device for fast, private, offline-capable features.

Cloud AI Integration in React Native

Never call LLM APIs directly from your React Native app — that would expose your API key in the client bundle. Instead, build a backend API that your app calls, and have the backend call the LLM provider.

// hooks/useChat.ts
import { useState } from 'react'

export function useChat() {
  const [messages, setMessages] = useState([])
  const [isLoading, setIsLoading] = useState(false)

  const sendMessage = async (content: string) => {
    setIsLoading(true)
    const userMessage = { role: 'user' as const, content }
    setMessages(prev => [...prev, userMessage])

    try {
      const response = await fetch('https://api.yourapp.com/chat', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${userToken}`
        },
        body: JSON.stringify({ messages: [...messages, userMessage] })
      })
      const data = await response.json()
      setMessages(prev => [...prev, { role: 'assistant', content: data.content }])
    } finally {
      setIsLoading(false)
    }
  }
  return { messages, sendMessage, isLoading }
}

Streaming Responses in React Native

Streaming makes AI chat feel much faster — the user sees text appearing rather than waiting for the complete response. React Native supports streaming via the Fetch API with response.body (available in React Native 0.72+).

const streamMessage = async (content: string) => {
  const response = await fetch('https://api.yourapp.com/chat/stream', {
    method: 'POST',
    body: JSON.stringify({ message: content }),
    headers: { 'Content-Type': 'application/json' }
  })

  const reader = response.body!.getReader()
  const decoder = new TextDecoder()
  let assistantMessage = ''

  setMessages(prev => [...prev, { role: 'assistant', content: '' }])

  while (true) {
    const { done, value } = await reader.read()
    if (done) break
    assistantMessage += decoder.decode(value, { stream: true })
    setMessages(prev => [
      ...prev.slice(0, -1),
      { role: 'assistant', content: assistantMessage }
    ])
  }
}

Building the Chat UI

Use FlatList with inverted prop for the message list — this ensures new messages appear at the bottom without complex scroll management. Add a typing indicator component during streaming.

import { FlatList, TextInput, TouchableOpacity } from 'react-native'

function ChatScreen() {
  const { messages, sendMessage, isLoading } = useChat()
  const [input, setInput] = useState('')

  return (
    <View style={styles.container}>
      <FlatList
        data={[...messages].reverse()}
        inverted
        keyExtractor={(_, i) => i.toString()}
        renderItem={({ item }) => <MessageBubble message={item} />}
      />
      <View style={styles.inputRow}>
        <TextInput
          value={input}
          onChangeText={setInput}
          placeholder="Ask anything..."
          style={styles.input}
        />
        <TouchableOpacity
          onPress={() => { sendMessage(input); setInput('') }}
          disabled={isLoading}
        >
          <Text>Send</Text>
        </TouchableOpacity>
      </View>
    </View>
  )
}

On-Device AI with ONNX Runtime

For features that need to work offline or handle private data, ONNX Runtime for React Native enables running quantized ML models directly on device. Common use cases: text classification, intent detection, semantic search on local data.

import { InferenceSession, Tensor } from 'onnxruntime-react-native'

const session = await InferenceSession.create('model.onnx')
const inputTensor = new Tensor('float32', inputData, [1, 512])
const output = await session.run({ input_ids: inputTensor })

Use INT8 quantized models — they are 4x smaller and run faster on mobile CPUs
Test on real devices, not simulators — performance differs dramatically
Bundle the model with the app for small models (<50MB); download on first launch for larger models

Performance and UX Considerations

Show a skeleton UI immediately while the AI call is in-flight — never show a blank screen
Implement request cancellation when the user navigates away mid-stream
Cache AI responses locally for repeated identical queries
Use optimistic UI updates where the response is predictable
Handle offline gracefully — queue requests and retry when connectivity returns

AI-enhanced mobile apps that handle the streaming, offline, and error states properly feel dramatically better than those that don't. These UX details matter as much as the underlying model quality.

React Native + AI: Building Intelligent Mobile Applications

AI on Mobile: Two Architectural Patterns

Cloud AI Integration in React Native

Streaming Responses in React Native

Building the Chat UI

On-Device AI with ONNX Runtime

Performance and UX Considerations

Everyday Car Auctions — UK Auction Platform

Deploying React Native Apps to App Store and Play Store with Expo

Flutter in Production: Building Cross-Platform Apps That Feel Native

Want to Build This for Your Team?