This document provides a technical overview of the "Pollinations AI Samples" web application, detailing the implementation of each feature.
- Frontend Framework: React 19
- State Management: Zustand
- Styling: Tailwind CSS
- Client-side Storage:
- IndexedDB: For persistent storage of chat sessions and dictionary history.
- localStorage: For API token and user language preference.
- Icons: Lucide React
- HTTP Client: Native
fetch
API - Bundling/Modules: ES Modules with an
importmap
inindex.html
(no build step).
The application is a single-page application (SPA) composed of several key files:
index.html
: The entry point, loads Tailwind CSS, JSZip, and sets up the import map for React.index.tsx
: Renders the mainApp
component into the DOM.App.tsx
: The root component, responsible for layout (Sidebar + Main Content) and rendering the active feature view.store/appStore.ts
: The Zustand store for global state management.components/
: Contains all React components, categorized by UI elements (ui/
) or features.utils/
: Contains helper functions for API calls (api.ts
), database interactions (db.ts
), and internationalization (i18n.ts
).locales/
: Contains JSON files for different languages (en, es, zh).types.ts
: TypeScript type definitions.constants.tsx
: Application-wide constants like navigation items and model lists.
All interactions with the Pollinations AI backend are centralized in utils/api.ts
.
- Endpoints:
- Image API:
https://image.pollinations.ai
- Text/LLM API:
https://text.pollinations.ai
- Image API:
- Authentication: The API token, stored in the Zustand store (and persisted to
localStorage
), is retrieved viauseAppStore.getState()
and added to requests. For the image API, it's a query parameter?token=...
. For the text API, it's anAuthorization: Bearer ...
header. - Error Handling: A centralized
handleResponse
function checks for non-ok HTTP statuses (e.g., 4xx, 5xx). It provides specific user-friendly messages for authentication errors (401/403) and attempts to parse JSON error bodies for more detailed messages.
- API Call:
generateImage(prompt, params)
- Logic:
- Constructs a URL to
https://image.pollinations.ai/prompt/{prompt}
. - All options (model, width, height, seed, etc.) are converted into URL query parameters.
- For the
kontext
model, a user-uploaded reference image is converted to a Base64 data URL and passed as theimage
query parameter. - The API returns an image blob. This blob is converted into an object URL (
URL.createObjectURL()
) to be used as thesrc
for an<img>
tag, allowing the browser to render it without needing to store it as a Base64 string.
- Constructs a URL to
- API Call:
openAIText(payload)
- Logic:
- The user uploads an image file.
- The file is read using
FileReader
and converted to a Base64 data URL. - A payload is constructed for the
https://text.pollinations.ai/openai
endpoint, which is a proxy for vision-capable models. - The payload's
messages
array includes a text part (the prompt) and animage_url
part containing the Base64 data URL. - The response is a JSON object, and the description is extracted from
choices[0].message.content
.
- API Call:
textToAudio(text, voice)
- Logic:
- Constructs a URL to
https://text.pollinations.ai/{encodedText}
. - The
model
('openai-audio') and selectedvoice
are passed as query parameters. - The API returns an audio blob (MP3 format).
- This blob is converted to an object URL and used as the
src
for an<audio>
element for playback.
- Constructs a URL to
- API Call:
transcribeAudio(payload)
- Logic:
- The component accepts
.wav
and.mp3
files. The underlying API expects.wav
. - Client-Side Conversion: If an MP3 is uploaded, it is first converted to WAV format in the browser.
- The MP3 file is read into an
ArrayBuffer
. AudioContext.decodeAudioData()
decodes the MP3 data into raw PCM audio samples.- A custom helper function,
pcmToWav
, takes the raw samples and constructs a valid WAV file blob by manually writing the RIFF header and data chunks.
- The MP3 file is read into an
- The final WAV blob is converted to a Base64 string.
- This Base64 string is sent in the payload to the
/openai
endpoint inside aninput_audio
object. - The transcribed text is extracted from the JSON response.
- The component accepts
- State & Persistence:
- All chat data (sessions, messages) is managed by the Zustand store.
- The store logic (
store/appStore.ts
) uses helper functions fromutils/db.ts
to persist every change (new session, new message, model change) to IndexedDB, ensuring no data is lost on refresh.
- Streaming Response:
- The
handleSend
function callsopenAITextStream
, which setsstream: true
in the API payload. - It uses the
fetch
API'sReadableStream
to process the response. - A
TextDecoder
reads the incoming chunks. Asdata: {...}
events arrive, the JSON is parsed, and the text content (delta.content
) is extracted. - The first chunk creates a new assistant message, and subsequent chunks update it, creating a "typing" effect.
- The
- Voice Input:
- Uses the Web Audio API (
AudioContext
,createMediaStreamSource
,createScriptProcessor
). - When recording starts, raw audio chunks (PCM data) are collected in an array.
- When recording stops, the chunks are merged, converted to a WAV blob (using
pcmToWav
), and sent for transcription using the same logic as the Speech to Text feature.
- Uses the Web Audio API (
These features chain multiple API calls together, managing the state through each step.
-
Audio Translation (
components/AudioTranslate.tsx
):- Transcribe: The input audio is sent for transcription (Speech to Text).
- Translate: The resulting text is sent to the text model with a translation prompt.
- Synthesize: The translated text is sent to the Text to Speech API to generate the final audio.
- The UI tracks the current step (
transcribing
,translating
,synthesizing
) to provide clear feedback to the user.
-
Animation Generation (
components/AnimationGen.tsx
):- Storyboard Generation: A prompt is sent to the text model with a carefully engineered system message instructing it to return a JSON array of scene objects, each with a
scene_description
and animage_prompt
. - Image Generation: The component iterates through the received array, calling the Text to Image API for each
image_prompt
sequentially. The UI updates each scene's card as its image is generated. - Batch Export: Uses the
JSZip
library (loaded from a CDN) to create a.zip
file in the browser containing all generated images and astoryboard.txt
file.
- Storyboard Generation: A prompt is sent to the text model with a carefully engineered system message instructing it to return a JSON array of scene objects, each with a
These features rely heavily on "prompt engineering" to instruct the LLM to return a valid, structured JSON or code format, which is then parsed and rendered by the frontend.
-
Dictionary (
components/Dictionary.tsx
):- The system prompt is highly detailed, defining a specific JSON schema the AI must follow, including keys like
phonetic
,meanings
,definitions
,etymology
,relatedWords
, etc. - The frontend parses this JSON and renders a rich, interactive UI.
- It integrates Text to Speech to allow users to hear the pronunciation of the word, definitions, and examples.
- Search history is persisted to IndexedDB.
- The system prompt is highly detailed, defining a specific JSON schema the AI must follow, including keys like
-
Code Generation (
components/CodeGen.tsx
):- The system prompt instructs the AI to act as an expert developer and return a JSON array where each object represents a file (
{ fileName, code }
). This allows the generation of multi-file projects from a single prompt.
- The system prompt instructs the AI to act as an expert developer and return a JSON array where each object represents a file (
-
Web App Generator (
components/WebAppGen.tsx
):- The system prompt is strict, demanding a single, self-contained HTML file with inline
<style>
and<script>
tags, and explicitly forbidding any explanatory text or markdown code fences. - The raw HTML string response is rendered inside a sandboxed
<iframe>
using thesrcDoc
attribute for a safe preview.
- The system prompt is strict, demanding a single, self-contained HTML file with inline