Pollinations AI Samples - Technical README

This document provides a technical overview of the "Pollinations AI Samples" web application, detailing the implementation of each feature.

1. Core Technologies

Frontend Framework: React 19
State Management: Zustand
Styling: Tailwind CSS
Client-side Storage:
- IndexedDB: For persistent storage of chat sessions and dictionary history.
- localStorage: For API token and user language preference.
Icons: Lucide React
HTTP Client: Native fetch API
Bundling/Modules: ES Modules with an importmap in index.html (no build step).

2. Project Structure

The application is a single-page application (SPA) composed of several key files:

index.html: The entry point, loads Tailwind CSS, JSZip, and sets up the import map for React.
index.tsx: Renders the main App component into the DOM.
App.tsx: The root component, responsible for layout (Sidebar + Main Content) and rendering the active feature view.
store/appStore.ts: The Zustand store for global state management.
components/: Contains all React components, categorized by UI elements (ui/) or features.
utils/: Contains helper functions for API calls (api.ts), database interactions (db.ts), and internationalization (i18n.ts).
locales/: Contains JSON files for different languages (en, es, zh).
types.ts: TypeScript type definitions.
constants.tsx: Application-wide constants like navigation items and model lists.

3. Feature Implementation Details

3.1. API Abstraction (`utils/api.ts`)

All interactions with the Pollinations AI backend are centralized in utils/api.ts.

Endpoints:
- Image API: https://image.pollinations.ai
- Text/LLM API: https://text.pollinations.ai
Authentication: The API token, stored in the Zustand store (and persisted to localStorage), is retrieved via useAppStore.getState() and added to requests. For the image API, it's a query parameter ?token=.... For the text API, it's an Authorization: Bearer ... header.
Error Handling: A centralized handleResponse function checks for non-ok HTTP statuses (e.g., 4xx, 5xx). It provides specific user-friendly messages for authentication errors (401/403) and attempts to parse JSON error bodies for more detailed messages.

3.2. Text to Image (`components/ImageGen.tsx`)

API Call: generateImage(prompt, params)
Logic:
- Constructs a URL to https://image.pollinations.ai/prompt/{prompt}.
- All options (model, width, height, seed, etc.) are converted into URL query parameters.
- For the kontext model, a user-uploaded reference image is converted to a Base64 data URL and passed as the image query parameter.
- The API returns an image blob. This blob is converted into an object URL (URL.createObjectURL()) to be used as the src for an <img> tag, allowing the browser to render it without needing to store it as a Base64 string.

3.3. Image to Text (`components/ImageToText.tsx`)

API Call: openAIText(payload)
Logic:
- The user uploads an image file.
- The file is read using FileReader and converted to a Base64 data URL.
- A payload is constructed for the https://text.pollinations.ai/openai endpoint, which is a proxy for vision-capable models.
- The payload's messages array includes a text part (the prompt) and an image_url part containing the Base64 data URL.
- The response is a JSON object, and the description is extracted from choices[0].message.content.

3.4. Text to Speech (`components/TextToAudio.tsx`)

API Call: textToAudio(text, voice)
Logic:
- Constructs a URL to https://text.pollinations.ai/{encodedText}.
- The model ('openai-audio') and selected voice are passed as query parameters.
- The API returns an audio blob (MP3 format).
- This blob is converted to an object URL and used as the src for an <audio> element for playback.

3.5. Speech to Text (`components/AudioToText.tsx` & `Chat.tsx`)

API Call: transcribeAudio(payload)
Logic:
- The component accepts .wav and .mp3 files. The underlying API expects .wav.
- Client-Side Conversion: If an MP3 is uploaded, it is first converted to WAV format in the browser.
  1. The MP3 file is read into an ArrayBuffer.
  2. AudioContext.decodeAudioData() decodes the MP3 data into raw PCM audio samples.
  3. A custom helper function, pcmToWav, takes the raw samples and constructs a valid WAV file blob by manually writing the RIFF header and data chunks.
- The final WAV blob is converted to a Base64 string.
- This Base64 string is sent in the payload to the /openai endpoint inside an input_audio object.
- The transcribed text is extracted from the JSON response.

3.6. Chat (`components/Chat.tsx`)

State & Persistence:
- All chat data (sessions, messages) is managed by the Zustand store.
- The store logic (store/appStore.ts) uses helper functions from utils/db.ts to persist every change (new session, new message, model change) to IndexedDB, ensuring no data is lost on refresh.
Streaming Response:
- The handleSend function calls openAITextStream, which sets stream: true in the API payload.
- It uses the fetch API's ReadableStream to process the response.
- A TextDecoder reads the incoming chunks. As data: {...} events arrive, the JSON is parsed, and the text content (delta.content) is extracted.
- The first chunk creates a new assistant message, and subsequent chunks update it, creating a "typing" effect.
Voice Input:
- Uses the Web Audio API (AudioContext, createMediaStreamSource, createScriptProcessor).
- When recording starts, raw audio chunks (PCM data) are collected in an array.
- When recording stops, the chunks are merged, converted to a WAV blob (using pcmToWav), and sent for transcription using the same logic as the Speech to Text feature.

3.7. Multi-Step Features (Audio Translation, Animation Generation)

These features chain multiple API calls together, managing the state through each step.

Audio Translation (components/AudioTranslate.tsx):
1. Transcribe: The input audio is sent for transcription (Speech to Text).
2. Translate: The resulting text is sent to the text model with a translation prompt.
3. Synthesize: The translated text is sent to the Text to Speech API to generate the final audio.
- The UI tracks the current step (transcribing, translating, synthesizing) to provide clear feedback to the user.
Animation Generation (components/AnimationGen.tsx):
1. Storyboard Generation: A prompt is sent to the text model with a carefully engineered system message instructing it to return a JSON array of scene objects, each with a scene_description and an image_prompt.
2. Image Generation: The component iterates through the received array, calling the Text to Image API for each image_prompt sequentially. The UI updates each scene's card as its image is generated.
3. Batch Export: Uses the JSZip library (loaded from a CDN) to create a .zip file in the browser containing all generated images and a storyboard.txt file.

3.8. Advanced Prompt Engineering (Dictionary, Code/Web App Gen)

These features rely heavily on "prompt engineering" to instruct the LLM to return a valid, structured JSON or code format, which is then parsed and rendered by the frontend.

Dictionary (components/Dictionary.tsx):
- The system prompt is highly detailed, defining a specific JSON schema the AI must follow, including keys like phonetic, meanings, definitions, etymology, relatedWords, etc.
- The frontend parses this JSON and renders a rich, interactive UI.
- It integrates Text to Speech to allow users to hear the pronunciation of the word, definitions, and examples.
- Search history is persisted to IndexedDB.
Code Generation (components/CodeGen.tsx):
- The system prompt instructs the AI to act as an expert developer and return a JSON array where each object represents a file ({ fileName, code }). This allows the generation of multi-file projects from a single prompt.
Web App Generator (components/WebAppGen.tsx):
- The system prompt is strict, demanding a single, self-contained HTML file with inline <style> and <script> tags, and explicitly forbidding any explanatory text or markdown code fences.
- The raw HTML string response is rendered inside a sandboxed <iframe> using the srcDoc attribute for a safe preview.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pollinations AI Samples - Technical README

1. Core Technologies

2. Project Structure

3. Feature Implementation Details

3.1. API Abstraction (`utils/api.ts`)

3.2. Text to Image (`components/ImageGen.tsx`)

3.3. Image to Text (`components/ImageToText.tsx`)

3.4. Text to Speech (`components/TextToAudio.tsx`)

3.5. Speech to Text (`components/AudioToText.tsx` & `Chat.tsx`)

3.6. Chat (`components/Chat.tsx`)

3.7. Multi-Step Features (Audio Translation, Animation Generation)

3.8. Advanced Prompt Engineering (Dictionary, Code/Web App Gen)

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
components		components
locales		locales
store		store
utils		utils
.gitignore		.gitignore
App.tsx		App.tsx
README.md		README.md
constants.tsx		constants.tsx
index.html		index.html
index.tsx		index.tsx
metadata.json		metadata.json
package.json		package.json
tsconfig.json		tsconfig.json
types.ts		types.ts
vite.config.ts		vite.config.ts

nyr-github/pollinations-demo

Folders and files

Latest commit

History

Repository files navigation

Pollinations AI Samples - Technical README

1. Core Technologies

2. Project Structure

3. Feature Implementation Details

3.1. API Abstraction (utils/api.ts)

3.2. Text to Image (components/ImageGen.tsx)

3.3. Image to Text (components/ImageToText.tsx)

3.4. Text to Speech (components/TextToAudio.tsx)

3.5. Speech to Text (components/AudioToText.tsx & Chat.tsx)

3.6. Chat (components/Chat.tsx)

3.7. Multi-Step Features (Audio Translation, Animation Generation)

3.8. Advanced Prompt Engineering (Dictionary, Code/Web App Gen)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

3.1. API Abstraction (`utils/api.ts`)

3.2. Text to Image (`components/ImageGen.tsx`)

3.3. Image to Text (`components/ImageToText.tsx`)

3.4. Text to Speech (`components/TextToAudio.tsx`)

3.5. Speech to Text (`components/AudioToText.tsx` & `Chat.tsx`)

3.6. Chat (`components/Chat.tsx`)

Packages