An app for visualization of document corpora (collections) using D3.js
👉 Available online at pokusew-corpus-viz.netlify.app
The code is written in TypeScript, D3.js and React.js. See more in the Architecture section.
Note: The initial version is finished! 🚀
Next steps: Finish documentation 📖 and refactor 🧹 some parts of the code.
See 👉 Final Report – Visualization of a Document Corpus on Google Docs.
Currently, it is a client-side-only application (SPA). It runs completely in the browser.
The code is written in TypeScript, D3.js and React.js.
The project has just a few production dependencies. Everything else is implemented from scratch.
There is also a separate data preprocessing pipeline which is implemented in Python 3.
See data-preprocessing directory that contains its own README with more info.
The app/data directory (versioned in Git) contains already preprocessed data of some document collections.
The web app source code is in the app directory. Some directories contain feature-specific READMEs. The following diagram briefly describes the main directories and files:
. (project root dir)
├── .github - GitHub config (GitHub Actions)
├── app - the app source code
│ ├── components - React components for the the main app logic, UI, state, views, plot wrappers
│ ├── core - D3.js scatterplot and wordcloud, data loading
│ ├── data - data for to visualize - stored results of the data preprocessing pipeline
│ ├── helpers - various common functions
│ ├── images - the PWA app icon and SVG UI icons
│ ├── styles - app styles written in Sass (SCSS)
│ ├── sw - the service worker that handles precaching app shell (not fully integrated)
│ ├── _headers - Netlify HTTP headers customization
│ ├── _redirects - Netlify HTTP redirects/rewrites customization
│ ├── index.js - the app starting point (entrypoint)
│ ├── manifest.json - a web app manifest for PWA
│ ├── robots.txt
│ ├── routes.ts - app routes definitions
│ ├── template.ejs - index.html template to be built by webpack
│ └── types.js - data, state and API types
├── data-preprocessing - Python scripts used for data preprocessing
├── test - a few tests
├── tools - custom webpack plugins
├── types - TypeScript declarations for non-code imports (SVG, MP3)
├── .browserslistrc - Browserslist config
├── .eslintrc.js - ESLint config
├── .nvmrc - Node.js version specification for Netlify
├── ava.config.js - AVA config
├── babel.config.js - Babel config
├── netlify.toml - Netlify main config
├── package.json
├── babel.config.js - PostCSS config
├── tsconfig.json - main TypeScript config
├── webpack.config.*.js - webpack configs
└── yarn.lock
- Node.js >=18.x
- Yarn 1.x
- You can follow this Node.js Development Setup guide.
- Install all dependencies with Yarn (run
yarn
). - You are ready to go.
- Use
yarn start
to start dev server with HMR. - Then open
http://localhost:3000/
in the browser.
-
yarn start
– Starts a webpack development server with HMR (hot module replacement). -
yarn build
– Builds the production version and outputs todist
dir. Note: Before running an actual build,dist
dir is purged. -
yarn analyze
– Same asyarn build
but it also outputsbuild/stats.production.json
and runs webpack-bundle-analyzer CLI. -
yarn tsc
– Runs TypeScript compiler. Outputs type errors to console. -
yarn lint
– Runs ESLint. Outputs errors to console. -
yarn test
– Runs tests using AVA. -
yarn test-hot
– Runs tests using AVA in watch mode.
Currently, we use Netlify which is practically a CDN on steroids with integrated builds. There are 3 configuration files that affect the deployment behavior:
- netlify.toml – global config
- app/_headers – HTTP headers customization (mainly for immutable files)
- app_redirects – HTTP redirects and rewrites (fallback to index.html for client-side routing)