Document Corpus Visualization

An app for visualization of document corpora (collections) using D3.js

👉 Available online at pokusew-corpus-viz.netlify.app

The code is written in TypeScript, D3.js and React.js. See more in the Architecture section.

Note: The initial version is finished! 🚀
Next steps: Finish documentation 📖 and refactor 🧹 some parts of the code.

Content

Description
Architecture
- Data preprocessing
- Project structure
Development
Deployment

Description

See 👉 Final Report – Visualization of a Document Corpus on Google Docs.

Architecture

Currently, it is a client-side-only application (SPA). It runs completely in the browser.

The code is written in TypeScript, D3.js and React.js.

The project has just a few production dependencies. Everything else is implemented from scratch.

Data preprocessing

There is also a separate data preprocessing pipeline which is implemented in Python 3.
See data-preprocessing directory that contains its own README with more info.

The app/data directory (versioned in Git) contains already preprocessed data of some document collections.

Project structure

The web app source code is in the app directory. Some directories contain feature-specific READMEs. The following diagram briefly describes the main directories and files:

. (project root dir)
├── .github - GitHub config (GitHub Actions)
├── app - the app source code
│   ├── components - React components for the the main app logic, UI, state, views, plot wrappers
│   ├── core - D3.js scatterplot and wordcloud, data loading 
│   ├── data - data for to visualize - stored results of the data preprocessing pipeline
│   ├── helpers - various common functions
│   ├── images - the PWA app icon and SVG UI icons
│   ├── styles - app styles written in Sass (SCSS)
│   ├── sw - the service worker that handles precaching app shell (not fully integrated)
│   ├── _headers - Netlify HTTP headers customization
│   ├── _redirects - Netlify HTTP redirects/rewrites customization
│   ├── index.js - the app starting point (entrypoint)
│   ├── manifest.json - a web app manifest for PWA
│   ├── robots.txt
│   ├── routes.ts - app routes definitions
│   ├── template.ejs - index.html template to be built by webpack 
│   └── types.js - data, state and API types
├── data-preprocessing - Python scripts used for data preprocessing
├── test - a few tests
├── tools - custom webpack plugins
├── types - TypeScript declarations for non-code imports (SVG, MP3)
├── .browserslistrc - Browserslist config
├── .eslintrc.js - ESLint config
├── .nvmrc - Node.js version specification for Netlify
├── ava.config.js - AVA config
├── babel.config.js - Babel config
├── netlify.toml - Netlify main config
├── package.json
├── babel.config.js - PostCSS config
├── tsconfig.json - main TypeScript config
├── webpack.config.*.js - webpack configs
└── yarn.lock

Development

Requirements

Node.js >=18.x
Yarn 1.x
You can follow this Node.js Development Setup guide.

Set up

Install all dependencies with Yarn (run yarn).
You are ready to go.
Use yarn start to start dev server with HMR.
Then open http://localhost:3000/ in the browser.

Available commands

yarn start – Starts a webpack development server with HMR (hot module replacement).
yarn build – Builds the production version and outputs to dist dir. Note: Before running an actual build, dist dir is purged.
yarn analyze – Same as yarn build but it also outputs build/stats.production.json and runs webpack-bundle-analyzer CLI.
yarn tsc – Runs TypeScript compiler. Outputs type errors to console.
yarn lint – Runs ESLint. Outputs errors to console.
yarn test – Runs tests using AVA.
yarn test-hot – Runs tests using AVA in watch mode.

Deployment

Currently, we use Netlify which is practically a CDN on steroids with integrated builds. There are 3 configuration files that affect the deployment behavior:

netlify.toml – global config
app/_headers – HTTP headers customization (mainly for immutable files)
app_redirects – HTTP redirects and rewrites (fallback to index.html for client-side routing)

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github/workflows		.github/workflows
.idea		.idea
app		app
data-preprocessing		data-preprocessing
docs		docs
test		test
tools		tools
types		types
.browserslistrc		.browserslistrc
.editorconfig		.editorconfig
.eslintrc.js		.eslintrc.js
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
NODEJS-SETUP.md		NODEJS-SETUP.md
README.md		README.md
ava.config.js		ava.config.js
babel.config.js		babel.config.js
netlify.toml		netlify.toml
package.json		package.json
postcss.config.js		postcss.config.js
tsconfig.json		tsconfig.json
webpack.config.base.js		webpack.config.base.js
webpack.config.development.js		webpack.config.development.js
webpack.config.production.js		webpack.config.production.js
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Corpus Visualization

Content

Description

Architecture

Data preprocessing

Project structure

Development

Requirements

Set up

Available commands

Deployment

About

Releases

Packages

Contributors 2

Languages

pokusew/fel-corpus-viz

Folders and files

Latest commit

History

Repository files navigation

Document Corpus Visualization

Content

Description

Architecture

Data preprocessing

Project structure

Development

Requirements

Set up

Available commands

Deployment

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages