ConceptualsearchTrain a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs
AquiladbDrop in solution for Decentralized Neural Information Retrieval. Index latent vectors along with JSON metadata and do efficient k-NN search.
RanknetMy (slightly modified) Keras implementation of RankNet and PyTorch implementation of LambdaRank.
PwnbackBurp Extender plugin that generates a sitemap of a website using Wayback Machine
HdltexHDLTex: Hierarchical Deep Learning for Text Classification
Rank bm25A Collection of BM25 Algorithms in Python
OpenmatchAn Open-Source Package for Information Retrieval.
Vec4irWord Embeddings for Information Retrieval
NeuralqaNeuralQA: A Usable Library for Question Answering on Large Datasets with BERT
K NrmK-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling
RankingLearning to Rank in TensorFlow
BooksBooks worth spreading
Bm25A Python implementation of the BM25 ranking function.
Sf1r LiteSearch Formula-1——A distributed high performance massive data engine for enterprise/vertical search
GensimTopic Modelling for Humans
PyseriniPython interface to the Anserini IR toolkit built on Lucene
Tutorial Utilizing KgResources for Tutorial on "Utilizing Knowledge Graphs in Text-centric Information Retrieval"
InvoicenetDeep neural network to extract intelligent information from invoice documents.
EasyocrReady-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
FoundryThe Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning
Haystack🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Scilla🏴☠️ Information Gathering tool 🏴☠️ DNS / Subdomains / Ports / Directories enumeration
Pytrec evalpytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
VtextSimple NLP in Rust with Python bindings
Ds2iA library of inverted index data structures
SertSemantic Entity Retrieval Toolkit
FlexneuartFlexible classic and NeurAl Retrieval Toolkit
ForteForte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
SolrpluginsDice Solr Plugins from Simon Hughes Dice.com
Pyndripyndri is a Python interface to the Indri search engine.
Textrank Keyword ExtractionKeyword extraction using TextRank algorithm after pre-processing the text with lemmatization, filtering unwanted parts-of-speech and other techniques.
VectorsinsearchDice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Wordtokenizers.jlHigh performance tokenizers for natural language processing and other related tasks
ScdvText classification with Sparse Composite Document Vectors.
Domain discovery toolThis repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better understand a domain (or topic) as it is represented on the Web.
NprfNPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
PkePython Keyphrase Extraction module
Date InfoAPI to let user fetch the events that happen(ed) on a specific date
Drl4nlp.scratchpadNotes on Deep Reinforcement Learning for Natural Language Processing papers
FxtA large scale feature extraction tool for text-based machine learning
RelevancyfeedbackDice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, conceptual search, semantic search and personalized search
TalismanStraightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
AnseriniA Lucene toolkit for replicable information retrieval research
ResinHardware-accelerated vector-based search engine. Available as a HTTP service or as an embedded library.
Deep Semantic Similarity ModelMy Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.
Cdqa⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
PisaPISA: Performant Indexes and Search for Academia