ClaimMiner

ClaimMiner was originally designed by Conversence for SocietyLibrary as a way to identify claim networks in a document corpus. As such, it offers basic document parsing and RAG functionality, with a plugin architecture to add new analysis tasks. This open source project represents the core functionality of ClaimMiner, and a lot of analysis tasks specific to Society Library was removed. We intend to migrate ClaimMiner on HyperKnowledge architecture, so expect a lot of changes to the code base in the near future. In particular, we are working on a first nested frame implementation, based on LinkML.

The main data flow is currently as follows:

The documents are added to the corpus, either uploaded directly or as URLs
URLs are downloaded
Documents are broken into paragraphs
Language embeddings are calculated for each paragraph
Operators input some initial seed claims
Operators look for semantically related paragraphs in the corpus using the embeddings
They send the most promising paragraphs to AI systems that will identify claims in the paragraphs
Those claims are vetted and we can repeat the cycle

There are other ancillary functions:

ClaimMiner can use GDELT to perform a semantic search for news items
ClaimMiner can identify clusters in the claims, and draw a cloud of claims
ClaimMiner can perform text search on paragraphs or claims
ClaimMiner can perform a broadening semantic search (MMR) on paragraphs or claims