ClaimMiner was originally designed by Conversence for SocietyLibrary as a way to identify claim networks in a document corpus. As such, it offers basic document parsing and RAG functionality, with a plugin architecture to add new analysis tasks. This open source project represents the core functionality of ClaimMiner, and a lot of analysis tasks specific to Society Library was removed. We intend to migrate ClaimMiner on HyperKnowledge architecture, so expect a lot of changes to the code base in the near future. In particular, we are working on a first nested frame implementation, based on LinkML.
The main data flow is currently as follows:
- The documents are added to the corpus, either uploaded directly or as URLs
- URLs are downloaded
- Documents are broken into paragraphs
- Language embeddings are calculated for each paragraph
- Operators input some initial seed claims
- Operators look for semantically related paragraphs in the corpus using the embeddings
- They send the most promising paragraphs to AI systems that will identify claims in the paragraphs
- Those claims are vetted and we can repeat the cycle
There are other ancillary functions:
- ClaimMiner can use GDELT to perform a semantic search for news items
- ClaimMiner can identify clusters in the claims, and draw a cloud of claims
- ClaimMiner can perform text search on paragraphs or claims
- ClaimMiner can perform a broadening semantic search (MMR) on paragraphs or claims