Lang en fr

ClaimMiner

ClaimMiner was originally designed by Conversence for SocietyLibrary as a way to identify claim networks in a document corpus. As such, it offers basic document parsing and RAG functionality, with a plugin architecture to add new analysis tasks. This open source project represents the core functionality of ClaimMiner, and a lot of analysis tasks specific to Society Library was removed. We intend to migrate ClaimMiner on HyperKnowledge architecture, so expect a lot of changes to the code base in the near future. In particular, we are working on a first nested frame implementation, based on LinkML.

The main data flow is currently as follows:

  1. The documents are added to the corpus, either uploaded directly or as URLs
  2. URLs are downloaded
  3. Documents are broken into paragraphs
  4. Language embeddings are calculated for each paragraph
  5. Operators input some initial seed claims
  6. Operators look for semantically related paragraphs in the corpus using the embeddings
  7. They send the most promising paragraphs to AI systems that will identify claims in the paragraphs
  8. Those claims are vetted and we can repeat the cycle

There are other ancillary functions:

  1. ClaimMiner can use GDELT to perform a semantic search for news items
  2. ClaimMiner can identify clusters in the claims, and draw a cloud of claims
  3. ClaimMiner can perform text search on paragraphs or claims
  4. ClaimMiner can perform a broadening semantic search (MMR) on paragraphs or claims