A project for extracting research methods from academic papers, analyzing their evolution over time.
This project uses uv for Python environment management:
# Install uv if not already installed
pip install uv
# clone the repository
git clone https://github.com/tianranchunzhen/methods-evolution.git
cd methods-evolution
# Create the same environment
uv syncThis project requires Python 3.10+ and uses the following key packages now:
httpx: HTTP client for API requestsloguru: Logging utilitymarker-pdf: PDF to Markdown converterpolars: Fast dataframe librarypyahocorasick: Efficient pattern matchingspaCy: NLP toolkit (used en_core_web_lg-3.8.0 model)toml: Configuration file parsertorch&transformers: Deep learning frameworks (specified inpyproject.toml)
- Prepare method dictionary: Parse NCRM research method typology into structured TOML format
- Collect sample papers: Fetch papers from academic sources with metadata
- Process papers: Convert PDFs to Markdown format using Marker
- Extract methods: Match method terms in papers using Aho-Corasick algorithm
TODO:
- Try AutoPhrase and see the results
- Handle the synonyms
- Simple analysis of the results
- Try to extract from the title + abstract or the method section, compare the results
Data/: Paper's data, texts and method dictionariesDocs/: Some reference documentsScripts/: Processing and analysis scriptsResults/: Output of method extractionModels/: Machine learning models (used en_core_web_lg-3.8.0 model)