TF-IDF
Term Frequency-Inverse Document Frequency. A method for quantifying word importance within documents.
TF-IDF (Term Frequency-Inverse Document Frequency) is a method for quantifying how important a specific word is within a document collection. It is calculated as the product of TF (term frequency) and IDF (inverse document frequency).
TF measures how frequently a word appears in a document, while IDF measures how rare the word is across the entire document collection. Common words appearing in many documents have low IDF, while words appearing only in specific documents have high IDF. Information retrieval and NLP books cover calculation methods.
TF-IDF serves as the foundation for many NLP tasks including search engine ranking, document classification, keyword extraction, and document summarization.
For character counting, TF-IDF is based on word occurrence frequency, making text character count and word count directly influential metrics. Machine learning text analysis books provide additional context.