@racsodev/cv-pdf-to-json

v2.2.1

Published

2 days ago

Node.js library that extracts CV/resume data from PDF files and converts it to structured JSON using Claude's native PDF support.

Downloads

218

0High
0Medium
0Low

racsodev

cv resume extract claude ai pdf json typescript

CV PDF to JSON

Extract and process CV data from PDF files with Claude AI's native PDF support. This library provides a robust pipeline for converting PDF resumes into structured JSON data.

Features

Direct PDF processing using Claude AI's native PDF support
Structured JSON output of CV data
Support for processing single PDF file or directory of PDF files
Support for saving structured JSON outputs
Debug mode for detailed processing insights

Installation

npm install @racsodev/cv-pdf-to-json

Basic Usage

import { createPdfExtractor } from '@racsodev/cv-pdf-to-json'
import path from 'path'

// Initialize the PDF extractor
const extractor = createPdfExtractor({
  anthropicApiKey: process.env.ANTHROPIC_API_KEY || '',
  outputJsonPath: './outputs/json',
})

// Process a single file
const result = await extractor.process('path/to/cv.pdf')
console.log('Processing Result:', result)

// Process a directory
const results = await extractor.processDirectory('path/to/cvs')
console.log('Processing Results:', results)

Advanced Usage

For more control over the extraction process, you can use individual components:

import {
  DocumentProcessor,
  ClaudeProcessor,
  type CvData,
  type Experience,
  type Education,
  type Language,
  ContractType,
  LanguageLevel,
} from '@racsodev/cv-pdf-to-json'

// Initialize Claude AI processor with native PDF support
const processor = new ClaudeProcessor({
  apiKey: process.env.ANTHROPIC_API_KEY || '',
})

// Create document processor
const documentProcessor = new DocumentProcessor({
  processor,
  outputJsonPath: './outputs/json',
  debug: true,
})

// Process CV
async function processCV(pdfPath: string) {
  const result = await documentProcessor.process(pdfPath)

  if (result.success && result.data) {
    const cvData: CvData = result.data
    console.log('Extracted CV Data:', cvData)
  }

  return result
}

// Use the processor
const result = await processCV('path/to/cv.pdf')

Output Format

The processor returns data in the following format:

interface CvData {
  lastName: string
  firstName: string
  address: string
  email: string
  phone: string
  linkedin: string
  github: string
  personalWebsite: string
  professionalSummary: string
  school: string
  schoolLowerCase: string
  promotionYear: number
  professionalExperiences: Experience[]
  otherExperiences: Experience[]
  educations: Education[]
  hardSkills: string[]
  softSkills: string[]
  languages: Language[]
  publications: string[]
  distinctions: string[]
  hobbies: string[]
  references: string[]
}

interface Experience {
  companyName?: string
  title?: string
  location: string
  type: ContractType
  startDate: number
  endDate: number
  duration: number // in months
  ongoing: boolean
  description: string
  associatedSkills: string[]
}

interface Education {
  degree: string
  institution: string
  location: string
  startDate: number
  endDate: number
  duration: number // in months
  ongoing: boolean
  description: string
  associatedSkills: string[]
}

interface Language {
  language: string
  level: LanguageLevel
}

enum LanguageLevel {
  BASIC_KNOWLEDGE = 'BASIC_KNOWLEDGE',
  LIMITED_PROFESSIONAL = 'LIMITED_PROFESSIONAL',
  PROFESSIONAL = 'PROFESSIONAL',
  FULL_PROFESSIONAL = 'FULL_PROFESSIONAL',
  NATIVE_BILINGUAL = 'NATIVE_BILINGUAL',
}

enum ContractType {
  PERMANENT_CONTRACT = 'PERMANENT_CONTRACT',
  SELF_EMPLOYED = 'SELF_EMPLOYED',
  FREELANCE = 'FREELANCE',
  FIXED_TERM_CONTRACT = 'FIXED_TERM_CONTRACT',
  INTERNSHIP = 'INTERNSHIP',
  APPRENTICESHIP = 'APPRENTICESHIP',
  PERFORMING_ARTS_INTERMITTENT = 'PERFORMING_ARTS_INTERMITTENT',
  PART_TIME_PERMANENT = 'PART_TIME_PERMANENT',
  CIVIC_SERVICE = 'CIVIC_SERVICE',
  PART_TIME_FIXED_TERM = 'PART_TIME_FIXED_TERM',
  SUPPORTED_EMPLOYMENT = 'SUPPORTED_EMPLOYMENT',
  CIVIL_SERVANT = 'CIVIL_SERVANT',
  TEMPORARY_WORKER = 'TEMPORARY_WORKER',
  ASSOCIATIVE = 'ASSOCIATIVE',
}

Development Setup

Install dependencies:

npm install

Set up environment variables:

# Copy the example env file
cp .env.example .env

# Edit .env and add your Anthropic API key
ANTHROPIC_API_KEY=your_api_key_here

Process documents:

npm run process <file-path>

This will process the specified PDF file or directory and generate JSON outputs in the configured directory.

Project Structure

src/ - Source code directory
- processors/ - Document processing pipeline
  - Processor.ts - Base processor class
  - DocumentProcessor.ts - Main document processing logic
  - ClaudeProcessor.ts - Claude AI integration with native PDF support
- types/ - TypeScript type definitions
- utils/ - Utility functions for data processing and file handling

Requirements

Node.js >= 18.4.2
Anthropic API key for Claude AI integration

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme