git-repo-parser
v2.0.7
Published
A tool to scrape all files from a GitHub repository and turn it into a JSON file
Downloads
19
Readme
git-repo-parser
A powerful tool to scrape all files from a GitHub repository and convert them into JSON or plain text format.
Installation
Install the package globally using npm:
npm install -g git-repo-parser
Or add it to your project as a dependency:
npm install git-repo-parser
Usage
Command Line Interface (CLI)
This package provides two CLI commands:
git-repo-to-json
: Scrapes a GitHub repository and saves the result as a JSON file.git-repo-to-text
: Scrapes a GitHub repository and saves the result as a plain text file.
Example usage:
git-repo-to-json https://github.com/username/repo-name.git
git-repo-to-text https://github.com/username/repo-name.git
The scraped data will be saved as files.json
or files.txt
in your current directory.
Programmatic Usage
You can also use the package in your Node.js projects:
import { scrapeRepositoryToJson, scrapeRepositoryToPlainText } from 'git-repo-parser';
// To get JSON output
const jsonResult = await scrapeRepositoryToJson('https://github.com/username/repo-name.git');
// To get plain text output
const textResult = await scrapeRepositoryToPlainText('https://github.com/username/repo-name.git');
API
scrapeRepositoryToJson(repoUrl: string): Promise<FileData[]>
Scrapes the given GitHub repository and returns a promise that resolves to an array of FileData
objects.
scrapeRepositoryToPlainText(repoUrl: string): Promise<string>
Scrapes the given GitHub repository and returns a promise that resolves to a string containing the repository contents in a structured plain text format.
FileData Interface
The FileData
interface represents the structure of files and directories in the JSON output:
interface FileData {
name: string;
path: string;
type: 'file' | 'directory';
children?: FileData[];
content?: string;
}
Features
- Clones the repository locally (temporary)
- Ignores binary files and common non-source files
- Supports nested directory structures
- Provides both JSON and plain text output formats
- Cleans up cloned repository after scraping
Ignored Files
The following file types and patterns are ignored during scraping:
- package-lock.json
- Binary files (pdf, png, jpg, jpeg, gif, ico, svg, woff, woff2, eot, ttf, otf)
- Media files (mp4, avi, webm, mov, mp3, wav, flac, ogg, webp)
- Debug and error logs (npm-debug, yarn-debug, yarn-error)
- Configuration files (tsconfig, jest.config)
- The
.git
directory
License
This project is licensed under the MIT License.
Author
arnab2001
Contributing
Contributions, issues, and feature requests are welcome. Feel free to check [issues page] if you want to contribute.
Show your support
Give a ⭐️ if this project helped you!