@tricoteuses/assemblee
v1.9.13
Published
Retrieve, clean up & handle French Assemblée nationale's open data
Downloads
1,713
Readme
Tricoteuses-Assemblee
Retrieve, clean up & handle French Assemblée nationale's open data
Requirements
- Node >= 18
Installation
git clone https://git.en-root.org/tricoteuses/tricoteuses-assemblee
cd tricoteuses-assemblee/
npm install
Copy the .env.example file, rename it to .env, and set your environment variables. OpenAI is only used for law texts parsing.
Download and clean data
Basic usage
Create a folder where the data will be downloaded and run the following command to download, reorganize and clean the data.
mkdir ../assemblee-data/
# Download and clean open data
npm run data:download ../assemblee-data
Data from other sources is also available :
# Retrieval of députés' pictures from Assemblée nationale's website
npm run data:retrieve_deputes_photos ../assemblee-data
# Retrieval of sénateurs' pictures from Assemblée nationale's website
npm run data:retrieve_senateurs_photos ../assemblee-data
# Retrieve and parse law texts. Warning : uses OpenAI API and will incur costs.
npm run data:retrieve_textes_lois ../assemblee-data
# Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
npm run data:retrieve_pending_amendements ../assemblee-data
Notes:
- Reorganized files (generated by the data:reorganize_data command) are also available in Tricoteuses / Data / Données brutes de l'Assemblée. They are updated on a regular basis.
- Split & cleaned files (generated by the data:clean_data command) are also available in Tricoteuses / Data / Données nettoyées de l'Assemblée with the
_nettoye
suffix. They are updated on a regular basis.
Filtering options
Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.
To download only a type of dataset, use the --categories option (shortcut -k) :
# Available options : ActeursEtOrganes, Agendas, Amendements, DossiersLegislatifs, Photos, Scrutins, Questions, ComptesRendusSeances
npm run data:download ../assemblee-data -- --categories Amendements
To download only a specific legislature, use the --legislature option (shortcut -l):
# Available options : 14, 15, 16, 17
npm run data:download ../assemblee-data -- --legislature 17
If you use such options, use them in all subsequent commands too (data:regorganize_data and data:clean_data).
Download using Docker
A Docker image that downloads and cleans the data all at once is available. Build it locally or pull it from the container registry :
docker pull registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest
Create a volume to download the data and use the environment variables LEGISLATURE
and CATEGORIES
if needed :
docker volume create assemblee-data
docker run --name tricoteuses-assemblee -v assemblee-data:/app/assemblee -e LEGISLATURE=17 -d registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest
Using the data
Once the data is downloaded and cleaned, you can use loaders to retrieve it. To use loaders in your project, you can install the @tricoteuses/assemblee package, and import the iterator functions that you need.
npm install @tricoteuses/assemblee
import {
iterLoadAssembleeActeurs,
iterLoadAssembleeOrganes,
iterLoadAssembleeReunions,
iterLoadAssembleeScrutins,
iterLoadAssembleeDocuments,
iterLoadAssembleeDossiersParlementaires,
iterLoadAssembleeAmendements,
iterLoadAssembleeQuestions,
iterLoadAssembleeComptesRendus,
} from "@tricoteuses/assemblee/lib/loaders";
// Pass data directory and legislature as arguments
for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", 17)) {
console.log(acteur.uid)
}
Generating schemas and documentation (for contributors only)
View instructions here