npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

dps-extractor

v1.0.0

Published

Builds a docker container for executing data extraction tasks

Downloads

3

Readme

DPS-EXTRACTOR

A containerized application that executes extraction tasks. The scope of this repository is focused on collecting third-party data and storing it in S3.

Requirements

  • NodeJs 18.5 https://nodejs.org/en/blog/release/v18.5.0/
  • NPM 8.12.1 - (comes with node but need to update to 8.12.1)
  • Typescript >= 4.0.5 https://www.typescriptlang.org/
  • npm https://www.npmjs.com/
  • docker https://www.docker.com/

Internal Dependencies

  • dps-utilities-typescript (https://github.com/cafemedia/dps-utilities-typescript)

Setup

  1. Clone this repository [email protected]:cafemedia/dps-extractor.git
  2. Enter the directory cd dps-extractor
  3. Install dependencies npm install

Infrastructure

Running Terraform. A wrapper script is provided for your convenience. Use terraform.sh -h for more information.

# -e environment
# -p aws credentials profile name
# -v pass terraform variables, can be invoked multiple times
./terraform.sh \
  -e development \
  -p aws-profile \
  .tf

Building

This project is designed to be imported as a library, and must first be compiled into javascript.
NOTE: the memory requirements for building are increasing... npm run build

Linting

This project is configured to use tslint to keep our code styling in line.
npm run lint

Formatting

Please be sure to format your code before commit! npm run format

Testing

A full test suite has been integrated into the project using:

  • mocha - test framework - (https://mochajs.org/)
  • chai - assertion library - (https://www.chaijs.com/)
  • sinon - mocking and faking support - (https://sinonjs.org/)
  • nyc - coverage reporting - (https://github.com/istanbuljs/nyc) npm run test:unit:coverage

Git Commit Hooks

In order to ensure that we aren't pushing messy code that likely won't pass linting or test phases in Drone, we use husky (https://github.com/typicode/husky) which will automatically build, lint and test our code when we attempt to commit.

Working with Private Github Packages

This project depends on dps-utilities-typescript, which is installed via NPM, but requires authentication with Github Packages.

Building Locally with Docker

This project is configured to automatically build and deploy an image to ECR on the Adthrive AWS Account with a repository of the same name. In order to test that builds work locally:

docker build -t dps-extractor --build-arg GITHUB_TOKEN=<YOUR GITHUB PAT> .
docker run -t dps-extractor Hello

jowens@JOWENS-MAC dps-extractor % docker run -t dps-extractor Hello                                                                
2021-02-11T18:05:22.912Z - info: [Hello] Starting - 512c0d70-a72d-4cdc-8013-4673527dd0b9 - {}
2021-02-11T18:05:22.923Z - info: [Hello] Done duration=3ms
jowens@JOWENS-MAC dps-extractor %

TODO: This could use some optimization.

Running in Airflow

Example airflow task to be incorporated into a DAG:

hello_extractor = KubernetesPodOperator(
    namespace = 'dps',
    image =  f"312505582686.dkr.ecr.us-east-1.amazonaws.com/dps-extractor:<IMAGE TAG>",
    arguments = [
        "Hello",
        "-s", "{{ task_instance.xcom_pull(task_ids='get_state', key='return_value').date }}",
        "-e", "{{ task_instance.xcom_pull(task_ids='get_state', key='return_value').date }}",
        "-x"
    ],
    name = "hello-extractor",
    task_id = "hello-extractor",
    get_logs = True,
    dag = dag,
    is_delete_operator_pod = True,
    in_cluster = True,
    log_events_on_failure = True,
    run_as_user = "airflow",
    annotations = {"datadog-service": "sample-k8s-dag", "datadog-source": "airflow"},
    do_xcom_push = True,
)

Note that if do_xcom_push is set to True, we must also pass the -x argument to the container.

Example Log Output:

[2021-02-11 18:14:17,749] {taskinstance.py:901} INFO - Executing <Task(KubernetesPodOperator): hello-extractor> on 2021-02-11T18:13:52.710304+00:00
[2021-02-11 18:14:17,750] {base_task_runner.py:131} INFO - Running on host: gamearningshelloextractor-f82ef6ed61744772b86229379fd9ba1b
[2021-02-11 18:14:17,750] {base_task_runner.py:132} INFO - Running: ['airflow', 'run', 'gam_earnings', 'hello-extractor', '2021-02-11T18:13:52.710304+00:00', '--job_id', '188', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/dags/gam_earnings.py', '--cfg_path', '/tmp/tmp77e3vqz_']
[2021-02-11 18:14:19,072] {base_task_runner.py:111} INFO - Job 188: Subtask hello-extractor [2021-02-11 18:14:19,072] {__init__.py:50} INFO - Using executor LocalExecutor
[2021-02-11 18:14:19,072] {base_task_runner.py:111} INFO - Job 188: Subtask hello-extractor [2021-02-11 18:14:19,072] {dagbag.py:417} INFO - Filling up the DagBag from /opt/airflow/dags/dags/gam_earnings.py
[2021-02-11 18:14:19,396] {base_task_runner.py:111} INFO - Job 188: Subtask hello-extractor Running <TaskInstance: gam_earnings.hello-extractor 2021-02-11T18:13:52.710304+00:00 [running]> on host gamearningshelloextractor-f82ef6ed61744772b86229379fd9ba1b
[2021-02-11 18:14:19,548] {logging_mixin.py:112} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py:309: DeprecationWarning: Using `airflow.contrib.kubernetes.pod.Pod` is deprecated. Please use `k8s.V1Pod`.
  dummy_pod = Pod(
[2021-02-11 18:14:19,548] {logging_mixin.py:112} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py:77: DeprecationWarning: Using `airflow.contrib.kubernetes.pod.Pod` is deprecated. Please use `k8s.V1Pod` instead.
  pod = self._mutate_pod_backcompat(pod)
[2021-02-11 18:14:19,606] {pod_launcher.py:171} INFO - Event: hello-extractor-0d3841b79ce44594bd421def1f168461 had an event of type Pending
[2021-02-11 18:14:19,606] {pod_launcher.py:139} WARNING - Pod not yet started: hello-extractor-0d3841b79ce44594bd421def1f168461
[2021-02-11 18:14:20,614] {pod_launcher.py:171} INFO - Event: hello-extractor-0d3841b79ce44594bd421def1f168461 had an event of type Pending
[2021-02-11 18:14:20,614] {pod_launcher.py:139} WARNING - Pod not yet started: hello-extractor-0d3841b79ce44594bd421def1f168461
[2021-02-11 18:14:21,625] {pod_launcher.py:171} INFO - Event: hello-extractor-0d3841b79ce44594bd421def1f168461 had an event of type Running
[2021-02-11 18:14:21,660] {pod_launcher.py:156} INFO - b'2021-02-11T18:14:20.820Z - \x1b[32minfo\x1b[39m: [Hello] Starting - 9e4e5a0a-44c0-444a-9940-80f2966f4366 - {"start":"2021-02-10T00:00:00.000+00:00","end":"2021-02-10T00:00:00.000+00:00","writeXcom":true}\n'
[2021-02-11 18:14:21,660] {pod_launcher.py:156} INFO - b'2021-02-11T18:14:20.822Z - \x1b[32minfo\x1b[39m: [Hello] Done duration=1ms\n'
[2021-02-11 18:14:21,718] {pod_launcher.py:267} INFO - Running command... cat /airflow/xcom/return.json

[2021-02-11 18:14:21,761] {pod_launcher.py:267} INFO - Running command... kill -s SIGINT 1

foo