npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

inacio-luma-patient-recommender

v0.9.0

Published

This library recommends a list of patients most likely to pick up a call from the hospital, based on the hospital's location and other optional parameters.

Downloads

2

Readme

Introduction

This is my submission to the Luma Health backend interview assigment, it is:

  • a simple library you can import and use to create a list of top priority patients for a hospital call;
  • a RESTful API that given a hospital coordinates (latitude, longitude) it returns a waitlist of patient most likely to pick a call from the hospital.

Table of content

  1. Quickstart
  2. Endpoints
  3. Implementation

Quickstart

There's three ways to play with this submission:

  1. I've deployed the REST API at AWS as a ECS instance; feel free to acess it here http://luma.inacio.codes/docs
  2. Or you can install and run the API locally with:
npm i
npm run dev
  1. Finally, you can also install and import the library with:
git clone https://github.com/inacioMattos/luma-interview.git
cd luma-interview
npm i
npm run dev

Endpoints

GET /docs

The Swagger endpoint.

GET /health

A healthcheck route.

  1. Monitoring and Availability: Health checks allow monitoring systems to verify that an API is up and running correctly. This helps in ensuring high availability of the service, as it allows for quick detection and response to issues.
  2. Load Balancing: In environments with multiple instances of a service, load balancers use health checks to decide which instances are capable of handling requests. Instances that fail health checks can be automatically removed from the pool, ensuring that traffic is only directed to healthy instances.
  3. Auto-scaling and Orchestration: In cloud environments, health checks are crucial for auto-scaling and orchestration. Systems like Kubernetes and other orchestration tools rely on health check endpoints to manage the lifecycle of containers and services, such as scaling up or down and performing rolling updates without downtime.

GET /patients/recommend

This endpoint recommends a list of patients most likely to pick up a call from the hospital, based on the hospital's location and other optional parameters.

Query Parameters:

  • lat (required): Hospital's latitude

    • Type: number
    • Range: -90 to 90
    • Description: The latitude coordinate of the hospital's location
  • long (required): Hospital's longitude

    • Type: number
    • Range: -180 to 180
    • Description: The longitude coordinate of the hospital's location
  • limit (optional):

    • Type: number
    • Default: 10
    • Minimum: 1
    • Description: Number of patients to recommend
  • include_details (optional):

    • Type: boolean
    • Default: false
    • Description: If set to true, the response will include all patient data plus individual scores for each historical feature (ageScore, replyTimeScore, etc.). This is useful for debugging purposes. All scores are weighted by their respective weights.

Response:

The endpoint returns a list of recommended patients based on their likelihood to pick up a call from the hospital. The exact structure of the response depends on the include_details parameter.

Success Response (200 OK):
  • When include_details is false (default):
    [
      {
        "id": "string",
        "name": "string",
        "score": 9.67,
      }
    ]
  • When include_details is true:
    [
      {
        "id": "string",
        "name": "string",
        "age": 67,
        "acceptedOffers": 22,
        "canceledOffers": 7,
        "averageReplyTime": 1230,
        "location": {
          "latitude": 87.0444,
          "longitude": 155.0585
        },
        "ageScore": 0.7,
        "replyTimeScore": 1.95,
        "offersScore": 4.4,
        "locationScore": 0.3,
        "score": 7.35
      }
    ]

Example usage

GET /patients/recommend?lat=37.7749&long=-122.4194&limit=5&include_details=true

This request would return a list of 5 recommended patients for a hospital located in San Francisco, including detailed scores for each patient.

Time complexity

This endpoint has a $O(log(n))$ time complexity where $n$ is the total number of patients. (Will be explained in depth below)

Implementation

The implementation is divided into two major components:

  • Scoring: How to score each individual patient;
  • Traversing: Given a hospital coordinates (latitude, longitude) & a patient score, how to traverse through them in order to find the top 10 patients.

Since we can score all patients beforehand, traversing becomes the important part since it's what will define our app perfomance (because we'll need to traverse for each new hospital search).

So, let's first dive into how traversing is implemented.

Traversing

A O(log n) implementation

To optimize our algorithm, I decided to use a data structure known as K-d tree. Its main selling point is the efficient nearest neighbor searches in $O(log(n))$ time on average.

A K-d tree (k-dimensional tree) is a data structure that allows for efficient nearest neighbor searches in O(log n) time on average. It is an auto-balacing binary tree but for arbitrary d dimensions.

By leveraging this data structure, we can significantly improve our algorithm's speed. Here's our algorithm's outline:

  1. Preprocessing: Construct a K-d tree using the patients' location data:
    • latitude;
    • longitude;
    • precomputed age score
    • precomputed offers acceptancy rate score;
    • precomputed time to reply score.
  2. Query: When a hospital request comes in, use the K-d tree to efficiently find the nearest neighbors (potential patients) based on location.
  3. Result: Return the k-nearest-neighbors.

The standard approach to traversing leads to a time complexity of $O(n)$, while mine approach has a time complexity of $O(log(n))$.

What happens as the total number of patients grows? With this method, the processing time would increase linearly with the number of patients — resulting in a time complexity of O(n). This becomes problematic, especially considering we're calculating the computationally expensive haversine distance for every patient. Fortunately, we can implement a more efficient solution:

Scoring

Now let's turn our attention to scoring — i.e. how to score a individual patient.

Given this patient:

{
  "name": "Mr. Carmella VonRueden",
  "age": 43,
  "acceptedOffers": 98,
  "canceledOffers": 9,
  "averageReplyTime": 3170,
  "location": {
    "latitude": 87.0444,
    "longitude": 155.0585
  }
}

There's three features we can score statically (i.e. before knowing the hospital coordinates):

  1. age: I assumed the higher the age the better the score;
  2. averageReplyTime: I assumed the lower the averageReplyTime the better the score;
  3. offers: This one is a bit tricky — I'll go in depth below.

To score one of the static features we simply:

  1. Standardize it (also known as z-score):
    • $z = \frac{(x - \mu)}{\sigma}$
    • Where $x$ is the value to be standardize, $\mu$ and $\sigma$ are the mean and standard deviation of the $X$ set respectively.
  2. Normalize it:
    • $normalized = \frac{(z - Z_{\text{min}})}{(Z_{\text{max}} - Z_{\text{min}})}$
    • Where $z$ is the value to be normalized, $Z_{\min}$ and $Z_{\max}$ are the minimum and maximum values in the $Z$ set respectively.
  3. Apply its weight:
    • $weighted = normalized * W_x$
    • Where $normalized$ is the normalized value you wish apply weighting & $W_x$ is the weight of the respective feature (age, average reply time, etc.).

The standardization step is significantly important because it eliminates the risk of having outliers in the patients set.

Probably the most straightforward way to score those features would be to simply normalize them, putting them in the range of [0, 1] and then multiplying by their weights.

This is not great. Why?

Because normalization is vulnerable to outliers — if an outlier is extremely high, the range of the data becomes unusually large, making the normalized values of other data points inordinately small and tightly clustered.

Normalization adjusts the data based on the minimum and maximum values, if outliers are present, they'll significantly skew these minima and maxima, thus distorting the normalized values.

Since most every real-world phenomenon follows a normal distribution due to the central limit theorem, there almost certainly will be outliers present.

We can fix this using standardization!

Standardization (or z-scores)

Standardization — also known as z-scores — mitigates the impact of outliers more effectively because it is based on the mean and standard deviation. It ensures that:

  • The unit of measurement for variances and covariances is consistent across variables, which is particularly important in models that weigh inputs equally (like many machine learning algorithms).
  • It maintains the relative distances between and within data points, preserving outliers in a way that does not disproportionately influence the overall data structure as much as normalization might.

Thus, to compute the score for the age & averageReplyTime features one should simply follow the three steps above. Computing acceptedOffers and canceledOffers, however, requires one extra-step:

Computing offers

I've decided to use a Empirical Bayes Estimator since it makes a ton of sense here. Here's its formulation:

$\text{Offer Score} = \frac{C \times m + \text{Accepted Offers}}{C + \text{Total Offers}}$

Where $C$ is the average total offers number and $m$ is the median offer acceptancy rate.

Why did I choose this approach?

Bayes versus additive scoring

Additive score is the idea of: add a 'point' for each accepted offer and subtract a 'point' for each canceled offer. This simplifies to $Offer Score = acceptedOffers - canceledOffers$.

This can be problematic in the following scenario: patient1 has { acceptedOffers: 100, canceledOffers: 80 } and patient2 has { acceptedOffers: 18, canceledOffers: 0 }

  • patient1 additive score: $100 - 80 = 20$
  • patient2 additive score: $18 - 0 = 18$

Bayes versus simple offer acceptancy rate

Suppose a patient1 has { acceptedOffers: 1, canceledOffers: 0 } and patient2 has { acceptedOffers: 92, canceledOffers: 2 }. Using a naive ratio or acceptancy ratio would lead us to rate patient1 as better — which isn't ideal.

Using an empirical Bayes estimator solves both issues.


Running the Application

npm run dev

This will simply run the server in development mode at port 3000.

npm run dev:debug

This will simply run the server in debug mode (additional logs) at port 3000.

npm run build

This will transpile the Typescript code into Javascript

npm run start

This will start the server in production mode — only available after building with npm run build

npm run build:docker

This will build a docker container for this service. Useful for deploying to the cloud.

npm run start:docker

This will start the server in production mode using the container previously built with npm run build:docker

Assumptions

techs

Stress test

npm run stress-test req/s is 31k /api/patients/recommend req/s is 43k /health

Docker