npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@andrewhead/python-program-analysis

v0.4.8

Published

A Typescript library for parsing Python 3 and doing basic program analysis, like forming control-flow graphs and def-use chains.

Downloads

110

Readme

python-program-analysis

A Typescript library for parsing Python 3 and doing basic program analysis, like forming control-flow graphs and def-use chains.

Parsing

To parse Python 3 code, pass a string containing the code to the parse method.

const code = [
    'x, y = 0, 0',
    'while x < 10:',
    '   y += x * 2',
    '   x += 1',
    'print(y)'
];
const tree = parse(code.join('\n')); 

This method returns a tree of SyntaxNode objects, discriminated with a type field. The library also provides a function walk for pre-order tree traversal. With no arguments, it returns a list of the syntax nodes in the tree.

walk(tree).map(node => node.type)
// produces ["module", "assign", "literal", "literal", "name", "name", "while", "binop", …]

Optionally, walk takes a visitor object with methods onEnterNode (for pre-order traversal) and onExitNode (for post-order traversal).

Syntax nodes can be turned back into code with the printNode function, which produces a string. There is no guarantee of round-tripping. That is printNode(parse(code)) could be syntactically different than code, but will be semantically the same. For example, there may be extra parentheses around expressions, when compared with the original code. The printNode function is primarily for debugging.

Control flow

A control flow graph organizes a parse tree into a graph where the nodes are "basic blocks" (sequences of statements that run together) and the edges reflect the order of block execution.

const cfg = new ControlFlowGraph(tree);

cfg.blocks is an array of the blocks in the control flow graph, with cfg.entry pointing to the entry block and cfg.exit pointing to the exit block. The control flow graph for the parse tree above looks like this.

control flow graph

Each block has a list of its statements.

printNode(cfg.blocks[0].statements[0])

prints x, y = 0, 0.

The methods cfg.getSuccessors and cfg.getPredecessors allow the edges to be followed forward or backward.

const cond = cfg.getSuccessors(cfg.entry)[0];
printNode(cond)

prints x < 10.

Data flow

The library also provides basic def-use program analysis, namely, tracking where the values assigned to variables are read. For example, the 0 assigned to x in the entry block is read in the conditional x < 10, in the assignments y = x * 2 and x += 1.

const analyzer = new DataflowAnalyzer();
const flows = analyzer.analyze(cfg).flows;
for (let flow of flows.items) 
    console.log(printNode(flow.fromNode) + 
        " -----> " + printNode(flow.toNode))

prints

x, y = 0, 0 -----> x < 10
x, y = 0, 0 -----> print(y)
x, y = 0, 0 -----> y = x * 2
x += 1 -----> x < 10
y = x * 2 -----> print(y)
x += 1 -----> y = x * 2

Program Slicing

Program slicing removes lines from a program that are unnecessary to see the effect of a chosen line of code. For example, if we only care about the print statement in this program:

sum = 0
diff_sum = 0
for i in range(min(len(A), len(B))):
    sum += A[i] + B[i]
    diff_sum += A[i] - B[i]
print(sum)

then we can simplify the code to this:

sum = 0
for i in range(min(len(A), len(B))):
    sum += A[i] + B[i]
print(sum)

The function call slice(P,loc) takes a program P (a parse tree) and a program location loc and returns the program locations that are necessary for loc. For example, to do the slicing example above, we call slice(ast, {first_line: 6, first_column: 0, last_line, 6: last_column: 10}) which returns a LocationSet whose members have first_line values of 1, 3, 4, and 6 (but not 2 or 5).

API Specs

When deciding whether an API call needs to appear in a program slice, the slicing algorithm needs to know whether the call has a side-effect on the variables that are passed to it (including the self parameter for method calls). The call f(x) has a side-effect on x if f updates a field (x.m = y), updates an element (x[i] = y), updates a global variable, or transitively calls another function that has a side effect. Rather than analyzing the code of a called function (which may not even be available), we rely on having specifications, recorded in JSON files. Here is the specification pandas.json (with some lines omitted):

{
  "pandas": {
    "functions": [
      "array",
      "bdate_range",
      ...
      { "name": "read_clipboard", "returns": "DataFrame" },
      { "name": "read_csv", "returns": "DataFrame" },
      { "name": "read_excel", "returns": "DataFrame" },
      ...
    ],
    "types": {
      "DataFrame": {
        "methods": [
          "abs",
          "add",
          "add_prefix",
          ...
          { "name": "pop", "updates": [0] },
          ...
        ]
      }
    }
  }
}

A module's spec provides a list of the module's functions, types, and submodules. A type spec provides a list of the type's methods. In the function/method list, if a function/method appears just as a name (for example, "array" or "abs"), then it has no side-effects and doesn't return any objects with specifications. Otherwise, the function/method appears as a dictionary with its name in a name field and any of the following:

  • updates is an array of strings that lists the parameters that experience side-effects. 0 refers the self parameter, a number k >= 1 refers to the k th parameter, and any non-numeric string is the name of an updated global variable.
  • reads is an array of strings with the global variables that the method reads. (If a slice includes a call to a function that reads a global, then it must also include any calls that update that global.)
  • returns is the type of object the call returns, which is only necessary if that type has a spec.

The specs allow slice to analyze code like the following:

import pandas as pd
d = pd.read_csv("some_path")
d.pop("Column")
d.memory_usage()
d.count()            // ← slice on this line

Looking at the spec above, we can see that read_csv return a DataFrame object, so d is a DataFrame. The call to pop on d has a side-effect on the self parameter (d), because DataFrame's pop spec has an updates of [0]. Therefore, this call to pop must appear in the slice. On the other hand, the call to memory_usage has no side-effects, so it can be left out. So, the final slice includes lines 1, 2, 3, and 5, but not 4.

If there is no spec for an API call, then the data-flow and slicing algorithms conservatively assume that any passed parameter could experience a side-effect.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.