npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

collabo-flow

v0.1.0

Published

'collabo-flow' is an environment for coordinated execution of flows (or workflows). It is part of the [CollaboScience initiative]{@link http://www.collaboscience.com/}

Downloads

7

Readme

CollaboFlow

NPM version Build Status Coverage Status Dependency Status semantic-release

'CollaboFlow' is a node.js environment for coordinated execution of flows (or workflows) that can be in node, python, ... It is part of the CollaboScience initiative.

Introduction

Collabo-Flow Class diagram

Flow

'Flow' is a structured set of tasks that, taken together, serve a particular purpose. The purpose could be to parse a set of data in a particular way, or to analyse the corpus of an author @see [Bukvik]{@link http://bukvik.litterra.info}, or to crawl Wikipedia and analyse community behavior @see [Democracy-Framework - Wikipedia]{@link http://cha-os.github.io/democracy-framework-wikipedia/}, etc, etc.

Flow can be executed fully or partially. A user can run the entire flow, a particular task, or a particular task with dependencies - the task with all the other tasks that the specified task depends upon. For example, a task calculating some data may depend on the task that loads that data in the first place (its 'dependency').

@see:

  • the [Flow implementation]{@link collabo-flow.Flow}
  • the [FlowStructure]{@link collabo-flow.Flow.FlowStructure}

Task

'Task' describes a clearly isolated unit of work. Task explains 'what' has to be done and 'how' it has to be done. Each task has datasets that it uses for performing intended work. 'How' tasks are executed is described by referring to a specific 'processor' that will perform the work. 'What' the task works with is described with a set of 'datasets' that should be used for performing the intended work.

@see:

  • the [Task implementation]{@link collabo-flow.Task}
  • the [TaskStructure]{@link collabo-flow.Task.TaskStructure}

Processor

'Processor' is the 'code-implementation' of a particular clearly defined type of work, called up by a task. A processor usually takes the form of an internal or external module in the system. It can be encoded in different programming languages (JavaScript (nodejs), Python, ...). Each task relates 1-1 with the processor that is performing it.

Users can

  1. use preexisting processors from some free/commercial bundles (packages)
  2. develop new processors themselves, hire someone to develop them
  3. 'wrap' existing complex tools by creating light wrapper-processors that will make those tools understandable for CollaboFlow.

@see:

  • the [Processor implementation]{@link collabo-flow.Processor}

Dataset

A 'dataset' is a reference to any type (and cardinality) of data organized under a single logical/conceptual unit and identified with a unique namespace(:name). The dataset has different metadata associated with it, it can be versioned etc, but overall it is a set of data the task can uniquely access, load, update or store.

@see:

  • the [Dataset implementation]{@link collabo-flow.Dataset}
  • the [DatasetStructure]{@link collabo-flow.Dataset.DatasetStructure}

Port

A 'port' is a structural unit that connects tasks and datasets. Every task can have multiple named ports for accessing datasets. The two most common ports are 'in' and 'out'. Those two ports contains input datasets that task uses to retrieve all necessary data for processing, and output datasets that task uses for storing working results, retrospectively. Usually each port contains just one dataset, but in the case a task processed an array of datasets, the port can contain any provisionally large array of datasets.

Each port has its semantically described direction. It can be one of the following:

  • in: datasets will be used just for reading from
  • out: datasets will be used just for writing to
  • inout: datasets will be used both for reading from and writing to

@see:

  • the [Port implementation]{@link collabo-flow.Port}
  • the [PortStructure]{@link collabo-flow.Port.PortStructure}

Namespace and name

'Namespace' is a notation used for organizing both tasks and datasets. It consists of a set of words separated with periods, for example:

  • df-wiki.get-pages-revisions.by-pages-list or
  • <NAMESPACE_DATA>.wiki-user.registered.contribution, or
  • <NAMESPACE_DATA>.wiki-user.registered.talk-pages.snapshots.

For the last two namespaces, we can see that they share the same beginning ``<NAMESPACE_DATA>.wiki-user.registered`.

To fully specify (address) one entity, either a dataset or a task, we use both namespace and the name of the entity in the form 'namespace:name'.

'Dependency' between tasks is mainly structured through the datasets they use. For example, if taskB consumes dataset1 (through its input port), and taskA produces dataset1 (through its output port), then we say that 'taskB depends on taskA'.

Examples

Task

{
	"type": "task",
	"namespace": "<NAMESPACE_TASK>.execute",
	"name": "titles-to-pageids",
	"comment": "Converts list of titles (+adding title prefix) into list with titles and pageids",
	"processorNsN": "df-wiki.get-pages.titles-to-pageids",
	"execution": 0,
	"forceExecute": true,
	"isPower": true,
	"params": {
		"outputFileFormats": ["json", "csv"],
		"processorSubType": "titles-to-pageids",
		"pagePrefix": "",
		"wereSpecialPages": false,
		"retrievingDepth": 0,
		"wpNamespacesToRetrieve": [0],
		"everyWhichToRetrieve": 1,
		"csvSeparator": ", ",
		"storeInDb": false,
		"transferExtraFields": ["popularity", "accuracy"]
	},
	"ports": {
		"in": {
			"direction": "in",
			"datasets": [
				{
					"namespace": "<NAMESPACE_DATA>",
					"name": "zeitgeist_2008-2013"
				}
			]
		},
		"listOut": {
			"direction": "out",
			"datasets": [
				{
					"namespace": "<NAMESPACE_DATA>",
					"name": "zeitgeist_2008-2013-with-ids"
				}
			]
		}
	}
}

Notably, there are a few important aspects:

  • type: it tells that the entity in the flow is a task
  • namespace: the namespace that the task belongs to
  • name: the name of the task (inside of the namespace)
  • comment: a comment about the task (what it does, etc), the phrasing is free
  • processorNsN: a full namespace:name path to the processor that will execute the task
  • params: set of properties we are passing to the processor to tweak the task's execution
  • ports: set of ports to the task, where each port has an array of datasets that task can access In this case we have two ports, each with one dataset associated with it.

For more details please @see the [TaskStructure]{@link collabo-flow.Task.TaskStructure}

Ports and dataset

Creating a flow

In order to create a new flow, the user needs to create a new json file with the flow's description in it @see [FlowStructure]{@link collabo-flow.Flow.FlowStructure}.

Example of a blanc flow:

{
	"name": "Micro-analysis",
	"authors": "Sinisha Rudan, Sasha Mile Rudan, Lazar Kovacevic",
	"version": "0.1",
	"description": "Micro-analyses of Wikipedia behavior",
	"environment": {
		"variables": {
			"NAMESPACE_DATA": "democracy-framework.wikipedia"
		}
	},
	"data": [
	]
}

Note: Our sub-project [Bukvik]{@link http://bukvik.litterra.info} does support visualization and editing of the flow, but we are still working on migrating it to a more general context necessary to visualize, edit, and run collaboflows.

node df-wiki/exec.js micro-analysis.json

After setting up all the metadata in the newly created flow, the user can add the first task @see the [TaskStructure]{@link collabo-flow.Task.TaskStructure}. For each task, the user needs to find the appropriate processor (@see [Processor implementation]{@link collabo-flow.Processor}), or create a new one. Some of easily understandable processors can be found among [core Collabo-Flow processors]{@link collabo-flow.processors}.

We need to set up the root of the namespace, at the moment we can do it through 'collabo-flow.execute-flow.js' setting FFS constructor

Creating a new processor

Create a new nodejs file: analyse-wp-polices.js and store it with other processors: df-wiki/processors/analyse-wp-polices.js.

Write stub content for the new processor

Creating a new sub-processor

Register it in 'collabo-flow.execute-flow:loadProcessors' :

// analyze-wp-polices
this.processors['df-wiki.processors.analyze-wp-polices.analyze-user-talk-pages'] = "../df-wiki/processors/analyze-wp-polices";

Possible Scenarios

Democracy Framework

Bukvik

multiple environments

  • python,
  • nodejs, and
  • potentially more (R, java, ...)

communicating through messagging

communicating through streams

communicating through caches

communicating through filesystem storage

communicating through databases

communicating through services

need to control tasks'/services' execution

need to pipeline

need to verify which parts of dataflow have to be reprocessed/updated

LitTerra

ToDo

CollaboFlow

Release History