newslynx

v1.2.1

Published

2 years ago

A platform for measuring the quantitative and qualitative impact of journalism.

Downloads

0High
0Medium
0Low

mhkeller0

analytics journalism impact qualitative quantitative tow center columbia

NewsLynx App

The NewsLynx web interface

This project is a part of research project at the Tow Center for Digital Journalism at Columbia University by Michael Keller & Brian Abelson.

Read our full documentation for installing your own instance. The instructions below cover installing and developing the app as well as architectural documentation on how it works.

Since a lot of functionality is intertwined across different repositories, please post all issues to our project issue tracker.

Getting started

Install dependencies with npm install.

If you haven't run newslynx init by following the full install instructions, you can still test out the app by creating .newslynx folder in your home folder and creating a config.yaml file that looks like the following (change the secret key to something else or not if you're simply testing):

api_version: v1
app_secret_key: chicken-burrito-grande
https: false
api_url: http://localhost:5000

To recap, if you want to do a dry run to make sure the app runs locally but you haven't configured / ran the api yet, make a file that looks like the one above and put it in ~/.newslynx/config.yaml. Without having a server running locally, you won't get passed the login page, but at least you can make sure you get that far.

Running the server

To start the server, run

npm start

This compiles your CSS and JS and runs the server with Forever.

When you see the following, it's done and you can visit http://localhost:3000.

Note: If you are running this in production, you want to run it in behind https and tell the app you are doing so one of two ways:

Run it with the environment variable NEWSLYNX_ENV=https
Set https: true in your ~/.newslynx/config.yaml file

This will make sure your cookies are set securely.

#####################################
# HTTP listening on 0.0.0.0:3000... #
#####################################

Other start up commands

Alternate commands are in package.json under scripts.

Developing locally

If you want to modify files and have the CSS and JS re-compiled automatically and the server restarted if necessary, do:

npm run dev

If you just want to watch the CSS and JS and re-compile when on change, do:

npm run watch-files

If you just want to watch the Express server and restart when its files change (templates, server js files), do:

npm run watch-server

These last two commands are best run in tandem in two separate shell windows. npm run dev does them both in one window for convenience.

The final command listed is npm test, which will run a simple test to make sure the server can launch.

Documentation

This documentation will explain the architecture and design patterns in use in the Express app and each section's Backbone app.

Overall architecture

The NewsLynx app has two main components:

An ExpressJS server-side JavaScript application, which is concerned with authentication, sessioning and requesting data from the api. We'll refer to this as the "Express app" or the "Express side."
A combination of multiple JavaScript Single-Page Applications. Each "page" in NewsLynx — currently Settings, Approval River, Articles and the Submit event page — are their own front-end JavaScript apps written using the Backbone framework. We'll refer to this as the "front-end."

The front-end code communicates with the Express side through Express routes as defined in lib/routes/. Probaby the most important route is the one that redirects any URL that starts with /api/ to the api endpoint and returns a JSON response.

Express App architecture

Running the app

The main Express app file is [lib/app.js]. This file glues all the Express middleware together such as sessioning, cookies, routes and determines some logic for which routes require authentication.

To run the app, you can start it from the command line through the file bin/www.js by providing the run argument like so:

./bin/www.js run

It defaults to port 3000 but that can be changed with a second argument

./bin/www.js run 3001

In production and development, however, we run the server with Forever and Nodemon, respectively. These tools have better support for keeping a NodeJS server alive for long periods of time. Nodemon is used in development since it can restart the server whenever files are modified.

Templates and loading CSS

Templates are written in Jade and found in [lib/views/]. They extend from lib/views/layout.jade which specifies "blocks" that subviews will insert themselves into. Here's what layout.jade looks like:

doctype html
html
  head
    title NewsLynx | 
      = info.title
    block css
    link(rel='stylesheet', href='/stylesheets/octicon/octicons.css')
    link(rel='stylesheet', href='/stylesheets/css/#{info.page}.css')
  body(data-section="#{info.page}")
    #main-wrapper
      block main-wrapper-contents
    #global-loading
    block bootstrap-data
    block templates
    block scripts

Note: If you open up layout.jade you'll see it has all of this ugly JavaScript describing menu items like Copy, Paste and Reload. This is to construct menu items for the Desktop application so we're skipping that here.

You can see two variables here, title and page. These are important, since, as you can see, that variable name determines what CSS file is loaded, which we'll explain more in the StyleSheets with Stylus section below. Generally, you can see that a page-specifi variable name will determine which CSS file we load. These variables match exactly the route name, for example, when you go to /settings, info.title is set to Settings in lib/routes/pages.js near line 103, which is then run through the sanitize function, which will put it in lowercase and replace spaces with dashes. We'll then fetch the file at /stylesheets/css/settings.css.

A page data attribute is also set on the body, which is used for loading page-specific JavaScript files and is discussed below in How page-specific JavaScript is loaded.

So, with this main layout.jade file, we then have page-specific jade files which insert blocks. Each of these inherit from lib/views/page.jade

Here's what that file looks like:

extends layout

block main-wrapper-contents
  include includes/left-rail/index.jade
  #drawer(data-loading="true")
    block drawer
  #content(data-loading="true")
    block content

Take a look at lib/views/settings.jade for an example of a "Page" layout file, which inserts code into the drawer block, or the content block.

Authentication & interacting with the Core API

Every API call must include org and apikey query parameters. Read more in the Newslynx Core documentation for more specifics. As far as the App is concerned, all user login operations are handled by routes in lib/routes/organizations.js.

Logging in is done by sending a POST request to /login containing the following data:

{
  "email": "<string>",
  "password": "<string>",
  "remember_me": "<string>"
}

The remember_me value is set via a checkbox, which will serialize to on if checked and falsey if not. That value will set the maxAge of the session cookie to the distant future so that a user does not need to enter their information until they logout.

You can see it's also doing a few things with this redirect_url business. The idea here is that if you have not authenticated, and you want to go to, says, /articles, you will be redirected to login. After you login, the expectation is that you will proceed to where you originally intended. To do that is both simple and complicated.

The simple part is that you can stash the incoming url on the req.session object, which is what we do initially in app.js near line 93. That url won't include anything in the hash, however, because the server never receives that information — it considers it below its station, it is the domain of the client and must not rise to such peaks.

For example, if we go to /articles#detail, Express only sees /articles as the page. This is better than nothing, though, so we save it as req.session.redirect_page. So how do we save the # stuff?

The complicated part is that we can save the hash client-side once we get to the login page by putting in some javascript that writes the hash to a hidden input field. When we submit our login form, we also submit the page where we intended to go. The jade template inserts that markup below the Remember me button:

  .form-row
    label 
      input(type='checkbox' name="remember_me") 
      | Remember me
    //- Handle redirects by stashing the # portion of the url in a hidden field, which will then be picked up by our login POST endpoint
    script.
      var href = document.location.href
      if (href.indexOf('logout') === -1){
        document.write('<input type="hidden" name="redirect_url" value="'+href+'"/>');
      }

Note How we don't stash this if we are on the logout page since we would be redirected to logging out.

So if we want to go to the /articles#detail page, the object we POST actually looks like this:

{
  "email": "<string>",
  "password": "<string>",
  "remember_me": "<string>",
  "redirect_url": "/login#detail"
}

Notice how it thinks we want to go to the login page, plus our original hash, even though we requested /articles#detail. This is because the document.location.href is executing on the login page. So it preserves our hash but not the page!

Putting two and two together, Express was able to store the page, but not the hash. The client can store the hash, but not the original page. The rest of the code in our login POST endpoint replaces the /login with our previously saved page. Phew!

This request is then forwarded to the almighty auth.relay function, which handles communication with the Core API. It deserves a few words.

Talking to the Core API

All communication with the Core API is handled throgh lib/utils/auth.js. For logging in this, means setting data under auth. More generally, it adds our apikey and org id from the session to sign each request and adds the API url, as set in our config.yaml file, and always returns JSON. The file itself is heavily commented for what each part does specifically but as an overview, if the Express App wants to talk to the Core API, it goes through the relay.

Sessioning with LevelDB

The app keeps track of whether a user is logged in by setting a cookie on the person's browser with a Session ID.The Session ID stores the user's api key in a LevelDB database, which is written out to the lib/db/ folder.

This whole process is largely abstracted thanks to the use of two libraries:

express-session handles communicating with the browser's cookies
level-session-store handles putting our sessions in the database.

This process is configured in lib/app.js. We include a flag for storing the session securely if we are in an https production environment, which is set as explained above in Getting started.

var sessInfo = {
  store: new LevelStore(path.join(__dirname, 'db')),
  secret: NEWSLYNX_CONFIG.app_secret_key,
  resave: true,
  saveUninitialized: true,
  unset: 'destroy',
  cookie: {secure: false}
};

// If we are running in a secure environment
if (app.get('env') === 'https' || NEWSLYNX_CONFIG.https === true) {
  app.set('trust proxy', 1) // Trust the first proxy
  sessInfo.cookie.secure = true
}

Bootstrapping and transforming data

Currently, on initial load for any of your main Pages, the Express app will make a bunch of calls to the API and package up this data as a global data object called pageData. You can see how all this plays out in the lib/routes/pages.js file.

We currently have an open issue to change this pattern so that Backbone collections fetch their own data on load. The advantage with this change is that the user will see the page change more quickly than with the current setup. For example, from the Home screen, if you click "Approval River," that data is all fetched asynchronously by the Express app but then your browser loads it all in one big object, which is why you hang on that Loading gif of Merlynne making potions a few seconds.

We built it this way, essentially, because that's the way we first set it up. The benefit of doing it this way is we are also doing a number of transformations on the data and the fact that we serialize the JSON data (i.e. convert it to a string and then back out to JSON) lets us not worry about mutating data in unexpected ways (because objects are passed by reference, not duplicated in JavaScript, you can easily modify an object in one place and unexpectedly see those changes reflected in elsewhere as well).

For example, our articles come back from the server with a list of Subject tag ids. We then hydrate these ids with the full subject tag info. If we weren't careful, we would really only have one copy of this object instead of multiple. The consequence of that is if we delete a subject tag off of one article, it would be removed from every article.

This problem is not insurmountable, but I explain it here to point out some of the advantages of the current system and things to keep in mind for shifting to another system.

All of the transformations are stored in lib/utils/transform.js

Front-end architecture

Build process with Gulp

The front-end JavaScript is written in separate files that are meant to be concatenated together and minified. We use Gulp to do this and watch those files for changes. Gulp also transforms our Stylus files into normal CSS files. Checkout the Gulpfile, which orchestrates all the events.

The final concatenated JavaScript file is saved to lib/public/javascripts/main.bundled.js and that file is loaded in every page template. Let's look at the hierarchy of these javascript files, which are all in

This is the order in which the gulpfile concatenates them:

// ...
  js: [
    './lib/public/javascripts/namespace.js',
    './lib/public/javascripts/helpers/*.js',
    './lib/public/javascripts/models/*.js',
    './lib/public/javascripts/collections/*.js',
    './lib/public/javascripts/views/*.js',
    './lib/public/javascripts/app/*.js',
    './lib/public/javascripts/routing/*.js',
    './lib/public/javascripts/init.js'
  ]

// ...

Because these files are concatenated in alphabetical order, views or other files that are meant to be extended are given the file name prefix AA_ to make sure they are loaded first.

Let's look at namespace.js in particular, since that's the first file and it will give us some sense of the structure the rest of the files are built around. This file creates our top-level objects we'll be using throughout the app:

'use strict';
var helpers = {};
var templates = {};
var models = {};
var collections = {};
var app = {};
var views = {};
var routing = {};

We'll look at these more in detail in the How page-specific JavaScript is loaded section. For now, just note how these main objects are what the rest of the files add functions and objects to.

For styles, gulp puts page-specific CSS files in the css/ folder. This is discussed more in detail in the next section, Stylesheets with Stylus.

Stylesheets with Stylus

The app uses a CSS preprocessor called Stylus, which is a NodeJS package. These files are in lib/public/stylesheets/. Each page has its own top level file such as articles.styl, home.style, approval-river.styl etc.

Styles are broken into smaller files so they can be more easily reused across views. These are all in lib/public/stylesheets/blueprint/. Even smaller stylus files that are reused across "blueprint" files are in the the modules subfolder. The nested folder structure helps show which files are meant to be used as shared assets.

During the build process, the top level files for each page are written into the css/ folder at lib/public/stylesheets/css/. To bring it full circle, these files, articles.css, home.css, approval-river.css are what layout.jade calls based on the info.page variable, as explained above.

link(rel='stylesheet', href='/stylesheets/css/#{info.page}.css')

How page-specific JavaScript is loaded

As explained in the Build process with Gulp, the JavaScript is baked out into one file main.bundled.js unlike the CSS files, which are page-specific. Which JavaScript functions get executed, however, is determined through the same info.page variable.

In the main layout.jade file, the HTML <body> element gets a page-specific data-attribute like so:

body(data-section="#{info.page}")

When the JavaScript file are loaded, they call corresponding functions and the page-specific code gets executed. In the build process section, we discussed the order in which these files were concatenated,the last element in that list is the first file we look to to run our app, init.js.

If you look at this file, you'll see that each of the objects in our namespace.js has an init object, that contain page-specific functions. When we load a page, we grab that data-attribute off of <body> and that dictates which function groups to execute off of these objects.

In this way, the app's models, collections and views are instantiated by the main init object at the bottom of this file, which looks like this:

var init = {
  go: function(){
    // Call the page specific functions
    var section = $('body').attr('data-section');
    // Their `this` should be the root object so you can still say `this.` even though you're nested in the object
    templates.init[section].call(templates);
    models.init[section].call(models);
    collections.init[section].call(collections);
    app.init[section].call(app);
    routing.init.go.call(routing, section);
  }
}

init.go();

The main view for each page is the app.instance object.

Form serialization

The app uses the library jquery.serializeJSON to turn all form elements into JSON objects. Check out the getSettings function in AA_BaseForm.js near line 541 for the details of that implementation.

Settings

Change detection

Modal windows

Approval River

Form construction

Form validation

Articles

Comparison view

Isotope

Detail view

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.