bloom-harvesting-neo4j-import
v0.0.1
Published
Neo4J Import ============
Downloads
5
Readme
Neo4J Import
This project contains classes required to transform TripleModel instances to table-like structures required to perform batch import of data in the Neo4J database.
Data Model
This section describes main model entities and their relations. Entities:
- doc - documents; it could be a tweet, fb-post, blog post, comment etc
- actor - a person or a service creating documents; it could be also a fb-group or a media-site generating posts
Each entity contains the following fields:
- doc
uri - the main URI uniquely identifying the document
source - source of the document (twitter, facebook, web, etc)
date - date of the creation of this document
href - an URL giving access to the document
content - content of the document
tags - character string representing a comma-separated list of tags
type - type of the document; from synthesio
country - country of the document; could be empty; from synthesio
language - document language; from synthesio
sentiment - sentiment associated with the document; positive/negative/neutral/undefined
influence_document - numerical influence value; extracted from synthesio
influence_author - numerical influence value for the document author; from synthesio
followers - number of author followers; extracted from synthesio
favorits - likes etc
retweets - number of retweets; from synthesio
- actor
- uri - an URI uniquely identifying the actor
- source - twitter/facebook/web/...
- type - type of the actor
- name - human readable actor's name
- description - actor's descrition
- key - key from the original social network
- topic
- uri - unique identifier of the topic
Relations:
- author - relation between the document and the actor authoring the document
- refersTo - relation between two documents; ex: one document contains a http reference to another one
- mention - a document referencing an actor; ex: mentions in twitter like @john
- partOf - relation between document and topic
These relations can be represented as following:
- doc=[author]=>actor
- doc=[refersTo]=>doc
- doc=[mention]=>actor
- doc=[partOf]=>topic
The Main Package Class
The main class of this package is the Neo4JImportBatchGenerator. The responsibility of this class is transformation of TripleModel instances to individual entities and relations.