Skip to content

A tool for domain experts to find recent and relevant public discourse on topics they are familiar with

Notifications You must be signed in to change notification settings

yohanyee/brainfeed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

brainfeed

A tool for domain experts to find recent and relevant public discourse on topics they are familiar with

Project overview

Motivation

The increasing accessibility of scientific articles and surrounding public discourse is generally beneficial to society. A tradeoff to this increased public consumption of knowledge (in formats traditionally meant for domain experts) is the rise of misinformation. Articles in scientific journals often describe specific facts and precise outcomes under specific conditions, and their validity and generalizability are usually only understood by a few experts. On the other hand, social media such as Reddit and Twitter allow anyone (often anonymously) to post articles and comment about their contents, and it is in these forums that misunderstandings and wrong information are conveyed and spread. The wide audience of these forums, coupled with increased public interest on scientific topics (for example, in relation to the Covid-19 pandemic), has made it imperative that experts be able to find and engage such posts.

This project, pitched for Brainhack Toronto 2021, seeks to create a live feed of active and relevant public discussions on widely used social media forums. While the initial focus of this project is to detect discussions that revolve around brain imaging, the tools to be developed here should in principle be useful for other scientific fields.

Implementation

A tentative implementation of the project is as follows:

Project communications

For Brainhack Toronto 2021, we'll communicate through a Brainhack Toronto discord channel.

Project goals

For Brainhack 2021:

  • Post and comment detection on Reddit via Reddit API
  • Abstract and keyword detection based via Crossref API
  • Classification of posts relevancy based on abstract keywords
  • Post data to central repository (Firebase)
  • Web application to view recent posts

Future ideas:

  • Extend to other forums (Twitter?)
  • Sentiment detection of discussions
  • Mobile applications to display discussion feed
  • Analysis of scientific information spread across social networks

Current progress

  • Web Application (not deployed yet)

    This web application allows users to search reddit posts in all or a specific subreddit, search the most recent posts or posts in a given time windown, or search posts with specific keywords.

    The app currently works offline (locally) but will be deployed soon.

    python run_app_reddit_search.py
    
  • Reddit Posts Search & Store

    Store a user's search results into a PostgreSQL database.

    For demonstraction, run the following codes:

    python demo_reddit_search.py
  • Reddit Post Recommender

    For a given reddit post, this recommender recommends the top 5 most similar posts based on the content of the post title

    See the Jupyter Notebook reddit_recommender.ipynb

  • Reddit Post Topic (Flair Tag) Classification

    This classification models predicts the topic (flair tag) of reddit posts based on the contnet of the post title.

    For simplicity and demonstration, the present model performs a binary classification on posts with Biology and Environment flair tags.

    See the Jupyter Notebook reddit_classification.ipynb

Contributing

Contributors of all backgrounds and experiences are welcome.

Requirements

  • Python

    For simplicity and consistency, you can create a conda environment using the following command:

    conda create \
    --name brainfeed \
    python=3.7

    After creating the environment, you can activate it by running conda activate brainfeed.

  • Python packages

    You will need the following packages:

    • habanero (for Crossref)
    • PRAW (for Reddit)
    • firebase-admin (for Firebase / Firestore)
    • spyder (optional, a Python IDE)

    You can install the required packages with this command, after activating the conda environment:

    pip install habanero==1.0.0 praw==7.5.0 firebase-admin==5.1.0
    conda install spyder=5.1.5
  • A Reddit account

    Setup a Reddit account, and create a script app by clicking the "Create app" button here. More details on this can be found at: https://github.com/reddit-archive/reddit/wiki/OAuth2

  • A Firebase project

    Create here: https://console.firebase.google.com/

Setup

  1. Clone this repository

    git clone git@github.com:yohanyee/brainfeed.git

  2. Activate the conda environment

    conda activate brainfeed

  3. Copy the praw.ini_TEMPLATE_DO_NOT_ENTER_INFO_HERE file to your config directory and rename it to praw.ini (see https://praw.readthedocs.io/en/stable/getting_started/configuration/prawini.html). Then, fill in your Reddit authentication information, following Reddit guidelines for the user_agent field. Make sure to not have this publicly visible.

  4. Initialize the Firebase SDK (create a service account and download the private key)

    See https://firebase.google.com/docs/admin/setup/#initialize-sdk

  5. Add an environment variable called GOOGLE_APPLICATION_CREDENTIALS pointing to the location of this private key (which should not be publicly visible)

    export GOOGLE_APPLICATION_CREDENTIALS="/home/user/.config/service-account-file.json"

Helpful links

About

A tool for domain experts to find recent and relevant public discourse on topics they are familiar with

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 2

  •  
  •