Skip to content

junipertcy/RegRank

Repository files navigation

regrank aims to implement a suite of regularized models to infer the hierarchical structure in a directed network.

Docs · Discussions · Examples

This is the software repository behind the paper:

  • Tzu-Chi Yen and Stephen Becker, Regularized methods for efficient ranking in networks, in preparation.

Installation

RegRank relies on powerful Python libraries with deep C++ dependencies, such as Ax (ft. PyTorch & BoTorch, for hyperparameter search), CVXPY (ft. OSQP/ECOS/SCS, for convex optimization), and graph-tool (ft. BOOST & CGAL, for graph analysis). These packages cannot be installed by pip alone.

Therefore, the recommended installation strategy is a hybrid approach:

  1. Use Conda to create a stable environment and install these heavy, compiled dependencies.
  2. Use uv (a fast package manager written in Rust) to install regrank and its Python dependencies.

We recommend Miniforge or Mambaforge for a minimal, conda-forge-centric setup. Follow these steps to install and use regrank as a library in your projects.

# 1. In a new dir, create a conda environment with Python.
conda create -n regrank -c conda-forge python=3.11 -y

# 2. Activate the new environment.
conda activate regrank

# 3. Install PyTorch (a dependency for Ax), CVXPY, and graph-tool.
#    Using conda for PyTorch is more robust, especially on macOS.
conda install -c pytorch pytorch torchvision
conda install -c conda-forge graph-tool python-graphviz cvxpy sage ecos # docs todo

# 4. Install regrank using uv.
#    (If you don't have uv yet: pip install uv)
uv pip install regrank

Example

# Import the library
import regrank as rr

# Load a data set
g = rr.datasets.us_air_traffic()

# Create a model
model = rr.SpringRank(method="annotated")

# Fit the model: We decided to analyze the `state_abr` nodal metadata,
# We may inspect `g.list_properties()` for other metadata to analyze.
result = model.fit(g, alpha=1, lambd=0.5, goi="state_abr")

# Now, result["primal"] should have the rankings. We can compute a summary.
summary = model.compute_summary(g, "state_abr", primal_s=result["primal"])

# Print a summary table
rr.print_summary_table(summary)

# Plot the rankings as rain-cloud plots, grouped by cluster
rr.plot_rankings(result["primal"], labels=list(g.vp["state_abr"]),
                 summary=summary, kind="box")

Let's plot the rankings via rr.plot_rankings(..., kind="box"). Each cluster is shown as a rain-cloud plot (half-violin density, boxplot, and individual observations). Note that most of the node categories are regularized to have the same mean ranking.

Rain-cloud plots of four ranking clusters in the U.S. air traffic network.

We provided a summary via rr.print_summary_table(summary).

------------------------------------------------------------------------------------------------
| Group   | #Tags |  #Nodes | Members                                  |     Mean |        Std |
------------------------------------------------------------------------------------------------
| 1       |     5 |     825 | CA, WA, OR, TT, AK                       |    0.047 |    1.1e-02 |
| 2       |     4 |     206 | TX, MT, PA, ID                           |   -0.006 |    4.2e-03 |
| 3       |    43 |    1243 | MI, IN, TN, NC, VA, IL, CO, SC, FL, NY,  |   -0.035 |    4.3e-03 |
|         |       |         | MD, KY, NH, MO, CT, AR, ND, VT, NV, UT,  |          |            |
|         |       |         | PR, OK, WV, SD, HI, IA, RI, MS, AL, NE,  |          |            |
|         |       |         | NM, DE, WI, MN, NJ, OH, ME, GA, LA, AZ,  |          |            |
|         |       |         | MA, KS, WY                               |          |            |
| 4       |     1 |       4 | VI                                       |   -0.072 |    0.0e+00 |
------------------------------------------------------------------------------------------------

The result suggests that states such as CA, WA, or AK are significantly more popular than other states.

Data sets

We have a companion repo, regrank-data, which stores the data sets used in the paper. All data sets can be loaded via the regrank.datasets submodule into a graph-tool Graph object.

  • PhD Exchange (PhD_exchange()): Faculty hiring among 230 math PhD programs, 9,584 edges over 1946–2010. Vertex properties: vname, vindex. Edge properties: eweight, etime.
  • US Air Traffic (us_air_traffic()): Yearly flight snapshots among 2,278 U.S. commercial airports, ~6.4M edges (1990–present). Vertex properties: airport_code, state_abr, state_nm, wac, and others. Edge properties: passengers, year, quarter, month, unique_carrier, carrier_name, distance, and others.
  • Dutch School (dutch_school(wave=)): Friendship network among 26 pupils across 4 temporal snapshots. Edge properties: weight.
  • Chess (chess()): Directed network of chess game outcomes, 7,301 players, ~65K games. Edge properties: weight, time.
  • Parakeet (parakeet(group=)): Dominance interactions among 21 parakeets (G1), 1,013 edges over 4 quarters. Vertex properties: name. Edge properties: quarter.
  • Bitcoin OTC (bitcoin_otc()): Signed temporal trust network, 5,881 users, ~35.6K ratings (−10 to +10). Edge properties: eweight, etime.
  • Austrian Migrations (at_migrations()): Migrations between 2,115 Austrian municipalities, ~2.9M edges (2002–2022). Vertex properties: name, code. Edge properties: count, year, sex, nationality.
  • Faculty Hiring (US) (faculty_hiring_us()): PhD-to-hiring flows among 3,284 US universities, ~62K hiring events. Vertex properties: id, name, non_attrition, attrition. Edge properties: total, men, women.
  • Reddit Hyperlinks (reddit_hyperlinks()): Hyperlinks between 35,776 subreddits, ~287K links with sentiment labels (2013–2017). Vertex properties: name. Edge properties: sentiment, timestamp.
  • EU Email (eu_email()): Email communication at a European research institution, 265,214 nodes, ~420K edges. No vertex or edge properties.
Dataset Time-varying Huber Node groups Edge groups Scalability
PhD Exchange
US Air Traffic
Dutch School
Chess
Parakeet
Bitcoin OTC
Austrian Migrations
Faculty Hiring (US)
Reddit Hyperlinks
EU Email

Development Notes

We use pytest to ensure the consistency and correctness during development. The test suite uses CVXPY's SCS solver to compare results. One may optionally use other solvers but they must be installed independently. See CVXPY's installation guide.

If you want to contribute to regrank (thank you!), we recommend setting the enviroment by (1) Git clone this repository and navigate into it; (2) Follow Steps 1 to 3 as above; (3) Install regrank in "editable" mode along with its development dependencies, via uv pip install -e ".[dev]".

Use pre-commit run --all-files for pre-commit checks.

License

regrank is open-source and licensed under the GNU Lesser General Public License v3.0. This means that you are welcome to include this library in your own projects, whether they are open-source or proprietary. The main idea is to allow you to use the library's functionality freely, while ensuring that any improvements made directly to regrank itself are shared back with the community -- through a legal mechanism called "weak copyleft".

Acknowledgments

TCY wants to thank Perplexity.ai and gemini-2.5-pro-preview-06-05.

About

Regularized methods for efficient ranking in networks

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Sponsor this project

 

Contributors

Languages