Competition Network Analysis Tutorial - Part 1
Code
Contents
- Part 1: Analyzing Example Network Data Sample
(You are here.) - Part 2: Goodness of Fit, Degeneracy, Estimation Algorithm
- Part 3: Adding and Cleaning New data
- Part 4: Computing Period Networks and Covariate Lists
- Part 5: Analyzing Your Network with Updated Data
Acknowledgement
This research was supported in part by MOST-105-2420-H-009-012-DR and MOST-106-2922-I-009-127.
Info
- Intro to Network Analysis in R
- The R network Package
- Intro to Exponential Random Graph Models (ERGM)
- Computing ERGMs in R
- Bootstrapped ERGMs for Big Networks in R
- Temporal ERGMs (TERGM) in R
Tutorial for computations used in Downing, Kang, & Markman (Under Review).
Part 1: Analyzing Example Network Data Sample
In this repository’s R
directory, download the R script amj_run_TERGM_tutorial_1.R
.
Download and save the following RDS (serialized) data file
in the same directory that you save the above script. You can run the script in its entirety simply to get the results, but an explanation of each part is provided below in case you want to change the analysis.
Set the name of the directory where you saved the data file:
##===============================
## SET YOUR DATA DIRECTORY:
## This is the path to the folder where you saved the data file.
## If you are using a Windows PC, use double backslash path separators "..\\dir\\subdir\\.."
##-------------------------------
data_dir <- '/set/your/data/directory/here'
Set parameters for the analysis and load the data into memory:
## analysis parameters
firm_i <- 'qualtrics' ## focal firm
d <- 2 ## ego network theshold (order)
## load RDS data file into memory as a list of networks
# data_file <- file.path(data_dir,sprintf('%s_d%s.rds',firm_i,d))
data_file <- file.path(data_dir, 'tutorial_d2_competition_network_sample.rds')
nets.all <- readRDS(data_file)
len <- length(nets.all)
## set number of time periods
nPeriods <- 8 ## any number from 3 to 11 is OK, but use 8 to compare results with example
## subset network periods
nets <- nets.all[(len-nPeriods+1):len]
Set the model formulas for the Part 1 tutorial:
m0 <- nets ~ edges + gwesp(0, fixed = T) + gwdegree(0, fixed=T) +
nodematch("ipo_status", diff = F) +
nodematch("state_code", diff = F) +
nodecov("age") + absdiff("age") +
memory(type = "stability", lag = 1)
m1 <- nets ~ edges + gwesp(0, fixed = T) + gwdegree(0, fixed=T) +
nodematch("ipo_status", diff = F) +
nodematch("state_code", diff = F) +
nodecov("age") + absdiff("age") +
nodecov("cent_deg") +
memory(type = "stability", lag = 1) +
nodecov("genidx_multilevel") +
nodecov("cent_pow_n0_4") + absdiff("cent_pow_n0_4") +
cycle(3) + cycle(4)
Set the number of bootstrap replications. According to Leifeld, Cranmer, & Desmarais (2018) :
- Roughly 100 is enough for an approximate estimate
- On the order of 1000 or more for reporting results
R <- 100 ## enough for a rough estimate
Compute the first model m0
and save to disk as an RDS (serialized) file:
## set pseudorandom number generator seed for reproducibility
set.seed(1111)
## estimate the TERGM with bootstrapped PMLE
fit0 <- btergm(get('m0'), R=R, parallel = "multicore", ncpus = detectCores())
##
## Initial dimensions of the network and covariates:
## t=2 t=3 t=4 t=5 t=6 t=7 t=8
## nets (row) 180 180 180 180 180 180 180
## nets (col) 180 180 180 180 180 180 180
## memory (row) 180 180 180 180 180 180 180
## memory (col) 180 180 180 180 180 180 180
##
## All networks are conformable.
##
## Dimensions of the network and covariates after adjustment:
## t=2 t=3 t=4 t=5 t=6 t=7 t=8
## nets (row) 180 180 180 180 180 180 180
## nets (col) 180 180 180 180 180 180 180
## memory (row) 180 180 180 180 180 180 180
## memory (col) 180 180 180 180 180 180 180
##
## Starting pseudolikelihood estimation with 100 bootstrapping replications using multicore forking on 4 cores...
## Done.
## SAVE SERIALIZED DATA
fit0_file <- file.path(data_dir,sprintf('fit_%s_pd%s_R%s_%s.rds', firm_i, nPeriods, R, 'm0'))
saveRDS(fit0, file=fit0_file)
Compute the second model m1
and save to disk as an RDS (serialized) file:
## set pseudorandom number generator seed for reproducibility
set.seed(1111)
## estimate the TERGM with bootstrapped PMLE
fit1 <- btergm(get('m1'), R=R, parallel = "multicore", ncpus = detectCores())
##
## Initial dimensions of the network and covariates:
## t=2 t=3 t=4 t=5 t=6 t=7 t=8
## nets (row) 180 180 180 180 180 180 180
## nets (col) 180 180 180 180 180 180 180
## memory (row) 180 180 180 180 180 180 180
## memory (col) 180 180 180 180 180 180 180
##
## All networks are conformable.
##
## Dimensions of the network and covariates after adjustment:
## t=2 t=3 t=4 t=5 t=6 t=7 t=8
## nets (row) 180 180 180 180 180 180 180
## nets (col) 180 180 180 180 180 180 180
## memory (row) 180 180 180 180 180 180 180
## memory (col) 180 180 180 180 180 180 180
##
## Starting pseudolikelihood estimation with 100 bootstrapping replications using multicore forking on 4 cores...
## Done.
## SAVE SERIALIZED DATA
fit1_file <- file.path(data_dir,sprintf('fit_%s_pd%s_R%s_%s.rds', firm_i, nPeriods, R, 'm1'))
saveRDS(fit1, file=fit1_file)
Create a list of model fits. Print the regression table to screen and save it as a formatted HTML file. You should see results like these:
## Cache model fits list
fits <- list(Model_0=fit0,Model_1=fit1)
## Echo model comparison table to screen
screenreg(fits, digits = 3)
##
## =============================================================
## Model_0 Model_1
## -------------------------------------------------------------
## edges -0.180 -1.916 *
## [-1.749; 1.067] [-3.590; -0.606]
## gwesp.fixed.0 0.813 * 0.449 *
## [ 0.658; 0.908] [ 0.152; 0.625]
## gwdegree -1.145 * -0.504
## [-1.910; -0.527] [-1.454; 0.337]
## nodematch.ipo_status 0.132 -0.176
## [-0.259; 1.070] [-0.659; 0.777]
## nodematch.state_code -0.593 * -0.413 *
## [-0.770; -0.387] [-0.710; -0.048]
## nodecov.age -0.139 * -0.119 *
## [-0.199; -0.065] [-0.159; -0.051]
## absdiff.age 0.157 * 0.142 *
## [ 0.085; 0.216] [ 0.075; 0.177]
## edgecov.memory[[i]] 5.229 * 4.907 *
## [ 4.908; 5.769] [ 4.605; 5.482]
## nodecov.cent_deg 0.009
## [-0.071; 0.067]
## nodecov.genidx_multilevel 1.526 *
## [ 0.401; 2.503]
## nodecov.cent_pow_n0_4 0.008
## [-0.117; 0.176]
## absdiff.cent_pow_n0_4 0.203 *
## [ 0.098; 0.450]
## cycle3 0.349 *
## [ 0.121; 0.615]
## cycle4 -0.010
## [-0.054; 0.035]
## -------------------------------------------------------------
## Num. obs. 13787 67288
## =============================================================
## * 0 outside the confidence interval
## SAVE FORMATTED REGRESSION TABLE
compare_file <- file.path(data_dir,sprintf('%s_tergm_results_pd%s_R%s_%s.html', firm_i, nPeriods, R, 'm0-m1'))
htmlreg(fits, digits = 3, file=compare_file)
##
## finished successfully.