Progress Report - Advancements in Solar and Wind Energy in the USA

Key Drivers of the Growth of Solar and Wind Electricity Generation in the USA

Authors

Pingfan Hu

Abbey Kollar

Published

October 29, 2023

1. Research Question & Explanation

1.1 Research Question

Exploring the key drivers of the growth of solar and wind electricity generation in the USA.

1.2 Explanation of the Study

The drivers we considered in this progress report are:

  1. Cost of solar and wind installation
  2. Cost of fossil fuel (take coal as example)
  3. Policies and incentives regarding solar and wind
  4. Energy research budgets

Our study is in 3 stages:

  1. Stage 1: Solar/wind electricity generation vs these 4 drivers. We expect that the above 4 drivers will have positive or negative correlations with the growth of solar and wind electricity generation.
  2. Stage 2: Correlation matrix of all 5 variables. The 4 drivers and the generation add up to 5 variables. We expect that there will be relationships among the drivers, not just the drivers influencing generation.
  3. Stage 3: Expansion on the progress report. New questions have been generated from these initial figures, specifically:
    1. What countries have the highest increase in wind and solar generation?
    2. What is the relationship between wind and solar and petroleum and natural gas in terms of cost and generation? Especially because there has been a rise in natural gas use, unlike coal.
    3. What U.S. states have the highest growth of wind and solar generation?
    4. What are the differences between the generation data in EIA, IRENA, and Our World in Data sources?
    5. How are generation and the drivers related to capacity?

The team will attempt to answer some of these questions, as well as updating some of the plots within this project proposal.

2. Data Source Discussion

We have adjusted our data sources from the proposal to focus on wind and solar energy. We have found sources that cover generation and capacity of energy sources, United States policies, and country R&D spending.

2.1 EIA Data Browser - Solar/Wind Generation & Fossil Cost

This data browser is generated by EIA (2023) (US Energy Information Administration). In this browser, the data can be selected as net generation, consumption, or retail; and the reported data components can be selected based on our purposes. It contains data of different renewable energy sources from 2000 to 2023. The data is pre-processed by EIA, but originated from EIA API. The EIA appears to collect its data from “survey respondents,” which could indicate missing data if not all power companies or organizations are reporting data to the EIA, and there is the potential for bias if only certain types of companies, like large corporations, are reporting data and if the survey questions are inherently biased.

2.2 IRENA - Solar/Wind Installed Cost

IRENA (2023), short for International Renewable Energy Agency, also provides data about renewable energy capacity, power generation, and renewable energy balances. The data is processed by IRENA and was collected from IRENA members using a questionnaire and the gaps in data were filled using desk research. The renewable energy power capacity statistics are updated in March, and the power generation and balances data sets are updated in July. While IRENA tries to fill in missing data through their own research, the data set may still be incomplete. The questionnaire may introduce biases depending on how the questions are worded.

2.3 Our World in Data - Solar/Wind Share & Consumption

The Energy Data Explorer of OurWorldinData (2023) provides data on the consumption, relative change, and share of total energy for all energy sources, including solar and wind and fossil fuels. The explorer allows us to choose specific countries. The data was pre-processed by Our World in Data and was formerly published by the Energy Institute Statistical Review of World Energy (2023). The Energy Institute analyzes data on world energy markets from the prior year and has been providing this “timely, comprehensive and objective data to the energy community since 1952.” The institute appears to partner with other institutions to obtain data, like energy companies, and gets information from government sources and published data, thus the data from Our World in Data has been processed twice. While the original data sources are government sources and published data, there can be gaps in the data if not all power producers are reporting data to the governments. Also, there could be bias due to which data sources are used (i.e. selection bias), and the report includes methodologies for manipulating the data so there could be uncertainty and biases with these underlying assumptions.

2.4 DSIRE - Solar/Wind Laws & Incentives

DSIRE (2023) (Database of State Incentives for Renewables and Efficiency) provides state and federal policy related to renewable energy and efficiency. This data has been pre-processed by the N.C. Clean Energy Technology Center at N.C. State University and provides summaries of incentives and policies established by the federal government, state governments and U.S. territories, local governments, and larger electric and gas utilities in the U.S. The website does not state how N.C. collected this information, but it is likely from publicly available websites and news reports. The data does not include federal funding opportunities for R&D, demonstration projects, outreach, or inter-governmental programs nor bills that have not been enacted or regulations that have not been fully approved. The scope is limited to local incentives and policies that are especially innovative, and to municipalities and counties with the largest populations, which results in missing data and biases for what qualifies as “innovative.” The data is typically updated at least once a year.

2.5 IEA - Energy Technology RD&D Budgets

This data set from the IEA (2023) contains spending on energy technologies within IEA countries since 1974. The data was processed by the IEA and collected from central or federal government budgets, as well as the budgets of state-owned companies, for spending on a range of sectors including energy efficiency, renewables, nuclear power, fossil fuels, hydrogen and fuel cells, and more. There is missing data because not all countries are IEA members and perhaps not all spending is reflected in the budgets, which also could introduce biases.

3. Figures and Evaluations

3.1 Solar and Wind Generation

The EIA provides the solar and wind generation data. We need to firstly perform data wrangling and exporting:

Code
# Raw data wrangling & exporting
solar_gen_path <- here('data', 'eia_solar_gen_raw.csv')
solar_gen <- read_csv(solar_gen_path, skip = 4) %>% 
    clean_names() %>% 
    select('month', 'united_states_thousand_megawatthours') %>% 
    arrange(month) %>% 
    mutate(
        month = dmy(paste('01', month)),
        month = format(month, '%Y-%m'),
        year = as.numeric(substr(month, 1, 4))
    ) %>% 
    filter(year <= 2022) %>% 
    group_by(year) %>% 
    summarise(solar_gen_gwh = sum(united_states_thousand_megawatthours)) %>% 
    write_csv(here('data', 'eia_solar_gen.csv'))

wind_gen_path <- here('data', 'eia_wind_gen_raw.csv')
wind_gen <- read_csv(wind_gen_path, skip = 4) %>% 
    clean_names() %>% 
    select('month', 'united_states_thousand_megawatthours') %>% 
    arrange(month) %>% 
    mutate(
        month = dmy(paste('01', month)),
        month = format(month, '%Y-%m'),
        year = as.numeric(substr(month, 1, 4))
    ) %>% 
    filter(year <= 2022) %>% 
    group_by(year) %>% 
    summarise(wind_gen_gwh = sum(united_states_thousand_megawatthours)) %>% 
    write_csv(here('data', 'eia_wind_gen.csv'))

# Long format & exporting
solar_wind_gen_longer <- inner_join(solar_gen, wind_gen, by = 'year') %>% 
    pivot_longer(names_to = 'energy_source',
                 values_to = 'gwh',
                 cols = c('solar_gen_gwh', 'wind_gen_gwh')) %>% 
    write_csv(here('data', 'eia_solar_wind_gen_longer.csv'))

Then, we use the prepared dataset to create the plot:

Code
solar_wind_gen_longer %>% 
    ggplot(aes(x = year,
               y = gwh / 1000,
               color = energy_source)) +
    geom_line(linewidth = 1) +
    geom_label(data = subset(solar_wind_gen_longer, year == max(year)),
               aes(label = recode(energy_source, 
                                  solar_gen_gwh = 'Solar', 
                                  wind_gen_gwh = 'Wind')),
               nudge_x = -0.5,
               nudge_y = 10) +
    labs(x = 'Year',
         y = 'Terawatt-Hour (TWH)',
         color = 'Energy Source',
         title = 'Solar and Wind Energy Generation in the USA',
         subtitle = 'From 2000 to 2022') +
    scale_x_continuous(breaks = seq(2000, 2022, by = 2)) +
    theme_bw(base_family = 'Ubuntu') +
    theme(legend.position = 'none')

In our proposal, we made predictions about capacity and the relationships of the drivers to capacity; however, we have shifted in the progress report to focus on generation since that is the actual amount of electricity generated in a given time period. We had expected capacity to be unimodal and widely spread, and we expected this distribution for generation as well. The plot of energy generation supports those expectations. Both sources are steadily increasing in annual generation and have yet to plateau, and the generation is widely-spread from about 0 TWH to 150 and over 400 TWH for solar and wind, respectively. The generation of wind has increased at a faster rate than solar, which is unexpected due to the decrease in solar costs and relatively steady wind costs (see 3.2). However, the cost of solar was higher than wind until 2018 despite the steady decrease in costs, so perhaps more wind projects were deployed from 2000 to 2018.

3.2 Cost of Solar and Wind Installation

The price of solar and wind technologies can be described by the installed costs of these projects. The team has acquired the weighted cost data of the USA from IRENA. The installed costs can be plotted based on these datasets.

IRENA provides the weighted installed costs of solar and wind projects from 2010 to 2019. We firstly manipulate with the data:

Code
# Read solar and wind cost data and merge to one
solar_cost <- read_csv(here('data', 'irena_solar_installed_cost.csv'))
wind_cost <- read_csv(here('data', 'irena_wind_installed_cost.csv'))
solar_wind_cost <- left_join(solar_cost, wind_cost, by = 'year')

# Pivot to longer format and export to csv
solar_wind_cost_longer <- solar_wind_cost %>% 
    pivot_longer(names_to = 'energy_source',
                 values_to = 'cost',
                 cols = -year) %>% 
    write_csv(here('data', 'irena_solar_wind_installed_cost_longer.csv'))

Then, we generate the plot of both solar and wind installed costs:

Code
solar_wind_cost_longer %>% 
    ggplot(aes(x = year,
               y = cost,
               color = energy_source)) +
    geom_line(linewidth = 1) +
    geom_label(data = subset(solar_wind_cost_longer, year == max(year)),
               aes(label = recode(energy_source, 
                                  solar_installed_cost_usd_per_kw = 'Solar', 
                                  wind_installed_cost_usd_per_kw = 'Wind')),
               nudge_x = 0.6,
               nudge_y = 10) +
    labs(x = 'Year',
         y = 'Installed Cost (USD/kW)',
         color = 'Energy Source',
         title = 'Solar and Wind Installed Costs in the USA',
         subtitle = 'In USD/kW, From 2010 to 2019') +
    scale_x_continuous(breaks = seq(2010, 2019, by = 1)) +
    theme_bw(base_family = 'Ubuntu') +
    theme(legend.position = 'none')

It was expected that the cost for renewables would be multimodal due to yearly variations but that the overall trend would be decreasing and that the grouping would range depending on the type of technology, specifically the capital cost elasticity. The plot of the cost of solar installed in the U.S. shows a decreasing trend from 2010 and is widely-spread from around 4500 to 1000 USD/kW, which mostly supports the expectations except for the expected yearly variability causing a multimodal distribution. The cost of installing wind projects appears to be multimodal with one large peak in 2010 and a smaller range from about 2325 to 1500 USD/kW. It makes sense that the cost of solar installation has a larger range than wind cost because of the technological solar panel advancements, which have made the capital cheaper, while wind turbines have not seen such progress.

3.3 Cost of Fossil Fuel (Coal)

The EIA provides the cost of electricity generation using coal, which can be used as a comparison. Firstly, we perform data wrangling and data exporting:

Code
fossil_cost_path <- here('data', 'eia_fossil_cost_raw.csv')
fossil_cost <- read_csv(fossil_cost_path, skip = 4) %>% 
    clean_names() %>% 
    arrange(month) %>% 
    mutate(
        month = dmy(paste('01', month)),
        month = format(month, '%Y-%m'),
        year = as.numeric(substr(month, 1, 4))
    ) %>% 
    filter(year <= 2022) %>% 
    group_by(year) %>% 
    summarise(avg_coal_cost_per_ton = 
                  mean(united_states_coal_dollars_per_tons)) %>% 
    write_csv(here('data', 'eia_fossil_cost.csv'))

Then, we plot the cost of coal:

Code
fossil_cost %>% 
    filter(year >= 2010 & year <= 2019) %>% 
    ggplot(aes(x = year,
               y = avg_coal_cost_per_ton)) +
    geom_ribbon(aes(ymin = 33,
                    ymax = avg_coal_cost_per_ton),
                alpha = 0.4) +
    geom_line() +
    labs(x = 'Year',
         y = 'USD / Ton',
         title = 'Average Cost of Coal Used for Electricity Generation in the USA',
         subtitle = 'In USD/Ton, From 2010 to 2019') +
    scale_x_continuous(limits = c(2010, 2019),
                       breaks = seq(2010, 2018, by = 1),
                       expand = c(0,0)) +
    scale_y_continuous(limits = c(33, 48),
                       breaks = seq(33, 48, by = 3),
                       expand = expansion(mult = c(0, 0.1))) +
    theme_bw(base_family = 'Ubuntu')

The price of coal was expected to be multimodal due to yearly fluctuations in prices but tightly-grouped since fossil fuels have been a long term energy source in the US with no recent major developments in technological efficiency or resource extraction. The graph supports these expectations as there has been fluctuations in the price resulting in a peak around 2011 and 2014, and the cost has varied slightly from 46.5 to 36 USD/Ton. It is interesting that the cost of coal appears to align with the presidential administrations (Obama 2009-2017, Trump 2017-2021, Biden starting in 2021), perhaps indicating fossil fuel subsidies or political pushes towards renewables.

3.4 Policies and Incentives

The DSIRE (Database of State Incentives for Renewables & Efficiency) provides a comprehensive collection of policies and incentives in terms of sustainability. There are 4 subcategories:

  1. Electric Vehicles
  2. Energy Efficiency
  3. Energy Storage
  4. Renewable Energy

This website is a great source of policies and incentives, but it doesn’t provide direct link of csv files. A work-around would be web scrapping using R codes. We decided to obtain the complete table, and tailor it later based on our needs.

The table in this page is dynamic and is not revealed only by providing the URL. We suspect it’s controlled by some JavaScript codes. A work-around is to store the table part of the HTML codes.

Below are the codes of scrapping the table:

Code
# Read the html from local path
laws_html_path <- here('data', 'dsire_laws_raw.html')
laws_html <- read_html(laws_html_path)

# Scrape the table as a df
laws_table <- laws_html %>% 
  html_nodes('table') %>% 
  .[[1]] %>% 
  html_table()

# Save the df to csv
write_csv(laws_table, here('data', 'dsire_laws_raw.csv'))

The raw table needs to be modified and saved as a preparation of plotting:

Code
# Solar laws
solar_laws <- read_csv(here('data', 'dsire_laws_raw.csv')) %>% 
    filter(str_detect(`Policy/Incentive Type`, 'Solar')) %>%
    mutate(Created = as.integer(substr(Created, 7, 10))) %>% 
    count(Created) %>% 
    rename(year = Created, count_of_solar_laws = n) %>% 
    write_csv(here('data', 'dsire_solar_laws.csv'))

# Wind laws
wind_laws <- read_csv(here('data', 'dsire_laws_raw.csv')) %>% 
    filter(str_detect(`Policy/Incentive Type`, 'Wind')) %>%
    mutate(Created = as.integer(substr(Created, 7, 10))) %>% 
    count(Created) %>% 
    rename(year = Created, count_of_wind_laws = n) %>% 
    write_csv(here('data', 'dsire_wind_laws.csv'))

Then we do the plotting.

Firstly, we plot the number of solar policies:

Code
solar_laws %>% 
    filter(year <= 2020) %>% 
    ggplot(aes(x = year, y = count_of_solar_laws)) +
    
    # Shading Layer
    geom_rect(aes(xmin = -Inf, xmax = 2001,
                  ymin = -Inf, ymax = Inf),
              fill = 'mediumaquamarine', alpha = 0.1) +
    geom_rect(aes(xmin = 2001, xmax = 2009,
                  ymin = -Inf, ymax = Inf),
              fill = 'moccasin', alpha = 0.1) +
    geom_rect(aes(xmin = 2009, xmax = 2017,
                  ymin = -Inf, ymax = Inf),
              fill = 'lightblue', alpha = 0.1) +
    geom_rect(aes(xmin = 2017, xmax = Inf,
                  ymin = -Inf, ymax = Inf),
              fill = 'lightcoral', alpha = 0.1) +
    
    # Column Layer
    geom_col() +
    
    # Label Layer
    geom_label(aes(x = 2000, y = 36, label = 'Clinton'),
               fill = 'mediumaquamarine', nudge_x = -0.2, nudge_y = 5) +
    geom_label(aes(x = 2005, y = 36, label = 'Bush'),
               fill = 'moccasin', nudge_x = 0, nudge_y = 5) +
    geom_label(aes(x = 2013, y = 36, label = 'Obama'),
               fill = 'lightblue', nudge_x = 0, nudge_y = 5) +
    geom_label(aes(x = 2019, y = 36, label = 'Trump'),
               fill = 'lightcoral', nudge_x = 0.2, nudge_y = 5) +
    
    labs(x = 'Year',
         y = 'Number of Solar Policies',
         title = 'Number of Solar Policies by Year in the USA',
         subtitle = 'From 2000 to 2020') +
    scale_x_continuous(breaks = seq(2000, 2020, by = 2)) +
    theme_bw(base_family = 'Ubuntu')

Then, we plot the number of wind policies:

Code
wind_laws %>% 
    ggplot(aes(x = year,
               y = count_of_wind_laws)) +
    
    # Shading Layer
    geom_rect(aes(xmin = -Inf, xmax = 2001,
                  ymin = -Inf, ymax = Inf),
              fill = 'mediumaquamarine', alpha = 0.1) +
    geom_rect(aes(xmin = 2001, xmax = 2009,
                  ymin = -Inf, ymax = Inf),
              fill = 'moccasin', alpha = 0.1) +
    geom_rect(aes(xmin = 2009, xmax = 2017,
                  ymin = -Inf, ymax = Inf),
              fill = 'lightblue', alpha = 0.1) +
    geom_rect(aes(xmin = 2017, xmax = Inf,
                  ymin = -Inf, ymax = Inf),
              fill = 'lightcoral', alpha = 0.1) +
    
    # Column Layer
    geom_col() +
    
    # Label Layer
    geom_label(aes(x = 2000, y = 36, label = 'Clinton'),
               fill = 'mediumaquamarine', nudge_x = -0.2, nudge_y = 5) +
    geom_label(aes(x = 2005, y = 36, label = 'Bush'),
               fill = 'moccasin', nudge_x = 0, nudge_y = 5) +
    geom_label(aes(x = 2013, y = 36, label = 'Obama'),
               fill = 'lightblue', nudge_x = 0, nudge_y = 5) +
    geom_label(aes(x = 2019, y = 36, label = 'Trump'),
               fill = 'lightcoral', nudge_x = 0.2, nudge_y = 5) +
    
    labs(x = 'Year',
         y = 'Number of Wind Policies',
         title = 'Number of Wind Policies by Year in the USA',
         subtitle = 'From 2000 to 2020') +
    scale_x_continuous(breaks = seq(2000, 2020, by = 2)) +
    theme_bw(base_family = 'Ubuntu')

Note that some policies will both contain the topics of wind and solar, so there is an overlap in solar and wind policies.

It was expected that the number of policies would be multimodal and widely-spread due to changes in administration. The plots for wind and solar policies support these expectations. It makes sense that the largest number of wind and solar policies occurred in 2000 while the highest generation of wind and solar has been in the more recent years because there is usually a policy-implementation lag. Also, 2000 was near the end of Clinton’s presidential term, so more progressive policies may have been instituted at this time. There is a clear association of policies enacted depending on the presidential term (Clinton 1993-2001, Bush 2001-2009, Obama 2009-2017, Trump 2017-2021): Clinton and Obama show higher numbers of policies than Trump and Bush. Moving forward, we will consider adding color to indicate if the congressional majority was the same party as the president.

3.5 Energy Research Budgets

The iea provides a comprehensive collection of energy technology RD&D budgets. It can be traced back to 1974 and ends at 2015. We filtered only the solar and wind renewable energy categories and hand scripted the data into a local csv file.

Let’s read and wrangle the file:

Code
# Store the wider format for later use
solar_wind_budget_wider <- read_csv(
    here('data', 'iea_solar_wind_budget_wider.csv')
    )

# This is the longer format
solar_wind_budget_longer <- read_csv(
    here('data', 'iea_solar_wind_budget_wider.csv')
    ) %>% 
    pivot_longer(cols = -year,
                 names_to = 'energy_source',
                 values_to = 'rdd_millions_usd')

Then, let’s do the plot:

Code
label_data <- subset(solar_wind_budget_longer, year == max(year))
label_data$nudge <- ifelse(label_data$energy_source == 
                               'solar_rdd_million_usd', 220, -100)

solar_wind_budget_longer %>% 
    ggplot(aes(x = year,
               y = rdd_millions_usd,
               color = energy_source)) +
    geom_line(linewidth = 1) +
    geom_label(data = label_data,
               aes(label = recode(energy_source, 
                                  solar_rdd_million_usd = 'Solar', 
                                  wind_rdd_million_usd = 'Wind')),
               nudge_x = -0.5,
               nudge_y = label_data$nudge) + 
    labs(x = 'Year',
         y = 'RD&D Budget (Million USD)',
         color = 'Energy Source',
         title = 'Solar and Wind Energy Budgets in the USA',
         subtitle = 'In Million USD, From 1974 to 2015') +
    scale_x_continuous(breaks = seq(1975, 2015, by = 5)) +
    theme_bw(base_family = 'Ubuntu') +
    theme(legend.position = 'none')

It was expected that the amount of funding for renewable energy research would also be multimodal and widely-spread because of administration changes. The multimodal expectation was supported by the figure with peaks occurring around 1980, 1995, and 2010 for wind and solar, but solar was wide-spread, while wind was more tightly-grouped. This figure indicates a funding-implementation lag, as the most funding for solar occurred in the late 1970s and early 1980s, but the decline in solar costs is more recent. The boom in solar funding in the late 1970s could be related to the start of the environmental movement. The second major increase in funding from 2009-2015 overlaps with Obama’s administration.

3.6 Correlations

The relationships we expected to find among these variables were:

  1. The price of renewable energy is negatively correlated with renewable energy generation.
  2. The price of fossil fuels is positively correlated with renewable energy generation.
  3. The number of promotional policies is positively correlated with renewable energy generation.
  4. The amount of funding for renewable research is positively correlated with renewable energy generation.

3.6.1 Solar/Wind Generation vs 4 Drivers

In order to perform the correlations, we need to merge the datasets:

Code
# Both solar and wind
merged_solar_wind <- fossil_cost %>%
    full_join(solar_cost, by = 'year') %>%
    full_join(solar_gen, by = 'year') %>%
    full_join(solar_laws, by = 'year') %>%
    full_join(solar_wind_budget_wider, by = 'year') %>%
    full_join(wind_cost, by = 'year') %>%
    full_join(wind_gen, by = 'year') %>%
    full_join(wind_laws, by = 'year') %>% 
    arrange(year) %>% 
    write_csv(here('data', 'merged_solar_wind.csv'))

Then, we calculate the correlation coefficients between solar generation (solar_gen) and other solar related variables. They are:

  • Solar installed cost - solar_installed_cost_usd_per_kw
  • Solar related laws - count_of_solar_laws
  • Solar technology budget - solar_rdd_million_usd
  • Coal cost (representing traditional power source) - avg_coal_cost_per_ton
Code
cor_solar_gen_coal_cost <- cor(
    merged_solar_wind$solar_gen_gwh,
    merged_solar_wind$avg_coal_cost_per_ton,
    use = 'pairwise.complete.obs'
    )
cor_solar_gen_solar_cost <- cor(
    merged_solar_wind$solar_gen_gwh,
    merged_solar_wind$solar_installed_cost_usd_per_kw,
    use = 'pairwise.complete.obs'
    )
cor_solar_gen_solar_laws <- cor(
    merged_solar_wind$solar_gen_gwh,
    merged_solar_wind$count_of_solar_laws,
    use = 'pairwise.complete.obs'
    )
cor_solar_gen_solar_rdd <- cor(
    merged_solar_wind$solar_gen_gwh,
    merged_solar_wind$solar_rdd_million_usd,
    use = 'pairwise.complete.obs'
    )

Then, we calculate the correlation coefficients between wind generation (wind_gen) and other wind related variables. They are:

  • Wind installed cost - wind_installed_cost_usd_per_kw
  • Wind related laws - count_of_wind_laws
  • Wind technology budget - wind_rdd_million_usd
  • Coal cost (representing traditional power source) - avg_coal_cost_per_ton
Code
cor_wind_gen_coal_cost <- cor(
    merged_solar_wind$wind_gen_gwh,
    merged_solar_wind$avg_coal_cost_per_ton,
    use = 'pairwise.complete.obs'
    )
cor_wind_gen_wind_cost <- cor(
    merged_solar_wind$wind_gen_gwh,
    merged_solar_wind$wind_installed_cost_usd_per_kw,
    use = 'pairwise.complete.obs'
    )
cor_wind_gen_wind_laws <- cor(
    merged_solar_wind$wind_gen_gwh,
    merged_solar_wind$count_of_wind_laws,
    use = 'pairwise.complete.obs'
    )
cor_wind_gen_wind_rdd <- cor(
    merged_solar_wind$wind_gen_gwh,
    merged_solar_wind$wind_rdd_million_usd,
    use = 'pairwise.complete.obs'
    )

We merge all these values into a data frame:

Code
cor_solar_wind <- data.frame(value_1 = c('Solar',
                                         'Solar',
                                         'Solar',
                                         'Solar',
                                         'Wind',
                                         'Wind',
                                         'Wind',
                                         'Wind'),
                             value_2 = c('Coal Cost',
                                         'Installed Cost',
                                         'Laws',
                                         'Budgets',
                                         'Coal Cost',
                                         'Installed Cost',
                                         'Laws',
                                         'Budgets'),
                             cor = c(cor_solar_gen_coal_cost,
                                     cor_solar_gen_solar_cost,
                                     cor_solar_gen_solar_laws,
                                     cor_solar_gen_solar_rdd,
                                     cor_wind_gen_coal_cost,
                                     cor_wind_gen_wind_cost,
                                     cor_wind_gen_wind_laws,
                                     cor_wind_gen_wind_rdd)) %>% 
    write_csv(here('data', 'cor_solar_wind.csv'))

The summary table of solar/wind generation vs the 4 other variables is shown below:

Code
cor_solar_wind %>% 
    mutate(cor = round(cor, 2),
           cor = cell_spec(
               cor, 'html', 
               color = ifelse(
                   cor > 0.5 | cor < -0.5, 'white', 'black'
                   ),
               background = ifelse(
                   cor > 0.5 | cor < -0.5, 'cornflowerblue', 'white'
                   ))) %>%
    kable(format = 'html',
          escape = FALSE,
          align = c('l', 'l', 'l'),
          col.names = c('Energy Type', 'Compared Variable', 'Correlation'),
          caption = 'Correlations of Solar/Wind Generation vs the Other 4 Variables') %>%
    kable_styling(bootstrap_options = c('striped', 'hover'))
Correlations of Solar/Wind Generation vs the Other 4 Variables
Energy Type Compared Variable Correlation
Solar Coal Cost -0.54
Solar Installed Cost -0.96
Solar Laws 0.28
Solar Budgets -0.21
Wind Coal Cost -0.52
Wind Installed Cost -0.18
Wind Laws 0.1
Wind Budgets 0.2

We found support for two of the expected relationships shown in the correlation coefficient figure: both solar and wind installed costs were negatively correlated with generation, and laws were positively correlated with solar and wind generation. Only wind had a positive correlation between R&D funding and generation, while solar had an unexpected negative correlation. Recall with the RD&D graph, a large amount of solar funding occurred before the increase in generation, likely causing this discrepancy. Additionally, the cost of coal had an unexpected negative correlated with solar and wind generation. This discrepancy is likely due to the fact that the cost of coal has remained relatively steady, and there might be a response-lag between the cost of coal increasing and the generation of solar and wind increasing.

3.6.2 Correlation Matrix

The above section only calculated the correlation coefficients between Solar/Wind Generation and the 4 other variables, which did not provide sufficient convincing results. A better solution is to work out the correlation matrix, so that all possible correlations are clearly shown. We can search for the most correlated variables based on this matrix. To search for the most correlated variables, we merged the datasets into merged_solar and merged_wind:

Code
# Solar
merged_solar <- fossil_cost %>%
    full_join(solar_cost, by = 'year') %>%
    full_join(solar_gen, by = 'year') %>%
    full_join(solar_laws, by = 'year') %>%
    full_join(solar_wind_budget_wider %>%
                  select(-wind_rdd_million_usd),
              by = 'year') %>% 
    arrange(year) %>% 
    write_csv(here('data', 'merged_solar.csv'))

# Wind
merged_wind <- fossil_cost %>%
    full_join(wind_cost, by = 'year') %>%
    full_join(wind_gen, by = 'year') %>%
    full_join(wind_laws, by = 'year') %>% 
    full_join(solar_wind_budget_wider %>%
                  select(-solar_rdd_million_usd),
              by = 'year') %>% 
    arrange(year) %>% 
    write_csv(here('data', 'merged_wind.csv'))

Then, we calculate the correlation matrix for this merged data:

Code
cor_matrix_solar <- cor(
    merged_solar[, sapply(merged_solar, is.numeric)],
    use = 'pairwise.complete.obs'
    ) %>% 
    as.data.frame()
cor_matrix_wind <- cor(
    merged_wind[, sapply(merged_wind, is.numeric)],
    use = 'pairwise.complete.obs'
    ) %>% 
    as.data.frame()

We set a threshold of \(\pm 0.8\) to indicate a “high” correlation. There are 5 high correlation coefficients in solar, and 2 in wind.

For solar:

Code
high_cor_solar <- cor_matrix_solar %>%
    mutate(value_1 = row.names(.)) %>%
    gather(value_2, cor, -value_1) %>%
    filter(value_1 != value_2,
         cor > 0.8 | cor < -0.8) %>% 
    mutate(min_var = pmin(value_1, value_2),
           max_var = pmax(value_1, value_2)) %>%
    distinct(min_var, max_var, .keep_all = TRUE) %>%
    select(-min_var, -max_var) %>% 
    arrange(value_1) %>% 
    write_csv(here('data', 'high_cor_solar.csv'))
  1. Solar Generation vs Year: 0.84.
  2. Solar Generation vs Solar Installed Cost: -0.96.
  3. Solar Installed Cost vs Year: -0.98.
  4. Solar Installed Cost vs Coal Cost: 0.90.
  5. Solar Research Budget vs Solar Installed Cost: 0.87.

For wind:

Code
high_cor_wind <- cor_matrix_wind %>%
    mutate(value_1 = row.names(.)) %>%
    gather(value_2, cor, -value_1) %>%
    filter(value_1 != value_2,
         cor > 0.8 | cor < -0.8) %>% 
    mutate(min_var = pmin(value_1, value_2),
           max_var = pmax(value_1, value_2)) %>%
    distinct(min_var, max_var, .keep_all = TRUE) %>%
    select(-min_var, -max_var) %>% 
    arrange(value_1) %>% 
    write_csv(here('data', 'high_cor_wind.csv'))
  1. Wind Generation vs Year: 0.97.
  2. Wind Installed Cost vs Year: -0.81.

The high correlations associated with year have been previously discussed in the variable expectations section and those associated with generation were discussed in the last section; therefore, there were only two new strong correlations produced via this analysis. The two correlations were solar installed cost vs coal cost and solar research budget vs solar installed cost. Solar installed cost was strongly, positively correlated with coal cost, but wind cost was not, which is likely due to the steady decline in the price of solar while the price of wind has been more steady. Solar R&D funding was strongly, positively correlated with installed cost, which may indicate that R&D had been focused on cost efficiency of the solar panels via innovative, cheaper materials.

3.6.3 High Correlation Plots

In the previous part, we found 5 high correlations in solar and 2 in wind. Let’s generate the scatter plots to visualize them:

Code
# Plot 1: Solar Generation vs Year
plot_1 <- merged_solar_wind %>%
    filter(!is.na(solar_gen_gwh)) %>% 
    ggplot(aes(x=year, y=solar_gen_gwh / 1000)) +
    geom_point() +
    geom_smooth(method='lm', se=FALSE, size = 0.5,
                color='cornflowerblue') +
    labs(x = 'Year',
         y = 'Solar Generation (TWH)',
         title='Solar Generation vs Year',
         subtitle = 'Correlation = 0.84') +
    theme_bw(base_family = 'Ubuntu')

# Plot 2: Solar Generation vs Solar Installed Cost
plot_2 <- merged_solar_wind %>%
    filter(!is.na(solar_gen_gwh) & !is.na(solar_installed_cost_usd_per_kw)) %>% 
    ggplot(aes(x=solar_installed_cost_usd_per_kw, y=solar_gen_gwh / 1000)) +
    geom_point() +
    geom_smooth(method='lm', se=FALSE, size = 0.5,
                color='cornflowerblue') +
    labs(x = 'Solar Installed Cost (USD / kW)',
         y = 'Solar Generation (TWH)',
         title='Solar Generation vs Installed Cost',
         subtitle = 'Correlation = -0.96') +
    theme_bw(base_family = 'Ubuntu')

# Plot 3: Solar Installed Cost vs Year
plot_3 <- merged_solar_wind %>%
    filter(!is.na(solar_installed_cost_usd_per_kw)) %>% 
    ggplot(aes(x=year, y=solar_installed_cost_usd_per_kw)) +
    geom_point() +
    geom_smooth(method='lm', se=FALSE, size = 0.5,
                color='cornflowerblue') +
    labs(x = 'Year',
         y = 'Solar Installed Cost (USD / kW)',
         title='Solar Installed Cost vs Year',
         subtitle = 'Correlation = -0.98') +
    theme_bw(base_family = 'Ubuntu')

# Plot 4: Solar Installed Cost vs Coal Cost
plot_4 <- merged_solar_wind %>%
    filter(!is.na(solar_installed_cost_usd_per_kw) &
               !is.na(avg_coal_cost_per_ton)) %>% 
    ggplot(aes(x=avg_coal_cost_per_ton,
               y=solar_installed_cost_usd_per_kw)) +
    geom_point() +
    geom_smooth(method='lm', se=FALSE, size = 0.5,
                color='cornflowerblue') +
    labs(x = 'Average Coal Cost (USD / Ton)',
         y = 'Solar Installed Cost (USD / kW)',
         title='Solar Installed Cost vs Coal Cost',
         subtitle = 'Correlation = 0.90') +
    theme_bw(base_family = 'Ubuntu')

# Plot 5: Solar Research Budget vs Solar Installed Cost
plot_5 <- merged_solar_wind %>%
    filter(!is.na(solar_installed_cost_usd_per_kw) & !is.na(solar_rdd_million_usd)) %>% 
    ggplot(aes(x=solar_installed_cost_usd_per_kw, y=solar_rdd_million_usd)) +
    geom_point() +
    geom_smooth(method='lm', se=FALSE, size = 0.5,
                color='cornflowerblue') +
    labs(x = 'Solar Installed Cost (USD / kW)',
         y = 'Solar RD&D (Million USD)',
         title='Solar RD&D vs Installed Cost',
         subtitle = 'Correlation = 0.87') +
    theme_bw(base_family = 'Ubuntu')

# Plot 6: Wind Generation vs Year
plot_6 <- merged_solar_wind %>%
    filter(!is.na(wind_gen_gwh)) %>% 
    ggplot(aes(x = year,
               y = wind_gen_gwh / 1000)) +
    geom_point() +
    geom_smooth(method='lm', se=FALSE, size = 0.5,
                color='cornflowerblue') +
    labs(x = 'Year',
         y = 'Wind Generation (TWH)',
         title='Wind Generation vs Year',
         subtitle = 'Correlation = 0.97') +
    theme_bw(base_family = 'Ubuntu')

# Plot 7: Wind Installed Cost vs Year
plot_7 <- merged_solar_wind %>%
    filter(!is.na(wind_installed_cost_usd_per_kw)) %>% 
    ggplot(aes(x = year,
               y = wind_installed_cost_usd_per_kw)) +
    geom_point() +
    geom_smooth(method='lm', se=FALSE, size = 0.5,
                color='cornflowerblue') +
    labs(x = 'Year',
         y = 'Wind Installed Cost (USD / kW)',
         title='Wind Installed Cost vs Year',
         subtitle = 'Correlation = -0.81') +
    theme_bw(base_family = 'Ubuntu')

Show the solar plots:

Code
grid.arrange(plot_1, plot_2, ncol = 2)

Code
grid.arrange(plot_3, plot_4, ncol = 2)

Code
plot_5

Show the wind plots:

Code
grid.arrange(plot_6, plot_7, ncol = 2)

After plotting these high correlation relationships, it becomes more obvious which relationships are more convincing than others. Solar generation vs year appears to be more exponential rather than linear. Solar RD&D vs installed cost has very few data points. Finally, wind installed cost vs year has an odd curvature after 2005.

3.7 Summary Tables

This section generates 3 summary tables of solar, wind, and fossil (coal).

3.7.1 Summary of Solar

Below is the summary of solar data, including solar electricity generation, installed cost, laws, and RD&D budget:

Code
# Calculate the summary and export to csv
summary_solar_gen <- summary(solar_gen$solar_gen_gwh)
summary_solar_cost <- summary(solar_cost$solar_installed_cost_usd_per_kw)
summary_solar_laws <- summary(solar_laws$count_of_solar_laws)
summary_solar_budget <- summary(solar_wind_budget_wider$solar_rdd_million_usd)

summary_solar_df <- data.frame(
    tier = names(summary_solar_gen),
    solar_gen_gwh = round(as.numeric(summary_solar_gen), 2),
    solar_cost_usd_per_kw = round(as.numeric(summary_solar_cost), 2),
    solar_laws_count = round(as.numeric(summary_solar_laws)),
    solar_budget_million_usd = round(as.numeric(summary_solar_budget), 2)
    ) %>% 
    write_csv(here('data', 'summary_solar.csv'))

# Genarate the table using kable()
summary_solar_df %>% 
    kable(format = 'html',
          escape = FALSE,
          col.names = c('Tier',
                        'Electricity Generation (GWh)',
                        'Installed Cost (USD/KW)',
                        'Laws Count',
                        'RD&D Budget (Million USD)'),
          caption = 'Summary of Solar') %>%
    kable_styling(bootstrap_options = c('striped', 'hover'))
Summary of Solar
Tier Electricity Generation (GWh) Installed Cost (USD/KW) Laws Count RD&D Budget (Million USD)
Min. 507.71 1220.57 1 17.70
1st Qu. 584.31 1959.80 2 132.98
Median 3072.19 2728.27 7 172.85
Mean 29080.30 2985.99 9 283.62
3rd Qu. 48978.68 4297.55 12 276.75
Max. 145598.31 4666.57 45 1168.50

3.7.2 Summary of Wind

Below is the summary of wind data, including wind electricity generation, installed cost, laws, and RD&D budget:

Code
# Calculate the summary and export to csv
summary_wind_gen <- summary(wind_gen$wind_gen_gwh)
summary_wind_cost <- summary(wind_cost$wind_installed_cost_usd_per_kw)
summary_wind_cost <- summary_wind_cost[!names(summary_wind_cost) == "NA's"]
summary_wind_laws <- summary(wind_laws$count_of_wind_laws)
summary_wind_budget <- summary(solar_wind_budget_wider$wind_rdd_million_usd)

summary_wind_df <- data.frame(
    tier = names(summary_wind_gen),
    wind_gen_gwh = round(as.numeric(summary_wind_gen), 2),
    wind_cost_usd_per_kw = round(as.numeric(summary_wind_cost), 2),
    wind_laws_count = round(as.numeric(summary_wind_laws)),
    wind_budget_million_usd = round(as.numeric(summary_wind_budget), 2)
    ) %>% 
    write_csv(here('data', 'summary_wind.csv'))

# Genarate the table using kable()
summary_wind_df %>% 
    kable(format = 'html',
          escape = FALSE,
          col.names = c('Tier',
                        'Electricity Generation (GWh)',
                        'Installed Cost (USD/KW)',
                        'Laws Count',
                        'RD&D Budget (Million USD)'),
          caption = 'Summary of Wind') %>%
    kable_styling(bootstrap_options = c('striped', 'hover'))
Summary of Wind
Tier Electricity Generation (GWh) Installed Cost (USD/KW) Laws Count RD&D Budget (Million USD)
Min. 6737.33 1553.00 1 0.00
1st Qu. 28554.33 1822.50 1 32.05
Median 130499.16 2036.00 7 57.00
Mean 152144.43 2445.53 8 67.63
3rd Qu. 247475.16 2751.50 10 83.38
Max. 434811.65 4950.00 45 247.70

3.7.3 Summary of Fossil

Below is the summary of fossil fuel data (coal costs):

Code
# Calculate the summary and export to csv
summary_fossil_cost <- summary(fossil_cost$avg_coal_cost_per_ton)
summary_fossil_cost_df <- data.frame(
    tier = names(summary_fossil_cost),
    fossil_cost_usd_per_ton = round(as.numeric(summary_fossil_cost), 2)) %>% 
    write_csv(here('data', 'summary_fossil_cost.csv'))

# Genarate the table using kable()
summary_fossil_cost_df %>% 
    kable(format = 'html',
          escape = FALSE,
          col.names = c('Tier',
                        'Coal Cost (USD/Ton)'),
          caption = 'Summary of Fossil Cost') %>%
    kable_styling(bootstrap_options = c('striped', 'hover'))
Summary of Fossil Cost
Tier Coal Cost (USD/Ton)
Min. 36.32
1st Qu. 39.27
Median 42.84
Mean 42.20
3rd Qu. 45.11
Max. 46.65

4. Attribution

Below is the attribution breakdown for the team:

  • Research question was edited together.
  • Data collection and discussion and proposal expectation evaluation by Abbey.
  • Data cleaning and plotting by Pingfan.
  • Moving forward, we will both create the new figures to help explore the questions in Stage 3.

5. Appendix

5.1 Dictionary of Files

This is a complete dictionary of files. The files can be accessed in the “data” folder. In the dictionary, there are 4 columns:

  1. Name - The name of the file.
  2. Type - The type of the file. In our case, we have CSV, HTML, and Excel.
  3. Source - The source of the file. “Generated” means it is generated by R codes; “Raw” means Its either directly downloaded or scrapped from the website.
  4. Description - A brief description of the file.

The dictionary is in alphabetical order.

Code
dict_file <- data.frame(
    name = c('cor_solar_wind',
             'dsire_laws_raw',
             'dsire_solar_laws',
             'dsire_wind_laws',
             'eia_fossil_cost_raw',
             'eia_fossil_cost',
             'eia_solar_gen_raw',
             'eia_solar_gen',
             'eia_solar_wind_gen_longer',
             'eia_wind_gen_raw',
             'eia_wind_gen',
             'high_cor_solar',
             'high_cor_wind',
             'iea_solar_wind_budget_wider',
             'irena_solar_installed_cost',
             'irena_solar_wind_installed_cost_longer',
             'irena_wind_installed_cost',
             'merged_solar_wind',
             'merged_solar',
             'merged_wind',
             'summary_fossil_cost',
             'summary_solar',
             'summary_wind',
             'dsire_laws_raw',
             'irena_costs_raw'),
    type = c('CSV', 'CSV', 'CSV', 'CSV', 'CSV',
             'CSV', 'CSV', 'CSV', 'CSV', 'CSV',
             'CSV', 'CSV', 'CSV', 'CSV', 'CSV',
             'CSV', 'CSV', 'CSV', 'CSV', 'CSV',
             'CSV', 'CSV', 'CSV', 'HTML', 'Excel'),
    source = c('Generated', 'Raw', 'Generated', 'Generated', 'Raw',
               'Generated', 'Raw', 'Generated', 'Generated', 'Raw',
               'Generated', 'Generated', 'Generated', 'Raw',
               'Generated', 'Generated', 'Generated', 'Generated',
               'Generated', 'Generated', 'Generated', 'Generated',
               'Generated', 'Raw', 'Raw'),
    description = c('Correlation between generation and 4 other variables',
                    'Laws and incentives count from DSIRE',
                    'Solar laws and incentives count',
                    'Wind laws and incentives count',
                    'Monthly fossil cost from EIA',
                    'Annual average fossil cost',
                    'Monthly solar electricity generation from EIA',
                    'Annual solar electricity generation',
                    'Longer version of annual solar/wind generation',
                    'Monthly wind electricity generation from EIA',
                    'Annual wind electricity generation',
                    'Solar related correlations higher than 0.8',
                    'Wind related correlations higher than 0.8',
                    'Wider version of solar/wind RD&D budgets from IEA',
                    'Annual average solar installed cost',
                    'Longer version of solar/wind generation installed cost',
                    'Annual average wind installed cost',
                    'Merged dataset of all solar, wind, and fossil',
                    'Merged dataset of both solar and fossil',
                    'Merged dataset of both wind and fossil',
                    'Summary data of fossil cost',
                    'Summary dataset of solar',
                    'Summary dataset of wind',
                    'Scraped law count data from DSIRE',
                    'Renewable power generation cost data from IRENA')
) %>% 
    mutate(count = row_number()) %>%
    select(count, everything())
dict_file %>% 
    kable(format = 'html',
          escape = FALSE,
          align = c('c', 'l', 'l', 'l', 'l'),
          col.names = c('No.', 'Name', 'Type', 'Source', 'Description'),
          caption = 'Dictionary of Data Files') %>%
    kable_styling(bootstrap_options = c('striped', 'hover'))
Dictionary of Data Files
No. Name Type Source Description
1 cor_solar_wind CSV Generated Correlation between generation and 4 other variables
2 dsire_laws_raw CSV Raw Laws and incentives count from DSIRE
3 dsire_solar_laws CSV Generated Solar laws and incentives count
4 dsire_wind_laws CSV Generated Wind laws and incentives count
5 eia_fossil_cost_raw CSV Raw Monthly fossil cost from EIA
6 eia_fossil_cost CSV Generated Annual average fossil cost
7 eia_solar_gen_raw CSV Raw Monthly solar electricity generation from EIA
8 eia_solar_gen CSV Generated Annual solar electricity generation
9 eia_solar_wind_gen_longer CSV Generated Longer version of annual solar/wind generation
10 eia_wind_gen_raw CSV Raw Monthly wind electricity generation from EIA
11 eia_wind_gen CSV Generated Annual wind electricity generation
12 high_cor_solar CSV Generated Solar related correlations higher than 0.8
13 high_cor_wind CSV Generated Wind related correlations higher than 0.8
14 iea_solar_wind_budget_wider CSV Raw Wider version of solar/wind RD&D budgets from IEA
15 irena_solar_installed_cost CSV Generated Annual average solar installed cost
16 irena_solar_wind_installed_cost_longer CSV Generated Longer version of solar/wind generation installed cost
17 irena_wind_installed_cost CSV Generated Annual average wind installed cost
18 merged_solar_wind CSV Generated Merged dataset of all solar, wind, and fossil
19 merged_solar CSV Generated Merged dataset of both solar and fossil
20 merged_wind CSV Generated Merged dataset of both wind and fossil
21 summary_fossil_cost CSV Generated Summary data of fossil cost
22 summary_solar CSV Generated Summary dataset of solar
23 summary_wind CSV Generated Summary dataset of wind
24 dsire_laws_raw HTML Raw Scraped law count data from DSIRE
25 irena_costs_raw Excel Raw Renewable power generation cost data from IRENA

5.2 Dictionary of Solar and Wind Datasets

This is a dictionary of solar and wind related datasets that are generated in this report. The majority of the datasets overlap with the file dictionary, and they have same or similar names.

All datasets are in the format of data frame or tibble.

Code
dict_solar_wind <- data.frame(
    name = c('merged_solar',
             'merged_solar_wind',
             'merged_wind',
             'solar_cost',
             'solar_gen',
             'solar_laws',
             'solar_wind_budget_longer',
             'solar_wind_budget_wider',
             'solar_wind_cost',
             'solar_wind_cost_longer',
             'solar_wind_gen_longer',
             'summary_solar_df',
             'summary_wind_df',
             'wind_cost',
             'wind_gen',
             'wind_laws'),
    type = c('A tibble: 50 × 6',
             'A tibble: 50 × 10',
             'A tibble: 49 × 6',
             'A tibble: 10 × 2',
             'A tibble: 22 × 2',
             'A tibble: 20 × 2',
             'A tibble: 84 × 3',
             'A tibble: 42 × 3',
             'A tibble: 10 × 3',
             'A tibble: 20 × 3',
             'A tibble: 44 × 3',
             'A data frame: 6 x 5',
             'A data frame: 6 x 5',
             'A tibble: 36 × 2',
             'A tibble: 22 × 2',
             'A tibble: 17 × 2'),
    description = c('Merged dataset of both solar and fossil',
                    'Merged dataset of all solar, wind, and fossil',
                    'Merged dataset of both wind and fossil',
                    'Annual average solar installed cost',
                    'Annual solar electricity generation',
                    'Solar laws and incentives count',
                    'Longer version of solar/wind RD&D budgets',
                    'Wider version of solar/wind RD&D budgets',
                    'Solar/wind installed cost',
                    'Longer version of solar/wind installed cost',
                    'Longer version of annual solar/wind generation',
                    'Summary dataset of solar',
                    'Summary dataset of wind',
                    'Annual average wind installed cost',
                    'Annual wind electricity generation',
                    'Wind laws and incentives count')
) %>% 
    mutate(count = row_number()) %>%
    select(count, everything())
dict_solar_wind %>% 
    kable(format = 'html',
          escape = FALSE,
          align = c('c', 'l', 'l', 'l'),
          col.names = c('No.', 'Name', 'Type', 'Description'),
          caption = 'Dictionary of Solar and Wind Datasets') %>%
    kable_styling(bootstrap_options = c('striped', 'hover'))
Dictionary of Solar and Wind Datasets
No. Name Type Description
1 merged_solar A tibble: 50 × 6 Merged dataset of both solar and fossil
2 merged_solar_wind A tibble: 50 × 10 Merged dataset of all solar, wind, and fossil
3 merged_wind A tibble: 49 × 6 Merged dataset of both wind and fossil
4 solar_cost A tibble: 10 × 2 Annual average solar installed cost
5 solar_gen A tibble: 22 × 2 Annual solar electricity generation
6 solar_laws A tibble: 20 × 2 Solar laws and incentives count
7 solar_wind_budget_longer A tibble: 84 × 3 Longer version of solar/wind RD&D budgets
8 solar_wind_budget_wider A tibble: 42 × 3 Wider version of solar/wind RD&D budgets
9 solar_wind_cost A tibble: 10 × 3 Solar/wind installed cost
10 solar_wind_cost_longer A tibble: 20 × 3 Longer version of solar/wind installed cost
11 solar_wind_gen_longer A tibble: 44 × 3 Longer version of annual solar/wind generation
12 summary_solar_df A data frame: 6 x 5 Summary dataset of solar
13 summary_wind_df A data frame: 6 x 5 Summary dataset of wind
14 wind_cost A tibble: 36 × 2 Annual average wind installed cost
15 wind_gen A tibble: 22 × 2 Annual wind electricity generation
16 wind_laws A tibble: 17 × 2 Wind laws and incentives count

5.3 Dictionary of Correlations

This dictionary contains the correlation datasets. All datasets are in the format of data frame.

Code
dict_cor <- data.frame(
    name = c('cor_matrix_solar',
             'cor_matrix_wind',
             'cor_solar_wind',
             'high_cor_solar',
             'high_cor_wind'),
    type = c('A data frame: 6 × 6',
             'A data frame: 6 × 6',
             'A data frame: 8 × 3',
             'A data frame: 5 × 3',
             'A data frame: 2 × 3'),
    description = c('Correlation matrix of all solar related variables',
                    'Correlation matrix of all wind related variables',
                    'Correlation between generation and 4 other variables',
                    'Solar related correlations higher than 0.8',
                    'Wind related correlations higher than 0.8')
) %>% 
    mutate(count = row_number()) %>%
    select(count, everything())
dict_cor %>% 
    kable(format = 'html',
          escape = FALSE,
          align = c('c', 'l', 'l', 'l'),
          col.names = c('No.', 'Name', 'Type', 'Description'),
          caption = 'Dictionary of Correlations') %>%
    kable_styling(bootstrap_options = c('striped', 'hover'))
Dictionary of Correlations
No. Name Type Description
1 cor_matrix_solar A data frame: 6 × 6 Correlation matrix of all solar related variables
2 cor_matrix_wind A data frame: 6 × 6 Correlation matrix of all wind related variables
3 cor_solar_wind A data frame: 8 × 3 Correlation between generation and 4 other variables
4 high_cor_solar A data frame: 5 × 3 Solar related correlations higher than 0.8
5 high_cor_wind A data frame: 2 × 3 Wind related correlations higher than 0.8

5.4 Miscellaneous

There are also other datasets that are not included in the above dictionaries:

  1. There is a fossil_cost tibble that contains the average coal cost per ton from 2008 to 2022.
  2. There are 7 plots named as plot_1 through plot_7. These are the correlation plots shown in 3.6.3 High Correlation Plots. The other plots are shown directly and are not stored as variables or datasets.
  3. There are also some other datasets like label_data, laws_html, laws_table, etc. They are used to temporarily store values.
Back to top

References

DSIRE. 2023. “Clean Energy Programs.” https://programs.dsireusa.org/system/program.
EIA. 2023. “Electricity Data Browser.” https://www.eia.gov/electricity/data/browser/.
IEA. 2023. “Energy Technology RD&d Budgets.” https://www.iea.org/data-and-statistics/data-product/energy-technology-rd-and-d-budget-database-2#.
IRENA. 2023. “IRENA Statistics Data.” https://www.irena.org/Data.
OurWorldinData. 2023. “Energy Data Explorer.” https://ourworldindata.org/explorers/energy.