Key Drivers of the Growth of Solar and Wind Electricity Generation in the USA
Authors
Pingfan Hu
Abbey Kollar
Published
October 29, 2023
1. Research Question & Explanation
1.1 Research Question
Exploring the key drivers of the growth of solar and wind electricity generation in the USA.
1.2 Explanation of the Study
The drivers we considered in this progress report are:
Cost of solar and wind installation
Cost of fossil fuel (take coal as example)
Policies and incentives regarding solar and wind
Energy research budgets
Our study is in 3 stages:
Stage 1: Solar/wind electricity generation vs these 4 drivers. We expect that the above 4 drivers will have positive or negative correlations with the growth of solar and wind electricity generation.
Stage 2: Correlation matrix of all 5 variables. The 4 drivers and the generation add up to 5 variables. We expect that there will be relationships among the drivers, not just the drivers influencing generation.
Stage 3: Expansion on the progress report. New questions have been generated from these initial figures, specifically:
What countries have the highest increase in wind and solar generation?
What is the relationship between wind and solar and petroleum and natural gas in terms of cost and generation? Especially because there has been a rise in natural gas use, unlike coal.
What U.S. states have the highest growth of wind and solar generation?
What are the differences between the generation data in EIA, IRENA, and Our World in Data sources?
How are generation and the drivers related to capacity?
The team will attempt to answer some of these questions, as well as updating some of the plots within this project proposal.
2. Data Source Discussion
We have adjusted our data sources from the proposal to focus on wind and solar energy. We have found sources that cover generation and capacity of energy sources, United States policies, and country R&D spending.
This data browser is generated by EIA (2023) (US Energy Information Administration). In this browser, the data can be selected as net generation, consumption, or retail; and the reported data components can be selected based on our purposes. It contains data of different renewable energy sources from 2000 to 2023. The data is pre-processed by EIA, but originated from EIA API. The EIA appears to collect its data from “survey respondents,” which could indicate missing data if not all power companies or organizations are reporting data to the EIA, and there is the potential for bias if only certain types of companies, like large corporations, are reporting data and if the survey questions are inherently biased.
IRENA (2023), short for International Renewable Energy Agency, also provides data about renewable energy capacity, power generation, and renewable energy balances. The data is processed by IRENA and was collected from IRENA members using a questionnaire and the gaps in data were filled using desk research. The renewable energy power capacity statistics are updated in March, and the power generation and balances data sets are updated in July. While IRENA tries to fill in missing data through their own research, the data set may still be incomplete. The questionnaire may introduce biases depending on how the questions are worded.
The Energy Data Explorer of OurWorldinData (2023) provides data on the consumption, relative change, and share of total energy for all energy sources, including solar and wind and fossil fuels. The explorer allows us to choose specific countries. The data was pre-processed by Our World in Data and was formerly published by the Energy Institute Statistical Review of World Energy (2023). The Energy Institute analyzes data on world energy markets from the prior year and has been providing this “timely, comprehensive and objective data to the energy community since 1952.” The institute appears to partner with other institutions to obtain data, like energy companies, and gets information from government sources and published data, thus the data from Our World in Data has been processed twice. While the original data sources are government sources and published data, there can be gaps in the data if not all power producers are reporting data to the governments. Also, there could be bias due to which data sources are used (i.e. selection bias), and the report includes methodologies for manipulating the data so there could be uncertainty and biases with these underlying assumptions.
DSIRE (2023) (Database of State Incentives for Renewables and Efficiency) provides state and federal policy related to renewable energy and efficiency. This data has been pre-processed by the N.C. Clean Energy Technology Center at N.C. State University and provides summaries of incentives and policies established by the federal government, state governments and U.S. territories, local governments, and larger electric and gas utilities in the U.S. The website does not state how N.C. collected this information, but it is likely from publicly available websites and news reports. The data does not include federal funding opportunities for R&D, demonstration projects, outreach, or inter-governmental programs nor bills that have not been enacted or regulations that have not been fully approved. The scope is limited to local incentives and policies that are especially innovative, and to municipalities and counties with the largest populations, which results in missing data and biases for what qualifies as “innovative.” The data is typically updated at least once a year.
This data set from the IEA (2023) contains spending on energy technologies within IEA countries since 1974. The data was processed by the IEA and collected from central or federal government budgets, as well as the budgets of state-owned companies, for spending on a range of sectors including energy efficiency, renewables, nuclear power, fossil fuels, hydrogen and fuel cells, and more. There is missing data because not all countries are IEA members and perhaps not all spending is reflected in the budgets, which also could introduce biases.
3. Figures and Evaluations
3.1 Solar and Wind Generation
The EIA provides the solar and wind generation data. We need to firstly perform data wrangling and exporting:
Then, we use the prepared dataset to create the plot:
Code
solar_wind_gen_longer %>%ggplot(aes(x = year,y = gwh /1000,color = energy_source)) +geom_line(linewidth =1) +geom_label(data =subset(solar_wind_gen_longer, year ==max(year)),aes(label =recode(energy_source, solar_gen_gwh ='Solar', wind_gen_gwh ='Wind')),nudge_x =-0.5,nudge_y =10) +labs(x ='Year',y ='Terawatt-Hour (TWH)',color ='Energy Source',title ='Solar and Wind Energy Generation in the USA',subtitle ='From 2000 to 2022') +scale_x_continuous(breaks =seq(2000, 2022, by =2)) +theme_bw(base_family ='Ubuntu') +theme(legend.position ='none')
In our proposal, we made predictions about capacity and the relationships of the drivers to capacity; however, we have shifted in the progress report to focus on generation since that is the actual amount of electricity generated in a given time period. We had expected capacity to be unimodal and widely spread, and we expected this distribution for generation as well. The plot of energy generation supports those expectations. Both sources are steadily increasing in annual generation and have yet to plateau, and the generation is widely-spread from about 0 TWH to 150 and over 400 TWH for solar and wind, respectively. The generation of wind has increased at a faster rate than solar, which is unexpected due to the decrease in solar costs and relatively steady wind costs (see 3.2). However, the cost of solar was higher than wind until 2018 despite the steady decrease in costs, so perhaps more wind projects were deployed from 2000 to 2018.
3.2 Cost of Solar and Wind Installation
The price of solar and wind technologies can be described by the installed costs of these projects. The team has acquired the weighted cost data of the USA from IRENA. The installed costs can be plotted based on these datasets.
IRENA provides the weighted installed costs of solar and wind projects from 2010 to 2019. We firstly manipulate with the data:
Code
# Read solar and wind cost data and merge to onesolar_cost <-read_csv(here('data', 'irena_solar_installed_cost.csv'))wind_cost <-read_csv(here('data', 'irena_wind_installed_cost.csv'))solar_wind_cost <-left_join(solar_cost, wind_cost, by ='year')# Pivot to longer format and export to csvsolar_wind_cost_longer <- solar_wind_cost %>%pivot_longer(names_to ='energy_source',values_to ='cost',cols =-year) %>%write_csv(here('data', 'irena_solar_wind_installed_cost_longer.csv'))
Then, we generate the plot of both solar and wind installed costs:
Code
solar_wind_cost_longer %>%ggplot(aes(x = year,y = cost,color = energy_source)) +geom_line(linewidth =1) +geom_label(data =subset(solar_wind_cost_longer, year ==max(year)),aes(label =recode(energy_source, solar_installed_cost_usd_per_kw ='Solar', wind_installed_cost_usd_per_kw ='Wind')),nudge_x =0.6,nudge_y =10) +labs(x ='Year',y ='Installed Cost (USD/kW)',color ='Energy Source',title ='Solar and Wind Installed Costs in the USA',subtitle ='In USD/kW, From 2010 to 2019') +scale_x_continuous(breaks =seq(2010, 2019, by =1)) +theme_bw(base_family ='Ubuntu') +theme(legend.position ='none')
It was expected that the cost for renewables would be multimodal due to yearly variations but that the overall trend would be decreasing and that the grouping would range depending on the type of technology, specifically the capital cost elasticity. The plot of the cost of solar installed in the U.S. shows a decreasing trend from 2010 and is widely-spread from around 4500 to 1000 USD/kW, which mostly supports the expectations except for the expected yearly variability causing a multimodal distribution. The cost of installing wind projects appears to be multimodal with one large peak in 2010 and a smaller range from about 2325 to 1500 USD/kW. It makes sense that the cost of solar installation has a larger range than wind cost because of the technological solar panel advancements, which have made the capital cheaper, while wind turbines have not seen such progress.
3.3 Cost of Fossil Fuel (Coal)
The EIA provides the cost of electricity generation using coal, which can be used as a comparison. Firstly, we perform data wrangling and data exporting:
fossil_cost %>%filter(year >=2010& year <=2019) %>%ggplot(aes(x = year,y = avg_coal_cost_per_ton)) +geom_ribbon(aes(ymin =33,ymax = avg_coal_cost_per_ton),alpha =0.4) +geom_line() +labs(x ='Year',y ='USD / Ton',title ='Average Cost of Coal Used for Electricity Generation in the USA',subtitle ='In USD/Ton, From 2010 to 2019') +scale_x_continuous(limits =c(2010, 2019),breaks =seq(2010, 2018, by =1),expand =c(0,0)) +scale_y_continuous(limits =c(33, 48),breaks =seq(33, 48, by =3),expand =expansion(mult =c(0, 0.1))) +theme_bw(base_family ='Ubuntu')
The price of coal was expected to be multimodal due to yearly fluctuations in prices but tightly-grouped since fossil fuels have been a long term energy source in the US with no recent major developments in technological efficiency or resource extraction. The graph supports these expectations as there has been fluctuations in the price resulting in a peak around 2011 and 2014, and the cost has varied slightly from 46.5 to 36 USD/Ton. It is interesting that the cost of coal appears to align with the presidential administrations (Obama 2009-2017, Trump 2017-2021, Biden starting in 2021), perhaps indicating fossil fuel subsidies or political pushes towards renewables.
3.4 Policies and Incentives
The DSIRE (Database of State Incentives for Renewables & Efficiency) provides a comprehensive collection of policies and incentives in terms of sustainability. There are 4 subcategories:
Electric Vehicles
Energy Efficiency
Energy Storage
Renewable Energy
This website is a great source of policies and incentives, but it doesn’t provide direct link of csv files. A work-around would be web scrapping using R codes. We decided to obtain the complete table, and tailor it later based on our needs.
The table in this page is dynamic and is not revealed only by providing the URL. We suspect it’s controlled by some JavaScript codes. A work-around is to store the table part of the HTML codes.
Below are the codes of scrapping the table:
Code
# Read the html from local pathlaws_html_path <-here('data', 'dsire_laws_raw.html')laws_html <-read_html(laws_html_path)# Scrape the table as a dflaws_table <- laws_html %>%html_nodes('table') %>% .[[1]] %>%html_table()# Save the df to csvwrite_csv(laws_table, here('data', 'dsire_laws_raw.csv'))
The raw table needs to be modified and saved as a preparation of plotting:
Note that some policies will both contain the topics of wind and solar, so there is an overlap in solar and wind policies.
It was expected that the number of policies would be multimodal and widely-spread due to changes in administration. The plots for wind and solar policies support these expectations. It makes sense that the largest number of wind and solar policies occurred in 2000 while the highest generation of wind and solar has been in the more recent years because there is usually a policy-implementation lag. Also, 2000 was near the end of Clinton’s presidential term, so more progressive policies may have been instituted at this time. There is a clear association of policies enacted depending on the presidential term (Clinton 1993-2001, Bush 2001-2009, Obama 2009-2017, Trump 2017-2021): Clinton and Obama show higher numbers of policies than Trump and Bush. Moving forward, we will consider adding color to indicate if the congressional majority was the same party as the president.
3.5 Energy Research Budgets
The iea provides a comprehensive collection of energy technology RD&D budgets. It can be traced back to 1974 and ends at 2015. We filtered only the solar and wind renewable energy categories and hand scripted the data into a local csv file.
Let’s read and wrangle the file:
Code
# Store the wider format for later usesolar_wind_budget_wider <-read_csv(here('data', 'iea_solar_wind_budget_wider.csv') )# This is the longer formatsolar_wind_budget_longer <-read_csv(here('data', 'iea_solar_wind_budget_wider.csv') ) %>%pivot_longer(cols =-year,names_to ='energy_source',values_to ='rdd_millions_usd')
Then, let’s do the plot:
Code
label_data <-subset(solar_wind_budget_longer, year ==max(year))label_data$nudge <-ifelse(label_data$energy_source =='solar_rdd_million_usd', 220, -100)solar_wind_budget_longer %>%ggplot(aes(x = year,y = rdd_millions_usd,color = energy_source)) +geom_line(linewidth =1) +geom_label(data = label_data,aes(label =recode(energy_source, solar_rdd_million_usd ='Solar', wind_rdd_million_usd ='Wind')),nudge_x =-0.5,nudge_y = label_data$nudge) +labs(x ='Year',y ='RD&D Budget (Million USD)',color ='Energy Source',title ='Solar and Wind Energy Budgets in the USA',subtitle ='In Million USD, From 1974 to 2015') +scale_x_continuous(breaks =seq(1975, 2015, by =5)) +theme_bw(base_family ='Ubuntu') +theme(legend.position ='none')
It was expected that the amount of funding for renewable energy research would also be multimodal and widely-spread because of administration changes. The multimodal expectation was supported by the figure with peaks occurring around 1980, 1995, and 2010 for wind and solar, but solar was wide-spread, while wind was more tightly-grouped. This figure indicates a funding-implementation lag, as the most funding for solar occurred in the late 1970s and early 1980s, but the decline in solar costs is more recent. The boom in solar funding in the late 1970s could be related to the start of the environmental movement. The second major increase in funding from 2009-2015 overlaps with Obama’s administration.
3.6 Correlations
The relationships we expected to find among these variables were:
The price of renewable energy is negatively correlated with renewable energy generation.
The price of fossil fuels is positively correlated with renewable energy generation.
The number of promotional policies is positively correlated with renewable energy generation.
The amount of funding for renewable research is positively correlated with renewable energy generation.
3.6.1 Solar/Wind Generation vs 4 Drivers
In order to perform the correlations, we need to merge the datasets:
Code
# Both solar and windmerged_solar_wind <- fossil_cost %>%full_join(solar_cost, by ='year') %>%full_join(solar_gen, by ='year') %>%full_join(solar_laws, by ='year') %>%full_join(solar_wind_budget_wider, by ='year') %>%full_join(wind_cost, by ='year') %>%full_join(wind_gen, by ='year') %>%full_join(wind_laws, by ='year') %>%arrange(year) %>%write_csv(here('data', 'merged_solar_wind.csv'))
Then, we calculate the correlation coefficients between solar generation (solar_gen) and other solar related variables. They are:
Solar installed cost - solar_installed_cost_usd_per_kw
Solar related laws - count_of_solar_laws
Solar technology budget - solar_rdd_million_usd
Coal cost (representing traditional power source) - avg_coal_cost_per_ton
The summary table of solar/wind generation vs the 4 other variables is shown below:
Code
cor_solar_wind %>%mutate(cor =round(cor, 2),cor =cell_spec( cor, 'html', color =ifelse( cor >0.5| cor <-0.5, 'white', 'black' ),background =ifelse( cor >0.5| cor <-0.5, 'cornflowerblue', 'white' ))) %>%kable(format ='html',escape =FALSE,align =c('l', 'l', 'l'),col.names =c('Energy Type', 'Compared Variable', 'Correlation'),caption ='Correlations of Solar/Wind Generation vs the Other 4 Variables') %>%kable_styling(bootstrap_options =c('striped', 'hover'))
Correlations of Solar/Wind Generation vs the Other 4 Variables
Energy Type
Compared Variable
Correlation
Solar
Coal Cost
-0.54
Solar
Installed Cost
-0.96
Solar
Laws
0.28
Solar
Budgets
-0.21
Wind
Coal Cost
-0.52
Wind
Installed Cost
-0.18
Wind
Laws
0.1
Wind
Budgets
0.2
We found support for two of the expected relationships shown in the correlation coefficient figure: both solar and wind installed costs were negatively correlated with generation, and laws were positively correlated with solar and wind generation. Only wind had a positive correlation between R&D funding and generation, while solar had an unexpected negative correlation. Recall with the RD&D graph, a large amount of solar funding occurred before the increase in generation, likely causing this discrepancy. Additionally, the cost of coal had an unexpected negative correlated with solar and wind generation. This discrepancy is likely due to the fact that the cost of coal has remained relatively steady, and there might be a response-lag between the cost of coal increasing and the generation of solar and wind increasing.
3.6.2 Correlation Matrix
The above section only calculated the correlation coefficients between Solar/Wind Generation and the 4 other variables, which did not provide sufficient convincing results. A better solution is to work out the correlation matrix, so that all possible correlations are clearly shown. We can search for the most correlated variables based on this matrix. To search for the most correlated variables, we merged the datasets into merged_solar and merged_wind:
Code
# Solarmerged_solar <- fossil_cost %>%full_join(solar_cost, by ='year') %>%full_join(solar_gen, by ='year') %>%full_join(solar_laws, by ='year') %>%full_join(solar_wind_budget_wider %>%select(-wind_rdd_million_usd),by ='year') %>%arrange(year) %>%write_csv(here('data', 'merged_solar.csv'))# Windmerged_wind <- fossil_cost %>%full_join(wind_cost, by ='year') %>%full_join(wind_gen, by ='year') %>%full_join(wind_laws, by ='year') %>%full_join(solar_wind_budget_wider %>%select(-solar_rdd_million_usd),by ='year') %>%arrange(year) %>%write_csv(here('data', 'merged_wind.csv'))
Then, we calculate the correlation matrix for this merged data:
The high correlations associated with year have been previously discussed in the variable expectations section and those associated with generation were discussed in the last section; therefore, there were only two new strong correlations produced via this analysis. The two correlations were solar installed cost vs coal cost and solar research budget vs solar installed cost. Solar installed cost was strongly, positively correlated with coal cost, but wind cost was not, which is likely due to the steady decline in the price of solar while the price of wind has been more steady. Solar R&D funding was strongly, positively correlated with installed cost, which may indicate that R&D had been focused on cost efficiency of the solar panels via innovative, cheaper materials.
3.6.3 High Correlation Plots
In the previous part, we found 5 high correlations in solar and 2 in wind. Let’s generate the scatter plots to visualize them:
After plotting these high correlation relationships, it becomes more obvious which relationships are more convincing than others. Solar generation vs year appears to be more exponential rather than linear. Solar RD&D vs installed cost has very few data points. Finally, wind installed cost vs year has an odd curvature after 2005.
3.7 Summary Tables
This section generates 3 summary tables of solar, wind, and fossil (coal).
3.7.1 Summary of Solar
Below is the summary of solar data, including solar electricity generation, installed cost, laws, and RD&D budget:
Code
# Calculate the summary and export to csvsummary_solar_gen <-summary(solar_gen$solar_gen_gwh)summary_solar_cost <-summary(solar_cost$solar_installed_cost_usd_per_kw)summary_solar_laws <-summary(solar_laws$count_of_solar_laws)summary_solar_budget <-summary(solar_wind_budget_wider$solar_rdd_million_usd)summary_solar_df <-data.frame(tier =names(summary_solar_gen),solar_gen_gwh =round(as.numeric(summary_solar_gen), 2),solar_cost_usd_per_kw =round(as.numeric(summary_solar_cost), 2),solar_laws_count =round(as.numeric(summary_solar_laws)),solar_budget_million_usd =round(as.numeric(summary_solar_budget), 2) ) %>%write_csv(here('data', 'summary_solar.csv'))# Genarate the table using kable()summary_solar_df %>%kable(format ='html',escape =FALSE,col.names =c('Tier','Electricity Generation (GWh)','Installed Cost (USD/KW)','Laws Count','RD&D Budget (Million USD)'),caption ='Summary of Solar') %>%kable_styling(bootstrap_options =c('striped', 'hover'))
Summary of Solar
Tier
Electricity Generation (GWh)
Installed Cost (USD/KW)
Laws Count
RD&D Budget (Million USD)
Min.
507.71
1220.57
1
17.70
1st Qu.
584.31
1959.80
2
132.98
Median
3072.19
2728.27
7
172.85
Mean
29080.30
2985.99
9
283.62
3rd Qu.
48978.68
4297.55
12
276.75
Max.
145598.31
4666.57
45
1168.50
3.7.2 Summary of Wind
Below is the summary of wind data, including wind electricity generation, installed cost, laws, and RD&D budget:
Code
# Calculate the summary and export to csvsummary_wind_gen <-summary(wind_gen$wind_gen_gwh)summary_wind_cost <-summary(wind_cost$wind_installed_cost_usd_per_kw)summary_wind_cost <- summary_wind_cost[!names(summary_wind_cost) =="NA's"]summary_wind_laws <-summary(wind_laws$count_of_wind_laws)summary_wind_budget <-summary(solar_wind_budget_wider$wind_rdd_million_usd)summary_wind_df <-data.frame(tier =names(summary_wind_gen),wind_gen_gwh =round(as.numeric(summary_wind_gen), 2),wind_cost_usd_per_kw =round(as.numeric(summary_wind_cost), 2),wind_laws_count =round(as.numeric(summary_wind_laws)),wind_budget_million_usd =round(as.numeric(summary_wind_budget), 2) ) %>%write_csv(here('data', 'summary_wind.csv'))# Genarate the table using kable()summary_wind_df %>%kable(format ='html',escape =FALSE,col.names =c('Tier','Electricity Generation (GWh)','Installed Cost (USD/KW)','Laws Count','RD&D Budget (Million USD)'),caption ='Summary of Wind') %>%kable_styling(bootstrap_options =c('striped', 'hover'))
Summary of Wind
Tier
Electricity Generation (GWh)
Installed Cost (USD/KW)
Laws Count
RD&D Budget (Million USD)
Min.
6737.33
1553.00
1
0.00
1st Qu.
28554.33
1822.50
1
32.05
Median
130499.16
2036.00
7
57.00
Mean
152144.43
2445.53
8
67.63
3rd Qu.
247475.16
2751.50
10
83.38
Max.
434811.65
4950.00
45
247.70
3.7.3 Summary of Fossil
Below is the summary of fossil fuel data (coal costs):
Code
# Calculate the summary and export to csvsummary_fossil_cost <-summary(fossil_cost$avg_coal_cost_per_ton)summary_fossil_cost_df <-data.frame(tier =names(summary_fossil_cost),fossil_cost_usd_per_ton =round(as.numeric(summary_fossil_cost), 2)) %>%write_csv(here('data', 'summary_fossil_cost.csv'))# Genarate the table using kable()summary_fossil_cost_df %>%kable(format ='html',escape =FALSE,col.names =c('Tier','Coal Cost (USD/Ton)'),caption ='Summary of Fossil Cost') %>%kable_styling(bootstrap_options =c('striped', 'hover'))
Summary of Fossil Cost
Tier
Coal Cost (USD/Ton)
Min.
36.32
1st Qu.
39.27
Median
42.84
Mean
42.20
3rd Qu.
45.11
Max.
46.65
4. Attribution
Below is the attribution breakdown for the team:
Research question was edited together.
Data collection and discussion and proposal expectation evaluation by Abbey.
Data cleaning and plotting by Pingfan.
Moving forward, we will both create the new figures to help explore the questions in Stage 3.
5. Appendix
5.1 Dictionary of Files
This is a complete dictionary of files. The files can be accessed in the “data” folder. In the dictionary, there are 4 columns:
Name - The name of the file.
Type - The type of the file. In our case, we have CSV, HTML, and Excel.
Source - The source of the file. “Generated” means it is generated by R codes; “Raw” means Its either directly downloaded or scrapped from the website.
Description - A brief description of the file.
The dictionary is in alphabetical order.
Code
dict_file <-data.frame(name =c('cor_solar_wind','dsire_laws_raw','dsire_solar_laws','dsire_wind_laws','eia_fossil_cost_raw','eia_fossil_cost','eia_solar_gen_raw','eia_solar_gen','eia_solar_wind_gen_longer','eia_wind_gen_raw','eia_wind_gen','high_cor_solar','high_cor_wind','iea_solar_wind_budget_wider','irena_solar_installed_cost','irena_solar_wind_installed_cost_longer','irena_wind_installed_cost','merged_solar_wind','merged_solar','merged_wind','summary_fossil_cost','summary_solar','summary_wind','dsire_laws_raw','irena_costs_raw'),type =c('CSV', 'CSV', 'CSV', 'CSV', 'CSV','CSV', 'CSV', 'CSV', 'CSV', 'CSV','CSV', 'CSV', 'CSV', 'CSV', 'CSV','CSV', 'CSV', 'CSV', 'CSV', 'CSV','CSV', 'CSV', 'CSV', 'HTML', 'Excel'),source =c('Generated', 'Raw', 'Generated', 'Generated', 'Raw','Generated', 'Raw', 'Generated', 'Generated', 'Raw','Generated', 'Generated', 'Generated', 'Raw','Generated', 'Generated', 'Generated', 'Generated','Generated', 'Generated', 'Generated', 'Generated','Generated', 'Raw', 'Raw'),description =c('Correlation between generation and 4 other variables','Laws and incentives count from DSIRE','Solar laws and incentives count','Wind laws and incentives count','Monthly fossil cost from EIA','Annual average fossil cost','Monthly solar electricity generation from EIA','Annual solar electricity generation','Longer version of annual solar/wind generation','Monthly wind electricity generation from EIA','Annual wind electricity generation','Solar related correlations higher than 0.8','Wind related correlations higher than 0.8','Wider version of solar/wind RD&D budgets from IEA','Annual average solar installed cost','Longer version of solar/wind generation installed cost','Annual average wind installed cost','Merged dataset of all solar, wind, and fossil','Merged dataset of both solar and fossil','Merged dataset of both wind and fossil','Summary data of fossil cost','Summary dataset of solar','Summary dataset of wind','Scraped law count data from DSIRE','Renewable power generation cost data from IRENA')) %>%mutate(count =row_number()) %>%select(count, everything())dict_file %>%kable(format ='html',escape =FALSE,align =c('c', 'l', 'l', 'l', 'l'),col.names =c('No.', 'Name', 'Type', 'Source', 'Description'),caption ='Dictionary of Data Files') %>%kable_styling(bootstrap_options =c('striped', 'hover'))
Dictionary of Data Files
No.
Name
Type
Source
Description
1
cor_solar_wind
CSV
Generated
Correlation between generation and 4 other variables
2
dsire_laws_raw
CSV
Raw
Laws and incentives count from DSIRE
3
dsire_solar_laws
CSV
Generated
Solar laws and incentives count
4
dsire_wind_laws
CSV
Generated
Wind laws and incentives count
5
eia_fossil_cost_raw
CSV
Raw
Monthly fossil cost from EIA
6
eia_fossil_cost
CSV
Generated
Annual average fossil cost
7
eia_solar_gen_raw
CSV
Raw
Monthly solar electricity generation from EIA
8
eia_solar_gen
CSV
Generated
Annual solar electricity generation
9
eia_solar_wind_gen_longer
CSV
Generated
Longer version of annual solar/wind generation
10
eia_wind_gen_raw
CSV
Raw
Monthly wind electricity generation from EIA
11
eia_wind_gen
CSV
Generated
Annual wind electricity generation
12
high_cor_solar
CSV
Generated
Solar related correlations higher than 0.8
13
high_cor_wind
CSV
Generated
Wind related correlations higher than 0.8
14
iea_solar_wind_budget_wider
CSV
Raw
Wider version of solar/wind RD&D budgets from IEA
15
irena_solar_installed_cost
CSV
Generated
Annual average solar installed cost
16
irena_solar_wind_installed_cost_longer
CSV
Generated
Longer version of solar/wind generation installed cost
17
irena_wind_installed_cost
CSV
Generated
Annual average wind installed cost
18
merged_solar_wind
CSV
Generated
Merged dataset of all solar, wind, and fossil
19
merged_solar
CSV
Generated
Merged dataset of both solar and fossil
20
merged_wind
CSV
Generated
Merged dataset of both wind and fossil
21
summary_fossil_cost
CSV
Generated
Summary data of fossil cost
22
summary_solar
CSV
Generated
Summary dataset of solar
23
summary_wind
CSV
Generated
Summary dataset of wind
24
dsire_laws_raw
HTML
Raw
Scraped law count data from DSIRE
25
irena_costs_raw
Excel
Raw
Renewable power generation cost data from IRENA
5.2 Dictionary of Solar and Wind Datasets
This is a dictionary of solar and wind related datasets that are generated in this report. The majority of the datasets overlap with the file dictionary, and they have same or similar names.
All datasets are in the format of data frame or tibble.
Code
dict_solar_wind <-data.frame(name =c('merged_solar','merged_solar_wind','merged_wind','solar_cost','solar_gen','solar_laws','solar_wind_budget_longer','solar_wind_budget_wider','solar_wind_cost','solar_wind_cost_longer','solar_wind_gen_longer','summary_solar_df','summary_wind_df','wind_cost','wind_gen','wind_laws'),type =c('A tibble: 50 × 6','A tibble: 50 × 10','A tibble: 49 × 6','A tibble: 10 × 2','A tibble: 22 × 2','A tibble: 20 × 2','A tibble: 84 × 3','A tibble: 42 × 3','A tibble: 10 × 3','A tibble: 20 × 3','A tibble: 44 × 3','A data frame: 6 x 5','A data frame: 6 x 5','A tibble: 36 × 2','A tibble: 22 × 2','A tibble: 17 × 2'),description =c('Merged dataset of both solar and fossil','Merged dataset of all solar, wind, and fossil','Merged dataset of both wind and fossil','Annual average solar installed cost','Annual solar electricity generation','Solar laws and incentives count','Longer version of solar/wind RD&D budgets','Wider version of solar/wind RD&D budgets','Solar/wind installed cost','Longer version of solar/wind installed cost','Longer version of annual solar/wind generation','Summary dataset of solar','Summary dataset of wind','Annual average wind installed cost','Annual wind electricity generation','Wind laws and incentives count')) %>%mutate(count =row_number()) %>%select(count, everything())dict_solar_wind %>%kable(format ='html',escape =FALSE,align =c('c', 'l', 'l', 'l'),col.names =c('No.', 'Name', 'Type', 'Description'),caption ='Dictionary of Solar and Wind Datasets') %>%kable_styling(bootstrap_options =c('striped', 'hover'))
Dictionary of Solar and Wind Datasets
No.
Name
Type
Description
1
merged_solar
A tibble: 50 × 6
Merged dataset of both solar and fossil
2
merged_solar_wind
A tibble: 50 × 10
Merged dataset of all solar, wind, and fossil
3
merged_wind
A tibble: 49 × 6
Merged dataset of both wind and fossil
4
solar_cost
A tibble: 10 × 2
Annual average solar installed cost
5
solar_gen
A tibble: 22 × 2
Annual solar electricity generation
6
solar_laws
A tibble: 20 × 2
Solar laws and incentives count
7
solar_wind_budget_longer
A tibble: 84 × 3
Longer version of solar/wind RD&D budgets
8
solar_wind_budget_wider
A tibble: 42 × 3
Wider version of solar/wind RD&D budgets
9
solar_wind_cost
A tibble: 10 × 3
Solar/wind installed cost
10
solar_wind_cost_longer
A tibble: 20 × 3
Longer version of solar/wind installed cost
11
solar_wind_gen_longer
A tibble: 44 × 3
Longer version of annual solar/wind generation
12
summary_solar_df
A data frame: 6 x 5
Summary dataset of solar
13
summary_wind_df
A data frame: 6 x 5
Summary dataset of wind
14
wind_cost
A tibble: 36 × 2
Annual average wind installed cost
15
wind_gen
A tibble: 22 × 2
Annual wind electricity generation
16
wind_laws
A tibble: 17 × 2
Wind laws and incentives count
5.3 Dictionary of Correlations
This dictionary contains the correlation datasets. All datasets are in the format of data frame.
Code
dict_cor <-data.frame(name =c('cor_matrix_solar','cor_matrix_wind','cor_solar_wind','high_cor_solar','high_cor_wind'),type =c('A data frame: 6 × 6','A data frame: 6 × 6','A data frame: 8 × 3','A data frame: 5 × 3','A data frame: 2 × 3'),description =c('Correlation matrix of all solar related variables','Correlation matrix of all wind related variables','Correlation between generation and 4 other variables','Solar related correlations higher than 0.8','Wind related correlations higher than 0.8')) %>%mutate(count =row_number()) %>%select(count, everything())dict_cor %>%kable(format ='html',escape =FALSE,align =c('c', 'l', 'l', 'l'),col.names =c('No.', 'Name', 'Type', 'Description'),caption ='Dictionary of Correlations') %>%kable_styling(bootstrap_options =c('striped', 'hover'))
Dictionary of Correlations
No.
Name
Type
Description
1
cor_matrix_solar
A data frame: 6 × 6
Correlation matrix of all solar related variables
2
cor_matrix_wind
A data frame: 6 × 6
Correlation matrix of all wind related variables
3
cor_solar_wind
A data frame: 8 × 3
Correlation between generation and 4 other variables
4
high_cor_solar
A data frame: 5 × 3
Solar related correlations higher than 0.8
5
high_cor_wind
A data frame: 2 × 3
Wind related correlations higher than 0.8
5.4 Miscellaneous
There are also other datasets that are not included in the above dictionaries:
There is a fossil_cost tibble that contains the average coal cost per ton from 2008 to 2022.
There are 7 plots named as plot_1 through plot_7. These are the correlation plots shown in 3.6.3 High Correlation Plots. The other plots are shown directly and are not stored as variables or datasets.
There are also some other datasets like label_data, laws_html, laws_table, etc. They are used to temporarily store values.
---title: "Progress Report - Advancements in Solar and Wind Energy in the USA"subtitle: "Key Drivers of the Growth of Solar and Wind Electricity Generation in the USA"date: "Oct 29, 2023"bibliography: references.bibauthor: - "Pingfan Hu" - "Abbey Kollar"---```{r}#| label: setup#| include: falseknitr::opts_chunk$set(comment ='#>',fig.width =7.252,fig.height =4,fig.retina =3)library(tidyverse)library(here)library(janitor)library(lubridate)library(rvest)library(gridExtra)library(kableExtra)```## 1. Research Question & Explanation### 1.1 Research Question**Exploring the key drivers of the growth of solar and wind electricity generation in the USA.**### 1.2 Explanation of the StudyThe drivers we considered in this progress report are:1. Cost of solar and wind installation2. Cost of fossil fuel (take coal as example)3. Policies and incentives regarding solar and wind4. Energy research budgetsOur study is in 3 stages:1. Stage 1: Solar/wind electricity generation vs these 4 drivers. We expect that the above 4 drivers will have positive or negative correlations with the growth of solar and wind electricity generation.2. Stage 2: Correlation matrix of all 5 variables. The 4 drivers and the generation add up to 5 variables. We expect that there will be relationships among the drivers, not just the drivers influencing generation.3. Stage 3: Expansion on the progress report. New questions have been generated from these initial figures, specifically: a. What countries have the highest increase in wind and solar generation? b. What is the relationship between wind and solar and petroleum and natural gas in terms of cost and generation? Especially because there has been a rise in natural gas use, unlike coal. c. What U.S. states have the highest growth of wind and solar generation? d. What are the differences between the generation data in EIA, IRENA, and Our World in Data sources? e. How are generation and the drivers related to capacity?The team will attempt to answer some of these questions, as well as updating some of the plots within this project proposal.## 2. Data Source DiscussionWe have adjusted our data sources from the proposal to focus on wind and solar energy. We have found sources that cover generation and capacity of energy sources, United States policies, and country R&D spending.### 2.1 [EIA Data Browser](https://www.eia.gov/electricity/data/browser/) - Solar/Wind Generation & Fossil CostThis data browser is generated by @EIADataBrowser2023 (US Energy Information Administration). In this browser, the data can be selected as net generation, consumption, or retail; and the reported data components can be selected based on our purposes. It contains data of different renewable energy sources from 2000 to 2023. The data is pre-processed by EIA, but originated from [EIA API](https://www.eia.gov/opendata/). The EIA appears to collect its data from "survey respondents," which could indicate missing data if not all power companies or organizations are reporting data to the EIA, and there is the potential for bias if only certain types of companies, like large corporations, are reporting data and if the survey questions are inherently biased.### 2.2 [IRENA](https://www.irena.org/Data) - Solar/Wind Installed Cost@IRENA, short for International Renewable Energy Agency, also provides data about renewable energy capacity, power generation, and renewable energy balances. The data is processed by IRENA and was collected from IRENA members using a questionnaire and the gaps in data were filled using desk research. The renewable energy power capacity statistics are updated in March, and the power generation and balances data sets are updated in July. While IRENA tries to fill in missing data through their own research, the data set may still be incomplete. The questionnaire may introduce biases depending on how the questions are worded.### 2.3 [Our World in Data](https://ourworldindata.org/explorers/energy?tab=chart&facet=none&country=~USA&Total+or+Breakdown=Select+a+source&Energy+or+Electricity=Primary+energy&Metric=Share+of+total&Select+a+source=Solar+and+wind) - Solar/Wind Share & ConsumptionThe Energy Data Explorer of @OurWorldinData provides data on the consumption, relative change, and share of total energy for all energy sources, including solar and wind and fossil fuels. The explorer allows us to choose specific countries. The data was pre-processed by Our World in Data and was formerly published by the [Energy Institute Statistical Review of World Energy (2023)](https://www.energyinst.org/statistical-review). The Energy Institute analyzes data on world energy markets from the prior year and has been providing this "timely, comprehensive and objective data to the energy community since 1952." The institute appears to partner with other institutions to obtain data, like energy companies, and gets information from government sources and published data, thus the data from Our World in Data has been processed twice. While the original data sources are government sources and published data, there can be gaps in the data if not all power producers are reporting data to the governments. Also, there could be bias due to which data sources are used (i.e. selection bias), and the report includes methodologies for manipulating the data so there could be uncertainty and biases with these underlying assumptions.### 2.4 [DSIRE](https://programs.dsireusa.org/system/program?state=US) - Solar/Wind Laws & Incentives@DSIRE (Database of State Incentives for Renewables and Efficiency) provides state and federal policy related to renewable energy and efficiency. This data has been pre-processed by the N.C. Clean Energy Technology Center at N.C. State University and provides summaries of incentives and policies established by the federal government, state governments and U.S. territories, local governments, and larger electric and gas utilities in the U.S. The website does not state how N.C. collected this information, but it is likely from publicly available websites and news reports. The data does not include federal funding opportunities for R&D, demonstration projects, outreach, or inter-governmental programs nor bills that have not been enacted or regulations that have not been fully approved. The scope is limited to local incentives and policies that are especially innovative, and to municipalities and counties with the largest populations, which results in missing data and biases for what qualifies as "innovative." The data is typically updated at least once a year.### 2.5 [IEA](https://www.iea.org/data-and-statistics/data-product/energy-technology-rd-and-d-budget-database-2#) - Energy Technology RD&D BudgetsThis data set from the @IEA contains spending on energy technologies within IEA countries since 1974. The data was processed by the IEA and collected from central or federal government budgets, as well as the budgets of state-owned companies, for spending on a range of sectors including energy efficiency, renewables, nuclear power, fossil fuels, hydrogen and fuel cells, and more. There is missing data because not all countries are IEA members and perhaps not all spending is reflected in the budgets, which also could introduce biases.## 3. Figures and Evaluations### 3.1 Solar and Wind GenerationThe [EIA](https://www.eia.gov/electricity/data/browser/#/topic/0?agg=1,0,2&geo=qnifi05c03j78&linechart=ELEC.GEN.SUN-US-99.M~ELEC.GEN.SUN-NV-99.M~ELEC.GEN.SUN-CA-99.M&columnchart=ELEC.GEN.SUN-US-99.M~ELEC.GEN.SUN-NV-99.M~ELEC.GEN.SUN-CA-99.M&map=ELEC.GEN.SUN-US-99.M&freq=M&start=200101&end=202307&ctype=linechart<ype=pin&rtype=s&pin=&rse=0&maptype=0&sec=o3g&fuel=004) provides the solar and wind generation data. We need to firstly perform data **wrangling** and **exporting**:```{r}# Raw data wrangling & exportingsolar_gen_path <-here('data', 'eia_solar_gen_raw.csv')solar_gen <-read_csv(solar_gen_path, skip =4) %>%clean_names() %>%select('month', 'united_states_thousand_megawatthours') %>%arrange(month) %>%mutate(month =dmy(paste('01', month)),month =format(month, '%Y-%m'),year =as.numeric(substr(month, 1, 4)) ) %>%filter(year <=2022) %>%group_by(year) %>%summarise(solar_gen_gwh =sum(united_states_thousand_megawatthours)) %>%write_csv(here('data', 'eia_solar_gen.csv'))wind_gen_path <-here('data', 'eia_wind_gen_raw.csv')wind_gen <-read_csv(wind_gen_path, skip =4) %>%clean_names() %>%select('month', 'united_states_thousand_megawatthours') %>%arrange(month) %>%mutate(month =dmy(paste('01', month)),month =format(month, '%Y-%m'),year =as.numeric(substr(month, 1, 4)) ) %>%filter(year <=2022) %>%group_by(year) %>%summarise(wind_gen_gwh =sum(united_states_thousand_megawatthours)) %>%write_csv(here('data', 'eia_wind_gen.csv'))# Long format & exportingsolar_wind_gen_longer <-inner_join(solar_gen, wind_gen, by ='year') %>%pivot_longer(names_to ='energy_source',values_to ='gwh',cols =c('solar_gen_gwh', 'wind_gen_gwh')) %>%write_csv(here('data', 'eia_solar_wind_gen_longer.csv'))```Then, we use the prepared dataset to create the **plot**:```{r}solar_wind_gen_longer %>%ggplot(aes(x = year,y = gwh /1000,color = energy_source)) +geom_line(linewidth =1) +geom_label(data =subset(solar_wind_gen_longer, year ==max(year)),aes(label =recode(energy_source, solar_gen_gwh ='Solar', wind_gen_gwh ='Wind')),nudge_x =-0.5,nudge_y =10) +labs(x ='Year',y ='Terawatt-Hour (TWH)',color ='Energy Source',title ='Solar and Wind Energy Generation in the USA',subtitle ='From 2000 to 2022') +scale_x_continuous(breaks =seq(2000, 2022, by =2)) +theme_bw(base_family ='Ubuntu') +theme(legend.position ='none')```In our proposal, we made predictions about capacity and the relationships of the drivers to capacity; however, we have shifted in the progress report to focus on generation since that is the actual amount of electricity generated in a given time period. We had expected capacity to be unimodal and widely spread, and we expected this distribution for generation as well. The plot of energy generation supports those expectations. Both sources are steadily increasing in annual generation and have yet to plateau, and the generation is widely-spread from about 0 TWH to 150 and over 400 TWH for solar and wind, respectively. The generation of wind has increased at a faster rate than solar, which is unexpected due to the decrease in solar costs and relatively steady wind costs (see 3.2). However, the cost of solar was higher than wind until 2018 despite the steady decrease in costs, so perhaps more wind projects were deployed from 2000 to 2018.### 3.2 Cost of Solar and Wind InstallationThe price of solar and wind technologies can be described by the installed costs of these projects. The team has acquired the weighted cost data of the USA from [IRENA](https://www.irena.org/Data). The installed costs can be plotted based on these datasets.IRENA provides the weighted installed costs of [**solar**](https://www.irena.org/Data/View-data-by-topic/Costs/Solar-costs) and [**wind**](https://www.irena.org/Data/View-data-by-topic/Costs/Wind-Costs) projects from 2010 to 2019. We firstly manipulate with the data:```{r}# Read solar and wind cost data and merge to onesolar_cost <-read_csv(here('data', 'irena_solar_installed_cost.csv'))wind_cost <-read_csv(here('data', 'irena_wind_installed_cost.csv'))solar_wind_cost <-left_join(solar_cost, wind_cost, by ='year')# Pivot to longer format and export to csvsolar_wind_cost_longer <- solar_wind_cost %>%pivot_longer(names_to ='energy_source',values_to ='cost',cols =-year) %>%write_csv(here('data', 'irena_solar_wind_installed_cost_longer.csv'))```Then, we generate the **plot** of both solar and wind installed costs:```{r}solar_wind_cost_longer %>%ggplot(aes(x = year,y = cost,color = energy_source)) +geom_line(linewidth =1) +geom_label(data =subset(solar_wind_cost_longer, year ==max(year)),aes(label =recode(energy_source, solar_installed_cost_usd_per_kw ='Solar', wind_installed_cost_usd_per_kw ='Wind')),nudge_x =0.6,nudge_y =10) +labs(x ='Year',y ='Installed Cost (USD/kW)',color ='Energy Source',title ='Solar and Wind Installed Costs in the USA',subtitle ='In USD/kW, From 2010 to 2019') +scale_x_continuous(breaks =seq(2010, 2019, by =1)) +theme_bw(base_family ='Ubuntu') +theme(legend.position ='none')```It was expected that the cost for renewables would be multimodal due to yearly variations but that the overall trend would be decreasing and that the grouping would range depending on the type of technology, specifically the capital cost elasticity. The plot of the cost of solar installed in the U.S. shows a decreasing trend from 2010 and is widely-spread from around 4500 to 1000 USD/kW, which mostly supports the expectations except for the expected yearly variability causing a multimodal distribution. The cost of installing wind projects appears to be multimodal with one large peak in 2010 and a smaller range from about 2325 to 1500 USD/kW. It makes sense that the cost of solar installation has a larger range than wind cost because of the technological solar panel advancements, which have made the capital cheaper, while wind turbines have not seen such progress.### 3.3 Cost of Fossil Fuel (Coal)The [EIA](https://www.eia.gov/electricity/data/browser/#/topic/14?agg=1,0,2&fuel=004&geo=qnifi05c03j78&sec=o3g&freq=M&start=200101&end=202307&ctype=linechart<ype=pin&rtype=s&maptype=0&rse=0&pin=) provides the cost of electricity generation using coal, which can be used as a comparison. Firstly, we perform data **wrangling** and data **exporting**:```{r}fossil_cost_path <-here('data', 'eia_fossil_cost_raw.csv')fossil_cost <-read_csv(fossil_cost_path, skip =4) %>%clean_names() %>%arrange(month) %>%mutate(month =dmy(paste('01', month)),month =format(month, '%Y-%m'),year =as.numeric(substr(month, 1, 4)) ) %>%filter(year <=2022) %>%group_by(year) %>%summarise(avg_coal_cost_per_ton =mean(united_states_coal_dollars_per_tons)) %>%write_csv(here('data', 'eia_fossil_cost.csv'))```Then, we **plot** the cost of coal:```{r}fossil_cost %>%filter(year >=2010& year <=2019) %>%ggplot(aes(x = year,y = avg_coal_cost_per_ton)) +geom_ribbon(aes(ymin =33,ymax = avg_coal_cost_per_ton),alpha =0.4) +geom_line() +labs(x ='Year',y ='USD / Ton',title ='Average Cost of Coal Used for Electricity Generation in the USA',subtitle ='In USD/Ton, From 2010 to 2019') +scale_x_continuous(limits =c(2010, 2019),breaks =seq(2010, 2018, by =1),expand =c(0,0)) +scale_y_continuous(limits =c(33, 48),breaks =seq(33, 48, by =3),expand =expansion(mult =c(0, 0.1))) +theme_bw(base_family ='Ubuntu')```The price of coal was expected to be multimodal due to yearly fluctuations in prices but tightly-grouped since fossil fuels have been a long term energy source in the US with no recent major developments in technological efficiency or resource extraction. The graph supports these expectations as there has been fluctuations in the price resulting in a peak around 2011 and 2014, and the cost has varied slightly from 46.5 to 36 USD/Ton. It is interesting that the cost of coal appears to align with the presidential administrations (Obama 2009-2017, Trump 2017-2021, Biden starting in 2021), perhaps indicating fossil fuel subsidies or political pushes towards renewables.### 3.4 Policies and IncentivesThe [**DSIRE**](https://programs.dsireusa.org/system/program) (Database of State Incentives for Renewables & Efficiency) provides a comprehensive collection of policies and incentives in terms of sustainability. There are 4 subcategories:1. Electric Vehicles2. Energy Efficiency3. Energy Storage4. Renewable EnergyThis website is a great source of policies and incentives, but it doesn't provide direct link of `csv` files. A work-around would be web scrapping using R codes. We decided to obtain the complete table, and tailor it later based on our needs.The table in this page is dynamic and is not revealed only by providing the URL. We suspect it's controlled by some JavaScript codes. A work-around is to store the table part of the HTML codes.Below are the codes of **scrapping** the table:```{r}# Read the html from local pathlaws_html_path <-here('data', 'dsire_laws_raw.html')laws_html <-read_html(laws_html_path)# Scrape the table as a dflaws_table <- laws_html %>%html_nodes('table') %>% .[[1]] %>%html_table()# Save the df to csvwrite_csv(laws_table, here('data', 'dsire_laws_raw.csv'))```The raw table needs to be **modified** and **saved** as a preparation of plotting:```{r}# Solar lawssolar_laws <-read_csv(here('data', 'dsire_laws_raw.csv')) %>%filter(str_detect(`Policy/Incentive Type`, 'Solar')) %>%mutate(Created =as.integer(substr(Created, 7, 10))) %>%count(Created) %>%rename(year = Created, count_of_solar_laws = n) %>%write_csv(here('data', 'dsire_solar_laws.csv'))# Wind lawswind_laws <-read_csv(here('data', 'dsire_laws_raw.csv')) %>%filter(str_detect(`Policy/Incentive Type`, 'Wind')) %>%mutate(Created =as.integer(substr(Created, 7, 10))) %>%count(Created) %>%rename(year = Created, count_of_wind_laws = n) %>%write_csv(here('data', 'dsire_wind_laws.csv'))```Then we do the **plotting**.Firstly, we plot the number of **solar** policies:```{r}solar_laws %>%filter(year <=2020) %>%ggplot(aes(x = year, y = count_of_solar_laws)) +# Shading Layergeom_rect(aes(xmin =-Inf, xmax =2001,ymin =-Inf, ymax =Inf),fill ='mediumaquamarine', alpha =0.1) +geom_rect(aes(xmin =2001, xmax =2009,ymin =-Inf, ymax =Inf),fill ='moccasin', alpha =0.1) +geom_rect(aes(xmin =2009, xmax =2017,ymin =-Inf, ymax =Inf),fill ='lightblue', alpha =0.1) +geom_rect(aes(xmin =2017, xmax =Inf,ymin =-Inf, ymax =Inf),fill ='lightcoral', alpha =0.1) +# Column Layergeom_col() +# Label Layergeom_label(aes(x =2000, y =36, label ='Clinton'),fill ='mediumaquamarine', nudge_x =-0.2, nudge_y =5) +geom_label(aes(x =2005, y =36, label ='Bush'),fill ='moccasin', nudge_x =0, nudge_y =5) +geom_label(aes(x =2013, y =36, label ='Obama'),fill ='lightblue', nudge_x =0, nudge_y =5) +geom_label(aes(x =2019, y =36, label ='Trump'),fill ='lightcoral', nudge_x =0.2, nudge_y =5) +labs(x ='Year',y ='Number of Solar Policies',title ='Number of Solar Policies by Year in the USA',subtitle ='From 2000 to 2020') +scale_x_continuous(breaks =seq(2000, 2020, by =2)) +theme_bw(base_family ='Ubuntu')```Then, we plot the number of **wind** policies:```{r}wind_laws %>%ggplot(aes(x = year,y = count_of_wind_laws)) +# Shading Layergeom_rect(aes(xmin =-Inf, xmax =2001,ymin =-Inf, ymax =Inf),fill ='mediumaquamarine', alpha =0.1) +geom_rect(aes(xmin =2001, xmax =2009,ymin =-Inf, ymax =Inf),fill ='moccasin', alpha =0.1) +geom_rect(aes(xmin =2009, xmax =2017,ymin =-Inf, ymax =Inf),fill ='lightblue', alpha =0.1) +geom_rect(aes(xmin =2017, xmax =Inf,ymin =-Inf, ymax =Inf),fill ='lightcoral', alpha =0.1) +# Column Layergeom_col() +# Label Layergeom_label(aes(x =2000, y =36, label ='Clinton'),fill ='mediumaquamarine', nudge_x =-0.2, nudge_y =5) +geom_label(aes(x =2005, y =36, label ='Bush'),fill ='moccasin', nudge_x =0, nudge_y =5) +geom_label(aes(x =2013, y =36, label ='Obama'),fill ='lightblue', nudge_x =0, nudge_y =5) +geom_label(aes(x =2019, y =36, label ='Trump'),fill ='lightcoral', nudge_x =0.2, nudge_y =5) +labs(x ='Year',y ='Number of Wind Policies',title ='Number of Wind Policies by Year in the USA',subtitle ='From 2000 to 2020') +scale_x_continuous(breaks =seq(2000, 2020, by =2)) +theme_bw(base_family ='Ubuntu')```Note that some policies will both contain the topics of wind and solar, so there is an overlap in solar and wind policies.It was expected that the number of policies would be multimodal and widely-spread due to changes in administration. The plots for wind and solar policies support these expectations. It makes sense that the largest number of wind and solar policies occurred in 2000 while the highest generation of wind and solar has been in the more recent years because there is usually a policy-implementation lag. Also, 2000 was near the end of Clinton's presidential term, so more progressive policies may have been instituted at this time. There is a clear association of policies enacted depending on the presidential term (Clinton 1993-2001, Bush 2001-2009, Obama 2009-2017, Trump 2017-2021): Clinton and Obama show higher numbers of policies than Trump and Bush. Moving forward, we will consider adding color to indicate if the congressional majority was the same party as the president.### 3.5 Energy Research BudgetsThe [iea](https://www.iea.org/data-and-statistics/data-product/energy-technology-rd-and-d-budget-database-2#) provides a comprehensive collection of energy technology RD&D budgets. It can be traced back to 1974 and ends at 2015. We filtered only the solar and wind renewable energy categories and hand scripted the data into a local csv file.Let's **read** and **wrangle** the file:```{r}# Store the wider format for later usesolar_wind_budget_wider <-read_csv(here('data', 'iea_solar_wind_budget_wider.csv') )# This is the longer formatsolar_wind_budget_longer <-read_csv(here('data', 'iea_solar_wind_budget_wider.csv') ) %>%pivot_longer(cols =-year,names_to ='energy_source',values_to ='rdd_millions_usd')```Then, let's do the **plot**:```{r}label_data <-subset(solar_wind_budget_longer, year ==max(year))label_data$nudge <-ifelse(label_data$energy_source =='solar_rdd_million_usd', 220, -100)solar_wind_budget_longer %>%ggplot(aes(x = year,y = rdd_millions_usd,color = energy_source)) +geom_line(linewidth =1) +geom_label(data = label_data,aes(label =recode(energy_source, solar_rdd_million_usd ='Solar', wind_rdd_million_usd ='Wind')),nudge_x =-0.5,nudge_y = label_data$nudge) +labs(x ='Year',y ='RD&D Budget (Million USD)',color ='Energy Source',title ='Solar and Wind Energy Budgets in the USA',subtitle ='In Million USD, From 1974 to 2015') +scale_x_continuous(breaks =seq(1975, 2015, by =5)) +theme_bw(base_family ='Ubuntu') +theme(legend.position ='none')```It was expected that the amount of funding for renewable energy research would also be multimodal and widely-spread because of administration changes. The multimodal expectation was supported by the figure with peaks occurring around 1980, 1995, and 2010 for wind and solar, but solar was wide-spread, while wind was more tightly-grouped. This figure indicates a funding-implementation lag, as the most funding for solar occurred in the late 1970s and early 1980s, but the decline in solar costs is more recent. The boom in solar funding in the late 1970s could be related to the start of the environmental movement. The second major increase in funding from 2009-2015 overlaps with Obama's administration.### 3.6 CorrelationsThe relationships we expected to find among these variables were:1. The price of renewable energy is negatively correlated with renewable energy generation.2. The price of fossil fuels is positively correlated with renewable energy generation.3. The number of promotional policies is positively correlated with renewable energy generation.4. The amount of funding for renewable research is positively correlated with renewable energy generation.#### 3.6.1 Solar/Wind Generation vs 4 DriversIn order to perform the correlations, we need to **merge** the datasets:```{r}# Both solar and windmerged_solar_wind <- fossil_cost %>%full_join(solar_cost, by ='year') %>%full_join(solar_gen, by ='year') %>%full_join(solar_laws, by ='year') %>%full_join(solar_wind_budget_wider, by ='year') %>%full_join(wind_cost, by ='year') %>%full_join(wind_gen, by ='year') %>%full_join(wind_laws, by ='year') %>%arrange(year) %>%write_csv(here('data', 'merged_solar_wind.csv'))```Then, we calculate the correlation coefficients between **solar generation** (`solar_gen`) and other solar related variables. They are:- Solar installed cost - `solar_installed_cost_usd_per_kw`- Solar related laws - `count_of_solar_laws`- Solar technology budget - `solar_rdd_million_usd`- Coal cost (representing traditional power source) - `avg_coal_cost_per_ton````{r}cor_solar_gen_coal_cost <-cor( merged_solar_wind$solar_gen_gwh, merged_solar_wind$avg_coal_cost_per_ton,use ='pairwise.complete.obs' )cor_solar_gen_solar_cost <-cor( merged_solar_wind$solar_gen_gwh, merged_solar_wind$solar_installed_cost_usd_per_kw,use ='pairwise.complete.obs' )cor_solar_gen_solar_laws <-cor( merged_solar_wind$solar_gen_gwh, merged_solar_wind$count_of_solar_laws,use ='pairwise.complete.obs' )cor_solar_gen_solar_rdd <-cor( merged_solar_wind$solar_gen_gwh, merged_solar_wind$solar_rdd_million_usd,use ='pairwise.complete.obs' )```Then, we calculate the correlation coefficients between **wind generation** (`wind_gen`) and other wind related variables. They are:- Wind installed cost - `wind_installed_cost_usd_per_kw`- Wind related laws - `count_of_wind_laws`- Wind technology budget - `wind_rdd_million_usd`- Coal cost (representing traditional power source) - `avg_coal_cost_per_ton````{r}cor_wind_gen_coal_cost <-cor( merged_solar_wind$wind_gen_gwh, merged_solar_wind$avg_coal_cost_per_ton,use ='pairwise.complete.obs' )cor_wind_gen_wind_cost <-cor( merged_solar_wind$wind_gen_gwh, merged_solar_wind$wind_installed_cost_usd_per_kw,use ='pairwise.complete.obs' )cor_wind_gen_wind_laws <-cor( merged_solar_wind$wind_gen_gwh, merged_solar_wind$count_of_wind_laws,use ='pairwise.complete.obs' )cor_wind_gen_wind_rdd <-cor( merged_solar_wind$wind_gen_gwh, merged_solar_wind$wind_rdd_million_usd,use ='pairwise.complete.obs' )```We merge all these values into a data frame:```{r}cor_solar_wind <-data.frame(value_1 =c('Solar','Solar','Solar','Solar','Wind','Wind','Wind','Wind'),value_2 =c('Coal Cost','Installed Cost','Laws','Budgets','Coal Cost','Installed Cost','Laws','Budgets'),cor =c(cor_solar_gen_coal_cost, cor_solar_gen_solar_cost, cor_solar_gen_solar_laws, cor_solar_gen_solar_rdd, cor_wind_gen_coal_cost, cor_wind_gen_wind_cost, cor_wind_gen_wind_laws, cor_wind_gen_wind_rdd)) %>%write_csv(here('data', 'cor_solar_wind.csv'))```The summary table of solar/wind generation vs the 4 other variables is shown below:```{r}cor_solar_wind %>%mutate(cor =round(cor, 2),cor =cell_spec( cor, 'html', color =ifelse( cor >0.5| cor <-0.5, 'white', 'black' ),background =ifelse( cor >0.5| cor <-0.5, 'cornflowerblue', 'white' ))) %>%kable(format ='html',escape =FALSE,align =c('l', 'l', 'l'),col.names =c('Energy Type', 'Compared Variable', 'Correlation'),caption ='Correlations of Solar/Wind Generation vs the Other 4 Variables') %>%kable_styling(bootstrap_options =c('striped', 'hover'))```We found support for two of the expected relationships shown in the correlation coefficient figure: both solar and wind installed costs were negatively correlated with generation, and laws were positively correlated with solar and wind generation. Only wind had a positive correlation between R&D funding and generation, *while solar had an unexpected negative correlation.* Recall with the RD&D graph, a large amount of solar funding occurred before the increase in generation, likely causing this discrepancy. Additionally, the *cost of coal had an unexpected negative correlated with solar and wind generation.* This discrepancy is likely due to the fact that the cost of coal has remained relatively steady, and there might be a response-lag between the cost of coal increasing and the generation of solar and wind increasing.#### 3.6.2 Correlation MatrixThe above section **only** calculated the correlation coefficients between Solar/Wind Generation and the 4 other variables, which did not provide sufficient convincing results. A better solution is to work out the correlation **matrix**, so that all possible correlations are clearly shown. We can search for the most correlated variables based on this matrix. To search for the most correlated variables, we merged the datasets into `merged_solar` and `merged_wind`:```{r}# Solarmerged_solar <- fossil_cost %>%full_join(solar_cost, by ='year') %>%full_join(solar_gen, by ='year') %>%full_join(solar_laws, by ='year') %>%full_join(solar_wind_budget_wider %>%select(-wind_rdd_million_usd),by ='year') %>%arrange(year) %>%write_csv(here('data', 'merged_solar.csv'))# Windmerged_wind <- fossil_cost %>%full_join(wind_cost, by ='year') %>%full_join(wind_gen, by ='year') %>%full_join(wind_laws, by ='year') %>%full_join(solar_wind_budget_wider %>%select(-solar_rdd_million_usd),by ='year') %>%arrange(year) %>%write_csv(here('data', 'merged_wind.csv'))```Then, we calculate the correlation matrix for this merged data:```{r}cor_matrix_solar <-cor( merged_solar[, sapply(merged_solar, is.numeric)],use ='pairwise.complete.obs' ) %>%as.data.frame()cor_matrix_wind <-cor( merged_wind[, sapply(merged_wind, is.numeric)],use ='pairwise.complete.obs' ) %>%as.data.frame()```We set a threshold of $\pm 0.8$ to indicate a "high" correlation. There are **5** high correlation coefficients in solar, and **2** in wind.For **solar**:```{r}high_cor_solar <- cor_matrix_solar %>%mutate(value_1 =row.names(.)) %>%gather(value_2, cor, -value_1) %>%filter(value_1 != value_2, cor >0.8| cor <-0.8) %>%mutate(min_var =pmin(value_1, value_2),max_var =pmax(value_1, value_2)) %>%distinct(min_var, max_var, .keep_all =TRUE) %>%select(-min_var, -max_var) %>%arrange(value_1) %>%write_csv(here('data', 'high_cor_solar.csv'))```1. Solar Generation **vs** Year: **0.84**.2. Solar Generation **vs** Solar Installed Cost: **-0.96**.3. Solar Installed Cost **vs** Year: **-0.98**.4. Solar Installed Cost **vs** Coal Cost: **0.90**.5. Solar Research Budget **vs** Solar Installed Cost: **0.87**.For **wind**:```{r}high_cor_wind <- cor_matrix_wind %>%mutate(value_1 =row.names(.)) %>%gather(value_2, cor, -value_1) %>%filter(value_1 != value_2, cor >0.8| cor <-0.8) %>%mutate(min_var =pmin(value_1, value_2),max_var =pmax(value_1, value_2)) %>%distinct(min_var, max_var, .keep_all =TRUE) %>%select(-min_var, -max_var) %>%arrange(value_1) %>%write_csv(here('data', 'high_cor_wind.csv'))```1. Wind Generation **vs** Year: **0.97**.2. Wind Installed Cost **vs** Year: **-0.81**.The high correlations associated with year have been previously discussed in the variable expectations section and those associated with generation were discussed in the last section; therefore, there were only two new strong correlations produced via this analysis. The two correlations were solar installed cost vs coal cost and solar research budget vs solar installed cost. Solar installed cost was strongly, positively correlated with coal cost, but wind cost was not, which is likely due to the steady decline in the price of solar while the price of wind has been more steady. Solar R&D funding was strongly, positively correlated with installed cost, which may indicate that R&D had been focused on cost efficiency of the solar panels via innovative, cheaper materials.#### 3.6.3 High Correlation PlotsIn the previous part, we found **5** high correlations in solar and **2** in wind. Let's generate the scatter plots to visualize them:```{r}# Plot 1: Solar Generation vs Yearplot_1 <- merged_solar_wind %>%filter(!is.na(solar_gen_gwh)) %>%ggplot(aes(x=year, y=solar_gen_gwh /1000)) +geom_point() +geom_smooth(method='lm', se=FALSE, size =0.5,color='cornflowerblue') +labs(x ='Year',y ='Solar Generation (TWH)',title='Solar Generation vs Year',subtitle ='Correlation = 0.84') +theme_bw(base_family ='Ubuntu')# Plot 2: Solar Generation vs Solar Installed Costplot_2 <- merged_solar_wind %>%filter(!is.na(solar_gen_gwh) &!is.na(solar_installed_cost_usd_per_kw)) %>%ggplot(aes(x=solar_installed_cost_usd_per_kw, y=solar_gen_gwh /1000)) +geom_point() +geom_smooth(method='lm', se=FALSE, size =0.5,color='cornflowerblue') +labs(x ='Solar Installed Cost (USD / kW)',y ='Solar Generation (TWH)',title='Solar Generation vs Installed Cost',subtitle ='Correlation = -0.96') +theme_bw(base_family ='Ubuntu')# Plot 3: Solar Installed Cost vs Yearplot_3 <- merged_solar_wind %>%filter(!is.na(solar_installed_cost_usd_per_kw)) %>%ggplot(aes(x=year, y=solar_installed_cost_usd_per_kw)) +geom_point() +geom_smooth(method='lm', se=FALSE, size =0.5,color='cornflowerblue') +labs(x ='Year',y ='Solar Installed Cost (USD / kW)',title='Solar Installed Cost vs Year',subtitle ='Correlation = -0.98') +theme_bw(base_family ='Ubuntu')# Plot 4: Solar Installed Cost vs Coal Costplot_4 <- merged_solar_wind %>%filter(!is.na(solar_installed_cost_usd_per_kw) &!is.na(avg_coal_cost_per_ton)) %>%ggplot(aes(x=avg_coal_cost_per_ton,y=solar_installed_cost_usd_per_kw)) +geom_point() +geom_smooth(method='lm', se=FALSE, size =0.5,color='cornflowerblue') +labs(x ='Average Coal Cost (USD / Ton)',y ='Solar Installed Cost (USD / kW)',title='Solar Installed Cost vs Coal Cost',subtitle ='Correlation = 0.90') +theme_bw(base_family ='Ubuntu')# Plot 5: Solar Research Budget vs Solar Installed Costplot_5 <- merged_solar_wind %>%filter(!is.na(solar_installed_cost_usd_per_kw) &!is.na(solar_rdd_million_usd)) %>%ggplot(aes(x=solar_installed_cost_usd_per_kw, y=solar_rdd_million_usd)) +geom_point() +geom_smooth(method='lm', se=FALSE, size =0.5,color='cornflowerblue') +labs(x ='Solar Installed Cost (USD / kW)',y ='Solar RD&D (Million USD)',title='Solar RD&D vs Installed Cost',subtitle ='Correlation = 0.87') +theme_bw(base_family ='Ubuntu')# Plot 6: Wind Generation vs Yearplot_6 <- merged_solar_wind %>%filter(!is.na(wind_gen_gwh)) %>%ggplot(aes(x = year,y = wind_gen_gwh /1000)) +geom_point() +geom_smooth(method='lm', se=FALSE, size =0.5,color='cornflowerblue') +labs(x ='Year',y ='Wind Generation (TWH)',title='Wind Generation vs Year',subtitle ='Correlation = 0.97') +theme_bw(base_family ='Ubuntu')# Plot 7: Wind Installed Cost vs Yearplot_7 <- merged_solar_wind %>%filter(!is.na(wind_installed_cost_usd_per_kw)) %>%ggplot(aes(x = year,y = wind_installed_cost_usd_per_kw)) +geom_point() +geom_smooth(method='lm', se=FALSE, size =0.5,color='cornflowerblue') +labs(x ='Year',y ='Wind Installed Cost (USD / kW)',title='Wind Installed Cost vs Year',subtitle ='Correlation = -0.81') +theme_bw(base_family ='Ubuntu')```Show the **solar** plots:```{r}grid.arrange(plot_1, plot_2, ncol =2)grid.arrange(plot_3, plot_4, ncol =2)``````{r}#| fig-width: 3.626plot_5```Show the **wind** plots:```{r}grid.arrange(plot_6, plot_7, ncol =2)```After plotting these high correlation relationships, it becomes more obvious which relationships are more convincing than others. Solar generation vs year appears to be more exponential rather than linear. Solar RD&D vs installed cost has very few data points. Finally, wind installed cost vs year has an odd curvature after 2005.### 3.7 Summary TablesThis section generates 3 summary tables of solar, wind, and fossil (coal).#### 3.7.1 Summary of SolarBelow is the summary of **solar** data, including solar electricity generation, installed cost, laws, and RD&D budget:```{r}# Calculate the summary and export to csvsummary_solar_gen <-summary(solar_gen$solar_gen_gwh)summary_solar_cost <-summary(solar_cost$solar_installed_cost_usd_per_kw)summary_solar_laws <-summary(solar_laws$count_of_solar_laws)summary_solar_budget <-summary(solar_wind_budget_wider$solar_rdd_million_usd)summary_solar_df <-data.frame(tier =names(summary_solar_gen),solar_gen_gwh =round(as.numeric(summary_solar_gen), 2),solar_cost_usd_per_kw =round(as.numeric(summary_solar_cost), 2),solar_laws_count =round(as.numeric(summary_solar_laws)),solar_budget_million_usd =round(as.numeric(summary_solar_budget), 2) ) %>%write_csv(here('data', 'summary_solar.csv'))# Genarate the table using kable()summary_solar_df %>%kable(format ='html',escape =FALSE,col.names =c('Tier','Electricity Generation (GWh)','Installed Cost (USD/KW)','Laws Count','RD&D Budget (Million USD)'),caption ='Summary of Solar') %>%kable_styling(bootstrap_options =c('striped', 'hover'))```#### 3.7.2 Summary of WindBelow is the summary of **wind** data, including wind electricity generation, installed cost, laws, and RD&D budget:```{r}# Calculate the summary and export to csvsummary_wind_gen <-summary(wind_gen$wind_gen_gwh)summary_wind_cost <-summary(wind_cost$wind_installed_cost_usd_per_kw)summary_wind_cost <- summary_wind_cost[!names(summary_wind_cost) =="NA's"]summary_wind_laws <-summary(wind_laws$count_of_wind_laws)summary_wind_budget <-summary(solar_wind_budget_wider$wind_rdd_million_usd)summary_wind_df <-data.frame(tier =names(summary_wind_gen),wind_gen_gwh =round(as.numeric(summary_wind_gen), 2),wind_cost_usd_per_kw =round(as.numeric(summary_wind_cost), 2),wind_laws_count =round(as.numeric(summary_wind_laws)),wind_budget_million_usd =round(as.numeric(summary_wind_budget), 2) ) %>%write_csv(here('data', 'summary_wind.csv'))# Genarate the table using kable()summary_wind_df %>%kable(format ='html',escape =FALSE,col.names =c('Tier','Electricity Generation (GWh)','Installed Cost (USD/KW)','Laws Count','RD&D Budget (Million USD)'),caption ='Summary of Wind') %>%kable_styling(bootstrap_options =c('striped', 'hover'))```#### 3.7.3 Summary of FossilBelow is the summary of **fossil fuel** data (coal costs):```{r}# Calculate the summary and export to csvsummary_fossil_cost <-summary(fossil_cost$avg_coal_cost_per_ton)summary_fossil_cost_df <-data.frame(tier =names(summary_fossil_cost),fossil_cost_usd_per_ton =round(as.numeric(summary_fossil_cost), 2)) %>%write_csv(here('data', 'summary_fossil_cost.csv'))# Genarate the table using kable()summary_fossil_cost_df %>%kable(format ='html',escape =FALSE,col.names =c('Tier','Coal Cost (USD/Ton)'),caption ='Summary of Fossil Cost') %>%kable_styling(bootstrap_options =c('striped', 'hover'))```## 4. AttributionBelow is the attribution breakdown for the team:- Research question was edited **together**.- Data collection and discussion and proposal expectation evaluation by **Abbey**.- Data cleaning and plotting by **Pingfan**.- Moving forward, we will **both** create the new figures to help explore the questions in **Stage 3**.## 5. Appendix### 5.1 Dictionary of FilesThis is a complete dictionary of files. The files can be accessed in the "data" folder. In the dictionary, there are 4 columns:1. Name - The name of the file.2. Type - The type of the file. In our case, we have CSV, HTML, and Excel.3. Source - The source of the file. "Generated" means it is generated by R codes; "Raw" means Its either directly downloaded or scrapped from the website.4. Description - A brief description of the file.The dictionary is in alphabetical order.```{r}dict_file <-data.frame(name =c('cor_solar_wind','dsire_laws_raw','dsire_solar_laws','dsire_wind_laws','eia_fossil_cost_raw','eia_fossil_cost','eia_solar_gen_raw','eia_solar_gen','eia_solar_wind_gen_longer','eia_wind_gen_raw','eia_wind_gen','high_cor_solar','high_cor_wind','iea_solar_wind_budget_wider','irena_solar_installed_cost','irena_solar_wind_installed_cost_longer','irena_wind_installed_cost','merged_solar_wind','merged_solar','merged_wind','summary_fossil_cost','summary_solar','summary_wind','dsire_laws_raw','irena_costs_raw'),type =c('CSV', 'CSV', 'CSV', 'CSV', 'CSV','CSV', 'CSV', 'CSV', 'CSV', 'CSV','CSV', 'CSV', 'CSV', 'CSV', 'CSV','CSV', 'CSV', 'CSV', 'CSV', 'CSV','CSV', 'CSV', 'CSV', 'HTML', 'Excel'),source =c('Generated', 'Raw', 'Generated', 'Generated', 'Raw','Generated', 'Raw', 'Generated', 'Generated', 'Raw','Generated', 'Generated', 'Generated', 'Raw','Generated', 'Generated', 'Generated', 'Generated','Generated', 'Generated', 'Generated', 'Generated','Generated', 'Raw', 'Raw'),description =c('Correlation between generation and 4 other variables','Laws and incentives count from DSIRE','Solar laws and incentives count','Wind laws and incentives count','Monthly fossil cost from EIA','Annual average fossil cost','Monthly solar electricity generation from EIA','Annual solar electricity generation','Longer version of annual solar/wind generation','Monthly wind electricity generation from EIA','Annual wind electricity generation','Solar related correlations higher than 0.8','Wind related correlations higher than 0.8','Wider version of solar/wind RD&D budgets from IEA','Annual average solar installed cost','Longer version of solar/wind generation installed cost','Annual average wind installed cost','Merged dataset of all solar, wind, and fossil','Merged dataset of both solar and fossil','Merged dataset of both wind and fossil','Summary data of fossil cost','Summary dataset of solar','Summary dataset of wind','Scraped law count data from DSIRE','Renewable power generation cost data from IRENA')) %>%mutate(count =row_number()) %>%select(count, everything())dict_file %>%kable(format ='html',escape =FALSE,align =c('c', 'l', 'l', 'l', 'l'),col.names =c('No.', 'Name', 'Type', 'Source', 'Description'),caption ='Dictionary of Data Files') %>%kable_styling(bootstrap_options =c('striped', 'hover'))```### 5.2 Dictionary of Solar and Wind DatasetsThis is a dictionary of solar and wind related datasets that are generated in this report. The majority of the datasets overlap with the file dictionary, and they have same or similar names.All datasets are in the format of **data frame** or **tibble**.```{r}dict_solar_wind <-data.frame(name =c('merged_solar','merged_solar_wind','merged_wind','solar_cost','solar_gen','solar_laws','solar_wind_budget_longer','solar_wind_budget_wider','solar_wind_cost','solar_wind_cost_longer','solar_wind_gen_longer','summary_solar_df','summary_wind_df','wind_cost','wind_gen','wind_laws'),type =c('A tibble: 50 × 6','A tibble: 50 × 10','A tibble: 49 × 6','A tibble: 10 × 2','A tibble: 22 × 2','A tibble: 20 × 2','A tibble: 84 × 3','A tibble: 42 × 3','A tibble: 10 × 3','A tibble: 20 × 3','A tibble: 44 × 3','A data frame: 6 x 5','A data frame: 6 x 5','A tibble: 36 × 2','A tibble: 22 × 2','A tibble: 17 × 2'),description =c('Merged dataset of both solar and fossil','Merged dataset of all solar, wind, and fossil','Merged dataset of both wind and fossil','Annual average solar installed cost','Annual solar electricity generation','Solar laws and incentives count','Longer version of solar/wind RD&D budgets','Wider version of solar/wind RD&D budgets','Solar/wind installed cost','Longer version of solar/wind installed cost','Longer version of annual solar/wind generation','Summary dataset of solar','Summary dataset of wind','Annual average wind installed cost','Annual wind electricity generation','Wind laws and incentives count')) %>%mutate(count =row_number()) %>%select(count, everything())dict_solar_wind %>%kable(format ='html',escape =FALSE,align =c('c', 'l', 'l', 'l'),col.names =c('No.', 'Name', 'Type', 'Description'),caption ='Dictionary of Solar and Wind Datasets') %>%kable_styling(bootstrap_options =c('striped', 'hover'))```### 5.3 Dictionary of CorrelationsThis dictionary contains the correlation datasets. All datasets are in the format of **data frame**.```{r}dict_cor <-data.frame(name =c('cor_matrix_solar','cor_matrix_wind','cor_solar_wind','high_cor_solar','high_cor_wind'),type =c('A data frame: 6 × 6','A data frame: 6 × 6','A data frame: 8 × 3','A data frame: 5 × 3','A data frame: 2 × 3'),description =c('Correlation matrix of all solar related variables','Correlation matrix of all wind related variables','Correlation between generation and 4 other variables','Solar related correlations higher than 0.8','Wind related correlations higher than 0.8')) %>%mutate(count =row_number()) %>%select(count, everything())dict_cor %>%kable(format ='html',escape =FALSE,align =c('c', 'l', 'l', 'l'),col.names =c('No.', 'Name', 'Type', 'Description'),caption ='Dictionary of Correlations') %>%kable_styling(bootstrap_options =c('striped', 'hover'))```### 5.4 MiscellaneousThere are also other datasets that are not included in the above dictionaries:1. There is a `fossil_cost` tibble that contains the average coal cost per ton from 2008 to 2022.2. There are 7 plots named as `plot_1` through `plot_7`. These are the correlation plots shown in [3.6.3 High Correlation Plots]. The other plots are shown directly and are not stored as variables or datasets.3. There are also some other datasets like `label_data`, `laws_html`, `laws_table`, etc. They are used to temporarily store values.