Table of contents
1. What is graph bar chart?
graph bar command to create vertical bar charts that effectively summarize numerical data across categorical groups. The graph bar command:
- Displays statistical measures (e.g., mean, sum, count) for selected variables.
- Supports flexible grouping with the
over()option. - Offers extensive customization for bar appearance, including colors, gaps, and label orientation.
- Allows data filtering using conditional statements (e.g.,
ifclause). - Integrates with additional options for axes, legends, and overall graph aesthetics.
The basic syntax for creating a line chart in Stata is:
[code]*Syntax of the line command in Stata
graph bar yvars [if] [in] [weight] [, options][/code]
where, yvars represents the variables to be summarized, asis, percent or count, or is a statistic. Options like over() help group the data for more visual comparisons.
⠀
Basic examples
Simple count bar graph
[code]*First, load the datasetsysuse nlsw88.dta, clear
* bar graph for counts
graph bar (count)[/code]
Adding an “over” option
[code]graph bar (count), over(race)[/code]Displaying totals
[code]graph bar (count), over(race) blabel(total)[/code]Showing percentages
[code]graph bar (percent), over(race) ylabel(0(20)100)[/code]group_options– control the appearance and connection of lines.yvar_options– assign plots to specific axes.lookofbar_options– for titles, legends, axes, and overall graph appearance.legending_options– how yvars are labeledaxis_options– how the numerical y axis is labeledtitle_and_other_options– titles, added text, aspect ratio, etc.
Let’s explore its usage with examples.
2. Horizontal bar graphs
In a horizontal bar graph, the bars are displayed horizontally, with the categories on the vertical axis and the values on the horizontal axis. The horizontal bars are usually used when you have many categories or long category labels, as it makes them easier to read.
[code]graph hbar (percent), over(occupation)[/code]
Sorting and displaying all categories
[code]graph hbar (percent), over(occupation, sort(1) descending) missing allcategories[/code]Graphs with Two or More “over” Variables
[code]graph export “C:Usersimsc.80166PicturesPicture1.svg”, as(svg) name(“Graph”) replace[/code]Percentages with two “over” variables
Two variables side by side over third
sysuse auto.dta, clear
* sort by two variables
sort foreign weight
* Create graph
graph bar mpg trunk if foreign==1, over(weight, lab(angle(45))) [/code]
(line invest year if firm == 1, sort lcolor(blue)):
Plots a blue line for Firm 1’s investment over time. Theif firm == 1condition restricts the plot to Firm 1. Thesortoption ensures that the data points are connected in order of theyearvariable.line invest year if firm == 2, sort lcolor(red)):
Plots a red line for Firm 2’s investment over time.title("Investment over Time for Firms 1 and 2")adds a main title.xtitle("Year")andytitle("Investment")label the axes.- The
legend()option provides labels for the two lines.
twoway vs line
line command is a shorthand for the more general twoway line command. For basic line plots, both commands function identically, producing the same graphical output. The distinction arises in more complex scenarios. twoway syntax allows for the combination of multiple plot types within a single graph. For example, you can overlay a line plot with a scatter plot or a linear fit (lfit) using the twoway framework. In simpler cases, choosing between line and twoway line for a single line plot is purely a matter of preference as they are functionally equivalent.
* Both commands produce the same basic line chart line yvar xvar twoway line yvar xvar * twoway syntax allows combining plot types twoway (line yvar xvar) (scatter yvar xvar) (lfit yvar xvar)
Combine line with bar chart
Overlay graphs are typically used to display two variables on a single graph, allowing for a direct comparison between absolute and relative values. For instance, one variable may be represented by a bar chart—which effectively shows counts or proportions—while the other is illustrated with a line graph that highlights trends or changes over a continuous scale.
[code]
clear*
input str40 Sector Coverage Ratio
“Agriculture; ” .85813358 0.52
“Mining ” .89187858 0.13
“Manufacturing ” .36191116 0.29
“Electricity; ” .68654997 0.18
“Construction ” .13923316 0.36
“Wholesale ” .62995644 0.35
“Transport; ” .34724069 0.27
“Financial Services ” .75544079 0.22
“CSP ” .90706484 0.31
“Private Households ” .9931992 0.80
end
input str40 Sector Coverage Ratio
“Agriculture; ” .85813358 0.52
“Mining ” .89187858 0.13
“Manufacturing ” .36191116 0.29
“Electricity; ” .68654997 0.18
“Construction ” .13923316 0.36
“Wholesale ” .62995644 0.35
“Transport; ” .34724069 0.27
“Financial Services ” .75544079 0.22
“CSP ” .90706484 0.31
“Private Households ” .9931992 0.80
end
g Sector1=_n
labmask Sector1, val(Sector) // ssc install labmask
twoway bar Coverage Sector1, ylab(0(.2)1, ///
notick) barwidth(.7) xtitle(“”) ytitle(“”) ///
xla(1/10, valuelabel notick ang(90)) || ///
line Ratio Sector1, sort[/code]