This is the second assignment on Data Visualization!

Clarity:
The stacked bar chart title indicates “Which Country is more pro-vaccine?” but the bar chart is sorted in ascending alphabetical by Y-axis, hence mislead users that Australia is the country that is more pro-vaccine when it is not. There are no labels added to the data points as well, hence hindering the proper interpretation of data.
For the stacked bar chart indicating “Which Country is more pro-vaccine?”, It is hard to interpret the responses in the middle of the stacked bar as they do not share a common baseline.
The bar chart indicating “% of strongly agreed to vaccination” shows the breakdown of the percentage of strongly agreed to vaccination for respective countries but does not reflect the number of respondents. Hence, it may mislead user to the wrong conclusion if the sample size is small.
Each chart has a different scale on its horizontal axis. For the stacked bar chart indicating “Which Country is more pro-vaccine?”, the values of the percentage of total record are from 0% to 100% whereas for the bar chart indicating “% of strongly agreed to vaccination”, the values of the percentage of strongly agreed are from 0% to 60%. Hence, making it unclear to user whether the chart is truncated.
There is inconsistent sort method associated with each chart. For the stacked bar chart indicating “Which Country is more pro-vaccine?”, the country is sorted in ascending alphabetical for the Y-axis while the other bar chart indicating “% of strongly agreed to vaccination”, data is sorted by percentage of strongly agree for the Y-axis in descending order. Hence, it may mislead user that both charts are sorted in the same manner and hinder the proper interpretation of data.
Legend showing Likert scales is incomplete as it only shows “1-Strongly agree” and “5-Strongly disagree”. There are no responses for point values, from 2 to 4.
Legend title indicates as “Vac 1” but there is no explanation of this notation.
Asethetics:
Data Labels (for example, Percentage of public who agreed to vaccination) are not displaying for all the data points on the bar charts. Hence, unable to facilitate easy reading of data points.
The axis labels on both charts are inconsistent as the stacked bar chart is displaying percentage of total records in whole number while the other bar chart is displaying percentage of strongly agreed in one decimal place.
There is no clear indication of data source stated in the chart.
The use of colour scheme used on each segment of the stacked bar chart is confusing as there are too many contrasting colour.
There is a spelling error for the word ‘Vaccine’ which was spelt as ‘Vacinne’ in the chart title.
Sketch of Proposed Design: 
In terms of Clarity:
For the First Chart
The positive and negative percentages are presented using a synchronized dual axis chart. The neutrals are a separate chart that is placed next to this dual axis chart.
Note: (i) “Strongly Disagree” and “Disagree” categories are combined to obtain Percentage Negative. (ii) “Strongly Agree” and “Agree” categories are combined to obtain Percentage Positive.
Subsequently, a histogram bar chart is created to show the breakdown of responses for all point values on a Likert scale (“1-Strongly agree”, “2-Agree”,“3-Neutral”,“4-Disagree”,“5-Strongly disagree”) when user hovers over the stacked bar.
In order to build a dashboard that is interactive, parameter actions such as filters are used. For instance, user can select the survey questions using a parameter control to change the type of question. (i.e. Vac_1 is renamed to “If a Covid-19 vaccine were made available to me this week, I would definitely get it”.) For the purpose of further analysis, data collected from vac2_1, vac2_2, vac2_3, vac2_6 and vac3 are used.
For the Second Chart
In order not to mislead user into thinking that they can conclude more than the data allows, the results are presented with confidence intervals (i.e. at 90%/95%/99% confidence interval).
In order to build a dashboard that is interactive, parameter actions such as filters are used. For instance, user can select the date using a parameter control to change the Month of Year (i.e. January 2021).
In terms of Asethetics:
All Data points are consistently labelled in percentage in 0 decimal place, Chart title is included, and Axis title are properly labelled for better visualization of data.
Data source is clearly stated in the chart. Link as follows: https://github.com/YouGov-Data/covid-19-tracker/tree/master/data.
For better visualisation of the histogram bar chart, sequential colour scheme is applied. Recolour each segment of the stack from darkest to lightest shades of one colour to differentiate Positive sentiment, Neutral and Negative sentiment.
Data Visualization can be found in Tableau Public Server: https://public.tableau.com/profile/elaine3214#!/vizhome/Dataviz2_16136800084340/Dashboard1?publish=yes
Data Sources are extracted from https://github.com/YouGov-Data/covid-19-tracker/tree/master/data.
For CSV files:
Unzip the zip files of Australia, Denmark, France, Germany, Italy, Norway, Singapore, Sweden and United Kingdom which was downloaded from Imperial College London YouGov Covid 19 Behaviour Tracker Data Hub hosted at Github. Save all 14 CSV files (comprised of Australia, Canada, Denmark, Finland, France, Germany, Italy, Japan, Netherlands, Norway, Singapore, South Korea, Sweden and United Kingdom) in a folder.
There were multiple columns of field name “employment_status” for Denmark, Finland, Norway and Sweden CSV files.
The below formula was used to combine data from multiple columns into one column to reflect the correct Employment Status for Norway’s CSV files. Apply similar formula for Denmark, Finland, and Sweden CSV files.
Formula: IF(MATCH(“Yes”,CA2:CG2,0)=1,“Full time employment”,IF(MATCH(“Yes”,CA2:CG2,0)=2,“Part time employment”,IF(MATCH(“Yes”,CA2:CG2,0)=3,“Full time student”,IF(MATCH(“Yes”,CA2:CG2,0)=4,“Retired”,IF(MATCH(“Yes”,CA2:CG2,0)=5,“Unemployed”,IF(MATCH(“Yes”,CA2:CG2,0)=6,“Not working”,IF(MATCH(“Yes”,CA2:CG2,0)=7,“Other”,0)))))))
Rename CSV files to “Denmark_cleaned”, “Finland_cleaned”, “Norway_cleaned” and “Sweden_cleaned”.
For Tableau:
Go to File and open the folder where all 14 CSV files are kept and double click on any one of the CSV files. All the CSV files in the folder will be automatically be listed in Tableau.
Next, manually union distinct tables. On the Data Source page, double click on New Union to set up the union. Select all tables to union in the left pane and then drag them directly below the first table.

Manage Metadata

Pivot



In the Data pane, right-click on “Survey Answer”. In the Edit Aliases dialog box, amend the Value (Alias) to the following responses ranges from 1=“Strongly agree”, 2=“Agree”, 3=“Neutral”, 4=“Disagree” and 5=“Strongly disagree”.

In the Data pane, right-click on “Survey Question” and select Aliases. In the Edit Aliases dialog box, amend the Value (Alias) to the following responses ranges.

In the Calculation Editor that opens, give the calculated field a name and enter a formula or value. When finished, click Ok.









Create a visualization in a target worksheet view to serve as the Visualization in Tooltip. Rename the Worksheet as ‘Tooltip’.
Drag [Survey Answer] to the Columns shelf and [Number of Records] to the Rows shelf. Right-click on [Number of Records] and select Percent of Total under Quick Table Calculation.


Final visualization of the chart is as follows: 



Create Confidence interval bars:
Create the necessary Calculated Field: Select Analysis and click Create Calculated Field.
Create calculated field – Score-Strongly Agree: To extract the percentage of survey responses who strongly agreed to vaccination, create the calculated field “Score-Strongly Agree” with calculation as follows:
if[Score]=1 THEN 1
else 0
end
SUM([Score-Strongly Agree])/SUM([Number of Records])
sqrt(([Proportion]*(1-[Proportion]))/sum([Number of Records]))
Create calculate field-Z Value for 90% confidence interval: Create the calculated field “Z_90%” with value as follows: 1.64485
Create calculated field – Z Value for 95% confidence interval: Create the calculated field “Z_95%” with value as follows: 1.959964
Create calculated field – Z Value for 99% confidence interval: Create the calculated field “Z_99%” with value as follows: 2.575829
Create a parameter to select confidence interval: In the Data pane, click the drop-down arrow in the upper right corner and select Create Parameter. Select the Data type as “String” and Allowable values as “List”. Edit the List of Values as follows:


Create the calculated field “Upper Bound of Confidence Interval” with calculation as follows: [Proportion]+[CI]*[Proportion_SE]
Create the calculated field “Lower Bound of Confidence Interval” with calculation as follows: [Proportion]-[CI]*[Proportion_SE]








The Final Visualization of the chart is as follows:
Create Dashboard

Edit the title of the Dashboard to “Public willingness on Covid-19 vaccination Survey Report”.
Remove filters such as Measure Names and include the following fields; age (group according to 18-58 and 59-99), employment_status (rename to Employment Status), household_size (Household Size), gender (Gender),endtime (rename to Month-Year) and household_children (rename to Household Children) to filters (on the right pane) to create an interactive Dashboard.
Add in Data Source at the bottom of the Dashboard: https://github.com/YouGov-Data/covid-19-tracker/tree/master/data.
Select every filter on the right pane and click on the left arrow indicating “More Options”. Select “Apply to Worksheets” and click on “All Using This Data Source”.






Evident from another survey question, “I am worried about the potential side effects of a Covid-19 vaccine”, the uncertainty in the responses who selected ‘Strongly Agree’ for household size of 5 is also larger than the responses for household size of 3.