A deep learning approach to identify unhealthy advertisements in street view images

While outdoor advertisements are common features within towns and cities, they may reinforce social inequalities in health. Vulnerable populations in deprived areas may have greater exposure to fast food, gambling and alcohol advertisements, which may encourage their consumption. Understanding who is exposed and evaluating potential policy restrictions requires a substantial manual data collection effort. To address this problem we develop a deep learning workflow to automatically extract and classify unhealthy advertisements from street-level images. We introduce the Liverpool \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${360}^{\circ }$$\end{document}360∘ Street View (LIV360SV) dataset for evaluating our workflow. The dataset contains 25,349, 360 degree, street-level images collected via cycling with a GoPro Fusion camera, recorded Jan 14th–18th 2020. 10,106 advertisements were identified and classified as food (1335), alcohol (217), gambling (149) and other (8405). We find evidence of social inequalities with a larger proportion of food advertisements located within deprived areas and those frequented by students. Our project presents a novel implementation for the incidental classification of street view images for identifying unhealthy advertisements, providing a means through which to identify areas that can benefit from tougher advertisement restriction policies for tackling social inequalities.


Introduction
The literature on advertising has previously shown that certain social demographics experience greater exposure to unhealthy products via a variety of advertisement platforms [1][2][3][4][5] .There is increasing recognition of the role of unhealthy product consumption in the global non-communicable disease burden 6 .In recent years, some public authorities have responded by introducing restrictions to limit exposure towards advertisements that encourage risky behaviour (e.g.Transport for London have banned all fast food advertisements on their networks).Understanding the populations and areas exposed to unhealthy advertisements, monitoring if regulations are being adhered to, and identifying areas to implement restrictions remain open problems.Collecting advertisement data within urban environments requires a substantial manual effort 5,[7][8][9] , as there are very few (if any) existing secondary datasets geolocating advertisements.The rapid and dynamic nature of advertisements constantly changing also limits the use of surveying landscapes (which are time and cost intensive).
The emergence of deep learning 10 for improved image classification raises the possibility of automating this task.Current state-of-the-art seamless segmentation networks 11 can be trained to identify billboards using the Mapillary Vistas Dataset for semantic understanding of street scenes 12 .However, this dataset does not account for different advertisement categories.Furthermore, we consider that the manual annotation of advertisements within street-level imagery is both time consuming and can lead to a dataset with a limited shelf-life.Advertisement campaigns, company logos and product ranges are ever evolving 13 , rendering manual efforts obsolete.To mitigate this problem we present a workflow for extracting and classifying advertisements using a dataset augmentation approach that is flexible and allows repeated data sweeps.
The aim of our study is to develop a deep learning workflow to automatically extract and classify unhealthy advertisements from street view images.Our contributions can be summarized as follows: 3. We compare the clustering of extracted advertisements by socio-demographics to study the extent of social inequalities in unhealthy advertisement exposure.

The Impact of Unhealthy Advertisements
The Commercial Determinants of Health (CDoH), defined here as the processes where private organisations prioritise profit over public health, are powerful drivers of trends in non-communicable diseases and health inequalities 14,15 .Organisations may encourage the consumption of unhealthy products through marketing and advertisements campaigns across multiple platforms.
There is a growing concern among public health officials regarding the number of advertisements for risky products e.g., alcohol, gambling, unhealthy food and beverages 16,17 .Numerous studies conducted around the world indicate that exposure to unhealthy energy-dense, nutrition-poor food and beverage advertisements can promote unhealthy eating habits [18][19][20][21][22][23][24] .The marketing of products that are high in fat, sugar and salt to children is particularly concerning, as it increases the potential for diet-related diseases later in life 21 .Exposing adolescents to alcohol advertisements has been found to encourage early usage, and can lead to an increase in consumption 25 , while gambling advertisements can trigger an impulse to increase activities, in particular in individuals who want to either quit or gamble less frequently 26 .

Differences in exposure to advertising
When advertisements are prevalent within deprived areas, or areas with high levels of obesity, their role may counter public health efforts to tackle health inequalities.Evidence suggests a socio-economic difference in exposure to outdoor food advertising.For instance, in Newcastle upon Tyne, England larger spaces were found to be devoted to food advertisements within less affluent areas 5 .Differences in exposure meanwhile have been linked to a big data revolution, which has seen many firms possessing unprecedented amounts of information about consumers to enable advertisement campaigns to be aimed at individual demographics within the population 4,27 .This practice has been shown to impact brand perceptions of the exposed demographic.Harris et al. 28 find that upon experiencing greater exposure towards advertisements promoting energy dense and nutrient poor foods, Black and Latino adolescents develop a more positive attitude towards the promoted brand.Pasch et al. 29 show that the number of outdoor alcohol advertisements found within 1500 feet of 63 Chicago schools is significantly higher for schools with 20% or more Hispanic students -6.5 times higher than for Schools with less than 20% Hispanic students.Alcohol marketing campaigns have also been shown to be more prevalent around areas frequented by University students.Kuo et al. 30 find that alcohol advertisements are prevalent in the alcohol outlets around college campuses in the USA.
Students are also a demographic more likely to be exposed to gambling advertisements.Clemens et al. 31 find that high exposure towards gambling advertisements is positively related to all assessed gambling outcomes.In addition, strong associations have been found for adolescents and students engaging in risky behaviour such as drinking and gambling when exposed to related advertisements 32,33 .Problem gambling in particular has the potential to be amplified by drinking and eating disorders.Lopez et al. 33 investigate the extent to which gambling commercials are promoting risky behaviour of drinking alcohol and eating low nutritional value food, looking at the narratives depicted within the advertisements.The authors find that British and Spanish football betting advertisements attempt to align the consumption of alcohol with sports culture and friendship within the emotionally charged context of watching sporting events.Indeed, even far reaching sporting bodies, e.g., the English Premier League, have been shown to have marketing portfolios that include unhealthy products 17 .
Restricting exposure to unhealthy advertisements meanwhile has been found to have a positive effect on behaviour 24 .Lwin et al. 34 , for example, study the impact of food advertising restrictions enforced in Singapore.The authors find that children's cognition towards fast-food shifts in a desirable direction upon a stricter policy being adopted, with household stocks of unhealthy food items also decreasing.However, while there is evidence that vulnerable populations are more exposed to unhealthy advertisements and restricting them is an effective strategy, much of these data come from lab-based studies.To our knowledge, there is very few to no data available on the location of advertisements.To be able to understand differential patterns of exposure, as well as effectively evaluative the impact of future regulative interventions, we need data systems that map advertisement locations.Traditional data collection strategies employ primary surveys to locate advertisements, however such methods are time and cost intensive making them static snapshots that fail to capture the dynamic and evolving aspects of advertisement strategies.

Deep Learning
Utilising incidental data sources, coupled with maturing image classification techniques offers one way forward to improve and automate the data collection process efficiently.Deep Learning is one technique that has shown a lot of promise for developing solutions to challenging virtual and real world problems 35,36 .These successes can be attributed to breakthroughs that enable deep neural networks to learn solutions to problems that humans solve using intuition 10 .Deep neural networks are trained to extract compact features from complex high dimensional input data.They accomplish this by combining layers of hierarchical features into ever more complex concepts.Our workflow uses Convolutional Neural Networks (ConvNets), which can extract features from inputs in the form of arrays and tensors 37 .A ConvNet trained to classify images consists of layers of neurons, with the first layer extracting edges, which are combined into corners and contours by the next layers, before subsequently being combined to form the object parts that enable a classification.Through stacking multiple non-linear layers the network can be trained using stochastic gradient descent to implement complex functions, that are sensitive towards minute details within inputs, while simultaneously being able to ignore less relevant features 37 .Through building an effective classifier that can be updated with new information (important when advertisements are constantly changing), deep learning offers a deployable tool that automatically classify images more efficiently than manual coding by researchers.

Mapillary Vistas dataset
Street level images (also known as street view images) are panoramic images recorded at set intervals.Services such as Google Street View, Bing Maps and Mapillary use these data to provide a virtual representation of map locations.To aid the development of state-of-the-art road scene understanding Mapillary introduced the Vistas dataset, which consists of 25,000 densely-annotated international street level-images with 66 object categories, including billboards 12 .The dataset annotated crowd sourced images, of which approximately 90% are from road / sidewalk views in urban areas, with the remaining being rural areas and off-road.Individual objects within each images are delineated using polygons.Since its release the Mapillary Vistas has frequently been used for benchmarking panoptic street scene segmentation methods 11,38 .

Advertisement Data
While the Mapillary Vistas includes a billboards category, the dataset does not distinguish different types of advertisements.Further annotations would therefore be necessary to train panoptic scene segmentation networks to differentiate between advertisement types.However, manually annotating segmentation masks is a time consuming task.Instead, we propose to classify advertisements extracted from street level images using a model trained to classify advertisement images.Google Images is a useful resource for obtaining data for training and evaluating deep learning architectures 39,40 .We use this resource to build an advertisement dataset.First we compile a list of relevant keywords describing brands, business names and key terms.The keywords are subsequently used to scrape images using Python's Google Image Download package (https://pypi.org/project/google_images_download/).Our final dataset consists of 159,897 food, 80,001 alcohol, 40,119 gambling and 34,156 other samples.Duplicate images are removed using FDUPES 41 .

The Liverpool 360 Street View Dataset
While there exists an abundance of street-level imagery on platforms such as Google Street View, the recently imposed costs for using Google's API, as well as cases of Google updating terms and conditions to hinder researchers, highlights the need for alternative open sourced solutions.Existing open and crowd sourced street-level images predominately lack the quality of the interactive panoramas found on services such as Google Street View.Images are frequently recorded using dashboard cameras, and as a result have a restricted field of vision.Motivated by these factors we record an open street-level dataset for Liverpool, using a GoPro Fusion 360 • camera attached to a member of the team (Mark Green) who cycled along major roads.We follow Mapillary's recommendations for recording street-level images (https://help.mapillary.com/hc/en-us/articles/360026122412-GoPro-Fusion-360).The camera records front and back images at 0.5 second interval, which we later stitch together using GoPro Fusion Studio.To date our dataset consists of 26,645 street-level images each with GPS location recorded.We illustrate the current coverage of the LIV360SV dataset in Figure 1.We focused on sampling three areas of Liverpool with varying contexts over three different days: (1) City Centre (Jan 14 th 2020) -areas characterised by shops and services; (2) North Liverpool (Jan 15 th 2020) -areas contain high levels of deprivation; (3) South Liverpool (Jan 18 th 2020) -areas include a mixture of affluent populations and diverse ethnic groups (See https://www.mapillary.com/app/org/gdsl_uol?lat=53.39&lng=-2.9&z=11.72&tab=uploads).To date we have identified 10,106 advertisements within these data, manually classified as food (1335), alcohol (217), gambling (149) and other (8405).

Spatial data
To examine the extent of geographical clustering in the socio-demographic types of areas that advertisements are located, we use three area level datasets.
First, neighbourhood deprivation is measured using the English Indices of Deprivation 2019 42 .The index measures neighbourhood deprivation based on seven domains including income, employment, education, health, crime, access to housing and services, and environmental features.Data are measured for Lower Super Output Areas (LSOAs) which are administrative zones with an average population size of ≈ 1500 people.We use decile of deprivation rank for analyses.Second, socio-demographic area type is measured using 2011 Output Area Classification (OAC) 43 .OAC is a neighbourhood classification built using data from demographic (e.g.age, sex, ethnicity) and social (e.g.occupation, education) measures to classify 'area types'.OAC comprises 8 Supergroups and 26 Groups which we describe in Table 1.We focus our evaluation at the Supergroup and Group levels.Output Areas are administrative zones with a minimum of 100 people.

3/13
Finally, we examine whether advertisements are clustered by relevant health outcomes.We focus on small area estimates of child obesity and excess weight.Estimates are taken from the National Child Measurement Programme and are released for Middle Layer Super Output Areas (average population size ≈ 7000).We conduct our evaluation using the 2015/16 to 2017/18 measurements of reception and year 6 children (https://www.gov.uk/government/statistics/child-obesity-and-excess-weightsmall-area-level-data).No openly available small area data on alcohol-and gambling-related outcomes were available.

Method
Figure 2 illustrates our workflow, and we discuss each individual component in detail below.For implementation details and dataset download instructions visit: https://github.com/gjp1203/LIV360SV.

Seamless Scene Segmentation
For extracting advertisements from street level images we use the seamless scene segmentation network introduced by Porzi et al. 11 .The network offers advantages of both semantic segmentation -determining the semantic category that a pixel belongs to -and instance-specific semantic segmentation -the individual object that a pixel belongs to, enabling differentiation between Table 1.Area classification for output area (OAC) cluster names 43 .
neighbouring entities of the same type.The authors achieve state-of-the-art results on three street-view datasets, including Cityscapes 44 , the Indian Driving Dataset 45 and Mapillary Vistas 12 .

Extraction
Upon identifying the location of an advertisement, we obtain a one hot mask with a filled convex hull using OpenCV's find and draw contours functionalities 46 .The masks allow us to extract individual advertisements from the original input images.

Preprocessing
With the remaining content having been masked out during the extraction step we subsequently crop the images.However, given that the final step of our workflow is to pass the extracted items to a classifier trained on advertisement images with a frontal view, we use a Spatial Transformation Network (STN) 47 to transform the extracted items, the majority of which were recorded from a non-frontal view.

Classification
We classify extracted advertisements using Keras' MobileNet-V2 48 implementation.The network is trained using manually labelled extracted samples augmented with the scraped images dataset described in Section 3.2.We train the network for five 1250 step epochs, using a learning rate of 1e-4 and a batch size of 32 images per step.The inputs images are of size 224x224 pixels.We apply common dataset augmentation techniques including random rotations and spatial transformations.We accelerate the training process using a GeForce GTX 1080 GPU.

5/13
Figure 2.An illustration of our advertisement classification workflow.

Results
We take a two-step approach towards evaluating our proposed workflow.First we analyse the clustering of advertisements extracted using the seamless scene segmentation network component.For precision we conduct this analysis upon assigning ground truth labels to the extracted advertisements.Our second step is to evaluate the extent to which a MobileNetV2 can be trained to classify the extracted advertisements.

Examining inequalities in advertisement locations
In Figure 3 we illustrate the distribution of advertisements belonging to each category across the LSOAs for Liverpool.The LSOAs are each assigned a color shading based on the decile that they belong to, with white and black representing the most and least deprived respectively.Advertisements are represented by circles.We turn to bar-plots in Figure 4 to illustrate exposure towards unhealthy advertisements per decile of deprivation.However, in Sub-Figure 4a we observe an imbalance in the number of street-level image samples per decile within the LIV360SV dataset.We therefore focus on the proportion of advertisements found within each decile.In Sub-Figure 4b we observe that, with the exception of alcohol, the less deprived LSOAs have fewer advertisements compared to the more deprived areas.While larger proportions of food advertisements are found within deciles 1 to 6, the highest proportion of alcohol advertisements can be found in decile 8 (5.59%).For gambling meanwhile the largest proportion of advertisements are found within decile 5 (2.37%).Figure 5 compares the proportions of advertisements by OAC area type.For alcohol we observe that a large proportion of advertisements belong to OAC 8c -Hard Pressed Aging Workers (14.29%, see Sub-Figure 5b).However, this category only contains 14 images (Sub-Figure 5a).Among the better represented categories the largest proportions of advertisements can be found within 2a -Students Around Campus (0.83%), 2b -Inner City Students (2.98%) and 3a -Ethnic Family Life (1.15%).For food and gambling the largest proportions of advertisements are also located within 2a (1.77%) and 2b (1.01%).We also observe larger proportional representation under Super-Group 7 -Constrained City Dwellers, in particular 7a -Challenged Diversity (0.42%) and 7c -White Communities (0.74%).The largest proportions of food advertisements can be found within super-groups 2 -Cosmopolitans, 4 -Multicultural Metropolitans and 8 -Hard-Pressed Living.Specifically, 2a -Students Around Campus (4.62%), 2b-Inner-City Students (11.8%), 2c-Comfortable Cosmopolitan (4.23%), 4b-Challenged Asian Terraces (23.29%), 4c-Asian Traits (16.67%), 8b-Challenged Terraced Workers (23.2%) and 8c-Hard-Pressed Ageing Workers (7.14%).However, 4b (73), 4c (6) and 8c ( 14) contain less images compared to the other categories.
In Sub-Figure 6a we shade MSOAs according to the percentage of children classed as carrying excess weight for reception and year 6 pupils.Red circles represent the locations of food advertisements.To gain insights regarding the exposure of pupils towards food advertisements we split the MSOAs into deciles according to the percentage of children carrying excess weight.
We illustrate the number and percentage of food advertisements found within each decile in Sup-Figures 6b and 6c respectively.With the exception of deciles six, nine and ten, we observe larger exposure towards food advertisements within MSOAs with higher percentages of pupils carrying excess weight.

Towards Automated Classifications
We evaluate if advertisements extracted from street-level imagery can be categorised automatically.We conduct four-fold cross validation using the advertisements extracted from our LIV360SV dataset with ground truth labelling.However, given that we have an imbalance in our dataset, and the fact that alcohol and gambling are underrepresented, we augment our training data with images scraped from Google Images (See Section 3.2).Specifically, first we use oversampling to ensure that each category has an equal number of LIV360SV advertisements (≈ 6000 images), and subsequently add randomly selected scraped advertisement to obtain 10,000 images per category.For each fold we train a MobileNet-V2 using the hyperparameter configuration outlined in Section 4. 4.
In Figure 7 we depict the resulting confusion matrix as well as precision and recall values for each category.In the confusion matrix each column represents the likelihood of a row entity being classified as the respective column entity.While the diagonal elements of a confusion matrix show that the majority of test samples are correctly classified, we observe that measures could   be taken to improve the classifier.In particular gambling and alcohol advertisements, the two categories that rely more heavily on dataset augmentation for diversity, are more likely to be mistaken for food and other, but are rarely confused with each other.Indeed, while food has the highest recall (0.84) it has the lowest precision (0.59).In contrast gambling and alcohol have high precision but low recall.

Discussion
Our study demonstrates a novel workflow that can be used to efficiently identify the location of unhealthy advertisements from street-view imagery.To date we have extracted 10,106 advertisements for Liverpool, UK, categorised as food (1335), alcohol (217), gambling (149) and other (8405).There was distinct geographical clustering of advertisements particularly with greater amounts of unhealthy advertisements in deprived areas and student populations.Our approach addresses the dearth of data available on the location of unhealthy advertisements, offering an efficient and deployable tool for surveying other towns and cities.
The prevalence of food, gambling and alcohol advertisements within areas classified as inner-city students and campus provides further evidence that the student population is experiencing greater exposure to advertisements for unhealthy products 32,33 .Regulating these areas and protecting younger student populations might be a key policy goal particularly as this period of the life course is important at establishing behaviours that may continue into later life.The clustering of unhealthy food advertisements in deprived areas may exacerbate inequalities in obesity and related health conditions.This would suggest that any policy to regulate the location of unhealthy food advertisements would be progressive and potentially help to narrow inequalities.
Having identified the prevalence of unhealthy advertisements within areas frequented by students opens up interesting avenues for future research.For example, given advertisers' attempts to normalize the consumption of unhealthy items while gambling with friends 33 , an evaluation could be conducted to determine whether these behaviours are more likely to be enacted in areas with greater exposure.In addition, insights could be gained through differentiating between advertisement formats and studying the extent to which each type contributes towards triggering an impulse to gamble, e.g., billboard, shop window, and store signs.We find evidence of areas with a high percentage of excess weight children being more exposed to food advertisements.However, we note that a more systematic approach towards gathering data is necessary to evaluate the extent to which the current rules restricting the promotion of high fat, sugar and salt (HFSS) products within 100 meters from schools is deterring advertisers (https://www.asa.org.uk/advice-online/food-hfss-media-placement.html).In addition, we consider that individuals are often exposed to advertisements via dynamic entities.Bus stops for instance use monitors that can switch between advertisements.Developing our approach to account for these issues will be useful for future research.Further, insights could be gained through differentiating between advertisement formats and studying the extent to which each type contributes towards triggering behaviours to identify where regulations should focus their efforts.
A key strength to our study is the efficient data collection of advertisement locations.To our knowledge, there is no open dataset that charts the location of advertisements in the UK and our project helps to develop a tool to address the gap.
Having access to open data on advertisement locations is key for making effective policy decisions.Through automating the classification of street-view imagery, our approach can be efficiently combined with incidental data sources to locate advertisements over time with little additional time or resource costs.Expanding our data collection efforts to additional cities will help improve data coverage.
There are several limitations with regards to both the data and methods used in this paper.First, LIV360SV contains a number of unhealthy advertisements that are worthy of their own category.For instance, electronic cigarettes and vaping devices have become the most common tobacco products used by youth, with brands using similar marketing and advertising strategies as previously used for traditional tobacco products 49 .Classifying new categories would require retraining our classifier using additional data.Similarly, when applying our approach to a different location representative training data must be obtained for local brands and product ranges.Although our 'other' category may not be specific, it captures the total potential exposure for unhealthy advertisements given that advertisements may change weekly in their content.
We note that the data collection process requires a systematic approach.Figure 4 displays that our dataset is skewed towards more deprived areas with regards to the number of samples.This largely reflects the historical concentration of deprivation and inequalities in Liverpool.Collecting data across different contexts and cities will help to improve the generalisability of our  dataset.Our initial data collection wave was in January where anecdotally during data collection, many advertisements were observed as relating to gyms or physical exercise.Commercial firms may release advertisements at different parts of the year based on seasonal trends (e.g.Easter and chocolate), events (e.g.gambling around sporting events) or product development.
We plan to record seasonal data to enable a longitudinal study of advertisements within Liverpool.Finally, steps are necessary to improve the accuracy of the workflow's classifier component (Section 5.2).Our evaluation shows that our augmented learning approach requires more representative training images for alcohol and gambling.In addition, advertisements extracted from street level imagery are often partially obscured by other real world entities (cars, trees, pedestrians).An approach to improve these issues may be to classify advertisements within street-level imagery augmented with Generative Adversarial Networks (GANs) 50 .We propose to embed selected advertisements within street-level imagery through GANs to create additional training data (albeit 'fake data') for model training.To date we can show that advertisements can be successfully integrated into street-level images.We place the advertisement using a STN to transform the image to a target shape.Finally we train GANs to realistically embed the images.We hypothesize that augmenting our collected street view data with these secondary GANs created data will enable the training of an effective model.

Conclusion
Our study presents a novel open deep learning workflow for extracting and classifying unhealthy advertisements within streetlevel imagery.Tackling inequalities in exposures to unhealthy advertisements might offer feasible regulatory opportunities for public authorities, especially when coupled with efficient and effective data collection methods to support decision making.There are very few to no existing secondary datasets providing this information to public authorities or researchers, and our project can solves this barrier to effective decision making.Our deployable tool can be used to efficiently collect data for understanding exposure to unhealthy advertisements, as well as identifying areas with high exposures that can benefit from restriction policies.

Figure 1 .
Figure 1.Map depicting the coverage of the LIV360SV dataset color coded by lower-layer super output areas (LSOAs).

Figure 3 .
Figure 3. Liverpool advertisement locations by Lower Super Output Areas (LSOAs).A color gradient indicates the level of deprivation, with white and black being the most and least deprived respectively.
(a) Street-level image totals per deprivation decile.(b) Unhealthy advertisement percentages.(c) Other advertisement percentage.

Figure 4 .
Figure 4. Sub-Figure 4a illustrates the number of street-level images per deprivation decile according to the 2019 English indices of deprivation.Sub-Figures 4b and 4c contain the percentage of images with unhealthy advertisements and those of type 'other' respectively.

Figure 5 .
Figure 5. Sub-Figure 5a illustrates the number of street-level images per OAC category.Sub-Figures 5b and 5c contain the percentage of images per OAC that contain either unhealthy advertisements and those of type 'other' respectively.

( a )
Clustering of food advertisements.(b) Counts (c) Percentage

Figure 6 .
Figure 6.Sub-Figure 6a provides a map illustrating the clustering of food advertisements in MSOAs in Liverpool, shaded by the percentage of excess weight children.In Sup-Figure 6b we split the MSOAs into deciles and illustrate the number of advertisements found within each decile.Finally, in Sub-Figure 6c we illustrate the percentage of images within each decile containing food advertisements.

Figure 7 .
Figure 7. Confusion matrix with corresponding precision and recall values upon conducting 4-fold cross validation.Values were obtained by training a MobileNet-V2 on the extracted ads from the LIV360SV dataset augmented with advertisement scraped from Google Images.