Adidas Sales Analysis
Project Report for STAT 303-1 Fall 2025
1) Inspiration
According to Forbes, data-driven decision making (DDDM) has become a critical source for retailers in an increasingly digital world. First, analytics can improve inventory management by identifying sales patterns and reducing the risks of overstocking or understocking. Additionally, it can inform retailers of dynamic pricing strategies by helping them understand competitors’ behavior and customers’ price sensitivity. Finally, analytics may enable more personalized customer preferences, allowing retailers to tailor recommendations based on individual purchasing behavior [1]. Learning about DDDM has strengthened my interest in product analytics and to understand how data can be used extract insights that inform decisions related to profitability, regional performance, and product demand.
Recently, Adidas has highlighted their direct-to-consumer (DTC) channels, which focuses on improving platforms such as the Adidas’s website, mobile app, and branded retail stores, to better control pricing, access richer customer data, and making it a central elemnt of the company’s digital and analytics-driven growth strategy. To work on this, they have set up a strategic target for 50% of total sales to come from DTC by 2025, up from just 30% in 2019 [2]. This shift has already contributed to improved operational efficiency, more accurate demand forecasting, increased customer satisfaction and higher profitability, as Adidas supports the growing emphasis on data-driven decision-making.
Personally, as someone who genuinely enjoys Adidas shoes, I was excited to analyze data about a brand I’m more familiar with and learn from a company that is very competitive in the digital retail landscape. From my dataset, I hope to explore a thorough view of Adida’s retail operations, which includes product types, retailer performance, sales patterns, regional differences, providing an opportunity to practice those skills in a realistic business context. By examining different sales methods, city-level profitability, and different product demand, I believe it will help me better understand how companies such as Adidas can use data analytics to enable a more effective approach effective approach in understanding and targeting customers, while balancing profitability, operational efficiency, and customer comfort.
2) Problem Statement
Question 1:
How do in-store/outlet/online sales perform overall, and how does their profitability vary across city/state?
Question 2:
Which city/state contribute the most to Adidas’s overall operating profit, and which retailers across city/states are most profitable?
Question 3:
Which product types generate the highest operating profit, and how does their profitability vary across city/state?
3) Data
Data Identification:
The dataset is from Kaggle : https://www.kaggle.com/datasets/heemalichaudhari/adidas-sales-dataset/data. Dataset downloaded as of November 3rd 2025. This dataset includes details such as the product sold, retailer, number of units sold, total sales revenue, location of the sales, etc.
Data Definition
Each observation represents sales for a product type sold through a specific retailer in a given city and state on a specific date with details such as total profit and total operating profit.
- Price per Unit — Price of a single item
- Units Sold — Number of units purchased for that invoice
- Total Sales — Total revenue: Price per Unit × Units Sold
- Operating Profit — Profit: Total Revenue − Operating Costs
- Operating Margin — Operating Profit / Total Sales
- Retailer — Company that sold the product
- Retailer ID — Numerical identifier unique to retailer
- Invoice Date — Date of invoice
- Region — Geographic area where the sale took place (West, Northeast, etc.)
- State — State where the sale occurred
- City — City where the sale occurred
- Product — Type of Adidas item sold (Men’s Apparel, Men’s Street Footwear, etc.)
- Sales Method — Channel used to make the sale (in-store, online, outlet)
Data Description
There are a total 9648 observations and 13 variables. Of the 13 variables, the dataset has 5 numeric variables, and 8 categorical variables. Numeric variables include: Price per Unit, Units Sold, Total Sales, Operating Profit, Operating Margin. Categorical variables include: Retailer, Retailer ID (stored as a numerical var), Invoice Date (stored as a datetime var), Region, State, City, Product, and Sales Method. Seven variables (Retailer, State, City, Product, Total Sales, Operating Profit, and Sales Method) were used for all of the analysis in the report.
Reference Dataset Identification:
The reference dataset contains US cities coordinates (only used city, state_name, lat (latitude), lng (longitude) variables to create a United States map in my analysis (Q2)): https://simplemaps.com/data/us-cities. Dataset downloaded as of November 24th 2025. It includes 31,254 United States cities and towns built from sources such as the U.S. Geological Survey and U.S. Census Bureau.
Reference Dataset Definition
Each observation represents a city/town and includes information such as its state, county, population, longitude, and latitude, and other details.
- lat — Latitude coordinates of city/town’s location
- lng — Longitude coordinate of city/town’s location
- population — Number of people living in the city/town
- density — Population per unit of land
- ranking — Numeric value indicating the city/town’s relative size
- city — City Name
- city_ascii — Standardized city/town name
- state_id — State abbreviation
- state_name — State Name
- county_fips — Numerical value identifying county
- county_name — Name of county
- source — Whether coordinates came from a boundary (“shape”) or single point (“point”)
- military — Whether location is a military base (True/False)
- incorporated — Whether city/town is officially incorporated (True/False)
- timezone — Timezone of city/town
- zips — List of zip codes for the city/town
- id — A unique identifier for each city/town
Reference Data Description Description
There are a total of 31,254 observations and 17 variables. Of the 17 variables, the dataset has 4 numeric variables and 13 cateogorical variables. Numeric variables include: lat, lng, population, density, and ranking. Categorical variables include: city, city_ascii, state_id, state_name, county_fips (stored as numeric var), county_name, source, military (boolean), incorporated (boolean), timezone, zips, and id (stored as numeric var). Variables used for analysis include city, state_id, lat, and lng.
4) Stakeholders
Management at Adidas and Strategy Team at Adidas
The management and strategy team would benefit significantly from identifying the most profitable city/state combinations. By noting the the city/state combinations, and understanding the associated retailers and sales methods that generates the most profit, it can guide them to take action in expanding/decreasing retailer stores, increasing/decreasing in-store vs outlet vs online sales methods, and decreasing/increasing product inventory levels at certain locations. Additionally, by learning what product types are most popular, Adidas can invest/divest in certain product types.
Adidas consumers
Adidas consumers care most about product quality and pricing. Allowing consumers to know which product types are most popular across city/state and their pricing helps them identify reliable items that are worth purchasing.
Retailers
It might also be valuable for retailers to know which product types of Adidas are most popular. By identifying popular product types, retailers can strategically position these products in high-traffic areas of the store to attract visitors and maximize sales. Additionally, by understanding product preferences they can proactively keep in contact with Adidas to improve and reduce stocks for certain products.
5) Data Cleaning
Adidas Dataset
There are 0 missing values in any of the varibles used for analysis. Thus, no imputation is needed.
However, overall cleaning is neccessary because variables: Price per Unit, Total Sales, Operating Profit, and Operating Margin all have ‘%’ or ‘$’ signs. We have converted those columns to numeric values.
We should also convert Units Sold to numeric values, as it represents a value. We have also converted Invoice Date to be stored as datetime. In conclusion, all of the columns have been converted to their supposed datatypes before further cleaning within each question.
a) Cleaning - Question 1
The columns needed for Question 1 is City, State, Total Sales, Operating Profit, Sales Method.
No missing values in the dataset as mentioned previously. The outliers for the numeric variable includes: {‘Total Sales’: 653, ‘Operating Profit’: 706}. I’ve decided to keep them because I believe high volume in total sales and operating profit may represent higher density areas, or other possible reasons, which are meaningful to understand sales performance and its correlation with sales method.
The number of categories for each categorical var: {‘City’: 52, ‘State’: 50, ‘Sales Method’: 3}. I believe these categories are all relevant, since city and state represents where the transaction was made and sales method represents whether the transaction was online, in-store, or at an outlet. All categories are useful, no category has just a few observations. No, there are also no incorrect values, all entries for total sales and operating profit are all >=0.
b) Cleaning - Question 2
The columns needed for Question 2 is Retailer, City, State, Total Sales, Operating Profit.
(Also in Q1: No missing values in the dataset as mentioned previously. The outliers for the numeric variables include: {‘Total Sales’: 653, ‘Operating Profit’: 706}. I’ve decided to keep them because I believe high volume in total sales and operating profit may represent higher density areas, or other possible reasons, which are meaningful to understand sales performance and its correlation with sales method.)
The number of categories for each categorical var: {‘Retailer’: 6, ‘City’: 52, ‘State’: 50}.
I believe these categories are all relevant, since city and state represents where the transaction was made and retailer represents which company (Walmart, Foot Locker, Sports Direct, West Gear, Kohl’s, Amazon) the consumer bought the product. All categories are useful, no category has just a few observations. No, there are also no incorrect values, all entries for total sales and operating profit are all >=0.
c) Cleaning - Question 3
The columns needed for Question 3 is Product, City, State, Total Sales, Operating Profit.
(Also in Q1: No missing values in the dataset as mentioned previously. The outliers for the numeric variables include: {‘Total Sales’: 653, ‘Operating Profit’: 706}. I’ve decided to keep them because I believe high volume in total sales and operating profit may represent higher density areas, or other possible reasons, which are meaningful to understand sales performance and its correlation with sales method.)
The number of categories for each categorical var: {‘Product’: 6, ‘City’: 52, ‘State’: 50}.
I believe these categories are all relevant, since city and state represents where the transaction was made and product represents the product type (Men’s Street Footwear, Men’s Athletic Footwear, Women’s Street Footwear, Women’s Athletic Footwear, Men’s Apparel, and Women’s Apparel) that the customer bought. All categories are useful, no category has just a few observations. No, there are also no incorrect values, all entries for total sales and operating profit are all >=0.
6) Data Analysis
a) Analysis 1
After cleaning the dataset, we grouped by the the sales method and aggregated total sales and total operating profit. Then, I created a dictionary that maps each sales method to its total operating profit. Using this dictionary, I calculated pairwise profit differences between in-store, online, and outlet, and converted the results into a dataframe. Finally, I generated a bar chart using matplotlib to visually compare these differences displayed exact profit values on the y-axis. From the results, we noted that in-store purchases bring in the highest revenue. This may be due to the fact that customers who walk into a store are often already committed to buying, so the selling rate is often higher than online browsing. Additionally, product displays, sales assistants’ converations, as well as having the opportunity to try the product on, may give the consumer more confidence to purchase the item. Outlets also bring in lower revenue than in-store, which could be due to the fact that they primarily sell older inventory, and the discount brings in less profit. However, outlets do make slightly more than online purchases, and this can be due to the fact that people often go to outlets with the intention of making purchases, which increases total sales.
For more in-depth research, we wanted to see for each city/state which sales method performed the best. I grouped city/state as well as sales method then aggregated total sales and total operating profit. Then, for each city and state, we kept only the entry with the highest total sales, meaning that among in-store, outlet, and online sales method, we retained the row corresponding to the highest-performing sales method.
Using a pie chart, I visualized the share of cities where in-store, online, or outlet performs best. I can confirm that this matches with our previous result, where most prefer in-store, followed by outlets. However, outlets and online sales method show similar outcomes, with 16 and 14 city/states preferring them as the best sales method, respectively. This can be due to the fact that since outlet and online both attach price-sensitive consumers, they both lead to similar sales, with outlets winning by a small margin. Afterward, we did a sanity check to confirm that there were 54 different cities within the data, matching our pie chart results.
Third, to better understand city-level performance, I used the already grouped by city/state and sales method aggregated by total sales and total operating profit dataset, and sorted it by total operating profit to identify the most profitable city/state and sales method combinations. Using these results, I created a horizontal bar chart that displays the top 25 highest-performing locations.
From the top-25 profitable chart, we see that outlet stores frequently appear among the highest-earning cities, largely due to strong foot traffic and high-volume purchases. In-store locations also perform well in high-populated cities. While outlets dominate many of the top single locations, in-store ultimately earns more overall possibly because it maintains higher margins.
In conclusion, in-store performs the best overall, generating the highest total operating profit across all sales methods. This is likely due to conversations with products and sales associates, as well as in-person try-ons, which increases consumer confidence. Outlet stores rank second, bringing in revenue through high-volume discount purchases. Online sales perform the lowest overall, largely due low consumer confidnece.
Across cities and states, profitability follows a similar pattern: in-store is the top-performing method in the largest number of cities, while outlet and online perform similarly, with outlet winning slightly than online because outlet benefits from in-person shopping.
The top-20 profitability chart also shows that outlet stores dominate highest-earning locations possibly due to strong foot traffic, however, in-store remains the strongest performing sales method nationwide.
b) Analysis 2
To calculate which city/state contribute the most to Adidas’s overall operating profit, I grouped the dataset by City/State, and aggregating total sales and total operating profit for each location. I also grouped the dataset by state, summing total sales and total operating profit for each state. This allowed me to generate two ranked tables showing the top-performing cities/states and top-performing states based on their aggregated operating profit.
I then created a list of major U.S. cities (54 city/states from the dataset), matched each one to its state abbreviation, then merged it with a dataset [3] containing geographic coordinates and state abbreviations. Afterward, I plotted each city on a U.S. map, with marker color and size representing operating profit. Those that are darker show which locations contribute the most at a national level. From the results, New York, Miami, Charleston (SC), San Francisco, and Houston emerged as some of the most profitable cities, while New York, Florida, California, and Texas, Florida were the most profitable states overall. This shows that the high-population urban areas drive the majority of Adidas’s operating profit.
(CHECK THIRD HTML FILE - UNITED STATES MAP DOESN’T RENDER ONTO CURRENT HTML)
| City | State | Total Sales | Total Operating Profit | |
|---|---|---|---|---|
| 36 | New York | New York | 39801235 | 13899981 |
| 32 | Miami | Florida | 31600863 | 12168628 |
| 10 | Charleston | South Carolina | 29285637 | 11324247 |
| 48 | San Francisco | California | 34539220 | 10256252 |
| 23 | Houston | Texas | 25456882 | 9845140 |
| 12 | Charlotte | North Carolina | 23956531 | 9756425 |
| 0 | Albany | New York | 24427804 | 9429864 |
| 35 | New Orleans | Louisiana | 23750781 | 9417239 |
| 6 | Birmingham | Alabama | 17633424 | 9147581 |
| 29 | Los Angeles | California | 25634913 | 9044931 |
| State | Total Sales | Total Operating Profit | |
|---|---|---|---|
| 31 | New York | 64229039 | 23329845 |
| 8 | Florida | 59283714 | 20926219 |
| 4 | California | 60174133 | 19301183 |
| 42 | Texas | 46359746 | 18688214 |
| 39 | South Carolina | 29285637 | 11324247 |
| 32 | North Carolina | 23956531 | 9756425 |
| 17 | Louisiana | 23750781 | 9417239 |
| 0 | Alabama | 17633424 | 9147581 |
| 41 | Tennessee | 18067440 | 8493670 |
| 21 | Michigan | 18625433 | 8135902 |
Moving on to the second question, I want to understand which retailers are most profitable across all cities/states. First, I first grouped by retailer then aggregated total sales and total operating profit. By visualizing the totals using a bar chart, I noted that West Gear generates the highest overall operating profit, followed by Foot Locker, then Sports Direct.
Next, I wanted to examine how profitable each retailer is per individual store, because it is possible that the high profiting retailers may be profiting due to large amounts of stores. I then grouped the data by retailer, city, and state, then calculated normalized profit metric by dividing each store’s profit by the number of stores that the retailer has. Using a boxplot, I visualized the distribution of per-store profitability across retailers. This reveals although West Gear is the most profitable retailer overall due to its large number of locations, the normalized box plot shows that Walmart generates the highest profit per store.
In conclusion, the cities that contribute the most to Adidas’s operating profit are high-population areas like New York, Miami, Charleston (SC), San Francisco, and Houston, with states such as New York, Florida, California, and Texas helping Adidas’ profits the most. When looking across all city/state combinations, West Gear comes out as the most profitable retailer due to its large number of locations. But once profits are normalized per store, we see that Walmart actually performs the best for each store individually.
c) Analysis 3
To answer the first part of the question of what product type generates the highest operating profit, I grouped the data by Product and aggregated total sales and total operating profit to see which product types made the most money overall. Then, I plotted a horizontal bar chart where each bar shows the total operating profit for a product type. From here, we noted that men’s street footwear was most profitable, followed by women’s apparel and men’s athletic footwear.
Next, to see how product profitability varies geographically, I grouped the dataset by Product, City, and State, then aggregated total operating profit. Afterward, I created a box plot of operating profit with product type, demonstrating how operating profit is distributed across different cities for each product type. This lets us compare not only which products earn the most overall, but also how their profitability varies across different cities and states.
In conclusion, we can note that the product bar chart and product box plot tell a similar story, where the product type that generate the highest total operating profit also tend to perform strongly across many cities/states.
For example, men’s street footwear not only leads in overall total operating profit, but also shows relatively stable performance in most locations, indicating broad consumer appeal. Additionally, women’s apparel and men’s athletic footwear also demonstrate distributions with relatively high demand across all locations. This suggests that the most profitable products across are relatively stable in demand across diverse regions.
7) Recommendations to Stakeholders
a) Recommendations 1
Based on my analysis, the main action items for stakeholders is prioritize investing in in-store locations by expanding store presence in high-population areas, and by possibly improving in-store experience through enhanced staffing and offering personalized services. Stakeholders should also invest in outlet locations by possibly increasing promotions and inventory, ensuring that high-demand items remain consistently available for the high-volume of customers.
Limitations of my analysis that stakeholders should be aware of is that we do not have population data, demographic data, as well as store-level operating costs data. For example, we do not know whether in-person success is driven by foot-traffic density, local income levels, or simply the number of stores in that region.
To overcome these limitations, future research should focus on integrating population and demographic data to understand why certain sales methods outperform others. Futher research should also focus on understand operating costs for in-person locations, and find methods to decrease costs to drive profits higher.
b) Recommendations 2
Based on my analysis, the main action items for stakeholders is prioritize investing in high performing cities/states and increase stores in New York, Miami, and Houston, while strengthening relationships with high individual profit retailers such as Walmart and Amazon.
However, the results are limited because we do not know whether the higher profits for individual stores are simply due to being located in more densely populated regions, or if certain retailers have stronger brand reputation or customer loyalty in specific cities and states. We also do not take into account local competition at certain regions.
To overcome these limitations, future analysis should incorporate population density maps, map of where these Adidas shoes are being sold, as well as retailer reputation metrics, and data on competitor presence to give a more complete picture of why certain cities, states, and retailers outperform others.
c) Recommendations 3
Based on my analysis, stakeholders should invest in the product categories (men’s street footwear, women’s apparel, and men’s athletic footwear) that drive the highest operating profit and show stable demand across many cities/states. They should experiment with increasing products with new clothing/footwear designs, along with increasing inventory to maximizie revenue.
Limitations of my analysis is not account for external factors that may influence sales performance across regions, such as local consumer preferences. To address these limitations, future research should incorporate demographic information and regional preference data. If we added this elements, we can provide a clearer understanding of whether certain products succeed in particular areas and help stakeholders make more targeted inventory decisions about which items to stock in which city/state.
8) Conclusion
My analysis on Adidas’ product trends highlight how data can be used to make decisions regarding profitability, regional performance, and product demand.
One of my strongest findings is that in-store sales drive the highest operating profit, followed by outlet sales. Most of these transactions also occur at highly populated cities such as New York and Miami, where traffic is naturally higher. Additionally, men’s street footwear, women’s apparel, and men’s athletic footwear consistently are the top product types across many cities, demonstrating stable demand.
To maximize profitability, stakeholders should prioritize investing in in-store experiences at highly populated regions, strengthening relationships with retailers, while ensuring strong inventory levels of the most popular product types.
However, there are limitations with my analysis. My analysis does not incorporate population data, demographic data, regional fashion preferences, retailer reputation metrics, or local competition data. All of these factors could explain why certain products, sales methods, or overall profit is higher in some cities/states than others. Future research should integrate these other information to provide a more thorough understanding of customer behavior, which can guide targeted-decision making.
Overall, this project shows how data analytics can support companies like Adidas balance profitability, operational efficiency, and customer experience more effectively.
References
[1] Goldstein, Joel. “Data-Driven Decision Making In Retail.” Forbes, Forbes Magazine, 13 Aug. 2024, www.forbes.com/councils/forbesbusinessdevelopmentcouncil/2023/12/14/data-driven-decision-making-in-retail/. Accessed 25 Nov. 2025.
[2] ClickZ. “The Cost of Momentum: Inside Adidas’s Bold Marketing & E-Commerce Strategy.” ClickZ, 24 Nov. 2025, www.clickz.com/the-cost-of-momentum-inside-adidas-s-bold-marketing-e-commerce-strategy/270092/. Accessed 25 Nov. 2025.
[3] Pareto Software. United States Cities Database. simplemaps.com/data/us-cities. Accessed 25 Nov. 2025.