Sr data scientist Interview Questions

WooliesX Data Science Test - Machine learning and statistics Question 1 We are measuring the brightness of a star with a photon detector that produces a luminosity score. We point it at a particular star and take a large number of readings. Unfortunately, the readings are noisy and we observe that some readings indicate the star has negative brightness. Would you discard the negative readings? What effect does this have on the data and the readings we make from it? Question 2 You have fitted a GBM model and are happy with its accuracy. How will you explain, in business terms, to your stakeholders what the model is doing? What insights can you draw from the model? Question 3 Imagine you have the same dataset for training a predictive model. you once use XGboost and once a randomforest methodology (not eXtreme boosting). Under which scenario do you expect the depth of the trees to be higher? Question 4 Assume you have built a classification model which has an accuracy of 90% on the test set. Under what circumstances could this still be a bad model? Question 5 You are supposed to make a propensity to purchase model using XGBoost, and you have 40k features on customers in the feature bank. Given it is not feasible to productionise a model with this many features, how do you quantitatively reduce the number of features to something feasible (say 500 features)? Question 6 What are the advantages of a model like XGBoost over logistic regression? What are the disadvantages? Question 7 If you have a dataset that has a size larger than the amount of RAM in your computer, list at least 3 ways to help in fitting a model on this data. Question 8 You have made a very powerful predictive model for customers weekly sales. What is your favorite method of explaining the importance of the features in your model? Does this method consider interactions between features? If the feature is categorical, does this method work better with one-hot encoding or label encoding? Does this method explain the direction of the effect of the feature on the target variable (direct or inverse)? Question 9 How do you compare one-hot encoding and label encoding? When would one-hot encoding work better? And when would it be the other way around? Any other approach to encoding? Question 10 You are developing a GBM model to predict customers' weekly spend in supermarkets. From the data you collected you realised that about 30% of your target variable were zeros, i.e. 30% of customers had zero weekly spend in the past. State your plan for modelling. Question 11 A promotion offer was sent to two groups of customers, Group A and Group B, consisting of 1180 and 5740 customers, respectively. The redemption rate was 21% for Group A and 25% for Group B. Determine whether the two redemption rates are significantly different. Report the associated p-value. State any assumptions you may make. Question 12 You have a friend who randomly decides whether he goes out for a drink on Friday nights with probability of going out being 90%. If he goes out, he randomly chooses from three bars, A, B and C, with equal probabilities. Suppose you are trying to find him on a Friday night, and you have checked Bar A and B and he is not in either of those two. What is the probability that you will find him in Bar C? Apply the Bayes rule and show steps.

Senior Data Scientist

Interviewed at Woolworths Group

3.5★

Oct 30, 2020

WooliesX Data Science Test - Machine learning and statistics Question 1 We are measuring the brightness of a star with a photon detector that produces a luminosity score. We point it at a particular star and take a large number of readings. Unfortunately, the readings are noisy and we observe that some readings indicate the star has negative brightness. Would you discard the negative readings? What effect does this have on the data and the readings we make from it? Question 2 You have fitted a GBM model and are happy with its accuracy. How will you explain, in business terms, to your stakeholders what the model is doing? What insights can you draw from the model? Question 3 Imagine you have the same dataset for training a predictive model. you once use XGboost and once a randomforest methodology (not eXtreme boosting). Under which scenario do you expect the depth of the trees to be higher? Question 4 Assume you have built a classification model which has an accuracy of 90% on the test set. Under what circumstances could this still be a bad model? Question 5 You are supposed to make a propensity to purchase model using XGBoost, and you have 40k features on customers in the feature bank. Given it is not feasible to productionise a model with this many features, how do you quantitatively reduce the number of features to something feasible (say 500 features)? Question 6 What are the advantages of a model like XGBoost over logistic regression? What are the disadvantages? Question 7 If you have a dataset that has a size larger than the amount of RAM in your computer, list at least 3 ways to help in fitting a model on this data. Question 8 You have made a very powerful predictive model for customers weekly sales. What is your favorite method of explaining the importance of the features in your model? Does this method consider interactions between features? If the feature is categorical, does this method work better with one-hot encoding or label encoding? Does this method explain the direction of the effect of the feature on the target variable (direct or inverse)? Question 9 How do you compare one-hot encoding and label encoding? When would one-hot encoding work better? And when would it be the other way around? Any other approach to encoding? Question 10 You are developing a GBM model to predict customers' weekly spend in supermarkets. From the data you collected you realised that about 30% of your target variable were zeros, i.e. 30% of customers had zero weekly spend in the past. State your plan for modelling. Question 11 A promotion offer was sent to two groups of customers, Group A and Group B, consisting of 1180 and 5740 customers, respectively. The redemption rate was 21% for Group A and 25% for Group B. Determine whether the two redemption rates are significantly different. Report the associated p-value. State any assumptions you may make. Question 12 You have a friend who randomly decides whether he goes out for a drink on Friday nights with probability of going out being 90%. If he goes out, he randomly chooses from three bars, A, B and C, with equal probabilities. Suppose you are trying to find him on a Friday night, and you have checked Bar A and B and he is not in either of those two. What is the probability that you will find him in Bar C? Apply the Bayes rule and show steps.

Hoe would you handle the conflict

Senior Clinical Data Scientist

Interviewed at KARL STORZ

3.8★

Dec 18, 2024

Hoe would you handle the conflict

How would present your findings to a high-level executive?

Senior Data Science Analyst

Interviewed at Petco

3★

Jun 6, 2018

How would present your findings to a high-level executive?

What is your experience with ML at scale?

Senior Data Scientist

Interviewed at Steady

3.9★

Jan 19, 2022

What is your experience with ML at scale?

Why are you looking to leave your current role

Senior Data Scientist

Interviewed at Allstate

3.5★

Feb 26, 2024

Why are you looking to leave your current role

Case study about failure prediction

Senior Data Scientist

Interviewed at QuantumBlack

3.9★

May 8, 2021

Case study about failure prediction

HR - General experience questions Hiring Manager - Description of past projects and questions more specific to the gaming industry and data science DS team - Case studies about modelling and technical questions. ML - Code interview and technical questions about ML pipeline and simulation Producers - Business/Customer focused questions

Senior Data Scientist

Interviewed at EA Sports

4.1★

Sep 21, 2022

HR - General experience questions Hiring Manager - Description of past projects and questions more specific to the gaming industry and data science DS team - Case studies about modelling and technical questions. ML - Code interview and technical questions about ML pipeline and simulation Producers - Business/Customer focused questions

Describe a time you had to balance delivering work quickly with the quality of the deliverable.

Senior Data Scientist

Interviewed at Carta

3.6★

Aug 5, 2024

Describe a time you had to balance delivering work quickly with the quality of the deliverable.

- - What got you interest in applying for this position? - What are your main areas of expertise when it comes to data science? Where do you spend most of your time/what is a good breakdown of what you spend your time doing day to day? - After presenting the solution/delivering a model, do you have experience supporting the stakeholders afterwards? - Python, Looker, Snowflake and DBT experience? - Can you describe some of the things you noticed when your scope became senior? What were some of the new responsibilities, or what did you spend some time doing that maybe you didn't at mid-level? - Experience mentoring juniors? - - Main responsibilities in your current position? - How do you decide on what project you were going to work on? - What tools do you use day-to-day? - Experience with data modeling and DBT? - Key considerations to keep in mind when you're transforming data for use? - How do you approach about learning new tables in a database? - Have you worked with data coming from an application? - Have you ever partnered with product managers in the past? - How do you partner with stakeholders to understand business problems? - Project you've worked on that had a positive impact on a company? - How do you maintain a deployed project for ongoing usage?

Senior Data Scientist

Interviewed at Carta

3.6★

Feb 11, 2025

- - What got you interest in applying for this position? - What are your main areas of expertise when it comes to data science? Where do you spend most of your time/what is a good breakdown of what you spend your time doing day to day? - After presenting the solution/delivering a model, do you have experience supporting the stakeholders afterwards? - Python, Looker, Snowflake and DBT experience? - Can you describe some of the things you noticed when your scope became senior? What were some of the new responsibilities, or what did you spend some time doing that maybe you didn't at mid-level? - Experience mentoring juniors? - - Main responsibilities in your current position? - How do you decide on what project you were going to work on? - What tools do you use day-to-day? - Experience with data modeling and DBT? - Key considerations to keep in mind when you're transforming data for use? - How do you approach about learning new tables in a database? - Have you worked with data coming from an application? - Have you ever partnered with product managers in the past? - How do you partner with stakeholders to understand business problems? - Project you've worked on that had a positive impact on a company? - How do you maintain a deployed project for ongoing usage?

What similar previous experience I had.

Senior Data Scientist

Interviewed at Indie Campers

2.6★

Sep 26, 2022

What similar previous experience I had.

Sr Data Scientist Interview Questions

3,509 sr data scientist interview questions shared by candidates

See Interview Questions for Similar Jobs