WooliesX Data Science Test - Machine learning and statistics Question 1 We are measuring the brightness of a star with a photon detector that produces a luminosity score. We point it at a particular star and take a large number of readings. Unfortunately, the readings are noisy and we observe that some readings indicate the star has negative brightness. Would you discard the negative readings? What effect does this have on the data and the readings we make from it? Question 2 You have fitted a GBM model and are happy with its accuracy. How will you explain, in business terms, to your stakeholders what the model is doing? What insights can you draw from the model? Question 3 Imagine you have the same dataset for training a predictive model. you once use XGboost and once a randomforest methodology (not eXtreme boosting). Under which scenario do you expect the depth of the trees to be higher? Question 4 Assume you have built a classification model which has an accuracy of 90% on the test set. Under what circumstances could this still be a bad model? Question 5 You are supposed to make a propensity to purchase model using XGBoost, and you have 40k features on customers in the feature bank. Given it is not feasible to productionise a model with this many features, how do you quantitatively reduce the number of features to something feasible (say 500 features)? Question 6 What are the advantages of a model like XGBoost over logistic regression? What are the disadvantages? Question 7 If you have a dataset that has a size larger than the amount of RAM in your computer, list at least 3 ways to help in fitting a model on this data. Question 8 You have made a very powerful predictive model for customers weekly sales. What is your favorite method of explaining the importance of the features in your model? Does this method consider interactions between features? If the feature is categorical, does this method work better with one-hot encoding or label encoding? Does this method explain the direction of the effect of the feature on the target variable (direct or inverse)? Question 9 How do you compare one-hot encoding and label encoding? When would one-hot encoding work better? And when would it be the other way around? Any other approach to encoding? Question 10 You are developing a GBM model to predict customers' weekly spend in supermarkets. From the data you collected you realised that about 30% of your target variable were zeros, i.e. 30% of customers had zero weekly spend in the past. State your plan for modelling. Question 11 A promotion offer was sent to two groups of customers, Group A and Group B, consisting of 1180 and 5740 customers, respectively. The redemption rate was 21% for Group A and 25% for Group B. Determine whether the two redemption rates are significantly different. Report the associated p-value. State any assumptions you may make. Question 12 You have a friend who randomly decides whether he goes out for a drink on Friday nights with probability of going out being 90%. If he goes out, he randomly chooses from three bars, A, B and C, with equal probabilities. Suppose you are trying to find him on a Friday night, and you have checked Bar A and B and he is not in either of those two. What is the probability that you will find him in Bar C? Apply the Bayes rule and show steps.
Sr Data Scientist Interview Questions
3,509 sr data scientist interview questions shared by candidates
Hoe would you handle the conflict
How would present your findings to a high-level executive?
What is your experience with ML at scale?
Why are you looking to leave your current role
Case study about failure prediction
HR - General experience questions Hiring Manager - Description of past projects and questions more specific to the gaming industry and data science DS team - Case studies about modelling and technical questions. ML - Code interview and technical questions about ML pipeline and simulation Producers - Business/Customer focused questions
Describe a time you had to balance delivering work quickly with the quality of the deliverable.
- - What got you interest in applying for this position? - What are your main areas of expertise when it comes to data science? Where do you spend most of your time/what is a good breakdown of what you spend your time doing day to day? - After presenting the solution/delivering a model, do you have experience supporting the stakeholders afterwards? - Python, Looker, Snowflake and DBT experience? - Can you describe some of the things you noticed when your scope became senior? What were some of the new responsibilities, or what did you spend some time doing that maybe you didn't at mid-level? - Experience mentoring juniors? - - Main responsibilities in your current position? - How do you decide on what project you were going to work on? - What tools do you use day-to-day? - Experience with data modeling and DBT? - Key considerations to keep in mind when you're transforming data for use? - How do you approach about learning new tables in a database? - Have you worked with data coming from an application? - Have you ever partnered with product managers in the past? - How do you partner with stakeholders to understand business problems? - Project you've worked on that had a positive impact on a company? - How do you maintain a deployed project for ongoing usage?
What similar previous experience I had.
Viewing 1231 - 1240 interview questions