Weekend Sale - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65percent

Welcome To DumpsPedia

DY0-001 Sample Questions Answers

Questions 4

A data scientist is building a proof of concept for a commercialized machine-learning model. Which of the following is the best starting point?

Options:

A.

Literature review

B.

Model performance evaluation

C.

Hyperparameter tuning

D.

Model selection

Buy Now
Questions 5

Which of the following explains back propagation?

Options:

A.

The passage of convolutions backward through a neural network to update weights and biases

B.

The passage of accuracy backward through a neural network to update weights and biases

C.

The passage of nodes backward through a neural network to update weights and biases

D.

The passage of errors backward through a neural network to update weights and biases

Buy Now
Questions 6

A data scientist is attempting to identify sentences that are conceptually similar to each other within a set of text files. Which of the following is the best way to prepare the data set to accomplish this task after data ingestion?

Options:

A.

Embeddings

B.

Extrapolation

C.

Sampling

D.

One-hot encoding

Buy Now
Questions 7

Which of the following distribution methods or models can most effectively represent the actual arrival times of a bus that runs on an hourly schedule?

Options:

A.

Binomial

B.

Exponential

C.

Normal

D.

Poisson

Buy Now
Questions 8

A data scientist is merging two tables. Table 1 contains employee IDs and roles. Table 2 contains employee IDs and team assignments. Which of the following is the best technique to combine these data sets?

Options:

A.

inner join between Table 1 and Table 2

B.

left join on Table 1 with Table 2

C.

right join on Table 1 with Table 2

D.

outer join between Table 1 and Table 2

Buy Now
Questions 9

Which of the following is best solved with graph theory?

Options:

A.

Optical character recognition

B.

Traveling salesman

C.

Fraud detection

D.

One-armed bandit

Buy Now
Questions 10

Which of the following distance metrics for KNN is best described as a straight line?

Options:

A.

Radial

B.

Euclidean

C.

Cosine

D.

Manhattan

Buy Now
Questions 11

A data scientist is using the following confusion matrix to assess model performance:

Actually Fails

Actually Succeeds

Predicted to Fail

80%

20%

Predicted to Succeed

15%

85%

The model is predicting whether a delivery truck will be able to make 200 scheduled delivery stops.

Every time the model is correct, the company saves 1 hour in planning and scheduling.

Every time the model is wrong, the company loses 4 hours of delivery time.

Which of the following is the net model impact for the company?

Options:

A.

25 hours lost

B.

25 hours saved

C.

165 hours lost

D.

165 hours saved

Buy Now
Questions 12

In a modeling project, people evaluate phrases and provide reactions as the target variable for the model. Which of the following best describes what this model is doing?

Options:

A.

Sentiment analysis

B.

Named-entity recognition

C.

TF-IDF vectorization

D.

Part-of-speech tagging

Buy Now
Questions 13

A data scientist uses a large data set to build multiple linear regression models to predict the likely market value of a real estate property. The selected new model has an RMSE of 995 on the holdout set and an adjusted R² of 0.75. The benchmark model has an RMSE of 1,000 on the holdout set. Which of the following is the best business statement regarding the new model?

Options:

A.

The model should be deployed because it has a lower RMSE.

B.

The model's adjusted R² is exceptionally strong for such a complex relationship.

C.

The model fails to improve meaningfully on the benchmark model.

D.

The model's adjusted R² is too low for the real estate industry.

Buy Now
Questions 14

A model's results show increasing explanatory value as additional independent variables are added to the model. Which of the following is the most appropriate statistic?

Options:

A.

Adjusted R²

B.

p value

C.

χ²

D.

Buy Now
Questions 15

The following graphic shows the results of an unsupervised, machine-learning clustering model:

k is the number of clusters, and n is the processing time required to run the model. Which of the following is the best value of k to optimize both accuracy and processing requirements?

Options:

A.

2

B.

10

C.

15

D.

20

Buy Now
Questions 16

Which of the following does k represent in the k-means model?

Options:

A.

Number of model tests

B.

Number of data splits

C.

Number of clusters

D.

Distance between features

Buy Now
Questions 17

The term "greedy algorithms" refers to machine-learning algorithms that:

Options:

A.

update priors as more data is seen.

B.

examine every node of a tree before making a decision.

C.

apply a theoretical model to the distribution of the data.

D.

make the locally optimal decision.

Buy Now
Questions 18

A data scientist would like to model a complex phenomenon using a large data set composed of categorical, discrete, and continuous variables. After completing exploratory data analysis, the data scientist is reasonably certain that no linear relationship exists between the predictors and the target. Although the phenomenon is complex, the data scientist still wants to maintain the highest possible degree of interpretability in the final model. Which of the following algorithms best meets this objective?

Options:

A.

Artificial neural network

B.

Decision tree

C.

Multiple linear regression

D.

Random forest

Buy Now
Questions 19

A data scientist is presenting the recommendations from a monthslong modeling and experiment process to the company’s Chief Executive Officer. Which of the following is the best set of artifacts to include in the presentation?

Options:

A.

Methods, data overview, results, recommendations, and charts

B.

Results, recommendations, justifications, and clear charts

C.

Recommendation, charts, justifications, code reviews, and results

D.

Methodology, code snippets, findings, data tables, and p-values

Buy Now
Questions 20

A data analyst wants to find the latitude and longitude of a mailing address. Which of the following is the best method to use?

Options:

A.

One-hot encoding

B.

Binning

C.

Geocoding

D.

Imputing

Buy Now
Questions 21

A data scientist is standardizing a large data set that contains website addresses. A specific string inside some of the web addresses needs to be extracted. Which of the following is the best method for extracting the desired string from the text data?

Options:

A.

Regular expressions

B.

Named-entity recognition

C.

Large language model

D.

Find and replace

Buy Now
Questions 22

A data analyst is analyzing data and would like to build conceptual associations. Which of the following is the best way to accomplish this task?

Options:

A.

n-grams

B.

NER

C.

TF-IDF

D.

POS

Buy Now
Questions 23

A data scientist is creating a responsive model that will update a product's daily pricing based on the previous day's sales volume. Which of the following resource constraints is the data scientist's greatest concern?

Options:

A.

Deployment time

B.

Training time

C.

Development time

D.

Data collection time

Buy Now
Questions 24

Which of the following is a classic example of a constrained optimization problem?

Options:

A.

The cold start problem

B.

The traveling salesman

C.

Calculating local maximum

D.

Calculating gradient descent

Buy Now
Questions 25

Given the equation:

Xt = δ + ϕ1Xt−1 + ωt, where ωt ∼ N(0, σω²)

Which of the following time series models best represents this process?

Options:

A.

ARIMA(1,1,1)

B.

ARMA(1,1)

C.

SARIMA(1,1,1) × (1,1,1)1

D.

AR(1)

Buy Now
Exam Code: DY0-001
Exam Name: CompTIA DataX Exam
Last Update: Aug 12, 2025
Questions: 85
$57.75  $164.99
$43.75  $124.99
$36.75  $104.99
buy now DY0-001