Easter Special Sale - Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 575363r9

Welcome To DumpsPedia

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Sample Questions Answers

Questions 4

Which of the following code blocks returns a DataFrame where columns predError and productId are removed from DataFrame transactionsDf?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId|f |

3.+-------------+---------+-----+-------+---------+----+

4.|1 |3 |4 |25 |1 |null|

5.|2 |6 |7 |2 |2 |null|

6.|3 |3 |null |25 |3 |null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.withColumnRemoved("predError", "productId")

B.

transactionsDf.drop(["predError", "productId", "associateId"])

C.

transactionsDf.drop("predError", "productId", "associateId")

D.

transactionsDf.dropColumns("predError", "productId", "associateId")

E.

transactionsDf.drop(col("predError", "productId"))

Buy Now
Questions 5

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

transactionsDf.__1__(__2__.__3__(__4__))

Options:

A.

1. select

2. col("storeId")

3. cast

4. StringType

B.

1. select

2. col("storeId")

3. as

4. StringType

C.

1. cast

2. "storeId"

3. as

4. StringType()

D.

1. select

2. col("storeId")

3. cast

4. StringType()

E.

1. select

2. storeId

3. cast

4. StringType()

Buy Now
Questions 6

The code block displayed below contains an error. The code block should trigger Spark to cache DataFrame transactionsDf in executor memory where available, writing to disk where insufficient

executor memory is available, in a fault-tolerant way. Find the error.

Code block:

transactionsDf.persist(StorageLevel.MEMORY_AND_DISK)

Options:

A.

Caching is not supported in Spark, data are always recomputed.

B.

Data caching capabilities can be accessed through the spark object, but not through the DataFrame API.

C.

The storage level is inappropriate for fault-tolerant storage.

D.

The code block uses the wrong operator for caching.

E.

The DataFrameWriter needs to be invoked.

Buy Now
Questions 7

The code block shown below should add column transactionDateForm to DataFrame transactionsDf. The column should express the unix-format timestamps in column transactionDate as string

type like Apr 26 (Sunday). Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, from_unixtime(__3__, __4__))

Options:

A.

1. withColumn

2. "transactionDateForm"

3. "MMM d (EEEE)"

4. "transactionDate"

B.

1. select

2. "transactionDate"

3. "transactionDateForm"

4. "MMM d (EEEE)"

C.

1. withColumn

2. "transactionDateForm"

3. "transactionDate"

4. "MMM d (EEEE)"

D.

1. withColumn

2. "transactionDateForm"

3. "transactionDate"

4. "MM d (EEE)"

E.

1. withColumnRenamed

2. "transactionDate"

3. "transactionDateForm"

4. "MM d (EEE)"

Buy Now
Questions 8

Which of the following describes characteristics of the Spark UI?

Options:

A.

Via the Spark UI, workloads can be manually distributed across executors.

B.

Via the Spark UI, stage execution speed can be modified.

C.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

D.

There is a place in the Spark UI that shows the property spark.executor.memory.

E.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Buy Now
Questions 9

Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?

Options:

A.

transactionsDf.count("productId").distinct()

B.

transactionsDf.groupBy("productId").agg(col("value").count())

C.

transactionsDf.count("productId")

D.

transactionsDf.groupBy("productId").count()

E.

transactionsDf.groupBy("productId").select(count("value"))

Buy Now
Questions 10

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column

predError in DataFrame transactionsDf?

Options:

A.

transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))

B.

transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))

C.

transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))

D.

transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))

E.

transactionsDf.withColumn("predErrorSquared", "predError"**2)

Buy Now
Questions 11

The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header

and casting the columns in the most appropriate type. Find the error.

First 3 rows of transactions.csv:

1.transactionId;storeId;productId;name

2.1;23;12;green grass

3.2;35;31;yellow sun

4.3;23;12;green grass

Code block:

transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)

Options:

A.

The DataFrameReader is not accessed correctly.

B.

The transaction is evaluated lazily, so no file will be read.

C.

Spark is unable to understand the file type.

D.

The code block is unable to capture all columns.

E.

The resulting DataFrame will not have the appropriate schema.

Buy Now
Questions 12

Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should

only be listed once.

Sample of DataFrame itemsDf:

1.+------+--------------------+--------------------+-------------------+

2.|itemId| itemName| attributes| supplier|

3.+------+--------------------+--------------------+-------------------+

4.| 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|

5.| 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|

6.| 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|

7.+------+--------------------+--------------------+-------------------+

Options:

A.

itemsDf.filter(col(supplier).not_contains('X')).select(supplier).distinct()

B.

itemsDf.select(~col('supplier').contains('X')).distinct()

C.

itemsDf.filter(not(col('supplier').contains('X'))).select('supplier').unique()

D.

itemsDf.filter(~col('supplier').contains('X')).select('supplier').distinct()

E.

itemsDf.filter(!col('supplier').contains('X')).select(col('supplier')).unique()

Buy Now
Questions 13

Which of the following code blocks generally causes a great amount of network traffic?

Options:

A.

DataFrame.select()

B.

DataFrame.coalesce()

C.

DataFrame.collect()

D.

DataFrame.rdd.map()

E.

DataFrame.count()

Buy Now
Questions 14

Which of the following describes Spark actions?

Options:

A.

Writing data to disk is the primary purpose of actions.

B.

Actions are Spark's way of exchanging data between executors.

C.

The driver receives data upon request by actions.

D.

Stage boundaries are commonly established by actions.

E.

Actions are Spark's way of modifying RDDs.

Buy Now
Questions 15

The code block displayed below contains an error. The code block below is intended to add a column itemNameElements to DataFrame itemsDf that includes an array of all words in column

itemName. Find the error.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+

2.|itemId|itemName |supplier |

3.+------+----------------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |YetiX |

6.|3 |Outdoors Backpack |Sports Company Inc.|

7.+------+----------------------------------+-------------------+

Code block:

itemsDf.withColumnRenamed("itemNameElements", split("itemName"))

itemsDf.withColumnRenamed("itemNameElements", split("itemName"))

Options:

A.

All column names need to be wrapped in the col() operator.

B.

Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument "," needs to be passed to the split method.

C.

Operator withColumnRenamed needs to be replaced with operator withColumn and the split method needs to be replaced by the splitString method.

D.

Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument " " needs to be passed to the split method.

E.

The expressions "itemNameElements" and split("itemName") need to be swapped.

Buy Now
Questions 16

Which of the elements in the labeled panels represent the operation performed for broadcast variables?

Larger image

Options:

A.

2, 5

B.

3

C.

2, 3

D.

1, 2

E.

1, 3, 4

Buy Now
Questions 17

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

Options:

A.

1. save

2. mode

3. "ignore"

4. "compression"

5. path

B.

1. store

2. with

3. "replacement"

4. "compression"

5. path

C.

1. write

2. mode

3. "overwrite"

4. "compression"

5. save

(Correct)

D.

1. save

2. mode

3. "replace"

4. "compression"

5. path

E.

1. write

2. mode

3. "overwrite"

4. compression

5. parquet

Buy Now
Questions 18

Which of the following code blocks produces the following output, given DataFrame transactionsDf?

Output:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.schema.print()

B.

transactionsDf.rdd.printSchema()

C.

transactionsDf.rdd.formatSchema()

D.

transactionsDf.printSchema()

E.

print(transactionsDf.schema)

Buy Now
Questions 19

Which of the following statements about Spark's DataFrames is incorrect?

Options:

A.

Spark's DataFrames are immutable.

B.

Spark's DataFrames are equal to Python's DataFrames.

C.

Data in DataFrames is organized into named columns.

D.

RDDs are at the core of DataFrames.

E.

The data in DataFrames may be split into multiple chunks.

Buy Now
Questions 20

Which is the highest level in Spark's execution hierarchy?

Options:

A.

Task

B.

Executor

C.

Slot

D.

Job

E.

Stage

Buy Now
Questions 21

Which of the following code blocks reads in the JSON file stored at filePath, enforcing the schema expressed in JSON format in variable json_schema, shown in the code block below?

Code block:

1.json_schema = """

2.{"type": "struct",

3. "fields": [

4. {

5. "name": "itemId",

6. "type": "integer",

7. "nullable": true,

8. "metadata": {}

9. },

10. {

11. "name": "supplier",

12. "type": "string",

13. "nullable": true,

14. "metadata": {}

15. }

16. ]

17.}

18."""

Options:

A.

spark.read.json(filePath, schema=json_schema)

B.

spark.read.schema(json_schema).json(filePath)

1.schema = StructType.fromJson(json.loads(json_schema))

2.spark.read.json(filePath, schema=schema)

C.

spark.read.json(filePath, schema=schema_of_json(json_schema))

D.

spark.read.json(filePath, schema=spark.read.json(json_schema))

Buy Now
Questions 22

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

Options:

A.

itemsDf.persist(StorageLevel.MEMORY_ONLY)

B.

itemsDf.cache(StorageLevel.MEMORY_AND_DISK)

C.

itemsDf.store()

D.

itemsDf.cache()

E.

itemsDf.write.option('destination', 'memory').save()

Buy Now
Questions 23

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

Options:

A.

spark.mode("parquet").read("/FileStore/imports.parquet")

B.

spark.read.path("/FileStore/imports.parquet", source="parquet")

C.

spark.read().parquet("/FileStore/imports.parquet")

D.

spark.read.parquet("/FileStore/imports.parquet")

E.

spark.read().format('parquet').open("/FileStore/imports.parquet")

Buy Now
Questions 24

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

Options:

A.

Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.

B.

Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.

C.

Use a narrow transformation to reduce the number of partitions.

D.

Use a wide transformation to reduce the number of partitions.

Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Buy Now
Questions 25

Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame

transactionsDf?

Options:

A.

transactionsDf.select(corr(["predError", "value"]).alias("corr")).first()

B.

transactionsDf.select(corr(col("predError"), col("value")).alias("corr")).first()

C.

transactionsDf.select(corr(predError, value).alias("corr"))

D.

transactionsDf.select(corr(col("predError"), col("value")).alias("corr"))

(Correct)

E.

transactionsDf.select(corr("predError", "value"))

Buy Now
Questions 26

The code block displayed below contains an error. When the code block below has executed, it should have divided DataFrame transactionsDf into 14 parts, based on columns storeId and

transactionDate (in this order). Find the error.

Code block:

transactionsDf.coalesce(14, ("storeId", "transactionDate"))

Options:

A.

The parentheses around the column names need to be removed and .select() needs to be appended to the code block.

B.

Operator coalesce needs to be replaced by repartition, the parentheses around the column names need to be removed, and .count() needs to be appended to the code block.

(Correct)

C.

Operator coalesce needs to be replaced by repartition, the parentheses around the column names need to be removed, and .select() needs to be appended to the code block.

D.

Operator coalesce needs to be replaced by repartition and the parentheses around the column names need to be replaced by square brackets.

E.

Operator coalesce needs to be replaced by repartition.

Buy Now
Questions 27

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

Options:

A.

itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

B.

itemsDf.join(transactionsDf, itemId == transactionId)

C.

itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

D.

itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

E.

itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Buy Now
Questions 28

Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

Options:

A.

transactionsDf.dropna("any")

B.

transactionsDf.dropna(thresh=4)

C.

transactionsDf.drop.na("",2)

D.

transactionsDf.dropna(thresh=2)

E.

transactionsDf.dropna("",4)

Buy Now
Questions 29

Which of the following code blocks prints out in how many rows the expression Inc. appears in the string-type column supplier of DataFrame itemsDf?

Options:

A.

1.counter = 0

2.

3.for index, row in itemsDf.iterrows():

4. if 'Inc.' in row['supplier']:

5. counter = counter + 1

6.

7.print(counter)

B.

1.counter = 0

2.

3.def count(x):

4. if 'Inc.' in x['supplier']:

5. counter = counter + 1

6.

7.itemsDf.foreach(count)

8.print(counter)

C.

print(itemsDf.foreach(lambda x: 'Inc.' in x))

D.

print(itemsDf.foreach(lambda x: 'Inc.' in x).sum())

E.

1.accum=sc.accumulator(0)

2.

3.def check_if_inc_in_supplier(row):

4. if 'Inc.' in row['supplier']:

5. accum.add(1)

6.

7.itemsDf.foreach(check_if_inc_in_supplier)

8.print(accum.value)

Buy Now
Questions 30

Which of the following code blocks returns a DataFrame with a single column in which all items in column attributes of DataFrame itemsDf are listed that contain the letter i?

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

2.|itemId|itemName |attributes |supplier |

3.+------+----------------------------------+-----------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |

6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|

7.+------+----------------------------------+-----------------------------+-------------------+

Options:

A.

itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(attributes_exploded.contains("i"))

B.

itemsDf.explode(attributes).alias("attributes_exploded").filter(col("attributes_exploded").contains("i"))

C.

itemsDf.select(explode("attributes")).filter("attributes_exploded".contains("i"))

D.

itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(col("attributes_exploded").contains("i"))

E.

itemsDf.select(col("attributes").explode().alias("attributes_exploded")).filter(col("attributes_exploded").contains("i"))

Buy Now
Questions 31

Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?

Options:

A.

transactionsDf.sort(asc(value)).show(10)

B.

transactionsDf.sort(col("value")).show(10)

C.

transactionsDf.sort(col("value").desc()).head()

D.

transactionsDf.sort(col("value").asc()).print(10)

E.

transactionsDf.orderBy("value").asc().show(10)

Buy Now
Questions 32

The code block shown below should store DataFrame transactionsDf on two different executors, utilizing the executors' memory as much as possible, but not writing anything to disk. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

1.from pyspark import StorageLevel

2.transactionsDf.__1__(StorageLevel.__2__).__3__

Options:

A.

1. cache

2. MEMORY_ONLY_2

3. count()

B.

1. persist

2. DISK_ONLY_2

3. count()

C.

1. persist

2. MEMORY_ONLY_2

3. select()

D.

1. cache

2. DISK_ONLY_2

3. count()

E.

1. persist

2. MEMORY_ONLY_2

3. count()

Buy Now
Questions 33

The code block displayed below contains an error. The code block should return DataFrame transactionsDf, but with the column storeId renamed to storeNumber. Find the error.

Code block:

transactionsDf.withColumn("storeNumber", "storeId")

Options:

A.

Instead of withColumn, the withColumnRenamed method should be used.

B.

Arguments "storeNumber" and "storeId" each need to be wrapped in a col() operator.

C.

Argument "storeId" should be the first and argument "storeNumber" should be the second argument to the withColumn method.

D.

The withColumn operator should be replaced with the copyDataFrame operator.

E.

Instead of withColumn, the withColumnRenamed method should be used and argument "storeId" should be the first and argument "storeNumber" should be the second argument to that method.

Buy Now
Questions 34

Which of the following code blocks returns a DataFrame that is an inner join of DataFrame itemsDf and DataFrame transactionsDf, on columns itemId and productId, respectively and in which every

itemId just appears once?

Options:

A.

itemsDf.join(transactionsDf, "itemsDf.itemId==transactionsDf.productId").distinct("itemId")

B.

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId).dropDuplicates(["itemId"])

C.

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId).dropDuplicates("itemId")

D.

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId, how="inner").distinct(["itemId"])

E.

itemsDf.join(transactionsDf, "itemsDf.itemId==transactionsDf.productId", how="inner").dropDuplicates(["itemId"])

Buy Now
Questions 35

Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?

Options:

A.

transactionsDf.withColumn("storeId", convert("storeId", "string"))

B.

transactionsDf.withColumn("storeId", col("storeId", "string"))

C.

transactionsDf.withColumn("storeId", col("storeId").convert("string"))

D.

transactionsDf.withColumn("storeId", col("storeId").cast("string"))

E.

transactionsDf.withColumn("storeId", convert("storeId").as("string"))

Buy Now
Questions 36

Which of the following code blocks applies the boolean-returning Python function evaluateTestSuccess to column storeId of DataFrame transactionsDf as a user-defined function?

Options:

A.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())

3.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))

B.

1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)

2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(storeId))

C.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.IntegerType())

3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))

D.

1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)

2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))

E.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())

3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))

Buy Now
Questions 37

Which of the following code blocks reads in the parquet file stored at location filePath, given that all columns in the parquet file contain only whole numbers and are stored in the most appropriate

format for this kind of data?

Options:

A.

1.spark.read.schema(

2. StructType(

3. StructField("transactionId", IntegerType(), True),

4. StructField("predError", IntegerType(), True)

5. )).load(filePath)

B.

1.spark.read.schema([

2. StructField("transactionId", NumberType(), True),

3. StructField("predError", IntegerType(), True)

4. ]).load(filePath)

C.

1.spark.read.schema(

2. StructType([

3. StructField("transactionId", StringType(), True),

4. StructField("predError", IntegerType(), True)]

5. )).parquet(filePath)

D.

1.spark.read.schema(

2. StructType([

3. StructField("transactionId", IntegerType(), True),

4. StructField("predError", IntegerType(), True)]

5. )).format("parquet").load(filePath)

E.

1.spark.read.schema([

2. StructField("transactionId", IntegerType(), True),

3. StructField("predError", IntegerType(), True)

4. ]).load(filePath, format="parquet")

Buy Now
Questions 38

Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?

Options:

A.

transactionsDf.sort("storeId", asc("productId"))

B.

transactionsDf.sort(col(storeId)).desc(col(productId))

C.

transactionsDf.order_by(col(storeId), desc(col(productId)))

D.

transactionsDf.sort("storeId", desc("productId"))

E.

transactionsDf.sort("storeId").sort(desc("productId"))

Buy Now
Questions 39

Which of the following code blocks stores a part of the data in DataFrame itemsDf on executors?

Options:

A.

itemsDf.cache().count()

B.

itemsDf.cache(eager=True)

C.

cache(itemsDf)

D.

itemsDf.cache().filter()

E.

itemsDf.rdd.storeCopy()

Buy Now
Questions 40

Which of the following code blocks returns about 150 randomly selected rows from the 1000-row DataFrame transactionsDf, assuming that any row can appear more than once in the returned

DataFrame?

Options:

A.

transactionsDf.resample(0.15, False, 3142)

B.

transactionsDf.sample(0.15, False, 3142)

C.

transactionsDf.sample(0.15)

D.

transactionsDf.sample(0.85, 8429)

E.

transactionsDf.sample(True, 0.15, 8261)

Buy Now
Questions 41

Which of the following describes a narrow transformation?

Options:

A.

narrow transformation is an operation in which data is exchanged across partitions.

B.

A narrow transformation is a process in which data from multiple RDDs is used.

C.

A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

D.

A narrow transformation is an operation in which data is exchanged across the cluster.

E.

A narrow transformation is an operation in which no data is exchanged across the cluster.

Buy Now
Questions 42

Which of the following statements about lazy evaluation is incorrect?

Options:

A.

Predicate pushdown is a feature resulting from lazy evaluation.

B.

Execution is triggered by transformations.

C.

Spark will fail a job only during execution, but not during definition.

D.

Accumulators do not change the lazy evaluation model of Spark.

E.

Lineages allow Spark to coalesce transformations into stages

Buy Now
Questions 43

Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?

Options:

A.

transactionsDf["storeId"].distinct()

B.

transactionsDf.select("storeId").distinct()

(Correct)

C.

transactionsDf.filter("storeId").distinct()

D.

transactionsDf.select(col("storeId").distinct())

E.

transactionsDf.distinct("storeId")

Buy Now
Questions 44

Which of the following code blocks reads JSON file imports.json into a DataFrame?

Options:

A.

spark.read().mode("json").path("/FileStore/imports.json")

B.

spark.read.format("json").path("/FileStore/imports.json")

C.

spark.read("json", "/FileStore/imports.json")

D.

spark.read.json("/FileStore/imports.json")

E.

spark.read().json("/FileStore/imports.json")

Buy Now
Questions 45

The code block shown below should return a copy of DataFrame transactionsDf with an added column cos. This column should have the values in column value converted to degrees and having

the cosine of those converted values taken, rounded to two decimals. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(__2__, round(__3__(__4__(__5__)),2))

Options:

A.

1. withColumn

2. col("cos")

3. cos

4. degrees

5. transactionsDf.value

B.

1. withColumnRenamed

2. "cos"

3. cos

4. degrees

5. "transactionsDf.value"

C.

1. withColumn

2. "cos"

3. cos

4. degrees

5. transactionsDf.value

D.

1. withColumn

2. col("cos")

3. cos

4. degrees

5. col("value")

E

. 1. withColumn

2. "cos"

3. degrees

4. cos

5. col("value")

Buy Now
Questions 46

Which of the following describes tasks?

Options:

A.

A task is a command sent from the driver to the executors in response to a transformation.

B.

Tasks transform jobs into DAGs.

C.

A task is a collection of slots.

D.

A task is a collection of rows.

E.

Tasks get assigned to the executors by the driver.

Buy Now
Questions 47

Which of the following code blocks returns approximately 1000 rows, some of them potentially being duplicates, from the 2000-row DataFrame transactionsDf that only has unique rows?

Options:

A.

transactionsDf.sample(True, 0.5)

B.

transactionsDf.take(1000).distinct()

C.

transactionsDf.sample(False, 0.5)

D.

transactionsDf.take(1000)

E.

transactionsDf.sample(True, 0.5, force=True)

Buy Now
Questions 48

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

Options:

A.

1. save

2. mode

3. "ignore"

4. "compression"

5. path

B.

1. store

2. with

3. "replacement"

4. "compression"

5. path

C.

1. write

2. mode

3. "overwrite"

4. "compression"

5. save

(Correct)

D.

1. save

2. mode

3. "replace"

4. "compression"

5. path

E.

1. write

2. mode

3. "overwrite"

4. compression

5. parquet

Buy Now
Questions 49

Which of the following statements about executors is correct?

Options:

A.

Executors are launched by the driver.

B.

Executors stop upon application completion by default.

C.

Each node hosts a single executor.

D.

Executors store data in memory only.

E.

An executor can serve multiple applications.

Buy Now
Questions 50

The code block displayed below contains at least one error. The code block should return a DataFrame with only one column, result. That column should include all values in column value from

DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s).

Code block:

1.from pyspark.sql.functions import udf

2.from pyspark.sql import types as T

3.

4.transactionsDf.createOrReplaceTempView('transactions')

5.

6.def pow_5(x):

7. return x**5

8.

9.spark.udf.register(pow_5, 'power_5_udf', T.LongType())

10.spark.sql('SELECT power_5_udf(value) FROM transactions')

Options:

A.

The pow_5 method is unable to handle empty values in column value and the name of the column in the returned DataFrame is not result.

B.

The returned DataFrame includes multiple columns instead of just one column.

C.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and the SparkSession cannot access the transactionsDf

DataFrame.

D.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and Spark driver does not call the UDF function

appropriately.

E.

The pow_5 method is unable to handle empty values in column value, the UDF function is not registered properly with the Spark driver, and the name of the column in the returned DataFrame is

not result.

Buy Now
Questions 51

Which of the following code blocks writes DataFrame itemsDf to disk at storage location filePath, making sure to substitute any existing data at that location?

Options:

A.

itemsDf.write.mode("overwrite").parquet(filePath)

B.

itemsDf.write.option("parquet").mode("overwrite").path(filePath)

C.

itemsDf.write(filePath, mode="overwrite")

D.

itemsDf.write.mode("overwrite").path(filePath)

E.

itemsDf.write().parquet(filePath, mode="overwrite")

Buy Now
Questions 52

Which of the following DataFrame operators is never classified as a wide transformation?

Options:

A.

DataFrame.sort()

B.

DataFrame.aggregate()

C.

DataFrame.repartition()

D.

DataFrame.select()

E.

DataFrame.join()

Buy Now
Questions 53

The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.

A sample of DataFrame itemsDf is below.

Code block:

itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

Options:

A.

Since itemId is the index, it does not need to be an argument to the select() method.

B.

The alias() method needs to be called after the select() method.

C.

The explode() method expects a Column object rather than a string.

D.

explode() is not a method of DataFrame. explode() should be used inside the select() method instead.

E.

The split() method should be used inside the select() method instead of the explode() method.

Buy Now
Questions 54

Which of the following is the deepest level in Spark's execution hierarchy?

Options:

A.

Job

B.

Task

C.

Executor

D.

Slot

E.

Stage

Buy Now
Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Last Update: May 18, 2024
Questions: 180
$64  $159.99
$48  $119.99
$40  $99.99
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0