
GET Real Google Professional-Data-Engineer Exam Questions With 100% Refund Guarantee Apr 05, 2024
Get Special Discount Offer on Professional-Data-Engineer Dumps PDF
NEW QUESTION # 167
When you design a Google Cloud Bigtable schema it is recommended that you
_________.
- A. Create schema designs that require atomicity across rows
- B. Avoid schema designs that require atomicity across rows
- C. Create schema designs that are based on a relational database design
- D. Avoid schema designs that are based on NoSQL concepts
Answer: B
Explanation:
All operations are atomic at the row level. For example, if you update two rows in a table, it's possible that one row will be updated successfully and the other update will fail. Avoid schema designs that require atomicity across rows.
Reference: https://cloud.google.com/bigtable/docs/schema-design#row-keys
NEW QUESTION # 168
You are designing a system that requires an ACID-compliant database. You must ensure that the system requires minimal human intervention in case of a failure. What should you do?
- A. Configure a Cloud SQL for PostgreSQL instance with high availability enabled.
- B. Configure a Cloud SQL for MySQL instance with point-in-time recovery enabled.
- C. Configure a BJgQuery table with a multi-region configuration.
- D. Configure a Bigtable instance with more than one cluster.
Answer: A
Explanation:
The best option to meet the ACID compliance and minimal human intervention requirements is to configure a Cloud SQL for PostgreSQL instance with high availability enabled. Key reasons: Cloud SQL for PostgreSQL provides full ACID compliance, unlike Bigtable which provides only atomicity and consistency guarantees.
Enabling high availability removes the need for manual failover as Cloud SQL will automatically failover to a standby replica if the leader instance goes down. Point-in-time recovery in MySQL requires manual intervention to restore data if needed. BigQuery does not provide transactional guarantees required for an ACID database. Therefore, a Cloud SQL for PostgreSQL instance with high availability meets the ACID and minimal intervention requirements best. The automatic failover will ensure availability and uptime without administrative effort.
NEW QUESTION # 169
What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?
- A. the selection is final and you must resume using the same storage type
- B. create a third instance and sync the data from the two storage types via batch jobs
- C. run parallel instances where one is HDD and the other is SDD
- D. export the data from the existing instance and import the data into a new instance
Answer: D
Explanation:
When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage for the cluster is permanent. You cannot use the Google Cloud Platform Console to change the type of storage that is used for the cluster.
If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data from the existing instance and import the data into a new instance. Alternatively, you can write a Cloud Dataflow or Hadoop MapReduce job that copies the data from one instance to another.
Reference: https://cloud.google.com/bigtable/docs/choosing-ssd-hdd-
NEW QUESTION # 170
You need to deploy additional dependencies to all of a Cloud Dataproc cluster at startup using an existing initialization action. Company security policies require that Cloud Dataproc nodes do not have access to the Internet so public initialization actions cannot fetch resources. What should you do?
- A. Copy all dependencies to a Cloud Storage bucket within your VPC security perimeter
- B. Use an SSH tunnel to give the Cloud Dataproc cluster access to the Internet
- C. Use Resource Manager to add the service account used by the Cloud Dataproc cluster to the Network User role
- D. Deploy the Cloud SQL Proxy on the Cloud Dataproc master
Answer: C
NEW QUESTION # 171
As your organization expands its usage of GCP, many teams have started to create their own projects.
Projects are further multiplied to accommodate different stages of deployments and target audiences.
Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take? Choose 2 answers.
- A. Use Cloud Deployment Manager to automate access provision.
- B. Create distinct groups for various teams, and specify groups in Cloud IAM policies.
- C. Introduce resource hierarchy to leverage access control policy inheritance.
- D. Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.
- E. For each Cloud Storage bucket or BigQuery dataset, decide which projects need access. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.
Answer: A,B
Explanation:
Explanation/Reference:
NEW QUESTION # 172
The Dataflow SDKs have been recently transitioned into which Apache service?
- A. Apache Spark
- B. Apache Beam
- C. Apache Kafka
- D. Apache Hadoop
Answer: B
Explanation:
Dataflow SDKs are being transitioned to Apache Beam, as per the latest Google directive
NEW QUESTION # 173
You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?
- A. Store and process the entire dataset in BigQuery.
- B. Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active.
- C. Store and process the entire dataset in Cloud Bigtable.
- D. Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket.
Answer: D
NEW QUESTION # 174
Your company built a TensorFlow neutral-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?
- A. Serialization
- B. Dimensionality Reduction
- C. Threading
- D. Dropout Methods
Answer: D
Explanation:
Explanation/Reference: https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using- tensorflow-30505541d877
NEW QUESTION # 175
Case Study: 2 - MJTelco
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost. Their management and operations teams are situated all around the globe creating many-to- many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments ?development/test, staging, and production ?
to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community. Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
Provide reliable and timely access to data for analysis from distributed research workers Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately
100m records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis.
Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
You create a new report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. It is company policy to ensure employees can view only the data associated with their region, so you create and populate a table for each region. You need to enforce the regional access policy to the data.
Which two actions should you take? (Choose two.)
- A. Adjust the settings for each table to allow a related region-based security group view access.
- B. Ensure each table is included in a dataset for a region.
- C. Adjust the settings for each dataset to allow a related region-based security group view access.
- D. Adjust the settings for each view to allow a related region-based security group view access.
- E. Ensure all the tables are included in global dataset.
Answer: B,D
NEW QUESTION # 176
You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need to process, store and analyze these very large datasets in real time.
What should you do?
- A. Send the data to Google Cloud Pub/Sub, stream Cloud Pub/Sub to Google Cloud Dataflow, and store the data in Google BigQuery.
- B. Export logs in batch to Google Cloud Storage and then spin up a Google Cloud SQL instance, import the data from Cloud Storage, and run an analysis as needed.
- C. Send the data to Cloud Storage and then spin up an Apache Hadoop cluster as needed in Google Cloud Dataproc whenever analysis is required.
- D. Send the data to Google Cloud Datastore and then export to BigQuery.
Answer: A
Explanation:
Pubsub for realtime, Dataflow for pipeline, Bigquery for analytics.
NEW QUESTION # 177
You are implementing workflow pipeline scheduling using open source-based tools and Google Kubernetes Engine (GKE). You want to use a Google managed service to simplify and automate the task. You also want to accommodate Shared VPC networking considerations. What should you do?
- A. Use Cloud Composer in a Shared VPC configuration. Place the Cloud Composer resources in the host project.
- B. Use Cloud Composer in a Shared VPC configuration. Place the Cloud Composer resources in the service project.
- C. Use Dataflow for your workflow pipelines. Use Cloud Run triggers for scheduling.
- D. Use Dataflow for your workflow pipelines. Use shell scripts to schedule workflows.
Answer: B
Explanation:
Shared VPC requires that you designate a host project to which networks and subnetworks belong and a service project, which is attached to the host project. When Cloud Composer participates in a Shared VPC, the Cloud Composer environment is in the service project. Reference:
https://cloud.google.com/composer/docs/how-to/managing/configuring-shared-vpc
NEW QUESTION # 178
Your company built a TensorFlow neutral-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?
- A. Serialization
- B. Dimensionality Reduction
- C. Threading
- D. Dropout Methods
Answer: D
NEW QUESTION # 179
You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?
- A. Use Cloud TPUs without any additional adjustment to your code.
- B. Use Cloud GPUs after implementing GPU kernel support for your customs ops.
- C. Stay on CPUs, and increase the size of the cluster you're training your model on.
- D. Use Cloud TPUs after implementing GPU kernel support for your customs ops.
Answer: D
Explanation:
Cloud TPUs are not suited to the following workloads: [...] Neural network workloads that contain custom TensorFlow operations written in C++. Specifically, custom operations in the body of the main training loop are not suitable for TPUs.
NEW QUESTION # 180
Which of the following is not possible using primitive roles?
- A. Give a user viewer access to BigQuery and owner access to Google Compute Engine instances.
- B. Give GroupA owner access and GroupB editor access for all datasets in a project.
- C. Give UserA owner access and UserB editor access for all datasets in a project.
- D. Give a user access to view all datasets in a project, but not run queries on them.
Answer: D
Explanation:
Primitive roles can be used to give owner, editor, or viewer access to a user or group, but they can't be used to separate data access permissions from job-running permissions.
Reference: https://cloud.google.com/bigquery/docs/access-control#primitive_iam_roles
NEW QUESTION # 181
Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.
Which approach should you take?
- A. Use the automatically generated timestamp from Cloud Pub/Sub to order the data.
- B. Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.
- C. Use the NOW () function in BigQuery to record the event's time.
- D. Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.
Answer: D
Explanation:
Topic 2, MJTelco Case Study
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments - development/test, staging, and production - to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community.
Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
Provide reliable and timely access to data for analysis from distributed research workers
Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
NEW QUESTION # 182
How would you query specific partitions in a BigQuery table?
- A. Use the __PARTITIONTIME pseudo-column in the WHERE clause
- B. Use DATE BETWEEN in the WHERE clause
- C. Use the DAY column in the WHERE clause
- D. Use the EXTRACT(DAY) clause
Answer: A
Explanation:
Partitioned tables include a pseudo column named _PARTITIONTIME that contains a date-based timestamp for data loaded into the table. To limit a query to particular partitions (such as Jan 1st and 2nd of 2017), use a clause similar to this:
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-01-01') AND TIMESTAMP('2017-01-02') Reference: https://cloud.google.com/bigquery/docs/partitioned-tables#the_partitiontime_pseudo_column
NEW QUESTION # 183
When a Cloud Bigtable node fails, ____ is lost.
- A. no data
- B. all data
- C. the time dimension
- D. the last transaction
Answer: A
Explanation:
A Cloud Bigtable table is sharded into blocks of contiguous rows, called tablets, to help balance the workload of queries. Tablets are stored on Colossus, Google's file system, in SSTable format. Each tablet is associated with a specific Cloud Bigtable node.
Data is never stored in Cloud Bigtable nodes themselves; each node has pointers to a set of tablets that are stored on Colossus. As a result:
Rebalancing tablets from one node to another is very fast, because the actual data is not copied. Cloud Bigtable simply updates the pointers for each node.
Recovery from the failure of a Cloud Bigtable node is very fast, because only metadata needs to be migrated to the replacement node.
When a Cloud Bigtable node fails, no data is lost
NEW QUESTION # 184
You're training a model to predict housing prices based on an available dataset with real estate properties.
Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency.
What should you do?
- A. Create a numeric column from a feature cross of latitude and longitude.
- B. Provide latitude and longitude as input vectors to your neural net.
- C. Create a feature cross of latitude and longitude, bucketize at the minute level and use L1 regularization during optimization.
- D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization.
Answer: A
Explanation:
Explanation
Reference https://cloud.google.com/bigquery/docs/gis-data
NEW QUESTION # 185
You work for an economic consulting firm that helps companies identify economic trends as they happen. As part of your analysis, you use Google BigQuery to correlate customer data with the average prices of the 100 most common goods sold, including bread, gasoline, milk, and others. The average prices of these goods are updated every 30 minutes. You want to make sure this data stays up to date so you can combine it with other data in BigQuery as cheaply as possible. What should you do?
- A. Store the data in Google Cloud Datastore. Use Google Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Cloud Datastore
- B. Load the data every 30 minutes into a new partitioned table in BigQuery.
- C. Store the data in a file in a regional Google Cloud Storage bucket. Use Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Google Cloud Storage.
- D. Store and update the data in a regional Google Cloud Storage bucket and create a federated data source in BigQuery
Answer: A
NEW QUESTION # 186
Which of the following is NOT one of the three main types of triggers that Dataflow supports?
- A. Trigger based on time
- B. Trigger based on element count
- C. Trigger that is a combination of other triggers
- D. Trigger based on element size in bytes
Answer: D
Explanation:
There are three major kinds of triggers that Dataflow supports: 1. Time-based triggers 2.
Data-driven triggers. You can set a trigger to emit results from a window when that window has received a certain number of data elements. 3. Composite triggers. These triggers combine multiple time-based or data-driven triggers in some logical way Reference: https://cloud.google.com/dataflow/model/triggers
NEW QUESTION # 187
An aerospace company uses a proprietary data format to store its night dat
a. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?
- A. Use a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.
- B. Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
- C. Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source
- D. Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format
Answer: B
NEW QUESTION # 188
......
Google Professional Data Engineer Practice Test Questions, Google Professional Data Engineer Exam Practice Test Questions
The Google Professional Data Engineer certification is designed to evaluate the candidates’ skills in designing data processing systems and ensuring solution quality. It is also created to measure their competence in building and operationalizing data processing systems and operationalizing ML models. The potential applicants must complete a single exam to get certified.
PDF Download Google Test To Gain Brilliante Result!: https://www.prepawayexam.com/Google/braindumps.Professional-Data-Engineer.ete.file.html
Provide Updated Google Professional-Data-Engineer Dumps as Practice Test and PDF: https://drive.google.com/open?id=1lleXDhfai4B0qDrx468H5ktcyiMAvSOq