Parallel Computing Toolbox™User's GuideR2015a
x ContentsJob Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-29Job Monitor GUI . . . . . . . . . . . . . . .
4 Interactive Parallel Computation with pmode4-68Until this point in the example, the variant arrays are independent, other thanhaving the same name.
Run Communicating Jobs Interactively Using pmode4-7P>> combined = gather(whole)Notice, however, that this gathers the entire array into the wor
4 Interactive Parallel Computation with pmode4-814If you require distribution along a different dimension, you can use theredistribute function. In th
Run Communicating Jobs Interactively Using pmode4-915Exit pmode and return to the regular MATLAB desktop.P>> pmode exit
4 Interactive Parallel Computation with pmode4-10Parallel Command WindowWhen you start pmode on your local client machine with the commandpmode start
Parallel Command Window4-11You have several options for how to arrange the tiles showing your worker outputs.Usually, you will choose an arrangement
4 Interactive Parallel Computation with pmode4-12P>> distobj = codistributor('1d',1);P>> I = redistribute(I, distobj)When you re
Parallel Command Window4-13You can choose to view the worker outputs by tabs.1. Select tabbeddisplay2. Select tab3. Select labsshown inthis tabYou ca
4 Interactive Parallel Computation with pmode4-14Multiple labsin same tab
Running pmode Interactive Jobs on a Cluster4-15Running pmode Interactive Jobs on a ClusterWhen you run pmode on a cluster of workers, you are running
xiRun mapreduce on a Hadoop Cluster . . . . . . . . . . . . . . . . . . 6-61Cluster Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Interactive Parallel Computation with pmode4-16Plotting Distributed Data Using pmodeBecause the workers running a job in pmode are MATLAB sessions w
Plotting Distributed Data Using pmode4-17This is not the only way to plot codistributed data. One alternative method, especiallyuseful when running n
4 Interactive Parallel Computation with pmode4-18pmode Limitations and Unexpected ResultsUsing Graphics in pmodeDisplaying a GUIThe workers that run t
pmode Troubleshooting4-19pmode TroubleshootingIn this section...“Connectivity Testing” on page 4-19“Hostname Resolution” on page 4-19“Socket Connecti
5Math with Codistributed ArraysThis chapter describes the distribution or partition of data across several workers,and the functionality provided for
5 Math with Codistributed Arrays5-2Nondistributed Versus Distributed ArraysIn this section...“Introduction” on page 5-2“Nondistributed Arrays” on page
Nondistributed Versus Distributed Arrays5-3 WORKER 1 WORKER 2 WORKER 3 WORKER 4 | | |8 1 6 | 8
5 Math with Codistributed Arrays5-4Codistributed ArraysWith replicated and variant arrays, the full content of the array is stored in theworkspace of
Working with Codistributed Arrays5-5Working with Codistributed ArraysIn this section...“How MATLAB Software Distributes Arrays” on page 5-5“Creating
xii ContentsProgram Communicating Jobs8Program Communicating Jobs . . . . . . . . . . . . . . . . . . . . . . . . 8-2Program Communicating Jobs for a
5 Math with Codistributed Arrays5-6end Lab 1: This lab stores D(:,1:250). Lab 2: This lab stores D(:,251:500). Lab 3: This lab stores D(:,501
Working with Codistributed Arrays5-7number is not evenly divisible by the number of workers, MATLAB partitions the arrayas evenly as possible.MATLAB
5 Math with Codistributed Arrays5-8spmd, A = [11:18; 21:28; 31:38; 41:48], endA = 11 12 13 14 15 16 17 18 21 22 23
Working with Codistributed Arrays5-9(local part) on each worker first, and then combine them into a single array that isdistributed across the worker
5 Math with Codistributed Arrays5-10Constructor FunctionsThe codistributed constructor functions are listed here. Use the codist argument(created by t
Working with Codistributed Arrays5-11 size(D) L = getLocalPart(D); size(L)endreturns on each worker:3 803 20Each worker recognizes
5 Math with Codistributed Arrays5-12where D is any MATLAB array.Determining the Dimension of DistributionThe codistributor object determines how an ar
Working with Codistributed Arrays5-13Construct an 8-by-16 codistributed array D of random values distributed by columns onfour workers:spmd D = ra
5 Math with Codistributed Arrays5-14 11 12 13 | 14 15 16 | 17 18 | 19 20 21 22 23 | 24 25 26 | 27 28 | 29 30
Working with Codistributed Arrays5-15to the end of the entire array; that is, the last subscript of the final segment. The lengthof each segment is a
xiiiExample: Run Your MATLAB Code . . . . . . . . . . . . . . . . . . 9-14Supported MATLAB Code . . . . . . . . . . . . . . . . . . . . . . . . . . 9-
5 Math with Codistributed Arrays5-16 Element is in position 25000 on worker 2.Notice if you use a pool of a different size, the element ends up in a
Working with Codistributed Arrays5-17Now you can use this codistributor object to distribute the original matrix:P>> AA = codistributed(A, DIST
5 Math with Codistributed Arrays5-181 9 17 25 33 41 49 572 10 18 26 34 42 50 583 11 19 27 35
Working with Codistributed Arrays5-19The following points are worth noting:• '2dbc' distribution might not offer any performance enhancemen
5 Math with Codistributed Arrays5-20Looping Over a Distributed Range (for-drange)In this section...“Parallelizing a for-Loop” on page 5-20“Codistribut
Looping Over a Distributed Range (for-drange)5-21 plot(1:numDataSets, res); print -dtiff -r300 fig.tiff; save \\central\myResults\today.mat
5 Math with Codistributed Arrays5-22D = eye(8, 8, codistributor())E = zeros(8, 8, codistributor())By default, these arrays are distributed by columns;
Looping Over a Distributed Range (for-drange)5-23To loop over all elements in the array, you can use for-drange on the dimension ofdistribution, and
5 Math with Codistributed Arrays5-24MATLAB Functions on Distributed and Codistributed ArraysMany functions in MATLAB software are enhanced or overload
MATLAB Functions on Distributed and Codistributed Arrays5-25atandatanhbesselhbesselibesseljbesselkbesselybetabetaincbetaincinvbetalnbitandbitorbitxor
xiv ContentsObjects — Alphabetical List10Functions — Alphabetical List11Glossary
6Programming OverviewThis chapter provides information you need for programming with Parallel ComputingToolbox software. Further details of evaluating
6 Programming Overview6-2How Parallel Computing Products Run a JobIn this section...“Overview” on page 6-2“Toolbox and Server Components” on page 6-3“
How Parallel Computing Products Run a Job6-3MATLAB WorkerSchedulerMATLAB ClientParallelComputingToolboxMATLAB DistributedComputing ServerMATLAB Worke
6 Programming Overview6-4A MATLAB Distributed Computing Server software setup usually includes manyworkers that can all execute tasks simultaneously,
How Parallel Computing Products Run a Job6-5ClientWorkerWorkerWorkerClientScheduler 2WorkerWorkerWorkerClientClientScheduler 1Cluster with Multiple C
6 Programming Overview6-6• Is the handling of parallel computing jobs the only cluster scheduling managementyou need?The MJS is designed specifically
How Parallel Computing Products Run a Job6-7same platform. The cluster can also be comprised of both 32-bit and 64-bit machines,so long as your data
6 Programming Overview6-8ClusterClientWorkerWorkerWorkerJobJobJobJobJobJobJobJobJobJobJobJobJobJobcreateJobsubmitPendingQueued RunningFinishedfetchOut
How Parallel Computing Products Run a Job6-9Job Stage DescriptionFailed When using a third-party scheduler, a job might fail if thescheduler encounte
1Getting Started• “Parallel Computing Toolbox Product Description” on page 1-2• “Parallel Computing with MathWorks Products” on page 1-3• “Key Problem
6 Programming Overview6-10Create Simple Independent JobsProgram a Job on a Local ClusterIn some situations, you might need to define the individual ta
Create Simple Independent Jobs6-11results = [2] [4] [6]6Delete the job. When you have the results, you can permanently remove the job fromt
6 Programming Overview6-12Parallel PreferencesYou can access parallel preferences in the general preferences for MATLAB. To open thePreferences dialog
Parallel Preferences6-13• Automatically create a parallel pool — This setting causes a pool to automaticallystart if one is not already running at th
6 Programming Overview6-14Clusters and Cluster ProfilesIn this section...“Cluster Profile Manager” on page 6-14“Discover Clusters” on page 6-14“Import
Clusters and Cluster Profiles6-15This opens the Discover Clusters dialog box, where you select the location of yourclusters. As clusters are discover
6 Programming Overview6-16discovery of MJS clusters by identifying specific hosts rather than broadcastingacross your network.A DNS service (SRV) reco
Clusters and Cluster Profiles6-17The imported profile appears in your Cluster Profile Manager list. Note that the listcontains the profile name, whic
6 Programming Overview6-18The following example provides instructions on how to create and modify profiles usingthe Cluster Profile Manager.Suppose yo
Clusters and Cluster Profiles6-19This creates and displays a new profile, called MJSProfile1.2Double-click the new profile name in the listing, and m
1 Getting Started1-2Parallel Computing Toolbox Product DescriptionPerform parallel computations on multicore computers, GPUs, and computer clustersPar
6 Programming Overview6-20You might want to edit other properties depending on your particular network andcluster situation.5Click Done to save the pr
Clusters and Cluster Profiles6-215Scroll down to the Workers section, and for the Range of number of workers, clearthe [4 4] and leave the field blan
6 Programming Overview6-22You can see examples of profiles for different kinds of supported schedulers in theMATLAB Distributed Computing Server insta
Clusters and Cluster Profiles6-23Note Validation will fail if you already have a parallel pool open.When the tests are complete, you can click Show D
6 Programming Overview6-24• The Cluster Profile Manager indicates which is the default profile. You can select anyprofile in the list, then click Set
Apply Callbacks to MJS Jobs and Tasks6-25Apply Callbacks to MJS Jobs and TasksThe MATLAB job scheduler (MJS) has the ability to trigger callbacks in
6 Programming Overview6-26 disp(['Finished task: ' num2str(task.ID)])Create a job and set its QueuedFcn, RunningFcn, and FinishedFcn prope
Apply Callbacks to MJS Jobs and Tasks6-27Create and save a callback function clientTaskCompleted.m on the path of theMATLAB client, with the followin
6 Programming Overview6-28guarantee to the order in which the tasks finish, so the plots might overwrite eachother. Likewise, the FinishFcn callback f
Job Monitor6-29Job MonitorIn this section...“Job Monitor GUI” on page 6-29“Manage Jobs Using the Job Monitor” on page 6-30“Identify Task Errors Using
Parallel Computing with MathWorks Products1-3Parallel Computing with MathWorks ProductsIn addition to Parallel Computing Toolbox providing a local cl
6 Programming Overview6-30Typical Use CasesThe Job Monitor lets you accomplish many different goals pertaining to job tracking andqueue management. Us
Job Monitor6-31If you save this script in a file named invert_me.m, you can try to run the script as abatch job on the default cluster:batch('in
6 Programming Overview6-32Programming TipsIn this section...“Program Development Guidelines” on page 6-32“Current Working Directory of a MATLAB Worker
Programming Tips6-333Modify your code for division. Decide how you want your code divided. Foran independent job, determine how best to divide it int
6 Programming Overview6-34C:\TEMP\MDCE\Checkpoint\nodeA52_worker22_mlworker_log\workWriting to Files from WorkersWhen multiple workers attempt to writ
Programming Tips6-35clears all Parallel Computing Toolbox objects from the current MATLAB session. Theystill remain in the MJS. For information on re
6 Programming Overview6-36initialize a task is far greater than the actual time it takes for the worker to evaluate thetask function.
Control Random Number Streams6-37Control Random Number StreamsIn this section...“Different Workers” on page 6-37“Client and Workers” on page 6-38“Cli
6 Programming Overview6-38delete(p)Note Because rng('shuffle') seeds the random number generator based on thecurrent time, you should not us
Control Random Number Streams6-39For identical results, you can set the client and worker to use the same generator andseed. Here the file randScript
1 Getting Started1-4Key Problems Addressed by Parallel ComputingIn this section...“Run Parallel for-Loops (parfor)” on page 1-4“Execute Batch Jobs in
6 Programming Overview6-40Keyword Generator Multiple Stream andSubstream SupportApproximate Period In FullPrecision'CombRecursive' or'm
Control Random Number Streams6-41Rg = -0.0108 -0.7577 -0.8159 0.4742Worker CPU and Worker GPUCode running on a worker’s CPU uses the same ge
6 Programming Overview6-42Profiling Parallel CodeIn this section...“Introduction” on page 6-42“Collecting Parallel Profile Data” on page 6-42“Viewing
Profiling Parallel Code6-43• Amount of data transferred between each worker• Amount of time each worker spends waiting for communicationsWith the par
6 Programming Overview6-44The function summary report displays the data for each function executed on a worker insortable columns with the following h
Profiling Parallel Code6-45Column Header DescriptionTotal Time Plot Bar graph showing relative size of Self Time, Self Comm WaitingTime, and Total Ti
6 Programming Overview6-46communication. Manual Comparison Selection allows you to compare data fromspecific workers or workers that meet certain crit
Profiling Parallel Code6-47The next figure shows a summary report for the workers that spend the most versusleast time for each function. A Manual Co
6 Programming Overview6-48Click on a function name in the summary listing of a comparison to get a detailedcomparison. The detailed comparison for cod
Profiling Parallel Code6-49To see plots of communication data, select Plot All PerLab Communication in theShow Figures menu. The top portion of the p
Key Problems Addressed by Parallel Computing1-5the same machine as the client, you might see significant performance improvement on amulticore/multip
6 Programming Overview6-50Plots like those in the previous two figures can help you determine the best way tobalance work among your workers, perhaps
Benchmarking Performance6-51Benchmarking PerformanceHPC Challenge BenchmarksSeveral MATLAB files are available to illustrate HPC Challenge benchmarkp
6 Programming Overview6-52Troubleshooting and DebuggingIn this section...“Object Data Size Limitations” on page 6-52“File Access and Permissions” on p
Troubleshooting and Debugging6-53Error using ==> feval Undefined command/function 'function_name'.The worker that ran the task did
6 Programming Overview6-54• MATLAB could not read/write the job input/output files in the scheduler’s job storagelocation. The storage location might
Troubleshooting and Debugging6-55how to start it and how to test connectivity, see “Start Admin Center” and “TestConnectivity” in the MATLAB Distribu
6 Programming Overview6-56 Could not send Job3.common.mat for job 3: One of your shell's init files contains a command that is writing t
Run mapreduce on a Parallel Pool6-57Run mapreduce on a Parallel PoolIn this section...“Start Parallel Pool” on page 6-57“Compare Parallel mapreduce”
6 Programming Overview6-58Create two MapReducer objects for specifying the different execution environments formapreduce.inMatlab = mapreducer(0);inPo
Run mapreduce on a Parallel Pool6-59readall(meanDelay) Key Value __________________ ________ 'MeanArrivalDelay&
How to Contact MathWorksLatest news:www.mathworks.comSales and services:www.mathworks.com/sales_and_servicesUser community:www.mathworks.com/matlabcen
1 Getting Started1-6Introduction to Parallel SolutionsIn this section...“Interactively Run a Loop in Parallel” on page 1-6“Run a Batch Job” on page 1-
6 Programming Overview6-60Related Examples• “Getting Started with MapReduce”• “Run mapreduce on a Hadoop Cluster”More About• “MapReduce”• “Datastore”
Run mapreduce on a Hadoop Cluster6-61Run mapreduce on a Hadoop ClusterIn this section...“Cluster Preparation” on page 6-61“Output Format and Order” o
6 Programming Overview6-62outputFolder = '/home/user/logs/hadooplog';Note The specified outputFolder must not already exist. The mapreduce o
Run mapreduce on a Hadoop Cluster6-63meanDelay = KeyValueDatastore with properties: Files: { ' .../tmp/alafleur/tpc00621b1_4
6 Programming Overview6-64Partition a Datastore in ParallelPartitioning a datastore in parallel, with a portion of the datastore on each worker in apa
Partition a Datastore in Parallel6-65 [total,count] = sumAndCountArrivalDelay(ds)sumtime = tocmean = total/counttotal = 17211680count = 24173
6 Programming Overview6-66total = 0;count = 0;parfor ii = 1:N % Get partition ii of the datastore. subds = partition(ds,N,ii); [localTota
Partition a Datastore in Parallel6-67Rather than let the software calculate the number of partitions, you can explicitly setthis value, so that the d
6 Programming Overview6-68mean = 7.1201delete(p);Parallel pool using the 'local' profile is shutting down.You might get some idea of mode
7Program Independent Jobs• “Program Independent Jobs” on page 7-2• “Program Independent Jobs on a Local Cluster” on page 7-3• “Program Independent Job
Introduction to Parallel Solutions1-7parforMATLAB®workersMATLAB®clientBecause the iterations run in parallel in other MATLAB sessions, each iteration
7 Program Independent Jobs7-2Program Independent JobsAn Independent job is one whose tasks do not directly communicate with each other, thatis, the ta
Program Independent Jobs on a Local Cluster7-3Program Independent Jobs on a Local ClusterIn this section...“Create and Run Jobs with a Local Cluster”
7 Program Independent Jobs7-4Create a Cluster ObjectYou use the parcluster function to create an object in your local MATLAB sessionrepresenting the l
Program Independent Jobs on a Local Cluster7-5c Local Cluster Associated Jobs Number Pending: 1 Number Queued:
7 Program Independent Jobs7-6Fetch the Job’s ResultsThe results of each task’s evaluation are stored in the task object’s OutputArgumentsproperty as a
Program Independent Jobs on a Local Cluster7-7evaluation to the local cluster, the scheduler starts a MATLAB worker for each task inthe job, but only
7 Program Independent Jobs7-8Program Independent Jobs for a Supported SchedulerIn this section...“Create and Run Jobs” on page 7-8“Manage Objects in t
Program Independent Jobs for a Supported Scheduler7-9where MATLAB is accessed and many other cluster properties. The exact properties aredetermined b
7 Program Independent Jobs7-10 Modified: false Host: node345 Username: mylogi
Program Independent Jobs for a Supported Scheduler7-11 Number Running: 0 Number Finished: 0 Task ID of Errors: []Note that the job’s St
1 Getting Started1-8Run a Batch JobTo offload work from your MATLAB session to run in the background in another session,you can use the batch command.
7 Program Independent Jobs7-12Alternatively, you can create the five tasks with one call to createTask by providing acell array of five cell arrays de
Program Independent Jobs for a Supported Scheduler7-13wait(job1)results = fetchOutputs(job1);Display the results from each task.results{1:5} 0.950
7 Program Independent Jobs7-14Computing Server software or other cluster resources remain in place. When the clientsession ends, only the local refere
Program Independent Jobs for a Supported Scheduler7-15Remove Objects PermanentlyJobs in the cluster continue to exist even after they are finished, a
7 Program Independent Jobs7-16Share Code with the WorkersBecause the tasks of a job are evaluated on different machines, each machine must haveaccess
Share Code with the Workers7-17c = parcluster(); % Use defaultjob1 = createJob(c);ap = {'/central/funcs','/dept1/funcs', ...
7 Program Independent Jobs7-18more than one task for the job. (Note: Do not confuse this property with the UserDataproperty on any objects in the MATL
Share Code with the Workers7-19manually attached files to determine which code files are necessary for the workers,and to automatically send those fi
7 Program Independent Jobs7-20• taskStartup.m automatically executes on a worker each time the worker beginsevaluation of a task.• poolStartup.m autom
Program Independent Jobs for a Generic Scheduler7-21Program Independent Jobs for a Generic SchedulerIn this section...“Overview” on page 7-21“MATLAB
Introduction to Parallel Solutions1-9batch runs your code on a local worker or a cluster worker, but does not require aparallel pool.You can use batc
7 Program Independent Jobs7-22Client nodeMATLAB clientEnvironmentvariablesSubmitfunctionWorker nodeMATLAB workerEnvironmentvariablesDecodefunctionSche
Program Independent Jobs for a Generic Scheduler7-23testlocation = 'Plant30'c.IndependentSubmitFcn = {@mysubmitfunc, time_limit, testlocati
7 Program Independent Jobs7-24exist before the worker starts. For more information on the decode function, see“MATLAB Worker Decode Function” on page
Program Independent Jobs for a Generic Scheduler7-25Define Scheduler Command to Run MATLAB WorkersThe submit function must define the command necessa
7 Program Independent Jobs7-26This example function uses only the three default arguments. You can haveadditional arguments passed into your submit fu
Program Independent Jobs for a Generic Scheduler7-27derived from the values of your object properties. This command is inside the for-loop so that yo
7 Program Independent Jobs7-28'parallel.cluster.generic.independentDecodeFcn'. The remainder of thissection is useful only if you use names
Program Independent Jobs for a Generic Scheduler7-29With those values from the environment variables, the decode function must set theappropriate pro
7 Program Independent Jobs7-30c = parcluster('MyGenericProfile')If your cluster uses a shared file system for workers to access job and task
Program Independent Jobs for a Generic Scheduler7-312. Create a JobYou create a job with the createJob function, which creates a job object in thecli
1 Getting Started1-10parforbatchMATLAB®workersMATLAB®client5To view the results:wait(job)load(job,'A')plot(A)The results look the same as be
7 Program Independent Jobs7-32T = createTask(job1, @rand, 1, {{3,3} {3,3} {3,3} {3,3} {3,3}});In this case, T is a 5-by-1 matrix of task objects.4. Su
Program Independent Jobs for a Generic Scheduler7-33 0.6038 0.0153 0.9318 0.2722 0.7468 0.4660 0.1988 0.4451 0.4186 0.8
7 Program Independent Jobs7-34Filename DescriptiondeleteJobFcn.m Script to delete a job from the schedulerextractJobId.m Script to get the job’s ID fr
Program Independent Jobs for a Generic Scheduler7-35for ii = 1:props.NumberOfTasks define scheduler command per task end submit job to scheduler
7 Program Independent Jobs7-36 command to scheduler canceling job job_idIn a similar way, you can define what do to for deleting a job, and what to
Program Independent Jobs for a Generic Scheduler7-37delete(j1)Get State Information About a Job or TaskWhen using a third-party scheduler, it is poss
7 Program Independent Jobs7-38The following step occurs in your network:1For each task, the scheduler starts a MATLAB worker session on a cluster node
8Program Communicating Jobs• “Program Communicating Jobs” on page 8-2• “Program Communicating Jobs for a Supported Scheduler” on page 8-4• “Program Co
8 Program Communicating Jobs8-2Program Communicating JobsCommunicating jobs are those in which the workers can communicate with each otherduring the e
Program Communicating Jobs8-3Some of the details of a communicating job and its tasks might depend on the type ofscheduler you are using. The followi
Introduction to Parallel Solutions1-11Run Script as Batch Job from the Current Folder BrowserFrom the Current Folder browser, you can run a MATLAB sc
8 Program Communicating Jobs8-4Program Communicating Jobs for a Supported SchedulerIn this section...“Schedulers and Conditions” on page 8-4“Code the
Program Communicating Jobs for a Supported Scheduler8-5The function for this example is shown below.function total_sum = colsumif labindex == 1 %
8 Program Communicating Jobs8-6When your cluster object is defined, you create the job object with thecreateCommunicatingJob function. The job Type pr
Program Communicating Jobs for a Generic Scheduler8-7Program Communicating Jobs for a Generic SchedulerIn this section...“Introduction” on page 8-7“C
8 Program Communicating Jobs8-83Use createCommunicatingJob to create a communicating job object for yourcluster.4Create a task, run the job, and retri
Program Communicating Jobs for a Generic Scheduler8-9Filename DescriptiongetSubmitString.m Script to get the submission string for the schedulerThese
8 Program Communicating Jobs8-10Further Notes on Communicating JobsIn this section...“Number of Tasks in a Communicating Job” on page 8-10“Avoid Deadl
Further Notes on Communicating Jobs8-11In another example, suppose you want to transfer data from every worker to the nextworker on the right (define
9GPU Computing• “GPU Capabilities and Performance” on page 9-2• “Establish Arrays on a GPU” on page 9-3• “Run Built-In Functions on a GPU” on page 9-8
1 Getting Started1-12Distribute Arrays and Run SPMDDistributed ArraysThe workers in a parallel pool communicate with each other, so you can distribute
9 GPU Computing9-2GPU Capabilities and PerformanceIn this section...“Capabilities” on page 9-2“Performance Benchmarking” on page 9-2CapabilitiesParall
Establish Arrays on a GPU9-3Establish Arrays on a GPUIn this section...“Transfer Arrays Between Workspace and GPU” on page 9-3“Create GPU Arrays Dire
9 GPU Computing9-4Transfer Array of a Specified PrecisionCreate a matrix of double-precision random values in MATLAB, and then transfer thematrix as s
Establish Arrays on a GPU9-5For example, to see the help on the colon constructor, typehelp gpuArray/colonExample: Construct an Identity Matrix on th
9 GPU Computing9-6parallel.gpu.RandStreamThese functions perform in the same way as rng and RandStream in MATLAB, butwith certain limitations on the G
Establish Arrays on a GPU9-7For more information about generating random numbers on a GPU, and a comparisonbetween GPU and CPU generation, see “Contr
9 GPU Computing9-8Run Built-In Functions on a GPUIn this section...“MATLAB Functions with gpuArray Arguments” on page 9-8“Example: Functions with gpuA
Run Built-In Functions on a GPU9-9atan2datandatanhbesseljbesselybetabetaincbetaincinvbetalnbitandbitcmpbitgetbitorbitsetbitshiftbitxorblkdiagbsxfunca
9 GPU Computing9-10Ga = rand(1000,'single','gpuArray');Gfft = fft(Ga); Gb = (real(Gfft) + Ga) * 6;G = gather(Gb);The whos command
Run Built-In Functions on a GPU9-11 (1,2) 1 (2,5) 1g = gpuArray(s); % g is a sparse gpuArraygt = transpose(g); % gt is a sparse g
Introduction to Parallel Solutions1-13The line above retrieves the data from worker 3 to assign the value of X. The followingcode sends data to worke
9 GPU Computing9-12Function Input Range for Real Outputlog(x) x >= 0log1p(x) x >= -1log10(x) x >= 0log2(x) x >= 0power(x,y) x >= 0reall
Run Element-wise MATLAB Code on GPU9-13Run Element-wise MATLAB Code on GPUIn this section...“MATLAB Code vs. gpuArray Objects” on page 9-13“Run Your
9 GPU Computing9-14Example: Run Your MATLAB CodeIn this example, a small function applies correction data to an array of measurementdata. The function
Run Element-wise MATLAB Code on GPU9-15Supported MATLAB CodeThe function you pass into arrayfun or bsxfun can contain the following built-inMATLAB fu
9 GPU Computing9-16Generate Random Numbers on a GPUThe function you pass to arrayfun or bsxfun for execution on a GPU can contain therandom number gen
Run Element-wise MATLAB Code on GPU9-17for gpuArray” on page 9-5. For more information about generating random numberson a GPU, and a comparison betw
9 GPU Computing9-18Identify and Select a GPU DeviceIf you have only one GPU in your computer, that GPU is the default. If you have morethan one GPU de
Identify and Select a GPU Device9-19 AvailableMemory: 4.9190e+09 MultiprocessorCount: 13 ClockRateKHz: 614500
9 GPU Computing9-20Run CUDA or PTX Code on GPUIn this section...“Overview” on page 9-20“Create a CUDAKernel Object” on page 9-21“Run a CUDAKernel” on
Run CUDA or PTX Code on GPU9-21The following sections provide details of these commands and workflow steps.Create a CUDAKernel Object• “Compile a PTX
1 Getting Started1-14Determine Product Installation and VersionsTo determine if Parallel Computing Toolbox software is installed on your system, typet
9 GPU Computing9-22k = parallel.gpu.CUDAKernel('myfun.ptx','float *, const float *, float');Another use for C prototype input is w
Run CUDA or PTX Code on GPU9-23Integer Typesint8_T, int16_T, int32_T, int64_Tuint8_T, uint16_T, uint32_T, uint64_TThe header file is shipped as matla
9 GPU Computing9-24These rules have some implications. The most notable is that every output from a kernelmust necessarily also be an input to the ker
Run CUDA or PTX Code on GPU9-25__global__ void simplestKernelEver( float * x, float val )then the PTX code contains an entry that might be called_Z18
9 GPU Computing9-26• GridSize — A vector of three elements, the product of which determines the numberof blocks.• ThreadBlockSize — A vector of three
Run CUDA or PTX Code on GPU9-27Use gpuArray VariablesIt might be more efficient to use gpuArray objects as input when running a kernel:k = parallel.g
9 GPU Computing9-28The input values x1 and x2 correspond to pInOut and c in the C function prototype. Theoutput argument y corresponds to the value of
Run CUDA or PTX Code on GPU9-292Compile the CU code at the shell command line to generate a PTX file calledtest.ptx.nvcc -ptx test.cu3Create the kern
9 GPU Computing9-304Before you run the kernel, set the number of threads correctly for the vectors youwant to add.N = 128;k.ThreadBlockSize = N;in1 =
Run MEX-Functions Containing CUDA Code9-31Run MEX-Functions Containing CUDA CodeIn this section...“Write a MEX-File Containing CUDA Code” on page 9-3
2Parallel for-Loops (parfor)• “Introduction to parfor” on page 2-2• “Create a parfor-Loop” on page 2-4• “Comparing for-Loops and parfor-Loops” on page
9 GPU Computing9-32{ int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) B[i] = 2.0 * A[i];}It contains the following lines to d
Run MEX-Functions Containing CUDA Code9-33Compile a GPU MEX-FileWhen you have set up the options file, use the mex command in MATLAB to compilea MEX-
9 GPU Computing9-34• MEX-files can analyze the size of the input and allocate memory of a different size,or launch grids of a different size, from C o
Measure and Improve GPU Performance9-35Measure and Improve GPU PerformanceIn this section...“Basic Workflow for Improving Performance” on page 9-35“A
9 GPU Computing9-36you might need to vectorize your code, replacing looped scalar operations with MATLABmatrix and vector operations. While vectorizin
Measure and Improve GPU Performance9-37on the GPU, rewrites the code to use arrayfun for element-wise operations, and finallyshows how to integrate a
9 GPU Computing9-38if you make that the first dimension. Similarly, if you frequently operate along aparticular dimension, it is usually best to have
Measure and Improve GPU Performance9-39repeating the timed operation to get better resolution, executing the function beforemeasurement to avoid init
9 GPU Computing9-40transform of a filter vector, transforms back to the time domain, and stores the result inan output matrix.function y = fastConvolu
Measure and Improve GPU Performance9-41On the same machine, this code displays the output:Execution time on CPU = 0.019335Execution time on GPU = 0.0
Revision HistoryNovember 2004 Online only New for Version 1.0 (Release 14SP1+)March 2005 Online only Revised for Version 1.0.1 (Release 14SP2)Septembe
2 Parallel for-Loops (parfor)2-2Introduction to parforIn this section...“parfor-Loops in MATLAB” on page 2-2“Deciding When to Use parfor” on page 2-2p
9 GPU Computing9-42Execution time on GPU = 0.0020537Maximum absolute error = 1.1374e-14In conclusion, vectorizing the code helps both the CPU and GPU
10Objects — Alphabetical List
10 Objects — Alphabetical List10-2codistributedAccess elements of arrays distributed among workers in parallel poolConstructorcodistributed, codistrib
codistributed10-3Also among the methods there are several for examining the characteristics of the arrayitself. Most behave like the MATLAB functions
10 Objects — Alphabetical List10-4codistributor1d1-D distribution scheme for codistributed arrayConstructorcodistributor1dDescriptionA codistributor1d
codistributor2dbc10-5codistributor2dbc2-D block-cyclic distribution scheme for codistributed arrayConstructorcodistributor2dbcDescriptionA codistribu
10 Objects — Alphabetical List10-6CompositeAccess nondistributed variables on multiple workers from clientConstructorCompositeDescriptionVariables tha
CUDAKernel10-7CUDAKernelKernel executable on GPUConstructorparallel.gpu.CUDAKernelDescriptionA CUDAKernel object represents a CUDA kernel, that can e
10 Objects — Alphabetical List10-8Property Name Descriptioncorresponding element in the vector of the MaxGridSize propertyof the GPUDevice object.Shar
CUDAKernel10-9See AlsogpuArray, GPUDevice
Introduction to parfor2-3one when you have only a small number of simple calculations. The examples of thissection are only to illustrate the behavio
10 Objects — Alphabetical List10-10distributedAccess elements of distributed arrays from clientConstructordistributedYou can also create a distributed
distributed10-11MethodsThe overloaded methods for distributed arrays are too numerous to list here. Mostresemble and behave the same as built-in MATL
10 Objects — Alphabetical List10-12gpuArrayArray stored on GPUConstructorgpuArray converts an array in the MATLAB workspace into a gpuArray with eleme
gpuArray10-13DescriptionA gpuArray object represents an array stored on the GPU. You can use the array fordirect calculations, or in CUDA kernels tha
10 Objects — Alphabetical List10-14GPUDeviceGraphics processing unit (GPU)ConstructorgpuDeviceDescriptionA GPUDevice object represents a graphic proce
GPUDevice10-15where methodname is the name of the method. For example, to get help onisAvailable, typehelp parallel.gpu.GPUDevice.isAvailableProperti
10 Objects — Alphabetical List10-16Property Name DescriptionClockRateKHz Peak clock rate of the GPU in kHz.ComputeMode The compute mode of the device,
mxGPUArray10-17mxGPUArrayType for MATLAB gpuArrayDescriptionmxGPUArray is an opaque C language type that allows a MEX function access to theelements
10 Objects — Alphabetical List10-18See AlsogpuArray, mxArray
parallel.Cluster10-19parallel.ClusterAccess cluster properties and behaviorsConstructorsparclustergetCurrentCluster (in the workspace of the MATLAB w
2 Parallel for-Loops (parfor)2-4Create a parfor-LoopThe safest approach when creating a parfor-loop is to assume that iterations areperformed on diffe
10 Objects — Alphabetical List10-20Cluster Type Descriptionparallel.cluster.HPCServer Interact with CJS cluster running WindowsMicrosoft HPC Serverpar
parallel.Cluster10-21Property DescriptionHost Host name of the cluster head nodeJobStorageLocation Location where cluster stores job and taskinformat
10 Objects — Alphabetical List10-22Property DescriptionSecurityLevel Degree of security applied to cluster and itsjobs. For descriptions of security l
parallel.Cluster10-23Property DescriptionRcpCommand Command to copy files to and from clientResourceTemplate Define resources to request forcommunica
10 Objects — Alphabetical List10-24Property DescriptionDeleteTaskFcn Function to run when deleting taskGetJobStateFcn Function to run when querying jo
parallel.cluster.Hadoop10-25parallel.cluster.HadoopHadoop cluster for mapreducerConstructorsparallel.cluster.HadoopDescriptionA parallel.cluster.Hado
10 Objects — Alphabetical List10-26Property DescriptionRequiresMathWorksHostedLicensing Specify whether cluster uses MathWorkshosted licensingHelpFor
parallel.Future10-27parallel.FutureRequest function execution on parallel pool workersConstructorsparfeval, parfevalOnAllContainer HierarchyParent pa
10 Objects — Alphabetical List10-28Method Descriptioncancel Cancel queued or running futurefetchNext Retrieve next available unread futureoutputs (Fev
parallel.Future10-29help parallel.FevalFuturehelp parallel.FevalOnAllFutureSee Alsoparallel.Pool
Create a parfor-Loop2-5• “Comparing for-Loops and parfor-Loops” on page 2-6• “Reductions: Cumulative Values Updated by Each Iteration” on page 2-8• “
10 Objects — Alphabetical List10-30parallel.JobAccess job properties and behaviorsConstructorscreateCommunicatingJob, createJob, findJob, recreategetC
parallel.Job10-31MethodsAll job type objects have the same methods, described in the following table.PropertiesCommon to All Job TypesThe following p
10 Objects — Alphabetical List10-32Property DescriptionUserData Information associated with job objectUsername Name of user who owns jobMJS JobsMJS in
parallel.Job10-33HelpTo get further help on a particular type of parallel.Job object, including a list of links tohelp for its properties, type help
10 Objects — Alphabetical List10-34parallel.PoolAccess parallel poolConstructorsparpool, gcpDescriptionA parallel.Pool object provides access to a par
parallel.Pool10-35HelpTo get further help on parallel.Pool objects, including a list of links to help for specificproperties, type:help parallel.Pool
10 Objects — Alphabetical List10-36parallel.TaskAccess task properties and behaviorsConstructorscreateTask, findTaskgetCurrentTask (in the workspace o
parallel.Task10-37PropertiesCommon to All Task TypesThe following properties are common to all task object types.Property DescriptionCaptureDiary Spe
10 Objects — Alphabetical List10-38MJS TasksMJS task objects have the following properties in addition to the common properties:Property DescriptionFa
parallel.Worker10-39parallel.WorkerAccess worker that ran taskConstructorsgetCurrentWorker in the workspace of the MATLAB worker.In the client worksp
2 Parallel for-Loops (parfor)2-6Comparing for-Loops and parfor-LoopsBecause parfor-loops are not quite the same as for-loops, there are specific behav
10 Objects — Alphabetical List10-40PropertiesMJS WorkerThe following table describes the properties of an MJS worker.Property DescriptionAllHostAddres
RemoteClusterAccess10-41RemoteClusterAccessConnect to schedulers when client utilities are not available locallyConstructorr = parallel.cluster.Remot
10 Objects — Alphabetical List10-42MethodsMethod Name Descriptionconnect connect(r,clusterHost) establishes a connection to thespecified host using th
RemoteClusterAccess10-43Method Name DescriptionrunCommand [status,result] = runCommand(r,command) runsthe supplied command on the remote host and ret
10 Objects — Alphabetical List10-44Property Name DescriptionJobStorageLocation Location on the remote host for files that are being mirrored.UseIdenti
11Functions — Alphabetical List
11 Functions — Alphabetical List11-2addAttachedFilesAttach files or folders to parallel poolSyntaxaddAttachedFiles(poolobj,files)DescriptionaddAttache
addAttachedFiles11-3Files or folders to attach, specified as a string or cell array of strings. Each string canspecify either an absolute or relative
11 Functions — Alphabetical List11-4arrayfunApply function to each element of array on GPUSyntaxA = arrayfun(FUN, B)A = arrayfun(FUN,B,C,...)[A,B,...]
arrayfun11-5all have the same size or be scalar. Any scalar inputs are scalar expanded before beinginput to the function FUN.One or more of the input
Comparing for-Loops and parfor-Loops2-7parfor-loop requires that each iteration be independent of the other iterations, and thatall code that follows
11 Functions — Alphabetical List11-6R2 = rand(2,1,4,3,'gpuArray');R3 = rand(1,5,4,3,'gpuArray');R = arrayfun(@(x,y,z)(x+y.*z),R1,R
arrayfun11-7 o2 400x400 108 gpuArray s1 400x400 108 gpuArray s2 400x400 108 gpuArray s3
11 Functions — Alphabetical List11-8batchRun MATLAB script or function on workerSyntaxj = batch('aScript')j = batch(myCluster,'aScript&
batch11-9j = batch(fcn,N,{x1, ..., xn}) runs the function specified by a function handleor function name, fcn, on a worker in the cluster identified
11 Functions — Alphabetical List11-10is the cwd of MATLAB when the batch command is executed. If the string for thisargument is '.', there i
batch11-11Clean up a batch job’s data after you are finished with it:delete(j)Run a batch function on a cluster that generates a 10-by-10 random matr
11 Functions — Alphabetical List11-12bsxfunBinary singleton expansion function for gpuArraySyntaxC = bsxfun(FUN,A,B)Descriptionbsxfun with gpuArray in
bsxfun11-13size(R) 2 5 4 3R1 = rand(2,2,0,4,'gpuArray');R2 = rand(2,1,1,4,'gpuArray');R = bsxfun(@plus,R1,R2);size(R
11 Functions — Alphabetical List11-14cancelCancel job or taskSyntaxcancel(t)cancel(j)Argumentst Pending or running task to cancel.j Pending, running,
cancel11-15c = parcluster();job1 = createJob(c);t = createTask(job1, @rand, 1, {3,3});cancel(t)t Task with properties: ID: 1
2 Parallel for-Loops (parfor)2-8Reductions: Cumulative Values Updated by Each IterationThese two examples show parfor-loops using reduction assignment
11 Functions — Alphabetical List11-16cancel (FevalFuture)Cancel queued or running futureSyntaxcancel(F)Descriptioncancel(F) stops the queued and runni
cancel (FevalFuture)11-17See AlsofetchOutputs | isequal | parfeval | parfevalOnAll | fetchNext
11 Functions — Alphabetical List11-18changePasswordPrompt user to change MJS passwordSyntaxchangePassword(mjs)changePassword(mjs,username)Argumentsmjs
changePassword11-19Change your password for the MJS cluster on which the parallel pool is running.p = gcp;mjs = p.Cluster;changePassword(mjs)See Also
11 Functions — Alphabetical List11-20classUnderlyingClass of elements within gpuArray or distributed arraySyntaxC = classUnderlying(D)DescriptionC = c
classUnderlying11-21c1 = classUnderlying(D1)c8 =uint8c1 =singleSee Alsodistributed | codistributed | gpuArray
11 Functions — Alphabetical List11-22clearRemove objects from MATLAB workspaceSyntaxclear objArgumentsobj An object or an array of objects.Description
clear11-23 1isequal (job1copy, j2)ans = 0More AboutTipsIf obj references an object in the cluster, it is cleared from the workspace, but it r
11 Functions — Alphabetical List11-24codistributedCreate codistributed array from replicated local dataSyntaxC = codistributed(X)C = codistributed(X,c
codistributed11-25ExamplesCreate a 1000-by-1000 codistributed array C1 using the default distribution scheme.spmd N = 1000; X = magic(N);
Reductions: Cumulative Values Updated by Each Iteration2-9More About• “Introduction to parfor” on page 2-2• “Comparing for-Loops and parfor-Loops” on
11 Functions — Alphabetical List11-26codistributed.buildCreate codistributed array from distributed dataSyntaxD = codistributed.build(L, codist)D = co
codistributed.build11-27 % Distribute the matrix over the second dimension (columns), % and let the codistributor derive the partition from the
11 Functions — Alphabetical List11-28codistributed.cellCreate codistributed cell arraySyntaxC = codistributed.cell(n)C = codistributed.cell(m, n, p, .
codistributed.cell11-29 C = cell(8, codistributor1d());endC = cell(m, n, p, ..., codist) and C = cell([m, n, p, ...],codist) are the same as C = c
11 Functions — Alphabetical List11-30codistributed.colonDistributed colon operationSyntaxcodistributed.colon(a,d,b)codistributed.colon(a,b)codistribut
codistributed.colon11-31spmd(4); C = codistributed.colon(1,10), endLab 1: This worker stores C(1:3). LocalPart: [1 2 3] Codistributor
11 Functions — Alphabetical List11-32codistributed.spallocAllocate space for sparse codistributed matrixSyntaxSD = codistributed.spalloc(M, N, nzmax)S
codistributed.spalloc11-33 SD = codistributed.spalloc(N, N, 2*N); for ii=1:N-1 SD(ii,ii:ii+1) = [ii ii]; endendSee Alsospalloc | sparse
11 Functions — Alphabetical List11-34codistributed.speyeCreate codistributed sparse identity matrixSyntaxCS = codistributed.speye(n)CS = codistributed
codistributed.speye11-35CS = speye(m, n, codist) and CS = speye([m, n], codist) are the same asCS = codistributed.speye(m, n) and CS = codistributed.
2 Parallel for-Loops (parfor)2-10parfor Programming ConsiderationsIn this section...“MATLAB Path” on page 2-10“Error Handling” on page 2-10MATLAB Path
11 Functions — Alphabetical List11-36codistributed.sprandCreate codistributed sparse array of uniformly distributed pseudo-random valuesSyntaxCS = cod
codistributed.sprand11-37spmd(4) CS = codistributed.sprand(1000, 1000, .001);endcreates a 1000-by-1000 sparse codistributed double array CS with a
11 Functions — Alphabetical List11-38codistributed.sprandnCreate codistributed sparse array of uniformly distributed pseudo-random valuesSyntaxCS = co
codistributed.sprandn11-39spmd(4) CS = codistributed.sprandn(1000, 1000, .001);endcreates a 1000-by-1000 sparse codistributed double array CS with
11 Functions — Alphabetical List11-40codistributorCreate codistributor object for codistributed arraysSyntaxcodist = codistributor()codist = codistrib
codistributor11-41codist = codistributor('2dbc') forms a 2-D block-cyclic codistributor object. Formore information about '2dbc'
11 Functions — Alphabetical List11-42ASee Alsocodistributed | codistributor1d | codistributor2dbc | getCodistributor |getLocalPart | redistribute
codistributor1d11-43codistributor1dCreate 1-D codistributor object for codistributed arraysSyntaxcodist = codistributor1d()codist = codistributor1d(d
11 Functions — Alphabetical List11-44To use a default dimension, specify codistributor1d.unsetDimension forthat argument; the distribution dimension i
codistributor1d.defaultPartition11-45codistributor1d.defaultPartitionDefault partition for codistributed arraySyntaxP = codistributor1d.defaultPartit
parfor Limitations2-11parfor LimitationsMost of these restrictions result from the need for loop iterations to be completelyindependent of each other
11 Functions — Alphabetical List11-46codistributor2dbcCreate 2-D block-cyclic codistributor object for codistributed arraysSyntaxcodist = codistributo
codistributor2dbc11-47codist = codistributor2dbc(lbgrid,blksize,orient,gsize) formsa codistributor object that distributes arrays with the global siz
11 Functions — Alphabetical List11-48codistributor2dbc.defaultLabGridDefault computational grid for 2-D block-cyclic distributed arraysSyntaxgrid = co
Composite11-49CompositeCreate Composite objectSyntaxC = Composite()C = Composite(nlabs)DescriptionC = Composite() creates a Composite object on the c
11 Functions — Alphabetical List11-50ExamplesThe following examples all use a local parallel pool of four workers, opened with thestatement:p = parpoo
Composite11-51d = distributed([3 1 4 2]); % One integer per workerspmd c = getLocalPart(d); % Unique value on each workerendc{:} 3 1
11 Functions — Alphabetical List11-52createCommunicatingJobCreate communicating job on clusterSyntaxjob = createCommunicatingJob(cluster)job = createC
createCommunicatingJob11-53simultaneously on all workers, and lab* functions can be used for communicationbetween workers.job = createCommunicatingJo
11 Functions — Alphabetical List11-54Delete the job from the cluster.delete(j);See AlsocreateJob | createTask | findJob | parcluster | recreate | subm
createJob11-55createJobCreate independent job on clusterSyntaxobj = createJob(cluster)obj = createJob(...,'p1',v1,'p2',v2,...)job
2 Parallel for-Loops (parfor)2-12Inputs and Outputs in parfor-LoopsIn this section...“Functions with Interactive Inputs” on page 2-12“Displaying Outpu
11 Functions — Alphabetical List11-56is not specified and the cluster has a value specified in its 'Profile' property, thecluster’s profile
createJob11-57 {'myapp/folderA','myapp/folderB','myapp/file1.m'});See AlsocreateCommunicatingJob | createTask |
11 Functions — Alphabetical List11-58createTaskCreate new task in jobSyntaxt = createTask(j, F, N, {inputargs})t = createTask(j, F, N, {C1,...,Cm})t =
createTask11-59by a function handle or function name F, with the given input arguments {inputargs},returning N output arguments.t = createTask(j, F,
11 Functions — Alphabetical List11-60Run the job.submit(j);Wait for the job to finish running, and get the output from the task evaluation.wait(j);tas
delete11-61deleteRemove job or task object from cluster and memorySyntaxdelete(obj)Descriptiondelete(obj) removes the job or task object, obj, from t
11 Functions — Alphabetical List11-62Delete all jobs on the cluster identified by the profile myProfile:myCluster = parcluster('myProfile');
delete (Pool)11-63delete (Pool)Shut down parallel poolSyntaxdelete(poolobj)Descriptiondelete(poolobj) shuts down the parallel pool associated with th
11 Functions — Alphabetical List11-64demoteDemote job in cluster queueSyntaxdemote(c,job)Argumentsc Cluster object that contains the job.job Job objec
demote11-65Examine the new queue sequence:[pjobs,qjobs,rjobs,fjobs] = findJob(c);get(qjobs,'Name') 'Job A' 'Job C&apos
Objects and Handles in parfor-Loops2-13Objects and Handles in parfor-LoopsIn this section...“Using Objects in parfor-Loops” on page 2-13“Handle Class
11 Functions — Alphabetical List11-66diaryDisplay or save Command Window text of batch jobSyntaxdiary(job)diary(job, 'filename')Argumentsjob
distributed11-67distributedCreate distributed array from data in client workspaceSyntaxD = distributed(X)DescriptionD = distributed(X) creates a dist
11 Functions — Alphabetical List11-68D1 = distributed(magic(Nsmall));Create a large distributed array directly, using a build method:Nlarge = 1000;D2
distributed.cell11-69distributed.cellCreate distributed cell arraySyntaxD = distributed.cell(n)D = distributed.cell(m, n, p, ...)D = distributed.cell
11 Functions — Alphabetical List11-70distributed.spallocAllocate space for sparse distributed matrixSyntaxSD = distributed.spalloc(M, N, nzmax)Descrip
distributed.speye11-71distributed.speyeCreate distributed sparse identity matrixSyntaxDS = distributed.speye(n)DS = distributed.speye(m, n)DS = distr
11 Functions — Alphabetical List11-72distributed.sprandCreate distributed sparse array of uniformly distributed pseudo-random valuesSyntaxDS = distrib
distributed.sprandn11-73distributed.sprandnCreate distributed sparse array of normally distributed pseudo-random valuesSyntaxDS = distributed.sprandn
11 Functions — Alphabetical List11-74dloadLoad distributed arrays and Composite objects from diskSyntaxdloaddload filenamedload filename Xdload filena
dload11-75When loading Composite objects, the data is sent to the available parallel pool workers. Ifthe Composite is too large to fit on the current
2 Parallel for-Loops (parfor)2-14B = @sin;for ii = 1:100 A(ii) = B(ii);endA corresponding parfor-loop does not allow B to reference a function hand
11 Functions — Alphabetical List11-76dsaveSave workspace distributed arrays and Composite objects to diskSyntaxdsavedsave filenamedsave filename Xdsav
dsave11-77See Alsosave | distributed | Composite | dload | parpool
11 Functions — Alphabetical List11-78existCheck whether Composite is defined on workersSyntaxh = exist(C,labidx)h = exist(C)Descriptionh = exist(C,lab
existsOnGPU11-79existsOnGPUDetermine if gpuArray or CUDAKernel is available on GPUSyntaxTF = existsOnGPU(DATA)DescriptionTF = existsOnGPU(DATA) retur
11 Functions — Alphabetical List11-80 4 14 15 1reset(g);M_exists = existsOnGPU(M) 0M % Try to display gpuArrayData no longer exists
eye11-81eyeIdentity matrixSyntaxE = eye(sz,arraytype)E = eye(sz,datatype,arraytype)E = eye(sz,'like',P)E = eye(sz,datatype,'like'
11 Functions — Alphabetical List11-82Argument Values Descriptions'uint8', 'int16','uint16','int32','uint3
eye11-83D = eye(1000,'distributed');Create Codistributed Identity MatrixCreate a 1000-by-1000 codistributed double identity matrix, distrib
11 Functions — Alphabetical List11-84falseArray of logical 0 (false)SyntaxF = false(sz,arraytype)F = false(sz,'like',P)C = false(sz,codist)C
false11-85see the reference pages for codistributor1d and codistributor2dbc. To usethe default distribution scheme, you can specify a codistributor c
Nesting and Flow in parfor-Loops2-15Nesting and Flow in parfor-LoopsIn this section...“Nested Functions” on page 2-15“Nested Loops” on page 2-15“Nest
11 Functions — Alphabetical List11-86Each worker contains a 100-by-labindex local piece of C.Create gpuArray False MatrixCreate a 1000-by-1000 gpuArra
fetchNext11-87fetchNextRetrieve next available unread FevalFuture outputsSyntax[idx,B1,B2,...,Bn] = fetchNext(F)[idx,B1,B2,...,Bn] = fetchNext(F,TIME
11 Functions — Alphabetical List11-88end% Build a waitbar to track progressh = waitbar(0,'Waiting for FevalFutures to complete...');results
fetchOutputs (job)11-89fetchOutputs (job)Retrieve output arguments from all tasks in jobSyntaxdata = fetchOutputs(job)Descriptiondata = fetchOutputs(
11 Functions — Alphabetical List11-90Wait for the job to finish and retrieve the random matrix:wait(j)data = fetchOutputs(j);data{1}
fetchOutputs (FevalFuture)11-91fetchOutputs (FevalFuture)Retrieve all output arguments from FutureSyntax[B1,B2,...,Bn] = fetchOutputs(F)[B1,B2,...,Bn
11 Functions — Alphabetical List11-920.0048 0.9658 0.8488Create an FevalFuture vector, and fetch all its outputs.for idx = 1:10 F(idx) = parf
feval11-93fevalEvaluate kernel on GPUSyntaxfeval(KERN, x1, ..., xn)[y1, ..., ym] = feval(KERN, x1, ..., xn)Descriptionfeval(KERN, x1, ..., xn) evalua
11 Functions — Alphabetical List11-94[y1, y2] = feval(KERN, x1, x2, x3) The three input arguments, x1, x2, and x3, correspond to the three argument
findJob11-95findJobFind job objects stored in clusterSyntaxout = findJob(c)[pending queued running completed] = findJob(c)out = findJob(c,'p1&ap
2 Parallel for-Loops (parfor)2-16Limitations of Nested for-LoopsFor proper variable classification, the range of a for-loop nested in a parfor must be
11 Functions — Alphabetical List11-96completed jobs include those that failed. Jobs that are deleted or whose status isunavailable are not returned by
findTask11-97findTaskTask objects belonging to job objectSyntaxtasks = findTask(j)[pending running completed] = findTask(j)tasks = findTask(j,'p
11 Functions — Alphabetical List11-98specified property-value pairs, p1, v1, p2, v2, etc. The property name must be in theform of a string, with the v
for11-99forfor-loop over distributed rangeSyntaxfor variable = drange(colonop) statement ... statementendDescriptionThe general format isfor
11 Functions — Alphabetical List11-100ExamplesFind the rank of magic squares. Access only the local portion of a codistributed array.r = zeros(1, 40,
gather11-101gatherTransfer distributed array or gpuArray to local workspaceSyntaxX = gather(A)X = gather(C,lab)DescriptionX = gather(A) can operate i
11 Functions — Alphabetical List11-102n = 10;spmd C = codistributed(magic(n)); M = gather(C) % Gather all elements to all workersendS = gather(C)
gather11-103W 1024x1 8192 doubleMore AboutTipsNote that gather assembles the codistributed or distributed array in the workspaces of
11 Functions — Alphabetical List11-104gcatGlobal concatenationSyntaxXs = gcat(X)Xs = gcat(X, dim)Xs = gcat(X, dim, targetlab)DescriptionXs = gcat(X) c
gcp11-105gcpGet current parallel poolSyntaxp = gcpp = gcp('nocreate')Descriptionp = gcp returns a parallel.Pool object representing the cur
Nesting and Flow in parfor-Loops2-17Invalid Valid disp(A(i, 1))end end disp(v(1)) A(i, :) = v;endInside a parfor, if you use multiple for
11 Functions — Alphabetical List11-106delete(gcp('nocreate'))See AlsoComposite | delete | distributed | parfeval | parfevalOnAll | parfor |p
getAttachedFilesFolder11-107getAttachedFilesFolderFolder into which AttachedFiles are writtenSyntaxfolder = getAttachedFilesFolderArgumentsfolder Str
11 Functions — Alphabetical List11-108getCodistributorCodistributor object for existing codistributed arraySyntaxcodist = getCodistributor(D)Descripti
getCodistributor11-109 ornt = codist2.OrientationendDemonstrate that these codistributor objects are complete:spmd (4) isComplete(codist1)
11 Functions — Alphabetical List11-110getCurrentClusterCluster object that submitted current taskSyntaxc = getCurrentClusterArgumentsc The cluster obj
getCurrentCluster11-111See AlsogetAttachedFilesFolder | getCurrentJob | getCurrentTask |getCurrentWorker
11 Functions — Alphabetical List11-112getCurrentJobJob object whose task is currently being evaluatedSyntaxjob = getCurrentJobArgumentsjob The job obj
getCurrentTask11-113getCurrentTaskTask object currently being evaluated in this worker sessionSyntaxtask = getCurrentTaskArgumentstask The task objec
11 Functions — Alphabetical List11-114getCurrentWorkerWorker object currently running this sessionSyntaxworker = getCurrentWorkerArgumentsworker The w
getCurrentWorker11-115j = createJob(c);j.AttachedFiles = {'identifyWorkerHost.m'};t = createTask(j,@identifyWorkerHost,1,{});submit(j)wait(
2 Parallel for-Loops (parfor)2-18More About• “parfor Limitations” on page 2-11• “Convert Nested for-Loops to parfor” on page 2-48
11 Functions — Alphabetical List11-116getDebugLogRead output messages from job run in CJS clusterSyntaxstr = getDebugLog(cluster, job_or_task)Argument
getDebugLog11-117getDebugLog(c,j);See AlsocreateCommunicatingJob | createJob | createTask | parcluster
11 Functions — Alphabetical List11-118getJobClusterDataGet specific user data for job on generic clusterSyntaxuserdata = getJobClusterData(cluster,job
getJobFolder11-119getJobFolderFolder on client where jobs are storedSyntaxjoblocation = getJobFolder(cluster,job)Descriptionjoblocation = getJobFolde
11 Functions — Alphabetical List11-120getJobFolderOnClusterFolder on cluster where jobs are storedSyntaxjoblocation = getJobFolderOnCluster(cluster,jo
getLocalPart11-121getLocalPartLocal portion of codistributed arraySyntaxL = getLocalPart(A)DescriptionL = getLocalPart(A) returns the local portion o
11 Functions — Alphabetical List11-122getLogLocationLog location for job or taskSyntaxlogfile = getLogLocation(cluster,cj)logfile = getLogLocation(clu
globalIndices11-123globalIndicesGlobal indices for local part of codistributed arraySyntaxK = globalIndices(C,dim)K = globalIndices(C,dim,lab)[E,F] =
11 Functions — Alphabetical List11-124ExamplesCreate a 2-by-22 codistributed array among four workers, and view the global indices oneach lab:spmd
gop11-125gopGlobal operation across all workersSyntaxres = gop(FUN,x)res = gop(FUN,x,targetlab)ArgumentsFUN Function to operate across workers.x Argu
Variables and Transparency in parfor-Loops2-19Variables and Transparency in parfor-LoopsIn this section...“Unambiguous Variable Names” on page 2-19“T
11 Functions — Alphabetical List11-126ExamplesThis example shows how to calculate the sum and maximum values for x among allworkers.p = parpool('
gop11-127spmd res = gop(afun,num2str(labindex));endres{1}1 2 3 4See AlsolabBarrier | labindex | numlabs
11 Functions — Alphabetical List11-128gplusGlobal additionSyntaxS = gplus(X)S = gplus(X, targetlab)DescriptionS = gplus(X) returns the addition of the
gpuArray11-129gpuArrayCreate array on GPUSyntaxG = gpuArray(X)DescriptionG = gpuArray(X) copies the numeric array X to the GPU, and returns a gpuArra
11 Functions — Alphabetical List11-130 G2 10x10 108 gpuArrayCopy the array back to the MATLAB workspace.G1 = gather(G2);whos G1 Name
gpuDevice11-131gpuDeviceQuery or select GPU deviceSyntaxD = gpuDeviceD = gpuDevice()D = gpuDevice(IDX)gpuDevice([ ])DescriptionD = gpuDevice or D = g
11 Functions — Alphabetical List11-132for ii = 1:gpuDeviceCount g = gpuDevice(ii); fprintf(1,'Device %i has ComputeCapability %s \n',
gpuDeviceCount11-133gpuDeviceCountNumber of GPU devices presentSyntaxn = gpuDeviceCountDescriptionn = gpuDeviceCount returns the number of GPU device
11 Functions — Alphabetical List11-134gputimeitTime required to run function on GPUSyntaxt = gputimeit(F)t = gputimeit(F,N)Descriptiont = gputimeit(F)
gputimeit11-135t1 = gputimeit(f,1)0.2933More AboutTipsgputimeit is preferable to timeit for functions that use the GPU, because it ensuresthat all op
2 Parallel for-Loops (parfor)2-20Similarly, you cannot clear variables from a worker's workspace by executing clearinside a parfor statement:parf
11 Functions — Alphabetical List11-136helpHelp for toolbox functions in Command WindowSyntaxhelp class/functionArgumentsclass A Parallel Computing Too
help11-137parallel.job.CJSIndependentJobhelp parallel.job/createTaskhelp parallel.job/AdditionalPathsSee Alsomethods
11 Functions — Alphabetical List11-138InfArray of infinitySyntaxA = Inf(sz,arraytype)A = Inf(sz,datatype,arraytype)A = Inf(sz,'like',P)A = I
Inf11-139Argument Values Descriptions'gpuArray'Specifies gpuArray.datatype'double' (default),'single'Specifies underlyi
11 Functions — Alphabetical List11-140Create Codistributed Inf MatrixCreate a 1000-by-1000 codistributed double matrix of Infs, distributed by its sec
isaUnderlying11-141isaUnderlyingTrue if distributed array's underlying elements are of specified classSyntaxTF = isaUnderlying(D, 'classnam
11 Functions — Alphabetical List11-142iscodistributedTrue for codistributed arraySyntaxtf = iscodistributed(X)Descriptiontf = iscodistributed(X) retur
isComplete11-143isCompleteTrue if codistributor object is completeSyntaxtf = isComplete(codist)Descriptiontf = isComplete(codist) returns true if cod
11 Functions — Alphabetical List11-144isdistributedTrue for distributed arraySyntaxtf = isdistributed(X)Descriptiontf = isdistributed(X) returns true
isequal11-145isequalTrue if clusters have same property valuesSyntaxisequal(C1,C2)isequal(C1,C2,C3,...)Descriptionisequal(C1,C2) returns logical 1 (t
Variables and Transparency in parfor-Loops2-21 temp = struct(); temp.myfield1 = rand(); temp.myfield2 = i;end parfor i = 1:4 temp = struc
11 Functions — Alphabetical List11-146isequal (FevalFuture)True if futures have same IDSyntaxeq = isequal(F1,F2)Descriptioneq = isequal(F1,F2) returns
isreplicated11-147isreplicatedTrue for replicated arraySyntaxtf = isreplicated(X)Descriptiontf = isreplicated(X) returns true for a replicated array,
11 Functions — Alphabetical List11-148jobStartupFile for user-defined options to run when job startsSyntaxjobStartup(job)Argumentsjob The job for whic
labBarrier11-149labBarrierBlock execution until all workers reach this callSyntaxlabBarrierDescriptionlabBarrier blocks execution of a parallel algor
11 Functions — Alphabetical List11-150labBroadcastSend data to all workers or receive data sent to all workersSyntaxshared_data = labBroadcast(srcWkrI
labBroadcast11-151ExamplesIn this case, the broadcaster is the worker whose labindex is 1.srcWkrIdx = 1;if labindex == srcWkrIdx data = randn(10);
11 Functions — Alphabetical List11-152labindexIndex of this workerSyntaxid = labindexDescriptionid = labindex returns the index of the worker currentl
labProbe11-153labProbeTest to see if messages are ready to be received from other workerSyntaxisDataAvail = labProbeisDataAvail = labProbe(srcWkrIdx)
11 Functions — Alphabetical List11-154[isDataAvail,srcWkrIdx,tag] = labProbe returns labindex of the workers andtags of ready messages. If no data is
labReceive11-155labReceiveReceive data from another workerSyntaxdata = labReceivedata = labReceive(srcWkrIdx)data = labReceive('any',tag)da
vContentsGetting Started1Parallel Computing Toolbox Product Description . . . . . . . . 1-2Key Features . . . . . . . . . . . . . . . . . . . . . . .
2 Parallel for-Loops (parfor)2-22x = zeros(10,12);parfor idx = 1:12 x(:,idx) = idx;endThe following code offers a suggested workaround for this limit
11 Functions — Alphabetical List11-156More AboutTipsThis function blocks execution in the worker until the corresponding call to labSendoccurs in the
labSend11-157labSendSend data to another workerSyntaxlabSend(data,rcvWkrIdx)labSend(data,rcvWkrIdx,tag)Argumentsdata Data sent to the other workers;
11 Functions — Alphabetical List11-158labSendReceiveSimultaneously send data to and receive data from another workerSyntaxdataReceived = labSendReceiv
labSendReceive11-159dataReceived = labSendReceive(rcvWkrIdx,srcWkrIdx,dataSent,tag) usesthe specified tag for the communication. tag can be any integ
11 Functions — Alphabetical List11-160Lab 2: otherdata = 1Lab 3: otherdata = 1 3 4 2Transfer data to the next worker with
length11-161lengthLength of object arraySyntaxlength(obj)Argumentsobj An object or an array of objects.Descriptionlength(obj) returns the length of o
11 Functions — Alphabetical List11-162listAutoAttachedFilesList of files automatically attached to job, task, or parallel poolSyntaxlistAutoAttachedFi
listAutoAttachedFiles11-163listAutoAttachedFiles(obj)Automatically Attach Files ProgrammaticallyProgrammatically set a job to automatically attach co
11 Functions — Alphabetical List11-164loadLoad workspace variables from batch jobSyntaxload(job)load(job, 'X')load(job, 'X', &apos
load11-165S = load(job ...) returns the contents of job into variable S, which is a structcontaining fields matching the variables retrieved.Examples
Classification of Variables in parfor-Loops2-23Classification of Variables in parfor-LoopsWhen a name in a parfor-loop is recognized as referring to
11 Functions — Alphabetical List11-166logoutLog out of MJS clusterSyntaxlogout(c)Descriptionlogout(c) logs you out of the MJS cluster specified by clu
mapreducer11-167mapreducerDefine parallel execution environment for mapreducemapreducer is the execution configuration function for mapreduce. This f
11 Functions — Alphabetical List11-168mapreducer(hcluster) specifies a Hadoop cluster for parallel execution ofmapreduce. hcluster is a parallel.clust
mapreducer11-169Output Argumentsmr — Execution environment for mapreduceMapReducer objectExecution environment for mapreduce, returned as a MapReduce
11 Functions — Alphabetical List11-170methodsList functions of object classSyntaxmethods(obj)out = methods(obj)Argumentsobj An object or an array of o
methods11-171See Alsohelp
11 Functions — Alphabetical List11-172mpiLibConfLocation of MPI implementationSyntax[primaryLib, extras] = mpiLibConfArgumentsprimaryLib MPI implement
mpiLibConf11-173More AboutTipsUnder all circumstances, the MPI library must support all MPI-1 functions. Additionally,the MPI library must support nu
11 Functions — Alphabetical List11-174mpiprofileProfile parallel communication and execution timesSyntaxmpiprofilempiprofile on <options>mpiprof
mpiprofile11-175Option Descriptionadditionally records information about built-infunctions such as eig or labReceive.-messagedetail default-messagede
2 Parallel for-Loops (parfor)2-24loop variablesliced output variablebroadcast variabletemporary variablereduction variablesliced input variableNotes a
11 Functions — Alphabetical List11-176mpiprofile info returns a profiling data structure with additional fields to the oneprovided by the standard pro
mpiprofile11-177ExamplesIn pmode, turn on the parallel profiler, run your function in parallel, and call the viewer:mpiprofile on;% call your functio
11 Functions — Alphabetical List11-178mpiSettingsConfigure options for MPI communicationSyntaxmpiSettings('DeadlockDetection','on'
mpiSettings11-179ExamplesSet deadlock detection for a communicating job inside the jobStartup.m file for thatjob: % Inside jobStartup.m for the co
11 Functions — Alphabetical List11-180mxGPUCopyFromMxArray (C)Copy mxArray to mxGPUArrayC Syntax#include "gpu/mxGPUArray.h"mxGPUArray* mxGPU
mxGPUCopyGPUArray (C)11-181mxGPUCopyGPUArray (C)Duplicate (deep copy) mxGPUArray objectC Syntax#include "gpu/mxGPUArray.h"mxGPUArray* mxGPU
11 Functions — Alphabetical List11-182mxGPUCopyImag (C)Copy imaginary part of mxGPUArrayC Syntax#include "gpu/mxGPUArray.h"mxGPUArray* mxGPU
mxGPUCopyReal (C)11-183mxGPUCopyReal (C)Copy real part of mxGPUArrayC Syntax#include "gpu/mxGPUArray.h"mxGPUArray* mxGPUCopyReal(mxGPUArray
11 Functions — Alphabetical List11-184mxGPUCreateComplexGPUArray (C)Create complex GPU array from two real gpuArraysC Syntax#include "gpu/mxGPUAr
mxGPUCreateFromMxArray (C)11-185mxGPUCreateFromMxArray (C)Create read-only mxGPUArray object from input mxArrayC Syntax#include "gpu/mxGPUArray.
Loop Variable2-25Loop VariableThe loop variable defines the loop index value for each iteration. It is set with thebeginning line of a parfor stateme
11 Functions — Alphabetical List11-186mxGPUCreateGPUArray (C)Create mxGPUArray object, allocating memory on GPUC Syntax#include "gpu/mxGPUArray.h
mxGPUCreateGPUArray (C)11-187ReturnsPointer to an mxGPUArray.DescriptionmxGPUCreateGPUArray creates a new mxGPUArray object with the specified size,
11 Functions — Alphabetical List11-188mxGPUCreateMxArrayOnCPU (C)Create mxArray for returning CPU data to MATLAB with data from GPUC Syntax#include &q
mxGPUCreateMxArrayOnGPU (C)11-189mxGPUCreateMxArrayOnGPU (C)Create mxArray for returning GPU data to MATLABC Syntax#include "gpu/mxGPUArray.h&qu
11 Functions — Alphabetical List11-190mxGPUDestroyGPUArray (C)Delete mxGPUArray objectC Syntax#include "gpu/mxGPUArray.h"mxGPUDestroyGPUArra
mxGPUGetClassID (C)11-191mxGPUGetClassID (C)mxClassID associated with data on GPUC Syntax#include "gpu/mxGPUArray.h"mxClassID mxGPUGetClass
11 Functions — Alphabetical List11-192mxGPUGetComplexity (C)Complexity of data on GPUC Syntax#include "gpu/mxGPUArray.h"mxComplexity mxGPUGe
mxGPUGetData (C)11-193mxGPUGetData (C)Raw pointer to underlying dataC Syntax#include "gpu/mxGPUArray.h"void* mxGPUGetData(mxGPUArray const
11 Functions — Alphabetical List11-194mxGPUGetDataReadOnly (C)Read-only raw pointer to underlying dataC Syntax#include "gpu/mxGPUArray.h"voi
mxGPUGetDimensions (C)11-195mxGPUGetDimensions (C)mxGPUArray dimensionsC Syntax#include "gpu/mxGPUArray.h"mwSize const * mxGPUGetDimensions
2 Parallel for-Loops (parfor)2-26More About• “Classification of Variables in parfor-Loops” on page 2-23
11 Functions — Alphabetical List11-196mxGPUGetNumberOfDimensions (C)Size of dimension array for mxGPUArrayC Syntax#include "gpu/mxGPUArray.h"
mxGPUGetNumberOfElements (C)11-197mxGPUGetNumberOfElements (C)Number of elements on GPU for arrayC Syntax#include "gpu/mxGPUArray.h"mwSize
11 Functions — Alphabetical List11-198mxGPUIsSame (C)Determine if two mxGPUArrays refer to same GPU dataC Syntax#include "gpu/mxGPUArray.h"i
mxGPUIsSparse (C)11-199mxGPUIsSparse (C)Determine if mxGPUArray contains sparse GPU dataC Syntax#include "gpu/mxGPUArray.h"int mxGPUIsSpars
11 Functions — Alphabetical List11-200mxGPUIsValidGPUData (C)Determine if mxArray is pointer to valid GPU dataC Syntax#include "gpu/mxGPUArray.h&
mxInitGPU (C)11-201mxInitGPU (C)Initialize MATLAB GPU library on currently selected deviceC Syntax#include "gpu/mxGPUArray.h" int mxInitGPU
11 Functions — Alphabetical List11-202mxIsGPUArray (C)Determine if mxArray contains GPU dataC Syntax#include "gpu/mxGPUArray.h"int mxIsGPUAr
NaN11-203NaNArray of Not-a-NumbersSyntaxA = NaN(sz,arraytype)A = NaN(sz,datatype,arraytype)A = NaN(sz,'like',P)A = NaN(sz,datatype,'li
11 Functions — Alphabetical List11-204Argument Values Descriptions'gpuArray'Specifies gpuArray.datatype'double' (default),'si
NaN11-205Create Codistributed NaN MatrixCreate a 1000-by-1000 codistributed double matrix of NaNs, distributed by its seconddimension (columns).spmd(
Sliced Variables2-27Sliced VariablesA sliced variable is one whose value can be broken up into segments, or slices, whichare then operated on separat
11 Functions — Alphabetical List11-206numlabsTotal number of workers operating in parallel on current jobSyntaxn = numlabsDescriptionn = numlabs retur
ones11-207onesArray of onesSyntaxN = ones(sz,arraytype)N = ones(sz,datatype,arraytype)N = ones(sz,'like',P)N = ones(sz,datatype,'like&
11 Functions — Alphabetical List11-208Argument Values Descriptions'gpuArray'Specifies gpuArray.datatype'double' (default),'si
ones11-209ExamplesCreate Distributed Ones MatrixCreate a 1000-by-1000 distributed array of ones with underlying class double:D = ones(1000,'dist
11 Functions — Alphabetical List11-210pagefunApply function to each page of array on GPUSyntaxA = pagefun(FUN,B)A = pagefun(FUN,B,C,...)[A,B,...] = pa
pagefun11-211FUN must be a handle to a function that is written in the MATLAB language (i.e., not abuilt-in function or a MEX-function).Currently the
11 Functions — Alphabetical List11-212B = rand(K,N,P,'gpuArray');C = pagefun(@mtimes,A,B);s = size(C) % returns M-by-N-by-P s = 300
parallel.cluster.Hadoop11-213parallel.cluster.HadoopCreate Hadoop cluster objectSyntaxhcluster = parallel.cluster.Hadoophcluster = parallel.cluster.H
11 Functions — Alphabetical List11-214Input ArgumentsName-Value Pair ArgumentsSpecify optional comma-separated pairs of Name,Value arguments. Name is
parallel.cluster.Hadoop11-215See Alsomapreduce | mapreducer
2 Parallel for-Loops (parfor)2-28After the first level, you can use any type of valid MATLAB indexing in the second andfurther levels.The variable A s
11 Functions — Alphabetical List11-216parallel.clusterProfilesNames of all available cluster profilesSyntaxALLPROFILES = parallel.clusterProfiles[ALLP
parallel.clusterProfiles11-217allNames = parallel.clusterProfiles()myCluster = parcluster(allNames{end});See Alsoparallel.defaultClusterProfile | par
11 Functions — Alphabetical List11-218parallel.defaultClusterProfileExamine or set default cluster profileSyntaxp = parallel.defaultClusterProfileoldp
parallel.defaultClusterProfile11-219oldDefault = parallel.defaultClusterProfile('Profile2');strcmp(oldDefault,'MyProfile') % retu
11 Functions — Alphabetical List11-220parallel.exportProfileExport one or more profiles to fileSyntaxparallel.exportProfile(profileName, filename)para
parallel.exportProfile11-221notLocal = ~strcmp(allProfiles,'local');profilesToExport = allProfiles(notLocal);if ~isempty(profilesToExport)
11 Functions — Alphabetical List11-222parallel.gpu.CUDAKernelCreate GPU CUDA kernel object from PTX and CU codeSyntaxKERN = parallel.gpu.CUDAKernel(PT
parallel.gpu.CUDAKernel11-223 int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < vecLen) { pi[idx] += c;}and simpleEx.ptx contai
11 Functions — Alphabetical List11-224parallel.importProfileImport cluster profiles from fileSyntaxprof = parallel.importProfile(filename)Descriptionp
parallel.importProfile11-225Import all the profiles from the file ManyProfiles.settings, and use the first one toopen a parallel pool.profs = paralle
Sliced Variables2-29a simple (nonindexed) broadcast variable; and every other index is a scalar constant, asimple broadcast variable, a nested for-lo
11 Functions — Alphabetical List11-226parclusterCreate cluster objectSyntaxc = parclusterc = parcluster(profile)Descriptionc = parcluster returns a cl
parcluster11-227parpool(myCluster);Find a particular cluster using the profile named 'MyProfile', and create anindependent job on the clust
11 Functions — Alphabetical List11-228parfevalExecute function asynchronously on parallel pool workerSyntaxF = parfeval(p,fcn,numout,in1,in2,...)F = p
parfeval11-229for idx = 1:10 f(idx) = parfeval(p,@magic,1,idx); % Square size determined by idxend% Collect the results as they become available.mag
11 Functions — Alphabetical List11-230parfevalOnAllExecute function asynchronously on all workers in parallel poolSyntaxF = parfevalOnAll(p,fcn,numout
parfor11-231parforExecute loop iterations in parallelSyntaxparfor loopvar = initval:endval, statements, endparfor (loopvar = initval:endval, M), stat
11 Functions — Alphabetical List11-232than that number, even if additional workers are available. If you request more resourcesthan are available, MAT
parfor11-233Notably, the assignments to the variables i, t, and u do not affect variables with thesame name in the context of the parfor statement. T
11 Functions — Alphabetical List11-234are necessary for its execution, then automatically attaches those files to the parallelpool so that the code is
parpool11-235parpoolCreate parallel pool on clusterSyntaxparpoolparpool(poolsize)parpool(profilename)parpool(profilename,poolsize)parpool(cluster)par
2 Parallel for-Loops (parfor)2-30However, if it is clear that in every iteration, every reference to an array element isset before it is used, the var
11 Functions — Alphabetical List11-236parpool( ___ ,Name,Value) applies the specified values for certain properties whenstarting the pool.poolobj = pa
parpool11-237Return Pool Object and Delete PoolCreate a parallel pool with the default profile, and later delete the pool.poolobj = parpool;delete(po
11 Functions — Alphabetical List11-238Example: c = parcluster();Name-Value Pair ArgumentsSpecify optional comma-separated pairs of Name,Value argument
parpool11-239More AboutTips• The pool status indicator in the lower-left corner of the desktop shows the clientsession connection to the pool and the
11 Functions — Alphabetical List11-240This slight difference in behavior might be an issue in a mixed-platform environmentwhere the client is not the
parpool11-241P7P8• “Parallel Preferences”• “Clusters and Cluster Profiles”• “Pass Data to and from Worker Sessions”See AlsoComposite | delete | distr
11 Functions — Alphabetical List11-242pausePause MATLAB job scheduler queueSyntaxpause(mjs)Argumentsmjs MATLAB job scheduler object whose queue is pau
pctconfig11-243pctconfigConfigure settings for Parallel Computing Toolbox client sessionSyntaxpctconfig('p1', v1, ...)config = pctconfig(&a
11 Functions — Alphabetical List11-244If the property is 'hostname', the specified value is used to set the hostname forthe client session o
pctRunDeployedCleanup11-245pctRunDeployedCleanupClean up after deployed parallel applicationsSyntaxpctRunDeployedCleanupDescriptionpctRunDeployedClea
Broadcast Variables2-31Broadcast VariablesA broadcast variable is any variable other than the loop variable or a sliced variablethat is not affected
11 Functions — Alphabetical List11-246pctRunOnAllRun command on client and all workers in parallel poolSyntaxpctRunOnAll commandDescriptionpctRunOnAll
pctRunOnAll11-247See Alsoparpool
11 Functions — Alphabetical List11-248ploadLoad file into parallel sessionSyntaxpload(fileroot)Argumentsfileroot Part of filename common to all saved
pload11-249This creates three files (threeThings1.mat, threeThings2.mat,threeThings3.mat) in the current working directory.Clear the workspace on all
11 Functions — Alphabetical List11-250pmodeInteractive Parallel Command WindowSyntaxpmode startpmode start numworkerspmode start prof numworkerspmode
pmode11-251pmode quit or pmode exit stops the pmode job, deletes it, and closes the ParallelCommand Window. You can enter this command at the MATLAB
11 Functions — Alphabetical List11-252pmode start local 4Start pmode using the profile myProfile and eight workers on the cluster.pmode start myProfil
poolStartup11-253poolStartupFile for user-defined options to run on each worker when parallel pool startsSyntaxpoolStartupDescriptionpoolStartup runs
11 Functions — Alphabetical List11-254See AlsojobStartup | taskFinish | taskStartup
promote11-255promotePromote job in MJS cluster queueSyntaxpromote(c,job)Argumentsc The MJS cluster object that contains the job.job Job object promot
vi ContentsReductions: Cumulative Values Updated by Each Iteration . 2-8parfor Programming Considerations . . . . . . . . . . . . . . . . . . 2-10MATL
2 Parallel for-Loops (parfor)2-32Reduction VariablesMATLAB supports an important exception, called reductions, to the rule that loopiterations must be
11 Functions — Alphabetical List11-256Examine the new queue sequence:[pjobs, qjobs, rjobs, fjobs] = findJob(c);get(qjobs,'Name') 'Jo
psave11-257psaveSave data from communicating job sessionSyntaxpsave(fileroot)Argumentsfileroot Part of filename common to all saved files.Description
11 Functions — Alphabetical List11-258Clear the workspace on all the workers and confirm there are no variables.clear allwhosLoad the previously saved
rand11-259randArray of rand valuesSyntaxR = rand(sz,arraytype)R = rand(sz,datatype,arraytype)R = rand(sz,'like',P)R = rand(sz,datatype,&apo
11 Functions — Alphabetical List11-260Argument Values Descriptions'gpuArray'Specifies gpuArray.datatype'double' (default),'si
rand11-261Create Codistributed Rand MatrixCreate a 1000-by-1000 codistributed double matrix of rands, distributed by its seconddimension (columns).sp
11 Functions — Alphabetical List11-262randiArray of random integersSyntaxR = randi(valrange,sz,arraytype)R = randi(valrange,sz,datatype,arraytype)R =
randi11-263Argument Values Descriptions'codistributed'Specifies codistributed array, using the defaultdistribution scheme.'gpuArray&ap
11 Functions — Alphabetical List11-264ExamplesCreate Distributed Randi MatrixCreate a 1000-by-1000 distributed array of randi values from 1 to 100, wi
randn11-265randnArray of randn valuesSyntaxR = randn(sz,arraytype)R = randn(sz,datatype,arraytype)R = randn(sz,'like',P)R = randn(sz,dataty
Reduction Variables2-33parfor i = 1:n X = X + d(i);endThis loop is equivalent to the following, where each d(i) is calculated by a differentiterat
11 Functions — Alphabetical List11-266Argument Values Descriptions'gpuArray'Specifies gpuArray.datatype'double' (default),'si
randn11-267Create Codistributed Randn MatrixCreate a 1000-by-1000 codistributed double matrix of randn values, distributed by itssecond dimension (co
11 Functions — Alphabetical List11-268recreateCreate new job from existing jobSyntaxnewjob = recreate(oldjob)newjob = recreate(oldjob,'TaskID&apo
recreate11-269Recreate a Job with Specified TasksThis example shows how to recreate an independent job, which has only the tasks withIDs 21 to 32 fro
11 Functions — Alphabetical List11-270redistributeRedistribute codistributed array with another distribution schemeSyntaxD2 = redistribute(D1, codist)
reset11-271resetReset GPU device and clear its memorySyntaxreset(gpudev)Descriptionreset(gpudev) resets the GPU device and clears its memory of gpuAr
11 Functions — Alphabetical List11-272M % Display gpuArray 16 2 3 13 5 11 10 8 9 7 6 12 4 14 15
reset11-273clear MSee AlsogpuDevice | gpuArray | parallel.gpu.CUDAKernel
11 Functions — Alphabetical List11-274resumeResume processing queue in MATLAB job schedulerSyntaxresume(mjs)Argumentsmjs MATLAB job scheduler object w
saveAsProfile11-275saveAsProfileSave cluster properties to specified profileDescriptionsaveAsProfile(cluster,profileName) saves the properties of the
2 Parallel for-Loops (parfor)2-34Required (static): If the reduction assignment uses * or [,], then in every reductionassignment for X, X must be cons
11 Functions — Alphabetical List11-276saveProfileSave modified cluster properties to its current profileDescriptionsaveProfile(cluster) saves the modi
saveProfile11-277 Properties: Profile: local Modified: false Host: H
11 Functions — Alphabetical List11-278setConstantMemorySet some constant memory on GPUSyntaxsetConstantMemory(kern,sym,val)setConstantMemory(kern,sym1
setConstantMemory11-279setConstantMemory(KERN,'N1',int32(10));setConstantMemory(KERN,'N2',int32(10));setConstantMemory(KERN,&apos
11 Functions — Alphabetical List11-280setJobClusterDataSet specific user data for job on generic clusterSyntaxsetJobClusterData(cluster,job,userdata)A
size11-281sizeSize of object arraySyntaxd = size(obj)[m,n] = size(obj)[m1,m2,m3,...,mn] = size(obj)m = size(obj,dim)Argumentsobj An object or an arra
11 Functions — Alphabetical List11-282See Alsolength
sparse11-283sparseCreate sparse distributed or codistributed matrixSyntaxSD = sparse(FD)SC = sparse(m,n,codist)SC = sparse(m,n,codist,'noCommuni
11 Functions — Alphabetical List11-284To simplify this six-argument call, you can pass scalars for the argument v and one of thearguments i or j, in w
sparse11-285Create a sparse codistributed array from vectors of indices and a distributed array ofelement values:r = [ 1 1 4 4 8];c = [ 1 4 1 4
Reduction Variables2-35beginning of each iteration. The parfor on the right is correct, because it does not assignf inside the loop:Invalid Validf =
11 Functions — Alphabetical List11-286spmdExecute code in parallel on workers of parallel poolSyntaxspmd, statements, endspmd(n), statements, endspmd(
spmd11-287By default, MATLAB uses as many workers as it finds available in the pool. When thereare no MATLAB workers available, MATLAB executes the b
11 Functions — Alphabetical List11-288• If the AutoAttachFiles property in the cluster profile for the parallel pool is set totrue, MATLAB performs an
submit11-289submitQueue job in schedulerSyntaxsubmit(j)Argumentsj Job object to be queued.Descriptionsubmit(j) queues the job object j in its cluster
11 Functions — Alphabetical List11-290More AboutTipsWhen a job is submitted to a cluster queue, the job’s State property is set to queued,and the job
subsasgn11-291subsasgnSubscripted assignment for CompositeSyntaxC(i) = {B}C(1:end) = {B}C([i1, i2]) = {B1, B2}C{i} = BDescriptionsubsasgn assigns rem
11 Functions — Alphabetical List11-292subsrefSubscripted reference for CompositeSyntaxB = C(i)B = C([i1, i2, ...])B = C{i}[B1, B2, ...] = C{[i1, i2, .
taskFinish11-293taskFinishUser-defined options to run on worker when task finishesSyntaxtaskFinish(task)Argumentstask The task being evaluated by the
11 Functions — Alphabetical List11-294taskStartupUser-defined options to run on worker when task startsSyntaxtaskStartup(task)Argumentstask The task b
true11-295trueArray of logical 1 (true)SyntaxT = true(sz,arraytype)T = true(sz,'like',P)C = true(sz,codist)C = true(sz, ___ ,codist,'n
2 Parallel for-Loops (parfor)2-36parfor statement might produce values of X with different round-off errors. This is anunavoidable cost of parallelism
11 Functions — Alphabetical List11-296reference pages for codistributor1d and codistributor2dbc. To use the defaultdistribution scheme, you can specif
true11-297Each worker contains a 100-by-labindex local piece of C.Create gpuArray True MatrixCreate a 1000-by-1000 gpuArray of trues:G = true(1000,&a
11 Functions — Alphabetical List11-298updateAttachedFilesUpdate attached files or folders on parallel poolSyntaxupdateAttachedFiles(poolobj)Descriptio
updateAttachedFiles11-299See AlsoaddAttachedFiles | gcp | listAutoAttachedFiles | parpool
11 Functions — Alphabetical List11-300waitWait for job to change stateSyntaxwait(j)wait(j,'state')wait(j,'state',timeout)Arguments
wait11-301Note Simulink models cannot run while a MATLAB session is blocked by wait. If youmust run Simulink from the MATLAB client while also runnin
11 Functions — Alphabetical List11-302wait (FevalFuture)Wait for futures to completeSyntaxOK = wait(F)OK = wait(F,STATE)OK = wait(F,STATE,TIMEOUT)Desc
wait (GPUDevice)11-303wait (GPUDevice)Wait for GPU calculation to completeSyntaxwait(gpudev)Descriptionwait(gpudev) blocks execution in MATLAB until
11 Functions — Alphabetical List11-304zerosArray of zerosSyntaxZ = zeros(sz,arraytype)Z = zeros(sz,datatype,arraytype)Z = zeros(sz,'like',P)
zeros11-305Argument Values Descriptions'gpuArray'Specifies gpuArray.datatype'double' (default),'single', 'int8&apo
Reduction Variables2-37f(e,a) = a = f(a,e)Examples of identity elements for some functions are listed in this table.Function Identity Element+ 0* and
11 Functions — Alphabetical List11-306ExamplesCreate Distributed Zeros MatrixCreate a 1000-by-1000 distributed array of zeros with underlying class do
Glossary-1GlossaryCHECKPOINTBASEThe name of the parameter in the mdce_def file thatdefines the location of the checkpoint directories for theMATLAB jo
GlossaryGlossary-2distributed applicationThe same application that runs independently on severalnodes, possibly with different input parameters. There
GlossaryGlossary-3homogeneous clusterA cluster of identical machines, in terms of both hardwareand software.independent jobA job composed of independ
GlossaryGlossary-4nodeA computer that is part of a cluster.parallel applicationThe same application that runs on several workerssimultaneously, with c
GlossaryGlossary-5worker checkpointinformationFiles required by the worker during the execution oftasks.
2 Parallel for-Loops (parfor)2-38First consider the reduction function itself. To compare an iteration's result againstanother's, the functi
Temporary Variables2-39Temporary VariablesA temporary variable is any variable that is the target of a direct, nonindexedassignment, but is not a red
2 Parallel for-Loops (parfor)2-40 b = false; end ... endThis loop is acceptable as an ordinary for-loop, but as a parfor-loop, b is a
Temporary Variables2-41• “Reduction Variables” on page 2-32
viiReduction Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32Basic Rules for Reduction Variables . . . . . . . . . . . .
2 Parallel for-Loops (parfor)2-42Improving parfor PerformanceWhere to Create ArraysWith a parfor-loop, it might be faster to have each MATLAB worker c
Improving parfor Performance2-43before the loop (as shown on the left below), rather than have each worker create its ownarrays inside the loop (as s
2 Parallel for-Loops (parfor)2-44Parallel PoolsIn this section...“What Is a Parallel Pool?” on page 2-44“Automatically Start and Stop a Parallel Pool
Parallel Pools2-45Automatically Start and Stop a Parallel PoolBy default, a parallel pool starts automatically when needed by certain parallel langua
2 Parallel for-Loops (parfor)2-46To open a parallel pool based on your preference settings:parpoolTo open a pool of a specific size:parpool(4)To use a
Parallel Pools2-47If you specify a pool size at the command line, this overrides the setting of yourpreferences. But this value must fall within the
2 Parallel for-Loops (parfor)2-48Convert Nested for-Loops to parforA typical use case for nested loops is to step through an array using one loop vari
Convert Nested for-Loops to parfor2-49M1 = magic(5);for a = 1:5 parfor b = 1:5 M2(a,b) = a*10 + b + M1(a,b)/10000; endendM2In this case,
3Single Program Multiple Data (spmd)• “Execute Simultaneously on Multiple Data Sets” on page 3-2• “Access Worker Variables with Composites” on page 3-
viii ContentsCreate Codistributed Arrays . . . . . . . . . . . . . . . . . . . . . . . . 3-11Programming Tips . . . . . . . . . . . . . . . . . . . .
3 Single Program Multiple Data (spmd)3-2Execute Simultaneously on Multiple Data SetsIn this section...“Introduction” on page 3-2“When to Use spmd” on
Execute Simultaneously on Multiple Data Sets3-3Define an spmd StatementThe general form of an spmd statement is:spmd <statements>endNote If
3 Single Program Multiple Data (spmd)3-4 R = rand(4,4);endNote All subsequent examples in this chapter assume that a parallel pool is open andremai
Execute Simultaneously on Multiple Data Sets3-5Display OutputWhen running an spmd statement on a parallel pool, all command-line output from theworke
3 Single Program Multiple Data (spmd)3-6Access Worker Variables with CompositesIn this section...“Introduction to Composites” on page 3-6“Create Compo
Access Worker Variables with Composites3-7 3 5 7 4 9 2MM{2} 16 2 3 13 5 11 10 8 9 7 6
3 Single Program Multiple Data (spmd)3-8 0 0 1 0 0 0 0 1Data transfers from worker to client when you explicit
Access Worker Variables with Composites3-9AA(:) % Composite [1] [2] [3] [4]spmd AA = AA * 2; % Multiply existing valueendAA(:) % Com
3 Single Program Multiple Data (spmd)3-10Distribute ArraysIn this section...“Distributed Versus Codistributed Arrays” on page 3-10“Create Distributed
Distribute Arrays3-11requirements. These overloaded functions include eye(___,'distributed'),rand(___,'distributed'), etc. For a
ixWorking with Codistributed Arrays . . . . . . . . . . . . . . . . . . . . 5-5How MATLAB Software Distributes Arrays . . . . . . . . . . . . . 5-5Cre
3 Single Program Multiple Data (spmd)3-12parpool('local',2) % Create poolspmd codist = codistributor1d(3,[4,12]); Z = zeros(3,3,16,cod
Programming Tips3-13Programming TipsIn this section...“MATLAB Path” on page 3-13“Error Handling” on page 3-13“Limitations” on page 3-13MATLAB PathAll
3 Single Program Multiple Data (spmd)3-14X = 5;spmd eval('X');endSimilarly, you cannot clear variables from a worker's workspace by
Programming Tips3-15run in parallel in another parallel pool, but runs serially in a single thread on the workerrunning its containing function.Neste
4Interactive Parallel Computation withpmodeThis chapter describes interactive pmode in the following sections:• “pmode Versus spmd” on page 4-2• “Run
4 Interactive Parallel Computation with pmode4-2pmode Versus spmdpmode lets you work interactively with a communicating job running simultaneouslyon s
Run Communicating Jobs Interactively Using pmode4-3Run Communicating Jobs Interactively Using pmodeThis example uses a local scheduler and runs the w
4 Interactive Parallel Computation with pmode4-44A variable does not necessarily have the same value on every worker. The labindexfunction returns the
Run Communicating Jobs Interactively Using pmode4-57Assign a unique value to the array on each worker, dependent on the worker number(labindex). With
Kommentare zu diesen Handbüchern