€•.TŒsphinx.addnodes”Œdocument”“”)”}”(Œ rawsource”Œ”Œchildren”]”(Œ translations”Œ LanguagesNode”“”)”}”(hhh]”(hŒ pending_xref”“”)”}”(hhh]”Œdocutils.nodes”ŒText”“”ŒChinese (Simplified)”…””}”Œparent”hsbaŒ attributes”}”(Œids”]”Œclasses”]”Œnames”]”Œdupnames”]”Œbackrefs”]”Œ refdomain”Œstd”Œreftype”Œdoc”Œ reftarget”Œ&/translations/zh_CN/accel/introduction”Œmodname”NŒ classname”NŒ refexplicit”ˆuŒtagname”hhh ubh)”}”(hhh]”hŒChinese (Traditional)”…””}”hh2sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ&/translations/zh_TW/accel/introduction”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒItalian”…””}”hhFsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ&/translations/it_IT/accel/introduction”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒJapanese”…””}”hhZsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ&/translations/ja_JP/accel/introduction”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒKorean”…””}”hhnsbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ&/translations/ko_KR/accel/introduction”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubh)”}”(hhh]”hŒSpanish”…””}”hh‚sbah}”(h]”h ]”h"]”h$]”h&]”Œ refdomain”h)Œreftype”h+Œ reftarget”Œ&/translations/sp_SP/accel/introduction”Œmodname”NŒ classname”NŒ refexplicit”ˆuh1hhh ubeh}”(h]”h ]”h"]”h$]”h&]”Œcurrent_language”ŒEnglish”uh1h hhŒ _document”hŒsource”NŒline”NubhŒcomment”“”)”}”(hŒ SPDX-License-Identifier: GPL-2.0”h]”hŒ SPDX-License-Identifier: GPL-2.0”…””}”hh£sbah}”(h]”h ]”h"]”h$]”h&]”Œ xml:space”Œpreserve”uh1h¡hhhžhhŸŒ@/var/lib/git/docbuild/linux/Documentation/accel/introduction.rst”h KubhŒsection”“”)”}”(hhh]”(hŒtitle”“”)”}”(hŒ Introduction”h]”hŒ Introduction”…””}”(hh»hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hh¶hžhhŸh³h KubhŒ paragraph”“”)”}”(hŒœThe Linux compute accelerators subsystem is designed to expose compute accelerators in a common way to user-space and provide a common set of functionality.”h]”hŒœThe Linux compute accelerators subsystem is designed to expose compute accelerators in a common way to user-space and provide a common set of functionality.”…””}”(hhËhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Khh¶hžhubhÊ)”}”(hXThese devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. Although these devices are typically designed to accelerate Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer is not limited to handling these types of accelerators.”h]”hXThese devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. Although these devices are typically designed to accelerate Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer is not limited to handling these types of accelerators.”…””}”(hhÙhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K hh¶hžhubhÊ)”}”(hŒPTypically, a compute accelerator will belong to one of the following categories:”h]”hŒPTypically, a compute accelerator will belong to one of the following categories:”…””}”(hhçhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Khh¶hžhubhŒ bullet_list”“”)”}”(hhh]”(hŒ list_item”“”)”}”(hŒ×Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, or an IP inside a SoC (e.g. laptop web camera). These devices are typically configured using registers and can work with or without DMA. ”h]”hÊ)”}”(hŒÖEdge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, or an IP inside a SoC (e.g. laptop web camera). These devices are typically configured using registers and can work with or without DMA.”h]”hŒÖEdge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, or an IP inside a SoC (e.g. laptop web camera). These devices are typically configured using registers and can work with or without DMA.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Khhüubah}”(h]”h ]”h"]”h$]”h&]”uh1húhh÷hžhhŸh³h Nubhû)”}”(hX÷Inference data-center - single/multi user devices in a large server. This type of device can be stand-alone or an IP inside a SoC or a GPU. It will have on-board DRAM (to hold the DL topology), DMA engines and command submission queues (either kernel or user-space queues). It might also have an MMU to manage multiple users and might also enable virtualization (SR-IOV) to support multiple VMs on the same device. In addition, these devices will usually have some tools, such as profiler and debugger. ”h]”hÊ)”}”(hXöInference data-center - single/multi user devices in a large server. This type of device can be stand-alone or an IP inside a SoC or a GPU. It will have on-board DRAM (to hold the DL topology), DMA engines and command submission queues (either kernel or user-space queues). It might also have an MMU to manage multiple users and might also enable virtualization (SR-IOV) to support multiple VMs on the same device. In addition, these devices will usually have some tools, such as profiler and debugger.”h]”hXöInference data-center - single/multi user devices in a large server. This type of device can be stand-alone or an IP inside a SoC or a GPU. It will have on-board DRAM (to hold the DL topology), DMA engines and command submission queues (either kernel or user-space queues). It might also have an MMU to manage multiple users and might also enable virtualization (SR-IOV) to support multiple VMs on the same device. In addition, these devices will usually have some tools, such as profiler and debugger.”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Khjubah}”(h]”h ]”h"]”h$]”h&]”uh1húhh÷hžhhŸh³h Nubhû)”}”(hXTraining data-center - Similar to Inference data-center cards, but typically have more computational power and memory b/w (e.g. HBM) and will likely have a method of scaling-up/out, i.e. connecting to other training cards inside the server or in other servers, respectively. ”h]”hÊ)”}”(hXTraining data-center - Similar to Inference data-center cards, but typically have more computational power and memory b/w (e.g. HBM) and will likely have a method of scaling-up/out, i.e. connecting to other training cards inside the server or in other servers, respectively.”h]”hXTraining data-center - Similar to Inference data-center cards, but typically have more computational power and memory b/w (e.g. HBM) and will likely have a method of scaling-up/out, i.e. connecting to other training cards inside the server or in other servers, respectively.”…””}”(hj0hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K hj,ubah}”(h]”h ]”h"]”h$]”h&]”uh1húhh÷hžhhŸh³h Nubeh}”(h]”h ]”h"]”h$]”h&]”Œbullet”Œ-”uh1hõhŸh³h Khh¶hžhubhÊ)”}”(hXQAll these devices typically have different runtime user-space software stacks, that are tailored-made to their h/w. In addition, they will also probably include a compiler to generate programs to their custom-made computational engines. Typically, the common layer in user-space will be the DL frameworks, such as PyTorch and TensorFlow.”h]”hXQAll these devices typically have different runtime user-space software stacks, that are tailored-made to their h/w. In addition, they will also probably include a compiler to generate programs to their custom-made computational engines. Typically, the common layer in user-space will be the DL frameworks, such as PyTorch and TensorFlow.”…””}”(hjLhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K%hh¶hžhubhµ)”}”(hhh]”(hº)”}”(hŒSharing code with DRM”h]”hŒSharing code with DRM”…””}”(hj]hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hjZhžhhŸh³h K,ubhÊ)”}”(hX!Because this type of devices can be an IP inside GPUs or have similar characteristics as those of GPUs, the accel subsystem will use the DRM subsystem's code and functionality. i.e. the accel core code will be part of the DRM subsystem and an accel device will be a new type of DRM device.”h]”hX#Because this type of devices can be an IP inside GPUs or have similar characteristics as those of GPUs, the accel subsystem will use the DRM subsystem’s code and functionality. i.e. the accel core code will be part of the DRM subsystem and an accel device will be a new type of DRM device.”…””}”(hjkhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K.hjZhžhubhÊ)”}”(hŒýThis will allow us to leverage the extensive DRM code-base and collaborate with DRM developers that have experience with this type of devices. In addition, new features that will be added for the accelerator drivers can be of use to GPU drivers as well.”h]”hŒýThis will allow us to leverage the extensive DRM code-base and collaborate with DRM developers that have experience with this type of devices. In addition, new features that will be added for the accelerator drivers can be of use to GPU drivers as well.”…””}”(hjyhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K4hjZhžhubeh}”(h]”Œsharing-code-with-drm”ah ]”h"]”Œsharing code with drm”ah$]”h&]”uh1h´hh¶hžhhŸh³h K,ubhµ)”}”(hhh]”(hº)”}”(hŒDifferentiation from GPUs”h]”hŒDifferentiation from GPUs”…””}”(hj’hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hjhžhhŸh³h K:ubhÊ)”}”(hŒçBecause we want to prevent the extensive user-space graphic software stack from trying to use an accelerator as a GPU, the compute accelerators will be differentiated from GPUs by using a new major number and new device char files.”h]”hŒçBecause we want to prevent the extensive user-space graphic software stack from trying to use an accelerator as a GPU, the compute accelerators will be differentiated from GPUs by using a new major number and new device char files.”…””}”(hj hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h K`_ - Oded Gabbay (2022)”h]”hÊ)”}”(hjÄh]”(hŒ reference”“”)”}”(hŒ¦`Initial discussion on the New subsystem for acceleration devices `_”h]”hŒ@Initial discussion on the New subsystem for acceleration devices”…””}”(hjËhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”Œname”Œ@Initial discussion on the New subsystem for acceleration devices”Œrefuri”Œ`https://lore.kernel.org/lkml/CAFCwf11=9qpNAepL7NL+YAV_QO=Wv6pnWPhKHKAepK3fNn+2Dg@mail.gmail.com/”uh1jÉhjÆubhŒtarget”“”)”}”(hŒc ”h]”h}”(h]”Œ@initial-discussion-on-the-new-subsystem-for-acceleration-devices”ah ]”h"]”Œ@initial discussion on the new subsystem for acceleration devices”ah$]”h&]”Œrefuri”jÜuh1jÝŒ referenced”KhjÆubhŒ - Oded Gabbay (2022)”…””}”(hjÆhžhhŸNh Nubeh}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KhhjÂubah}”(h]”h ]”h"]”h$]”h&]”uh1húhj¿hžhhŸh³h Nubhû)”}”(hŒ…`patch-set to add the new subsystem `_ - Oded Gabbay (2022) ”h]”hÊ)”}”(hŒ„`patch-set to add the new subsystem `_ - Oded Gabbay (2022)”h]”(jÊ)”}”(hŒo`patch-set to add the new subsystem `_”h]”hŒ"patch-set to add the new subsystem”…””}”(hjhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”Œname”Œ"patch-set to add the new subsystem”jÛŒGhttps://lore.kernel.org/lkml/20221022214622.18042-1-ogabbay@kernel.org/”uh1jÉhjubjÞ)”}”(hŒJ ”h]”h}”(h]”Œ"patch-set-to-add-the-new-subsystem”ah ]”h"]”Œ"patch-set to add the new subsystem”ah$]”h&]”Œrefuri”juh1jÝjìKhjubhŒ - Oded Gabbay (2022)”…””}”(hjhžhhŸNh Nubeh}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h Kihjýubah}”(h]”h ]”h"]”h$]”h&]”uh1húhj¿hžhhŸh³h Nubeh}”(h]”h ]”h"]”h$]”h&]”jJŒ*”uh1hõhŸh³h Khhj®hžhubeh}”(h]”Œ email-threads”ah ]”h"]”Œ email threads”ah$]”h&]”uh1h´hjhžhhŸh³h Kfubhµ)”}”(hhh]”(hº)”}”(hŒConference talks”h]”hŒConference talks”…””}”(hjEhžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”uh1h¹hjBhžhhŸh³h Klubhö)”}”(hhh]”hû)”}”(hŒ`LPC 2022 Accelerators BOF outcomes summary `_ - Dave Airlie (2022)”h]”hÊ)”}”(hjXh]”(jÊ)”}”(hŒ{`LPC 2022 Accelerators BOF outcomes summary `_”h]”hŒ*LPC 2022 Accelerators BOF outcomes summary”…””}”(hj]hžhhŸNh Nubah}”(h]”h ]”h"]”h$]”h&]”Œname”Œ*LPC 2022 Accelerators BOF outcomes summary”jÛŒKhttps://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html”uh1jÉhjZubjÞ)”}”(hŒN ”h]”h}”(h]”Œ*lpc-2022-accelerators-bof-outcomes-summary”ah ]”h"]”Œ*lpc 2022 accelerators bof outcomes summary”ah$]”h&]”Œrefuri”jmuh1jÝjìKhjZubhŒ - Dave Airlie (2022)”…””}”(hjZhžhhŸNh Nubeh}”(h]”h ]”h"]”h$]”h&]”uh1hÉhŸh³h KnhjVubah}”(h]”h ]”h"]”h$]”h&]”uh1húhjShžhhŸh³h Nubah}”(h]”h ]”h"]”h$]”h&]”jJj9uh1hõhŸh³h KnhjBhžhubeh}”(h]”Œconference-talks”ah ]”h"]”Œconference talks”ah$]”h&]”uh1h´hjhžhhŸh³h Klubeh}”(h]”Œexternal-references”ah ]”h"]”Œexternal references”ah$]”h&]”uh1h´hh¶hžhhŸh³h Kcubeh}”(h]”Œ introduction”ah ]”h"]”Œ introduction”ah$]”h&]”uh1h´hhhžhhŸh³h Kubeh}”(h]”h ]”h"]”h$]”h&]”Œsource”h³uh1hŒcurrent_source”NŒ current_line”NŒsettings”Œdocutils.frontend”ŒValues”“”)”}”(h¹NŒ generator”NŒ datestamp”NŒ source_link”NŒ source_url”NŒ toc_backlinks”Œentry”Œfootnote_backlinks”KŒ sectnum_xform”KŒstrip_comments”NŒstrip_elements_with_classes”NŒ strip_classes”NŒ report_level”KŒ halt_level”KŒexit_status_level”KŒdebug”NŒwarning_stream”NŒ traceback”ˆŒinput_encoding”Œ utf-8-sig”Œinput_encoding_error_handler”Œstrict”Œoutput_encoding”Œutf-8”Œoutput_encoding_error_handler”jÌŒerror_encoding”Œutf-8”Œerror_encoding_error_handler”Œbackslashreplace”Œ language_code”Œen”Œrecord_dependencies”NŒconfig”NŒ id_prefix”hŒauto_id_prefix”Œid”Œ dump_settings”NŒdump_internals”NŒdump_transforms”NŒdump_pseudo_xml”NŒexpose_internals”NŒstrict_visitor”NŒ_disable_config”NŒ_source”h³Œ _destination”NŒ _config_files”]”Œ7/var/lib/git/docbuild/linux/Documentation/docutils.conf”aŒfile_insertion_enabled”ˆŒ raw_enabled”KŒline_length_limit”M'Œpep_references”NŒ pep_base_url”Œhttps://peps.python.org/”Œpep_file_url_template”Œpep-%04d”Œrfc_references”NŒ rfc_base_url”Œ&https://datatracker.ietf.org/doc/html/”Œ tab_width”KŒtrim_footnote_reference_space”‰Œsyntax_highlight”Œlong”Œ smart_quotes”ˆŒsmartquotes_locales”]”Œcharacter_level_inline_markup”‰Œdoctitle_xform”‰Œ docinfo_xform”KŒsectsubtitle_xform”‰Œ image_loading”Œlink”Œembed_stylesheet”‰Œcloak_email_addresses”ˆŒsection_self_link”‰Œenv”NubŒreporter”NŒindirect_targets”]”Œsubstitution_defs”}”Œsubstitution_names”}”Œrefnames”}”Œrefids”}”Œnameids”}”(j¦j£jŒj‰jjjšj—jžj›j?j<jèjåjjj–j“jwjtuŒ nametypes”}”(j¦‰jŒ‰j‰jš‰jž‰j?‰jèˆjˆj–‰jwˆuh}”(j£h¶j‰jZjjj—j!j›jj<j®jåjßjjj“jBjtjnuŒ footnote_refs”}”Œ citation_refs”}”Œ autofootnotes”]”Œautofootnote_refs”]”Œsymbol_footnotes”]”Œsymbol_footnote_refs”]”Œ footnotes”]”Œ citations”]”Œautofootnote_start”KŒsymbol_footnote_start”KŒ id_counter”Œ collections”ŒCounter”“”}”…”R”Œparse_messages”]”Œtransform_messages”]”Œ transformer”NŒ include_log”]”Œ decoration”Nhžhub.