• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

prefetch_tuning: This module is based on Kunpeng chip and provides some performa ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

prefetch_tuning

开源软件地址:

https://gitee.com/openeuler/prefetch_tuning

开源软件介绍:

English | 简体中文

Prefetch Tuning

该内核模块是基于鲲鹏芯片设计的寄存器读写接口,用于读取和配置在CPU的硬件层面的芯片性能调优参数。

运行环境

  • 硬件: 基于Kunpeng 920芯片

  • 操作系统: openEuler 或 EulerOS

编译、安装、卸载

make clean && makeinsmod prefetch_tuning.ko

可以用以下命令确认该内核模块已经插入:

lsmod | grep prefetch_tuning

使用以下命令以卸载该内核模块:

rmmod prefetch_tuning

卸载模块的操作将使所有设定参数恢复默认。

可调节参数

目前支持以下参数的调节(后续可能会增加更多寄存器位的支持):

参数名称参数意义可调节范围(可能是0-1值,或者int值)
policyPrefetch policy0 ~ 15
read_uniqueWhether to allow cross-numa access to cache.0 (allow), 1 (forbid)
reg_nosnp_atomic_bypass_enWhether to bypass atomic operations of CPUs.0 (disable), 1 (enable)
reg_ro_alloc_shut_enIndicates whether to enable the function of allocating readOnce operations to L3.0 (disable), 1(enable)
reg_wrfull_hit_shut_enIndicates whether to disable the function. When 64wu_full hits pipe M, do not send createE to the HA.1 (enable), 0 (disable)
req_conflict_enWhether to enable backpressure on the CPU in one beat if requests from the CPU and non-CPU are received at the same beat.1 (enable), 0 (disable)
lower_power_enWhether to enable the CQ low-power mode.1 (enable), 0 (disable)
dataclean_shut_enWhether to mask the CE bit carried by the writenosnoopfull of the TaiShan kernel.0 (not shielded), 1 (shielded)
arb_flush_shut_enWhen the CQ is idle, the enable status of the ARBIT adjustment scheduling pointer is reset.0 (enable), 1(disable)
pgnt_arb_exat_shut_enWhether to enable the scheduling of the pgrant in the try mode.0 (enable), 1 (disable)
fast_exter_shut_enWhether to disable the external request data of miss from passing through the fast path.1(disable), 0 (enable)
fast_data_shut_enWhether to disable the miss data from passing through the fast path.1 (disable), 0 (enable)
pend_data_shut_enWhether to disable the miss data from passing through the pend channel.1 (disable), 0 (enable)
ramswap_full_shut_enFull or partial when doing ramswap.0 (full), 1 (partial)
ramfwd_shut_enWhether to enable the ramfwd function.1 (disable), 0 (enable)
reads_upgrade_enEnable read_shared operation status promotion.0 (disable), 1(enable)
rdmerge_pipe_enWhether to allow Sqmerge requests to be hit in Cpipe5.1 (enable), 0 (disable)
spill_enWhether the L3T spin function is enabled.0 (disable), 1 (enable)
spill_shared_enWhether L3T enables the Spill function in the shared state.0 (disable), 1(enable)
spill_instr_enWhether the L3T enables the instruction spear function.0 (disable), 1(enable)
sqrdmerge_enEnable RDMERGE acceleration after SQ merge operation.0 (disable), 1(enable)
prefetch_drop_enWhether to discard prefetch packets in L3T mode.0 (disable), 1(enable)
datapull_enWhether the L3T enables the data pull function.0 (disable), 1(enable)
mkinvld_enWhether L3T is enabled to convert makelinvalid to cleaninvalid.0 (enable), 1(disable)
ramthr_enWhether to allow L3D to directly return data to the CPU through the thr channel.1 (enable), 0 (disable)
rsperr_enIndicates whether to report rsperr.1 (enable), 0 (disable)
iocapacity_limit_enWhether to limit the io capacity of cache.0 (no limit), 1 (limit)
force_cq_clk_enWhether to enable the cache queue clock forcibly for L3T.0 (disable), 1(enable)
sqmerge_enWhether consecutive address access can occupy only one entry in the squeue to accelerate the merge process.0 (limit), 1 (merge)
rdmerge_upgrade_enWhether to allow the RS to merge with the preceding ReadE.0 (disable), 1 (allow)
prefetch_drop_hha_enWhether to merge a non-prefetch operation with the previous prefetch operation.0 (allow), 1 (limit)
tag_rep_algChoose cache line algorithm.0 (random), 1 (drrip), 2 (plru), 3 (random)
rdnosnp_nca_shut_enWhether to mark the readnosnp of the bypass sent by the CPU as NCA.0 (yes), 1 (no)
wrfull_create_enWhether to enable the 128-byte writeunique function to obtain the permission but not data from the HHA.0 (disable), 1 (enable)
cleanunique_data_enWhether cleanunqunie returns data.0 (disable), 1(enable)
lock_share_req_enWhether to enable the register lock in share mode and not to deliver operations to the HHA.0 (prohibited), 1 (allowed)
ddr_compress_opt_enOptimization switch of support HHA compression access.0 (disable), 1 (enable)
atomic_monitor_enSpecifies whether to enable the atomic_monitor function.0 (disable), 1 (enable)
snpsleep_enWhether to enable snp sleep.0 (disable), 1 (enable)
prefetchtgt_enWhether to enable the prefetchtgt.0 (disable), 1 (enable)
sequence_shape_enEnable to push back to the CPU for several cycles when the SQ is about to be full.0 (disable), 1 (enable)
mpam_portion_enEnable the function of allocating MPAM based on the way.0 (disable), 1 (enable)
mpam_capacity_enEnable the function of allocating MPAM based on capacity statistics.0 (disable), 1 (enable)
eccchk_enEnable ECC_CHK.0 (disable), 1 (enable)
refill_1024_relax_enWhether to use the 1024-bit size to send requests for access.0 (disable), 1 (enable)
lookup_thr_enWhether to enable the through channel during pipeline query.0 (disable), 1 (enable)
snpunique_stash_enSupport to receive hydra SnpUniqueStash.0 (forbid), 1 (support)
prime_timeout_mask_enEnable the count for timeout.0 (disable), 1 (enable)
prime_sleep_mask_enEnable the function of releasing a sleep request after a period of time.0 (disable), 1 (enable)
prime_extend_mask_enWhether to enable random allocation of a request to extendway.0 (disable), 1 (enable)
force_intl_allocate_failEnable the function of forcibly determining that the assign operation of the intleave type fails.0 (disable), 1 (enable)
cpu_write_unique_stream_enWhether to forcibly process the writeunique operation delivered by the CPU as the stream type.0 (disable), 1 (enable)
cpu_pf_lqos_enWhether to enable the prefetch operation delivered by the CPU to be forcibly processed as the lqos operation.0 (disable), 1 (enable)
cpu_vic_lqos_enWhether to forcibly process the writeunique operation delivered by the CPU as the stream type.0 (disable), 1 (enable)
prime_excl_mask_enWhether to enable the random exclusive operation.0 (disable), 1 (enable)
prime_drop_mask_enWhether to enable prefetch to retry randomly.0 (disable), 1 (enable)
prime_home_mask_enEnable the forcehome processing on internal requests randomly.0 (disable), 1 (enable)
refillsize_com_ada_enWhether to enable the auto-sensing of the size of the request sent to the HHA. If the size of the continuously received requests is 128 bytes or 64 bytes, the size of the prefetched request is automatically adjusted.0 (disable), 1 (enable adaptive size adjustment)
refillsize_pre_ada_enWhether to enable the adaptation of the size of the request sent to the HHA. If the size of the continuously received request is 128 bytes or 64 bytes, the size of the normal request is automatically adjusted.0 (disable), 1 (enable adaptive size adjustment)
sequence_opt_enWhether to change the L3T processing to serial mode when blocked.0 (limit), 1 (enable)
prefetch_clr_levelNumber of requests that fail to find the corresponding prefetch buffer and lower the priority of each buffer to make the existing buffer easier to replace.0 ~ 255
prefetch_overide_levelInitial coverage priority for an operation to enter the prefetch buffer. If the value is incorrect, the threshold is decreased by 1. If the value is correct, the threshold is increased by 1. If the value is 0, the prefetch rule needs to be replaced.0 ~ 15
prefetch_utl_ddrThe utilization of ddr that leads to the halving the threshold of prefetch.0 (less than 1/2), 1 (1/2), 2 (3/4), 3 (almost full)
prefetch_utl_ddr_enWhether to allow the automatic threshold reduction according to the utilization of ddr.0 (forbid), 1 (allow).
prefetch_utl_l3tThe utilization of l3t that leads to the halving the threshold of prefetch.0 (less thean 1/2), 1 (1/2), 2 (3/4), 3 (almost full)
prefetch_utl_l3t_enWhether to allow the automatic threshold reduction according to the utilization of l3t.0 (forbid), 1 (allow)
prefetch_vague_enIndicates whether to enable fuzzy match for prefetch. After the function is enabled, the prefetch summarizes the same 16 KB address rule. The four 4 KB address rules are the same and can be used together.0 (disable), 1 (enable)
prefetch_core_enWhether to enable core prefetch. Every bit 1 setting indicates that the matching core request needs to be prefetched. e.g. 0x1001 represents core 1 and core 4 enable prefetch. Note: this parameter controls the L3T_PREFETCH register which determines the cpu prefetch policies, so set every bit to 1 to enable prefetch policies which can be configurated by parameters prefixed with 'prefetch' in this module. (default disabled)0 ~ 15
prefetch_match_enWhether to enable the prefetch operation after the prefetch hit.0 (disable), 1 (enable)
prefetch_start_levelThe number of missing addresses that leads to prefetch. 0 -> 32, 1 -> 2, n-1 -> n0 ~ 31
pime_timeout_numThe maximum count of timeout.0 ~ 65535
reg_ctrl_spillprefetchSnoop type configuration of the spill.0 (type of request), 1 (prefetch)
reg_ctrl_mpamenEnable HHA MPAM scheduling.0 (disable), 1 (enable)
reg_ctrl_mpamqosEnable QoS for modifying the DDR read/write command based on the MPAM monitoring and control bandwidth.0 (disable), 1 (enable)
reg_ctrl_poisonEnable HHA to return poison.0 (disable), 1 (enable)
reg_ctrl_compress_specEnable the random read of 128-byte data in HHA memory data compression.0 (disable), 1 (enable)
reg_ctrl_writeevict_dropEnable the discard of WriteEvictI.0 (disable), 1 (enable)
reg_ctrl_prefetch_dropWhether to enable prefetch operation discard.0 (disable), 1 (enable)
reg_ctrl_dmcassignDDR access address alignment enable.0 (The DDR read operation uses the wrap mode, and the address is 32-byte-aligned. The DDR write operation is always in INCR mode, and the address is aligned with the access boundary), 1 (The DDR read operation is always in INCR mode, and the address is aligned with the access boundary. The DDR write operation is always in INCR mode, and the address is aligned with the access boundary)
reg_ctrl_rdatabypDDR read data bypass memory enable in the HHA.0 (disable), 1 (The internal data of the HHA is bypassed, and the DDR read data can be transmitted quickly)
reg_ctrl_excl_clear_disWhether to disable the function of periodically clearing HHA non-cacheable exclusive monitor.0 (enable), 1 (disable)
reg_ctrl_excl_eventenEnable HHA non-cacheable exclusive monitor event. An event can be sent to wake up the CPU when an address is successfully written or corrupted.0 (disable), 1 (enable)
reg_ctrl_eccenEnable the memory ECC error correction in the HHA.0 (disable), 1 (enable)
reg_readoncesnp_disDisable NCA Readonce fixed snoop.0 (enable), 1 (disable)
reg_cc_exter_stashL3T configuration of extern snoop stash.0 (forbid), 1 (allow)
reg_cc_writebacki_spill_fullEnable fixed 128-byte data spill of the WritebackI operation.0 (disable), 1 (enable)
reg_cc_writeevicti_spill_fullEnable fixed 128-byte data spill of the WriteEvictI operation.0 (disable), 1 (enable)
reg_cc_stashonce_fullEnable fixed 128-byte data stash of the StashOnce operation.0 (disable), 1 (enable)
reg_cc_atomicstashl2Enable L2 stash of atomic operations.0 (disable), 1 (enable)
reg_cc_atomicstashl3Enable L3 stash of atomic operations.0 (disable), 1 (enable)
reg_cc_atomicstashclrClear L3 stash monitor of atomic operations.0 (disable), 1 (enable)
reg_cc_cmo_snpmeEnable snoop me for CMO operations.0 (disable), 1 (enable)
reg_cc_makee_changeEnable HHA MakeE conversion to readE when the HHA MakeE is not self-hit.0 (disable), 1 (enable)
reg_cc_ioc_hitsca_disDisable the function of recording CAIDs when the HHA I/O cache hits the exact directory.0 (enable), 1 (disable)
reg_cc_passdirtyEnable HHA pass dirty.0 (disable), 1 (enable)
reg_cc_snpdropEnable Snoop Drop.0 (disable), 1 (enable)
reg_cc_spillEnable local multi-partition sharing.0 (disable), 1 (enable)
reg_precisionsnp_disDisable HHA precise snoop based on shared directories.0 (disable), 1 (enable)
reg_notonly_exclWhether to create new entries for exclusive operations in the HHA share directory buffer.0 (only for exclusive operations), 1 (for all of operations)
reg_miss_allindexEnable that HHA miss queues are related based on index.0 (disable), 1 (enable)
reg_miss_cbackthEnable HHA miss queue copyback request to use second threshold.0 (disable), 1 (enable)
reg_miss_normalthEnable HHA miss queue common request to use second threshold.0 (disable), 1 (enable)
reg_miss_tosdirEnable HHA only to allow miss alloc to be sent to sdir.0 (disable), 1 (enable)
reg_entry_exceptExclude the same entry address in HHA.0 (disable), 1 (enable)
reg_dir_replace_algDirectory replacement algorithm configuration.0 (EDIR random + SDIR random), 1 (EDIR random + SDIR polling), 2 (EDIR PLRU + SDIR random), 3 (EDIR PLRU + SDIR polling)
strict_orderKeep the order of HHA operation queue strictly.0 (disable), 1 (enable)
prefetch_combRead operation and prefetchtgt merge enable.0 (The read operation can be merged with the fetchtgt operation), 1 (The read operation and the fetchtgt merge operation are not allowed)
evict_greenUnblocking configuration of the evict in PQ.0 (evict can't be blocked), 1 (evict can be blocked)
block_retryWhether to perform retry configuration directly when the MPAM hardlim flow bandwidth exceeds the configured one such that enters CMD.0 (retry directly), 1 (don't retry directly and be scheduled with other flows)
buffer_prioPriority configuration for the ingress queue of the CMD buffer request and PGNT application.0 (CMD buffer priors to pgnt), 1 (CMD buffer and pgnt have equivalent priority)
half_wr_rdddr_delayEnables the DDR read delay during 64-byte full write operations after compression.0 (disable), 1 (enable)
wback_cnfl_rdhalfDDR size configuration that is reread when the writeback conflict occurs.0 (depend on Writeback address and size), 1 (size = 128B)
reg_funcdis_pendprecisionEnable precise pend.0 (pend is precisely depend on flit), 1 (pend = 1)
reg_funcdis_combrdddrReread DDR after multiple adjacent narrow write operations are merged.0, 1
reg_funcdis_scrambleIngress queue scrambling.0 (disable), 1 (enable)
reg_funcdis_stashidpgWhether to enable the partial good conversion of the Stash TGTID.0 (disable), 1 (enable)
reg_funcdis_rdatatimeHHA receives DMC read data anti-starvation threshold configuration.0 (threshold = 8), 1 (threshold = 4)
reg_funcdis_dmcutlDMC usage source selection.0 (from DDRC), 1 (from queue processing utilization ratio inside HHA)
reg_funcdis_cancelexceptThe pipeline index check excludes requests that are not actually queried (for example, prefetchtgt).0 (enable exclusion), 1 (disable exclusion)
reg_funcdis_ccixcbupdateWhether to update the directory in the CCIX copyback of the multi-CA.0 (allow), 1 (forbid)
reg_funcdis_updateopenBlock the update dir command in the processing queue based on index.0 (disable), 1 (enable)
reg_funcdis_combWhether to merge write operations whose size is less than 128 bytes.0 (enable), 1 (disables the merge function of the write operation)
reg_prefetchtgt_outstandingOutstanding configuration for the HHA to read data from the DDR prefetch. When the read/write operation sent by the HHA to the DDR exceeds the threshold, the prefetchtgt operation is forbidden to read the DDR data and the operation is directly discarded. This configuration and reg_prefetch_outstanding control the prefetch threshold at the same time.0 ~ 127
reg_prefetchtgt_levelThreshold for the HHA to read data from the DDR prefetch. When the DDR read/write operations in the HHA processing queue exceed the threshold, the prefetchtgt operation is forbidden to read the DDR data and the operation is directly discarded. This configuration and reg_prefetch_outstanding control the prefetch threshold at the same time.0 ~ 127
reg_spec_rd_levelDDR threshold configuration for speculation read. When the DDR read and write commands in the HHA processing queue exceed the threshold, speculative reading of the DDR is prohibited. After the directory is queried, the system determines whether to read the DDR based on the directory query result. Note: The value 0x08 or 0x10 is recommended.0 ~ 127
reg_drop_levelPrefetch drop threshold configuration. When the number of DDR read and write commands in the HHA processing queue exceeds the threshold, some prefetch read commands can be discarded.0 ~ 127
dvmsnp_outstandingOutstanding value of the DVMSNP of the MN. Note 1: If dvmsnp_perf_en is enabled, the configured value is valid. The maximum value of outstanding can be 5 when the TaiShan core is used. Otherwise, overflow errors occur. Note 2: The SMMU cannot match the Dvmsnp outstanding value 5. Therefore, you need to set the switch to 3 for Totem and Infinite of 1383. Totem and Nimbus of 1620: Set DVM outstanding to 5. However, do not configure POE for the DVMSNP broadcast node. The POE uses a private page table and does not require DVMSNP.0 ~ 15(Note: 0 represents that outstanding level is 1)
dvmreq_outstandingOutstanding value of the DVMREQ of the MN. Note 1: If dvmreq_perf_en is enabled, the configured value is valid. The maximum value of outstanding can be 9 when there are four chips. Otherwise, an overflow error occurs. Note 2: In the case of two chips, the maximum outstanding value of totem can be 10, and the maximum outstanding value of nimbus or infinite is 24. This ensures the best performance. Note 3: In the case of a single chip, there is no restriction on the outstanding configuration of the totem.0 ~ 31(Note: 0 represents that outstanding level is 1)
dvmsnp_perf_enWhether to enable the outstanding level for the dvmsnp.(Note: After the function is enabled, the dvmsnp outstanding value of the MN is equal to the value of dvmsnp_outstanding.)0 (disable), 1 (enable)
dvmreq_perf_enWhether to enable the outstanding level for the dvmreq.(Note: After the function is enabled, the dvmreq outstanding value of the MN is equal to the value of dvmreq_outstanding.)0 (disable), 1 (enable)

参数的具体细节,以及寄存器各Bit间约束与关系,还请参考详细的芯片手册。

参数配置

模块插入后,在/sys/class/misc/prefetch/下将会生成一系列与参数相关联的虚拟文件接口,使用cat命令能够获取参数的当前值,或确认配置是否生效。例如:

cat /sys/class/misc/prefetch/policy

更改配置参数,可使用echo命令,例如:

echo 1 > /sys/class/misc/prefetch/policy

如手动设定的配置值有误(值不合法,比如越界等),可能导致配置失败,此时可以查看系统日志,确认是否出现改情况。

注:某些参数如policy等,获取参数时会列出所有CPU核的配置情况,改配参数时将一并改动所有CPU的配置。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
热门话题
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap