From de04241ab87afcaac26f15fcc32a7bd27294dd47 Mon Sep 17 00:00:00 2001 From: Jonathan Marek Date: Tue, 16 Feb 2021 15:10:29 -0500 Subject: [PATCH 1/6] opp: Don't skip freq update for different frequency We skip the OPP update if the current and target OPPs are same. This is fine for the devices that don't support frequency but may cause issues for the ones that need to program frequency. An OPP entry doesn't really signify a single operating frequency but rather the highest frequency at which the other properties of the OPP entry apply. And we may reach here with different frequency values, while all of them would point to the same OPP entry in the OPP table. We just need to update the clock frequency in that case, though in order to not add special exit points we reuse the code flow from a normal path. While at it, rearrange the conditionals in the 'if' statement to check 'enabled' flag at the end. Fixes: 81c4d8a3c414 ("opp: Keep track of currently programmed OPP") Signed-off-by: Jonathan Marek [ Viresh: Improved commit log and subject, rename current_freq as current_rate, document it, remove local variable and rearrange code. ] Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 8 +++++--- drivers/opp/opp.h | 2 ++ 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index c3f3d9249cc5..c2689386a906 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -998,14 +998,15 @@ static int _set_opp(struct device *dev, struct opp_table *opp_table, old_opp = opp_table->current_opp; /* Return early if nothing to do */ - if (opp_table->enabled && old_opp == opp) { + if (old_opp == opp && opp_table->current_rate == freq && + opp_table->enabled) { dev_dbg(dev, "%s: OPPs are same, nothing to do\n", __func__); return 0; } dev_dbg(dev, "%s: switching OPP: Freq %lu -> %lu Hz, Level %u -> %u, Bw %u -> %u\n", - __func__, old_opp->rate, freq, old_opp->level, opp->level, - old_opp->bandwidth ? old_opp->bandwidth[0].peak : 0, + __func__, opp_table->current_rate, freq, old_opp->level, + opp->level, old_opp->bandwidth ? old_opp->bandwidth[0].peak : 0, opp->bandwidth ? opp->bandwidth[0].peak : 0); scaling_down = _opp_compare_key(old_opp, opp); @@ -1061,6 +1062,7 @@ static int _set_opp(struct device *dev, struct opp_table *opp_table, /* Make sure current_opp doesn't get freed */ dev_pm_opp_get(opp); opp_table->current_opp = opp; + opp_table->current_rate = freq; return ret; } diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index 9b9daf83b074..50fb9dced3c5 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -135,6 +135,7 @@ enum opp_table_access { * @clock_latency_ns_max: Max clock latency in nanoseconds. * @parsed_static_opps: Count of devices for which OPPs are initialized from DT. * @shared_opp: OPP is shared between multiple devices. + * @current_rate: Currently configured frequency. * @current_opp: Currently configured OPP for the table. * @suspend_opp: Pointer to OPP to be used during device suspend. * @genpd_virt_dev_lock: Mutex protecting the genpd virtual device pointers. @@ -184,6 +185,7 @@ struct opp_table { unsigned int parsed_static_opps; enum opp_table_access shared_opp; + unsigned long current_rate; struct dev_pm_opp *current_opp; struct dev_pm_opp *suspend_opp; From 67fc209b527d023db4d087c68e44e9790aa089ef Mon Sep 17 00:00:00 2001 From: Shawn Guo Date: Tue, 19 Jan 2021 10:39:25 +0800 Subject: [PATCH 2/6] cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks Commit f17b3e44320b ("cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code") introduces a regression on platforms using the driver, by failing to initialise a policy, when one is created post hotplug. When all the CPUs of a policy are hoptplugged out, the call to .exit() and later to devm_iounmap() does not release the memory region that was requested during devm_platform_ioremap_resource(). Therefore, a subsequent call to .init() will result in the following error, which will prevent a new policy to be initialised: [ 3395.915416] CPU4: shutdown [ 3395.938185] psci: CPU4 killed (polled 0 ms) [ 3399.071424] CPU5: shutdown [ 3399.094316] psci: CPU5 killed (polled 0 ms) [ 3402.139358] CPU6: shutdown [ 3402.161705] psci: CPU6 killed (polled 0 ms) [ 3404.742939] CPU7: shutdown [ 3404.765592] psci: CPU7 killed (polled 0 ms) [ 3411.492274] Detected VIPT I-cache on CPU4 [ 3411.492337] GICv3: CPU4: found redistributor 400 region 0:0x0000000017ae0000 [ 3411.492448] CPU4: Booted secondary processor 0x0000000400 [0x516f802d] [ 3411.503654] qcom-cpufreq-hw 17d43000.cpufreq: can't request region for resource [mem 0x17d45800-0x17d46bff] With that being said, the original code was tricky and skipping memory region request intentionally to hide this issue. The true cause is that those devm_xxx() device managed functions shouldn't be used for cpufreq init/exit hooks, because &pdev->dev is alive across the hooks and will not trigger auto resource free-up. Let's drop the use of device managed functions and manually allocate/free resources, so that the issue can be fixed properly. Cc: v5.10+ # v5.10+ Fixes: f17b3e44320b ("cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code") Suggested-by: Bjorn Andersson Signed-off-by: Shawn Guo Signed-off-by: Viresh Kumar --- drivers/cpufreq/qcom-cpufreq-hw.c | 40 ++++++++++++++++++++++++------- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c index acc645b85e79..abb91162a6ed 100644 --- a/drivers/cpufreq/qcom-cpufreq-hw.c +++ b/drivers/cpufreq/qcom-cpufreq-hw.c @@ -32,6 +32,7 @@ struct qcom_cpufreq_soc_data { struct qcom_cpufreq_data { void __iomem *base; + struct resource *res; const struct qcom_cpufreq_soc_data *soc_data; }; @@ -280,6 +281,7 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy) struct of_phandle_args args; struct device_node *cpu_np; struct device *cpu_dev; + struct resource *res; void __iomem *base; struct qcom_cpufreq_data *data; int ret, index; @@ -303,18 +305,33 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy) index = args.args[0]; - base = devm_platform_ioremap_resource(pdev, index); - if (IS_ERR(base)) - return PTR_ERR(base); + res = platform_get_resource(pdev, IORESOURCE_MEM, index); + if (!res) { + dev_err(dev, "failed to get mem resource %d\n", index); + return -ENODEV; + } - data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL); + if (!request_mem_region(res->start, resource_size(res), res->name)) { + dev_err(dev, "failed to request resource %pR\n", res); + return -EBUSY; + } + + base = ioremap(res->start, resource_size(res)); + if (IS_ERR(base)) { + dev_err(dev, "failed to map resource %pR\n", res); + ret = PTR_ERR(base); + goto release_region; + } + + data = kzalloc(sizeof(*data), GFP_KERNEL); if (!data) { ret = -ENOMEM; - goto error; + goto unmap_base; } data->soc_data = of_device_get_match_data(&pdev->dev); data->base = base; + data->res = res; /* HW should be in enabled state to proceed */ if (!(readl_relaxed(base + data->soc_data->reg_enable) & 0x1)) { @@ -355,7 +372,11 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy) return 0; error: - devm_iounmap(dev, base); + kfree(data); +unmap_base: + iounmap(data->base); +release_region: + release_mem_region(res->start, resource_size(res)); return ret; } @@ -363,12 +384,15 @@ static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy) { struct device *cpu_dev = get_cpu_device(policy->cpu); struct qcom_cpufreq_data *data = policy->driver_data; - struct platform_device *pdev = cpufreq_get_driver_data(); + struct resource *res = data->res; + void __iomem *base = data->base; dev_pm_opp_remove_all_dynamic(cpu_dev); dev_pm_opp_of_cpumask_remove_table(policy->related_cpus); kfree(policy->freq_table); - devm_iounmap(&pdev->dev, data->base); + kfree(data); + iounmap(base); + release_mem_region(res->start, resource_size(res)); return 0; } From 538b0188da4653b9f4511a114f014354fb6fb7a5 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 15 Feb 2021 20:24:46 +0100 Subject: [PATCH 3/6] cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known Commit 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") attempted to address a performance issue involving acpi-cpufreq, the schedutil governor and scale-invariance on x86 by extending the frequency tables created by acpi-cpufreq to cover the entire range of "turbo" (or "boost") frequencies, but that caused frequencies reported via /proc/cpuinfo and the scaling_cur_freq attribute in sysfs to change which may confuse users and monitoring tools. For this reason, revert the part of commit 3c55e94c0ade adding the extra entry to the frequency table and use the observation that in principle cpuinfo.max_freq need not be equal to the maximum frequency listed in the frequency table for the given policy. Namely, modify cpufreq_frequency_table_cpuinfo() to allow cpufreq drivers to set their own cpuinfo.max_freq above that frequency and change acpi-cpufreq to set cpuinfo.max_freq to the maximum boost frequency found via CPPC. This should be sufficient to let all of the cpufreq subsystem know the real maximum frequency of the CPU without changing frequency reporting. Link: https://bugzilla.kernel.org/show_bug.cgi?id=211305 Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") Reported-by: Matt McDonald Tested-by: Matt McDonald Signed-off-by: Rafael J. Wysocki Tested-by: Giovanni Gherdovich Tested-by: Michael Larabel Cc: 5.11+ # 5.11+ --- drivers/cpufreq/acpi-cpufreq.c | 62 +++++++++------------------------- drivers/cpufreq/freq_table.c | 8 ++++- 2 files changed, 23 insertions(+), 47 deletions(-) diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c index d3e5a6fceb61..d1bbc16fba4b 100644 --- a/drivers/cpufreq/acpi-cpufreq.c +++ b/drivers/cpufreq/acpi-cpufreq.c @@ -54,7 +54,6 @@ struct acpi_cpufreq_data { unsigned int resume; unsigned int cpu_feature; unsigned int acpi_perf_cpu; - unsigned int first_perf_state; cpumask_var_t freqdomain_cpus; void (*cpu_freq_write)(struct acpi_pct_register *reg, u32 val); u32 (*cpu_freq_read)(struct acpi_pct_register *reg); @@ -223,10 +222,10 @@ static unsigned extract_msr(struct cpufreq_policy *policy, u32 msr) perf = to_perf_data(data); - cpufreq_for_each_entry(pos, policy->freq_table + data->first_perf_state) + cpufreq_for_each_entry(pos, policy->freq_table) if (msr == perf->states[pos->driver_data].status) return pos->frequency; - return policy->freq_table[data->first_perf_state].frequency; + return policy->freq_table[0].frequency; } static unsigned extract_freq(struct cpufreq_policy *policy, u32 val) @@ -365,7 +364,6 @@ static unsigned int get_cur_freq_on_cpu(unsigned int cpu) struct cpufreq_policy *policy; unsigned int freq; unsigned int cached_freq; - unsigned int state; pr_debug("%s (%d)\n", __func__, cpu); @@ -377,11 +375,7 @@ static unsigned int get_cur_freq_on_cpu(unsigned int cpu) if (unlikely(!data || !policy->freq_table)) return 0; - state = to_perf_data(data)->state; - if (state < data->first_perf_state) - state = data->first_perf_state; - - cached_freq = policy->freq_table[state].frequency; + cached_freq = policy->freq_table[to_perf_data(data)->state].frequency; freq = extract_freq(policy, get_cur_val(cpumask_of(cpu), data)); if (freq != cached_freq) { /* @@ -680,7 +674,6 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) struct cpuinfo_x86 *c = &cpu_data(cpu); unsigned int valid_states = 0; unsigned int result = 0; - unsigned int state_count; u64 max_boost_ratio; unsigned int i; #ifdef CONFIG_SMP @@ -795,28 +788,8 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) goto err_unreg; } - state_count = perf->state_count + 1; - - max_boost_ratio = get_max_boost_ratio(cpu); - if (max_boost_ratio) { - /* - * Make a room for one more entry to represent the highest - * available "boost" frequency. - */ - state_count++; - valid_states++; - data->first_perf_state = valid_states; - } else { - /* - * If the maximum "boost" frequency is unknown, ask the arch - * scale-invariance code to use the "nominal" performance for - * CPU utilization scaling so as to prevent the schedutil - * governor from selecting inadequate CPU frequencies. - */ - arch_set_max_freq_ratio(true); - } - - freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL); + freq_table = kcalloc(perf->state_count + 1, sizeof(*freq_table), + GFP_KERNEL); if (!freq_table) { result = -ENOMEM; goto err_unreg; @@ -851,27 +824,25 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) } freq_table[valid_states].frequency = CPUFREQ_TABLE_END; + max_boost_ratio = get_max_boost_ratio(cpu); if (max_boost_ratio) { - unsigned int state = data->first_perf_state; - unsigned int freq = freq_table[state].frequency; + unsigned int freq = freq_table[0].frequency; /* * Because the loop above sorts the freq_table entries in the * descending order, freq is the maximum frequency in the table. * Assume that it corresponds to the CPPC nominal frequency and - * use it to populate the frequency field of the extra "boost" - * frequency entry. + * use it to set cpuinfo.max_freq. */ - freq_table[0].frequency = freq * max_boost_ratio >> SCHED_CAPACITY_SHIFT; + policy->cpuinfo.max_freq = freq * max_boost_ratio >> SCHED_CAPACITY_SHIFT; + } else { /* - * The purpose of the extra "boost" frequency entry is to make - * the rest of cpufreq aware of the real maximum frequency, but - * the way to request it is the same as for the first_perf_state - * entry that is expected to cover the entire range of "boost" - * frequencies of the CPU, so copy the driver_data value from - * that entry. + * If the maximum "boost" frequency is unknown, ask the arch + * scale-invariance code to use the "nominal" performance for + * CPU utilization scaling so as to prevent the schedutil + * governor from selecting inadequate CPU frequencies. */ - freq_table[0].driver_data = freq_table[state].driver_data; + arch_set_max_freq_ratio(true); } policy->freq_table = freq_table; @@ -947,8 +918,7 @@ static void acpi_cpufreq_cpu_ready(struct cpufreq_policy *policy) { struct acpi_processor_performance *perf = per_cpu_ptr(acpi_perf_data, policy->cpu); - struct acpi_cpufreq_data *data = policy->driver_data; - unsigned int freq = policy->freq_table[data->first_perf_state].frequency; + unsigned int freq = policy->freq_table[0].frequency; if (perf->states[0].core_frequency * 1000 != freq) pr_warn(FW_WARN "P-state 0 is not max freq\n"); diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c index f839dc9852c0..d3f756f7b5a0 100644 --- a/drivers/cpufreq/freq_table.c +++ b/drivers/cpufreq/freq_table.c @@ -52,7 +52,13 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy, } policy->min = policy->cpuinfo.min_freq = min_freq; - policy->max = policy->cpuinfo.max_freq = max_freq; + policy->max = max_freq; + /* + * If the driver has set its own cpuinfo.max_freq above max_freq, leave + * it as is. + */ + if (policy->cpuinfo.max_freq < max_freq) + policy->max = policy->cpuinfo.max_freq = max_freq; if (policy->min == ~0) return -EINVAL; From 71f1309f4f5b70aa3f1342a52b1460aa454c39ff Mon Sep 17 00:00:00 2001 From: Yue Hu Date: Thu, 18 Feb 2021 17:01:32 +0800 Subject: [PATCH 4/6] cpufreq: schedutil: Remove needless sg_policy parameter from ignore_dl_rate_limit() Since sg_policy is a member of struct sugov_cpu. Also remove the local variable in sugov_update_single_common() to make the code more clean. Signed-off-by: Yue Hu Acked-by: Viresh Kumar [ rjw: Minor subject edits ] Signed-off-by: Rafael J. Wysocki --- kernel/sched/cpufreq_schedutil.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 6931f0cdeb80..68fd485504a6 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -426,23 +426,21 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } * Make sugov_should_update_freq() ignore the rate limit when DL * has increased the utilization. */ -static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy) +static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu) { if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl) - sg_policy->limits_changed = true; + sg_cpu->sg_policy->limits_changed = true; } static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags) { - struct sugov_policy *sg_policy = sg_cpu->sg_policy; - sugov_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; - ignore_dl_rate_limit(sg_cpu, sg_policy); + ignore_dl_rate_limit(sg_cpu); - if (!sugov_should_update_freq(sg_policy, time)) + if (!sugov_should_update_freq(sg_cpu->sg_policy, time)) return false; sugov_get_util(sg_cpu); @@ -557,7 +555,7 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) sugov_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; - ignore_dl_rate_limit(sg_cpu, sg_policy); + ignore_dl_rate_limit(sg_cpu); if (sugov_should_update_freq(sg_policy, time)) { next_f = sugov_next_freq_shared(sg_cpu, time); From e209cb51bfcceda7519b8ba1094c8ba41a658ce8 Mon Sep 17 00:00:00 2001 From: Yue Hu Date: Thu, 18 Feb 2021 17:37:53 +0800 Subject: [PATCH 5/6] cpufreq: schedutil: Remove update_lock comment from struct sugov_policy definition Currently, update_lock is also used in sugov_update_single_freq(). The comment is not helpful anymore. Signed-off-by: Yue Hu Acked-by: Viresh Kumar [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki --- kernel/sched/cpufreq_schedutil.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 68fd485504a6..fe7df039ce9e 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -26,7 +26,7 @@ struct sugov_policy { struct sugov_tunables *tunables; struct list_head tunables_hook; - raw_spinlock_t update_lock; /* For shared policies */ + raw_spinlock_t update_lock; u64 last_freq_update_time; s64 freq_update_delay_ns; unsigned int next_freq; From 4e6df217b73e4e76a3f08d6b905790e5445db63e Mon Sep 17 00:00:00 2001 From: Yue Hu Date: Thu, 18 Feb 2021 17:53:38 +0800 Subject: [PATCH 6/6] cpufreq: Fix typo in kerneldoc comment Change 'Terget' to 'Target'. Should be Target. Signed-off-by: Yue Hu Acked-by: Viresh Kumar [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 7d0ae968def7..1d1b563cea4b 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2101,7 +2101,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch); * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go. * @cpu: Target CPU. * @min_perf: Minimum (required) performance level (units of @capacity). - * @target_perf: Terget (desired) performance level (units of @capacity). + * @target_perf: Target (desired) performance level (units of @capacity). * @capacity: Capacity of the target CPU. * * Carry out a fast performance level switch of @cpu without sleeping.