Qualität und Sicherheit in der Gesundheitsversorgung / Quality and Safety in Health Care| Volume 165, P27-34, October 2021

Ok
• PDF [489 KB]PDF [489 KB]
• Top

# Measuring patient satisfaction in acute care hospitals: nationwide monitoring in Switzerland

Open AccessPublished:August 16, 2021

## Abstract

The National Association for Quality Development in Hospitals and Clinics (ANQ) has conducted patient satisfaction measurements in the inpatient sector in Switzerland since 2009. Specifically designed for this measurement, an instrument consisting of five questions was evaluated on an 11-point rating scale. Nevertheless, the instrument showed substantial ceiling effects, which did not allow for hospital discrimination. Therefore, ANQ initiated a revision testing different scales in a pilot study. The results showed that a 5-point verbal scale displayed good psychometric properties. Compared to the 7- or 11-point scales, the 5-point verbal scale exhibited reduced ceiling effects, which was more appropriate to compare hospitals. For the national public reporting of hospitals and clinics, risk adjustment by age and self-reported health status was recommended, which was not the case for gender, principal diagnosis, type of admission and insurance status.

## Zusammenfassung

Der nationale Verein für Qualitätsentwicklung in Spitälern und Kliniken (ANQ) führt in der Schweiz seit 2009 Patientenzufriedenheitsmessungen im stationären Bereich durch. Das für diese Messung entworfene Instrument besteht aus fünf Fragen, welche auf einer 11-er Bewertungsskala beurteilt werden. Es hat sich aber gezeigt, dass das Instrument erhebliche Deckeneffekte aufweist und die Variation zwischen den Spitälern unzureichend abbildet. Der ANQ hat deshalb eine Überarbeitung veranlasst, wobei verschiedene Skalenversionen in einer Pilotuntersuchung getestet wurden. Dabei hat sich gezeigt, dass eine verbale 5-er Antwortskala gute psychometrische Eigenschaften aufweist. Die Deckeneffekte werden minimiert und das Instrument ist besser in der Lage zwischen den Spitälern zu diskriminieren, als die verglichenen 7-er oder 11-er Skalen. Vergleich der Spitäler und Kliniken wird eine Risikoadjustierung nach Alter und subjektivem Gesundheitszustand empfohlen, hingegen sind Geschlecht, Hauptdiagnose, Art des Spitaleintritts oder Versicherungsstatus nicht im Adjustierungsmodell zu berücksichtigen.

## Schlüsselwörter

#### Abbreviations:

ANQ (National Association for Quality Development in Hospitals and Clinics), CFI (Comparative Fit Index), CMS (Centers for Medicare & Medicaid Services), DIF (Differential Item Functioning), IUMSP (Institute of Social and Preventive Medicine Lausanne), NFI (Normed Fit Index)

## Introduction

Patient-reported satisfaction is increasingly recognized as quality indicator of hospital care as it provides a perspective on quality often not recognized by purely clinical or managerial perspectives [
• Berkowitz B.
The patient experience and patient satisfaction: measurement of a complex dynamic.
]. Nevertheless, there are inconsistent findings concerning the positive association between favorable patient satisfaction ratings and better clinical outcomes. Satisfaction needs to be considered as a self-sufficient quality of care outcome [
• Prabhu K.L.
• Cleghorn M.C.
• Elnahas A.
• Tse A.
• Maeda A.
• Quereshy F.A.
• et al.
Is quality important to our patients? The relationship between surgical outcomes and patient satisfaction.
,
• Kennedy G.D.
• Tevis S.E.
• Kent K.C.
Is There a Relationship Between Patient Satisfaction and Favorable Outcomes?.
]. The relevance of satisfaction measures for governance of the health care system has been recognized with some systems incorporating them into laws and reimbursement schemes, e.g. in the US Hospital Value-Based Purchasing Program under der Centers for Medicare & Medicaid Services (CMS) [
• Hong Y.-R.
• Nguyen O.
• Etzold E.
• Song J.
• Duncan R.P.
• et al.
Early performance of hospital value-based purchasing program in medicare: a systematic review.
].
In Switzerland, hospital quality programs were not publicly mandated but implemented by various stakeholders, such as the National Association for Quality Development in Hospitals and Clinics ANQ [

ANQ – Swiss National Association for Quality Development in Hospitals and Clinics [Internet]. ANQ. [cited 2021 Apr 26]. Available from: https://www.anq.ch/en/.

]. The ANQ's objective is to monitor Switzerland‘s quality of hospitals and clinics and make this information publicly available for benchmarking. Its mission is explicitly not to produce rankings or hospital league tables. One of the many ANQ quality indicators is the periodic measurement of inpatient satisfaction. This information is provided for hospitals‘ quality monitoring, for payer organizations who use it for contracting with service providers, for policy as a basis for planning, and for the patients in order to enable informed choice of service providers. In Switzerland since 2012, patients can, with minor limitations, choose any hospital nationwide for their inpatient services. This aims to intensify the competition between hospitals that should ultimately lead to a consolidation of the almost three-hundred hospitals in Switzerland (one hospital per 28,820 residents).
The ANQ inpatient satisfaction measurement concept was developed by an expert consortium, which evaluated potential instruments. Nevertheless, the consortium did not achieve a consensus on the use of an equally respected instrument, and, as a result, the ANQ developed its own instrument implemented in all three Swiss national languages: German, French, and Italian [

ANQ. Patient satisfaction measurement – Concept for measurements in acute somatic medicine, rehabilitation and psychiatry [German] [Internet]. 2019. Available from: https://www.anq.ch/wp-content/uploads/2017/12/ANQ_Patientenzufriedenheit_Konzept.pdf.

]. The survey was first conducted in 2009 in nearly 200 acute care hospitals and continued annually with a response rate of about 47 percent. The average outcome of each hospital has been published in the ANQ‘s website with a funnel plot for every item [

Messergebnisse Akutsomatik [Internet]. ANQ. [cited 2021 Apr 26]. Available from: https://www.anq.ch/de/fachbereiche/akutsomatik/messergebnisse-akutsomatik/.

].
The original instrument consisted of five items on a numerical 11-point rating scale from 0 (=extremely bad) to 10 (=excellent). Two items referred to the overall satisfaction with care, two items referred to communication, and the last item referred to whether patients felt treated with respect and dignity. In addition to the five items, the survey participants were asked about their background characteristics that included gender, age, and health insurance type (public or additionally semiprivate/ private).
Patients discharged from inpatient hospitalization in acute care were invited to participate in the survey. The inclusion criteria were age of 18 years or older, a Swiss home address, and knowledge of the local language. The questionnaire was mailed between two and seven weeks after discharge. Participating hospitals also sent the questionnaire electronically when the e-mail address was available, and the patient agreed to participate. No reminder management was in place. The survey logistics and data analysis were mandated to independent contractors. Shortly after its rollout, the survey revealed some weaknesses, being the ceiling effect its main shortcoming [
• Frick U.
• Wiedermann W.
• Kast S.
• Haug S.
Überprüfung des ANQ-Messplans hinsichtlich Vollständigkeit und Relevanz.
]. As a result, the expert consortium was mandated to revise the instrument with the goal to create a more granular comparison among hospitals. Further aims of the revision were the inclusion of further domains and the analysis of confounding factors for risk adjustment.
A first draft of a modified instrument was available by the end of 2014, which included items on discharge management and medication. A study with the overall objective to test and potentially modify the new instrument for the national reporting of patient satisfaction in an acute in-patient setting was released. The specific aims of the study were a) to determine requested quality dimensions (survey design); b) to test the instrument for comprehensibility by patients; c) determine a response scale that minimizes ceiling effects; d) determine the instrument's dimensionality and psychometric properties; and e) identify variables for risk adjustment and its effect.

## Material and methods

### Questionnaire design

By the end of 2014, a first draft of a modified instrument was available. The pre-pilot or preparation phase tested an initial version of the questionnaire, which included seven items within four domains: quality of care, communication, medication, and discharge process. Before launching the pilot phase, the evaluation of the questionnaire passed through three stages that included the patients’ perspective, experts’ analysis, and a scientific translation.
In the first stage, or pilot phase, cognitive interviews were organized to elicit patients’ understanding of the questionnaire. The process started in June 2015 in three acute care hospitals, where patients were contacted in advance by the hospital to inform them about the survey. Only patients who accepted to participate, and who signed a declaration of consent took part of the interviews. In total, 15 patients evaluated the clarity of the questions, the answer possibilities, and the visual structure of the questionnaire. This process ended with recommendations that resulted in adaptations to the questionnaire. In a second stage, the adapted version of the questionnaire was presented and discussed with the expert committee on patient satisfaction. This process resulted in additional changes that were tested again with a smaller group of patients.
As the original version of the questionnaire was developed in German, the questionnaire followed a rigorous translation process in order to provide adapted versions for the French and Italian speaking regions. Initially health scientists from the Institute of Social and Preventive Medicine, Lausanne (IUMSP), whose mother tongue were German, French, or Italian, translated the questionnaire. The translated versions were sent to two independent professional translators to perform back-translations. To check for potential inconsistencies, members of the IUMSP together with external experts evaluated these final versions. Finally, the translated versions of the questionnaire were tested with patients.

### Sampling and inclusion criteria

Inclusion criteria were persons 18 years of age or older, who had a hospital stay of more than 24 hours in acute care with discharge during the defined measurement period for each language region. For German-speaking hospitals, the measurement period was the month of October 2015, for French-speaking hospitals was January 2016 and for Italian-speaking hospitals was February 2016. The questionnaire was sent out on the 15th of the following month respectively. Persons with multiple hospital admissions and discharges during the measurement month received only one questionnaire. The inclusion and exclusion criteria were identical to the ANQ patient satisfaction survey [

ANQ. Patientenzufriedenheitsmessung ANQ – Konzept für die Messungen in der Akutsomatik, Rehabilitation und Psychiatrie [Internet]. 2019. Report No.: 1.1. Available from: https://www.anq.ch/wp-content/uploads/2017/12/ANQ_Patientenzufriedenheit_Konzept.pdf.

]. For comparison purposes, three different response scales were constructed: a 5-point verbal scale, a 7-point scale, and 11-point scale.
To determine the number of hospitals and the number of cases per hospitals, several conditions had to be considered: [
• Berkowitz B.
The patient experience and patient satisfaction: measurement of a complex dynamic.
] the three response scales had to be tested in all participating hospitals; [
• Prabhu K.L.
• Cleghorn M.C.
• Elnahas A.
• Tse A.
• Maeda A.
• Quereshy F.A.
• et al.
Is quality important to our patients? The relationship between surgical outcomes and patient satisfaction.
] it was required to have a significant number of participating hospitals in each language region, so the differences between regions could be interpreted; and [
• Kennedy G.D.
• Tevis S.E.
• Kent K.C.
Is There a Relationship Between Patient Satisfaction and Favorable Outcomes?.
] it was required to have enough cases per hospital, so the differences between hospitals could be interpreted. From the literature, it resulted that 12-15 hospitals, five per language region (German, French, Italian), were necessary to get robust estimates of the psychometric properties [
• Farin E.
[The application of hierarchical linear modelling for rehabilitation center comparisons in quality assurance and rehabilitation research].
]. Every participating hospital randomly allocated each patient to one of the three scales, where a minimum response rate of 50 responses per scale was determined, assuming the detection with $p<0.05$ of a difference of 0.5 in standard deviation from the overall mean with power of 80%. The minimum number of questionnaires required by each hospital was 150, which translated to 2,250 cases at country level, or 750 per language region.

### Statistical analysis

#### Ceiling effect

To deal with the observed ceiling effect of the previous version of the questionnaire, which had a 11-point scale, the pilot survey tested two additional scale responses: a 5-point verbal scale and a 7-point scale, both with only the extreme rates labeled. The inclusion of the additional scales was based on recommendations of related literature, which suggested that lower point scales tend to have a better performance to deal with ceiling effects [
• Garratt A.M.
• Helgeland J.
• Gulbrandsen P.
Five-point scales outperform 10-point scales in a randomized comparison of item scaling for the Patient Experiences Questionnaire.
,
• Moors G.
• Kieruj N.D.
• Vermunt J.K.
The effect of labeling and numbering of response scales on the likelihood of response bias.
].
To evaluate the adequacy of the new set of questions and to determine the capacity of discrimination of the scales, two tests were performed. The first test was the item-total correlation, which determined if an item in the questionnaire was inconsistent with the overall performance of the other items. Values between 0.4 to 0.7 were considered good for discrimination purposes [
• Moosbrugger H.
• Kelava A.
Qualitätsanforderungen an einen psychologischen Test (Testgütekriterien).
]. The second test was a Cronbach's alpha ($α$) test, which measured the internal consistency of the questionnaire, and calculated how the items related to each other. Scales with $α≥0.65$ were recommended to use. Finally, a total (sum) score was built by equally weighting all items. For ease of comparison, each scale was linearly transformed into a 0 to 100 scale.

#### Unidimensionality

Dimensional analysis tested the structure of the items for the three scales, as well as the adequacy of the questions. Using confirmatory factor analysis, we tested if the data fitted the theoretical model. A 4-factor and a 1-factor model were specified and tested with the general factor “patient satisfaction.” The adaptability of the models was assessed using several measures: The Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI) and the Normed Fit Index (NFI). Good adaptation of the model for the RMSEA was assumed with values between 0.05 and 0.08, for the CFI ≥ 0.97 and for the NFI ≥ 0.95 [
• Hu L.
• Bentler P.M.
Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives.
].

#### Rasch analysis

To test the global structure of the questionnaire, a Rasch model analysis was implemented. Compared to other models, an advantage of the Rasch model is that a deterministic relation between the patients’ behavior in the test and their personal parameters is not necessary. It assumes the existence of a latent continuity of characteristics (patient satisfaction) that allowed patients to be grouped into item-response categories based on the manifestation of their characteristics (satisfaction).
The Q-index controls for the adequacy of the items included in the questionnaire, i.e., the extent to which the response to the items is explained by the model. The result of the Q-index (conditional item-fit index ranging from 0 to 1) shows when an item is more or less discriminated than predicted by the model. A Q-index lower than 0.30 is indicative of a good fit [
• Rost J.
• Lehrbuch Testtheorie
Testkonstruktion. 2., vollst. überarb. u. erw. Aufl. 2004 edition.
].

#### Differential item functioning

The differential item functioning (DIF) tests whether items in a questionnaire are similarly understood across groups with different background characteristics (usually age, gender, or ethnicity). The objective of any questionnaire is that the differences in the probability of choosing a category of response are explained by the variables of interest (global satisfaction) and not by the groups’ characteristics. To detect DIF, for every item in the questionnaire, and in the three scale formats, a logistic regression was implemented [
• Zumbo B.
A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modelling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation.
]. In a first stage, the cumulative sum of all the items was integrated into the model as a predictive variable. The resulting pseudo R2 was then compared to the pseudo R2 when the group's characteristics were included as covariates. The difference between the two pseudo R2 displayed the intensity of the DIF. A DIF within the threshold $0.13≤ΔR2≤0.26$ was considered moderate. The DIF was controlled for age, sex, and language.

#### Hospital comparability

In addition to the item analysis, this study examined how the different scales discriminated the results among hospitals. The average score of each hospital were compared to the total score. An analysis of the variance of patient satisfaction was calculated to evaluate hospitals‘ variation. The analysis was performed for both the total score of patient satisfaction, and the five individual items in the three scale versions. The explained variance ($η2$), which indicates the part of the variance that can be explained by the grouping variable (hospitals), varies from 0 to 1. $η2$ was considered a small effect below 0.06, a medium effect between 0.06 and 0.14 and a major effect when larger than 0.14 [
• Cohen J.
Statistical Power Analysis for the Behavioral Sciences. 2 edition.
].

The interpretation and comparability of the results required that the characteristics of patients be similar among the participating hospitals. Otherwise, the results on patient satisfaction could have been driven by other causes than the services and treatment offered by the hospitals. Using a unifactorial and a multifactorial analysis, and following evidence from related literature [
• Covinsky K.E.
• Rosenthal G.E.
• Chren M.-M.
• Justice A.C.
• Fortinsky R.H.
• Palmer R.M.
• et al.
The relation between health status changes and patient satisfaction in older hospitalized medical patients.
,
• Young G.J.
• Meterko M.
• Desai K.R.
Patient satisfaction with hospital care: effects of demographic and institutional characteristics.
,
• Gerlach L.
editor. Zeitschrift für Klinische Psychologie und Psychotherapie.
], the analysis included the following variables: Age (separated into twenty age-groups), sex, health insurance status (basic, private or semi-private), type of admission (emergency or routine admission), principal diagnosis as an indicator of illness severity (grouped by main chapters of ICD-10), length of hospital stay, place of discharge (at home or other place), delay effect (time between hospital discharge and receiving the questionnaire, 2-6 weeks after discharge), and self-perceived health status at the moment of filling the questionnaire.
Age, sex, health insurance, self-perceived health status (1 item, 5-point verbal scale), and questionnaire date were collected in the survey. The type of admission, place of discharge, dates of admission and discharge, and principal diagnosis were merged in from administrative hospital data. If one of the latter variables appeared relevant for risk adjustment, it would be included in future versions of the questionnaire.
A variance-covariance test checked if the included variables of adjustment allowed a fair comparison among the participating hospitals. A single-factor analysis of variance tested the relationship between potential confounders and patient satisfaction. The variables for which the one-factor analysis showed a relationship to patient satisfaction were then integrated into a multifactorial model.
Data preparation and statistical analyses were conducted with IBM SPSS Statistics 25. Rasch models were estimated with WINMIRA 2001 1.37 and the factor analysis was computed using LISREL 8.7. A significance level of p < 0.05 was applied to denote statistically significant effects.

## Results

### Questionnaire

The final questionnaire included five items within four domains of patient satisfaction that were tested in the three rating scales. The items included questions about quality of care (Q1), information/communication (Q2, Q3), medication (Q4), and discharge process (Q5). More specifically:
• Q1. How do you evaluate the quality of care? (Care performed by physicians and nursing personnel)
• Q2. Did you have the possibility to ask questions?
• Q4. Was the purpose of the medication you should take at home explained to you in an understandable way?
• Q5. How was the organization of the hospital discharge?

### Sampling and inclusion criteria

The pilot questionnaire was launched in 13 hospitals, 6 in the German-speaking region, 5 in the French-speaking region, and 2 in the Italian-speaking region. In total, 9,460 questionnaires were sent to eligible patients, from which 3,440 were returned, a response rate of 36.4% (Figure 1). Only the Italian-speaking region (n = 450) did not reach the minimum number of questionnaires per region.
There were a similar number of questionnaires by scale response: 1184 questionnaires with a 5-point verbal scale, 1181 with a 7-point scale, and 1075 with an 11-point scale. Amongst the 13 hospitals, one hospital in the French part did not reach the minimum required (50 completed questionnaires) and it was excluded from the analysis that determined differences between hospitals.
In total, the sample had more women (53.1%) than men (46.9%) that participated in the survey (Table 1). The average age of the respondents was 61.0 years. The average hospital stay was of 6.0 days, with more emergency (52.4%) than routine admissions (47.6%). Most patients had a general health insurance (74.1%), and about a quarter (25.9%) had private or semi-private insurance. The place of discharge was mostly at home (81.5%), and few patients (18.5%) were discharged to a different place. The average number of days between the discharge from the hospital and the questionnaire was 33.1 days.
Table 1Sample characteristics.
VariableAll respondersResponders with complete questionnaire
(N = 3,440)(N = 2,734)
Gender – n (valid %)
Male1592 (46.9)1303 (48.2)
Female1802 (53.1)1400 (51.8)
Missing4631
Age (years) – Mean (SD)61.0 (18.8)60.2 (18.6)
Length of hospital stay (days) – Mean (SD)6.0 (6.6)6.1 (6.3)
Admission type – n (valid %)
Emergency1753 (52.4)1376 (51.6)
Routine (planned)1595 (47.6)1292 (48.4)
Missing9266
Place of discharge – n (valid %)
Home2614 (81.5)2100 (82.5)
Others594 (18.5)446 (17.5)
Missing232188
Insurance type – n (valid %)
Social health insurance2493 (74.1)1982 (73.8)
Private/Semi-private873 (25.9)705 (26.2)
Missing7447
Principal diagnosis – n (valid %)
Infectious and parasitic diseases70 (2.2)53 (2.1)
New formations321 (10.1)257 (10.1)
Endocrine, nutrition and metabolism53 (1.7)43 (1.7)
Mental and behavioral disorders35 (1.1)26 (1.0)
Diseases of the blood/blood-forming organs19 (0.6)14 (0.5)
Diseases of the eye / ear71 (2.2)55 (2.1)
Diseases of the nervous system91 (2.9)66 (2.6)
Diseases of the circulatory system490 (15.4)392 (15.4)
Diseases of the respiratory system187 (5.9)152 (6.0)
Diseases of the digestive system289 (9.1)233 (9.1)
Diseases of the skin and subcutaneous tissue32 (1.0)25 (1.0)
Diseases of the musculoskeletal system418 (13.1)351 (13.8)
Diseases of the urogenital system234 (7.3)189 (7.4)
Pregnancy, birth and postpartum296 (9.3)246 (9.7)
States originating in perinatal period13 (0.4)8 (0.3)
Congenital malformations/deformations11 (0.3)11 (0.4)
Symptoms and abnormal clinical103 (3.2)79 (3.1)
Injuries/poisoning413 (13.0)323 (12.7)
Factors affecting health status39 (1.2)24 (0.9)
Missing255187
Self-perceived health status (1=Bad to 5=Excellent) – Mean (SD)3.22 (0.88)3.21 (0.88)
Duration between discharge and survey (days) – Mean (SD)33.1 (9.6)33.1 (9.6)
Notes: SD: Standard deviation

### Statistical analysis

#### Ceiling effects

The results showed that the 5-point verbal scale offered the best discrimination between respondents. This was true particularly for items 1, 2, and 5. In general, the response values were inflated; while for items 3 and 4, the average values of the three scale responses were close (around 85), for the items 1, 2 and 5, the 5-point scale displayed values significantly lower. The response values in the three scales showed a leptokurtic left skewed distribution. Among the three, the 5-point scale displayed a distribution closer to a normal distribution, particularly for items 1, 2 and 5.
The ceiling effect ranged from 23.5% to 61.3%, where items 2 and 4 presented the highest ceiling effects, and items 1 and 5 the lowest. Comparing the scales, the 5-point scale presented the lowest ceiling effect, with the exception of items 3 and 4.
The non-response rate per scale ranged from 2.5% to 9.0%. The 5-point verbal scale had the fewest missing values. The item 4 presented the fewest missing responses in the three scales (5-point scale: 6.4%; 7-point scale: 9.0%; 11-point scale: 8.8%). As for the ability of discrimination, the item-correlation of the three scales ranged between 0.50 to 0.66 (5-point scale), 0.63 to 0.77 (7-point scale), and 0.62 to 0.75 (11-point scale). For the three scale formats, the results indicated a good representation of the total value and a high capacity for discrimination between respondents with different characteristics.
Finally, in the total score, the 5-point scale presented the lowest ceiling effect (5-point scale: 10.6%; 7-point scale: 22.7%; 11-point scale: 22.0%) and the lowest average values (5: 79.8; 7: 83.9; 11: 86.3). The three scales presented a left skewed distribution, where the 5-point scale was the least skewed (asymmetry: 5-point scale: -1.14; 7-point scale: -1.65; 11-point scale: -2.12).

#### Unidimensionality

A dimensionality analysis was implemented to assert how the four domains covered in the questionnaire measured the concept of patient satisfaction. The selected domains performed well in the three tested scales. The RMSEA was < 0.06 in the three scales, the CFI >0.99, and the NFI >0.98, which implies that the correlation between the items of the questionnaire explained the selected model. Nonetheless, the 7- and 11-point scale displayed better adaptation values. Similarly, in the single factor model the RMSEA was < 0.07 in two of the three scales, and the CFI >0.98 and NFI >0.98 in the three scales. The 7- and 11-point scale showed a better performance; however, the differences were marginal. In general, the factor analysis showed that the three scales had good data adequacy. Thus, the factors represented well the model of patient satisfaction.

#### Rasch model

The estimated personal parameters showed a monotonic transformation in the total sums of the items. A big part of the possible cumulative sums and the intermediate part, the personal Rasch parameters, represented a close to a linear transformation (Appendix A, Figure A1). By inspecting the Q-index, the items in the three scales showed a good fit of the data with the p-values not indicating statistically significant differences with the subject response model (Appendix A, Table A1). Therefore, the adequacy index for every item showed a good fit of the characteristics of “patient satisfaction”. The representative values of the adaptation capacity of the global model confirmed the hypotheses of the Rasch model in the three scales. The 5-point scale achieved good overall fitting values.

#### Differential item functioning

The DIF tested three characteristics of the patients: age, sex, and language (German, French or Italian). The change in the R2 fell in the range $0.000<ΔR2<0.014$, which is below the critical values (Appendix A, Table A2). Therefore, it was assumed that there were no systematic differences in the patient satisfaction associated to the characteristics of the patient.

#### Hospital comparability

A graphical comparison of the total satisfaction score between the 12 participating hospitals and the three scales showed the largest variability between hospitals in the 5-point scale (Appendix A, Figure A2). Also, the variance analysis indicated that the 5-point scale best differentiated among hospitals, which resulted in statistically significant differences in patient satisfaction ($p=0.007$; $η2=0.028$). The findings were similar for the 7-point scale ($p=0.026$; $η2=0.025$). In contrast, the 11-point scale did not show statistically significant difference among hospitals ($p=0.349$; $η2=0.015$).
The mean values by individual items showed statistically significant differences between hospitals (Appendix A, Table A3). Nevertheless, this result varied for the three scales. In the 5-point scale, items 1, 2, 3, and 5, showed the most significant differences. In the 7-point scale, items 2, 3, and 5 showed also statistically significant, but smaller differences than the 5-point scale. The 11-point scale did not identify differences amongst hospitals for all items.

The one-factor analysis in patient satisfaction showed significant differences by age, type of admission, discharge destination, and self-reported health. In all three scales, age was as a significant predictor of patient satisfaction. However, there was no linear relationship; the lowest patient satisfaction values were found in the age groups 20-29 years and 80-89 years, while the middle age groups had higher patient satisfaction. Patients admitted through emergency reported a significantly lower patient satisfaction than patients in routine admission. Nevertheless, this result was only shown in the 5-point scale; the 7- and 11-point scale did not show significant differences. On average, people who were discharged to home reported a higher patient satisfaction in the 5- and 11-point scales, with more statistically significant results in the 5-point scale. A linear relationship between patient satisfaction and self-reported health status was identified, where patients with a better perception of their health tend to report higher levels of satisfaction. The 5-point scale showed a higher level of discrimination of the results (5-point scale: $η2=0.118$; 7-point scale: $η2=0.057$; 11-point scale: $η2=0.093$). No statistically significant association with patient satisfaction and sex, principal diagnosis, length of hospital stays, type of health insurance, or the delay effect were found in all three scales. For this reason, these factors were excluded from the multifactorial analysis for risk adjustment. Table 2
Table 2Statistical distribution of satisfaction scores, by item and scale.
Item 1Item 2Item 3Item 4Item 5Total score
Mean5-point scale74.8083.9085.1884.9469.9679.80
7-point scale85.8186.6584.5986.3580.8283.90
11-point scale87.0788.5686.9885.5683.5386.30
St. Deviation5-point scale20.3320.6419.0623.1023.4916.37
7-point scale16.9019.0919.7225.0823.6317.06
11-point scale16.0117.3217.4922.1321.5315.57
Median5-point scale75.0010010010075.0085.00
7-point scale83.3310083.3310083.3390.00
11-point scale90.0010090.0010090.0090.00
Mode5-point scale75.0010010010075.00
7-point scale100100100100100
11-point scale100100100100100
Kurtosis5-point scale0.820.701.922.650.471.69
7-point scale4.724.153.212.712.113.74
11-point scale5.896.255.764.333.095.92
Assymety5-point scale-0.72-1.14-1.35-1.70-0.69-1.14
7-point scale-1.77-1.86-1.66-1.80-1.53-1.65
11-point scale-2.04-2.24-2.10-2.07-1.78-2.12
Missing rate5-point scale3.0%4.1%4.2%6.4%2.9%
7-point scale2.5%5.2%5.6%9.0%3.5%
11-point scale3.3%6.7%6.7%8.8%4.6%
Ceiling effect5-point scale27.1%54.9%54.2%61.3%23.5%10.6%
7-point scale43.8%54.6%47.6%54.9%42.4%22.7%
11-point scale40.6%53.3%44.9%52.0%41.6%22.0%
Item correlation5-point scale0.660.630.650.500.53
7-point scale0.720.710.770.630.65
11-point scale0.730.730.750.620.68
Notes: The 5-point scale was a verbal scale (n = 1184), while in the 7-point (n = 1181) and 11-point scale (n = 1075) only the minimum and maximum were labeled.
The multifactorial regression (Table 3) included age, sex, an interaction term between age and sex, the self-perceived health status, the type of admission and the place of discharge and the treating hospital, as predictors of patient satisfaction. Sex was considered as control variable, as a small association with patient satisfaction was found, although not statistically significant. Age and self-perceived health status showed a statistically significant relation to satisfaction measured on all three scales (5-point, 7-point, and 11-point scale). In the 5-point scale, the type of admission appeared statistically significant as well. In general, the multifactorial analysis showed that the 5-point scale, compared to the other scales, can establish statistically significant differences amongst the hospitals (5-point scale:$p=0.027$, $η2=0.027$; 7-point scale: $p=0.061,η2=0.025$; 11-point scale: $p=0.693,η2=0.093$).
Table 3Risk adjustment: Effect of confounders, by scale.
Confounder5-point scale7-point scale11-point scale
p-value$η2$p-value$η2$p-value$η2$
Sex0.2490.0020.6090.0000.5010.001
Age0.0330.0110.0030.0180.0030.018
Sex*Age0.5480.0030.3550.0040.3550.004
Health status< 0.0010.103< 0.0010.078< 0.0010.078
Place of discharge0.0830.0040.9480.0000.9480.000
Hospital0.0270.0270.0610.0250.6930.011
Notes: The 5-point scale was a verbal scale (n = 948), while in the 7-point (n = 900) and 11-point scale (n = 820) only the minimum and maximum rates were labeled
Health status was a self-report on a verbal 5-point scale
Admission type and place of discharge were binary (emergency vs routine; and home vs other, respectively)

## Discussion

This study demonstrated the potential to explore various response scales in patient satisfaction ratings when the aim was a reduction in the ceiling effect towards a better discrimination among hospitals. The three tested scales displayed good psychometric properties and fit well the Rasch model. In general, the results showed that a shorter verbal scale offered the most suitable framework.
More specifically, the 5-point scale displayed the best discrimination among hospitals, and reported the fewest missing values, which suggests a wider acceptability amongst survey participants. Yet, the scale still displayed a substantial ceiling effect. In fact, the median satisfaction was maximum in two out of seven tested questions irrespective of the response scale, and the mean score per item was between 70 and 85 (in a 0-100 scale). These numbers are substantially higher than those observed in comparable European countries that exhibited ratings between 50 and 70 [
• Chantereau M.W.
• von Holstein K.S.
International comparisons of patients’ views on quality of care.
]. In general, lower satisfaction scores are found when hospital employees were surveyed on hospital quality rather than patients [
• Aiken L.H.
• Sermeus W.
• Van den Heede K.
• Sloane D.M.
• Busse R.
• McKee M.
• et al.
Patient safety, satisfaction, and quality of hospital care: cross sectional surveys of nurses and patients in 12 countries in Europe and the United States.
]. From these results we conclude that when the general population is better informed on health care, it translates in higher expectations, and thus lower ratings [
• Batbaatar E.
• Dorjdagva J.
• Luvsannyam A.
• Savino M.M.
• Amenta P.
Determinants of patient satisfaction: a systematic review.
]. Since nationwide education of the population remains illusory and methodological approaches are unlikely to overcome ceiling effects, the question remains as to how these ceiling effects can be reduced.
This study identified self-reported health status and age as confounding factors when assessing patient satisfaction; similarly for the type of admission where persons admitted through emergency reported lower satisfaction. In the literature, the effect of health status on satisfaction is well reported [
• Jaipaul C.K.
• Rosenthal G.E.
Are older patients more satisfied with hospital care than younger patients?.
,
• Xiao H.
• Barber J.P.
The effect of perceived health status on patient satisfaction.
]. Same for the effect of age, with increased satisfaction in older persons and a slight decline in the very old with experience with health care [
• Jaipaul C.K.
• Rosenthal G.E.
Are older patients more satisfied with hospital care than younger patients?.
,
• Moret L.
• Nguyen J.-M.
• Volteau C.
• Falissard B.
• Lombrail P.
• Gasquet I.
Evidence of a non-linear influence of patient age on satisfaction with hospital care.
]. In contrast, there exist mixed results on the effect of sex, principal diagnosis and length of stay [
• Batbaatar E.
• Dorjdagva J.
• Luvsannyam A.
• Savino M.M.
• Amenta P.
Determinants of patient satisfaction: a systematic review.
]. In general, more choice by the patients was associated with higher satisfaction ratings, such as for routine admissions as compared to emergency admission, or for privately insured patients as compared to publicly insured [
• Batbaatar E.
• Dorjdagva J.
• Luvsannyam A.
• Savino M.M.
• Amenta P.
Determinants of patient satisfaction: a systematic review.
,
• Grøndahl V.A.
• Hall-Lord M.L.
• Karlsson I.
• Appelgren J.
Exploring patient satisfaction predictors in relation to a theoretical model.
]. This phenomenon could partly explain the high satisfaction ratings, as Switzerland offers free choice of hospitals, with few restrictions.
Related studies have shown that the language region was often identified as a factor contributing to rating tendency with the German speaking region featuring the highest satisfaction [

ANQ Measurement of Patient Satisfaction [German] [Internet]. ANQ. Available from: https://www.anq.ch/de/fachbereiche/akutsomatik/messergebnisse-akutsomatik/.

]. Still, language region was not intended to be employed as a factor for risk adjustment as it is not a personal factor, but more a specification of the hospital. A risk adjustment by language would annul any real difference in quality of care between linguistic regions. Nevertheless, it remains to be proven if the differences in satisfaction ratings between language regions are a result of cultural differences, or they reflect a distinct care provision between regions.
A similar argument could be made to not adjust by other factors in order to avoid annulling the effect of low-quality care provided to specific groups in the population, e.g., young individuals, or those with low self-perceived health status. In many cases, risk adjustment could increase the bias [
• Lilford R.
• Mohammed M.A.
• Spiegelhalter D.
• Thomson R.
Use and misuse of process and outcome data in managing performance of acute medical care: Avoiding institutional stigma.
,
• Nicholl J.
Case-mix adjustment in non-randomised observational evaluations: the constant risk fallacy.
]. Besides, if a confounding factor is strong but evenly distributed amongst the hospitals in the sampling population, there is no need for adjustment. The last assertation is most likely true for Swiss hospitals which cannot deny care to patients and show little discrimination against specific groups [
Swiss Federal Office of Public Health
]. Therefore, the ANQ decided to offer some risk adjustment in moderate ways, including rather fewer than more adjustment factors.

### Strengths and Limitations

For this pilot study, 13 hospitals out of 200 acute care hospitals in Switzerland, were surveyed. The participating hospitals were not randomly selected, but actively agreed to participate in response to an ANQ call. In this respect, the generalizability of the findings from such an opportunity sample can be questioned.
The response rate of the survey was substantially lower compared to other satisfaction survey conducted in the same year (46.5%). Related literature noticed that more satisfied patients were also more likely to participate in a post-hospitalization satisfaction survey [
• Perneger T.V.
• Peytremann-Bridevaux I.
• Combescure C.
Patient satisfaction and survey response in 717 hospital surveys in Switzerland: a cross-sectional study.
]. Hence, the lower the response the more the findings were biased towards higher ratings. Nevertheless, in this study, the response rate was not necessarily critical for our efforts to mitigate the ceiling effect; however, it raises a question on the representativeness of the psychometric properties.
The study sample matched well the general hospital population in the distribution of gender [
Swiss Federal Office of Public Health
]. However, the study population had a lower average length of hospital stay than the general population. Other parameters such as age or insurance types were not systematically sampled, and were only presented in aggregate form to protect the privacy of the patients.

## Conclusions

Using a large sample of patients from 13 hospitals in Switzerland, we showed that, compared to other scales, a 5-point verbal scale is a valid and reliable instrument for measuring patient satisfaction in acute care. The findings should preferably be risk adjusted for age and self-perceived health status. Nevertheless, the challenge remains to design an instrument that hampers the ceiling effect, which allows to discriminate among hospitals.
Related studies should consider testing more than one scale in order to better address ceiling effects. Similarly, risk adjustment must account for the context, as other personal factors might be more relevant in other countries/regions. Finally, it is important to take into account the timing of the application of the questionnaire, as recall bias could play an important role in the reported satisfaction.

## Data statement

Data of the pilot study as well as data of the regular ANQ patient satisfaction survey are available for research purposes upon request to the ANQ. The terms of use and conditions were specified under https://www.anq.ch/wp-content/uploads/2017/12/ANQ_Konzept_Nutzung_Forschungsdaten.pdf (Verwendung der im Rahmen von ANQ-Messungen erhobenen Daten zu Forschungszwecken), or contact [email protected]

## Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

## Acknowledgement

We thank the ANQ Quality Committee on Patient Satisfaction with its members: Pierre Chopard, Adriana Degiorgi, Michel Délitroz, Andrea Dobrin, Armin Gemperli, Francesca Giuliani, Janick Gross, Stefan Kuhn, Anastasia Theodoridou, Stephan Tobler, Daniel Uebelhart and Eric Veya. We further thank Chjristoph Poggendorf and Anna Schlumbohm from the Charité, Berlin for the technical support, Anita Savidan-Niederer and Isabelle Peytremann-Bridevaux from ESOPE, IUMSP Lausanne for support in translation, and Regula Heller and the entire ANQ team. We also wish to thank all patients who participated in our pilot survey; and those who did and continue to participate in ANQ patient satisfaction surveys.

## Conflict of interest

The authors have no conflict of interest to disclose.

## Credit author statement

Diana Pacheco Barzallo: Conceptualization, Formal analysis, Writing -original draft, Writing -review & editing, Visualization. Stefanie Köhn: Conceptualization, Methodology, Formal analysis, Writing -review & editing, Visualization. Stephan Tobler: Writing -review & editing. Michel Délitroz: Writing -review & editing. Armin Gemperli: Conceptualization, Supervision, Writing -review & editing.

## References

• Berkowitz B.
The patient experience and patient satisfaction: measurement of a complex dynamic.
Online J Issues Nurs. 2016; 21: 1
• Prabhu K.L.
• Cleghorn M.C.
• Elnahas A.
• Tse A.
• Maeda A.
• Quereshy F.A.
• et al.
Is quality important to our patients? The relationship between surgical outcomes and patient satisfaction.
BMJ Qual Saf. 2018; 27: 48-52
• Kennedy G.D.
• Tevis S.E.
• Kent K.C.
Is There a Relationship Between Patient Satisfaction and Favorable Outcomes?.
Annals of Surgery. 2014; 260: 592-600
• Hong Y.-R.
• Nguyen O.
• Etzold E.
• Song J.
• Duncan R.P.
• et al.
Early performance of hospital value-based purchasing program in medicare: a systematic review.
Med Care. 2020; 58: 734-743
1. ANQ – Swiss National Association for Quality Development in Hospitals and Clinics [Internet]. ANQ. [cited 2021 Apr 26]. Available from: https://www.anq.ch/en/.

2. ANQ. Patient satisfaction measurement – Concept for measurements in acute somatic medicine, rehabilitation and psychiatry [German] [Internet]. 2019. Available from: https://www.anq.ch/wp-content/uploads/2017/12/ANQ_Patientenzufriedenheit_Konzept.pdf.

3. Messergebnisse Akutsomatik [Internet]. ANQ. [cited 2021 Apr 26]. Available from: https://www.anq.ch/de/fachbereiche/akutsomatik/messergebnisse-akutsomatik/.

• Frick U.
• Wiedermann W.
• Kast S.
• Haug S.
Überprüfung des ANQ-Messplans hinsichtlich Vollständigkeit und Relevanz.
Institut für Sucht- und Gesundheitsforschung (ISGF);, Zürich2012 (Report No.: Forschungsbericht Nr. 313)
4. ANQ. Patientenzufriedenheitsmessung ANQ – Konzept für die Messungen in der Akutsomatik, Rehabilitation und Psychiatrie [Internet]. 2019. Report No.: 1.1. Available from: https://www.anq.ch/wp-content/uploads/2017/12/ANQ_Patientenzufriedenheit_Konzept.pdf.

• Farin E.
[The application of hierarchical linear modelling for rehabilitation center comparisons in quality assurance and rehabilitation research].
Rehabilitation (Stuttg). 2005; 44: 157-164
• Garratt A.M.
• Helgeland J.
• Gulbrandsen P.
Five-point scales outperform 10-point scales in a randomized comparison of item scaling for the Patient Experiences Questionnaire.
J Clin Epidemiol. 2011; 64: 200-207
• Moors G.
• Kieruj N.D.
• Vermunt J.K.
The effect of labeling and numbering of response scales on the likelihood of response bias.
Sociological Methodology. 2014; 44: 369-399
• Moosbrugger H.
• Kelava A.
Qualitätsanforderungen an einen psychologischen Test (Testgütekriterien).
in: Moosbrugger H. Kelava A. Testtheorie und Fragebogenkonstruktion [Internet]. Berlin. Springer Berlin Heidelberg;, Heidelberg2012: 7-26 ([cited 2019 Apr 1]. (Springer-Lehrbuch). Available from: https://doi.org/10.1007/978-3-642-20072-4_2)
• Hu L.
• Bentler P.M.
Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives.
Structural Equation Modeling: A Multidisciplinary Journal. 1999; 6: 1-55
• Rost J.
• Lehrbuch Testtheorie
Testkonstruktion. 2., vollst. überarb. u. erw. Aufl. 2004 edition.
Bern: Huber, Bern;. 2004; : 426
• Zumbo B.
A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modelling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation.
Department of National Defense;. 1999;
• Cohen J.
Statistical Power Analysis for the Behavioral Sciences. 2 edition.
Hillsdale, N.J: Routledge;. 1988; 400
• Covinsky K.E.
• Rosenthal G.E.
• Chren M.-M.
• Justice A.C.
• Fortinsky R.H.
• Palmer R.M.
• et al.
The relation between health status changes and patient satisfaction in older hospitalized medical patients.
J Gen Intern Med. 1998; 13: 223-229
• Young G.J.
• Meterko M.
• Desai K.R.
Patient satisfaction with hospital care: effects of demographic and institutional characteristics.
Med Care. 2000; 38: 325-334
• Gerlach L.
editor. Zeitschrift für Klinische Psychologie und Psychotherapie.
Forschung und Praxis [Internet]. 1999; 28 ([cited 2019 Mar 7]. Available from: https://doi.org/10.1026//0084-5345.28.2.143)
• Chantereau M.W.
• von Holstein K.S.
International comparisons of patients’ views on quality of care.
Int J Health Care Qual Assur Inc Leadersh Health Serv. 2005; 18: 62-73
• Aiken L.H.
• Sermeus W.
• Van den Heede K.
• Sloane D.M.
• Busse R.
• McKee M.
• et al.
Patient safety, satisfaction, and quality of hospital care: cross sectional surveys of nurses and patients in 12 countries in Europe and the United States.
BMJ. 2012; 344: e1717
• Batbaatar E.
• Dorjdagva J.
• Luvsannyam A.
• Savino M.M.
• Amenta P.
Determinants of patient satisfaction: a systematic review.
Perspect Public Health. 2017; 137: 89-101
• Jaipaul C.K.
• Rosenthal G.E.
Are older patients more satisfied with hospital care than younger patients?.
J Gen Intern Med. 2003; 18: 23-30
• Xiao H.
• Barber J.P.
The effect of perceived health status on patient satisfaction.
Value in Health. 2008; 11: 719-725
• Moret L.
• Nguyen J.-M.
• Volteau C.
• Falissard B.
• Lombrail P.
• Gasquet I.
Evidence of a non-linear influence of patient age on satisfaction with hospital care.
Int J Qual Health Care. 2007; 19: 382-389
• Grøndahl V.A.
• Hall-Lord M.L.
• Karlsson I.
• Appelgren J.
Exploring patient satisfaction predictors in relation to a theoretical model.
Int J Health Care Qual Assur. 2013; 26: 37-54
5. ANQ Measurement of Patient Satisfaction [German] [Internet]. ANQ. Available from: https://www.anq.ch/de/fachbereiche/akutsomatik/messergebnisse-akutsomatik/.

• Lilford R.
• Mohammed M.A.
• Spiegelhalter D.
• Thomson R.
Use and misuse of process and outcome data in managing performance of acute medical care: Avoiding institutional stigma.
Lancet. 2004; 363: 1147-1154
• Nicholl J.
Case-mix adjustment in non-randomised observational evaluations: the constant risk fallacy.
Journal of epidemiology and community health. 2007; 61: 1010-1013
• Swiss Federal Office of Public Health
Key figures Swiss hospitals 2018 [Internet]. 2020; (Available from: https://spitalstatistik.bagapps.ch/data/download/kzp18_publication.pdf?v = 1592291836)
• Perneger T.V.
• Peytremann-Bridevaux I.
• Combescure C.
Patient satisfaction and survey response in 717 hospital surveys in Switzerland: a cross-sectional study.
BMC Health Services Research. 2020; 20: 158
• Swiss Federal Office of Public Health
Inpatient hospital cases 2018 [Internet]. 2019; (Available from: /content/bfs/de/home/statistiken/gesundheit/gesundheitswesen/spitaeler/patienten-hospitalisierungen.assetdetail.10627691.html)