If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Swiss Paraplegic Research, Nottwil, SwitzerlandDepartment of Health Sciences and Medicine, University of Lucerne, Lucerne, SwitzerlandCenter for Rehabilitation in Global Health Systems, Lucerne, Switzerland
Swiss Paraplegic Research, Nottwil, SwitzerlandDepartment of Health Sciences and Medicine, University of Lucerne, Lucerne, SwitzerlandCenter of Primary and Community Care, Lucerne, Switzerland
The National Association for Quality Development in Hospitals and Clinics (ANQ) has conducted patient satisfaction measurements in the inpatient sector in Switzerland since 2009. Specifically designed for this measurement, an instrument consisting of five questions was evaluated on an 11-point rating scale. Nevertheless, the instrument showed substantial ceiling effects, which did not allow for hospital discrimination. Therefore, ANQ initiated a revision testing different scales in a pilot study. The results showed that a 5-point verbal scale displayed good psychometric properties. Compared to the 7- or 11-point scales, the 5-point verbal scale exhibited reduced ceiling effects, which was more appropriate to compare hospitals. For the national public reporting of hospitals and clinics, risk adjustment by age and self-reported health status was recommended, which was not the case for gender, principal diagnosis, type of admission and insurance status.
Zusammenfassung
Der nationale Verein für Qualitätsentwicklung in Spitälern und Kliniken (ANQ) führt in der Schweiz seit 2009 Patientenzufriedenheitsmessungen im stationären Bereich durch. Das für diese Messung entworfene Instrument besteht aus fünf Fragen, welche auf einer 11-er Bewertungsskala beurteilt werden. Es hat sich aber gezeigt, dass das Instrument erhebliche Deckeneffekte aufweist und die Variation zwischen den Spitälern unzureichend abbildet. Der ANQ hat deshalb eine Überarbeitung veranlasst, wobei verschiedene Skalenversionen in einer Pilotuntersuchung getestet wurden. Dabei hat sich gezeigt, dass eine verbale 5-er Antwortskala gute psychometrische Eigenschaften aufweist. Die Deckeneffekte werden minimiert und das Instrument ist besser in der Lage zwischen den Spitälern zu diskriminieren, als die verglichenen 7-er oder 11-er Skalen. Vergleich der Spitäler und Kliniken wird eine Risikoadjustierung nach Alter und subjektivem Gesundheitszustand empfohlen, hingegen sind Geschlecht, Hauptdiagnose, Art des Spitaleintritts oder Versicherungsstatus nicht im Adjustierungsmodell zu berücksichtigen.
Patient-reported satisfaction is increasingly recognized as quality indicator of hospital care as it provides a perspective on quality often not recognized by purely clinical or managerial perspectives [
]. Nevertheless, there are inconsistent findings concerning the positive association between favorable patient satisfaction ratings and better clinical outcomes. Satisfaction needs to be considered as a self-sufficient quality of care outcome [
]. The relevance of satisfaction measures for governance of the health care system has been recognized with some systems incorporating them into laws and reimbursement schemes, e.g. in the US Hospital Value-Based Purchasing Program under der Centers for Medicare & Medicaid Services (CMS) [
In Switzerland, hospital quality programs were not publicly mandated but implemented by various stakeholders, such as the National Association for Quality Development in Hospitals and Clinics ANQ [
ANQ – Swiss National Association for Quality Development in Hospitals and Clinics [Internet]. ANQ. [cited 2021 Apr 26]. Available from: https://www.anq.ch/en/.
]. The ANQ's objective is to monitor Switzerland‘s quality of hospitals and clinics and make this information publicly available for benchmarking. Its mission is explicitly not to produce rankings or hospital league tables. One of the many ANQ quality indicators is the periodic measurement of inpatient satisfaction. This information is provided for hospitals‘ quality monitoring, for payer organizations who use it for contracting with service providers, for policy as a basis for planning, and for the patients in order to enable informed choice of service providers. In Switzerland since 2012, patients can, with minor limitations, choose any hospital nationwide for their inpatient services. This aims to intensify the competition between hospitals that should ultimately lead to a consolidation of the almost three-hundred hospitals in Switzerland (one hospital per 28,820 residents).
The ANQ inpatient satisfaction measurement concept was developed by an expert consortium, which evaluated potential instruments. Nevertheless, the consortium did not achieve a consensus on the use of an equally respected instrument, and, as a result, the ANQ developed its own instrument implemented in all three Swiss national languages: German, French, and Italian [
]. The survey was first conducted in 2009 in nearly 200 acute care hospitals and continued annually with a response rate of about 47 percent. The average outcome of each hospital has been published in the ANQ‘s website with a funnel plot for every item [
The original instrument consisted of five items on a numerical 11-point rating scale from 0 (=extremely bad) to 10 (=excellent). Two items referred to the overall satisfaction with care, two items referred to communication, and the last item referred to whether patients felt treated with respect and dignity. In addition to the five items, the survey participants were asked about their background characteristics that included gender, age, and health insurance type (public or additionally semiprivate/ private).
Patients discharged from inpatient hospitalization in acute care were invited to participate in the survey. The inclusion criteria were age of 18 years or older, a Swiss home address, and knowledge of the local language. The questionnaire was mailed between two and seven weeks after discharge. Participating hospitals also sent the questionnaire electronically when the e-mail address was available, and the patient agreed to participate. No reminder management was in place. The survey logistics and data analysis were mandated to independent contractors. Shortly after its rollout, the survey revealed some weaknesses, being the ceiling effect its main shortcoming [
]. As a result, the expert consortium was mandated to revise the instrument with the goal to create a more granular comparison among hospitals. Further aims of the revision were the inclusion of further domains and the analysis of confounding factors for risk adjustment.
A first draft of a modified instrument was available by the end of 2014, which included items on discharge management and medication. A study with the overall objective to test and potentially modify the new instrument for the national reporting of patient satisfaction in an acute in-patient setting was released. The specific aims of the study were a) to determine requested quality dimensions (survey design); b) to test the instrument for comprehensibility by patients; c) determine a response scale that minimizes ceiling effects; d) determine the instrument's dimensionality and psychometric properties; and e) identify variables for risk adjustment and its effect.
Material and methods
Questionnaire design
By the end of 2014, a first draft of a modified instrument was available. The pre-pilot or preparation phase tested an initial version of the questionnaire, which included seven items within four domains: quality of care, communication, medication, and discharge process. Before launching the pilot phase, the evaluation of the questionnaire passed through three stages that included the patients’ perspective, experts’ analysis, and a scientific translation.
In the first stage, or pilot phase, cognitive interviews were organized to elicit patients’ understanding of the questionnaire. The process started in June 2015 in three acute care hospitals, where patients were contacted in advance by the hospital to inform them about the survey. Only patients who accepted to participate, and who signed a declaration of consent took part of the interviews. In total, 15 patients evaluated the clarity of the questions, the answer possibilities, and the visual structure of the questionnaire. This process ended with recommendations that resulted in adaptations to the questionnaire. In a second stage, the adapted version of the questionnaire was presented and discussed with the expert committee on patient satisfaction. This process resulted in additional changes that were tested again with a smaller group of patients.
As the original version of the questionnaire was developed in German, the questionnaire followed a rigorous translation process in order to provide adapted versions for the French and Italian speaking regions. Initially health scientists from the Institute of Social and Preventive Medicine, Lausanne (IUMSP), whose mother tongue were German, French, or Italian, translated the questionnaire. The translated versions were sent to two independent professional translators to perform back-translations. To check for potential inconsistencies, members of the IUMSP together with external experts evaluated these final versions. Finally, the translated versions of the questionnaire were tested with patients.
Sampling and inclusion criteria
Inclusion criteria were persons 18 years of age or older, who had a hospital stay of more than 24 hours in acute care with discharge during the defined measurement period for each language region. For German-speaking hospitals, the measurement period was the month of October 2015, for French-speaking hospitals was January 2016 and for Italian-speaking hospitals was February 2016. The questionnaire was sent out on the 15th of the following month respectively. Persons with multiple hospital admissions and discharges during the measurement month received only one questionnaire. The inclusion and exclusion criteria were identical to the ANQ patient satisfaction survey [
ANQ. Patientenzufriedenheitsmessung ANQ – Konzept für die Messungen in der Akutsomatik, Rehabilitation und Psychiatrie [Internet]. 2019. Report No.: 1.1. Available from: https://www.anq.ch/wp-content/uploads/2017/12/ANQ_Patientenzufriedenheit_Konzept.pdf.
] it was required to have a significant number of participating hospitals in each language region, so the differences between regions could be interpreted; and [
] it was required to have enough cases per hospital, so the differences between hospitals could be interpreted. From the literature, it resulted that 12-15 hospitals, five per language region (German, French, Italian), were necessary to get robust estimates of the psychometric properties [
]. Every participating hospital randomly allocated each patient to one of the three scales, where a minimum response rate of 50 responses per scale was determined, assuming the detection with of a difference of 0.5 in standard deviation from the overall mean with power of 80%. The minimum number of questionnaires required by each hospital was 150, which translated to 2,250 cases at country level, or 750 per language region.
Statistical analysis
Ceiling effect
To deal with the observed ceiling effect of the previous version of the questionnaire, which had a 11-point scale, the pilot survey tested two additional scale responses: a 5-point verbal scale and a 7-point scale, both with only the extreme rates labeled. The inclusion of the additional scales was based on recommendations of related literature, which suggested that lower point scales tend to have a better performance to deal with ceiling effects [
To evaluate the adequacy of the new set of questions and to determine the capacity of discrimination of the scales, two tests were performed. The first test was the item-total correlation, which determined if an item in the questionnaire was inconsistent with the overall performance of the other items. Values between 0.4 to 0.7 were considered good for discrimination purposes [
]. The second test was a Cronbach's alpha () test, which measured the internal consistency of the questionnaire, and calculated how the items related to each other. Scales with were recommended to use. Finally, a total (sum) score was built by equally weighting all items. For ease of comparison, each scale was linearly transformed into a 0 to 100 scale.
Psychometric properties
Unidimensionality
Dimensional analysis tested the structure of the items for the three scales, as well as the adequacy of the questions. Using confirmatory factor analysis, we tested if the data fitted the theoretical model. A 4-factor and a 1-factor model were specified and tested with the general factor “patient satisfaction.” The adaptability of the models was assessed using several measures: The Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI) and the Normed Fit Index (NFI). Good adaptation of the model for the RMSEA was assumed with values between 0.05 and 0.08, for the CFI ≥ 0.97 and for the NFI ≥ 0.95 [
To test the global structure of the questionnaire, a Rasch model analysis was implemented. Compared to other models, an advantage of the Rasch model is that a deterministic relation between the patients’ behavior in the test and their personal parameters is not necessary. It assumes the existence of a latent continuity of characteristics (patient satisfaction) that allowed patients to be grouped into item-response categories based on the manifestation of their characteristics (satisfaction).
The Q-index controls for the adequacy of the items included in the questionnaire, i.e., the extent to which the response to the items is explained by the model. The result of the Q-index (conditional item-fit index ranging from 0 to 1) shows when an item is more or less discriminated than predicted by the model. A Q-index lower than 0.30 is indicative of a good fit [
The differential item functioning (DIF) tests whether items in a questionnaire are similarly understood across groups with different background characteristics (usually age, gender, or ethnicity). The objective of any questionnaire is that the differences in the probability of choosing a category of response are explained by the variables of interest (global satisfaction) and not by the groups’ characteristics. To detect DIF, for every item in the questionnaire, and in the three scale formats, a logistic regression was implemented [
A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modelling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation.
]. In a first stage, the cumulative sum of all the items was integrated into the model as a predictive variable. The resulting pseudo R2 was then compared to the pseudo R2 when the group's characteristics were included as covariates. The difference between the two pseudo R2 displayed the intensity of the DIF. A DIF within the threshold was considered moderate. The DIF was controlled for age, sex, and language.
Hospital comparability
In addition to the item analysis, this study examined how the different scales discriminated the results among hospitals. The average score of each hospital were compared to the total score. An analysis of the variance of patient satisfaction was calculated to evaluate hospitals‘ variation. The analysis was performed for both the total score of patient satisfaction, and the five individual items in the three scale versions. The explained variance (), which indicates the part of the variance that can be explained by the grouping variable (hospitals), varies from 0 to 1. was considered a small effect below 0.06, a medium effect between 0.06 and 0.14 and a major effect when larger than 0.14 [
The interpretation and comparability of the results required that the characteristics of patients be similar among the participating hospitals. Otherwise, the results on patient satisfaction could have been driven by other causes than the services and treatment offered by the hospitals. Using a unifactorial and a multifactorial analysis, and following evidence from related literature [
], the analysis included the following variables: Age (separated into twenty age-groups), sex, health insurance status (basic, private or semi-private), type of admission (emergency or routine admission), principal diagnosis as an indicator of illness severity (grouped by main chapters of ICD-10), length of hospital stay, place of discharge (at home or other place), delay effect (time between hospital discharge and receiving the questionnaire, 2-6 weeks after discharge), and self-perceived health status at the moment of filling the questionnaire.
Age, sex, health insurance, self-perceived health status (1 item, 5-point verbal scale), and questionnaire date were collected in the survey. The type of admission, place of discharge, dates of admission and discharge, and principal diagnosis were merged in from administrative hospital data. If one of the latter variables appeared relevant for risk adjustment, it would be included in future versions of the questionnaire.
A variance-covariance test checked if the included variables of adjustment allowed a fair comparison among the participating hospitals. A single-factor analysis of variance tested the relationship between potential confounders and patient satisfaction. The variables for which the one-factor analysis showed a relationship to patient satisfaction were then integrated into a multifactorial model.
Data preparation and statistical analyses were conducted with IBM SPSS Statistics 25. Rasch models were estimated with WINMIRA 2001 1.37 and the factor analysis was computed using LISREL 8.7. A significance level of p < 0.05 was applied to denote statistically significant effects.
Results
Questionnaire
The final questionnaire included five items within four domains of patient satisfaction that were tested in the three rating scales. The items included questions about quality of care (Q1), information/communication (Q2, Q3), medication (Q4), and discharge process (Q5). More specifically:
•
Q1. How do you evaluate the quality of care? (Care performed by physicians and nursing personnel)
•
Q2. Did you have the possibility to ask questions?
•
Q3. Did you receive comprehensible answers to your questions?
•
Q4. Was the purpose of the medication you should take at home explained to you in an understandable way?
•
Q5. How was the organization of the hospital discharge?
Sampling and inclusion criteria
The pilot questionnaire was launched in 13 hospitals, 6 in the German-speaking region, 5 in the French-speaking region, and 2 in the Italian-speaking region. In total, 9,460 questionnaires were sent to eligible patients, from which 3,440 were returned, a response rate of 36.4% (Figure 1). Only the Italian-speaking region (n = 450) did not reach the minimum number of questionnaires per region.
There were a similar number of questionnaires by scale response: 1184 questionnaires with a 5-point verbal scale, 1181 with a 7-point scale, and 1075 with an 11-point scale. Amongst the 13 hospitals, one hospital in the French part did not reach the minimum required (50 completed questionnaires) and it was excluded from the analysis that determined differences between hospitals.
In total, the sample had more women (53.1%) than men (46.9%) that participated in the survey (Table 1). The average age of the respondents was 61.0 years. The average hospital stay was of 6.0 days, with more emergency (52.4%) than routine admissions (47.6%). Most patients had a general health insurance (74.1%), and about a quarter (25.9%) had private or semi-private insurance. The place of discharge was mostly at home (81.5%), and few patients (18.5%) were discharged to a different place. The average number of days between the discharge from the hospital and the questionnaire was 33.1 days.
Table 1Sample characteristics.
Variable
All responders
Responders with complete questionnaire
(N = 3,440)
(N = 2,734)
Gender – n (valid %)
Male
1592 (46.9)
1303 (48.2)
Female
1802 (53.1)
1400 (51.8)
Missing
46
31
Age (years) – Mean (SD)
61.0 (18.8)
60.2 (18.6)
Length of hospital stay (days) – Mean (SD)
6.0 (6.6)
6.1 (6.3)
Admission type – n (valid %)
Emergency
1753 (52.4)
1376 (51.6)
Routine (planned)
1595 (47.6)
1292 (48.4)
Missing
92
66
Place of discharge – n (valid %)
Home
2614 (81.5)
2100 (82.5)
Others
594 (18.5)
446 (17.5)
Missing
232
188
Insurance type – n (valid %)
Social health insurance
2493 (74.1)
1982 (73.8)
Private/Semi-private
873 (25.9)
705 (26.2)
Missing
74
47
Principal diagnosis – n (valid %)
Infectious and parasitic diseases
70 (2.2)
53 (2.1)
New formations
321 (10.1)
257 (10.1)
Endocrine, nutrition and metabolism
53 (1.7)
43 (1.7)
Mental and behavioral disorders
35 (1.1)
26 (1.0)
Diseases of the blood/blood-forming organs
19 (0.6)
14 (0.5)
Diseases of the eye / ear
71 (2.2)
55 (2.1)
Diseases of the nervous system
91 (2.9)
66 (2.6)
Diseases of the circulatory system
490 (15.4)
392 (15.4)
Diseases of the respiratory system
187 (5.9)
152 (6.0)
Diseases of the digestive system
289 (9.1)
233 (9.1)
Diseases of the skin and subcutaneous tissue
32 (1.0)
25 (1.0)
Diseases of the musculoskeletal system
418 (13.1)
351 (13.8)
Diseases of the urogenital system
234 (7.3)
189 (7.4)
Pregnancy, birth and postpartum
296 (9.3)
246 (9.7)
States originating in perinatal period
13 (0.4)
8 (0.3)
Congenital malformations/deformations
11 (0.3)
11 (0.4)
Symptoms and abnormal clinical
103 (3.2)
79 (3.1)
Injuries/poisoning
413 (13.0)
323 (12.7)
Factors affecting health status
39 (1.2)
24 (0.9)
Missing
255
187
Self-perceived health status (1=Bad to 5=Excellent) – Mean (SD)
3.22 (0.88)
3.21 (0.88)
Duration between discharge and survey (days) – Mean (SD)
The results showed that the 5-point verbal scale offered the best discrimination between respondents. This was true particularly for items 1, 2, and 5. In general, the response values were inflated; while for items 3 and 4, the average values of the three scale responses were close (around 85), for the items 1, 2 and 5, the 5-point scale displayed values significantly lower. The response values in the three scales showed a leptokurtic left skewed distribution. Among the three, the 5-point scale displayed a distribution closer to a normal distribution, particularly for items 1, 2 and 5.
The ceiling effect ranged from 23.5% to 61.3%, where items 2 and 4 presented the highest ceiling effects, and items 1 and 5 the lowest. Comparing the scales, the 5-point scale presented the lowest ceiling effect, with the exception of items 3 and 4.
The non-response rate per scale ranged from 2.5% to 9.0%. The 5-point verbal scale had the fewest missing values. The item 4 presented the fewest missing responses in the three scales (5-point scale: 6.4%; 7-point scale: 9.0%; 11-point scale: 8.8%). As for the ability of discrimination, the item-correlation of the three scales ranged between 0.50 to 0.66 (5-point scale), 0.63 to 0.77 (7-point scale), and 0.62 to 0.75 (11-point scale). For the three scale formats, the results indicated a good representation of the total value and a high capacity for discrimination between respondents with different characteristics.
Finally, in the total score, the 5-point scale presented the lowest ceiling effect (5-point scale: 10.6%; 7-point scale: 22.7%; 11-point scale: 22.0%) and the lowest average values (5: 79.8; 7: 83.9; 11: 86.3). The three scales presented a left skewed distribution, where the 5-point scale was the least skewed (asymmetry: 5-point scale: -1.14; 7-point scale: -1.65; 11-point scale: -2.12).
Psychometric properties
Unidimensionality
A dimensionality analysis was implemented to assert how the four domains covered in the questionnaire measured the concept of patient satisfaction. The selected domains performed well in the three tested scales. The RMSEA was < 0.06 in the three scales, the CFI >0.99, and the NFI >0.98, which implies that the correlation between the items of the questionnaire explained the selected model. Nonetheless, the 7- and 11-point scale displayed better adaptation values. Similarly, in the single factor model the RMSEA was < 0.07 in two of the three scales, and the CFI >0.98 and NFI >0.98 in the three scales. The 7- and 11-point scale showed a better performance; however, the differences were marginal. In general, the factor analysis showed that the three scales had good data adequacy. Thus, the factors represented well the model of patient satisfaction.
Rasch model
The estimated personal parameters showed a monotonic transformation in the total sums of the items. A big part of the possible cumulative sums and the intermediate part, the personal Rasch parameters, represented a close to a linear transformation (Appendix A, Figure A1). By inspecting the Q-index, the items in the three scales showed a good fit of the data with the p-values not indicating statistically significant differences with the subject response model (Appendix A, Table A1). Therefore, the adequacy index for every item showed a good fit of the characteristics of “patient satisfaction”. The representative values of the adaptation capacity of the global model confirmed the hypotheses of the Rasch model in the three scales. The 5-point scale achieved good overall fitting values.
Differential item functioning
The DIF tested three characteristics of the patients: age, sex, and language (German, French or Italian). The change in the R2 fell in the range , which is below the critical values (Appendix A, Table A2). Therefore, it was assumed that there were no systematic differences in the patient satisfaction associated to the characteristics of the patient.
Hospital comparability
A graphical comparison of the total satisfaction score between the 12 participating hospitals and the three scales showed the largest variability between hospitals in the 5-point scale (Appendix A, Figure A2). Also, the variance analysis indicated that the 5-point scale best differentiated among hospitals, which resulted in statistically significant differences in patient satisfaction (; ). The findings were similar for the 7-point scale (; ). In contrast, the 11-point scale did not show statistically significant difference among hospitals (; ).
The mean values by individual items showed statistically significant differences between hospitals (Appendix A, Table A3). Nevertheless, this result varied for the three scales. In the 5-point scale, items 1, 2, 3, and 5, showed the most significant differences. In the 7-point scale, items 2, 3, and 5 showed also statistically significant, but smaller differences than the 5-point scale. The 11-point scale did not identify differences amongst hospitals for all items.
Risk adjustment
The one-factor analysis in patient satisfaction showed significant differences by age, type of admission, discharge destination, and self-reported health. In all three scales, age was as a significant predictor of patient satisfaction. However, there was no linear relationship; the lowest patient satisfaction values were found in the age groups 20-29 years and 80-89 years, while the middle age groups had higher patient satisfaction. Patients admitted through emergency reported a significantly lower patient satisfaction than patients in routine admission. Nevertheless, this result was only shown in the 5-point scale; the 7- and 11-point scale did not show significant differences. On average, people who were discharged to home reported a higher patient satisfaction in the 5- and 11-point scales, with more statistically significant results in the 5-point scale. A linear relationship between patient satisfaction and self-reported health status was identified, where patients with a better perception of their health tend to report higher levels of satisfaction. The 5-point scale showed a higher level of discrimination of the results (5-point scale: ; 7-point scale: ; 11-point scale: ). No statistically significant association with patient satisfaction and sex, principal diagnosis, length of hospital stays, type of health insurance, or the delay effect were found in all three scales. For this reason, these factors were excluded from the multifactorial analysis for risk adjustment. Table 2
Table 2Statistical distribution of satisfaction scores, by item and scale.
Item 1
Item 2
Item 3
Item 4
Item 5
Total score
Quality of treatment
Asks questions
Gets answers
Medication
Discharge
Mean
5-point scale
74.80
83.90
85.18
84.94
69.96
79.80
7-point scale
85.81
86.65
84.59
86.35
80.82
83.90
11-point scale
87.07
88.56
86.98
85.56
83.53
86.30
St. Deviation
5-point scale
20.33
20.64
19.06
23.10
23.49
16.37
7-point scale
16.90
19.09
19.72
25.08
23.63
17.06
11-point scale
16.01
17.32
17.49
22.13
21.53
15.57
Median
5-point scale
75.00
100
100
100
75.00
85.00
7-point scale
83.33
100
83.33
100
83.33
90.00
11-point scale
90.00
100
90.00
100
90.00
90.00
Mode
5-point scale
75.00
100
100
100
75.00
7-point scale
100
100
100
100
100
11-point scale
100
100
100
100
100
Kurtosis
5-point scale
0.82
0.70
1.92
2.65
0.47
1.69
7-point scale
4.72
4.15
3.21
2.71
2.11
3.74
11-point scale
5.89
6.25
5.76
4.33
3.09
5.92
Assymety
5-point scale
-0.72
-1.14
-1.35
-1.70
-0.69
-1.14
7-point scale
-1.77
-1.86
-1.66
-1.80
-1.53
-1.65
11-point scale
-2.04
-2.24
-2.10
-2.07
-1.78
-2.12
Missing rate
5-point scale
3.0%
4.1%
4.2%
6.4%
2.9%
7-point scale
2.5%
5.2%
5.6%
9.0%
3.5%
11-point scale
3.3%
6.7%
6.7%
8.8%
4.6%
Ceiling effect
5-point scale
27.1%
54.9%
54.2%
61.3%
23.5%
10.6%
7-point scale
43.8%
54.6%
47.6%
54.9%
42.4%
22.7%
11-point scale
40.6%
53.3%
44.9%
52.0%
41.6%
22.0%
Item correlation
5-point scale
0.66
0.63
0.65
0.50
0.53
7-point scale
0.72
0.71
0.77
0.63
0.65
11-point scale
0.73
0.73
0.75
0.62
0.68
Notes: The 5-point scale was a verbal scale (n = 1184), while in the 7-point (n = 1181) and 11-point scale (n = 1075) only the minimum and maximum were labeled.
The multifactorial regression (Table 3) included age, sex, an interaction term between age and sex, the self-perceived health status, the type of admission and the place of discharge and the treating hospital, as predictors of patient satisfaction. Sex was considered as control variable, as a small association with patient satisfaction was found, although not statistically significant. Age and self-perceived health status showed a statistically significant relation to satisfaction measured on all three scales (5-point, 7-point, and 11-point scale). In the 5-point scale, the type of admission appeared statistically significant as well. In general, the multifactorial analysis showed that the 5-point scale, compared to the other scales, can establish statistically significant differences amongst the hospitals (5-point scale:, ; 7-point scale: ; 11-point scale: ).
Table 3Risk adjustment: Effect of confounders, by scale.
Confounder
5-point scale
7-point scale
11-point scale
p-value
p-value
p-value
Sex
0.249
0.002
0.609
0.000
0.501
0.001
Age
0.033
0.011
0.003
0.018
0.003
0.018
Sex*Age
0.548
0.003
0.355
0.004
0.355
0.004
Health status
< 0.001
0.103
< 0.001
0.078
< 0.001
0.078
Admission type
< 0.001
0.016
0.797
0.000
0.331
0.001
Place of discharge
0.083
0.004
0.948
0.000
0.948
0.000
Hospital
0.027
0.027
0.061
0.025
0.693
0.011
Notes: The 5-point scale was a verbal scale (n = 948), while in the 7-point (n = 900) and 11-point scale (n = 820) only the minimum and maximum rates were labeled
Health status was a self-report on a verbal 5-point scale
Admission type and place of discharge were binary (emergency vs routine; and home vs other, respectively)
This study demonstrated the potential to explore various response scales in patient satisfaction ratings when the aim was a reduction in the ceiling effect towards a better discrimination among hospitals. The three tested scales displayed good psychometric properties and fit well the Rasch model. In general, the results showed that a shorter verbal scale offered the most suitable framework.
More specifically, the 5-point scale displayed the best discrimination among hospitals, and reported the fewest missing values, which suggests a wider acceptability amongst survey participants. Yet, the scale still displayed a substantial ceiling effect. In fact, the median satisfaction was maximum in two out of seven tested questions irrespective of the response scale, and the mean score per item was between 70 and 85 (in a 0-100 scale). These numbers are substantially higher than those observed in comparable European countries that exhibited ratings between 50 and 70 [
Patient safety, satisfaction, and quality of hospital care: cross sectional surveys of nurses and patients in 12 countries in Europe and the United States.
]. From these results we conclude that when the general population is better informed on health care, it translates in higher expectations, and thus lower ratings [
]. Since nationwide education of the population remains illusory and methodological approaches are unlikely to overcome ceiling effects, the question remains as to how these ceiling effects can be reduced.
This study identified self-reported health status and age as confounding factors when assessing patient satisfaction; similarly for the type of admission where persons admitted through emergency reported lower satisfaction. In the literature, the effect of health status on satisfaction is well reported [
]. In general, more choice by the patients was associated with higher satisfaction ratings, such as for routine admissions as compared to emergency admission, or for privately insured patients as compared to publicly insured [
]. This phenomenon could partly explain the high satisfaction ratings, as Switzerland offers free choice of hospitals, with few restrictions.
Related studies have shown that the language region was often identified as a factor contributing to rating tendency with the German speaking region featuring the highest satisfaction [
]. Still, language region was not intended to be employed as a factor for risk adjustment as it is not a personal factor, but more a specification of the hospital. A risk adjustment by language would annul any real difference in quality of care between linguistic regions. Nevertheless, it remains to be proven if the differences in satisfaction ratings between language regions are a result of cultural differences, or they reflect a distinct care provision between regions.
A similar argument could be made to not adjust by other factors in order to avoid annulling the effect of low-quality care provided to specific groups in the population, e.g., young individuals, or those with low self-perceived health status. In many cases, risk adjustment could increase the bias [
]. Besides, if a confounding factor is strong but evenly distributed amongst the hospitals in the sampling population, there is no need for adjustment. The last assertation is most likely true for Swiss hospitals which cannot deny care to patients and show little discrimination against specific groups [
]. Therefore, the ANQ decided to offer some risk adjustment in moderate ways, including rather fewer than more adjustment factors.
Strengths and Limitations
For this pilot study, 13 hospitals out of 200 acute care hospitals in Switzerland, were surveyed. The participating hospitals were not randomly selected, but actively agreed to participate in response to an ANQ call. In this respect, the generalizability of the findings from such an opportunity sample can be questioned.
The response rate of the survey was substantially lower compared to other satisfaction survey conducted in the same year (46.5%). Related literature noticed that more satisfied patients were also more likely to participate in a post-hospitalization satisfaction survey [
]. Hence, the lower the response the more the findings were biased towards higher ratings. Nevertheless, in this study, the response rate was not necessarily critical for our efforts to mitigate the ceiling effect; however, it raises a question on the representativeness of the psychometric properties.
The study sample matched well the general hospital population in the distribution of gender [
]. However, the study population had a lower average length of hospital stay than the general population. Other parameters such as age or insurance types were not systematically sampled, and were only presented in aggregate form to protect the privacy of the patients.
Conclusions
Using a large sample of patients from 13 hospitals in Switzerland, we showed that, compared to other scales, a 5-point verbal scale is a valid and reliable instrument for measuring patient satisfaction in acute care. The findings should preferably be risk adjusted for age and self-perceived health status. Nevertheless, the challenge remains to design an instrument that hampers the ceiling effect, which allows to discriminate among hospitals.
Related studies should consider testing more than one scale in order to better address ceiling effects. Similarly, risk adjustment must account for the context, as other personal factors might be more relevant in other countries/regions. Finally, it is important to take into account the timing of the application of the questionnaire, as recall bias could play an important role in the reported satisfaction.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Acknowledgement
We thank the ANQ Quality Committee on Patient Satisfaction with its members: Pierre Chopard, Adriana Degiorgi, Michel Délitroz, Andrea Dobrin, Armin Gemperli, Francesca Giuliani, Janick Gross, Stefan Kuhn, Anastasia Theodoridou, Stephan Tobler, Daniel Uebelhart and Eric Veya. We further thank Chjristoph Poggendorf and Anna Schlumbohm from the Charité, Berlin for the technical support, Anita Savidan-Niederer and Isabelle Peytremann-Bridevaux from ESOPE, IUMSP Lausanne for support in translation, and Regula Heller and the entire ANQ team. We also wish to thank all patients who participated in our pilot survey; and those who did and continue to participate in ANQ patient satisfaction surveys.
Conflict of interest
The authors have no conflict of interest to disclose.
ANQ – Swiss National Association for Quality Development in Hospitals and Clinics [Internet]. ANQ. [cited 2021 Apr 26]. Available from: https://www.anq.ch/en/.
ANQ. Patientenzufriedenheitsmessung ANQ – Konzept für die Messungen in der Akutsomatik, Rehabilitation und Psychiatrie [Internet]. 2019. Report No.: 1.1. Available from: https://www.anq.ch/wp-content/uploads/2017/12/ANQ_Patientenzufriedenheit_Konzept.pdf.
Qualitätsanforderungen an einen psychologischen Test (Testgütekriterien).
in: Moosbrugger H. Kelava A. Testtheorie und Fragebogenkonstruktion [Internet]. Berlin. Springer Berlin Heidelberg;,
Heidelberg2012: 7-26 ([cited 2019 Apr 1]. (Springer-Lehrbuch). Available from: https://doi.org/10.1007/978-3-642-20072-4_2)
A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modelling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation.
Patient safety, satisfaction, and quality of hospital care: cross sectional surveys of nurses and patients in 12 countries in Europe and the United States.