Dissociable neuronal mechanism for different crossmodal correspondence effects in humans

Crossmodal correspondences (CMCs) refer to associations between seemingly arbitrary stimulus features in different sensory modalities. Pitch‑size correspondences refer to the strong association of e.g., small objects with high pitches. Pitch‑elevation correspondences refer to the strong association of e.g., visuospatial elevated objects with high pitches. We used functional magnetic resonance imaging (fMRI) to study the neural components, which underlie the CMCs in pitch‑size and spatial pitch‑elevation. This study focuses on answering the question of whether or not different CMCs are driven by similar neural mechanisms. The comparison of congruent against incongruent trials allows the estimation of CMC effects across different CMCs. The analysis of the measured neural activity in different CMCs strongly pointed toward different mechanisms which are involved in the processing of pitch‑size and pitch‑elevation correspondences. Differential, whole brain effects were observed within the superior parietal lobule (SPL), cerebellum and Heschls’ gyrus (HG). Further, the angular gyrus (AnG), the intraparietal sulcus (IPS) and anterior cingulate cortex (ACC) were engaged in processing the CMCs but showed different effects for processing congruent compared to incongruent stimulus presentations. Within pitch‑size significant effects in the AnG and ACC were found for congruent stimulus presentations whereas for pitch‑elevation, significant effects in the ACC and IPS were found for incongruent stimulus presentations. In summary, the present results indicated differential neural processing in different simple audio‑visual CMCs.


INTRODUCTION
Crossmodal correspondences (CMCs) refer to almost universally experienced (implicit) associations between stimulus features in different sensory modalities.A well-studied example of correspondence effect was found for pitch and visual elevation.When an object that is visually elevated in space is paired with a high-pitched tone, a stronger association of these features is observed compared to pairing the same object with a low-pitched tone (Ben-Artzi & Marks, 1995;Chiou & Rich, 2012;Evans, 2020;Evans & Treisman, 2010;Jamal et al., 2017;McCormick et al., 2018;Melara & Brien, 1987).A second pitch-based CMC is pitch and size.Presenting e.g., a small object together with a high-pitched tone resulted in a successful crossmod-al correspondence in a study by Evans and Treisman (2010) and numerous other studies (Bien et al., 2012;Bonetti & Costa, 2018;Gallace & Spence, 2006;Parise & Spence, 2012).
A prominent theory about the origin of pitch-elevation correspondences is based on language process-ing (Parise et al., 2014;Spence, 2011Spence, , 2020;;Spence & Sathian, 2020).In most cultures, the words 'high' and 'low' can describe both, the height of a pitch and the position of an object in space.This linguistic link is not described for pitch and size correspondences.Even though a language-driven cause for pitch-elevation associations is plausible (Ben-Artzi & Marks, 1999), a growing number of studies on the CMC effect between pitch and spatial-elevation raise the question if other variables than language probably cause the strong CMC of these seemingly arbitrary stimulus pairs (McCormick et al., 2018;Parise et al., 2014;Parkinson et al., 2012).
A second theory on CMCs declares that the correspondences between pitch and elevation as well as pitch and size probably arise from regularities in our natural environment that are stored in memory (Parise et al., 2014;Spence, 2011;2020;Spence & Sathian, 2020).For example, larger bodies usually resonate lower pitches and smaller objects tend to resonate higher pitches (Parise et al., 2014).We are confronted with this regularity frequently in our daily lives.Children typically have a higher-pitched voice than adults (Lee et al., 1999) and small animals tend to make higher-pitched noises than larger animals (Bowling et al., 2017).We also tend to perceive higher-pitched tones from objects elevated in space than when on the ground (Parise et al., 2014).Following this assumption, CMCs probably have their roots in statistical regularities, i.e. naturally learned rules and assumptions from our environment (Parise et al., 2014;Spence, 2011;2020;Spence & Sathian, 2020).If both CMCs have their origin in similar mechanisms, great activations within comparable brain regions will be measured in both pitch-size and pitch-elevation CMCs.
The third and last theory we are going to address is the theory of perceived intensity, which is also called a theory of magnitude (Spence, 2011).This theory declares that the CMC effect probably evolved from a correspondence in intensity or magnitude in the underlying neuronal structure of corresponding stimulus pairs (Spence, 2011;Spence & Sathian, 2020).The main idea underlying the magnitude in CMCs is a shared polar dimension of the stimulus pairs perceived as congruent.According to this notion, a high-pitched tone and a small visual stimulus would be situated on the same side of their respective polar dimension.Compared to incongruent stimulus pairs, congruent pairs would share 'more' in terms of intensity or magnitude (Chang & Cho, 2015).A common neural activation in terms of magnitude was found for e.g., numbers by Piazza et al. (2007) and sizes with luminance by Pinel and colleagues (2004).If pitch and size and pitch and elevation correspondences have their origin in similar coded neuronal responses, we hypothesize to find greater activations within the intraparietal sulcus (IPS) for congruent trials as a common effect in both CMCs (Humphreys & Ralph, 2015;Piazza et al., 2007;Pinel et al., 2004).
The CMC is in behavioral studies often measured via the reaction time (RT) differences between congruent and incongruent stimulus pairs.Thus being significant, these differences are rather small in absolute values (Chiou & Rich, 2012;Evans & Treisman, 2010).The study by Evans and Treisman, performed in 2010, included eight subjects in their pitch-size visual experiment, in which the absolute difference between congruent and incongruent trials was 14.4 ms.Their pitch-elevation visual paradigm included twelve participants and the absolute difference between congruent and incongruent RTs was 18.6 ms.Within their fMRI paradigm, Mc-Cormick and colleagues (2018) did not find significant RT differences between congruent and incongruent stimulus presentation in the pitch-elevation CMC, what may be related to the overall small size of the effect.They validated their findings outside the scanner with a behavioral task (McCormick et al., 2018).Based on these previous findings, performing a behavioral test outside the scanner appears to be an appropriate measure to validate CMC effects studied with fMRI (Koten et al., 2013;McCormick et al., 2018).In our study, we implemented a congruence classification task outside the scanner to measure the behavioral CMC effect in addition to the typically measured RTs.
The main focus in our study was to examine the neural basis of the processing of a pitch-elevation CMC and compare this to a pitch-size CMC while both CMCs are always in the focus.The estimation of the CMC effect can be achieved by the calculation of the difference of congruent > incongruent (C > I) presentations.The calculated difference then allows a direct comparison of the neural substrates of the CMC effect between the different CMCs.This comparison can be used to test common or different neural correlates of different CMCs focusing on the CMC effect, thus directly testing the different theoretical assumptions about the origin of the CMC effect.
If we find a common effect within the IPS for congruent > incongruent presentations, a magnitude driven CMC is likely to cause this effect.An effect within the left inferior frontal gyrus (IFG) is favorable for a CMC driven by language, which we hypothesize to find most likely for congruent > incongruent pitch-elevation presentations.However, if CMCs are based on statistical representations of our environment, we will most likely find an effect within areas common for attention and memory retrieval like the anterior cingulate cortex (ACC) or the AnG.
Although we are mainly interested in congruency effects, it cannot be excluded that effects for incongruent stimulus presentations are also part of the processing of the stimuli in our tested CMCs.Stronger effects for incongruent stimuli could be due to for example response conflict or a shift of attention (Chiou & Rich, 2012;Spence & Sathian, 2020).
Besides the question about the neural mechanisms between two different CMCs, we were interested in a probable modulation of the effect within one CMC by stimulus contrast.It has been hypothesized that the CMC effect depends on the ability to form a unique correspondence between stimulus pairs (Chiou & Rich, 2012).Therefore, we additionally measured a variant of the pitch-size CMC, in which we reduced the difference, i.e. the contrast between the stimuli to probably also reduce the CMC effect (Chiou & Rich, 2012).If the CMC effect is modulated by the contrast of the stimulus pairs, we hypothesized to find a reduced neural effect for the variant of the pitch-size CMC with a reduced difference between the stimulus pairs.

Participants
Thirty-three mentally and physically healthy participants (21 females, age M=24.8 years, SD=3.8 years) with normal hearing and normal or corrected-to-normal vision took part in this experiment.The participants were recruited through a local online job platform.Four participants had to be excluded from the final analysis (two due to technical issues and two due to excessive movement in the scanner (>5 mm)).Therefore, the final sample size was 29 participants.All experiment protocols were approved by the Ethics Committee of the General Medical Council Hamburg (PV7022) and all our methods were carried out in accordance with relevant ethical guidelines and regulations.All participants gave their written informed consent and were paid an expense allowance of 10 €/h.

Inside the scanner
The stimuli were presented using Presentation® software (Version 22.01, Neurobehavioral Systems, Inc., Berkeley, CA) running on Windows 7. A mirror placed on the head coil with ~ 12 cm distance to the participant's face was used to reflect the stimulus presentation from a 40'' LCD screen with a refresh rate of 60 Hz.The auditory stimuli were presented using MR compatible in-ear head phones (MR confon).Participant responses were tracked using two MR compatible button boxes.

Outside the scanner
For a tutorial as well as the congruence classification task, Psychopy (Version 3.2.4)software running on a 15' hp laptop with Windows 10 was used to present the stimuli.A button box with two active buttons (one for each hand) was used to track the participants' responses.Auditory stimuli were presented via loudspeaker on both sides of the screen.
For measuring the pitch-elevation CMC, a black square (1° 8' .54'')was presented either above or below a 0° 29' .50'',white fixation cross.The cross was presented in the center of the screen and the distance of the squares to the center was 3° 14' .18''.The auditory stimuli used were the same as in the variant of pitch-size with reduced difference.Squares above the fixation cross presented together with higher pitched tones and squares presented below the fixation cross together with lower tones will be referred to as congruent trials in the following (Fig. 1A).

Experimental design and procedure
We tested two different CMC types within our study, pitch-size and pitch-elevation, as well as a variant of the pitch-size CMC with reduced difference between the stimulus contrasts.A tutorial was performed by the participants before entering the scanner.Within the tutorial, the two distinct CMCs as well as the pitch-size variant with reduced difference were introduced separately to the participants.Each tutorial part for the two distinct CMCs and the pitch-size variant with reduced difference consisted of twelve trials (8 congruent; 4 incongruent).Within the tutorial, the participants were not introduced to the concept of congruence and incongruence hidden behind the stimulus pairings.The purpose of the tutorial was to familiarize the participants with the stimulus pairs and the focus was always on the visual stimuli.
After the tutorial, the participants were placed in the scanner.Before the experiment started, the volume of the acoustic stimuli were adjusted while the participants were exposed to the scanner noise.With the latter procedure we ensured a comfortable but valid presentation of the acoustic stimuli in the scanner.
We used an event-related design with jittered inter trial intervals (ITI) to present the stimuli in the scanner.In the main experiment, each participant saw all CMCs, the two distinct CMCs and the variant of pitch-size with reduced difference (Fig. 1A, B), in separate runs.The duration of a run was ~ 10 minutes.The order in which the two distinct CMCs or the variant of pitch-size with reduced difference were presented was counterbalanced between participants.
Each of the three runs, in which one of the two distinct CMCs (Fig. 1A) or the variant of pitch-size with reduced difference (Fig. 1B) was presented, consisted of 96 trials with 48 repetitions of each condition (congruent; incongruent) and 24 presentations of each stimulus pair (e.g., small square and high pitch) (Chiou & Rich, 2012).The 96 trials were presented in a pseudo-randomized order and this order was also randomized between participants.In each trial, a visual stimulus was presented simultaneously with a sound (Fig. 1C).The participants were instructed to respond to the different visual stimuli as fast and precise as possible.For small as well as elevated stimuli, the correct button press was performed with the left index finger.For large and low presented stimuli, the button press was performed with the right index finger.The audio-visual presentation lasted for 1000 ms followed by 500 ms of extended key response time.The inter-trial interval was jittered between 2000 -8000 ms with a mean of ~ 5000 ms (Fig. 1C).Instructions were prompted on a screen in the scanner be-fore each new run started.The participants had the opportunity to take a short break between the runs, however, they had to stay in the scanner during the short break.
A stimulus congruence classification task was performed outside the scanner following the main experiment.Within the stimulus congruence classification, the participants were instructed to classify if the audio-visual stimulus presentations match each other or not (Fig. 1D).The congruence classifications was separately performed for each distinct CMC and variant of pitch-size with reduced difference, whereby each condition (congruent; incongruent) was presented six times in a random order.The participants were instructed to classify the presented audio-visual pairs by clicking on the respective side of a scale with a computer mouse (Fig. 1D).Thereby only the ends of the scale could be clicked, no gradual adjustment was possible.The participants were instructed to classify intuitively if the audio-visual stimuli were matching or not.No feedback on the chosen pair was given.We conducted this final task to test whether the participants correctly matched the congruent and incongruent stimulus pairs in accordance with the CMC theory (Fig. 1A, B).

Behavioral data analysis
The focus of the analysis of the behavioral data was the stimulus congruence classification performed outside the scanner.All statistical tests on the behavioral data were performed in JASP (Version 0.16.1).
All congruence classifications were taken into account for the further analysis.The classifications were then divided into trials in which participants chose 'matching' and trials in which participants chose 'not matching' separately for each condition and each CMC, i.e., the two distinct CMCs (Table 1) and the variant of pitch-size with reduced difference (Table 2).We were interested in whether participants would classify our congruent stimulus pairs as matching and our incongruent stimulus pairs as not matching, i.e. whether participants show the expected classification of pairs in accordance with the CMC theory.We also wanted to know whether these classifications are dependent on the tested CMCs.Therefore, we conducted a repeated measures ANOVA to test the effect of the within-subject factors 'distinct CMCs (pitch-size & pitch-elevation),' as well as 'Classifications of congruent stimuli (congruent stimuli rated as matching & congruent stimuli rated as not matching)' on stimulus classifications.We also conducted a second repeated measures ANOVA to test the effect of the within subject factors 'pitch-size and pitch-size variant with reduced difference', as well as 'Classifications of congruent stimuli (congruent stimuli rated as matching & congruent stimuli rated as not matching)' on stimulus classifications.We also conducted these repeated measures ANOVAs for the stimulus classification performed on incongruent stimuli.
For the sake of completeness, we analyzed the RTs of the in-scanner task.Only RTs from correct trials and trials with RTs below 1000 ms were taken into account in the further analysis.We chose this threshold, as the stimulus presentation ended after 1000 ms.Furthermore, we wanted to avoid including decisions formed by e.g., complex cognitive processing or inattentiveness.We performed two repeated measures ANOVAs, one for the two distinct CMCs and another for the pitch-size CMC and its variant with reduced difference, to test whether there are differences between the RTs of the trial conditions (congruent trials & incongruent trials) and whether these differences are different between the CMCs.One repeated measures ANOVA was conducted to compare the within subject factors 'distinct CMCs (pitch-size & pitch-elevation)', as well as 'Condition (congruent trials & incongruent trials)' on RTs and the other repeated measures ANOVA was conducted to compare the effect of the within subject factors 'pitch-size and its variant with reduced difference', as well as 'Condition (congruent trials & incongruent trials)' on RTs.

Functional data analysis
Preprocessing and statistical analysis of the fMRI data were carried out in SPM12 (http://www.fil.ion.ucl.ac.uk/spm/) on Matlab version R2020a.Image preprocessing steps included a correction for the mag-netic field distortion by unwarping the images using a fieldmap, as well as motion correction with registration on the first EPI, correcting for between subject anatomical differences by normalizing images on EPI with the EPI template provided by SPM12 and smoothing the normalized images with a 6 mm (full widths half maximum; FWHM) Gaussian kernel.We did not correct for RT differences between congruent and incongruent conditions as the expected RT difference was below 100 ms (Chiou & Rich, 2012;Evans & Treisman, 2010).
The hemodynamic response for each condition (congruent; incongruent) was modelled as an event-related design (for further information see Experimental design and procedure).The six contrast images (main effects) per participant, calculated from onsets of each condition, were entered into a flexible factorial group level analysis and all statistical comparisons were estimated on the group level.

Functional data analysis of the distinct CMCs
To test for differences between the processing of the conditions in the distinct pitch-size and pitch-elevation CMCs, we estimated two interaction contrasts at the second level.To test for enhanced neural effects of congruent trials selectively in the pitch-size CMC an interaction contrast was estimated (Pitch-size (C>I) > pitch-elevation (C<I)).To test for enhanced neural effects of congruent trials selectively in the pitch-elevation CMC, another interaction contrast was estimated (Pitch-size (C<I) < pitch-elevation (C>I)).We also tested for differences in neural effects between the conditions within the distinct CMCs.Therefore, we estimated contrasts that tested for enhanced neural effects of congruent stimuli within pitch-size (C>I) and pitch-elevation (C>I), as well as contrasts that tested for enhanced neural effects of incongruent stimuli within pitch-size (C<I) and pitch-elevation (C<I) (Table 3).To test for common neural effects of congruent trials within the two distinct CMCs, a global conjunction was estimated (Pitch-size (C>I) & pitch-elevation (C>I)).Statistically significant whole brain fMRI effects were family wise error corrected (FWE, p<0.05).

Functional data analysis of pitch-size and its variant with reduced difference
To test for common neural effects of congruent trials within pitch-size and its variant with reduced difference, a global conjunction was estimated (Pitch-size (C>I) & pitch-size variant with reduced difference (C>I)).
To test for enhanced neural effects of congruent trials selectively in the pitch-size CMC, an interaction contrast was estimated (Pitch-size (C>I) > pitch-size variant with reduced difference (C<I)).Statistically significant whole brain fMRI effects were family wise error corrected (FWE, p<0.05).

Behavioral results of the stimulus congruence classification task
We tested separately whether the congruent and incongruent trials were overall classified as matching compared to not matching with a classification task.We also tested whether there is a difference in classifications of congruence or incongruence depending on the CMC.We tested this dependence between the distinct CMCs and between pitch-size and its variant with reduced difference.The results of the classification task, which was performed outside the scanner, showed that the congruent stimulus pairs were overall classified as matching and incongruent stimulus presentations were overall classified as not matching, i.e., the participants' classification aligns with the CMC theory (Spence, 2011).A dependence of the classification strength was observed for the classification of congruent trials between pitch-size and pitch-elevation with more congruent trials rated as matching in pitch-size compared to pitch-elevation (details in the following sections).

Stimulus congruence classification results of the two distinct CMCs
To test whether congruent stimulus pairs were significantly classified as matching by the participants (Table 1) and to test whether this classification different between the distinct CMCs, a repeated measures ANOVA with the distinct CMCs and Classifications of congruent stimuli (congruent trials rated as matching & congruent trials rated as not matching) as within subject factors was performed.For the distinct CMCs, this repeated-measures ANOVA showed a reliable effect for the factor Classifications of congruent stimuli (F (1,28) =133.8,p<0.001, hp2=0.83;Table 1).This means that congruent pairs were significantly classified as matching (M=86.2%,SEM=3.9%;Table 1).An interaction of distinct CMCs × Classifications of congruent stimuli was statistically significant (F (1,28) =7.398, p=0.011, hp2=0.209)and a post-hoc test revealed that significantly more congruent pairs were classified as matching within pitch-size compared to pitch-elevation (pholm=0.022).This means that a significant difference was observed for the number of congruent trials rated as matching between the pitch-size and pitch-elevation CMCs, i.e., a higher congruence classification was observed for congruent stimuli in pitch-size compared to pitch-elevation (Table 1).Furthermore, significantly more congruent pairs were classified as matching within pitch-size (pholm<.001) as well as within pitch-elevation (pholm<.001).We observed no effect for distinct CMCs (F (1,28) =-6.010e -14 , p=1.0, hp2=-2.146e -1 ).
To test whether incongruent stimulus pairs were significantly classified as not matching by the participants and to test whether this classification is different between the distinct CMCs this classification is different between the distinct CMCs, a repeated measures ANOVA with the distinct CMCs and Classi- fications of incongruent stimuli (incongruent trials rated as matching & incongruent trials rated as not matching) as within subject factors was performed.
Stimulus congruence classification results of the pitch-size CMC and its variant with reduced difference To test whether congruent stimulus pairs were significantly classified as matching by the participants (Table 2) and to test whether this classification is different between pitch-size CMC and its variant with reduced difference, a repeated measures ANO-VA with the within subject factors pitch-size and its variant with reduced difference and Classifications of congruent stimuli (congruent trials rated as matching & congruent trials rated as not matching) was performed.This repeated-measures ANOVA showed a reliable effect for the factor Classifications of congruent stimuli (F (1,28) =216.87,p<.001, hp2=0.89;Table 2).This means that congruent pairs were significantly classified as matching (M=87.4%,SEM=3.35%;Table 2).An interaction of Pitch-size and its variant with reduced difference × Classifications of congruent stimuli was statistically significant (F (1,28) =5.072, p=0.033, hp2=0.152)and a post-hoc test revealed that significantly more congruent pairs were classified as matching within pitch-size (pholm<0.001)as well as within the pitch-size variant with reduced difference (pholm<0.001).However, there was no significant difference observed between congruent stimuli classified as matching for pitch-size and its variant with reduced difference (pholm=0.066).No main effect was observed for the stimulus classifications of pitch-size and its variant with reduced difference (F (1,28) =1.699e -13 .0,p=1.0, hp2=6.068e -1 ).This means that no significant difference was observed between pitch-size and its variant with reduced difference for the classification of congruent stimuli.
To test whether incongruent stimulus pairs were significantly classified as not matching by the participants (Table 2) and to test whether this classification is different between the pitch-size CMC and its variant with reduced difference, a repeated measures ANOVA with pitch-size and its variant with reduced difference and Classifications of incongruent stimuli (incongruent trials rated as matching & incongruent trials rated as not matching) as within subject factors was performed.This repeated measures ANOVA showed an effect for Classifications of incongruent stimuli (F (1,28) =33.3, p<.001, hp2=0.54).This means that incongruent pairs were significantly classified as not matching (M=76.7%,SEM=5.4%;Table 2).No effect was observed for pitch-size CMC and its variant with reduced difference (F (1,28) =-5.634e -15 , p=1.0, hp2=-2.012 -1 ) as well as for the interaction of pitch-size CMC and its variant with reduced difference × Classifications of incongruent stimuli (F (1,28) =0.01, p=0.92, hp2=3.55e - ).This means that no significant difference was observed between pitch-size and its variant with reduced difference for the classification of incongruent stimuli.

Behavioral results of the in scanner-task recorded reaction time data
We discarded 2.25% of trials due to false or missing responses and 0.9% of trials due to slow button presses (over 1000 ms).
Error rates and reaction time data of the in scanner task for the two distinct CMCs The error rates for pitch-size were 3.27% and for pitch-elevation 3.3%.To test whether there is a difference in RTs depending on Conditions (congruent trials & incongruent trials) and the tested distinct CMCs (pitch-size & pitch-elevation) we performed a repeated-measures ANOVA with the distinct CMCs and the two conditions as within subject factors on RTs.This ANOVA showed a main effect Error rates and reaction time data of the in scanner task for the pitch-size CMC and its variant The error rates for pitch-size were 3.27% and for pitch-size with reduced difference were 2.95%.To test whether there is a difference in RTs depending on Conditions (congruent trials & incongruent trials) and the tested CMCs pitch-size and its variant with reduced difference, we performed a repeated-measures ANOVA with the pitch-size and its variant with reduced difference and the two conditions as within subject factors on RTs.This repeated-measures ANO-VA showed a main effect for pitch-size and its variant with reduced difference (F (1,28) =26.4,p<.001, hp2=0.485).RTs in the pitch-size CMC (492 ms; SEM=9.5 ms) were significantly lower compared to the pitch-size variant with reduced difference (526.5 ms; SEM=9.8 ms).The main effect for Condition (F (1,28) =1.025, p=0.32, hp2=0.04)as well as for the interaction of pitch-size and its variant with reduced difference × Condition (F (1,28) =1.25, p=0.27, hp2=0.04)were not significant.In summary, a significant difference in RTs was observed between pitch-size and its variant with reduced difference, indicating a probable difference in processing the stimulus pairs dependent on the tested pitch-size CMC.

Functional data
We were interested in finding the neural components of different CMC types as well as between pitch-size and its variant with reduced difference to evaluate different and common neural mechanisms underlying pitch-based CMC effects.To test for congruence effects in dependence of the CMCs pitch-size and pitch-elevation, interaction analyses were performed.To test whether there were common effects for congruent trials between distinct CMCs as well as between pitch-size and its variant pitch-size with reduced difference, global conjunction analyses (k>0) were performed.

Differences between pitch-size and pitch-elevation crossmodal correspondences
We did not find common congruence (C>I) effects for pitch-size and pitch-elevation CMCs when a global conjunction was performed.Albeit clear differences between the distinct CMCs were observed when we tested for enhanced neural effects of congruent trials selectively in the pitch-size CMC.The computed interaction for pitch-size (C>I) > pitch-elevation (C<I) revealed whole brain family wise error (FWE; p<0.05) corrected activations within the left superior parietal lobule (SPL; MNI coordinates: x=-26, y=-48, z=62; T=5.39, p=0.021), the left Heschls' gyrus (HG;z=6;T=5.33,p=0.026)and the left cerebellum T=5.22,p=0.041;Fig. 2).We observed greater neural effects for congruent trials (C>I) in pitch-size and greater neural effects for incongruent trials (C<I) in the pitch-elevation CMCs in these regions (see Fig. 2 as well as Table 3 and Table 4).
The effect within the right AnG was dominated by a positive CMC effect within pitch-size (C>I; T=3.94, p=0.014 corr.).The measured effects within the right ACC were significantly greater for congruent trials in pitch-size (C>I; T=3.97, p=0.013 corr., Table 4) and for incongruent trials in pitch-elevation (C<I; T=3.61, p=0.038 corr., Table 4).The elicited effect in the IPS was driven by incongruent trials within the presentation of pitch-elevation (C<I; T=4.06, p=0.01 corr.; see Fig. 3).For the sake of completeness, we have included a table presenting the statistically significant results of the small volume corrected regions of interest for both pitch-size and pitch-elevation CMCs of both conditions (C>I, C<I; Table 4).
To evaluate a possible influence of RT differences between the pitch-size and pitch-elevation CMCs on the neural activity, we replicated the interaction analysis for pitch-size (C>I) > pitch-elevation (C<I) by including individual median RT data for each condition as a parametric modulator in the group level analysis.The results showed no substantial changes in activity neither in the whole brain nor the ROIs.All reported coordinates in the interaction analysis remained the same and the p-value remained at p<0.05 for the SPL  (T=5.36),cerebellum (T=5.20),HG (T=5.29),right ACC (T=5.57),left ACC (T=4.11) and AnG (T=3.69).The neural activity found in the left IPS (T=3.46)stayed at a nearly significant level (p=0.058).No activation of the IFG survived with the small volume correction.These results confirm that the interaction effects between the pitch-size and pitch-elevation CMCs are independent of the observed RT differences.
Common effects for pitch-size and pitch-size with reduced difference correspondences We were interested in to what extent a reduced difference between the visual and acoustic stimuli could modulate a CMC effect for pitch and size.No common effect was found between pitch-size and its variant with reduced difference.This means that the global conjunction over pitch-size (C>I) & pitch-size variant with reduced difference (C>I) did not lead to significant effects within the whole brain or our ROIs.We therefore concluded that there was likely a difference in neural effects between pitch-size and its variant with reduced difference.Subsequently, we examined whether there were any differences in our ROIs between pitch-size and its variant with reduced difference by conducting an interaction analysis that tested for enhanced neural effects of congruent trials selectively in pitch-size (pitch-size (C>I) > pitch-size variant with reduced difference (C<I)).A differential effect in the ACC (MNI co-ordinates: x=-2, y=30, z=26) with a T value of 3.75 was observed.However, it did not survive corrections for multiple comparisons.

DISCUSSION
In this study, the neural correlates between different CMCs were compared to evaluate a possible common or different processing of CMCs and further to obtain a possible clue regarding the theories on the origins of CMCs.Our results are in favor of different processes underlying different pitch-based CMCs as well as in favor of the theory that these CMCs are probably driven by statistical regularities from our environment.
The participants significantly classified the congruent stimulus pairs as matching compared to not matching and the incongruent stimulus pairs as not matching compared to matching in both distinct CMCs as well as the variant of pitch-size with reduced difference in the stimulus congruence classification, which was per- formed outside the scanner.This means that the participants classified the stimulus pairs in accordance with the CMC theory (Fig. 1A, B).

Common and different regions are involved in processing different crossmodal correspondences
Overall, no area was observed to show common effects for congruence across the distinct CMCs within this study.This finding points towards probable differences in the underlying information processing of different CMCs.Differential effects were observed in the SPL, cerebellum, HG, ACC, AnG and IPS.In all regions, the effects were different between the conditions of the two CMCs.For the pitch-size CMC, a greater activation was observed for congruent > incongruent stimulus presentations while enhanced activity was seen for congruent<incongruent stimulus pairs in the pitch-elevation CMC within identical areas.However, the stimulus congruence classification task showed clear behavioral effects, supporting the assumption that congruent stimulus presentations are overall perceived as belonging together.This finding is consistent with the possibility that the neural effects are related to the perceived congruency between the visual and acoustic stimuli in the different CMCs.Therefore, the disparate neural effects can be interpreted as correlates of different neural processes underlying the different CMCs.
The processing and integration of multimodal information is assumed to start already in the primary sensory system (Baier et al., 2006;Crottaz-Herbette & Menon, 2006;Werner & Noppeney, 2010).This likely explains our observation of effects within the HG in both CMCs.The HG, which is part of the primary acoustic system, demonstrated a modulation of acoustic processing based on the congruency of the visual stimulus in CMCs.Despite the fact that our visual stimuli were clear and informative on their own, we found that processing of the visual stimulus was clearly influenced by the acoustic stimulus (Gallace & Spence, 2006;Mc-Donald et al., 2000;Misselhorn et al., 2016;Werner & Noppeney, 2010).This is particularly interesting since the acoustic stimulus is irrelevant in all tested CMCs.On the one hand acoustic processing was previously reported as suppressed in visually dominating tasks (Johnson & Zatorre, 2005;Schmid et al., 2011) on the other hand, the influential nature of irrelevant acoustic stimuli in visual information processing was also reported by other studies (McDonald et al., 2000;Regenbogen et al., 2018;Tonelli et al., 2017).Thus, our results are a likely indication that in CMCs, differential processing of congruent and incongruent stimuli already occurs in the early sensory systems.To our surprise, we found differential processing of congruent and incongruent stimulus combinations in the early sensory system for the different CMCs.Greater effects in the HG were found for congruent stimulus presentations in pitch-size and for incongruent stimulus presentations in pitch-elevation (Fig. 2).
The ACC was also involved in both CMCs, which probably reflects the different functions of this area in the dynamical interplay between top-down and bottom-up multisensory processing (Benedict et al., 2002;Crottaz-Herbette & Menon, 2006;Downar et al., 2002;Laurienti et al., 2003).In particular, the role of the AnG in multisensory integration is well established (Bonnici et al., 2016;Cabeza et al., 2012;Hölig et al., 2017).Further, the AnG is assumed to be crucial for the direction of attention in relation to memory (Cabeza et al., 2012;Humphreys & Ralph, 2015;Jablonowski & Rose, 2022;Regenbogen et al., 2018).The involvement of this area can be interpreted as recruitment of memory-based attention in the pitch-size CMC.We speculate that pitch-size congruent trials were mainly driven by bottom-up processes, as the AnG is part of the ventral parietal cortex (VPC) (Cabeza et al., 2012;Humphreys & Ralph, 2015;Kim, 2010).According to the attention to memory (AtoM) model postulated by Cabeza and colleagues, pitch-size congruencies can be assumed to be driven by a detection of cues as part of a memory based retrieval (Cabeza et al., 2011;2012).According to Cabeza and colleagues, the VPC '[…] mediates the bottom-up capture of attention by salient memory contents (bottom-up AtoM)' (Cabeza et al., 2012, p. 6).
In contrast to pitch and size correspondences, top-down processes seem to be involved in pitch-elevation CMCs.The IPS was postulated to be part of top-down attentional processes as this region is part of the dorsal parietal cortex (Cabeza et al., 2011;2012;Regenbogen et al., 2018).Furthermore, the IPS showed greater activations dependent on task difficulty (Regenbogen et al., 2018).As the information about the location of a visual stimulus can be assumed to be biased by the misleading acoustic stimulus within incongruent pitch-elevation trials, top-down controlled attention shifts were probably enforced within incongruent stimulus presentations in the pitch-elevation CMC (Chiou & Rich, 2012;Regenbogen et al., 2018;Salmi et al., 2009).On a theoretical level, the fMRI results supported the theory about statistical assumptions extracted from our environment as a general basis for different CMCs (Maimon et al., 2021;Pisanski et al., 2017;Spence, 2011Spence, , 2020;;Spence & Sathian, 2020;Zeljko et al., 2019).This would explain the facilitation of memory driven processes, which seem to best fit the measured neural effects.As shown in other studies before, there seems to be a strong connection between pitches with certain sizes or pitches and the spatial location of objects (Ben-Artzi & Marks, 1995;Bolam et al., 2022;Evans & Treisman, 2010;Jamal et al., 2017;Maimon et al., 2021;Pisanski et al., 2017;Tonelli et al., 2017).These connections are probably based on experiences from our daily lives (Bowling et al., 2017;Parise et al., 2014;Pisanski et al., 2017).However, this common basis result in differential neural processing in the two distinct CMCs.We can only speculate at this point that our task design allowed for an easier detection of congruent pitch-size presentations.The spatially differentiating incongruent pitch-elevation pairs on the other hand probably resulted in a confounded response detection processing (Bruns et al., 2014;Chiou & Rich, 2012;Maimon et al., 2021).Differences in processing the stimulus combinations in the two distinct CMCs may explain the lack of significant effects observed for the incongruent pitch-size and congruent pitch-elevation presentations, as well as the interaction testing for enhanced neural effects of congruent trials selectively in the pitch-elevation CMC.However, further studies are required to investigate these differences and clarify the findings.

Findings on language and magnitude driven effects in crossmodal correspondences
No statistically significant activation in our language specific ROIs was found within this study.Based on the findings from this study as well as previous research, the theories stating that pitch-elevation correspondences are driven by language, are not supported (McCormick et al., 2018;Parkinson et al., 2012).However, it is important to note that we cannot rule out any entanglements of language with other factors influencing the mechanisms underlying pitch and elevation correspondences.
When examining the theory of magnitude-driven correspondences, the effect within the left, instead of the proposed right, IPS exhibited nearly significant results in the interaction between pitch-size and pitch-elevation.Surprisingly, the effect showed significant results in relation to incongruent presentations when assessing the ROI in the main effect of pitch and elevation, contradicting our hypothesis.Hence, our data does not strongly support the existence of a shared foundation for correspondence effects driven by magnitude within or across different types of CMCs.
According to the theory of magnitude/intensity, an e.g., small visual stimulus would be located on the same side of a polar scale as a high-pitched tone on its respective scale.Thus, leading to an intrinsic matching of congruent pairs as they would be perceived as 'more' intense or 'greater' in magnitude when com-bined (Chang & Cho, 2015).Hence, incongruent pairs of audio-visual stimuli would be on their opposite sides of the polar scales and therefore perceived as less intense or lower in magnitude.While our findings in investigating magnitude and language effects in our chosen ROIs may be open to debate, they are in line with the relatively weak effects reported by McCormick et al. (2018) in their study on pitch-elevation correspondence.

Linking this study to a former fMRI study on pitch-elevation CMCs
It is important to note that McCormick et al. ( 2018) used a working memory task in their study on pitch-elevation correspondences, which might has obscured direct correspondence effects leading to no significant effects in a congruent versus incongruent fMRI contrast.Nevertheless, notable effects were observed when analyzing consecutive congruent versus consecutive incongruent trials.Even though we have to interpret these findings with caution, in regard of the magnitude and language hypothesis, both of our findings share some commonalities in terms of null or weak effects.On the other hand, we found similar effects in the right AnG, however probably originating in the pitch-size correspondence in our study.Rather than directly comparing or linking our outcomes to the study by McCormick et al. (2018), our aim was to start from a similar theoretical framework and employ similar ROIs to explore the congruence effect in different CMCs with a non-working memory related task design.

No effects of pitch and size and its variant with reduced difference
We hypothesized to find a modulation of effect strength related to the reduced difference of the contrast of pitches and sizes in our pitch-size CMCs.This hypothesis was not supported by the measured effects within this study.We did not observe common effects of congruence between pitch-size and its variant with reduced difference within our ROIs.According to the findings of a study by Chiou & Rich (2012), the mapping of acoustic and visual stimuli seems to be relative to the assignments of low and high pitches to e.g., small and large squares, however the ability to measure a CMC effect is dependent on the ability to form distinct pairs which clearly stand out from each other in the context they are presented in (Chiou & Rich, 2012;Zeljko et al., 2019).The context in which the stimuli are presented in is thought to support the formation of congruent mappings of stimuli (Chiou & Rich, 2012;Zeljko et al., 2019).The findings in our congruence classification task led to the assumption that a reduced difference of the contrast of our implemented stimuli did not significantly weaken the mentioned ability to associate the expected stimulus pairs as congruent or incongruent.We observed a differential activity in the ACC of pitch-size compared to its variant.This effect was observed when we performed an interaction analysis that tested for enhanced neural effects of congruent trials selectively in the pitch-size CMC.However, this activity did not survive multiple comparison corrections.Nevertheless, this observation suggests that differences in processing congruency may be involved in pitch-size compared to its variant.The slowed down RTs in the pitch-size variant with reduced difference compared to the pitch-size pairs with a great difference in stimulus contrasts probably hints towards a slowed down decision process.It remains speculative whether a lack in neural modulatory effects was caused by the great similarity of the two pitch-size CMCs, as well as the higher similarity of stimuli within the variant of the pitch-size with reduced difference.

Limitations of this study
In the present study are some limitations we want to address.Instead of performing the CMC presentations in an upright position like in classical behavioral measurements, the participants lay on their back during the main experiment in our study.The perceived upright should not have inferred with the pitch-elevation matching of congruence or not matching of incongruence as the presented visual and acoustic stimuli were aligned with the body position (Harris et al., 2015), nevertheless we cannot exclude a probable influential factor of the body position on the perception of stimulus positions within the pitch-elevation part of the experiment in the scanner.While we decided to use natural stimuli instead of pure tones, which allowed for a more natural representation of crossmodal correspondences, it is worth acknowledging the potential influence of timbre on our findings.We matched the instruments and their produced tones, i.e., higher tones are played by smaller instruments and lower tones are played by larger instruments within our distinct CMCs and the pitch-size variant with reduced difference.This alignment potentially amplified the correspondence effect observed in the pitch-size CMC.Previous studies, such as Evans and Treisman (2010), successfully utilized piano and violin sounds in their indirect crossmodal correspondence tasks without encountering issues related to the instruments used.However, it is worth noting that while there are studies explor-ing crossmodal correspondences involving timbre and other perceptual dimensions (Adeli et al., 2014;Qi et al., 2020;Wallmark & Allen, 2020), to our knowledge, none have specifically investigated the role of timbre in pitch-size or pitch-elevation correspondences using the stimuli employed in our study.While we believe our results provide valuable insights into the mechanisms hidden behind different crossmodal correspondences, it would be beneficial to replicate our study with an alternative experimental setup to ascertain the robustness of our conclusions.It would be for example from great interest if the same results we find in this study can be replicated when utilizing the same task for e.g., different crossmodal correspondences.Regarding the null effects we observed in our study, it is important to emphasize that the absence of significant findings does not necessarily imply the absence of an effect.Further investigation with a different paradigm may provide a better understanding of the effect.Although CMCs have been successfully studied using an implicit task design in behavioral studies (Evans, 2020;Evans & Treisman, 2010;Parise & Spence, 2012, Chiou & Rich, 2012), exploring differences between the processing of CMCs in implicit and explicit task designs may be of interest for future fMRI studies.For example, a recent study on sound-symbolic CMCs (Barany et al., 2023) showed that significant congruence effects occur in an explicit design, whereas an implicit task design led to significant incongruence effects in sound-symbolic CMCs (Mccormick et al., 2021;Peiffer-Smadja & Cohen, 2019).In future studies of pitch-based CMCs, it would also be worthwhile to test other stimuli, increase the number of trials, or expand the sample size to gain a more comprehensive understanding of the neural mechanisms involved in CMC effects.

CONCLUSION
This study aimed to elucidate the neural processing of different pitch-based correspondences.The results within our study argue in favor of the idea that different mechanisms drive the integration of stimulus features within different audio-visual CMCs.The strong matching of pitches with their congruent spatial location probably led to greater top-down attentional driven processes during incongruent stimulus presentations.On the other hand, attention as well as memory retrieval seem to be crucial for pitch-size correspondences.Our results support the findings of previous studies assuming both top-down and bottom-up processes are involved in forming the CMC effect in particular and multimodal integration in general (Getz & Kubovy, 2018;Salmi et al., 2009).

Fig. 1 .
Fig. 1. (A) Overview of the congruent (left) and incongruent (right) stimulus pairs for each of the two distinct CMCs (from top to bottom: Pitch-elevation, pitch-size).(B) Overview of the congruent (left) and incongruent (right) stimulus pairs for the pitch-size variant with reduced difference between the visual and acoustic stimulus pairs.(C) Schematic sequence of events in a trial (here pitch-size).(D) Example of a trial within the post experimental test for the strength of congruence outside the scanner.Participants were asked to classify simultaneous sound and square presentations as matching or not matching without knowing the purpose of the main experiment.Only the ends of the scale could be clicked, no gradual adjustment was possible.

Fig. 2 .
Fig. 2. Effects of the interaction for pitch-size with pitch-elevation (left), which tested for enhanced neural effects of congruent trials selectively in the pitch-size CMC.Contrast estimates for the corresponding region are displayed on the right (blue = Pitch-size; orange = Pitch-elevation).Each bar resembles the activation difference (C>I) within the specific region.Whole-brain FWE (peak-level; p<0.05) corrected effects were observed in the SPL, cerebellum and HG.Error bars indicate the observed standard error within each contrast.(The FWE corrected analysis with p<0.05 was used to create the images above).

Fig. 3 .
Fig. 3. Effects of the interaction for pitch-size with pitch-elevation (left), which tested for enhanced neural effects of congruent trials selectively in the pitch-size CMC.Contrast estimates for the corresponding region are displayed on the right (blue = Pitch-size; orange = Pitch-elevation).Each bar resembles the activation difference (C>I) within the specific region.FWE corrected small volume effects (peak-level; p<0.05) were observed in the ACC, AnG and IPS.Error bars indicate the observed standard error within each contrast.(For display purposes, a T value of > 4 was chosen to create the images above).

Table 1 .
Congruence classification of the two distinct CMCs for congruent and incongruent stimulus presentations.Mean in percent for stimulus conditions (congruent, incongruent) within each distinct CMC (pitch-size,

Table 2 .
Congruence classification of pitch-size and its variant with reduced

Table 4 .
Condition specific fMRI findings for pitch-size and pitch-elevation.Findings of the small volume, FWE (p<0.05)corrected ROIs peak-level effects for pitch-size and pitch-elevation by condition (C>I; C<I).Statistically significant effects are shown and therefore no results for pitch-size C<I and pitch-elevation C>I are included in the table.

Table 3 .
fMRI effects for an interaction between pitch-size and pitch-elevation.T Interaction effects within a) whole brain FWE corrected (p<0.05)peak-level effects and b) small volume, FWE corrected (p<0.05)peak-level effects within the defined ROIs.Statistically significant effects are shown and therefore no results for the interaction testing for enhanced neural effects of congruent trials selectively in pitch-elevation (pitch-size (C<I) < pitch-elevation (C>I)) are included in this table.