Phonotactic constraints on word-initial structures vary across languages: Slavic languages permit a wide range of sound sequences, while English prohibits stop-stop clusters and Japanese restricts any obstruent sequences. To examine the perceptual challenges faced by English monolinguals and English-speaking learners of Japanese due to acoustic similarities between native and non-native phonotactic patterns, we conducted a syllable counting experiment using non-words that mimic the phonotactic patterns of English, Japanese, and Slavic languages (e.g., /putata/, /pu̥tata/, and /ptata/). Our findings revealed that English monolinguals perceive Japanese-like tokens as having three syllables less often compared to Slavic-like sequences, indicating that the acoustics of linguistic input modulate the perception of sequences unattested in native phonotactics. Conversely, learners of Japanese showed sensitivity to underlying voiceless vowels, reflecting their integration of acoustic details with a learned phonotactic grammar that prohibits stop-stop onsets. However, exposure to the acoustics of Japanese voiceless vowels did not improve L2 listeners’ ability to perceive the Slavic-like clusters without an epenthetic vowel, suggesting that the acquired phonotactic constraint modulates the perception of acoustic input for L2 learners. Overall, the study underscores the intricate role of linguistic experience, providing insights for future research on L2 speech perception.