Bug or Feature: REPLACE Puzzle

Davide Mauri, a colleague from SolidQ, brought up an interesting discussion in our company, which originally started in the Italian SQL Server forums. The topic was certain behavior of the T-SQL REPLACE function that some people thought was a bug and others a feature. I will present the example in question here as a puzzle, and before describing the correct answer and the reasoning behind it, I’ll let you try and figure out what you think the correct answer is.

See if you can guess what the output of the following code is going to be before executing it (empty space left intentionally before presenting the answer):

DECLARE @s AS VARCHAR(1000) = '.x.x.x.';

SELECT REPLACE(@s, '.x.', '.y.');

Books online doesn’t provide much details about the REPLACE function. All it has is just one sentence saying: “Replaces all occurrences of a specified string value with another string value.” With this in mind, it’s interesting that for many people it seems intuitive that the correct answer should be .y.y.y., but in practice, it is, .y.x.y. (read: dot y dot x dot y dot). If you guessed incorrectly, now that you know what the correct answer is, see if you can explain the reasoning behind it.

Given the expression REPLACE(@s, @find, @replacement), the REPLACE function was designed to replace in @s nonoverlapping occurrences of @find with @replacement, scanning the string from left to right. So apply this logic to our example, where @s is initially .x.x.x.. You can express the original string as x, and after the replacement, the result is x. Translating the result to the actual string, you get .y.x.y..

For some this behavior seems like a bug for two main reasons:

1. You’re thinking of overlapping cases.

2. You’re thinking of the all-at-once concept that is supposed to be implemented by SQL, where all calculations are supposed to be handled in an all-at-once manner, as opposed to being handled from left to right.

Regarding point 1, the REPLACE function simply wasn’t designed to deal with overlapping cases, rather nonoverlapping ones.

Regarding point 2, the all-at-once concept is applicable to the set operations that apply to the tables (relations) involved in the query. All expressions that participated in the same logical query processing phase (e.g., the SELECT phase, the WHERE phase, the UPDATE SET phase, etc.), are conceptually evaluated at the same point in time. The concept is not applicable to the logic of a scalar expression operating on an input value expression, which is our case.

So, now that you know what the correct answer is, and that this behavior is in fact a feature and not a bug, how do you change the original expression such that it will handle overlapping cases? That is, can you think of an expression that when applied to the input .x.x.x., requesting to replace all ocurrences of .x. with .y., you will get the result .y.y.y.?

Cheers,

Comments

Plain text