.NETRocks Conversations: Jason Olson Digs into CLR 4

\\[Editor’s note: Welcome to “.NETRocks Conversations,” excerpts of conversations from the .NET Rocks! weekly Internet audio talk show. Hosts Richard Campbell and Carl Franklin chat with a wide variety of .NET developer experts. This month’s excerpt is from show 517, “Jason Olson Digs into the CLR 4.0.”\\]

Carl Franklin: Jason Olson is a Senior Technical Evangelist in the Developer & Platform Evangelism division at Microsoft currently evangelizing Visual Studio 2010 and .NET Framework 4.0.

Jason Olson: Where do you want to start? We have everything from the language space to some of the core framework improvements around things like Parallel Extensions and Map and the core data structures, even the CLR. CLR 4.0 contains a lot of improvements.

Richard Campbell: Wouldn't there be changes to the garbage collection?

Olson: Yeah, absolutely there was a big change which is the introduction of something called Background GC, or Background Garbage Collection. What's really interesting about Background Garbage Collection is it's really an upgrade of the existing Concurrent GC for desktop where there were plenty of limitations that we ran into with the concurrent GC around allocation of memory. So if you're allocating Gen-Zero, Gen 1.0 type stuff, you can hit this allocation limit and as soon as that happens, your thread was actually suspended completely while garbage collection took place.

Franklin: Yeah, and it made a little hiccup.

Olson: Exactly and it defeats the purpose of Concurrent GC to target those desktop UI responsive applications that always wanted to be responsive without the white screen of death.

Franklin: Early games running on DirectX in managed code would have that pause.

Olson: Yeah and it became a pain. With Background GC there's still a limit per se, but as you hit that limit you can actually continue going while the GC is happening or running in the background so there's no abrupt stoppage of that thread.

Franklin: Side-by-side CLR versions?

Olson: The biggest addition to side-by-side CLR versions with .NET Framework 4.0, with CLR 4.0, is an introduction of what we call In-Process Side-by-Side which, when it comes to hosted applications, can be a huge feature. Looking forward to Office adapting .NET Framework, or some other host adapting .NET Framework that becomes huge. We used to support Out-of-Process Side-by-Side where you could have one process using .NET 1.1, you could have another process using 2.0 but they couldn't intermingle within the same process. Now you can.

Franklin: You’ve now nailed it with ASP.NET because that's one process, one version of CLR. If you've got a tool that was built for Framework 2.0, I mean you're not running on Framework 2.0, you were out of luck.

Olson: So you combine that with other issues like the .NET Framework for not auto-rolling forward applications, and you find yourself in a much better position to deploy .NET Framework 4.0 without worrying about breakage of existing apps that may exist in your current enterprise or systems.

Franklin: And interop has some new features too.

Olson: .NET Framework 4.0 has really stepped up and filled a lot of the gaps that existed before, and interop is a big one—to enable interop between COM and managed applications. Before there was a primary interop assembly chicken and egg scenario. Let's say Office has a primary interop assembly that you use as a managed developer to target and to write add-ins for Office. But Office itself can't deploy the primary interop assembly when it's installed because Office doesn't require the .NET Framework to be installed. So if .NET isn't installed, there's potentially no global assembly cache, there's nowhere to deploy the primary interop assembly in the first place. This new feature called Type Equivalence in the CLR allows us to essentially mark two types that may actually be deployed in the different assemblies as saying, hey, these two types are equivalent. They're identical. So while you might be developing against a primary interop assembly, there's a new option in Visual Studio 2010 in the settings for that primary interop assembly where you can say I want you to embed the interop type. That will embed the types directly in your deployed assembly and flag them in such a way that the CLR knows, okay, this version or these components that the host, let's say Office, is expecting, is identical to the version that is here in/or add-in assembly. So in that way, when you go to load this or deploy it, you don't have a situation where you have let's say a 2K add-on that's having to drag along a 12-meg PIA, which is kind of insane. So it's addressed that issue to make interop and especially deployment and management of those interop scenarios a lot better than it was before.

Franklin: PIA, Primary Interop Assemblies.

Olson: Well, here it's called NoPIA.

Franklin: So I heard something about a simplified security model but I'm not sure what that means. Are we talking Code Access Security or what kind of security are we talking about?

Olson: Anyone who has dealt with Code Access Security, or CAS, in the past knows CAS is a very complex and very hard to deal with.

Franklin: And I’d say that you haven't heard about it because a whole bunch of developers before you took a look at it and went yuck.

Olson: There's such a major overhaul to the way you implement policy in sandboxing and enforcement in .NET; it’s almost completely different than it was before. The cool thing about how the team implemented this: Most applications today will continue to work exactly as they did before, and then in the host in library developers that really need to leverage this stuff for protection of their code have a much more simple way to address security in their applications than they ever did before. This stuff can get so convoluted and complex that it becomes very difficult to get it right, and anything that we're doing that prevents developers from falling into the pit of success is a bad thing. I mean, if you really want to get into some gory details, we talk about how security really comprises three areas when we're talking about this kind of code security: First, we're talking about the policy that defines what the security is and how it behaves. Second, you have the sandboxing that actually takes place to make sure that that code is living in its own sandbox so it can't escalate or do things that it shouldn't do, protection from malicious code. Then you have, at runtime, the physical enforcement of that security that happens. Each of those areas has been overhauled majorly with CLR 4.0, and overhauled in such a way that's it’s much more simple like I stated where my primary example that I use is when it comes to policy and enforcement of security in CLR 4.0, the CAS policy is actually disabled by default now. Especially when you look at machine-level policy, that concept is gone, deprecated in a way where we don't even want you thinking about that because there are some problems that come with thinking about policies executed at the enterprise and the machine level that have caused problems and confusions in the past. The thing is the correct place for that policy to happen in a lot of ways is at the OS level, not at the .NET level because you get into this applications where if you're not doing that OS level policy, you actually have different policies that impact managed and native code and where does that shift happens between, especially if you have managed code that's calling out to native code, it causes this almost explosion of policies and permissions that's to happen that causes a lot of problems. So today, the impact of turning that off by default is that all un-hosted managed code is actually fully trusted by default. So code run from a hard drive or code run from a network share today is fully trusted as opposed to what it used to be before.

Campbell: It's the classic Windows problem of you're either an administrator or your app doesn't run. You either have full trust or you're hooped.

Olson: We cause more problems by trying to lock everything down than was actually necessary. So today with CLR 4.0, the host code is enforcing those security decisions or making those security decisions. For instance, code that arrives from the Internet like via ClickOnce isn't going to be fully trusted. Because everything that comes from IE or from the web or something is going to be partial. So we're not going to break applications or we don't want to let malicious code gain access that it didn't have before, but we want people to be able to fall into that pit of success to be able to do things they expect to just work. Now, for legacy code issues there's a flag that you can enable in your app config to turn on the Legacy Code Access Security Policy if you want if the new model is a problem. Sandboxing is one major example where there was a lot of confusion. One primary example is what we call heterogeneous app domain where it was actually possible before to have an app domain where every assembly that was loaded in that domain had a different permission set.

Franklin: Yikes.

Olson: And they could have their customized permission set and then you talk about the explosion to where the machine policy is going to affect that, and where in enterprise level policy is going to affect that and at any one time a developer can't look at a piece of code and say what is actually going to be the explicit permission set that this code runs with at deploy time.

Franklin: Or even more importantly, it's hard to look at code and say, ah, why isn't that working? Is it my code? Or is it some stupid permission restriction? You know, this is why developers hate security, and I'm not saying that it's not necessary; it just always rubs up against our work.

Olson: It makes it very hard to reason what the permission set will actually be at runtime. Especially if you consider the fact that that policy could look drastically different if you deploy it on one machine versus another machine. You can actually easily find yourself in a situation where lower trust assemblies could actually compromise middle trust assemblies and effectively escalate their permission where it's almost a college level thesis or research problem to try to analyze the stuff to make sure you're doing it right, which is a problem when you expect or you want developers to leverage the technology.

Campbell: Right.

Olson: Another way to do sandboxing previously with the CLR was via some of the permissions like Permit Only, Deny, and Assert that are essentially the Stack Walking Modifiers where you say, okay, this method call is going to Assert that this permission is given, this one should Deny it if this permission isn't visible and so on. Those Stack Walking Modifiers are really easy to circumvent where assuming you have code that's running fully trusted, then it's kind of game over. Your base code could Deny some permission X and then it loads up some add-in code that contains malicious code that, hey, just Assert that it has permission X so then when the top code consumes that it just walks down the stack and it gets to that Assert and say, oh, I have this permission so I don't need to keep on walking. So the Deny permission was actually never found so it's very easy to circumvent this permission set issue that we had before. Because of that deny, if you actually look in .NET Framework 4.0, that Deny capability is actually deprecated and usage of Permit Only is actually strongly discouraged as well.

Franklin: Can we move on to Memory-Map files?

Olson: The introduction of Memory-Map file into System.IO allows developers to gain views into very large files. If you're dealing with files that are gigabytes in size, then you can gain views directly into that and work with it very easily directly into memory rather than having to go through IO to actually spin through that and consume extra memory. The other thing is enabling Inner Process Communication scenarios or IPC.

Franklin: So in other words, a big file exists in Process A, Process B wants to access it directly which flies in the face of what a process is supposed to protect against, but we're just assuming that it's data and it's not code and it doesn't allow for code corruption but it could certainly allow for two processes to directly access the same piece of memory.

Olson: And of course you have to be careful about it from a purity side. This is like the concept of shared code taken to the extremely ugly extreme. Especially people that are fans of functional languages, this kind of makes you puke. Because it's not only making code intimate with each other but it's making processes very intimate with each other to a point where it can present a problem.

Franklin: Because once you have processes that are dependent on each other, now one goes down and the other one goes down essentially. Now that said, there are a lot of great practical uses for Memory-Map files.

Olson: Yes and basically the insight into huge datasets is a big one.

Franklin: Right.

Olson: You know, if I'm doing some kind of CAD software of factory management stuff where I'm dealing with gigabytes of data, there are some customers that are essentially almost writing their own database against the single normalized file, which of course is crazy. I'll say that’s a problem which is why I'm not namedropping, but it fits their needs because they're doing real-time systems that have to have this incredibly quick, incredibly large dataset that they need to access too.

Franklin: How about plug-ins that work with large streams of data, video data, audio data? You want those to be protected in another process but you need access to the stream.

Olson: There's definitely some very good uses and I'm particularly intrigued or happy anytime I see something that was only done or only capable of being done either through native code or C++/CLI that's now possible for C# 4.0 or VB.NET.

Franklin: Can we talk about co-variants and contra-variants annotations and what exactly does that mean?

Olson: This is the problem with generic variants: it's when you talk about co-variants and contra-variants it's very difficult to get into this topic of what do we mean by variants without getting into abstract discussions like what do we mean by sub-type and super-type where sub-type and super-type a lot of people may initially assume that that means, okay, sub-class and parent class which is totally not the case because there are different types of relationships where a type can be smaller or larger than another one.

Franklin: So you might say co-variant means that a generic of a type can be treated as a generic of a super type, and contra means that a generic of a type can be treated as a generic of a sub-type.

Olson: Right: What's the relationship? If I have a generic type, what's the relationship in that parameter? The parameter or technically some people might call them “parameterized type”—what's the relationship of that in a super type and sub-type relationship. The co-variants and contra-variants, that's the first part of it but the next part essentially goes into how is this generic type being used? Is it being used explicitly as an output, or is it being used as an input because that will determine whether it needs to be co-variant and counter-variant. Then you get into all sorts of things that are mind-blowing, like what if the input is actually an output of something else and it explodes.

Franklin: That's where brain no work.

Olson: Exactly. That's brain meltage.

Franklin: There's a whole bunch of new Parallel Extensions in .NET 4.0 . . .

Olson: The long-term vision of Parallel Extensions is to get to a point where your average developer does not have to worry about this stuff anymore. The problem when you're developing applications today is at the enterprise. An enterprise or an ISP or what have you, has to devote their smartest and brightest people to addressing the concurrency issue. You know, this isn't going to be concurrency for the masses on this first step, but it's a step in the right direction where we want to get to the point where developers don't have to even think about threads anymore.

Franklin: And specifically locking and synchronizing.

Olson: Right. When you get into locking and synchronization structures and everything that you usually get with threads, it becomes a very difficult topic to reason about because a lot of people can't envision the interactions that can possibly happen.

Franklin: You have to know the theory.

Olson: Yeah. The problem is if you attach a debugger, it's kind of like a quantum problem where the mere observation of it makes the behavior go away. We've addressed this by raising the level of abstraction to this new obstruction called the Task, to dig into what a Task actually is. This is introduced via the Task Parallel Library that's part of the Parallel Extensions or just PPL .It's allowing a developer to reason and implement code in such a way that just says "Here is the unit of work I want to get done and go do it for me." It's very similar to what people used to use as the Thread Pool in a work item. The problem is before when you're dealing with the thread API, if I had access to an instance of a thread, the API was very rich. There were a lot of things I could do with that instance of a thread like cancelling it and interacting with it in different ways—spending it and waiting for it and so on. And as soon as I take the route of moving to the Thread Pool, which is a lot easier to reason about, I lose all that richness. Now what if I want to coordinate between two work items that I've submitted to the Thread Pool? Now you're back into locking and new taxes and synchronization structures that become very difficult to use.

Franklin: Right.

Olson: So the thing that's awesome about the TPL, or the Task Parallel Library, is that when you have an instance of a Task you have the richness of an API even more so than you used to have with the thread with the behavior that you may have gotten with the Thread Pool. So it becomes very powerful.

Franklin: In other words, you don't really have to worry about locking when you're trying to coordinate things across multiple tasks.

To listen to or read the full interview go to www.dotnetrocks.com.

Comments

Plain text